| ===================================== | |
| The PDB File Format | |
| ===================================== | |
| .. contents:: | |
| :local: | |
| .. _pdb_intro: | |
| Introduction | |
| ============ | |
| PDB (Program Database) is a file format invented by Microsoft and which contains | |
| debug information that can be consumed by debuggers and other tools. Since | |
| officially supported APIs exist on Windows for querying debug information from | |
| PDBs even without the user understanding the internals of the file format, a | |
| large ecosystem of tools has been built for Windows to consume this format. In | |
| order for Clang to be able to generate programs that can interoperate with these | |
| tools, it is necessary for us to generate PDB files ourselves. | |
| At the same time, LLVM has a long history of being able to cross-compile from | |
| any platform to any platform, and we wish for the same to be true here. So it | |
| is necessary for us to understand the PDB file format at the byte-level so that | |
| we can generate PDB files entirely on our own. | |
| This manual describes what we know about the PDB file format today. The layout | |
| of the file, the various streams contained within, the format of individual | |
| records within, and more. | |
| We would like to extend our heartfelt gratitude to Microsoft, without whom we | |
| would not be where we are today. Much of the knowledge contained within this | |
| manual was learned through reading code published by Microsoft on their `GitHub | |
| repo <https://github.com/Microsoft/microsoft-pdb>`__. | |
| .. _pdb_layout: | |
| File Layout | |
| =========== | |
| .. important:: | |
| Unless otherwise specified, all numeric values are encoded in little endian. | |
| If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always | |
| assume it is little endian! | |
| .. toctree:: | |
| :hidden: | |
| MsfFile | |
| PdbStream | |
| TpiStream | |
| DbiStream | |
| ModiStream | |
| PublicStream | |
| GlobalStream | |
| HashStream | |
| CodeViewSymbols | |
| CodeViewTypes | |
| .. _msf: | |
| The MSF Container | |
| ----------------- | |
| A PDB file is really just a special case of an MSF (Multi-Stream Format) file. | |
| An MSF file is actually a miniature "file system within a file". It contains | |
| multiple streams (aka files) which can represent arbitrary data, and these | |
| streams are divided into blocks which may not necessarily be contiguously | |
| laid out within the file (aka fragmented). Additionally, the MSF contains a | |
| stream directory (aka MFT) which describes how the streams (files) are laid | |
| out within the MSF. | |
| For more information about the MSF container format, stream directory, and | |
| block layout, see :doc:`MsfFile`. | |
| .. _streams: | |
| Streams | |
| ------- | |
| The PDB format contains a number of streams which describe various information | |
| such as the types, symbols, source files, and compilands (e.g. object files) | |
| of a program, as well as some additional streams containing hash tables that are | |
| used by debuggers and other tools to provide fast lookup of records and types | |
| by name, and various other information about how the program was compiled such | |
| as the specific toolchain used, and more. A summary of streams contained in a | |
| PDB file is as follows: | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | Name | Stream Index | Contents | | |
| +====================+==============================+===========================================+ | |
| | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | PDB Stream | - Fixed Stream Index 1 | - Basic File Information | | |
| | | | - Fields to match EXE to this PDB | | |
| | | | - Map of named streams to stream indices | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records | | |
| | | | - Index of TPI Hash Stream | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information | | |
| | | | - Indices of individual module streams | | |
| | | | - Indices of public / global streams | | |
| | | | - Section Contribution Information | | |
| | | | - Source File Information | | |
| | | | - FPO / PGO Data | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records | | |
| | | | - Index of IPI Hash Stream | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | /LinkInfo | - Contained in PDB Stream | - Unknown | | |
| | | Named Stream map | | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | /src/headerblock | - Contained in PDB Stream | - Unknown | | |
| | | Named Stream map | | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | /names | - Contained in PDB Stream | - PDB-wide global string table used for | | |
| | | Named Stream map | string de-duplication | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module | | |
| | | - One for each compiland | - Line Number Information | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records | | |
| | | | - Index of Public Hash Stream | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | Global Stream | - Contained in DBI Stream | - Global Symbol Records | | |
| | | | - Index of Global Hash Stream | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records | | |
| | | | by name | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records | | |
| | | | by name | | |
| +--------------------+------------------------------+-------------------------------------------+ | |
| More information about the structure of each of these can be found on the | |
| following pages: | |
| :doc:`PdbStream` | |
| Information about the PDB Info Stream and how it is used to match PDBs to EXEs. | |
| :doc:`TpiStream` | |
| Information about the TPI stream and the CodeView records contained within. | |
| :doc:`DbiStream` | |
| Information about the DBI stream and relevant substreams including the Module Substreams, | |
| source file information, and CodeView symbol records contained within. | |
| :doc:`ModiStream` | |
| Information about the Module Information Stream, of which there is one for each compilation | |
| unit and the format of symbols contained within. | |
| :doc:`PublicStream` | |
| Information about the Public Symbol Stream. | |
| :doc:`GlobalStream` | |
| Information about the Global Symbol Stream. | |
| :doc:`HashStream` | |
| Information about the Hash Table stream, and how it can be used to quickly look up records | |
| by name. | |
| CodeView | |
| ======== | |
| CodeView is another format which comes into the picture. While MSF defines | |
| the structure of the overall file, and PDB defines the set of streams that | |
| appear within the MSF file and the format of those streams, CodeView defines | |
| the format of **symbol and type records** that appear within specific streams. | |
| Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for | |
| more information about the CodeView format. |