|  | ======================================== | 
|  | Precompiled Header and Modules Internals | 
|  | ======================================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | This document describes the design and implementation of Clang's precompiled | 
|  | headers (PCH) and modules.  If you are interested in the end-user view, please | 
|  | see the :ref:`User's Manual <usersmanual-precompiled-headers>`. | 
|  |  | 
|  | Using Precompiled Headers with ``clang`` | 
|  | ---------------------------------------- | 
|  |  | 
|  | The Clang compiler frontend, ``clang -cc1``, supports two command line options | 
|  | for generating and using PCH files. | 
|  |  | 
|  | To generate PCH files using ``clang -cc1``, use the option `-emit-pch`: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | $ clang -cc1 test.h -emit-pch -o test.h.pch | 
|  |  | 
|  | This option is transparently used by ``clang`` when generating PCH files.  The | 
|  | resulting PCH file contains the serialized form of the compiler's internal | 
|  | representation after it has completed parsing and semantic analysis.  The PCH | 
|  | file can then be used as a prefix header with the `-include-pch` | 
|  | option: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | $ clang -cc1 -include-pch test.h.pch test.c -o test.s | 
|  |  | 
|  | Design Philosophy | 
|  | ----------------- | 
|  |  | 
|  | Precompiled headers are meant to improve overall compile times for projects, so | 
|  | the design of precompiled headers is entirely driven by performance concerns. | 
|  | The use case for precompiled headers is relatively simple: when there is a | 
|  | common set of headers that is included in nearly every source file in the | 
|  | project, we *precompile* that bundle of headers into a single precompiled | 
|  | header (PCH file).  Then, when compiling the source files in the project, we | 
|  | load the PCH file first (as a prefix header), which acts as a stand-in for that | 
|  | bundle of headers. | 
|  |  | 
|  | A precompiled header implementation improves performance when: | 
|  |  | 
|  | * Loading the PCH file is significantly faster than re-parsing the bundle of | 
|  | headers stored within the PCH file.  Thus, a precompiled header design | 
|  | attempts to minimize the cost of reading the PCH file.  Ideally, this cost | 
|  | should not vary with the size of the precompiled header file. | 
|  |  | 
|  | * The cost of generating the PCH file initially is not so large that it | 
|  | counters the per-source-file performance improvement due to eliminating the | 
|  | need to parse the bundled headers in the first place.  This is particularly | 
|  | important on multi-core systems, because PCH file generation serializes the | 
|  | build when all compilations require the PCH file to be up-to-date. | 
|  |  | 
|  | Modules, as implemented in Clang, use the same mechanisms as precompiled | 
|  | headers to save a serialized AST file (one per module) and use those AST | 
|  | modules.  From an implementation standpoint, modules are a generalization of | 
|  | precompiled headers, lifting a number of restrictions placed on precompiled | 
|  | headers.  In particular, there can only be one precompiled header and it must | 
|  | be included at the beginning of the translation unit.  The extensions to the | 
|  | AST file format required for modules are discussed in the section on | 
|  | :ref:`modules <pchinternals-modules>`. | 
|  |  | 
|  | Clang's AST files are designed with a compact on-disk representation, which | 
|  | minimizes both creation time and the time required to initially load the AST | 
|  | file.  The AST file itself contains a serialized representation of Clang's | 
|  | abstract syntax trees and supporting data structures, stored using the same | 
|  | compressed bitstream as `LLVM's bitcode file format | 
|  | <http://llvm.org/docs/BitCodeFormat.html>`_. | 
|  |  | 
|  | Clang's AST files are loaded "lazily" from disk.  When an AST file is initially | 
|  | loaded, Clang reads only a small amount of data from the AST file to establish | 
|  | where certain important data structures are stored.  The amount of data read in | 
|  | this initial load is independent of the size of the AST file, such that a | 
|  | larger AST file does not lead to longer AST load times.  The actual header data | 
|  | in the AST file --- macros, functions, variables, types, etc. --- is loaded | 
|  | only when it is referenced from the user's code, at which point only that | 
|  | entity (and those entities it depends on) are deserialized from the AST file. | 
|  | With this approach, the cost of using an AST file for a translation unit is | 
|  | proportional to the amount of code actually used from the AST file, rather than | 
|  | being proportional to the size of the AST file itself. | 
|  |  | 
|  | When given the `-print-stats` option, Clang produces statistics | 
|  | describing how much of the AST file was actually loaded from disk.  For a | 
|  | simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header | 
|  | (which is built as a precompiled header), this option illustrates how little of | 
|  | the actual precompiled header is required: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | *** AST File Statistics: | 
|  | 895/39981 source location entries read (2.238563%) | 
|  | 19/15315 types read (0.124061%) | 
|  | 20/82685 declarations read (0.024188%) | 
|  | 154/58070 identifiers read (0.265197%) | 
|  | 0/7260 selectors read (0.000000%) | 
|  | 0/30842 statements read (0.000000%) | 
|  | 4/8400 macros read (0.047619%) | 
|  | 1/4995 lexical declcontexts read (0.020020%) | 
|  | 0/4413 visible declcontexts read (0.000000%) | 
|  | 0/7230 method pool entries read (0.000000%) | 
|  | 0 method pool misses | 
|  |  | 
|  | For this small program, only a tiny fraction of the source locations, types, | 
|  | declarations, identifiers, and macros were actually deserialized from the | 
|  | precompiled header.  These statistics can be useful to determine whether the | 
|  | AST file implementation can be improved by making more of the implementation | 
|  | lazy. | 
|  |  | 
|  | Precompiled headers can be chained.  When you create a PCH while including an | 
|  | existing PCH, Clang can create the new PCH by referencing the original file and | 
|  | only writing the new data to the new file.  For example, you could create a PCH | 
|  | out of all the headers that are very commonly used throughout your project, and | 
|  | then create a PCH for every single source file in the project that includes the | 
|  | code that is specific to that file, so that recompiling the file itself is very | 
|  | fast, without duplicating the data from the common headers for every file.  The | 
|  | mechanisms behind chained precompiled headers are discussed in a :ref:`later | 
|  | section <pchinternals-chained>`. | 
|  |  | 
|  | AST File Contents | 
|  | ----------------- | 
|  |  | 
|  | An AST file produced by clang is an object file container with a ``clangast`` | 
|  | (COFF) or ``__clangast`` (ELF and Mach-O) section containing the serialized AST. | 
|  | Other target-specific sections in the object file container are used to hold | 
|  | debug information for the data types defined in the AST.  Tools built on top of | 
|  | libclang that do not need debug information may also produce raw AST files that | 
|  | only contain the serialized AST. | 
|  |  | 
|  | The ``clangast`` section is organized into several different blocks, each of | 
|  | which contains the serialized representation of a part of Clang's internal | 
|  | representation.  Each of the blocks corresponds to either a block or a record | 
|  | within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. | 
|  | The contents of each of these logical blocks are described below. | 
|  |  | 
|  | .. image:: PCHLayout.png | 
|  |  | 
|  | The ``llvm-objdump`` utility provides a ``-raw-clang-ast`` option to extract the | 
|  | binary contents of the AST section from an object file container. | 
|  |  | 
|  | The `llvm-bcanalyzer <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ | 
|  | utility can be used to examine the actual structure of the bitstream for the AST | 
|  | section.  This information can be used both to help understand the structure of | 
|  | the AST section and to isolate areas where the AST representation can still be | 
|  | optimized, e.g., through the introduction of abbreviations. | 
|  |  | 
|  |  | 
|  | Metadata Block | 
|  | ^^^^^^^^^^^^^^ | 
|  |  | 
|  | The metadata block contains several records that provide information about how | 
|  | the AST file was built.  This metadata is primarily used to validate the use of | 
|  | an AST file.  For example, a precompiled header built for a 32-bit x86 target | 
|  | cannot be used when compiling for a 64-bit x86 target.  The metadata block | 
|  | contains information about: | 
|  |  | 
|  | Language options | 
|  | Describes the particular language dialect used to compile the AST file, | 
|  | including major options (e.g., Objective-C support) and more minor options | 
|  | (e.g., support for "``//``" comments).  The contents of this record correspond to | 
|  | the ``LangOptions`` class. | 
|  |  | 
|  | Target architecture | 
|  | The target triple that describes the architecture, platform, and ABI for | 
|  | which the AST file was generated, e.g., ``i386-apple-darwin9``. | 
|  |  | 
|  | AST version | 
|  | The major and minor version numbers of the AST file format.  Changes in the | 
|  | minor version number should not affect backward compatibility, while changes | 
|  | in the major version number imply that a newer compiler cannot read an older | 
|  | precompiled header (and vice-versa). | 
|  |  | 
|  | Original file name | 
|  | The full path of the header that was used to generate the AST file. | 
|  |  | 
|  | Predefines buffer | 
|  | Although not explicitly stored as part of the metadata, the predefines buffer | 
|  | is used in the validation of the AST file.  The predefines buffer itself | 
|  | contains code generated by the compiler to initialize the preprocessor state | 
|  | according to the current target, platform, and command-line options.  For | 
|  | example, the predefines buffer will contain "``#define __STDC__ 1``" when we | 
|  | are compiling C without Microsoft extensions.  The predefines buffer itself | 
|  | is stored within the :ref:`pchinternals-sourcemgr`, but its contents are | 
|  | verified along with the rest of the metadata. | 
|  |  | 
|  | A chained PCH file (that is, one that references another PCH) and a module | 
|  | (which may import other modules) have additional metadata containing the list | 
|  | of all AST files that this AST file depends on.  Each of those files will be | 
|  | loaded along with this AST file. | 
|  |  | 
|  | For chained precompiled headers, the language options, target architecture and | 
|  | predefines buffer data is taken from the end of the chain, since they have to | 
|  | match anyway. | 
|  |  | 
|  | .. _pchinternals-sourcemgr: | 
|  |  | 
|  | Source Manager Block | 
|  | ^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | The source manager block contains the serialized representation of Clang's | 
|  | :ref:`SourceManager <SourceManager>` class, which handles the mapping from | 
|  | source locations (as represented in Clang's abstract syntax tree) into actual | 
|  | column/line positions within a source file or macro instantiation.  The AST | 
|  | file's representation of the source manager also includes information about all | 
|  | of the headers that were (transitively) included when building the AST file. | 
|  |  | 
|  | The bulk of the source manager block is dedicated to information about the | 
|  | various files, buffers, and macro instantiations into which a source location | 
|  | can refer.  Each of these is referenced by a numeric "file ID", which is a | 
|  | unique number (allocated starting at 1) stored in the source location.  Clang | 
|  | serializes the information for each kind of file ID, along with an index that | 
|  | maps file IDs to the position within the AST file where the information about | 
|  | that file ID is stored.  The data associated with a file ID is loaded only when | 
|  | required by the front end, e.g., to emit a diagnostic that includes a macro | 
|  | instantiation history inside the header itself. | 
|  |  | 
|  | The source manager block also contains information about all of the headers | 
|  | that were included when building the AST file.  This includes information about | 
|  | the controlling macro for the header (e.g., when the preprocessor identified | 
|  | that the contents of the header dependent on a macro like | 
|  | ``LLVM_CLANG_SOURCEMANAGER_H``). | 
|  |  | 
|  | .. _pchinternals-preprocessor: | 
|  |  | 
|  | Preprocessor Block | 
|  | ^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | The preprocessor block contains the serialized representation of the | 
|  | preprocessor.  Specifically, it contains all of the macros that have been | 
|  | defined by the end of the header used to build the AST file, along with the | 
|  | token sequences that comprise each macro.  The macro definitions are only read | 
|  | from the AST file when the name of the macro first occurs in the program.  This | 
|  | lazy loading of macro definitions is triggered by lookups into the | 
|  | :ref:`identifier table <pchinternals-ident-table>`. | 
|  |  | 
|  | .. _pchinternals-types: | 
|  |  | 
|  | Types Block | 
|  | ^^^^^^^^^^^ | 
|  |  | 
|  | The types block contains the serialized representation of all of the types | 
|  | referenced in the translation unit.  Each Clang type node (``PointerType``, | 
|  | ``FunctionProtoType``, etc.) has a corresponding record type in the AST file. | 
|  | When types are deserialized from the AST file, the data within the record is | 
|  | used to reconstruct the appropriate type node using the AST context. | 
|  |  | 
|  | Each type has a unique type ID, which is an integer that uniquely identifies | 
|  | that type.  Type ID 0 represents the NULL type, type IDs less than | 
|  | ``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.), | 
|  | while other "user-defined" type IDs are assigned consecutively from | 
|  | ``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered.  The AST file has | 
|  | an associated mapping from the user-defined types block to the location within | 
|  | the types block where the serialized representation of that type resides, | 
|  | enabling lazy deserialization of types.  When a type is referenced from within | 
|  | the AST file, that reference is encoded using the type ID shifted left by 3 | 
|  | bits.  The lower three bits are used to represent the ``const``, ``volatile``, | 
|  | and ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class. | 
|  |  | 
|  | .. _pchinternals-decls: | 
|  |  | 
|  | Declarations Block | 
|  | ^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | The declarations block contains the serialized representation of all of the | 
|  | declarations referenced in the translation unit.  Each Clang declaration node | 
|  | (``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the | 
|  | AST file.  When declarations are deserialized from the AST file, the data | 
|  | within the record is used to build and populate a new instance of the | 
|  | corresponding ``Decl`` node.  As with types, each declaration node has a | 
|  | numeric ID that is used to refer to that declaration within the AST file.  In | 
|  | addition, a lookup table provides a mapping from that numeric ID to the offset | 
|  | within the precompiled header where that declaration is described. | 
|  |  | 
|  | Declarations in Clang's abstract syntax trees are stored hierarchically.  At | 
|  | the top of the hierarchy is the translation unit (``TranslationUnitDecl``), | 
|  | which contains all of the declarations in the translation unit but is not | 
|  | actually written as a specific declaration node.  Its child declarations (such | 
|  | as functions or struct types) may also contain other declarations inside them, | 
|  | and so on.  Within Clang, each declaration is stored within a :ref:`declaration | 
|  | context <DeclContext>`, as represented by the ``DeclContext`` class. | 
|  | Declaration contexts provide the mechanism to perform name lookup within a | 
|  | given declaration (e.g., find the member named ``x`` in a structure) and | 
|  | iterate over the declarations stored within a context (e.g., iterate over all | 
|  | of the fields of a structure for structure layout). | 
|  |  | 
|  | In Clang's AST file format, deserializing a declaration that is a | 
|  | ``DeclContext`` is a separate operation from deserializing all of the | 
|  | declarations stored within that declaration context.  Therefore, Clang will | 
|  | deserialize the translation unit declaration without deserializing the | 
|  | declarations within that translation unit.  When required, the declarations | 
|  | stored within a declaration context will be deserialized.  There are two | 
|  | representations of the declarations within a declaration context, which | 
|  | correspond to the name-lookup and iteration behavior described above: | 
|  |  | 
|  | * When the front end performs name lookup to find a name ``x`` within a given | 
|  | declaration context (for example, during semantic analysis of the expression | 
|  | ``p->x``, where ``p``'s type is defined in the precompiled header), Clang | 
|  | refers to an on-disk hash table that maps from the names within that | 
|  | declaration context to the declaration IDs that represent each visible | 
|  | declaration with that name.  The actual declarations will then be | 
|  | deserialized to provide the results of name lookup. | 
|  | * When the front end performs iteration over all of the declarations within a | 
|  | declaration context, all of those declarations are immediately | 
|  | de-serialized.  For large declaration contexts (e.g., the translation unit), | 
|  | this operation is expensive; however, large declaration contexts are not | 
|  | traversed in normal compilation, since such a traversal is unnecessary. | 
|  | However, it is common for the code generator and semantic analysis to | 
|  | traverse declaration contexts for structs, classes, unions, and | 
|  | enumerations, although those contexts contain relatively few declarations in | 
|  | the common case. | 
|  |  | 
|  | Statements and Expressions | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | Statements and expressions are stored in the AST file in both the :ref:`types | 
|  | <pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks, | 
|  | because every statement or expression will be associated with either a type or | 
|  | declaration.  The actual statement and expression records are stored | 
|  | immediately following the declaration or type that owns the statement or | 
|  | expression.  For example, the statement representing the body of a function | 
|  | will be stored directly following the declaration of the function. | 
|  |  | 
|  | As with types and declarations, each statement and expression kind in Clang's | 
|  | abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding | 
|  | record type in the AST file, which contains the serialized representation of | 
|  | that statement or expression.  Each substatement or subexpression within an | 
|  | expression is stored as a separate record (which keeps most records to a fixed | 
|  | size).  Within the AST file, the subexpressions of an expression are stored, in | 
|  | reverse order, prior to the expression that owns those expression, using a form | 
|  | of `Reverse Polish Notation | 
|  | <http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_.  For example, an | 
|  | expression ``3 - 4 + 5`` would be represented as follows: | 
|  |  | 
|  | +-----------------------+ | 
|  | | ``IntegerLiteral(5)`` | | 
|  | +-----------------------+ | 
|  | | ``IntegerLiteral(4)`` | | 
|  | +-----------------------+ | 
|  | | ``IntegerLiteral(3)`` | | 
|  | +-----------------------+ | 
|  | | ``IntegerLiteral(-)`` | | 
|  | +-----------------------+ | 
|  | | ``IntegerLiteral(+)`` | | 
|  | +-----------------------+ | 
|  | |       ``STOP``        | | 
|  | +-----------------------+ | 
|  |  | 
|  | When reading this representation, Clang evaluates each expression record it | 
|  | encounters, builds the appropriate abstract syntax tree node, and then pushes | 
|  | that expression on to a stack.  When a record contains *N* subexpressions --- | 
|  | ``BinaryOperator`` has two of them --- those expressions are popped from the | 
|  | top of the stack.  The special STOP code indicates that we have reached the end | 
|  | of a serialized expression or statement; other expression or statement records | 
|  | may follow, but they are part of a different expression. | 
|  |  | 
|  | .. _pchinternals-ident-table: | 
|  |  | 
|  | Identifier Table Block | 
|  | ^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | The identifier table block contains an on-disk hash table that maps each | 
|  | identifier mentioned within the AST file to the serialized representation of | 
|  | the identifier's information (e.g, the ``IdentifierInfo`` structure).  The | 
|  | serialized representation contains: | 
|  |  | 
|  | * The actual identifier string. | 
|  | * Flags that describe whether this identifier is the name of a built-in, a | 
|  | poisoned identifier, an extension token, or a macro. | 
|  | * If the identifier names a macro, the offset of the macro definition within | 
|  | the :ref:`pchinternals-preprocessor`. | 
|  | * If the identifier names one or more declarations visible from translation | 
|  | unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these | 
|  | declarations. | 
|  |  | 
|  | When an AST file is loaded, the AST file reader mechanism introduces itself | 
|  | into the identifier table as an external lookup source.  Thus, when the user | 
|  | program refers to an identifier that has not yet been seen, Clang will perform | 
|  | a lookup into the identifier table.  If an identifier is found, its contents | 
|  | (macro definitions, flags, top-level declarations, etc.) will be deserialized, | 
|  | at which point the corresponding ``IdentifierInfo`` structure will have the | 
|  | same contents it would have after parsing the headers in the AST file. | 
|  |  | 
|  | Within the AST file, the identifiers used to name declarations are represented | 
|  | with an integral value.  A separate table provides a mapping from this integral | 
|  | value (the identifier ID) to the location within the on-disk hash table where | 
|  | that identifier is stored.  This mapping is used when deserializing the name of | 
|  | a declaration, the identifier of a token, or any other construct in the AST | 
|  | file that refers to a name. | 
|  |  | 
|  | .. _pchinternals-method-pool: | 
|  |  | 
|  | Method Pool Block | 
|  | ^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | The method pool block is represented as an on-disk hash table that serves two | 
|  | purposes: it provides a mapping from the names of Objective-C selectors to the | 
|  | set of Objective-C instance and class methods that have that particular | 
|  | selector (which is required for semantic analysis in Objective-C) and also | 
|  | stores all of the selectors used by entities within the AST file.  The design | 
|  | of the method pool is similar to that of the :ref:`identifier table | 
|  | <pchinternals-ident-table>`: the first time a particular selector is formed | 
|  | during the compilation of the program, Clang will search in the on-disk hash | 
|  | table of selectors; if found, Clang will read the Objective-C methods | 
|  | associated with that selector into the appropriate front-end data structure | 
|  | (``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and | 
|  | class methods, respectively). | 
|  |  | 
|  | As with identifiers, selectors are represented by numeric values within the AST | 
|  | file.  A separate index maps these numeric selector values to the offset of the | 
|  | selector within the on-disk hash table, and will be used when de-serializing an | 
|  | Objective-C method declaration (or other Objective-C construct) that refers to | 
|  | the selector. | 
|  |  | 
|  | AST Reader Integration Points | 
|  | ----------------------------- | 
|  |  | 
|  | The "lazy" deserialization behavior of AST files requires their integration | 
|  | into several completely different submodules of Clang.  For example, lazily | 
|  | deserializing the declarations during name lookup requires that the name-lookup | 
|  | routines be able to query the AST file to find entities stored there. | 
|  |  | 
|  | For each Clang data structure that requires direct interaction with the AST | 
|  | reader logic, there is an abstract class that provides the interface between | 
|  | the two modules.  The ``ASTReader`` class, which handles the loading of an AST | 
|  | file, inherits from all of these abstract classes to provide lazy | 
|  | deserialization of Clang's data structures.  ``ASTReader`` implements the | 
|  | following abstract classes: | 
|  |  | 
|  | ``ExternalSLocEntrySource`` | 
|  | This abstract interface is associated with the ``SourceManager`` class, and | 
|  | is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to | 
|  | load the details of a file, buffer, or macro instantiation. | 
|  |  | 
|  | ``IdentifierInfoLookup`` | 
|  | This abstract interface is associated with the ``IdentifierTable`` class, and | 
|  | is used whenever the program source refers to an identifier that has not yet | 
|  | been seen.  In this case, the AST reader searches for this identifier within | 
|  | its :ref:`identifier table <pchinternals-ident-table>` to load any top-level | 
|  | declarations or macros associated with that identifier. | 
|  |  | 
|  | ``ExternalASTSource`` | 
|  | This abstract interface is associated with the ``ASTContext`` class, and is | 
|  | used whenever the abstract syntax tree nodes need to loaded from the AST | 
|  | file.  It provides the ability to de-serialize declarations and types | 
|  | identified by their numeric values, read the bodies of functions when | 
|  | required, and read the declarations stored within a declaration context | 
|  | (either for iteration or for name lookup). | 
|  |  | 
|  | ``ExternalSemaSource`` | 
|  | This abstract interface is associated with the ``Sema`` class, and is used | 
|  | whenever semantic analysis needs to read information from the :ref:`global | 
|  | method pool <pchinternals-method-pool>`. | 
|  |  | 
|  | .. _pchinternals-chained: | 
|  |  | 
|  | Chained precompiled headers | 
|  | --------------------------- | 
|  |  | 
|  | Chained precompiled headers were initially intended to improve the performance | 
|  | of IDE-centric operations such as syntax highlighting and code completion while | 
|  | a particular source file is being edited by the user.  To minimize the amount | 
|  | of reparsing required after a change to the file, a form of precompiled header | 
|  | --- called a precompiled *preamble* --- is automatically generated by parsing | 
|  | all of the headers in the source file, up to and including the last | 
|  | ``#include``.  When only the source file changes (and none of the headers it | 
|  | depends on), reparsing of that source file can use the precompiled preamble and | 
|  | start parsing after the ``#include``\ s, so parsing time is proportional to the | 
|  | size of the source file (rather than all of its includes).  However, the | 
|  | compilation of that translation unit may already use a precompiled header: in | 
|  | this case, Clang will create the precompiled preamble as a chained precompiled | 
|  | header that refers to the original precompiled header.  This drastically | 
|  | reduces the time needed to serialize the precompiled preamble for use in | 
|  | reparsing. | 
|  |  | 
|  | Chained precompiled headers get their name because each precompiled header can | 
|  | depend on one other precompiled header, forming a chain of dependencies.  A | 
|  | translation unit will then include the precompiled header that starts the chain | 
|  | (i.e., nothing depends on it).  This linearity of dependencies is important for | 
|  | the semantic model of chained precompiled headers, because the most-recent | 
|  | precompiled header can provide information that overrides the information | 
|  | provided by the precompiled headers it depends on, just like a header file | 
|  | ``B.h`` that includes another header ``A.h`` can modify the state produced by | 
|  | parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``. | 
|  |  | 
|  | There are several ways in which chained precompiled headers generalize the AST | 
|  | file model: | 
|  |  | 
|  | Numbering of IDs | 
|  | Many different kinds of entities --- identifiers, declarations, types, etc. | 
|  | --- have ID numbers that start at 1 or some other predefined constant and | 
|  | grow upward.  Each precompiled header records the maximum ID number it has | 
|  | assigned in each category.  Then, when a new precompiled header is generated | 
|  | that depends on (chains to) another precompiled header, it will start | 
|  | counting at the next available ID number.  This way, one can determine, given | 
|  | an ID number, which AST file actually contains the entity. | 
|  |  | 
|  | Name lookup | 
|  | When writing a chained precompiled header, Clang attempts to write only | 
|  | information that has changed from the precompiled header on which it is | 
|  | based.  This changes the lookup algorithm for the various tables, such as the | 
|  | :ref:`identifier table <pchinternals-ident-table>`: the search starts at the | 
|  | most-recent precompiled header.  If no entry is found, lookup then proceeds | 
|  | to the identifier table in the precompiled header it depends on, and so one. | 
|  | Once a lookup succeeds, that result is considered definitive, overriding any | 
|  | results from earlier precompiled headers. | 
|  |  | 
|  | Update records | 
|  | There are various ways in which a later precompiled header can modify the | 
|  | entities described in an earlier precompiled header.  For example, later | 
|  | precompiled headers can add entries into the various name-lookup tables for | 
|  | the translation unit or namespaces, or add new categories to an Objective-C | 
|  | class.  Each of these updates is captured in an "update record" that is | 
|  | stored in the chained precompiled header file and will be loaded along with | 
|  | the original entity. | 
|  |  | 
|  | .. _pchinternals-modules: | 
|  |  | 
|  | Modules | 
|  | ------- | 
|  |  | 
|  | Modules generalize the chained precompiled header model yet further, from a | 
|  | linear chain of precompiled headers to an arbitrary directed acyclic graph | 
|  | (DAG) of AST files.  All of the same techniques used to make chained | 
|  | precompiled headers work --- ID number, name lookup, update records --- are | 
|  | shared with modules.  However, the DAG nature of modules introduce a number of | 
|  | additional complications to the model: | 
|  |  | 
|  | Numbering of IDs | 
|  | The simple, linear numbering scheme used in chained precompiled headers falls | 
|  | apart with the module DAG, because different modules may end up with | 
|  | different numbering schemes for entities they imported from common shared | 
|  | modules.  To account for this, each module file provides information about | 
|  | which modules it depends on and which ID numbers it assigned to the entities | 
|  | in those modules, as well as which ID numbers it took for its own new | 
|  | entities.  The AST reader then maps these "local" ID numbers into a "global" | 
|  | ID number space for the current translation unit, providing a 1-1 mapping | 
|  | between entities (in whatever AST file they inhabit) and global ID numbers. | 
|  | If that translation unit is then serialized into an AST file, this mapping | 
|  | will be stored for use when the AST file is imported. | 
|  |  | 
|  | Declaration merging | 
|  | It is possible for a given entity (from the language's perspective) to be | 
|  | declared multiple times in different places.  For example, two different | 
|  | headers can have the declaration of ``printf`` or could forward-declare | 
|  | ``struct stat``.  If each of those headers is included in a module, and some | 
|  | third party imports both of those modules, there is a potentially serious | 
|  | problem: name lookup for ``printf`` or ``struct stat`` will find both | 
|  | declarations, but the AST nodes are unrelated.  This would result in a | 
|  | compilation error, due to an ambiguity in name lookup.  Therefore, the AST | 
|  | reader performs declaration merging according to the appropriate language | 
|  | semantics, ensuring that the two disjoint declarations are merged into a | 
|  | single redeclaration chain (with a common canonical declaration), so that it | 
|  | is as if one of the headers had been included before the other. | 
|  |  | 
|  | Name Visibility | 
|  | Modules allow certain names that occur during module creation to be "hidden", | 
|  | so that they are not part of the public interface of the module and are not | 
|  | visible to its clients.  The AST reader maintains a "visible" bit on various | 
|  | AST nodes (declarations, macros, etc.) to indicate whether that particular | 
|  | AST node is currently visible; the various name lookup mechanisms in Clang | 
|  | inspect the visible bit to determine whether that entity, which is still in | 
|  | the AST (because other, visible AST nodes may depend on it), can actually be | 
|  | found by name lookup.  When a new (sub)module is imported, it may make | 
|  | existing, non-visible, already-deserialized AST nodes visible; it is the | 
|  | responsibility of the AST reader to find and update these AST nodes when it | 
|  | is notified of the import. | 
|  |  |