| The ELF, COFF and Wasm Linkers |
| ============================== |
| |
| The ELF Linker as a Library |
| --------------------------- |
| |
| You can embed LLD to your program by linking against it and calling the linker's |
| entry point function lld::elf::link. |
| |
| The current policy is that it is your reponsibility to give trustworthy object |
| files. The function is guaranteed to return as long as you do not pass corrupted |
| or malicious object files. A corrupted file could cause a fatal error or SEGV. |
| That being said, you don't need to worry too much about it if you create object |
| files in the usual way and give them to the linker. It is naturally expected to |
| work, or otherwise it's a linker's bug. |
| |
| Design |
| ====== |
| |
| We will describe the design of the linkers in the rest of the document. |
| |
| Key Concepts |
| ------------ |
| |
| Linkers are fairly large pieces of software. |
| There are many design choices you have to make to create a complete linker. |
| |
| This is a list of design choices we've made for ELF and COFF LLD. |
| We believe that these high-level design choices achieved a right balance |
| between speed, simplicity and extensibility. |
| |
| * Implement as native linkers |
| |
| We implemented the linkers as native linkers for each file format. |
| |
| The linkers share the same design but share very little code. |
| Sharing code makes sense if the benefit is worth its cost. |
| In our case, the object formats are different enough that we thought the layer |
| to abstract the differences wouldn't be worth its complexity and run-time |
| cost. Elimination of the abstract layer has greatly simplified the |
| implementation. |
| |
| * Speed by design |
| |
| One of the most important things in archiving high performance is to |
| do less rather than do it efficiently. |
| Therefore, the high-level design matters more than local optimizations. |
| Since we are trying to create a high-performance linker, |
| it is very important to keep the design as efficient as possible. |
| |
| Broadly speaking, we do not do anything until we have to do it. |
| For example, we do not read section contents or relocations |
| until we need them to continue linking. |
| When we need to do some costly operation (such as looking up |
| a hash table for each symbol), we do it only once. |
| We obtain a handler (which is typically just a pointer to actual data) |
| on the first operation and use it throughout the process. |
| |
| * Efficient archive file handling |
| |
| LLD's handling of archive files (the files with ".a" file extension) is |
| different from the traditional Unix linkers and similar to Windows linkers. |
| We'll describe how the traditional Unix linker handles archive files, what the |
| problem is, and how LLD approached the problem. |
| |
| The traditional Unix linker maintains a set of undefined symbols during |
| linking. The linker visits each file in the order as they appeared in the |
| command line until the set becomes empty. What the linker would do depends on |
| file type. |
| |
| - If the linker visits an object file, the linker links object files to the |
| result, and undefined symbols in the object file are added to the set. |
| |
| - If the linker visits an archive file, it checks for the archive file's |
| symbol table and extracts all object files that have definitions for any |
| symbols in the set. |
| |
| This algorithm sometimes leads to a counter-intuitive behavior. If you give |
| archive files before object files, nothing will happen because when the linker |
| visits archives, there is no undefined symbols in the set. As a result, no |
| files are extracted from the first archive file, and the link is done at that |
| point because the set is empty after it visits one file. |
| |
| You can fix the problem by reordering the files, |
| but that cannot fix the issue of mutually-dependent archive files. |
| |
| Linking mutually-dependent archive files is tricky. You may specify the same |
| archive file multiple times to let the linker visit it more than once. Or, |
| you may use the special command line options, `--start-group` and |
| `--end-group`, to let the linker loop over the files between the options until |
| no new symbols are added to the set. |
| |
| Visiting the same archive files multiple makes the linker slower. |
| |
| Here is how LLD approaches the problem. Instead of memorizing only undefined |
| symbols, we program LLD so that it memorizes all symbols. When it sees an |
| undefined symbol that can be resolved by extracting an object file from an |
| archive file it previously visited, it immediately extracts the file and link |
| it. It is doable because LLD does not forget symbols it have seen in archive |
| files. |
| |
| We believe that the LLD's way is efficient and easy to justify. |
| |
| The semantics of LLD's archive handling is different from the traditional |
| Unix's. You can observe it if you carefully craft archive files to exploit |
| it. However, in reality, we don't know any program that cannot link with our |
| algorithm so far, so it's not going to cause trouble. |
| |
| Numbers You Want to Know |
| ------------------------ |
| |
| To give you intuition about what kinds of data the linker is mainly working on, |
| I'll give you the list of objects and their numbers LLD has to read and process |
| in order to link a very large executable. In order to link Chrome with debug |
| info, which is roughly 2 GB in output size, LLD reads |
| |
| - 17,000 files, |
| - 1,800,000 sections, |
| - 6,300,000 symbols, and |
| - 13,000,000 relocations. |
| |
| LLD produces the 2 GB executable in 15 seconds. |
| |
| These numbers vary depending on your program, but in general, |
| you have a lot of relocations and symbols for each file. |
| If your program is written in C++, symbol names are likely to be |
| pretty long because of name mangling. |
| |
| It is important to not waste time on relocations and symbols. |
| |
| In the above case, the total amount of symbol strings is 450 MB, |
| and inserting all of them to a hash table takes 1.5 seconds. |
| Therefore, if you causally add a hash table lookup for each symbol, |
| it would slow down the linker by 10%. So, don't do that. |
| |
| On the other hand, you don't have to pursue efficiency |
| when handling files. |
| |
| Important Data Structures |
| ------------------------- |
| |
| We will describe the key data structures in LLD in this section. The linker can |
| be understood as the interactions between them. Once you understand their |
| functions, the code of the linker should look obvious to you. |
| |
| * Symbol |
| |
| This class represents a symbol. |
| They are created for symbols in object files or archive files. |
| The linker creates linker-defined symbols as well. |
| |
| There are basically three types of Symbols: Defined, Undefined, or Lazy. |
| |
| - Defined symbols are for all symbols that are considered as "resolved", |
| including real defined symbols, COMDAT symbols, common symbols, |
| absolute symbols, linker-created symbols, etc. |
| - Undefined symbols represent undefined symbols, which need to be replaced by |
| Defined symbols by the resolver until the link is complete. |
| - Lazy symbols represent symbols we found in archive file headers |
| which can turn into Defined if we read archieve members. |
| |
| There's only one Symbol instance for each unique symbol name. This uniqueness |
| is guaranteed by the symbol table. As the resolver reads symbols from input |
| files, it replaces an existing Symbol with the "best" Symbol for its symbol |
| name using the placement new. |
| |
| The above mechanism allows you to use pointers to Symbols as a very cheap way |
| to access name resolution results. Assume for example that you have a pointer |
| to an undefined symbol before name resolution. If the symbol is resolved to a |
| defined symbol by the resolver, the pointer will "automatically" point to the |
| defined symbol, because the undefined symbol the pointer pointed to will have |
| been replaced by the defined symbol in-place. |
| |
| * SymbolTable |
| |
| SymbolTable is basically a hash table from strings to Symbols |
| with logic to resolve symbol conflicts. It resolves conflicts by symbol type. |
| |
| - If we add Defined and Undefined symbols, the symbol table will keep the |
| former. |
| - If we add Defined and Lazy symbols, it will keep the former. |
| - If we add Lazy and Undefined, it will keep the former, |
| but it will also trigger the Lazy symbol to load the archive member |
| to actually resolve the symbol. |
| |
| * Chunk (COFF specific) |
| |
| Chunk represents a chunk of data that will occupy space in an output. |
| Each regular section becomes a chunk. |
| Chunks created for common or BSS symbols are not backed by sections. |
| The linker may create chunks to append additional data to an output as well. |
| |
| Chunks know about their size, how to copy their data to mmap'ed outputs, |
| and how to apply relocations to them. |
| Specifically, section-based chunks know how to read relocation tables |
| and how to apply them. |
| |
| * InputSection (ELF specific) |
| |
| Since we have less synthesized data for ELF, we don't abstract slices of |
| input files as Chunks for ELF. Instead, we directly use the input section |
| as an internal data type. |
| |
| InputSection knows about their size and how to copy themselves to |
| mmap'ed outputs, just like COFF Chunks. |
| |
| * OutputSection |
| |
| OutputSection is a container of InputSections (ELF) or Chunks (COFF). |
| An InputSection or Chunk belongs to at most one OutputSection. |
| |
| There are mainly three actors in this linker. |
| |
| * InputFile |
| |
| InputFile is a superclass of file readers. |
| We have a different subclass for each input file type, |
| such as regular object file, archive file, etc. |
| They are responsible for creating and owning Symbols and InputSections/Chunks. |
| |
| * Writer |
| |
| The writer is responsible for writing file headers and InputSections/Chunks to |
| a file. It creates OutputSections, put all InputSections/Chunks into them, |
| assign unique, non-overlapping addresses and file offsets to them, and then |
| write them down to a file. |
| |
| * Driver |
| |
| The linking process is driven by the driver. The driver: |
| |
| - processes command line options, |
| - creates a symbol table, |
| - creates an InputFile for each input file and puts all symbols within into |
| the symbol table, |
| - checks if there's no remaining undefined symbols, |
| - creates a writer, |
| - and passes the symbol table to the writer to write the result to a file. |
| |
| Link-Time Optimization |
| ---------------------- |
| |
| LTO is implemented by handling LLVM bitcode files as object files. |
| The linker resolves symbols in bitcode files normally. If all symbols |
| are successfully resolved, it then runs LLVM passes |
| with all bitcode files to convert them to one big regular ELF/COFF file. |
| Finally, the linker replaces bitcode symbols with ELF/COFF symbols, |
| so that they are linked as if they were in the native format from the beginning. |
| |
| The details are described in this document. |
| http://llvm.org/docs/LinkTimeOptimization.html |
| |
| Glossary |
| -------- |
| |
| * RVA (COFF) |
| |
| Short for Relative Virtual Address. |
| |
| Windows executables or DLLs are not position-independent; they are |
| linked against a fixed address called an image base. RVAs are |
| offsets from an image base. |
| |
| Default image bases are 0x140000000 for executables and 0x18000000 |
| for DLLs. For example, when we are creating an executable, we assume |
| that the executable will be loaded at address 0x140000000 by the |
| loader, so we apply relocations accordingly. Result texts and data |
| will contain raw absolute addresses. |
| |
| * VA |
| |
| Short for Virtual Address. For COFF, it is equivalent to RVA + image base. |
| |
| * Base relocations (COFF) |
| |
| Relocation information for the loader. If the loader decides to map |
| an executable or a DLL to a different address than their image |
| bases, it fixes up binaries using information contained in the base |
| relocation table. A base relocation table consists of a list of |
| locations containing addresses. The loader adds a difference between |
| RVA and actual load address to all locations listed there. |
| |
| Note that this run-time relocation mechanism is much simpler than ELF. |
| There's no PLT or GOT. Images are relocated as a whole just |
| by shifting entire images in memory by some offsets. Although doing |
| this breaks text sharing, I think this mechanism is not actually bad |
| on today's computers. |
| |
| * ICF |
| |
| Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). |
| |
| ICF is an optimization to reduce output size by merging read-only sections |
| by not only their names but by their contents. If two read-only sections |
| happen to have the same metadata, actual contents and relocations, |
| they are merged by ICF. It is known as an effective technique, |
| and it usually reduces C++ program's size by a few percent or more. |
| |
| Note that this is not an entirely sound optimization. C/C++ require |
| different functions have different addresses. If a program depends on |
| that property, it would fail at runtime. |
| |
| On Windows, that's not really an issue because MSVC link.exe enabled |
| the optimization by default. As long as your program works |
| with the linker's default settings, your program should be safe with ICF. |
| |
| On Unix, your program is generally not guaranteed to be safe with ICF, |
| although large programs happen to work correctly. |
| LLD works fine with ICF for example. |