|  | ===================================================================== | 
|  | Building a JIT: Adding Optimizations -- An introduction to ORC Layers | 
|  | ===================================================================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | **This tutorial is under active development. It is incomplete and details may | 
|  | change frequently.** Nonetheless we invite you to try it out as it stands, and | 
|  | we welcome any feedback. | 
|  |  | 
|  | Chapter 2 Introduction | 
|  | ====================== | 
|  |  | 
|  | **Warning: This text is currently out of date due to ORC API updates.** | 
|  |  | 
|  | **The example code has been updated and can be used. The text will be updated | 
|  | once the API churn dies down.** | 
|  |  | 
|  | Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In | 
|  | `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT | 
|  | class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce | 
|  | executable code in memory. KaleidoscopeJIT was able to do this with relatively | 
|  | little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and | 
|  | ObjectLinkingLayer, to do much of the heavy lifting. | 
|  |  | 
|  | In this layer we'll learn more about the ORC layer concept by using a new layer, | 
|  | IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. | 
|  |  | 
|  | Optimizing Modules using the IRTransformLayer | 
|  | ============================================= | 
|  |  | 
|  | In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM" | 
|  | tutorial series the llvm *FunctionPassManager* is introduced as a means for | 
|  | optimizing LLVM IR. Interested readers may read that chapter for details, but | 
|  | in short: to optimize a Module we create an llvm::FunctionPassManager | 
|  | instance, configure it with a set of optimizations, then run the PassManager on | 
|  | a Module to mutate it into a (hopefully) more optimized but semantically | 
|  | equivalent form. In the original tutorial series the FunctionPassManager was | 
|  | created outside the KaleidoscopeJIT and modules were optimized before being | 
|  | added to it. In this Chapter we will make optimization a phase of our JIT | 
|  | instead. For now this will provide us a motivation to learn more about ORC | 
|  | layers, but in the long term making optimization part of our JIT will yield an | 
|  | important benefit: When we begin lazily compiling code (i.e. deferring | 
|  | compilation of each function until the first time it's run), having | 
|  | optimization managed by our JIT will allow us to optimize lazily too, rather | 
|  | than having to do all our optimization up-front. | 
|  |  | 
|  | To add optimization support to our JIT we will take the KaleidoscopeJIT from | 
|  | Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the | 
|  | IRTransformLayer works in more detail below, but the interface is simple: the | 
|  | constructor for this layer takes a reference to the layer below (as all layers | 
|  | do) plus an *IR optimization function* that it will apply to each Module that | 
|  | is added via addModule: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | class KaleidoscopeJIT { | 
|  | private: | 
|  | std::unique_ptr<TargetMachine> TM; | 
|  | const DataLayout DL; | 
|  | RTDyldObjectLinkingLayer<> ObjectLayer; | 
|  | IRCompileLayer<decltype(ObjectLayer)> CompileLayer; | 
|  |  | 
|  | using OptimizeFunction = | 
|  | std::function<std::shared_ptr<Module>(std::shared_ptr<Module>)>; | 
|  |  | 
|  | IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer; | 
|  |  | 
|  | public: | 
|  | using ModuleHandle = decltype(OptimizeLayer)::ModuleHandleT; | 
|  |  | 
|  | KaleidoscopeJIT() | 
|  | : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), | 
|  | ObjectLayer([]() { return std::make_shared<SectionMemoryManager>(); }), | 
|  | CompileLayer(ObjectLayer, SimpleCompiler(*TM)), | 
|  | OptimizeLayer(CompileLayer, | 
|  | [this](std::unique_ptr<Module> M) { | 
|  | return optimizeModule(std::move(M)); | 
|  | }) { | 
|  | llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); | 
|  | } | 
|  |  | 
|  | Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, | 
|  | but after the CompileLayer we introduce a typedef for our optimization function. | 
|  | In this case we use a std::function (a handy wrapper for "function-like" things) | 
|  | from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With | 
|  | our optimization function typedef in place we can declare our OptimizeLayer, | 
|  | which sits on top of our CompileLayer. | 
|  |  | 
|  | To initialize our OptimizeLayer we pass it a reference to the CompileLayer | 
|  | below (standard practice for layers), and we initialize the OptimizeFunction | 
|  | using a lambda that calls out to an "optimizeModule" function that we will | 
|  | define below. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // ... | 
|  | auto Resolver = createLambdaResolver( | 
|  | [&](const std::string &Name) { | 
|  | if (auto Sym = OptimizeLayer.findSymbol(Name, false)) | 
|  | return Sym; | 
|  | return JITSymbol(nullptr); | 
|  | }, | 
|  | // ... | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // ... | 
|  | return cantFail(OptimizeLayer.addModule(std::move(M), | 
|  | std::move(Resolver))); | 
|  | // ... | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // ... | 
|  | return OptimizeLayer.findSymbol(MangledNameStream.str(), true); | 
|  | // ... | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // ... | 
|  | cantFail(OptimizeLayer.removeModule(H)); | 
|  | // ... | 
|  |  | 
|  | Next we need to replace references to 'CompileLayer' with references to | 
|  | OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In | 
|  | addModule we need to be careful to replace both references: the findSymbol call | 
|  | inside our resolver, and the call through to addModule. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | std::shared_ptr<Module> optimizeModule(std::shared_ptr<Module> M) { | 
|  | // Create a function pass manager. | 
|  | auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get()); | 
|  |  | 
|  | // Add some optimizations. | 
|  | FPM->add(createInstructionCombiningPass()); | 
|  | FPM->add(createReassociatePass()); | 
|  | FPM->add(createGVNPass()); | 
|  | FPM->add(createCFGSimplificationPass()); | 
|  | FPM->doInitialization(); | 
|  |  | 
|  | // Run the optimizations over all functions in the module being added to | 
|  | // the JIT. | 
|  | for (auto &F : *M) | 
|  | FPM->run(F); | 
|  |  | 
|  | return M; | 
|  | } | 
|  |  | 
|  | At the bottom of our JIT we add a private method to do the actual optimization: | 
|  | *optimizeModule*. This function sets up a FunctionPassManager, adds some passes | 
|  | to it, runs it over every function in the module, and then returns the mutated | 
|  | module. The specific optimizations are the same ones used in | 
|  | `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM" | 
|  | tutorial series. Readers may visit that chapter for a more in-depth | 
|  | discussion of these, and of IR optimization in general. | 
|  |  | 
|  | And that's it in terms of changes to KaleidoscopeJIT: When a module is added via | 
|  | addModule the OptimizeLayer will call our optimizeModule function before passing | 
|  | the transformed module on to the CompileLayer below. Of course, we could have | 
|  | called optimizeModule directly in our addModule function and not gone to the | 
|  | bother of using the IRTransformLayer, but doing so gives us another opportunity | 
|  | to see how layers compose. It also provides a neat entry point to the *layer* | 
|  | concept itself, because IRTransformLayer turns out to be one of the simplest | 
|  | implementations of the layer concept that can be devised: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | template <typename BaseLayerT, typename TransformFtor> | 
|  | class IRTransformLayer { | 
|  | public: | 
|  | using ModuleHandleT = typename BaseLayerT::ModuleHandleT; | 
|  |  | 
|  | IRTransformLayer(BaseLayerT &BaseLayer, | 
|  | TransformFtor Transform = TransformFtor()) | 
|  | : BaseLayer(BaseLayer), Transform(std::move(Transform)) {} | 
|  |  | 
|  | Expected<ModuleHandleT> | 
|  | addModule(std::shared_ptr<Module> M, | 
|  | std::shared_ptr<JITSymbolResolver> Resolver) { | 
|  | return BaseLayer.addModule(Transform(std::move(M)), std::move(Resolver)); | 
|  | } | 
|  |  | 
|  | void removeModule(ModuleHandleT H) { BaseLayer.removeModule(H); } | 
|  |  | 
|  | JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) { | 
|  | return BaseLayer.findSymbol(Name, ExportedSymbolsOnly); | 
|  | } | 
|  |  | 
|  | JITSymbol findSymbolIn(ModuleHandleT H, const std::string &Name, | 
|  | bool ExportedSymbolsOnly) { | 
|  | return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly); | 
|  | } | 
|  |  | 
|  | void emitAndFinalize(ModuleHandleT H) { | 
|  | BaseLayer.emitAndFinalize(H); | 
|  | } | 
|  |  | 
|  | TransformFtor& getTransform() { return Transform; } | 
|  |  | 
|  | const TransformFtor& getTransform() const { return Transform; } | 
|  |  | 
|  | private: | 
|  | BaseLayerT &BaseLayer; | 
|  | TransformFtor Transform; | 
|  | }; | 
|  |  | 
|  | This is the whole definition of IRTransformLayer, from | 
|  | ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its | 
|  | comments. It is a template class with two template arguments: ``BaesLayerT`` and | 
|  | ``TransformFtor`` that provide the type of the base layer and the type of the | 
|  | "transform functor" (in our case a std::function) respectively. This class is | 
|  | concerned with two very simple jobs: (1) Running every IR Module that is added | 
|  | with addModule through the transform functor, and (2) conforming to the ORC | 
|  | layer interface. The interface consists of one typedef and five methods: | 
|  |  | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |     Interface    |                         Description                       | | 
|  | +==================+===========================================================+ | 
|  | |                  | Provides a handle that can be used to identify a module   | | 
|  | | ModuleHandleT    | set when calling findSymbolIn, removeModule, or           | | 
|  | |                  | emitAndFinalize.                                          | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |                  | Takes a given set of Modules and makes them "available    | | 
|  | |                  | for execution". This means that symbols in those modules  | | 
|  | |                  | should be searchable via findSymbol and findSymbolIn, and | | 
|  | |                  | the address of the symbols should be read/writable (for   | | 
|  | |                  | data symbols), or executable (for function symbols) after | | 
|  | |                  | JITSymbol::getAddress() is called. Note: This means that  | | 
|  | |   addModule      | addModule doesn't have to compile (or do any other        | | 
|  | |                  | work) up-front. It *can*, like IRCompileLayer, act        | | 
|  | |                  | eagerly, but it can also simply record the module and     | | 
|  | |                  | take no further action until somebody calls               | | 
|  | |                  | JITSymbol::getAddress(). In IRTransformLayer's case       | | 
|  | |                  | addModule eagerly applies the transform functor to        | | 
|  | |                  | each module in the set, then passes the resulting set     | | 
|  | |                  | of mutated modules down to the layer below.               | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |                  | Removes a set of modules from the JIT. Code or data       | | 
|  | |  removeModule    | defined in these modules will no longer be available, and | | 
|  | |                  | the memory holding the JIT'd definitions will be freed.   | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |                  | Searches for the named symbol in all modules that have    | | 
|  | |                  | previously been added via addModule (and not yet          | | 
|  | |    findSymbol    | removed by a call to removeModule). In                    | | 
|  | |                  | IRTransformLayer we just pass the query on to the layer   | | 
|  | |                  | below. In our REPL this is our default way to search for  | | 
|  | |                  | function definitions.                                     | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |                  | Searches for the named symbol in the module set indicated | | 
|  | |                  | by the given ModuleHandleT. This is just an optimized     | | 
|  | |                  | search, better for lookup-speed when you know exactly     | | 
|  | |                  | a symbol definition should be found. In IRTransformLayer  | | 
|  | |   findSymbolIn   | we just pass this query on to the layer below. In our     | | 
|  | |                  | REPL we use this method to search for functions           | | 
|  | |                  | representing top-level expressions, since we know exactly | | 
|  | |                  | where we'll find them: in the top-level expression module | | 
|  | |                  | we just added.                                            | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  | |                  | Forces all of the actions required to make the code and   | | 
|  | |                  | data in a module set (represented by a ModuleHandleT)     | | 
|  | |                  | accessible. Behaves as if some symbol in the set had been | | 
|  | |                  | searched for and JITSymbol::getSymbolAddress called. This | | 
|  | | emitAndFinalize  | is rarely needed, but can be useful when dealing with     | | 
|  | |                  | layers that usually behave lazily if the user wants to    | | 
|  | |                  | trigger early compilation (for example, to use idle CPU   | | 
|  | |                  | time to eagerly compile code in the background).          | | 
|  | +------------------+-----------------------------------------------------------+ | 
|  |  | 
|  | This interface attempts to capture the natural operations of a JIT (with some | 
|  | wrinkles like emitAndFinalize for performance), similar to the basic JIT API | 
|  | operations we identified in Chapter 1. Conforming to the layer concept allows | 
|  | classes to compose neatly by implementing their behaviors in terms of the these | 
|  | same operations, carried out on the layer below. For example, an eager layer | 
|  | (like IRTransformLayer) can implement addModule by running each module in the | 
|  | set through its transform up-front and immediately passing the result to the | 
|  | layer below. A lazy layer, by contrast, could implement addModule by | 
|  | squirreling away the modules doing no other up-front work, but applying the | 
|  | transform (and calling addModule on the layer below) when the client calls | 
|  | findSymbol instead. The JIT'd program behavior will be the same either way, but | 
|  | these choices will have different performance characteristics: Doing work | 
|  | eagerly means the JIT takes longer up-front, but proceeds smoothly once this is | 
|  | done. Deferring work allows the JIT to get up-and-running quickly, but will | 
|  | force the JIT to pause and wait whenever some code or data is needed that hasn't | 
|  | already been processed. | 
|  |  | 
|  | Our current REPL is eager: Each function definition is optimized and compiled as | 
|  | soon as it's typed in. If we were to make the transform layer lazy (but not | 
|  | change things otherwise) we could defer optimization until the first time we | 
|  | reference a function in a top-level expression (see if you can figure out why, | 
|  | then check out the answer below [1]_). In the next chapter, however we'll | 
|  | introduce fully lazy compilation, in which function's aren't compiled until | 
|  | they're first called at run-time. At this point the trade-offs get much more | 
|  | interesting: the lazier we are, the quicker we can start executing the first | 
|  | function, but the more often we'll have to pause to compile newly encountered | 
|  | functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow | 
|  | startup (which everything is optimized) but relatively short pauses as each | 
|  | function just passes through code-gen. If we both optimize and code-gen lazily | 
|  | we can start executing the first function more quickly, but we'll have longer | 
|  | pauses as each function has to be both optimized and code-gen'd when it's first | 
|  | executed. Things become even more interesting if we consider interproceedural | 
|  | optimizations like inlining, which must be performed eagerly. These are | 
|  | complex trade-offs, and there is no one-size-fits all solution to them, but by | 
|  | providing composable layers we leave the decisions to the person implementing | 
|  | the JIT, and make it easy for them to experiment with different configurations. | 
|  |  | 
|  | `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ | 
|  |  | 
|  | Full Code Listing | 
|  | ================= | 
|  |  | 
|  | Here is the complete code listing for our running example with an | 
|  | IRTransformLayer added to enable optimization. To build this example, use: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | # Compile | 
|  | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy | 
|  | # Run | 
|  | ./toy | 
|  |  | 
|  | Here is the code: | 
|  |  | 
|  | .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h | 
|  | :language: c++ | 
|  |  | 
|  | .. [1] When we add our top-level expression to the JIT, any calls to functions | 
|  | that we defined earlier will appear to the RTDyldObjectLinkingLayer as | 
|  | external symbols. The RTDyldObjectLinkingLayer will call the SymbolResolver | 
|  | that we defined in addModule, which in turn calls findSymbol on the | 
|  | OptimizeLayer, at which point even a lazy transform layer will have to | 
|  | do its work. |