| .. include:: <isonum.txt> |
| ================================================== |
| Performance |
| ================================================== |
| |
| High-Performance Generalized Matrix Multiplication |
| -------------------------------------------------- |
| |
| Polly automatically detects and optimizes generalized matrix multiplication, |
| the computation C |larr| α ⊗ C ⊕ β ⊗ A ⊗ B, where A, B, and C are three appropriately sized matrices, |
| ⊕ and ⊗ operations are originating from the corresponding matrix semiring, and α and β are |
| constants, and beta is not equal to zero. It allows to obtain the highly optimized form structured |
| similar to the expert implementation of GEMM that can be found in GotoBLAS and its successors. The |
| performance evaluation of GEMM is shown in the following figure. |
| |
| |
| .. image:: images/GEMM_double.png |
| :align: center |
| |
| |
| |
| Compile Time Impact of Polly |
| ---------------------------- |
| |
| Clang+LLVM+Polly are compiled using Clang on a Intel(R) Core(TM) i7-7700 based system. The experiment |
| is repeated twice: with and without Polly enabled in order to measure its compile time impact. |
| |
| The following versions are used: |
| |
| |
| - Polly (git hash 0db98a4837b6f233063307bb9184374175401922) |
| - Clang (git hash 3e1d04a92b51ed36163995c96c31a0e4bbb1561d) |
| - LLVM git hash 0265ec7ebad69a47f5c899d95295b5eb41aba68e) |
| |
| `ninja <https://ninja-build.org/>`_ is used as the build system. |
| |
| For both cases the whole compilation was performed five times. The compile times in seconds are shown in the following table. |
| |
| +--------------+-------------+ |
| |Polly Disabled|Polly Enabled| |
| +==============+=============+ |
| |964 |977 | |
| +--------------+-------------+ |
| |964 |980 | |
| +--------------+-------------+ |
| |967 |981 | |
| +--------------+-------------+ |
| |967 |981 | |
| +--------------+-------------+ |
| |968 |982 | |
| +--------------+-------------+ |
| |
| |
| The median compile time without Polly enabled is 967 seconds and with Polly enabled it is 981 seconds. The overhead is 1.4%. |
| |