| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> |
| <html> |
| <head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| <title>Polly - Performance</title> |
| <link type="text/css" rel="stylesheet" href="menu.css"> |
| <link type="text/css" rel="stylesheet" href="content.css"> |
| </head> |
| <body> |
| <div id="box"> |
| <!--#include virtual="menu.html.incl"--> |
| <div id="content"> |
| <h1>Performance</h1> |
| |
| <p>To evaluate the performance benefits Polly currently provides we compiled the |
| <a href="http://www.cse.ohio-state.edu/~pouchet/software/polybench/">Polybench |
| 2.0</a> benchmark suite. Each benchmark was run with double precision floating |
| point values on an Intel Core Xeon X5670 CPU @ 2.93GHz (12 cores, 24 thread) |
| system. We used <a href="http://pocc.sf.net">PoCC</a> and the included <a |
| href="http://pluto-compiler.sf.net">Pluto</a> transformations to optimize the |
| code. The source code of Polly and LLVM/clang was checked out on |
| 25/03/2011.</p> |
| |
| <p>The results shown were created fully automatically without manual |
| interaction. We did not yet spend any time to tune the results. Hence |
| further improvments may be achieved by tuning the code generated by Polly, the |
| heuristics used by Pluto or by investigating if more code could be optimized. |
| As Pluto was never used at such a low level, its heuristics are probably |
| far from perfect. Another area where we expect larger performance improvements |
| is the SIMD vector code generation. At the moment, it rarely yields to |
| performance improvements, as we did not yet include vectorization in our |
| heuristics. By changing this we should be able to significantly increase the |
| number of test cases that show improvements.</p> |
| |
| <p>The polybench test suite contains computation kernels from linear algebra |
| routines, stencil computations, image processing and data mining. Polly |
| recognices the majority of them and is able to show good speedup. However, |
| to show similar speedup on larger examples like the SPEC CPU benchmarks Polly |
| still misses support for integer casts, variable-sized multi-dimensional arrays |
| and probably several other construts. This support is necessary as such |
| constructs appear in larger programs, but not in our limited test suite. |
| |
| <h2> Sequential runs</h2> |
| |
| For the sequential runs we used Polly to create a program structure that is |
| optimized for data-locality. One of the major optimizations performed is tiling. |
| The speedups shown are without the use of any multi-core parallelism. No |
| additional hardware is used, but the single available core is used more |
| efficiently. |
| <h3> Small data size</h3> |
| <img src="images/performance/sequential-small.png" /><br /> |
| <h3> Large data size</h3> |
| <img src="images/performance/sequential-large.png" /> |
| <h2> Parallel runs</h2> |
| For the parallel runs we used Polly to expose parallelism and to add calls to an |
| OpenMP runtime library. With OpenMP we can use all 12 hardware cores |
| instead of the single core that was used before. We can see that in several |
| cases we obtain more than linear speedup. This additional speedup is due to |
| improved data-locality. |
| <h3> Small data size</h3> |
| <img src="images/performance/parallel-small.png" /><br /> |
| <h3> Large data size</h3> |
| <img src="images/performance/parallel-large.png" /> |
| </div> |
| </div> |
| </body> |
| </html> |