|  | Requirements: | 
|  |  | 
|  | - automake, autoconf, libtool | 
|  | (not needed when compiling a release) | 
|  | - pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config) | 
|  | (not needed when compiling a release using the included isl and pet) | 
|  | - gmp (http://gmplib.org/) | 
|  | - libyaml (http://pyyaml.org/wiki/LibYAML) | 
|  | (only needed if you want to compile the pet executable) | 
|  | - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html) | 
|  | Unless you have some other reasons for wanting to use the svn version, | 
|  | it is best to install the latest release (3.9). | 
|  | For more details, see pet/README. | 
|  |  | 
|  | If you are installing on Ubuntu, then you can install the following packages: | 
|  |  | 
|  | automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm | 
|  |  | 
|  | Note that you need at least version 3.2 of libclang-dev (ubuntu raring). | 
|  | Older versions of this package did not include the required libraries. | 
|  | If you are using an older version of ubuntu, then you need to compile and | 
|  | install LLVM/clang from source. | 
|  |  | 
|  |  | 
|  | Preparing: | 
|  |  | 
|  | Grab the latest release and extract it or get the source from | 
|  | the git repository as follows.  This process requires autoconf, | 
|  | automake, libtool and pkg-config. | 
|  |  | 
|  | git clone git://repo.or.cz/ppcg.git | 
|  | cd ppcg | 
|  | ./get_submodules.sh | 
|  | ./autogen.sh | 
|  |  | 
|  |  | 
|  | Compilation: | 
|  |  | 
|  | ./configure | 
|  | make | 
|  | make check | 
|  |  | 
|  | If you have installed any of the required libraries in a non-standard | 
|  | location, then you may need to use the --with-gmp-prefix, | 
|  | --with-libyaml-prefix and/or --with-clang-prefix options | 
|  | when calling "./configure". | 
|  |  | 
|  |  | 
|  | Using PPCG to generate CUDA or OpenCL code | 
|  |  | 
|  | To convert a fragment of a C program to CUDA, insert a line containing | 
|  |  | 
|  | #pragma scop | 
|  |  | 
|  | before the fragment and add a line containing | 
|  |  | 
|  | #pragma endscop | 
|  |  | 
|  | after the fragment.  To generate CUDA code run | 
|  |  | 
|  | ppcg --target=cuda file.c | 
|  |  | 
|  | where file.c is the file containing the fragment.  The generated | 
|  | code is stored in file_host.cu and file_kernel.cu. | 
|  |  | 
|  | To generate OpenCL code run | 
|  |  | 
|  | ppcg --target=opencl file.c | 
|  |  | 
|  | where file.c is the file containing the fragment.  The generated code | 
|  | is stored in file_host.c and file_kernel.cl. | 
|  |  | 
|  |  | 
|  | Specifying tile, grid and block sizes | 
|  |  | 
|  | The iterations space tile size, grid size and block size can | 
|  | be specified using the --sizes option.  The argument is a union map | 
|  | in isl notation mapping kernels identified by their sequence number | 
|  | in a "kernel" space to singleton sets in the "tile", "grid" and "block" | 
|  | spaces.  The sizes are specified outermost to innermost. | 
|  |  | 
|  | The dimension of the "tile" space indicates the (maximal) number of loop | 
|  | dimensions to tile.  The elements of the single integer tuple | 
|  | specify the tile sizes in each dimension. | 
|  | In case of hybrid tiling, the first element is half the size of | 
|  | the tile in the time (sequential) dimension.  The second element | 
|  | specifies the number of elements in the base of the hexagon. | 
|  | The remaining elements specify the tile sizes in the remaining space | 
|  | dimensions. | 
|  |  | 
|  | The dimension of the "grid" space indicates the (maximal) number of block | 
|  | dimensions in the grid.  The elements of the single integer tuple | 
|  | specify the number of blocks in each dimension. | 
|  |  | 
|  | The dimension of the "block" space indicates the (maximal) number of thread | 
|  | dimensions in the grid.  The elements of the single integer tuple | 
|  | specify the number of threads in each dimension. | 
|  |  | 
|  | For example, | 
|  |  | 
|  | { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 } | 
|  |  | 
|  | specifies that in kernel 0, two loops should be tiled with a tile | 
|  | size of 64 in both dimensions and that all kernels except kernel 4 | 
|  | should be run using a block of 16 threads. | 
|  |  | 
|  | Since PPCG performs some scheduling, it can be difficult to predict | 
|  | what exactly will end up in a kernel.  If you want to specify | 
|  | tile, grid or block sizes, you may want to run PPCG first with the defaults, | 
|  | examine the kernels and then run PPCG again with the desired sizes. | 
|  | Instead of examining the kernels, you can also specify the option | 
|  | --dump-sizes on the first run to obtain the effectively used default sizes. | 
|  |  | 
|  |  | 
|  | Compiling the generated CUDA code with nvcc | 
|  |  | 
|  | To get optimal performance from nvcc, it is important to choose --arch | 
|  | according to your target GPU.  Specifically, use the flag "--arch sm_20" | 
|  | for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for | 
|  | GK110 Kepler.  We discourage the use of older cards as we have seen | 
|  | correctness issues with compilation for older architectures. | 
|  | Note that in the absence of any --arch flag, nvcc defaults to | 
|  | "--arch sm_13". This will not only be slower, but can also cause | 
|  | correctness issues. | 
|  | If you want to obtain results that are identical to those obtained | 
|  | by the original code, then you may need to disable some optimizations | 
|  | by passing the "--fmad=false" option. | 
|  |  | 
|  |  | 
|  | Compiling the generated OpenCL code with gcc | 
|  |  | 
|  | To compile the host code you need to link against the file | 
|  | ocl_utilities.c which contains utility functions used by the generated | 
|  | OpenCL host code.  To compile the host code with gcc, run | 
|  |  | 
|  | gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL | 
|  |  | 
|  | Note that we have experienced the generated OpenCL code freezing | 
|  | on some inputs (e.g., the PolyBench symm benchmark) when using | 
|  | at least some version of the Nvidia OpenCL library, while the | 
|  | corresponding CUDA code runs fine. | 
|  | We have experienced no such freezes when using AMD, ARM or Intel | 
|  | OpenCL libraries. | 
|  |  | 
|  | By default, the compiled executable will need the _kernel.cl file at | 
|  | run time.  Alternatively, the option --opencl-embed-kernel-code may be | 
|  | given to place the kernel code in a string literal.  The kernel code is | 
|  | then compiled into the host binary, such that the _kernel.cl file is no | 
|  | longer needed at run time.  Any kernel include files, in particular | 
|  | those supplied using --opencl-include-file, will still be required at | 
|  | run time. | 
|  |  | 
|  |  | 
|  | Function calls | 
|  |  | 
|  | Function calls inside the analyzed fragment are reproduced | 
|  | in the CUDA or OpenCL code, but for now it is left to the user | 
|  | to make sure that the functions that are being called are | 
|  | available from the generated kernels. | 
|  |  | 
|  | In the case of OpenCL code, the --opencl-include-file option | 
|  | may be used to specify one or more files to be #include'd | 
|  | from the generated code.  These files may then contain | 
|  | the definitions of the functions being called from the | 
|  | program fragment.  If the pathnames of the included files | 
|  | are relative to the current directory, then you may need | 
|  | to additionally specify the --opencl-compiler-options=-I. | 
|  | to make sure that the files can be found by the OpenCL compiler. | 
|  | The included files may contain definitions of types used by the | 
|  | generated kernels.  By default, PPCG generates definitions for | 
|  | types as needed, but these definitions may collide with those in | 
|  | the included files, as PPCG does not consider the contents of the | 
|  | included files.  The --no-opencl-print-kernel-types will prevent | 
|  | PPCG from generating type definitions. | 
|  |  | 
|  |  | 
|  | GNU extensions | 
|  |  | 
|  | By default, PPCG may print out macro definitions that involve | 
|  | GNU extensions such as __typeof__ and statement expressions. | 
|  | Some compilers may not support these extensions. | 
|  | In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918) | 
|  | has been reported not to support __typeof__. | 
|  | The use of these extensions can be turned off with the | 
|  | --no-allow-gnu-extensions option. | 
|  |  | 
|  |  | 
|  | Processing PolyBench | 
|  |  | 
|  | When processing a PolyBench/C 3.2 benchmark, you should always specify | 
|  | -DPOLYBENCH_USE_C99_PROTO on the ppcg command line.  Otherwise, the source | 
|  | files are inconsistent, having fixed size arrays but parametrically | 
|  | bounded loops iterating over them. | 
|  | However, you should not specify this define when compiling | 
|  | the PPCG generated code using nvcc since CUDA does not support VLAs. | 
|  |  | 
|  |  | 
|  | CUDA and function overloading | 
|  |  | 
|  | While CUDA supports function overloading based on the arguments types, | 
|  | no such function overloading exists in the input language C.  Since PPCG | 
|  | simply prints out the same function name as in the original code, this | 
|  | may result in a different function being called based on the types | 
|  | of the arguments.  For example, if the original code contains a call | 
|  | to the function sqrt() with a float argument, then the argument will | 
|  | be promoted to a double and the sqrt() function will be called. | 
|  | In the transformed (CUDA) code, however, overloading will cause the | 
|  | function sqrtf() to be called.  Until this issue has been resolved in PPCG, | 
|  | we recommend that users either explicitly call the function sqrtf() or | 
|  | explicitly cast the argument to double in the input code. | 
|  |  | 
|  |  | 
|  | Contact | 
|  |  | 
|  | For bug reports, feature requests and questions, | 
|  | contact http://groups.google.com/group/isl-development | 
|  |  | 
|  | Whenever you report a bug, please mention the exact version of PPCG | 
|  | that you are using (output of "./ppcg --version").  If you are unable | 
|  | to compile PPCG, then report the git version (output of "git describe") | 
|  | or the version number included in the name of the tarball. | 
|  |  | 
|  |  | 
|  | Citing PPCG | 
|  |  | 
|  | If you use PPCG for your research, you are invited to cite | 
|  | the following paper. | 
|  |  | 
|  | @article{Verdoolaege2013PPCG, | 
|  | author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and | 
|  | G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and | 
|  | Catthoor, Francky}, | 
|  | title = {Polyhedral parallel code generation for CUDA}, | 
|  | journal = {ACM Trans. Archit. Code Optim.}, | 
|  | issue_date = {January 2013}, | 
|  | volume = {9}, | 
|  | number = {4}, | 
|  | month = jan, | 
|  | year = {2013}, | 
|  | issn = {1544-3566}, | 
|  | pages = {54:1--54:23}, | 
|  | doi = {10.1145/2400682.2400713}, | 
|  | acmid = {2400713}, | 
|  | publisher = {ACM}, | 
|  | address = {New York, NY, USA}, | 
|  | } |