| .. raw:: html | 
 |  | 
 |   <style type="text/css"> | 
 |     .none { background-color: #FFCCCC } | 
 |     .partial { background-color: #FFFF99 } | 
 |     .good { background-color: #CCFF99 } | 
 |   </style> | 
 |  | 
 | .. role:: none | 
 | .. role:: partial | 
 | .. role:: good | 
 |  | 
 | .. contents:: | 
 |    :local: | 
 |  | 
 | ================== | 
 | OpenMP Support | 
 | ================== | 
 |  | 
 | Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, | 
 | PPC64[LE] and has `basic support for Cuda devices`_. | 
 |  | 
 | Standalone directives | 
 | ===================== | 
 |  | 
 | * #pragma omp [for] simd: :good:`Complete`. | 
 |  | 
 | * #pragma omp declare simd: :partial:`Partial`.  We support parsing/semantic | 
 |   analysis + generation of special attributes for X86 target, but still | 
 |   missing the LLVM pass for vectorization. | 
 |  | 
 | * #pragma omp taskloop [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp target [enter|exit] data: :good:`Complete`. | 
 |  | 
 | * #pragma omp target update: :good:`Complete`. | 
 |  | 
 | * #pragma omp target: :good:`Complete`. | 
 |  | 
 | * #pragma omp declare target: :good:`Complete`. | 
 |  | 
 | * #pragma omp teams: :good:`Complete`. | 
 |  | 
 | * #pragma omp distribute [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp distribute parallel for [simd]: :good:`Complete`. | 
 |  | 
 | Combined directives | 
 | =================== | 
 |  | 
 | * #pragma omp parallel for simd: :good:`Complete`. | 
 |  | 
 | * #pragma omp target parallel: :good:`Complete`. | 
 |  | 
 | * #pragma omp target parallel for [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp target simd: :good:`Complete`. | 
 |  | 
 | * #pragma omp target teams: :good:`Complete`. | 
 |  | 
 | * #pragma omp teams distribute [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp target teams distribute [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp teams distribute parallel for [simd]: :good:`Complete`. | 
 |  | 
 | * #pragma omp target teams distribute parallel for [simd]: :good:`Complete`. | 
 |  | 
 | Clang does not support any constructs/updates from upcoming OpenMP 5.0 except | 
 | for `reduction`-based clauses in the `task` and `target`-based directives. | 
 |  | 
 | In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools | 
 | Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. | 
 | ows, and mac OS. | 
 |  | 
 | .. _basic support for Cuda devices: | 
 |  | 
 | Cuda devices support | 
 | ==================== | 
 |  | 
 | Directives execution modes | 
 | -------------------------- | 
 |  | 
 | Clang code generation for target regions supports two modes: the SPMD and | 
 | non-SPMD modes. Clang chooses one of these two modes automatically based on the | 
 | way directives and clauses on those directives are used. The SPMD mode uses a | 
 | simplified set of runtime functions thus increasing performance at the cost of | 
 | supporting some OpenMP features. The non-SPMD mode is the most generic mode and | 
 | supports all currently available OpenMP features. The compiler will always | 
 | attempt to use the SPMD mode wherever possible. SPMD mode will not be used if: | 
 |  | 
 |    - The target region contains an `if()` clause that refers to a `parallel` | 
 |      directive. | 
 |  | 
 |    - The target region contains a `parallel` directive with a `num_threads()` | 
 |      clause. | 
 |  | 
 |    - The target region contains user code (other than OpenMP-specific | 
 |      directives) in between the `target` and the `parallel` directives. | 
 |  | 
 | Data-sharing modes | 
 | ------------------ | 
 |  | 
 | Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda` | 
 | modes. The default mode is `Generic`. `Cuda` mode can give an additional | 
 | performance and can be activated using the `-fopenmp-cuda-mode` flag. In | 
 | `Generic` mode all local variables that can be shared in the parallel regions | 
 | are stored in the global memory. In `Cuda` mode local variables are not shared | 
 | between the threads and it is user responsibility to share the required data | 
 | between the threads in the parallel regions. | 
 |  | 
 | Features not supported or with limited support for Cuda devices | 
 | --------------------------------------------------------------- | 
 |  | 
 | - Reductions across the teams are not supported yet. | 
 |  | 
 | - Cancellation constructs are not supported. | 
 |  | 
 | - Doacross loop nest is not supported. | 
 |  | 
 | - User-defined reductions are supported only for trivial types. | 
 |  | 
 | - Nested parallelism: inner parallel regions are executed sequentially. | 
 |  | 
 | - Static linking of libraries containing device code is not supported yet. | 
 |  | 
 | - Automatic translation of math functions in target regions to device-specific | 
 |   math functions is not implemented yet. | 
 |  | 
 | - Debug information for OpenMP target regions is not supported yet. | 
 |  |