[cours]    [mise en pratique]

Table des matières

5. Thrust & CUME

5.1. Thrust

Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce, which can be composed together to implement complex algorithms with concise, readable source code. By describing your computation in terms of these high-level abstractions you provide Thrust with the freedom to select the most efficient implementation automatically. As a result, Thrust can be utilized in rapid prototyping of CUDA applications, where programmer productivity matters most, as well as in production, where robustness and absolute performance are crucial.

Exemple 1 : tri

On génère de manière aléatoire des entiers et on les trie ( i5-4570 CPU @ 3.20GHz, GeForce GTX 770) :

  1. #include <thrust/host_vector.h>
  2. #include <thrust/device_vector.h>
  3. #include <thrust/generate.h>
  4. #include <thrust/sort.h>
  5. #include <thrust/copy.h>
  6. #include <algorithm>
  7. #include <cstdlib>
  8.  
  9. int main(void)
  10. {
  11.   // generate 32M random numbers serially
  12.   thrust::host_vector<int> cpu_vec(32 << 20);
  13.   std::generate(cpu_vec.begin(), cpu_vec.end(), rand);
  14.  
  15.   // transfer data to the device
  16.   thrust::device_vector<int> gpu_vec = cpu_vec;
  17.  
  18.   // sort data on the device (846M keys per second on GeForce GTX 480)
  19.   thrust::sort(gpu_vec.begin(), gpu_vec.end());
  20.  
  21.   // transfer data back to host
  22.   thrust::copy(gpu_vec.begin(), gpu_vec.end(), cpu_vec.begin());
  23.  
  24.   return 0;
  25. }
  26.  
  27.  
nvcc --link -o thrust_sort.exe thrust_sort.cu --compiler-options -O2 -gencode arch=compute_30,code=sm_30

time ./thurst_sort.exe

real    0m0.432s
user    0m0.361s
sys     0m0.069s

Exemple 2 : réduction (somme)

On génère de manière aléatoire des entiers et on en calcule la somme ( i5-4570 CPU @ 3.20GHz, GeForce GTX 770) :

  1. #include <thrust/host_vector.h>
  2. #include <thrust/device_vector.h>
  3. #include <thrust/generate.h>
  4. #include <thrust/reduce.h>
  5. #include <thrust/functional.h>
  6. #include <algorithm>
  7. #include <cstdlib>
  8. #include <iostream>
  9. using namespace std;
  10.  
  11. int generator() {
  12.     return rand() % 5;
  13. }
  14.  
  15. int main(void)
  16. {
  17.   // generate random data serially
  18.   thrust::host_vector<int> cpu_vec(1000000);
  19.   std::generate(cpu_vec.begin(), cpu_vec.end(), generator);
  20.  
  21.   // transfer to device and compute sum
  22.   thrust::device_vector<int> gpu_vec = cpu_vec;
  23.   int sum = thrust::reduce(gpu_vec.begin(), gpu_vec.end(), 0, thrust::plus<int>());
  24.   cout << "sum = " << sum << endl;
  25.   return 0;
  26. }
  27.  
nvcc --link -o thrust_reduce.exe thrust_reduce.cu --compiler-options -O2 -gencode arch=compute_30,code=sm_30

time ./thrust_reduce.exe 
sum = 2002318

real    0m0.058s
user    0m0.023s
sys     0m0.033s

5.2. CUME

CUME (prononcer QMi) est un acronyme pour CUDA Made Easy. Il s'agit d'un framework qui a pour but de simplifier la programmation en CUDA en déchargeant le programmeur des tâches fastidieuses telles que :

Pour plus d'informations, voir :