CUDA : [cours] [mise en pratique]
L'objectif du TD est de comparer les temps de transfert entre mémoire de l'host et mémoire du device.
Ecrire un programme CUDA qui prend en paramètre une taille de tableau en octets puis qui :
cudaMalloc(void **devPtr, size_t size);
cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind);
On mesurera les temps de transfert en utilisant nvprof :
nvprof --unified-memory-profiling off program arguments
La sortie de nvprof se fait sur stderr, il faut récupérer les temps figurés en rouge et les convertir en secondes. Dans le cas présent il s'agit du temps moyen sur 10 itérations (Calls).
==11839== NVPROF is profiling process 11839, command: ./bin/memory_transfer.exe 1073741824
==11839== Profiling application: ./bin/memory_transfer.exe 1073741824
==11839== Profiling result:
Time(%) Time Calls Avg Min Max Name
50.36% 1.06122s 10 106.12ms 105.51ms 107.42ms [CUDA memcpy DtoH]
49.64% 1.04620s 10 104.62ms 103.81ms 107.23ms [CUDA memcpy HtoD]
==11839== API calls:
Time(%) Time Calls Avg Min Max Name
95.52% 2.10938s 20 105.47ms 103.82ms 107.60ms cudaMemcpy
4.44% 98.097ms 1 98.097ms 98.097ms 98.097ms cudaMalloc
0.02% 466.87us 90 5.1870us 265ns 182.11us cuDeviceGetAttribute
0.01% 234.44us 1 234.44us 234.44us 234.44us cuDeviceTotalMem
0.00% 44.040us 1 44.040us 44.040us 44.040us cuDeviceGetName
0.00% 2.3910us 2 1.1950us 367ns 2.0240us cuDeviceGetCount
0.00% 847ns 2 423ns 316ns 531ns cuDeviceGet
On générera un graphique en utilisant gnuplot comme ci-dessous.
Voici à présent plusieurs résultats comparatifs entre CPU et carte graphique. les colonnes représentent :
On notera qu'il s'agit de la mémoire dite paginable (pageable) et non la mémoire pinned (non paginable).
taille | H2D T (us) | H2D BW | D2H T (us) | D2H BW |
2048 | .000000860 | 2271.07 | .000001542 | 1266.61 |
4096 | .000001075 | 3633.72 | .000001692 | 2308.65 |
8192 | .000001510 | 5173.84 | .000001964 | 3977.85 |
16384 | .000002467 | 6333.60 | .000002604 | 6000.38 |
32768 | .000004329 | 7218.75 | .000003913 | 7986.19 |
65536 | .000008083 | 7732.27 | .000006467 | 9664.45 |
131072 | .000012205 | 10241.70 | .000011539 | 10832.82 |
262144 | .000022892 | 10920.84 | .000021769 | 11484.22 |
524288 | .000044006 | 11362.08 | .000042672 | 11717.28 |
1048576 | .000086931 | 11503.37 | .000083654 | 11954.00 |
2097152 | .000226130 | 8844.46 | .000243510 | 8213.21 |
4194304 | .000500070 | 7998.88 | .000563180 | 7102.52 |
8388608 | .001043900 | 7663.56 | .001179700 | 6781.38 |
16777216 | .002253400 | 7100.38 | .002396000 | 6677.79 |
33554432 | .004454900 | 7183.10 | .004946000 | 6469.87 |
67108864 | .009015300 | 7099.04 | .009801100 | 6529.87 |
134217728 | .017917000 | 7144.05 | .019677000 | 6505.05 |
268435456 | .036013000 | 7108.54 | .039292000 | 6515.32 |
536870912 | .072400000 | 7071.82 | .080247000 | 6380.30 |
1073741824 | .144740000 | 7074.75 | .164820000 | 6212.83 |
taille | H2D T (us) | H2D BW | D2H T (us) | D2H BW |
2048 | .000000915 | 2134.56 | .000001587 | 1230.70 |
4096 | .000001155 | 3382.03 | .000001724 | 2265.80 |
8192 | .000001641 | 4760.81 | .000002044 | 3822.16 |
16384 | .000003417 | 4572.72 | .000002598 | 6014.24 |
32768 | .000004636 | 6740.72 | .000003852 | 8112.66 |
65536 | .000008640 | 7233.79 | .000006348 | 9845.62 |
131072 | .000011766 | 10623.83 | .000011254 | 11107.16 |
262144 | .000021727 | 11506.42 | .000021196 | 11794.67 |
524288 | .000042761 | 11692.89 | .000042300 | 11820.33 |
1048576 | .000081823 | 12221.50 | .000078591 | 12724.10 |
2097152 | .000181810 | 11000.49 | .000169060 | 11830.11 |
4194304 | .000410850 | 9735.91 | .000391240 | 10223.90 |
8388608 | .000798780 | 10015.27 | .000820960 | 9744.68 |
16777216 | .001578200 | 10138.13 | .001643100 | 9737.69 |
33554432 | .003243500 | 9865.88 | .003355200 | 9537.43 |
67108864 | .006434300 | 9946.69 | .006524000 | 9809.93 |
134217728 | .013081000 | 9785.18 | .013071000 | 9792.67 |
268435456 | .026024000 | 9837.07 | .026418000 | 9690.36 |
536870912 | .051243000 | 9991.60 | .053770000 | 9522.03 |
1073741824 | .101320000 | 10106.59 | .106050000 | 9655.82 |
taille | H2D T (us) | H2D BW | D2H T (us) | D2H BW |
2048 | .000000764 | 2556.44 | .000000425 | 4595.58 |
4096 | .000000944 | 4137.97 | .000000547 | 7141.22 |
8192 | .000001289 | 6060.89 | .000000822 | 9504.25 |
16384 | .000002025 | 7716.04 | .000001430 | 10926.57 |
32768 | .000003475 | 8992.80 | .000002627 | 11895.69 |
65536 | .000006364 | 9820.86 | .000005087 | 12286.21 |
131072 | .000011078 | 11283.62 | .000010418 | 11998.46 |
262144 | .000021241 | 11769.69 | .000020342 | 12289.84 |
524288 | .000041925 | 11926.05 | .000040437 | 12364.91 |
1048576 | .000083009 | 12046.88 | .000080193 | 12469.91 |
2097152 | .000201330 | 9933.93 | .000201790 | 9911.29 |
4194304 | .000438850 | 9114.73 | .000463310 | 8633.52 |
8388608 | .000919330 | 8701.98 | .000979430 | 8168.01 |
16777216 | .001893600 | 8449.51 | .002021200 | 7916.08 |
33554432 | .003819300 | 8378.49 | .004137000 | 7735.07 |
67108864 | .007777200 | 8229.18 | .008176900 | 7826.92 |
134217728 | .015975000 | 8012.51 | .016380000 | 7814.40 |
268435456 | .030719000 | 8333.60 | .033224000 | 7705.27 |
536870912 | .061641000 | 8306.15 | .065844000 | 7775.95 |
1073741824 | .123570000 | 8286.80 | .130790000 | 7829.34 |
taille | H2D T (us) | H2D BW | D2H T (us) | D2H BW |
1024 | .0000024000 | 426.6666 | .0000012200 | 839.3442 |
2048 | .0000023600 | 867.7966 | .0000014000 | 1462.8571 |
4096 | .0000026500 | 1545.6603 | .0000011300 | 3624.7787 |
8192 | .0000032700 | 2505.1987 | .0000021700 | 3775.1152 |
16384 | .0000045300 | 3616.7770 | .0000034600 | 4735.2601 |
32768 | .0000069100 | 4742.1128 | .0000056500 | 5799.6460 |
65536 | .0000120000 | 5461.3333 | .0000112000 | 5851.4285 |
131072 | .0000237000 | 5530.4641 | .0000219000 | 5985.0228 |
262144 | .0000438000 | 5985.0228 | .0000408000 | 6425.0980 |
524288 | .0000839000 | 6248.9630 | .0000820000 | 6393.7560 |
1048576 | .0001660000 | 6316.7228 | .0001620000 | 6472.6913 |
2097152 | .0004790000 | 4378.1878 | .0004370000 | 4798.9748 |
4194304 | .0012000000 | 3495.2533 | .0010200000 | 4112.0627 |
8388608 | .0025700000 | 3264.0498 | .0021200000 | 3956.8905 |
16777216 | .0052600000 | 3189.5847 | .0042800000 | 3919.9102 |
33554432 | .010668 | 3145.3348 | .0086900000 | 3861.2695 |
On pourra comparer les résultats obtenus avec l'utiliaire 1_Utilities/bandWidth des examples (samples) de NVidia.
Par example avec une GTX 970, pour transférer 16_777_216 octets, on obtient :
./bandwidthTest --memory=pageable --mode=range --start=16777216 --end=16777216 --increment=1
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 970
Range Mode
Host to Device Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 10057.4
Device to Host Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 9485.5
Device to Device Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
16777216 138641.0
Result = PASS
Soit des résultats comparables :
Type | bw (NVidia) | Programme |
H2D | 10057.4 | 10138.13 |
D2H | 9485.5 | 9737.69 |