6661 views|7 replies

364

Posts

0

Resources
The OP
 

Is CUDA good? The gap between CUDA and CPU in computing [Copy link]

邀请:@chenzhufly   @skywalker_lee   @wsdymg   @bigbat   参与回复

Test platform

Jetson Music Nano

Test code

Code repository (please give me a STAR if it is useful to you!)

https://github.com/LitchiCheng/CUDA_Test

The following is some test code:

#include <cstdio>
#include "cuda_runtime.h"

#include <sys/time.h>
#include <time.h>

#include <math.h>

#include "utility/timecost.h"

__global__ void kernelAdd(float * A, float * B, float * C)
{
    int ix=threadIdx.x+blockDim.x*blockIdx.x;
    int iy=threadIdx.y+blockDim.y*blockIdx.y;
    int idx=ix+iy*blockDim.x*gridDim.x;
    C[idx]=cos(A[idx])+sin(B[idx]);
    // if(idx == 2077){
        // printf("idx[%d],ix[%d],iy[%d],bdx[%d],bdy[%d],bix[%d],biy[%d],gdx[%d],gdy[%d],C[%f]\r\n", \
        // idx,ix,iy,blockDim.x,blockDim.y,blockIdx.x,blockIdx.y,gridDim.x,gridDim.y,C[idx]);
    // }
}


int main()
{

for(int cycle=0;cycle < 32;cycle++)
{
    int gridSize2Dx = 16*cycle;
    int gridSize2Dy = 16*cycle;
    //blocksize MAX 1024
    int blockSize2Dx = 32;
    int blockSize2Dy = 32;
    int sum = gridSize2Dx*gridSize2Dy*blockSize2Dx*blockSize2Dy;
    int sum_bytes = sum*sizeof(float);
    printf("size %d \r\n", sum);

    float* A_host=(float*)malloc(sum_bytes);
    for(int i=0;i<sum;i++){
        A_host[i]=(float)i;
    }
  
    float* B_host=(float*)malloc(sum_bytes);
    for(int i=0;i<sum;i++){
        B_host[i]=i;
    }

    float* C_host=(float*)malloc(sum_bytes);

    float *A_dev=NULL;
    float *B_dev=NULL;
    float *C_dev=NULL;
    {
        timecost t1("cuda");
        cudaMalloc((void**)&A_dev,sum_bytes);
        cudaMemcpy(A_dev,A_host,sum_bytes,cudaMemcpyHostToDevice);
        cudaMalloc((void**)&B_dev,sum_bytes);
        cudaMemcpy(B_dev,B_host,sum_bytes,cudaMemcpyHostToDevice);

        cudaMalloc((void**)&C_dev,sum_bytes);
        dim3 gridSize2D(gridSize2Dx, gridSize2Dy);
        dim3 blockSize2D(blockSize2Dx, blockSize2Dy);
        kernelAdd<<<gridSize2D, blockSize2D>>>(A_dev,B_dev,C_dev);
        int ret = 0;
        ret = cudaMemcpy(C_host,C_dev,sum_bytes,cudaMemcpyDeviceToHost);
    }

    for(int i=0;i<sum;i++){
        // printf("C[%d]:%f \n", i, C_host[i]);
        C_host[i] = 0.0f;
    }

    {
        timecost t2("cpu");
        for(int i=0;i<sum;i++){
            C_host[i]=cos(A_host[i])+sin(B_host[i]);
        }
    }

    for(int i=0;i<sum;i++){
        // printf("C[%d]:%f \n", i, C_host[i]);
        C_host[i] = 0.0f;
    }

    cudaFree(A_dev);
    free(A_host);
    cudaFree(B_dev);
    free(B_host);
    cudaFree(C_dev);
    free(C_host);
}
    return 0;
}

Use Python script to count the time consumption (orange CPU time consumption, blue CUDA time consumption, x-axis is the increasing amount of calculation)

in conclusion

After the above tests, a simple comparison between CUDA and CPU shows that the larger the amount of data, the more advantages CUDA has. When the amount of data is very small, it is worse to use CPU. Of course, the writing of CUDA programs is also a major influencing factor, such as memory allocation, decoupling of program logic, etc.

video


This post is from Embedded System

Latest reply

111   Details Published on 2024-7-9 09:49

6748

Posts

2

Resources
2
 

CUDA uses graphics card calculations. When the amount of data is large, the advantages of graphics cards are reflected.

This post is from Embedded System
 
 

6748

Posts

2

Resources
3
 

However, there may be some problems with using cuda, such as the graphics card not supporting it. I used cuda before, but after updating the graphics card driver, my computer frequently had blue screens.

This post is from Embedded System

Comments

Mine is a Jetson board, which is relatively better, not the graphics card of my computer.  Details Published on 2024-2-29 22:52
 
 
 

364

Posts

0

Resources
4
 
wangerxian posted on 2024-2-29 10:51 However, there may be some problems with using cuda, such as the graphics card not supporting it. I used cuda before, but after updating the graphics card driver, my computer frequently had blue screens

Mine is a Jetson board, which is relatively better, not the graphics card of my computer.


This post is from Embedded System

Comments

Did you buy the Jetson board to learn artificial intelligence?  Details Published on 2024-3-1 15:41
 
 
 

6748

Posts

2

Resources
5
 
LitchiCheng posted on 2024-2-29 22:52 This is a jetson board, which is relatively better, not the graphics card of my own computer

Did you buy the Jetson board to learn artificial intelligence?

This post is from Embedded System

Comments

For pre-research testing  Details Published on 2024-3-4 08:27
 
 
 

364

Posts

0

Resources
6
 
wangerxian posted on 2024-3-1 15:41 Did you buy the Jetson board to learn artificial intelligence?

For pre-research testing

This post is from Embedded System
 
 
 

5217

Posts

239

Resources
7
 

I'm curious, are all engineers' computers equipped with NVIDIA graphics cards?

This post is from Embedded System
Add and join groups EEWorld service account EEWorld subscription account Automotive development circle
 
 
 

2

Posts

0

Resources
8
 

111

This post is from Embedded System
 
 
 

Guess Your Favourite
Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list