当前位置:网站首页>Compact CUDA tutorial - CUDA driver API
Compact CUDA tutorial - CUDA driver API
2022-04-23 19:42:00 【Adenialzz】
Streamlining CUDA course ——CUDA Driver API
Driver API summary
CUDA The multistage of API
CUDA Of API There are multiple levels ( The figure below ), Details available :CUDA Environment details .
- CUDA Driver API yes CUDA And GPU The driving level of communication API. In the early CUDA And GPU Communication is directly through Driver API.
cuCtxCreate()
etc.cu
The beginning is basically Driver API. Familiar to usnvidia-smi
Commands are called Driver API. - Later I found out Driver API Too much , The details are too complicated , So it evolved Runtime API,Runtime API Is based on Driver API Developed , common
cudaMalloc()
etc. API All are Runtime API.
CUDA Driver
Environment related
CUDA Driver yes With the release of graphics card driver , want And cudatoolkit Look at it separately .
CUDA Driver Corresponding to cuda.h
and libcuda.so
Two documents . Be careful cuda.h
Will be installed cudatoolkit It includes , however libcuda.so
It is installed with the graphics card driver in our system ( and No Also follow cudatooklit install ). therefore , If you want to copy and move directly libcuda.so
Note that the driver version needs to be adapted to the file .
How to understand CUDA Driver
This concise course is for the bottom Driver API The understanding of the , In order to Conducive to subsequent Runtime API Learning and error debugging .Driver API yes understand cudaRuntime The key to context . therefore , This abridged course is in CUDA Driver The main knowledge points of this part are :
- Context Management mechanism of
- CUDA Development habits of series interfaces ( Error checking method )
- Memory model
About context And memory classification
About context, There are two kinds of :
- Manually managed context:
cuCtxCreate
, Manual management , In a stack push/pop - Automatically managed context:
cuDevicePrimaryCtxRetain
, Automatic management ,Runtime API On this basis
About memory , There are two broad categories. :
- CPU Memory , be called Host Memory
- Pageable Memory: Pageable memory
- Page-Locked Memory: Page locked memory
- GPU Memory ( memory ), be called Device Memory
- Global Memory: Global memory
- Shared Memory: Shared memory
- … other
The above contents will be introduced later .
cuIint Drive initialization
cuInit
The meaning is , Initialize drive API, One global execution is enough , If not implemented , Then all API Will return an error .- No, Corresponding
cuDestroy
, No need to release , Program destruction automatically releases .
Return value check
Version of a
Check correctly and friendly cuda The return value of the function , Conducive to the organizational structure of the program , Make the code more readable , Mistakes are easier to find .
We know cuInit
The type returned is CUresult
, The return value will tell the programmer whether the function succeeded or failed , What are the reasons for the failure .
The logic of the official version of the check , as follows :
// Use a parameterized macro definition to check cuda driver Whether it is initialized normally , And locate the file name of the program error 、 Number of lines and error messages
// Macro definition with do...while Loop can ensure the correctness of the program
#define checkDriver(op) \ do{
\ auto code = (op); \ if(code != CUresult::CUDA_SUCCESS){
\ const char* err_name = nullptr; \ const char* err_message = nullptr; \ cuGetErrorName(code, &err_name); \ cuGetErrorString(code, &err_message); \ printf("%s:%d %s failed. \n code = %s, message = %s\n", __FILE__, __LINE__, #op, err_name, err_message); \ return -1; \ } \ }while(0)
Is a macro definition , We're calling other API When , Check the return value of the function , And print out the error code and error information in case of error , Convenient debugging . such as :
checkDriver(cuDeviceGetName(device_name, sizeof(device_name), device));
If there are uninitialized and other errors , The error message will be clearly printed .
This version is also Nvidia The official version , But there are some problems , For example, the readability of the code is poor , Go straight back to int Type error code, etc . Version 2 is recommended .
Version 2
// Obviously , This kind of code encapsulation , More convenient to use
// Macro definition #define < Macro name >(< Parameter table >) < Macrobody >
#define checkDriver(op) __check_cuda_driver((op), #op, __FILE__, __LINE__)
bool __check_cuda_driver(CUresult code, const char* op, const char* file, int line){
if(code != CUresult::CUDA_SUCCESS){
const char* err_name = nullptr;
const char* err_message = nullptr;
cuGetErrorName(code, &err_name);
cuGetErrorString(code, &err_message);
printf("%s:%d %s failed. \n code = %s, message = %s\n", file, line, op, err_name, err_message);
return false;
}
return true;
}
The obvious , The return value of version 2 、 Code readability 、 The encapsulation is much better than version 1 . Use the same way :
checkDriver(cuDeviceGetName(device_name, sizeof(device_name), device));
// Or add a judgment , Exit in case of error
if (!checkDriver(cuDeviceGetName(device_name, sizeof(device_name), device))) {
return -1;
}
CUcontext
Manual context management
-
context Is a context , Association pair GPU All operations .
-
One context Associated with a graphics card , A graphics card can be used by more than one context relation .
-
Each thread has a stack structure to store context, The top of the stack is currently used context, The due push/pop Function operation context The stack , all API Are based on the current context For operational purposes
Just imagine , If you do anything, you need to pass a device Decide which device to send to perform , Much trouble .context Is to facilitate the management of the current API Where is it device A means of implementation , The stack structure is used to save the previous context device, Thus, it is convenient to control multiple devices .
Automatic context management
- Since high-frequency operations are fixed access to a thread device unchanged , It's not often that the same thread accesses different data multiple times device The situation of , And only one context, Seldom use more context.
- That is, in most cases ,
CreateContext
、PushCurrent
、PopCurrent
This is more than context Management is troublesome - So it came out
cuDevicePrimaryCtxRetain
, Associate the master for the device context, Distribute in this way 、 Set up 、 Release 、 The stack doesn't need us to manage it manually , It's a kind of automatic management context The way primaryContext
: Give me the equipment id, Here you are. context And set it up. , here One device Corresponding to one primary context. Different threads , As long as the equipment id identical ,primary context It's the same , And context It's thread safe .- To be introduced later CUDA Runtime API in , Is to automatically use
cuDevicePrimaryCtxRetain
Of .
DriverAPI memory management
- host memory Is the memory of the computer itself , It can be used CUDA Driver API To apply for and release , It can also be used. C/C++ Of
malloc/free
andnew/delete
To apply for and release . - device memory It's the memory on the graphics card , That is, video memory , There are special ones Driver API To apply and release .
版权声明
本文为[Adenialzz]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923163837.html
边栏推荐
- What is a message queue
- MySQL syntax collation (5) -- functions, stored procedures and triggers
- 如何在BNB鏈上創建BEP-20通證
- IIS data conversion problem: 16bit to 24bit
- MySQL syntax collation (4)
- Electron入门教程3 ——进程通信
- TI DSP的 FFT与IFFT库函数的使用测试
- Unity创建超写实三维场景的一般步骤
- Virtual machine performance monitoring and fault handling tools
- C6748 软件仿真和硬件测试 ---附详细FFT硬件测量时间
猜你喜欢
The platinum library cannot search the debug process records of some projection devices
2021-2022-2 ACM training team weekly Programming Competition (8) problem solution
精简CUDA教程——CUDA Driver API
Is meituan, a profit-making company with zero foundation, hungry? Coupon CPS applet (with source code)
Project training of Software College of Shandong University - Innovation Training - network security shooting range experimental platform (VII)
[webrtc] add x264 encoder for CEF / Chromium
Kubernetes introduction to mastery - ktconnect (full name: kubernetes toolkit connect) is a small tool based on kubernetes environment to improve the efficiency of local test joint debugging.
山东大学软件学院项目实训-创新实训-网络安全靶场实验平台(八)
MySQL lock
Software College of Shandong University Project Training - Innovation Training - network security shooting range experimental platform (8)
随机推荐
Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
[transfer] summary of new features of js-es6 (one picture)
Golang timer
IIS数据转换问题16bit转24bit
Main differences between go and PHP
Zero cost, zero foundation, build profitable film and television applet
Is meituan, a profit-making company with zero foundation, hungry? Coupon CPS applet (with source code)
MySQL syntax collation (4)
深度学习——特征工程小总结
Pdf reference learning notes
Using oes texture + glsurfaceview + JNI to realize player picture processing based on OpenGL es
Prefer composition to inheritance
The platinum library cannot search the debug process records of some projection devices
How to use go code to compile Pb generated by proto file with protoc Compiler Go file
高效的串口循环Buffer接收处理思路及代码2
渤海期货这家公司怎么样。期货开户办理安全?
图书管理数据库系统设计
MySQL lock
IIS data conversion problem: 16bit to 24bit
Pit encountered using camera x_ When onpause, the camera is not released, resulting in a black screen when it comes back