当前位置:网站首页>9. cuBLAS Development Guide Chinese Version--Configuration of Atomic Mode in cuBLAS
9. cuBLAS Development Guide Chinese Version--Configuration of Atomic Mode in cuBLAS
2022-08-08 16:04:00 【Ho, sweeps the floor】
cuBLASConfiguration of atomic mode in
2.4.20. cublasSetAtomicsMode()
cublasStatus_t cublasSetAtomicsMode(cublasHandlet handle, cublasAtomicsMode_t mode)
一些方法,如 cublas<t>symv
和 cublas<t>hemv
Has an alternative implementation that uses atoms to accumulate results. This implementation is usually much faster,But the results that can be generated from one run to another are not exactly the same. 从数学上讲,These different results were not significant,But biases can arise when debugging these differences.
This function allows or disallows cuBLAS The library uses atomics for all routines with alternate implementations. 如果没有在任何 cuBLAS The routine's documentation is explicitly specified,It means that the routine does not use an alternative implementation of atoms. When atomic mode is disabled,When called with the same parameters on the same hardware,每个 cuBLAS The routine should produce the same results from one run to another.
默认初始化的 cublasHandle_t
The default atomic mode of an object is CUBLAS_ATOMICS_NOT_ALLOWED
. 有关详细信息,See the section on types.
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | Atomic mode set successfully |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
2.4.21. cublasGetAtomicsMode()
cublasStatus_t cublasGetAtomicsMode(cublasHandle_t handle, cublasAtomicsMode_t *mode)
This function is query specific cuBLAS Atomic mode for context.
默认初始化的 cublasHandle_t
The default atomic mode of an object is CUBLAS_ATOMICS_NOT_ALLOWED.
有关详细信息,See the section on types.
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | Atomic mode set successfully |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | The parameter pattern is one NULL 指针 |
2.4.22. cublasSetMathMode()
cublasStatus_t cublasSetMathMode(cublasHandle_t handle, cublasMath_t mode)
cublasSetMathMode
The function enables you to choose from cublasMath_t
The defined computational precision mode(请参阅 cublasMath_t). Allows the user to set computational precision modes to their logical combination(不推荐使用的 CUBLAS_TENSOR_OP_MATH
除外). 例如,cublasSetMathMode(handle, CUBLAS_DEFAULT_MATH | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION)
. 请注意,The default math mode is CUBLAS_DEFAULT_MATH
.
有关 cublasGemmEx()
和 cublasLtMatmul()
API and the matrix and computational precision allowed by its stride variant,请参阅:cublasGemmEx()、cublasGemmBatchedEx()、cublasGemmStridedBatchedEx() 和 cublasLtMatmul().
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | Atomic mode set successfully |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | An invalid mode value was specified. |
2.4.23. cublasGetMathMode()
cublasStatus_t cublasGetMathMode(cublasHandle_t handle, cublasMath_t *mode)
This function returns the math mode used by the library routines.
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | Math type returned successfully. |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | 模式为 NULL. |
2.4.24. cublasSetSmCountTarget()
cublasStatus_t cublasSetSmCountTarget(cublasHandle_t handle, int smCountTarget)
cublasSetSmCountTarget
The function allows to override the number of multiprocessors available to the library during kernel execution.
当已知 cuBLAS routine and different CUDA While other work on the stream is running concurrently,This option can be used to improve library performance. 例如. NVIDIA A100 GPU
有 108 个 SM,And there is a concurrency kenrel 运行,网格大小为 8,Values can be used 100 的 cublasSetSmCountTarget
to override the library heuristics,以优化在 100 run on multiple processors.
当设置为 0 时,The library will return to its default behavior. The input value should not exceed the device's multiprocessor count,可使用 cudaDeviceGetAttribute
获取. 不接受负值.
Users must ensure thread safety when using this routine to modify library handles,类似于使用 cublasSetStream
时.
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | SM Count target successfully set. |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | smCountTarget The value is out of the allowed range. |
2.4.25. cublasGetSmCountTarget()
cublasStatus_t cublasGetSmCountTarget(cublasHandle_t handle, int *smCountTarget)
This function gets the value previously programmed into the library handle.
Return Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | SM Count target successfully set. |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | smCountTarget 的值为NULL. |
边栏推荐
- Superset 1.2.0 installation
- 大佬们,sqlserver-cdc任务报错这个,大家遇到过吗Caused by: org.apac
- 18、学习MySQL ALTER命令
- 基于LEAP模型的能源环境发展、碳排放建模预测及不确定性分析
- 使用 ansible-bender 构建容器镜像
- hdu2475 Box
- Take you to play with the "Super Cup" ECS features and experiment on the pit [HUAWEI CLOUD is simple and far]
- 最高法院关于婚姻案件诉讼程序的一些解答
- Kubernetes二进制部署高可用集群
- 【云原生】-MySQL压测神器HammerDB的部署及使用
猜你喜欢
彻底理解 volatile 关键字及应用场景,面试必问,小白都能看懂!
A16z:为什么 NFT 创作者要选择 cc0?
光弘科技:公司在印度为小米、三星、OPPO、诺基亚提供智能手机等产品的制造服务
有了这个开源工具后,我五点就下班了!
论文解读(soft-mask GNN)《Soft-mask: Adaptive Substructure Extractions for Graph Neural Networks》
Jingdong T9 pure hand type 688 pages of god notes, SSM framework integrates Redis to build efficient Internet applications
EMQ畅谈IoT数据基础软件开源版图,引领本土开源走向全球
jupyter notebook 隐藏&显示全部输出内容
promise学习笔记
leetcode 31. 下一个排列(实现next_permutation 函数)
随机推荐
10分钟快速入门RDS【华为云至简致远】
Take you to play with the "Super Cup" ECS features and experiment on the pit [HUAWEI CLOUD is simple and far]
鹏城杯部分WP
egg(二十):fs读取本地的txt文件
使用pymongo,将MongoDB生成的ObjectId类型数据与字符串之间的相互转化
【kali-权限提升】(4.2.5)社会工程学工具包:PowerShell攻击向量(防报毒)
使用 FasterTransformer 和 Triton 推理服务器加速大型 Transformer 模型的推理
光弘科技:公司在印度为小米、三星、OPPO、诺基亚提供智能手机等产品的制造服务
找工作的我看了国聘app
微信公众号+web后台的工资条发放功能的实现
UTF-8 BOM文件导致配置文件无法读取
【软件工程之美 - 专栏笔记】40 | 最佳实践:小团队如何应用软件工程?
我分析30w条数据后发现,西安新房公摊最低的竟是这里?
【Unity入门计划】用双血条方法控制伤害区域减血速度
Beetl使用记录
MySQL中常见的内些...啥
Thread local storage ThreadLocal
瑞吉外卖学习笔记3
json根据条件存入数据库
10.cuBLAS开发指南中文版--cuBLAS中的logger配置