当前位置:网站首页>unhandled system error, NCCL version 2.7.8
unhandled system error, NCCL version 2.7.8
2022-04-23 06:12:00 【wujpbb7】
在 宿主机上运行基于 DDP 的 pytorch 训练程序没问题,
进入 docker 后运行,出现 "unhandled system error, NCCL version 2.7.8" 的错误。
解决方法:
在 python -m torch.distributed.launch --nproc_per_node=4 ...前加上 NCCL_DEBUG=INFO
可以看到:
s215:623:649 [3] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-send-404da1ec128dc62d-0-3-2 (size 4104)
进入 docker 时,带上 --ipc=host 即可。
版权声明
本文为[wujpbb7]所创,转载请带上原文链接,感谢
https://blog.csdn.net/blueblood7/article/details/122969027
边栏推荐
- 【2021年新书推荐】Practical IoT Hacking
- 如何对多维矩阵进行标准化(基于numpy)
- DCMTK(DCM4CHE)与DICOOGLE协同工作
- Five methods are used to obtain the parameters and calculation of torch network model
- 常见的正则表达式
- [recommendation of new books in 2021] practical IOT hacking
- 【2021年新书推荐】Practical Node-RED Programming
- Recyclerview batch update view: notifyitemrangeinserted, notifyitemrangeremoved, notifyitemrangechanged
- .net加载字体时遇到 Failed to decode downloaded font:
- Pytorch model pruning example tutorial III. multi parameter and global pruning
猜你喜欢

【点云系列】Learning Representations and Generative Models for 3D pointclouds

Project, how to package

【点云系列】Pointfilter: Point Cloud Filtering via Encoder-Decoder Modeling

【点云系列】DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds

Visual studio 2019 installation and use

Chapter 5 fundamentals of machine learning

1.2 初试PyTorch神经网络

【点云系列】 A Rotation-Invariant Framework for Deep Point Cloud Analysis

【2021年新书推荐】Red Hat RHCSA 8 Cert Guide: EX200

【2021年新书推荐】Enterprise Application Development with C# 9 and .NET 5
随机推荐
What did you do during the internship
【点云系列】Learning Representations and Generative Models for 3D pointclouds
PyTorch最佳实践和代码编写风格指南
PyMySQL连接数据库
[dynamic programming] different paths 2
机器学习笔记 一:学习思路
利用官方torch版GCN训练并测试cora数据集
Component learning (2) arouter principle learning
Three methods to realize the rotation of ImageView with its own center as the origin
Visual studio 2019 installation and use
ArcGIS license server administrator cannot start the workaround
Chapter 1 numpy Foundation
【点云系列】FoldingNet:Point Cloud Auto encoder via Deep Grid Deformation
【3D形状重建系列】Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
Keras如何保存、加载Keras模型
Chapter 8 generative deep learning
Android interview Online Economic encyclopedia [constantly updating...]
c语言编写一个猜数字游戏编写
机器学习——朴素贝叶斯
【2021年新书推荐】Professional Azure SQL Managed Database Administration