当前位置:网站首页>Error in multi machine and multi card training

Error in multi machine and multi card training

2022-04-23 07:28:00 wujpbb7

error 1:

“NCCL WARN Connect to  failed : Network is unreachable”

resolvent :

Set the environment variable  NCCL_SOCKET_IFNAME=enp(enp The prefix of the local network card is , It could be eno, You can use first  ifconfig see )

Reference resources :

The best introduction to distributed deep learning ( Step on the pit ) guide

版权声明
本文为[wujpbb7]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611550332.html