Python: RuntimeError NCCL Error 2 unhandled system error

created at 08-08-2021 views: 101

method that not work anymore

docker exec  -it --user root  -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 remote /bin/bash

solution

According to other people's methods on github, I created a new container by myself, and then configured it on this basis, and found that the problem of RuntimeError: NCCL Error 2: unhandled system error will not occur.

docker exec  -it --user root --gpus all --ipc=host  -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 remote /bin/bash

summary:

  • Need to recreate a container
  • Then just reconfigure the environment.
created at:08-08-2021
edited at: 08-08-2021: