solve the error: Unable to determine the device handle for GPU 0000:09:00.0

created at 02-15-2022 views: 330

Reinstall the nvidia driver, Cuda and Cudann to solve the error: Unable to determine the device handle for GPU 0000:09:00.0

The CUDA_HOME of the lab server is inexplicably gone, and an error is reported when executing the command nvidia-smi: Unable to determine the device handle for GPU 0000:09:00.0: Unknown Error

After trying an almighty reboot, finally. . . still not resolved

1 nvidia driver

Go to the website https://www.nvidia.cn/Download/index.aspx to find the driver suitable for your environment, and then download it:

wget https://us.download.nvidia.com/tesla/440.118.02/NVIDIA-Linux-x86_64-440.118.02.run

run:

sudo ./NVIDIA-xxxx.run

Then encountered an error and needed to shut down the X service:

# closure
sudo /etc/init.d/lightdm stop
# View status
sudo /etc/init.d/lightdm status

Delete possible existing drivers

sudo apt-get --purge remove nvidia-*

Reboot (or not if you don't reboot...)

sudo reboot

Then re-run sudo ./NVIDIA-xxxx.run

2. Cuda

Installed here is Cuda Toolkit 10.2: https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

As shown in the figure below, after selecting the configuration, follow the prompts on the website and execute the following commands directly:

# download
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run

# Install
sudo sh cuda_10.2.89_440.33.01_linux.run

If the previous ones are not deleted, you can execute: sudo apt autoremove cuda

Cuda

3. Cudann

I already had it before, so I didn't reinstall it...

created at:02-15-2022
edited at: 02-15-2022: