I encountered this error during pytorch training. Check the reason is indeed that the label is out of bounds. This is related to the Python starting from 0. The label needs to be handled carefully. But what I want to explain here is not this, but I am predicting where the label is out of bounds? ?

The following explains how to handle this error:

RuntimeError: CUDA error: device-side assert triggered
  1. First check whether your label starts from 0;
  2. If the model, loss function, and tensor are all transferred to the CPU to run or debug, the specific location will be reported at this time. Because cpu is more able to locate its own mistakes than cuda. In debug mode, I found that it was indeed because a certain index was out of range that caused the program to be abnormal (but cuda always told me that the label was out of range).

After the above two parts, the problem can basically be solved. good luck!

