RuntimeError: cuda runtime error (59) : device-side assert triggered at XXX

created at 08-02-2021 views: 8

cuda runtime error (59)

The figure shows the error that occurred when using the Ubuntu GPU server to train the text classification model. It is difficult to see the source of the error simply by looking at the final error, but we noticed that the error indicated that there was a problem with loss.backward(), which is the loss calculation. mistake.

Most of these problems are label definition problems. You can see that in the seventh line of the program, I defined a dictionary of class_dict to store the id corresponding to the category. I just said that most of this problem is label definition problems. , Observe that the label I defined is from 1 to 7. At first, I thought that this definition is no problem, but everyone needs to pay attention to which position is calculated according to argmax after the model is output, so as to return the index of this position. In other words, this method is also used when calculating loss.

solution

Therefore, the index starts from 0, and the real data label does not define the 0 label, which makes it impossible to calculate the loss, resulting in an error. Convert the id corresponding to the category to 0 ~ 6 to solve the problem.

summary

RuntimeError: cuda runtime error (59): device-side assert triggered at XXX The error prompts that loss cannot be transmitted back, most of which are label definition problems. Just convert the label starting value to a continuous label value starting from 0.

created at:08-02-2021
edited at: 08-02-2021: