RuntimeError: CUDA error: device-side assert triggered

created at 12-05-2021 views: 51

error

RuntimeError: CUDA error: device-side assert triggered 

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

This is because my calculation using torch.nn.CrossEntropyLoss is wrong and the number of categories is not correct

pred = torch.zeros((128, 1, 128, 128))
labels = torch.zeros((128, 128, 128))

loss_fn = torch.nn.CrossEntropyLoss()

loss = loss_fn(pred, labels)

loss.backward()    <--error here

solution

pred = torch.zeros((128, 1, 128, 128))
labels = torch.zeros((128, 128, 128))

loss_fn = torch.nn.SmoothL1Loss()

loss = loss_fn(pred.view(128, 128, 128), labels)

loss.backward()    

It's just to explain the problem, not really directly change CE to L1Loss, it depends on your actual situation.

In addition, this error RuntimeError is not very clear. It is not only me that will have this error. You can communicate with me about the error in other situations.

created at:12-05-2021
edited at: 12-05-2021: