PyTorch: one of the variables needed for gradient computation has been modified by an inplace operation

created at 02-15-2022 views: 330

problem

After writing the training code, I found such an error after running it:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [544, 768]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead. 

Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

It probably means that when calculating the gradient, it is checked that a Variable has been modified by an inplace operation.

Follow the prompts, set: torch.autograd.set_detect_anomaly = True, run it again, and get the following more detailed output:

Traceback (most recent call last):
  File "E:\xxx\main\main.py", line 71, in <module>
    main(args)
  File "E:\xxx\main\main.py", line 57, in main
    train(config)
  File "E:\xxx\train\train.py", line 114, in train
    total_loss.backward()
  File "D:\Anaconda3\envs\nlp\lib\site-packages\torch\tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\Anaconda3\envs\nlp\lib\site-packages\torch\autograd\__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [544, 768]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

It can be seen that there is an error in the process of back propagation.

So what is an inplace operation?

According to the Q&A post on the pytorch forum:

An in-place operation is an operation that changes directly the content of a given Tensor without making a copy.

  • In pytorch, inplace operation can be caused by some .add_() or .scatter_(). For the .add_() method, it is directly modified on the tensor, you can change x.add_(y) to x = x + y. If you need to make a copy, refer to the method in the second post, you can use the .clone() method.

  • In python, inplace operation can be caused by some += or *=. For example x += y, need to be changed to x = x +y

created at:02-15-2022
edited at: 02-15-2022: