Rate this Page

torch.Tensor.backward#

Tensor.backward(gradient=None,retain_graph=None,create_graph=False,inputs=None)[source]#

Computes the gradient of current tensor wrt graph leaves.

The graph is differentiated using the chain rule. If the tensor isnon-scalar (i.e. its data has more than one element) and requiresgradient, the function additionally requires specifying agradient.It should be a tensor of matching type and shape, that representsthe gradient of the differentiated function w.r.t.self.

This function accumulates gradients in the leaves - you might need to zero.grad attributes or set them toNone before calling it.SeeDefault gradient layoutsfor details on the memory layout of accumulated gradients.

Note

If you run any forward ops, creategradient, and/or callbackwardin a user-specified CUDA stream context, seeStream semantics of backward passes.

Note

Wheninputs are provided and a given input is not a leaf,the current implementation will call its grad_fn (though it is not strictly needed to get this gradients).It is an implementation detail on which the user should not rely.Seepytorch/pytorch#60521 for more details.

Parameters
  • gradient (Tensor,optional) – The gradient of the functionbeing differentiated w.r.t.self.This argument can be omitted ifself is a scalar. Defaults toNone.

  • retain_graph (bool,optional) – IfFalse, the graph used to compute the grads will be freed;IfTrue, it will be retained. The default isNone, in which case the value is inferred fromcreate_graph(i.e., the graph is retained only when higher-order derivative tracking is requested). Note that in nearly all casessetting this option to True is not needed and often can be worked around in a much more efficient way.

  • create_graph (bool,optional) – IfTrue, graph of the derivative willbe constructed, allowing to compute higher order derivativeproducts. Defaults toFalse.

  • inputs (Sequence[Tensor],optional) – Inputs w.r.t. which the gradient will beaccumulated into.grad. All other tensors will be ignored. If notprovided, the gradient is accumulated into all the leaf Tensors that wereused to compute thetensors. Defaults toNone.