Rate this Page

Automatic differentiation package - torch.autograd#

Created On: Dec 23, 2016 | Last Updated On: Jun 12, 2025

torch.autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions.

It requires minimal changes to the existing code - you only need to declareTensor sfor which gradients should be computed with therequires_grad=True keyword.As of now, we only support autograd for floating pointTensor types (half, float, double and bfloat16) and complexTensor types (cfloat, cdouble).

backward

Compute the sum of gradients of given tensors with respect to graph leaves.

grad

Compute and return the sum of gradients of outputs with respect to the inputs.

Forward-mode Automatic Differentiation#

Warning

This API is in beta. Even though the function signatures are very unlikely to change, improvedoperator coverage is planned before we consider this stable.

Please see theforward-mode AD tutorialfor detailed steps on how to use this API.

forward_ad.dual_level

Context-manager for forward AD, where all forward AD computation must occur within thedual_level context.

forward_ad.make_dual

Associate a tensor value with its tangent to create a "dual tensor" for forward AD gradient computation.

forward_ad.unpack_dual

Unpack a "dual tensor" to get both its Tensor value and its forward AD gradient.

forward_ad.enter_dual_level

Enter a new forward grad level.

forward_ad.exit_dual_level

Exit a forward grad level.

forward_ad.UnpackedDualTensor

Namedtuple returned byunpack_dual() containing the primal and tangent components of the dual tensor.

Functional higher level API#

Warning

This API is in beta. Even though the function signatures are very unlikely to change, majorimprovements to performances are planned before we consider this stable.

This section contains the higher level API for the autograd that builds on the basic API aboveand allows you to compute jacobians, hessians, etc.

This API works with user-provided functions that take only Tensors as input and returnonly Tensors.If your function takes other arguments that are not Tensors or Tensors that don’t have requires_grad set,you can use a lambda to capture them.For example, for a functionf that takes three inputs, a Tensor for which we want the jacobian, anothertensor that should be considered constant and a boolean flag asf(input,constant,flag=flag)you can use it asfunctional.jacobian(lambdax:f(x,constant,flag=flag),input).

functional.jacobian

Compute the Jacobian of a given function.

functional.hessian

Compute the Hessian of a given scalar function.

functional.vjp

Compute the dot product between a vectorv and the Jacobian of the given function at the point given by the inputs.

functional.jvp

Compute the dot product between the Jacobian of the given function at the point given by the inputs and a vectorv.

functional.vhp

Compute the dot product between vectorv and Hessian of a given scalar function at a specified point.

functional.hvp

Compute the dot product between the scalar function's Hessian and a vectorv at a specified point.

Locally disabling gradient computation#

SeeLocally disabling gradient computation for more information on the differencesbetween no-grad and inference mode as well as other related mechanisms thatmay be confused with the two. Also seeLocally disabling gradient computationfor a list of functions that can be used to locally disable gradients.

Default gradient layouts#

When a non-sparseparam receives a non-sparse gradient duringtorch.autograd.backward() ortorch.Tensor.backward()param.grad is accumulated as follows.

Ifparam.grad is initiallyNone:

  1. Ifparam’s memory is non-overlapping and dense,.grad iscreated with strides matchingparam (thus matchingparam’slayout).

  2. Otherwise,.grad is created with rowmajor-contiguous strides.

Ifparam already has a non-sparse.grad attribute:

  1. Ifcreate_graph=False,backward() accumulates into.gradin-place, which preserves its strides.

  2. Ifcreate_graph=True,backward() replaces.grad with anew tensor.grad+newgrad, which attempts (but does not guarantee)matching the preexisting.grad’s strides.

The default behavior (letting.grads beNone before the firstbackward(), such that their layout is created according to 1 or 2,and retained over time according to 3 or 4) is recommended for best performance.Calls tomodel.zero_grad() oroptimizer.zero_grad() will not affect.gradlayouts.

In fact, resetting all.grads toNone before eachaccumulation phase, e.g.:

foriterations......forparaminmodel.parameters():param.grad=Noneloss.backward()

such that they’re recreated according to 1 or 2 every time,is a valid alternative tomodel.zero_grad() oroptimizer.zero_grad()that may improve performance for some networks.

Manual gradient layouts#

If you need manual control over.grad’s strides,assignparam.grad= a zeroed tensor with desired stridesbefore the firstbackward(), and never reset it toNone.3 guarantees your layout is preserved as long ascreate_graph=False.4 indicates your layout islikely preserved even ifcreate_graph=True.

In-place operations on Tensors#

Supporting in-place operations in autograd is a hard matter, and we discouragetheir use in most cases. Autograd’s aggressive buffer freeing and reuse makesit very efficient and there are very few occasions when in-place operationsactually lower memory usage by any significant amount. Unless you’re operatingunder heavy memory pressure, you might never need to use them.

In-place correctness checks#

AllTensor s keep track of in-place operations applied to them, andif the implementation detects that a tensor was saved for backward in one ofthe functions, but it was modified in-place afterwards, an error will be raisedonce backward pass is started. This ensures that if you’re using in-placefunctions and not seeing any errors, you can be sure that the computedgradients are correct.

Variable (deprecated)#

Warning

The Variable API has been deprecated: Variables are no longer necessary touse autograd with tensors. Autograd automatically supports Tensors withrequires_grad set toTrue. Below please find a quick guide on whathas changed:

  • Variable(tensor) andVariable(tensor,requires_grad) still work as expected,but they return Tensors instead of Variables.

  • var.data is the same thing astensor.data.

  • Methods such asvar.backward(),var.detach(),var.register_hook() now work on tensorswith the same method names.

In addition, one can now create tensors withrequires_grad=True using factorymethods such astorch.randn(),torch.zeros(),torch.ones(), and otherslike the following:

autograd_tensor=torch.randn((2,3,4),requires_grad=True)

Tensor autograd functions#

torch.Tensor.grad

This attribute isNone by default and becomes a Tensor the first time a call tobackward() computes gradients forself.

torch.Tensor.requires_grad

IsTrue if gradients need to be computed for this Tensor,False otherwise.

torch.Tensor.is_leaf

All Tensors that haverequires_grad which isFalse will be leaf Tensors by convention.

torch.Tensor.backward([gradient, ...])

Computes the gradient of current tensor wrt graph leaves.

torch.Tensor.detach

Returns a new Tensor, detached from the current graph.

torch.Tensor.detach_

Detaches the Tensor from the graph that created it, making it a leaf.

torch.Tensor.register_hook(hook)

Registers a backward hook.

torch.Tensor.register_post_accumulate_grad_hook(hook)

Registers a backward hook that runs after grad accumulation.

torch.Tensor.retain_grad()

Enables this Tensor to have theirgrad populated duringbackward().

Function#

classtorch.autograd.Function(*args,**kwargs)[source]#

Base class to create customautograd.Function.

To create a customautograd.Function, subclass this class and implementtheforward() andbackward() static methods. Then, to use your customop in the forward pass, call the class methodapply. Do not callforward() directly.

To ensure correctness and best performance, make sure you are calling thecorrect methods onctx and validating your backward function usingtorch.autograd.gradcheck().

SeeExtending torch.autograd for more details on how to use this class.

Examples:

>>>classExp(Function):>>>@staticmethod>>>defforward(ctx,i):>>>result=i.exp()>>>ctx.save_for_backward(result)>>>returnresult>>>>>>@staticmethod>>>defbackward(ctx,grad_output):>>>result,=ctx.saved_tensors>>>returngrad_output*result>>>>>># Use it by calling the apply method:>>>output=Exp.apply(input)

Function.forward

Define the forward of the custom autograd Function.

Function.backward

Define a formula for differentiating the operation with backward mode automatic differentiation.

Function.jvp

Define a formula for differentiating the operation with forward mode automatic differentiation.

Function.vmap

Define the behavior for this autograd.Function underneathtorch.vmap().

Context method mixins#

When creating a newFunction, the following methods are available toctx.

function.FunctionCtx.mark_dirty

Mark given tensors as modified in an in-place operation.

function.FunctionCtx.mark_non_differentiable

Mark outputs as non-differentiable.

function.FunctionCtx.save_for_backward

Save given tensors for a future call tobackward().

function.FunctionCtx.set_materialize_grads

Set whether to materialize grad tensors.

Custom Function utilities#

Decorator for backward method.

Base customFunction used to build PyTorch utilities

function.BackwardCFunction

This class is used for internal autograd work.

function.InplaceFunction

This class is here only for backward compatibility reasons.

function.NestedIOFunction

This class is here only for backward compatibility reasons.

Numerical gradient checking#

gradcheck

Check gradients computed via small finite differences against analytical gradients wrt tensors ininputs that are of floating point or complex type and withrequires_grad=True.

gradgradcheck

Check gradients of gradients computed via small finite differences against analytical gradients wrt tensors ininputs andgrad_outputs that are of floating point or complex type and withrequires_grad=True.

GradcheckError

Error raised bygradcheck() andgradgradcheck().

Profiler#

Autograd includes a profiler that lets you inspect the cost of differentoperators inside your model - both on the CPU and GPU. There are three modesimplemented at the moment - CPU-only usingprofile.nvprof based (registers both CPU and GPU activity) usingemit_nvtx.and vtune profiler based usingemit_itt.

classtorch.autograd.profiler.profile(enabled=True,*,use_cuda=False,use_device=None,record_shapes=False,with_flops=False,profile_memory=False,with_stack=False,with_modules=False,use_kineto=False,use_cpu=True,experimental_config=None,acc_events=False,custom_trace_id_callback=None)[source]#

Context manager that manages autograd profiler state and holds a summary of results.

Note

This is the backend, most people should usetorch.profiler instead.

Under the hood it just records events of functions being executed in C++ andexposes those events to Python. You can wrap any code into it and it willonly report runtime of PyTorch functions.Note: profiler is thread local and is automatically propagated into the async tasks

Parameters
  • enabled (bool,optional) – Setting this to False makes this context manager a no-op.

  • use_cuda (bool,optional) – Enables timing of CUDA events as wellusing the cudaEvent API. (will be deprecated)

  • use_device (str,optional) – Enables timing of device events.Adds approximately 4us of overhead to each tensor operation when use cuda.The valid devices options are ‘cuda’, ‘xpu’, ‘mtia’ and ‘privateuseone’.

  • record_shapes (bool,optional) – If shapes recording is set, informationabout input dimensions will be collected. This allows one to see whichdimensions have been used under the hood and further group by themusing prof.key_averages(group_by_input_shape=True). Please note thatshape recording might skew your profiling data. It is recommended touse separate runs with and without shape recording to validate the timing.Most likely the skew will be negligible for bottom most events (in a caseof nested function calls). But for higher level functions the totalself cpu time might be artificially increased because of the shapecollection.

  • with_flops (bool,optional) – If with_flops is set, the profiler will estimatethe FLOPs (floating point operations) value using the operator’s input shape.This allows one to estimate the hardware performance. Currently,this option only works for the matrix multiplication and 2D convolution operators.

  • profile_memory (bool,optional) – track tensor memory allocation/deallocation.

  • with_stack (bool,optional) – record source information (file and line number) for the ops.

  • with_modules (bool) – record module hierarchy (including function names)corresponding to the callstack of the op. e.g. If module A’s forward call’smodule B’s forward which contains an aten::add op,then aten::add’s module hierarchy is A.BNote that this support exist, at the moment, only for TorchScript modelsand not eager mode models.

  • use_kineto (bool,optional) – experimental, enable profiling with Kineto profiler.

  • use_cpu (bool,optional) – profile CPU events; setting toFalse requiresuse_kineto=True and can be used to lower the overhead for GPU-only profiling.

  • experimental_config (_ExperimentalConfig) – A set of experimental optionsused by profiler libraries like Kineto. Note, backward compatibility is not guaranteed.

  • acc_events (bool) – Enable the accumulation of FunctionEvents across multiple profiling cycles

Warning

Enabling memory profiling or source attribution incurs additional profileroverhead

Warning

This context managers should not be called recursively, i.e. no nestedinstances are allowed

Warning

Due to some CUDA multiprocessing limitations (seeCUDA in multiprocessing),one cannot use the profiler withuse_device='cuda' to benchmarkDataLoaders withnum_workers>0. If you wish to benchmark data loading,please useuse_device=None ornum_workers=0.

Example

>>>x=torch.randn((1,1),requires_grad=True)>>>withtorch.autograd.profiler.profile()asprof:>>>for_inrange(100):# any normal python code, really!>>>y=x**2>>>y.backward()>>># NOTE: some columns were removed for brevity>>>print(prof.key_averages().table(sort_by="self_cpu_time_total"))-----------------------------------  ---------------  ---------------  ---------------Name                                 Self CPU total   CPU time avg     Number of Calls-----------------------------------  ---------------  ---------------  ---------------mul                                  32.048ms         32.048ms         200pow                                  27.041ms         27.041ms         200PowBackward0                         9.727ms          55.483ms         100torch::autograd::AccumulateGrad      9.148ms          9.148ms          100torch::autograd::GraphRoot           691.816us        691.816us        100-----------------------------------  ---------------  ---------------  ---------------

profiler.profile.export_chrome_trace

Export an EventList as a Chrome tracing tools file.

profiler.profile.key_averages

Averages all function events over their keys.

profiler.profile.self_cpu_time_total

Returns total time spent on CPU.

profiler.profile.total_average

Averages all events.

profiler.parse_nvprof_trace

profiler.EnforceUnique

Raises an error if a key is seen more than once.

profiler.KinetoStepTracker

Provides an abstraction for incrementing the step count globally.

profiler.record_function

Context manager/function decorator that adds a label to a code block/function when running autograd profiler.

profiler_util.Interval

profiler_util.Kernel

profiler_util.MemRecordsAcc

Acceleration structure for accessing mem_records in interval.

profiler_util.StringTable

classtorch.autograd.profiler.emit_nvtx(enabled=True,record_shapes=False)[source]#

Context manager that makes every autograd operation emit an NVTX range.

It is useful when running the program under nvprof:

nvprof--profile-from-startoff-otrace_name.prof--<regularcommandhere>

Unfortunately, there’s no way to force nvprof to flush the data it collectedto disk, so for CUDA profiling one has to use this context manager to annotatenvprof traces and wait for the process to exit before inspecting them.Then, either NVIDIA Visual Profiler (nvvp) can be used to visualize the timeline, ortorch.autograd.profiler.load_nvprof() can load the results for inspectione.g. in Python REPL.

Parameters
  • enabled (bool,optional) – Settingenabled=False makes this context manager a no-op.Default:True.

  • record_shapes (bool,optional) – Ifrecord_shapes=True, the nvtx range wrappingeach autograd op will append information about the sizes of Tensor arguments receivedby that op, in the following format:[[arg0.size(0),arg0.size(1),...],[arg1.size(0),arg1.size(1),...],...]Non-tensor arguments will be represented by[].Arguments will be listed in the order they are received by the backend op.Please note that this order may not match the order in which those arguments were passedon the Python side. Also note that shape recording may increase the overhead of nvtx range creation.Default:False

Example

>>>withtorch.cuda.profiler.profile():...model(x)# Warmup CUDA memory allocator and profiler...withtorch.autograd.profiler.emit_nvtx():...model(x)

Forward-backward correlation

When viewing a profile created usingemit_nvtx in the Nvidia Visual Profiler,correlating each backward-pass op with the corresponding forward-pass op can be difficult.To ease this task,emit_nvtx appends sequence number information to the ranges itgenerates.

During the forward pass, each function range is decorated withseq=<N>.seq is a runningcounter, incremented each time a new backward Function object is created and stashed for backward.Thus, theseq=<N> annotation associated with each forward function range tells you thatif a backward Function object is created by this forward function,the backward object will receive sequence number N.During the backward pass, the top-level range wrapping each C++ backward Function’sapply() call is decorated withstashedseq=<M>.M is the sequence number thatthe backward object was created with. By comparingstashedseq numbers in backward withseqnumbers in forward, you can track down which forward op created each backward Function.

Any functions executed during the backward pass are also decorated withseq=<N>. Duringdefault backward (withcreate_graph=False) this information is irrelevant, and in fact,N may simply be 0 for all such functions. Only the top-level ranges associated withbackward Function objects’apply() methods are useful, as a way to correlate these Functionobjects with the earlier forward pass.

Double-backward

If, on the other hand, a backward pass withcreate_graph=True is underway (in other words,if you are setting up for a double-backward), each function’s execution during backwardis given a nonzero, usefulseq=<N>. Those functions may themselves create Function objectsto be executed later during double-backward, just as the original functions in the forward pass did.The relationship between backward and double-backward is conceptually the same as the relationshipbetween forward and backward: The functions still emit current-sequence-number-tagged ranges,the Function objects they create still stash those sequence numbers, and during the eventualdouble-backward, the Function objects’apply() ranges are still tagged withstashedseqnumbers, which can be compared toseq numbers from the backward pass.

classtorch.autograd.profiler.emit_itt(enabled=True,record_shapes=False)[source]#

Context manager that makes every autograd operation emit an ITT range.

It is useful when running the program under Intel(R) VTune Profiler:

vtune<--vtune-flags><regularcommandhere>

The Instrumentation and Tracing Technology (ITT) API enables your application to generate andcontrol the collection of trace data during its execution across different Intel tools.This context manager is to annotate Intel(R) VTune Profiling trace. With help of this context manager,you will be able to see labeled ranges in Intel(R) VTune Profiler GUI.

Parameters
  • enabled (bool,optional) – Settingenabled=False makes this context manager a no-op.Default:True.

  • record_shapes (bool,optional) – Ifrecord_shapes=True, the itt range wrappingeach autograd op will append information about the sizes of Tensor arguments receivedby that op, in the following format:[[arg0.size(0),arg0.size(1),...],[arg1.size(0),arg1.size(1),...],...]Non-tensor arguments will be represented by[].Arguments will be listed in the order they are received by the backend op.Please note that this order may not match the order in which those arguments were passedon the Python side. Also note that shape recording may increase the overhead of itt range creation.Default:False

Example

>>>withtorch.autograd.profiler.emit_itt():...model(x)

profiler.load_nvprof

Open an nvprof trace file and parses autograd annotations.

Debugging and anomaly detection#

classtorch.autograd.detect_anomaly(check_nan=True)[source]#

Context-manager that enable anomaly detection for the autograd engine.

This does two things:

  • Running the forward pass with detection enabled will allow the backwardpass to print the traceback of the forward operation that created the failingbackward function.

  • Ifcheck_nan isTrue, any backward computation that generate “nan”value will raise an error. DefaultTrue.

Warning

This mode should be enabled only for debugging as the different testswill slow down your program execution.

Example

>>>importtorch>>>fromtorchimportautograd>>>classMyFunc(autograd.Function):...@staticmethod...defforward(ctx,inp):...returninp.clone()......@staticmethod...defbackward(ctx,gO):...# Error during the backward pass...raiseRuntimeError("Some error in backward")...returngO.clone()>>>defrun_fn(a):...out=MyFunc.apply(a)...returnout.sum()>>>inp=torch.rand(10,10,requires_grad=True)>>>out=run_fn(inp)>>>out.backward()    Traceback (most recent call last):      File "<stdin>", line 1, in <module>      File "/your/pytorch/install/torch/_tensor.py", line 93, in backward        torch.autograd.backward(self, gradient, retain_graph, create_graph)      File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward        allow_unreachable=True)  # allow_unreachable flag      File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply        return self._forward_cls.backward(self, *args)      File "<stdin>", line 8, in backward    RuntimeError: Some error in backward>>>withautograd.detect_anomaly():...inp=torch.rand(10,10,requires_grad=True)...out=run_fn(inp)...out.backward()    Traceback of forward call that caused the error:      File "tmp.py", line 53, in <module>        out = run_fn(inp)      File "tmp.py", line 44, in run_fn        out = MyFunc.apply(a)    Traceback (most recent call last):      File "<stdin>", line 4, in <module>      File "/your/pytorch/install/torch/_tensor.py", line 93, in backward        torch.autograd.backward(self, gradient, retain_graph, create_graph)      File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward        allow_unreachable=True)  # allow_unreachable flag      File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply        return self._forward_cls.backward(self, *args)      File "<stdin>", line 8, in backward    RuntimeError: Some error in backward
classtorch.autograd.set_detect_anomaly(mode,check_nan=True)[source]#

Context-manager that sets the anomaly detection for the autograd engine on or off.

set_detect_anomaly will enable or disable the autograd anomaly detectionbased on its argumentmode.It can be used as a context-manager or as a function.

Seedetect_anomaly above for details of the anomaly detection behaviour.

Parameters
  • mode (bool) – Flag whether to enable anomaly detection (True),or disable (False).

  • check_nan (bool) – Flag whether to raise an error when the backwardgenerate “nan”

grad_mode.set_multithreading_enabled

Context-manager that sets multithreaded backwards on or off.

Autograd graph#

Autograd exposes methods that allow one to inspect the graph and interpose behavior duringthe backward pass.

Thegrad_fn attribute of atorch.Tensor holds atorch.autograd.graph.Nodeif the tensor is the output of a operation that was recorded by autograd (i.e., grad_mode isenabled and at least one of the inputs required gradients), orNone otherwise.

graph.Node.name

Return the name.

graph.Node.metadata

Return the metadata.

graph.Node.next_functions

graph.Node.register_hook

Register a backward hook.

graph.Node.register_prehook

Register a backward pre-hook.

graph.increment_version

Update autograd metadata tracking whether the given Tensor was modified in place.

Some operations need intermediary results to be saved during the forward passin order to execute the backward pass.These intermediary results are saved as attributes on thegrad_fn and can be accessed.For example:

>>>a=torch.tensor([0.,0.,0.],requires_grad=True)>>>b=a.exp()>>>print(isinstance(b.grad_fn,torch.autograd.graph.Node))True>>>print(dir(b.grad_fn))['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_raw_saved_result', '_register_hook_dict', '_saved_result', 'metadata', 'name', 'next_functions', 'register_hook', 'register_prehook', 'requires_grad']>>>print(torch.allclose(b.grad_fn._saved_result,b))True

You can also define how these saved tensors should be packed / unpacked using hooks.A common application is to trade compute for memory by saving those intermediary resultsto disk or to CPU instead of leaving them on the GPU. This is especially useful if younotice your model fits on GPU during evaluation, but not training.Also seeHooks for saved tensors.

classtorch.autograd.graph.saved_tensors_hooks(pack_hook,unpack_hook)[source]#

Context-manager that sets a pair of pack / unpack hooks for saved tensors.

Use this context-manager to define how intermediary results of an operationshould be packed before saving, and unpacked on retrieval.

In that context, thepack_hook function will be called every time anoperation saves a tensor for backward (this includes intermediary resultssaved usingsave_for_backward() butalso those recorded by a PyTorch-defined operation). The output ofpack_hook is then stored in the computation graph instead of theoriginal tensor.

Theunpack_hook is called when the saved tensor needs to be accessed,namely when executingtorch.Tensor.backward() ortorch.autograd.grad(). It takes as argument thepacked objectreturned bypack_hook and should return a tensor which has the samecontent as the original tensor (passed as input to the correspondingpack_hook).

The hooks should have the following signatures:

pack_hook(tensor: Tensor) -> Any

unpack_hook(Any) -> Tensor

where the return value ofpack_hook is a valid input tounpack_hook.

In general, you wantunpack_hook(pack_hook(t)) to be equal tot in termsof value, size, dtype and device.

Example:

>>>defpack_hook(x):...print("Packing",x)...returnx.detach()>>>>>>defunpack_hook(x):...print("Unpacking",x)...returnx>>>>>>a=torch.ones(5,requires_grad=True)>>>b=torch.ones(5,requires_grad=True)*2>>>withtorch.autograd.graph.saved_tensors_hooks(pack_hook,unpack_hook):...y=a*bPacking tensor([1., 1., 1., 1., 1.], requires_grad=True)Packing tensor([2., 2., 2., 2., 2.], grad_fn=<MulBackward0>)>>>y.sum().backward()Unpacking tensor([1., 1., 1., 1., 1.], requires_grad=True)Unpacking tensor([2., 2., 2., 2., 2.], grad_fn=<MulBackward0>)

Warning

Performing an inplace operation on the input to either hooks may leadto undefined behavior.

Warning

Only one pair of hooks is allowed at a time. When recursively nesting thiscontext-manager, only the inner-most pair of hooks will be applied.

Warning

To avoid reference cycle, the return value ofpack_hook cannot hold areference to the input tensor. For example, uselambda x: x.detach()instead oflambda x: x as the pack hook.

classtorch.autograd.graph.save_on_cpu(pin_memory=False,device_type='cuda')[source]#

Context manager under which tensors saved by the forward pass will be stored on cpu, then retrieved for backward.

When performing operations within this context manager, intermediaryresults saved in the graph during the forward pass will be moved to CPU,then copied back to the original device when needed for the backward pass.If the graph was already on CPU, no tensor copy is performed.

Use this context-manager to trade compute for GPU memory usage (e.g.when your model doesn’t fit in GPU memory during training).

Parameters

pin_memory (bool) – IfTrue tensors will be saved to CPU pinned memoryduring packing and copied to GPU asynchronously during unpacking.Defaults toFalse.Also seeUse pinned memory buffers.

Example:

>>>a=torch.randn(5,requires_grad=True,device="cuda")>>>b=torch.randn(5,requires_grad=True,device="cuda")>>>c=torch.randn(5,requires_grad=True,device="cuda")>>>>>>deff(a,b,c):...prod_1=a*b# a and b are saved on GPU...withtorch.autograd.graph.save_on_cpu():...prod_2=prod_1*c# prod_1 and c are saved on CPU...y=prod_2*a# prod_2 and a are saved on GPU...returny>>>>>>y=f(a,b,c)>>>dela,b,c# for illustration only>>># the content of a, b, and prod_2 are still alive on GPU>>># the content of prod_1 and c only live on CPU>>>y.sum().backward()# all CPU tensors are moved back to GPU, for backward>>># all intermediary tensors are released (deleted) after the call to backward
classtorch.autograd.graph.disable_saved_tensors_hooks(error_message)[source]#

Context-manager that disables the saved tensors default hooks feature.

Useful for if you are creating a feature that does not work with savedtensors default hooks.

Parameters

error_message (str) – When saved tensors default hooks are used when theyhave been are disabled, a RuntimeError with thiserror message gets raised.

Return type

Generator[None, None, None]

Example:

>>>message="saved tensors default hooks are disabled">>>withtorch.autograd.graph.disable_saved_tensors_hooks(message):...# Raises RuntimeError: saved tensors default hooks are disabled...withtorch.autograd.graph.save_on_cpu():...pass
classtorch.autograd.graph.register_multi_grad_hook(tensors,fn,*,mode='all')[source]#

Register a multi-grad backward hook.

There are two supported modes:"all" and"any".

Under the"all" mode, the hook will be called after gradients with respect to every tensor intensors have been computed. If a tensor is intensors butis not part of the graph, or if a tensor is not needed to compute the gradientsfor anyinputs specified for the current.backward() or.grad() call,this tensor will be ignored and the hook will not wait for its gradient to becomputed.

After every non-ignored tensor’s gradient has been computed,fn will becalled with those gradients.None will be passed for tensors that did nothave their gradients computed.

Under the"any" mode, the hook will be called after the first gradientwith respect to a tensor intensors has been computed. The hookwill be called with that gradient as its argument.

The hook should not modify its arguments.

This function returns a handle with a methodhandle.remove() that removes the hook.

Note

SeeBackward Hooks execution for more information on how when this hookis executed, and how its execution is ordered relative to other hooks.

Example:

>>>importtorch>>>>>>a=torch.rand(2,3,requires_grad=True)>>>b=torch.rand(2,3,requires_grad=True)>>>c=a*b>>>d=a*b>>>>>>deffn(grads):...print([gisnotNoneforgingrads])...>>>torch.autograd.graph.register_multi_grad_hook((a,b,c,d),fn)>>>>>>c.sum().backward(retain_graph=True)[True, True, True, False]>>>c.sum().backward(inputs=(a,),retain_graph=True)[True, False, True, False]>>>
Return type

RemovableHandle

classtorch.autograd.graph.allow_mutation_on_saved_tensors[source]#

Context manager under which mutating tensors saved for backward is allowed.

Under this context manager, tensors saved for backward are cloned on mutation,so the original version can still be used during backward. Normally, mutating a tensorsaved for backward will result in an error raised when it’s used during backward.

To ensure the correct behavior, both the forward and backward should be run underthe same context manager.

Returns

An _AllowMutationOnSavedContext object storing the state managed by thiscontext manager. This object can be useful for debugging purposes. The statemanaged by the context manager is automatically cleared upon exiting.

Return type

Generator[_AllowMutationOnSavedContext, None, None]

Example:

>>>importtorch>>>withtorch.autograd.graph.allow_mutation_on_saved_tensors():...# forward...a=torch.ones(2,3,requires_grad=True)...b=a.clone()...out=(b**2).sum()...b.sin_()...# backward...out.sum().backward()...tensor([[0.8415, 0.8415, 0.8415],        [0.8415, 0.8415, 0.8415]], grad_fn=<SinBackward0>)
classtorch.autograd.graph.GradientEdge(node,output_nr,ownership_token=None)[source]#

Object representing a given gradient edge within the autograd graph.

To get the gradient edge where a given Tensor gradient will be computed,you can doedge=autograd.graph.get_gradient_edge(tensor).

torch.autograd.graph.get_gradient_edge(tensor)[source]#

Get the gradient edge for computing the gradient of the given Tensor.

In particular, it is equivalent to callg=autograd.grad(loss,input) andg=autograd.grad(loss,get_gradient_edge(input)).

Return type

GradientEdge