Rate this Page

InplaceFunction#

classtorch.autograd.function.InplaceFunction(inplace=False)[source]#

This class is here only for backward compatibility reasons.UseFunction instead of this for any new use case.

staticbackward(ctx,*grad_outputs)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses.(Defining this function is equivalent to defining thevjp function.)

It must accept a contextctx as the first argument, followed byas many outputs as theforward() returned (None will be passed infor non tensor outputs of the forward function),and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output,and each returned value should be the gradient w.r.t. thecorresponding input. If an input is not a Tensor or is a Tensor notrequiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forwardpass. It also has an attributectx.needs_input_grad as a tupleof booleans representing whether each input needs gradient. E.g.,backward() will havectx.needs_input_grad[0]=True if thefirst input toforward() needs gradient computed w.r.t. theoutput.

Return type

Any

staticforward(*args,**kwargs)[source]#

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses.There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethoddefforward(ctx:Any,*args:Any,**kwargs:Any)->Any:pass

Usage 2 (Separate forward and ctx):

@staticmethoddefforward(*args:Any,**kwargs:Any)->Any:pass@staticmethoddefsetup_context(ctx:Any,inputs:Tuple[Any,...],output:Any)->None:pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override thetorch.autograd.Function.setup_context()staticmethod to handle setting up thectx object.output is the output of the forward,inputs are a Tuple of inputsto the forward.

  • SeeExtending torch.autograd for more details

The context can be used to store arbitrary data that can be thenretrieved during the backward pass. Tensors should not be storeddirectly onctx (though this is not currently enforced forbackward compatibility). Instead, tensors should be saved either withctx.save_for_backward() if they are intended to be used inbackward (equivalently,vjp) orctx.save_for_forward()if they are intended to be used for injvp.

Return type

Any

staticjvp(ctx,*grad_inputs)[source]#

Define a formula for differentiating the operation with forward mode automatic differentiation.

This function is to be overridden by all subclasses.It must accept a contextctx as the first argument, followed byas many inputs as theforward() got (None will be passed infor non tensor inputs of the forward function),and it should return as many tensors as there were outputs toforward(). Each argument is the gradient w.r.t the given input,and each returned value should be the gradient w.r.t. thecorresponding output. If an output is not a Tensor or the function is notdifferentiable with respect to that output, you can just pass None as agradient for that input.

You can use thectx object to pass any value from the forward to thisfunctions.

Return type

Any

mark_dirty(*args)[source]#

Mark given tensors as modified in an in-place operation.

This should be called at most once, in either thesetup_context()orforward() methods, and all arguments should be inputs.

Every tensor that’s been modified in-place in a call toforward()should be given to this function, to ensure correctness of our checks.It doesn’t matter whether the function is called before or aftermodification.

Examples::
>>>classInplace(Function):>>>@staticmethod>>>defforward(ctx,x):>>>x_npy=x.numpy()# x_npy shares storage with x>>>x_npy+=1>>>ctx.mark_dirty(x)>>>returnx>>>>>>@staticmethod>>>@once_differentiable>>>defbackward(ctx,grad_output):>>>returngrad_output>>>>>>a=torch.tensor(1.,requires_grad=True,dtype=torch.double).clone()>>>b=a*a>>>Inplace.apply(a)# This would lead to wrong gradients!>>># but the engine would not know unless we mark_dirty>>>b.backward()# RuntimeError: one of the variables needed for gradient>>># computation has been modified by an inplace operation
mark_non_differentiable(*args)[source]#

Mark outputs as non-differentiable.

This should be called at most once, in either thesetup_context()orforward() methods, and all arguments should be tensor outputs.

This will mark outputs as not requiring gradients, increasing theefficiency of backward computation. You still need to accept a gradientfor each output inbackward(), but it’s always going tobe a zero tensor with the same shape as the shape of a correspondingoutput.

This is used e.g. for indices returned from a sort. See example::
>>>classFunc(Function):>>>@staticmethod>>>defforward(ctx,x):>>>sorted,idx=x.sort()>>>ctx.mark_non_differentiable(idx)>>>ctx.save_for_backward(x,idx)>>>returnsorted,idx>>>>>>@staticmethod>>>@once_differentiable>>>defbackward(ctx,g1,g2):# still need to accept g2>>>x,idx=ctx.saved_tensors>>>grad_input=torch.zeros_like(x)>>>grad_input.index_add_(0,idx,g1)>>>returngrad_input
save_for_backward(*tensors)[source]#

Save given tensors for a future call tobackward().

save_for_backward should be called at most once, in either thesetup_context() orforward() methods, and only with tensors.

All tensors intended to be used in the backward pass should be savedwithsave_for_backward (as opposed to directly onctx) to preventincorrect gradients and memory leaks, and enable the application of savedtensor hooks. Seetorch.autograd.graph.saved_tensors_hooks.SeeExtending torch.autograd for more details.

Note that if intermediary tensors, tensors that are neither inputsnor outputs offorward(), are saved for backward, your custom Functionmay not support double backward.Custom Functions that do not support double backward should decorate theirbackward() method with@once_differentiable so that performingdouble backward raises an error. If you’d like to support double backward,you can either recompute intermediaries based on the inputs during backwardor return the intermediaries as the outputs of the custom Function. See thedouble backward tutorialfor more details.

Inbackward(), saved tensors can be accessed through thesaved_tensorsattribute. Before returning them to the user, a check is made to ensurethey weren’t used in any in-place operation that modified their content.

Arguments can also beNone. This is a no-op.

SeeExtending torch.autograd for more details on how to use this method.

Example:

>>>classFunc(Function):>>>@staticmethod>>>defforward(ctx,x:torch.Tensor,y:torch.Tensor,z:int):>>>w=x*z>>>out=x*y+y*z+w*y>>>ctx.save_for_backward(x,y,w,out)>>>ctx.z=z# z is not a tensor>>>returnout>>>>>>@staticmethod>>>@once_differentiable>>>defbackward(ctx,grad_out):>>>x,y,w,out=ctx.saved_tensors>>>z=ctx.z>>>gx=grad_out*(y+y*z)>>>gy=grad_out*(x+z+w)>>>gz=None>>>returngx,gy,gz>>>>>>a=torch.tensor(1.,requires_grad=True,dtype=torch.double)>>>b=torch.tensor(2.,requires_grad=True,dtype=torch.double)>>>c=4>>>d=Func.apply(a,b,c)
save_for_forward(*tensors)[source]#

Save given tensors for a future call tojvp().

save_for_forward should be called at most once, in either thesetup_context() orforward() methods, and all argumentsshould be tensors.

Injvp(), saved objects can be accessed through thesaved_tensorsattribute.

Arguments can also beNone. This is a no-op.

SeeExtending torch.autograd for more details on how to use this method.

Example:

>>>classFunc(torch.autograd.Function):>>>@staticmethod>>>defforward(ctx,x:torch.Tensor,y:torch.Tensor,z:int):>>>ctx.save_for_backward(x,y)>>>ctx.save_for_forward(x,y)>>>ctx.z=z>>>returnx*y*z>>>>>>@staticmethod>>>defjvp(ctx,x_t,y_t,_):>>>x,y=ctx.saved_tensors>>>z=ctx.z>>>returnz*(y*x_t+x*y_t)>>>>>>@staticmethod>>>defvjp(ctx,grad_out):>>>x,y=ctx.saved_tensors>>>z=ctx.z>>>returnz*grad_out*y,z*grad_out*x,None>>>>>>a=torch.tensor(1.,requires_grad=True,dtype=torch.double)>>>t=torch.tensor(1.,dtype=torch.double)>>>b=torch.tensor(2.,requires_grad=True,dtype=torch.double)>>>c=4>>>>>>withfwAD.dual_level():>>>a_dual=fwAD.make_dual(a,t)>>>d=Func.apply(a_dual,b,c)
set_materialize_grads(value)[source]#

Set whether to materialize grad tensors. Default isTrue.

This should be called only from either thesetup_context() orforward() methods.

IfTrue, undefined grad tensors will be expanded to tensors full of zerosprior to calling thebackward() andjvp() methods.

Example:

>>>classSimpleFunc(Function):>>>@staticmethod>>>defforward(ctx,x):>>>returnx.clone(),x.clone()>>>>>>@staticmethod>>>@once_differentiable>>>defbackward(ctx,g1,g2):>>>returng1+g2# No check for None necessary>>>>>># We modify SimpleFunc to handle non-materialized grad outputs>>>classFunc(Function):>>>@staticmethod>>>defforward(ctx,x):>>>ctx.set_materialize_grads(False)>>>ctx.save_for_backward(x)>>>returnx.clone(),x.clone()>>>>>>@staticmethod>>>@once_differentiable>>>defbackward(ctx,g1,g2):>>>x,=ctx.saved_tensors>>>grad_input=torch.zeros_like(x)>>>ifg1isnotNone:# We must check for None now>>>grad_input+=g1>>>ifg2isnotNone:>>>grad_input+=g2>>>returngrad_input>>>>>>a=torch.tensor(1.,requires_grad=True)>>>b,_=Func.apply(a)# induces g2 to be undefined
staticsetup_context(ctx,inputs,output)[source]#

There are two ways to define the forward pass of an autograd.Function.

Either:

  1. Override forward with the signatureforward(ctx,*args,**kwargs).setup_context is not overridden. Setting up the ctx for backwardhappens inside theforward.

  2. Override forward with the signatureforward(*args,**kwargs) andoverridesetup_context. Setting up the ctx for backward happensinsidesetup_context (as opposed to inside theforward)

Seetorch.autograd.Function.forward() andExtending torch.autograd for more details.

Return type

Any

staticvjp(ctx,*grad_outputs)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses.(Defining this function is equivalent to defining thevjp function.)

It must accept a contextctx as the first argument, followed byas many outputs as theforward() returned (None will be passed infor non tensor outputs of the forward function),and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output,and each returned value should be the gradient w.r.t. thecorresponding input. If an input is not a Tensor or is a Tensor notrequiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forwardpass. It also has an attributectx.needs_input_grad as a tupleof booleans representing whether each input needs gradient. E.g.,backward() will havectx.needs_input_grad[0]=True if thefirst input toforward() needs gradient computed w.r.t. theoutput.

Return type

Any

staticvmap(info,in_dims,*args)[source]#

Define the behavior for this autograd.Function underneathtorch.vmap().

For atorch.autograd.Function() to supporttorch.vmap(), you must either override this static method, or setgenerate_vmap_rule toTrue (you may not do both).

If you choose to override this staticmethod: it must accept

  • aninfo object as the first argument.info.batch_sizespecifies the size of the dimension being vmapped over,whileinfo.randomness is the randomness option passed totorch.vmap().

  • anin_dims tuple as the second argument.For each arg inargs,in_dims has a correspondingOptional[int]. It isNone if the arg is not a Tensor or ifthe arg is not being vmapped over, otherwise, it is an integerspecifying what dimension of the Tensor is being vmapped over.

  • *args, which is the same as the args toforward().

The return of the vmap staticmethod is a tuple of(output,out_dims).Similar toin_dims,out_dims should be of the same structure asoutput and contain oneout_dim per output that specifies if theoutput has the vmapped dimension and what index it is in.

Please seeExtending torch.func with autograd.Function for more details.