Rate this Page

Inference Mode#

c10::InferenceMode is a new RAII guard analogous toNoGradModeto be used when you are certain your operations will have no interactionswith autograd (e.g. model training). Compared toNoGradMode, code rununder this mode gets better performance by disabling autograd related work likeview tracking and version counter bumps. However, tensors created insidec10::InferenceMode have more limitations when interacting with autograd system as well.

InferenceMode can be enabled for a given block of code. InsideInferenceModeall newly allocated (non-view) tensors are marked as inference tensors. Inference tensors:

  • do not have a version counter so an error will be raised if you try to read their version(e.g., because you saved this tensor for backward).

  • are immutable outsideInferenceMode. So an error will be raised if you try to:- mutate their data outside InferenceMode.- mutate them intorequires_grad=True outside InferenceMode.To work around you can make a clone outsideInferenceMode to get a normal tensor before mutating.

A non-view tensor is an inference tensor if and only if it was allocated insideInferenceMode.A view tensor is an inference tensor if and only if it is a view of an inference tensor.

Inside anInferenceMode block, we make the following performance guarantees:

  • LikeNoGradMode, all operations do not recordgrad_fn even if their inputs haverequires_grad=True.This applies to both inference tensors and normal tensors.

  • View operations on inference tensors do not do view tracking. View and non-view inference tensors areindistinguishable.

  • Inplace operations on inference tensors are guaranteed not to do a version bump.

For more implementation details ofInferenceMode please see theRFC-0011-InferenceMode.

Migration guide fromAutoNonVariableTypeMode#

In production use of PyTorch for inference workload, we have seen a proliferationof uses of the C++ guardAutoNonVariableTypeMode (nowAutoDispatchBelowADInplaceOrView),which disables autograd, view tracking and version counter bumps. Unfortunately,current colloquial of this guard for inference workload is unsafe: it’s possible touseAutoNonVariableTypeMode to bypass PyTorch’s safety checks and result insilently wrong results, e.g. PyTorch throws an error when tensors saved for backwardsare subsequently mutated, but mutation happens insideAutoNonVariableTypeMode willsilently bypass the check and returns wrong gradient to users.

When current users ofAutoNonVariableTypeMode think about migrating, the followingsteps might help you decide the best alternatives:

  1. Users trying to run workload in inference only mode (like loading a pretrained JIT model andrun inference in C++ runtime) should addc10::InferenceModeguard to guard all operationson tensors (including model loading). See an inference workload example below:

c10::InferenceModeguard;model.load_jit(saved_model);autoinputs=preprocess_tensors(data);autoout=model.forward(inputs);autooutputs=postprocess_tensors(out);

Notec10::InferenceMode offers a drop in replacement forAutoNonVariableTypeMode which preservesthe performance characteristics ofAutoNonVariableTypeMode. But they also have some differences thatusers should pay additional attention to:

  • Both guards affects tensor execution process to skip work not related to inference, butInferenceModealso affects tensor creation whileAutoNonVariableTypeMode doesn’t. In other words, tensors createdinsideInferenceMode are marked as inference tensors so that certain limitations can be applied afterexitingInferenceMode.

  • Enabled/disabledInferenceMode states can be nested whileAutoNonVariableTypeMode only allows enabled state.

{InferenceModeguard(true);// InferenceMode is on{InferenceModeguard(false);// InferenceMode is off}// InferenceMode is on}// InferenceMode is off
  1. Users trying to implement a customized kernel who want to redispatch underAutograd dispatchkeys should useAutoDispatchBelowADInplaceOrView instead. NoteAutoDispatchBelowADInplaceOrView is just a new nameofAutoNonVariableTypeMode since it explains the guard’s functionality better. We’re deprecatingAutoNonVariableTypeMode and it’ll be removed in 1.10 release. See customized kernelROIAlignFunction inpytorch/vision for an example:

classROIAlignFunction:publictorch::autograd::Function<ROIAlignFunction>{public:statictorch::autograd::variable_listforward(torch::autograd::AutogradContext*ctx,consttorch::autograd::Variable&input,consttorch::autograd::Variable&rois,doublespatial_scale,int64_tpooled_height,int64_tpooled_width,int64_tsampling_ratio,boolaligned){ctx->saved_data["spatial_scale"]=spatial_scale;ctx->saved_data["pooled_height"]=pooled_height;ctx->saved_data["pooled_width"]=pooled_width;ctx->saved_data["sampling_ratio"]=sampling_ratio;ctx->saved_data["aligned"]=aligned;ctx->saved_data["input_shape"]=input.sizes();ctx->save_for_backward({rois});// Used to be at::AutoNonVariableTypeMode g;at::AutoDispatchBelowADInplaceOrViewguard;autoresult=roi_align(input,rois,spatial_scale,pooled_height,pooled_width,sampling_ratio,aligned);return{result};}

Customized inplace & view kernels need some special handling in addition to the guard above, seecustom kernel tutorialfor more details.