Inference Mode#
c10::InferenceMode is a new RAII guard analogous toNoGradModeto be used when you are certain your operations will have no interactionswith autograd (e.g. model training). Compared toNoGradMode, code rununder this mode gets better performance by disabling autograd related work likeview tracking and version counter bumps. However, tensors created insidec10::InferenceMode have more limitations when interacting with autograd system as well.
InferenceMode can be enabled for a given block of code. InsideInferenceModeall newly allocated (non-view) tensors are marked as inference tensors. Inference tensors:
do not have a version counter so an error will be raised if you try to read their version(e.g., because you saved this tensor for backward).
are immutable outside
InferenceMode. So an error will be raised if you try to:- mutate their data outside InferenceMode.- mutate them intorequires_grad=Trueoutside InferenceMode.To work around you can make a clone outsideInferenceModeto get a normal tensor before mutating.
A non-view tensor is an inference tensor if and only if it was allocated insideInferenceMode.A view tensor is an inference tensor if and only if it is a view of an inference tensor.
Inside anInferenceMode block, we make the following performance guarantees:
Like
NoGradMode, all operations do not recordgrad_fneven if their inputs haverequires_grad=True.This applies to both inference tensors and normal tensors.View operations on inference tensors do not do view tracking. View and non-view inference tensors areindistinguishable.
Inplace operations on inference tensors are guaranteed not to do a version bump.
For more implementation details ofInferenceMode please see theRFC-0011-InferenceMode.
Migration guide fromAutoNonVariableTypeMode#
In production use of PyTorch for inference workload, we have seen a proliferationof uses of the C++ guardAutoNonVariableTypeMode (nowAutoDispatchBelowADInplaceOrView),which disables autograd, view tracking and version counter bumps. Unfortunately,current colloquial of this guard for inference workload is unsafe: it’s possible touseAutoNonVariableTypeMode to bypass PyTorch’s safety checks and result insilently wrong results, e.g. PyTorch throws an error when tensors saved for backwardsare subsequently mutated, but mutation happens insideAutoNonVariableTypeMode willsilently bypass the check and returns wrong gradient to users.
When current users ofAutoNonVariableTypeMode think about migrating, the followingsteps might help you decide the best alternatives:
Users trying to run workload in inference only mode (like loading a pretrained JIT model andrun inference in C++ runtime) should add
c10::InferenceModeguardto guard all operationson tensors (including model loading). See an inference workload example below:
c10::InferenceModeguard;model.load_jit(saved_model);autoinputs=preprocess_tensors(data);autoout=model.forward(inputs);autooutputs=postprocess_tensors(out);
Notec10::InferenceMode offers a drop in replacement forAutoNonVariableTypeMode which preservesthe performance characteristics ofAutoNonVariableTypeMode. But they also have some differences thatusers should pay additional attention to:
Both guards affects tensor execution process to skip work not related to inference, but
InferenceModealso affects tensor creation whileAutoNonVariableTypeModedoesn’t. In other words, tensors createdinsideInferenceModeare marked as inference tensors so that certain limitations can be applied afterexitingInferenceMode.Enabled/disabled
InferenceModestates can be nested whileAutoNonVariableTypeModeonly allows enabled state.
{InferenceModeguard(true);// InferenceMode is on{InferenceModeguard(false);// InferenceMode is off}// InferenceMode is on}// InferenceMode is off
Users trying to implement a customized kernel who want to redispatch under
Autograddispatchkeys should useAutoDispatchBelowADInplaceOrViewinstead. NoteAutoDispatchBelowADInplaceOrViewis just a new nameofAutoNonVariableTypeModesince it explains the guard’s functionality better. We’re deprecatingAutoNonVariableTypeModeand it’ll be removed in 1.10 release. See customized kernelROIAlignFunctioninpytorch/visionfor an example:
classROIAlignFunction:publictorch::autograd::Function<ROIAlignFunction>{public:statictorch::autograd::variable_listforward(torch::autograd::AutogradContext*ctx,consttorch::autograd::Variable&input,consttorch::autograd::Variable&rois,doublespatial_scale,int64_tpooled_height,int64_tpooled_width,int64_tsampling_ratio,boolaligned){ctx->saved_data["spatial_scale"]=spatial_scale;ctx->saved_data["pooled_height"]=pooled_height;ctx->saved_data["pooled_width"]=pooled_width;ctx->saved_data["sampling_ratio"]=sampling_ratio;ctx->saved_data["aligned"]=aligned;ctx->saved_data["input_shape"]=input.sizes();ctx->save_for_backward({rois});// Used to be at::AutoNonVariableTypeMode g;at::AutoDispatchBelowADInplaceOrViewguard;autoresult=roi_align(input,rois,spatial_scale,pooled_height,pooled_width,sampling_ratio,aligned);return{result};}
Customized inplace & view kernels need some special handling in addition to the guard above, seecustom kernel tutorialfor more details.