torch.jit.optimize_for_inference#
- torch.jit.optimize_for_inference(mod,other_methods=None)[source]#
Perform a set of optimization passes to optimize a model for the purposes of inference.
If the model is not already frozen, optimize_for_inferencewill invoketorch.jit.freeze automatically.
In addition to generic optimizations that should speed up your model regardlessof environment, prepare for inference will also bake in build specific settingssuch as the presence of CUDNN or MKLDNN, and may in the future make transformationswhich speed things up on one machine but slow things down on another. Accordingly,serialization is not implemented following invokingoptimize_for_inference andis not guaranteed.
This is still in prototype, and may have the potential to slow down your model.Primary use cases that have been targeted so far have been vision models on cpuand gpu to a lesser extent.
Example (optimizing a module with Conv->Batchnorm):
importtorchin_channels,out_channels=3,32conv=torch.nn.Conv2d(in_channels,out_channels,kernel_size=3,stride=2,bias=True)bn=torch.nn.BatchNorm2d(out_channels,eps=0.001)mod=torch.nn.Sequential(conv,bn)frozen_mod=torch.jit.optimize_for_inference(torch.jit.script(mod.eval()))assert"batch_norm"notinstr(frozen_mod.graph)# if built with MKLDNN, convolution will be run with MKLDNN weightsassert"MKLDNN"infrozen_mod.graph
- Return type