Rate this Page

Patching Batch Norm#

Created On: Jan 03, 2023 | Last Updated On: Jun 11, 2025

What’s happening?#

Batch Norm requires in-place updates to running_mean and running_var of the same size as the input.Functorch does not support inplace update to a regular tensor that takes in a batched tensor (i.e.regular.add_(batched) is not allowed). So when vmapping over a batch of inputs to a single module,we end up with this error

How to fix#

One of the best supported ways is to switch BatchNorm for GroupNorm. Options 1 and 2 support this

All of these options assume that you don’t need running stats. If you’re using a module this meansthat it’s assumed you won’t use batch norm in evaluation mode. If you have a use case that involvesrunning batch norm with vmap in evaluation mode, please file an issue

Option 1: Change the BatchNorm#

If you want to change for GroupNorm, anywhere that you have BatchNorm, replace it with:

BatchNorm2d(C,G,track_running_stats=False)

HereC is the sameC as in the original BatchNorm.G is the number of groups tobreakC into. As such,C%G==0 and as a fallback, you can setC==G, meaningeach channel will be treated separately.

If you must use BatchNorm and you’ve built the module yourself, you can change the module tonot use running stats. In other words, anywhere that there’s a BatchNorm module, set thetrack_running_stats flag to be False

BatchNorm2d(64,track_running_stats=False)

Option 2: torchvision parameter#

Some torchvision models, like resnet and regnet, can take in anorm_layer parameter. These areoften defaulted to be BatchNorm2d if they’ve been defaulted.

Instead you can set it to be GroupNorm.

importtorchvisionfromfunctoolsimportpartialtorchvision.models.resnet18(norm_layer=lambdac:GroupNorm(num_groups=g,c))

Here, once again,c%g==0 so as a fallback, setg=c.

If you are attached to BatchNorm, be sure to use a version that doesn’t use running stats

importtorchvisionfromfunctoolsimportpartialtorchvision.models.resnet18(norm_layer=partial(BatchNorm2d,track_running_stats=False))

Option 3: functorch’s patching#

functorch has added some functionality to allow for quick, in-place patching of the module to notuse running stats. Changing the norm layer is more fragile, so we have not offered that. If youhave a net where you want the BatchNorm to not use running stats, you can runreplace_all_batch_norm_modules_ to update the module in-place to not use running stats

fromtorch.funcimportreplace_all_batch_norm_modules_replace_all_batch_norm_modules_(net)

Option 4: eval mode#

When run under eval mode, the running_mean and running_var will not be updated. Therefore, vmap can support this mode

model.eval()vmap(model)(x)model.train()