Serialization semantics

This note describes how you can save and load PyTorch tensors and module statesin Python, and how to serialize Python modules so they can be loaded in C++.

Saving and loading tensors

torch.save() andtorch.load() let you easily save and load tensors:

>>>t=torch.tensor([1.,2.])>>>torch.save(t,'tensor.pt')>>>torch.load('tensor.pt')tensor([1., 2.])

By convention, PyTorch files are typically written with a ‘.pt’ or ‘.pth’ extension.

torch.save() andtorch.load() use Python’s pickle by default,so you can also save multiple tensors as part of Python objects like tuples,lists, and dicts:

>>>d={'a':torch.tensor([1.,2.]),'b':torch.tensor([3.,4.])}>>>torch.save(d,'tensor_dict.pt')>>>torch.load('tensor_dict.pt'){'a': tensor([1., 2.]), 'b': tensor([3., 4.])}

Custom data structures that include PyTorch tensors can also be saved if thedata structure is pickle-able.

Saving and loading tensors preserves views

Saving tensors preserves their view relationships:

>>>numbers=torch.arange(1,10)>>>evens=numbers[1::2]>>>torch.save([numbers,evens],'tensors.pt')>>>loaded_numbers,loaded_evens=torch.load('tensors.pt')>>>loaded_evens*=2>>>loaded_numberstensor([ 1,  4,  3,  8,  5, 12,  7, 16,  9])

Behind the scenes, these tensors share the same “storage.” SeeTensor Views for moreon views and storage.

When PyTorch saves tensors it saves their storage objects and tensormetadata separately. This is an implementation detail that may change in thefuture, but it typically saves space and lets PyTorch easilyreconstruct the view relationships between the loaded tensors. In the abovesnippet, for example, only a single storage is written to ‘tensors.pt’.

In some cases, however, saving the current storage objects may be unnecessaryand create prohibitively large files. In the following snippet a storage muchlarger than the saved tensor is written to a file:

>>>large=torch.arange(1,1000)>>>small=large[0:5]>>>torch.save(small,'small.pt')>>>loaded_small=torch.load('small.pt')>>>loaded_small.storage().size()999

Instead of saving only the five values in thesmall tensor to ‘small.pt,’the 999 values in the storage it shares withlarge were saved and loaded.

When saving tensors with fewer elements than their storage objects, the size ofthe saved file can be reduced by first cloning the tensors. Cloning a tensorproduces a new tensor with a new storage object containing only the valuesin the tensor:

>>>large=torch.arange(1,1000)>>>small=large[0:5]>>>torch.save(small.clone(),'small.pt')# saves a clone of small>>>loaded_small=torch.load('small.pt')>>>loaded_small.storage().size()5

Since the cloned tensors are independent of each other, however, they havenone of the view relationships the original tensors did. If both file size andview relationships are important when saving tensors smaller than theirstorage objects, then care must be taken to construct new tensors that minimizethe size of their storage objects but still have the desired view relationshipsbefore saving.

Saving and loading torch.nn.Modules

See also:Tutorial: Saving and loading modules

In PyTorch, a module’s state is frequently serialized using a ‘state dict.’A module’s state dict contains all of its parameters and persistent buffers:

>>>bn=torch.nn.BatchNorm1d(3,track_running_stats=True)>>>list(bn.named_parameters())[('weight', Parameter containing: tensor([1., 1., 1.], requires_grad=True)), ('bias', Parameter containing: tensor([0., 0., 0.], requires_grad=True))]>>>list(bn.named_buffers())[('running_mean', tensor([0., 0., 0.])), ('running_var', tensor([1., 1., 1.])), ('num_batches_tracked', tensor(0))]>>>bn.state_dict()OrderedDict([('weight', tensor([1., 1., 1.])),             ('bias', tensor([0., 0., 0.])),             ('running_mean', tensor([0., 0., 0.])),             ('running_var', tensor([1., 1., 1.])),             ('num_batches_tracked', tensor(0))])

Instead of saving a module directly, for compatibility reasons it is recommendedto instead save only its state dict. Python modules even have a function,load_state_dict(), to restore their states from a state dict:

>>>torch.save(bn.state_dict(),'bn.pt')>>>bn_state_dict=torch.load('bn.pt')>>>new_bn=torch.nn.BatchNorm1d(3,track_running_stats=True)>>>new_bn.load_state_dict(bn_state_dict)<All keys matched successfully>

Note that the state dict is first loaded from its file withtorch.load()and the state then restored withload_state_dict().

Even custom modules and modules containing other modules have state dicts andcan use this pattern:

# A module with two linear layers>>>classMyModule(torch.nn.Module):def__init__(self):super().__init__()self.l0=torch.nn.Linear(4,2)self.l1=torch.nn.Linear(2,1)defforward(self,input):out0=self.l0(input)out0_relu=torch.nn.functional.relu(out0)returnself.l1(out0_relu)>>>m=MyModule()>>>m.state_dict()OrderedDict([('l0.weight',tensor([[0.1400,0.4563,-0.0271,-0.4406],[-0.3289,0.2827,0.4588,0.2031]])),('l0.bias',tensor([0.0300,-0.1316])),('l1.weight',tensor([[0.6533,0.3413]])),('l1.bias',tensor([-0.1112]))])>>>torch.save(m.state_dict(),'mymodule.pt')>>>m_state_dict=torch.load('mymodule.pt')>>>new_m=MyModule()>>>new_m.load_state_dict(m_state_dict)<Allkeysmatchedsuccessfully>

Serialized file format fortorch.save

Since PyTorch 1.6.0,torch.save defaults to returning an uncompressed ZIP64archive unless the user sets_use_new_zipfile_serialization=False.

In this archive, the files are ordered as such

checkpoint.pth├── data.pkl├── byteorder  # added in PyTorch 2.1.0├── data/│   ├── 0│   ├── 1│   ├── 2│   └── …└── version
The entries are as follows:
  • data.pkl is the result of pickling the object passed totorch.saveexcludingtorch.Storage objects that it contains

  • byteorder contains a string with thesys.byteorder when saving (“little” or “big”)

  • data/ contains all the storages in the object, where each storage is a separate file

  • version contains a version number at save time that can be used at load time

When saving, PyTorch will ensure that the local file header of each file is paddedto an offset that is a multiple of 64 bytes, ensuring that the offset of each fileis 64-byte aligned.

Note

Tensors on certain devices such as XLA are serialized as pickled numpy arrays. Assuch, their storages are not serialized. In these casesdata/ might not existin the checkpoint.

torch.load withweights_only=True

Starting in version 2.6,torch.load will useweights_only=True if thepickle_moduleargument is not passed.

As discussed in the documentation fortorch.load(),weights_only=True restrictsthe unpickler used intorch.load to only executing functions/building classes required forstate_dicts of plaintorch.Tensors as well as some other primitive types. Further,unlike the defaultUnpickler provided by thepickle module, theweights_only Unpickleris not allowed to dynamically import anything during unpickling.

As mentioned above, saving a module’sstate_dict is a best practice when usingtorch.save. If loading an oldcheckpoint that contains annn.Module, we recommendweights_only=False. When loading a checkpoint that containstensor subclasses, there will likely be functions/classes that need to be allowlisted, see below for further details.

If theweights_only Unpickler encounters a function or class that is not allowlistedby default within the pickle file, you should see an actionable error like such

_pickle.UnpicklingError: Weights only load failed. This file can still be loaded,to do so you have two options, do those steps only if you trust the source of the checkpoint.    1. Re-running `torch.load` with `weights_only` set to `False` will likely succeed,        but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.    2. Alternatively, to load with `weights_only=True` please check the recommended       steps in the following error message.       WeightsUnpickler error: Unsupported global: GLOBAL {__module__}.{__name__} was not an allowed global by       default. Please use `torch.serialization.add_safe_globals([{__name__}])` or the       `torch.serialization.safe_globals([{__name__}])` context manager to allowlist this global       if you trust this class/function.

Please follow the steps in the error message and allowlist the functions or classes only if you trust them.

To get all GLOBALs (functions/classes) in the checkpoint that are not yet allowlisted you can usetorch.serialization.get_unsafe_globals_in_checkpoint() which will return a list of strings of the form{__module__}.{__name__}. If you trust these functions/classes, you can import them and allowlist them perthe error message either viatorch.serialization.add_safe_globals() or the context managertorch.serialization.safe_globals.

To access the list of user-allowlisted functions/classes you can usetorch.serialization.get_safe_globals() andto clear the current list seetorch.serialization.clear_safe_globals().

Troubleshootingweights_only

Getting unsafe globals

A caveat is thattorch.serialization.get_unsafe_globals_in_checkpoint() analyzes the checkpoint statically,some types might be built dynamically during the unpickling process and hence will not be reported bytorch.serialization.get_unsafe_globals_in_checkpoint(). One such example isdtypes in numpy. Innumpy<1.25 after allowlisting all the functions/classes reported bytorch.serialization.get_unsafe_globals_in_checkpoint() you might see an error like

WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`,but got <class 'numpy.dtype[float32]'>

This can be allowlisted via{add_}safe_globals([type(np.dtype(np.float32))]).

Innumpy>=1.25 you would see

WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`,but got <class 'numpy.dtypes.Float32DType'>

This can be allowlisted via{add_}safe_globals([np.dtypes.Float32DType]).

Environment Variables

There are two environment variables that will influence the behavior oftorch.load. These can be helpfulif one does not have access to thetorch.load callsites.

  • TORCH_FORCE_WEIGHTS_ONLY_LOAD=1 will override alltorch.load callsites to useweights_only=True.

  • TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 will maketorch.load callsites useweights_only=Falseonlyifweights_only was not passed as an argument.

Serializing torch.nn.Modules and loading them in C++

See also:Tutorial: Loading a TorchScript Model in C++

ScriptModules can be serialized as a TorchScript program and loadedusingtorch.jit.load().This serialization encodes all the modules’ methods, submodules, parameters,and attributes, and it allows the serialized program to be loaded in C++(i.e. without Python).

The distinction betweentorch.jit.save() andtorch.save() may notbe immediately clear.torch.save() saves Python objects with pickle.This is especially useful for prototyping, researching, and training.torch.jit.save(), on the other hand, serializes ScriptModules to a formatthat can be loaded in Python or C++. This is useful when saving and loading C++modules or for running modules trained in Python with C++, a common practicewhen deploying PyTorch models.

To script, serialize and load a module in Python:

>>>scripted_module=torch.jit.script(MyModule())>>>torch.jit.save(scripted_module,'mymodule.pt')>>>torch.jit.load('mymodule.pt')RecursiveScriptModule( original_name=MyModule                      (l0): RecursiveScriptModule(original_name=Linear)                      (l1): RecursiveScriptModule(original_name=Linear) )

Traced modules can also be saved withtorch.jit.save(), with the caveatthat only the traced code path is serialized. The following example demonstratesthis:

# A module with control flow>>>classControlFlowModule(torch.nn.Module):def__init__(self):super().__init__()self.l0=torch.nn.Linear(4,2)self.l1=torch.nn.Linear(2,1)defforward(self,input):ifinput.dim()>1:returntorch.tensor(0)out0=self.l0(input)out0_relu=torch.nn.functional.relu(out0)returnself.l1(out0_relu)>>>traced_module=torch.jit.trace(ControlFlowModule(),torch.randn(4))>>>torch.jit.save(traced_module,'controlflowmodule_traced.pt')>>>loaded=torch.jit.load('controlflowmodule_traced.pt')>>>loaded(torch.randn(2,4)))tensor([[-0.1571],[-0.3793]],grad_fn=<AddBackward0>)>>>scripted_module=torch.jit.script(ControlFlowModule(),torch.randn(4))>>>torch.jit.save(scripted_module,'controlflowmodule_scripted.pt')>>>loaded=torch.jit.load('controlflowmodule_scripted.pt')>>loaded(torch.randn(2,4))tensor(0)

The above module has an if statement that is not triggered by the traced inputs,and so is not part of the traced module and not serialized with it.The scripted module, however, contains the if statement and is serialized with it.See theTorchScript documentationfor more on scripting and tracing.

Finally, to load the module in C++:

>>>torch::jit::script::Modulemodule;>>>module=torch::jit::load('controlflowmodule_scripted.pt');

See thePyTorch C++ API documentationfor details about how to use PyTorch modules in C++.

Saving and loading ScriptModules across PyTorch versions

The PyTorch Team recommends saving and loading modules with the same version ofPyTorch. Older versions of PyTorch may not support newer modules, and newerversions may have removed or modified older behavior. These changes areexplicitly described inPyTorch’srelease notes,and modules relying on functionality that has changed may need to be updatedto continue working properly. In limited cases, detailed below, PyTorch willpreserve the historic behavior of serialized ScriptModules so they do not requirean update.

torch.div performing integer division

In PyTorch 1.5 and earliertorch.div() would perform floor division whengiven two integer inputs:

# PyTorch 1.5 (and earlier)>>>a=torch.tensor(5)>>>b=torch.tensor(3)>>>a/btensor(1)

In PyTorch 1.7, however,torch.div() will always perform a true divisionof its inputs, just like division in Python 3:

# PyTorch 1.7>>>a=torch.tensor(5)>>>b=torch.tensor(3)>>>a/btensor(1.6667)

The behavior oftorch.div() is preserved in serialized ScriptModules.That is, ScriptModules serialized with versions of PyTorch before 1.6 will continueto seetorch.div() perform floor division when given two integer inputseven when loaded with newer versions of PyTorch. ScriptModules usingtorch.div()and serialized on PyTorch 1.6 and later cannot be loaded in earlier versions ofPyTorch, however, since those earlier versions do not understand the new behavior.

torch.full always inferring a float dtype

In PyTorch 1.5 and earliertorch.full() always returned a float tensor,regardless of the fill value it’s given:

# PyTorch 1.5 and earlier>>>torch.full((3,),1)# Note the integer fill value...tensor([1.,1.,1.])# ...but float tensor!

In PyTorch 1.7, however,torch.full() will infer the returned tensor’sdtype from the fill value:

# PyTorch 1.7>>>torch.full((3,),1)tensor([1,1,1])>>>torch.full((3,),True)tensor([True,True,True])>>>torch.full((3,),1.)tensor([1.,1.,1.])>>>torch.full((3,),1+1j)tensor([1.+1.j,1.+1.j,1.+1.j])

The behavior oftorch.full() is preserved in serialized ScriptModules. That is,ScriptModules serialized with versions of PyTorch before 1.6 will continue to seetorch.full return float tensors by default, even when given bool orinteger fill values. ScriptModules usingtorch.full() and serialized on PyTorch 1.6and later cannot be loaded in earlier versions of PyTorch, however, since thoseearlier versions do not understand the new behavior.

Utility functions

The following utility functions are related to serialization:

torch.serialization.register_package(priority,tagger,deserializer)[source][source]

Registers callables for tagging and deserializing storage objects with an associated priority.Tagging associates a device with a storage object at save time while deserializing moves astorage object to an appropriate device at load time.tagger anddeserializerare run in the order given by theirpriority until a tagger/deserializer returns avalue that is notNone.

To override the deserialization behavior for a device in the global registry, one can register atagger with a higher priority than the existing tagger.

This function can also be used to register a tagger and deserializer for new devices.

Parameters
Returns

None

Example

>>>defipu_tag(obj):>>>ifobj.device.type=='ipu':>>>return'ipu'>>>defipu_deserialize(obj,location):>>>iflocation.startswith('ipu'):>>>ipu=getattr(torch,"ipu",None)>>>assertipuisnotNone,"IPU device module is not loaded">>>asserttorch.ipu.is_available(),"ipu is not available">>>returnobj.ipu(location)>>>torch.serialization.register_package(11,ipu_tag,ipu_deserialize)
torch.serialization.get_crc32_options()[source][source]

Get whethertorch.save() computes and writes crc32 for each record.

Defaults toTrue.

Return type

bool

torch.serialization.set_crc32_options(compute_crc32)[source][source]

Set whethertorch.save() computes and writes crc32 for each record.

Note

Setting this toFalse may make unzipping of thetorch.save outputfail or warn due to corrupted CRC32. Howevertorch.load will beable to load the file.

Parameters

compute_crc32 (bool) – set crc32 compuation flag

torch.serialization.get_default_load_endianness()[source][source]

Get fallback byte order for loading files

If byteorder mark is not present in saved checkpoint,this byte order is used as fallback.By default, it’s “native” byte order.

Returns

Optional[LoadEndianness]

Return type

default_load_endian

torch.serialization.set_default_load_endianness(endianness)[source][source]

Set fallback byte order for loading files

If byteorder mark is not present in saved checkpoint,this byte order is used as fallback.By default, it’s “native” byte order.

Parameters

endianness – the new fallback byte order

torch.serialization.get_default_mmap_options()[source][source]

Get default mmap options fortorch.load() withmmap=True.

Defaults tommap.MAP_PRIVATE.

Returns

int

Return type

default_mmap_options

torch.serialization.set_default_mmap_options(flags)[source][source]

Context manager or function to set default mmap options fortorch.load() withmmap=True to flags.

For now, only eithermmap.MAP_PRIVATE ormmap.MAP_SHARED are supported.Please open an issue if you need any other option to be added here.

Note

This feature is currently not supported for Windows.

Parameters

flags (int) –mmap.MAP_PRIVATE ormmap.MAP_SHARED

torch.serialization.add_safe_globals(safe_globals)[source][source]

Marks the given globals as safe forweights_only load. For example, functionsadded to this list can be called during unpickling, classes could be instantiatedand have state set.

Each item in the list can either be a function/class or a tuple of the form(function/class, string) where string is the full path of the function/class.

Within the serialized format, each function is identified with its fullpath as{__module__}.{__qualname__}. When calling this API, you can provide thisfull path that should match the one in the checkpoint otherwise the default{fn.__module__}.{fn.__qualname__} will be used.

Parameters

safe_globals (List[Union[Callable,Tuple[Callable,str]]]) – list of globals to mark as safe

Example

>>>importtempfile>>>classMyTensor(torch.Tensor):...pass>>>t=MyTensor(torch.randn(2,3))>>>withtempfile.NamedTemporaryFile()asf:...torch.save(t,f.name)# Running `torch.load(f.name, weights_only=True)` will fail with# Unsupported global: GLOBAL __main__.MyTensor was not an allowed global by default.# Check the code and make sure MyTensor is safe to be used when loaded from an arbitrary checkpoint....torch.serialization.add_safe_globals([MyTensor])...torch.load(f.name,weights_only=True)# MyTensor([[-0.5024, -1.8152, -0.5455],#          [-0.8234,  2.0500, -0.3657]])
torch.serialization.clear_safe_globals()[source][source]

Clears the list of globals that are safe forweights_only load.

torch.serialization.get_safe_globals()[source][source]

Returns the list of user-added globals that are safe forweights_only load.

Return type

list[Union[Callable,tuple[Callable,str]]]

torch.serialization.get_unsafe_globals_in_checkpoint(f)[source][source]

Returns a list of strings of functions/classes in atorch.save object that are not safe forweights_only.

For a given function or classf, the corresponding string will be of the form{f.__module__}.{f.__name__}.

This function will return any GLOBALs in the checkpoint that are not in the set marked safeforweights_only (either viaadd_safe_globals() orsafe_globals context orallowlisted bytorch by default).

Note

This function will statically disassemble the pickle file in the checkpoint.The implication is any classes dynamically pushed onto the stack during unpicklingwill not be included in the output.

Parameters

f (Union[str,PathLike[str],IO[bytes]]) – File-like object or string containing the checkpoint object saved viatorch.save

Returns

A list of strings of pickle GLOBALs in the checkpoint that are not allowlisted forweights_only.

Return type

list[str]

classtorch.serialization.safe_globals(safe_globals)[source][source]

Context-manager that adds certain globals as safe forweights_only load.

Parameters

safe_globals (list[Union[Callable,tuple[Callable,str]]]) – List of globals for weights_only load.

Example

>>>importtempfile>>>classMyTensor(torch.Tensor):...pass>>>t=MyTensor(torch.randn(2,3))>>>withtempfile.NamedTemporaryFile()asf:...torch.save(t,f.name)# Running `torch.load(f.name, weights_only=True)` will fail with# Unsupported global: GLOBAL __main__.MyTensor was not an allowed global by default.# Check the code and make sure MyTensor is safe to be used when loaded from an arbitrary checkpoint....withtorch.serialization.safe_globals([MyTensor]):...torch.load(f.name,weights_only=True)# MyTensor([[-0.5024, -1.8152, -0.5455],#          [-0.8234,  2.0500, -0.3657]])>>>asserttorch.serialization.get_safe_globals()==[]
classtorch.serialization.skip_data(materialize_fake_tensors=False)[source][source]

Context-manager that skips writing/reading storage bytes fortorch.save /torch.load calls.

For the save path, storages will still be saved, but the space that their bytes would usually be written towill be empty space. The storage bytes can then be populated in a separate pass.

For the load path, tensors will be loaded per the checkpoint but their storages will not be populated with data.

Warning

Theskip_data context manager is an early prototype and is subject to change.

Parameters

materialize_fake_tensors (bool) – Whether to materialize FakeTensors during save. This is a no-op for the load path.

Example

>>>importtempfile>>>t=torch.randn(2,3)>>>withtempfile.NamedTemporaryFile()asf:...withtorch.serialization.skip_data():...torch.save(t,f.name)...torch.load(f.name,weights_only=True)tensor([[0., 0., 0.],        [0., 0., 0.]])

Config

torch.utils.serialization.config provides a global config that can control the behavior oftorch.save andtorch.load.

torch.utils.serialization.config.save contains options that control the behavior oftorch.save.

  • compute_crc32: whether to compute and write the zip file checksum (Default :True).Seeset_crc32_options().

  • use_pinned_memory_for_d2h: for storages that are on an accelerator when passed totorch.save, whether tomove storage to pinned memory or pageable memory on CPU withintorch.save. (Default:False (i.e. pageable))

  • storage_alignment: alignment of storages in the checkpoint duringtorch.save in bytes. (Default64)

torch.utils.serialization.config.load contains options that control the behavior oftorch.load.

  • mmap: See the documentation formmap argument intorch.load().This config will set the behavior ofmmap fortorch.load if it is notalready explicitly passed to thetorch.load call (Default :False).

  • endianness: Seeset_default_load_endianness().(Default :torch.serialization.LoadEndianness.NATIVE)

  • mmap_flags: Seeset_default_mmap_options.(Default :MAP_PRIVATE)

  • calculate_storage_offsets: If this config is set toTrue, offsets for storages will becalculated rather than read via random reads when usingtorch.load(mmap=True). This minimizesrandom reads, which can be helpful when the file is being loaded over a network. (Default :False)