SALE!Use codeBF40 for 40% off everything!
Hurry, sale ends soon!Click to see the full catalog.

Navigation

Making developers awesome at machine learning

Click here Take the FREE Deep Learning with PyTorch Crash-Course

Save and Load Your PyTorch Models

By Adrian TamonApril 8, 2023in Deep Learning with PyTorch 4

A deep learning model is a mathematical abstraction of data, in which a lot of parameters are involved. Training these parameters can take hours, days, and even weeks but afterward, you can make use of the result to apply on new data. This is called inference in machine learning. It is important to know how we can preserve the trained model in disk and later, load it for use in inference. In this post, you will discover how to save your PyTorch models to files and load them up again to make predictions. After reading this chapter, you will know:

What are states and parameters in a PyTorch model
How to save model states
How to load model states

Kick-start your project with my bookDeep Learning with PyTorch. It providesself-study tutorials withworking code.

Let’s get started.

Save and Load Your PyTorch Models
Photo byJoseph Chan. Some rights reserved.

Overview

This post is in three parts; they are

Build an Example Model
What’s Inside a PyTorch Model
Accessingstate_dict of a Model

Build an Example Model

Let’s start with a very simple model in PyTorch. It is a model based on the iris dataset. You will load the dataset using scikit-learn (which the targets are integer labels 0, 1, and 2) and train a neural network for this multiclass classification problem. In this model, you used log softmax as the output activation so you can combine with the negative log likelihood loss function. It is equivalent to no output activation combined with cross entropy loss function.

import torchimport torch.nn as nnimport torch.optim as optimfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_split# Load data into NumPy arraysdata = load_iris()X, y = data["data"], data["target"]# convert NumPy array into PyTorch tensorsX = torch.tensor(X, dtype=torch.float32)y = torch.tensor(y, dtype=torch.long)# splitX_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)# PyTorch modelclass Multiclass(nn.Module):    def __init__(self):        super().__init__()        self.hidden = nn.Linear(4, 8)        self.act = nn.ReLU()        self.output = nn.Linear(8, 3)        self.logsoftmax = nn.LogSoftmax(dim=1)    def forward(self, x):        x = self.act(self.hidden(x))        x = self.logsoftmax(self.output(x))        return xmodel = Multiclass()    # loss metric and optimizerloss_fn = nn.NLLLoss()optimizer = optim.Adam(model.parameters(), lr=0.001)# prepare model and training parametersn_epochs = 100batch_size = 5batch_start = torch.arange(0, len(X), batch_size)# training loopfor epoch in range(n_epochs):    for start in batch_start:        # take a batch        X_batch = X_train[start:start+batch_size]        y_batch = y_train[start:start+batch_size]        # forward pass        y_pred = model(X_batch)        loss = loss_fn(y_pred, y_batch)        # backward pass        optimizer.zero_grad()        loss.backward()        # update weights        optimizer.step()

importtorch

importtorch.nnasnn

importtorch.optimasoptim

fromsklearn.datasetsimportload_iris

fromsklearn.model_selectionimporttrain_test_split

# Load data into NumPy arrays

data=load_iris()

X,y=data["data"],data["target"]

# convert NumPy array into PyTorch tensors

X=torch.tensor(X,dtype=torch.float32)

y=torch.tensor(y,dtype=torch.long)

# split

X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,shuffle=True)

# PyTorch model

classMulticlass(nn.Module):

def__init__(self):

super().__init__()

self.hidden=nn.Linear(4,8)

self.act=nn.ReLU()

self.output=nn.Linear(8,3)

self.logsoftmax=nn.LogSoftmax(dim=1)

defforward(self,x):

x=self.act(self.hidden(x))

x=self.logsoftmax(self.output(x))

returnx

model=Multiclass()

# loss metric and optimizer

loss_fn=nn.NLLLoss()

optimizer=optim.Adam(model.parameters(),lr=0.001)

# prepare model and training parameters

n_epochs=100

batch_size=5

batch_start=torch.arange(0,len(X),batch_size)

# training loop

forepochinrange(n_epochs):

forstartinbatch_start:

# take a batch

X_batch=X_train[start:start+batch_size]

y_batch=y_train[start:start+batch_size]

# forward pass

y_pred=model(X_batch)

loss=loss_fn(y_pred,y_batch)

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

With such a simple model and small dataset, it shouldn’t take a long time to finish training. Afterwards, we can confirm that this model works, by evaluating it with the test set:

...y_pred = model(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

...

y_pred=model(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

It prints, for example,

Accuracy: 0.96

1	Accuracy: 0.96

Want to Get Started With Deep Learning with PyTorch?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

What’s Inside a PyTorch Model

PyTorch model is an object in Python. It holds some deep learning building blocks such as various kinds of layers and activation functions. It also knows how to connect them so it can produce you an output from your input tensors. The algorithm of a model is fixed at the time you created it, however, it has trainable parameters that is supposed to be modified during training loop so the model can be more accurate.

You saw how to get the model parameters when you set up the optimizer for your training loop, namely,

optimizer = optim.Adam(model.parameters(), lr=0.001)

1	optimizer=optim.Adam(model.parameters(),lr=0.001)

The functionmodel.parameters() give you a generator that reference to each layers’ trainable parameters in turn in the form of PyTorch tensors. Therefore, it is possible for you to make a copy of them or overwrite them, for example:

# create a new modelnewmodel = Multiclass()# ask PyTorch to ignore autograd on update and overwrite parameterswith torch.no_grad():    for newtensor, oldtensor in zip(newmodel.parameters(), model.parameters()):        newtensor.copy_(oldtensor)# test with new model using copied tensory_pred = newmodel(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

# create a new model

newmodel=Multiclass()

# ask PyTorch to ignore autograd on update and overwrite parameters

withtorch.no_grad():

fornewtensor,oldtensorinzip(newmodel.parameters(),model.parameters()):

newtensor.copy_(oldtensor)

# test with new model using copied tensor

y_pred=newmodel(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

Which the result should be exactly the same as before since you essentially made the two models identical by copying the parameters.

However, this is not always the case. Some models hasnon-trainable parameters. One example is the batch normalization layer that is common in many convolution neural networks. What it does is to apply normalization on tensors that produced by its previous layer and pass on the normalized tensor to its next layer. It has two parameters: The mean and standard deviation, which are learned from your input data during training loop but not trainable by the optimizer. Therefore these are not part ofmodel.parameters() but equally important.

Accessing`state_dict` of a Model

To access all parameters of a model, trainable or not, you can get it fromstate_dict() function. From the model above, this is what you can get:

import pprintpp = pprint.PrettyPrinter(indent=4)pp.pprint(model.state_dict())

importpprint

pp=pprint.PrettyPrinter(indent=4)

pp.pprint(model.state_dict())

The model above produces the following:

OrderedDict([   (   'hidden.weight',                    tensor([[ 0.1480,  0.0336,  0.3425,  0.2832],        [ 0.5265,  0.8587, -0.7023, -1.1149],        [ 0.1620,  0.8440, -0.6189, -0.6513],        [-0.1559,  0.0393, -0.4701,  0.0825],        [ 0.6364, -0.6622,  1.1150,  0.9162],        [ 0.2081, -0.0958, -0.2601, -0.3148],        [-0.0804,  0.1027,  0.7363,  0.6068],        [-0.4101, -0.3774, -0.1852,  0.1524]])),                (   'hidden.bias',                    tensor([ 0.2057,  0.7998, -0.0578,  0.1041, -0.3903, -0.4521, -0.5307, -0.1532])),                (   'output.weight',                    tensor([[-0.0954,  0.8683,  1.0667,  0.2382, -0.4245, -0.0409, -0.2587, -0.0745],        [-0.0829,  0.8642, -1.6892, -0.0188,  0.0420, -0.1020,  0.0344, -0.1210],        [-0.0176, -1.2809, -0.3040,  0.1985,  0.2423,  0.3333,  0.4523, -0.1928]])),                ('output.bias', tensor([ 0.0998,  0.6360, -0.2990]))])

OrderedDict([ ( 'hidden.weight',

tensor([[ 0.1480, 0.0336, 0.3425, 0.2832],

[ 0.5265, 0.8587, -0.7023, -1.1149],

[ 0.1620, 0.8440, -0.6189, -0.6513],

[-0.1559, 0.0393, -0.4701, 0.0825],

[ 0.6364, -0.6622, 1.1150, 0.9162],

[ 0.2081, -0.0958, -0.2601, -0.3148],

[-0.0804, 0.1027, 0.7363, 0.6068],

[-0.4101, -0.3774, -0.1852, 0.1524]])),

( 'hidden.bias',

tensor([ 0.2057, 0.7998, -0.0578, 0.1041, -0.3903, -0.4521, -0.5307, -0.1532])),

( 'output.weight',

tensor([[-0.0954, 0.8683, 1.0667, 0.2382, -0.4245, -0.0409, -0.2587, -0.0745],

[-0.0829, 0.8642, -1.6892, -0.0188, 0.0420, -0.1020, 0.0344, -0.1210],

[-0.0176, -1.2809, -0.3040, 0.1985, 0.2423, 0.3333, 0.4523, -0.1928]])),

('output.bias', tensor([ 0.0998, 0.6360, -0.2990]))])

It is calledstate_dict because all state variables of a model are here. It is anOrderedDict object from Python’s built-incollections module. All components from a PyTorch model has a name and so as the parameters therein. TheOrderedDict object allows you to map the weights back to the parameters correctly by matching their names.

This is how you should save and load the model: Fetch the model states into anOrderedDict, serialize and save it to disk. For inference, you create a model first (without training), and load the states. In Python, the native format for serialization is pickle:

import pickle# Save modelwith open("iris-model.pickle", "wb") as fp:    pickle.dump(model.state_dict(), fp)    # Create new model and load statesnewmodel = Multiclass()with open("iris-model.pickle", "rb") as fp:    newmodel.load_state_dict(pickle.load(fp))# test with new model using copied tensory_pred = newmodel(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

importpickle

# Save model

withopen("iris-model.pickle","wb")asfp:

pickle.dump(model.state_dict(),fp)

# Create new model and load states

newmodel=Multiclass()

withopen("iris-model.pickle","rb")asfp:

newmodel.load_state_dict(pickle.load(fp))

# test with new model using copied tensor

y_pred=newmodel(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

You know it works because the model you didn’t train produced the same result as the one you trained.

Indeed, the recommended way is to use the PyTorch API to save and load the states, instead of using pickle manually:

# Save modeltorch.save(model.state_dict(), "iris-model.pth")# Create new model and load statesnewmodel = Multiclass()newmodel.load_state_dict(torch.load("iris-model.pth"))# test with new model using copied tensory_pred = newmodel(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

# Save model

torch.save(model.state_dict(),"iris-model.pth")

# Create new model and load states

newmodel=Multiclass()

newmodel.load_state_dict(torch.load("iris-model.pth"))

# test with new model using copied tensor

y_pred=newmodel(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

The*.pth file is indeed a zip file of some pickle files created by PyTorch. It is recommended because PyTorch can store additional information in it. Note that you stored only the states but not the model. You still need to create the model using Python code and load the states into it. If you wish to store the model as well, you can pass in the entire model instead of the states:

# Save modeltorch.save(model, "iris-model-full.pth")# Load modelnewmodel = torch.load("iris-model-full.pth")# test with new model using copied tensory_pred = newmodel(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

# Save model

torch.save(model,"iris-model-full.pth")

# Load model

newmodel=torch.load("iris-model-full.pth")

# test with new model using copied tensor

y_pred=newmodel(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

But remember, due to the nature of Python language, doing so does not relieve you from keeping the code of the model. Thenewmodel object above is an instance ofMulticlass class that you defined before. When you load the model from disk, Python need to know in detail how this class is defined. If you run a script with just the linetorch.load(), you will see the following error message:

Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/.../torch/serialization.py", line 789, in loadreturn _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)File "/.../torch/serialization.py", line 1131, in _loadresult = unpickler.load()File "/.../torch/serialization.py", line 1124, in find_classreturn super().find_class(mod_name, name)AttributeError: Can't get attribute 'Multiclass' on <module '__main__' (built-in)>

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

File"/.../torch/serialization.py",line789,inload

return_load(opened_zipfile,map_location,pickle_module,**pickle_load_args)

File"/.../torch/serialization.py",line1131,in_load

result=unpickler.load()

File"/.../torch/serialization.py",line1124,infind_class

returnsuper().find_class(mod_name,name)

AttributeError:Can't get attribute 'Multiclass' on <module '__main__'(built-in)>

That’s why it is recommended to save only the state dict rather than the entire model.

Putting everything together, the following is the complete code to demonstrate how to create a model, train it, and save to disk:

import torchimport torch.nn as nnimport torch.optim as optimfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_split# Load data into NumPy arraysdata = load_iris()X, y = data["data"], data["target"]# convert NumPy array into PyTorch tensorsX = torch.tensor(X, dtype=torch.float32)y = torch.tensor(y, dtype=torch.long)# splitX_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)# PyTorch modelclass Multiclass(nn.Module):    def __init__(self):        super().__init__()        self.hidden = nn.Linear(4, 8)        self.act = nn.ReLU()        self.output = nn.Linear(8, 3)        self.logsoftmax = nn.LogSoftmax(dim=1)    def forward(self, x):        x = self.act(self.hidden(x))        x = self.logsoftmax(self.output(x))        return xmodel = Multiclass()    # loss metric and optimizerloss_fn = nn.NLLLoss()optimizer = optim.Adam(model.parameters(), lr=0.001)# prepare model and training parametersn_epochs = 100batch_size = 5batch_start = torch.arange(0, len(X), batch_size)# training loopfor epoch in range(n_epochs):    for start in batch_start:        # take a batch        X_batch = X_train[start:start+batch_size]        y_batch = y_train[start:start+batch_size]        # forward pass        y_pred = model(X_batch)        loss = loss_fn(y_pred, y_batch)        # backward pass        optimizer.zero_grad()        loss.backward()        # update weights        optimizer.step()# Save modeltorch.save(model.state_dict(), "iris-model.pth")

importtorch

importtorch.nnasnn

importtorch.optimasoptim

fromsklearn.datasetsimportload_iris

fromsklearn.model_selectionimporttrain_test_split

# Load data into NumPy arrays

data=load_iris()

X,y=data["data"],data["target"]

# convert NumPy array into PyTorch tensors

X=torch.tensor(X,dtype=torch.float32)

y=torch.tensor(y,dtype=torch.long)

# split

X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,shuffle=True)

# PyTorch model

classMulticlass(nn.Module):

def__init__(self):

super().__init__()

self.hidden=nn.Linear(4,8)

self.act=nn.ReLU()

self.output=nn.Linear(8,3)

self.logsoftmax=nn.LogSoftmax(dim=1)

defforward(self,x):

x=self.act(self.hidden(x))

x=self.logsoftmax(self.output(x))

returnx

model=Multiclass()

# loss metric and optimizer

loss_fn=nn.NLLLoss()

optimizer=optim.Adam(model.parameters(),lr=0.001)

# prepare model and training parameters

n_epochs=100

batch_size=5

batch_start=torch.arange(0,len(X),batch_size)

# training loop

forepochinrange(n_epochs):

forstartinbatch_start:

# take a batch

X_batch=X_train[start:start+batch_size]

y_batch=y_train[start:start+batch_size]

# forward pass

y_pred=model(X_batch)

loss=loss_fn(y_pred,y_batch)

# backward pass

optimizer.zero_grad()

loss.backward()

# update weights

optimizer.step()

# Save model

torch.save(model.state_dict(),"iris-model.pth")

And the following is how to load the model from disk and run it for inference:

import torchimport torch.nn as nnimport torch.optim as optimfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_split# Load data into NumPy arraysdata = load_iris()X, y = data["data"], data["target"]# convert NumPy array into PyTorch tensorsX = torch.tensor(X, dtype=torch.float32)y = torch.tensor(y, dtype=torch.long)# PyTorch modelclass Multiclass(nn.Module):    def __init__(self):        super().__init__()        self.hidden = nn.Linear(4, 8)        self.act = nn.ReLU()        self.output = nn.Linear(8, 3)        self.logsoftmax = nn.LogSoftmax(dim=1)    def forward(self, x):        x = self.act(self.hidden(x))        x = self.logsoftmax(self.output(x))        return x# Create new model and load statesmodel = Multiclass()with open("iris-model.pickle", "rb") as fp:    model.load_state_dict(pickle.load(fp))# Run model for inferencey_pred = model(X_test)acc = (torch.argmax(y_pred, 1) == y_test).float().mean()print("Accuracy: %.2f" % acc)

importtorch

importtorch.nnasnn

importtorch.optimasoptim

fromsklearn.datasetsimportload_iris

fromsklearn.model_selectionimporttrain_test_split

# Load data into NumPy arrays

data=load_iris()

X,y=data["data"],data["target"]

# convert NumPy array into PyTorch tensors

X=torch.tensor(X,dtype=torch.float32)

y=torch.tensor(y,dtype=torch.long)

# PyTorch model

classMulticlass(nn.Module):

def__init__(self):

super().__init__()

self.hidden=nn.Linear(4,8)

self.act=nn.ReLU()

self.output=nn.Linear(8,3)

self.logsoftmax=nn.LogSoftmax(dim=1)

defforward(self,x):

x=self.act(self.hidden(x))

x=self.logsoftmax(self.output(x))

returnx

# Create new model and load states

model=Multiclass()

withopen("iris-model.pickle","rb")asfp:

model.load_state_dict(pickle.load(fp))

# Run model for inference

y_pred=model(X_test)

acc=(torch.argmax(y_pred,1)==y_test).float().mean()

print("Accuracy: %.2f"%acc)

Summary

In this post, you learned how to keep a copy of your trained PyTorch model in disk and how to reuse it. In particular, you learned

What are parameters and states in a PyTorch model
How to save all necessary states from a model to disk
How to rebuild a working model from the saved states

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

...using the newly released PyTorch 2.0 library

Discover how in my new Ebook:
Deep Learning with PyTorch

It providesself-study tutorials withhundreds of working code to turn you from a novice to expert. It equips you with
tensor operation,training,evaluation,hyperparameter optimization, and much more...

Kick-start your deep learning journey with hands-on exercises

See What's Inside

4 Responses toSave and Load Your PyTorch Models

JavierMarch 10, 2023 at 6:35 am#
Thank you it was really useful, just a comment. In the final example the loading operation is shown using picklle instead of torch.load. Torch load will be more consistent with the saving code showed before.
Reply
- James CarmichaelMarch 10, 2023 at 7:59 am#
  You are very welcome Javier! Thank you for your feedback and suggestion!
  Reply
PoultAugust 10, 2023 at 4:48 pm#
Hello, I think in the last windows of code, in the 31-32 line:
it could be: ‘with open(“iris-model.pth”, “rb”) as fp:
model.load_state_dict(torch.load(fp))’
Reply
- James CarmichaelAugust 11, 2023 at 8:15 am#
  Thank you for your feedback and suggestion Poult!
  Reply

Movatterモバイル変換

Navigation

Save and Load Your PyTorch Models

Overview

Build an Example Model

Want to Get Started With Deep Learning with PyTorch?

What’s Inside a PyTorch Model

Accessing`state_dict` of a Model

Further Readings

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

About Adrian Tam

4 Responses toSave and Load Your PyTorch Models

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Movatterモバイル変換

Navigation

Overview

Build an Example Model

Want to Get Started With Deep Learning with PyTorch?

What’s Inside a PyTorch Model

Accessingstate_dict of a Model

Further Readings

Summary

Get Started on Deep Learning with PyTorch!

Learn how to build deep learning models

Kick-start your deep learning journey with hands-on exercises

More On This Topic

About Adrian Tam

4 Responses toSave and Load Your PyTorch Models

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Accessing`state_dict` of a Model