Rate this Page

Note

Go to the endto download the full example code.

What istorch.nnreally?#

Created On: Dec 26, 2018 | Last Updated: Jan 24, 2025 | Last Verified: Nov 05, 2024

Authors: Jeremy Howard,fast.ai. Thanks to Rachel Thomas and Francisco Ingham.

We recommend running this tutorial as a notebook, not a script. To download the notebook (.ipynb) file,click the link at the top of the page.

PyTorch provides the elegantly designed modules and classestorch.nn ,torch.optim ,Dataset ,andDataLoaderto help you create and train neural networks.In order to fully utilize their power and customizethem for your problem, you need to really understand exactly what they’redoing. To develop this understanding, we will first train basic neural neton the MNIST data set without using any features from these models; we willinitially only use the most basic PyTorch tensor functionality. Then, we willincrementally add one feature fromtorch.nn,torch.optim,Dataset, orDataLoader at a time, showing exactly what each piece does, and how itworks to make the code either more concise, or more flexible.

This tutorial assumes you already have PyTorch installed, and are familiarwith the basics of tensor operations. (If you’re familiar with Numpy arrayoperations, you’ll find the PyTorch tensor operations used here nearly identical).

MNIST data setup#

We will use the classicMNIST dataset,which consists of black-and-white images of hand-drawn digits (between 0 and 9).

We will usepathlibfor dealing with paths (part of the Python 3 standard library), and willdownload the dataset usingrequests. We will onlyimport modules when we use them, so you can see exactly what’s beingused at each point.

frompathlibimportPathimportrequestsDATA_PATH=Path("data")PATH=DATA_PATH/"mnist"PATH.mkdir(parents=True,exist_ok=True)URL="https://github.com/pytorch/tutorials/raw/main/_static/"FILENAME="mnist.pkl.gz"ifnot(PATH/FILENAME).exists():content=requests.get(URL+FILENAME).content(PATH/FILENAME).open("wb").write(content)

This dataset is in numpy array format, and has been stored using pickle,a python-specific format for serializing data.

importpickleimportgzipwithgzip.open((PATH/FILENAME).as_posix(),"rb")asf:((x_train,y_train),(x_valid,y_valid),_)=pickle.load(f,encoding="latin-1")

Each image is 28 x 28, and is being stored as a flattened row of length784 (=28x28). Let’s take a look at one; we need to reshape it to 2dfirst.

frommatplotlibimportpyplotimportnumpyasnppyplot.imshow(x_train[0].reshape((28,28)),cmap="gray")# ``pyplot.show()`` only if not on Colabtry:importgoogle.colabexceptImportError:pyplot.show()print(x_train.shape)
nn tutorial
(50000, 784)

PyTorch usestorch.tensor, rather than numpy arrays, so we need toconvert our data.

tensor([[0., 0., 0.,  ..., 0., 0., 0.],        [0., 0., 0.,  ..., 0., 0., 0.],        [0., 0., 0.,  ..., 0., 0., 0.],        ...,        [0., 0., 0.,  ..., 0., 0., 0.],        [0., 0., 0.,  ..., 0., 0., 0.],        [0., 0., 0.,  ..., 0., 0., 0.]]) tensor([5, 0, 4,  ..., 8, 4, 8])torch.Size([50000, 784])tensor(0) tensor(9)

Neural net from scratch (withouttorch.nn)#

Let’s first create a model using nothing but PyTorch tensor operations. We’re assumingyou’re already familiar with the basics of neural networks. (If you’re not, you canlearn them atcourse.fast.ai).

PyTorch provides methods to create random or zero-filled tensors, which we willuse to create our weights and bias for a simple linear model. These are just regulartensors, with one very special addition: we tell PyTorch that they require agradient. This causes PyTorch to record all of the operations done on the tensor,so that it can calculate the gradient during back-propagationautomatically!

For the weights, we setrequires_gradafter the initialization, since wedon’t want that step included in the gradient. (Note that a trailing_ inPyTorch signifies that the operation is performed in-place.)

Note

We are initializing the weights here withXavier initialisation(by multiplying with1/sqrt(n)).

importmathweights=torch.randn(784,10)/math.sqrt(784)weights.requires_grad_()bias=torch.zeros(10,requires_grad=True)

Thanks to PyTorch’s ability to calculate gradients automatically, we canuse any standard Python function (or callable object) as a model! Solet’s just write a plain matrix multiplication and broadcasted additionto create a simple linear model. We also need an activation function, sowe’ll writelog_softmax and use it. Remember: although PyTorchprovides lots of prewritten loss functions, activation functions, andso forth, you can easily write your own using plain python. PyTorch willeven create fast accelerator or vectorized CPU code for your functionautomatically.

deflog_softmax(x):returnx-x.exp().sum(-1).log().unsqueeze(-1)defmodel(xb):returnlog_softmax(xb@weights+bias)

In the above, the@ stands for the matrix multiplication operation. We will callour function on one batch of data (in this case, 64 images). This isoneforward pass. Note that our predictions won’t be any better thanrandom at this stage, since we start with random weights.

bs=64# batch sizexb=x_train[0:bs]# a mini-batch from xpreds=model(xb)# predictionspreds[0],preds.shapeprint(preds[0],preds.shape)
tensor([-2.2616, -2.5512, -2.4773, -1.9495, -2.7288, -2.4958, -2.3086, -2.2068,        -2.0355, -2.2661], grad_fn=<SelectBackward0>) torch.Size([64, 10])

As you see, thepreds tensor contains not only the tensor values, but also agradient function. We’ll use this later to do backprop.

Let’s implement negative log-likelihood to use as the loss function(again, we can just use standard Python):

defnll(input,target):return-input[range(target.shape[0]),target].mean()loss_func=nll

Let’s check our loss with our random model, so we can see if we improveafter a backprop pass later.

yb=y_train[0:bs]print(loss_func(preds,yb))
tensor(2.2900, grad_fn=<NegBackward0>)

Let’s also implement a function to calculate the accuracy of our model.For each prediction, if the index with the largest value matches thetarget value, then the prediction was correct.

defaccuracy(out,yb):preds=torch.argmax(out,dim=1)return(preds==yb).float().mean()

Let’s check the accuracy of our random model, so we can see if ouraccuracy improves as our loss improves.

print(accuracy(preds,yb))
tensor(0.0938)

We can now run a training loop. For each iteration, we will:

  • select a mini-batch of data (of sizebs)

  • use the model to make predictions

  • calculate the loss

  • loss.backward() updates the gradients of the model, in this case,weightsandbias.

We now use these gradients to update the weights and bias. We do thiswithin thetorch.no_grad() context manager, because we do not want theseactions to be recorded for our next calculation of the gradient. You can readmore about how PyTorch’s Autograd records operationshere.

We then set thegradients to zero, so that we are ready for the next loop.Otherwise, our gradients would record a running tally of all the operationsthat had happened (i.e.loss.backward()adds the gradients to whatever isalready stored, rather than replacing them).

Tip

You can use the standard python debugger to step through PyTorchcode, allowing you to check the various variable values at each step.Uncommentset_trace() below to try it out.

fromIPython.core.debuggerimportset_tracelr=0.5# learning rateepochs=2# how many epochs to train forforepochinrange(epochs):foriinrange((n-1)//bs+1):#         set_trace()start_i=i*bsend_i=start_i+bsxb=x_train[start_i:end_i]yb=y_train[start_i:end_i]pred=model(xb)loss=loss_func(pred,yb)loss.backward()withtorch.no_grad():weights-=weights.grad*lrbias-=bias.grad*lrweights.grad.zero_()bias.grad.zero_()

That’s it: we’ve created and trained a minimal neural network (in this case, alogistic regression, since we have no hidden layers) entirely from scratch!

Let’s check the loss and accuracy and compare those to what we gotearlier. We expect that the loss will have decreased and accuracy tohave increased, and they have.

print(loss_func(model(xb),yb),accuracy(model(xb),yb))
tensor(0.0815, grad_fn=<NegBackward0>) tensor(1.)

Usingtorch.nn.functional#

We will now refactor our code, so that it does the same thing as before, onlywe’ll start taking advantage of PyTorch’snn classes to make it more conciseand flexible. At each step from here, we should be making our code one or moreof: shorter, more understandable, and/or more flexible.

The first and easiest step is to make our code shorter by replacing ourhand-written activation and loss functions with those fromtorch.nn.functional(which is generally imported into the namespaceF by convention). This modulecontains all the functions in thetorch.nn library (whereas other parts of thelibrary contain classes). As well as a wide range of loss and activationfunctions, you’ll also find here some convenient functions for creating neuralnets, such as pooling functions. (There are also functions for doing convolutions,linear layers, etc, but as we’ll see, these are usually better handled usingother parts of the library.)

If you’re using negative log likelihood loss and log softmax activation,then Pytorch provides a single functionF.cross_entropy that combinesthe two. So we can even remove the activation function from our model.

importtorch.nn.functionalasFloss_func=F.cross_entropydefmodel(xb):returnxb@weights+bias

Note that we no longer calllog_softmax in themodel function. Let’sconfirm that our loss and accuracy are the same as before:

print(loss_func(model(xb),yb),accuracy(model(xb),yb))
tensor(0.0815, grad_fn=<NllLossBackward0>) tensor(1.)

Refactor usingnn.Module#

Next up, we’ll usenn.Module andnn.Parameter, for a clearer and moreconcise training loop. We subclassnn.Module (which itself is a class andable to keep track of state). In this case, we want to create a class thatholds our weights, bias, and method for the forward step.nn.Module has anumber of attributes and methods (such as.parameters() and.zero_grad())which we will be using.

Note

nn.Module (uppercase M) is a PyTorch specific concept, and is aclass we’ll be using a lot.nn.Module is not to be confused with the Pythonconcept of a (lowercasem)module,which is a file of Python code that can be imported.

fromtorchimportnnclassMnist_Logistic(nn.Module):def__init__(self):super().__init__()self.weights=nn.Parameter(torch.randn(784,10)/math.sqrt(784))self.bias=nn.Parameter(torch.zeros(10))defforward(self,xb):returnxb@self.weights+self.bias

Since we’re now using an object instead of just using a function, wefirst have to instantiate our model:

Now we can calculate the loss in the same way as before. Note thatnn.Module objects are used as if they are functions (i.e they arecallable), but behind the scenes Pytorch will call ourforwardmethod automatically.

print(loss_func(model(xb),yb))
tensor(2.3415, grad_fn=<NllLossBackward0>)

Previously for our training loop we had to update the values for each parameterby name, and manually zero out the grads for each parameter separately, like this:

Now we can take advantage of model.parameters() and model.zero_grad() (whichare both defined by PyTorch fornn.Module) to make those steps more conciseand less prone to the error of forgetting some of our parameters, particularlyif we had a more complicated model:

withtorch.no_grad():forpinmodel.parameters():p-=p.grad*lrmodel.zero_grad()

We’ll wrap our little training loop in afit function so we can run itagain later.

deffit():forepochinrange(epochs):foriinrange((n-1)//bs+1):start_i=i*bsend_i=start_i+bsxb=x_train[start_i:end_i]yb=y_train[start_i:end_i]pred=model(xb)loss=loss_func(pred,yb)loss.backward()withtorch.no_grad():forpinmodel.parameters():p-=p.grad*lrmodel.zero_grad()fit()

Let’s double-check that our loss has gone down:

print(loss_func(model(xb),yb))
tensor(0.0812, grad_fn=<NllLossBackward0>)

Refactor usingnn.Linear#

We continue to refactor our code. Instead of manually defining andinitializingself.weights andself.bias, and calculatingxb @self.weights+self.bias, we will instead use the Pytorch classnn.Linear for alinear layer, which does all that for us. Pytorch has many types ofpredefined layers that can greatly simplify our code, and often makes itfaster too.

classMnist_Logistic(nn.Module):def__init__(self):super().__init__()self.lin=nn.Linear(784,10)defforward(self,xb):returnself.lin(xb)

We instantiate our model and calculate the loss in the same way as before:

model=Mnist_Logistic()print(loss_func(model(xb),yb))
tensor(2.3909, grad_fn=<NllLossBackward0>)

We are still able to use our samefit method as before.

fit()print(loss_func(model(xb),yb))
tensor(0.0815, grad_fn=<NllLossBackward0>)

Refactor usingtorch.optim#

Pytorch also has a package with various optimization algorithms,torch.optim.We can use thestep method from our optimizer to take a forward step, insteadof manually updating each parameter.

This will let us replace our previous manually coded optimization step:

withtorch.no_grad():forpinmodel.parameters():p-=p.grad*lrmodel.zero_grad()

and instead use just:

(optim.zero_grad() resets the gradient to 0 and we need to call it beforecomputing the gradient for the next minibatch.)

fromtorchimportoptim

We’ll define a little function to create our model and optimizer so wecan reuse it in the future.

defget_model():model=Mnist_Logistic()returnmodel,optim.SGD(model.parameters(),lr=lr)model,opt=get_model()print(loss_func(model(xb),yb))forepochinrange(epochs):foriinrange((n-1)//bs+1):start_i=i*bsend_i=start_i+bsxb=x_train[start_i:end_i]yb=y_train[start_i:end_i]pred=model(xb)loss=loss_func(pred,yb)loss.backward()opt.step()opt.zero_grad()print(loss_func(model(xb),yb))
tensor(2.3045, grad_fn=<NllLossBackward0>)tensor(0.0828, grad_fn=<NllLossBackward0>)

Refactor using Dataset#

PyTorch has an abstract Dataset class. A Dataset can be anything that hasa__len__ function (called by Python’s standardlen function) anda__getitem__ function as a way of indexing into it.This tutorialwalks through a nice example of creating a customFacialLandmarkDataset classas a subclass ofDataset.

PyTorch’sTensorDatasetis a Dataset wrapping tensors. By defining a length and way of indexing,this also gives us a way to iterate, index, and slice along the firstdimension of a tensor. This will make it easier to access both theindependent and dependent variables in the same line as we train.

fromtorch.utils.dataimportTensorDataset

Bothx_train andy_train can be combined in a singleTensorDataset,which will be easier to iterate over and slice.

Previously, we had to iterate through minibatches ofx andy values separately:

xb=x_train[start_i:end_i]yb=y_train[start_i:end_i]

Now, we can do these two steps together:

xb,yb=train_ds[i*bs:i*bs+bs]
model,opt=get_model()forepochinrange(epochs):foriinrange((n-1)//bs+1):xb,yb=train_ds[i*bs:i*bs+bs]pred=model(xb)loss=loss_func(pred,yb)loss.backward()opt.step()opt.zero_grad()print(loss_func(model(xb),yb))
tensor(0.0823, grad_fn=<NllLossBackward0>)

Refactor usingDataLoader#

PyTorch’sDataLoader is responsible for managing batches. You cancreate aDataLoader from anyDataset.DataLoader makes it easierto iterate over batches. Rather than having to usetrain_ds[i*bs:i*bs+bs],theDataLoader gives us each minibatch automatically.

fromtorch.utils.dataimportDataLoadertrain_ds=TensorDataset(x_train,y_train)train_dl=DataLoader(train_ds,batch_size=bs)

Previously, our loop iterated over batches(xb,yb) like this:

foriinrange((n-1)//bs+1):xb,yb=train_ds[i*bs:i*bs+bs]pred=model(xb)

Now, our loop is much cleaner, as(xb,yb) are loaded automatically from the data loader:

forxb,ybintrain_dl:pred=model(xb)
model,opt=get_model()forepochinrange(epochs):forxb,ybintrain_dl:pred=model(xb)loss=loss_func(pred,yb)loss.backward()opt.step()opt.zero_grad()print(loss_func(model(xb),yb))
tensor(0.0806, grad_fn=<NllLossBackward0>)

Thanks to PyTorch’snn.Module,nn.Parameter,Dataset, andDataLoader,our training loop is now dramatically smaller and easier to understand. Let’snow try to add the basic features necessary to create effective models in practice.

Add validation#

In section 1, we were just trying to get a reasonable training loop set up foruse on our training data. In reality, youalways should also haveavalidation set, in orderto identify if you are overfitting.

Shuffling the training data isimportantto prevent correlation between batches and overfitting. On the other hand, thevalidation loss will be identical whether we shuffle the validation set or not.Since shuffling takes extra time, it makes no sense to shuffle the validation data.

We’ll use a batch size for the validation set that is twice as large asthat for the training set. This is because the validation set does notneed backpropagation and thus takes less memory (it doesn’t need tostore the gradients). We take advantage of this to use a larger batchsize and compute the loss more quickly.

train_ds=TensorDataset(x_train,y_train)train_dl=DataLoader(train_ds,batch_size=bs,shuffle=True)valid_ds=TensorDataset(x_valid,y_valid)valid_dl=DataLoader(valid_ds,batch_size=bs*2)

We will calculate and print the validation loss at the end of each epoch.

(Note that we always callmodel.train() before training, andmodel.eval()before inference, because these are used by layers such asnn.BatchNorm2dandnn.Dropout to ensure appropriate behavior for these different phases.)

model,opt=get_model()forepochinrange(epochs):model.train()forxb,ybintrain_dl:pred=model(xb)loss=loss_func(pred,yb)loss.backward()opt.step()opt.zero_grad()model.eval()withtorch.no_grad():valid_loss=sum(loss_func(model(xb),yb)forxb,ybinvalid_dl)print(epoch,valid_loss/len(valid_dl))
0 tensor(0.3716)1 tensor(0.2822)

Create fit() and get_data()#

We’ll now do a little refactoring of our own. Since we go through a similarprocess twice of calculating the loss for both the training set and thevalidation set, let’s make that into its own function,loss_batch, whichcomputes the loss for one batch.

We pass an optimizer in for the training set, and use it to performbackprop. For the validation set, we don’t pass an optimizer, so themethod doesn’t perform backprop.

defloss_batch(model,loss_func,xb,yb,opt=None):loss=loss_func(model(xb),yb)ifoptisnotNone:loss.backward()opt.step()opt.zero_grad()returnloss.item(),len(xb)

fit runs the necessary operations to train our model and compute thetraining and validation losses for each epoch.

importnumpyasnpdeffit(epochs,model,loss_func,opt,train_dl,valid_dl):forepochinrange(epochs):model.train()forxb,ybintrain_dl:loss_batch(model,loss_func,xb,yb,opt)model.eval()withtorch.no_grad():losses,nums=zip(*[loss_batch(model,loss_func,xb,yb)forxb,ybinvalid_dl])val_loss=np.sum(np.multiply(losses,nums))/np.sum(nums)print(epoch,val_loss)

get_data returns dataloaders for the training and validation sets.

defget_data(train_ds,valid_ds,bs):return(DataLoader(train_ds,batch_size=bs,shuffle=True),DataLoader(valid_ds,batch_size=bs*2),)

Now, our whole process of obtaining the data loaders and fitting themodel can be run in 3 lines of code:

train_dl,valid_dl=get_data(train_ds,valid_ds,bs)model,opt=get_model()fit(epochs,model,loss_func,opt,train_dl,valid_dl)
0 0.295537490361928951 0.29120843526124957

You can use these basic 3 lines of code to train a wide variety of models.Let’s see if we can use them to train a convolutional neural network (CNN)!

Switch to CNN#

We are now going to build our neural network with three convolutional layers.Because none of the functions in the previous section assume anything aboutthe model form, we’ll be able to use them to train a CNN without any modification.

We will use PyTorch’s predefinedConv2d classas our convolutional layer. We define a CNN with 3 convolutional layers.Each convolution is followed by a ReLU. At the end, we perform anaverage pooling. (Note thatview is PyTorch’s version of Numpy’sreshape)

classMnist_CNN(nn.Module):def__init__(self):super().__init__()self.conv1=nn.Conv2d(1,16,kernel_size=3,stride=2,padding=1)self.conv2=nn.Conv2d(16,16,kernel_size=3,stride=2,padding=1)self.conv3=nn.Conv2d(16,10,kernel_size=3,stride=2,padding=1)defforward(self,xb):xb=xb.view(-1,1,28,28)xb=F.relu(self.conv1(xb))xb=F.relu(self.conv2(xb))xb=F.relu(self.conv3(xb))xb=F.avg_pool2d(xb,4)returnxb.view(-1,xb.size(1))lr=0.1

Momentum is a variation onstochastic gradient descent that takes previous updates into account as welland generally leads to faster training.

model=Mnist_CNN()opt=optim.SGD(model.parameters(),lr=lr,momentum=0.9)fit(epochs,model,loss_func,opt,train_dl,valid_dl)
0 0.335689662861824061 0.2266719504714012

Usingnn.Sequential#

torch.nn has another handy class we can use to simplify our code:Sequential .ASequential object runs each of the modules contained within it, in asequential manner. This is a simpler way of writing our neural network.

To take advantage of this, we need to be able to easily define acustom layer from a given function. For instance, PyTorch doesn’thave aview layer, and we need to create one for our network.Lambdawill create a layer that we can then use when defining a network withSequential.

classLambda(nn.Module):def__init__(self,func):super().__init__()self.func=funcdefforward(self,x):returnself.func(x)defpreprocess(x):returnx.view(-1,1,28,28)

The model created withSequential is simple:

model=nn.Sequential(Lambda(preprocess),nn.Conv2d(1,16,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.Conv2d(16,16,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.Conv2d(16,10,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.AvgPool2d(4),Lambda(lambdax:x.view(x.size(0),-1)),)opt=optim.SGD(model.parameters(),lr=lr,momentum=0.9)fit(epochs,model,loss_func,opt,train_dl,valid_dl)
0 0.307881199836730961 0.21302276641130446

WrappingDataLoader#

Our CNN is fairly concise, but it only works with MNIST, because:
  • It assumes the input is a 28*28 long vector

  • It assumes that the final CNN grid size is 4*4 (since that’s the average pooling kernel size we used)

Let’s get rid of these two assumptions, so our model works with any 2dsingle channel image. First, we can remove the initial Lambda layer bymoving the data preprocessing into a generator:

defpreprocess(x,y):returnx.view(-1,1,28,28),yclassWrappedDataLoader:def__init__(self,dl,func):self.dl=dlself.func=funcdef__len__(self):returnlen(self.dl)def__iter__(self):forbinself.dl:yield(self.func(*b))train_dl,valid_dl=get_data(train_ds,valid_ds,bs)train_dl=WrappedDataLoader(train_dl,preprocess)valid_dl=WrappedDataLoader(valid_dl,preprocess)

Next, we can replacenn.AvgPool2d withnn.AdaptiveAvgPool2d, whichallows us to define the size of theoutput tensor we want, rather thantheinput tensor we have. As a result, our model will work with anysize input.

model=nn.Sequential(nn.Conv2d(1,16,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.Conv2d(16,16,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.Conv2d(16,10,kernel_size=3,stride=2,padding=1),nn.ReLU(),nn.AdaptiveAvgPool2d(1),Lambda(lambdax:x.view(x.size(0),-1)),)opt=optim.SGD(model.parameters(),lr=lr,momentum=0.9)

Let’s try it out:

fit(epochs,model,loss_func,opt,train_dl,valid_dl)
0 0.341110521697998051 0.24150483466386796

Using yourAccelerator#

If you’re lucky enough to have access to an accelerator such as CUDA (you canrent one for about $0.50/hour from most cloud providers) you canuse it to speed up your code. First check that your accelerator is working inPytorch:

# If the current accelerator is available, we will use it. Otherwise, we use the CPU.device=torch.accelerator.current_accelerator().typeiftorch.accelerator.is_available()else"cpu"print(f"Using{device} device")
Using cuda device

Let’s updatepreprocess to move batches to the accelerator:

defpreprocess(x,y):returnx.view(-1,1,28,28).to(device),y.to(device)train_dl,valid_dl=get_data(train_ds,valid_ds,bs)train_dl=WrappedDataLoader(train_dl,preprocess)valid_dl=WrappedDataLoader(valid_dl,preprocess)

Finally, we can move our model to the accelerator.

model.to(device)opt=optim.SGD(model.parameters(),lr=lr,momentum=0.9)

You should find it runs faster now:

fit(epochs,model,loss_func,opt,train_dl,valid_dl)
0 0.219278332436084741 0.2120027590751648

Closing thoughts#

We now have a general data pipeline and training loop which you can use fortraining many types of models using Pytorch. To see how simple training a modelcan now be, take a look at themnist_sample notebook.

Of course, there are many things you’ll want to add, such as data augmentation,hyperparameter tuning, monitoring training, transfer learning, and so forth.These features are available in the fastai library, which has been developedusing the same design approach shown in this tutorial, providing a naturalnext step for practitioners looking to take their models further.

We promised at the start of this tutorial we’d explain through example each oftorch.nn,torch.optim,Dataset, andDataLoader. So let’s summarizewhat we’ve seen:

  • torch.nn:

    • Module: creates a callable which behaves like a function, but can alsocontain state(such as neural net layer weights). It knows whatParameter (s) itcontains and can zero all their gradients, loop through them for weight updates, etc.

    • Parameter: a wrapper for a tensor that tells aModule that it has weightsthat need updating during backprop. Only tensors with therequires_grad attribute set are updated

    • functional: a module(usually imported into theF namespace by convention)which contains activation functions, loss functions, etc, as well as non-statefulversions of layers such as convolutional and linear layers.

  • torch.optim: Contains optimizers such asSGD, which update the weightsofParameter during the backward step

  • Dataset: An abstract interface of objects with a__len__ and a__getitem__,including classes provided with Pytorch such asTensorDataset

  • DataLoader: Takes anyDataset and creates an iterator which returns batches of data.

Total running time of the script: (0 minutes 26.218 seconds)