Posted onMar 6, 2021 • Originally published ateducative.io

PyTorch tutorial: a quick guide for new learners

#python #datascience #deeplearning #tutorial

Python is well-established as the go-to language for data science and machine learning, partially thanks to the open-source ML library PyTorch.

PyTorch's combination of powerful deep neural network building tools and ease-of-use make it a popular choice for data scientists. As its popularity grows, more and more companies are moving from TensorFlow to PyTorch, making now the best time to get started with PyTorch.

Today, we'll help understand what makes PyTorch so popular, some basics of using PyTorch, and help you make your first computational models.

Here’s what we’ll cover today:

Start building PyTorch projects in half the time

Learn to build and optimize PyTorch neural networks with hands-on practice.

Make Your First GAN Using PyTorch

What is PyTorch?

PyTorch is an open-source machine learning Python library used for deep learning implementations like computer vision (using TorchVision) andnatural language processing. It was developed by Facebook's AI research lab (FAIR) in 2016 and has since been adopted across the fields of data science and ML.

PyTorch makes machine learning intuitive for those already familiar with Python and has great features likeOOP support and dynamic computation graphs.

Along with building deep neural networks, PyTorch is also great for complicated mathematical computations because of its GPU acceleration. This feature allows PyTorch to use your computer's GPU to massively speed up computations.

This combination of unique features and PyTorch's unparalleled simplicity makes it one of the most popular deep learning libraries, only rivaled by TensorFlow for the top spot.

Why use PyTorch?

Before PyTorch, developers used advanced calculus to find the relationships between back-propagated errors and node weighting. Deeper neural networks called for more and more complicated operations, which restricted machine learning in scale and approachability.

Now, we can use ML libraries to automatically complete all that calculus! ML libraries can compute any size or shape network in a matter of seconds, allowing more developers to build bigger and better networks.

PyTorch takes this accessibility one step further by behaving like standard Python. Instead of learning a new syntax, you can use existing Python knowledge to get started fast. Further, you can use additional Python libraries with PyTorch, such as popular debuggers like thePyCharm debugger.

PyTorch vs. TensorFlow

The main difference between PyTorch and TensorFlow is a tradeoff between simplicity and performance:PyTorch is easier to learn (especially for Python programmers), while TensorFlow has a learning curve but performs better and is more widely used.

Popularity: TensorFlow is the current go-to tool for industry professionals and researchers because it was released 1 year earlier than PyTorch. However, PyTorch users are growing at a faster rate than TensorFlow, suggesting that PyTorch may soon be the most popular.
Data parallelism: PyTorch includes declarative data parallelism, in other words it automatically spreads the workload of data processing across different GPUs to speed up performance. TensorFlow has parallelism, but it requires you to assign work manually, which is often time-consuming and less efficient.
Dynamic vs. Static Graphs: PyTorch has dynamic graphs by default that respond to new data immediately. TensorFlow has limited support for dynamic graphs using TensorFlow Fold but mostly uses static graphs.
Integrations: PyTorch is good to use for projects onAWS because of its close connection through TorchServe. TensorFlow is well integrated with Google Cloud and is suited for mobile applications due to its use of Swift API.
Visualization: TensorFlow has more robustvisualization tools and offers you finer control over graph settings. PyTorch's Visdom visualization tool or other standard plotting libraries like matplotlib are not as fully featured as TensorFlow, but they're easier to learn.

PyTorch Basics

Tensors

PyTorch tensors are multidimensional array variables used as the foundation for all advanced operations. Unlike standard numeric types, tensors can be assigned to use either your CPU or GPU to speed up operations.

They're similar to an n-dimensional NumPy array and can even be converted to a NumPy array in just a single line.

Tensors come in 5 types:

FloatTensor: 32-bit float
DoubleTensor: 64-bit float
HalfTensor: 16-bit float
IntTensor: 32-bit int
LongTensor: 64-bit int

As with all numeric types, you want to use the smallest type that fits your needs to save memory. PyTorch usesFloatTensor as the default type for all tensors, but you can change this using

torch.set_default_tensor_type(t)

To initialize twoFloatTensors:

importtorch# initializing tensorsa=torch.tensor(2)b=torch.tensor(1)

Tensors can be used like other numeric types in simple mathematical operations.

# additionprint(a+b)# subtractionprint(b-a)# multiplicationprint(a*b)# divisionprint(a/b)

You can also move tensors to be handled by the GPU usingcuda.

iftorch.cuda.is_available():x=x.cuda()y=y.cuda()x+y

As tensors are matrices in PyTorch, you can set tensors to represent a table of numbers:

ones_tensor=torch.ones((2,2))# tensor containing all onesrand_tensor=torch.rand((2,2))# tensor containing random values

Here, we're specifying that our tensor should be a 2x2 square. The square is populated with either all ones when using theones() function or random numbers when using therand() function.

Neural Networks

PyTorch is commonly used to buildneural networks due to its exceptional classification models like image classification or convolutional neural networks (CNN).

Neural networks are layers of connected and weighted data nodes. Each layer allows the model to home in on which classification the input data most closely matches.

Neural networks are only as good as their training and therefore needbig datasets and GAN frameworks, which generate more challenging training data based on those already mastered by the model.

PyTorch defines neural networks using thetorch.nn package, which contains a set of modules to represent each layer of a network.

Each module receives input tensors and computes output tensors, which work together to create the network. Thetorch.nn package also defines loss functions that we use to train neural networks.
The steps to building a neural network are:

Construction: Create neural network layers, set up parameters, establish weights and biases.
Forward Propagation: Calculate the predicted output using your parameters. Measure error by comparing predicted and actual output.
Back-propagation: After finding the error, take the derivative of the error function in terms of the parameters of our neural network. Backward propagation allows us to update our weight parameters.
Iterative Optimization: Minimize errors by using optimizers that update parameters through iteration using gradient descent.

Here's an example of a neural network in PyTorch:

importtorchimporttorch.nnasnnimporttorch.nn.functionalasFclassNet(nn.Module):def__init__(self):super(Net,self).__init__()# 1 input image channel, 6 output channels, 3x3 square convolution# kernelself.conv1=nn.Conv2d(1,6,3)self.conv2=nn.Conv2d(6,16,3)# an affine operation: y = Wx + bself.fc1=nn.Linear(16*6*6,120)# 6*6 from image dimensionself.fc2=nn.Linear(120,84)self.fc3=nn.Linear(84,10)defforward(self,x):# Max pooling over a (2, 2) windowx=F.max_pool2d(F.relu(self.conv1(x)),(2,2))# If the size is a square you can only specify a single numberx=F.max_pool2d(F.relu(self.conv2(x)),2)x=x.view(-1,self.num_flat_features(x))x=F.relu(self.fc1(x))x=F.relu(self.fc2(x))x=self.fc3(x)returnxdefnum_flat_features(self,x):size=x.size()[1:]# all dimensions except the batch dimensionnum_features=1forsinsize:num_features*=sreturnnum_featuresnet=Net()print(net)

Thenn.module designates that this will be a neural network then we define it with 2 conv2d layers, which perform a 2D convolution, and 3 linear layers, which perform linear transformations.

Next, we define a forward method to outline how to do forward propagation. We don't need to define a backward propagation method because PyTorch includes abackwards() function by default.

Don't worry if this seems confusing right now, we'll cover simpler PyTorch implementations later in this tutorial.

Autograd

Autograd is a PyTorch package used tocalculate derivatives essential for neural network operations. These derivatives are called gradients. During a forward pass, autograd records all operations on a gradient-enabled tensor and creates an acyclic graph to find the relationship between the tensor and all operations. This operation collection is called automatic differentiation.

The leaves of this graph are input tensors, and the roots are output tensors. Autograd calculates the gradient by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

After calculating the gradient, the value of the derivative is automatically populated as agrad attribute of the tensor.

importtorch# pytorch tensorx=torch.tensor(3.5,requires_grad=True)# y is defined as a function of xy=(x-1)*(x-2)*(x-3)# work out gradientsy.backward()

By default,requires_grad is set tofalse and PyTorch will not track gradients. Specifyingrequires_grad asTrue during initialization will make PyTorch track gradients for this particular tensor whenever we perform some operation on it.

This code looks aty and sees that it came from(x-1) * (x-2) * (x-3) and automatically works out the gradient dy/dx,3x^2 - 12x + 11

The instruction also works out the numerical value of that gradient and places it inside the tensorx alongside the actual value ofx,3.5.

All together, the gradient is3 * (3.5 * 3.5) - 12 * (3.5) + 11 = 5.75.

Gradients accumulate by default, which could influence the result if not reset. Usemodel.zero_grad() to re-zero your graph after each gradient.

Optimizers

Optimizers allow you to update the weights and biases within a model to reduce error. This allows you to edit how your model works without having to remake the whole thing.

All PyTorch optimizers are contained in thetorch.optim package, with each optimization scheme designed to be useful in specific situations. Thetorch.optim module allows you to build an abstract optimization scheme by just passing a list of params. PyTorch has many optimizers to choose from, meaning there's almost always one that best fits your needs.

For example, we can implement the common optimization algorithm, SGD (Stochastic Gradient Descent), to smooth our data.

importtorch.optimasoptimparams=torch.tensor([1.0,0.0],requires_grad=True)learning_rate=1e-3## SGDoptimizer=optim.SGD([params],lr=learning_rate)

After updating the model, useoptimizer.step() to tell PyTorch to recalculate the model.
Without using optimizers, we would need to manually update the model parameters one by one using a loop:

forparamsinmodel.parameters():params-=params.grad*learning_rate

Overall, optimizers save a lot of time by allowing you to optimize your data weighting and alter your model without remaking it.

Computation Graphs with PyTorch

To better understand PyTorch and neural networks, it's important to practice with computation graphs. These graphs are essentially a simplified version of neural networks with a sequence of operations used to see how the output of a system is affected by the input.

In other words, inputx is used to findy, which is then used to find the outputz.

Imagine thaty andz are calculated as follows:

y = x^2

z = 2y + 3

However, we're interested in how how outputz changes with inputx, so we'll need to do some calculus:

dz/dx = (dz/dy) * (dy/dx)

dz/dx = 2.2x

dz/dx = 4x

Using this, we can see that inputx=3.5 will makez = 14.

Knowing how to define each tensor in terms of the others (y andz in terms ofx,z in terms ofy, etc.) allows PyTorch to build a picture of how these tensors are connected.

This picture is called acomputation graph and can help us understand how PyTorch works behind the scenes.

Using this graph, we can see how each tensor will be affected by a change in any other tensor. These relationships are gradients and are used to update a neural network during training.

These graphs are much easier to do using PyTorch than by hand, so let's try it now that we understand what's happening behind the scenes.

importtorch# set up simple graph relating x, y and zx=torch.tensor(3.5,requires_grad=True)y=x*xz=2*y+3print("x:",x)print("y = x*x:",y)print("z= 2*y + 3:",z)# work out gradientsz.backward()print("Working out gradients dz/dx")# what is gradient at x = 3.5print("Gradient at x = 3.5:",x.grad)

This finds thatz = 14 just as we found by hand above!

Keep learning PyTorch.

PyTorch is a rising star among data scientists, now is the time to get ahead of the curve. Educative's hands-on courses help you pick up the latest industry trends in half the time.

Make Your First GAN Using PyTorch

Hands-on with PyTorch: Multi-path computation graph

Now that you've seen a computation graph with a single set of relationships, let's try a more complex example.

First, define two tensors,a andb, to function as our inputs. Make sure to setrequires_grad=True so we can make gradients down the line.

importtorch# set up simple graph relating x, y and za=torch.tensor(3.0,requires_grad=True)b=torch.tensor(2.0,requires_grad=True)

Next, set up the relationships between our input and each layer of our neural network,x,y, andz. Notice thatz is defined in terms ofx andy, whilex andy are defined using our input valuesa andb.

importtorch# set up simple graph relating x, y and za=torch.tensor(3.0,requires_grad=True)b=torch.tensor(2.0,requires_grad=True)x=2*a+3*by=5*a*a+3*b*b*bz=2*x+3*y

This builds a chain of relationships that PyTorch can follow to understand all the relationships between data.

We can now work out the gradientdz/da by following the path back fromz toa.

There are two paths, one going throughx and the other throughy. You should follow them both and add the expressions from both paths together. This makes sense because both paths froma toz contribute to the value ofz.

We'd have found the same result if we had worked outdz/da using the chain rule of calculus.

The first path throughx gives us2 * 2 and the second path throughy gives us3 * 10a. So, the rate at whichz changes witha is4 + 30a.

Ifa is 22, thendz/da is4+30∗2=64.

We can confirm this in PyTorch by adding a backward propagation fromz then asking for the gradient (or derivative) ofa.

importtorch# set up simple graph relating x, y and za=torch.tensor(2.0,requires_grad=True)b=torch.tensor(1.0,requires_grad=True)x=2*a+3*by=5*a*a+3*b*b*bz=2*x+3*yprint("a:",a)print("b:",b)print("x:",x)print("y:",y)print("z:",z)# work out gradientsz.backward()print("Working out gradient dz/da")# what is gradient at a = 2.0print("Gradient at a=2.0:",a.grad)

Next steps for your learning

Congratulations, you've now completed your quick start to PyTorch and Neural Networks. Completing a computational graph is an essential part of understanding deep learning networks.

As you learn advanced deep learning skills and applications, you'll want to explore:

Complex neural networks with optimization
Visualization design
Training with GANs

To help you get comfortable with real-world deep learning projects, Educative has madeMake Your First GAN Using PyTorch. This course gives you a crash course creating neural networks with PyTorch and teaches you to enhance their accuracy with GANs.

By the end, you'll have hands-on practice creating and optimizing industry-ready networks from start to finish.

Happy learning!