Note
Go to the endto download the full example code.
Introduction ||Tensors ||Autograd ||Building Models ||TensorBoard Support ||Training Models ||Model Understanding
Introduction to PyTorch Tensors#
Created On: Nov 30, 2021 | Last Updated: Sep 22, 2025 | Last Verified: Nov 05, 2024
Follow along with the video below or onyoutube.
Tensors are the central data abstraction in PyTorch. This interactivenotebook provides an in-depth introduction to thetorch.Tensorclass.
First things first, let’s import the PyTorch module. We’ll also addPython’s math module to facilitate some of the examples.
importtorchimportmath
Creating Tensors#
The simplest way to create a tensor is with thetorch.empty() call:
x=torch.empty(3,4)print(type(x))print(x)
<class 'torch.Tensor'>tensor([[2.7272e+14, 4.5762e-41, 2.7272e+14, 4.5762e-41], [2.7272e+14, 4.5762e-41, 2.7273e+14, 4.5762e-41], [2.7273e+14, 4.5762e-41, 2.7273e+14, 4.5762e-41]])
Let’s upack what we just did:
We created a tensor using one of the numerous factory methodsattached to the
torchmodule.The tensor itself is 2-dimensional, having 3 rows and 4 columns.
The type of the object returned is
torch.Tensor, which is analias fortorch.FloatTensor; by default, PyTorch tensors arepopulated with 32-bit floating point numbers. (More on data typesbelow.)You will probably see some random-looking values when printing yourtensor. The
torch.empty()call allocates memory for the tensor,but does not initialize it with any values - so what you’re seeing iswhatever was in memory at the time of allocation.
A brief note about tensors and their number of dimensions, andterminology:
You will sometimes see a 1-dimensional tensor called avector.
Likewise, a 2-dimensional tensor is often referred to as amatrix.
Anything with more than two dimensions is generally justcalled a tensor.
More often than not, you’ll want to initialize your tensor with somevalue. Common cases are all zeros, all ones, or random values, and thetorch module provides factory methods for all of these:
zeros=torch.zeros(2,3)print(zeros)ones=torch.ones(2,3)print(ones)torch.manual_seed(1729)random=torch.rand(2,3)print(random)
tensor([[0., 0., 0.], [0., 0., 0.]])tensor([[1., 1., 1.], [1., 1., 1.]])tensor([[0.3126, 0.3791, 0.3087], [0.0736, 0.4216, 0.0691]])
The factory methods all do just what you’d expect - we have a tensorfull of zeros, another full of ones, and another with random valuesbetween 0 and 1.
Random Tensors and Seeding#
Speaking of the random tensor, did you notice the call totorch.manual_seed() immediately preceding it? Initializing tensors,such as a model’s learning weights, with random values is common butthere are times - especially in research settings - where you’ll wantsome assurance of the reproducibility of your results. Manually settingyour random number generator’s seed is the way to do this. Let’s lookmore closely:
torch.manual_seed(1729)random1=torch.rand(2,3)print(random1)random2=torch.rand(2,3)print(random2)torch.manual_seed(1729)random3=torch.rand(2,3)print(random3)random4=torch.rand(2,3)print(random4)
tensor([[0.3126, 0.3791, 0.3087], [0.0736, 0.4216, 0.0691]])tensor([[0.2332, 0.4047, 0.2162], [0.9927, 0.4128, 0.5938]])tensor([[0.3126, 0.3791, 0.3087], [0.0736, 0.4216, 0.0691]])tensor([[0.2332, 0.4047, 0.2162], [0.9927, 0.4128, 0.5938]])
What you should see above is thatrandom1 andrandom3 carryidentical values, as dorandom2 andrandom4. Manually settingthe RNG’s seed resets it, so that identical computations depending onrandom number should, in most settings, provide identical results.
For more information, see thePyTorch documentation onreproducibility.
Tensor Shapes#
Often, when you’re performing operations on two or more tensors, theywill need to be of the sameshape - that is, having the same number ofdimensions and the same number of cells in each dimension. For that, wehave thetorch.*_like() methods:
x=torch.empty(2,2,3)print(x.shape)print(x)empty_like_x=torch.empty_like(x)print(empty_like_x.shape)print(empty_like_x)zeros_like_x=torch.zeros_like(x)print(zeros_like_x.shape)print(zeros_like_x)ones_like_x=torch.ones_like(x)print(ones_like_x.shape)print(ones_like_x)rand_like_x=torch.rand_like(x)print(rand_like_x.shape)print(rand_like_x)
torch.Size([2, 2, 3])tensor([[[-7.4134e-14, 3.0931e-41, 0.0000e+00], [ 1.4013e-45, 8.9683e-44, 0.0000e+00]], [[ 1.1210e-43, 0.0000e+00, -5.4490e-14], [ 3.0931e-41, 1.4013e-45, 0.0000e+00]]])torch.Size([2, 2, 3])tensor([[[-5.3096e-14, 3.0931e-41, 1.4013e-45], [ 0.0000e+00, 1.4013e-45, 0.0000e+00]], [[ 1.4013e-45, 0.0000e+00, 1.4013e-45], [ 0.0000e+00, 1.4013e-45, 0.0000e+00]]])torch.Size([2, 2, 3])tensor([[[0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.]]])torch.Size([2, 2, 3])tensor([[[1., 1., 1.], [1., 1., 1.]], [[1., 1., 1.], [1., 1., 1.]]])torch.Size([2, 2, 3])tensor([[[0.6128, 0.1519, 0.0453], [0.5035, 0.9978, 0.3884]], [[0.6929, 0.1703, 0.1384], [0.4759, 0.7481, 0.0361]]])
The first new thing in the code cell above is the use of the.shapeproperty on a tensor. This property contains a list of the extent ofeach dimension of a tensor - in our case,x is a three-dimensionaltensor with shape 2 x 2 x 3.
Below that, we call the.empty_like(),.zeros_like(),.ones_like(), and.rand_like() methods. Using the.shapeproperty, we can verify that each of these methods returns a tensor ofidentical dimensionality and extent.
The last way to create a tensor that will cover is to specify its datadirectly from a PyTorch collection:
some_constants=torch.tensor([[3.1415926,2.71828],[1.61803,0.0072897]])print(some_constants)some_integers=torch.tensor((2,3,5,7,11,13,17,19))print(some_integers)more_integers=torch.tensor(((2,4,6),[3,6,9]))print(more_integers)
tensor([[3.1416, 2.7183], [1.6180, 0.0073]])tensor([ 2, 3, 5, 7, 11, 13, 17, 19])tensor([[2, 4, 6], [3, 6, 9]])
Usingtorch.tensor() is the most straightforward way to create atensor if you already have data in a Python tuple or list. As shownabove, nesting the collections will result in a multi-dimensionaltensor.
Note
torch.tensor() creates a copy of the data.
Tensor Data Types#
Setting the datatype of a tensor is possible a couple of ways:
a=torch.ones((2,3),dtype=torch.int16)print(a)b=torch.rand((2,3),dtype=torch.float64)*20.print(b)c=b.to(torch.int32)print(c)
tensor([[1, 1, 1], [1, 1, 1]], dtype=torch.int16)tensor([[ 0.9956, 1.4148, 5.8364], [11.2406, 11.2083, 11.6692]], dtype=torch.float64)tensor([[ 0, 1, 5], [11, 11, 11]], dtype=torch.int32)
The simplest way to set the underlying data type of a tensor is with anoptional argument at creation time. In the first line of the cell above,we setdtype=torch.int16 for the tensora. When we printa,we can see that it’s full of1 rather than1. - Python’s subtlecue that this is an integer type rather than floating point.
Another thing to notice about printinga is that, unlike when weleftdtype as the default (32-bit floating point), printing thetensor also specifies itsdtype.
You may have also spotted that we went from specifying the tensor’sshape as a series of integer arguments, to grouping those arguments in atuple. This is not strictly necessary - PyTorch will take a series ofinitial, unlabeled integer arguments as a tensor shape - but when addingthe optional arguments, it can make your intent more readable.
The other way to set the datatype is with the.to() method. In thecell above, we create a random floating point tensorb in the usualway. Following that, we createc by convertingb to a 32-bitinteger with the.to() method. Note thatc contains all the samevalues asb, but truncated to integers.
For more information, see thedata types documentation.
Math & Logic with PyTorch Tensors#
Now that you know some of the ways to create a tensor… what can you dowith them?
Let’s look at basic arithmetic first, and how tensors interact withsimple scalars:
ones=torch.zeros(2,2)+1twos=torch.ones(2,2)*2threes=(torch.ones(2,2)*7-1)/2fours=twos**2sqrt2s=twos**0.5print(ones)print(twos)print(threes)print(fours)print(sqrt2s)
tensor([[1., 1.], [1., 1.]])tensor([[2., 2.], [2., 2.]])tensor([[3., 3.], [3., 3.]])tensor([[4., 4.], [4., 4.]])tensor([[1.4142, 1.4142], [1.4142, 1.4142]])
As you can see above, arithmetic operations between tensors and scalars,such as addition, subtraction, multiplication, division, andexponentiation are distributed over every element of the tensor. Becausethe output of such an operation will be a tensor, you can chain themtogether with the usual operator precedence rules, as in the line wherewe createthrees.
Similar operations between two tensors also behave like you’dintuitively expect:
tensor([[ 2., 4.], [ 8., 16.]])tensor([[5., 5.], [5., 5.]])tensor([[12., 12.], [12., 12.]])
It’s important to note here that all of the tensors in the previous codecell were of identical shape. What happens when we try to perform abinary operation on tensors if dissimilar shape?
Note
The following cell throws a run-time error. This is intentional.
a=torch.rand(2,3)b=torch.rand(3,2)print(a*b)
In the general case, you cannot operate on tensors of different shapethis way, even in a case like the cell above, where the tensors have anidentical number of elements.
In Brief: Tensor Broadcasting#
Note
If you are familiar with broadcasting semantics in NumPyndarrays, you’ll find the same rules apply here.
The exception to the same-shapes rule istensor broadcasting. Here’san example:
rand=torch.rand(2,4)doubled=rand*(torch.ones(1,4)*2)print(rand)print(doubled)
tensor([[0.6146, 0.5999, 0.5013, 0.9397], [0.8656, 0.5207, 0.6865, 0.3614]])tensor([[1.2291, 1.1998, 1.0026, 1.8793], [1.7312, 1.0413, 1.3730, 0.7228]])
What’s the trick here? How is it we got to multiply a 2x4 tensor by a1x4 tensor?
Broadcasting is a way to perform an operation between tensors that havesimilarities in their shapes. In the example above, the one-row,four-column tensor is multiplied byboth rows of the two-row,four-column tensor.
This is an important operation in Deep Learning. The common example ismultiplying a tensor of learning weights by abatch of input tensors,applying the operation to each instance in the batch separately, andreturning a tensor of identical shape - just like our (2, 4) * (1, 4)example above returned a tensor of shape (2, 4).
The rules for broadcasting are:
Each tensor must have at least one dimension - no empty tensors.
Comparing the dimension sizes of the two tensors,going from last tofirst:
Each dimension must be equal,or
One of the dimensions must be of size 1,or
The dimension does not exist in one of the tensors
Tensors of identical shape, of course, are trivially “broadcastable”, asyou saw earlier.
Here are some examples of situations that honor the above rules andallow broadcasting:
a=torch.ones(4,3,2)b=a*torch.rand(3,2)# 3rd & 2nd dims identical to a, dim 1 absentprint(b)c=a*torch.rand(3,1)# 3rd dim = 1, 2nd dim identical to aprint(c)d=a*torch.rand(1,2)# 3rd dim identical to a, 2nd dim = 1print(d)
tensor([[[0.6493, 0.2633], [0.4762, 0.0548], [0.2024, 0.5731]], [[0.6493, 0.2633], [0.4762, 0.0548], [0.2024, 0.5731]], [[0.6493, 0.2633], [0.4762, 0.0548], [0.2024, 0.5731]], [[0.6493, 0.2633], [0.4762, 0.0548], [0.2024, 0.5731]]])tensor([[[0.7191, 0.7191], [0.4067, 0.4067], [0.7301, 0.7301]], [[0.7191, 0.7191], [0.4067, 0.4067], [0.7301, 0.7301]], [[0.7191, 0.7191], [0.4067, 0.4067], [0.7301, 0.7301]], [[0.7191, 0.7191], [0.4067, 0.4067], [0.7301, 0.7301]]])tensor([[[0.6276, 0.7357], [0.6276, 0.7357], [0.6276, 0.7357]], [[0.6276, 0.7357], [0.6276, 0.7357], [0.6276, 0.7357]], [[0.6276, 0.7357], [0.6276, 0.7357], [0.6276, 0.7357]], [[0.6276, 0.7357], [0.6276, 0.7357], [0.6276, 0.7357]]])
Look closely at the values of each tensor above:
The multiplication operation that created
bwasbroadcast over every “layer” ofa.For
c, the operation was broadcast over every layer and row ofa- every 3-element column is identical.For
d, we switched it around - now everyrow is identical,across layers and columns.
For more information on broadcasting, see thePyTorchdocumentationon the topic.
Here are some examples of attempts at broadcasting that will fail:
Note
The following cell throws a run-time error. This is intentional.
a=torch.ones(4,3,2)b=a*torch.rand(4,3)# dimensions must match last-to-firstc=a*torch.rand(2,3)# both 3rd & 2nd dims differentd=a*torch.rand((0,))# can't broadcast with an empty tensor
More Math with Tensors#
PyTorch tensors have over three hundred operations that can be performedon them.
Here is a small sample from some of the major categories of operations:
# common functionsa=torch.rand(2,4)*2-1print('Common functions:')print(torch.abs(a))print(torch.ceil(a))print(torch.floor(a))print(torch.clamp(a,-0.5,0.5))# trigonometric functions and their inversesangles=torch.tensor([0,math.pi/4,math.pi/2,3*math.pi/4])sines=torch.sin(angles)inverses=torch.asin(sines)print('\nSine and arcsine:')print(angles)print(sines)print(inverses)# bitwise operationsprint('\nBitwise XOR:')b=torch.tensor([1,5,11])c=torch.tensor([2,7,10])print(torch.bitwise_xor(b,c))# comparisons:print('\nBroadcasted, element-wise equality comparison:')d=torch.tensor([[1.,2.],[3.,4.]])e=torch.ones(1,2)# many comparison ops support broadcasting!print(torch.eq(d,e))# returns a tensor of type bool# reductions:print('\nReduction ops:')print(torch.max(d))# returns a single-element tensorprint(torch.max(d).item())# extracts the value from the returned tensorprint(torch.mean(d))# averageprint(torch.std(d))# standard deviationprint(torch.prod(d))# product of all numbersprint(torch.unique(torch.tensor([1,2,1,2,1,2])))# filter unique elements# vector and linear algebra operationsv1=torch.tensor([1.,0.,0.])# x unit vectorv2=torch.tensor([0.,1.,0.])# y unit vectorm1=torch.rand(2,2)# random matrixm2=torch.tensor([[3.,0.],[0.,3.]])# three times identity matrixprint('\nVectors & Matrices:')print(torch.linalg.cross(v2,v1))# negative of z unit vector (v1 x v2 == -v2 x v1)print(m1)m3=torch.linalg.matmul(m1,m2)print(m3)# 3 times m1print(torch.linalg.svd(m3))# singular value decomposition
Common functions:tensor([[0.9238, 0.5724, 0.0791, 0.2629], [0.1986, 0.4439, 0.6434, 0.4776]])tensor([[-0., -0., 1., -0.], [-0., 1., 1., -0.]])tensor([[-1., -1., 0., -1.], [-1., 0., 0., -1.]])tensor([[-0.5000, -0.5000, 0.0791, -0.2629], [-0.1986, 0.4439, 0.5000, -0.4776]])Sine and arcsine:tensor([0.0000, 0.7854, 1.5708, 2.3562])tensor([0.0000, 0.7071, 1.0000, 0.7071])tensor([0.0000, 0.7854, 1.5708, 0.7854])Bitwise XOR:tensor([3, 2, 1])Broadcasted, element-wise equality comparison:tensor([[ True, False], [False, False]])Reduction ops:tensor(4.)4.0tensor(2.5000)tensor(1.2910)tensor(24.)tensor([1, 2])Vectors & Matrices:tensor([ 0., 0., -1.])tensor([[0.7375, 0.8328], [0.8444, 0.2941]])tensor([[2.2125, 2.4985], [2.5332, 0.8822]])torch.return_types.linalg_svd(U=tensor([[-0.7889, -0.6145], [-0.6145, 0.7889]]),S=tensor([4.1498, 1.0548]),Vh=tensor([[-0.7957, -0.6056], [ 0.6056, -0.7957]]))
This is a small sample of operations. For more details and the full inventory ofmath functions, have a look at thedocumentation.For more details and the full inventory of linear algebra operations, have alook at thisdocumentation.
Altering Tensors in Place#
Most binary operations on tensors will return a third, new tensor. Whenwe sayc=a*b (wherea andb are tensors), the new tensorc will occupy a region of memory distinct from the other tensors.
There are times, though, that you may wish to alter a tensor in place -for example, if you’re doing an element-wise computation where you candiscard intermediate values. For this, most of the math functions have aversion with an appended underscore (_) that will alter a tensor inplace.
For example:
a:tensor([0.0000, 0.7854, 1.5708, 2.3562])tensor([0.0000, 0.7071, 1.0000, 0.7071])tensor([0.0000, 0.7854, 1.5708, 2.3562])b:tensor([0.0000, 0.7854, 1.5708, 2.3562])tensor([0.0000, 0.7071, 1.0000, 0.7071])tensor([0.0000, 0.7071, 1.0000, 0.7071])
For arithmetic operations, there are functions that behave similarly:
Before:tensor([[1., 1.], [1., 1.]])tensor([[0.3788, 0.4567], [0.0649, 0.6677]])After adding:tensor([[1.3788, 1.4567], [1.0649, 1.6677]])tensor([[1.3788, 1.4567], [1.0649, 1.6677]])tensor([[0.3788, 0.4567], [0.0649, 0.6677]])After multiplyingtensor([[0.1435, 0.2086], [0.0042, 0.4459]])tensor([[0.1435, 0.2086], [0.0042, 0.4459]])
Note that these in-place arithmetic functions are methods on thetorch.Tensor object, not attached to thetorch module like manyother functions (e.g.,torch.sin()). As you can see froma.add_(b),the calling tensor is the one that gets changed inplace.
There is another option for placing the result of a computation in anexisting, allocated tensor. Many of the methods and functions we’ve seenso far - including creation methods! - have anout argument thatlets you specify a tensor to receive the output. If theout tensoris the correct shape anddtype, this can happen without a new memoryallocation:
a=torch.rand(2,2)b=torch.rand(2,2)c=torch.zeros(2,2)old_id=id(c)print(c)d=torch.matmul(a,b,out=c)print(c)# contents of c have changedassertcisd# test c & d are same object, not just containing equal valuesassertid(c)==old_id# make sure that our new c is the same object as the old onetorch.rand(2,2,out=c)# works for creation too!print(c)# c has changed againassertid(c)==old_id# still the same object!
tensor([[0., 0.], [0., 0.]])tensor([[0.3653, 0.8699], [0.2364, 0.3604]])tensor([[0.0776, 0.4004], [0.9877, 0.0352]])
Copying Tensors#
As with any object in Python, assigning a tensor to a variable makes thevariable alabel of the tensor, and does not copy it. For example:
tensor([[ 1., 561.], [ 1., 1.]])
But what if you want a separate copy of the data to work on? Theclone() method is there for you:
tensor([[True, True], [True, True]])tensor([[1., 1.], [1., 1.]])
There is an important thing to be aware of when using ``clone()``.If your source tensor has autograd, enabled then so will the clone.This will be covered more deeply in the video on autograd, but ifyou want the light version of the details, continue on.
In many cases, this will be what you want. For example, if your modelhas multiple computation paths in itsforward() method, andboththe original tensor and its clone contribute to the model’s output, thento enable model learning you want autograd turned on for both tensors.If your source tensor has autograd enabled (which it generally will ifit’s a set of learning weights or derived from a computation involvingthe weights), then you’ll get the result you want.
On the other hand, if you’re doing a computation whereneither theoriginal tensor nor its clone need to track gradients, then as long asthe source tensor has autograd turned off, you’re good to go.
There is a third case, though: Imagine you’re performing a computationin your model’sforward() function, where gradients are turned onfor everything by default, but you want to pull out some valuesmid-stream to generate some metrics. In this case, youdon’t want thecloned copy of your source tensor to track gradients - performance isimproved with autograd’s history tracking turned off. For this, you canuse the.detach() method on the source tensor:
tensor([[0.0905, 0.4485], [0.8740, 0.2526]], requires_grad=True)tensor([[0.0905, 0.4485], [0.8740, 0.2526]], grad_fn=<CloneBackward0>)tensor([[0.0905, 0.4485], [0.8740, 0.2526]])tensor([[0.0905, 0.4485], [0.8740, 0.2526]], requires_grad=True)
What’s happening here?
We create
awithrequires_grad=Trueturned on.We haven’tcovered this optional argument yet, but will during the unit onautograd.When we print
a, it informs us that the propertyrequires_grad=True- this means that autograd and computationhistory tracking are turned on.We clone
aand label itb. When we printb, we can seethat it’s tracking its computation history - it has inheriteda’s autograd settings, and added to the computation history.We clone
aintoc, but we calldetach()first.Printing
c, we see no computation history, and norequires_grad=True.
Thedetach() methoddetaches the tensor from its computationhistory. It says, “do whatever comes next as if autograd was off.” Itdoes thiswithout changinga - you can see that when we printa again at the end, it retains itsrequires_grad=True property.
Moving toAccelerator#
One of the major advantages of PyTorch is its robust acceleration on anacceleratorsuch as CUDA, MPS, MTIA, or XPU.So far, everything we’ve done has been on CPU. How do we move to the fasterhardware?
First, we should check whether an accelerator is available, with theis_available() method.
Note
If you do not have an accelerator, the executable cells in this section will not execute anyaccelerator-related code.
iftorch.accelerator.is_available():print('We have an accelerator!')else:print('Sorry, CPU only.')
We have an accelerator!
Once we’ve determined that one or more accelerators is available, we need to putour data someplace where the accelerator can see it. Your CPU does computationon data in your computer’s RAM. Your accelerator has dedicated memory attachedto it. Whenever you want to perform a computation on a device, you mustmoveall the data needed for that computation to memory accessible bythat device. (Colloquially, “moving the data to memory accessible by theGPU” is shorted to, “moving the data to the GPU”.)
There are multiple ways to get your data onto your target device. Youmay do it at creation time:
iftorch.accelerator.is_available():gpu_rand=torch.rand(2,2,device=torch.accelerator.current_accelerator())print(gpu_rand)else:print('Sorry, CPU only.')
tensor([[0.3344, 0.2640], [0.2119, 0.0582]], device='cuda:0')
By default, new tensors are created on the CPU, so we have to specifywhen we want to create our tensor on the accelerator with the optionaldevice argument. You can see when we print the new tensor, PyTorchinforms us which device it’s on (if it’s not on CPU).
You can query the number of accelerators withtorch.accelerator.device_count(). Ifyou have more than one accelerator, you can specify them by index, take CUDA for example:device='cuda:0',device='cuda:1', etc.
As a coding practice, specifying our devices everywhere with stringconstants is pretty fragile. In an ideal world, your code would performrobustly whether you’re on CPU or accelerator hardware. You can do this bycreating a device handle that can be passed to your tensors instead of astring:
my_device=torch.accelerator.current_accelerator()iftorch.accelerator.is_available()elsetorch.device('cpu')print('Device:{}'.format(my_device))x=torch.rand(2,2,device=my_device)print(x)
Device: cudatensor([[0.0024, 0.6778], [0.2441, 0.6812]], device='cuda:0')
If you have an existing tensor living on one device, you can move it toanother with theto() method. The following line of code creates atensor on CPU, and moves it to whichever device handle you acquired inthe previous cell.
y=torch.rand(2,2)y=y.to(my_device)
It is important to know that in order to do computation involving two ormore tensors,all of the tensors must be on the same device. Thefollowing code will throw a runtime error, regardless of whether youhave an accelerator device available, take CUDA for example:
x=torch.rand(2,2)y=torch.rand(2,2,device='cuda')z=x+y# exception will be thrown
Manipulating Tensor Shapes#
Sometimes, you’ll need to change the shape of your tensor. Below, we’lllook at a few common cases, and how to handle them.
Changing the Number of Dimensions#
One case where you might need to change the number of dimensions ispassing a single instance of input to your model. PyTorch modelsgenerally expectbatches of input.
For example, imagine having a model that works on 3 x 226 x 226 images -a 226-pixel square with 3 color channels. When you load and transformit, you’ll get a tensor of shape(3,226,226). Your model, though,is expecting input of shape(N,3,226,226), whereN is thenumber of images in the batch. So how do you make a batch of one?
torch.Size([3, 226, 226])torch.Size([1, 3, 226, 226])
Theunsqueeze() method adds a dimension of extent 1.unsqueeze(0) adds it as a new zeroth dimension - now you have abatch of one!
So if that’sunsqueezing? What do we mean by squeezing? We’re takingadvantage of the fact that any dimension of extent 1does not changethe number of elements in the tensor.
c=torch.rand(1,1,1,1,1)print(c)
tensor([[[[[0.2347]]]]])
Continuing the example above, let’s say the model’s output is a20-element vector for each input. You would then expect the output tohave shape(N,20), whereN is the number of instances in theinput batch. That means that for our single-input batch, we’ll get anoutput of shape(1,20).
What if you want to do somenon-batched computation with that output -something that’s just expecting a 20-element vector?
torch.Size([1, 20])tensor([[0.1899, 0.4067, 0.1519, 0.1506, 0.9585, 0.7756, 0.8973, 0.4929, 0.2367, 0.8194, 0.4509, 0.2690, 0.8381, 0.8207, 0.6818, 0.5057, 0.9335, 0.9769, 0.2792, 0.3277]])torch.Size([20])tensor([0.1899, 0.4067, 0.1519, 0.1506, 0.9585, 0.7756, 0.8973, 0.4929, 0.2367, 0.8194, 0.4509, 0.2690, 0.8381, 0.8207, 0.6818, 0.5057, 0.9335, 0.9769, 0.2792, 0.3277])torch.Size([2, 2])torch.Size([2, 2])
You can see from the shapes that our 2-dimensional tensor is now1-dimensional, and if you look closely at the output of the cell aboveyou’ll see that printinga shows an “extra” set of square brackets[] due to having an extra dimension.
You may onlysqueeze() dimensions of extent 1. See above where wetry to squeeze a dimension of size 2 inc, and get back the sameshape we started with. Calls tosqueeze() andunsqueeze() canonly act on dimensions of extent 1 because to do otherwise would changethe number of elements in the tensor.
Another place you might useunsqueeze() is to ease broadcasting.Recall the example above where we had the following code:
a=torch.ones(4,3,2)c=a*torch.rand(3,1)# 3rd dim = 1, 2nd dim identical to aprint(c)
The net effect of that was to broadcast the operation over dimensions 0and 2, causing the random, 3 x 1 tensor to be multiplied element-wise byevery 3-element column ina.
What if the random vector had just been 3-element vector? We’d lose theability to do the broadcast, because the final dimensions would notmatch up according to the broadcasting rules.unsqueeze() comes tothe rescue:
a=torch.ones(4,3,2)b=torch.rand(3)# trying to multiply a * b will give a runtime errorc=b.unsqueeze(1)# change to a 2-dimensional tensor, adding new dim at the endprint(c.shape)print(a*c)# broadcasting works again!
torch.Size([3, 1])tensor([[[0.1891, 0.1891], [0.3952, 0.3952], [0.9176, 0.9176]], [[0.1891, 0.1891], [0.3952, 0.3952], [0.9176, 0.9176]], [[0.1891, 0.1891], [0.3952, 0.3952], [0.9176, 0.9176]], [[0.1891, 0.1891], [0.3952, 0.3952], [0.9176, 0.9176]]])
Thesqueeze() andunsqueeze() methods also have in-placeversions,squeeze_() andunsqueeze_():
batch_me=torch.rand(3,226,226)print(batch_me.shape)batch_me.unsqueeze_(0)print(batch_me.shape)
torch.Size([3, 226, 226])torch.Size([1, 3, 226, 226])
Sometimes you’ll want to change the shape of a tensor more radically,while still preserving the number of elements and their contents. Onecase where this happens is at the interface between a convolutionallayer of a model and a linear layer of the model - this is common inimage classification models. A convolution kernel will yield an outputtensor of shapefeatures x width x height, but the following linearlayer expects a 1-dimensional input.reshape() will do this for you,provided that the dimensions you request yield the same number ofelements as the input tensor has:
output3d=torch.rand(6,20,20)print(output3d.shape)input1d=output3d.reshape(6*20*20)print(input1d.shape)# can also call it as a method on the torch module:print(torch.reshape(output3d,(6*20*20,)).shape)
torch.Size([6, 20, 20])torch.Size([2400])torch.Size([2400])
Note
The(6*20*20,) argument in the final line of the cellabove is because PyTorch expects atuple when specifying atensor shape - but when the shape is the first argument of a method, itlets us cheat and just use a series of integers. Here, we had to add theparentheses and comma to convince the method that this is really aone-element tuple.
When it can,reshape() will return aview on the tensor to bechanged - that is, a separate tensor object looking at the sameunderlying region of memory.This is important: That means any changemade to the source tensor will be reflected in the view on that tensor,unless youclone() it.
Thereare conditions, beyond the scope of this introduction, wherereshape() has to return a tensor carrying a copy of the data. Formore information, see thedocs.
NumPy Bridge#
In the section above on broadcasting, it was mentioned that PyTorch’sbroadcast semantics are compatible with NumPy’s - but the kinshipbetween PyTorch and NumPy goes even deeper than that.
If you have existing ML or scientific code with data stored in NumPyndarrays, you may wish to express that same data as PyTorch tensors,whether to take advantage of PyTorch’s GPU acceleration, or itsefficient abstractions for building ML models. It’s easy to switchbetween ndarrays and PyTorch tensors:
importnumpyasnpnumpy_array=np.ones((2,3))print(numpy_array)pytorch_tensor=torch.from_numpy(numpy_array)print(pytorch_tensor)
[[1. 1. 1.] [1. 1. 1.]]tensor([[1., 1., 1.], [1., 1., 1.]], dtype=torch.float64)
PyTorch creates a tensor of the same shape and containing the same dataas the NumPy array, going so far as to keep NumPy’s default 64-bit floatdata type.
The conversion can just as easily go the other way:
pytorch_rand=torch.rand(2,3)print(pytorch_rand)numpy_rand=pytorch_rand.numpy()print(numpy_rand)
tensor([[0.8716, 0.2459, 0.3499], [0.2853, 0.9091, 0.5695]])[[0.87163675 0.2458961 0.34993553] [0.2853077 0.90905803 0.5695162 ]]
It is important to know that these converted objects are usingthe sameunderlying memory as their source objects, meaning that changes to oneare reflected in the other:
numpy_array[1,1]=23print(pytorch_tensor)pytorch_rand[1,1]=17print(numpy_rand)
tensor([[ 1., 1., 1.], [ 1., 23., 1.]], dtype=torch.float64)[[ 0.87163675 0.2458961 0.34993553] [ 0.2853077 17. 0.5695162 ]]
Total running time of the script: (0 minutes 0.439 seconds)