Python Engineer

Free Python and Machine Learning Tutorials

Become A Patron and get exclusive content! Get access to ML From Scratch notebooks, join a private Slack channel, get priority response, and more! I really appreciate the support!

back to course overview

Autograd - PyTorch Beginner 03

25 Dec 2019

Learn all the basics you need to get started with this deep learning framework! In this part we learn how to calculate gradients using the autograd package in PyTorch. This tutorial contains the following topics:

All code from this course can be found on GitHub.

The Autograd package

The autograd package provides automatic differentiation for all operations on Tensors. To tell PyTorch that we want the gradient, we have to set requires_grad=True. With this attribute set, all operations on the tensor are tracked in the computational graph.

import torch # requires_grad = True -> tracks all operations on the tensor. x = torch.randn(3, requires_grad=True) y = x + 2 # y was created as a result of an operation, so it has a grad_fn attribute. # grad_fn: references a Function that has created the Tensor print(x) # created by the user -> grad_fn is None print(y) print(y.grad_fn) # Do more operations on y z = y * y * 3 print(z) z = z.mean() print(z)

Let's compute the gradients with backpropagation

When we finish our computation we can simply call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute. It is the partial derivate of the function w.r.t. the tensor.

z.backward() print(x.grad) # dz/dx

Generally speaking, torch.autograd is an engine for computing vector-Jacobian product. It computes partial derivates while applying the chain rule.

# Model with non-scalar output: # If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() # specify a gradient argument that is a tensor of matching shape. # needed for vector-Jacobian product x = torch.randn(3, requires_grad=True) y = x * 2 for _ in range(10): y = y * 2 print(y) print(y.shape) v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32) y.backward(v) print(x.grad)

Stop a tensor from tracking history

For example during our training loop when we want to update our weights then this update operation should not be part of the gradient computation. We have 3 options to stop gradient calculations:

.requires_grad_(...) changes an existing flag in-place:

a = torch.randn(2, 2) print(a.requires_grad) b = ((a * 3) / (a - 1)) print(b.grad_fn) a.requires_grad_(True) print(a.requires_grad) b = (a * a).sum() print(b.grad_fn)

.detach(): get a new Tensor with the same content but no gradient computation:

a = torch.randn(2, 2, requires_grad=True) print(a.requires_grad) b = a.detach() print(b.requires_grad)

wrap in with torch.no_grad():

a = torch.randn(2, 2, requires_grad=True) print(a.requires_grad) with torch.no_grad(): print((x ** 2).requires_grad)

Empty gradients!

backward() accumulates the gradient for this tensor into the .grad attribute. We need to be careful during optimization !!!
-> Use .zero_() to empty the gradients before a new optimization step!

weights = torch.ones(4, requires_grad=True) for epoch in range(3): # just a dummy example model_output = (weights*3).sum() model_output.backward() print(weights.grad) # optimize model, i.e. adjust weights... with torch.no_grad(): weights -= 0.1 * weights.grad # this is important! It affects the final weights & output weights.grad.zero_() print(weights) print(model_output)

Optimizer has zero_grad() method

(We will learn about optimizer in tutorial #6)

optimizer = torch.optim.SGD([weights], lr=0.1) # During training: optimizer.step() optimizer.zero_grad()