Skip to content

Autograd - PyTorch Beginner 03

In this part we learn how to calculate gradients using the autograd package in PyTorch.

Learn all the basics you need to get started with this deep learning framework! In this part we learn how to calculate gradients using the autograd package in PyTorch. This tutorial contains the following topics:

  • requires_grad attribute for Tensors
  • Computational graph
  • Backpropagation (brief explanation)
  • How to stop autograd from tracking history
  • How to zero (empty) gradients

All code from this course can be found on GitHub.

The Autograd package

The autograd package provides automatic differentiation for all operations on Tensors. To tell PyTorch that we want the gradient, we have to set requires_grad=True. With this attribute set, all operations on the tensor are tracked in the computational graph.

import torch
# requires_grad = True -> tracks all operations on the tensor. 
x = torch.randn(3, requires_grad=True)
y = x + 2

# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
print(x) # created by the user -> grad_fn is None

# Do more operations on y
z = y * y * 3
z = z.mean()

Let's compute the gradients with backpropagation

When we finish our computation we can simply call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute. It is the partial derivate of the function w.r.t. the tensor.

print(x.grad) # dz/dx

Generally speaking, torch.autograd is an engine for computing vector-Jacobian product. It computes partial derivates while applying the chain rule.

# Model with non-scalar output:
# If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() 
# specify a gradient argument that is a tensor of matching shape.
# needed for vector-Jacobian product

x = torch.randn(3, requires_grad=True)

y = x * 2
for _ in range(10):
    y = y * 2


v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)

Stop a tensor from tracking history

For example during our training loop when we want to update our weights then this update operation should not be part of the gradient computation. We have 3 options to stop gradient calculations:

  • x.requires_grad_(False)
  • x.detach()
  • wrap in with torch.no_grad():

.requires_grad_(...) changes an existing flag in-place:

a = torch.randn(2, 2)
b = ((a * 3) / (a - 1))
b = (a * a).sum()

.detach(): get a new Tensor with the same content but no gradient computation:

a = torch.randn(2, 2, requires_grad=True)
b = a.detach()

wrap in with torch.no_grad():

a = torch.randn(2, 2, requires_grad=True)
with torch.no_grad():
    print((x ** 2).requires_grad)

Empty gradients!

backward() accumulates the gradient for this tensor into the .grad attribute. We need to be careful during optimization !!!
-> Use .zero_() to empty the gradients before a new optimization step!

weights = torch.ones(4, requires_grad=True)

for epoch in range(3):
    # just a dummy example
    model_output = (weights*3).sum()


    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        weights -= 0.1 * weights.grad

    # this is important! It affects the final weights & output


Optimizer has zero_grad() method

(We will learn about optimizer in tutorial #6)

optimizer = torch.optim.SGD([weights], lr=0.1)
# During training:

FREE VS Code / PyCharm Extensions I Use

鉁 Write cleaner code with Sourcery, instant refactoring suggestions: Link*

PySaaS: The Pure Python SaaS Starter Kit

馃殌 Build a software business faster with pure Python: Link*

* These are affiliate link. By clicking on it you will not have any additional costs. Instead, you will support my project. Thank you! 馃檹