Learn all the basics you need to get started with this deep learning framework! In this part we learn how to calculate gradients using the autograd package in PyTorch. This tutorial contains the following topics:
requires_gradattribute for Tensors
- Computational graph
- Backpropagation (brief explanation)
- How to stop autograd from tracking history
- How to zero (empty) gradients
All code from this course can be found on GitHub.
The Autograd package
The autograd package provides automatic differentiation for all operations on Tensors. To tell PyTorch that we want the gradient, we have to set
requires_grad=True. With this attribute set, all operations on the tensor are tracked in the computational graph.
import torch # requires_grad = True -> tracks all operations on the tensor. x = torch.randn(3, requires_grad=True) y = x + 2 # y was created as a result of an operation, so it has a grad_fn attribute. # grad_fn: references a Function that has created the Tensor print(x) # created by the user -> grad_fn is None print(y) print(y.grad_fn) # Do more operations on y z = y * y * 3 print(z) z = z.mean() print(z)
Let's compute the gradients with backpropagation
When we finish our computation we can simply call
.backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into
.grad attribute. It is the partial derivate of the function w.r.t. the tensor.
z.backward() print(x.grad) # dz/dx
torch.autograd is an engine for computing vector-Jacobian product. It computes partial derivates while applying the chain rule.
# Model with non-scalar output: # If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() # specify a gradient argument that is a tensor of matching shape. # needed for vector-Jacobian product x = torch.randn(3, requires_grad=True) y = x * 2 for _ in range(10): y = y * 2 print(y) print(y.shape) v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32) y.backward(v) print(x.grad)
Stop a tensor from tracking history
For example during our training loop when we want to update our weights then this update operation should not be part of the gradient computation. We have 3 options to stop gradient calculations:
- wrap in
.requires_grad_(...) changes an existing flag in-place:
a = torch.randn(2, 2) print(a.requires_grad) b = ((a * 3) / (a - 1)) print(b.grad_fn) a.requires_grad_(True) print(a.requires_grad) b = (a * a).sum() print(b.grad_fn)
.detach(): get a new Tensor with the same content but no gradient computation:
a = torch.randn(2, 2, requires_grad=True) print(a.requires_grad) b = a.detach() print(b.requires_grad)
a = torch.randn(2, 2, requires_grad=True) print(a.requires_grad) with torch.no_grad(): print((x ** 2).requires_grad)
backward() accumulates the gradient for this tensor into the
.grad attribute. We need to be careful during optimization !!!
.zero_() to empty the gradients before a new optimization step!
weights = torch.ones(4, requires_grad=True) for epoch in range(3): # just a dummy example model_output = (weights*3).sum() model_output.backward() print(weights.grad) # optimize model, i.e. adjust weights... with torch.no_grad(): weights -= 0.1 * weights.grad # this is important! It affects the final weights & output weights.grad.zero_() print(weights) print(model_output)
(We will learn about optimizer in tutorial #6)
optimizer = torch.optim.SGD([weights], lr=0.1) # During training: optimizer.step() optimizer.zero_grad()