Learn all the basics you need to get started with this deep learning framework! In this part we learn how to calculate gradients using the autograd package in PyTorch. This tutorial contains the following topics:

`requires_grad`

attribute for Tensors- Computational graph
- Backpropagation (brief explanation)
- How to stop autograd from tracking history
- How to zero (empty) gradients

All code from this course can be found on GitHub.

## The Autograd package

The autograd package provides automatic differentiation for all operations on Tensors. To tell PyTorch that we want the gradient, we have to set `requires_grad=True`

. With this attribute set, all operations on the tensor are tracked in the computational graph.

```
import torch
# requires_grad = True -> tracks all operations on the tensor.
x = torch.randn(3, requires_grad=True)
y = x + 2
# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
print(x) # created by the user -> grad_fn is None
print(y)
print(y.grad_fn)
# Do more operations on y
z = y * y * 3
print(z)
z = z.mean()
print(z)
```

## Let's compute the gradients with backpropagation

When we finish our computation we can simply call `.backward()`

and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad`

attribute. It is the partial derivate of the function w.r.t. the tensor.

```
z.backward()
print(x.grad) # dz/dx
```

Generally speaking, `torch.autograd`

is an engine for computing vector-Jacobian product. It computes partial derivates while applying the chain rule.

```
# Model with non-scalar output:
# If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward()
# specify a gradient argument that is a tensor of matching shape.
# needed for vector-Jacobian product
x = torch.randn(3, requires_grad=True)
y = x * 2
for _ in range(10):
y = y * 2
print(y)
print(y.shape)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)
y.backward(v)
print(x.grad)
```

## Stop a tensor from tracking history

For example during our training loop when we want to update our weights then this update operation should not be part of the gradient computation. We have 3 options to stop gradient calculations:

`x.requires_grad_(False)`

`x.detach()`

- wrap in
`with torch.no_grad():`

`.requires_grad_(...)`

changes an existing flag in-place:

```
a = torch.randn(2, 2)
print(a.requires_grad)
b = ((a * 3) / (a - 1))
print(b.grad_fn)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
```

`.detach()`

: get a new Tensor with the same content but no gradient computation:

```
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
b = a.detach()
print(b.requires_grad)
```

### wrap in `with torch.no_grad()`

:

```
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
```

## Empty gradients!

`backward()`

**accumulates the gradient** for this tensor into the `.grad`

attribute. We need to be careful during optimization !!!

-> Use `.zero_()`

to empty the gradients before a new optimization step!

```
weights = torch.ones(4, requires_grad=True)
for epoch in range(3):
# just a dummy example
model_output = (weights*3).sum()
model_output.backward()
print(weights.grad)
# optimize model, i.e. adjust weights...
with torch.no_grad():
weights -= 0.1 * weights.grad
# this is important! It affects the final weights & output
weights.grad.zero_()
print(weights)
print(model_output)
```

### Optimizer has `zero_grad()`

method

(We will learn about optimizer in tutorial #6)

```
optimizer = torch.optim.SGD([weights], lr=0.1)
# During training:
optimizer.step()
optimizer.zero_grad()
```