What are the two main features of PyTorch?
  1. N-dimensional Tensor that can run on GPUs
  2. Autograd that enables automatic differentiation for building and training neural network
Why PyTorch Tensor and not Numpy?

Numpy provides n-dimensional array which it’s similar to n-dimensional tensor. However, Numpy is a scientific computing framework that doesn’t know anything about computation graphs or deep learning or gradients. This means that it cannot utilise GPUs to accelerate numerical computations.

This is why Tensor is used here. Tensor is conceptually identical to numpy array except it keeps track of a computational graph and gradients. It can utilise GPUs.

PyTorch on 2-layer Neural Network

1. Import dependencies and initialise input and weights

In [1]:
import torch

dtype = torch.float
device = torch.device('cpu') # running on CPU
# device = torch.device('cuda:0') # running on GPU
In [2]:
N = 64 # batch_size
D_in = 1000 # input dimension
H = 100 # hidden dimension
D_out = 10 # output dimension
In [3]:
x = torch.randn(N, D_in, device = device, dtype = dtype) # randomly generated input
y = torch.randn(N, D_out, device = device, dtype = dtype) # randomly generated output
In [9]:
print(x.shape)
print(y.shape)
torch.Size([64, 1000])
torch.Size([64, 10])
In [8]:
w1 = torch.randn(D_in, H, device = device, dtype = dtype) # randomly initialise weight1
w2 = torch.randn(H, D_out, device = device, dtype = dtype) # randomly initialise weight2
In [10]:
print(w1.shape)
print(w2.shape)
torch.Size([1000, 100])
torch.Size([100, 10])
In [11]:
learning_rate = 1e-6

2. Training

In [12]:
for t in range(500):
    
    # Forward
    h = x.mm(w1) # matrix multiplication of x (64, 1000) and w1 (1000, 100)
    h_relu = h.clamp(min = 0) # non-linearity function
    y_pred = h_relu.mm(w2) # matrix multiplication between output of layer 1 (64, 100) and w2 (100, 10)
    
    # Compute loss
    loss = (y_pred - y).pow(2).sum().item() # compute mean squared error
    if t % 100 == 99:
        print(t, loss)
    
    # Backprop
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # Update weights
    w1 = w1 - learning_rate*grad_w1
    w2 = w2 - learning_rate*grad_w2
99 308.5810546875
199 0.8532795310020447
299 0.00400793831795454
399 0.00013242007116787136
499 2.9330951292649843e-05

AutoGrad

In the above we have to manually implement both forward and backward passes. As our neural network gets larger and larger, the implementation of the backward pass becomes more complex. The autograd of PyTorch can automatically compute backward passes for us. When using autograd, the forward pass would define the computational graph whereby the nodes would be tensor and the edges would be functions that produce output tensors. Backprop through this graph allows us to easily compute gradients.

If x is a Tensor (with required_grad = True), then x would have the attribute x.grad which stores the gradient of x.

1. Initialise inputs and weights

Remember to set requires_grad = True for those variables that you want to do backprop!

In [20]:
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)


w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

2. Training (with autograd)

In [21]:
for t in range(500):
    
    # Forward pass
    y_pred = x.mm(w1).clamp(min = 0).mm(w2)
    
    # Compute loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item()) # .item() gets the scalar value when tensor is shape (1, )
    
    # Backprop using autograd. This will compute gradient of loss w.r.t all Tensors with requires_grad = True, which
    # in our case is weight 1 and 2.
    loss.backward()
    
    with torch.no_grad():
        # update the weights inside the torch.no_grad() as we don't need to track the weights in autograd
        
        # inplace subtraction instead of assigning it to a new tensor
        w1.sub_(w1.grad*learning_rate)
        w2.sub_(w2.grad*learning_rate)
        
        # set gradient to zero after weights update
        w1.grad.zero_()
        w2.grad.zero_()
99 513.0426635742188
199 1.752084493637085
299 0.008917410857975483
399 0.00017502835544291884
499 3.062380346818827e-05

PyTorch nn module

The nn package provides high level abstractions over raw computational graphs. It includes a set of modules that are roughly equivalent to neural network layers. A module takes in input Tensor and output Tensor and can hold internal state such as learnable parameters.

The nn package also has a set of common loss functions.

optim

The optim package provides implementations of commonly used optimisation algorithms.

In [22]:
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

1. Using nn package to define different layers in a two-layer network

In [24]:
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H), 
    torch.nn.ReLU(), 
    torch.nn.Linear(H, D_out),
)

2. Using nn package to define loss function us

In [25]:
loss_fn = torch.nn.MSELoss(reduction = 'sum')

3. Training

In [26]:
learning_rate = 1e-4
In [27]:
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
In [28]:
for t in range(500):
    
    # Forward
    y_pred = model(x)
    
    # Loss
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())
    
    # Zero the gradients before running backprop (as gradients are accumulated by default)
    optimizer.zero_grad()
    
    # Backprop
    loss.backward()
    
    # Update weights
    optimizer.step()
99 47.96703338623047
199 1.0740092992782593
299 0.04033464938402176
399 0.0012894074898213148
499 1.719343345030211e-05

How to create your own complex model using torch.nn.Module

  1. Create a subclass inheriting from torch.nn.Module
  2. Define the forward function that takes in input Tensors and return output Tensors
In [30]:
class TwoLayerNet(torch.nn.Module):
    
    def __init__(self, D_in, H, D_out):
        
        super(TwoLayerNet, self).__init__()
        
        # Initialise all the layers of your model
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)
    
    def forward(self, x):
        
        # Here, we define our forward pass!
        
        h_relu = self.linear1(x).clamp(min = 0)
        y_pred = self.linear2(h_relu)
        
        return y_pred
In [31]:
model = TwoLayerNet(D_in, H, D_out)
Ryan

Ryan

Data Scientist

Leave a Reply