An Introduction to PyTorch Fundamentals for Training DL Models

This blog post explains the basics of PyTorch Tensors, the workflow to train a 2 layer Neural Network for a vision dataset and track the progress in a Tensorboard
Coding
DL
Author

Senthil Kumar

Published

August 15, 2021

Introduction

  • We have taken FashionMNIST dataset and prepared a simple 2-layer NN model to uncover the fundamental concepts of PyTorch
  • Before going into the DL portions, let us look at Tensors first

0. What are Tensors

  • Tensors are like numerical arrays that encode the input, output and weights/parameters of a model in the form of arrays and matrices.
  • Typical 1D and 2D arrays:

image Source: docs.microsoft.com/en-US/learn

  • How to imagine a 3D array:

image Source: docs.microsoft.com/en-US/learn

  • Tensors work better on GPUs. They are optimized for automatic differentiation
  • Tensors and numpy often have the same memory address. For example, review the code below
import numpy as np
import torch

data = [[1,2],[3,4]]
np_array = np.array(data)
tensor_array = torch.from_numpy(np_array)

# doing multiplication opearation on `np_array`
np.multiply(np_array,2,out=np_array)

print(f"Numpy array:{np_array}")
print(f"Tensor array:{tensor_array}")
Numpy array:[[2 4]
 [6 8]]
Tensor array:tensor([[2, 4],
        [6, 8]])

How to initialize a tensor?:

# directly from a python datastructure element
data = [[1,2],[3,4]]
x_tensor_from_data = torch.tensor(data)

# from numpy_array
np_array = np.array(data)
x_tensor_from_numpy = torch.from_numpy(np_array)

# from other tensors
x_new_tensor = torch.rand_like(x_tensor_from_data, dtype=torch.float) # dtype overrides the dtype of z_tensor_from_data
    
# random or new tensor of given shape
shape = (2,3,) # or just (2,3)
x_new_tensor_2 = torch.ones(shape)

What are the attributes of a tensor?:

print(f"{x_new_tensor_2.shape}")
print(f"{x_new_tensor_2.dtype}")
print(f"{x_new_tensor_2.device}") # whether stored in CPU or GPU

When to use CPU and and when to use GPU while operating tensors?:

  • Some common tensor operations include: Any arithmetic operation, linear algebra, matrix manipulation (transposing, indexing, slicing)
  • Typical GPUs have 1000s of cores. GPUs can handle parallel processing.

image Source: docs.microsoft.com/en-US/learn

  • Typical CPUs have 4 cores. Modern CPUs can have upto 16 cores. Cores are units that do the actual computation. Each core processes tasks in sequential order

image Source: docs.microsoft.com/en-US/learn

  • Caveat: Copying large tensors across devices can be expensive w.r.t time and memory

  • PyTorch uses Nvidia CUDA library in the backend to operate on GPU cards

if torch.cuda._is_available():
    gpu_tensor = original_tensor.to('cuda') 

What are the common tensor operations?:
- Joining or ConCATenate

new_tensor = torch.cat([tensor, tensor],dim=1) # join along column if dim=1
  • Matrix Multiplication
# you would have to do the transpose
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)
assert y1 = y2 = y3
  • Element-wise Multiplication
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
  • Single element tensor into python numerical value
sum_of_values = tensor.sum()
sum_of_values_python_variable = sum_of_values.item()
print(sum_of_values.dtype, type(sum_of_values_python_variable))
# >> torch.int64, <class 'int'>
  • In-place Operations
# add in_place
tensor.add_(5)
# transpose  in place
tensor.t_()

Summary of the key operations

  • torch.cuda.is_available() gives a boolean output
  • torch.tensor(x)
  • x could be a 1D or 2D iterable (list or tuple)
  • torch.ones_like(tensor_variable), torch.rand_like(tensor_variable)
  • torch.ones(shape_in_a_tuple_or_list) , torch.zeros(shape_in_a_tuple_or_list) and torch.rand(shape_in_a_tuple_or_list)
  • torch_tensor_variable[start_index:end_index:step_value] (similar to a numpy indexing)
  • numpy to torch tensor: torch.from_numpy(np_array)
  • torch_tensor to numpy: torch_tensor_variable.numpy()
  • Concatenate across rows torch.cat((an_iterable_of_tensors),dim=0)
  • Concatenate across columns torch.cat((an_iterable_of_tensors),dim=1)
  • tensor multiplication tensor1 * tensor2 == torch.mul(tensor1,tensor2,out=tensor3) == tensor1.mul(tensor2)
  • convert single_element_tensor into a python datatype using .item() –> single_element_tensor = tensor1.sum(); python_variable = single_element_tensor.item()
  • In-place Operations in torch using _: x.add_(5) will add 5 to each element of x
  • tensor n = t.numpy() & np.add(n,2,out=n) –> A change in n will automatically change t (vice versa is true too)

Importing relevant modules

Code
%matplotlib inline
import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler
# torchvision.datasets module contains `Dataset` objects for many real-world vision data
from torchvision import datasets # other domain-specific libraries TorchAudio, TorchText
from torchvision.transforms import (
    ToTensor, # for normalizing the pixel values to the range [0,1]
    Lambda, # to make user-defined functions as one of the transformations 
    )
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter
import numpy as np

1. Dataset and DataLoaders

Two data primitives to handle data efficiently:
- torch.utils.data.Dataset - torch.utils.data.DataLoader

How should the data be preprocessed before training in DL?:
- Pass samples of data in minibatches - reshuffle the data at every epoch to overfitting - leverage Python’s multiprocessing to speed up data retrieval

torch.utils.data.DataLoader abstracts all the above steps

What does Dataset do? - Dataset: Stores data samples and their corresponding labels - DataLoader: Wraps an iterable around Dataset to enable easy access to the samples. DataLoader can also be used along with torch.multiprocessing - torchvision.datasets and torchtext.datasets are both subclasses of torch.utils.data.Dataset (they have getitem and len methods implemented) and also they can be passed to a torch.utils.data.DataLoader

What does normalization do?:
- Changes the range of the data - When one pixel value is 15 and another pixel is 190, the higher pixel value will deviate the learning

Why do we do normalization of data before training a DL: - Prediction accuracy is better for normalized data - Model can learn faster if data is normalized

More details on PyTorch Primitives

  • torchvision.datasets –> to use pre-existing datasets like FashionMNIST, coco, cifar, etc.,
  • torchvision.datasets have arguments/parameters to transform featuers (aka inputs) and target_transform to transform labels (like one hot encoding of labels
  • CustomDatasetClass must overwrite the magic methods of python such as - __init__, __getitem__ and __len__ methods inherited from Dataset
  • torchvision.transforms.ToTensor (to transform/modify the features) and torchvision.transforms.Lambda (to transform the target/labels) - torchvision.transforms.ToTensor() converts features to normalized tensors - torchvision.transforms.Lambda could be used to transform labels - Lambda(lambda y: torch.zeros(number_of_classes,dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1) ) - Tensor.scatter_ is used to change values of a tensor variable at specified indices

1A. Converting Data into Model Suitable Iterables

  • Downloading and transforming the datasets
  • Preparing train, validation and test datasets
# help(datasets.FashionMNIST)
help(datasets.MNIST)
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448265233/work/torch/csrc/utils/tensor_numpy.cpp:180.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
print(test_data.test_labels[0:5])
tensor([9, 2, 1, 1, 6])
training_data.class_to_idx
{'T-shirt/top': 0,
 'Trouser': 1,
 'Pullover': 2,
 'Dress': 3,
 'Coat': 4,
 'Sandal': 5,
 'Shirt': 6,
 'Sneaker': 7,
 'Bag': 8,
 'Ankle boot': 9}
# If you have a custom dataset in your location

class CustomImageDataset(Dataset):
    """FashionMNIST like Image Dataset Class"""
    def __init__(self, 
                 annotations_file,
                 img_dir,
                 transform=None,
                 target_transform=None):
        """
        Args:
            transform (Optional): dataset will take an optional argument transform 
                so that any required processing can be applied on the sample
        """
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform
    
    def __len__(self):
        return len(self.img)
    
    def __getitem__(self, idx):
        # format of data 
        # image_location, label_type
        # tshirt1.jpg, T-shirt/top # class needs to be convered into numerical format
        # pant4.jpg, Trouser # class needs to be convered into numerical format
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx,0])
        image = tvio.read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        sample = {"image": image, "label": label}
        return sample


# target_transform
# turn the integer y values into a `one_hot_encoded` vector 
# 1. create a zero tensor of size 10 torch.zeros(10, dtype=torch.float)
# 2. `scatter_` assigns a value =1
the_target_lambda_function = Lambda(lambda y: torch.zeros(10,
                                    dtype=troch.float).scatter_(dim=0,
                                                    index=torch.tensor(y), value=1))


training_data = CustomImageDataset(
    root="data", # the path where the train/test data is stored
    train=True, # False if it is a test dataset 
    download=False, # downloads the data from Web if not available at root
    transform=ToTensor(), # transform the features; converts PIL image or numpy array into a FloatTensor and scaled the image's pixel intensity to the range [0,1]
    target_transform=the_target_lambda_function
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=False,
    transform=ToTensor(),
    target_transform=the_target_lambda_function
    # target_transform=torch.nn.functional.one_hot(y, num_classes=10) # alternate way
)

Preparing Validation Data from Test Data

indices = list(range(len(training_data)))
np.random.shuffle(indices)

print(indices[0:5])
[7400, 11594, 9947, 24051, 56426]
split = int(np.floor(0.2 * len(training_data)))
training_data_sample = SubsetRandomSampler(indices[split:])
validation_data_sample = SubsetRandomSampler(indices[:split])

Convert into iterables

batchsize = 4

# create iterables 
train_dataloader = DataLoader(training_data, sampler=training_data_sample, batch_size=batchsize)
validation_dataloader = DataLoader(training_data, sampler=validation_data_sample, batch_size=batchsize)
test_dataloader = DataLoader(test_data, batch_size=batchsize)

print(len(train_dataloader))
print(len(validation_dataloader))
print(len(test_dataloader))

# to understand the shape of input features and output
for X,y in test_dataloader:
    print("Shape of Features:",X.shape)
    print("Shape of Labels:",y.shape)
    break
12000
3000
2500
Shape of Features: torch.Size([4, 1, 28, 28])
Shape of Labels: torch.Size([4])
len(train_dataloader)
12000
  • The above shape of training image is in the format NCHW
  • batchsize N, no. of channels C, height H, width W

1B. Visualize sample data

dataiter = iter(train_dataloader)
images, labels = dataiter.next()

fig = plt.figure(figsize=(5,5))

for idx in np.arange(4):
    ax = fig.add_subplot(1, 4, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title(labels[idx].item())
    fig.tight_layout()

# looking into just one image, label

figure = plt.figure(figsize=(5,5))
img, label = test_data[0]

plt.axis("off")
plt.imshow(img.squeeze(),cmap="gray")
<matplotlib.image.AxesImage at 0x7f38e81aedd0>

# Helper function for inline image display
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel:
        plt.imshow(npimg, cmap="Greys")
    else:
        plt.imshow(np.transpose(npimg, (1, 2, 0)))
        
dataiter = iter(train_dataloader)
images, labels = dataiter.next()

# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)

1C. Initiating the Tensorboard Logs and Visualizing Sample Images

# specifying the log directory
writer = SummaryWriter('runs/fashion_mnist_2_layer_NN_experiment_1')

# writing the grid of 4 images to Tensorboard log dir
writer.add_image('Four Sample Fashion-MNIST Images', img_grid)
writer.flush()

How to load the tensorboard

To view, start TensorBoard on the command line with: - tensorboard --logdir=runs - and open a browser tab to http://localhost:6006/ - Can view the sample images in images tab

  • Load the TensorBoard notebook extension for jupyter notebook
%load_ext tensorboard
  • Run the tensorboard from jupyter notebook
%tensorboard --logdir runs/fashion_mnist_2_layer_NN_experiment_1

2. Build the Model Layers

Build a NN with 2 hidden layers and 1 output layer

Components of a Neural Network:

  • Typical Neural Network:

image
  • Activation Function, Weight and Bias

image
  • Linear weighted sum of inputs: x = ∑(weights * inputs) + bias

  • f(x) = activation_func(x)

  • Activation Functions add non-linearity to the model

  • Different Activation Functions:

    • Sigmoid: 1/(1 + exp(-x))
    • Softmax: exp(x) / (sum(exp(x)))
    • ReLU: max(0,x)
    • Tanh: (exp(x) - exp(-x))/(exp(x) + exp(-x))

Building a neural network in PyTorch - torch.nn class provides all the building block needed to build a NN - Every module/layer in PyTorch subclases the torch.nn.Module - A NN is a composite module consisting of other modules (layers)

  • Initialize all layers in __init__ module
  • Build a 3-layer NN with
    • flattened 28*28 image as input,
    • 2 hidden layers will have 512 neurons each and
    • the third layer (which also has relu activation function) will have 10 neurons each corresponding to the number of classes
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device}")
Using cuda
# defining the model architecture

class NeuralNetwork(nn.Module):
    def __init__(self):
        # initialize the layers in __init__ constructor
        super(NeuralNetwork,self).__init__()
        # supercharge your sub-class by inheriting the defaults from Parent class
        self.flatten = nn.Flatten()
        # one can also use Functional API in PyTorch 
        # but below codes use Sequential API
        # the below stack of layers generates scores or logits
        self.linear_relu_stack = nn.Sequential(
            # hidden layer 1 consisting of 512 neurons
            nn.Linear(28*28, 512),
            nn.ReLU(),
            # hidden layer 2 consisting of 512 neurons too
            nn.Linear(512,512),
            nn.ReLU(),
            # output layer consisting of 10 neurons 
            nn.Linear(512,10),
            # we can also build a NN without this final layer ReLU
            # instead can also run the log_softmax directly
            nn.ReLU(), 
        )
        
    def forward(self,x): # need to pass the input argument x
        # function where the input is run through 
        # the initialized layers
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
    
# create a instance of the class NeuralNetwork 
# move it to the device (CPU or GPU)
model = NeuralNetwork().to(device)

# print model structure
print(model)

# is nn.ReLU in the final layer?
# https://ai.stackexchange.com/questions/8491/does-it-make-sense-to-apply-softmax-on-top-of-relu
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)
  • Why model(X) instead of model.forward(X)?
    Source

Dissecting the steps using Functional API

  • Step 1:Convert 28*28 into a contiguous array of 784 pixel values
input_image = torch.rand(3, 28, 28)
print(input_image.size())
# step 1: Flatten the input image
flatten = nn.Flatten() # instantitate
flat_image = flatten(input_image)  # pass the prev layer (input) into the instance
print(flat_image.size())
  • Step 2: Dense or linear layer in PyTorch weight * input + bias
# step 2: apply linear transformation `weight * input + bias`
layer1 = nn.Linear(in_features=28*28, out_features=512) # instantiate
hidden1 = layer1(flat_image) # pass the prev layer (flattened image) into the instance
print(hidden1.size())
  • Step 3: Apply Relu activation on the linear transformation
relu_activation = nn.ReLU() #instantiate
hidden1 = relu_activation(hidden1)

Repeat Step 2 and 3 for hidden2:

layer2 = nn.Linear(in_features=512, out_features=512)
hidden2 = layer2(hidden1)
hidden2 = relu_activation(hidden2)
  • Step 4: Compute the logits
# a simple 1 hidden layer NN with 20 neurons in the hidden layer
nn_seq_modules = nn.Sequential(
                    flatten,
                    layer1,
                    relu_activation,
                    layer2,
                    relu_activation,
                    nn.Linear(512, 10), # the output                )
input_image = torch.rand(3, 28, 28)
logits =  nn_seq_modules(input_image)   
  • Step 5: Apply Softmax function

softmax = nn.Softmax(dim=1)
predict_probab = softmax(logits)
  • Full NN workflow:

image Source: docs.microsoft.com/en-US/learn

How to see internal layers of a NN in PyTorch:

print("Weights stored in first layer: {model.linear_relu_stack[0].weight} \n")
print("Bias stored in first layer: {model.linear_relu_stack[0].bias} \n") 
    
from name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()}"
Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784])
Layer: linear_relu_stack.0.bias | Size: torch.Size([512])
Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512])
Layer: linear_relu_stack.2.bias | Size: torch.Size([512])
Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512])
Layer: linear_relu_stack.4.bias | Size: torch.Size([10])

3. Training the Model

Training with training data and evaluating loss on Validation Data

3A.Setting Hyperparameters

  • num_of_epochs: The number of times the entire training dataset is pass through the network
  • batch_size: The number of data samples seen by the model before updating its weights. (derived parameter steps = total_training_data/batch_size - the number of batches needed to complete an epoch)
  • learning_rate: How much to change the weights in the w = w - learning_rate * gradient. Smaller value means the model will take a longer time to find best weights. Larger value of learning_rate might make the NN miss the optimal weights because we might step over the best values
  • Choice of loss_fn
    Common Loss Functions for classification problems :
    • nn.NLLLoss #Negative Log Likelihood
    • nn.CrossEntropyLoss # combination of nn.LogSoftmax and nn.NLLLoss
  • Choice of optimizers
    • torch.optim.SGD
    • torch.optim.Adam
    • torch.optim.RMSProp and many more …
num_of_epochs = 40
batchsize = 4 # already mentioned in the DataLoader arguments
learning_rate = 1e-3

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(),
                            lr=learning_rate
                           )
# SGD optimizer in PyTorch actually is Mini-batch Gradient Descent with momentum
# it updates one mini-batch at a time (batchsize)
# Source: https://discuss.pytorch.org/t/how-sgd-works-in-pytorch/8060

3B. Writing Core Training and Evaluation Loop Functions

  • loss_fn and optimizer are passed to train_loop and just loss_fn to test_loop
for i in range(epochs):
    print(f"Epoch {i+1}\n ----------------------------")
    train_loop(train_dataloader, validation_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader,model, loss_fn)
print("Over!")    
def train_loop(train_dataloader, validation_dataloader, model, loss_fn, optimizer, epoch):
    train_size = len(train_dataloader.dataset)
    validation_size = len(validation_dataloader.dataset)
    training_loss_per_epoch = 0
    validation_loss_per_epoch = 0
    for batch_number, (X,y) in enumerate(train_dataloader):
        X,y = X.to(device), y.to(device)
        
        # compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation steps
        # key optimizer steps
        # by default, gradients add up in PyTorch
        # we zero out in every iteration
        optimizer.zero_grad() 
        # performs the gradient computation steps (across the DAG)
        loss.backward()
        # adjust the weights
        optimizer.step()
        training_loss_per_epoch += loss.item()
        
#         if batch_number % 100 == 0:
#             print(f"After completing {batch_number * len(X)} samples, the loss is:")
#             print(loss.item()) 
            
    for batch_number, (X,y) in enumerate(validation_dataloader):
        X,y = X.to(device), y.to(device)
        
        # compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)
        
        validation_loss_per_epoch += loss.item()
    avg_training_loss = training_loss_per_epoch/train_size
    avg_validation_loss = validation_loss_per_epoch/validation_size
    print(f"Average Training Loss of {epoch}: {avg_training_loss}")
    print(f"Average Validation Loss of {epoch}: {avg_validation_loss}")
    writer.add_scalars('Training vs. Validation Loss',
                       {'Training': avg_training_loss, 
                        'Validation': avg_validation_loss
                       },
                       epoch
                      )
def test_loop(test_dataloader,model, loss_fn, epoch):
    test_size = len(test_dataloader.dataset)
    # Failing to do eval can yield inconsistent inference results
    model.eval()
    test_loss_per_epoch, accuracy_per_epoch = 0, 0
    # disabling gradient tracking while inference
    with torch.no_grad():
        for X,y in test_dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            loss = loss_fn(pred, y)
            test_loss_per_epoch += loss.item()
            accuracy_per_epoch += (pred.argmax(1)==y).type(torch.float).sum().item()
    print(f"Average Test Loss of {epoch}: {test_loss_per_epoch/test_size}")
    print(f"Average Accuracy of {epoch}: {accuracy_per_epoch/test_size}")

3C. Training the model for many epochs

%%time
for epoch in range(num_of_epochs):
    print(f"Epoch Number: {epoch} \n---------------------")
    train_loop(train_dataloader, validation_dataloader, model, loss_fn, optimizer, epoch)
    test_loop(test_dataloader,model, loss_fn, epoch)
Epoch Number: 0 
---------------------
Average Training Loss of 0: 0.37492141907910503
Average Validation Loss of 0: 0.07822599628902972
Average Test Loss of 0: 0.3941003955438733
Average Accuracy of 0: 0.4513
Epoch Number: 1 
---------------------
Average Training Loss of 1: 0.29412952572156986
Average Validation Loss of 1: 0.06984573040464893
Average Test Loss of 1: 0.3524202892445028
Average Accuracy of 1: 0.5089
Epoch Number: 37 
---------------------
Average Training Loss of 37: 0.13975639427933614
Average Validation Loss of 37: 0.037423237568447926
Average Test Loss of 37: 0.19380079013922005
Average Accuracy of 37: 0.7052
Epoch Number: 38 
---------------------
Average Training Loss of 38: 0.13921849230745761
Average Validation Loss of 38: 0.038412615390023046
Average Test Loss of 38: 0.19745682889677718
Average Accuracy of 38: 0.7015
Epoch Number: 39 
---------------------
Average Training Loss of 39: 0.13862396091737622
Average Validation Loss of 39: 0.03721317019570803
Average Test Loss of 39: 0.1929354560287782
Average Accuracy of 39: 0.7063
CPU times: user 12min 2s, sys: 5.22 s, total: 12min 7s
Wall time: 11min 39s

truncated the results for easy viewing

Points to ponder: - The accuracy for this 2-layer NN stands at 71%. - The Hyperparameters - batch_size, learning_rate, choice of optimizer - can be varied to see how results change. - Changing Architecture: Deepening the number of hidden layers can help in improving the accuracy or changing the architecture to use CNN or any pre-trained NN like LeNet-5 or others will improve further

3D. Saving, Loading and Exporting the model

!mkdir -p model_weights/
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.utf-8)
torch.save(model.state_dict(),"model_weights/fmnist_2_layer_nn_model_batch_size_4.pth")

How to save and load the model for inference?

# pytorch models save the parameters in a internal state dictionary called `state_dict`
torch.save(model.state_dict(),"data/modelname.pth")
    
# infer from a saved model
# instantiate the model architecture class
model = NeuralNetwork()
model.load_state_dict(torch.load("data/modelname.pth"))
# the eval method is called before inferencing so that the batch normalization dropout layers are set to `evaluation` mod
# Failing to do this can yield inconsistent inference results
model.eval()

How to export a pytorch model to run in any Programming Language/Platform:

  • ONNX: Open Neural Network Exchange
  • Converting PyTorch model to onnx format aids in running the model in Java, Javascript, C# and ML.NET
# while explorting pytorch model to onnx, 
# we'd have to pass a sample input of the right shape
# this will help produce a `persisted` ONNX model    
import torch.onnx as onnx
input_image = torch.zeros((1,28,28))
onnx_model_location = 'data/model.onnx'
onnx.export(model, input_image, onnx_model)

4. Predict using the Trained Model

Loading the trained model and predicting for unseen data

# construct the model structure
model = NeuralNetwork()
# load the state_dict
model.load_state_dict(torch.load("model_weights/fmnist_2_layer_nn_model_batch_size_4.pth"))

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')
Predicted: "Ankle boot", Actual: "Ankle boot"
# these are logit scores and not softmax outputs 
# yet they are enough for predicting the class 
# since the logits are finally coming out of a ReLU() unit
# A ReLU outputs from (0,max)
pred[0]
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.4451, 0.0000, 0.0000, 0.0000,
        5.6093])

5. Leveraging Tensorboard

Reiterating the steps we have already done using Tensorboard

  • 1.Specifying the Log directory and using add_images method
# `torch.utils.data.tensorboard.SummaryWriter` class
# specifying the log directory
writer = SummaryWriter('runs/fashion_mnist_2_layer_NN_experiment_1')

# writing the grid of 4 images to Tensorboard log dir
# we can look at `IMAGES` tab of Tensorboard for this
writer.add_image('Four Sample Fashion-MNIST Images', img_grid)
writer.flush()
  • 2.Tracking Epoch level Average Training and Validation Losses.
# We can track in the `SCALARS` tab of the Tensorboard
writer.add_scalars('Training vs. Validation Loss',
                   {'Training': avg_training_loss, 
                    'Validation': avg_validation_loss
                   },
                   epoch
                  )

The Graph of Training Loss (blue line) and Validation Loss (green line) in Tensorboard

  • 3.After trained model is obtained, we can look at the graph to trace the sample input through your model
# We can track in the `GRAPH` tab of the Tensorboard
dataiter = iter(train_dataloader)
images, labels = dataiter.next()

# add_graph() will trace the sample input through your model
writer.add_graph(model, images)
writer.flush()

NN_graph in Tensorboard