Notebook: https://github.com/extensive-nlp/TSAI-DeepNLP-END2.0/blob/main/03_PyTorch101/MNIST_Adder.ipynb
Colab Link: https://githubtocolab.com/extensive-nlp/TSAI-DeepNLP-END2.0/blob/main/03_PyTorch101/MNIST_Adder.ipynb
The Data sample is always represented as (input, target)
where both the input and target have to be torch.Tensor
, conversion to Tensor is taken care by PyTorch’s DataLoader
for primitive types, for MNIST Image we need to use the torchvision transforms’ ToTensor
method.
My Representation for (input, target
is (img, random_number), (target, target + random_number)
class MNISTAdder(MNIST):
"""
A modified version of MNIST
It adds a random number along with the mnist image, and the target now is
the mnist image's target + the random number
For example:
[MNIST Image for 1], 5 => 1, 6
Usage:
>>> mnistadder = MNISTAdder(root='data')
>>> dloader = DataLoader(mnistadder)
>>> batch = next(iter(dloader))
"""
def __init__(
self,
*args,
**kwargs
) -> None:
super(MNISTAdder, self).__init__(*args, **kwargs)
def __getitem__(self, index: int) -> Tuple[Any, Any]:
img, target = super(self.__class__, self).__getitem__(index)
random_number = np.random.randint(low=0, high=10)
return (img, random_number), (target, target + random_number)
The model’s primary task is to figure out the MNIST Image Classification, for that I simply copy-paste one of my past MNIST Classification Model and only use the backbone layers, which is upto the GAP Layer + Conv1D,
Now after this i got 20 Channels, there 20 channels are flattened out, and concatenated with the one-hot representation of the “random” number, so 30 features in total, Now from here on i use Linear Layers which will do the addition, and out of that I will extract 10 features of my MNIST classification, and 19 features of my Addition classification.
Why 19 features ? because 0-9 (MNIST) + 0-9 (Random) will be 0-18 numbers, or 19 possible numbers in total.
def forward(self, mnist_img, rand_num):
rand_num = F.one_hot(rand_num, num_classes=self.num_classes)
# mnist embedding: 1x20
mnist_embed = self.mnist_base(mnist_img)
# concat the mnist embedding and the random number = 30 features
ccat = torch.cat([mnist_embed, rand_num], dim=-1)
pre_out = self.prefinal_layer1(ccat)
pre_out = self.prefinal_layer2(pre_out)
mnist_out = self.mnist_final_layer(pre_out)
adder_out = self.adder_final_layer(pre_out)
return mnist_out, adder_out
The first task is to classify the mnist image, so obvious choice of loss function for that is cross entropy loss, which is basically log_softmax
+ nll_loss
(negative log likelihood).
But for the adder, since i used a one hot representation for the output, cross entropy seems like a good choice for it too. So i went with that.
Now both these losses are combined by simple addition. We can also give more weightage to the MNIST Loss, because without the correct prediction for MNIST we cannot give the correct output for the adder.
...
self.loss = nn.CrossEntropyLoss()
...
...
# both mnist and adder use cross entropy loss
mnist_loss = self.loss(mnist_pred, mnist_y)
adder_loss = self.loss(adder_pred, adder_y)
# final loss is sum of the two loss
loss = mnist_loss + adder_loss
The Output obtained from the model is evaluated against the target using accuracy functions, which basically does.
Where is a tensor of target values, and is a tensor of predictions. Source Code
mnist_pred = torch.argmax(F.log_softmax(mnist_pred, dim=1), dim=1)
adder_pred = torch.argmax(F.log_softmax(adder_pred, dim=1), dim=1)
mnist_acc = accuracy(mnist_pred, mnist_y)
adder_acc = accuracy(adder_pred, adder_y)
The total accuracy is the product of mnist_acc
and adder_acc
, because end of the day we need both the mnist output AND the added value output to be correct.
The Model was trained for 20
epochs with SWE: Stochastic Weight Averaging
with OneCycleLR
Model Parameters
| Name | Type | Params
-------------------------------------------------------
0 | mnist_base | MNISTModel | 9.9 K
1 | prefinal_layer1 | Sequential | 1.9 K
2 | prefinal_layer2 | Sequential | 3.7 K
3 | mnist_final_layer | Sequential | 600
4 | adder_final_layer | Sequential | 1.1 K
5 | loss | CrossEntropyLoss | 0
-------------------------------------------------------
17.3 K Trainable params
0 Non-trainable params
17.3 K Total params
0.069 Total estimated model params size (MB)
Model Testing
>>> trainer.test()
>>>
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing: 100%
78/78 [00:01<00:00, 55.57it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'adder_acc': 0.9921875,
'mnist_acc': 0.9931890964508057,
'total_acc': 0.9855143427848816,
'val_loss': 0.057633303105831146}
Achieved a Total Accuracy of 98.5%
in 20
epochs
Overall Accuracy | |
---|---|
MNIST Accuracy | Adder Accuracy |
---|---|
Learning Rate | Validation Loss |
---|---|
LR Finder
>>> lr finder suggested lr: 0.16601414902764572
Sample Test Outputs