CIFAR10 Image Classifier using PyTorch

This is a blog post on the work that I have done in programming an image classifier for the CIFAR10 dataset.

I had followed many notebooks, tutorials, and guidance from my Teaching Assistant Mondol, Md Ashaduzzaman Rubel, and my fellow classmate Subbiah Sharavanan, Abishek Pichaipillai for creating this notebook.

The Work

First I had to work on a base tutorial code available on the PyTorch website here. It is a beginner tutorial for classifying images into their respective labels which are present in the CIFAR10 dataset. Also need to try different optimizers provided by PyTorch and document the results and infer some conclusions.

CIFAR10 dataset:

The CIFAR10 dataset consists of 6000 images with which there are 1000 tests and 5000 training images with labels.

It has images from 10 classes namely plane, car, bird, cat, deer, dog, frog, horse, ship, and truck.

The Tutorial:

I started off understanding the tutorial. All they were doing is having a 2 layer convolutional neural network with 1 max pool layer, 2 linear rectification layers, and 3 linear layers.

They used torch.utils.data.DataLoader to load the data in batches. The training data was shuffled but the test data loader was not shuffled.

They had an image transformation with normalization of every pixel to 0.5 with no augmentations.

The tutorial is using the CrossEntropyLoss() Loss function and the Stochastic Gradient Descent Optimizer with a learning rate of 0.001 and a momentum of 0.9

For the training, they ran it for 2 epochs and loaded the image and label from the training loader and do the following steps:

Zero the gradients.
Pass the inputs to the model.
Calculate the loss.
Backward Propagation.
Increase Optimizer Step by one time.

They got a loss of 1.278 for the above work. When they tested it with the test data loader they got an accuracy of 68%.

What I did

Initial Commit

I had to use different optimizers so as my initial commit I planned to combine together all the optimizers with their default hyperparameters for the same image transform and model as given in the example.

All that I did was modularize the training method and initialize the optimizer and pass the different optimizers into the training function and would do each of the steps Zero the gradients, Pass the inputs and get the output to and from the model, Calculate the loss and so on...

Next, I found out while initializing the optimizers that apparently, the LBFGS optimizer sep needs closure as a required argument that will return the loss. Also, there are tricky things that need to be handled in the closure function as per the documentation which was unnecessary.

I then added the steps that were done in the training as a separate closure function so that I can pass it to the step function only if the optimizer is LBFGS.

There was an interesting way in which they validated the model. The first saved the checkpoint to a file and then after all the training is done they will load it back and proceed with the validation.

I got an issue while doing so. I saved each of the optimizer's models into separate files. While opening them back I got a runtime error saying too many files are open. To resolve this I saved all the model state values in a single file and retrieved them for validation.

The first set of Accuracies for the individual type of optimizers are as follows:

SGD     - 55%

Adagrad - 36%

Adam    - 55%
Adamax  - 52%
ASGD    - 32%
LBFGS   - 10%
Rprop   - 15%

For more information about my first commit refer here.

Improving Accuracy

To improve the accuracy I had to do the following:

Image Augmentation and Normalization.
Add more layers and modify the layer parameters.
Modify the Optimizer Hyperparameters.

Image Augmentation and Normalization

In image augmentation, we change some properties of all the test and training images. For example, changing the color of the image, Image rotation, Cropping the images, Vertically or Horizontally flipping the image. Augmentation will help us expand the training dataset and help the model to improve its prediction accuracy. It opens up different perspectives of the same image.

Normalization is the process of changing the range of pixel intensity values.

For my case, I had to do 2 Image augmentations RandomHorizontalFlip and RandomCrop which did improve the model accuracy by 4%.

Also, I had changed the default normal index to (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010) values, this also improved the model accuracy by 4%.

Add more layers and modify the layer parameters

Convolution is the process of filtering the image to get more features out of it like curves, Horizontal, Vertical Lines, and so on...

The above figure shows us the Convolutional neural network that had been referred to from this reference. It has 3 Layers of Convolution 2D and Each having a 2D Batch Norm, Leniar Rectifier, and MaxPool 2D activation. We also use a Dropout layer to prevent the network from overfitting.

Batch normalization accelerates training, in some cases by halving the epochs or better, and provides some regularization, reducing generalization error.

Modify the Optimizer Hyperparameters

Model Accuracy can be improved by changing different hyperparameters for different optimizers. For example, changing the value of momentum in the SGD optimizer will help us achieve better accuracy in fewer epochs. In the same way, the weight_decay parameter of the Adam optimizer should not be greater than 1e-4 which will also help us reach higher accuracy in small epochs.

Results

SGD Optimizer

Adam Optimizer

Adagrad Optimizer

Adamax Optimizer

ASGD Optimizer

LBFGS Optimizer

Rprop Optimizer

To Avoid

Must not store state dictionaries in multiple files and open them separately. Instead, save all of them together in a single file.

LBFGS optimizer needs a closure function as an input to its step function which should return the current loss of the model.

Conclusion

From the graph and the accuracy results above we can see that the winner is Adamax optimizer with a

Training accuracy of 90.5780%

Validation accuracy of 86.47%

Average Execution time: 0:00:22.445664 (HH: MM: SS)

Total Execution time: 0:07:28.913286 (HH: MM: SS)

The final table for all the results:

Bibliography

Wholehearted thanks for helping me achieve this:

Search This Blog

Studies