Performance Study of InceptionV3 & CNN
A Comparative Study of the performance of CNN from scratch compared to a transfer learning approach (InceptionV3)
Properties | Details |
---|---|
Name | Intel Image Classification |
Source | Kaggle |
Author | PUNEET BANSAL |
Data Type | Images |
Problem Type | Image Classification |
Files | 3 Folders (14K Training, 3K Testing, 7K Prediction) |
Classes | 6 Classes (Buildings, Forest, Glacier, Mountain, Sea, Street) |
Preprocessing Steps
- Image Resizing: Image has been resized to
(150, 150)
. - Image Normalisation: Normalising pixel values from
0-255
to between0-1
. - Encoding Image Labels: Transforming labels from categorical format to numerical using
Label Encoder
.
Modelling
In this part, we will go through the different CNN versions that we created from scratch and using transfer learning with different versions, comparing the results including fine-tuning:
- Simple CNN:
- 1 convolutional layer
(32 Units)
- 1 MaxPooling layer
- 2 Dense Layers
(128 Units, 6)
- 10 Epochs
- 1 convolutional layer
- Deep CNN:
- 4 convolutional layers
(2 * 32 Units + 64 Units + 128 Units)
- 4 MaxPooling Layers
- 3 Dense Layers
(128 Units + 64 Units, 6)
- 10 Epochs
- 4 convolutional layers
- Deep CNN + Fine-tuning:
- Same previous architecture
- Change optimizer to
Adamax
- Added Callbacks (
early stopping
+LR Scheduler
) - 20 Epochs
- InceptionV3 I:
- Excluding original classification layers (
include_top = False
) - Using ImageNet weights (
weights='imagenet'
) - Unfreezing the first 25 layers (
inception1.layers[-25:]
) - Adding custom classification head:
- Dense layer
(1024 units)
- Dropout layer
(Dropping 20% of units, 0.2)
- Classification layer (last dense layer)
(6)
- Dense layer
- Optimizer
Adamax
with a starting learning rate of0.0001
- Callbacks (
early stopping
+retrieve best weights
) - 20 Epochs
- Excluding original classification layers (
- InceptionV3 II:
- Same previous architecture
- Unfreezing the last 25 layers instead (
inception2.layers[25:]
) - 20 Epochs
- InceptionV3 III:
- Using the full architecture (
48 Layers
) - Different custom classification head:
- Batch Normalization Layer
- 2 Dense Layers
(256 Units + 128 Units)
- 1 Classification layer
(6)
- Same Callbacks
- 20 Epochs
- Using the full architecture (
Results
I - Simple CNN
- Training: The Simple CNN did a horrible job with an evaluation of
68%
Accuracy &140%
Loss.
- Prediction: The bad training explains the bad predictions done by the Simple CNN, failing to classify a lot of images.
II - Deep CNN
- Training: The Deep CNN did a better job than the normal CNN with an evaluation of
81%
Accuracy but still had errors with82%
Loss.
- Prediction: The defective training explains the wrong predictions done by the Deep CNN, failing to classify some images.
III - Deep CNN + Fine-tuning
- Training: The Fine-tuned Deep CNN did better than the previous models with an evaluation of
82%
Accuracy but still had errors with49%
Loss (still high loss).
- Prediction: The improvement in the training results can be seen in the predictions done by the Fine-tuned CNN.
IV - InceptionV3 (ver 1)
- Training: We can already see the supremacy of transfer learning here with an evaluation of
90%
Accuracy but still having errors with27%
Loss.
- Prediction: The difference cannot be really seen here, but overall it's doing way better than previous models.
IV - InceptionV3 (ver 2)
- Training: Freezing the first 25 layers instead showed a difference with
91%
Accuracy but still having errors with25%
Loss.
- Prediction: The difference cannot be really seen here, but overall it's doing way better than previous models.
IV - InceptionV3 (ver 3)
- Training: Using the whole model made the performance inferior to previous versions of Inception with an evaluation of
89%
Accuracy but having more errors with30%
Loss.
- Prediction: The difference cannot be really seen here, but overall it's doing a bit worse than the other Inception versions.
Bonus
This is the confusion matrices of each model, giving better context to the overall performances: