Machine Learning Engineer Nanodegree

Deep Learning

📑   Practice Project 4: Convolutional Neural Networks

In this notebook, we train an MLP to classify images from the MNIST database.

1. Load MNIST Database

In [1]:
%%html
<style>
@import url('https://fonts.googleapis.com/css?family=Orbitron|Roboto');
body {background-color: #add8e6;} 
a {color: darkblue; font-family: 'Roboto';} 
h1 {color: steelblue; font-family: 'Orbitron'; text-shadow: 4px 4px 4px #aaa;} 
h2, h3 {color: #483d8b; font-family: 'Orbitron'; text-shadow: 4px 4px 4px #aaa;}
h4 {color: slategray; font-family: 'Roboto';}
span {text-shadow: 4px 4px 4px #ccc;}
div.output_prompt, div.output_area pre {color: #483d8b;}
div.input_prompt, div.output_subarea {color: darkblue;}      
div.output_stderr pre {background-color: #add8e6;}  
div.output_stderr {background-color: #483d8b;}        
</style>
In [3]:
from keras.datasets import mnist

# use Keras to import pre-shuffled MNIST database
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("The MNIST database has a training set of %d examples." % len(X_train))
print("The MNIST database has a test set of %d examples." % len(X_test))
The MNIST database has a training set of 60000 examples.
The MNIST database has a test set of 10000 examples.

2. Visualize the First Six Training Images

In [4]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import numpy as np

# plot first six training images
fig = plt.figure(figsize=(20,20))
for i in range(6):
    ax = fig.add_subplot(1, 6, i+1, xticks=[], yticks=[])
    ax.imshow(X_train[i], cmap='bone')
    ax.set_title(str(y_train[i]))

3. View an Image in More Detail

In [5]:
def visualize_input(img, ax):
    ax.imshow(img, cmap='gray')
    width, height = img.shape
    thresh = img.max()/2.5
    for x in range(width):
        for y in range(height):
            ax.annotate(str(round(img[x][y],2)), xy=(y,x),
                        horizontalalignment='center',
                        verticalalignment='center',
                        color='white' if img[x][y]<thresh else 'black')

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
visualize_input(X_train[0], ax)

4. Rescale the Images by Dividing Every Pixel in Every Image by 255

In [6]:
# rescale [0,255] --> [0,1]
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255 

5. Encode Categorical Integer Labels Using a One-Hot Scheme

In [7]:
from keras.utils import np_utils

# print first ten (integer-valued) training labels
print('Integer-valued labels:')
print(y_train[:10])

# one-hot encode the labels
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

# print first ten (one-hot) training labels
print('One-hot labels:')
print(y_train[:10])
Integer-valued labels:
[5 0 4 1 9 2 1 3 1 4]
One-hot labels:
[[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

6. Define the Model Architecture

In [14]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

# define the model
model = Sequential()
model.add(Flatten(input_shape=X_train.shape[1:]))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

# summarize the model
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_2 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_3 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_4 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 10)                5130      
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________

7. Compile the Model

In [15]:
# compile the model
model.compile(loss='categorical_crossentropy', 
              optimizer='nadam', 
              metrics=['accuracy'])

8. Calculate the Classification Accuracy on the Test Set (Before Training)

In [16]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)
Test accuracy: 11.2300%

9. Train the Model

In [17]:
from keras.callbacks import ModelCheckpoint   

# train the model
checkpointer = ModelCheckpoint(filepath='mnist.model.best.hdf5', 
                               verbose=1, save_best_only=True)
hist = model.fit(X_train, y_train, batch_size=128, epochs=10,
          validation_split=0.2, callbacks=[checkpointer],
          verbose=1, shuffle=True)
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.2480 - acc: 0.9238Epoch 00000: val_loss improved from inf to 0.10715, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 28s - loss: 0.2475 - acc: 0.9239 - val_loss: 0.1072 - val_acc: 0.9670
Epoch 2/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.1062 - acc: 0.9671Epoch 00001: val_loss improved from 0.10715 to 0.10036, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 25s - loss: 0.1064 - acc: 0.9671 - val_loss: 0.1004 - val_acc: 0.9681
Epoch 3/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0749 - acc: 0.9764Epoch 00002: val_loss improved from 0.10036 to 0.09122, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 24s - loss: 0.0750 - acc: 0.9764 - val_loss: 0.0912 - val_acc: 0.9729
Epoch 4/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0623 - acc: 0.9807Epoch 00003: val_loss improved from 0.09122 to 0.07930, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 23s - loss: 0.0623 - acc: 0.9807 - val_loss: 0.0793 - val_acc: 0.9760
Epoch 5/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0560 - acc: 0.9825Epoch 00004: val_loss did not improve
48000/48000 [==============================] - 23s - loss: 0.0560 - acc: 0.9825 - val_loss: 0.0968 - val_acc: 0.9730
Epoch 6/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0470 - acc: 0.9850Epoch 00005: val_loss did not improve
48000/48000 [==============================] - 22s - loss: 0.0469 - acc: 0.9850 - val_loss: 0.0934 - val_acc: 0.9758
Epoch 7/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0431 - acc: 0.9860Epoch 00006: val_loss did not improve
48000/48000 [==============================] - 22s - loss: 0.0430 - acc: 0.9860 - val_loss: 0.0918 - val_acc: 0.9767
Epoch 8/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0413 - acc: 0.9864Epoch 00007: val_loss did not improve
48000/48000 [==============================] - 22s - loss: 0.0414 - acc: 0.9863 - val_loss: 0.0969 - val_acc: 0.9752
Epoch 9/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0383 - acc: 0.9879Epoch 00008: val_loss did not improve
48000/48000 [==============================] - 22s - loss: 0.0382 - acc: 0.9879 - val_loss: 0.0952 - val_acc: 0.9767
Epoch 10/10
47872/48000 [============================>.] - ETA: 0s - loss: 0.0332 - acc: 0.9891Epoch 00009: val_loss did not improve
48000/48000 [==============================] - 22s - loss: 0.0333 - acc: 0.9891 - val_loss: 0.0939 - val_acc: 0.9786

10. Load the Model with the Best Classification Accuracy on the Validation Set

In [18]:
# load the weights that yielded the best validation accuracy
model.load_weights('mnist.model.best.hdf5')

11. Calculate the Classification Accuracy on the Test Set

In [19]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)
Test accuracy: 97.8900%