In this project, we'll classify images from the Flower Color Images Dataset. The content is very simple: 210 images (128x128x3) with 10 species of flowering plants and the file with labels flower-labels.csv
. Photo files are in the .png format and the labels are the integers.
We'll preprocess the images, then train a neural network on all the samples. The images need to be normalized and the labels need to be one-hot encoded.
We are going to apply Keras: The Python Deep Learning library.
At the end, we'll get to see the neural network's predictions on the sample images.
Let's set up a style of the Jupyter notebook and import the software libraries. The command hide_code
will hide the code cells.
%%html
<style>
@import url('https://fonts.googleapis.com/css?family=Orbitron|Roboto');
body {background-color: aliceblue;}
a {color: #4876ff; font-family: 'Roboto';}
h1 {color: #348ABD; font-family: 'Orbitron'; text-shadow: 4px 4px 4px #ccc;}
h2, h3 {color: slategray; font-family: 'Roboto'; text-shadow: 4px 4px 4px #ccc;}
h4 {color: #348ABD; font-family: 'Orbitron';}
span {text-shadow: 4px 4px 4px #ccc;}
div.output_prompt, div.output_area pre {color: slategray;}
div.input_prompt, div.output_subarea {color: #4876ff;}
div.output_stderr pre {background-color: aliceblue;}
div.output_stderr {background-color: slategrey;}
</style>
<script>
code_show = true;
function code_display() {
if (code_show) {
$('div.input').each(function(id) {
if (id == 0 || $(this).html().indexOf('hide_code') > -1) {$(this).hide();}
});
$('div.output_prompt').css('opacity', 0);
} else {
$('div.input').each(function(id) {$(this).show();});
$('div.output_prompt').css('opacity', 1);
};
code_show = !code_show;
}
$(document).ready(code_display);
</script>
<form action="javascript: code_display()">
<input style="color: #348ABD; background: aliceblue; opacity: 0.8;" \
type="submit" value="Click to display or hide code cells">
</form>
hide_code = ''
import numpy as np
import pandas as pd
from PIL import ImageFile
from tqdm import tqdm
import h5py
import cv2
import matplotlib.pylab as plt
from matplotlib import cm
%matplotlib inline
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from keras.preprocessing import image as keras_image
from keras.models import Sequential, load_model
from keras.layers import Dense, LSTM, GlobalAveragePooling1D, GlobalAveragePooling2D
from keras.layers import Activation, Flatten, Dropout, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D, GlobalMaxPooling2D
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from keras.layers.advanced_activations import PReLU, LeakyReLU
Run the following cell to download the dataset.
hide_code
# Function for processing an image
def image_to_tensor(img_path):
img = keras_image.load_img("data/flower_images/" + img_path, target_size=(128, 128))
x = keras_image.img_to_array(img)
return np.expand_dims(x, axis=0)
# Function for creating the data tensor
def data_to_tensor(img_paths):
list_of_tensors = [image_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)
ImageFile.LOAD_TRUNCATED_IMAGES = True
# Load the data
data = pd.read_csv("data/flower_images/flower_labels.csv")
files = data['file']
targets = data['label'].values
tensors = data_to_tensor(files);
Run the following cell to display the set shapes.
hide_code
# Print the shape
print ('Tensor shape:', tensors.shape)
print ('Target shape', targets.shape)
We can create a list of flower names and display image examples.
hide_code
# Create the name list
names = ['phlox', 'rose', 'calendula', 'iris', 'max chrysanthemum',
'bellflower', 'viola', 'rudbeckia laciniata', 'peony', 'aquilegia']
hide_code
# Read from files and display images using OpenCV
def display_images(img_path, ax):
img = cv2.imread("data/flower_images/" + img_path)
ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
fig = plt.figure(figsize=(20, 10))
for i in range(8):
ax = fig.add_subplot(2, 4, i + 1, xticks=[], yticks=[], title=names[targets[i]])
display_images(files[i], ax)
The data tensors can be saved in the appropriate format of files .h5
.
hide_code
# Create the tensor file
with h5py.File('FlowerColorImages.h5', 'w') as f:
f.create_dataset('images', data = tensors)
f.create_dataset('labels', data = targets)
f.close()
If we decide to come back to this notebook or have to restart it, we can start here.
hide_code
# Read the h5 file
f = h5py.File('FlowerColorImages.h5', 'r')
# List all groups
keys = list(f.keys())
keys
hide_code
# Create tensors and targets
tensors = np.array(f[keys[0]])
targets = np.array(f[keys[1]])
print ('Tensor shape:', tensors.shape)
print ('Target shape', targets.shape)
hide_code
# TODO: normalize the tensors
Now we'll implement the one-hot encoding function to_categorical
.
hide_code
# TODO: one-hot encode the targets
Apply the function train_test_split
and split the data into training and testing sets.
Set up the size for the testing set - 10% and for the validation set - 10%.
hide_code
# TODO: split the data
Let's pring the shape of these data sets.
hide_code
# Print the shape
x_train.shape, x_test.shape, x_valid.shape, y_train.shape, y_test.shape, y_valid.shape
We can display an image example from the training set.
hide_code
# Read and display a tensor using Matplotlib
print('Label: ', names[np.argmax(y_train[1])])
plt.figure(figsize=(3,3))
plt.imshow((x_train[1]));
Define a model architecture and compile the model.
hide_code
def model():
model = Sequential()
# TODO: Define a model architecture
# TODO: Compile the model
return model
model = model()
hide_code
# Create callbacks
checkpointer = ModelCheckpoint(filepath='weights.best.model.hdf5',
verbose=2, save_best_only=True)
lr_reduction = ReduceLROnPlateau(monitor='val_loss',
patience=5, verbose=2, factor=0.2)
hide_code
# TODO: Set up parameters
# epochs =
# batch_size =
# Train the model
history = model.fit(x_train, y_train,
epochs=epochs, batch_size=batch_size, verbose=2,
validation_data=(x_valid, y_valid),
callbacks=[checkpointer,lr_reduction])
hide_code
# TODO: Try to apply ImageDataGenerator (keras)
We should have an accuracy greater than 10%. Let's try to reach the level 60-70%.
hide_code
# Load the model with the best validation accuracy
model.load_weights('weights.best.model.hdf5')
# Calculate classification accuracy on the testing set
score = model.evaluate(x_test, y_test)
score
hide_code
# Save/reload models
model.save('model.h5')
model = load_model('model.h5')
Trained model has been saved to the current folder.
hide_code
# Model predictions for the testing dataset
y_test_predict = model.predict_classes(x_test)
hide_code
# Display true labels and predictions
fig = plt.figure(figsize=(18, 18))
for i, idx in enumerate(np.random.choice(x_test.shape[0], size=16, replace=False)):
ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
ax.imshow(np.squeeze(x_test[idx]))
pred_idx = y_test_predict[idx]
true_idx = np.argmax(y_test[idx])
ax.set_title("{} ({})".format(names[pred_idx], names[true_idx]),
color=("#4876ff" if pred_idx == true_idx else "darkred"))