Tensorflow2 learning note (1)

Fri, Apr 23, 2021 8-minute read pythonTensorflow2

 

Recently I have been taking a series of online courses to learn Tensorflow2, simply out of curiosity to know how it works, especially nowadays when almost everyone is talking about AI, deep learning, etc. 😛. In the post, I will show how to build up a neural network model from scratch to classify images of human-written digits of 0 to 9. This is the part 1 of the Tensorflow2 mini-series, and it serves more like a template demonstrating the modeling pipeline than a sophisticated performance-oriented project. I just randomly set up the model architecture, and the accuracy of the test dataset using the trained neural network model is 97.91%. Some models on Kaggle can achieve test accuracy more than 99.00%, those are also good resources to learn how to process the image data and improve the model performance.

Personally, on one hand I feel amazed that deep learning models can achieve such impressive performance superior than the other traditional statistical or machine learning models such as support vector machine (SVM) or random forest (RF). However, on the other hand I can sense that the real challenge resides on (1) how to adjust the model structure or refine the data processing to increase the accuracy by 1%, 0.1% or just 0.01%? (2) understand what pattern/relationship the model has learned from the data? As for the first challenge, it seems to me that it would require professional experience and extensive experiments, those classical models available on Keras Applications do not look very intuitive. The second challenge is related to an emerging research area called interpretable machine learning1 2, and I have heard from a machine learning engineer working in a pharmaceutical company that they use graph neural networks (GNN) to understand the connections between the molecule structure of drugs and their properties. Those are definitely very interesting topics, I will explore more on them and hopefully I will be able to cover the topics in my blog in the future 🤔.

The simple modeling pipeline presented in the post mainly consists of four parts,

  • Load modules and data
  • Construct model and specify training algorithm & evaluation metrics
  • Train model using train set and set up stopping criteria
  • Evaluate model performance on test set

and details are displayed below. The work is originally created on Google Colab with GPU processor.

 

1. Load modules and Tensorflow2

Instead of installing Tensorflow2 on local computer, it’s more convenient to use Google Colab, which can import Tensorflow2 directly and specify faster GPU option. Here the Colab default Tensorflow2 version is 2.x and I set the working directory to a folder in the Google Drive.

import tensorflow as tf
print('Tensorflow2 version', tf.__version__)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Dense, Flatten, Conv2D, MaxPooling2D, 
                                     Softmax, Dropout, BatchNormalization)
from tensorflow.keras.preprocessing import image
from tensorflow.keras import regularizers

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Tensorflow2 version 2.4.1
# set working directory

% cd /content/drive/MyDrive/Colab Notebooks
/content/drive/MyDrive/Colab Notebooks

 

2. Load Data

The MINST data can be loaded from Keras and is splitted into train set (60,000) and test set (10,000). I scale the image data to be in the range [0,1] and append a dummy third dimension, since the convolutional layer expects three dimensions. Some examples of the digit images and their corresponding labels are given below.

# load the MNIST Digits data
mnist_data = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist_data.load_data()

# scale images to be in range [0,1]
train_images_scale, test_images_scale = train_images/255.0, test_images/255.0

# add dummy channel dimension
train_images_scale = train_images_scale[...,np.newaxis]
test_images_scale = test_images_scale[...,np.newaxis]

print("train_images_scale shape:", train_images_scale.shape)
print("test_images_scale shape:", test_images_scale.shape)
train_images_scale shape: (60000, 28, 28, 1)
test_images_scale shape: (10000, 28, 28, 1)
# example images and labels

random_idx = np.random.choice(train_images_scale.shape[0],size=5)

fig, axes = plt.subplots(1,5,figsize=(5,5))

for i, idx in enumerate(random_idx):
  axes[i].set_axis_off()
  axes[i].imshow(np.squeeze(train_images_scale[idx]), cmap='Greys')
  axes[i].set_title(f'Digit {train_labels[idx]}')

 

3. Build and Compile model

I use model Sequential API to build up a neural network model of seven layers, and there is no specific rationals regarding the configuration, simply for fun 😎. Note that the last Dense layer is set to have 10 units, since there are 10 digit classes. When compiling the model, specify Adam as the training algorithm, loss is calculated by sparse_categorical_crossentropy, and model performance is measured by SparseCategoricalAccuracy. Then instantiate my_model from the defined model class and the summary is displayed below to show the layers structure and the number of parameters involved.

def get_model(input_shape):

  # specify layers
  model = Sequential([
    Conv2D(filters=16, kernel_size=(3,3), strides=(2,2), 
           padding='SAME', activation='relu', 
           input_shape=input_shape, data_format='channels_last', 
           # Conv2D expects (batch_size,dim,dim,channels); 
           # specify the channel axis accordingly in data_format
           name = "layer_1"),
    MaxPooling2D(pool_size=(3,3), name="layer_2"),
    Flatten(name="layer_3"),
    Dense(units=128, activation = 'relu', kernel_regularizer=regularizers.l2(1e-4),
          name="layer_4"), 
    Dense(units=64, activation = 'relu', kernel_initializer='he_uniform',
          bias_initializer=tf.keras.initializers.Constant(value=0),
          kernel_regularizer=regularizers.l2(1e-4),
          name="layer_5"), 
    BatchNormalization(name="layer_6"),
    Dense(units=10, activation = 'softmax', 
          kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05),
          bias_initializer="zeros",
          kernel_regularizer=regularizers.l2(1e-4),
          name="layer_7")
    ])
  
  # compile model
  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), 
                loss='sparse_categorical_crossentropy',
                metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
  
  return model

# instantiate model
my_model = get_model(train_images_scale[0].shape)

# display the model summary
my_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
layer_1 (Conv2D)             (None, 14, 14, 16)        160       
_________________________________________________________________
layer_2 (MaxPooling2D)       (None, 4, 4, 16)          0         
_________________________________________________________________
layer_3 (Flatten)            (None, 256)               0         
_________________________________________________________________
layer_4 (Dense)              (None, 128)               32896     
_________________________________________________________________
layer_5 (Dense)              (None, 64)                8256      
_________________________________________________________________
layer_6 (BatchNormalization) (None, 64)                256       
_________________________________________________________________
layer_7 (Dense)              (None, 10)                650       
=================================================================
Total params: 42,218
Trainable params: 42,090
Non-trainable params: 128
_________________________________________________________________

 

Before training the model, we first check the model performance using the initial weights. The test accuracy is 7.82%.

def get_test_evaluate(model, x, y):

  test_loss, test_acc = model.evaluate(x, y, verbose=0)
  print("Test loss: {:.3f}\nTest accuracy: {:.2f}%".format(test_loss, 100 * test_acc))

get_test_evaluate(my_model, test_images_scale, test_labels)
Test loss: 2.335
Test accuracy: 7.82%

 

4. Train model

mode.fit() method is used to train the neural network model for 50 epochs and 15% of the training set is set aside as validation set. Note that there are many helpful callback classes we can make use of to manipulate the training process. For examples, we set callbacks_early_stop to tell the machine to stop updating if the validation loss is not improved for 3 consecutive epochs; checkpoints_best tells the machine to save the best model weights in terms of the validation loss; checkpoints_epoch makes the machine save the model weights after each epoch. The results from the first 5 epochs are given below.

# stop if val_loss stops improving for 3 consecutive epochs
callbacks_early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', patience=3)

# reduce learning rate by a factor 0.5 if val_loss stops improving for 3 consecutive epochs
callbacks_reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3)

# save the loss and metrics after each epoch in the .csv file
callbacks_csv = tf.keras.callbacks.CSVLogger("train_results.csv")

# save the model weights after each epoch
checkpoints_epoch = tf.keras.callbacks.ModelCheckpoint(filepath='checkpoints_epoch/checkpoints_{epoch:03d}',
                                                       frequency='epoch', save_weights_only=True, verbose=0)

# save the best model weights in terms of the lowest val_loss
checkpoints_best = tf.keras.callbacks.ModelCheckpoint(filepath='checkpoints_best/checkpoints_best',
                                                       frequency='epoch', save_weights_only=True, verbose=0,
                                                      save_best_only = True, monitor='val_loss')

# train 50 epochs, and split 15% data as validation set
history = my_model.fit(train_images_scale, train_labels, 
                       epochs=50, batch_size = 128, validation_split=0.15, verbose=0,
                       callbacks=[callbacks_early_stop, callbacks_reduce_lr, callbacks_csv,
                                  checkpoints_epoch, checkpoints_best])
# check the training metrics, which is also saved in the train_results.csv
df = pd.DataFrame(history.history)
df.head()

loss sparse_categorical_accuracy val_loss val_sparse_categorical_accuracy lr
0 0.438134 0.899392 0.225271 0.953444 0.001
1 0.130286 0.968392 0.164475 0.953111 0.001
2 0.101795 0.977137 0.119016 0.971222 0.001
3 0.086796 0.980922 0.118879 0.971667 0.001
4 0.077237 0.983980 0.119780 0.972778 0.001

 

5. Evaluate model performance

We now visualize how the model performance changes as the number of training epochs increases. As shown from the plot, the loss and accuracy of both training and validation sets are decreasing and increasing, respectively. However, the loss and accuracy of validation set are worse than those of training set, which implies there is the overfitting issue. One may consider conducting data augmentation to increase the training size or adding regularizations to the weights.

fig = plt.figure(figsize=(10,5))

fig.add_subplot(121)
plt.plot(history.history['sparse_categorical_accuracy'])
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('sparse_categorical_accuracy VS. epochs')
plt.ylabel('sparse_categorical_accuracy')
plt.xlabel('epoch')
# plt.xticks(np.arange(epochs))
plt.legend(['Training', 'Validation'], loc='lower right')

fig.add_subplot(122)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('loss VS. epochs')
plt.ylabel('loss')
plt.xlabel('epoch')
# plt.xticks(np.arange(epochs))
plt.legend(['Training', 'Validation'], loc='upper right')

plt.show()

print('After model fitting:')
get_test_evaluate(my_model, test_images_scale, test_labels)
After model fitting:
Test loss: 0.096
Test accuracy: 97.70%

 

In addition, we evaluate the trained model on the test set and obtain the test accuracy 97.70%, which has dramatically increased from 7.82%, the value before model training. Moreover, we randomly select two test images and use the trained model to make predictions as well as the corresponding bar plot. It looks like the test images are clear and the model gives pretty good predictions 🥳.

# select two test images
random_idx = np.random.choice(test_images_scale.shape[0], size=2)
random_images = test_images_scale[random_idx, ...]
random_labels = test_labels[random_idx, ...]

# make predictions
random_predictions = my_model.predict(random_images)

# plot
fig, axes = plt.subplots(2, 2, figsize = (10,4))
fig.subplots_adjust(hspace = 0.4, wspace = -0.2)

for i, (pred, image, label) in enumerate(zip(random_predictions, random_images, random_labels)):

  axes[i,0].imshow(np.squeeze(image), cmap='Greys')
  axes[i,0].get_xaxis().set_visible(False)
  axes[i,0].get_yaxis().set_visible(False)
  axes[i,0].set_title(f'Digit {label}')
  axes[i,1].bar(np.arange(len(pred)), pred)
  axes[i,1].set_xticks(np.arange(len(pred)))
  axes[i,1].set_title(f'Model Prediction: {np.argmax(pred)}')

plt.show()

 

6. Load model weights and architecture

As mentioned in section 4, we have set up some callbacks to save the model weights after each epoch and the best model weights of all epochs. Here we check the performances of model using weights from the last epoch and the best weights, respectively. It shows that test accuracy of model using the best weight is higher, with value 97.91%.

# build up models with same layers structure
new_model_last = get_model(train_images_scale[0].shape)
new_model_best = get_model(train_images_scale[0].shape)

# load the weights from the last epoch
new_model_last.load_weights(tf.train.latest_checkpoint('checkpoints_epoch'))

# load the best weights
new_model_best.load_weights('checkpoints_best/checkpoints_best')

print('Use weights from last epoch:')
get_test_evaluate(new_model_last, test_images_scale, test_labels)
print('\nUse weights from best epoch:')
get_test_evaluate(new_model_best, test_images_scale, test_labels)
Use weights from last epoch:
Test loss: 0.096
Test accuracy: 97.70%

Use weights from best epoch:
Test loss: 0.087
Test accuracy: 97.91%

 

The model.load_weights() method is very useful in the scenario where one needs to stop then resume the training process. Method model.load_model() can be called if one wants to load a pre-trained model including both model architecture and weights. Furthermore, If one wants to extract the model configuration, method model.get_config() will retrieve the config in dictionary format, and it’s also possible to obtain the config in JSON or YAML format.


# extract model architecture
config_dict = my_model.get_config()

# load model architecture
new_model_config = tf.keras.Sequential.from_config(config_dict)
# for models that are not sequential models, use tf.keras.Model.from_config()