Handwritten digit recognition, also known as Optical Character Recognition (OCR), is a fascinating application of deep learning and machine learning. In this article, we will explore how to build a neural network to recognize handwritten digits from 0 to 9 using Python and the popular MNIST dataset. We will use the Keras library to load the dataset and implement a neural network based purely on mathematics to perform this task. Let's get started!
Introduction
Handwritten digit recognition is a classic problem in machine learning. It involves classifying images of handwritten digits into their corresponding numerical values. The MNIST dataset consists of 28x28 pixel grayscale images of handwritten digits and their corresponding labels.
Libraries Required
Before we dive into the code, make sure you have the following libraries installed:
- pandas
- numpy
- pickle
- csv
- Keras
- matplotlib
- PIL (Python Imaging Library)
You can install these libraries using pip if you haven't already:
pip install pandas numpy pickle csv keras matplotlib pillow
Loading and Preprocessing the Data
We begin by loading the MNIST dataset and performing some preprocessing steps:
import pandas as pd
import numpy as np
import pickle
import csv
from keras.datasets import mnist
import matplotlib.pyplot as plt
from PIL import Image
# Load the MNIST dataset, consisting of training and testing data
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# Scaling factor used for normalization
Scaling = 255 # 255 is the maximum pixel value in grayscale images
# Get the dimensions (width and height) of the training images
WIDTH = X_train.shape[1] # Width of each image in pixels
HEIGHT = X_train.shape[2] # Height of each image in pixels
# Reshape and normalize the data
X_train = X_train.reshape(X_train.shape[0], WIDTH * HEIGHT).T / Scaling
X_test = X_test.reshape(X_test.shape[0], WIDTH * HEIGHT).T / Scaling
We reshape the data to 2D arrays and normalize it by dividing each pixel value by 255 to scale it between 0 and 1.
Building the Neural Network
Next, we define the architecture of our neural network. Our network consists of two layers: a hidden layer with ReLU activation and an output layer with softmax activation.
# Define batch size
batch_size = 128
# Calculate the total number of batches
num_batches = len(X_train) // batch_size
# Create mini-batches
mini_batches = []
for i in range(num_batches):
start_idx = i * batch_size
end_idx = (i + 1) * batch_size
mini_batch_X = X_train[:, start_idx:end_idx]
mini_batch_Y = Y_train[start_idx:end_idx]
mini_batches.append((mini_batch_X, mini_batch_Y))
# If there are any remaining data points, create one more mini-batch
if len(X_train) % batch_size != 0:
mini_batch_X = X_train[:, num_batches * batch_size:]
mini_batch_Y = Y_train[num_batches * batch_size:]
mini_batches.append((mini_batch_X, mini_batch_Y))
# Define the Rectified Linear Unit (ReLU) activation function
def ReLU(Z):
return np.maximum(Z, 0)
# Compute the derivative of the ReLU activation function
def derivative_ReLU(Z):
return Z > 0
# Define the softmax activation function
def softmax(Z):
exp = np.exp(Z - np.max(Z)) # Avoid overflow
return exp / exp.sum(axis=0)
# Initialize neural network parameters (weights and biases)
def init_params(nodes, size):
W1 = np.random.rand(nodes, size) * np.sqrt(1. / size)
b1 = np.random.rand(nodes, 1) * np.sqrt(1. / nodes)
W2 = np.random.rand(nodes, nodes) * np.sqrt(1. / nodes)
b2 = np.random.rand(nodes, 1) * np.sqrt(1. / size)
return W1, b1, W2, b2
Forward and Backward Propagation
We implement the forward and backward propagation functions. These functions compute the forward pass through the network, calculate the loss, and compute gradients during the backward pass.
# Perform forward propagation through the neural network
def forward_propagation(X, W1, b1, W2, b2):
Z1 = W1.dot(X) + b1
A1 = ReLU(Z1)
Z2 = W2.dot(A1) + b2
A2 = softmax(Z2)
return Z1, A1, Z2, A2
# Create a one-hot encoding of target labels Y
def one_hot(Y, nodes):
one_hot_Y = np.zeros((nodes, Y.size))
one_hot_Y[Y, np.arange(Y.size)] = 1
return one_hot_Y
# Perform backward propagation to compute gradients
def backward_propagation(nodes, X, Y, A1, A2, W2, Z1, m):
one_hot_Y = one_hot(Y, nodes)
dZ2 = 2 * (A2 - one_hot_Y)
dW2 = 1 / m * (dZ2.dot(A1.T))
db2 = 1 / m * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = W2.T.dot(dZ2) * derivative_ReLU(Z1)
dW1 = 1 / m * (dZ1.dot(X.T))
db1 = 1 / m * np.sum(dZ1, axis=1, keepdims=True)
return dW1, db1, dW2, db2
Updating Model Parameters
We update the model parameters (weights and biases) using gradient descent.
# Update model parameters using gradient descent
def update_params(nodes, alpha, W1, b1, W2, b2, dW1, db1, dW2, db2):
W1 -= alpha * dW1
b1 -= alpha * np.reshape(db1, (nodes, 1))
W2 -= alpha * dW2
b2 -= alpha * np.reshape(db2, (nodes, 1))
return W1, b1, W2, b2
Training the Neural Network
We train the neural network using gradient descent with mini-batches. This involves iteratively updating the model parameters to minimize the loss.
# Implement gradient descent training with mini-batches
def gradient_descent(X, Y, alpha, iterations, batch_size):
size, m = X.shape
nodes = 512
# Initialize model parameters
W1, b1, W2, b2 = init_params(nodes, size)
for i in range(iterations):
# Mini-batch training
for batch_start in range(0, m, batch_size):
batch_end = min(batch_start + batch_size, m)
X_batch = X[:, batch_start:batch_end]
Y_batch = Y[batch_start:batch_end]
# Forward propagation
Z1, A1, Z2, A2 = forward_propagation(X_batch, W1, b1, W2, b2)
# Backward propagation
dW1, db1, dW2, db2 = backward_propagation(nodes, X_batch, Y_batch, A1, A2, W2, Z1, batch_size)
# Update model parameters
W1, b1, W2, b2 = update_params(nodes, alpha, W1, b1, W2, b2, dW1, db1, dW2, db2)
if (i + 1) % int(iterations / 10) == 0:
print(f"Iteration: {i + 1} / {iterations}")
prediction = get_predictions(A2)
print(f'{get_accuracy(prediction, Y_batch):.3%}')
return W1, b1, W2, b2
Making Predictions
We define functions to make predictions using the trained model and display sample predictions.
# Function to make predictions using the trained model
def make_predictions(X, W1, b1, W2, b2):
_, _, _, A2 = forward_propagation(X, W1, b1, W2, b2)
predictions = get_predictions(A2)
return predictions
# Function to display a prediction along with the actual label
def show_prediction(index, X, Y, W1, b1, W2, b2):
vect_X = X[:, index, None]
prediction = make_predictions(vect_X, W1, b1, W2, b2)
label = Y[index];
console.log("Prediction: ", prediction);
console.log("Label: ", label);
current_image = vect_X.reshape((WIDTH, HEIGHT)) * Scaling;
plt.gray();
plt.imshow(current_image, interpolation='nearest');
plt.show();
Training and Testing
We train the neural network and evaluate its performance on the test dataset.
# Perform gradient descent to train the neural network model
W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.15, 200, batch_size)
# Save the trained model parameters (weights and biases) to a pickle file
with open("trained_params.pkl", "wb") as dump_file:
pickle.dump((W1, b1, W2, b2), dump_file)
# Load the trained model parameters from a pickle file
with open("trained_params.pkl", "rb") as dump_file:
W1, b1, W2, b2 = pickle.load(dump_file)
# Display predictions for several test images
show_prediction(4, X_test, Y_test, W1, b1, W2, b2)
show_prediction(97, X_test, Y_test, W1, b1, W2, b2)
show_prediction(533, X_test, Y_test, W1, b1, W2, b2)
# Make predictions on the test dataset using the trained model parameters
dev_predictions = make_predictions(X_test, W1, b1, W2, b2)
# Calculate the accuracy of the model on the test dataset
accuracy = get_accuracy(dev_predictions, Y_test);
console.log("Accuracy on the test dataset: ", accuracy.toFixed(3), "%");
Conclusion
In this article, we have built and trained a neural network for handwritten digit recognition using Python. We used the MNIST dataset to train our model and achieved a high level of accuracy on the test dataset. This demonstrates the power of neural networks in recognizing and classifying handwritten digits, a fundamental task in the field of machine learning and computer vision. You can further enhance this model by experimenting with different architectures, hyperparameters, and optimization techniques to achieve even better results.
View Project