Recognizing Handwritten Digits
In recent years, handwritten digit recognition has proven to be a difficult task. Many real-world scenarios necessitate the classification of handwritten text or numbers. Digit recognition is used in postal mail sorting, bank check processing, form data entry, and other applications. Images of scaled segments from five-digit ZIP codes make up the raw data.
In this blog , We are going to recognize handwritten digits(0–9). These images are gray scale images and the dataset can be loaded from scikit learn library in the module datasets. I am going to load necessary libraries and then load the dataset.
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inlinefrom sklearn.datasets import load_digits
digits = load_digits()
I have loaded the the dataset load_digits() and created an instace of the dataset ‘digits’. I am going to find out more about this dataset by using DESCR method.
print(digits.DESCR)OUTPUT :
.. _digits_dataset:
Optical recognition of handwritten digits dataset
--------------------------------------------------
**Data Set Characteristics:**
:Number of Instances: 1797
:Number of Attributes: 64
:Attribute Information: 8x8 image of integer pixels in the range 0..16.
:Missing Attribute Values: None
:Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
:Date: July; 1998
This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.
Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.
For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.
.. topic:: References
- C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
Graduate Studies in Science and Engineering, Bogazici University.
- E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
- Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
Linear dimensionalityreduction using relevance weighted LDA. School of
Electrical and Electronic Engineering Nanyang Technological University.
2005.
- Claudio Gentile. A New Approximate Maximal Margin Classification
Algorithm. NIPS. 2000.
This dataset has 1797 grayscale images of handwritten digits. Each image is an 8x8 matrix and has no null values. The whole data set is a dictionary and has a bunch of keys. Images key contains all the images , data keys contains all the data i.e matrices , target has the target values of the images and target names gives the names of the features. I am going see how many images are there in images key and the matrix of first image
digits.images.shapeOUTPUT :
(1797, 8, 8)digits.images[0]OUTPUT:
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
Now , let us plot the first image.
plt.figure(figsize=(10,6))
plt.imshow(digits.images[0],cmap=plt.cm.gray_r,
interpolation='nearest')
Let us now view targets and it’s size
digits.targetOUTPUT:
array([0, 1, 2, ..., 8, 9, 8])digits.target.sizeOUTPUT:
1797
To recognize the images or data , I am going to use Support vectors classification model with parameters C=100 and gamma = 0.01. I am going to perform 3 different cases , where for each case I am going to change the size of training data and see the accuracy of the model.
from sklearn.svm import SVCmodel = SVC(C = 100.,gamma=0.01)
CASE — 1:
I am going to take 6 images as my test images and the rest are going to be my training data. I am going to plot the 6 images , I will use for testing.
plt.subplot(321)
plt.imshow(digits.images[1791],cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(322)
plt.imshow(digits.images[1792],cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(323)
plt.imshow(digits.images[1793],cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(324)
plt.imshow(digits.images[1794],cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(325)
plt.imshow(digits.images[1795],cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(326)
plt.imshow(digits.images[1796],cmap=plt.cm.gray_r,
interpolation='nearest')
I am going to fit the training data to the model and store the predictions in a instance ‘pred’
model.fit(digits.data[1:1790],digits.target[1:1790])
pred = model.predict(digits.data[1791:1796])
print(pred)OUTPUT :
[4 9 0 8 9]
Now, to evaluate the model , Let us import classification_report , confusion_matrix , accuracy_score from scikit learn metrics. We check the above eavluators against the predicitions we made and the test data.
from sklearn.metrics import classification_report,confusion_matrix,accuracy_scoreprint(confusion_matrix(digits.target[1791:1796],pred))
print(classification_report(digits.target[1791:1796],pred))
print('Accuracy : ',accuracy_score(digits.target[1791:1796],pred))
Now , I am going to plot a heatmap of the confusion matrix. To visualize better.
cm = confusion_matrix(digits.target[1791:1796],pred)
plt.figure(figsize=(10,10))
sns.heatmap(cm,annot=True)
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
title = 'Accuracy score : {0}'.format(accuracy_score(digits.target[1791:1796],pred))
plt.title(title)
Now let us repeat the same process for two other cases where I reduce the training data. In both these cases , I am going to use an in-built funtion in model selection module of scikit learn called train_test_split.
CASE-2:
First , We will have to import the function train_test_split. In this case I am going to split the data such that 50% is training and 50% is test. In train_test_split , I am going to be using parameter , test_score and set it to 0.5. The rest will be same as the above case.
from sklearn.model_selection import train_test_splitmodel_1 = SVC(C=100.,gamma=0.01)
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.5 ,random_state=42)
model_1.fit(X_train,y_train)pred_1 = model_1.predict(X_test)
print(pred_1)
print(confusion_matrix(y_test,pred_1))print(classification_report(y_test,pred_1))print('Accuracy : ',accuracy_score(y_test,pred_1))
The accuracy for this case is 70 percent. Let’s view the heatmap.
cm = confusion_matrix(y_test,pred_1)
plt.figure(figsize=(10,10))
sns.heatmap(cm,annot=True)
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
title = 'Accuracy score : {0}'.format(accuracy_score(y_test,pred_1))
plt.title(title)
CASE — 3:
For this case , I am going to take 30% of the data as training and the rest as test data. Let’s check the accuracy of the model.
model_2 = SVC(C=100.,gamma=0.01)
X1_train, X1_test, y1_train, y1_test = train_test_split(digits.data, digits.target, test_size=0.7, random_state=42)
model_2.fit(X1_train,y1_train)pred_2 = model_2.predict(X1_test)print(confusion_matrix(y1_test,pred_2))print(classification_report(y1_test,pred_2))print('Accuracy : ',accuracy_score(y1_test,pred_2))
The accuracy for this model has dropped to 50 percent. Let’s plot the heat map.
cm = confusion_matrix(y1_test,pred_2)
plt.figure(figsize=(10,10))
sns.heatmap(cm,annot=True)
plt.ylabel('Actual Value')
plt.xlabel('Predicted Value')
title = 'Accuracy score : {0}'.format(accuracy_score(y1_test,pred_2))
plt.title(title)
CONCLUSION :
From the above , we can conlude the our model works if it’s trained properly with more data. You can also try for other C and gamma values. for gamma = 0.001 , This model works much better.
Thank you!