Loading scikit-learn's MNIST Hand-Written Dataset

Goal

This post aims to introduce how to load MNIST (hand-written digit image) dataset using scikit-learn

image

Refernce

Library

In [11]:
from sklearn.datasets import load_digits
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Load Dataset

In [2]:
mnist = load_digits()
In [3]:
type(mnist)
Out[3]:
sklearn.utils.Bunch
In [4]:
mnist.keys()
Out[4]:
dict_keys(['data', 'target', 'target_names', 'images', 'DESCR'])

Data

In [5]:
pd.DataFrame(mnist.data).head()
Out[5]:
0 1 2 3 4 5 6 7 8 9 ... 54 55 56 57 58 59 60 61 62 63
0 0.0 0.0 5.0 13.0 9.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 6.0 13.0 10.0 0.0 0.0 0.0
1 0.0 0.0 0.0 12.0 13.0 5.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 11.0 16.0 10.0 0.0 0.0
2 0.0 0.0 0.0 4.0 15.0 12.0 0.0 0.0 0.0 0.0 ... 5.0 0.0 0.0 0.0 0.0 3.0 11.0 16.0 9.0 0.0
3 0.0 0.0 7.0 15.0 13.0 1.0 0.0 0.0 0.0 8.0 ... 9.0 0.0 0.0 0.0 7.0 13.0 13.0 9.0 0.0 0.0
4 0.0 0.0 0.0 1.0 11.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 2.0 16.0 4.0 0.0 0.0

5 rows × 64 columns

Target

In [6]:
pd.DataFrame(mnist.target).head()
Out[6]:
0
0 0
1 1
2 2
3 3
4 4

Images

This dataset comprises of 8 x 8 images.

In [13]:
plt.imshow(mnist.images[0]);
In [27]:
fig, axes = plt.subplots(2, 10, figsize=(16, 6))
for i in range(20):
    axes[i//10, i %10].imshow(mnist.images[i], cmap='gray');
    axes[i//10, i %10].axis('off')
    axes[i//10, i %10].set_title(f"target: {mnist.target[i]}")
    
plt.tight_layout()

Comments

Comments powered by Disqus