NIELIT

NIELIT Ropar

Deep Learning Techniques · DOAI250006

← Back

// Practical — 01

Loading and Visualizing the
MNIST Handwritten Digit Dataset

Open in Colab

Aim

To load the MNIST handwritten digit dataset using Keras/TensorFlow, explore its structure, and visualize sample images along with their pixel intensity distributions.

Prerequisites

Python Programming
NumPy Arrays
Matplotlib Basics
TensorFlow / Keras
Image Representation
Data Visualization

Theory

The MNIST (Modified National Institute of Standards and Technology) dataset is one of the most widely used benchmarks in machine learning. It contains 70,000 grayscale images of handwritten digits (0–9), each of size 28×28 pixels. The dataset is split into 60,000 training samples and 10,000 test samples. Each pixel value ranges from 0 (black) to 255 (white), representing the intensity of that pixel.

The dataset is organized as a 3D NumPy array of shape (N, 28, 28) where N is the number of samples. Each 28×28 matrix is a 2D representation of a digit image. For fully connected neural networks, this 2D matrix must be flattened to a 1D vector of size 784 (28×28=784). For CNNs, a channel dimension is added, making the shape (N, 28, 28, 1).

Visualization is critical for understanding data quality, class distribution, and identifying anomalies. Using Matplotlib's imshow() function with the 'gray' colormap, we can render each 28×28 array as a human-readable digit image. Bar charts of class distributions help reveal if the dataset is well balanced across all categories, which informs decisions about model training.

The labels are integers 0–9 stored as a 1D array of shape (N,). For multi-class classification, these must often be one-hot encoded — converting an integer label into a binary vector where only the index corresponding to the class is 1 and all others are 0. For example, label 3 becomes [0,0,0,1,0,0,0,0,0,0].

Algorithm / Step-by-Step

  1. Import required libraries: tensorflow, matplotlib.pyplot, and numpy.
  2. Load the MNIST dataset using tf.keras.datasets.mnist.load_data() which returns (x_train, y_train), (x_test, y_test).
  3. Print the shapes of training and test arrays to verify the tensor dimensions.
  4. Display the first 9 images from the training set in a 3×3 grid using Matplotlib subplots, setting titles to their corresponding labels.
  5. Calculate the class distribution (count per digit 0–9) using numpy's unique() with return_counts=True.
  6. Plot a bar chart of the class frequencies to visualize the dataset balance.

Key Code Concepts

Snippet 1 — Loading and Inspecting MNIST

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Verify the dimensions of the tensors
print(f'Training data shape: {x_train.shape}')
print(f'Training labels shape: {y_train.shape}')
print(f'Test data shape: {x_test.shape}')
print(f'Test labels shape: {y_test.shape}')

The load_data() call automatically downloads the MNIST dataset (if not cached) and returns it as NumPy arrays. We then print the shapes of these tensors to verify their dimensions.

Snippet 2 — Visualizing a Grid of Images

# Set up a matplotlib figure and axis grid
plt.figure(figsize=(10, 10))

# Iterate through the first 9 images in the training set
for i in range(9):
    plt.subplot(3, 3, i + 1)
    plt.imshow(x_train[i], cmap='gray')
    plt.title(f'Label: {y_train[i]}')
    plt.axis('off')

plt.tight_layout()
plt.show()

We use a for loop and plt.subplot(3, 3, i + 1) to arrange the first 9 images into a 3x3 grid. The cmap='gray' argument renders the single-channel image in grayscale, and plt.axis('off') hides the coordinate axes.

Snippet 3 — Class Distribution Analysis

# Count samples per digit class
classes, counts = np.unique(y_train, return_counts=True)
print("Class Distribution:", dict(zip(classes, counts)))

# Bar chart
plt.bar(classes, counts, color='steelblue')
plt.xlabel("Digit Class")
plt.ylabel("Count")
plt.title("MNIST Training Set Class Distribution")
plt.show()

This reveals whether the dataset is balanced (approximately equal counts per class) or imbalanced. MNIST is nearly balanced with ~6,000 samples per digit.

Expected Output

Console Output: Shape prints confirming x_train = (60000, 28, 28), y_train = (60000,), x_test = (10000, 28, 28), and the Class Distribution dictionary.

Figure 1: A 3×3 grid of the first 9 grayscale digit images (0–9) with their correct labels shown as titles. Images appear clear and distinguishable without coordinate axes.

Figure 2: A bar chart showing ~5,000–7,000 samples per digit class, confirming a well-balanced dataset with no significant class imbalance.

Viva Questions & Answers

Q1. What does MNIST stand for and how many samples does it contain?
MNIST stands for Modified National Institute of Standards and Technology. It contains 70,000 grayscale images — 60,000 for training and 10,000 for testing — of handwritten digits 0 through 9, each of 28×28 pixel resolution.
Q2. Why do we typically normalize pixel values before training a neural network?
Normalization scales inputs to a small, consistent range (e.g., [0,1]). This prevents neurons with large input magnitudes from dominating gradient updates, leads to faster convergence during stochastic gradient descent, and reduces the risk of vanishing or exploding gradients. Without normalization, weight updates may oscillate or diverge.
Q3. What is one-hot encoding and when is it used for MNIST labels?
One-hot encoding converts integer class labels into binary vectors. For MNIST with 10 classes, label 5 becomes [0,0,0,0,0,1,0,0,0,0]. It is used when the output layer uses a softmax activation with categorical cross-entropy loss, which requires target distributions rather than scalar class indices.
Q4. What is the difference between the shape (60000, 28, 28) and (60000, 784)?
Shape (60000, 28, 28) represents 60,000 2D images of size 28×28. Shape (60000, 784) is the flattened version where each 28×28 image is unrolled into a 1D vector of 784 values (28×28=784). Flattening is required for Dense/fully-connected layers but CNNs operate on the 2D spatial structure.
Q5. How would you check if MNIST is a balanced dataset?
Use numpy.unique(y_train, return_counts=True) to compute the frequency of each class. If all 10 digit classes have approximately equal counts (~6,000 each for MNIST training set), the dataset is balanced. A bar chart of class frequencies visually confirms balance or reveals imbalance.