P08 — Deep Feedforward ANN

Aim

To construct, train, and evaluate a Deep Feedforward Artificial Neural Network (ANN) featuring exactly four hidden layers using backpropagation to classify articles of clothing from the Fashion MNIST dataset.

Prerequisites

Python Programming

TensorFlow / Keras

Neural Network Architectures

Backpropagation

Gradient Descent Optimization

Multi-class Classification

Theory

A Deep Feedforward Neural Network (or Multilayer Perceptron) consists of an input layer, an output layer, and multiple hidden layers (in this experiment, exactly four). The term "deep" specifically refers to having more than one hidden layer, which allows the network to learn hierarchical and more complex feature representations of the input data.

The network learns through an algorithm called Backpropagation. During training, the forward pass calculates the network's prediction and the resulting loss (error). Backpropagation then moves backward through the network, computing the gradient of the loss with respect to each weight using the chain rule of calculus. An optimizer like Adam uses these gradients to update the weights, minimizing the overall loss over multiple iterations (epochs).

A common architectural design in deep ANNs is to use a funnel structure. The first hidden layer has a large number of neurons, and subsequent layers progressively decrease in size (e.g., 256 → 128 → 64 → 32). This acts as a form of feature compression, forcing the network to distill the high-dimensional raw pixel data into low-dimensional, highly discriminative semantic features before the final classification layer.

In this practical, we utilize the Fashion MNIST dataset. It serves as a direct drop-in replacement for the original MNIST dataset, containing 70,000 28x28 grayscale images representing 10 different categories of clothing (e.g., T-shirts, trousers, sneakers). It poses a slightly harder challenge for simple ANNs compared to handwritten digits.

Algorithm / Step-by-Step

Import the tensorflow and matplotlib.pyplot libraries.
Load the Fashion MNIST dataset using keras.datasets.fashion_mnist.load_data().
Normalize the image pixel values from the range [0, 255] to [0.0, 1.0] to facilitate stable gradient descent.
Initialize a Sequential model.
Add a Flatten layer to unroll the 28x28 2D images into a 784-element 1D array.
Add four sequential hidden Dense layers with decreasing units (256, 128, 64, 32) and apply the ReLU activation function to each.
Add an output Dense layer with 10 units (for the 10 clothing classes) using a softmax activation function.
Compile the model using the 'adam' optimizer, 'sparse_categorical_crossentropy' loss, and 'accuracy' metrics.
Train the model using model.fit() on the training data for 15 epochs, setting aside 20% of the data for validation.
Plot the training and validation accuracy curves over the epochs.
Evaluate final model performance on the unseen test dataset.

Key Code Concepts

Snippet 1 — Defining the 4-Hidden-Layer Architecture

model = tf.keras.models.Sequential([
    # Input layer
    tf.keras.layers.Flatten(input_shape=(28, 28)),

    # Hidden Layer 1
    tf.keras.layers.Dense(256, activation='relu', name='Hidden_Layer_1'),
    # Hidden Layer 2
    tf.keras.layers.Dense(128, activation='relu', name='Hidden_Layer_2'),
    # Hidden Layer 3
    tf.keras.layers.Dense(64, activation='relu', name='Hidden_Layer_3'),
    # Hidden Layer 4
    tf.keras.layers.Dense(32, activation='relu', name='Hidden_Layer_4'),

    # Output Layer (10 classes for Fashion MNIST)
    tf.keras.layers.Dense(10, activation='softmax', name='Output_Layer')
])

This snippet explicitly defines a multi-layer perceptron. The Flatten layer is mandatory to convert 2D image data into the 1D vector required by standard Dense layers. Naming the layers helps clarify the structure when printing the model summary.

Snippet 2 — Compiling and Training with Backpropagation

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model and save the history
history = model.fit(x_train, y_train,
                    epochs=15,
                    batch_size=32,
                    validation_split=0.2)

The compile method defines the backpropagation mechanics. adam is a popular adaptive learning rate optimization algorithm. sparse_categorical_crossentropy is used because our labels are integers (0-9) rather than one-hot encoded vectors. Training tracks history to allow for post-training visualization of learning curves.

Expected Output

Model Summary: A tabular breakdown of the network showing 1 Flatten layer, 4 Dense hidden layers progressively decreasing in parameters, and 1 Dense output layer. Total trainable parameters will be around 242,000.

Training Logs: Text output for 15 epochs showing decreasing loss and val_loss alongside increasing accuracy and val_accuracy.

Visualization: A line plot containing two lines (Train and Validation Accuracy) curving upwards and plateuing as epochs increase, demonstrating the model's learning trajectory and highlighting any potential overfitting.

Test Accuracy: A final printed console line displaying the test accuracy (typically hovering around 88% - 89% for Fashion MNIST with this architecture).

Viva Questions & Answers

Q1. What makes a neural network "deep"?

A neural network is considered "deep" when it has multiple hidden layers between the input and output layers. In contrast, a "shallow" network typically has only one hidden layer. The depth allows the network to learn increasingly abstract representations of the data.

Q2. Why do we progressively reduce the number of neurons in the hidden layers (256 -> 128 -> 64 -> 32)?

This "funnel" or "bottleneck" architecture forces the network to compress the information. As data moves through the layers, the network is forced to discard noise and retain only the most critical, high-level features necessary for classification, which also helps reduce computational cost and overfitting.

Q3. What is the role of backpropagation in this model?

Backpropagation calculates the gradient of the loss function with respect to the network's weights. By recursively applying the chain rule from the output layer back to the input layer, it determines how much each weight contributed to the error, allowing the optimizer (Adam) to adjust the weights in the correct direction.

Q4. Why are we using 'sparse_categorical_crossentropy' instead of 'categorical_crossentropy'?

We use 'sparse_categorical_crossentropy' when our target labels are provided as integers (e.g., 0, 1, 2... 9). If we had manually converted our labels into one-hot encoded binary arrays (e.g., [0, 0, 1, 0...]), we would use 'categorical_crossentropy'. The 'sparse' version saves memory and computation.

Q5. What is the purpose of the Flatten layer at the start of the model?

Standard Dense (fully connected) layers require 1-dimensional input arrays. Because the Fashion MNIST images are 2-dimensional (28x28 pixels), the Flatten layer simply reshapes the 2D matrix into a 1D vector of 784 elements (28 * 28 = 784) without altering the actual data.

NIELIT Ropar

Implementation of Deep Feed Forward
ANN with 4 Hidden Layers

Aim

Prerequisites

Theory

Algorithm / Step-by-Step

Key Code Concepts

Expected Output

Viva Questions & Answers

Implementation of Deep Feed ForwardANN with 4 Hidden Layers

Aim

Prerequisites

Theory

Algorithm / Step-by-Step

Key Code Concepts

Expected Output

Viva Questions & Answers

Implementation of Deep Feed Forward
ANN with 4 Hidden Layers