Aim
To construct, train, and evaluate a Deep Feedforward Artificial Neural Network (ANN) featuring exactly four hidden layers using backpropagation to classify articles of clothing from the Fashion MNIST dataset.
Prerequisites
Theory
A Deep Feedforward Neural Network (or Multilayer Perceptron) consists of an input layer, an output layer, and multiple hidden layers (in this experiment, exactly four). The term "deep" specifically refers to having more than one hidden layer, which allows the network to learn hierarchical and more complex feature representations of the input data.
The network learns through an algorithm called Backpropagation. During training, the forward pass calculates the network's prediction and the resulting loss (error). Backpropagation then moves backward through the network, computing the gradient of the loss with respect to each weight using the chain rule of calculus. An optimizer like Adam uses these gradients to update the weights, minimizing the overall loss over multiple iterations (epochs).
A common architectural design in deep ANNs is to use a funnel structure. The first hidden layer has a large number of neurons, and subsequent layers progressively decrease in size (e.g., 256 → 128 → 64 → 32). This acts as a form of feature compression, forcing the network to distill the high-dimensional raw pixel data into low-dimensional, highly discriminative semantic features before the final classification layer.
In this practical, we utilize the Fashion MNIST dataset. It serves as a direct drop-in replacement for the original MNIST dataset, containing 70,000 28x28 grayscale images representing 10 different categories of clothing (e.g., T-shirts, trousers, sneakers). It poses a slightly harder challenge for simple ANNs compared to handwritten digits.
Algorithm / Step-by-Step
- Import the
tensorflowandmatplotlib.pyplotlibraries. - Load the Fashion MNIST dataset using
keras.datasets.fashion_mnist.load_data(). - Normalize the image pixel values from the range [0, 255] to [0.0, 1.0] to facilitate stable gradient descent.
- Initialize a
Sequentialmodel. - Add a
Flattenlayer to unroll the 28x28 2D images into a 784-element 1D array. - Add four sequential hidden Dense layers with decreasing units (256, 128, 64,
32) and apply the
ReLUactivation function to each. - Add an output
Denselayer with 10 units (for the 10 clothing classes) using asoftmaxactivation function. - Compile the model using the
'adam'optimizer,'sparse_categorical_crossentropy'loss, and'accuracy'metrics. - Train the model using
model.fit()on the training data for 15 epochs, setting aside 20% of the data for validation. - Plot the training and validation accuracy curves over the epochs.
- Evaluate final model performance on the unseen test dataset.
Key Code Concepts
Snippet 1 — Defining the 4-Hidden-Layer Architecture
model = tf.keras.models.Sequential([ # Input layer tf.keras.layers.Flatten(input_shape=(28, 28)), # Hidden Layer 1 tf.keras.layers.Dense(256, activation='relu', name='Hidden_Layer_1'), # Hidden Layer 2 tf.keras.layers.Dense(128, activation='relu', name='Hidden_Layer_2'), # Hidden Layer 3 tf.keras.layers.Dense(64, activation='relu', name='Hidden_Layer_3'), # Hidden Layer 4 tf.keras.layers.Dense(32, activation='relu', name='Hidden_Layer_4'), # Output Layer (10 classes for Fashion MNIST) tf.keras.layers.Dense(10, activation='softmax', name='Output_Layer') ])
This snippet explicitly defines a multi-layer perceptron. The Flatten layer is mandatory
to convert 2D image data into the 1D vector required by standard Dense layers. Naming
the layers helps clarify the structure when printing the model summary.
Snippet 2 — Compiling and Training with Backpropagation
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train the model and save the history history = model.fit(x_train, y_train, epochs=15, batch_size=32, validation_split=0.2)
The compile method defines the backpropagation mechanics. adam is a
popular adaptive learning rate optimization algorithm. sparse_categorical_crossentropy
is used because our labels are integers (0-9) rather than one-hot encoded vectors. Training tracks
history to allow for post-training visualization of learning curves.
Expected Output
Model Summary: A tabular breakdown of the network showing 1 Flatten layer, 4 Dense hidden layers progressively decreasing in parameters, and 1 Dense output layer. Total trainable parameters will be around 242,000.
Training Logs: Text output for 15 epochs showing decreasing loss and
val_loss alongside increasing accuracy and val_accuracy.
Visualization: A line plot containing two lines (Train and Validation Accuracy) curving upwards and plateuing as epochs increase, demonstrating the model's learning trajectory and highlighting any potential overfitting.
Test Accuracy: A final printed console line displaying the test accuracy (typically hovering around 88% - 89% for Fashion MNIST with this architecture).
