Aim
To design a Convolutional Autoencoder neural network to compress and reconstruct images from the CIFAR-10 dataset, and to visually compare the original input images against the model's reconstructed outputs.
Prerequisites
Theory
An Autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal "noise". Unlike classification models that require human-labeled targets, autoencoders use the input data itself as the target.
The architecture is comprised of two distinct parts that work sequentially:
- The Encoder: This part compresses the input data into a lower-dimensional
latent-space representation (often called a bottleneck). In Convolutional Autoencoders,
this is achieved using
Conv2DandMaxPooling2Dlayers, which gradually decrease the spatial resolution of the image while capturing high-level feature maps. - The Decoder: This part aims to reconstruct the original input from the
compressed latent-space representation. It mirrors the encoder, utilizing
Conv2DandUpSampling2D(or Transposed Convolution) layers to progressively expand the spatial dimensions back to the original input shape.
In this practical, we utilize the CIFAR-10 dataset, which contains 60,000 32x32 color images. The objective is to evaluate how well the autoencoder can "squeeze" these 32x32x3 (3,072 pixels) images down to an 8x8 representation and reliably inflate them back into recognizable color images without extreme loss of detail.
Algorithm / Step-by-Step
- Import
tensorflow,keras.layers,keras.models,numpy, andmatplotlib.pyplot. - Load the CIFAR-10 dataset. Discard the
y_trainandy_testlabels since this is an unsupervised learning task. - Normalize the pixel values of the training and testing sets to be in the range [0.0, 1.0] by dividing by 255.0.
- Define a
Sequentialmodel. - Construct Encoder: Add
Conv2Dlayers with ReLU activation followed byMaxPooling2Dto downsample the images from 32x32 to 16x16, and then to a bottleneck of 8x8. - Construct Decoder: Add
Conv2Dlayers with ReLU activation followed byUpSampling2Dto upscale the bottleneck from 8x8 to 16x16, and finally back to 32x32. - Add a final
Conv2Dlayer with 3 filters and asigmoidactivation function to generate the final 3-channel (RGB) image bounded between 0 and 1. - Compile the autoencoder using the
adamoptimizer andmse(Mean Squared Error) loss function. - Train the model using
model.fit(), passingx_trainas both the input (x) and target (y) parameters. Use the test set for validation. - Generate reconstructed images by calling
model.predict()on a subset of the test data. - Plot the original input images alongside their corresponding reconstructed outputs using Matplotlib to visually compare the data retention.
Key Code Concepts
Snippet 1 — Autoencoder Architecture (Encoder + Decoder)
autoencoder = models.Sequential([ # ENCODER layers.InputLayer(input_shape=(32, 32, 3)), layers.Conv2D(32, (3, 3), activation='relu', padding='same'), layers.MaxPooling2D((2, 2), padding='same'), # Compresses to 16x16 layers.Conv2D(16, (3, 3), activation='relu', padding='same'), layers.MaxPooling2D((2, 2), padding='same'), # Compresses to 8x8 (Bottleneck) # DECODER layers.Conv2D(16, (3, 3), activation='relu', padding='same'), layers.UpSampling2D((2, 2)), # Decompresses to 16x16 layers.Conv2D(32, (3, 3), activation='relu', padding='same'), layers.UpSampling2D((2, 2)), # Decompresses to 32x32 layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same') # 3 channels for RGB ])
This symmetric structure forces the network to learn a compressed representation. The
MaxPooling2D layers aggressively halve the resolution, creating the bottleneck. The
UpSampling2D layers reverse this by duplicating rows and columns. The final
sigmoid activation ensures pixel outputs map cleanly to the [0, 1] range we scaled our
data into.
Snippet 2 — Compiling and Self-Supervised Training
autoencoder.compile(optimizer='adam', loss='mse') # Notice x_train is passed for both input AND target history = autoencoder.fit(x_train, x_train, epochs=10, batch_size=128, validation_data=(x_test, x_test), verbose=1)
Because the network is attempting to recreate the input, we do not use y_train labels.
By passing x_train, x_train, we instruct the model to calculate its Mean Squared Error
(MSE) loss based on the pixel-by-pixel difference between the original image and the reconstructed
image.
Expected Output
Model Summary: A table tracing the dimensions down from (32, 32, 3) to a bottleneck of (8, 8, 16), and back up to (32, 32, 3).
Training Logs: Epoch printouts showing decreasing Mean Squared Error (MSE) loss, indicating the model is getting better at reconstructing images.
Visual Comparison Plot: A generated figure with two rows of images. The top row will display crisp, original images from the CIFAR-10 test set. The bottom row will show the model's reconstructions of those exact images. The reconstructions will look similar in shape and color but slightly blurrier, reflecting the information lost during the compression bottleneck.
