NIELIT

NIELIT Ropar

Deep Learning Techniques · DOAI250006

← Back

// Practical — 03

Binary Classification Model for
Disease Risk Prediction using ANN

Open in Colab

Aim

To design, train, and evaluate a binary classification Artificial Neural Network (ANN) that predicts whether a breast cancer tumor is malignant or benign based on standardized clinical features, evaluating its performance via accuracy metrics and a confusion matrix.

Prerequisites

Python & Scikit-learn
Feature Scaling
ANN Architecture
Sigmoid Activation
Binary Cross-Entropy
Confusion Matrix

Theory

Binary classification is the core machine learning task of predicting one of two possible outcomes — such as disease-positive (1) or disease-negative (0). In healthcare AI, this is critical for early risk screening and automated diagnostics. An ANN designed for binary classification must output a single probability value in the range of (0,1). This is achieved by placing exactly one neuron in the final layer equipped with the sigmoid activation function: σ(z) = 1 / (1 + e^(-z)).

The binary cross-entropy (or log loss) function is the standard mathematical loss for binary classification: L = -[y·log(p) + (1-y)·log(1-p)]. It heavily penalizes the network when it makes a confident prediction that is entirely wrong, guiding the Adam optimizer to quickly adjust the weights in the correct direction.

In this practical, we utilize the Breast Cancer Dataset from Scikit-Learn. It contains 569 patient samples, each with 30 numeric clinical features (like tumor radius, texture, area, and smoothness). Because these features exist on wildly different numerical scales (e.g., area might be ~1000 while smoothness is ~0.1), feature standardization (using `StandardScaler`) is mandatory. Without scaling, features with large magnitudes would dominate the gradient updates, leading to a biased or non-converging model.

While accuracy is a good starting metric, medical diagnostics require deeper evaluation. We use a Confusion Matrix to visualize the network's predictions. It breaks down the results into True Positives (correctly identified), True Negatives, False Positives (Type I error / false alarm), and False Negatives (Type II error / missed diagnosis). In cancer screening, minimizing False Negatives is heavily prioritized.

Algorithm / Step-by-Step

  1. Import required libraries: `tensorflow`, `pandas`, `sklearn.datasets`, `sklearn.preprocessing`, and visualization tools (`matplotlib`, `seaborn`).
  2. Load the breast cancer dataset using `load_breast_cancer()`.
  3. Extract the feature matrix (`X`) and the target labels (`y`).
  4. Split the data into training (80%) and testing (20%) sets using `train_test_split`.
  5. Initialize a `StandardScaler`. Fit it only on the training data, then transform both the training and testing sets to prevent data leakage.
  6. Build a Sequential ANN model with three layers: `Dense(32, relu)` → `Dense(16, relu)` → `Dense(1, sigmoid)`.
  7. Compile the model specifying `adam` as the optimizer, `binary_crossentropy` as the loss, and tracking the `accuracy` metric.
  8. Train the model using `model.fit()` for 50 epochs with a batch size of 16 and a validation split of 20%.
  9. Evaluate the final model on the unseen test set to retrieve the test accuracy and loss.
  10. Generate prediction probabilities for the test set, convert them to binary labels (0 or 1) using a 0.5 threshold, and compute the `confusion_matrix`.
  11. Visualize the confusion matrix as a heatmap using Seaborn.

Key Code Concepts

Snippet 1 — Data Splitting and Safe Scaling

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data to a mean of 0 and variance of 1
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Crucially, fit_transform is only called on the training data. This calculates the mean and standard deviation from the training set and applies it. The exact same scaler is then applied to the test data using only transform. This ensures the model receives no statistical hints about the unseen test data (preventing "data leakage").

Snippet 2 — Binary Classification Architecture

model = tf.keras.models.Sequential([
    # Input layer automatically handles the 30 features from the dataset
    tf.keras.layers.Dense(32, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dense(16, activation='relu'),
    # Output layer for binary classification must have 1 neuron
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

The network compresses the 30 clinical features down to 32 neurons, then to 16, and finally to 1. The sigmoid activation on the final neuron guarantees the output will be squashed strictly between 0 and 1, allowing it to be interpreted as the probability of the tumor being benign.

Snippet 3 — Generating the Confusion Matrix

# 1. Get continuous probability predictions [e.g., 0.12, 0.89, 0.45]
y_pred_prob = model.predict(X_test_scaled)

# 2. Threshold probabilities at 0.5 to get crisp binary classes [e.g., 0, 1, 0]
y_pred = (y_pred_prob > 0.5).astype(int)

# 3. Compute array of True Positives, False Positives, etc.
cm = confusion_matrix(y_test, y_pred)

Neural networks output floats. To compare the network's predictions against the true labels (which are crisp 0s and 1s), we must apply a threshold boundary. Values above 0.5 are classified as 1 (Benign), and values below 0.5 are classified as 0 (Malignant).

Expected Output

Training Logs: Text output showing the model training over 50 epochs. You will see the loss steadily dropping towards 0.05 and the accuracy climbing towards 98%.

Test Accuracy: A printed line displaying the model's accuracy on the unseen test dataset. Given the relatively clean nature of the Breast Cancer dataset and the use of scaling, this test accuracy typically exceeds 96% to 98%.

Confusion Matrix Heatmap: A colorful 2x2 grid generated by Seaborn. The top-left (True Malignant) and bottom-right (True Benign) squares will display high numbers, indicating correct predictions. The top-right and bottom-left squares (the errors) will contain numbers very close to zero, reflecting the model's high accuracy.

Viva Questions & Answers

Q1. Why is the output layer of a binary classification ANN a single neuron with a Sigmoid activation?
The sigmoid mathematical function uniquely maps any real-valued input (from negative infinity to positive infinity) strictly to a range between 0 and 1. This allows the single output to be directly interpreted as a probability (e.g., 0.85 = 85% chance of being class 1). A threshold is then used to convert this probability into a definitive binary choice.
Q2. Why do we use StandardScaler, and why is it only 'fit' on the training data?
StandardScaler normalizes features to have a mean of 0 and standard deviation of 1, ensuring features with large numerical ranges don't overwhelm the neural network. We only `fit` it on the training data to calculate the mean/variance. If we fit it on the test data as well, we would be injecting statistical information about the test set into the training process, causing an error known as "Data Leakage."
Q3. What is binary cross-entropy loss?
Binary cross-entropy is a loss function specifically designed for 2-class problems. It heavily penalizes the model when it makes a confident prediction that is entirely wrong (for instance, predicting a 99% probability for Class 1 when the true label is 0). This exponential penalty helps the optimizer adjust the network's weights quickly.
Q4. What exactly does the Confusion Matrix show?
A confusion matrix is a 2x2 table that breaks down the raw accuracy metric. It displays the counts of True Positives (correctly identified class 1), True Negatives (correctly identified class 0), False Positives (predicting 1 when it's actually 0), and False Negatives (predicting 0 when it's actually 1). It helps determine if a model is biased towards a specific class.
Q5. Why use ReLU activation in the hidden layers instead of Sigmoid?
The ReLU (Rectified Linear Unit) function simply outputs the input if it's positive, and outputs 0 if it's negative. Unlike Sigmoid, ReLU does not suffer from the "vanishing gradient problem," allowing the neural network to learn much faster and more effectively during the backpropagation process.