P09 — LSTM Time Series Forecasting

Aim

To design and implement a robust Multivariate Long Short-Term Memory (LSTM) network for predicting stock prices, utilizing feature engineering, separate data scaling, and early stopping mechanisms.

Prerequisites

Python & Pandas

Time Series Data

Feature Engineering

Data Scaling

RNN / LSTM Architecture

TensorFlow / Keras

Theory

Time series forecasting involves predicting future values based on previously observed values. Traditional Artificial Neural Networks (ANNs) struggle with this because they process inputs independently and have no "memory" of past events. Recurrent Neural Networks (RNNs) solve this by passing the output of a neuron back to itself as an input for the next time step.

However, standard RNNs suffer from the vanishing gradient problem, making it hard to learn long-term dependencies. Long Short-Term Memory (LSTM) networks are a specialized type of RNN designed to overcome this issue. They utilize a complex internal structure consisting of memory cells and three gates: a forget gate, an input gate, and an output gate. These gates determine what information is kept, discarded, or passed forward, allowing the network to retain context over long sequences.

In this practical, we build a multivariate forecasting model. Instead of just using past stock prices to predict future prices (univariate), we engineer additional technical indicators: the 20-day and 50-day Simple Moving Averages (SMA). Providing these trend-following indicators as inputs gives the LSTM more context to make accurate predictions.

Furthermore, to prevent the model from memorizing the training data (overfitting), we use a regularization technique called Dropout, and a training callback known as Early Stopping, which halts training automatically when validation performance stops improving.

Algorithm / Step-by-Step

Import necessary libraries: yfinance, pandas, numpy, sklearn.preprocessing, and tensorflow.keras.
Download historical stock data (e.g., AAPL) using the yfinance library.
Engineer new features by calculating the 20-day and 50-day Simple Moving Averages (SMA) on the Close price. Drop any resulting NaN values.
Extract the features array (Close, SMA_20, SMA_50) and the target array (Close).
Scale the features and target arrays independently using MinMaxScaler(feature_range=(0,1)) to ensure uniform gradient updates.
Define a sliding window function to create sequence blocks (e.g., using 60 days of data to predict the 61st day).
Split the sequential data chronologically into training (80%) and testing (20%) sets.
Construct a Sequential model containing LSTM layers, Dropout layers (set to 20%), and fully connected Dense output layers.
Compile the model using the adam optimizer and mean_squared_error loss function.
Define an EarlyStopping callback tracking validation loss to prevent overfitting.
Train the model using model.fit() with the training data and validation splits.
Predict values on the test set and apply inverse_transform to revert scaled outputs back to actual currency values. Evaluate using RMSE and MAE.

Key Code Concepts

Snippet 1 — Sequence Creation (Sliding Window)

def create_multivariate_sequences(features_data, target_data, time_step=60):
    X, y = [], []
    for i in range(len(features_data) - time_step):
        # Appends a block of 60 days of multivariate features
        X.append(features_data[i:(i + time_step), :])
        # Appends the target (Close price) of the 61st day
        y.append(target_data[i + time_step, 0])
    return np.array(X), np.array(y)

Neural networks require inputs in a structured format. This function takes continuous time series data and turns it into a supervised learning problem mapping inputs (X) to labels (y). The resulting shape of X is 3-dimensional: (Samples, Time_Steps, Features), which is mandatory for Keras LSTM layers.

Snippet 2 — Building the LSTM with Early Stopping

model = Sequential([
    # First LSTM layer returns sequences for the next LSTM layer
    LSTM(64, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.2),
    LSTM(64, return_sequences=False),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(1)
])

early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
model.compile(optimizer='adam', loss='mean_squared_error')

return_sequences=True is critical when stacking LSTM layers, as it passes the full sequence of hidden states to the next layer instead of just the final state. The EarlyStopping callback will halt training if the model's performance on the validation set fails to improve for 5 consecutive epochs (patience=5).

Expected Output

Initial Plot: A multi-line graph showing the raw closing price interwoven with smoothed 20-day and 50-day Simple Moving Average trendlines.

Training Logs: Epoch printouts showing training and validation loss decreasing. The training will likely stop prematurely (before the specified 50 epochs) due to the Early Stopping callback triggering.

Evaluation Metrics: Console outputs stating the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) evaluated in actual USD amounts (e.g., "RMSE: $4.50").

Final Forecasting Plot: A large chart displaying the training data (blue), actual test data (orange), and the model's predictions (red) closely tracking the test data trajectory over time.

Viva Questions & Answers

Q1. Why are LSTMs preferred over standard Artificial Neural Networks (ANNs) for time series forecasting?

Standard ANNs treat inputs as independent occurrences, lacking the concept of order. LSTMs contain a recurrent structure and memory cells that retain information from previous time steps, allowing them to capture temporal dependencies and sequential patterns crucial for time series data.

Q2. What is the purpose of the sliding window method (sequence creation)?

The sliding window method converts continuous time series data into a supervised learning format. It extracts a specific chunk of past data (e.g., 60 days) to act as the features (X) and maps it to the immediately following data point (e.g., 61st day) which acts as the target label (y).

Q3. Why do we scale the input features and target outputs using two separate scaler objects?

By using one scaler for the multi-feature inputs and an independent scaler for the single-target output, we drastically simplify post-processing. When the model outputs predictions, we can use the target scaler's inverse_transform directly on the predictions without needing to reconstruct dummy columns for the SMA features.

Q4. How does the Early Stopping callback help in training?

Early Stopping monitors a specific metric, usually validation loss. If the validation loss stops decreasing or begins to increase for a set number of epochs (defined by 'patience'), it halts the training process early. This prevents the model from overfitting to the training data and restores the model to its best-performing state.

Q5. What is the role of the Dropout layer used after the LSTM layers?

Dropout is a regularization technique that randomly ignores (sets to zero) a certain percentage of neurons during each training pass. This forces the network to distribute its learning across multiple nodes, preventing it from relying too heavily on specific features and reducing overall overfitting.

NIELIT Ropar

Time Series Forecasting using Deep
Learning Network with LSTM

Aim

Prerequisites

Theory

Algorithm / Step-by-Step

Key Code Concepts

Expected Output

Viva Questions & Answers

Time Series Forecasting using DeepLearning Network with LSTM

Aim

Prerequisites

Theory

Algorithm / Step-by-Step

Key Code Concepts

Expected Output

Viva Questions & Answers

Time Series Forecasting using Deep
Learning Network with LSTM