Aim
To design and implement a robust Multivariate Long Short-Term Memory (LSTM) network for predicting stock prices, utilizing feature engineering, separate data scaling, and early stopping mechanisms.
Prerequisites
Theory
Time series forecasting involves predicting future values based on previously observed values. Traditional Artificial Neural Networks (ANNs) struggle with this because they process inputs independently and have no "memory" of past events. Recurrent Neural Networks (RNNs) solve this by passing the output of a neuron back to itself as an input for the next time step.
However, standard RNNs suffer from the vanishing gradient problem, making it hard to learn long-term dependencies. Long Short-Term Memory (LSTM) networks are a specialized type of RNN designed to overcome this issue. They utilize a complex internal structure consisting of memory cells and three gates: a forget gate, an input gate, and an output gate. These gates determine what information is kept, discarded, or passed forward, allowing the network to retain context over long sequences.
In this practical, we build a multivariate forecasting model. Instead of just using past stock prices to predict future prices (univariate), we engineer additional technical indicators: the 20-day and 50-day Simple Moving Averages (SMA). Providing these trend-following indicators as inputs gives the LSTM more context to make accurate predictions.
Furthermore, to prevent the model from memorizing the training data (overfitting), we use a regularization technique called Dropout, and a training callback known as Early Stopping, which halts training automatically when validation performance stops improving.
Algorithm / Step-by-Step
- Import necessary libraries:
yfinance,pandas,numpy,sklearn.preprocessing, andtensorflow.keras. - Download historical stock data (e.g., AAPL) using the yfinance library.
- Engineer new features by calculating the 20-day and 50-day Simple Moving Averages (SMA) on the Close price. Drop any resulting NaN values.
- Extract the features array (Close, SMA_20, SMA_50) and the target array (Close).
- Scale the features and target arrays independently using
MinMaxScaler(feature_range=(0,1))to ensure uniform gradient updates. - Define a sliding window function to create sequence blocks (e.g., using 60 days of data to predict the 61st day).
- Split the sequential data chronologically into training (80%) and testing (20%) sets.
- Construct a Sequential model containing
LSTMlayers,Dropoutlayers (set to 20%), and fully connectedDenseoutput layers. - Compile the model using the
adamoptimizer andmean_squared_errorloss function. - Define an
EarlyStoppingcallback tracking validation loss to prevent overfitting. - Train the model using
model.fit()with the training data and validation splits. - Predict values on the test set and apply
inverse_transformto revert scaled outputs back to actual currency values. Evaluate using RMSE and MAE.
Key Code Concepts
Snippet 1 — Sequence Creation (Sliding Window)
def create_multivariate_sequences(features_data, target_data, time_step=60): X, y = [], [] for i in range(len(features_data) - time_step): # Appends a block of 60 days of multivariate features X.append(features_data[i:(i + time_step), :]) # Appends the target (Close price) of the 61st day y.append(target_data[i + time_step, 0]) return np.array(X), np.array(y)
Neural networks require inputs in a structured format. This function takes continuous time series data and turns it into a supervised learning problem mapping inputs (X) to labels (y). The resulting shape of X is 3-dimensional: (Samples, Time_Steps, Features), which is mandatory for Keras LSTM layers.
Snippet 2 — Building the LSTM with Early Stopping
model = Sequential([ # First LSTM layer returns sequences for the next LSTM layer LSTM(64, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])), Dropout(0.2), LSTM(64, return_sequences=False), Dropout(0.2), Dense(32, activation='relu'), Dense(1) ]) early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True) model.compile(optimizer='adam', loss='mean_squared_error')
return_sequences=True is critical when stacking LSTM layers, as it passes the full
sequence of hidden states to the next layer instead of just the final state. The
EarlyStopping callback will halt training if the model's performance on the validation
set fails to improve for 5 consecutive epochs (patience=5).
Expected Output
Initial Plot: A multi-line graph showing the raw closing price interwoven with smoothed 20-day and 50-day Simple Moving Average trendlines.
Training Logs: Epoch printouts showing training and validation loss decreasing. The training will likely stop prematurely (before the specified 50 epochs) due to the Early Stopping callback triggering.
Evaluation Metrics: Console outputs stating the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) evaluated in actual USD amounts (e.g., "RMSE: $4.50").
Final Forecasting Plot: A large chart displaying the training data (blue), actual test data (orange), and the model's predictions (red) closely tracking the test data trajectory over time.
Viva Questions & Answers
inverse_transform directly on the
predictions without needing to reconstruct dummy columns for the SMA features.