yaswanthteja

Posted on May 31

Predicting Stock Prices with AI: A Simple Guide to Using LSTM for Nifty50 Forecasting

#beginners #tutorial #ai #python

Introduction

Have you ever wondered if computers can predict stock prices? Well, they can—not perfectly, but they can learn from past data to make educated guesses about future prices. In this blog, we’ll explore how we can use a special kind of artificial intelligence (AI) called Long Short-Term Memory (LSTM) to predict the future prices of the Nifty50, India’s leading stock market index.

Don’t worry if you’re not a tech expert—we’ll explain everything in simple terms, just like teaching a friend!

How Does Stock Prediction Work?

Imagine you’re trying to predict the weather. You’d look at past weather patterns—like temperature, humidity, and rainfall—to guess if it’ll rain tomorrow. Similarly, stock prediction works by analyzing past stock prices to forecast future trends.

Here’s how we do it:

Get Historical Data – We download years of Nifty50 stock prices.
Clean the Data – Remove errors or missing values (like a teacher correcting a messy notebook).
Train the AI Model – Teach the computer to recognize patterns in stock prices.
Make Predictions – Ask the AI to predict future prices based on what it learned.

Now, let’s dive deeper into each step!

Step 1: Getting the Data

We use a library called yfinance (Yahoo Finance) to download Nifty50 stock prices from 2005 to 2025. This gives us a big table with daily prices—like a giant Excel sheet with dates and closing prices.

import yfinance as yf

start_date = '2005-05-16'
end_date = '2025-05-15'
nifty_data = yf.download('^NSEI', start=start_date, end=end_date)

Think of this as downloading a history book of stock prices.

Step 2: Cleaning the Data

Sometimes, data has missing or incorrect entries (like a torn page in a book). We fix this by:

Sorting dates correctly.
Removing or filling missing values.

print(f"Missing values in the dataset: {nifty_data.isnull().sum().sum()}")
nifty_data = nifty_data.sort_index()

This ensures our AI learns from clean, organized data.

Step 3: Scaling the Data (Making Numbers Easier to Work With)

Stock prices can be huge (like ₹20,000), but AI works better with smaller numbers (between 0 and 1). We use MinMaxScaler to shrink the numbers while keeping their relationships intact.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(nifty_close)  # nifty_close = Closing prices

This is like converting kilometers into meters—same distance, just easier to handle.

Step 4: Preparing the Data for AI Learning

AI learns from sequences. Imagine teaching a child to predict the next number in this sequence:

1, 2, 3, …? (Answer: 4)

Similarly, we give the AI sequences of 60 days’ stock prices and ask it to predict the 61st day.

def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step - 1):
        X.append(dataset[i:(i + time_step), 0])  # 60 days of data
        y.append(dataset[i + time_step, 0])      # Next day's price
    return np.array(X), np.array(y)

X_train, y_train = create_dataset(train_data, time_step=60)

This way, the AI learns patterns like:

If prices rise for 10 days, will they fall soon?
Do big drops usually recover?

Step 5: Building the AI (LSTM Model)

Our AI is an LSTM (Long Short-Term Memory) network — a type of neural network great at learning sequences (like stock prices over time).

We build it like this:

First Layer (50 neurons) – Learns basic patterns.
Second Layer (50 neurons) – Learns deeper trends.
Third Layer (50 neurons) – Makes final predictions.
Dropout Layers – Prevents overfitting (like a student who memorizes answers instead of learning concepts).

model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(time_step, 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))  # Final prediction

We then train the model using past data, just like a student practicing with old exam papers.

Step 6: Making Predictions

After training, the AI can predict future prices. We test it on unseen data (like a surprise test) to see how well it performs.

train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

We measure accuracy using:

RMSE (Root Mean Square Error) – How far predictions are from real prices.
R² Score – How well the model explains price movements (0% = random guess, 100% = perfect prediction).

train_rmse = math.sqrt(mean_squared_error(y_train_actual, train_predict))
test_r2 = r2_score(y_test_actual, test_predict)

If the AI gets ~90% accuracy, it’s doing well!

Step 7: Predicting Future Prices

Finally, we ask the AI: "What will Nifty50 prices be in the next 30 days?"

We feed it the last 60 days’ data and let it predict day by day.

last_60_days = scaled_data[-60:]
future_predictions = []

for _ in range(30):
    X_future = last_60_days.reshape(1, time_step, 1)
    future_pred = model.predict(X_future)
    last_60_days = np.append(last_60_days[1:], future_pred)
    future_predictions.append(future_pred[0, 0])

We then plot the predictions:

The red line shows what the AI thinks will happen!

Here, the sample size is limited to get better accuracy. We need to train them with larger data.

Conclusion: Can AI Really Predict Stocks?

Yes — but with limitations.

✅ Good at:

Finding patterns in historical data.
Making short-term predictions.

❌ Not perfect at:

Predicting sudden crashes (like COVID-19).
Accounting for unexpected news (elections, wars).

Try It Yourself!

Want to run this code? Copy the full script from above and try it on Google Colab or Jupyter Notebook.

click below link and run the below code

Google Colab


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import datetime as dt
import math

# Download market data
print("Downloading Nifty50 data...")
start_date = '2005-05-16'
end_date = '2025-05-15'
nifty_data = yf.download('^NSEI', start=start_date, end=end_date)

# Data cleaning and preprocessing
print("\nCleaning and preprocessing data...")
print(f"Missing values in the dataset: {nifty_data.isnull().sum().sum()}")
nifty_data = nifty_data.sort_index()
nifty_close = nifty_data['Close'].values.reshape(-1, 1)

# Scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(nifty_close)

# Split data into training and testing sets
train_size = int(len(scaled_data) * 0.8)
test_size = len(scaled_data) - train_size
train_data, test_data = scaled_data[0:train_size,:], scaled_data[train_size:len(scaled_data),:]
print(f"\nTraining data size: {train_size}, Testing data size: {test_size}")

def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step - 1):
        X.append(dataset[i:(i + time_step), 0])
        y.append(dataset[i + time_step, 0])
    return np.array(X), np.array(y)

# Create the dataset with time steps
time_step = 60
X_train, y_train = create_dataset(train_data, time_step)
X_test, y_test = create_dataset(test_data, time_step)

# Reshape input to be [samples, time steps, features] which is required for LSTM
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Build LSTM model
print("\nBuilding LSTM model...")
model = Sequential()
# First LSTM layer with 50 neurons and return sequences=True to stack another LSTM layer
model.add(LSTM(units=50, return_sequences=True, input_shape=(time_step, 1)))
model.add(Dropout(0.2)) # Dropout to prevent overfitting

# Second LSTM layer with 50 neurons
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))

# Third LSTM layer with 50 neurons
model.add(LSTM(units=50))
model.add(Dropout(0.2))

# Output layer
model.add(Dense(units=1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Early stopping to prevent overfitting
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train the model
print("\nTraining the model...")
batch_size = 32
epochs = 50

history = model.fit(
    X_train, y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_split=0.1,  # Use 10% of training data for validation
    callbacks=[early_stop],
    verbose=1
)

# Make predictions and evaluate the model
print("\nMaking predictions...")
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Inverse transform to get actual values
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)
y_train_actual = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))

# Calculate performance metrics
train_rmse = math.sqrt(mean_squared_error(y_train_actual, train_predict))
test_rmse = math.sqrt(mean_squared_error(y_test_actual, test_predict))
train_mae = mean_absolute_error(y_train_actual, train_predict)
test_mae = mean_absolute_error(y_test_actual, test_predict)
train_r2 = r2_score(y_train_actual, train_predict)
test_r2 = r2_score(y_test_actual, test_predict)

# Display results
print(f"\nTraining RMSE: {train_rmse:.2f}")
print(f"Testing RMSE: {test_rmse:.2f}")
print(f"Training MAE: {train_mae:.2f}")
print(f"Testing MAE: {test_mae:.2f}")
print(f"Training R^2 Score: {train_r2:.2f}")
print(f"Testing R^2 Score: {test_r2:.2f}")

# Calculate accuracy as a percentage (simplified for this context)
def calculate_accuracy(actual, predicted, threshold=0.01):
    within_threshold = np.abs(actual - predicted) <= threshold * actual
    accuracy = np.mean(within_threshold) * 100
    return accuracy

train_accuracy = calculate_accuracy(y_train_actual, train_predict)
test_accuracy = calculate_accuracy(y_test_actual, test_predict)

print(f"\nTraining Accuracy: {train_accuracy:.2f}%")
print(f"Testing Accuracy: {test_accuracy:.2f}%")

# Visualize the results
print("\nVisualizing results...")
train_dates = nifty_data.index[time_step+1:train_size]
test_dates = nifty_data.index[train_size+time_step:-1]

plt.figure(figsize=(14, 6))
plt.plot(train_dates, y_train_actual, label='Actual Training Data')
plt.plot(train_dates, train_predict, label='Training Predictions')
plt.plot(test_dates, y_test_actual, label='Actual Testing Data')
plt.plot(test_dates, test_predict, label='Testing Predictions')
plt.title('Nifty50 Price Prediction')
plt.xlabel('Date')
plt.ylabel('Price (INR)')
plt.legend()
plt.show()

# Future predictions
print("\nMaking future predictions...")
last_60_days = scaled_data[-60:]
future_predictions = []

for _ in range(30):
    # Reshape data for prediction
    X_future = last_60_days.reshape(1, time_step, 1)
    # Make prediction
    future_pred = model.predict(X_future)
    # Append to the input data
    last_60_days = np.append(last_60_days[1:], future_pred)
    last_60_days = last_60_days.reshape(-1, 1)
    # Store the prediction
    future_predictions.append(future_pred[0, 0])

# Inverse transform to get actual values
future_predictions = np.array(future_predictions).reshape(-1, 1)
future_predictions = scaler.inverse_transform(future_predictions)

# Create future dates
last_date = nifty_data.index[-1]
future_dates = [last_date + dt.timedelta(days=i) for i in range(1, 31)]

# Visualize future predictions
plt.figure(figsize=(14, 6))
plt.plot(nifty_data.index[-100:], nifty_data['Close'].values[-100:], label='Historical Data')
plt.plot(future_dates, future_predictions, label='Future Predictions', color='red')
plt.title('Nifty50 Future Price Prediction')
plt.xlabel('Date')
plt.ylabel('Price (INR)')
plt.legend()
plt.show()

# Print model summary
print("\nModel Summary:")
model.summary()

Credits:ezcompounding

Final Thoughts

AI is a powerful tool for stock prediction, but it’s not a crystal ball. It helps investors make educated guesses, not certainties.

Would you trust AI for stock advice? Let us know in the comments! 🚀

Happy investing! 📈

DEV Community