Skip to content

DEV Community

GeekyHumans

Posted on Jul 18, 2022 • Originally published at geekyhumans.com on Jul 14, 2022

How to predict US Stock Price using Python?

#datascience #mlai

Stock market prediction is a hot topic nowadays. Because of the big speculation risk, the stock market is highly influenced by the news, such as the policy change caused by the Federal Reserve, the interest rate, and so on. This article describes how to predict US stock price using Python with the help of artificial intelligence technology of deep neural networks to predict US stock prices. We will write a program in python that predicts the movement of the US stock market by using historical data.

Pre-Requisites:

Numpy:

NumPy, short for Numerical Python, is a Python library used for scientific computing and data processing. This library makes it easier to run Python code on arrays and matrices instead of lists. It has many functions to make your mathematics faster.

You can install the Jupyter notebook using the following command in your conda terminal.

pip install numpy

Matplotlib:

Matplotlib is a very extensive library. Matplotlin was created as the graphical user interface for a program named MATLAB. Engineers and data scientists primarily use MATLAB, although it also works well with Python. Since we’re going to create charts and graphs, therefore, we need to install matplotlib.

You can install the Jupyter notebook using the following command in your conda terminal.

pip install Matplotlib

Tensorflow:

TensorFlow is an open-source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.

You can install the Jupyter notebook using the following command in your conda terminal.

pip install --upgrade tensorflow

Sklearn:

SK-learn is a python library that makes the machine learning process easy to understand

You can install the Jupyter notebook using the following command in your conda terminal.

pip install -U scikit-learn

Pandas_datareader:

Pandas DataReader is a Python package that allows us to create a pandas DataFrame object by using various data sources from the internet. It is popularly used for real-time stock price datasets.

You can install the Jupyter notebook using the following command in your conda terminal.

pip install pandas-datareader

Step -1: Import dependencies

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.model_selection import PredefinedSplit
import pandas_datareader as web
import datetime as dt

from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM

Step -2: Load the data

For that, we have to specify from what point we want to take the data to predict and we also defined the ticker symbol you can get the ticker symbol of any company from the google

company = 'FB'
start = dt.datetime(2014,1,1)
end = dt.datetime(2022,1,1)

# define ticker symbol
data = web.DataReader(company, 'yahoo', start, end)

Step -3: Preparing the data

To prepare the data we are not going to use the whole data frame we are only using the closing price.

scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(data['Close'].values.reshape(-1, 1))

# how many days we want to look at the past to predict
prediction_days = 60

# defining two empty lists for preparing the training data
x_train = []
y_train = []

# we are counting from the 60th index to the last index
for x in range(prediction_days, len(scaled_data)):
    x_train.append(scaled_data[x-prediction_days:x, 0])
    y_train.append(scaled_data[x, 0])

x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

Step -4: Build the model and specify the layers

Here we are always going to include a single LSTM layer, followed by a dropout layer in the sequence. After that, we are going to have dense layers which will be many units in size and each unit will be the stock price prediction. You can change the number of units used but you need to know that more units mean a longer training time since there is more computation required per layer.

model = Sequential()
# specify the layer
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
# this is going to be a prediction of the next closing value
model.add(Dense(units=1))

Step -5: Compiling the Model

model.compile(optimizer='adam', loss='mean_squared_error')
# fit the model in the training data
model.fit(x_train, y_train, epochs=25, batch_size=32)

Step -6: Testing the model

# Load Test Data
test_start = dt.datetime(2020,1,1)
test_end = dt.datetime.now()

test_data = web.DataReader(company, 'yahoo', test_start, test_end)

Now what we are going to do with the data from this company is that we need to see how predictive it can be. We need to get prices, scale the prices, and then create a total data set consisting of both tested and untested information so firstly, we’ll use actual stock market data which is not related to any predictions made. In the real world the type of data we would use will be closing values and then, what we will do is combine all the information into one big data set to help us make our predictions.

actual_prices = test_data['Close'].values
total_dataset = pd.concat((data['Close'],test_data['Close']), axis=0)

model_input = total_dataset[len(total_dataset)- len(test_data) - prediction_days:].values
# reshaping the model
model_input = model_input.reshape(-1, 1)
# scaling down the model
model_input = scaler.transform(model_input)

Step -7: Predict the next day’s data

x_test = []
for x in range(prediction_days, len(model_input)):
    x_test.append(model_input[x-prediction_days:x, 0])

x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

predicted_price = model.predict(x_test)
predicted_price = scaler.inverse_transform(predicted_price)

# plot the test Predictions
plt.plot(actual_prices, color="black", label=f"Actual{company} price")
plt.plot(predicted_price, color='green', label="Predicted {company} Price")
plt.title(f"{company} Share price")
plt.xlabel('Time')
plt.ylabel(f'{company} share price')
plt.legend
plt.show()

Here we are going to use real data as an input to predict the data for the next day.

prediction = model.predict(real_data)
prediction = scaler.inverse_transform(prediction)
print(f"Prediction: {prediction}")

Output:

Final Words

In this blog, we learned how to predict the stock market with Python! Here we took a 60-day long time series of data, then predicted the next day’s data. It’s a little bit of a complicated process but it’s not that hard either. That said, I do not recommend using this for trading. I consider this more a learning experience than anything else. So hope you liked the tutorial and if you have any questions, please feel free to leave them down below and I’ll do my best to answer them!

Here are some useful tutorials that you can read:

The post How to predict US Stock Price using Python? appeared first on Geeky Humans.

Top comments (1)

Subscribe

peeerpioneers • Feb 13 '25

web.DataReader is generally the correct function, BUT... Yahoo Finance's API has changed. The yahoo data source in pandas-datareader is no longer officially supported. Yahoo discontinued their free historical data API in 2017. While some workarounds existed for a while, they are generally unreliable. You should use a different data source.