Streamlit- Deploy a Machine Learning Model without learning any web framework.

#machinelearning #datascience #python #streamlit

Explained with a demo project-

It can be very tiresome for many people to work on an actual data science project and then spend some more time working on a web framework, backend, and frontend. For a data scientist or a machine learning engineer, working on these technologies is a secondary task. So the question is that, how is it possible to deploy an ML model without learning even the flask, which is a very well known minimal python framework? Well, here in this blog, I'll present you with the most useful tool, namely Streamlit, which can help you to focus on your work as a data scientist. However, it will take care of the deployment of your model, which can be published as a working web application.

Let us understand how the streamlit tool is useful for ML/Data Science practitioners like you and me for deploying our model. We will use the famous capital bike share dataset and implement various classification algorithms to determine the member type of user, whether he/she is a Member, Casual, or Unkown. Before beginning, let me tell you guys that I am a neophyte blogger as I don't have a habit of writing much, and this is also my first blog. So, please excuse me for any mistake that I might have committed. I'll try to keep the explanation as simple as possible, so bear with me while reading this article. You should be aware of basic python programming language, and a little bit of working knowledge in implementing ML algorithms through the scikit-learn library would be sufficient for understanding this project.

To get started with Streamlit, you need to install it on your system using pip. type "$pip install streamlit" on your terminal/cmd. The command is the same for Linux as well as Windows users.

Let us begin with importing a few libraries along with streamlit as shown below:

	import streamlit as st
	import pandas as pd
	import numpy as np
	from sklearn.model_selection import train_test_split
	from sklearn import metrics
	from sklearn.preprocessing import LabelEncoder
	from sklearn.metrics import classification_report, confusion_matrix
	from sklearn.tree import DecisionTreeClassifier
	from sklearn.neural_network import MLPClassifier
	from sklearn.preprocessing import StandardScaler
	from sklearn.neighbors import KNeighborsClassifier
	import plotly.express as px

view raw Trip history data analytics blog 2 hosted with ❤ by GitHub

We will train the Decision Tree, Neural Network, and KNN Classifier model to predict the member type of the user. Have a look at an instance of our dataset:

Now, let's begin with defining our main() function, where we will be calling other functions to perform preprocessing on our dataset and invoke our ML models.

	def main():
	st.title("Prediction of Trip History Data using various Machine Learning Classification Algorithms- A Streamlit Demo!")
	# Function call to Load the dataset
	data = loadData()
	# Function call to perform the train test split on our dataset
	X_train, X_test, y_train, y_test, le = preprocessing(data)

	# Insert Check-Box to show the snippet of the data.
	if st.checkbox('Show Raw Data'):
	st.subheader("Showing raw data---->>>")
	st.write(data.head()) # Displays the instance of our dataset on the web app

view raw Trip history data analytics blog 1 hosted with ❤ by GitHub

In the above snippet, st.title() is a streamlit function that helps to display a title to our project. It will be shown as the title on your web app, which is deployed automatically on the localhost once you run this streamlit code on your local system. To run the Streamlit code, move to the directory where your code resides and type streamlit run your_file_name.py on your terminal or cmd, and it will run your web application on the localhost:8501. Furthermore, we load the dataset by calling our user-defined function loadData(). Then we will perform preprocessing and split the data into training and testing data by calling preprocessing() function (snippet of loadData() and preprocessing() functions is shown below). Function st.checkbox() is also a streamlit function that helps to enable a checkbox widget on the web app. So, when the user clicks on the checkbox, the function st.write() will be called, thus popping up an instance of the dataset on our application.
Have a look at function loadData() and preprocessing():

	@st.cache
	def loadData():
	df = pd.read_csv("2010-capitalbikeshare-tripdata.csv")
	return df

	# Basic and common preprocessing required for all the models.
	def preprocessing(df):
	# Assign X (independent features) and y (dependent feature i.e. df['Member type'] column in dataset)
	X = df.iloc[:, [0, 3, 5]].values
	y = df.iloc[:, -1].values

	# X and y has Categorical data hence needs Encoding
	le = LabelEncoder()
	y = le.fit_transform(y.flatten())

	# 1. Splitting X,y into Train & Test
	X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
	return X_train, X_test, y_train, y_test, le

view raw Trip history data analytics blog 3 hosted with ❤ by GitHub

The most interesting and useful feature of streamlit is introduced in the code shown above, i.e. @st.cache. It helps to cache every task performed in the corresponding function. Here, the dataset is cached, so there would be no delay the next time we load it. Now, we have assigned the independent and dependent features as X and y, respectively. Independent features that we considered are duration, start station number, and end station number. The dependent feature is the member type column of the dataset, but it needs to be encoded because it consists of categorical data. Here, I have used scikit-learn's LabelEncoder(), yet you can use any encoder of your choice. At last, we will split the data into training and testing datasets using the train_test_split() method of the scikit-learn library.
Now that we have our training and testing data ready let us implement ML algorithms and show the results on our web app.
First of all, we will let the user select the desired ML model from the sidebar's select-box using streamlit's st.sidebar.selectbox() method, as shown below:

	choose_model = st.sidebar.selectbox("Choose the ML Model",
	["NONE","Decision Tree", "Neural Network", "K-Nearest Neighbours"])

	if(choose_model == "Decision Tree"):
	score, report, tree = decisionTree(X_train, X_test, y_train, y_test)
	st.text("Accuracy of Decision Tree model is: ")
	st.write(score,"%")
	st.text("Report of Decision Tree model is: ")
	st.write(report)

	try:
	if(st.checkbox("Want to predict on your own Input? It is recommended to have a look at dataset to enter values in below tabs than just typing in random values")):
	user_prediction_data = accept_user_data()
	pred = tree.predict(user_prediction_data)
	st.write("The Predicted Class is: ", le.inverse_transform(pred)) # Inverse transform to get the original dependent value.
	except:
	pass

view raw Trip history data analytics blog 4 hosted with ❤ by GitHub

Since it is a streamlit tool demo with a hands-on project, explaining the ML algorithms is out of scope. So, I hope that you guys have some knowledge of implementing ML algorithms in python. Alright! Let's come back to the code shown above. If the user chooses our first model, which is a decision tree classifier, then decisionTree() function (shown below) gets called that returns the confusion matrix and the accuracy of the trained model. Again using st.write(), we can display the confusion matrix and the score on our web app.
Moreover, what if the user wants to feed his test sample to the model and get the predictions? For that, we can employ a checkbox on our application. If the user clicks on the checkbox, then accept_user_data() function (code of function accept_user_data() is shown below) gets called, and the user will be able to fill the three independent features manually. Thus, the model will predict the member type for the same.
Code for accept_user_data() function:

	# Accepting user data for predicting its Member Type
	def accept_user_data():
	duration = st.text_input("Enter the Duration: ")
	start_station = st.text_input("Enter the start station number: ")
	end_station = st.text_input("Enter the end station number: ")
	user_prediction_data = np.array([duration,start_station,end_station]).reshape(1,-1)

	return user_prediction_data

view raw Trip history data analytics blog 9 hosted with ❤ by GitHub

Furthermore, we can repeat the same thing by calling the functions that build the KNN classifier and Neural Network classifier. Please note that we have just discussed the function calls to these models, and we are yet to see those function definitions in which the actual models are getting trained. Have a look at the snippet of the code for those function calls:

	elif(choose_model == "Neural Network"):
	score, report, clf = neuralNet(X_train, X_test, y_train, y_test)
	st.text("Accuracy of Neural Network model is: ")
	st.write(score,"%")
	st.text("Report of Neural Network model is: ")
	st.write(report)

	try:
	if(st.checkbox("Want to predict on your own Input? It is recommended to have a look at dataset to enter values in below tabs than just typing in random values")):
	user_prediction_data = accept_user_data()
	scaler = StandardScaler()
	scaler.fit(X_train)
	user_prediction_data = scaler.transform(user_prediction_data)
	pred = clf.predict(user_prediction_data)
	st.write("The Predicted Class is: ", le.inverse_transform(pred)) # Inverse transform to get the original dependent value.
	except:
	pass

	elif(choose_model == "K-Nearest Neighbours"):
	score, report, clf = Knn_Classifier(X_train, X_test, y_train, y_test)
	st.text("Accuracy of K-Nearest Neighbour model is: ")
	st.write(score,"%")
	st.text("Report of K-Nearest Neighbour model is: ")
	st.write(report)

	try:
	if(st.checkbox("Want to predict on your own Input? It is recommended to have a look at dataset to enter values in below tabs than just typing in random values")):
	user_prediction_data = accept_user_data()
	pred = clf.predict(user_prediction_data)
	st.write("The Predicted Class is: ", le.inverse_transform(pred)) # Inverse transform to get the original dependent value.
	except:
	pass

view raw Trip history data analytics blog 5 hosted with ❤ by GitHub

Secondly, ML algorithms will be written using the scikit-learn library, which is a well-known machine learning library in python.
Decision Tree Classifier is written as follows:

	# Training Decission Tree for Classification
	@st.cache(suppress_st_warning=True)
	def decisionTree(X_train, X_test, y_train, y_test):
	# Train the model
	tree = DecisionTreeClassifier(max_leaf_nodes=3, random_state=0)
	tree.fit(X_train, y_train)
	y_pred = tree.predict(X_test)
	score = metrics.accuracy_score(y_test, y_pred) * 100
	report = classification_report(y_test, y_pred)

	return score, report, tree

view raw Trip history data analytics blog 6 hosted with ❤ by GitHub

Neural Network is also written in a somewhat similar way using MLPClassifier in scikit-learn. Here, we have to scale the data before training the model because that way, it gives more accurate results. Please observe that we have cached the models using streamlit, as discussed above. It will help to load the model instantly, thus, reducing the delay in execution.
See the Neural Network implementation below:

	# Training Neural Network for Classification.
	@st.cache(suppress_st_warning=True)
	def neuralNet(X_train, X_test, y_train, y_test):
	# Scalling the data before feeding it to the Neural Network.
	scaler = StandardScaler()
	scaler.fit(X_train)
	X_train = scaler.transform(X_train)
	X_test = scaler.transform(X_test)
	# Instantiate the Classifier and fit the model.
	clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
	clf.fit(X_train, y_train)
	y_pred = clf.predict(X_test)
	score1 = metrics.accuracy_score(y_test, y_pred) * 100
	report = classification_report(y_test, y_pred)

	return score1, report, clf

view raw Trip history data analytics blog 7 hosted with ❤ by GitHub

Similarly, the KNN classifier model is written using KNeighborsClassifier from scikit-learn library:

	# Training KNN Classifier
	@st.cache(suppress_st_warning=True)
	def Knn_Classifier(X_train, X_test, y_train, y_test):
	clf = KNeighborsClassifier(n_neighbors=5)
	clf.fit(X_train, y_train)
	y_pred = clf.predict(X_test)
	score = metrics.accuracy_score(y_test, y_pred) * 100
	report = classification_report(y_test, y_pred)

	return score, report, clf

view raw Trip history data analytics blog 8 hosted with ❤ by GitHub

We all know the importance of data visualization as far as Data Science and Machine Learning domain is concerned.
Finally, let us explore how can we showcase beautiful and interactive visualizations on our web app using Streamlit.
We will develop two visualizations based on this data for demo purposes. The first one is a visualization of a few start locations of the vehicles from the dataset. Streamlit cannot directly visualize the scatter plot of the geospatial points on the map; however, it requires latitude and longitude columns in the dataset. Thus, I personally prefer using the Geopy library to decode latitudes and longitudes for 1000 addresses present in the start location column of the dataset. If you are interested in knowing how did I decoded it, then please visit my GitHub Repository. Otherwise, you can download the datasets with and without the coordinates from my GitHub repository for this project.
Besides, the st.map() function will automatically plot the world map along with the data points based on the latitudes and longitudes. See the example below:

	# Loading the data for showing visualization of vehicals starting from various start locations on the world map.
	@st.cache
	def showMap():
	plotData = pd.read_csv("Trip history with locations.csv")
	Data = pd.DataFrame()
	Data['lat'] = plotData['lat']
	Data['lon'] = plotData['lon']
	return Data

	# Visualization Section
	plotData = showMap()
	st.subheader("Bike Travel History data plotted-first few locations located near Washington DC")
	st.map(plotData, zoom = 14)

view raw Trip history data analytics blog 10 hosted with ❤ by GitHub

For the last visualization, we will plot the histograms for determining the count of every element in a specific feature. For example, there are three types of members, according to our dataset. So, we will plot a histogram using the plotly library to see the approximate count of each type. Let us visualize count using the histogram for three features, namely- Start station, an End station, and Member type. Also, let the user select which visualization to view using the streamlit select box function.
See the code below:

	choose_viz = st.sidebar.selectbox("Choose the Visualization",
	["NONE","Total number of vehicles from various Starting Points", "Total number of vehicles from various End Points",
	"Count of each Member Type"])

	if(choose_viz == "Total number of vehicles from various Starting Points"):
	fig = px.histogram(data['Start station'], x ='Start station')
	st.plotly_chart(fig)
	elif(choose_viz == "Total number of vehicles from various End Points"):
	fig = px.histogram(data['End station'], x ='End station')
	st.plotly_chart(fig)
	elif(choose_viz == "Count of each Member Type"):
	fig = px.histogram(data['Member type'], x ='Member type')
	st.plotly_chart(fig)

view raw Trip history data analytics blog 11 hosted with ❤ by GitHub

Here are a few snaps of the staggering results obtained using streamlit web application:

That's a considerable tutorial, I know! But, the most exciting fact is that you have learned how to deploy a machine learning web application without any need to learn Flask or Django. Also, streamlit is pretty easy to use as in this project, we have just made 33 streamlit calls, that too a lot of them are the same type of function calls. I hope you like the article and found it helpful if so, share it with your friends and colleagues.

NOTE: Please DO NOT copy and run the code snippets because it might cause indentation errors. So, to save you some time, I am adding my GitHub link to this code - CLICK HERE.

There is a lot of cool stuff that you can do with Streamlit. So, check out the official documentation of Streamlit .