Abstract
The Python Client enables developers to get up and running with SingleStoreDB quickly. In this short article, we'll discuss how to configure and use the Python Client. We'll use a local Jupyter installation as our development environment.
The notebook file used in this article is available on GitHub.
Introduction
Python is one of the most popular programming languages today. It is also widely used by Data Scientists. We'll look at several small code examples of how we can use Python with SingleStoreDB and some of the data analyses we might be interested in performing.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Iris Demo Group as our Workspace Group Name and iris-demo as our Workspace Name. We'll make a note of our password and host name. Finally, we'll create a new database using the SQL Editor:
CREATE DATABASE IF NOT EXISTS iris_demo;
Notebook
Let's now start to fill out our notebook.
First, we'll need to install the following:
!pip install matplotlib pandas plotly scikit-learn seaborn singlestoredb --quiet --no-warn-script-location
Next, we'll import some libraries:
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
import singlestoredb as s2
import seaborn as sns
from sklearn.decomposition import PCA
We'll now create a connection to SingleStoreDB:
conn = s2.connect(
host = "<host>",
port = "3306",
user = "admin",
password = "<password>",
database = "iris_demo",
results_type = "tuples"
)
We'll replace the <host>
and <password>
with the values from our SingleStoreDB Cloud account. The results_type
provides several options and, in this example, we'll work with tuples.
Let's now read our Iris CSV data into a Pandas Dataframe, as follows:
url = "https://gist.githubusercontent.com/VeryFatBoy/9af771d443f5ec4dd6eec8d69a062638/raw/c03ef25a97f23a48ee408ac02114195b663a2364/iris.csv"
df = pd.read_csv(url)
and we can view the data, as follows:
df.head(5)
The output should be similar to the following:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Let's now prepare the SQL statement to insert the data into SingleStoreDB:
stmt = """
INSERT INTO iris (
sepal_length,
sepal_width,
petal_length,
petal_width,
species
) VALUES (%s, %s, %s, %s, %s)
"""
Next, we'll create the table, insert the data from the Pandas Dataframe into SingleStoreDB and then retrieve the data back from SingleStoreDB:
with conn:
conn.autocommit(True)
with conn.cursor() as cur:
cur.execute("DROP TABLE IF EXISTS iris")
cur.execute("""
CREATE TABLE IF NOT EXISTS iris (
sepal_length FLOAT,
sepal_width FLOAT,
petal_length FLOAT,
petal_width FLOAT,
species VARCHAR(20)
)
""")
cur.executemany(stmt, df)
cur.execute("SELECT * FROM iris")
rows = cur.fetchall()
Next, we'll convert the returned data into a new Pandas Dataframe:
iris_df = pd.DataFrame(rows, columns = [
"sepal_length",
"sepal_width",
"petal_length",
"petal_width",
"species"
])
We can now perform some data analysis, such as PCA:
# https://plotly.com/python/pca-visualization/
X = iris_df[[
"sepal_length",
"sepal_width",
"petal_length",
"petal_width"
]]
pca = PCA(n_components = 2)
components = pca.fit_transform(X)
pca_fig = px.scatter(
components,
x = 0,
y = 1,
color = iris_df["species"]
)
pca_fig.show(renderer = "iframe")
The output should be similar to Figure 1.
We can also check for correlations:
sns.heatmap(
iris_df.drop(columns = ["species"]).corr(),
cmap = "OrRd",
annot = True
)
plt.title("Correlations")
plt.plot()
The output should be similar to Figure 2.
And so on.
Summary
The Python Client provides a wide range of options and, in this article, we have used tuples as they are a convenient way to work with bulk data. The documentation provides examples of how to use the other options.
Top comments (0)