DEV Community

Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: Ibis, Pandas and SingleStoreDB

Abstract

The Ibis Project provides the power of Python Analytics with SQL. In this short article, we'll see how to use Ibis to connect to SingleStoreDB Cloud and run a few commands to demonstrate the integration.

Introduction

In previous articles, we've discussed various options to connect to SingleStoreDB. These have included using, for example:

Another connection option is the Ibis backend being developed by SingleStore Labs. In this short article, we'll test this implementation with some commands.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Ibis Demo Group as our Workspace Group Name and ibis-demo as our Workspace Name. We'll make a note of our password and host name.

We'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS iris_db;
Enter fullscreen mode Exit fullscreen mode

Create a Deepnote account

We'll create a free account on the Deepnote website. Once logged in, we'll create a new Deepnote project to give us a new notebook. We'll also create a new directory called data. We'll use the Iris flower data set and upload the CSV file into the data directory.

Deepnote notebook

First, we'll install the package:

!pip install ibis-singlestoredb
Enter fullscreen mode Exit fullscreen mode

Deepnote will prompt us to add this to the requirements.txt file.

Next, we'll create some imports:

import ibis
import pandas as pd

ibis.options.interactive = True
Enter fullscreen mode Exit fullscreen mode

We'll now read the iris.csv file and look at the data:

iris_df = pd.read_csv("data/iris.csv")

iris_df.head(5)
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa
Enter fullscreen mode Exit fullscreen mode

Next, let's check the Pandas Dataframe:

iris_df.info()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
Enter fullscreen mode Exit fullscreen mode

We'll now create a connection to SingleStoreDB:

conn = ibis.singlestoredb.connect(
    "admin:<password>@<host>:3306/iris_db"
)
Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Now we'll create a new table in SingleStoreDB using the Pandas Dataframe and then look at the data:

iris_tbl = conn.create_table("iris", iris_df, force = True)

iris_tbl.head(5)
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

   sepal_length  sepal_width  petal_length  petal_width      species
0           4.6          3.1           1.5          0.2  Iris-setosa
1           5.4          3.9           1.3          0.4  Iris-setosa
2           5.1          3.5           1.4          0.3  Iris-setosa
3           4.9          3.1           1.5          0.1  Iris-setosa
4           4.5          2.3           1.3          0.3  Iris-setosa
Enter fullscreen mode Exit fullscreen mode

We can get some further details:

iris_tbl.info()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

                      Summary of iris
                          150 rows
+--------------+------------------------+---------+---------+
| Name         | Type                   | # Nulls | % Nulls |
+--------------+------------------------+---------+---------+
| sepal_length | Float64(nullable=True) |       0 |    0.00 |
| sepal_width  | Float64(nullable=True) |       0 |    0.00 |
| petal_length | Float64(nullable=True) |       0 |    0.00 |
| petal_width  | Float64(nullable=True) |       0 |    0.00 |
| species      | String(nullable=True)  |       0 |    0.00 |
+--------------+------------------------+---------+---------+
Enter fullscreen mode Exit fullscreen mode

Basic schema information can also be viewed:

iris_tbl.schema()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

ibis.Schema {
  sepal_length  float64
  sepal_width   float64
  petal_length  float64
  petal_width   float64
  species       string
}
Enter fullscreen mode Exit fullscreen mode

We can also find the details of the CREATE TABLE statement:

conn.show.create_table("iris")
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

|  Name | CreateTable                                                                                                                 |
---------------------------------------------------------------------------------------------------------------------------------------
|  iris | CREATE TABLE `iris` (
          `sepal_length` float DEFAULT NULL,
          `sepal_width` float DEFAULT NULL,
          `petal_length` float DEFAULT NULL,
          `petal_width` float DEFAULT NULL,
          `species` text CHARACTER SET utf8 COLLATE utf8_general_ci,
          KEY `__UNORDERED` () USING CLUSTERED COLUMNSTORE,
          SHARD KEY ()
          ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |
Enter fullscreen mode Exit fullscreen mode

We can count the number of different species:

iris_tbl.species.value_counts()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

           species  count
0   Iris-virginica     50
1      Iris-setosa     50
2  Iris-versicolor     50
Enter fullscreen mode Exit fullscreen mode

Filtering is also possible. Here, for example, we want to limit the results by Iris-versicolor where the petal_length is greater than 4.5:

res = iris_tbl[iris_tbl.species.like("Iris-versicolor")][iris_tbl.petal_length > 4.5]
Enter fullscreen mode Exit fullscreen mode

Using the filter we can get a count:

res.count()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

14
Enter fullscreen mode Exit fullscreen mode

Sorting is also possible. This is in ascending order for petal_length:

res.sort_by("petal_length")
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

    sepal_length  sepal_width  petal_length  petal_width          species
0            6.6          2.9           4.6          1.3  Iris-versicolor
1            6.1          3.0           4.6          1.4  Iris-versicolor
2            6.5          2.8           4.6          1.5  Iris-versicolor
3            6.1          2.8           4.7          1.2  Iris-versicolor
4            6.7          3.1           4.7          1.5  Iris-versicolor
5            7.0          3.2           4.7          1.4  Iris-versicolor
6            6.3          3.3           4.7          1.6  Iris-versicolor
7            6.1          2.9           4.7          1.4  Iris-versicolor
8            5.9          3.2           4.8          1.8  Iris-versicolor
9            6.8          2.8           4.8          1.4  Iris-versicolor
10           6.9          3.1           4.9          1.5  Iris-versicolor
11           6.3          2.5           4.9          1.5  Iris-versicolor
12           6.7          3.0           5.0          1.7  Iris-versicolor
13           6.0          2.7           5.1          1.6  Iris-versicolor
Enter fullscreen mode Exit fullscreen mode

Finally, we can convert the results back into a Pandas Dataframe, for additional processing, as follows:

another_iris_df = res.execute()
Enter fullscreen mode Exit fullscreen mode

Summary

The Ibis Project provides another way to work with SingleStoreDB using Python and Pandas. The SingleStore Labs GitHub repo contains example notebooks that use other popular datasets. Check it out.

Top comments (0)