Akmal Chaudhri for SingleStore

Posted on Mar 10, 2023 • Edited on Nov 1, 2024

Quick tip: Ibis, Pandas and SingleStoreDB

#singlestoredb #ibis #pandas #python

Abstract

The Ibis Project provides the power of Python Analytics with SQL. In this short article, we'll see how to use Ibis to connect to SingleStoreDB Cloud and run a few commands to demonstrate the integration.

The notebook file used in this article is available on GitHub.

Introduction

In previous articles, we've discussed various options to connect to SingleStoreDB. These have included using, for example:

Another connection option is the Ibis backend being developed by SingleStore Labs. In this short article, we'll test this implementation with some commands.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use the Free Shared Tier and take the default names for the Workspace and Database.

Notebook

First, we'll create some imports:

import ibis
import pandas as pd

ibis.options.interactive = True

We'll now read the iris.csv file and look at the data:

url = "https://gist.githubusercontent.com/VeryFatBoy/9af771d443f5ec4dd6eec8d69a062638/raw/c03ef25a97f23a48ee408ac02114195b663a2364/iris.csv"

iris_df = pd.read_csv(url)

iris_df.head(5)

The result should be similar to the following:

   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa

Next, let's check the Pandas Dataframe:

iris_df.info()

The result should be as follows:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)

We'll now create a connection to SingleStoreDB:

conn = ibis.singlestoredb.connect(connection_url)

Now we'll create a new table in SingleStoreDB using the Pandas Dataframe and then look at the data:

iris_tbl = conn.create_table("iris", iris_df, force = True)

iris_tbl.head(5)

The result should be similar to the following:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64      │ float64     │ float64      │ float64     │ string          │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│          4.9 │         3.0 │          1.4 │         0.2 │ Iris-setosa     │
│          5.0 │         3.4 │          1.5 │         0.2 │ Iris-setosa     │
│          4.9 │         3.1 │          1.5 │         0.1 │ Iris-setosa     │
│          4.6 │         3.6 │          1.0 │         0.2 │ Iris-setosa     │
│          7.0 │         3.2 │          4.7 │         1.4 │ Iris-versicolor │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘

We can get some further details:

iris_tbl.info()

The result should be as follows:

┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┓
┃ name         ┃ type    ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos  ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━┩
│ string       │ string  │ boolean  │ int64 │ int64     │ float64   │ int8 │
├──────────────┼─────────┼──────────┼───────┼───────────┼───────────┼──────┤
│ sepal_length │ float64 │ True     │     0 │       150 │       0.0 │    0 │
│ sepal_width  │ float64 │ True     │     0 │       150 │       0.0 │    1 │
│ petal_length │ float64 │ True     │     0 │       150 │       0.0 │    2 │
│ petal_width  │ float64 │ True     │     0 │       150 │       0.0 │    3 │
│ species      │ string  │ True     │     0 │       150 │       0.0 │    4 │
└──────────────┴─────────┴──────────┴───────┴───────────┴───────────┴──────┘

Basic schema information can also be viewed:

iris_tbl.schema()

The result should be as follows:

ibis.Schema {
  sepal_length  float64
  sepal_width   float64
  petal_length  float64
  petal_width   float64
  species       string
}

We can also find the details of the CREATE TABLE statement:

conn.show.create_table("iris")

The result should be as follows:

|  Name | CreateTable                                                                                                                                                                                                                                                                                                                                                                                         |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  iris | CREATE TABLE `iris` ( `sepal_length` double DEFAULT NULL, `sepal_width` double DEFAULT NULL, `petal_length` double DEFAULT NULL, `petal_width` double DEFAULT NULL, `species` text CHARACTER SET utf8 COLLATE utf8_general_ci, SORT KEY `__UNORDERED` () , SHARD KEY () ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |

We can count the number of different species:

iris_tbl.species.value_counts()

The result should be as follows:

┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species         ┃ species_count ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string          │ int64         │
├─────────────────┼───────────────┤
│ Iris-virginica  │            50 │
│ Iris-setosa     │            50 │
│ Iris-versicolor │            50 │
└─────────────────┴───────────────┘

Filtering is also possible. Here, for example, we want to limit the results by Iris-versicolor where the petal_length is greater than 4.5:

res = iris_tbl[iris_tbl.species.like("Iris-versicolor")][iris_tbl.petal_length > 4.5]

Using the filter we can get a count:

res.count()

The result should be as follows:

Sorting is also possible. This is in ascending order for petal_length:

res.order_by("petal_length")

The result should be as follows:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64      │ float64     │ float64      │ float64     │ string          │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│          6.6 │         2.9 │          4.6 │         1.3 │ Iris-versicolor │
│          6.5 │         2.8 │          4.6 │         1.5 │ Iris-versicolor │
│          6.1 │         3.0 │          4.6 │         1.4 │ Iris-versicolor │
│          6.3 │         3.3 │          4.7 │         1.6 │ Iris-versicolor │
│          6.1 │         2.8 │          4.7 │         1.2 │ Iris-versicolor │
│          7.0 │         3.2 │          4.7 │         1.4 │ Iris-versicolor │
│          6.1 │         2.9 │          4.7 │         1.4 │ Iris-versicolor │
│          6.7 │         3.1 │          4.7 │         1.5 │ Iris-versicolor │
│          5.9 │         3.2 │          4.8 │         1.8 │ Iris-versicolor │
│          6.8 │         2.8 │          4.8 │         1.4 │ Iris-versicolor │
│            … │           … │            … │           … │ …               │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘

Finally, we can convert the results back into a Pandas Dataframe, for additional processing, as follows:

another_iris_df = res.execute()

Summary

The Ibis Project provides another way to work with SingleStoreDB using Python and Pandas. The SingleStore Labs GitHub repo contains example notebooks that use other popular datasets. Check it out.

DEV Community

Quick tip: Ibis, Pandas and SingleStoreDB

Abstract

Introduction

Create a SingleStoreDB Cloud account

Notebook

Summary

Top comments (0)

Read next

How to Retrieve EC2 Instances Information Using Python and Boto3

Interactive DataFrame Management with Streamlit Fragments 🚀

Python 🐍 and variable types

6 Essential Python Design Patterns for Scalable Software Architecture