DEV Community

Cover image for Bulk load Pandas DataFrames into SQL databases using Jaydebeapi
MoroChev
MoroChev

Posted on • Edited on

4 2

Bulk load Pandas DataFrames into SQL databases using Jaydebeapi

Loading Pandas DataFrames into SQL databases of all names is a common task between all developers working on building data pipelines for their environments or trying to automate ETL jobs generally.
And for that, Pandas DataFrame class has the built-in method pandas.DataFrame.to_sql that allows to do so very quickly, for SQLite and all the databases supported by SQLAlchemy library, and when it comes to those who doesn’t have a good support by it ( in my case was IBM DB2 ), developers find themselves forced to think twice for some work around in order to get the job done.
Jaydebeapi introduces himself as a good alternative, and it’s particularly seen thus by all developers coming from a Java background and having some familiarities of working with JDBC API to access the database.
Let’s start first by creating the database connection. for that reason I will be creating a simple function that takes in params all the informations required and it will give a connection to DB2 as a return.

def get_conn_to_db( user: str,
                    password: str,
                    host: str,
                    port: str,
                    db_name: str,
                    driver_name: str ): 

    """ Return a connection to DB2 database  """

    login          = {'user': user, 'password': password} 
    drivers_path   = [path_to/driver_file.jar]     
    connection_url = 'jdbc:db2://'+host+':'+port+'/'+database 
    connection     = jaydebeapi.connect(driver_name, connection_url, login,jars= drivers_path)     
    return connection
Enter fullscreen mode Exit fullscreen mode

And then let’s move on to build the bulk_load function that’s going to be charged to load our Pandas DataFrame into DB2 in a chunked way.

def bulk_load(df: pandas.DataFrame, conn, schema_name: str, table: str, chunksize: int) -> []:

    cursor = connection.cursor()    
    sql_exceptions = []
    row_nbr = 0
    df_length = df.shape[0]
    schema_table = f"{schema_name}.{table}"
    # You should make sure that the columns names doesn't 
    # contain any SQL key word
    cols_names_list = df.columns.values.tolist()
    cols_names = f"({ ",".join(cols_names_list) })"

    while row_nbr < df_length:       
        # Determine insert statement boundaries (size)
        beginrow = row_nbr
        endrow = df_length if (row_nbr+chunksize) > df_length 
                           else row_nbr + chunksize 

        # Extract the chunk
        tuples = [tuple(x) for x in df.values[beginrow : endrow]]            
        values_params = '('+",".join('?' for i in cols_names)+')'       
        sql = f"INSERT INTO {schema_table} {cols_names} VALUES {values_params}"

        try:
            cursor.executemany(sql, tuples)
            connection.commit()
        except Exception as e: 
            sql_exceptions.append((beginrow,endrow, e))

        row_nbr = endrow

    cursor.close()
    connection.close()
    return sql_exceptions
Enter fullscreen mode Exit fullscreen mode

Now Let’s see how we can apply those functions on our main task

import numpy as np
import pandas as pd
import jaydebeapi


db_settings = {    
    'host': 'host-adress',
    'port': '50000',
    'user':'username',
    'password':"password",
    'driver_name':'com.ibm.db2.jcc.DB2Driver',
    'db_name':'bludb'
} 

data = np.random.choice(['foo',3,'bar'],size=(100000,3))
df = pd.DataFrame(data, columns=['random_1', 'random_2', 'random_3'])

with get_conn_to_db(**db_settings) as conn:
     bulk_load(df, conn, 'RANDOM_SCHEMA_NAME', 'RANDOM_TABLE_NAME',  1000)
Enter fullscreen mode Exit fullscreen mode

Heroku

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (0)

Image of Stellar post

🚀 Stellar Dev Diaries Series: Episode 1 is LIVE!

Ever wondered what it takes to build a web3 startup from scratch? In the Stellar Dev Diaries series, we follow the journey of a team of developers building on the Stellar Network as they go from hackathon win to getting funded and launching on mainnet.

Read more

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay