Problems with Strings
I am accessing IRIS databases with JDBC (or ODBC) using Python. I want to fetch the data into a pandas dataframe to manipulate the data and create charts from it. I ran into a problem with string handling while using JDBC. This post is to help if anyone else has the same issues. Or, if there is an easier way to solve this, let me know in the comments!
I am using OSX, so I am unsure how unique my problem is. I am using Jupyter Notebooks, although the code would generally be the same if you used any other Python program or framework.
The JDBC problem
When I fetch data from the database the column descriptions and any string data are returned as data type java.lang.String
. If you print
string data data it will look like: "(p,a,i,n,i,n,t,h,e,r,e,a,r)"
instead of the expected "painintherear"
.
This is probably because character strings of data type java.lang.String
are coming through as an iterable or array when fetched using JDBC. This can happen if the Python-Java bridge you're using (e.g., JayDeBeApi, JDBC) is not automatically converting java.lang.String
to a Python str
in a single step.
Python's str
string representation, in contrast, has the whole string as a single unit. When Python retrieves a normal str
(e.g. via ODBC), it doesn't split into individual characters.
The JDBC Solution
To fix this issue, you must ensure that the java.lang.String
is correctly converted into Python's str
type. You can explicitly handle this conversion when processing the fetched data so it is not interpreted as an iterable or list of characters.
There are many ways to do this string manipulation; this is what I did.
import pandas as pd
import pyodbc
import jaydebeapi
import jpype
def my_function(jdbc_used)
# Some other code to create the connection goes here
cursor.execute(query_string)
if jdbc_used:
# Fetch the results, convert java.lang.String in the data to Python str
# (java.lang.String is returned "(p,a,i,n,i,n,t,h,e,r,e,a,r)" Convert to str type "painintherear"
results = []
for row in cursor.fetchall():
converted_row = [str(item) if isinstance(item, jpype.java.lang.String) else item for item in row]
results.append(converted_row)
# Get the column names and ensure they are Python strings
column_names = [str(col[0]) for col in cursor.description]
# Create the dataframe
df = pd.DataFrame.from_records(results, columns=column_names)
# Check the results
print(df.head().to_string())
else:
# I was also testing ODBC
# For very large result sets get results in chunks using cursor.fetchmany(). or fetchall()
results = cursor.fetchall()
# Get the column names
column_names = [column[0] for column in cursor.description]
# Create the dataframe
df = pd.DataFrame.from_records(results, columns=column_names)
# Do stuff with your dataframe
The ODBC problem
When using an ODBC connection, strings are not returned or are NA.
If you're connecting to a database that contains Unicode data (e.g., names in different languages) or if your application needs to store or retrieve non-ASCII characters, you must ensure that the data remains correctly encoded when passed between the database and your Python application.
The ODBC solution
This code ensures that string data is encoded and decoded using UTF-8 when sending and retrieving data to the database. It's especially important when dealing with non-ASCII characters or ensuring compatibility with Unicode data.
def create_connection(connection_string, password):
connection = None
try:
# print(f"Connecting to {connection_string}")
connection = pyodbc.connect(connection_string + ";PWD=" + password)
# Ensure strings are read correctly
connection.setdecoding(pyodbc.SQL_CHAR, encoding="utf8")
connection.setdecoding(pyodbc.SQL_WCHAR, encoding="utf8")
connection.setencoding(encoding="utf8")
except pyodbc.Error as e:
print(f"The error '{e}' occurred")
return connection
connection.setdecoding(pyodbc.SQL_CHAR, encoding="utf8")
Tells pyodbc how to decode character data from the database when fetching SQL_CHAR types (typically, fixed-length character fields).
connection.setdecoding(pyodbc.SQL_WCHAR, encoding="utf8")
Sets the decoding for SQL_WCHAR, wide-character types (i.e., Unicode strings, such as NVARCHAR or NCHAR in SQL Server).
connection.setencoding(encoding="utf8")
Ensures that any strings or character data sent from Python to the database will be encoded using UTF-8,
meaning Python will translate its internal str type (which is Unicode) into UTF-8 bytes when communicating with the database.
Putting it all together
Install JDBC
Install JAVA - use dmg
https://www.oracle.com/middleeast/java/technologies/downloads/#jdk23-mac
Update shell to set default version
$ /usr/libexec/java_home -V
Matching Java Virtual Machines (2):
23 (arm64) "Oracle Corporation" - "Java SE 23" /Library/Java/JavaVirtualMachines/jdk-23.jdk/Contents/Home
1.8.421.09 (arm64) "Oracle Corporation" - "Java" /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home
/Library/Java/JavaVirtualMachines/jdk-23.jdk/Contents/Home
$ echo $SHELL
/opt/homebrew/bin/bash
$ vi ~/.bash_profile
Add JAVA_HOME to your path
export JAVA_HOME=$(/usr/libexec/java_home -v 23)
export PATH=$JAVA_HOME/bin:$PATH
Get the JDBC driver
https://intersystems-community.github.io/iris-driver-distribution/
Put the jar file somewhere... I put it in $HOME
$ ls $HOME/*.jar
/Users/myname/intersystems-jdbc-3.8.4.jar
Sample code
It assumes you have set up ODBC (an example for another day, the dog ate my notes...).
Note: this is a hack of my real code. Note the variable names.
import os
import datetime
from datetime import date, time, datetime, timedelta
import pandas as pd
import pyodbc
import jaydebeapi
import jpype
def jdbc_create_connection(jdbc_url, jdbc_username, jdbc_password):
# Path to JDBC driver
jdbc_driver_path = '/Users/yourname/intersystems-jdbc-3.8.4.jar'
# Ensure JAVA_HOME is set
os.environ['JAVA_HOME']='/Library/Java/JavaVirtualMachines/jdk-23.jdk/Contents/Home'
os.environ['CLASSPATH'] = jdbc_driver_path
# Start the JVM (if not already running)
if not jpype.isJVMStarted():
jpype.startJVM(jpype.getDefaultJVMPath(), classpath=[jdbc_driver_path])
# Connect to the database
connection = None
try:
connection = jaydebeapi.connect("com.intersystems.jdbc.IRISDriver",
jdbc_url,
[jdbc_username, jdbc_password],
jdbc_driver_path)
print("Connection successful")
except Exception as e:
print(f"An error occurred: {e}")
return connection
def odbc_create_connection(connection_string):
connection = None
try:
# print(f"Connecting to {connection_string}")
connection = pyodbc.connect(connection_string)
# Ensure strings are read correctly
connection.setdecoding(pyodbc.SQL_CHAR, encoding="utf8")
connection.setdecoding(pyodbc.SQL_WCHAR, encoding="utf8")
connection.setencoding(encoding="utf8")
except pyodbc.Error as e:
print(f"The error '{e}' occurred")
return connection
# Parameters
odbc_driver = "InterSystems ODBC"
odbc_host = "your_host"
odbc_port = "51773"
odbc_namespace = "your_namespace"
odbc_username = "username"
odbc_password = "password"
jdbc_host = "your_host"
jdbc_port = "51773"
jdbc_namespace = "your_namespace"
jdbc_username = "username"
jdbc_password = "password"
# Create connection and create charts
jdbc_used = True
if jdbc_used:
print("Using JDBC")
jdbc_url = f"jdbc:IRIS://{jdbc_host}:{jdbc_port}/{jdbc_namespace}?useUnicode=true&characterEncoding=UTF-8"
connection = jdbc_create_connection(jdbc_url, jdbc_username, jdbc_password)
else:
print("Using ODBC")
connection_string = f"Driver={odbc_driver};Host={odbc_host};Port={odbc_port};Database={odbc_namespace};UID={odbc_username};PWD={odbc_password}"
connection = odbc_create_connection(connection_string)
if connection is None:
print("Unable to connect to IRIS")
exit()
cursor = connection.cursor()
site = "SAMPLE"
table_name = "your.TableNAME"
desired_columns = [
"RunDate",
"ActiveUsersCount",
"EpisodeCountEmergency",
"EpisodeCountInpatient",
"EpisodeCountOutpatient",
"EpisodeCountTotal",
"AppointmentCount",
"PrintCountTotal",
"site",
]
# Construct the column selection part of the query
column_selection = ", ".join(desired_columns)
query_string = f"SELECT {column_selection} FROM {table_name} WHERE Site = '{site}'"
print(query_string)
cursor.execute(query_string)
if jdbc_used:
# Fetch the results
results = []
for row in cursor.fetchall():
converted_row = [str(item) if isinstance(item, jpype.java.lang.String) else item for item in row]
results.append(converted_row)
# Get the column names and ensure they are Python strings (java.lang.String is returned "(p,a,i,n,i,n,t,h,e,a,r,s,e)"
column_names = [str(col[0]) for col in cursor.description]
# Create the dataframe
df = pd.DataFrame.from_records(results, columns=column_names)
print(df.head().to_string())
else:
# For very large result sets get results in chunks using cursor.fetchmany(). or fetchall()
results = cursor.fetchall()
# Get the column names
column_names = [column[0] for column in cursor.description]
# Create the dataframe
df = pd.DataFrame.from_records(results, columns=column_names)
print(df.head().to_string())
# # Build charts for a site
# cf.build_7_day_rolling_average_chart(site, cursor, jdbc_used)
cursor.close()
connection.close()
# Shutdown the JVM (if you started it)
# jpype.shutdownJVM()
Top comments (0)