Case description
Let’s imagine that you are a Python developer or have a well-trained team specialized in Python, but the deadline you got to analyze some data in IRIS is tight. Of course, InterSystems offers many tools for all kinds of analyses and treatments. However, in the given scenario, it is better to get the job done using the good old Pandas and leave the IRIS for another time.
For the abovementioned situation and many other cases, you might want to fetch tables from IRIS to manage data outside InterSystems’ products. However, you may also need to do things the other way around when you have an external table in any format, namely CSV, TXT, or Pickle, that you need to import and use the IRIS tools upon it.
Regardless of whether you have to deal with an issue described above or not, Innovatium taught me that knowing more ways to solve a coding problem can always come in handy. The good news is that you do not need to go through the tedious process of creating a new table, transferring all the rows, and adjusting every type when bringing a table from IRIS.
This article will show you how to quickly transform an IRIS table into a Pandas data frame and backward with just a few lines of code. You can check out the code on my GitHub, where you can find a Jupiter Notebook with every step for this tutorial.
Bringing a table from IRIS
Of course, you should start by importing the libraries required for this project.
import pandas as pd
import sqlalchemy as db
Your next step will be to create the connection between the Python file and the IRIS instance. To do that, we will use SQLAlchemy’s function create_engine(), with a string as an argument. This string should contain information about the dialect of the operation, the username and password, the host and port for the instance, and the destination namespace. For more information on the basic concepts of using sqlalchemy-iris, examine one of my previous articles, SQLAlchemy - the easiest way to use Python and SQL with IRIS's databases.
engine = db.create_engine("iris://_system:SYS@localhost:1972/SAMPLE")
connection = engine.connect()
Then, we can declare the variable that will hold the data frame and call Pandas’ read_sql_table() function on this connection, specifying the table name as a string with the schema. You can also state the schema in another argument, which is, in fact, preferable since having a dot on the name string might cause errors in some cases.
df = pd.read_sql_table("NameAge", connection, schema="PD")
It is a good practice to double-check if the table we are working with exists in the schema we want to use and, of course, if there is a schema we need in the first place. In the last section of this article, you will learn how to do that additionally to some more tips. From now on, if you have a way with Pandas, you can perform any changes and analyses you want since you know what to do. Explore the following example to see how it works.
Sending a table to IRIS
Before we start, let's change something in our data frame just as an example. We can adapt the values of a column to suit our needs, (e.g., add rows and columns, etc.) After playing around a bit, I have put the names in lowercase and added a new person and a column based on the existing data. You can review the following illustration to see the result.
Now we can send it back to IRIS in a single line of code. All we need is to specify the engine and the table name.
df.to_sql("NameAge", con=engine, schema="PD", if_exists="replace")
Once again, we need to put the schema in an argument separately from the table name to avoid some errors and undesirable behavior. Additionally to that, the if_exists argument specifies what to do if there is already a table with the same name in the given schema. The possible values are: replace, fail (the default), and append. Of course, the replace option drops the table and creates a new one with an SQL command, whereas append will add the data to the existing table. Remember that this method won’t check for repeated values, so be careful when using this attribute. Finally, the fail value raises the following error:
Keep in mind that if you specify a table name that doesn’t exist, the function will create it.
Now you can query IRIS to see what’s new or go to the Management Portal to the part dedicated to SQL. Remember that if you used the replace value, you should regard the class’s source code because the method rewrote it completely. It means that if you have implemented any methods, you should leave them in a superclass.
More tips on sqlalchemy-iris
If you have any problems you couldn’t solve with the information shared in other communities or forums related to your application’s code, you might find the help you need in this section. Here you will discover a list of tips on how to find details about the engine and the dialect.
Dialect-specific features
SQL Alchemy works upon dialects that are automatically chosen based on your engine. When you use the function create_engine() to connect to an IRIS database, the dialect that gets selected is sqlalchemy-iris by Dmitry Maslennikov.
You can access and edit its features with the dialect property of your engine.
engine = db.create_engine("iris://_system:SYS@localhost:1972/SAMPLE")
engine.dialect
With the extension IntelliCode from VSCode, you can search every option from this property or check the source code on CaretDev's GitHub.
Checking available schemas in an engine
A special function from the dialect that is worth highlighting is the get_schema_names() function. Attention! The following information can be crucial for you if you want to avoid errors in your code and for iterating.
connection = engine.connect()
engine.dialect.get_schema_names(connection)
Checking available tables in a schema
Let’s look at a similar situation. You might also need to know the available tables from a schema. In this case, you can use the inspection. Run the function inspect() on the engine and save it in a variable. The same variable you will use to access another function, get_table_names(). It will return a list with the table names in the specified schema or to the default “SQLUser.”
inspection = db.inspect(engine)
inspection.get_table_names(schema="Sample")
Additionally, if you want to use more of SQL Alchemy’s features on your data, you can declare a base and have its metadata reflect a schema from the engine.
b = db.orm.declarative_base()
b.metadata.reflect(engine, schema="Sample")
If you need more information to solve this problem, check the SQL Alchemy Documentation and the sqlalchemy-iris GitHub Repository. Alternatively, you can message me or leave a comment, and we will try to uncover the secret together.
Final considerations
The implementation approach in this article emphasizes the usage of IRIS instances as Cloud providers and make it possible to carry out an analysis on different bases. It makes it easy to monitor all of them simultaneously in any of their qualities and compare their performances and usage. If you combine this knowledge with the development described in another article about a portal made with Django, you can quickly build a powerful manager for as many features and instances as you need.
This implementation is also an efficient way to move data from outside IRIS to a well-built class. Since you might be familiar with some other functions found in Pandas for dealing with many different languages, namely CSV, JSON, HTML, Excel, and Pickle, it will be easy for you to change the read_sql_table to read_csv, read_json, or any other option. Yes, I should warn you that integration with InterSystems from some types is not a built-in feature and, thus, may not be very easy. However, the union of SQL Alchemy and Pandas will always come in handy for exporting data from IRIS.
Therefore, in this article, we have learned that IRIS has all the tools you need to help you with development and easy integration with existing devices of your system or gadgets of your expertise.
Top comments (0)