Introduction
In a previous article, I presented the IRIStool module, which seamlessly integrates the pandas Python library with the IRIS database. Now, I'm explaining how we can use IRIStool to leverage InterSystems IRIS as a foundation for intelligent, semantic search over healthcare data in FHIR format.
This article covers what I did to create the database for another of my projects, the FHIR Data Explorer. Both projects are candidates in the current InterSystems contest, so please vote for them if you find them useful.
You can find them at the Open Exchange:
In this article we'll cover:
- Connecting to InterSystems IRIS database through Python
- Creating a FHIR-ready database schema
- Importing FHIR data with vector embeddings for semantic search
Prerequisites
Install IRIStool from the IRIStool and Data Manager GitHub page.
1. IRIS Connection Setup
Start by configuring your connection through environment variables in a .env
file:
IRIS_HOST=localhost
IRIS_PORT=9092
IRIS_NAMESPACE=USER
IRIS_USER=_SYSTEM
IRIS_PASSWORD=SYS
Connect to IRIS using IRIStool's context manager:
from utils.iristool import IRIStool
import os
from dotenv import load_dotenv
load_dotenv()
with IRIStool(
host=os.getenv('IRIS_HOST'),
port=os.getenv('IRIS_PORT'),
namespace=os.getenv('IRIS_NAMESPACE'),
username=os.getenv('IRIS_USER'),
password=os.getenv('IRIS_PASSWORD')
) as iris:
# IRIStool manages the connection automatically
pass
2. Creating the FHIR Schema
At first, create a table to store FHIR data, then while extracting data from FHIR bundles, create tables with vector search capabilities for each of the extracted FHIR resources (like Patient, Osservability, etc.).
IRIStool simplifies table and index creation!
FHIR Repository Table
# Create main repository table for raw FHIR bundles
if not iris.table_exists("FHIRrepository", "SQLUser"):
iris.create_table(
table_name="FHIRrepository",
columns={
"patient_id": "VARCHAR(200)",
"fhir_bundle": "CLOB"
}
)
iris.quick_create_index(
table_name="FHIRrepository",
column_name="patient_id"
)
Patient Table with Vector Support
# Create Patient table with vector column for semantic search
if not iris.table_exists("Patient", "SQLUser"):
iris.create_table(
table_name="Patient",
columns={
"patient_row_id": "INT AUTO_INCREMENT PRIMARY KEY",
"patient_id": "VARCHAR(200)",
"description": "CLOB",
"description_vector": "VECTOR(FLOAT, 384)",
"full_name": "VARCHAR(200)",
"gender": "VARCHAR(30)",
"age": "INTEGER",
"birthdate": "TIMESTAMP"
}
)
# Create standard indexes
iris.quick_create_index(table_name="Patient", column_name="patient_id")
iris.quick_create_index(table_name="Patient", column_name="age")
# Create HNSW vector index for similarity search
iris.create_hnsw_index(
index_name="patient_vector_idx",
table_name="Patient",
column_name="description_vector",
distance="Cosine"
)
3. Importing FHIR Data with Vectors
Generate vector embeddings from FHIR patient descriptions and insert them into IRIS easily:
from sentence_transformers import SentenceTransformer
# Initialize transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Example: Process patient data
patient_description = "45-year-old male with hypertension and type 2 diabetes"
patient_id = "patient-123"
# Generate vector embedding
vector = model.encode(patient_description, normalize_embeddings=True).tolist()
# Insert patient data with vector
iris.insert(
table_name="Patient",
patient_id=patient_id,
description=patient_description,
description_vector=str(vector),
full_name="John Doe",
gender="male",
age=45,
birthdate="1979-03-15"
)
4. Performing Semantic Search
Once your data is loaded, you can perform similarity searches:
# Search query
search_text = "patients with diabetes"
query_vector = model.encode(search_text, normalize_embeddings=True).tolist()
# define sql query
query = f"""
SELECT TOP 5
patient_id,
full_name,
description,
VECTOR_COSINE(description_vector, TO_VECTOR(?)) as similarity
FROM Patient
ORDER BY similarity DESC
"""
# define query parameters
parameters = [str(query_vector)]
# Find similar patients using vector search
results = iris.query(query, parameters)
# print DataFrame data
if not results.empty:
print(f"{results['full_name']}: {results['similarity']:.3f}")
Conclusion
- IRIStool simplifies IRIS integration with intuitive Python methods for table and index creation
- IRIS supports hybrid SQL + vector storage natively, enabling both traditional queries and semantic search
- Vector embeddings enable intelligent search across FHIR healthcare data using natural language
- HNSW indexes provide efficient similarity search at scale
This approach demonstrates how InterSystems IRIS can serve as a powerful foundation for building intelligent healthcare applications with semantic search capabilities over FHIR data.
Top comments (0)