<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: InterSystems Developer</title>
    <description>The latest articles on DEV Community by InterSystems Developer (@intersystemsdev).</description>
    <link>https://dev.to/intersystemsdev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F405886%2F59cd7fa8-de28-47cf-acb0-c800febc5986.png</url>
      <title>DEV Community: InterSystems Developer</title>
      <link>https://dev.to/intersystemsdev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/intersystemsdev"/>
    <language>en</language>
    <item>
      <title>Fast Automatic ML Hyperparameter tuning Using Optuna (w. MLflow model registry and IRIS DB)</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Tue, 16 Jun 2026 15:37:19 +0000</pubDate>
      <link>https://dev.to/intersystems/fast-automatic-ml-hyperparameter-tuning-using-optuna-w-mlflow-model-registry-and-iris-db-5aoc</link>
      <guid>https://dev.to/intersystems/fast-automatic-ml-hyperparameter-tuning-using-optuna-w-mlflow-model-registry-and-iris-db-5aoc</guid>
      <description>&lt;p&gt;This article presents a straightforward approach to automatically and efficiently tune hyperparameters for machine learning models using Optuna as the optimisation framework. We explore how to use both Optuna’s native storage options and InterSystems IRIS as a database backend to track the progress of hyperparameter searches. We also show how MLflow can be used to monitor experiments and manage models through its tracking and model registry UI.&lt;/p&gt;

&lt;p&gt;This article is based on this &lt;a href="https://www.kaggle.com/code/jorgeivnjh/fast-automatic-ml-hyperparameter-tuning-w-optuna" rel="noopener noreferrer"&gt;Kaggle Notebook&lt;/a&gt;, which you can run and directly edit yourself.&lt;/p&gt;

&lt;p&gt;When training ML models, the choice of hyperparameters can strongly influence performance. They are not the only factor, but they can significantly affect both convergence and generalisation.&lt;/p&gt;

&lt;p&gt;Tuning hyperparameters manually takes a lot of effort. This is especially true because hyperparameters interact with each other, so tuning them independently is usually not enough. For example, higher regularisation may require a lower learning rate for more stable optimization. A more complex model may require stronger regularization to avoid overfitting, but at the same time, a very small learning rate on a complex model can make learning too slow.&lt;/p&gt;

&lt;p&gt;Optuna is an MIT-licensed open source library, which allows commercial use, that automates hyperparameter search for ML models developed with the most popular frameworks such as scikit-learn, PyTorch, TensorFlow, and LightGBM. It works by defining a search space and an objective metric to either minimize or maximize. Optuna then explores the search space efficiently to find well-performing configurations.&lt;/p&gt;

&lt;p&gt;Here we use Optuna to tune a LightGBM model on a dummy dataset and show how to scale the search using shared database storage. We will also use MLflow for experiment tracking and model registry, and IRIS DB as a possible Optuna storage backend for concurrent studies.&lt;/p&gt;

&lt;p&gt;We will use the California Housing dataset, commonly used in ML examples, to populate IRIS tables and run the tuning workflow.&lt;/p&gt;

&lt;p&gt;Note: For the last bit, you will need an existing IRIS instance that you can connect to. I am using the one created with Docker by running the docker-compose file from this &lt;a href="https://github.com/JorgeIvanJH/IRIS_and_MLflow-Continuous-Training-Pipeline" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. I am also using the environment variables and requirements.txt from that repository, together with Python 3.12.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cross_val_score&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;


&lt;span class="n"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Connection String to Existing IRIS Database
&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_SERVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Standard InterSystems superserver port
&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_NAMESPACE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_USERNAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sklearn version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sklearn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlalchemy version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optuna version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lightgbm version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seaborn version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matplotlib version: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pandas version: 2.3.3
sklearn version: 1.8.0
sqlalchemy version: 2.0.46
optuna version: 4.8.0
lightgbm version: 4.6.0
seaborn version: 0.13.2
matplotlib version: 3.10.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick Intro to Optuna
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://optuna.org/" rel="noopener noreferrer"&gt;Optuna&lt;/a&gt; is a hyperparameter optimization framework that speeds up tuning by training multiple model configurations and learning from their results. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Efficient sampling strategies, such as TPE, to focus on promising regions of the search space&lt;/li&gt;
&lt;li&gt;Pruning strategies to stop unpromising trials early&lt;/li&gt;
&lt;li&gt;Support for distributed optimization through shared storage&lt;/li&gt;
&lt;li&gt;Visualization tools to understand the search space and parameter importance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a richer intro to Optuna, see this &lt;a href="https://www.youtube.com/watch?v=P6NwZVl8ttc" rel="noopener noreferrer"&gt;video&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Optuna to Avoid Endless Hyperparameter Tuning:
&lt;/h3&gt;

&lt;p&gt;A practical approach to efficiently find good hyperparameters is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run an initial broad search to identify reasonable ranges and baseline parameters. In a CT pipeline, this would usually happen during the experimentation phase.&lt;/li&gt;
&lt;li&gt;Run a more focused Optuna search over the most promising ranges. In a CT pipeline, this can be repeated when there is data drift, model degradation, or a significant change in the dataset.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Important! Hyperparameter tuning must use an appropriate validation setup. Otherwise, we may only find the configuration that best overfits the validation split, rather than one that generalizes well to the dataset at hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading Dataset
&lt;/h2&gt;

&lt;p&gt;The cell below loads scikit-learn's fetch_california_housing dataset, and changes the column names to snake case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Load California Housing Dataset
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch_california_housing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;as_frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;median_house_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Model Definition and Training
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choosing the right K-fold Split
&lt;/h3&gt;

&lt;p&gt;It is essential to choose the right cross-validation strategy. This depends on the task, whether it is regression or classification, whether the target is imbalanced, whether the order of samples matters, and whether there are groups in the data. For example, if multiple rows belong to the same patient, we may want to avoid having samples from the same patient appear in both training and validation splits.&lt;/p&gt;

&lt;p&gt;Refer to this &lt;a href="https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py" rel="noopener noreferrer"&gt;summary&lt;/a&gt; of the options available in SKlearn for further guidance.&lt;/p&gt;

&lt;p&gt;For simplicity, we can use the following decision rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time_order_matters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;TimeSeriesSplit&lt;/span&gt;   &lt;span class="c1"&gt;# no shuffle equivalent
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;groups_exist&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;classes_are_imbalanced&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;StratifiedGroupKFold&lt;/span&gt;   &lt;span class="c1"&gt;# (no shuffle equivalent)
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;GroupKFold&lt;/span&gt;             &lt;span class="c1"&gt;# → or GroupShuffleSplit
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;classes_are_imbalanced&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;StratifiedKFold&lt;/span&gt;        &lt;span class="c1"&gt;# → or StratifiedShuffleSplit
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;                  &lt;span class="c1"&gt;# → or ShuffleSplit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hyperparemeter Search with Optuna
&lt;/h3&gt;

&lt;p&gt;After choosing the model, in this case LightGBM, we define the hyperparameters that we want to tune and the metric that we want to optimize.&lt;/p&gt;

&lt;p&gt;The cells in this section can be run multiple times until we reach a satisfactory performance level. The variables marked as tweakable are the ones we are likely to adjust between studies.&lt;/p&gt;

&lt;p&gt;The general process is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run an initial study with a broad search space.&lt;/li&gt;
&lt;li&gt;Inspect the best trials, parameter importance, and search-space plots.&lt;/li&gt;
&lt;li&gt;Use those results to define narrower and more promising ranges.&lt;/li&gt;
&lt;li&gt;Run a new study over the refined search space.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Since this is a regression task, we use mean squared error as the metric to minimize. The metric is evaluated using the cross-validation strategy defined above.&lt;/p&gt;

&lt;p&gt;Note: When storage=storage_url points to a supported database, such as SQLite or InterSystems IRIS, Optuna automatically creates the tables needed to track studies, trials, parameters, and results. Each study is identified by its study_name. If the same study name and database are reused with load_if_exists=True, Optuna resumes from the existing study instead of starting from scratch.&lt;/p&gt;

&lt;p&gt;This shared storage is also what enables concurrent optimization: multiple processes, or even multiple machines, can connect to the same database and contribute trials to the same study.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;NUM_TRIALS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="c1"&gt;# Tweak
&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOKY_MAX_CPU_COUNT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# Tweak
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# Tweak
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# Tweak
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# Tweak
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                            &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                            &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neg_mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lightgbm_hyperparam_tuning_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d_%H-%M-%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                            &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="c1"&gt;# storage=storage_url,
&lt;/span&gt;                            &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samplers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TPESampler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;),)&lt;/span&gt;
&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_TRIALS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;show_progress_bar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Best parameters: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Best performance: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[32m[I 2026-05-13 15:58:38,618][0m A new study created in memory with name: lightgbm_hyperparam_tuning_2026-05-13_15-58-38[0m



  0%|          | 0/20 [00:00&amp;lt;?, ?it/s]


[32m[I 2026-05-13 15:59:02,770][0m Trial 0 finished with value: 0.22124664870518 and parameters: {'learning_rate': 0.00727491708802781, 'max_depth': 48, 'n_estimators': 746, 'num_leaves': 255, 'lambda_l2': 0.002570603566117598, 'max_bin': 255}. Best is trial 0 with value: 0.22124664870518.[0m
[32m[I 2026-05-13 15:59:06,986][0m Trial 1 finished with value: 0.2059125561807643 and parameters: {'learning_rate': 0.0823143373099555, 'max_depth': 13, 'n_estimators': 222, 'num_leaves': 63, 'lambda_l2': 0.0032112643094417484, 'max_bin': 255}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 15:59:13,470][0m Trial 2 finished with value: 0.25714400572802726 and parameters: {'learning_rate': 0.01120548642504815, 'max_depth': 40, 'n_estimators': 239, 'num_leaves': 127, 'lambda_l2': 3.850031979199519e-08, 'max_bin': 127}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 15:59:22,415][0m Trial 3 finished with value: 0.26413921215873515 and parameters: {'learning_rate': 0.0050225633119947675, 'max_depth': 7, 'n_estimators': 700, 'num_leaves': 255, 'lambda_l2': 2.133142332373004e-06, 'max_bin': 63}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 15:59:28,245][0m Trial 4 finished with value: 0.20942294704047681 and parameters: {'learning_rate': 0.01811326544803337, 'max_depth': 11, 'n_estimators': 972, 'num_leaves': 31, 'lambda_l2': 6.257956190096665e-08, 'max_bin': 255}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 15:59:54,053][0m Trial 5 finished with value: 0.22529793459324102 and parameters: {'learning_rate': 0.007840758945457348, 'max_depth': 16, 'n_estimators': 838, 'num_leaves': 255, 'lambda_l2': 4.6876566400928895e-08, 'max_bin': 63}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 15:59:57,575][0m Trial 6 finished with value: 0.6243686001512612 and parameters: {'learning_rate': 0.0010296901472345186, 'max_depth': 42, 'n_estimators': 722, 'num_leaves': 31, 'lambda_l2': 0.5860448217200517, 'max_bin': 63}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 16:00:01,328][0m Trial 7 finished with value: 0.25616396880444836 and parameters: {'learning_rate': 0.005194929407101736, 'max_depth': 18, 'n_estimators': 743, 'num_leaves': 31, 'lambda_l2': 0.0703178263660987, 'max_bin': 127}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 16:00:02,230][0m Trial 8 finished with value: 0.4328137375744699 and parameters: {'learning_rate': 0.015952322469109693, 'max_depth': 23, 'n_estimators': 74, 'num_leaves': 63, 'lambda_l2': 1.4726456718740824, 'max_bin': 255}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 16:00:03,606][0m Trial 9 finished with value: 0.5036899804922363 and parameters: {'learning_rate': 0.0033610226697378754, 'max_depth': 6, 'n_estimators': 325, 'num_leaves': 31, 'lambda_l2': 0.1710207048797339, 'max_bin': 127}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 16:00:07,940][0m Trial 10 finished with value: 0.21142577467959092 and parameters: {'learning_rate': 0.14804113057514628, 'max_depth': 30, 'n_estimators': 458, 'num_leaves': 63, 'lambda_l2': 3.757350306893132e-05, 'max_bin': 255}. Best is trial 1 with value: 0.2059125561807643.[0m
[32m[I 2026-05-13 16:00:11,156][0m Trial 11 finished with value: 0.2017814916171883 and parameters: {'learning_rate': 0.08309297264998405, 'max_depth': 12, 'n_estimators': 950, 'num_leaves': 16, 'lambda_l2': 0.0008326596975497944, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:12,488][0m Trial 12 finished with value: 0.20764432653610213 and parameters: {'learning_rate': 0.10507813096831281, 'max_depth': 28, 'n_estimators': 508, 'num_leaves': 16, 'lambda_l2': 0.0016316751769423123, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:12,862][0m Trial 13 finished with value: 0.3044026543083153 and parameters: {'learning_rate': 0.054273532006916266, 'max_depth': 3, 'n_estimators': 131, 'num_leaves': 16, 'lambda_l2': 6.119264662645272e-05, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:16,388][0m Trial 14 finished with value: 0.20646055020810183 and parameters: {'learning_rate': 0.041057846227823123, 'max_depth': 14, 'n_estimators': 366, 'num_leaves': 63, 'lambda_l2': 0.007230065446525416, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:18,008][0m Trial 15 finished with value: 0.21268042685192567 and parameters: {'learning_rate': 0.04807456550053136, 'max_depth': 21, 'n_estimators': 604, 'num_leaves': 16, 'lambda_l2': 6.458243615671745e-06, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:28,022][0m Trial 16 finished with value: 0.21844697644015332 and parameters: {'learning_rate': 0.18423283160212306, 'max_depth': 10, 'n_estimators': 992, 'num_leaves': 127, 'lambda_l2': 9.015211997542714, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:29,373][0m Trial 17 finished with value: 0.20797590828555537 and parameters: {'learning_rate': 0.08294987485804219, 'max_depth': 33, 'n_estimators': 188, 'num_leaves': 63, 'lambda_l2': 0.018231434623139052, 'max_bin': 255}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:30,247][0m Trial 18 finished with value: 0.23633039578627624 and parameters: {'learning_rate': 0.02831149820738454, 'max_depth': 24, 'n_estimators': 355, 'num_leaves': 16, 'lambda_l2': 0.00012197971292668617, 'max_bin': 127}. Best is trial 11 with value: 0.2017814916171883.[0m
[32m[I 2026-05-13 16:00:35,660][0m Trial 19 finished with value: 0.21720640666066582 and parameters: {'learning_rate': 0.07858633974467637, 'max_depth': 13, 'n_estimators': 879, 'num_leaves': 63, 'lambda_l2': 0.0007188574432995588, 'max_bin': 63}. Best is trial 11 with value: 0.2017814916171883.[0m

Best parameters: {'learning_rate': 0.08309297264998405, 'max_depth': 12, 'n_estimators': 950, 'num_leaves': 16, 'lambda_l2': 0.0008326596975497944, 'max_bin': 255}

Best performance: 0.2017814916171883
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below we inspect the best-performing trials from the study. This gives us a quick view of which hyperparameter combinations performed best and helps guide future searches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;trials_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trials_dataframe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;trials_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trials_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trials_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trials_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;trials_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params|value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="n"&gt;top_trials_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trials_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_trials_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_trials_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&amp;nbsp;&lt;/th&gt;
&lt;th&gt;value&lt;/th&gt;
&lt;th&gt;params_lambda_l2&lt;/th&gt;
&lt;th&gt;params_learning_rate&lt;/th&gt;
&lt;th&gt;params_max_bin&lt;/th&gt;
&lt;th&gt;params_max_depth&lt;/th&gt;
&lt;th&gt;params_n_estimators&lt;/th&gt;
&lt;th&gt;params_num_leaves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;0.201781&lt;/td&gt;
&lt;td&gt;8.326597e-04&lt;/td&gt;
&lt;td&gt;0.083093&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;950&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.205913&lt;/td&gt;
&lt;td&gt;3.211264e-03&lt;/td&gt;
&lt;td&gt;0.082314&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;222&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;0.206461&lt;/td&gt;
&lt;td&gt;7.230065e-03&lt;/td&gt;
&lt;td&gt;0.041058&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;366&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;0.207644&lt;/td&gt;
&lt;td&gt;1.631675e-03&lt;/td&gt;
&lt;td&gt;0.105078&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;508&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;0.207976&lt;/td&gt;
&lt;td&gt;1.823143e-02&lt;/td&gt;
&lt;td&gt;0.082950&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;188&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0.209423&lt;/td&gt;
&lt;td&gt;6.257956e-08&lt;/td&gt;
&lt;td&gt;0.018113&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;972&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;0.211426&lt;/td&gt;
&lt;td&gt;3.757350e-05&lt;/td&gt;
&lt;td&gt;0.148041&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;458&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;0.212680&lt;/td&gt;
&lt;td&gt;6.458244e-06&lt;/td&gt;
&lt;td&gt;0.048075&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;604&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;0.217206&lt;/td&gt;
&lt;td&gt;7.188574e-04&lt;/td&gt;
&lt;td&gt;0.078586&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;879&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;0.218447&lt;/td&gt;
&lt;td&gt;9.015212e+00&lt;/td&gt;
&lt;td&gt;0.184233&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;992&lt;/td&gt;
&lt;td&gt;127&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&amp;nbsp;&lt;/th&gt;
&lt;th&gt;value&lt;/th&gt;
&lt;th&gt;params_lambda_l2&lt;/th&gt;
&lt;th&gt;params_learning_rate&lt;/th&gt;
&lt;th&gt;params_max_bin&lt;/th&gt;
&lt;th&gt;params_max_depth&lt;/th&gt;
&lt;th&gt;params_n_estimators&lt;/th&gt;
&lt;th&gt;params_num_leaves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;count&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;1.000000e+01&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mean&lt;/td&gt;
&lt;td&gt;0.209896&lt;/td&gt;
&lt;td&gt;9.047112e-01&lt;/td&gt;
&lt;td&gt;0.087154&lt;/td&gt;
&lt;td&gt;235.800000&lt;/td&gt;
&lt;td&gt;18.500000&lt;/td&gt;
&lt;td&gt;613.900000&lt;/td&gt;
&lt;td&gt;52.100000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;std&lt;/td&gt;
&lt;td&gt;0.005155&lt;/td&gt;
&lt;td&gt;2.849745e+00&lt;/td&gt;
&lt;td&gt;0.049444&lt;/td&gt;
&lt;td&gt;60.715731&lt;/td&gt;
&lt;td&gt;8.759122&lt;/td&gt;
&lt;td&gt;313.844424&lt;/td&gt;
&lt;td&gt;34.252169&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;min&lt;/td&gt;
&lt;td&gt;0.201781&lt;/td&gt;
&lt;td&gt;6.257956e-08&lt;/td&gt;
&lt;td&gt;0.018113&lt;/td&gt;
&lt;td&gt;63.000000&lt;/td&gt;
&lt;td&gt;10.000000&lt;/td&gt;
&lt;td&gt;188.000000&lt;/td&gt;
&lt;td&gt;16.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;td&gt;0.206756&lt;/td&gt;
&lt;td&gt;2.078945e-04&lt;/td&gt;
&lt;td&gt;0.055703&lt;/td&gt;
&lt;td&gt;255.000000&lt;/td&gt;
&lt;td&gt;12.250000&lt;/td&gt;
&lt;td&gt;389.000000&lt;/td&gt;
&lt;td&gt;19.750000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;0.208699&lt;/td&gt;
&lt;td&gt;1.232167e-03&lt;/td&gt;
&lt;td&gt;0.082632&lt;/td&gt;
&lt;td&gt;255.000000&lt;/td&gt;
&lt;td&gt;13.500000&lt;/td&gt;
&lt;td&gt;556.000000&lt;/td&gt;
&lt;td&gt;63.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;td&gt;0.212367&lt;/td&gt;
&lt;td&gt;6.225365e-03&lt;/td&gt;
&lt;td&gt;0.099582&lt;/td&gt;
&lt;td&gt;255.000000&lt;/td&gt;
&lt;td&gt;26.250000&lt;/td&gt;
&lt;td&gt;932.250000&lt;/td&gt;
&lt;td&gt;63.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;max&lt;/td&gt;
&lt;td&gt;0.218447&lt;/td&gt;
&lt;td&gt;9.015212e+00&lt;/td&gt;
&lt;td&gt;0.184233&lt;/td&gt;
&lt;td&gt;255.000000&lt;/td&gt;
&lt;td&gt;33.000000&lt;/td&gt;
&lt;td&gt;992.000000&lt;/td&gt;
&lt;td&gt;127.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After the first broad search, we can estimate which hyperparameters had the strongest impact on performance. This helps us decide which parameters deserve a more focused search in the next study.&lt;/p&gt;

&lt;p&gt;The cell below calculates the importance score for each hyperparameter on a scale from 0 to 1. Higher values indicate parameters that had more influence on the objective metric in this study.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;param_importance_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;importance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_param_importances&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_importance_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_importance_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Importance Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hyperparameter&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hyperparameter Importance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2y37we7351tp19208ew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2y37we7351tp19208ew.png" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the plot above, we can identify the most relevant hyperparameters. Next, we choose how many of the top parameters we want to compare. In this example, we select the two most important ones.&lt;/p&gt;

&lt;p&gt;The contour plot below helps us visualize how these two parameters interact and which regions of the search space produced better results. We can use this to define narrower ranges for future studies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;numparamstocompare&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;best2params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_importance_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;numparamstocompare&lt;/span&gt;&lt;span class="p"&gt;:]]&lt;/span&gt;
&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visualization&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_contour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;best2params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcidul5pd61ethdcfyqm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcidul5pd61ethdcfyqm.png" alt=" " width="592" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &amp;nbsp;
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Concurrent studies to speed up Hyperparameter exploration
&lt;/h1&gt;

&lt;p&gt;Every time we test a set of hyperparameters, we should evaluate it properly using cross-validation to avoid selecting a model that just overfits to a particular train/validation split. This means training as many models as the number of folds we choose.&lt;/p&gt;

&lt;p&gt;For example, using 5-fold or 10-fold cross-validation implies training 5–10 models per hyperparameter configuration. There is no strict rule for the number of folds, but 5 or 10 are commonly used depending on how expensive each model is to train. As a result, evaluating each set of hyperparameters becomes 5–10 times more time-consuming, and this cost increases further as the dataset grows.&lt;/p&gt;

&lt;p&gt;For this reason, we want to accelerate the hyperparameter search. One way to do this is by running multiple processes, each working on the same Optuna study and exploring the same search space in parallel. If a machine has 16 cores, we can run up to 16 workers concurrently, which can significantly reduce the total optimization time (although not always perfectly linearly due to overhead and coordination between workers).&lt;/p&gt;

&lt;p&gt;An important advantage of Optuna is that if all workers point to a common storage database, the study is shared across processes. Optuna will create and manage the required tables in the database, and all workers will contribute trials to the same study. This means that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workers generally avoid evaluating identical hyperparameter configurations&lt;/li&gt;
&lt;li&gt;Completed trials from all workers are used to guide future sampling&lt;/li&gt;
&lt;li&gt;The search becomes more efficient over time as more results are collected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By default, you can specify "sqlite:///optuna_lgbm.db" as the storage parameter, and Optuna will create a local database for the study. The same approach can also be extended to a centralized database such as InterSystems IRIS, enabling distributed hyperparameter tuning across multiple machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optuna's native Concurrency + MLflow model registry
&lt;/h2&gt;

&lt;p&gt;We can combine Optuna for hyperparameter tuning and MLflow for experiment tracking and model registry. This way, we can leverage the same MLflow model registry capabilities shown in this &lt;a href="https://github.com/JorgeIvanJH/IRIS_and_MLflow-Continuous-Training-Pipeline" rel="noopener noreferrer"&gt;repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One of the main advantages of Optuna is how easy it is to scale hyperparameter tuning across processes or even across machines. We can run the same optimization study from different machines, and as long as all of them point to the same storage database, all workers will contribute trials to the same study. As trials finish, Optuna can use the accumulated results to guide future samples.&lt;/p&gt;

&lt;p&gt;In the example below, we run multiple workers against the same Optuna study. Running this as a separate Python script, not in a standard Jupyter notebook, allows parallel hyperparameter tuning with MLflow tracking. MLflow keeps track of the parent run, each child trial run, the final best parameters, the best cross-validation score, and the final trained model.&lt;/p&gt;

&lt;p&gt;The cell below ran 3200 trials in 25 minutes on a Windows laptop with 16 cores, using 16 workers with 200 trials each. Each trial used 3 cross-validation splits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;multiprocessing&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow.lightgbm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mlflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;infer_signature&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fetch_california_housing&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;

&lt;span class="n"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite:///optuna_lgbm.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# for local testing
&lt;/span&gt;

&lt;span class="c1"&gt;# Hyperparameter tuning configuration
&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="n"&gt;BASE_SEED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="n"&gt;NUM_CV_SPLITS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# 5 or 10 would be better
&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LightGBM Hyperparameter Tuning with Optuna and MLflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_CV_SPLITS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load dataset
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_california_housing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;median_house_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;random_state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verbosity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_jobs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;parent_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_PARENT_RUN_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trial_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;parent_run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parent_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# tags={"mlflow.parentRunId": parent_run_id} if parent_run_id else None,
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;child_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neg_mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;crossval_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Log current trial's error metric
&lt;/span&gt;        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metrics&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cv_mse_mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;crossval_score&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fold_idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fold_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fold_idx&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_mse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Make it easy to retrieve the best-performing child run later
&lt;/span&gt;        &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_user_attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;crossval_score&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_PARENT_RUN_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_run_id&lt;/span&gt;

    &lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samplers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TPESampler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;show_progress_bar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;worker_id&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="c1"&gt;# MLflow setup
&lt;/span&gt;    &lt;span class="n"&gt;datetime_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;RUN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parent_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;STUDY_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optuna_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;tracking_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracking_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;experiment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_experiment_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;experiment_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment_id&lt;/span&gt;


    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RUN_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_system_metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;parent_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parent_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_PARENT_RUN_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_run_id&lt;/span&gt;

        &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_trials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_workers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cv_n_splits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;study_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;worker_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parent_run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;worker_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;
        &lt;span class="n"&gt;best_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;
        &lt;span class="n"&gt;best_child_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_cv_mse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;best_child_run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_child_run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_child_run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Train final model on full dataset with best hyperparameters. Important: keep same seed
&lt;/span&gt;        &lt;span class="n"&gt;final_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;verbosity&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;input_sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;signature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;infer_signature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_sample&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_sample&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lightgbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;lgb_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;input_example&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code above works as a proof of concept when working across different machines. Each machine or process can point to the same shared Optuna storage database and contribute trials to the same study.&lt;/p&gt;

&lt;p&gt;However, if we are using a single PC, the simpler version below is usually preferable. It runs the same study with parallel jobs controlled by Optuna's n_jobs parameter. This approach is simpler and can achieve similar performance, although the exact trials and final best model are not guaranteed to be identical to the multiprocessing version.&lt;/p&gt;

&lt;p&gt;The code below also ran 3200 trials, in this case in 27 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;multiprocessing&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow.lightgbm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mlflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;infer_signature&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fetch_california_housing&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;

&lt;span class="n"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite:///optuna_lgbm.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# for local testing
&lt;/span&gt;

&lt;span class="c1"&gt;# Hyperparameter tuning configuration
&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="n"&gt;BASE_SEED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="n"&gt;NUM_CV_SPLITS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# 5 or 10 would be better
&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LightGBM Hyperparameter Tuning with Optuna and MLflow 2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_CV_SPLITS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load dataset
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_california_housing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;median_house_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;random_state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verbosity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_jobs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;parent_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_PARENT_RUN_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trial_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;parent_run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parent_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# tags={"mlflow.parentRunId": parent_run_id} if parent_run_id else None,
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;child_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neg_mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;crossval_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Log current trial's error metric
&lt;/span&gt;        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metrics&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cv_mse_mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;crossval_score&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fold_idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fold_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fold_idx&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_mse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Make it easy to retrieve the best-performing child run later
&lt;/span&gt;        &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_user_attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child_run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;crossval_score&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="c1"&gt;# MLflow setup
&lt;/span&gt;    &lt;span class="n"&gt;datetime_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;RUN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parent_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;STUDY_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optuna_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;tracking_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracking_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;experiment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_experiment_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXPERIMENT_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;experiment_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;experiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment_id&lt;/span&gt;


    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RUN_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_system_metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;parent_run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parent_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_PARENT_RUN_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_run_id&lt;/span&gt;

        &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_trials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_workers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cv_n_splits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;study_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;show_progress_bar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;best_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;
        &lt;span class="n"&gt;best_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;
        &lt;span class="n"&gt;best_child_run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_attrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_params&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_cv_mse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;best_child_run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_child_run_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;best_child_run_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Train final model on full dataset with best hyperparameters. Important: keep same seed
&lt;/span&gt;        &lt;span class="n"&gt;final_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;verbosity&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;input_sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;signature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;infer_signature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_sample&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_sample&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lightgbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;lgb_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;best_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;input_example&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a result of running either script, we get a parent run in MLflow with the final best model trained using the best hyperparameters found across the 3200 trials. The parent run also stores the best hyperparameters, the best cross-validation score, and the ID of the best child run. Each child run contains the parameters and metrics for one Optuna trial.&lt;/p&gt;

&lt;p&gt;All of this can be explored in the MLflow UI, for example at &lt;a href="http://localhost:5000/#/experiments" rel="noopener noreferrer"&gt;http://localhost:5000/#/experiments&lt;/a&gt;, where we can inspect the parent run, compare child runs, and download or register the final model.&lt;/p&gt;

&lt;p&gt;In the image below, we see two plots from MLflow's UI. On the left, we get a sense of the search space by comparing the mean cross-validation MSE across trials with different values of max_depth and num_leaves. On the right, we see the 100 worst models, meaning the trials with the highest mean squared error across cross-validation. The best found model achieved a score of approximately 0.199580.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofddwuf9p4n8tu7100m8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofddwuf9p4n8tu7100m8.png" alt=" " width="799" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optuna Concurrency + IRIS DB
&lt;/h2&gt;

&lt;p&gt;When trying to replicate the same process with IRIS DB as the Optuna storage backend, multiple issues arose when running more than 4 workers in parallel. This is likely related to how each worker process creates its own connection to IRIS and writes trial metadata concurrently to the same Optuna study.&lt;/p&gt;

&lt;p&gt;The code below worked fine with up to 3 workers running at the same time. Another option is to keep a single Python process pointing to IRIS and set Optuna's n_jobs parameter to the number of concurrent jobs we want (just as we did above). This approach uses threads inside one process, which can be simpler from a database-connection perspective because it avoids multiple independent Python processes creating separate connections to IRIS.&lt;/p&gt;

&lt;p&gt;However, this approach is not always equivalent to multiprocessing. Since Optuna's n_jobs uses threads, CPU-bound Python code can be limited by Python's GIL. In this specific example, most of the expensive work is done by LightGBM and scikit-learn routines, so threading may still provide useful speedup, but it may not scale the same way as true multiprocessing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lightgbm&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;multiprocessing&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.pool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NullPool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KFold&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fetch_california_housing&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;

&lt;span class="n"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cpu_count&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_lightgbm_study_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d_%H-%M-%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;
&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_SERVER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_NAMESPACE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_USERNAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IRIS_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iris://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_splits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load Dataset
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_california_housing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;median_house_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_estimators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda_l2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# CHANGEABLE
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;suggest_categorical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;random_state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BASE_SEED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verbosity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n_jobs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LGBMRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_val_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crossvalstrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;scoring&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neg_mean_squared_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;
    &lt;span class="n"&gt;worker_storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_storage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;worker_storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;samplers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TPESampler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_SEED&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;optimize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_TRIALS_PER_WORKER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;show_progress_bar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;worker_id&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_storage&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RDBStorage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STORAGE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;engine_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;poolclass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NullPool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;connect_args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Helps with heavy concurrent writes
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;main_storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_storage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;main_storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;load_if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;main_storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_engine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;main_storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_engine&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;dispose&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;worker_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;worker_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;worker_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NUM_WORKERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worker_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;final_storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_storage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;final_study&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;optuna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_study&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;study_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STUDY_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_storage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Overall Best Value: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Overall Best Params: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_study&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optuna saves the study metadata in IRIS for future reference. This includes studies, trials, trial parameters, trial values, intermediate values, and related metadata in the Optuna storage tables created in IRIS.&lt;/p&gt;

&lt;p&gt;For further performance analysis, we can query these tables directly or, preferably, load the study back through Optuna and use Optuna's built-in visualization and analysis tools to inspect the optimization history, parameter importance, and trial performance.&lt;/p&gt;

&lt;p&gt;The image below shows the Optuna storage tables created in IRIS DB.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvx8slxzjpp0qc5kttjf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvx8slxzjpp0qc5kttjf.png" alt=" " width="715" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Discovering PII Inside InterSystems IRIS</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Tue, 16 Jun 2026 15:34:39 +0000</pubDate>
      <link>https://dev.to/intersystems/discovering-pii-inside-intersystems-iris-1i2l</link>
      <guid>https://dev.to/intersystems/discovering-pii-inside-intersystems-iris-1i2l</guid>
      <description>&lt;p&gt;Data privacy regulations such as GDPR, LGPD, and HIPAA demand that organizations know exactly where Personally Identifiable Information (PII) lives inside their databases. Yet in practice, most teams rely on manual inventories, tribal knowledge, or external scanning tools that require data to leave the database engine — a process that itself creates privacy and security risks.&lt;/p&gt;

&lt;p&gt;This article presents an MVP that takes a different approach: it runs PII detection &lt;strong&gt;inside&lt;/strong&gt; InterSystems IRIS using Embedded Python, analyzing data where it lives and never exporting it to an external process. The result is a lightweight, non-intrusive utility that scans your tables, identifies PII using AI, and produces a structured CSV report — all without data ever leaving the IRIS process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: PII You Don't Know You Have
&lt;/h2&gt;

&lt;p&gt;Organizations today face a painful blind spot. A typical IRIS instance may contain hundreds of tables across dozens of schemas, some holding decades of accumulated data. Columns named &lt;code&gt;ContactInfo&lt;/code&gt;, &lt;code&gt;Notes&lt;/code&gt;, or &lt;code&gt;Description&lt;/code&gt; might silently contain social security numbers, email addresses, or government IDs — sometimes intentionally, sometimes as a side effect of free-text fields that capture whatever users type in.&lt;/p&gt;

&lt;p&gt;Traditional approaches to PII discovery share a common flaw: they require data extraction. You export samples, send them to an external service, or pipe them through a standalone tool. Every step in that pipeline is an additional attack surface and a potential compliance violation.&lt;/p&gt;

&lt;p&gt;The principle of &lt;strong&gt;data sovereignty&lt;/strong&gt; — keeping data within its jurisdiction and under controlled access — suggests a better path: bring the analysis to the data, not the data to the analysis.&lt;/p&gt;

&lt;p&gt;This is not just a technical preference; it is a governance requirement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GDPR (EU)&lt;/strong&gt; — Article 28 requires that any processing of personal data by a third-party processor be governed by a binding contract covering subject-matter, duration, purpose, data types, and obligations [&lt;a href="https://gdpr.eu/article-28-processor/" rel="noopener noreferrer"&gt;Art. 28 GDPR&lt;/a&gt;]. Article 44 extends this further: any transfer of personal data to a third country is permitted only if the conditions of Chapter V are met, ensuring the level of protection guaranteed by the Regulation is not undermined [&lt;a href="https://gdpr.eu/article-44-transfer-of-personal-data/" rel="noopener noreferrer"&gt;Art. 44 GDPR&lt;/a&gt;]. Every external tool you send data to becomes a new processor — and every cross-border transfer triggers these obligations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LGPD (Brazil)&lt;/strong&gt; — Brazil's Lei Geral de Proteção de Dados mirrors GDPR's principles. Article 5(XV) defines "data processing" broadly to include any operation with personal data, and Article 37 requires the appointment of a Data Protection Officer (DPO) by controllers [&lt;a href="https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/l13709.htm" rel="noopener noreferrer"&gt;Lei nº 13.709/2018&lt;/a&gt;]. Any external PII scanning service would itself be classified as a processor under the law.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HIPAA (US)&lt;/strong&gt; — The Security Rule mandates that covered entities and business associates implement technical safeguards to protect the confidentiality, integrity, and availability of electronic protected health information (ePHI). Specifically, the Transmission Security standard (45 CFR §164.312(e)) requires technical security measures to guard against unauthorized access to ePHI that is being transmitted over an electronic network [&lt;a href="https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html" rel="noopener noreferrer"&gt;HIPAA Security Rule Summary&lt;/a&gt;]. Every time ePHI leaves the database engine for an external scan, this safeguard is put at risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running the scan inside the database engine eliminates the transmission step entirely, simplifying compliance and reducing risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Three Decoupled Components
&lt;/h2&gt;

&lt;p&gt;The utility follows a simple but deliberate separation of concerns. Three independent components cooperate in a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PIIScanner  →  PIIIdentifier  →  PIIReporter
(database)     (AI detection)     (reporting)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;PIIIdentifier&lt;/strong&gt; — Wraps the AI detection library. It has zero knowledge of IRIS, SQL, or database schemas. Its single method, &lt;code&gt;identify(text)&lt;/code&gt;, takes a string and returns the highest-confidence PII entity type (e.g., &lt;code&gt;"EMAIL_ADDRESS"&lt;/code&gt;, &lt;code&gt;"PERSON"&lt;/code&gt;, &lt;code&gt;"CPF"&lt;/code&gt;) or &lt;code&gt;None&lt;/code&gt;. This isolation means the detection logic can be tested, swapped, or upgraded without touching the database layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIIScanner&lt;/strong&gt; — The only component that interacts with IRIS. It queries &lt;code&gt;INFORMATION_SCHEMA.TABLES&lt;/code&gt; to discover user tables, samples up to N rows per table via &lt;code&gt;SELECT TOP N *&lt;/code&gt;, feeds each column's values to the identifier, and collects findings. It respects schema exclusion patterns (exact match and wildcard prefix like &lt;code&gt;"Ens*"&lt;/code&gt;) and lets the caller configure the sample size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIIReporter&lt;/strong&gt; — Deduplicates findings and writes a CSV with five columns: &lt;code&gt;schema_name, table_name, column_name, pii_type, confidence&lt;/code&gt;. The confidence score (0.0–1.0) helps reviewers prioritize findings and identify likely false positives.&lt;/p&gt;

&lt;p&gt;This separation is not accidental. It means the identifier could be replaced with a more powerful model tomorrow without changing a single line of scanner or reporter code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Presidio and spaCy: The Detection Engine
&lt;/h2&gt;

&lt;p&gt;The PIIIdentifier is powered by &lt;a href="https://microsoft.github.io/presidio/" rel="noopener noreferrer"&gt;Microsoft Presidio&lt;/a&gt;, an open-source data protection and de-identification framework. Presidio is the current detection engine, but the architecture is deliberately engine-agnostic — the &lt;code&gt;PIIIdentifier&lt;/code&gt; wrapper fully isolates the detection library from the scanner and reporter. Swapping to a different detection approach would only require changes to that one module, leaving the rest of the pipeline untouched. Presidio's analyzer combines two detection strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pattern-based recognizers&lt;/strong&gt; — Regular expressions and checksum validators for structured identifiers: email addresses, phone numbers, SSNs, credit card numbers, CPF, and dozens more. These recognizers are deterministic and language-agnostic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLP-based recognizers&lt;/strong&gt; — Machine learning models that detect entity types like PERSON, LOCATION, and ORGANIZATION from natural language context. This is where spaCy comes in.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The utility configures Presidio with two spaCy models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;en_core_web_sm&lt;/code&gt; — English small model (~12 MB)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pt_core_news_sm&lt;/code&gt; — Portuguese small model (~13 MB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each row of data is analyzed against both languages, and the highest-confidence result wins. Multi-language support is essential for this kind of tool to be useful for users around the world — databases rarely contain data in a single language, and PII detection that only understands English would miss critical findings in Portuguese, Spanish, German, or any other language. The current MVP supports English and Portuguese as a starting point, but the architecture makes it straightforward to add more spaCy models for additional languages.&lt;/p&gt;

&lt;p&gt;For every text input, the &lt;code&gt;identify()&lt;/code&gt; method iterates through both language analyzers, collects all results, and returns the entity type with the highest confidence score:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;identify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;best_entity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;best_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;languages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;best_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;best_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;
                &lt;span class="n"&gt;best_entity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entity_type&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;best_entity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design means a Brazilian CPF mentioned in an English sentence will still be caught by the PT analyzer's pattern recognizer, even though the surrounding text is English.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Inside IRIS: The Embedded Python Advantage
&lt;/h2&gt;

&lt;p&gt;The entire utility runs as a Python module inside the IRIS process via &lt;code&gt;irispython&lt;/code&gt;. No external API calls, no data exports, no network transfers. The scanner uses &lt;code&gt;iris.sql.exec()&lt;/code&gt; — IRIS's native Python SQL interface — to query metadata and sample data directly within the engine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;irispython &lt;span class="nt"&gt;-m&lt;/span&gt; irisapp.pii_discovery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single command starts the scan. The output is a CSV file written to the mounted volume, immediately available on the host machine.&lt;/p&gt;

&lt;p&gt;The utility also integrates with IRIS's built-in Task Scheduler. A &lt;code&gt;%SYS.Task.Definition&lt;/code&gt; subclass (&lt;code&gt;PIIScannerTask&lt;/code&gt;) exposes configurable &lt;code&gt;OutputPath&lt;/code&gt; and &lt;code&gt;SampleSize&lt;/code&gt; properties in the Admin Portal, and its &lt;code&gt;OnTask()&lt;/code&gt; method invokes the Python module via &lt;code&gt;%SYS.Python.Import()&lt;/code&gt;. The task is registered automatically during Docker build and can be scheduled to run periodically — for instance, a weekly PII inventory scan that appends results to a central compliance report.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# One-shot scan from the command line&lt;/span&gt;
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;iris irispython &lt;span class="nt"&gt;-m&lt;/span&gt; irisapp.pii_discovery

&lt;span class="c"&gt;# Scan with custom namespace and sample size&lt;/span&gt;
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;iris irispython &lt;span class="nt"&gt;-m&lt;/span&gt; irisapp.pii_discovery &lt;span class="nt"&gt;-n&lt;/span&gt; USER &lt;span class="nt"&gt;-s&lt;/span&gt; 50

&lt;span class="c"&gt;# Populate sample data + scan in one command&lt;/span&gt;
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;iris irispython &lt;span class="nt"&gt;-m&lt;/span&gt; irisapp.pii_discovery &lt;span class="nt"&gt;--populate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Sample Database: Testing with Realistic Data
&lt;/h2&gt;

&lt;p&gt;To make the utility immediately testable, the project includes a sample database in the &lt;code&gt;PIISample&lt;/code&gt; schema with three tables that cover the main PII patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIISample.Patients&lt;/strong&gt; — Structured single-field PII. Each column holds one type of personal data: full names, email addresses, phone numbers, SSNs/CPFs, and street addresses. The table deliberately mixes US and Brazilian records to exercise both NLP models. Non-PII columns (Diagnosis, AdmissionDate) serve as internal controls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIISample.CustomerFeedback&lt;/strong&gt; — Free-text PII. Narrative paragraphs contain PII embedded in natural language — the hardest detection pattern. Examples include &lt;em&gt;"My SSN is 111-22-3333 for insurance verification"&lt;/em&gt; and &lt;em&gt;"Meu CPF é 345.678.901-22"&lt;/em&gt;. Two rows contain no PII at all, acting as negative controls within the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIISample.Products&lt;/strong&gt; — No PII. A control table with product names, categories, prices, and stock quantities. Ideally the scanner should produce zero findings here — in practice, the small NLP model produces false positives, which we will examine in the results section.&lt;/p&gt;

&lt;p&gt;The sample data is populated by a Python function (&lt;code&gt;populate()&lt;/code&gt;) that runs during Docker build and can be re-invoked at any time. It uses &lt;code&gt;DROP TABLE IF EXISTS&lt;/code&gt; before each &lt;code&gt;CREATE TABLE&lt;/code&gt;, making it idempotent and safe to call repeatedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results: What the Scanner Found — and What It Got Wrong
&lt;/h2&gt;

&lt;p&gt;Running the scanner against the sample database produces something like the following report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_name,table_name,column_name,pii_type,confidence
PIISample,CustomerFeedback,CustomerName,PERSON,0.85
PIISample,CustomerFeedback,FeedbackText,EMAIL_ADDRESS,1.0
PIISample,CustomerFeedback,CreatedAt,DATE_TIME,0.85
PIISample,Patients,FullName,PERSON,0.85
PIISample,Patients,Email,EMAIL_ADDRESS,1.0
PIISample,Patients,Phone,PHONE_NUMBER,0.4
PIISample,Patients,SSN,PHONE_NUMBER,0.4
PIISample,Patients,DateOfBirth,DATE_TIME,0.85
PIISample,Patients,Address,LOCATION,0.85
PIISample,Patients,Diagnosis,LOCATION,0.85
PIISample,Patients,AdmissionDate,DATE_TIME,0.85
PIISample,Products,ProductName,PERSON,0.85
PIISample,Products,Category,LOCATION,0.85
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The true positives are clear: names detected as PERSON, emails as EMAIL_ADDRESS, phone numbers as PHONE_NUMBER, addresses as LOCATION. Confidence scores help reviewers prioritize — well-structured PII like emails consistently scores 0.85, while borderline cases like false positives on the Products table score below 0.5.&lt;/p&gt;

&lt;p&gt;But the results also reveal the limitations of the current approach — and they are not limited to edge cases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Products — not a clean pass.&lt;/strong&gt; The Products table was designed as a no-PII control, containing only product names, categories, prices, and stock quantities. Yet the scanner reports &lt;code&gt;PERSON&lt;/code&gt; in ProductName and &lt;code&gt;LOCATION&lt;/code&gt; in Category. Product names like "Wireless Mouse" and categories like "Sports" are misidentified by the NLP model because the small spaCy model lacks the contextual understanding to distinguish generic nouns from personal names or place names. This is the most striking false positive in the results: a table with zero PII produces two findings, demonstrating exactly where the small model trade-off hurts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diagnosis flagged as LOCATION.&lt;/strong&gt; Medical diagnoses like "Hypertension" and "Diabetes Type 2" are misclassified as LOCATION. This is another NLP false positive — the small model confuses medical terminology with geographic references.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSN detected as PHONE_NUMBER.&lt;/strong&gt; The Patients.SSN column contains values like &lt;code&gt;123-45-6789&lt;/code&gt; (US SSN) and &lt;code&gt;123.456.789-00&lt;/code&gt; (Brazilian CPF). Presidio has dedicated recognizers for both &lt;code&gt;US_SSN&lt;/code&gt; and &lt;code&gt;CPF&lt;/code&gt;, but the small spaCy models sometimes assign a higher confidence score to the PHONE_NUMBER recognizer for these digit-heavy patterns. The scanner reports the highest-scoring entity — which in this case is the wrong one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date columns flagged as DATE_TIME.&lt;/strong&gt; Values like &lt;code&gt;1985-03-15&lt;/code&gt; trigger the DATE_TIME recognizer. Whether dates of birth and admission dates constitute PII is context-dependent: under HIPAA they are, under some interpretations of GDPR they might not be (on their own). The scanner makes no policy judgment — it reports what it finds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One PII type per column.&lt;/strong&gt; The scanner's &lt;code&gt;scan_column()&lt;/code&gt; method returns the first PII type found in a column. If a column contains both email addresses and phone numbers (as FeedbackText does), only the first type detected gets reported. This is by design for the MVP — a full inventory might list all detected types per column.&lt;/p&gt;

&lt;h2&gt;
  
  
  The spaCy Small Model Trade-off
&lt;/h2&gt;

&lt;p&gt;The false positives and misclassifications stem from a deliberate architectural choice: using spaCy's &lt;strong&gt;small&lt;/strong&gt; models (&lt;code&gt;_sm&lt;/code&gt; suffix) rather than medium (&lt;code&gt;_md&lt;/code&gt;) or large (&lt;code&gt;_lg&lt;/code&gt;) variants.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Size (EN)&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Load Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;en_core_web_sm&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~12 MB&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;td&gt;~100 MB&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;en_core_web_md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~40 MB&lt;/td&gt;
&lt;td&gt;Higher&lt;/td&gt;
&lt;td&gt;~300 MB&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;en_core_web_lg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~560 MB&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;~1 GB&lt;/td&gt;
&lt;td&gt;Slow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The small models were chosen for the MVP because they keep the Docker image lean, startup fast, and run comfortably within the memory constraints of a containerized IRIS instance. For a proof-of-concept that needs to demonstrate feasibility, this is the right trade-off.&lt;/p&gt;

&lt;p&gt;But the trade-off is real. Small models have less training data, fewer word vectors, and coarser entity boundaries. In practice, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More false positives&lt;/strong&gt; — The sample database results demonstrate this concretely: the Products table, which contains zero PII, produces two false positive findings (&lt;code&gt;PERSON&lt;/code&gt; in ProductName and &lt;code&gt;LOCATION&lt;/code&gt; in Category). Common nouns like "Wireless Mouse" or "Sports" are misidentified because the small model lacks the word vectors to distinguish them from personal names or place names. Similarly, medical diagnoses like "Hypertension" are misclassified as LOCATION.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More misclassifications&lt;/strong&gt; — SSN and CPF patterns, while matched by Presidio's regex recognizers, can be out-scored by the NLP-based PHONE_NUMBER recognizer when the model's confidence calibration is off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poorer context understanding&lt;/strong&gt; — The small model may fail to distinguish &lt;em&gt;"My name is John"&lt;/em&gt; (PERSON) from &lt;em&gt;"John Deere Equipment"&lt;/em&gt; (ORGANIZATION) without sufficient surrounding context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Upgrading to medium or large models would improve accuracy significantly, but at a cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — The large English model alone requires ~1 GB of RAM at runtime, plus a similar footprint for Portuguese. In a containerized environment, this constrains how many workloads can run alongside IRIS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt; — Loading large models adds 5–10 seconds of startup time per scan. For a scheduled task running at 2 AM, this is acceptable. For an interactive scan triggered from a UI, it may not be.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image size&lt;/strong&gt; — The Docker image would grow by hundreds of megabytes, increasing build times and storage requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An alternative path is replacing spaCy with transformer-based models (e.g., HuggingFace BERT or RoBERTa fine-tuned for NER), which offer state-of-the-art accuracy. Presidio supports this via its &lt;code&gt;NlpEngineProvider&lt;/code&gt; — you can configure a Transformers-backed engine instead of spaCy. But transformer models carry even heavier resource requirements: GPU inference for acceptable latency, multiple gigabytes of memory, and significantly longer processing times per text.&lt;/p&gt;

&lt;p&gt;The architecture of this MVP — with the PIIIdentifier fully isolated from the scanner — makes this upgrade path straightforward. Swap the NLP engine configuration, and the rest of the pipeline continues to work unchanged.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and Cons
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data sovereignty.&lt;/strong&gt; Data never leaves the IRIS process. No external APIs, no network transfers, no intermediate files containing raw PII. The analysis happens where the data lives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-friction deployment.&lt;/strong&gt; Runs inside the same Docker container as IRIS. No separate service to deploy, monitor, or secure. One command to scan, one CSV file as output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bilingual detection.&lt;/strong&gt; Dual-language support (English + Portuguese) out of the box, with a clean pattern for adding more languages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-intrusive.&lt;/strong&gt; Uses sampling (&lt;code&gt;SELECT TOP N&lt;/code&gt;) rather than full table scans. Configurable sample size and schema exclusions let you control scope and impact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task Scheduler integration.&lt;/strong&gt; Automatic periodic scans via the IRIS Admin Portal, with configurable output path and sample size — no cron jobs or external schedulers needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular architecture.&lt;/strong&gt; AI detection, database scanning, and reporting are fully decoupled. Upgrading the detection engine is a one-file change.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small model accuracy.&lt;/strong&gt; As discussed, the spaCy small models produce false positives and misclassifications. This is the most significant limitation for production use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One PII type per column.&lt;/strong&gt; The current scanner reports only the highest-confidence entity type per column, not the full set of PII types present. A column containing both emails and phone numbers will only report one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No column-level exclusion.&lt;/strong&gt; You can exclude schemas, but not individual columns. A &lt;code&gt;notes&lt;/code&gt; column that is known to contain PII might be intentionally excluded from the report to avoid noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No incremental scanning.&lt;/strong&gt; Every run scans all tables from scratch. There is no tracking of previously scanned tables or columns, which limits scalability for large databases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sample-based detection.&lt;/strong&gt; If PII exists only in row 101 and beyond, a &lt;code&gt;SELECT TOP 100&lt;/code&gt; sample will miss it. Random sampling (e.g., &lt;code&gt;TABLESAMPLE&lt;/code&gt;) would be more robust but is not yet implemented.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No false negative analysis.&lt;/strong&gt; No systematic search for false negatives was performed in this work. PII that exists in the database but is not flagged by the scanner goes unnoticed — unlike false positives, which are visible in the report and can be reviewed by a human, false negatives are invisible. The report should be treated as a lower bound of PII presence, not a complete inventory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker build time.&lt;/strong&gt; Installing Presidio, spaCy, and downloading two NLP models adds significant time to the Docker build. This is a one-time cost but can be painful during development iterations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The project runs on InterSystems IRIS Community Edition in Docker. Clone the repository, build the image, and start the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose build
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sample database is populated automatically during the build. To run your first scan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;iris irispython &lt;span class="nt"&gt;-m&lt;/span&gt; irisapp.pii_discovery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The report will be written to &lt;code&gt;pii_report.csv&lt;/code&gt; in the project root. Open it, review the findings, and compare them against the sample data to understand what the scanner catches — and what it doesn't.&lt;/p&gt;

&lt;p&gt;You can check the sample database &lt;a href="http://localhost:55038/csp/sys/exp/%25CSP.UI.Portal.SQL.Home.zen?$NAMESPACE=IRISAPP" rel="noopener noreferrer"&gt;here&lt;/a&gt;, then choosing the &lt;code&gt;PIISample&lt;/code&gt; schema. Use default IRIS Community Version credentials (_system/SYS).&lt;/p&gt;

&lt;p&gt;From there, try the &lt;code&gt;--populate&lt;/code&gt; flag to reset the sample data, change the sample size with &lt;code&gt;-s&lt;/code&gt;, or point the scanner at a different namespace with &lt;code&gt;-n&lt;/code&gt;. The &lt;code&gt;--populate&lt;/code&gt; flag is particularly useful: it resets the sample tables and runs the scan in one step, making iteration fast.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is an MVP — a proof of concept that demonstrates the compute-to-data approach for PII discovery inside InterSystems IRIS. The small NLP models are a starting point, not a ceiling. The architecture is built to grow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was developed with the assistance of Artificial Intelligence tools for drafting and language refinement. All technical validation and final review were performed by the author.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>security</category>
      <category>python</category>
      <category>database</category>
    </item>
    <item>
      <title>AI-Powered Clinical Matching: Introducing iris-medmatch</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sat, 30 May 2026 18:50:37 +0000</pubDate>
      <link>https://dev.to/intersystems/ai-powered-clinical-matching-introducing-iris-medmatch-21a9</link>
      <guid>https://dev.to/intersystems/ai-powered-clinical-matching-introducing-iris-medmatch-21a9</guid>
      <description>&lt;p&gt;In the modern healthcare landscape, finding clinically similar patients often feels like looking for a needle in a haystack. Traditional keyword searches often fail because medical language is highly nuanced; a search for "Heart Failure" might miss a record containing "Congestive Cardiac Failure."&lt;/p&gt;
&lt;p&gt;I am excited to share&lt;strong&gt; iris-medmatch&lt;/strong&gt;, an AI-powered patient matching engine built on &lt;em&gt;&lt;strong&gt;InterSystems IRIS for Health&lt;/strong&gt;&lt;/em&gt;. By leveraging &lt;em&gt;Vector Search,&lt;/em&gt; this tool understands clinical intent rather than just matching literal strings.&lt;br&gt;## The Core Innovation: Semantic Clinical Search&lt;/p&gt;
&lt;p&gt;`iris-medmatch` bridges the gap between raw FHIR data and actionable AI insights. By utilizing the `all-MiniLM-L6-v2` model, the engine transforms clinical conditions into mathematical vectors.&lt;/p&gt;
&lt;p&gt;While standard searches look for exact words, this engine understands **clinical context**. For example, it can match a patient with "Hypertension" to a search for "High Blood Pressure" using mathematical vector similarity.&lt;/p&gt;
&lt;h4&gt;✨ Key Technical Features&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core&lt;/strong&gt;: InterSystems IRIS , Embedded Python, InterSystems FHIR Server, Vector search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI&lt;/strong&gt;: Python, ONNX Runtime, HuggingFace Transformers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Angular 18+&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Technical Architecture&lt;/h4&gt;
&lt;p&gt;The strength of this solution lies in its architectural efficiency. By running Transformers via Embedded Python, we eliminate "data gravity" issues. The data stays in IRIS, and the AI processing happens where the data lives.&lt;/p&gt;
&lt;p&gt;🚀 Application Walkthrough&lt;/p&gt;
&lt;p&gt;1. Semantic Similarity Search (The "Wow" Factor)&lt;/p&gt;
&lt;p&gt;This module uses Vector Search to understand medical synonyms. A search for "Cardiac Issues" will mathematically find "Myocardial Infarction" by comparing their vector positions within IRIS. This is achieved using Native IRIS SQL to calculate similarity scores in sub-seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz49enk42f9179fu48kz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz49enk42f9179fu48kz7.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2. Patient Directory &amp;amp; Condition Enrichment&lt;/p&gt;
&lt;p&gt;This module manages existing FHIR resources. Users can add new diagnoses through a high-performance modal, demonstrating real-time synchronization between standard FHIR data and AI-ready vector data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zt9snzhosh35dnkgdgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zt9snzhosh35dnkgdgr.png" alt=" " width="799" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3. New Patient Registration&lt;/p&gt;
&lt;p&gt;A streamlined entry point for creating new `Patient` resources within the InterSystems ecosystem. This features direct interaction with the FHIR R4 Repository via standard RESTful POST requests, ensuring data is indexed and searchable immediately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq9vlgx4gygeoskxk1eb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq9vlgx4gygeoskxk1eb.png" alt=" " width="800" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Conclusion&lt;/p&gt;
&lt;p&gt;iris-medmatch demonstrates how InterSystems IRIS is evolving into a comprehensive AI-Native database. By combining the reliability of FHIR with the power of Vector Search, we can create healthcare applications that truly "understand" the clinical data they store.&lt;/p&gt;

</description>
      <category>github</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>An Introduction to AI Hub, Part 2: Custom MCP Servers</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sat, 30 May 2026 18:39:47 +0000</pubDate>
      <link>https://dev.to/intersystems/an-introduction-to-ai-hub-part-2-custom-mcp-servers-4fol</link>
      <guid>https://dev.to/intersystems/an-introduction-to-ai-hub-part-2-custom-mcp-servers-4fol</guid>
      <description>&lt;p&gt;Welcome back to a series of introductory articles on AI Hub, the new product feature currently in an early access program! (links: &lt;a href="https://evaluation.intersystems.com/Eval/early-access/AIHub" rel="noopener noreferrer"&gt;EAP Site&lt;/a&gt; for download, &lt;a href="https://github.com/intersystems-community/ai-hub-eap/tree/master" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In the last article, we covered how to create agents and agent tools directly in ObjectScript using the new %AI classes. However, sometimes, instead of creating a new agent, you just want to add some custom tools to an existing agent so you can ask your local claude code, codex, copilot or other agent of choice to query your data directly. This is where MCP Servers might come in.&lt;/p&gt;
&lt;p&gt;In this guide, we will walk through how you can create your own MCP Servers to access your data.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Disclaimer: AI Hub is an early access preview, with features likely to change before production releases, any issues identified can be raised as issues on the documentation GitHub repo linked above. The EAP preview is not to be used in production settings.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;A very brief intro to MCP&lt;/h2&gt;
&lt;p&gt;I'm going to keep this brief because there are loads of other good articles on MCP Servers Model context protocol (I recommend starting with &lt;a href="https://community.intersystems.com/post/model-context-procotol-mcp-intersystems-iris-zero-hero" rel="noopener noreferrer"&gt;this article&lt;/a&gt; from &lt;span&gt;&lt;span&gt;&lt;a class="mentioned-user" href="https://dev.to/pietro"&gt;@pietro&lt;/a&gt;.DiLeo&lt;/span&gt;&lt;/span&gt; or this &lt;a href="https://www.youtube.com/watch?v=pieK0dog66Q" rel="noopener noreferrer"&gt;brilliant introductory video&lt;/a&gt; from InterSystems President Don Woodlock).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model Context Protocol is a transport protocol allowing external tools to be added to an agent&lt;/strong&gt;. There is a discovery 'handshake' where the MCP server sends a list of tools to the MCP Client. After the tools are discovered, the agent can send requests for tool executions, including parameters, to the MCP server, which executes the tool call and returns the result.&lt;/p&gt;
&lt;p&gt;MCP servers can be remote servers, i.e. running on a different machine to a client, this usually uses a streamable http/https connection or Server-Side Events. Or MCP servers can be local servers, i.e. running on the same machine, usually using a stdio connection.&lt;/p&gt;
&lt;h3&gt;An important distinction&lt;/h3&gt;
&lt;p&gt;AI hub allows you to create custom MCP servers within your IRIS environment, allowing agents to access or monitor your IRIS databases, productions and statuses. &lt;strong&gt;It is not a pre-configured MCP server&lt;/strong&gt;.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;If you are looking for a developer tool which gives your agent free access to an IRIS environment to speed up development, you may be looking for a pre-configured MCP server. If you are looking to create production MCP servers which are secure, auditable and fit within IRIS's governed security environment, AI Hub is what you are looking for.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;There are many pre-configured MCP servers which provides tools for developing with IRIS, including &lt;a href="https://github.com/intersystems-community/iris-agentic-dev" rel="noopener noreferrer"&gt;iris-agentic-dev&lt;/a&gt;, an MCP tool and skills library created by &lt;span&gt;&lt;span&gt;@tomd&lt;/span&gt;&lt;/span&gt;. This is a separate project from AI Hub, so look out for an article about this!&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;MCP in AI Hub&lt;/h2&gt;
&lt;p&gt;In the previous article, we covered creating agent tools and toolsets, here we will go through how to serve these tools as an MCP server using both HTTP and STDIO. The code covered in this article is available in the ai-hub-dev-template which is a nice place to start if you want to play around with IRIS AI hub.&lt;/p&gt;
&lt;p&gt;Before getting to this though lets, take a look at the architecture of an AI Hub MCP server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjl2trccwf8s9jk6o24n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjl2trccwf8s9jk6o24n.png" alt=" " width="799" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MCP Clients (blue) communicates with &lt;code&gt;iris-mcp-server&lt;/code&gt; to bridge the gap between the MCP calls (discovery and execution) and IRIS. This binary then communicates with an MCP Server web application, defined using &lt;code&gt;%AI.MCP.Service&lt;/code&gt; as a dispatch class. This dispatch class then routes the tool calls to ObjectScript tool classes which can then operate on IRIS databases. This diagram skips the reverse routing (returning the tool responses) as well as the initial handshake between the MCP Client and the &lt;code&gt;iris-mcp-server.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;There are more details on this architecture in the documentation, but this simplified view covers the key elements that we need to define. These are:&lt;br&gt;1. Tools / Toolsets&lt;br&gt;2. %AI.MCP.Service dispatch class&lt;br&gt;3. MCP Application&lt;br&gt;4. iris-mcp-server configuration&lt;br&gt;5. MCP Client connection&lt;/p&gt;
&lt;p&gt;Lets go through these one by one.&lt;/p&gt;
&lt;h3&gt;1. Tools&lt;/h3&gt;
&lt;p&gt;We define tools or toolsets by extending %AI.Tool or %AI.ToolSet, this was covered in detail in &lt;a href="https://community.intersystems.com/post/introduction-ai-hub-part-1-agents-objectscript" rel="noopener nofollow noreferrer"&gt;Part 1&lt;/a&gt;, so I'm going to skip over this.&lt;/p&gt;
&lt;h3&gt;2. Defining the dispatch class&lt;/h3&gt;
&lt;p&gt;To define an MCP Service dispatch class, we just need to extend &lt;code&gt;%AI.MCP.Service&lt;/code&gt; and point it at the tools/toolsets we want to include in the &lt;code&gt;SPECIFICATION&lt;/code&gt; parameter:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Class Sample.MCPService Extends %AI.MCP.Service
{
    Parameter SPECIFICATION = "Sample.ToolSet";
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could include multiple tool/toolset classes by adding the classes as a comma-separated list, but here we've kept it simple with just one.&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;3. Creating the MCP Application&lt;/h3&gt;
&lt;p&gt;Next up, we create an MCP server application. Like other web applications this can be managed from the management portal, or programmatically with the Security.Applications class. In this case, there is a new MCP server management portal in the Management Portal:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o7v1ug1yqlxgzfxbvaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o7v1ug1yqlxgzfxbvaj.png" alt=" " width="793" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But everything else will feel familiar to developers creating Web Applications in IRIS.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi93aw5uul4yootmclev3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi93aw5uul4yootmclev3.png" alt=" " width="799" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key points are to give an endpoint (e.g &lt;code&gt;/mcp/sample&lt;/code&gt;) and the MCP Service Class we created earlier. I won't show the programmatic version, but this can be done with &lt;code&gt;Security.Applications&lt;/code&gt;, just set the &lt;code&gt;Type&lt;/code&gt; value to &lt;code&gt;18&lt;/code&gt; to register it in the MCP Server menu.&lt;/p&gt;
&lt;p&gt;At this point, you can see the JSON description of tools being served at http://localhost:52773/mcp/sample/v1/services. This means it is discoverable by the &lt;code&gt;iris-mcp-server&lt;/code&gt; binary, &lt;strong&gt;it is not discoverable directly by an mcp client.&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;4. iris-mcp-server configuration&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;iris-mcp-server&lt;/code&gt; binary takes a configuration file when it is run, this is run with the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;iris-mcp-server -c config.toml run&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We'll get to exactly how this command is actually used in the next section (MCP Client), but for now we are going to focus on the config file.&lt;/p&gt;
&lt;p&gt;The first thing to do when writing your config file is to set your connection to IRIS - this requires the credentials for a gateway-privileged user e.g. CSPSystem, the superserver port used for web-gateway (default 1972) and your MCP endpoints:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[[iris]]
name = "local"
server = { host = "localhost", port = 1972, username = "SuperUser", password = "SYS" }
endpoints = [
    {path = "/mcp/sample" }
]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then you have to define your transport. This is done in the &lt;code&gt;[mcp]&lt;/code&gt; block.&lt;/p&gt;
&lt;p&gt;For &lt;code&gt;stdio&lt;/code&gt;, you just need set the type of transport:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[mcp]
transport="stdio"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For http/https, you also have to give the host and port:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[mcp]
transport = "http"
host      = "0.0.0.0"
port      = 8080&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;This port is a different port to the management portal&lt;/strong&gt; and will be used only for this MCP server. This is a common mistake because, although you can find the tool catalog on the management portal port at &lt;a href="http://localhost:52773/mcp/sample/v1/services" rel="noopener noreferrer"&gt;http://localhost:52773/mcp/sample/v1/services&lt;/a&gt;, to actually connect to the MCP server you have to set a different port to communicate to the &lt;code&gt;iris-mcp-server&lt;/code&gt; bridge.&lt;/p&gt;
&lt;h4&gt;Authentication&lt;/h4&gt;
&lt;p&gt;We set the web application to unauthenticated above, but this should be avoided for production use. To add authentication, first set the web application to password authenticated. You can add a username and password or a bearer token to the endpoint:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[[iris]]
name = "local"
server = { host = "localhost", port = 1972, username = "SuperUser", password = "SYS" }
endpoints = [
    {path = "/mcp/sample", username="SuperUser" password="SYS" }
]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are using an HTTP/HTTPS connection, you can also choose to not authenticate here, and instead handle authentication in the requests from the MCP client, this will be shown in the connecting from a client setting below.&lt;/p&gt;
&lt;h4&gt;Other settings&lt;/h4&gt;
&lt;p&gt;There are loads more settings to configure here, like the setting up &lt;strong&gt;OAuth&lt;/strong&gt;, using environment &lt;strong&gt;secrets&lt;/strong&gt; rather than hard-coding settings, configuring &lt;strong&gt;logging and telemetry&lt;/strong&gt; and enabling &lt;strong&gt;smart tool discovery&lt;/strong&gt;. To get more details on this, there is a &lt;a href="https://github.com/intersystems-community/ai-hub-eap/blob/master/MCP_Server_Guide.md" rel="noopener nofollow noreferrer"&gt;full guide on the iris-mcp-server usage&lt;/a&gt;, but for basic usage you just need to define &lt;code&gt;[[iris]]&lt;/code&gt; and &lt;code&gt;[mcp]&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;5. Connecting from an MCP Client&lt;/h2&gt;
&lt;p&gt;The method for adding an MCP server will differ depending on which client you are using, but in general there will be an option somewhere in your agent customization settings to add an MCP server. For example, to set up an MCP server on GitHub Copilot, type &lt;code&gt;&amp;gt;MCP: Add Server...&lt;/code&gt; into the VS Code Search bar, or for Claude Code, you can run &lt;code&gt;claude mcp add...&lt;/code&gt;.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The first option will likely be a choice between stdio or http(s) transport. These have quite different connection methods so lets tackle them individually.&lt;/p&gt;
&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3&gt;Stdio&lt;/h3&gt;
&lt;p&gt;To use a stdio mcp server, you add &lt;code&gt;/path/to/iris-mcp-server&lt;/code&gt; as the executable. The default location for this in a docker container is &lt;code&gt;/usr/irissys/bin&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You also need to add arguments for the stdio config file &lt;code&gt;config_stdio.toml&lt;/code&gt; and the run command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/usr/irissys/bin/iris-mcp-server -c config_stdio.toml run&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As the authentication is is included in the config file, this is all that is required.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Note, the &lt;code&gt;iris-mcp-server&lt;/code&gt; binary has to be on the same machine (or container) as your MCP client!&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3&gt;HTTP&lt;/h3&gt;
&lt;p&gt;To connect to a remote HTTP MCP server from an MCP client, you first need to start the iris-mcp-server transport by opening a shell and running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;iris-mcp-server -c config_http.toml run&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unlike the STDIO connection, this needs to be continuously running for the HTTP connection to be usable.&lt;/p&gt;
&lt;p&gt;With this running, we can then add this server to our MCP client by selecting HTTP as the transport or type, and giving the server URL: http://localhost:8080/mcp/sample.&lt;/p&gt;
&lt;p&gt;If we wanted to set authentication at the MCP connection level (rather than the configuration level detailed above), we use standard HTTP authentication headers, like &lt;code&gt;Basic base64(Username:Password)&lt;/code&gt; or &lt;code&gt;Bearer &amp;lt;token&amp;gt;&lt;/code&gt;. The following Python snippet shows an example of connecting to an MCP server using Langchain's &lt;code&gt;MultiServerMCPClient&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import base64
from langchain_mcp_adapters.client import MultiServerMCPClient

AUTH_HEADER = base64.b64encode(b"SuperUser:SYS").decode("utf-8")
async def get_tools():
    client = MultiServerMCPClient(
        {
            "minimal": {
                "transport": "http",
                "url": "http://localhost:8080/mcp/sample",
                "headers": {"Authorization": f"Basic {AUTH_HEADER}"},
            }
        }
    )

    tools = await client.get_tools()
    return tools&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;&amp;nbsp;&lt;/h1&gt;
&lt;h1&gt;Conclusions&lt;/h1&gt;
&lt;p&gt;We've reached the end of Part 2 of this series! It has been a long part, so well done if you have got this far. Hopefully this guide will give you the confidence to use the AI Hub preview to start building your own MCP servers inside IRIS, giving agents secure and governed access to your IRIS instance.&lt;/p&gt;
&lt;p&gt;If you want to see example code from this article, it is all included in the &lt;a href="https://openexchange.intersystems.com/package/ai-hub-dev-template" rel="noopener noreferrer"&gt;ai-hub-dev-template&lt;/a&gt; on open exchange. This is an example docker project that you can easily clone and use to start working with AI Hub on your local machine. In it, there is sample MCP server and as well as example code to programmatically create the MCP Server Web Application.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
    <item>
      <title>New SMART on FHIR v2 Scopes</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 24 May 2026 15:13:59 +0000</pubDate>
      <link>https://dev.to/intersystems/new-smart-on-fhir-v2-scopes-e3j</link>
      <guid>https://dev.to/intersystems/new-smart-on-fhir-v2-scopes-e3j</guid>
      <description>&lt;p&gt;In v2026.1 we introduced support for a more robust and real-life secure authorization for your FHIR endpoints.&lt;/p&gt;
&lt;p&gt;This is achieved by using &lt;a href="https://docs.intersystems.com/irisforhealthlatest/csp/docbook/DocBook.UI.Page.cls?KEY=HXFHIRADM_server_auth#HXFHIRADM_server_auth_oauth_scopes" rel="noopener noreferrer"&gt;SMART on FHIR v2 fine-grained scopes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt7urawsgh29q7so1ca0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt7urawsgh29q7so1ca0.png" alt=" " width="800" height="435"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;h2&gt;Focus - Not SMART in general, rather, the fine-grained scopes; Hands-on easy sample&lt;/h2&gt;
&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;p&gt;I have dived into the topic of SMART on FHIR in the past, for example see &lt;a href="https://community.intersystems.com/post/smart-fhir-app-sample-hands-exerciseworkshop-instructions" rel="noopener noreferrer"&gt;this article&lt;/a&gt; I wrote (with an accompanying &lt;a href="https://openexchange.intersystems.com/package/smart-day-hands-on" rel="noopener noreferrer"&gt;Open Exchange app&lt;/a&gt;, and &lt;a href="https://www.youtube.com/watch?v=OHaZ5qiyQ1c" rel="noopener noreferrer"&gt;related video series&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Also others have discussed this topic, for example &lt;span&gt;&lt;span&gt;@LuisAngel.PérezRamos&lt;/span&gt;&lt;/span&gt; in his &lt;a href="https://community.intersystems.com/post/developing-smart-fhir-applications-auth0-and-intersystems-iris-fhir-server-introduction" rel="noopener noreferrer"&gt;Developing SMART On FHIR Applications with Auth0 and InterSystems IRIS FHIR Server&lt;/a&gt; article series, &lt;a class="mentioned-user" href="https://dev.to/nicole"&gt;@nicole&lt;/a&gt;.Sun&lt;span&gt;&lt;span&gt;&amp;nbsp;in her&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;a href="https://community.intersystems.com/post/smart-fhir-ehr-launch-iris-health" rel="noopener noreferrer"&gt;SMART on FHIR EHR Launch with IRIS for Health&lt;/a&gt; article, and &lt;a class="mentioned-user" href="https://dev.to/kate"&gt;@kate&lt;/a&gt;.Lau&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;in her two-part&amp;nbsp;&lt;a href="https://community.intersystems.com/post/using-postman-testing-oauth20-intersystems-fhir-repository-part1" rel="noopener noreferrer"&gt;Using Postman for testing the OAuth2.0 of the InterSystems FHIR repository&lt;/a&gt;.&lt;/p&gt;In addition this Learning Services video - &lt;a href="https://www.youtube.com/watch?v=wAB-msyXq_8" id="OWAb43c6521-1114-4055-9f43-89a408a783e1" rel="noopener noreferrer"&gt;Configuring OAuth for InterSystems FHIR Server&lt;/a&gt;&amp;nbsp;- explains this nicely, and even demonstrates part of the latest SMART scope-based result filtering that we'll discuss here.But in the above mentioned articles and samples, we either used InterSystems IRIS itself as the OAuth Server, or a 3rd party cloud OAuth Server (like auth0 by Okta), but in this article and sample I want to do a few things differently -&lt;p&gt;1. I want to use a 3rd party OAuth Server, but not one you will need to register (and perhaps pay) for. This will be &lt;a href="https://www.keycloak.org/" rel="noopener noreferrer"&gt;&lt;strong&gt;Keycloak&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;2. I want to take care of all of the setup for you in a &lt;strong&gt;Dockerized sample&lt;/strong&gt; -&amp;nbsp;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;span&gt;&lt;em&gt;InterSystems IRIS for Health&lt;/em&gt;&lt;/span&gt; up an running with a FHIR Endpoint defined, including an OAuth client defined, and some Resources in the Repository.&lt;/li&gt;
&lt;li&gt;
&lt;span&gt;&lt;em&gt;Keycloak &lt;/em&gt;&lt;/span&gt;up and running with a client corresponding to the IRIS OAuth client.&lt;/li&gt;
&lt;li&gt;A &lt;span&gt;&lt;em&gt;Postman &lt;/em&gt;&lt;/span&gt;Collection to allow for quick testing and demonstration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;3. I want to focus on the relatively newer &lt;strong&gt;fine-grained SMART scopes&lt;/strong&gt;, not the basic ones. This is really the crux of the matter here. The other two above, are enablers for letting us focus just on this item.&lt;/p&gt;
&lt;h2&gt;SMART Scopes - The Granular Fine-grained Version&lt;/h2&gt;
&lt;p&gt;Above I generated (thank you NotebookLM) a nice infographic that summarizes the general syntax and usage of SMART scopes.&lt;/p&gt;
&lt;p&gt;In particular I want to focus on the filter part, the part in the scopes from the question mark (?).&lt;/p&gt;
&lt;p&gt;Here you can use standard FHIR Search syntax, with standard FHIR Search Parameters.&lt;/p&gt;
&lt;p&gt;Let's take this example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5n17wu4jx19gnypxukg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5n17wu4jx19gnypxukg.png" alt=" " width="799" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of just allowing access (Read &amp;amp; Search in this case) to all categories of Observations, here we are allowing only access to lab results (category=laboratory).&lt;/p&gt;
&lt;p&gt;So to illustrate, instead of getting access to a set like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb9gm49k6et2pd63kmro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb9gm49k6et2pd63kmro.png" alt=" " width="800" height="714"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can limit the access to a set like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmg0nc9x3461rasx2teq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmg0nc9x3461rasx2teq.png" alt=" " width="800" height="705"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This allows for an ABAC (Attribute-Based Access Control) approach to access FHIR data (see more about this topic in the &lt;a href="https://build.fhir.org/security.html#binding" rel="noopener noreferrer"&gt;FHIR docs&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Some local national regulations mandate enforcing this kind of access control, including for example using security tags to limit access to data.&lt;/p&gt;
&lt;p&gt;One example is the ONC certification in the US, but other countries have similar demands.&lt;/p&gt;
&lt;p&gt;So supporting this is not only important for securing your data, it is also a hard requirement by local law, in a growing number of places.&lt;/p&gt;
&lt;h2&gt;The Power to Filter (or Not to)&lt;/h2&gt;
&lt;p&gt;You can control whether, if the FHIR request does not adhere exactly to the scopes, to filter out the unauthorized data and return just what is allowed, or to fail the request and return a 403 error HTTP status.&lt;/p&gt;
&lt;p&gt;This setting is in the FHIR endpoint Authorization settings:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft058qt6elitfrzr2jluw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft058qt6elitfrzr2jluw.png" alt=" " width="799" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note this is relevant not only to fine-grained scopes but also without using the ? filter. For example if you use _include or the $everything operation, this could filter "whole" Resource Types from the Result Set.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Here's an example to illustrate -&lt;/p&gt;
&lt;h4&gt;Observation Search Example&lt;/h4&gt;
&lt;p&gt;Say we issue a Search for Observations, using Basic Authentication, so no SMART Scope are applied.&lt;/p&gt;
&lt;p&gt;You can see here we are getting back 793 Resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bt29oujkxpe55h8wqmf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bt29oujkxpe55h8wqmf.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Looking a little closer we can see for example the first one has a category of vital-signs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eypyi5t0u33kjl0ckwl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eypyi5t0u33kjl0ckwl.png" alt=" " width="799" height="333"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And in comparison if I use OAuth 2 authentication and have a scope of user/Observation.rs?category=laboratory, we get only 385 (vs. 793 above) Resources:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzelpiswoe3m8426i671.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzelpiswoe3m8426i671.png" alt=" " width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;And the first one (instead of vital-signs) is usurpingly of a category of laboratory:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5zcmyrm8cltdtx6lm2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5zcmyrm8cltdtx6lm2y.png" alt=" " width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A similar comparison can been seen with $everything -&lt;/p&gt;
&lt;h4&gt;$everyting Example&lt;/h4&gt;
&lt;p&gt;With Basic Authentication (no Scopes):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjlpi45fufagp91fb78q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjlpi45fufagp91fb78q.png" alt=" " width="799" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We get of course the Patient Resource itself, but also related Resources (per the example above): Encounter, Practitioner, Organization, Condition, Claim, ExplanationOfBenefit, Observation (of various types), MedicationRequest, Immunization, DiagnosticReport&lt;/p&gt;
&lt;p&gt;With OAuth (and scopes that include only: user/Patient.rs and user/Observation.rs?category=laboratory):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ahgk6ynaahkv24bposl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ahgk6ynaahkv24bposl.png" alt=" " width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here apart from the Patient itself, we only get Observations (laboratory ones), and no other related Resources.&lt;/p&gt;
&lt;p&gt;So, 171 Resources vs. 35 after the filtering.&lt;/p&gt;
&lt;h2&gt;Some Technical Notes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;As mentioned you need to be at least on v2026.1 to support this.&lt;/li&gt;
&lt;li&gt;Most FHIR Interactions are supported for these kind of Scopes (Create, Read, Update, Delete, Search), some not yet (History, VRead)&lt;/li&gt;
&lt;li&gt;As mentioned in the filter search string you can use standard FHIR Search syntax, but some parameters simply won't make sense in this context (like _include), so some might fail the request and others might simply be ignored, see referenced Docs for details.&lt;/li&gt;
&lt;li&gt;There are some notes re the $everything and $lastn, again see Docs for details.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Debugging&lt;/h2&gt;
&lt;p&gt;While using OAuth in general, and with SMART scopes in particular, not everything will work always as expected at first.&lt;/p&gt;
&lt;p&gt;Good resources to debug your situation will be the FHIR Server Log (aka FSLOG) and the HTTP Request Log (aka ISCLOG), see more details in &lt;a href="https://docs.intersystems.com/irisforhealth20261/csp/docbook/DocBook.UI.Page.cls?KEY=HXFHIRADM_server_debugMaintain#HXFHIRADM_server_debug_log" rel="noopener noreferrer"&gt;the Docs here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To illustrate here's an example -&lt;/p&gt;
&lt;p&gt;Say this time we turned off the filter results settings, and we're trying to Search for all Observation while our scope allows only laboratory.&lt;/p&gt;
&lt;p&gt;We will get a 403 Forbidden HTTP status:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm6ihjxkoaa48t6qxj3l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm6ihjxkoaa48t6qxj3l.png" alt=" " width="799" height="194"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And if we turned on the FHIR Server Log, we can see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce2g0jfrh9xgajiv5py7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce2g0jfrh9xgajiv5py7.png" alt=" " width="799" height="48"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6odvcizcas74buyipml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6odvcizcas74buyipml.png" alt=" " width="794" height="41"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Sample Demo&lt;/h2&gt;
&lt;p&gt;Tackling a topic hands-on always helps and deepens the understanding, so I encourage you to take the related Open Exchange app for a ride. It is a very simple click &amp;amp; go sample, where a docker compose will build and start up everything you need, and includes a sample Postman Collection for you test drive with.&lt;/p&gt;
&lt;p&gt;Here's a recording of the demo from a READY 2026 session:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.intersystems.com/smarter-scopes-in-action-live-demo-of-smart-v2-with-is-fhir-server-intersystems/" rel="noopener noreferrer"&gt;SMARTer Scopes in Action - Live Demo of SMART v2 with InterSystems FHIR Server&lt;/a&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>oauth</category>
      <category>security</category>
      <category>programming</category>
    </item>
    <item>
      <title>Continuous integration in IRIS with Git and Jenkins</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 24 May 2026 14:36:04 +0000</pubDate>
      <link>https://dev.to/intersystems/continuous-integration-in-iris-with-git-and-jenkins-a87</link>
      <guid>https://dev.to/intersystems/continuous-integration-in-iris-with-git-and-jenkins-a87</guid>
      <description>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In healthcare interoperability environments, InterSystems Health Connect typically contains critical components such as productions, business processes, operations, services, utility classes, routines, and other ObjectScript artifacts. Traditionally, many deployments of these components have been done manually, by copying classes, importing XML, or using administrative tools from the management portal.&lt;/p&gt;
&lt;p&gt;While this approach may work in the initial stages, it becomes difficult to maintain as the project grows, when multiple developers are working in parallel, or when repeatable deployments are needed across environments such as development, integration, pre-production, and production.&lt;/p&gt;
&lt;p&gt;A more robust alternative is to integrate Health Connect within a &lt;strong&gt;continuous integration&lt;/strong&gt; flow , using Git as the source code repository and Jenkins as the deployment orchestrator.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl34gn2yqdfrs5k3m0d5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl34gn2yqdfrs5k3m0d5.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The aim of this article is to show a practical approach to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Versioning Health Connect code on GitHub.&lt;/li&gt;
&lt;li&gt;Detect only the files modified since the last deployment.&lt;/li&gt;
&lt;li&gt;Copy those files to a staging folder.&lt;/li&gt;
&lt;li&gt;Load and compile the changes to a Health Connect namespace.&lt;/li&gt;
&lt;li&gt;Run the entire process remotely from Jenkins using SSH.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Architecture&lt;/h2&gt;
&lt;p&gt;For our example, we have configured the following elements:&lt;/p&gt;
&lt;h3&gt;IRIS for Health Instance&lt;/h3&gt;
&lt;p&gt;I have deployed InterSystems IRIS for Health on an AWS machine with RHEL10 with its own Apache Server and enabled connectivity via HTTP and SSH.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;For development, I have configured Visual Studio Code to work on a local instance of IRIS, on which I will make the code changes that I will then upload to GitHub.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F809bozgf0dgwdui9b5z3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F809bozgf0dgwdui9b5z3.png" alt=" " width="800" height="465"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;h3&gt;GitHub repository&lt;/h3&gt;
&lt;p&gt;We have chosen GitHub as our version control system, taking advantage of the extension available in Visual Studio Code. This will allow us to work with branches if necessary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz64svj9sc5zv3kneiro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz64svj9sc5zv3kneiro.png" alt=" " width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This element will be key to the CI/CD process since it is where we can obtain the latest code developed for deployment.&lt;/p&gt;
&lt;h3&gt;Jenkins&lt;/h3&gt;
&lt;p&gt;For those of you who don't know Jenkins, it's an open-source automation server widely used for continuous integration processes because it has a multitude of plugins that will make the task easier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql4ffpcym4bfnezvjbgj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql4ffpcym4bfnezvjbgj.png" alt=" " width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Jenkins has a Groovy scripting tool that allows us to implement the necessary steps for the integration process. For this example, we won't get too complicated.&lt;/p&gt;
&lt;h2&gt;Integration procedure&lt;/h2&gt;
&lt;p&gt;For this example, we've assumed we're working on an interoperability project with a DEVELOPMENT instance (deployed on the AWS server) where we want to deploy the changes developers make to their local instances for testing. The steps would be roughly as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The developer implements the functionalities in their local instance.&lt;/li&gt;
&lt;li&gt;The developer uploads changes to the corresponding branch of the GitHub repository.&lt;/li&gt;
&lt;li&gt;The person responsible for the deployment accesses Jenkins and launches a pipeline.&lt;/li&gt;
&lt;li&gt;Jenkins connects via SSH to the DEVELOPMENT server.&lt;/li&gt;
&lt;li&gt;A Linux script is running on the server.&lt;/li&gt;
&lt;li&gt;The script downloads the latest changes from the repository using a git pull.&lt;/li&gt;
&lt;li&gt;This script identifies new or modified files that are copied to a server directory.&lt;/li&gt;
&lt;li&gt;With the files identified, the script invokes a second script in ObjectScript.&lt;/li&gt;
&lt;li&gt;The second script loads and compiles the files into the IRIS for Health instance.&lt;/li&gt;
&lt;li&gt;If the upload was successful, the script restarts production.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As you can see, we have chosen a very basic operation, but one that can be quite helpful.&lt;/p&gt;
&lt;p&gt;Let's now take a look at the scripts we will run using Jenkins on our DEVELOPMENT server:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/env bash &lt;br&gt;
set -euo pipefail 
&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;
&lt;h1&gt;
  
  
  Configuration
&lt;/h1&gt;
&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;p&gt;REPO_URL="&lt;a href="https://github.com/intersystems-ib/workshop-cicd-demo" rel="noopener noreferrer"&gt;https://github.com/intersystems-ib/workshop-cicd-demo&lt;/a&gt;" &lt;br&gt;
BRANCH="main" &lt;/p&gt;

&lt;h1&gt;
  
  
  Local clone used to compare commits
&lt;/h1&gt;

&lt;p&gt;CACHE_REPO="/opt/git-cache/project_repo" &lt;/p&gt;

&lt;h1&gt;
  
  
  Folder to copy the files to be uploaded into Health Connect
&lt;/h1&gt;

&lt;p&gt;EXPORT_DIR="/projectGit" &lt;/p&gt;

&lt;h1&gt;
  
  
  File with the latest processed commit
&lt;/h1&gt;

&lt;p&gt;STATE_FILE="${CACHE_REPO}/.last_sync_commit" &lt;/p&gt;

&lt;h1&gt;
  
  
  CLean up EXPORT_DIR before to copy the new updates
&lt;/h1&gt;

&lt;p&gt;CLEAN_EXPORT_DIR="true" &lt;/p&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Validations
&lt;/h1&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;p&gt;if ! command -v git &amp;gt;/dev/null 2&amp;gt;&amp;amp;1; then &lt;br&gt;
  echo "Error: git is not installed." &lt;br&gt;
  exit 1 &lt;br&gt;
fi &lt;/p&gt;

&lt;p&gt;mkdir -p "${EXPORT_DIR}" &lt;br&gt;
mkdir -p "$(dirname "${CACHE_REPO}")" &lt;/p&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Clone or update cache folder
&lt;/h1&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;p&gt;if [ ! -d "${CACHE_REPO}/.git" ]; then &lt;br&gt;
  echo "Cloning repository into cache..." &lt;br&gt;
  git clone --branch "${BRANCH}" "${REPO_URL}" "${CACHE_REPO}" &lt;br&gt;
else &lt;br&gt;
  echo "Updating local cache..." &lt;br&gt;
  git -C "${CACHE_REPO}" fetch origin &lt;br&gt;
  git -C "${CACHE_REPO}" checkout "${BRANCH}" &lt;br&gt;
  git -C "${CACHE_REPO}" reset --hard "origin/${BRANCH}" &lt;br&gt;
fi &lt;/p&gt;

&lt;p&gt;REMOTE_COMMIT="$(git -C "${CACHE_REPO}" rev-parse HEAD)" &lt;/p&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;h1&gt;
  
  
  First execution
&lt;/h1&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;p&gt;if [ ! -f "${STATE_FILE}" ]; then &lt;br&gt;
  echo "First execution." &lt;br&gt;
  echo "Copying all the contains from branch into ${EXPORT_DIR}..." &lt;/p&gt;

&lt;p&gt;if [ "${CLEAN_EXPORT_DIR}" = "true" ]; then &lt;br&gt;
    find "${EXPORT_DIR}" -mindepth 1 -maxdepth 1 -exec rm -rf {} + &lt;br&gt;
  fi &lt;/p&gt;

&lt;p&gt;rsync -av --delete --exclude ".git" "${CACHE_REPO}/" "${EXPORT_DIR}/" &lt;/p&gt;

&lt;p&gt;echo "${REMOTE_COMMIT}" &amp;gt; "${STATE_FILE}" &lt;br&gt;
  echo "First export finished." &lt;br&gt;
  exit 0 &lt;br&gt;
fi &lt;/p&gt;

&lt;p&gt;LAST_COMMIT="$(cat "${STATE_FILE}")" &lt;/p&gt;

&lt;p&gt;if [ "${LAST_COMMIT}" = "${REMOTE_COMMIT}" ]; then &lt;br&gt;
  echo "No updates." &lt;br&gt;
  exit 0 &lt;br&gt;
fi &lt;/p&gt;

&lt;p&gt;echo "Comparing commits:" &lt;br&gt;
echo "  anterior: ${LAST_COMMIT}" &lt;br&gt;
echo "  actual:   ${REMOTE_COMMIT}" &lt;/p&gt;

&lt;p&gt;if [ "${CLEAN_EXPORT_DIR}" = "true" ]; then &lt;br&gt;
  echo "Cleaning up export folder..." &lt;br&gt;
  find "${EXPORT_DIR}" -mindepth 1 -maxdepth 1 -exec rm -rf {} + &lt;br&gt;
fi &lt;/p&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Export just added or modified files
&lt;/h1&gt;

&lt;h1&gt;
  
  
  =========================
&lt;/h1&gt;

&lt;p&gt;while IFS= read -r -d '' status &amp;amp;&amp;amp; IFS= read -r -d '' path1; do &lt;br&gt;
  case "${status}" in &lt;br&gt;
    M|A) &lt;br&gt;
      echo "Exporting ${status}: ${path1}" &lt;br&gt;
      mkdir -p "${EXPORT_DIR}/$(dirname "${path1}")" &lt;br&gt;
      cp -f "${CACHE_REPO}/${path1}" "${EXPORT_DIR}/${path1}" &lt;br&gt;
      ;; &lt;/p&gt;

&lt;pre class="highlight plaintext"&gt;&lt;code&gt;D) 
  # Ignoring deletes 
  echo "Ignoring deleted: ${path1}" 
  ;; 

R*) 
  IFS= read -r -d '' path2 
  echo "Exporting renamed: ${path1} -&amp;amp;gt; ${path2}" 
  mkdir -p "${EXPORT_DIR}/$(dirname "${path2}")" 
  cp -f "${CACHE_REPO}/${path2}" "${EXPORT_DIR}/${path2}" 
  ;; 

*) 
  echo "Change not automatically managed: ${status} ${path1}" 
  ;; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;esac &lt;br&gt;
done &amp;lt; &amp;lt;(git -C "${CACHE_REPO}" diff --name-status -z "${LAST_COMMIT}" "${REMOTE_COMMIT}") &lt;/p&gt;

&lt;p&gt;echo "${REMOTE_COMMIT}" &amp;gt; "${STATE_FILE}" &lt;br&gt;
echo "Incremental export concluded in ${EXPORT_DIR}"&lt;br&gt;
echo "Starting file upload and compile in Health Connect" &lt;br&gt;
(echo '_system'; echo 'SYS'; cat iris.script) | iris session IRISHEALTH &lt;br&gt;
echo "Compilation successfully finished" &lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, this script executes the git pull on our GitHub repository, updates the source code in a directory on the DEVELOPMENT server, detects the changes compared to the last downloaded version, extracts them to a second directory ( &lt;strong&gt;/projectGit&lt;/strong&gt; ) and finally invokes the IRIS script.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(echo '_system'; echo 'SYS'; cat iris.script) | iris session IRISHEALTH &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those first two &lt;strong&gt;echo &lt;/strong&gt;commands will allow us to pass the username and password to the terminal session we need to open to run our ObjectScript script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;zn "DEMO" &lt;br&gt;
set sc = $SYSTEM.OBJ.LoadDir("/projectGit/src/Demo", "ck", , 1) &lt;br&gt;
if '$SYSTEM.Status.IsOK(sc) do $SYSTEM.Status.DisplayError(sc) quit &lt;br&gt;
set production = "Demo.Order.Production" &lt;br&gt;
set ^Ens.Configuration("csp","LastProduction") = production &lt;br&gt;
do ##class(Ens.Director).SetAutoStart(production) &lt;br&gt;
do ##class(Ens.Director).StartProduction(production) &lt;br&gt;
write !,"Produccion iniciada correctamente: ",production,! &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This script is where we import the classes we've identified as modified or created and compile them. If the compilation is successful, we restart the corresponding production environment of our DEMO namespace so that the changes are implemented.&lt;/p&gt;
&lt;p&gt;Perfect, we have our scripts, our DEVELOPMENT server and our GitHub, let's configure our Jenkins.&lt;/p&gt;
&lt;h2&gt;Configuring Jenkins&lt;/h2&gt;
&lt;p&gt;Before we start creating our pipeline, we must install a plugin that allows us to connect via SSH to our DEVELOPMENT server with our primary username and password.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9noio8ynmf9s94fskinv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9noio8ynmf9s94fskinv.png" alt=" " width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the Jenkins configuration, we created an access credential to our DEVELOPMENT server:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsffpb4igwj2jeqpfbu59.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsffpb4igwj2jeqpfbu59.png" alt=" " width="800" height="1090"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And finally we proceed to create the Pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gsrqiumemwhc3toso8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gsrqiumemwhc3toso8x.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Within the pipeline configuration, we define the following script that will allow us to deploy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pipeline {&lt;br&gt;
    agent any
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parameters {
    string(name: 'GIT_BRANCH', defaultValue: 'main', description: 'Repository branch')
    string(name: 'REMOTE_HOST', defaultValue: 'ec2-**-**-***-**.**-*****.compute.amazonaws.com', description: 'Remote Host')
    string(name: 'REMOTE_USER', defaultValue: 'ec2-user', description: 'Remote SSH user')
    string(name: 'REMOTE_SCRIPT_NAME', defaultValue: 'shell_script.sh', description: 'Remote script name')
}

environment {
    REPO_URL = 'https://github.com/intersystems-ib/workshop-cicd-demo'
    SSH_CREDENTIALS_ID = 'ssh-healthconnect-remote'
}

stages {
    stage('Checkout') {
        steps {
            git branch: "${params.GIT_BRANCH}", url: "${env.REPO_URL}"
        }
    }

    stage('Validate script') {
        steps {
            sh '''
                set -eu
                test -f shell_script.sh
                chmod +x shell_script.sh
            '''
        }
    }

    stage('Launch remote script') {
        steps {
            sshagent(credentials: ["${env.SSH_CREDENTIALS_ID}"]) {
                sh '''
                    set -eu

                    ssh -o StrictHostKeyChecking=no "${REMOTE_USER}@${REMOTE_HOST}" \
                      "sudo sh '/${REMOTE_SCRIPT_NAME}'" | tee remote_execution.log
                '''
            }
        }
    }
}

post {
    always {
        archiveArtifacts artifacts: 'remote_execution.log', allowEmptyArchive: true
    }
    success {
        echo 'Remote deployment successfully finished.'
    }
    failure {
        echo 'Remote deployment failed. Check remote_execution.log.'
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What does our script do? Very simple, it checks that our GitHub repository exists with its associated branch and then, via SSH, sends the instruction to execute the Linux script that will be in charge of downloading and updating our instance.&lt;/p&gt;
&lt;p&gt;Let's see it in action with a small example.&lt;/p&gt;
&lt;h2&gt;Running the process&lt;/h2&gt;
&lt;p&gt;Our production is running normally and we want to make a change to one of our components so that the default value shown in one of the parameters is different:&lt;/p&gt;
&lt;br&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpst17pm4hv4qdf3y3si.png" alt=" " width="799" height="355"&gt;

&lt;p&gt;Now we want our &lt;strong&gt;TenantId&lt;/strong&gt; parameter to have the value ZZZ-999, great, let's correct the code we have in our local instance from Visual Studio Code and upload the change to our GitHub.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F057g8lmzb696c2z3148s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F057g8lmzb696c2z3148s.png" alt=" " width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;With our change now pushed to our repository, we can run the pipeline from our Jenkins instance. Let's see the pipeline's output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F965l4ikdejfqzv7yfklr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F965l4ikdejfqzv7yfklr.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Everything is correct; it has detected our change and executed the script successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1v5bc0c5uju6rhvavd91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1v5bc0c5uju6rhvavd91.png" alt=" " width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's verify that the parameter has changed and production has restarted successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbztxqx2s4rwrczlin84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbztxqx2s4rwrczlin84.png" alt=" " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There we have our new TenantId! A complete and resounding success!&lt;/p&gt;
&lt;h2&gt;Conclusions and next steps.&lt;/h2&gt;
&lt;p&gt;As you may have noticed, there are no technological limitations from IRIS for participating in a continuous integration process. You simply need the appropriate scripts that best suit your daily operations.&lt;/p&gt;
&lt;p&gt;In this article we have seen a small example of continuous integration with IRIS for Health, but this could be expanded to certain configurations that could be deployed using features such as Configuration Merge.&lt;/p&gt;
&lt;p&gt;Give it a try!&lt;/p&gt;

</description>
      <category>vscode</category>
      <category>automation</category>
      <category>programming</category>
      <category>github</category>
    </item>
    <item>
      <title>Introducing iris-synthetic-data-gen</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 17 May 2026 15:46:56 +0000</pubDate>
      <link>https://dev.to/intersystems/introducing-iris-synthetic-data-gen-2l8j</link>
      <guid>https://dev.to/intersystems/introducing-iris-synthetic-data-gen-2l8j</guid>
      <description>&lt;p&gt;Today I have published a new &lt;a href="https://openexchange.intersystems.com/package/iris-synthetic-data-gen" rel="noopener noreferrer"&gt;Open Exchange package&lt;/a&gt; for generation of Synthetic Data directly into IRIS.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;It can be a frustrating process to find decent datasets when you are looking to make a demo app. Maybe the dataset doesn't matter that much, but you still want it to appear somewhat genuine and with several linked tables that are usable directly within IRIS with the neat implicit joins with &lt;code&gt;-&amp;gt;&lt;/code&gt;. Maybe you just want linked tables that are easily installable with IPM to benchmark queries, this dataset generation would be perfect.&lt;/p&gt;
&lt;p&gt;I have opted to create datasets using Embedded Python, these datasets are configurable by custom config files. The datasets are generated directly with a single IRIS class method, and can be scaled with a multiplier to create however small or large datasets you want without having to measure configs.&lt;/p&gt;
&lt;p&gt;At the moment I have four datasets:&lt;br&gt;- Financial services (e.g. Bank Cards, accounts, transactions )&lt;br&gt;- Retail (Stores, Products, Users, Inventory)&lt;br&gt;- Supply Chain (products, sales orders, inventory movement)&lt;br&gt;- Theme Park management (parks, zones, rides, incidents)&lt;/p&gt;
&lt;p&gt;I am not an expert in any of these domains, so I doubt they are super accurate, and the data generation uses python libraries like &lt;code&gt;faker&lt;/code&gt; and statistical weighted generation with &lt;code&gt;numpy&lt;/code&gt;, so it all feels a bit synthetic.&lt;/p&gt;
&lt;p&gt;I will also be honest that, as a side-of-desk project which I couldn't give a huge amount of time to, this project was only made possible by AI. I used AI extensively for the design of datasets and the generation of the code to create the datasets. I supervised, tested for personal use cases and was very involved with the project design, but the code is all AI generated and I have not carefully reviewed the dataset generation process.&lt;/p&gt;
&lt;p&gt;For me, this project is a great use case for full "vibe coding" i.e. letting the agent handle the entire coding process. That is to say, the consequences of bugs is low as these datasets are not designed for any production use. The code can largely be judged on the results outputted, in the knowledge that the details or edge cases don't matter.&lt;/p&gt;
&lt;p&gt;Its also a good template to make new datasets - the first of the datasets took me a couple of hours of careful planning, discussion with agents, and iterating as to how best to create the dataset and add it to IRIS. Whereas for the last dataset, I could ask the agent "Create a new dataset with retail tables that is configured and generated like the others here", and it did a pretty good job without any real oversight.&lt;/p&gt;
&lt;p&gt;I hope this can be useful for some, and feel free to give feedback, contributions or to use it as a template to make your own synthetic datasets!&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>embeddedpython</category>
      <category>productivity</category>
    </item>
    <item>
      <title>An Introduction to AI Hub, Part 1: Agents in ObjectScript</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 17 May 2026 15:44:11 +0000</pubDate>
      <link>https://dev.to/intersystems/an-introduction-to-ai-hub-part-1-agents-in-objectscript-2p1e</link>
      <guid>https://dev.to/intersystems/an-introduction-to-ai-hub-part-1-agents-in-objectscript-2p1e</guid>
      <description>&lt;p&gt;For those of you that weren't at READY last week, you may have missed the exciting announcement that the Early Access Program for AI Hub is officially open. It was announced during an amazing demo from &lt;span&gt;&lt;a class="mentioned-user" href="https://dev.to/benjamin"&gt;@benjamin&lt;/a&gt;.DeBoe&lt;/span&gt; and &lt;span&gt;@Jeffrey.Fried&lt;/span&gt;, I recommend catching up with this demo when the recording is released! &amp;nbsp;I had the opportunity to play with AI Hub in advance, and thought I might share an introduction with the community.&lt;/p&gt;
&lt;p&gt;Before getting into the details, &lt;a href="https://github.com/intersystems-community/ai-hub-eap/tree/master" rel="noopener nofollow noreferrer"&gt;here is a link for the documentation&lt;/a&gt; and &lt;a href="https://evaluation.intersystems.com/Eval/early-access/AIHub" rel="noopener nofollow noreferrer"&gt;here is a link to the EAP portal to download AI Hub&lt;/a&gt;, its currently available as standalone install kits or container images.&amp;nbsp;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Please note, this is a preview and there are likely to be significant changes before the official release, it is not designed for production use, and you may run into some issues - if you do, raise an issue on the Github page!&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2&gt;Agents&lt;/h2&gt;
&lt;p&gt;The most exciting feature, for me at least, has been the new ObjectScript agents SDK. You can now create agents and tools directly in ObjectScript, using an intuitive SDK.&lt;/p&gt;
&lt;p&gt;Creating an Agent is simple you can give it a system prompt with the &lt;code&gt;XData INSTRUCTIONS&lt;/code&gt; component, then just set the provider, model and tools:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Class Sample.Agent Extends %AI.Agent
{
    /// LLM Model
    Parameter MODEL = "gpt-5-nano";

    /// Toolsets that the agent can use
    Parameter TOOLSETS = "Sample.ToolSet";
    
    /// System Prompt
    XData INSTRUCTIONS [ MimeType = text/markdown ]
    {
    # Sample Assistant

    You are a helpful assistant with access to a set of tools to interact with a database of people.
    }

    Method %OnInit() As %Status
    {
        // Set provider with API key from environment variable
        Set key = $System.Util.GetEnviron("OPENAI_API_KEY")  // or whatever
        Set ..Provider = ##class(%AI.Provider).Create("openai", {"api_key": (key)})
        
        Return $$$OK
    }
}&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Tools&lt;/h2&gt;
&lt;p&gt;Tools are even easier to create - its as simple as extending &lt;code&gt;%AI.Tools&lt;/code&gt;, after that, all methods, class methods and queries become tools that agents can use. So we can do something like the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Class Sample.Tools Extends %AI.Tool [dependsOn=Sample.Person]
{

/// Tool to add a person to the database
Method AddPerson(name As %String, age As %Integer) As %Status{
   Set person = ##class(Sample.Person).%New()
   Set person.Name = name
   Set person.Age = age
   Set sc =  person.%Save()
   Quit sc
}

/// Tool query database for people younger than a specified age
Query GetPeopleYoungerThan(age As %Integer) As %SQLQuery(ROWSPEC = "Name:%String,Age:%Integer") [ SqlProc ]
{
   SELECT Name, Age From Sample.Person Where Age &amp;lt; :age
}

}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tools can also be organised into toolsets, these are, as the name suggests, sets of tools that can be used to combine many tools from different classes, filter tools by regex matching, add policies and use MCP servers defined outside of IRIS.&lt;/p&gt;
&lt;p&gt;In the example below we combine the tools we defined above, &lt;code&gt;Sample.Tools&lt;/code&gt;, with a policy which logs tool calls to the terminal (&lt;code&gt;%AI.Policy.ConsoleAudit&lt;/code&gt;) and a custom Python MCP server.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Class Sample.ToolSet Extends %AI.ToolSet [DependsOn=Sample.Tools]
{
    XData Definition
    {
        &amp;lt;ToolSet&amp;gt;
            &amp;lt;Description&amp;gt;Sample Toolset&amp;lt;/Description&amp;gt;
            
            &amp;lt;Policies&amp;gt;
           &amp;lt;!--Policy to Log tool calls to Console--&amp;gt;
                &amp;lt;Audit Class="%AI.Policy.ConsoleAudit"/&amp;gt;
            &amp;lt;/Policies&amp;gt;
            
            &amp;lt;!--ObjectScript Tools--&amp;gt;
            &amp;lt;Include Class="Sample.Tools"&amp;gt;&amp;lt;/Include&amp;gt;
            
            &amp;lt;!--Python MCP Server created with FastMCP--&amp;gt;
            &amp;lt;MCP Name="PythonServer"&amp;gt;&amp;nbsp;
                &amp;lt;Stdio Executable="/usr/irissys/bin/irispython"&amp;nbsp;
                Args="/home/irisowner/dev/src/Python/multiplication_mcp.py" /&amp;gt;
            &amp;lt;/MCP&amp;gt;
            
        &amp;lt;/ToolSet&amp;gt;
    }    
}&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Other ObjectScript features&lt;/h2&gt;
&lt;p&gt;There are a load more of cool features to create super powerful agents, including support for agent skills (&lt;code&gt;%AI.Agent.Skill&lt;/code&gt;), delegation of tasks to subagents (&lt;code&gt;%AI.Agent.SubAgent&lt;/code&gt;) and tools for creating knowledge bases with RAG (&lt;code&gt;%AI.RAG&lt;/code&gt;). You can also create custom audit or authentication policies, to either log tool calls or decide whether they should be allowed.&lt;/p&gt;
&lt;p&gt;One very cool feature is that tools and toolsets can be &lt;code&gt;stateful&lt;/code&gt;, meaning they retain the state between tool calls. As such, a tool could be called multiple times, with the actions of the previous tool call being retained. For example, a file could be opened once and the contents 'remembered' the next time the tool is called. To use this, define tools with methods (instead of class methods) and save attributes as properties. There's a nice example of this in the &lt;a href="https://github.com/intersystems-community/ai-hub-eap/blob/master/ObjectScript_SDK_Guide.md#stateful-tools-instance-methods" rel="noopener nofollow noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I've been working with AI hub for well over a month now, and am still overwhelmed by the amount of features, particularly at the advanced end, that I still need to explore.&lt;/p&gt;
&lt;h2&gt;Template&lt;/h2&gt;
&lt;p&gt;If you want to start playing around with AI Hub, I published a &lt;a href="https://openexchange.intersystems.com/package/ai-hub-dev-template" rel="noopener noreferrer"&gt;dev template to the Open Exchange&lt;/a&gt; which includes instructions for downloading and building the AI Hub container, and has a few pre-loaded sample classes (you might recognise them from this article). It even has some agent skills, in case you'd like your AI agent of choice to know what's in the documentation before you do!&lt;/p&gt;
&lt;p&gt;It even creates an MCP server and has instructions on how to connect to it.&lt;/p&gt;
&lt;h2&gt;Next time&lt;/h2&gt;
&lt;p&gt;In my next article, I'll show how you can package your agent tools into an MCP server to connect directly to your data from any MCP client!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gpt3</category>
      <category>documentation</category>
      <category>programming</category>
    </item>
    <item>
      <title>IRIS Dockerization and Embedded Python for Data Science — One-Command Setup for Reproducible ML Workflows</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:41:55 +0000</pubDate>
      <link>https://dev.to/intersystems/iris-dockerization-and-embedded-python-for-data-science-one-command-setup-for-reproducible-ml-am4</link>
      <guid>https://dev.to/intersystems/iris-dockerization-and-embedded-python-for-data-science-one-command-setup-for-reproducible-ml-am4</guid>
      <description>&lt;p&gt;1-command only required for an entire IRIS instance for Data Science projects, and leveraging this to compare query methods' speed (Dynamic SQL, Pandas Query, and Globals).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0hxffduy4g1brhtt646.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0hxffduy4g1brhtt646.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before joining InterSystems, I worked in a team of web developers as a data scientist. Most of my day-to-day work involved training and embedding ML models in Python-based backend applications through microservices, mainly built with the Django framework and using Postgres SQL for sourcing the data. During development, testing, and deployment, I realized the importance of repeatability of results, both for the model’s inferences and for the performance inside the application, regardless of the hardware being used to run the code.&lt;/p&gt;

&lt;p&gt;This naturally went hand in hand with adopting good coding practices, such as modularization to reduce code repeatability and boilerplate, making maintenance easier and speeding up development. For this reason, Docker in particular became an essential tool in our workflow, not only for scalability and ease of deployment, but also to reduce human error and ensure that code behaves the same way everywhere, regardless of the underlying machine.&lt;/p&gt;

&lt;p&gt;When I joined InterSystems, I was immediately impressed by the robustness of IRIS as a data platform. Its resilience to human error when following guidelines to create services through productions, the multi-model nature of how information can be stored, and, in particular, the lightning-fast access to data through globals opened my eyes to a different way of thinking about performance and data access patterns, especially when compared to a traditional relational-only mindset.&lt;/p&gt;

&lt;p&gt;I was also lucky to join the company (September 2025) at a time when a rich ecosystem of tools was already in place, significantly flattening the learning curve. The VS Code ObjectScript Extension Pack, Embedded Python, the official IRIS Docker images, and the InterSystems Package Manager (IPM) for easily importing ObjectScript packages (&lt;a href="https://github.com/intersystems/ipm" rel="noopener noreferrer"&gt;https://github.com/intersystems/ipm&lt;/a&gt;) quickly became my everyday toolbelt.&lt;/p&gt;

&lt;p&gt;After about three months, I felt confident enough working with this stack that I started standardizing my own development environment. In this article, I’d like to share how I set up a fully containerized IRIS instance for Data Science projects using Docker—ready to use Embedded Python out of the box, with all required dependencies installed from both Python’s &lt;code&gt;pip&lt;/code&gt; and IPM.&lt;/p&gt;

&lt;p&gt;I’ll also use this setup to share some insights on the incredible speed of using globals to query tables, in a practical scenario where the popular gradient boosting model &lt;strong&gt;LightGBM&lt;/strong&gt; is used to train and make inferences on a mock dataset. This allows us to measure inference speed while comparing the different querying approaches available in IRIS.&lt;/p&gt;

&lt;p&gt;Some important highlights that will be addressed in this article are how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Link custom Python packages during the Docker build process, so they can be imported naturally (e.g. &lt;code&gt;from mypythonpackage import myclassorfunc&lt;/code&gt;) inside any Embedded Python methods living on ObjectScript classes, without repetitive boilerplate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Automatically execute IRIS terminal commands as soon as the container starts, which in this scenario is used to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Import custom ObjectScript packages into IRIS.&lt;/li&gt;
&lt;li&gt;Install IPM and, through it, Shavrov’s &lt;code&gt;csvgenpy&lt;/code&gt; utility
(&lt;a href="https://community.intersystems.com/post/csvgenpy-import-any-csv-intersystems-iris-using-embedded-python" rel="noopener noreferrer"&gt;https://community.intersystems.com/post/csvgenpy-import-any-csv-intersystems-iris-using-embedded-python&lt;/a&gt;),
used to create and populate new tables from a single CSV file.&lt;/li&gt;
&lt;li&gt;Check whether an IRIS table already exists and, if it doesn’t, populate it using &lt;code&gt;csvgenpy&lt;/code&gt; with a CSV file mounted into the container via Docker volumes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;All of this by only running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, the repository accompanying this article uses this setup to create a complete IRIS environment with all the tools and data needed to compare different ways of querying the same IRIS table and converting the results into a Pandas DataFrame (NumPy-based), which is typically what gets passed to Python-based machine learning models.&lt;/p&gt;

&lt;p&gt;The comparison includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic SQL queries&lt;/li&gt;
&lt;li&gt;Pandas querying the table directly&lt;/li&gt;
&lt;li&gt;Direct access through globals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each approach, execution time is measured to quantitatively compare the performance of the different querying methods. This analysis shows that direct global access provides the lowest-latency data retrieval for machine learning inference workloads by far.&lt;/p&gt;

&lt;p&gt;At the same time, consistency across querying methods is validated by asserting equality of the resulting Pandas DataFrames, ensuring that identical dataframes (and therefore identical downstream ML predictions) are produced regardless of the query mechanism used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── docker-compose.yml             # Docker orchestration configuration
├── dockerfile                     # Multi-stage build with IRIS + Python
├── iris_autoconf.sh               # Auto-configuration script for IRIS terminal commands
├── requirements.txt               # Python libraries
├── MockPackage/                   # Custom package
│   ├── MockDataManager.cls        # Data management utilities
│   ├── MockModelManager.cls       # ML model training
│   └── MockInference.cls          # Data retrieval and inference benchmarks
├── python_utils/                  # Custom Python packages
│   ├── __init__.py
│   ├── utils.py                   # ML preprocessing &amp;amp; inference
|   └── querymethods.py            # Methods for Querying IRIS tables
└── dur/                           # Volume for durable data on host machine and container
    ├── data/                      # CSV datasets
    └── models/                    # Trained LightGBM models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Dockerization of IRIS
&lt;/h2&gt;

&lt;p&gt;This section describes the main building blocks used to dockerize a Python-ready IRIS instance. The goal here is not only to run IRIS inside a container, but to do so in a way that makes it immediately usable for Data Science workflows: Embedded Python enabled, Python dependencies installed, ObjectScript packages available through IPM, and data automatically loaded when the container starts.&lt;/p&gt;

&lt;p&gt;The setup relies on three main components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker-compose.yml&lt;/code&gt; to define how the IRIS container is built and run&lt;/li&gt;
&lt;li&gt;a multi-stage &lt;code&gt;Dockerfile&lt;/code&gt; to prepare Embedded Python and dependencies&lt;/li&gt;
&lt;li&gt;an &lt;code&gt;iris_autoconf.sh&lt;/code&gt; script to automate IRIS-side configuration at startup&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  docker-compose.yml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3.8'

services:
  iris:
    build: # How is the image built
      context: . # Path to the directory containing the Dockerfile
      dockerfile: Dockerfile # Name of the Dockerfile
    container_name: iris-experimentation # Name of the container
    ports:
      - "1972:1972"    # SuperServer port
      - "52773:52773"  # Management Portal/Web Gateway
    volumes:
      - ./dur/.:/dur:rw # map host directory to container directory with read-write permissions
    restart: always # Always restart the container if it stops (unless explicitly stopped)
    healthcheck:
      test: ["CMD", "iris", "session", "iris", "-U", "%SYS", "##class(SYS.Database).GetMountedSize()"] # Health check command
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    command: --after "/usr/irissys/iris_autoconf.sh" # Run autoconf script after startup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker Compose specifies how the IRIS container is built, which ports are exposed, how storage is handled, and what commands are executed at startup. In particular, I want to highlight the following points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;volumes: ./dur/.:/dur:rw&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates the &lt;code&gt;/dur&lt;/code&gt; directory inside the container and maps it to &lt;code&gt;./dur&lt;/code&gt; (relative to the location of &lt;code&gt;docker-compose.yml&lt;/code&gt;) on the host machine, with both read and write permissions.&lt;/p&gt;

&lt;p&gt;In practice, this means that both the host machine and the container share the same path. This makes it very easy to load files into IRIS and inspect or modify them from the host without any extra copying steps.&lt;/p&gt;

&lt;p&gt;In this project, this is how the &lt;code&gt;/data&lt;/code&gt; and &lt;code&gt;/models&lt;/code&gt; folders are directly made available inside the container under &lt;code&gt;/dur&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;command: --after "/usr/irissys/iris_autoconf.sh"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This command allows the execution of a bash script immediately after the container is up and running. The script contains all the commands needed to open an IRIS terminal session and execute any required IRIS-side configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; The commands in this script are executed every time the container starts. This means that if the container goes down for any reason and restarts (for example, due to &lt;code&gt;restart: always&lt;/code&gt;), all the commands in this script will be executed again. If this behavior is not taken into account when writing the script, it can lead to unintended side effects such as reinstalling packages or resetting tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Stage 1: Build stage for installing dependencies
FROM python:3.12-slim AS builder

# Set the working directory
WORKDIR /app

# Copy the requirements file into the image
COPY requirements.txt requirements.txt

# Install the Python dependencies into a temporary location
RUN pip install --no-cache-dir --target /install -r requirements.txt

# Stage 2: Final image with InterSystems IRIS and the installed Python libraries
FROM containers.intersystems.com/intersystems/iris-community:latest-em

# Switch to the root user to install necessary system packages
USER root

# Install the correct Python 3.12 development library for Ubuntu Noble
RUN apt-get update &amp;amp;&amp;amp; apt-get install -y libpython3.12-dev wget &amp;amp;&amp;amp; \
    rm -rf /var/lib/apt/lists/*

# Set the environment variables for Embedded Python
ENV PythonRuntimeLibrary=/usr/lib/x86_64-linux-gnu/libpython3.12.so
ENV PythonRuntimeLibraryVersion=3.12

# Update the LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}

# Copy the installed Python packages from the builder stage
COPY --from=builder /install /usr/irissys/mgr/python

# Your own Python package
COPY python_utils /usr/irissys/mgr/python/python_utils
ENV PYTHONPATH=/usr/irissys/mgr/python:${PYTHONPATH}


# Copy ObjectScript classes into the image
COPY MockPackage /usr/irissys/mgr/MockPackage
# Copy and set permissions for the autoconf script while still root
COPY iris_autoconf.sh /usr/irissys/iris_autoconf.sh
RUN chmod +x /usr/irissys/iris_autoconf.sh

# Switch back to the default `irisowner` user
USER irisowner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a two-stage Dockerfile.&lt;/p&gt;

&lt;p&gt;The first stage is a lightweight build stage used to install all Python dependencies listed in &lt;code&gt;requirements.txt&lt;/code&gt; into a temporary directory. This keeps the final image clean and avoids installing build tools directly into the IRIS image.&lt;/p&gt;

&lt;p&gt;The second stage is based on the official InterSystems IRIS image. Here, the Python runtime library required for Embedded Python is installed, and IRIS is configured so that Embedded Python can recognize both the runtime library and all installed Python packages, including custom ones.&lt;/p&gt;

&lt;p&gt;It is worth highlighting the following configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedded Python runtime configuration&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ENV PythonRuntimeLibrary=/usr/lib/x86_64-linux-gnu/libpython3.12.so
  ENV PythonRuntimeLibraryVersion=3.12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These environment variables achieve what would otherwise be configured manually through the Management Portal by navigating to:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;System Administration → Configuration → Additional Settings → Advanced Memory&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;and updating the Embedded Python runtime settings. Defining them in the Dockerfile makes the configuration explicit, reproducible, and version-controlled.&lt;/p&gt;

&lt;p&gt;Additionally, the classes inside the package "MockPackage" are copied inside the container through:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;COPY MockPackage /usr/irissys/mgr/MockPackage&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;to be later on, automatically imported to IRIS when the the following bash file is executed after the container is up and running.&lt;/p&gt;

&lt;h3&gt;
  
  
  iris_autoconf.sh
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/bash
set -e

iris session IRIS &amp;lt;&amp;lt;'EOF'

/* Install IPM/ZPM client if you still need that first
   (your original snippet did this already) */
s version="latest" s r=##class(%Net.HttpRequest).%New(),r.Server="pm.community.intersystems.com",r.SSLConfiguration="ISC.FeatureTracker.SSL.Config" d r.Get("/packages/zpm/"_version_"/installer"),$system.OBJ.LoadStream(r.HttpResponse.Data,"c")

/* Configure registry */
zpm
repo -r -n registry -url https://pm.community.intersystems.com/ -user "" -pass ""
install csvgenpy
quit

/* Import and Compile the MockPackage */
/* The "ck" flags will Compile and Keep the source */
Do $system.OBJ.Import("/usr/irissys/mgr/MockPackage", "ck")

/* Upload csv data ONCE to Table Automatically using csvgenpy */
SET exists = ##class(%SYSTEM.SQL.Schema).TableExists("MockPackage.NoShowsAppointments")
IF 'exists {   do ##class(shvarov.csvgenpy.csv).Generate("/dur/data/healthcare_noshows_appointments.csv","NoShowsAppointments","MockPackage")   }

halt
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a bash script that is executed inside the container immediately after startup. It opens an IRIS terminal session using &lt;code&gt;iris session IRIS&lt;/code&gt; and runs IRIS-specific commands to perform additional configuration steps automatically.&lt;/p&gt;

&lt;p&gt;These steps include importing custom packages whose classes were copied inside the container's storage, installing IPM (available as &lt;code&gt;zpm&lt;/code&gt; inside the IRIS terminal), installing IPM packages such as &lt;code&gt;csvgenpy&lt;/code&gt;, and using &lt;code&gt;csvgenpy&lt;/code&gt; to load a CSV file mounted into the container at &lt;code&gt;/dur/data/healthcare_noshows_appointments.csv&lt;/code&gt; to create and populate a corresponding table in IRIS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; This script is executed every time the container starts. If this behavior is not considered, it can lead to unintended side effects such as reloading or resetting data. That is why it is important to make the script safe to run multiple times, for example, by checking whether the target table already exists before creating or populating it. This is especially relevant here because the Docker Compose restart policy is set to &lt;code&gt;restart: always&lt;/code&gt;, meaning the container will automatically restart and re-execute these commands whenever it goes down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Packages for Benchmarking
&lt;/h2&gt;

&lt;p&gt;This section introduces the ObjectScript packages used to benchmark different data access strategies in IRIS for a Machine Learning inference workload. The focus here is not on model quality, but on measuring and comparing the time it takes to retrieve data from IRIS, convert it into a Pandas DataFrame, and run inference using a trained LightGBM model.&lt;/p&gt;

&lt;p&gt;Each class plays a specific role in this process, from data preparation, to model training, and finally to inference and performance comparison.&lt;/p&gt;

&lt;h3&gt;
  
  
  MockDataManager.cls
&lt;/h3&gt;

&lt;p&gt;This class contains methods for taking a given CSV file and duplicating its rows to reach a desired dataset size (&lt;code&gt;AdjustDataSize&lt;/code&gt;), as well as updating a given IRIS table with the specified CSV (&lt;code&gt;UpdateTableFromCSV&lt;/code&gt;). The main purpose of these utilities is to allow testing query and inference time across multiple table sizes in a controlled way.&lt;/p&gt;

&lt;p&gt;Note: Throughout this analysis, we focus exclusively on the &lt;strong&gt;inference time&lt;/strong&gt; of a LightGBM model. We are not concerned with model performance metrics such as F1 score, precision, recall, accuracy, or else at this stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  MockModelManager.cls
&lt;/h3&gt;

&lt;p&gt;In this class, the only relevant method is &lt;code&gt;TrainNoShowsModel&lt;/code&gt;. It leverages the data processing pipeline defined in &lt;code&gt;python_utils.utils&lt;/code&gt; to prepare the raw data, passed in as a Pandas DataFrame, fit a LightGBM model, and persist the trained model to disk.&lt;/p&gt;

&lt;p&gt;The model is saved to a predefined location, which in this setup corresponds to the persistent storage mounted through Docker volumes in &lt;code&gt;docker-compose.yml&lt;/code&gt;. This allows the trained model to be reused across container restarts and inference runs without retraining.&lt;/p&gt;

&lt;h3&gt;
  
  
  MockInference.cls
&lt;/h3&gt;

&lt;p&gt;The core of the performance comparison lives in this class. The process begins by loading the trained LightGBM model weights from the file path specified in the &lt;code&gt;MODELPATH&lt;/code&gt; parameter. While this path is currently hardcoded, it serves as a static reference point shared by all inference tests.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RunInferenceWDynamicSQL&lt;/code&gt; represents the first approach. It relies on an ObjectScript method called &lt;code&gt;DynamicSQL&lt;/code&gt;, which executes a Dynamic SQL statement to filter records by age. The results are packed into a &lt;code&gt;%DynamicArray&lt;/code&gt; of &lt;code&gt;%DynamicObjects&lt;/code&gt;. This method is then called by the &lt;code&gt;dynamic_sql_query&lt;/code&gt; Python function in &lt;code&gt;python_utils/querymethods.py&lt;/code&gt;, where the IRIS objects are converted into a structure that can be easily transformed into a Pandas DataFrame.&lt;/p&gt;

&lt;p&gt;The entire workflow, including execution time measurement via a Python decorator defined in &lt;code&gt;python_utils/utils.py&lt;/code&gt;, is orchestrated inside &lt;code&gt;RunInferenceWDynamicSQL&lt;/code&gt;. The resulting DataFrame is then passed through the inference pipeline to produce predictions and measure end-to-end inference latency.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RunInferenceWIRISSQL&lt;/code&gt; follows a simpler path. It uses the &lt;code&gt;iris_sql_query&lt;/code&gt; method from &lt;code&gt;python_utils/querymethods.py&lt;/code&gt; to execute the SQL query directly from Python. The resulting IRIS SQL iterator is transformed directly into a Pandas DataFrame, after which the same inference and timing logic used in the previous method is applied.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RunInferenceWGLobals&lt;/code&gt; is the most direct approach, as it queries the underlying data structures (globals) backing the table. It uses the &lt;code&gt;iris_global_query&lt;/code&gt; method to fetch data directly from &lt;code&gt;^vCVc.Dvei.1&lt;/code&gt;. This particular global was identified as the &lt;code&gt;DataLocation&lt;/code&gt; in the storage definition of the &lt;code&gt;MockPackage.NoShowsAppointments&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;The global name is a result of the hashed storage automatically generated when the table was built from the CSV file.&lt;/p&gt;

&lt;p&gt;Finally, the integrity of all three approaches is verified using the &lt;code&gt;ConsistencyCheck&lt;/code&gt; method. This utility asserts that the Pandas DataFrames produced by each query strategy are identical, ensuring that data types, values, and numerical precision remain perfectly consistent regardless of the access method used.&lt;/p&gt;

&lt;p&gt;Because this check raises no errors, it confirms that Dynamic SQL, direct SQL access from Python, and high-speed global access are all returning exactly the same dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance comparison
&lt;/h2&gt;

&lt;p&gt;To evaluate performance, we measured query and inference times for increasing table sizes and report in the table below the average time over 10 runs for each configuration. Query time corresponds to retrieving the data from the database, while inference time corresponds to running the LightGBM model on the resulting dataset.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;th&gt;DynamicSQL – Query&lt;/th&gt;
&lt;th&gt;DynamicSQL – Infer&lt;/th&gt;
&lt;th&gt;IRISSQL – Query&lt;/th&gt;
&lt;th&gt;IRISSQL – Infer&lt;/th&gt;
&lt;th&gt;Globals – Query&lt;/th&gt;
&lt;th&gt;Globals – Infer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;0.003219271&lt;/td&gt;
&lt;td&gt;0.042354488&lt;/td&gt;
&lt;td&gt;0.001749706&lt;/td&gt;
&lt;td&gt;0.043090796&lt;/td&gt;
&lt;td&gt;0.001184559&lt;/td&gt;
&lt;td&gt;0.043616056&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;0.031865168&lt;/td&gt;
&lt;td&gt;0.052698898&lt;/td&gt;
&lt;td&gt;0.019246697&lt;/td&gt;
&lt;td&gt;0.056159472&lt;/td&gt;
&lt;td&gt;0.005061340&lt;/td&gt;
&lt;td&gt;0.045210719&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;0.237553477&lt;/td&gt;
&lt;td&gt;0.082497978&lt;/td&gt;
&lt;td&gt;0.099582171&lt;/td&gt;
&lt;td&gt;0.068728352&lt;/td&gt;
&lt;td&gt;0.036206818&lt;/td&gt;
&lt;td&gt;0.061128354&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;5.279174852&lt;/td&gt;
&lt;td&gt;0.189197206&lt;/td&gt;
&lt;td&gt;1.122253346&lt;/td&gt;
&lt;td&gt;0.177564192&lt;/td&gt;
&lt;td&gt;0.535172153&lt;/td&gt;
&lt;td&gt;0.175085044&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500,000&lt;/td&gt;
&lt;td&gt;68.741133046&lt;/td&gt;
&lt;td&gt;0.639807224&lt;/td&gt;
&lt;td&gt;7.015313649&lt;/td&gt;
&lt;td&gt;0.610818386&lt;/td&gt;
&lt;td&gt;2.743980526&lt;/td&gt;
&lt;td&gt;0.587647438&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;td&gt;196.871173100&lt;/td&gt;
&lt;td&gt;1.145034313&lt;/td&gt;
&lt;td&gt;22.138613220&lt;/td&gt;
&lt;td&gt;1.136569023&lt;/td&gt;
&lt;td&gt;5.987578392&lt;/td&gt;
&lt;td&gt;1.106307745&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2,000,000&lt;/td&gt;
&lt;td&gt;711.319680452&lt;/td&gt;
&lt;td&gt;3.021180152&lt;/td&gt;
&lt;td&gt;60.142974615&lt;/td&gt;
&lt;td&gt;2.879153728&lt;/td&gt;
&lt;td&gt;11.92040014&lt;/td&gt;
&lt;td&gt;2.728573560&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To characterise how query and inference times scale with respect to table size, we fitted a power-law regression of the form:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenephm08qhsw3ec4zzkb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenephm08qhsw3ec4zzkb.png" alt=" " width="388" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Inference Time
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcvw6y0dacebvenak2m2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcvw6y0dacebvenak2m2.png" alt=" " width="625" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzu1ioe6qjno3i0950a0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzu1ioe6qjno3i0950a0z.png" alt=" " width="431" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Inference time is very similar across all three query methods, which is expected, as the resulting input DataFrame was verified to be identical in all cases.&lt;/p&gt;

&lt;p&gt;From the measurements, the model is able to perform inference on approximately 1 million rows in about 1 second, highlighting the high throughput of LightGBM.&lt;/p&gt;

&lt;p&gt;The fitted exponent (k ~ 1.3) indicates slightly superlinear scaling of total inference time with respect to the number of rows. This behaviour is commonly observed in large-scale batch processing and is likely attributable to system-level effects such as cache pressure or memory bandwidth saturation, rather than to the algorithmic complexity of the model itself.&lt;/p&gt;

&lt;p&gt;The scaling factor "a" is on the order of tens of nanoseconds, reflecting the efficiency of the per-row computation. While the superlinear exponent implies that the marginal cost per additional row increases with table size, this effect becomes noticeable only at large scales (millions of rows), as illustrated by the increasing slope in the log–log plot.&lt;/p&gt;

&lt;p&gt;The marginal inference cost can be estimated from the derivative of the fitted model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wnvyih0ql36502xmb8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wnvyih0ql36502xmb8g.png" alt=" " width="138" height="57"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Evaluating this expression shows that the per-row marginal inference time increases from approximately 1.9e-7 seconds at 1000 rows to 1.5 e-6 seconds at 1 million rows, remaining firmly in the microsecond range within the observed data regime.&lt;/p&gt;

&lt;p&gt;Finally, the fitted constant offset (c ~ 0.08) seconds likely represents a fixed inference overhead, such as model invocation and runtime initialisation, and should be interpreted as a constant cost independent of table size.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Time
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjj1v93hxrjouzb4ij4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjj1v93hxrjouzb4ij4r.png" alt=" " width="357" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj4on4da3gfoa6e10tpg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj4on4da3gfoa6e10tpg.png" alt=" " width="447" height="85"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Query time exhibits substantially different scaling behavior across the three access methods. In contrast to inference time, which is largely independent of the query mechanism, query performance is dominated by the data access strategy and its interaction with storage and execution layers.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Globals-based&lt;/strong&gt; approach shows nearly linear scaling (k ~ 1.03), indicating that the cost of retrieving each additional row remains approximately constant across the measured range. This behavior is consistent with sequential access patterns and minimal query-planning overhead, making Globals the most scalable option for large result sets.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;IRISSQL&lt;/strong&gt; approach exhibits moderately superlinear scaling (k ~ 1.48). While still efficient for moderate table sizes, the increasing marginal cost suggests growing overhead from SQL execution, query planning, or intermediate result materialization as the number of rows increases.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;DynamicSQL&lt;/strong&gt; approach displays the most pronounced superlinear scaling (k ~ 1.82), resulting in rapidly increasing query times at larger scales. This behavior explains the steep slope observed in the plot and indicates that DynamicSQL incurs significant additional overhead as result size grows, making it the least scalable method for large batch queries.&lt;/p&gt;

&lt;p&gt;Although the fitted scaling factors "a" are numerically small, they must be interpreted jointly with the exponent "k". In practice, the exponent dominates the asymptotic behavior, which is why DynamicSQL, despite a small "a", becomes significantly slower at large table sizes.&lt;/p&gt;

&lt;p&gt;The fitted constant term "c" represents the fixed query overhead. For IRISSQL, "c" is close to zero, indicating a small startup cost. This overhead is even smaller for the Globals-based approach, where the fitted value is slightly negative, effectively suggesting a zero fixed cost. This behavior is expected, as data retrieval via a global key proceeds directly without additional query planning or execution overhead.&lt;/p&gt;

&lt;p&gt;In contrast, the relatively large constant offset observed for DynamicSQL indicates a substantial fixed overhead, likely associated with query preparation or execution setup. This fixed cost penalizes performance across all table sizes and becomes particularly impactful at both small and large scales.&lt;/p&gt;

&lt;p&gt;Overall, these results highlight that query time, unlike inference time, is highly sensitive to the data access method, with Globals offering near-linear scalability, IRISSQL providing a balanced middle ground, and DynamicSQL exhibiting poor scalability for large result sets.&lt;/p&gt;

&lt;p&gt;Please refer to the following repository for more details:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/JorgeIvanJH/IRIS_dockerization.git" rel="noopener noreferrer"&gt;https://github.com/JorgeIvanJH/IRIS_dockerization.git&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Video demo here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/IcShNKQ4jIk" rel="noopener noreferrer"&gt;https://youtu.be/IcShNKQ4jIk&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have any questions or notice any mistakes, please don’t hesitate to reach out.&lt;/p&gt;

&lt;p&gt;Thank you!&lt;/p&gt;

</description>
      <category>docker</category>
      <category>sql</category>
      <category>python</category>
      <category>performance</category>
    </item>
    <item>
      <title>Vector Search with Embedded Python in InterSystems IRIS</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:35:55 +0000</pubDate>
      <link>https://dev.to/intersystems/vector-search-with-embedded-python-in-intersystems-iris-h3a</link>
      <guid>https://dev.to/intersystems/vector-search-with-embedded-python-in-intersystems-iris-h3a</guid>
      <description>&lt;p&gt;&lt;span&gt;&lt;span&gt;One objective of vectorization is to render unstructured text more machine-usable. Vector embeddings accomplish this by encoding the semantics of text as high-dimensional numeric vectors, which can be employed by advanced search algorithms (normally an approximate nearest neighbor algorithm like Hierarchical Navigable Small World).&amp;nbsp;This not only improves our ability to interact with unstructured text programmatically but makes it searchable by context and by meaning beyond what is captured literally by keyword.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span&gt;In this article I will walk through a simple vector search implementation that Kwabena Ayim-Aboagye and I fleshed out using embedded python in InterSystems IRIS for Health. I'll also dive a bit into how to use embedded python and dynamic SQL generally, and how to take advantage of vector search features offered natively through IRIS.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;h2&gt;Environment Details:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;OS: Windows Server 2025&lt;/li&gt;
&lt;li&gt;InterSystems IRIS for Health 2025.1&lt;/li&gt;
&lt;li&gt;VS Code / InterSystems Server Manager&lt;/li&gt;
&lt;li&gt;Python 3.13.7&lt;/li&gt;
&lt;li&gt;Python Libraries: pandas, ollama, iris*&lt;em&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Ollama 0.12.3 and model all-minilm&lt;/li&gt;
&lt;li&gt;Dynamic SQL&lt;/li&gt;
&lt;li&gt;Sample database of unstructured text (classic poems)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Process:&lt;/h2&gt;
&lt;h3&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; 0. &lt;strong&gt;Setup the environment; complete installs&lt;/strong&gt;
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;h3&gt;&lt;strong&gt;Define an auxiliary table&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;The embeddings table &lt;code&gt;User.SamplePoetryVectors&lt;/code&gt;&amp;nbsp;has a foreign key on &lt;code&gt;User.SamplePoetry&lt;/code&gt; as well as an &lt;code&gt;EMBEDDING&lt;/code&gt; property of type &lt;code&gt;%Library.Vector&lt;/code&gt;. Ollama &lt;code&gt;all-minilm&lt;/code&gt; generates embeddings of 384 dimensions, so we imposed a length constraint accordingly.&lt;ul&gt;
&lt;li&gt;&lt;img src="/sites/default/files/inline/images/table_dfns_0.png" alt=""&gt;&lt;/li&gt;
&lt;li&gt;*Note that because the goal is to ultimately take advantage of &lt;a href="https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&amp;amp;CLASSNAME=%25SQL.Index.HNSW" rel="noopener noreferrer"&gt;IRIS' native HNSWIndex&lt;/a&gt; and &lt;a href="https://docs.intersystems.com/iris20253/csp/docbook/Doc.View.cls?KEY=RSQL_vectorcosine" rel="noopener noreferrer"&gt;IRIS' native vector search methods&lt;/a&gt;,&amp;nbsp;&lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_vecsearch#GSQL_vecsearch_index_hnsw" rel="noopener noreferrer"&gt;we must have a column of type %Library.Vector (or %Library.Embedding) of fixed length that is of type decimal or double&lt;/a&gt; upon which to index.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h3&gt;
&lt;strong&gt;Define a &lt;/strong&gt;&lt;code&gt;&lt;strong&gt;RegisteredObject&lt;/strong&gt;&lt;/code&gt;&lt;strong&gt; class&lt;/strong&gt; for our vectorization methods, which will be written in embedded python. First let's focus on a &lt;code&gt;VectorizeTable()&lt;/code&gt; method, which will contain a driver function (of the same name) and a few supporting process functions all written in Python.&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The driver function walks through the process as follows:&lt;ol&gt;
&lt;li&gt;Load from IRIS into a Pandas Dataframe (via supporting function &lt;code&gt;load_table()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Generate an embedding column (via supporting class method&amp;nbsp;&lt;code&gt;GetEmbeddingString&lt;/code&gt;, which will later be used to generate embeddings for queries as well)&lt;ul&gt;&lt;li&gt;Convert the embedding column to a string that's compatible with IRIS vector type&lt;/li&gt;&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Write the dataframe into the auxiliary able&lt;/li&gt;
&lt;li&gt;Create an HNSW index on the auxiliary table&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;VectorizeTable()&lt;/code&gt; class method then simply calls the driver function:&lt;ul&gt;&lt;li&gt;&lt;img src="/sites/default/files/inline/images/vectorizetable.png" alt=""&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Let's examine it step-by-step:&lt;/li&gt;
&lt;/ul&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;h4&gt;&lt;strong&gt;Load the table from IRIS into a Pandas Dataframe&lt;/strong&gt;&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="mention"&gt;def&lt;/span&gt; &lt;span class="mention"&gt;load_table&lt;/span&gt;&lt;span class="mention"&gt;(sample_size=&lt;/span&gt;&lt;span class="mention"&gt;''&lt;/span&gt;) -&amp;gt; pd.DataFrame:&lt;br&gt;
    sql = &lt;span class="mention"&gt;f"SELECT * FROM SQLUser.SamplePoetry&lt;/span&gt;&lt;span class="mention"&gt;{&lt;/span&gt;&lt;span class="mention"&gt;f' LIMIT &lt;/span&gt;&lt;span class="mention"&gt;{sample_size}&lt;/span&gt;' &lt;span class="mention"&gt;if&lt;/span&gt; sample_size != &lt;span class="mention"&gt;'*'&lt;/span&gt; &lt;span class="mention"&gt;else&lt;/span&gt; &lt;span class="mention"&gt;''&lt;/span&gt;}"&lt;br&gt;
    result_set = iris.sql.exec(sql)&lt;br&gt;
    df = result_set.dataframe()
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;span class="mention"&amp;gt;# Entries without text will not be vectorized nor searchable&amp;lt;/span&amp;gt;
&amp;lt;span class="mention"&amp;gt;for&amp;lt;/span&amp;gt; index, row &amp;lt;span class="mention"&amp;gt;in&amp;lt;/span&amp;gt; df.iterrows():
    &amp;lt;span class="mention"&amp;gt;if&amp;lt;/span&amp;gt; row[&amp;lt;span class="mention"&amp;gt;'poem'&amp;lt;/span&amp;gt;] == &amp;lt;span class="mention"&amp;gt;' '&amp;lt;/span&amp;gt; &amp;lt;span class="mention"&amp;gt;or&amp;lt;/span&amp;gt; row[&amp;lt;span class="mention"&amp;gt;'poem'&amp;lt;/span&amp;gt;] &amp;lt;span class="mention"&amp;gt;is&amp;lt;/span&amp;gt; &amp;lt;span class="mention"&amp;gt;None&amp;lt;/span&amp;gt;:
        df = df.drop(index)

&amp;lt;span class="mention"&amp;gt;return&amp;lt;/span&amp;gt; df&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li data-list-item-id="eefcc1d39f96038e122e461e7526ba90b"&amp;gt;This function leverages the &amp;lt;code&amp;gt;dataframe()&amp;lt;/code&amp;gt; method of &amp;lt;a href="https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&amp;amp;amp;CLASSNAME=%25SYS.Python.SQLResultSet#METHOD_dataframe" target="_blank"&amp;gt;the embedded python SQLResultSet objects&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li data-list-item-id="ea4ede0aa2a09dabf2902a06278aa79df"&amp;gt;&amp;lt;code&amp;gt;load_table()&amp;lt;/code&amp;gt; accepts an optional &amp;lt;code&amp;gt;sample_size&amp;lt;/code&amp;gt; argument for testing purposes. There's also a filter for entries without unstructured text. Though our sample database is curated and complete, some use cases may seek to vectorize datasets for which one cannot assume each row will have data for all columns (for example survey responses with skipped questions). As opposed to implementing a "null" or empty vector, we chose to exclude such rows from vector search by removing them at this step in the process.&amp;lt;/li&amp;gt;&amp;lt;li data-list-item-id="e821e2611fe70c080e7a022d8f3c21f1b"&amp;gt;*Note that &amp;lt;code&amp;gt;iris&amp;lt;/code&amp;gt; is the &amp;lt;a href="https://docs.intersystems.com/irisforhealthlatest/csp/docbook/DocBook.UI.Page.cls?KEY=GEPYTHON_reference" target="_blank"&amp;gt;InterSystems IRIS Python Module&amp;lt;/a&amp;gt;. It functions as an API to access IRIS classes, methods, and to interact with the database, etc.&amp;lt;/li&amp;gt;&amp;lt;li data-list-item-id="ea69a4d07e45b85e56b943c780424979e"&amp;gt;*Note that &amp;lt;code&amp;gt;SQLUser&amp;lt;/code&amp;gt; is the &amp;lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_tables#GSQL_tables_schemadefault" target="_blank"&amp;gt;system-wide default schema&amp;lt;/a&amp;gt;&amp;nbsp;which &amp;lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GOBJ_defpersobj#GOBJ_defpersobj_sqlproj_pkg" target="_blank"&amp;gt;corresponds to the default package&amp;lt;/a&amp;gt;&amp;lt;code&amp;gt;User&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li class="ck-list-marker-bold" data-list-item-id="e9ca354b9ae137911dc78afd93da5b132"&amp;gt;&amp;lt;h4&amp;gt;&amp;lt;strong&amp;gt;Generate an embedding column (support method)&amp;lt;/strong&amp;gt;&amp;lt;/h4&amp;gt;&amp;lt;ul&amp;gt;&amp;lt;li data-list-item-id="ea097b9c25eb08247c89cab9fcfac9e6d"&amp;gt;&amp;lt;pre class="codeblock-container" idlang="0" lang="ObjectScript" tabsize="4"&amp;gt;&amp;lt;code class="language-plaintext language-cls hljs cos"&amp;gt;&amp;lt;span class="mention"&amp;gt;ClassMethod&amp;lt;/span&amp;gt; GetEmbeddingString(aurg &amp;lt;span class="mention"&amp;gt;As&amp;lt;/span&amp;gt; &amp;lt;span class="mention"&amp;gt;%String&amp;lt;/span&amp;gt;) &amp;lt;span class="mention"&amp;gt;As&amp;lt;/span&amp;gt; &amp;lt;span class="mention"&amp;gt;%String&amp;lt;/span&amp;gt; [ Language = python ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;{&lt;br&gt;
  import iris&lt;br&gt;
  import ollama&lt;/p&gt;

&lt;p&gt;response = ollama.embed(model='all-minilm',input=[ aurg ])&lt;br&gt;
  embedding_str = str(response.embeddings[&lt;span&gt;0&lt;/span&gt;])&lt;/p&gt;

&lt;p&gt;&lt;span&gt;return&lt;/span&gt; embedding_str&lt;br&gt;
}&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;We installed Ollama on our VM, loaded the &lt;code&gt;all-minilm&lt;/code&gt; embedding model, and generated embeddings using Ollama’s Python library. This allowed us to run the model locally and generate embeddings without an API key.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GetEmbeddingString&lt;/code&gt;&amp;nbsp;returns the embedding as a string because&amp;nbsp;&lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_tovector#RSQL_tovector_args_data" rel="noopener noreferrer"&gt;&lt;code&gt;TO_VECTOR&lt;/code&gt;&lt;/a&gt;&amp;nbsp;by default expects the &lt;code&gt;data&lt;/code&gt; argument to be a string, more on that to follow.&lt;/li&gt;
&lt;li&gt;*Note that Embedded Python provides syntax for calling other ObjectScript methods defined within the current class (similar to &lt;code&gt;self&lt;/code&gt; in Python). The earlier example uses &lt;code&gt;iris.cls(&lt;strong&gt;name&lt;/strong&gt;)&lt;/code&gt; syntax to get a reference to the current ObjectScript class and invoke &lt;code&gt;GetEmbeddingString&lt;/code&gt;&amp;nbsp;(ObjectScript method) from &lt;code&gt;VectorizeTable&lt;/code&gt; (Embedded Python method inside ObjectScript method).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h4&gt;&lt;strong&gt;Write the embeddings from the dataframe into the table in IRIS&lt;/strong&gt;&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="mention"&gt;# Write dataframe into new table&lt;/span&gt;&lt;br&gt;
print(&lt;span class="mention"&gt;"Loading data into table..."&lt;/span&gt;)&lt;br&gt;
&lt;span class="mention"&gt;for&lt;/span&gt; index, row &lt;span class="mention"&gt;in&lt;/span&gt; df.iterrows():&lt;br&gt;
    sql = iris.sql.prepare(&lt;span class="mention"&gt;"INSERT INTO SQLUser.SamplePoetryVectors (ID, EMBEDDING) VALUES (?, TO_VECTOR(?, decimal))"&lt;/span&gt;)&lt;br&gt;
    rs = sql.execute(row[&lt;span class="mention"&gt;'id'&lt;/span&gt;], row[&lt;span class="mention"&gt;'embedding'&lt;/span&gt;])

&lt;p&gt;print(&lt;span&gt;"Data loaded into table."&lt;/span&gt;)&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;Here, we use Dynamic SQL to populate &lt;code&gt;SamplePoetryVectors&lt;/code&gt; row-by-row. Because earlier we declared the &lt;code&gt;EMBEDDING&lt;/code&gt; property to be of type &lt;code&gt;%Library.Vector&lt;/code&gt; we must use &lt;a href="http://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_tovector#RSQL_tovector_args_data" rel="noopener noreferrer"&gt;&lt;code&gt;TO_VECTOR&lt;/code&gt;&lt;/a&gt; to convert the embeddings to IRIS' native &lt;a href="https://docs.intersystems.com/iris20253/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&amp;amp;PRIVATE=1&amp;amp;CLASSNAME=%25Library.Vector" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR&lt;/code&gt;&lt;/a&gt; datatype upon insertion. We ensured compatibility with &lt;code&gt;TO_VECTOR&lt;/code&gt; by converting the embeddings to strings earlier.&lt;ul&gt;&lt;li&gt;The &lt;code&gt;iris&lt;/code&gt; python module again allows us to take advantage of Dynamic SQL from within our Embedded Python function.&lt;/li&gt;&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h4&gt;&lt;strong&gt;Create a HNSW Index&lt;/strong&gt;&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="mention"&gt;# Create Index&lt;/span&gt;&lt;br&gt;
iris.sql.exec(&lt;span class="mention"&gt;"CREATE INDEX HNSWIndex ON TABLE SQLUser.SamplePoetryVectors (EMBEDDING) AS HNSW(Distance='Cosine')"&lt;/span&gt;)&lt;br&gt;
print(&lt;span class="mention"&gt;"Index created."&lt;/span&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;IRIS will natively implement a &lt;a href="https://arxiv.org/abs/1603.09320" rel="noopener noreferrer"&gt;HNSW graph&lt;/a&gt; for use in vector search methods when an &lt;a href="https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&amp;amp;CLASSNAME=%25SQL.Index.HNSW" rel="noopener noreferrer"&gt;HNSW index&lt;/a&gt; is created on a compatible column. The vector search methods available through IRIS are &lt;code&gt;VECTOR_DOT_PRODUCT&lt;/code&gt; and &lt;code&gt;VECTOR_COSINE&lt;/code&gt;.&amp;nbsp;Once the index is created, IRIS will automatically use it to optimize the corresponding vector search method when called in subsequent queries. The parameter defaults for an HNSW index are &lt;code&gt;Distance = Cosine&lt;/code&gt;,&amp;nbsp;&lt;code&gt;M = 16&lt;/code&gt;, and &lt;code&gt;efConstruction = 200&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Note that &lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_vectorcosine#RSQL_vectorcosine_desc" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR_COSINE&lt;/code&gt;&lt;/a&gt;&amp;nbsp;implicitly normalizes its input vectors, so we did not need to perform normalization before inserting them into the table in order for our vector search queries to be scored correctly!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h3&gt;
&lt;strong&gt;Implement a &lt;/strong&gt;&lt;code&gt;&lt;strong&gt;VectorSearch()&lt;/strong&gt;&lt;/code&gt;&lt;strong&gt; class method&lt;/strong&gt;
&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;&lt;h4&gt;&amp;nbsp;&lt;img src="/sites/default/files/inline/images/vectorsearch_0.png" alt=""&gt;&amp;nbsp;&lt;span&gt;&amp;nbsp;&lt;/span&gt;
&lt;/h4&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;h4&gt;Generate an embedding for the query string&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="mention"&gt;# Generate embedding of search parameter&lt;/span&gt;&lt;br&gt;
search_vector = iris.cls(&lt;strong&gt;name&lt;/strong&gt;).GetEmbeddingString(aurg)&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;Reusing the class method&amp;nbsp;&lt;code&gt;GetEmbeddingString&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h4&gt;Prepare and execute a query that utilizes &lt;code&gt;VECTOR_COSINE&lt;/code&gt;
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;&lt;span class="mention"&gt;# Prepare and execute SQL statement&lt;/span&gt;&lt;br&gt;
stmt = iris.sql.prepare(&lt;br&gt;
        """SELECT top 5 p.poem, p.title, p.author &lt;br&gt;
        FROM SQLUser.SamplePoetry AS p &lt;br&gt;
        JOIN SQLUser.SamplePoetryVectors AS v &lt;br&gt;
        ON p.ID = v.ID &lt;br&gt;
        ORDER BY VECTOR_COSINE(v.embedding, TO_VECTOR(?)) DESC"""&lt;br&gt;
)&lt;br&gt;
results = stmt.execute(search_vector)&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;We use a &lt;code&gt;JOIN&lt;/code&gt; here to combine the poetry text with its corresponding vector embedding so we can rank results by semantic similarity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h4&gt;Output the results&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;pre&gt;&lt;code&gt;results_df = pd.DataFrame(results)

&lt;p&gt;pd.set_option(&lt;span&gt;'display.max_colwidth'&lt;/span&gt;, &lt;span&gt;25&lt;/span&gt;)&lt;br&gt;
results_df.rename(columns={&lt;span&gt;0&lt;/span&gt;: &lt;span&gt;'Poem'&lt;/span&gt;, &lt;span&gt;1&lt;/span&gt;: &lt;span&gt;'Title'&lt;/span&gt;, &lt;span&gt;2&lt;/span&gt;: &lt;span&gt;'Author'&lt;/span&gt;}, inplace=&lt;span&gt;True&lt;/span&gt;)&lt;/p&gt;


&lt;p&gt;print(results_df)&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;

&lt;li&gt;Utilizes formatting options from pandas to tweak how it appears in the IRIS Terminal:&lt;ul&gt;&lt;li&gt;

&lt;img src="/sites/default/files/inline/images/terminal_example.png" alt=""&gt;&amp;nbsp;&lt;span&gt;&amp;nbsp;&lt;/span&gt;
&lt;/li&gt;&lt;/ul&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>vectordatabase</category>
      <category>database</category>
      <category>tutorial</category>
      <category>ux</category>
    </item>
    <item>
      <title>KMS . Introduction to its use in IRIS and an example of setup on AWS EC2 system</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 26 Apr 2026 16:19:04 +0000</pubDate>
      <link>https://dev.to/intersystems/kms-introduction-to-its-use-in-iris-and-an-example-of-setup-on-aws-ec2-system-425e</link>
      <guid>https://dev.to/intersystems/kms-introduction-to-its-use-in-iris-and-an-example-of-setup-on-aws-ec2-system-425e</guid>
      <description>&lt;p&gt;IRIS can use a KMS (Key Managment Service) as of release 2023.3.&amp;nbsp; Intersystems&amp;nbsp;documentation is a good resource on KMS implementation but does not go into details of the KMS set up on the system, nor provide an easily followable example of how one might set this up for basic testing.&lt;/p&gt;

&lt;p&gt;The purpose of this article&amp;nbsp;is to supplement the docs with a brief explanation of KMS, an example of its use in IRIS, and notes for setup of a testing system on AWS EC2 RedHat Linux system using the AWS KMS.&amp;nbsp; It is assumed in this document that the reader/implementor&amp;nbsp;already has access/knowledge to set up an AWS EC2 Linux system running IRIS (2023.3&amp;nbsp;or later), and that they have proper authority to access the AWS KMS and AWS IAM (for creating roles and polices), or that they will be able to get this access either on their own or via their organizations Security contact in charge of their AWS access.&lt;br&gt;&lt;/p&gt;

&lt;p&gt;What is KMS and what does it do for IRIS?:&lt;/p&gt;

&lt;p&gt;KMS means Key Management Service.&amp;nbsp; &amp;nbsp;Briefly, it provides an external secure method of encrypting and decrypting IRIS encryption keys through a trusted service, the KMS.&lt;/p&gt;

&lt;p&gt;In prior&amp;nbsp;implementation, when using unattended startup, IRIS would never store unencrypted encryption keys; IRIS would encrypt a key with an encrypted copy of the key encryption key in that key itself.&amp;nbsp; It would then store a user ID and password in IRIS to unencrypt the encrypted key encryption key.&amp;nbsp; This leaves an unencrypted copy of the user ID and password stored in an IRIS database, which leaves extra burden on IRIS managers of securing that.&amp;nbsp;&amp;nbsp;&lt;span&gt;&lt;span&gt;The key encryption key is encrypted/decrypted by a symmetric key that is based on a key admin’s password using PBKDF2 (Password-Based Key Derivation Function 2). So the key that encrypts the key encryption key is never stored anywhere – it’s derived on the fly when a key admin supplies their password. Since there can be multiple admins for keys in a given key file we store in the key file one encrypted copy of the key encryption key (per admin) and then a single encrypted copy of each database/data element encryption key (encrypted with the key encryption key).&lt;/span&gt;&lt;/span&gt;&lt;br&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;With KMS we do not store the id and password in IRIS.&amp;nbsp; When we create the encryption key with KMS we get an encrypted encryption key, and the KMS&amp;nbsp;keeps the key encryption key for us. We reach out to the kms server with the encrypted&amp;nbsp;encryption key.&amp;nbsp; the kms server decrypts&amp;nbsp;the encryption key.&amp;nbsp; The decrypted key is sent back to us and stored in memory.&amp;nbsp; The communications are secured&amp;nbsp;using&amp;nbsp;TLS.&lt;/p&gt;

&lt;p&gt;We don't ever have access to the raw key encryption key.&amp;nbsp; We use it as a service via kms.&amp;nbsp; The key encryption key stays on the kms server.&amp;nbsp; This helps with key management and key security.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Current implementation (as of 1/22/2024) of KMS&amp;nbsp;is Cloud Vendor Specific&lt;/p&gt;

&lt;p&gt;In AWS&amp;nbsp;you must specify creation of a&amp;nbsp;symmetric key.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;In Azure you must specify creation of an RSA&amp;nbsp;key&lt;/p&gt;

&lt;p&gt;Future implementation my include google KMS.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;---&lt;/p&gt;

&lt;p&gt;Example of workflow setting up new encryption key in IRIS using KMS:&lt;/p&gt;

&lt;p&gt;The following assumes you have set up an&amp;nbsp;IRIS system to access an AWS KMS server and your instance has been authorized to access the keys there and you have set up a key for use.&amp;nbsp; (See Setup Notes following this example&amp;nbsp;for an example of setting up KMS on AWS to connect with an AWS EC2 RedHat Linux instance.)&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;1.%SYS&amp;gt;D ^EncryptionKey&lt;/p&gt;

&lt;p&gt;2.Create New Key&lt;/p&gt;

&lt;p&gt;3.Name the key&lt;/p&gt;

&lt;p&gt;4.Use KMS: yes&lt;/p&gt;

&lt;p&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Here you specify properties of the key.&amp;nbsp; Choose backup if you want a regular encryption key made to backup this KMS key.&amp;nbsp; This is the only place you can do this.&amp;nbsp; Treat this backup as you would a normal Encryption key.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;5. Select AWS for the kms server&lt;/p&gt;

&lt;p&gt;6. Get the key ID and the region from your AWS Key Managed Service console&lt;/p&gt;

&lt;p&gt;7. Env Key ; you should not need to specify anything here if your system is set up correctly (per this article). See AWS docs for further details if necessary for your needs.&amp;nbsp; Leave blank for the purpose of simplifying this for testing example.&lt;/p&gt;

&lt;p&gt;8. You should receive a message like:&lt;/p&gt;

&lt;p&gt;Encryption key file created: iriskmstest1&lt;br&gt;Encryption key created via KMS: 87A85627-9F8C-11EE-8839-0608ECAD1BAF&lt;/p&gt;

&lt;p&gt;This key is NOT activated.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Key Activation and use are then usual encryption key setup steps.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;If there are issues with the activation at startup it will error and go into interactive mode&lt;/p&gt;

&lt;p&gt;For interactive startup if you pass in a kms key it will not prompt for username or password&lt;/p&gt;

&lt;p&gt;If you put in the backup key (generated in step 14 above) then it will ask for the username and password you created at key creation time (just like normal key)&lt;/p&gt;

&lt;p&gt;If there are issues you will see errors in your startup, or logged in messages.log if silent startup.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;In general, your IRIS system does not need to be on AWS or other cloud system, it accesses the KMS&amp;nbsp;for the&amp;nbsp;key over TLS.&lt;/p&gt;

&lt;p&gt;IRIS uses credentials of current user when accessing the KMS server, so you need to make sure that user has access to KMS&lt;/p&gt;

&lt;p&gt;the AWS key policy defines who can use the key on AWS.&amp;nbsp; See following setup notes for an example.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;----&lt;/p&gt;

&lt;p&gt;Setup Notes: Getting an AWS EC2 Linux system running IRIS to work with an AWS KMS:&lt;/p&gt;

&lt;p&gt;(The following assumes you already have an AWS EC2 RedHat Linux system running an IRIS version that supports KMS)&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;To set up the AWS EC2 system to use the AWS KMS server:&lt;/p&gt;

&lt;p&gt;Follow Setup instructions in following link to install the AWS CLI on your EC2 system:&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" rel="noopener noreferrer"&gt;&amp;nbsp; Install or update the latest version of the AWS CLI - AWS Command Line Interface (amazon.com)&lt;/a&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;There are instructions for different OS types.&amp;nbsp; For the purpose of this instruction set I used an AWS RedHat Linux system.&amp;nbsp; It was fairly strait forward to follow that doc to install the AWS CLI on the system.&lt;/p&gt;

&lt;p&gt;I also had to use 'sudo yum install unzip' to install unzip on the system in order to follow the instructions which had me use unzip on the AWS client download zip file.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Here are the steps to create a key that could be used by an IRIS instance for encryption key encryption:&lt;/p&gt;

&lt;p&gt;1. In AWS Mgmnt Console go to Key Management Service.&lt;/p&gt;

&lt;p&gt;2. Click on Customer Managed Keys&lt;/p&gt;

&lt;p&gt;3. Click on Create Key&lt;/p&gt;

&lt;p&gt;5. Accept the Defaults&lt;/p&gt;

&lt;p&gt;6. Enter an Alias; this is the name for the key&lt;/p&gt;

&lt;p&gt;7.Key Admin Options: default policy&lt;/p&gt;

&lt;p&gt;8. Click Finish&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;The IRIS instance will also need to be authorization to use the KMS key. This is done either by running the instance as a user who has authenticated to AWS and is authorized to use the key, specifying a credentials file with the AWS_SHARED_CREDENTIALS_FILE environment variable or by assigning to the EC2 itself an IAM role that either has a policy attached to it that allows key usage or that has an explicit allowance specified in the key policy itself.&lt;/p&gt;

&lt;p&gt;For the purpose of this instruction set we are following the 3rd as ISC Development has suggested this would be the most commonly used by customers in AWS.&amp;nbsp; In the following we will create an IAM role that can be assigned to the EC2 instance itself. The role can have a policy attached to it that gives it very targeted privileges to access a given key in the KMS (or even just allow specific operations with the key).&amp;nbsp; We are only exploring the most simple process to give us something to use for testing...&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Here are the steps for Authorizing an Instance of IRIS on an AWS EC2 system to use the key on the KMS server:&lt;/p&gt;

&lt;p&gt;1.In AWS Managment Console go to Key Management Service&lt;/p&gt;

&lt;p&gt;2. Under "Customer managed keys" click on the Key ID of the key you want to use.&lt;/p&gt;

&lt;p&gt;3. In the "General configuration" section click the "Copy" icon next to the ARN to copy the ARN to the clipboard. Paste this value somewhere to use later in the policy configuration.&lt;/p&gt;

&lt;p&gt;4. In AWS Mgmnt Console go to IAM.&lt;br&gt;5. Under "Access Management"&amp;gt;"Policies" click "Create policy".&lt;br&gt;6. Under "Select a service" choose KMS from the drop-down list. Click "Next".&lt;br&gt;7. Under "Actions allowed" click on the "Write" access level expander. Check the "Decrypt" and "Encrypt" checkboxes.&lt;br&gt;8. Under "Resources" click on the "Add ARNs" link.&lt;br&gt;9. Paste the entire ARN from Step 3 above into the "Resource ARN" text field. Click "Add ARNs". Click "Next".&lt;br&gt;10. Under "Policy details" provide a policy name and, if desired, a policy description. Click "Create policy".&lt;/p&gt;

&lt;p&gt;11. In IAM under "Access Management"&amp;gt;"Roles" click "Create role".&lt;br&gt;12. Under "Trusted entity type" click "AWS service". Under "Use case" select EC2 from the drop-down list. Click "Next".&lt;br&gt;13. Under "Permissions policies" start typing the policy name from Step 10 until it appears in the list. Click the checkbox next to it. Click "Next".&lt;br&gt;14. Under "Role details" provide a role name. Click "Create role".&lt;/p&gt;

&lt;p&gt;15. In AWS Mgmnt Console go to EC2. Navigate to "Instances"&amp;gt;"Instances".&lt;br&gt;16. If EC2 instance already exists:&lt;br&gt;&amp;nbsp; &amp;nbsp; a. Click checkbox next to instance name.&lt;br&gt;&amp;nbsp; &amp;nbsp; b. Click "Actions"&amp;gt;"Security"&amp;gt;"Modify IAM role".&lt;br&gt;&amp;nbsp; &amp;nbsp; c. Choose the role from Step 15 from the drop-down list.&lt;br&gt;&amp;nbsp; &amp;nbsp; d. Click "Update IAM role".&lt;br&gt;16. If launching new EC2 instance:&lt;br&gt;&amp;nbsp; &amp;nbsp; a. Click "Launch instances".&lt;br&gt;&amp;nbsp; &amp;nbsp; b. Under "Advanced details" choose role from Step 15 in "IAM instance profile" drop-down list.&lt;/p&gt;

&lt;p&gt;17.You can now use the kms key in ^EncryptionKey&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Notes:&lt;br&gt;&amp;nbsp;After creating policy/role you might need to refresh the Mgmt Console for these new resources to show up.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;---&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Supplemental:&lt;/p&gt;

&lt;p&gt;Classes methods of interest:&lt;/p&gt;

&lt;p&gt;%SYSTEM.Encryption.KMSCreatEncryptionKey()&lt;/p&gt;

&lt;p&gt;%SYSTEM.Encryption.ActivateEncryptionKey() ;just supply the kms key, no need for username or password&lt;/p&gt;

&lt;p&gt;do ReadFile^EncryptionKey(&amp;lt;key&amp;gt;,.data) zw data ;it will be obvious if the key is kms type from the data returned.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Doc link:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.intersystems.com/irisforhealth20233/csp/docbook/DocBook.UI.Page.cls?KEY=ROARS_encrypt_mgmt#ROARS_encrypt_KMS" rel="noopener noreferrer"&gt;Key Management Tasks | InterSystems IRIS for Health 2023.3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>aws</category>
      <category>encryption</category>
      <category>beginners</category>
    </item>
    <item>
      <title>IRIS SIEM System Integration with Crowdstrike Logscale</title>
      <dc:creator>InterSystems Developer</dc:creator>
      <pubDate>Sun, 26 Apr 2026 16:17:27 +0000</pubDate>
      <link>https://dev.to/intersystems/iris-siem-system-integration-with-crowdstrike-logscale-5406</link>
      <guid>https://dev.to/intersystems/iris-siem-system-integration-with-crowdstrike-logscale-5406</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg68k8f6ffrgaoekhdq3l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg68k8f6ffrgaoekhdq3l.png" alt=" " width="800" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IRIS makes &lt;a href="https://www.irs.gov/privacy-disclosure/security-information-and-event-management-siem-systems" rel="noopener noreferrer"&gt;SIEM&lt;/a&gt; systems integration simple with Structured Logging and Pipes!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding a SIEM integration to InterSystems IRIS for "Audit Database Events" was dead simple with the &lt;a href="https://cloud.community.humio.com/" rel="noopener noreferrer"&gt;Community Edition of CrowdStrike's Falcon LogScale&lt;/a&gt;, and here's how I got it done.&amp;nbsp;&amp;nbsp;&lt;br&gt;&lt;br&gt;&lt;strong&gt;CrowdStrike Community&amp;nbsp;LogScale Setup&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.community.humio.com/" rel="noopener noreferrer"&gt;Getting Started&lt;/a&gt; was ridiculously straight forward and I had the account approved in a couple of days with the following disclaimer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Falcon LogScale Community is a free service providing you with up to 16 GB/day of data ingest, up to 5 users, and 7 day data retention, if you exceed the limitations, you’ll be asked to upgrade to a paid offering. You can use Falcon LogScale under the limitations as long as you want, provided, that we can modify or terminate the Community program at any time without notice or liability of any kind.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Pretty generous and a good fit for this implementation, with the caveat all good things can come to an end I guess, cut your self an ingestion token in the UI and save it to your favorite hiding place for secrets.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Python Interceptor - irislogd2crwd.py&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Wont go over this amazing piece of software engineering in detail, but it is as simple as a python implementation that accepts STDIN, breaks up what it sees into events, and ships them off to the SIEM platform to be ingested.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span class="hljs-comment"&gt;#!/usr/bin/env python&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; json
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; time
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; os
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; sys
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; requests
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; socket
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; datetime &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; datetime
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; humiolib.HumioClient &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; HumioIngestClient


input_list = sys.stdin.read().splitlines() &lt;span class="hljs-comment"&gt;# From ^LOGDMN Pipe!&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;for&lt;/span&gt; irisevent &lt;span class="hljs-keyword"&gt;in&lt;/span&gt; input_list:
    &lt;span class="hljs-comment"&gt;# Required for CRWD Data Source&lt;/span&gt;
    today = datetime.now()
    fqdn = socket.getfqdn()

    payload = [
        {
            &lt;span class="hljs-string"&gt;"tags"&lt;/span&gt;: {
                &lt;span class="hljs-string"&gt;"host"&lt;/span&gt;: fqdn,
                &lt;span class="hljs-string"&gt;"source"&lt;/span&gt;: &lt;span class="hljs-string"&gt;"irislogd"&lt;/span&gt;
            },
                &lt;span class="hljs-string"&gt;"events"&lt;/span&gt;: [
                {
                    &lt;span class="hljs-string"&gt;"timestamp"&lt;/span&gt;: today.isoformat(sep=&lt;span class="hljs-string"&gt;'T'&lt;/span&gt;,timespec=&lt;span class="hljs-string"&gt;'auto'&lt;/span&gt;) + &lt;span class="hljs-string"&gt;"Z"&lt;/span&gt;,
                    &lt;span class="hljs-string"&gt;"attributes"&lt;/span&gt;: {&lt;span class="hljs-string"&gt;"irislogd"&lt;/span&gt;:json.loads(irisevent)} 
                }
            ]
        }
    ]

    client = HumioIngestClient(
        base_url= &lt;span class="hljs-string"&gt;"https://cloud.community.humio.com"&lt;/span&gt;,
        ingest_token= os.environ[&lt;span class="hljs-string"&gt;"CRWD_LOGSCALE_APIKEY"&lt;/span&gt;]
    )
    ingest_response = client.ingest_json_data(payload)

    
&lt;/code&gt;&lt;/pre&gt;

&lt;blockquote&gt;
&lt;p&gt;You will want to &lt;strong&gt;chmod +x&lt;/strong&gt; this script and put it where &lt;strong&gt;irisowner&lt;/strong&gt; can enjoy it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;strong&gt;InterSystems IRIS Structured Logging Setup&lt;/strong&gt;&lt;br&gt;&lt;a href="https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=ALOG" rel="noopener noreferrer"&gt;Structured Logging in IRIS&lt;/a&gt; is documented to the 9's, so this will be a Cliff Note to the end state of configuring ^LOGDMN.&amp;nbsp; The thing that caught my attention in the docs is probably the most unclear part of the implementation, but the most powerful and fun for sure.&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqogkjvf367fcrclgkb7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqogkjvf367fcrclgkb7.png" alt=" " width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After:&lt;br&gt;&lt;strong&gt;ENABLING&lt;/strong&gt; the Log Daemon, &lt;strong&gt;CONFIGURING&lt;/strong&gt; the Log Daemon and &lt;strong&gt;STARTING&lt;/strong&gt; Logging your configuration should look like this:&lt;br&gt;&amp;nbsp;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span class="hljs-built_in"&gt;%SYS&lt;/span&gt;&amp;gt;&lt;span class="hljs-keyword"&gt;Do&lt;/span&gt; &lt;span class="hljs-symbol"&gt;^LOGDMN&lt;/span&gt;
&lt;span class="hljs-number"&gt;1&lt;/span&gt;) Enable logging
&lt;span class="hljs-number"&gt;2&lt;/span&gt;) Disable logging
&lt;span class="hljs-number"&gt;3&lt;/span&gt;) Display configuration
&lt;span class="hljs-number"&gt;4&lt;/span&gt;) Edit configuration
&lt;span class="hljs-number"&gt;5&lt;/span&gt;) &lt;span class="hljs-keyword"&gt;Set&lt;/span&gt; default configuration
&lt;span class="hljs-number"&gt;6&lt;/span&gt;) Display logging status
&lt;span class="hljs-number"&gt;7&lt;/span&gt;) Start logging
&lt;span class="hljs-number"&gt;8&lt;/span&gt;) Stop logging
&lt;span class="hljs-number"&gt;9&lt;/span&gt;) Restart logging

LOGDMN option? &lt;span class="hljs-number"&gt;3&lt;/span&gt;
LOGDMN configuration

Minimum level: -&lt;span class="hljs-number"&gt;1&lt;/span&gt; (DEBUG)
 Pipe command: /tmp/irislogd2crwd.py
       Format: JSON
     Interval: &lt;span class="hljs-number"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;blockquote&gt;
&lt;pre&gt;/tmp/irislogd2crwd.py  # Location of our chmod +x Python Interceptor
JSON                   # Important&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now that we are logging somewhere else, lets just pump up the verbosity in the Audit Log and enable all the events since somebody else is paying for it.&lt;br&gt;&lt;br&gt;Stealing from &lt;a class="mentioned-user" href="https://dev.to/sylvain"&gt;@sylvain&lt;/a&gt;.Guilbaud&lt;span&gt;&amp;nbsp;'s &lt;a href="https://community.intersystems.com/post/how-activate-all-audit-system-events" rel="noopener noreferrer"&gt;post&lt;/a&gt;:&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0ii8d2x1fi65u4v2ahc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0ii8d2x1fi65u4v2ahc.png" alt=" " width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;strong&gt;CrowdStrike LogScale Event Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It wont take long to get the hang of, but the Search Console is the beginning of all good things with setting up customized observability based on your events.&amp;nbsp;&amp;nbsp;The search pane with filter criteria displays in the left corner, the available attributes on the left sidebar and the matching events in the results pane in the main view.&lt;br&gt;&lt;br&gt;LogScale uses&amp;nbsp;The LogScale&amp;nbsp;&lt;em&gt;Query Language&lt;/em&gt;&amp;nbsp;(&lt;a href="https://library.humio.com/data-analysis/syntax.html" rel="noopener noreferrer"&gt;LQL&lt;/a&gt;)&amp;nbsp;&lt;span&gt;&amp;nbsp;to back the widgets, alerts and actions.&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8vz7xje2v52zmgh7wyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8vz7xje2v52zmgh7wyz.png" alt=" " width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I suck at visualizations, so I am sure you could do better than below with a box of crayons, but here is my 4 widgets of glory to put a clown suit on the SIEM events for this post:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9fh286x7cg58rut35nn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9fh286x7cg58rut35nn.png" alt=" " width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we look under the hood for the "Event Types" widget, the following LQL is only needed behind a time series graph lql:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;timechart(irislogd.event)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;br&gt;So we did the thing!&lt;br&gt;&lt;br&gt;&lt;strong&gt;We've integrated IRIS with the Enterprise SIEM implementation&lt;/strong&gt; and the Security Team is "😀&amp;nbsp;"&amp;nbsp;&amp;nbsp;&lt;br&gt;&lt;br&gt;The bonus here are the things that are also accomplished with the exact same development pattern as above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notifications&lt;/li&gt;
&lt;li&gt;Actions&lt;/li&gt;
&lt;li&gt;Scheduled Searches&lt;/li&gt;
&lt;li&gt;Scheduled Daily Reports&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>monitoring</category>
      <category>beginners</category>
      <category>security</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
