<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ron Sweeney</title>
    <description>The latest articles on DEV Community by Ron Sweeney (@sween).</description>
    <link>https://dev.to/sween</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F535288%2Ff3d7b55c-0e0e-4c1b-abea-6b15ca5512c2.jpeg</url>
      <title>DEV Community: Ron Sweeney</title>
      <link>https://dev.to/sween</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sween"/>
    <language>en</language>
    <item>
      <title>Export InterSystems IRIS Data to BigQuery on Google Cloud Platform</title>
      <dc:creator>Ron Sweeney</dc:creator>
      <pubDate>Mon, 07 Dec 2020 12:54:05 +0000</pubDate>
      <link>https://dev.to/intersystems/export-intersystems-iris-data-to-bigquery-on-google-cloud-platform-3f32</link>
      <guid>https://dev.to/intersystems/export-intersystems-iris-data-to-bigquery-on-google-cloud-platform-3f32</guid>
      <description>&lt;p&gt;Loading your IRIS Data to your Google Cloud Big Query Data Warehouse and keeping it current can be a hassle with bulky Commercial Third Party Off The Shelf ETL platforms, but made dead simple using the iris2bq utility.&lt;/p&gt;

&lt;p&gt;Let's say IRIS is contributing to workload for a Hospital system, routing DICOM images, ingesting HL7 messages,  posting FHIR resources, or pushing CCDA's to next provider in a transition of care.  Natively, IRIS persists these objects in various stages of the pipeline via the nature of the business processes and anything you included along the way.  Lets send that up to Google Big Query to augment and compliment the rest of our Data Warehouse data and ETL (Extract Transform Load) or ELT (Extract Load Transform) to our hearts desire.&lt;/p&gt;

&lt;p&gt;A reference architecture diagram may be worth a thousand words, but 3 bullet points may work out a little bit better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It exports the data from IRIS into DataFrames
&lt;/li&gt;
&lt;li&gt; It saves them into GCS as &lt;code&gt;.avro&lt;/code&gt; to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehand.
&lt;/li&gt;
&lt;li&gt; It starts BigQuery jobs to import those &lt;code&gt;.avro&lt;/code&gt; into the respective BigQuery tables you specify.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QtcKGpa3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fngt3hzmvadf7im30ih5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QtcKGpa3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/fngt3hzmvadf7im30ih5.png" alt="Alt Text"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt; Under the hood,  iris2bq it is using the Spark framework for the sake of simplicity, but no Hadoop cluster is needed. It is configured as a "local" cluster by default, meaning the application and is running standalone.  The tool is meant to be launched on an interval either through cron or something like Airflow.&lt;/p&gt;

&lt;p&gt;All you have to do is point it at your IRIS instance, tell it what tables you want to sync to Big Query, then they magically sync to an existing dataset or a creates a new one that you specify.  &lt;/p&gt;

&lt;h2&gt;
  
  
  How To Setup 
&lt;/h2&gt;

&lt;p&gt;And if a reference architecture and 3 bullet points  didn't do a good job explaining it, maybe actually running it will:&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Cloud Setup
&lt;/h3&gt;

&lt;p&gt;You can do this anyway you want, here are a few options for you, but all you have to do in GCP is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Project&lt;/li&gt;
&lt;li&gt;Enable the API's of Big Query and Cloud Storage&lt;/li&gt;
&lt;li&gt;Create a service account with access to create resources and download the json file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Using the Google Cloud Console (Easiest)
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com"&gt;https://cloud.google.com&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Using gcloud (Impress Your Friends):
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud projects create iris2bq-demo--enable-cloud-apis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  With Terraform (Coolest):
&lt;/h4&gt;

&lt;p&gt;Create a main.tf file after modifying the values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Create the GCP Project
resource "google_project" "gcp_project" {
  name = "IRIS 2 Big Query Demo"
  project_id = "iris2bq-demo"
  // You'll need this
  org_id     = "1234567"
}
// Enable the APIS
resource "google_project_services" "gcp_project_apis" {
  project = "iris2bq-demo"
  services   = ["bigquery.googleapis.com", "storage.googleapis.com"]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then do a:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform init
terraform plan
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  IRIS Setup
&lt;/h2&gt;

&lt;p&gt;Lets quickly jam some data into IRIS for a demonstration.  &lt;/p&gt;

&lt;p&gt;Create a class like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Class User.People Extends (%Persistent, %Populate)
{
Property ID As %String;
Property FirstName As %String(POPSPEC = "NAME");
Property LastName As %String(POPSPEC = "NAME");
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the populate to generate some data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USER&amp;gt;do ##class(User.People).Populate(10000)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can grab an &lt;code&gt;irissession&lt;/code&gt;, ensure you are in the USER namespace and run the following commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USER&amp;gt; SET result=$SYSTEM.SQL.Execute("CREATE TABLE People(ID int, FirstName varchar(255), LastName varchar(255))")
USER&amp;gt; for i=1:1:100000 { SET result=$SYSTEM.SQL.Execute("INSERT INTO People VALUES ("_i_", 'First"_i_"', 'Last"_i_"')") }
Both routes will create a table called "People" and insert 100,000 rows.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Either way you to and if everything worked out, you should be able to query for some dummy rows in IRIS.  These are the rows we are sending to Big Query.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gGcNn93V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5kx8wp83vt000zgvb34x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gGcNn93V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/5kx8wp83vt000zgvb34x.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  IRIS2BQ Setup
&lt;/h2&gt;

&lt;p&gt;Download the latest release of the utility &lt;a href="https://basenube.github.io/iris2bq/"&gt;iris2bq&lt;/a&gt;, and unzip it.  Then change directory to the &lt;code&gt;bin&lt;/code&gt; directory and move over your credentials to the root of this directory and create an application.conf file as below into the same root. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9tq3sh84--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/h9eilstbz8oiazt6gi3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9tq3sh84--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/h9eilstbz8oiazt6gi3m.png" alt="Alt Text"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Taking a look at the below configuration file here, you can get an idea of how the utility works.   &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Specify a jdbc url and the credentials for the system user.&lt;/li&gt;
&lt;li&gt; Give it a list of tables that you wan to appear in Big Query.&lt;/li&gt;
&lt;li&gt; Tell the utility which project to point to, the location of your credentials file.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then tell it a target Big Query Dataset, and a target bucket to write the &lt;code&gt;.avro&lt;/code&gt; files to.  &lt;/p&gt;

&lt;p&gt;Quick note on the GCP block, the dataset and bucket can either exist or not exist as the utility will create those resources for you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jdbc {
  url = "jdbc:IRIS://127.0.0.1:51773/USER"
  user = "_SYSTEM"
  password = "flounder" // the password is flounder
  tables = [ "people" ] //IRIS tables to send to big query
}
gcloud {
  project = "iris2bq-demo"
  service-account-key-path = "service.key.json" //gcp service account
  bq.dataset = "iris2bqdemods" // target bq dataset
  gcs.tmp-bucket = "iris2bqdemobucket" //target storage bucket
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Run&lt;/strong&gt;&lt;br&gt;
At this point we should be parked at our command prompt in the root of the utility, with a conf file we created and the json credentials file.  &lt;/p&gt;

&lt;p&gt;Now that we have all that in place, lets run it and check the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ export GOOGLE_CLOUD_PROJECT=iris2bq-demo 
$ exportGOOGLE_APPLICATION_CREDENTIALS=service.key.json 
$./iris2bq -Dconfig.file=configuration.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is a tad chatty, but if the import was successful it will state &lt;code&gt;people import done!&lt;/code&gt;  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--trkd3xw2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/vpwwuybkg7c1agnet9m6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--trkd3xw2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/vpwwuybkg7c1agnet9m6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets head over to to Big Query and inspect our work...&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NUNIHr87--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gdq46eqg46t9beuignav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NUNIHr87--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gdq46eqg46t9beuignav.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
 &lt;/p&gt;

&lt;p&gt;Hope you found this helpful!  &lt;/p&gt;

&lt;p&gt;Now setup a job to run it on an interval and &lt;strong&gt;LEFT JOIN&lt;/strong&gt; all over your IRIS data in Big Query!  &lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>intersystems</category>
    </item>
  </channel>
</rss>
