loading...

Ingest data into Azure Data Explorer via Data Factory

kenakamu profile image Kenichiro Nakamura Updated on ・4 min read

Azure Data Explorer (ADX) is a great service to analyze log types of data. And you need data to play with it. There are many ways to ingest data into ADX, and I explain how to ingest data from blob storage by using Azure Data Factory (ADF).

Prerequisites

I created following Azure resources.

  • Azure Data Explorer
  • Azure Data Factory
  • Azure Storage Account

Alt Text

Upload log to storage

I uploaded csv as data source to blob storage.

1. Go to Storage Account and select "Containers". Then add new container. I named it as "catlogs".

Alt Text

2. Then I uploaded CSV which contains following data.

name,age
chibi,3
mei,3
chelsea,2

Alt Text

Create Database and Table in ADX

1. Go to ADX account from Azure Portal and click "Add database".

Alt Text

2. Create database by giving any name. I call it "catlogsdb".

Alt Text

3. Once database is created, then click "Ingest new data" which brings you to ADX portal.

Alt Text

4. Click "Query" to display created database.

Alt Text

5. Right click the database and click "Create table".

Alt Text

6. Enter table name and select csv from local file.

Alt Text

7. Click "Edit schema", select "Includes column names" to make first row as header, and "Create mapping". Note the mapping name.

Alt Text

8. Finally click create and close to complete the process.

9. Query cats table and confirm you don't have any data yet.

Alt Text

Create pipeline to ingest data

Finally create pipeline in ADF.

Add service principal

To let ADF access to ADX, I need to create service principal. You can give permission at two levels.

  • ADX resource level
  • ADX database level

I give permission on resource level this time.

1. Go to Azure Portal | Azure Active Directory | App registrations and click "New registration".

Alt Text

2. Enter service principal name and click "Register". I named it as "mycatadx-sp".

3. Note "Application ID".

4. Select "Certificates & secrets" and generate new key. Note the key.

5. Then go back to Azure Portal and select ADX resource. Go to "Permissions". Click "Add".

Alt Text

6. Select the created service principal and add.

Alt Text

If you want to give permissions on database level, rather than go to "Permissions" on resource, go to databases and select a database, then select "Permissions".

Alt Text

Add ADX as Linked Service

1. Go to ADF resource from Azure Portal and click "Author & Monitor" which brings you to ADF portal.

Alt Text

2. Select "Manage" icon and click "New" in Linked services.

Alt Text

3. Select " Azure Data Explorer" from the list.

Alt Text

4. Select ADX resource from Azure Subscription, and enter service principal ID/key which you created.

5. Enter database name (catlogsdb) and click "Create".

Add Blob storage as Linked Service

By using similar step, add "Azure Blob storage" as linked service.

Add Dataset

After created linked service, I add dataset to define data model.

1. Go to "Author" menu, and click "+" to add datasets.

Alt Text

2. Select "Azure Blob Storage", then "DeliminatedText".

Alt Text

3. Select created blob linked service and specify file path. Also check "First row as header".

Alt Text

4. Once dataset it created, make sure to add "cats.csv" as file name path and click "Preview for File to confirm you can see the csv file.

Alt Text

5. Add another dataset. Select ADX and select created linked service. Database is auto resolved. Select table. I also renamed the dataset as "adxcats" to distinguish from blob cats dataset.

Alt Text

Create pipeline

Finally create pipeline.

1. From "+" button, select pipeline.

Alt Text

2. Drag and drop "Copy data" from "Move & transform" to the pipeline campus.

Alt Text

3. Select "Source" property and select "cats" dataset which you just created. Click "Preview data" to see the data.

Alt Text

4. Select "Sink" and select "adxcats" from dropdown. Enter mapping name which you obtain when you created table in ADX.

Alt Text

Run the pipeline

Click "Debug" menu to test the pipeline, which triggers the pipeline in debug mode.

Alt Text

Once the pipeline completed, go to ADX portal and query cats table.

Alt Text

Summary

Even though there are multiple steps to provision resources, settings and permissions, everything can be done via GUI.

If you want to automate the process with flexibility, please consider using ADF.

In the next article, I will explain how to consume the data from Databricks.

Discussion

pic
Editor guide