Kenichiro Nakamura

Posted on Sep 17, 2020 • Edited on Sep 23, 2020

Ingest data into Azure Data Explorer via Data Factory

#azure #adx #adf

Azure Data Explorer (ADX) is a great service to analyze log types of data. And you need data to play with it. There are many ways to ingest data into ADX, and I explain how to ingest data from blob storage by using Azure Data Factory (ADF).

Prerequisites

I created following Azure resources.

Azure Data Explorer
Azure Data Factory
Azure Storage Account

Upload log to storage

I uploaded csv as data source to blob storage.

1. Go to Storage Account and select "Containers". Then add new container. I named it as "catlogs".

2. Then I uploaded CSV which contains following data.

name,age
chibi,3
mei,3
chelsea,2

Create Database and Table in ADX

1. Go to ADX account from Azure Portal and click "Add database".

2. Create database by giving any name. I call it "catlogsdb".

3. Once database is created, then click "Ingest new data" which brings you to ADX portal.

4. Click "Query" to display created database.

5. Right click the database and click "Create table".

6. Enter table name and select csv from local file.

7. Click "Edit schema", select "Includes column names" to make first row as header, and "Create mapping". Note the mapping name.

8. Finally click create and close to complete the process.

9. Query cats table and confirm you don't have any data yet.

Create pipeline to ingest data

Finally create pipeline in ADF.

Add service principal

To let ADF access to ADX, I need to create service principal. You can give permission at two levels.

ADX resource level
ADX database level

I give permission on resource level this time.

1. Go to Azure Portal | Azure Active Directory | App registrations and click "New registration".

2. Enter service principal name and click "Register". I named it as "mycatadx-sp".

3. Note "Application ID".

4. Select "Certificates & secrets" and generate new key. Note the key.

5. Then go back to Azure Portal and select ADX resource. Go to "Permissions". Click "Add".

6. Select the created service principal and add.

If you want to give permissions on database level, rather than go to "Permissions" on resource, go to databases and select a database, then select "Permissions".

Add ADX as Linked Service

1. Go to ADF resource from Azure Portal and click "Author & Monitor" which brings you to ADF portal.

2. Select "Manage" icon and click "New" in Linked services.

3. Select " Azure Data Explorer" from the list.

4. Select ADX resource from Azure Subscription, and enter service principal ID/key which you created.

5. Enter database name (catlogsdb) and click "Create".

Add Blob storage as Linked Service

By using similar step, add "Azure Blob storage" as linked service.

Add Dataset

After created linked service, I add dataset to define data model.

1. Go to "Author" menu, and click "+" to add datasets.

2. Select "Azure Blob Storage", then "DeliminatedText".

3. Select created blob linked service and specify file path. Also check "First row as header".

4. Once dataset it created, make sure to add "cats.csv" as file name path and click "Preview for File to confirm you can see the csv file.

5. Add another dataset. Select ADX and select created linked service. Database is auto resolved. Select table. I also renamed the dataset as "adxcats" to distinguish from blob cats dataset.