DEV Community

Rajesh Natarajan
Rajesh Natarajan

Posted on

Ingest data into Azure Data Explorer via Data Factory

Azure Data Explorer (ADX) is a great service to analyze log types of data. And you need data to play with it. There are many ways to ingest data into ADX, and I explain how to ingest data from blob storage by using Azure Data Factory (ADF).

Prerequisites

I created following Azure resources.

  • Azure Data Explorer
  • Azure Data Factory
  • Azure Storage Account

Image description

Upload log to storage
I uploaded csv as data source to blob storage.

1.Go to Storage Account and select "Containers". Then add new container. I named it as "catlogs".

Image description
2.Then I uploaded CSV which contains following data.
name,age
chibi,3
mei,3
chelsea,2

Image description
Create Database and Table in ADX
1.Go to ADX account from Azure Portal and click "Add database".

Image description
2.Create database by giving any name. I call it "catlogsdb".

Image description
3.Once database is created, then click "Ingest new data" which brings you to ADX portal.
Image description
4.Click "Query" to display created database.

Image description
5.Right click the database and click "Create table".
Image description
6.Enter table name and select csv from local file.
Image description
7.Click "Edit schema", select "Includes column names" to make first row as header, and "Create mapping". Note the mapping name.
Image description
8.Finally click create and close to complete the process.

9.Query cats table and confirm you don't have any data yet.
Image description
Create pipeline to ingest data
Finally create pipeline in ADF.

Add service principal
To let ADF access to ADX, I need to create service principal. You can give permission at two levels.

  • ADX resource level
  • ADX database level I give permission on resource level this time.

1.Go to Azure Portal | Azure Active Directory | App registrations and click "New registration".

Image description
2.Enter service principal name and click "Register". I named it as "mycatadx-sp".

3.Note "Application ID".

4.Select "Certificates & secrets" and generate new key. Note the key.

5.Then go back to Azure Portal and select ADX resource. Go to "Permissions". Click "Add".

Image description

6.Select the created service principal and add.

Image description

If you want to give permissions on database level, rather than go to "Permissions" on resource, go to databases and select a database, then select "Permissions".

Image description
Add ADX as Linked Service
1.Go to ADF resource from Azure Portal and click "Author & Monitor" which brings you to ADF portal.

Image description
2.Select "Manage" icon and click "New" in Linked services.

Image description
3.Select " Azure Data Explorer" from the list.
Image description
4.Select ADX resource from Azure Subscription, and enter service principal ID/key which you created.

5.Enter database name (catlogsdb) and click "Create".

Add Blob storage as Linked Service
By using similar step, add "Azure Blob storage" as linked service.

Add Dataset
After created linked service, I add dataset to define data model.

1.Go to "Author" menu, and click "+" to add datasets.

Image description
2.Select "Azure Blob Storage", then "DeliminatedText".
Image description
3.Select created blob linked service and specify file path. Also check "First row as header".

Image description
4.Once dataset it created, make sure to add "cats.csv" as file name path and click "Preview for File to confirm you can see the csv file.

Image description
5.Add another dataset. Select ADX and select created linked service. Database is auto resolved. Select table. I also renamed the dataset as "adxcats" to distinguish from blob cats dataset.

Image description
Create pipeline
Finally create pipeline.

1.From "+" button, select pipeline.

Image description
2.Drag and drop "Copy data" from "Move & transform" to the pipeline campus.

Image description

3.Select "Source" property and select "cats" dataset which you just created. Click "Preview data" to see the data.

Image description

4.Select "Sink" and select "adxcats" from dropdown. Enter mapping name which you obtain when you created table in ADX.
Image description
Run the pipeline
Click "Debug" menu to test the pipeline, which triggers the pipeline in debug mode.

Image description
Once the pipeline completed, go to ADX portal and query cats table.
Image description
Summary
Even though there are multiple steps to provision resources, settings and permissions, everything can be done via GUI.

If you want to automate the process with flexibility, please consider using ADF.

Top comments (0)