DEV Community

Ayaka Hara
Ayaka Hara

Posted on

Query table data in Azure Data Explorer with Kusto to analyse load test results

There are situations where we want to query Table data, such as analysing load test results.
Since the maximum number of entities* that are returned in a single query with LINQ Take operator is 1,000 (Ref - Returning the Top n Entities), you may need to code more to retrieve what you want.

*Entities are sets of properties and can be thought of like rows in a database.

architecture

LINQ has an upper limit of 1000, while Data Explorer with Kusto allows to query large numbers of entities.

In this article, how to query with Kusto in Azure Data Explorer (ADX) will be explained in particular.

TOC

Pre-requisites

First of all, you need to complete the steps to ingest data from Table Storage into Data Explorer via Data Factory to prepare to query large numbers of entities with Kusto.
Please refer to the post "Ingest data from Azure Table Storage into Data Explorer".

Once done with the steps above, you're ready to query data ingested from Azure Table Storage with Kusto.

Objective

In this article, I will use an example from the post "Cost comparison between Azure services to determine architecture" to illustrate how to query Table data with Kusto.

The purpose of querying the table data here is to make sure that one of the requirements, the processing time between device and storage is less than 10 seconds, is met.

Example table data

Let's see how an example table data looks like. It is assumed that there are around 1.2 million entities adding into a single table.

Table Name : telemetry202108180820

PartitionKey RowKey Timestamp Data
992c9af9-b490-44cd-bf95-d9fa61bc3aa4 abcdefghigklmn 2021-08-18T10:33:25.355Z {
"deviceId": "992c9af9-b490-44cd-bf95-d9fa61bc3aa4",
"connectivity": "Online",
"eventType": "Telemetry",
"timestamp": "2021-08-18T19:12:08.1844379+09:00",
"telemetry": {
"6E8E2CE5-3A7D-4997-9056-297BAD62C601":
"12345678901234567890123456789",
"1023EF00-093C-4702-886F-6C9C8B4D3102":
"12345678901234567890123456789",
...
}
}
c06145f6-7843-420c-ae80-fc52710198b5 abcdefghigklmn 2021-08-18T10:33:25.433Z {
"deviceId": "c06145f6-7843-420c-ae80-fc52710198b5",
"connectivity": "Online",
"eventType": "Telemetry",
"timestamp": "2021-08-18T19:12:08.1933468+09:00",
"telemetry": {
"6E8E2CE5-3A7D-4997-9056-297BAD62C601":
"12345678901234567890123456789",
"1023EF00-093C-4702-886F-6C9C8B4D3102":
"12345678901234567890123456789",
...
}
}

timestamp in column Data is the time when telemetry message is sent from each device. Column Timestamp is the time when telemetry message is ingested into Table Storage after processing it with Function App.
These timestamp data will be used to calculate processing time between device and table storage.

Data_Timestamp

Kusto queries to calculate processing time

Here is an example of Kusto queries to calculate processing time between device and table storage.

telemetry202108180820
| project data = parse_json(Data), ingestedTime = Timestamp
| project generatedTime = todatetime(data, timestamp), ingestedTime
| project diff = datetime_diff("Millisecond", ingestedTime, generatedTime)
| summarize avg(diff), max(diff), min(diff), percentiles(diff, 5, 90, 99)
Enter fullscreen mode Exit fullscreen mode

What each step is doing will be explained as the following.

1. Reference to a table

telemetry202108180820
Enter fullscreen mode Exit fullscreen mode

The statement starts with a reference to a table. In this article, the table 'telemetry202108180820' is being used as shown in the example table data section above.

2. Interpret column Data as a JSON

| project 
    data = parse_json(Data), // Interpret column Data as a JSON and rename it to data
    ingestedTime = Timestamp // Rename column Timestamp to ingestedTime
Enter fullscreen mode Exit fullscreen mode

'project' is an operator to select the columns to include, rename or drop, and insert new computed columns.

In the example here, Data is a string and it needs to be interpreted as a JSON value to extract just some properties from the JSON column later.
Also, column Timestamp is renamed to ingestedTime to clarify the difference from another timestamp.

3. Convert timestamp in column Data to datetime

| project 
    generatedTime = todatetime(data, timestamp), // Convert timestamp in column data to datetime scalar and rename it to generatedTime
    ingestedTime // Include ingestedTime
Enter fullscreen mode Exit fullscreen mode

'todatetime' is a function to convert input to datetime scalar. Timestamp in column data is renamed to generatedTime after converting with 'todatetime' function.
ingestedTime which was renamed above is also included again.

4. Calculates calendarian difference between two datetime values

| project 
    diff = datetime_diff("Millisecond", ingestedTime, generatedTime) // Calculates calendarian difference (millisecond) between two datetime values: ingestedTime and generatedTime, and rename it to diff
Enter fullscreen mode Exit fullscreen mode

'datetime_diff' is a function to calculates calendarian difference between two datetime values.
By calculating the difference in millisecond between the ingestedTimeand (when telemetry message is ingested into Table Storage) and the geteratedTime (when telemetry message is sent from each device), the processing time can be obtained. The value is renamed to diff.

5. Produce a table with aggregation functions

| summarize 
    avg(diff), 
    max(diff), 
    min(diff), 
    percentiles(diff, 5, 50, 90, 99) // Produce a table including the average, maximum, minimum, percentile approximate of diff which is calendarian difference (millisecond) between ingestedTime and generatedTime
Enter fullscreen mode Exit fullscreen mode

'summarize' is an operator to produce a table that aggregates the content of the input table.
As an example, the following four aggregation functions are used.

  • avg : Returns an average value across the group
  • max : Returns the maximum value across the group
  • min : Returns the minimum value across the group
  • percentiles : Returns the percentile approximate of the group

Result

Here is the result of the Kusto queries explained ealier. The average is about 5,023 milliseconds which means 5.023 seconds.

ADX_ProcessingTime_Average_result

The result shows that the requirement which the processing time between device and storage should be less than 10 seconds.

Conclusion

This is just an example of how to query table data in Azure Data Explorer with Kusto to analyse load test results.
If you want to retrieve/query large numbers of entities from Table Storage, much more than 1000, one way to do it is to use Kusto instead of LINQ.

Reference

Discussion (0)