DEV Community

Shashank Banerjea
Shashank Banerjea

Posted on

3 1

Useful Azure Data Explorer queries for Azure Data Factory pipelines

Monitoring the Azure Data Factory Pipelines

Monitoring Azure Data Factory is enabled through collecting diagnostic logs and posting them to Log Analytics (part of Azure Monitor). The other options include – Storage Account or Event Hub for custom processing. There are additional settings provide options to decide what information that can are to be captured. It is also possible to have multiple settings for collecting diagnostic logs, where it is possible to send different data to different collection stores.

The most readily available monitoring is available through Log Analytics Workspace and Azure Data Factory Workbook available in the Azure Marketplace. Detailed documentation to enable monitoring through is available here.

Useful Azure Data Explorer Queries

While the workbook is helpful in surfacing some of the key metrics across multiple data factory instance it is helpful to write couple of custom queries in Azure Data Explorer to review some performance statistics of a Pipeline run.

Following the instructions in the link provided in the section above, you will have created a Log Analytics Workspace to store the Azure Data Factory diagnostic data. The data for Pipeline run is available in table called – “ADFPipelineRun”.

Viewing Duration of Pipeline Runs

One of the useful queries that I find is to view how long my succeeding pipeline are taking. For this example, we assume, what we are monitoring a pipeline called - gsc2adlsgen2copy. A query for the pipeline that would look like:

    ADFPipelineRun 
    | where Status == "Succeeded"
    | where PipelineName == "gcs2adlsgen2copy"
    | project PipelineName, RunId, Start, End, (End - Start)
Enter fullscreen mode Exit fullscreen mode

The query above returns all the execution time of the pipeline. This query returns all the executions of the pipeline (within the time frame set by the query explorer) and how much time they took.

Viewing number of Pipeline Runs exceeding a threshold

If we have established an SLA for the Pipeline run, we can add a threshold value into the query to view pipelines that exceed the threshold. The query would look like this with an arbitrary threshold of 15 seconds:

    ADFPipelineRun 
    | where Status == "Succeeded"
    | where PipelineName == "gcs2adlsgen2copy"
    | where (End - Start) > 15s
    | project PipelineName, RunId, Start, End, (End - Start)
Enter fullscreen mode Exit fullscreen mode

Generating Alerts on missed SLA

To generate an alert from the query above, click on the “+ New Alert” button on top of the query window and select to build a new alert on the Custom query.

New Alert Button

Shown below is the alert created for the query above, based on number of rows returned, monitored every 24 hours for past 24 hours.

New Alert Action

Conclusion

Azure Pipelines can be monitored using Azure Monitor. Alerts can be generated using custom Azure Data Explorer queries when a pipeline takes too long or does not run over a period. More reasons to rejoice and rest easy, my fellow cloud dwellers.

Heroku

Deliver your unique apps, your own way.

Heroku tackles the toil — patching and upgrading, 24/7 ops and security, build systems, failovers, and more. Stay focused on building great data-driven applications.

Learn More

Top comments (0)

Jetbrains image

Build Secure, Ship Fast

Discover best practices to secure CI/CD without slowing down your pipeline.

Read more

👋 Kindness is contagious

Dive into this insightful write-up, celebrated within the collaborative DEV Community. Developers at any stage are invited to contribute and elevate our shared skills.

A simple "thank you" can boost someone’s spirits—leave your kudos in the comments!

On DEV, exchanging ideas fuels progress and deepens our connections. If this post helped you, a brief note of thanks goes a long way.

Okay