Manohari Jayachandran

Posted on Jun 13 • Edited on Jun 19

Azure Application Insights: Monitoring, KQL Queries and Observability in Production

#azure #dotnet #csharp #cloudcomputing

At 2am on a Tuesday, an IP address change in Microsoft infrastructure silently broke our entire integration pipeline at Blue Yonder. Messages stopped flowing between Salesforce and ServiceNow. No errors surfaced in the application logs. The systems reported themselves as healthy. But nothing was moving.

It was Azure Application Insights that found it.

The Application Map showed 100% failure rate on the Service Bus node. I clicked the connection line. Saw connection refused errors on the dependency calls. Traced the root cause to the IP change within minutes.
I wrote a Logic App to inspect the dead-letter queue and replay every failed message. Zero data loss. Zero SLA breach.

That incident shaped how I think about observability forever. App Insights is not a monitoring tool you add at the end. It is the foundation you build everything on from day one.

This post covers everything I learned using App Insights in production - the queries, the alerts, the patterns, and the lessons that only come from real incidents.

What App Insights Collects Automatically

The first thing that surprises most engineers is how much App Insights collects with zero configuration. Add one NuGet package and one line in Program.cs and you immediately get:

Every HTTP request your API receives - the URL, method, response code, duration, and whether it succeeded.

Every dependency call your app makes outbound - SQL queries, HTTP calls to external APIs, Service Bus operations, Blob Storage reads. Each one tracked with duration and success status.

Every unhandled exception with the full stack trace, the exact line of code that failed, and all exception properties including inner exceptions.

Every log message you write with ILogger - Information, Warning, Error levels all captured with custom properties.

Performance counters including CPU usage, memory consumption, and request queue length collected automatically on App Service.

Setting It Up in C# ASP.NET Core

Three steps and you are done.

Install the NuGet package:
Microsoft.ApplicationInsights.AspNetCore
Add one line in Program.cs:
builder.Services.AddApplicationInsightsTelemetry(
builder.Configuration["ApplicationInsights:ConnectionString"]
);
Add the connection string to appsettings.json or
Azure App Service Configuration (never hardcode it).

That is it. Every ILogger call in your code now goes to App Insights automatically. No code changes needed in your controllers or services.

For custom events and metrics - tracking business-level events beyond technical telemetry - inject TelemetryClient and call TrackEvent() with a name and custom properties. I used this at Blue Yonder to track every integration completion with the system name and record count as
properties, so I could query processing volumes by system over time.

KQL - The Query Language That Changes Everything

KQL (Kusto Query Language) is what makes App Insights powerful rather than just a log viewer. It reads left to right with pipe operators - each step filters or transforms the result of the previous step.

The basic structure is always the same:
Start with a table name, then pipe through operators like where, project, summarize, order by, and extend.

Once you understand five queries you can write almost any investigation query you need. Here are the five I used most in production:

All failed requests in the last hour:

requests
| where timestamp > ago(1h)
| where success == false
| project timestamp, name, url, resultCode, duration
| order by timestamp desc

This was my first query every morning and after every deployment. If it returned rows, I had work to do.

Slowest API endpoints today:

requests
| where timestamp > ago(24h)
| summarize AvgDuration = avg(duration), Count = count() by name
| order by AvgDuration desc
| take 10

This identified performance regressions immediately after deployments before customers reported them.

All exceptions with full detail:

exceptions
| where timestamp > ago(4h)
| project timestamp, type, outerMessage, innermostMessage, method
| order by timestamp desc

The innermostMessage field is the one you want - it has the root cause, not the wrapper exception.

Dependency failures - what external calls failed:

dependencies
| where timestamp > ago(1h)
| where success == false
| project timestamp, type, target, name, duration, resultCode
| order by timestamp desc

This was how I found the Microsoft IP change. Service Bus dependencies all showing connection refused, all starting at the same timestamp.

Error rate by hour - spot patterns:

requests
| where timestamp > ago(24h)
| summarize Total=count(), Failed=countif(success==false) by bin(timestamp,1h)
| extend ErrorRate = round(Failed*100.0/Total, 2)
| order by timestamp asc

This chart pattern shows you whether failures are random noise or a systematic problem getting worse.

The Queries I Wrote at Blue Yonder

Beyond the standard queries, I built several patterns specific to integration monitoring that I have not seen documented elsewhere.

Cross-system correlation - following one record through every system it touched. Every request in our pipeline carried a CorrelationId custom property. With this query I could trace a single Salesforce case through Logic App orchestration, Function App transformation, Service Bus
messaging, and the final ServiceNow API call - seeing exact timestamps and durations at each step:

union requests, dependencies, traces, exceptions
| where timestamp > ago(24h)
| where tostring(customDimensions["CorrelationId"]) == "your-id"
| project timestamp, itemType, name, message, duration, success
| order by timestamp asc

Token refresh monitoring - tracking the 3-month Salesforce and 6-month ServiceNow credential refresh cycles that were a significant operational risk before I centralized them in Key Vault:

customEvents
| where name == "TokenRefreshed"
| extend System = tostring(customDimensions["System"])
| summarize LastRefresh=max(timestamp), SuccessCount=countif(tobool(customDimensions["Success"])==true) by System

Integration pipeline health - the Monday morning query that showed overnight processing status for every integration flow:

requests
| where timestamp > ago(12h)
| where name contains "Integration"
| summarize Success=countif(success==true), Failed=countif(success==false), AvgDuration=avg(duration) by name
| extend Status = iff(Failed > 0, "DEGRADED", "HEALTHY")
| order by Status asc

Smart Alerts - Stop Watching Dashboards

The real power of App Insights is alerts that find problems for you.

Metric alerts fire when a number crosses a threshold - failed requests greater than 5 in 5 minutes, response time average greater than 2 seconds, exception count greater than 10 per hour.

Log alerts run a KQL query on a schedule and fire if the results meet a condition. This is how I monitored the dead-letter queue - a query that ran every 5 minutes and fired immediately if any messages appeared in the DLQ. In production, a non-empty DLQ is always a signal that
something needs investigation.

Smart detection requires no configuration. App Insights learns your baseline automatically and alerts on anomalies - unusual failure rate spikes, abnormal response time degradation, memory leak patterns.
It caught two issues at Blue Yonder that I would not have noticed from metrics alone.

Live Metrics During Deployments

Live Metrics shows you what is happening with less than one second latency - incoming requests per second, exception rate, dependency call rate, CPU and memory of every running instance.

I had Live Metrics open on a second monitor during every production deployment. If the exception rate spiked within 30 seconds of a deploy I knew immediately to roll back. If it stayed flat for 2 minutes the deployment was clean.

This practice caught one bad deployment at Blue Yonder that would have caused a production incident if we had waited for customer reports. The rollback took 90 seconds. The alternative would have been hours of incident response.

Application Map

The Application Map is the fastest way to understand what broke and where. It shows every component of your system as a node - your API, the SQL database, Service Bus, external APIs - with connection lines showing call volume and failure rate between them.

When the Microsoft IP change broke our pipeline, the Service Bus node on the Application Map turned red with a 100% failure rate. I clicked the connection line between our API and Service Bus. The details pane
showed connection refused with the target IP address. That one click saved 30 minutes of log digging.

Distributed Tracing

Every request in App Insights gets a unique Operation ID that flows automatically through every system it touches. A single user request that goes through APIM, Logic App, Function App, Service Bus, and SQL - all with the same Operation ID.

In Transaction Search, paste the Operation ID and see every step in chronological order with exact timestamps and durations. The full story of one request across your entire distributed system in one view.

The practical implication for code: add a CorrelationId to your custom log entries and custom events. Then you can find every log entry related to one business transaction even when it crosses system boundaries.

Connecting App Insights Across the Azure Stack

App Insights works with every Azure service. Logic Apps through diagnostic settings send all run history to the same Log Analytics workspace you query with KQL. Function Apps with AddApplicationInsightsTelemetry()
track every function execution automatically. APIM connected to App Insights logs every API call with the caller's subscription key. Service Bus diagnostic settings expose message counts and DLQ depth as metrics.

The goal is one workspace where you can query across all these data sources simultaneously. When an incident spans multiple systems - which in integration work they always do - you want one place to look, not five dashboards.

Key Lessons From Production

Set up App Insights before you write business logic. Retrofitting observability into an existing system is ten times harder than building it in from the start.
Use structured logging with named parameters. log.LogInformation("Processing {OrderId}", order.Id) is infinitely more queryable than string concatenation. The named parameter becomes a filterable field in KQL.
Add CorrelationId to everything. Every log entry, every custom event, every Service Bus message property. Cross-system tracing without it is guesswork.
Learn five queries well rather than memorizing dozens. The queries in this post covered 90% of every real incident investigation I ran at Blue Yonder.
Set DLQ alerts before going live. Silent message loss in a Service Bus integration is the hardest production bug to diagnose after the fact. The alert costs 5 minutes to set up. The incident it prevents costs days.
Enable sampling in production. A busy integration platform generates GBs of telemetry per day at full collection. Adaptive sampling preserves all errors and exceptions while reducing successful request volume to stay within cost limits.

App Insights in the TechStack Blog

The C# ASP.NET Core API powering this blog has App Insights enabled. Every visit to techstackblog.com that triggers an API call is tracked - the request duration, the SQL query to Azure SQL, any exceptions. The connection string lives in Azure App Service Configuration, not in the code or GitHub.

If you are reading this post and wondering whether it is being monitored - it is.

Summary

Azure Application Insights gives you complete visibility into every layer of your Azure stack. Automatic telemetry collection means zero-config coverage from day one. KQL gives you the ability to answer any question about your system's behavior. Smart alerts find problems before your customers do. Application Map shows you what broke. Distributed tracing shows you why.

Build it in from day one. Set your alerts. Learn your five queries. Then stop firefighting and start preventing.

Originally published at TechStack Blog:
https://calm-island-0a7b4b30f.7.azurestaticapps.net/post.html?slug=azure-app-insights-deep-dive

Follow for weekly posts on Azure integration, C#, observability, and cloud engineering.