DEV Community

Hands-on Monitoring and Alerting guide for Azure resources

When talking about software quality and detecting flaws early, what immediately comes to mind is writing tests and enforcing them as soon as possible in the CI/CD process. Overall, quality is about ensuring reliability throughout the entire implemented solution. This can be tightly coupled with monitoring resources, tracking performance and setting up early alerting mechanisms. By proactively detecting issues like high CPU usage, memory leaks, or slow response times, teams can prevent failures before they impact users.

In this article we are going to focus on other aspects of quality that do not necessarily require writing and executing tests, but instead utilize metrics and logs provided by the Azure Portal directly and visualize them on an Azure Workbook as an interactive and customizable data visualization tool within Azure Portal.

Setting the scene

Imagine you're part of a DevOps team responsible for maintaining an application hosted on Azure. Before going to production you would like to be in a position to early detect slowdowns and occasional service disruptions. Without a clear picture of the system's health and performance, it's difficult to pinpoint the cause and respond quickly. This lack of visibility and proactive alerting leads to longer downtime and frustrated customers. To address this, we need a robust monitoring and alerting strategy using Azure's built-in tools - starting with identifying where the problem lies, setting up monitoring for relevant metrics and building alerting rules that help us react before users are affected.

Let's say we're responsible for maintaining an Orders API, which handles incoming HTTP requests from a web frontend app to process customer orders. It's hosted on Azure App Service and backed by an Azure SQL Database while Application Insights and/or Log Analytics workspace is enabled. Recently, support tickets have reported that requests to the /submit-order endpoint occasionally take too long or fail, especially during high traffic periods.

To diagnose and resolve this, we want to answer the following questions:

  • Is the API experiencing high response times or failures?
  • What's causing the slowdown - CPU/memory pressure, database latency, or something else?
  • Would it be useful to set up alerts notifying us as soon as performance degrades?

Our approach will follow these steps:

  1. Monitor metrics to understand the API's real-time performance (e.g., response time, request count, error rate)
  2. Enable Diagnostic Logs to capture deeper insights into failures and long-term trends using Log Analytics
  3. Use KQL Queries to investigate patterns and detect anomalies
  4. Create a Workbook to visualize the data in a centralized, interactive dashboard
  5. Define Alerts with thresholds that will notify us when performance degrades or errors spike.

This structured approach ensures we're not just reacting to problems, but actively detecting and preventing them.

Monitor metrics

To begin with troubleshooting the performance issues on /submit-order endpoint, we start by examining the available metrics provided by the Azure App Service that hosts our Orders API. These metrics give us a snapshot of how the application is performing in real time.

Navigate to Metrics in Azure Portal

  1. Go to the Azure Portal
  2. In the search bar, type and select your App Service (e.g., orders-api-prod)
  3. In the left-hand menu under Monitoring, click Metrics.

Image description

After clicking on Metrics, we can choose the one we want to monitor and see a graphical representation of it. For example, we can select from the dropdown the Response time and get the following graph:

Image description

Other metrics can be utilized to address user complaints and align with our system architecture. For example we can choose from the following:

  • Server response time - Tells us how long it takes to respond to HTTP requests
  • Requests - Shows the number of incoming requests. Spikes here may correlate with performance issues
  • HTTP 5xx errors - Indicates server-side errors, which can be tied to crashes or overload
  • CPU Percentage - Helps determine if the instance is under CPU pressure
  • Memory Working Set - Tracks memory usage over time

Monitor logs

While metrics give us a real-time snapshot of the Orders API's performance, Application Insights and/or Log Analytics workspace logs provide a deeper and more granular view of what's actually happening inside the application. Logs can help answer questions like:

  • Which specific requests are failing and why?
  • Are there specific error messages or exceptions being thrown?
  • How is the backend database responding?
  • What patterns can we identify over time?

Access and Explore Logs

Once logging is enabled and data starts flowing into your workspace:

  1. Go to your Log Analytics Workspace
  2. Click on Logs (reference "Metrics section under Monitoring" image)
  3. In the query editor, you'll see several predefined tables such as:
    • AppRequests – HTTP request data (e.g., method, URL, duration)
    • AppExceptions – Exceptions thrown by your app
    • AppTraces – Custom traces or log messages from your code
    • AppDependencies – External calls, e.g., to databases or APIs

In the query editor we use Kusto Query Language (KQL), a read-only query language optimized for fast and efficient data exploration, enabling users to filter, aggregate and visualize large datasets easily.

Here are a few useful KQL queries to start exploring what's happening behind the scenes:

  • Slow Requests to /submit-order:

Image description

  • Count of Failed Requests:

Image description

  • Top Exception Messages:

Image description

Configure Diagnostic Settings

In case the AppExceptions table is not available or any other necessary tables, we can enable Diagnostic settings to send these logs to a specific Log Analytics Workspace.

To start capturing logs, we need to ensure our App Service is sending data to a Log Analytics Workspace:

  1. Go to your Orders API App Service in the Azure Portal
  2. Under Monitoring, click Diagnostic settings
  3. Click Add diagnostic setting
  4. Give your setting a name and check:
    • Application Logging
    • Request Logs
    • Failed request tracing
    • AppServiceHTTPLogs

 5. Select Send to Log Analytics Workspace and choose an existing workspace or create a new one
 6. Click Save

Note: Logs can differ depending on the resource type. For App Services, HTTP logs and application logs are particularly useful.

Once the Diagnostic settings are set, the steps are identical with the previous case where we use KQL query on the Log Analytics workspace.

Workbooks

Understanding metrics, logs, and queries is the first step in enabling Azure resource monitoring. Once this foundation is established, we can analyze individual resources by visiting them and monitoring their behavior. However, for a more comprehensive and centralized approach, it is essential to consolidate metrics and logs in a single, structured view.

One of the visualization tools provided by the Azure Portal is Azure Workbooks. This feature allows users to analyze and visualize data from various Azure resources, logs, and metrics within a single, interactive interface.

Creating an Azure Workbook is a straightforward process. Simply type Azure Workbooks in the Azure Portal search bar, select the service, and click on the Create button. From this point, users can choose to create either an empty Workbook or select from preconfigured templates that cater to common monitoring scenarios.

Regardless of the option chosen, users can click on Edit to customize the Workbook according to their requirements. Within the edit mode, clicking on the Add button allows the inclusion of various visualization components

Image description

As seen on the image above, we are able to utilize multiple options to make our Workbook meet our needs:

  • Text - add markdown or HTML-based text to provide descriptions, explanations, or headers
  • Query - run Kusto Query Language (KQL) queries to fetch data from Log Analytics, Azure Resource Graph, or Application Insights
  • Parameters - Define dropdowns, text inputs, or checkboxes to make Workbooks dynamic and interactive
  • Links & Tabs - Add navigation links or tabs to switch between different sections of a Workbook
  • Metrics - Fetch real-time Azure Metrics (e.g., CPU usage, memory utilization) and display them visually
  • Group - helps in organizing content logically, making the Workbook easier to read

We can choose Metrics where the predefined metrics (per resource) are available to be displayed or Query where the same KQL query from before can be applied.

Once the data is loaded we can choose the preferred visualization option:

  • Charts (area, bar, line, pie, scatter, time)
  • Grids
  • Tiles
  • Stats
  • Graphs
  • Maps
  • Text visualization

Creating custom Workbooks provides a graphical visualization of the resources both for tech and non tech people.

Alerting

Creating Alert rules is a very easy process, as we can simply reuse the same metrics and/or queries that we have used on our Azure Workbook. Following these steps it will allow us to set up an alert:

  • Click Create Alert rule
  • Under Scope, select the Azure resource you want to monitor
  • Under Condition, define the metrics and queries condition that should trigger the alert
  • Under Actions, select or create an Action Group to define who gets notified
  • Provide a name and severity level for the alert rule.
  • Click Create to finalize the alert rule

Image description

Conclusion

In conclusion, effective Monitoring and Alerting in Azure is essential for maintaining visibility, performance, and security across cloud resources. Azure Workbooks provide a centralized and interactive way to visualize metrics and logs, enabling teams to analyze data efficiently. Meanwhile, Azure Alerts ensure proactive monitoring by automatically notifying the right people and triggering automated actions when predefined conditions are met. By leveraging Action groups, organizations can streamline alert management and ensure timely responses to potential issues.

Combining these tools allows for a comprehensive monitoring strategy, where teams can track, analyze, and respond to system behavior in real time. With proper Workbook customization, Alert rule configuration, and Action group management, businesses can optimize performance, reduce downtime, and enhance overall cloud reliability.

In case you are looking for a dynamic and knowledge-sharing workplace that respects and encourages your personal growth as part of its own development, we invite you to explore our current job opportunities and be part of Agile Actors.

Top comments (0)