Peter Bons

Posted on Jan 6, 2019 • Edited on Jan 15, 2020

Reporting Metrics Using .Net (Core) EventSource and EventCounter

#devops #csharp #monitoring #logging

Hardly anybody argues the importance of instrumenting your code and there are some built-in log options in the .Net Framework, not to mention countless great 3rd party libraries like Nlog, Serilog and Log4Net. But when it comes to reporting metrics (Think: requests per minute, usage, message rate and things like that) the built-in support is very limited. Of course we have the Performance Counters but they cannot be used cross-platform as they are Windows only. Luckily there is a relative simple way to report metrics using the EventSource class that is cross-platform and a part of the Framework so there is no need for external dependencies.

Introducing EventCounters

ETW (Event Tracing for Windows) and the EventSource class have long been a part of the .Net Framework. EventSources are heavily used by the framework itself, and by a lot of (3rd party) libraries as well. It allows you to use structured logging in your application code with a minimal performance overhead. The data can be collected in-process or out-of-process using tools like PerfView.

In this blogpost we will take a closer look to the not so well-known EventCounter class that lets us define and report metrics using an EventSource. I specifically want to show how you can read those counters in-process using an EventListener as this is not very well documented.

As a starter, let us see what the CoreFx team has to say about EventCounters (taken from their tutorial):

While EventSource is fast, logging too many events for very frequent events is still a performance hit. In this tutorial, we will introduce EventCounter, a mechanism for measuring performance for very frequent events.

For event that happen very frequently (for example, if it happen every few milliseconds), in general, you will want the performance overhead per event to be very low (e.g. less than a millisecond), otherwise it is going to cost a significant performance overhead. Logging an event, at the end of the day, need to write something to the disk. If the disk is not fast enough, you will lost events. We need a solution other than logging the event itself.

When dealing with large number of events, knowing the measure per event is not very useful either. Most of the time all we need is just some statistics out of it. So we could crank the statistics within the process itself and then write an event once in a while to report the statistics, that's what EventCounter will do for us.

Sounds great, doesn't it? It is a matter of defining and reporting metrics and the framework will do the aggregation and calculation in memory. All we need to do is to specify the interval at which the data is outputted.

The Recipe

First, we need to define an EventSource that will contain the EventCounter instances. You can dynamically create EventCounter instances at runtime but, due to a bug I documented here, you will have the create at least one EventCounter instance in the constructor.

[EventSource(Name = "My-CustomMetricsEventSource-Minimal")]
public sealed class CustomMetricsEventSource : EventSource
{
    private EventCounter methodDurationCounter;

    private Dictionary<string, EventCounter> dynamicCounters =
        new Dictionary<string, EventCounter>();

    public static CustomMetricsEventSource Log = new CustomMetricsEventSource();

    public CustomMetricsEventSource()
    {
        methodDurationCounter = new EventCounter(nameof(methodDurationCounter), this);
    }

    public void ReportMethodDurationInMs(long milliseconds)
    {
        methodDurationCounter.WriteMetric(milliseconds);
    }

    public void ReportMetric(string name, float value)
    {
        if (!dynamicCounters.TryGetValue(name, out EventCounter counterInstance))
        {
            counterInstance = new EventCounter(name, this);
            dynamicCounters.Add(name, counterInstance);
        }
        counterInstance.WriteMetric(value);
    }
}

We will use CustomMetricsEventSource to log our metrics in a console application. But we also need a listener that picks up the logged metrics and sends them to a destination. In this example we will write the data to the console using a custom implementation of an EventListener:

internal class CustomMetricsEventListener : EventListener
{
    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        var counterData = eventData.ToEventCounterData();

        // Only write to console if actual data has been reported
        if (counterData?.Count == 0)
            return;

        Console.WriteLine(
            $"Counter {counterData.Name} reported values " +
            $"Min: {counterData.Min}, " +
            $"Max: {counterData.Max}, " +
            $"Count {counterData.Count}, " +
            $"Mean {counterData.Mean}, " +
            $"StandardDeviation: {counterData.StandardDeviation}, " +
            $"IntervalSec: {counterData.IntervalSec}");
    }
}

The code listened above uses the ToEventCounterData method. This is actually an extension method. It returns an EventCounterData instance which makes it easier to work with the reported metric data:

public static class EventWrittenEventArgsExtensions
{
    public static bool IsEventCounter(this EventWrittenEventArgs eventData)
    {
        return eventData.EventName == "EventCounters";
    }

    public static EventCounterData ToEventCounterData(this EventWrittenEventArgs eventData)
    {
        if (!eventData.IsEventCounter())
            return null;

        return new EventCounterData(eventData);
    }
}

public class EventCounterData
{
    public EventCounterData(EventWrittenEventArgs eventData)
    {
        var payload = (IDictionary<string, object>) eventData.Payload[0];

        Name = payload["Name"].ToString();
        Mean = (float)payload["Mean"];
        StandardDeviation = (float)payload["StandardDeviation"];
        Count = (int)payload["Count"];
        IntervalSec = (float)payload["IntervalSec"];
        Min = (float)payload["Min"];
        Max = (float)payload["Max"];
    }

    public string Name { get; }
    public float Mean { get; }
    public float StandardDeviation { get; }
    public int Count { get; }
    public float IntervalSec { get; }
    public float Min { get; }
    public float Max { get; }
}

Finally we can create the console application that will call CustomMetricsEventSource to log metric values. The CustomMetricsEventListener is constructed in such a way that the metrics are outputted once per second.

It is important to understand that it is the configuration of the CustomMetricsEventListener that determines the metric reporting interval. This is done by passing a numeric value specifying the interval in seconds to the EventCounterIntervalSec argument.

class Program
{
    static void Main(string[] args)
    {
        var reader = new CustomMetricsEventListener();
        var arguments = new Dictionary<string, string>
        {
            {"EventCounterIntervalSec", "1"}
        };
        reader.EnableEvents(CustomMetricsEventSource.Log, EventLevel.LogAlways, EventKeywords.All, arguments);

        var random = new Random();
        for (int i = 0; i <= 10; i++)
        {
            SleepingBeauty(random.Next(10, 200));
        }

        Console.ReadKey();
    }

    static void SleepingBeauty(int sleepTimeInMs)
    {
        var stopwatch = Stopwatch.StartNew();

        Thread.Sleep(sleepTimeInMs);

        stopwatch.Stop();

        CustomMetricsEventSource.Log.ReportMethodDurationInMs(stopwatch.ElapsedMilliseconds);
        CustomMetricsEventSource.Log.ReportMetric("someCounter", DateTime.Now.Millisecond);
    }
}

When we run the program we will see the metric values being reported to the console:

Counter methodDurationCounter reported values Min: 23, Max: 215, Count 8, Mean 113,125, StandardDeviation: 64,10429, IntervalSec: 1,026394

Counter someCounter reported values Min: 87, Max: 982, Count 9, Mean 603,1111, StandardDeviation: 269,2832, IntervalSec: 1,026394

Do mind that the metrics will always be reported at the specified interval (using EventCounterIntervalSec) even if no call to the WriteMetric method of an EventCounter instance is made. All numeric values, except IntervalSec, will have the default value 0.

You can find the whole working example in my GitHub repository:

Expecho / Blog

Placeholder for code I blogged about, see

Blog

Placeholder for code I blogged about

Folders in this repository

Metrics: Reporting Metrics Using .Net (Core)

View on GitHub

What is next?

Of course, reporting those values to the console does not add much value, it is up to you to send the data to your own logging backend. You could for example send them as custom metrics to Application Insights using the EventCounterCollectionModule, log to your ELK stack or whatever log system you are using.

I really wish reporting metrics gets a bit more attention since having metrics data available is very important in order to have a detailed insight in how your application is behaving. For example, you can use metrics to scale in or out or get insights in the usage of certain parts of your application. Most logging frameworks I know of do not cater for metrics.

There was a lot of discussion about whether or not the support writing metric using the new ILogger interface of .Net Core but in the end there is no support for it right know. More details about the how and the why can be read in the issue created for this discussion:

Add support for reporting application metrics via logging #708

anurse posted on Sep 19, 2017

Overview

We want to enable applications to emit arbitrary named metrics and have them captured by telemetry/logging systems (App Insights, Elasticsearch/ELK, etc.) as well as other metrics systems (Statsd, InfluxDB, etc.). An abstraction is desired in order to allow applications to emit metrics without being coupled to the destination of the metrics, particularly with features like App Insights auto-lightup where the app doesn't directly depend upon App Insights.

Proposal

We will add support for metrics reporting to ILogger. It will be done via a new companion interface IMetricsLogger, which loggers may optionally implement. Applications can emit metrics through this interface (see API below), and logger providers which support metrics can receive and process those metrics. Metrics are (name, value) tuples, where value is always floating point number. Metrics are expected to be viewed in aggregate, hence it doesn't make sense to use a generic data type like object for the metric, so metric values are always doubles. Supporting arbitrary types of metric values is a non-goal. Reporting metrics should be as efficient as possible, especially in the case where no logger provider is registered to listen for them, or they are filtered out.

Why add to `ILogger`?

Initially, I considered developing a new abstraction for Metrics. However, I encountered a few concerns:

We would need to develop a structure for naming/organizing metrics (likely using type/namespace names as a base)
We would need to develop a structure for filtering metrics (by categories, names, granularities, etc.)
Many telemetry systems provide support for both log and metric collection, and would have to provide a new system
Consumers have to pull in two different diagnostics reporters from DI (ILogger<T>, IMetricReporter<T>, etc.)
This would create Yet Another Diagnostics Abstraction (on top of EventSource, EventCounters, DiagnosticSource, Perf Counters, yada yada yada (see what I did there? ;P))

The fact that 1 and 2 above already exist in the Logging pipeline, and many metric sinks are also logging sinks, led to the idea of integrating metrics into Logging rather than creating a completely new abstraction.

The main reason why we would not want to add this to ILogger would be a risk

The API

`IMetricLogger`

Optional companion interface for ILogger implementations to indicate their support for metrics. The ILogger instances returned by ILoggerProvider are expected to implement this interface if and only if they wish to receive metrics.

namespace Microsoft.Extensions.Logging
{
    public interface IMetricLogger
    {
        /// <summary>
        /// Define a new metric with the provided name and return an <see cref="IMetric"/> that can be used to report values for that metric.
        /// </summary>
        /// <param name="name">The name of the metric to define</param>
        /// <returns>An <see cref="IMetric"/> that can be used to report values for the metric</returns>
        IMetric DefineMetric(string name);
    }
}

`IMetric`

Interface that allows metric data to be reported. Metric values

namespace Microsoft.Extensions.Logging
{
    public interface IMetric
    {
        /// <summary>
        /// Record a new value for this metric.
        /// </summary>
        /// <param name="value">The value to record for this metric</param>
        void RecordValue(double value);
    }
}

`LoggerMetricsExtensions`

Provides an extension method on ILogger to enable consumers to emit metrics, even if the underlying logger provider doesn't support it (of course, the metrics will be ignored in that case).

namespace Microsoft.Extensions.Logging
{
    public static class LoggerMetricsExtensions
    {
        /// <summary>
        /// Define a new metric with the provided name and return an <see cref="IMetric"/> that can be used to report values for that metric.
        /// </summary>
        /// <remarks>
        /// If none of the registered logger providers support metrics, values recorded by this metric will be lost.
        /// </remarks>
        /// <param name="name">The name of the metric to define</param>
        /// <returns>An <see cref="IMetric"/> that can be used to report values for the metric</returns>
        public static IMetric DefineMetric(this ILogger logger, string name)
        {
            if(logger is IMetricLogger metricLogger)
            {
                return metricLogger.DefineMetric(name);
            }
            return NullMetric.Instance;
        }
    }
}

`MetricValueExtensions`

Extension methods for IMetric to support other common metric value types (as mentioned above, the underlying type is still double, so the value must be representable as a double).

Suggestion: We could have a .Time() API that returns an IDisposable which uses a Stopwatch to do the timing for you as well. That would be easy to add later though.

using System;

namespace Microsoft.Extensions.Logging
{
    public static class MetricValueExtensions
    {
        /// <summary>
        /// Record a new value for this metric.
        /// </summary>
        /// <remarks>
        /// This is a convenience method that will convert the <see cref="TimeSpan"/> to a <see cref="double"/> via
        /// the <see cref="TimeSpan.TotalMilliseconds"/> property.
        /// </remarks>
        /// <param name="metric">The metric to record the value on.</param>
        /// <param name="value">The value to record for this metric.</param>
        public static void RecordValue(this IMetric metric, TimeSpan value) => metric.RecordValue(value.TotalMilliseconds);
    }
}

`Logger` updates

The aggregate Logger class we return from LoggerFactory.CreateLogger would be updated to support IMetricsLogger, even if none of the ILoggers it aggregates support it. It would only send metrics to the loggers that are known to support the interface.

`Define`/`Record` pattern

In order to keep the performance cost as low as possible for recording the actual metric value, the API uses a Define/Record pattern. Consumers should expect to call DefineMetric("foo") once for a particular value of "foo" and store the result as globally as possible (in singleton types, etc.). It is DefineMetric that does the bulk of the work to set up the metric recording. After calling DefineMetric, the RecordValue call on the IMetric interface can choose how to store the value based on the providers needs. For example, in a provider that plans to send pre-aggregated metrics, the RecordValue could simply update rolling aggregate values and then drop the actual recording (thus having constant storage requirements for each metric). If a provider needs to report individual metrics, it can use dynamically-allocated buffers or ring buffers, depending upon the configuration and the needs of the provider. As an example of an IMetric implementation based on pre-aggregating data, see https://gist.github.com/anurse/03c6746204317bd3616c372b4df9dbba

Filtering Metrics

Metrics should be filterable, and they should participate in the existing Logger filtering pipeline. My current proposal is that Metrics are treated like LogLevel.Critical messages, in that if the category is enabled at all, the metrics are written. We could consider capturing a LogLevel in .DefineMetric and using that to provide different "levels" of metrics. Since the existing filters only let you filter by Provider, Category, and Level, it is very difficult to filter out specific metrics by name (as that would require a new Filter structure). This may make LogLevel support for metrics more important.

@pakrym and I talked and came up with a specific suggestion

Metrics can be filtered at the provider and category name, but there is no concept of a LogLevel for metrics. However, filters can disable all metrics for a particular provider/category combination. To do this, we'll add an AllowMetrics boolean to LoggerFilterRule which defaults to true. When writing a metric, we'll check the provider and category name against the filter options as well as checking this boolean.

Scopes and Property Bags

FEEDBACK REQUESTED

My initial thought is that Metrics exist outside of scopes, as they are generally aggregated and Scopes imply per-event data. Providers could, in theory, give their IMetric instances access to read active scope data, and attach them to metrics if they choose (for example, systems that report unaggregated metrics may want to). For similar reasons, property bags (i.e. Log Values/Structured Logging, etc.) don't really make sense for metrics in general as they are usually aggregated. Having said that, many providers do support arbitrary properties (InfluxDB, App Insights), so some level of support may be useful. We could easily add an optional parameter to .RecordValue to allow adding arbitrary properties to each metric, and providers could choose to disregard it if it doesn't make sense for that provider.

Impact on existing Providers

There is minimal impact on existing Providers. Because the interface provides an opt-in model for a provider to receive metrics, existing providers are unaffected. I'd imagine providers like Serilog would not participate in this system as they don't have an infrastructure for metrics (/cc @nblumhardt , I may be wrong!). Providers like App Insights may choose to either add metrics support to their existing providers or add new providers that only handle metrics (and ignore log messages) that are co-registered.

View on GitHub

Top comments (8)

SteveHarveyUK • Sep 4 '19 • Edited

Great post - just what I was looking for to help with metrics in a .Net Standard library... But... I think that there might be another bug in there.

Have you ever tried having multiple EventSources?

While WriteEvent() calls do appear to be filtered based on the EventSource used to enable to EventListener, I appear to be seeing all EventCounter events on all my EventListeners regardless of how I enabled them.

Peter Bons • Sep 9 '19

I've tried to reproduce it but I can't. Can you share some details (or code) how you defined and enabled the EventSources?

SteveHarveyUK • Sep 9 '19

I've forked your repo to github.com/SteveHarveyUK/Blog.git

In there I've moved things around to provide a factory base example that demonstrates my usage pattern and the issue I see.

If you change the enableFiltering parameter on the factory it'll turn on my own implementation of filtering in the EventListener.

Peter Bons • Sep 10 '19

So let me get this clear: you want to be able to enable or disable collection metrics of a specific EventSource?

SteveHarveyUK • Sep 10 '19

I want to be able to have a number of independent EventSources with EventCounters collecting different metrics with each having its own EventListener that only receives the events from the EventSource and EventCounter that was enabled against it at potentially different frequencies.

So for instance I can have one instance monitoring cache table usage and delivering me the update rate per second and perhaps another giving me the stats against all queues in an application every five seconds.

I believe that I've achieved that with my example code, BUT I've had to manually filter the EventCounter events as they don't seem to be filtered by the framework based on the EventSource passed into EventListener.EnableEvents() method.

I'm just trying to check that I've not missed something as I'd have expected the built in filtering that appears to work on EventSource.WriteEvent() to work for the EventCounter events too.

Peter Bons • Sep 10 '19

If you do not want to collect metrics just don't pass the EventCounterIntervalSec argument in the dictionary. Otherwise just use another interval. I made some changes to your repo but can't push them to your repo. Can you add me as a contributer? (I can't fork your repo since it is a fork of mine)

SteveHarveyUK • Sep 10 '19

Added as collaborator. Thanks for your help!!

Peter Bons • Sep 10 '19 • Edited

I've created a pull request in which I demonstrate how I would do it. Probably better to communicate using the PR from now on.