Matt Hosking

Posted on May 28, 2021

Integrating Apollo Studio with GraphQL for .NET - Part 2

#graphql #dotnet #apollo #protobuf

In the previous article, I covered some of the challenges of supporting a GraphQL implementation, and how to begin integration to Apollo Studio via their protobuf interface. The supporting code is available on GitHub as mattjhosking/graphql-dotnet-apollo-studio.

Creating a Client

The first thing we need is an interface for our client, for DI and unit testing:

public interface IGraphQLTraceLogger
{
  void LogTrace(HttpContext httpContext, string? operationName, string? query, ExecutionResult result);
  AsyncAutoResetEvent ForceSendTrigger { get; }
  Task Send();
}

The LogTrace method takes the HttpContext (for retrieving headers), operation name and query (for determining the request hash, more on this later) and the execution result containing the Apollo tracing information. Send simply sends all queued traces to Apollo Studio and ForceSendTrigger allows overriding the delay between sending batches. Details on AsyncAutoResetEvent can be found in this Microsoft Dev blogs article.

Creating a Trace

First, we need to convert the execution result to the Trace object using the metrics to trace converter from the previous article. If this succeeds, then we set the trace ClientName using a specified header ("apollographql-client-name", with fallback to first portion of "User-Agent") and ClientVersion with another header ("apollographql-client-version", with fallback to second portion of "User-Agent").

Grouping Traces

Each trace must be grouped by the operation and query. The way Apollo works with this is to use the format:
# {operationName}
{query}
Operation name is the GraphQL operation name and the query is the entire query with whitespace reduced. I used the following method:

private static string MinimalWhitespace(string? requestQuery)
{
  return Regex.Replace((requestQuery ?? "").Trim().Replace("\r", "\n").Replace("\n", " "), @"\s{2,}", " ");
}

Thread Safety

Since the trace logger needs to be thread-safe, a lock is required when accessing the traces:

private const int BATCH_THRESHOLD_SIZE = 2 * 1024 * 1024; 
...
lock (_tracesLock)
{
  var tracesAndStats = _traces.GetOrAdd($"# {(string.IsNullOrWhiteSpace(operationName) ? "-" : operationName)}\n{MinimalWhitespace(query)}", key => new TracesAndStats());
  tracesAndStats.Traces.Add(trace);

  // Trigger sending now if we exceed the batch threshold size (2mb)
  if (Serializer.Measure(CreateReport(_traces)).Length > BATCH_THRESHOLD_SIZE)
      ForceSendTrigger.Set();
}

The GetOrAdd retrieves the existing TracesAndStats grouping for this query (if any traces exist since the last batch was sent) or creates a new one. We then add the current trace to this grouping. Next, the protobuf Serializer static class can be used to measure how many bytes a report would be if we sent the current queued batch. If it exceeds the batch size threshold (2mb, as per recommended by Apollo Studio), then we set the ForceSendTrigger to override the delay (more on this later). CreateReport will be defined in the next section.

Sending to Apollo Studio

The Send method is responsible for taking all queued traces and dispatching via HTTP to the Apollo Studio servers.

Thread Safety

In order to ensure that the traces field is not modified while we're processing it, we need to swap its value with a new, empty dictionary. The best way to do this (and the only way to do it atomically) is to use Interlocked.Exchange. This does exactly what we require, all in one operation that no thread execution can get in the middle of (and consequently explode).

IDictionary<string, TracesAndStats> traces;
lock (_tracesLock)
  traces = Interlocked.Exchange(ref _traces, new ConcurrentDictionary<string, TracesAndStats>());

Creating the Report

We first need to create the full report with all traces before we can send. The CreateReport method mentioned earlier creates an instance of the Report class, with a static header. I initialise this as follows:

_reportHeader = new ReportHeader
{
  Hostname = Environment.GetEnvironmentVariable("WEBSITE_HOSTNAME") ?? Environment.MachineName,
  AgentVersion = "engineproxy 0.1.0",
  ServiceVersion = Assembly.GetExecutingAssembly().FullName,
  RuntimeVersion = $".NET Core {Environment.Version}",
  Uname = Environment.OSVersion.ToString(),
  GraphRef = graphRef,
  ExecutableSchemaId = ComputeSha256Hash(new SchemaPrinter(schema).Print())
};

Hostname uses either the Azure app service host name or the current machine name.
Agent version I've set to the same value the NodeJS version of the reporting client uses.
Service version is set as the full assembly details.
Runtime version is the version of .NET Core in use (this really should factor in that 5.0 is not "Core").
Uname is OS version.
Graph ref is provided by configuration as the graph name registered in Apollo Studio combined with the version in use, separated by an '@'.
Executable schema ID is determined by computing a SHA256 hash on the full schema (SchemaPrinter utility from GraphQL.Utilities used for this). The hash is calculated as follows:

private static string ComputeSha256Hash(string rawData)
{
  using SHA256 sha256Hash = SHA256.Create();
  byte[] bytes = sha256Hash.ComputeHash(Encoding.UTF8.GetBytes(rawData));
  StringBuilder builder = new StringBuilder();
  foreach (var t in bytes)
    builder.Append(t.ToString("x2"));
  return builder.ToString();
}

This performs a straight-forward SHA256 hash and then converts the resulting bytes to hex.

Preparing the Message

It's pretty straight-forward to follow the protobuf-net docs to serialize the report, but we should really GZIP the stream for sending to reduce bandwidth consumption and improve performance:

byte[] bytes;
await using (var memoryStream = new MemoryStream())
{
  await using (var gzipStream = new GZipStream(memoryStream, CompressionLevel.Fastest))
    Serializer.Serialize(gzipStream, report);
  bytes = memoryStream.ToArray();
}

We're using the fastest compression level here to minimise CPU load. The GZipStream is part of System.IO.Compression and wraps a writable stream for compression when writing to the underlying stream. I'm using a MemoryStream as we have no need to store this anywhere (nor should we). The remainder is fairly simply:

var httpRequestMessage = new HttpRequestMessage(HttpMethod.Post, new Uri("https://engine-report.apollodata.com/api/ingress/traces"));
httpRequestMessage.Headers.Add("X-Api-Key", _apiKey);

httpRequestMessage.Content = new ByteArrayContent(bytes)
{
  Headers =
  {
    ContentType = new MediaTypeHeaderValue("application/protobuf"),
    ContentEncoding = {"gzip"}
  }
};

var client = _httpClientFactory.CreateClient();
var response = await client.SendAsync(httpRequestMessage);
if (!response.IsSuccessStatusCode)
  _logger.LogWarning("Failed to send traces to Apollo Studio with error code {errorCode}", response.StatusCode);

Of note are the media type of "application/protobuf" and the content encoding set as "gzip". The "X-Api-Key" header is the standard authentication mechanism they use and the URL is fixed (no other servers exist).

Periodic Send of Batches

So after we've put this all together, nothing is sending since nothing is actually calling the Send method. We now need to create a class derived from BackgroundService and override the ExecuteAsync as follows:

while (!stoppingToken.IsCancellationRequested)
{
  using (var scope = _serviceProvider.CreateScope())
  {
    var graphQlTraceLogger = scope.ServiceProvider.GetRequiredService<IGraphQLTraceLogger>();

    // Send every 20 seconds or when forced due to size threshold reached
    var nextExecution = DateTime.Now.AddSeconds(20);
    await Task.WhenAny(graphQlTraceLogger.ForceSendTrigger.WaitAsync(), Task.Delay(Math.Max((int)(nextExecution - DateTime.Now).TotalMilliseconds, 0), stoppingToken));

    _logger.LogDebug("Sending queued traces...");
    await graphQlTraceLogger.Send();
  }
}

We don't strictly need to create a scope in the service provider as the service itself is singleton (and therefore our trace logger must be), but it's good practice to resolve your classes from a scope in a background service. You can also see for the ForceSendTrigger comes into it here - it can override the 20 second wait and send immediately. The reason for doing this all centrally is to avoid a race condition where multiple sends could happen at once. This approach completely eliminates that possibility.

Up Next

This article now gives you a completely working approach to sending traces to Apollo Studio from a GraphQL.NET implementation, but we haven't looked at how we configure that or how they'll appear. The last article in this series will cover those points.

DEV Community

Integrating Apollo Studio with GraphQL for .NET - Part 2

Creating a Client

Creating a Trace

Grouping Traces

Thread Safety

Sending to Apollo Studio

Thread Safety

Creating the Report

Preparing the Message

Periodic Send of Batches

Up Next

Top comments (0)