<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Reliably</title>
    <description>The latest articles on DEV Community by Reliably (@reliably).</description>
    <link>https://dev.to/reliably</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F4098%2Fbd1e4ffa-5e34-48cb-85ec-037e5c8e1def.png</url>
      <title>DEV Community: Reliably</title>
      <link>https://dev.to/reliably</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/reliably"/>
    <language>en</language>
    <item>
      <title>Observing the Reliability of your Java Apps and Services with Spring Boot, Micrometer, Prometheus &amp; Reliably</title>
      <dc:creator>Russ Miles</dc:creator>
      <pubDate>Wed, 04 Aug 2021 14:49:16 +0000</pubDate>
      <link>https://dev.to/reliably/observing-the-reliability-of-your-java-apps-and-services-with-spring-boot-micrometer-prometheus-reliably-27od</link>
      <guid>https://dev.to/reliably/observing-the-reliability-of-your-java-apps-and-services-with-spring-boot-micrometer-prometheus-reliably-27od</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F9216%2F1%2Abd7UX4O4EL3clA4mFxeLEw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F9216%2F1%2Abd7UX4O4EL3clA4mFxeLEw.jpeg" alt="Photo by [Marten Newhall](https://unsplash.com/@laughayette?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/magnifying-glass?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here at Reliably we are huge fans of &lt;a href="https://spring.io/projects/spring-boot" rel="noopener noreferrer"&gt;Spring Boot&lt;/a&gt; and the &lt;a href="https://micrometer.io" rel="noopener noreferrer"&gt;Micrometer&lt;/a&gt; dimensional metrics instrumentation library for providing the rich set of possible metrics that can be a great foundation for the Service Level Indicators that provide coverage for your &lt;a href="https://reliably.com/docs/getting-started/slos/" rel="noopener noreferrer"&gt;Reliably Service Level Objectives as Code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As of Spring Boot 2.0, Micrometer became the default instrumentation library for the huge range of Spring Boot applications, from monoliths to microservices. With Micrometer bakes in by default, we started to explore just how easy it would be to bring Reliably's "Developer-First" SLOs to bear on your Spring Boot apps and services.&lt;/p&gt;

&lt;p&gt;In this article we share our findings including how Reliably really can work "Bootifully" (TM, Josh Long :) ) with all your Spring Boot apps and services, out of the box and with no extra code required!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;NOTE: The full coded sample for this article is &lt;a href="https://github.com/Lawouach/spring-boot-prometheus-reliably" rel="noopener noreferrer"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Setup: Spring Boot, Prometheus &amp;amp; Reliably
&lt;/h2&gt;

&lt;p&gt;The exercise we wanted to conduct was to show how you could define and collaborate on Reliably Service Level Objectives that were measuring the availability of a simple Spring Boot service. To do this we needed three pieces in the mix:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3576%2F1%2A6E8cWoNwVH_ogyCjZml-lw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F3576%2F1%2A6E8cWoNwVH_ogyCjZml-lw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this approach, Spring Boot and Micrometer would push dimensional metrics to Prometheus. Then Reliably would use Prometheus queries to collate Service Level Indicators to back the Service Level Objectives being observed. Simple? Actually, it is…&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;NOTE: We chose Prometheus for this particular article but we could just have easily picked one of the other tools supported by Micrometer and Reliably, such as &lt;a href="https://dev.to/reliably/bringing-reliability-closer-to-you-with-reliably-and-datadog-2jbm"&gt;DataDog&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Sourcing Metrics from our Spring Boot Service
&lt;/h2&gt;

&lt;p&gt;To get things built as quickly and easily as possible, we used the &lt;a href="https://start.spring.io" rel="noopener noreferrer"&gt;Spring Initializr&lt;/a&gt; to generate a very simple HTTP-based application that did nothing more than provide a default root / response of "Greetings from Spring Boot!" to provide the service that we'd look to observe our SLOs on.&lt;/p&gt;

&lt;p&gt;As mentioned in the introduction, by default Spring Boot applications come with all the power of Micrometer by default, the only thing we needed to do was make sure that our Spring Boot service's metrics could be scraped by Prometheus by adding a single line to our service's application.properties:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h2&gt;
  
  
  Setting up Prometheus to Scrape the Metrics
&lt;/h2&gt;

&lt;p&gt;Next we added a simple Scraper configuration to our instance of Prometheus to periodically grab all the Micrometer metrics for our Spring Boot service from the endpoint we configured in the previous step:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;
&lt;br&gt;
With this config in place, Prometheus will grab the metrics from our Spring Boot service every 15 seconds.
&lt;h2&gt;
  
  
  Creating and Observing the Reliably SLOs as Code
&lt;/h2&gt;

&lt;p&gt;The final step was for us to use the new Prometheus support in the Reliably CLI v0.23.0 to create our SLO with an SLI implemented as an appropriate Prometheus query:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;This SLO definition includes a Service Level Indicator (SLI) that queries Prometheus for the appropriate metrics to help us judge that the SLO is being met, or not. &lt;/p&gt;

&lt;h2&gt;
  
  
  Pushing your SLIs to Reliably
&lt;/h2&gt;

&lt;p&gt;With your Service Level Objective, and corresponding Prometheus-driven Service Level Indicators, in hand, you can now begin pushing the SLO and SLIs over time to Reliably using the &lt;code&gt;reliably slo agent&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reliably slo agent &lt;span class="nt"&gt;-i&lt;/span&gt; 10
INFO[0000] &lt;span class="nt"&gt;---&lt;/span&gt; starting slo indicator agent &lt;span class="nt"&gt;---&lt;/span&gt;         
INFO[0000] getting indicators &lt;span class="k"&gt;for &lt;/span&gt;objective: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'99% of requests return 2xx over last 1 hour'&lt;/span&gt;, &lt;span class="nv"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'my spring boot app'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; 
INFO[0000] indicator percent: &lt;span class="o"&gt;[&lt;/span&gt;98.91] &lt;span class="k"&gt;for &lt;/span&gt;objective: &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'99% of requests return 2xx over last 1 hour'&lt;/span&gt;, &lt;span class="nv"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'my spring boot app'&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Observing your SLOs from your Command Line
&lt;/h2&gt;

&lt;p&gt;With our Spring Boot service running and receiving requests, our Prometheus instance scraping the available metrics, and Reliably monitoring the above SLO, we have successfully defined and can observe a Java Spring Boot application's reliability with as little code as possible! The final step is to observe the status of your SLO using the &lt;code&gt;reliably slo report -w -m reliably.yaml&lt;/code&gt; command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwnnrp67vr9amv07re8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwnnrp67vr9amv07re8c.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to go next: We need … You!
&lt;/h2&gt;

&lt;p&gt;Our goal is to shift reliability left by making it as easy as possible for you and your team to be able to define, observe and learn how to make your system's reliable.  As such we are constantly looking to make it easier for people to collaborate, code and observe Service Level Objectives and Indicators for their own bespoke needs. &lt;/p&gt;

&lt;p&gt;You can check out all the different tools that we currently integrate with in &lt;a href="https://reliably.com/docs/" rel="noopener noreferrer"&gt;our docs&lt;/a&gt;, but if there's something you don't see then please &lt;a href="https://reliably.com/contact/" rel="noopener noreferrer"&gt;get in touch&lt;/a&gt; or maybe even raise a ticket and PR yourself on our &lt;a href="https://github.com/reliablyhq/cli" rel="noopener noreferrer"&gt;free and open source Reliably CLI project&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>prometheus</category>
      <category>spring</category>
    </item>
    <item>
      <title>How does chaos engineering relate to the mathematical definitions of chaos?</title>
      <dc:creator>Mick Roper</dc:creator>
      <pubDate>Thu, 29 Jul 2021 12:56:05 +0000</pubDate>
      <link>https://dev.to/reliably/how-does-chaos-engineering-relate-to-the-mathematical-definitions-of-chaos-5685</link>
      <guid>https://dev.to/reliably/how-does-chaos-engineering-relate-to-the-mathematical-definitions-of-chaos-5685</guid>
      <description>&lt;p&gt;Recently, we had the pleasure of sharing Reliably's ideas on proactive reliability practices with a fabulous group of devops, engineers, architects and SREs at a bank. In conversations that followed, we discussed how chaos engineering relates to the mathematical definitions of chaos.&lt;/p&gt;

&lt;p&gt;In my view Chaos Engineering principles do align with mathematical chaos, where a chaotic dynamic system is highly sensitive to input conditions, and can generate a non-linear outcome depending on those conditions, as well as the evolution of those conditions over time.&lt;/p&gt;

&lt;p&gt;A good analogy would be weather - it exhibits many of the same conditions of a complex computer system where the input conditions are themselves complex systems, and therefore the outcome is complex and hard to model accurately (which is why 'weather predictions' should be renamed 'weather probabilities').&lt;/p&gt;

&lt;p&gt;Any sufficiently complex system can be subject to the behaviours described by chaos theory, and chaos engineering is simply exploring how the system might respond to turbulent conditions. Turbulence was carefully chosen as the term in the Principles of Chaos because it evokes the weather system example as any attempt at proving and predicting from design principles will have the same problems as predicting the weather.&lt;/p&gt;

&lt;p&gt;The upside with computer systems over weather systems is that, thanks to tools like the Chaos Toolkit (CTK), we don't simulate the the entire system (like you have to do with weather) but instead run a bounded, controlled experiment that allows us to develop an appreciation for the probability of an outcome when given certain inputs.&lt;/p&gt;

&lt;p&gt;I believe that the goal of modern software architecture is to manage the amount of dynamism in a given system. That's a big reason for using things like 'Bounded Context' from DDD, de-coupled service-oriented architectures and *-as-code tools like Terraform, Snyk and Reliably to manage and understand the occurrence of complex events that might impact our system. By reducing the opportunity for an unknown, unplanned input we reduce the opportunity for an unknown, unplanned output from our system.&lt;/p&gt;

&lt;p&gt;These ideas have informed the development and roadmaps of both Reliably and the Chaos Toolkit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://chaostoolkit.org/reference/tutorial/"&gt;Chaos Toolkit&lt;/a&gt; enables developer-first discovery of your system's weaknesses through the exploration and testing of your systems as code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reliably.com/docs/getting-started/"&gt;Reliably&lt;/a&gt; provides developer-first cloud native application reliability for teams. It enables developers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Define&lt;/strong&gt; service level objectives as code with your team and all stakeholders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observe&lt;/strong&gt; service level objective trends over time, surfacing detected reliability weaknesses as you code, continuously with gates and guardrails for Service Level Objective trends and detected weaknesses, right in your own CI/CD platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alert&lt;/strong&gt; teams, when your reliability is trending in the wrong direction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore&lt;/strong&gt; the impact of chaotic conditions on your reliability through chaos engineering experiments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; reliability problems using the best advice for your infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; continuously if your reliability fixes are having the right effects on your Service Level Objectives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learn&lt;/strong&gt;, collaborate and share how reliability is managed amongst your team and across your entire organisation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are interested in the concepts and best practices for pro-active reliability and want to learn more, you may find the following resources useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Get started with&lt;/strong&gt; &lt;a href="https://chaostoolkit.org/reference/tutorial/#:~:text=Getting%20Started%20with%20the%20Chaos%20Toolkit%20When%20you,tour%20of%20the%20basic%20elements%20of%20an%20experiment."&gt;Chaos Toolkit here&lt;/a&gt; and &lt;a href="https://join.slack.com/t/chaostoolkit/shared_invite/zt-tyvt79uo-03wPnzgujSZPDvEDONUvaA"&gt;join the Chaos Toolkit community on slack here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get started with&lt;/strong&gt; &lt;a href="https://reliably.com/docs/getting-started/"&gt;Reliably here&lt;/a&gt;. The easiest and quickest way of getting started with Reliably is to run the Reliably CLI on your machine with your local source code files. You can find a link to &lt;a href="https://reliably.com/docs/getting-started/install/#quick-install-guide"&gt;the install guide here&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join the&lt;/strong&gt; &lt;a href="https://www.meetup.com/globalsre/"&gt;Reliability meetup group&lt;/a&gt; to learn and share skills and experience with fellow engineers who are introducing pro-active reliability in their organisations. &lt;em&gt;The meetup was paused for a little while due to Covid, but will resume meetups in person and online from September 2021&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Are you introducing chaos engineering or proactive reliability practices in your organisation? I'd love to hear more about your experience! Ping me &lt;a href="https://join.slack.com/t/chaostoolkit/shared_invite/zt-tyvt79uo-03wPnzgujSZPDvEDONUvaA"&gt;on slack&lt;/a&gt; or share your thoughts below! &lt;/p&gt;

</description>
      <category>chaos</category>
      <category>reliably</category>
      <category>reliability</category>
      <category>chaostoolkit</category>
    </item>
    <item>
      <title>eBPF for SRE with Reliably</title>
      <dc:creator>Sylvain Hellegouarch</dc:creator>
      <pubDate>Wed, 28 Jul 2021 21:20:46 +0000</pubDate>
      <link>https://dev.to/reliably/ebpf-for-sre-with-reliably-18dc</link>
      <guid>https://dev.to/reliably/ebpf-for-sre-with-reliably-18dc</guid>
      <description>&lt;p&gt;eBPF is a funny piece of technology, it is based of a BPF which is almost as old as Linux itself and yet eBPF has been trending heavily for the past couple of years. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs81m5rhmudoulgw3rhe8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs81m5rhmudoulgw3rhe8.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my book, &lt;a href="https://ebpf.io/" rel="noopener noreferrer"&gt;eBPF&lt;/a&gt; is a system event generator. By tapping into that event pool and listening at the right level, you can gain tons of insights of your system.&lt;/p&gt;

&lt;p&gt;Funnily enough, SRE has also been trending for a couple of years and it precisely talk about how the health of the system.&lt;/p&gt;

&lt;p&gt;So, could these two be made for each other? Well, maybe not in such candid terms, but there is something very appealing to bring them closer.&lt;/p&gt;

&lt;p&gt;SRE introduced Service Level Objective (&lt;a href="https://sre.google/workbook/implementing-slos/" rel="noopener noreferrer"&gt;SLO&lt;/a&gt;) and Service Level Indicators (SLI). SLO encode what good looks like for aspects that matter to you in your system. SLI encode the metrics that aggregates as the SLO's target. eBPF is an interesting data source for indicators.&lt;/p&gt;

&lt;p&gt;For instance, say you have this eBPF program (via &lt;a href="https://github.com/iovisor/bcc" rel="noopener noreferrer"&gt;BCC&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;uapi/linux/ptrace.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;net/sock.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;bcc/proto.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#define IP_TCP  6
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;http_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__sk_buff&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;skb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;u8&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// let's not care for anything not Ethernet or TCP&lt;/span&gt;
    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ethernet_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ethernet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor_advance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ethernet&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ethernet&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mh"&gt;0x0800&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;ip_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor_advance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nextp&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;IP_TCP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple &lt;a href="https://www.kernel.org/doc/html/latest/networking/filter.html" rel="noopener noreferrer"&gt;socket filtering&lt;/a&gt; really. We only keep TCP packets to inspect them in user-land and ignore the rest.&lt;/p&gt;

&lt;p&gt;From user-land, we now have a Python program that injects this program into the kernel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MySocketHndl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;psocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SocketHndl&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BPF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        BCC gives the us a socket to listen from. Bind to it.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;function_http_filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http_filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BPF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOCKET_FILTER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;BPF&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;attach_raw_socket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_http_filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;socket_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;function_http_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_socket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromfd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;socket_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PF_PACKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOCK_RAW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IPPROTO_IP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# blocks forever when timeout is None
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;settimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bpfsock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Loads the BPF program and starts listening on the socket we attach
    to the interface. Cleanup when finished.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BPF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ebpf.c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;psock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MySocketHndl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;psock&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;psock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We simply attach to the interface and add our filter to the socket used to listen on the interface. Now we can process packets as we see them. We then dismiss any packet we don't care about, here anything not HTTP, and we parse valid packets as HTTP requests/responses, using the awesome &lt;a href="https://gitlab.com/mike01/pypacker/-/tree/master" rel="noopener noreferrer"&gt;pypacker&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filter_pkt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ethernet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ethernet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Process only packets that are going to or from the target server.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ethernet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ethernet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tcp_p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tcp_p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dport&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target_port&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;tcp_p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sport&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target_port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;bpfsock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iface&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;psock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pkt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;psock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recvp_iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filter_match_recv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filter_pkt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pkt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TCP&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;body_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Boom, we're gold! &lt;/p&gt;

&lt;p&gt;From there on, all we have to do is collect some information about requests/responses we see (duration, status code, path requested...) and aggregate ratios over time window we are interested in tracking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;class_2xx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;good_latency_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;window_start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;window_end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="c1"&gt;# our SLO latency is 150ms
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;good_latency_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;class_2xx&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="k"&gt;continue&lt;/span&gt;

   &lt;span class="n"&gt;indicators&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;availability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_push&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next_push&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="mf"&gt;100.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;class_2xx&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;indicators&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_push&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next_push&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="mf"&gt;100.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;good_latency_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing fancy here.&lt;/p&gt;

&lt;p&gt;Now that we have our indicators, we can send them to Reliably to generate our SLO results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_indicators&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;indicator_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;indicators&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TOKEN&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;indicator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;indicator_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;
             &lt;span class="p"&gt;}&lt;/span&gt;
         &lt;span class="p"&gt;},&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;from_ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;to_ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;indicator_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;percentile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
       &lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;150ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reliably_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, nothing fancy.&lt;/p&gt;

&lt;p&gt;At this stage, &lt;a href="https://reliably.com/" rel="noopener noreferrer"&gt;Reliably&lt;/a&gt; can now generate SLO results you can start viewing using the &lt;a href="https://reliably.com/docs/getting-started/" rel="noopener noreferrer"&gt;Reliably CLI&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;reliably slo report
&lt;span class="go"&gt;Refreshing SLO report every 3 seconds. Press CTRL+C to quit.
                                                                  Current  Objective  / Time Window     Type             Trend      
&lt;/span&gt;&lt;span class="gp"&gt;  Service #&lt;/span&gt;1: ebpf-2021-demo                                 
&lt;span class="go"&gt;  ❌ 99% of the responses are under 150ms                          98.99%      99.5%  / 10s             Latency          ✕ ✕ ✕ ✓ ✕  
  ❌ 99% of the responses to our users are in the 2xx class        98.66%        99%  / 10s             Availability     ✓ ✓ ✕ ✓ ✕  
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kaboom! We have now successfully mapped low-level ebpf events to high-level SLO constructs.&lt;/p&gt;

&lt;p&gt;Obviously, this is a rather trivial showcase but it's promising. Nevertheless, there is some rout to cover before the whole process becomes more attractive as eBPF's UX is perhaps not as transparent as one could hope for. &lt;/p&gt;

&lt;p&gt;Still, so much fun!&lt;/p&gt;

&lt;p&gt;The code can be found at &lt;a href="https://github.com/Lawouach/ebpf-2021-talk" rel="noopener noreferrer"&gt;https://github.com/Lawouach/ebpf-2021-talk&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>ebpf</category>
    </item>
    <item>
      <title>Bringing reliability closer to you with Reliably and DataDog</title>
      <dc:creator>Sylvain Hellegouarch</dc:creator>
      <pubDate>Fri, 23 Jul 2021 10:14:04 +0000</pubDate>
      <link>https://dev.to/reliably/bringing-reliability-closer-to-you-with-reliably-and-datadog-2jbm</link>
      <guid>https://dev.to/reliably/bringing-reliability-closer-to-you-with-reliably-and-datadog-2jbm</guid>
      <description>&lt;p&gt;As engineers we care about our users, at least we ought to :) They depend on us and our services to run just fine. This is reliability in a nutshell.&lt;/p&gt;

&lt;p&gt;Site Reliability Engineering, or SRE if you're casual, has gained momentum to codify this view on reliability. This article is not about detailing SRE but focusing on how we can use one of its tools, Service Level Objectives (or SLO for short), to signal loss of reliability as close to engineers as we can.&lt;/p&gt;

&lt;p&gt;Let's say we have a web application like this one below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.applications&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.routing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Route&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'hello'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'world'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;routes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Nothing fancy about it, just a &lt;em&gt;Hello World&lt;/em&gt; example. When running it as follows:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;uvicorn &lt;span class="nt"&gt;--reload&lt;/span&gt; server:app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;where &lt;code&gt;server&lt;/code&gt; is the name of the Python module containing that code: server.py. The &lt;code&gt;--reload&lt;/code&gt; flag allows us to change the code and let uvicorn restart the server automatically.&lt;/p&gt;

&lt;p&gt;We can access this server as follows:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl localhost:8000/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now; let's run a basic load against this server using &lt;a href="https://github.com/rakyll/hey"&gt;hey&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;hey &lt;span class="nt"&gt;-c&lt;/span&gt; 3 &lt;span class="nt"&gt;-q&lt;/span&gt; 10 &lt;span class="nt"&gt;-z&lt;/span&gt; 20s http://localhost:8000/

Summary:
  Total:    20.0125 secs
  Slowest:  0.0164 secs
  Fastest:  0.0020 secs
  Average:  0.0046 secs
  Requests/sec: 29.9813

  Total data:   10200 bytes
  Size/request: 17 bytes

Response &lt;span class="nb"&gt;time &lt;/span&gt;histogram:
  0.002 &lt;span class="o"&gt;[&lt;/span&gt;1] |
  0.003 &lt;span class="o"&gt;[&lt;/span&gt;152]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 &lt;span class="o"&gt;[&lt;/span&gt;184]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 &lt;span class="o"&gt;[&lt;/span&gt;195]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.008 &lt;span class="o"&gt;[&lt;/span&gt;53]    |■■■■■■■■■■■
  0.009 &lt;span class="o"&gt;[&lt;/span&gt;11]    |■■
  0.011 &lt;span class="o"&gt;[&lt;/span&gt;1] |
  0.012 &lt;span class="o"&gt;[&lt;/span&gt;1] |
  0.014 &lt;span class="o"&gt;[&lt;/span&gt;0] |
  0.015 &lt;span class="o"&gt;[&lt;/span&gt;0] |
  0.016 &lt;span class="o"&gt;[&lt;/span&gt;2] |


Latency distribution:
  10% &lt;span class="k"&gt;in &lt;/span&gt;0.0027 secs
  25% &lt;span class="k"&gt;in &lt;/span&gt;0.0033 secs
  50% &lt;span class="k"&gt;in &lt;/span&gt;0.0043 secs
  75% &lt;span class="k"&gt;in &lt;/span&gt;0.0055 secs
  90% &lt;span class="k"&gt;in &lt;/span&gt;0.0067 secs
  95% &lt;span class="k"&gt;in &lt;/span&gt;0.0074 secs
  99% &lt;span class="k"&gt;in &lt;/span&gt;0.0083 secs

Details &lt;span class="o"&gt;(&lt;/span&gt;average, fastest, slowest&lt;span class="o"&gt;)&lt;/span&gt;:
  DNS+dialup:   0.0000 secs, 0.0020 secs, 0.0164 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0005 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0002 secs
  resp &lt;span class="nb"&gt;wait&lt;/span&gt;:    0.0043 secs, 0.0017 secs, 0.0161 secs
  resp &lt;span class="nb"&gt;read&lt;/span&gt;:    0.0001 secs, 0.0001 secs, 0.0007 secs

Status code distribution:
  &lt;span class="o"&gt;[&lt;/span&gt;200] 600 responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This will gently load our server without going overboard.&lt;/p&gt;

&lt;p&gt;We likely want to monitor this server, why not use &lt;a href="https://www.datadoghq.com/"&gt;DataDog&lt;/a&gt; to do so, as follows:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ddtrace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;ddtrace.profiling.auto&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.applications&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.routing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Route&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'hello'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'world'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;starlette&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;starlette&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'service_name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'my-test-service'&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;routes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;homepage&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What differs is that we are importing DataDog &lt;a href="https://docs.datadoghq.com/tracing/setup_overview/setup/python/"&gt;ddtrace&lt;/a&gt; to push requests to the &lt;a href="https://docs.datadoghq.com/agent/docker/?tab=standard"&gt;local DataDog agent&lt;/a&gt;. The agent is started as follows on a different terminal:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;...
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DD_SITE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;datadoghq.eu

&lt;span class="nv"&gt;$ &lt;/span&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; dd-agent &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock:ro &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /proc/:/host/proc/:ro &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /sys/fs/cgroup/:/host/sys/fs/cgroup:ro &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;DD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DD_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;DD_SITE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DD_SITE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;DD_APM_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;DD_APM_NON_LOCAL_TRAFFIC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-p&lt;/span&gt; 8126:8126/tcp &lt;span class="se"&gt;\&lt;/span&gt;
gcr.io/datadoghq/agent:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;After a couple of minutes, you'll be able to search for metrics from this application on DataDog. Look for metrics with &lt;code&gt;starlette&lt;/code&gt; in the name.&lt;/p&gt;

&lt;p&gt;Could we now trick the application into raising odd errors to fake a faulty service? Why yes of course! By simply returning a 4xx or 5xx class of errors at random from time to time:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;random&lt;/span&gt;


&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ddtrace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;ddtrace.profiling.auto&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.applications&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.requests&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;starlette.routing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Route&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'boom'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;'hello'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'world'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;starlette&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;starlette&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'distributed_tracing'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;starlette&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'service_name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'my-frontend-service'&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;routes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's see how this impacts our client now, run again our mild load:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;hey &lt;span class="nt"&gt;-c&lt;/span&gt; 3 &lt;span class="nt"&gt;-q&lt;/span&gt; 10 &lt;span class="nt"&gt;-z&lt;/span&gt; 20s http://localhost:8000/

Summary:
  Total:    20.0120 secs
  Slowest:  0.0189 secs
  Fastest:  0.0018 secs
  Average:  0.0051 secs
  Requests/sec: 29.9820

  Total data:   10142 bytes
  Size/request: 16 bytes

Response &lt;span class="nb"&gt;time &lt;/span&gt;histogram:
  0.002 &lt;span class="o"&gt;[&lt;/span&gt;1] |
  0.004 &lt;span class="o"&gt;[&lt;/span&gt;146]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 &lt;span class="o"&gt;[&lt;/span&gt;193]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.007 &lt;span class="o"&gt;[&lt;/span&gt;146]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.009 &lt;span class="o"&gt;[&lt;/span&gt;101]   |■■■■■■■■■■■■■■■■■■■■■
  0.010 &lt;span class="o"&gt;[&lt;/span&gt;10]    |■■
  0.012 &lt;span class="o"&gt;[&lt;/span&gt;0] |
  0.014 &lt;span class="o"&gt;[&lt;/span&gt;0] |
  0.016 &lt;span class="o"&gt;[&lt;/span&gt;0] |
  0.017 &lt;span class="o"&gt;[&lt;/span&gt;1] |
  0.019 &lt;span class="o"&gt;[&lt;/span&gt;2] |


Latency distribution:
  10% &lt;span class="k"&gt;in &lt;/span&gt;0.0029 secs
  25% &lt;span class="k"&gt;in &lt;/span&gt;0.0036 secs
  50% &lt;span class="k"&gt;in &lt;/span&gt;0.0050 secs
  75% &lt;span class="k"&gt;in &lt;/span&gt;0.0065 secs
  90% &lt;span class="k"&gt;in &lt;/span&gt;0.0074 secs
  95% &lt;span class="k"&gt;in &lt;/span&gt;0.0079 secs
  99% &lt;span class="k"&gt;in &lt;/span&gt;0.0092 secs

Details &lt;span class="o"&gt;(&lt;/span&gt;average, fastest, slowest&lt;span class="o"&gt;)&lt;/span&gt;:
  DNS+dialup:   0.0000 secs, 0.0018 secs, 0.0189 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0005 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0002 secs
  resp &lt;span class="nb"&gt;wait&lt;/span&gt;:    0.0048 secs, 0.0017 secs, 0.0187 secs
  resp &lt;span class="nb"&gt;read&lt;/span&gt;:    0.0002 secs, 0.0001 secs, 0.0011 secs

Status code distribution:
  &lt;span class="o"&gt;[&lt;/span&gt;200] 542 responses
  &lt;span class="o"&gt;[&lt;/span&gt;500] 58 responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now notice how we get a summary that does show us some responses were in errors as per our change above. Yai we broke something!&lt;/p&gt;

&lt;p&gt;Can we now ask DataDog about these recorded errors? Yes we can:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# datadog info (change them to fit your owns)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DD_APP_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DD_SITE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;datadoghq.eu

&lt;span class="c"&gt;# your query data&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"(sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count() - sum:trace.starlette.request.errors{service:my-test-service,resource_name:get_/}.as_count()) / (sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count())"&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"15 min ago"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-G&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="s2"&gt;"https://api.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DD_SITE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/api/v1/query"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-urlencode&lt;/span&gt; &lt;span class="s2"&gt;"from=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-urlencode&lt;/span&gt; &lt;span class="s2"&gt;"to=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-urlencode&lt;/span&gt; &lt;span class="s2"&gt;"query=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"DD-API-KEY: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DD_API_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"DD-APPLICATION-KEY: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DD_APP_KEY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The query we are running may look daunting but is rather straightforward. We take the total number of requests and we remove the ones that were on error. We then divide by the total again and this should give us a ratio of good requests as a percentage.&lt;/p&gt;

&lt;p&gt;Great, we now have a query we can use to create a service level object (SLO) that will tell us how our service is doing over time. Let's use &lt;a href="https://reliably.com"&gt;Reliably&lt;/a&gt; for this.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reliably slo init
? What is the name of the service you want to &lt;span class="nb"&gt;declare &lt;/span&gt;SLOs &lt;span class="k"&gt;for&lt;/span&gt;? my-frontend-service
| Paste your &lt;span class="s1"&gt;'numerator'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;good events&lt;span class="o"&gt;)&lt;/span&gt; datadog query: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;:trace.starlette.request.hits&lt;span class="o"&gt;{&lt;/span&gt;service:my-test-service,resource_name:get_/&lt;span class="o"&gt;}&lt;/span&gt;.as_count&lt;span class="o"&gt;()&lt;/span&gt; - &lt;span class="nb"&gt;sum&lt;/span&gt;:trace.starlette.request.errors&lt;span class="o"&gt;{&lt;/span&gt;service:my-test-service,resource_name:get_/&lt;span class="o"&gt;}&lt;/span&gt;.as_count&lt;span class="o"&gt;())&lt;/span&gt;
| Paste your &lt;span class="s1"&gt;'denominator'&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;total events&lt;span class="o"&gt;)&lt;/span&gt; datadog query: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;:trace.starlette.request.hits&lt;span class="o"&gt;{&lt;/span&gt;service:my-test-service,resource_name:get_/&lt;span class="o"&gt;}&lt;/span&gt;.as_count&lt;span class="o"&gt;())&lt;/span&gt;
? What is your target &lt;span class="k"&gt;for &lt;/span&gt;this SLO &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt; %&lt;span class="o"&gt;)&lt;/span&gt;? 99
? What is your observation window &lt;span class="k"&gt;for &lt;/span&gt;this SLO? custom
? Define your custom observation window PT5M

? What is the name of this SLO? 99% of frontend responses over last 5 minutes are 2xx
SLO &lt;span class="s1"&gt;'99% of frontend responses over last 5 minutes are 2xx'&lt;/span&gt; added to Service &lt;span class="s1"&gt;'my-frontend-service'&lt;/span&gt;

? Do you want to add another SLO? No
Service &lt;span class="s1"&gt;'my-frontend-service'&lt;/span&gt; added

? Do you want to add another Service? No

✓ Your manifest has been saved to ./reliably.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In a nutshell, we created a file that contains the definition of the SLO:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reliably.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Objective&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;99% of requests  over last 5 minutes&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-test-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;indicatorSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;datadog_denominator_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count()&lt;/span&gt;
    &lt;span class="na"&gt;datadog_numerator_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count() / sum:trace.starlette.request.errors{service:my-test-service,resource_name:get_/}.as_count())&lt;/span&gt;
  &lt;span class="na"&gt;objectivePercent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;99&lt;/span&gt;
  &lt;span class="na"&gt;window&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1h0m0s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now we can make reliably know about it:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reliably slo &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Finally, while the application is still running with some load injected into it, start fetching data from DataDog, using the query we saw earlier and let Reliably consolidate them over the window duration given in the objective:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reliably slo agent &lt;span class="nt"&gt;-i3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Open now a new terminal and run the following:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reliably slo report &lt;span class="nt"&gt;-w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This will show you the SLO report for your service as computed by Reliably.&lt;/p&gt;


&lt;div class="ltag_asciinema"&gt;
  
&lt;/div&gt;



&lt;p&gt;So what happened exactly? Well, let's zoom in on a section of the SLO:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;indicatorSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;datadog_denominator_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count()&lt;/span&gt;
    &lt;span class="na"&gt;datadog_numerator_query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(sum:trace.starlette.request.hits{service:my-test-service,resource_name:get_/}.as_count() / sum:trace.starlette.request.errors{service:my-test-service,resource_name:get_/}.as_count())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;indicatorSelector&lt;/code&gt; property is how the magic happens. These are used for the following purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;giving the &lt;code&gt;reliably slo agent&lt;/code&gt; command the means to know what provider to use, here DataDog, and therefore how to fetch the required datapoints, here the two queries. These datapoints are stored under the name of &lt;em&gt;indicators&lt;/em&gt; on Reliably&lt;/li&gt;
&lt;li&gt;declaring how these objective and the indicators are mapped together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That second point is key. Indicators themselves are not declared as entities (or objects) as &lt;em&gt;objectives&lt;/em&gt; are. Instead they are merely a stream of values consumed by Reliably when sent by a client (&lt;code&gt;reliably slo agent&lt;/code&gt; or via the API directly). Upon receiving an indicator, Reliably looks at its labels and match this to any &lt;code&gt;indicatorSelector&lt;/code&gt; of any objectives (in the current organization). This tells us that objectives and indicators are loosly coupled. The fact the &lt;code&gt;reliably.yaml&lt;/code&gt; manifest contains the selector doesn't define the indicator, only how to match indicators to objectives.&lt;/p&gt;

&lt;p&gt;At this stage, you have a simple declaration of a service level object that relies on DataDog's data to compute it. Since the SLO is a just a file, you can now store it alongside your code base and use it as part of your CI/CD pipeline to automate decision about releasing. We'll see this in a future article using GitHub actions.&lt;/p&gt;

&lt;p&gt;The code for this article can be found on &lt;a href="https://github.com/Lawouach/reliably-and-datadog"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>sre</category>
      <category>reliability</category>
      <category>reliably</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
