<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yash Nigam</title>
    <description>The latest articles on DEV Community by Yash Nigam (@yashn).</description>
    <link>https://dev.to/yashn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2287820%2F50d2423c-bec3-4474-b428-9c7c224a8fa9.png</url>
      <title>DEV Community: Yash Nigam</title>
      <link>https://dev.to/yashn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yashn"/>
    <language>en</language>
    <item>
      <title>Understanding Open telemetry and Observability for SRE</title>
      <dc:creator>Yash Nigam</dc:creator>
      <pubDate>Sun, 27 Oct 2024 20:19:57 +0000</pubDate>
      <link>https://dev.to/yashn/understanding-open-telemetry-and-observability-for-sre-58m1</link>
      <guid>https://dev.to/yashn/understanding-open-telemetry-and-observability-for-sre-58m1</guid>
      <description>&lt;p&gt;&lt;strong&gt;Understanding of OpenTelemetry and Observability is essential for an SRE in any org. This blog post is my attempt to lay down a good understanding of OT after reading the following book:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud-Native Observability with OpenTelemetry from Packt Publishing &lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  From an high level OT can be described as:
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;A Framework to produce telemetry from your applications using open standards&lt;/li&gt;
&lt;li&gt;Concept of signals - traces, metrics, and logs&lt;/li&gt;
&lt;li&gt;Produce telemetry for these signals using OT APIs&lt;/li&gt;
&lt;li&gt;provides Tools to gain visibility into the performance of your services by combining tracing, metrics, and logging.&lt;/li&gt;
&lt;li&gt;allows you to instrument your application code through vendor-neutral APIs, libraries and tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Before Proceeding with Open telemetry let us list down and understand some other useful concepts and technologies which are interconnected:&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Cloud Native Applications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;There has been a shift to Microservices based Architecture for deploying an running applications aided by cloud services such as Kubernetes and serverless.&lt;/li&gt;
&lt;li&gt;The Applications are now Distributed amongst multiple cloud services, and scaled horizontally, producing logs at multiple places.&lt;/li&gt;
&lt;li&gt;The services are loosely coupled and operate independently.&lt;/li&gt;
&lt;li&gt;In such cases Latency is introduced between calling services as each service sits in it own container.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A Shift towards DevOps
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Small teams(4 to 6 people) managing their own microservices&lt;/li&gt;
&lt;li&gt;Developers own the lifecycle of code through all the stages, do all the work write, test, build code, package, deploy and operate the code in prod instance.(with aid of SRE)&lt;/li&gt;
&lt;li&gt;This Accelerates feature development&lt;/li&gt;
&lt;li&gt;However, as microservices increase - No one has the full picture, and it becomes difficult to find what caused an outage.&lt;/li&gt;
&lt;li&gt;Dev teams have to learn multiple tools For Building, Deploying, Monitoring.. etc which shifts their focus from their main task - coding.&lt;/li&gt;
&lt;li&gt;They may struggle to identify the root cause of production issues as there is not enough visibility across the managed systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;Observability can be defined in different ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;As per &lt;a href="https://en.wikipedia.org/wiki/Observability" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Observability&lt;/a&gt;, "In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs."&lt;/li&gt;
&lt;li&gt;The the ability to answer questions:

&lt;ul&gt;
&lt;li&gt;Is the system doing what I think it should be?&lt;/li&gt;
&lt;li&gt;If a problem occurred in production, what evidence would you have to be able to identify it?&lt;/li&gt;
&lt;li&gt;Why is this service suddenly overwhelmed when it was fine just a minute ago?&lt;/li&gt;
&lt;li&gt;If a specific condition from a client triggers an anomaly in some underlying service, would you know it without customers or support calling you?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;empowering the people who build and operate distributed applications to understand their code's behaviour while running in production&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;OT ultimately enables observability for the application on which it is configured, Historically observability has been achieved using the following:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Centralized logging
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt; For an application which is large and distributed across enough systems, searching through the logs on individual machines is not practical. &lt;/li&gt;
&lt;li&gt; Applications can also run on ephemeral machines that may no longer be present when we need those logs. &lt;/li&gt;
&lt;li&gt;  Need to make the logs available in a central location for persistent storage and searchability, and thus centralized logging was born&lt;/li&gt;
&lt;li&gt;  Tools for logging

&lt;ul&gt;
&lt;li&gt;Fluentd&lt;/li&gt;
&lt;li&gt;Logstash&lt;/li&gt;
&lt;li&gt;Apache Flume&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Metrics and dashboards
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Measuring application and system performance via the collection of metrics(signals)&lt;/li&gt;
&lt;li&gt;Metrics can also be used to configure alerting when an error rate becomes greater than an acceptable percentage.&lt;/li&gt;
&lt;li&gt;Tools:

&lt;ul&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;StatsD&lt;/li&gt;
&lt;li&gt;Graphite&lt;/li&gt;
&lt;li&gt;Grafana&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Tracing and analysis
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Tracing applications means having the ability to run through the application code and ensure it's doing what is expected.(Generally done via a debugger in IDE)&lt;/li&gt;
&lt;li&gt;This becomes impossible when debugging an application that is spread across multiple services on different hosts across a network.&lt;/li&gt;
&lt;li&gt;Google whitepaper on same: Dapper (https:// research.google/pubs/pub36356/)&lt;/li&gt;
&lt;li&gt;Tools:

&lt;ul&gt;
&lt;li&gt;Opentracing&lt;/li&gt;
&lt;li&gt;zipkin&lt;/li&gt;
&lt;li&gt;Jaegar&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Challenges
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Multiple tools for logging, tracing and metrics monitoring&lt;/li&gt;
&lt;li&gt;Multiple standards, libraries, methods&lt;/li&gt;
&lt;li&gt;Time needed to instrumenting the code/application to generate logs, traces and metrics, and the time needed to integrate the tools depending on complexity&lt;/li&gt;
&lt;li&gt;ROI&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Describing OpenTelemetry
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;OT is an ecosystem or an framework for application running on cloud native services.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Standardize how applications are instrumented and how telemetry data is generated, collected, and transmitted&lt;/li&gt;
&lt;li&gt;Give users the tools necessary to correlate that telemetry across systems, languages, and applications&lt;/li&gt;
&lt;li&gt;An open specification&lt;/li&gt;
&lt;li&gt;Language-specific APIs and SDKs&lt;/li&gt;
&lt;li&gt;Instrumentation libraries&lt;/li&gt;
&lt;li&gt;Semantic conventions&lt;/li&gt;
&lt;li&gt;An agent to collect telemetry&lt;/li&gt;
&lt;li&gt;A protocol to organize, transmit, and receive the data&lt;/li&gt;
&lt;li&gt;OpenTelemetry has implementations in 11 languages&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Core Concepts/Categories of conecerns of Opentelemetry
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Signals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Signals represent the core of the telemetry data that is generated by instrumenting.&lt;/li&gt;
&lt;li&gt;Signals are : 
a) Tracing
b) Baggage
c) Metrics
d) Logging&lt;/li&gt;
&lt;li&gt;Real power of OpenTelemetry is to allow its users to correlate data across signals to get a better understanding of their systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Specification:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/open-telemetry/opentelemetry-specification" rel="noopener noreferrer"&gt;https://github.com/open-telemetry/opentelemetry-specification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Data Model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/otel/metrics/data-model/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/specs/otel/logs/data-model/" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/otel/logs/data-model/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. API
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Providing users with an API allows them to go through the process of instrumenting their code in a way that is vendor-agnostic. &lt;/li&gt;
&lt;li&gt;The API is decoupled from the code that generates the telemetry, allowing users the flexibility to swap out the underlying implementations as they see fit&lt;/li&gt;
&lt;li&gt;A user who instruments their code by using the API and does not configure the SDK will not see any telemetry produced by design. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. SDK
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SDK does most of the heavy lifting in OT.&lt;/li&gt;
&lt;li&gt;Implements the underlying system that generates, aggregates, and transmits telemetry data. &lt;/li&gt;
&lt;li&gt;Provides the controls to configure how telemetry should be collected, where it should be transmitted, and how. &lt;/li&gt;
&lt;li&gt;Configuration of the SDK is supported via in-code configuration, as well as via environment variables defined in the specification. &lt;/li&gt;
&lt;li&gt;As it is decoupled from the API, using the SDK provided by OpenTelemetry is an option for users, but it is not required. Users and vendors are free to implement their own SDKs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Instrumentation Libraries
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ensures users can get up and running quickly&lt;/li&gt;
&lt;li&gt;provide instrumentation for popular open source projects and frameworks, in Python, the instrumentation libraries include Flask, Requests, Django, and others.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Pipelines
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pipelines helps to produce telemetry generated by signal and export  them to data store.&lt;/li&gt;
&lt;li&gt;Each signal implementation offers a series of mechanisms to generate, process, and transmit telemetry.&lt;/li&gt;
&lt;li&gt;PROVIDER &amp;gt; GENERATOR &amp;gt; PROCESSOR &amp;gt; Exporter&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Providers
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The starting point of the telemetry pipeline is the provider.&lt;/li&gt;
&lt;li&gt;A provider is a configurable factory that is used to give application code access to an entity used to generate telemetry data. &lt;/li&gt;
&lt;li&gt;Although multiple providers may be configured within an application, a default global provider may also be made available via the SDK. &lt;/li&gt;
&lt;li&gt;Providers should be configured early in the application code, prior to any telemetry data being generated. &lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Generator:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;To generate telemetry data at different points in the code, the telemetry generator instantiated by a provider is made available in the SDK. &lt;/li&gt;
&lt;li&gt;This generator is what most users will interact with through the instrumentation of their application and the use of the API. &lt;/li&gt;
&lt;li&gt;Generators are named differently depending on the signal: the 

&lt;ul&gt;
&lt;li&gt;tracing signal calls this a tracer, &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Processors
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Once the telemetry data has been generated, processors provides the ability to further modify the contents of the data. &lt;/li&gt;
&lt;li&gt;Processors may determine the frequency at which data should be processed or how the data should be exported. &lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Exporters
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;translate the internal data model of OpenTelemetry into the format that best matches the configured exporter's understanding. &lt;/li&gt;
&lt;li&gt;Multiple export formats and protocols are supported by the OpenTelemetry project:

&lt;ul&gt;
&lt;li&gt;OpenTelemetry protocol&lt;/li&gt;
&lt;li&gt;Console&lt;/li&gt;
&lt;li&gt;Jaeger&lt;/li&gt;
&lt;li&gt;Zipkin&lt;/li&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;OpenCensus&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;used to identify the source of the telemetry data, whether a machine, container, or function&lt;/li&gt;
&lt;li&gt;used at the time of analysis to correlate different events occurring in the same resource. &lt;/li&gt;
&lt;li&gt;Resource attributes are added to the telemetry data from signals at the export time&lt;/li&gt;
&lt;li&gt;Are associated with providers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. Context propagation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Is the core concept of distributed tracing, &lt;/li&gt;
&lt;li&gt;Provides the ability to pass valuable contextual information between services that are separated by a logical boundary. &lt;/li&gt;
&lt;li&gt;Context propagation is what allows distributed tracing to tie requests together across multiple systems&lt;/li&gt;
&lt;li&gt;Allows user defined values (baggage) to be propagated as well&lt;/li&gt;
&lt;li&gt;defines a context API as part of the OpenTelemetry specification. &lt;/li&gt;
&lt;li&gt;Python has  built-in context mechanisms, ContextVar&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Auto Instrumentation, Manual instrumentation and challenges
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Why Auto instrumentation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The upfront cost of instrumenting code can be a deterrent to even getting started, especially if a solution is complicated to implement and will fail to deliver any value for a long time. &lt;/li&gt;
&lt;li&gt;Auto-instrumentation looks to alleviate some of the burdens of instrumenting code manually&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Challenges of manual instrumentation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The libraries and APIs that are provided by telemetry frameworks can be hard to learn how to use&lt;/li&gt;
&lt;li&gt;Instrumenting applications can be tricky. This can be especially true for legacy applications where the original author of the code is no longer around&lt;/li&gt;
&lt;li&gt;Knowing what to instrument and how it should be done takes practice&lt;/li&gt;
&lt;li&gt;Modifying code means compiling code again and building the artifact again and deploying again&lt;/li&gt;
&lt;li&gt;The ability to disable instrumentation for a specific cod eblock/module/plugin&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Components of auto-instrumentation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1. Instrumentation libraries&lt;/li&gt;
&lt;li&gt;Python - flask, django, boto&lt;/li&gt;
&lt;li&gt;2. Agent/runner&lt;/li&gt;
&lt;li&gt;automatically invoke the instrumentation libraries without additional work on the part of the user&lt;/li&gt;
&lt;li&gt;configure OpenTelemetry and load the instrumentation libraries that can be used to then generate telemetry&lt;/li&gt;
&lt;li&gt;What it cannot do

&lt;ul&gt;
&lt;li&gt;cannot instrument application-specific code&lt;/li&gt;
&lt;li&gt;it may instrument things you're not interested in. This may result in the same network call being recorded multiple times, or generated data that you're not interested in using&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Instrumentation libraries in Python&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any intercepting calls to libraries are instrumented and are replaced at runtime via a technique known as monkey patching (&lt;a href="https://en.wikipedia.org/wiki/" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/&lt;/a&gt; Monkey_patch). &lt;/li&gt;
&lt;li&gt;The instrumenting library receives the original call, produces telemetry data, and then calls the underlying library. &lt;/li&gt;
&lt;li&gt;Python implementation ships a script that can be called to wrap any Python application. &lt;/li&gt;
&lt;li&gt;The opentelemetry-instrument script finds all the instrumentations that have been installed in an environment by loading the entry points registered under the opentelemetry_instrumentor name&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Overview of Traces, Spans and Logs and Metrics using a sample application with Opentelemetry
&lt;/h2&gt;

&lt;p&gt;A sample application running in docker compose environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy1xj5i8u99c2dmkt39r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdy1xj5i8u99c2dmkt39r.png" alt="Sample" width="535" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 Pods of application which emit data - shopper, gorcery store and legacy-inventory&lt;/li&gt;
&lt;li&gt;An Open telemetry collector&lt;/li&gt;
&lt;li&gt;Loki : visualized by Grafana&lt;/li&gt;
&lt;li&gt;Jaegar :  &lt;a href="http://localhost:16686" rel="noopener noreferrer"&gt;http://localhost:16686&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;Prometheus : &lt;a href="http://localhost:9090/" rel="noopener noreferrer"&gt;http://localhost:9090/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Grafana : &lt;a href="http://localhost:3000/explore" rel="noopener noreferrer"&gt;http://localhost:3000/explore&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Traces
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Trace Context specification

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.w3.org/TR/trace-context-1/" rel="noopener noreferrer"&gt;https://www.w3.org/TR/trace-context-1/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;A distributed trace contains events that cross process, network and security boundaries&lt;/li&gt;

&lt;li&gt;The work captured in a trace is broken into separate units or operations, each represented by a span&lt;/li&gt;

&lt;li&gt;This specification defines standard HTTP headers and a value format to propagate context information that enables distributed tracing scenarios&lt;/li&gt;

&lt;li&gt;Distributed tracing is the foundation behind the tracing signal of OpenTelemetry. &lt;/li&gt;

&lt;li&gt;A distributed trace is a series of event data generated at various points throughout a system tied together via a unique identifier. &lt;/li&gt;

&lt;li&gt;This identifier is propagated across all components responsible for any operation required to complete the request, allowing each operation to associate the event data to the originating request&lt;/li&gt;

&lt;li&gt;Example Jaegar trace&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftn9axp5alhhdjxx1wpup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftn9axp5alhhdjxx1wpup.png" alt="Image description" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Trace shows

&lt;ul&gt;
&lt;li&gt;Trace ID&lt;/li&gt;
&lt;li&gt;Start date time&lt;/li&gt;
&lt;li&gt;Duration&lt;/li&gt;
&lt;li&gt;Count of services&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  SPAN
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;span can represent a method call or a subset of the code being called within a method.&lt;/li&gt;
&lt;li&gt;Multiple spans within a trace are linked together in a parent-child relationship, with each child span containing information about its parent. &lt;/li&gt;
&lt;li&gt;The first span in a trace is called the root span and is identified because it does not have a parent span identifier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnud30zvekg4wdsj97fqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnud30zvekg4wdsj97fqs.png" alt="Image description" width="593" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two Spans can be seen here&lt;/li&gt;
&lt;li&gt;First one with 7.01 millisecond duration, second with 260 millisecond&lt;/li&gt;
&lt;li&gt;Each span has span id&lt;/li&gt;
&lt;li&gt;Tags: representing key value flavours which give information about operation being done &lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process: represents which process executed this operation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SpanContext:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contains information about the trace and must be propagated throughout the system. &lt;/li&gt;
&lt;li&gt;The elements of a trace available within a span context include the following: &lt;/li&gt;
&lt;li&gt;A unique identifier, referred to as a trace ID, identifies the request through the system. &lt;/li&gt;
&lt;li&gt;A second identifier, the span ID, is associated with the span that last interacted with the context. &lt;/li&gt;
&lt;li&gt;This may also be referred to as the parent identifier. • &lt;/li&gt;
&lt;li&gt;Trace flags include additional information about the trace, such as the sampling decision and trace level.&lt;/li&gt;
&lt;li&gt;Vendor-specific information is carried forward using a Trace state field. This allows individual vendors to propagate information necessary for their systems to interpret the tracing data. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; metrics provide information about the state of a running system to developers and operators&lt;/li&gt;
&lt;li&gt;The data collected via metrics can be aggregated over time to identify trends and patterns in applications graphed through various tools and visualizations.&lt;/li&gt;
&lt;li&gt;Metrics are critical to monitoring the health of an application and deciding when an on-call engineer should be alerted&lt;/li&gt;
&lt;li&gt;Metrics  form the basis of service level indicators (SLIs) (&lt;a href="https://en.wikipedia.org/wiki/Service_level_" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Service_level_&lt;/a&gt; indicator) that measure the performance of an application. &lt;/li&gt;
&lt;li&gt;These indicators are then used to set service level objectives (SLOs) (&lt;a href="https://en.wikipedia.org/wiki/" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/&lt;/a&gt; Service-level_objective) that organizations use to calculate error budgets. &lt;/li&gt;
&lt;li&gt;Opentelmetry primarily uses metrics by:

&lt;ul&gt;
&lt;li&gt;OpenMetrics&lt;/li&gt;
&lt;li&gt;StatsD,&lt;/li&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Metrics may capture data in various Data Point Types&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijjoxw7e4pxy0ajx9r5y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijjoxw7e4pxy0ajx9r5y.png" alt="Image description" width="343" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Searching a Metric in Prometheus
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus collects and stores metrics over time, in a time series database, which can be queried using metric name&lt;/li&gt;
&lt;li&gt;Request counter metric is a counter which counts the number of incoming requests on a service&lt;/li&gt;
&lt;li&gt;Here we can see that after querying by metric name "request_counter" we are returned with 3 rows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4wctn146z3s212q8teo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4wctn146z3s212q8teo.png" alt="Prometheus Metrics" width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each row is for a different service and shows the request_count value, which is a integer - metric of type counter which increases the count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A log is a record of events written to output&lt;/li&gt;
&lt;li&gt;Loki stores all the logs generated by grocery store application and grafana is used to view it&lt;/li&gt;
&lt;li&gt;A normal message on console output would be: Filter the logs using the {job="shopper"} query to retrieve all the logs generated by the shopper application&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shopper                  | INFO:shopper:message="add orange to cart"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa7vdc8iv316iu37jo39.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpa7vdc8iv316iu37jo39.png" alt="Application log collected by Loki viewed in Grafana" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;However this message in the loki would be below&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c4ew03brd3t13dhjnak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c4ew03brd3t13dhjnak.png" alt="A Correlated trace of the above log" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which contains more details like Traceid, spanid, time..etc&lt;/li&gt;
&lt;li&gt;The Same log contains trace id hence this can be corelated in Jaegar with trace and span details&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opentelemetry</category>
      <category>observability</category>
    </item>
  </channel>
</rss>
