DEV Community: Elli (Einav Laviv)

API latency in microservices – Trace-based troubleshooting

Elli (Einav Laviv) — Sun, 16 Jul 2023 14:22:47 +0000

Original article: https://gethelios.dev/blog/api-latency-in-microservices-trace-based-troubleshooting

In microservices architectures, apps are broken down into small, independent services that communicate with each other using APIs in a synchronous or asynchronous way.

Microservices carry many advantages, such as Increased flexibility and scalability (microservices can be scaled independently of each other, and APIs help to scale microservices by adding or removing instances of the service as needed), enhanced reliability (a failure in one doesn’t effect the other microservices as in monolith apps), better security, reduced development time and more.

However, microservices APIs are not problem free. API latency is a major obstacle in microservices, and it’s defined as the time it takes for an API to respond to a request. It is a critical factor in the performance of microservices applications.

In this aspect, it’s important to clarify the difference between latency and response time. These are two important metrics for measuring the performance of a system. They are often confused with each other, but they are actually two different things. Response time is the time it takes for a system to respond to a request. It includes the time it takes for the system to receive the request, process the request, and generate a response. Latency. On the other hand, is the time it takes for a message to travel from one point to another. It includes the time it takes for the message to travel through the network, as well as the time it takes for the system to receive and process the message. In short, it’s the remote response time. While both should be optimized, this article looks at latency.

Not surprisingly, latency is a big problem in microservices as microservices applications depend more on the communication between APIs: In a monolithic application, all of the code and data is located in a single process. This of course makes it easier to respond to requests quickly. In a microservices application, however, each service is its own individual process. This means that transaction events have to travel between processes, across different internal APIs which adds to the response time and may cause high latency. The problem worsens when scaling microservices – As the complexity of the service increases, there is a significant risk of increasing latency between microservices and application programming interfaces (APIs)

Microservices vs monolith response time

Source

Why is it such a problem? Latency is a problem due to multiple reasons, such as slow response time for users, increased load on servers, reduced scalability, and more.

Moreover, debugging and troubleshooting API latency in microservices is challenging as tracking down the root cause can be a developer nightmare. After all, it can be caused by one single service, or it can be related to the communication between multiple services.

This article discusses the challenges of latency in microservices as well as some strategies, best practices and examples for reducing latency.

Internal APIs performance issues in microservices
There are a number of performance failures or issues that can occur with APIs in microservices architectures. Some of the most common issues include:

Latency: As mentioned, latency is the time it takes for an API to respond to a request.

HTTP errors: HTTP errors are a common type of error that can occur with APIs. HTTP errors are returned by the API when there is a problem with the request or the response. Some common examples include 404 Not Found, 500 Internal Server Error, and 403 Forbidden.

Timeouts: Timeouts can occur when the latency is just too high and an API takes too long to respond to a request. Timeouts can be caused by a variety of factors, such as network congestion, overloaded servers, or errors in the code.

Connection errors: Connection errors can occur when an API is unable to connect to the server. Connection errors can be caused by a variety of factors, such as network problems, server outages, or firewall rules.

Invalid data: Invalid data can be sent to an API in a request. Invalid data can cause errors in the API, such as malformed requests or invalid parameters.

Deep dive into API latency in microservices
API latency in microservices is defined as the time it takes for the API to receive the request, process the request, and generate a response.

High API latency can occur due to multiple reasons, including:

The number of microservices: The more microservices there are in an architecture, the more potential points of latency there are (therefore the problem is bigger when scaling microservices)

The complexity of the microservices: The more complex the microservices are, the more time it takes for them to process requests.

The network infrastructure: The quality of the network infrastructure can also impact latency.

Network latency: The time it takes for a message to travel from one point to another can be a major factor in latency. This can be caused by factors such as distance, network congestion, and packet loss.

Server latency: The time it takes for a server to process a request can also contribute to latency. This can be caused by factors such as server load, server resources, and the complexity of the request.

Database latency: The time it takes for a database to return data can also be a factor in latency. This can be caused by factors such as database size, database load, and the complexity of the query.

API design: The design of the API can also impact latency. For example, if the API is not designed to be efficient, it may require more round trips between the client and the server, which can increase latency.

Caching: Not using caching can cause the API to fetch data from a slower storage medium, such as a database, every time a request is made.

Load balancing: If load balancing is not implemented correctly, requests may be routed to overloaded servers.

Service mesh: If a service mesh is not used, there will be no central point for managing and monitoring microservices.

Observability: in microservices, If latency is not properly observed, it can lead to performance problems and outages.
**
Troubleshooting high API latency in microservices**
Why do traditional monitoring methods fail?
Traditional monitoring methods fail to troubleshoot API latency in microservices architectures because they were not designed to handle the complexity of such architectures. They typically focus on monitoring individual servers and they alone can’t connect dependencies and performance across the entire customer journey through distributed architecture.

While monitoring is important, cloud-native application architectures require observability instead. As detailed in our article about API observability versus monitoring, API observability views much wider performance data compared to monitoring, in one place and in real-time, through observing individual services and the dependencies between them. It uses logs, metrics, and tracing to create a holistic strategy for monitoring.

Troubleshooting API latency effectively
To troubleshoot API latency in microservices architectures, it is important to use an observability solution that is designed for this type of architecture. These solutions typically provide features such as:

Distributed tracing: Distributed tracing allows you to see the path that a request takes through a microservices architecture. This can help you to identify the source of latency problems.

Service level objectives (SLOs): SLOs allow you to define acceptable levels of performance for your microservices. This can help you to identify and troubleshoot latency problems before they impact your users.

Alerting: Alerting allows you to be notified when there are latency problems in your microservices architecture. This can help you to quickly identify and troubleshoot problems before they impact your users.

Debugging API latency with distributed tracing and OpenTelemtry
Distributed tracing is a way of tracking requests as they travel through a distributed system in order to identify and resolve performance bottlenecks and other problems. The most powerful tool to implement distributed tracing is OpenTelemetry, an OSS observability framework that collects and exports telemetry data from a variety of sources including apps, services, and infrastructure, through data instrumentation. OpenTelemetry can be used to collect data about the request, including the time it took to complete. This data can be used to identify performance bottlenecks and other issues. It’s a great tool as it’s vendor-neutral, and not tied to any specific vendor, technology, language or framework.

While no doubt that OTel is a life-changing tool, it has a few disadvantages that can be dealt with by using 3rd party tools that are based on this OSS. The main issues include implementation and maintenance complexity, lack of backend storage, lack of a visualization layer, and no actionable insights based on the data it collects.
**
E2E trace-based observability: Visualization and error insights**
For effective troubleshooting of distributed systems and API latency in microservices, in particular, developers need an E2E observability solution that visualizes traces and spans, as well as collects granular error and performance-related data.

Helios is an OTel-based tool that helps Dev and Ops teams minimize MTTR in distributed applications. It helps developers install and maintain OpenTelemetry in no time, collect the full payload data, store telemetry data, visualize traces and spans, correlate them with logs and metrics, and enable error insights and alerts.

The tool provides a dashboard for each specific API in the catalog. This includes trends of the recent spans, duration distribution, HTTP response status code, errors and failures, and more. It lets developers filter APIs by errors with the full E2E context of each API call, enabling them to investigate what happened with the most relevant context and flow-drive mindset.

An example: Root cause analysis of the increase in API latency
This example is also shared in this article: API observability: Leveraging OTel to improve developer experience.

In this example, the flow showing the visualization is composed of various endpoints and involves several services.

Here’s the first entry point of the visualization’s API in the app:

Inspecting the API calls with the largest latency based on instrumented data

Using the API overview for analyzing & troubleshooting latency reported by customers – by quickly identifying the outlier long spans

Zeroing in on the API calls and spans that represent the increased latency

The investigation continues towards a minimal subset of traces, done by clicking on their visualizations and drilling down into the details (through the duration feature that pinpoints the bottlenecks):

Analyzing bottlenecks in the E2E flow using trace visualization

Investigating other spans can reveal a trend and see if the issue occurs in all other traces and if bottlenecks exist in all of them.

Conclusion
In conclusion, API latency poses a major challenge to microservice architectures. It can be caused by a variety of factors, including the number of microservices, their complexity, the network infrastructure, and the design of the API.

To troubleshoot API latency, it is important to use an observability solution that is specifically designed for microservices architectures. However, not all observability solutions provide all of the features that are needed to troubleshoot API latency in microservice architectures.

For example, some solutions do not collect data from all of the microservices in an architecture. Others do not provide trace-based visualization, which can make it difficult to identify the source of latency problems. Others still do not provide error insights and alerts, which can make it difficult to take action to resolve latency problems.

In order to make the most out of OpenTelemetry, you should use a solution that includes both visualization, granular data of the payloads, insights and error alerts.

*About Helios
*
Helios is a dev-first observability platform that helps Dev and Ops teams shorten the time to find and fix issues in distributed applications. Built on OpenTelemetry, Helios provides traces and correlates them with logs and metrics, enabling end-to-end app visibility and faster troubleshooting.

API monitoring vs. observability- Debugging microservices efficiently

Elli (Einav Laviv) — Mon, 05 Jun 2023 11:17:48 +0000

As microservices architecture has become more popular, there has been a growing need for API observability. This is because microservices applications are made up of many small, independent services that communicate with each other through APIs. This can make it difficult to track the performance and health of an application, as well as identify and troubleshoot problems.

API observability is a broader term than API monitoring. Monitoring focuses on tracking known metrics, such as request latency and response time. Observability, on the other hand, also includes tracking unknown metrics, such as error rates and resource utilization. This allows developers to get a more complete picture of how an API is performing, and to identify problems before they impact users.

There are a number of tools that can be used to implement API observability. Some popular options include:

Prometheus: A popular open-source monitoring system that can be used to collect metrics from APIs.

Grafana: A visualization tool that can be used to display Prometheus metrics.

Jaeger: A distributed tracing tool that can be used to track requests as they flow through an application.

By implementing API observability, developers can gain a deeper understanding of how their applications are performing. This can help them to identify and troubleshoot problems more quickly, and to improve the overall performance and reliability of their applications.

Continue reading to learn more about:

-The advantages of observability compared to monitoring

-The pillars of API observability (functional test automation, performance management, security and analytics)

-Metrics (API dependencies, API stats, Api spec

Benefits and examples
Use cases

-Evaluating 3rd party tools

-API observability with openTelemetry and distributed tracing

-How to make it actionable: Visualization, granular error data and insights

What is OpenTelemetry Tracing: Observability for troubleshooting microservices

Elli (Einav Laviv) — Sun, 30 Apr 2023 15:54:07 +0000

OpenTelemetry is buzzing lately and everybody is speaking about distributed tracing as a mean to solve microservices debugging hell and minimize MTTR.

Developers want to spend less time on debugging and have a better expeirence. Dev leaders need to ensure velocity and quality, lower the cost of the engineeering unit and find creative ways to deliver at scale (especially in today's atmosphere, when saving costs are a main KPI in the tech ecosystem.

Debugging in microservices is a big pain. huge.

Dependencies, complexities, endless components- these aspects make route cause analysis extreamly complex and traditional monitoring methods, that are built on statistical analysis, such as logging, can't offer a reasnable solution.

Microservices observabiity, which differs from monitoring as it is built on data instrumentation and recording events, is needed.

OPenTelemetry (OSS project) distributed tracing capabilities compensate for traditional observability methods, that master monolith apps but are hardly sufficient in observing and debugging distributed environments.

OTel allows developers to instrument their microservices apps with the standard instrumentation library that generates telemetry data from various sources, such as logs, metrics, and traces.

OpenTelemetry agents can then collect and export this telemetry data to multiple systems for logging, tracing, and monitoring. A main advantage of OpenTelemetry is that it aims to be vendor-agnostic, meaning that the data collected can be sent to any backend and moving between them doesn't require any client-side changes. Instrumentation allows deep dive into contextual error data that is needed for fast root cause analysis.

In order to make OpenTelemetry actionable, there's a need to export the data to a 3rd party tool that can help generating insights, such as Jaeger or alike.

Some of these tools are better than others, easier to maintain, offer advanced trace based visualization and highlight granular payload data that is critical to understand and solve issues such as bottlenecks, latency, and more.

Overview this article about OTel specifics as well as best practices and tools to minimize overhead and maximize the value >>

OpenTelemetry Python – Walkthrough and Examples

Elli (Einav Laviv) — Mon, 10 Apr 2023 11:50:16 +0000

Visit the original article that includes videos > OpenTelemetry Python – Walkthrough and Examples

OpenTelemetry in a Nutshell:

Microservices architecture has become the new norm for modern applications due to its numerous advantages compared to traditional monolithic architecture. However, microservices also come with several challenges. Especially when it comes to observability, traditional monitoring tools and techniques can no longer handle microservices’ distributed and dynamic nature.

This is where OpenTelemetry distributed tracing comes in. It’s an open-source observability framework that allows developers to capture telemetry data from distributed cloud-native applications. It provides APIs, tools, and SDKs to collect and generate traces, metrics, and logs. OpenTelemetry can be used with multiple programming languages, including Python, Java, JavaScript, Go, Ruby, and C++.

One of its key features is its ability to capture and propagate distributed traces and spans across different services and components. It also supports capturing and exporting metrics like request latency, error rates, and resource usage that can be used to monitor and generate alerts on various aspects of the system.

Compared to traditional monitoring approaches, OpenTelemetry has many advantages and resolves several pain points.

OpenTelemetry provides a unified and standardized method for collecting telemetry data, simplifying the correlation and analysis of data from various sources.
It supports multiple export formats, including popular observability platforms like Prometheus, Jaeger, and Zipkin.
It’s a vendor-agnostic, future-proof, and community-driven solution that ensures interoperability and extensibility.
In this article, we will guide you through the process of python tracing by using OpenTelemetry with Python to capture and export traces and metrics from your applications. We will discuss the essential components of OpenTelemetry and provide examples of integrating it with popular observability platforms.

Python OpenTelemetry Tutorial – Installation Walkthrough
Installing OTel in Python enables Python monitoring and tracking (I will later on also share some info about Python monitoring tools that offer additional value on top of OTel). As you will see now, it is pretty straightforward. You can either use the OpenTelemetry manual instrumentation approach or the OpenTelemetry auto instrumentation approach. Here, I will walk you through both methods using a simple Flask application and OpenTelemetry Python SDK

Prerequisite – Creating the Python application
Install the Flask framework using pip3 install flask command, and create a file named server.py. Then, update it with the code below.

from flask import Flask
import random
app = Flask(name)
@app.route('/random')
def index():
number = random.randint(0,100)
return 'Random Number : %d' % number
app.run(host='0.0.0.0', port=8000)
The above code will return a random number between 0 and 100 when the endpoint is called. You can test it by running the project using the flask run –app server.py command.

OpenTelemetry Manual Instrumentation
First, I will take you through the sets of OTel manual instrumentation.

Step 1 – Installing OpenTelemetry libraries
Install opentelemetry-api and opentelemetry-sdk libraries to start instrumenting your Python applications with OpenTelemetry.

pip install opentelemetry-api
pip install opentelemetry-sdk

Once the libraries are installed, you can add a tracer object to your server.py file using the following code:

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
provider = TracerProvider()
processor = BatchSpanProcessor(ConsoleSpanExporter()) #
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(name)

The above configuration includes a provider, a processor, and a tracer. The provider (TracingProvider), serves as the API entry point that holds the configuration. The processor specifies how to send the created spans forward. The tracer, on the other hand, is the actual object that generates the spans.

Step 2 – Adding a tracer to the Flask route

Then, you can add a tracer object to the Flask route like the below:

@app.route('/random')
def index():
with tracer.start_as_current_span("server_request"):
number = random.randint(0,100)
return 'Random Number : %d' % number

app.run(host='0.0.0.0', port=8000)

Then, restart the application and call the route to see the tracing information in the console.

{
"name": "server_request",
"context": {
"trace_id": "0x9f1bcbe7c34237ead06a35162f37dc83",
"span_id": "0x2d42807b6e77dbe4",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2023-03-16T17:24:07.025035Z",
"end_time": "2023-03-16T17:24:07.025035Z",
"status": {
"status_code": "UNSET"
},
"attributes": {},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.15.0",
"service.name": "unknown_service"
},
"schema_url": ""
}
}

However, the above tracing information does not provide much information besides the span ID, trace ID, and start and end times. So, let’s see how we can get more insights.

Step 3 – Updating the span to get more insights

We can get more details by adding an event to the random number generation function.

@app.route('/random')
def index():
with tracer.start_as_current_span(
"server_request",
attributes={ "endpoint": "/random"
}):
span = trace.get_current_span()
number = random.randint(0,100)
span.add_event( "log", {
"random number": number
})
return 'Random Number : %d' % number

app.run(host='0.0.0.0', port=8000)

In the above example, we start by adding an attribute to the span in the tracing start method. Then, we retrieve the current span and add an event to be triggered with each API call.

{
"name": "server_request",
"context": {
"trace_id": "0x9f0f2b9c658a1dba5ab169a19c4c226b",
"span_id": "0x9d18fdc373ae912a",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2023-03-17T18:17:39.626005Z",
"end_time": "2023-03-17T18:17:39.626005Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"endpoint": "/random"
},
"events": [
{
"name": "log",
"timestamp": "2023-03-17T18:17:39.626005Z",
"attributes": {
"random number": 64
}
}
],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.15.0",
"service.name": "unknown_service"
},
"schema_url": ""
}
}

As you can see, now, the tracing log provides more insights, including the endpoint, timestamp of the API call, and the result.

OpenTelemetry Automatic Instrumentation for Python

Instead of manually configuring attributes and events, you can use OpenTelemetry automatic instrumentation to standardize the process and enable Python observability without the manual load. It allows instrumenting your application without code modifications using monkey patching or bytecode injection.

However, automatic OTel instrumentation has limitations and is not as straightforward as a span per function. Instead, it is custom-implemented for several frameworks in a significant manner. Hence, it is necessary to check the supported frameworks before you start.

Step 1 – Installing OpenTelemetry libraries

First, you need to install opentelemetry-instrumentation-flask using the below command.

pip install opentelemetry-instrumentation-flask

Else, you can use opentelemetry-bootstrap -a install command to install auto instrumentation packages for all Python frameworks.

Step 2 – Running the application

Now, you do not need any code modifications. You can simply run the application with a few additional command line arguments.

opentelemetry-instrument –traces_exporter console flask run

The above command uses –traces_exporter flag to export the traces to the console. You can find more on other configuration options here.

{
"name": "/random",
"context": {
"trace_id": "0x541995dffa03529463f2885661ba2550",
"span_id": "0xde86468a94d54239",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2023-03-18T16:43:57.553799Z",
"end_time": "2023-03-18T16:43:57.554797Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 5000,
"http.host": "127.0.0.1:5000",
"http.target": "/random",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36",
"net.peer.port": 56717,
"http.flavor": "1.1",
"http.route": "/random",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.15.0",
"telemetry.auto.version": "0.36b0",
"service.name": "unknown_service"
},
"schema_url": ""
}
}

As you can see, the above trace log shows more details than the manual instrumentation process, and we didn’t have to make any manual code modifications.

Adding Advanced Visualization and Context for Python Spans and Traces

Tools like Jaeger offer visibility into spans and traces by providing basic visualization to see each span’s start and end times, relationships, and contextual information, such as the span ID and trace ID. However, these tools do not fully leverage the potential of OpenTelemetry. They have some significant limitations, like the lack of advanced visualization options, providing limited context, and missing data on the calls. Hence, we need tools with more advanced visualization and context support to get the maximum from the OTel Python instrumentation.

That’s where tools like Helios come in. It is a developer-observability solution built on OpenTelemetry that provides actionable insight into the end-to-end application flow. Recently, I used Helios with a Python (Flask) application, and it provided some amazing features like advanced visualizations, bottleneck identification, workflow reproduction, trace-based test generation, and end-to-end visibility.

So, let’s see how to integrate Helios with Python – OpenTelemetry instrumentation to increase observability while reducing the troubleshooting and debugging time.

Step 1 – Create a Helios Account

First, create a new Helios account.

Helios distrbuted tracing signin screen

Once the account is created, you will get a UI like the one below with installation commands and configurations for all the languages supported by Helios.

Multi lang and environments support - Helios distributed tracing

Step 2 – Installing Helios Python SDK

Install Helios SDK for your Python application with the below code:

pip install helios-opentelemetry-sdk

Then, all you need to do is set the environment variables, as in the screenshot from step 1.

export AUTOWRAPT_BOOTSTRAP="helios" # TODO: Variable should be set before process starts.
export HS_TOKEN="6ec524268408451d9854"
export HS_SERVICE_NAME="" # TODO: Replace value with service name.
export HS_ENVIRONMENT="" # TODO: Replace value with service environment.

If not, you can import Helios and initialize a connection using the token. But the recommended way is to use environment variables.

from flask import Flask
import random
from helios import initialize
initialize(
api_token='f1e8b1f587cebd6e45e8', # Insert API Token here.
service_name='my-python-service', # Insert service name.
enabled=True, # Defaults to False if omitted.
environment='DEPLOYMENT_ENV', # Defaults to os.environ.get('DEPLOYMENT_ENV') if omitted.
commit_hash='COMMIT_HASH', # Defaults to os.environ.get('COMMIT_HASH') if omitted.
)
app = Flask(name)

@app.route('/random')
def index():
number = random.randint(0,100)
return 'Random Number : %d' % number

app.run(host='0.0.0.0', port=8000)

Step 3 – Observe the Trace Visualizations

Now, restart the application and refresh the Helios dashboard to get the actionable data of your application.

Helios dashboard - get the actionable data of your application.

As you can see, Helios provides more advanced visualization and context for spans and traces in your Python application compared to stand-alone OTel or Jaeger.

The API tab lists all the API endpoints of the applications, and you can get trace and span data on each API by clicking on them.

The Traces tab will list all the traces, and you can get trace visualizations and timelines by clicking on them.

As the number of services and endpoints increase, visualizations and insights provided by OpenTelemetry can be incredibly helpful in troubleshooting and debugging your applications. Most importantly, the entire process only requires two simple steps.

Conclusion
This article discussed the importance of OpenTemeletry in modern microservices architecture and the step to manually and automatically instrument your Python applications with OpenTelemetry, it also shares OpenTelemetry python examples. However, troubleshooting modern distributed applications requires more insights and actionable data to identify the issues to reduce the MTTR. Hence, getting familiar with advanced visualization and context tools like Helios is important to make your work much easier.

Thank you for Reading.

OTel for the rescue - dev first API observability

Elli (Einav Laviv) — Wed, 29 Mar 2023 10:36:45 +0000

This article discusses how API observability powered by OTel distrubted tracing and instrumented data helps developers debug issues much faster and minimize MTTR in microservices.

We all know the davatages of APIs, today's SW developement pretty much relaes on them. But, here as well, distributed architectures make it hard to have control and visibility over all APIs across all microservices, and know where exactly they are used and how.

Actual instrumentated data improves the developer experience by providing API observability and troubleshooting capabilities. In short, it enables visibility into API inventory, specs & runs and therefore is helping identify and debug issues instantly and minimize MTTR. This article is written by a developer at Helios, using API observability features in the platform to monitor and troubleshoot the app she works on.

Specifically, she talks about applying API observability in the following ways:

Using an auto-generated API catalog, enabling discovery of the entire API inventory used by an application
Using API overview and spec tools, providing access to API documentation and performance as calculated automatically based on instrumented API calls
API troubleshooting, allowing immediate access to different kinds of API errors and failures – including the full E2E context coming from distributed tracing and context propagation

Continue to the full article to learn more and review examples:

Jaeger Tracing - Theory and practice

Elli (Einav Laviv) — Sun, 26 Mar 2023 09:13:24 +0000

For the full article, visit:(https://gethelios.dev/blog/jaeger-vs-helios-which-one-should-you-choose/)

OpenTelemetry (OTel), is an OSs that provides tools, APIs and SDKs for microservices observability based data collection (i.e, logs, metrics and traces) and therefore it's tailored to cloud native apps.

Developers use the data collected by OTel for monitoring apps health and performance and debugging and testing microservices. The data can be exported externally to tools, like APMs, open source Jaeger and Zipkin, Helios, and others.

In this article, we’ll explain Jaeger tracing compared to OpenTelemetry, and demonstrate how they can be enriched with advanced observability and insights for troubleshooting microservices >>

Golang Distributed Tracing with OpenTelemetry - Solving the Challenge

Elli (Einav Laviv) — Sun, 05 Mar 2023 12:07:09 +0000

This is a preview - LInk to the full article is provided below

OpenTelemetry (OTel ) an open-source framework for trace based observability, provides a standard set of vendor-agonistic SDKs, APIs, and variety of tools to connect with observability backends. It supports all major programming languages, including Java, Python, Node.js, dotnet and Go.

However, Golang distributed tracing (for observability, debugging and testing), through integrating OTel with Go is challenging due to several reasons.

This article demonstrates the biggest challenges and it offers a new approach instead, for compile-time auto-instrumentation makes the process of tracing Golang faster and with much less friction.

Read on how Golang tracing is executed today, and view an example of the exact steps to simplify it

Read more - Golang Distributed Tracing with OpenTelemetry - Solving the Challenge - https://gethelios.dev/blog/golang-distributed-tracing-opentelemetry-based-observability/