Trace
Trace is a record or history from a request. Trace give us big picture abour what is happening from application accepts request to return a response. This record is build above span.
Span
Span is a unit that records particular operation within certain time window. Span contains information about trace name, timestamp from request starts to when it ends, and other informations in form of span attributes such as HTTP request, HTTP URL, and URL path. This is an example of hello
span on json
format.
{
"name": "hello",
"context": {
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "051581bf3cb55c13"
},
"parent_id": null,
"start_time": "2022-04-29T18:52:58.114201Z",
"end_time": "2022-04-29T18:52:58.114687Z",
"attributes": {
"http.route": "some_route1"
},
"events": [
{
"name": "Guten Tag!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]
}
Span above contains informations about
- a name
hello
- span lifetime on
start_time
andend_time
fields -
http.route
span attribute records information URL -
event
field that tells us what happens during span lifetime
Span hello
is a parent span, because value on parent_id
field is null
. Pay attention on hello_greetings
span below.
{
"name": "hello-greetings",
"context": {
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "5fb397be34d26b51"
},
"parent_id": "051581bf3cb55c13",
"start_time": "2022-04-29T18:52:58.114304Z",
"end_time": "2022-04-29T22:52:58.114561Z",
"attributes": {
"http.route": "some_route2"
},
"events": [
{
"name": "hey there!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
},
{
"name": "bye now!",
"timestamp": "2022-04-29T18:52:58.114585Z",
"attributes": {
"event_attributes": 1
}
}
]
}
Value of parent_id
field on greetings
span is same as value of span_id
on hello
span. We call greetings
span is a child span from hello
span. Below is hello-salutation
span, where it's parent_id
is equal to hello
's span_id
.
{
"name": "hello-salutations",
"context": {
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "93564f51e1abe1c2"
},
"parent_id": "051581bf3cb55c13",
"start_time": "2022-04-29T18:52:58.114492Z",
"end_time": "2022-04-29T18:52:58.114631Z",
"attributes": {
"http.route": "some_route3"
},
"events": [
{
"name": "hey there!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]
}
OpenTelemetry Collector
Collector
is a component to receive, extract, and export telemetry data. OpenTelemetry collector is use to receive, extract, and export telemetry data in various formats such as Jaeger, Prometheus, or FluentBit. OpenTelemetry collector built above several components:
- Receiver
Receiver focus on collecting data from various sources and formats. Here is an example of receiver.
receivers:
# Data sources: logs
fluentforward:
endpoint: 0.0.0.0:8006
# Data sources: metrics
hostmetrics:
scrapers:
cpu:
disk:
filesystem:
load:
memory:
network:
process:
processes:
paging:
# Data sources: traces
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:4317
thrift_binary:
thrift_compact:
thrift_http:
# Data sources: traces, metrics, logs
kafka:
protocol_version: 2.0.0
# Data sources: traces, metrics
opencensus:
# Data sources: traces, metrics, logs
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
tls:
cert_file: cert.pem
key_file: cert-key.pem
http:
endpoint: 0.0.0.0:4318
# Data sources: metrics
prometheus:
config:
scrape_configs:
- job_name: otel-collector
scrape_interval: 5s
static_configs:
- targets: [localhost:8888]
# Data sources: traces
zipkin:
- Processor
Processor is use to process data from receiver. Processed data on this component cosists of filtering, renaming, or recalculating. We could not only define processor once, but also multiple of them, depending on our needs. Outcome of telemetry data depends on sequence or order of data processing on this component. Here is example of processor definition
processors:
# Data sources: traces
attributes:
actions:
- key: environment
value: production
action: insert
- key: db.statement
action: delete
- key: email
action: hash
# Data sources: traces, metrics, logs
batch:
# Data sources: metrics, metrics, logs
filter:
error_mode: ignore
traces:
span:
- 'attributes["container.name"] == "app_container_1"'
- 'resource.attributes["host.name"] == "localhost"'
- 'name == "app_3"'
spanevent:
- 'attributes["grpc"] == true'
- 'IsMatch(name, ".*grpc.*")'
metrics:
metric:
- 'name == "my.metric" and resource.attributes["my_label"] == "abc123"'
- 'type == METRIC_DATA_TYPE_HISTOGRAM'
datapoint:
- 'metric.type == METRIC_DATA_TYPE_SUMMARY'
- 'resource.attributes["service.name"] == "my_service_name"'
logs:
log_record:
- 'IsMatch(body, ".*password.*")'
- 'severity_number < SEVERITY_NUMBER_WARN'
# Data sources: traces, metrics, logs
memory_limiter:
check_interval: 5s
limit_mib: 4000
spike_limit_mib: 500
# Data sources: traces
resource:
attributes:
- key: cloud.zone
value: zone-1
action: upsert
- key: k8s.cluster.name
from_attribute: k8s-cluster
action: insert
- key: redundant-attribute
action: delete
# Data sources: traces
probabilistic_sampler:
hash_seed: 22
sampling_percentage: 15
# Data sources: traces
span:
name:
to_attributes:
rules:
- ^\/api\/v1\/document\/(?P<documentId>.*)\/update$
from_attributes: [db.svc, operation]
separator: '::'
- Exporter
Exporter is component that aims to export telemetry data that has been processed to other backend processes such as Jaeger, Zipkin, Prometheus, or Kafka. Each key on exporter use type/name
format, where type
defines kind of exporter like Jaeger, Kafka, or OpenTelemetry, and name
is an optional field to seperate exporter that has same type
. Here is an example of exporter.
exporters:
# Data sources: traces, metrics, logs
file:
path: ./filename.json
# Data sources: traces
otlp/jaeger:
endpoint: jaeger-server:4317
tls:
cert_file: cert.pem
key_file: cert-key.pem
# Data sources: traces, metrics, logs
kafka:
protocol_version: 2.0.0
# Data sources: traces, metrics, logs
# NOTE: Prior to v0.86.0 use `logging` instead of `debug`
debug:
verbosity: detailed
# Data sources: traces, metrics
opencensus:
endpoint: otelcol2:55678
# Data sources: traces, metrics, logs
otlp:
endpoint: otelcol2:4317
tls:
cert_file: cert.pem
key_file: cert-key.pem
# Data sources: traces, metrics
otlphttp:
endpoint: https://otlp.example.com:4318
# Data sources: metrics
prometheus:
endpoint: 0.0.0.0:8889
namespace: default
# Data sources: metrics
prometheusremotewrite:
endpoint: http://prometheus.example.com:9411/api/prom/push
# When using the official Prometheus (running via Docker)
# endpoint: 'http://prometheus:9090/api/v1/write', add:
# tls:
# insecure: true
# Data sources: traces
zipkin:
endpoint: http://zipkin.example.com:9411/api/v2/spans
Instrumenting Go Application With OpenTelemetry Tracing
Spin Up Collector
Trace and span data emitted by server will be collected, stored, filtered, and exported by OpenTelemetry collector. The rules of collector defined by a configuration file as shown below.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
otlphttp/trace:
endpoint: http://tempo:4318
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/trace]
OpenTelemetry collector will collects all emmited traces and spans as mentioned at receivers.otlp.http.endpoint
field. Value of this field is 0.0.0.0:4318
where 0.0.0.0
means the collector will listen to all IPv4 addresses.
After traces and spans has been collected by receiver, then they will be processed on processor
. In this example, the data will be processed as batch.
Processed data then will be exported to trace visualization technology such as Grafana Tempo. Exporters are defined on exporters
section.
Finally, we compile collecting, processing, and exporting traces data on service
field.
Defining Exporter
Exporter on code level aims to connect application with collector. NewTracer
function below explains exporter creation and connects created exporter with collector.
package trace
import (
"context"
"fmt"
"net/http"
"runtime"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
traceSdk "go.opentelemetry.io/otel/sdk/trace"
)
var globalServiceName string
func NewTracer(
ctx context.Context,
serviceName string,
) error {
propagator := propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
)
otel.SetTextMapPropagator(propagator)
// Create OpenTelemetry tracing client that exports
// trace data to OpenTelemetry HTTP receiver
traceClientHTTP := otlptracehttp.NewClient(
otlptracehttp.WithInsecure(),
)
traceExporter, err := otlptrace.New(
ctx,
traceClientHTTP,
)
if err != nil {
return err
}
res, err := resource.New(
ctx,
resource.WithContainer(),
resource.WithFromEnv(),
)
if err != nil {
return err
}
tracerProvider := traceSdk.NewTracerProvider(
traceSdk.WithBatcher(
traceExporter,
traceSdk.WithBatchTimeout(time.Second),
),
traceSdk.WithResource(
res,
),
)
globalServiceName = serviceName
otel.SetTracerProvider(tracerProvider)
return nil
}
Define Tracing Components
In this article, we are going to record events on HTTP request and business logic/service by creating trace and generate span from each component. HTTPLevelSpan
to trace over HTTP handler, while ServiceLevelSpan
to trace over service level.
package trace
import (
"context"
"fmt"
"net/http"
"runtime"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
"go.opentelemetry.io/otel/attribute"
)
func HTTPLevelSpan(
ctx context.Context,
r *http.Request,
) (context.Context, trace.Span) {
spanName := fmt.Sprintf("%s %s", r.Method, r.URL.String())
ctx, span := otel.Tracer(globalServiceName).Start(ctx, spanName)
if span.IsRecording() {
span.SetAttributes(attribute.String("http.request.method", r.Method))
span.SetAttributes(attribute.String("server.address", r.RemoteAddr))
span.SetAttributes(attribute.String("url.full", r.URL.String()))
}
return ctx, span
}
func ServiceLevelSpan(
ctx context.Context,
) (context.Context, trace.Span) {
pc := make([]uintptr, 10)
runtime.Callers(2, pc)
f := runtime.FuncForPC(pc[0])
ctx, span := otel.Tracer(globalServiceName).Start(ctx, f.Name())
if span.IsRecording() {
span.SetAttributes(attribute.String("code.function.name", f.Name()))
}
return ctx, span
}
When conducting trace on any aspect of application like HTTP, service, GRPC, or database, it is recommended to follow Semantic Convention or semconv. Semantic convention give us meaningful information about generated trace and span. Based on code above, code.function
, server.address
, and url.full
are examples of semantic convention.
Instrumenting Application
Code below demonstrate usages of HTTPLevelSpan
and ServiceLevelSpan
package server
import (
"log"
"net/http"
)
func RunServer() {
http.HandleFunc("/", home())
if err := http.ListenAndServe(":9320", nil); err != nil {
log.Fatalf("could not start server: %v\n", err)
}
}
package server
import (
"fmt"
"log"
"net/http"
"strconv"
"time"
"go-trace-demo/service"
"go-trace-demo/utils/metrics"
utilsTrace "go-trace-demo/utils/trace"
)
func home() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
ctx, span := utilsTrace.HTTPLevelSpan(ctx, r)
defer span.End()
respCode := 200
failQuery := r.URL.Query().Get("fail")
if failQuery != "" {
respCode = 500
isFail, err := strconv.ParseBool(failQuery)
if err != nil {
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte("error parse query"))
return
}
if isFail {
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte("encounter fail"))
return
}
}
err := service.GetUserByID(ctx)
if err != nil {
return
}
err = service.ListStores(ctx)
if err != nil {
return
}
w.Write([]byte("Hello, World!"))
}
}
package service
import (
"context"
utilsTrace "go-trace-demo/utils/trace"
)
func GetUserByID(
ctx context.Context,
) error {
ctx, span := utilsTrace.ServiceLevelSpan(ctx)
defer span.End()
return nil
}
func ListStores(
ctx context.Context,
) error {
ctx, span := utilsTrace.ServiceLevelSpan(ctx)
defer span.End()
return nil
}
Below is main
code
func main() {
ctx := context.Background()
serviceName := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
err := utilsTrace.NewTracer(ctx, serviceName)
if err != nil {
log.Fatalln("fail setup tracer")
}
server.RunServer()
}
Glue Everything
We'll run application, OpenTelemetry Collector, and Grafana Tempo using docker compose file below
version: "3"
networks:
observability:
services:
tempo:
image: grafana/tempo:main-c5323bf
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./config/tempo-config.yaml:/etc/tempo.yaml
- ./config/tempo-data:/var/tempo
ports:
- "3200:3200" # tempo
- "4317" # open this port for OpenTelemetry gRPC exporter
- "4318" # open this port for OpenTelemetry HTTP exporter
networks:
- observability
otel-collector:
image: otel/opentelemetry-collector:latest
volumes:
- ./config/otel-config.yaml:/etc/otel/config.yaml
command:
- '--config=/etc/otel/config.yaml'
ports:
- "4317:4317" #grpc
- "4318:4318" #http
depends_on:
- loki
networks:
- observability
app:
container_name: app
build:
context: ./
dockerfile: ./Dockerfile
ports:
- 9320:9320
networks:
- observability
depends_on:
- otel-collector
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
- OTEL_SERVICE_NAME=go-loki-app-demo
grafana:
environment:
- GF_PATHS_PROVISIONING=/etc/grafana/provisioning
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
entrypoint:
- sh
- -euc
- |
mkdir -p /etc/grafana/provisioning/datasources
cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: true
version: 1
editable: false
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
version: 1
editable: false
EOF
/run.sh
image: grafana/grafana:latest
ports:
- "3000:3000"
networks:
- observability
Our application defined at service.app
section. Be aware that we define OTEL_EXPORTER_OTLP_ENDPOINT
environment variables for application. The reason is when we are defining OpenTelemetry HTTP exporter, we don't specify OpenTelemetry collector address. OpenTelemetry will look for OTEL_EXPORTER_OTLP_ENDPOINT
environment variable to know where application should send trace data. If you have specific OpenTelemetry exporter, you can use WithEndpointURL
argument for otlptracehttp.NewClient
. Additionally, we define OTEL_SERVICE_NAME
environment variable where the purpose is for OpenTelemetry identify what service they are receive the trace data. If OTEL_SERVICE_NAME
not provided, OpenTelemetry will fill with unknown_service
value.
Visualizing Traces Using Grafana Tempo
After successful container creation with Docker compose, let's generate trace data by accessing localhost:9320/
. Next, access Grafana on localhost:3000
, then head to Exlpore
menu by click dropdown menu on top left corner.
Switch to Search
query type instead of TraceQL
, then click Service Name
dropdown. It will show service name we provided on OTEL_SERVICE_NAME
. By click
Run Query
, we'll have something like this.
Explore trace data by click available row on Trace
column. We will be redirected to trace detail like this
.
Additionally, explore span attribute by click one of span. We will see span attribute and resource attribute we defined when we are creating OpenTelemetry trace instance.
Top comments (0)