DEV Community

Cover image for Serverless Spy Vs. Spy Chapter 3: X-Ray vs Jaeger - Send Lambda traces with open telemetry
Gernot Glawe for AWS Community Builders

Posted on • Originally published at tecracer.com

Serverless Spy Vs. Spy Chapter 3: X-Ray vs Jaeger - Send Lambda traces with open telemetry

In modern architectures, Lambda functions co-exist with containers. Cloud Native Observability is achieved with open telemetry. I show you how to send open telemetry traces from Lambda to a Jaeger tracing server. Let's see how this compares to the X-Ray tracing service.

As the Lambda setup with Typescript and Python already had a good coverage in chapter 2, I will stick to GO here. The CDK code is easy to migrate.

Setting

Overview)
Architecture overview

The Lambda Function (2) sends traces to the jaeger backend with the OpenTelemetry Protocol. Because we do not want to accept requests from the internet, Lambda has to run within the network of the VPC called basevpc. This VPC is created (1) at first. The jaeger container announces its IP via the AWS Serviced-Discovery service.

To access the frontend/UI of jaeger a Load Balancer is created between the internal jaeger service private IP and the internet.

The CDK code, the application code and jaeger itself are written on GO.

Lambda

Lambda Resources

Lambda Resource
AWS Lambda Resources

  1   lambdaPath := filepath.Join(path, "../dist/main.zip")
  2   adotLayer := lambda.LayerVersion_FromLayerVersionArn(this, aws.String("adotlayer"),
  3     aws.String("arn:aws:lambda:eu-central-1:901920570463:layer:aws-otel-collector-amd64-ver-0-62-1:1"))
  4   fn := lambda.NewFunction(this, aws.String("adotlambda"),
  5   &lambda.FunctionProps{
  6     Vpc: vpc,
  7     Handler: aws.String("main"),
  8     Runtime: lambda.Runtime_PROVIDED_AL2(),
  9     Tracing: lambda.Tracing_ACTIVE,
 10     Environment: &map[string]*string{
 11       "OPENTELEMETRY_COLLECTOR_CONFIG_FILE" : aws.String("/var/task/config.yml"),
 12       // "https://opentelemetry.io/docs/concepts/sdk-configuration/general-sdk-configuration/"
 13       "OTEL_SERVICE_NAME" : aws.String("documentcounter"),
 14     },
 15     AllowPublicSubnet: aws.Bool(true),
 16     Layers: &[]lambda.ILayerVersion{
 17         adotLayer,
 18     },
 19     },
 20   )
Enter fullscreen mode Exit fullscreen mode

You have to define the following configuration, see chapter 2:

  1. Line 2:3 - The Lambda Layer for the otel collector
  2. Line 6 - run in the VPC
  3. Line 1 - Set the configuration file location
  4. Line 16 - Activate the layer

Lambda Code

In the application you have to do:

1. Configure the middleware to send traces

otelaws.AppendMiddlewares(&cfg.APIOptions)
ClientDDB = dynamodb.NewFromConfig(cfg)
Enter fullscreen mode Exit fullscreen mode

2. Propagate the context through all functions:

  • From main:
tp, err := xrayconfig.NewTracerProvider(ctx)
//...
lambda.Start(otellambda.InstrumentHandler(HandleRequest, xrayconfig.WithRecommendedOptions(tp)... ))
Enter fullscreen mode Exit fullscreen mode
  • to HandleRequest
func HandleRequest(ctx context.Context, s3Event events.S3Event) (string, error) {
//...
putItem(ctx,s3input)
Enter fullscreen mode Exit fullscreen mode
  • to putitem
func putItem(ctx context.Context, itemID string){
//...
result, err := ClientDDB.PutItem(ctx,input)
Enter fullscreen mode Exit fullscreen mode

otel flow)

In the app, at the end an s3 listobjects is performed, so that you have two AWS services in the traces.

See chapter 2 for more details.

Now Lambda could send traces, so we need a target. I chose Jaeger, an open-source, end-to-end distributed tracing, originally provided by Uber Technologies.

Jaeger Installation

VPC

We provide a VPN to run the ECS service - just a VPC with a private subnet.

Fargate Service

Jaeger service)
The JAEGER service

The front end will be provided on port 16686, the OTEL request will go to port 4317 via gRPC. All jaeger ports are described in the deployment part of the jaeger documentation.

To access the jager front end with a DNS name, you have to have a domain. So change the following configurations in jaeger/cluster.go:

var SERVICE_NAME = "jaeger"
var NAMESPACE = "otel.letsbuild-aws.com"
var HOSTED_ZONE_ID = "Z042038724KH99T9LFKK6"
var DNS_NAME = "service.letsbuild-aws.com"
Enter fullscreen mode Exit fullscreen mode

In this example, I have created a subdomain "service.letsbuild-aws.com" for the Load Balancer. The NAMESPACE is used for service discovery. You do not need a real domain for service discovery.

To get jaeger up and running, there is an all-in-one image we use:

jaegertracing/all-in-one:1.39.0
Enter fullscreen mode Exit fullscreen mode

The jaeger container can be configured via the environment:

"SPAN_STORAGE_TYPE":      aws.String("memory"),
"COLLECTOR_OTLP_ENABLED": aws.String("true"),
"LOG_LEVEL":              aws.String("debug"),
Enter fullscreen mode Exit fullscreen mode

To keep it (almost) simple, the storage is set to memory. In production, you could use Cassandra, elasticsearch and other backends. As stated in the jaeger documentation, all CLI parameters can be set via ENV variables. To be able to receive otlp data, its enabled.

The management ui and otlp ports are configured for the container:

task.AddContainer(aws.String("jaegerContainer"),
    &ecs.ContainerDefinitionOptions{
        Image:         ecs.ContainerImage_FromRegistry(aws.String("jaegertracing/all-in-one:1.39.0"), nil),
        ContainerName: aws.String("jaeger-all"),
//...
        PortMappings: &[]*ecs.PortMapping{
            {
                ContainerPort: MANAGEMENT_PORT,
                HostPort:      MANAGEMENT_PORT,
                Protocol:      ecs.Protocol_TCP,
                // management
            },
            {
                ContainerPort: aws.Float64(4317),
                HostPort:      aws.Float64(4317),
                Protocol:      ecs.Protocol_TCP,
                // "otel-grpc"
            },
//...
Enter fullscreen mode Exit fullscreen mode

See the jaeger/cluster.go file for the complete source.

Connect Lambda to Jaeger

On the jaeger side a namespace is configured:

namespace := awsservicediscovery.NewPrivateDnsNamespace(this, aws.String("oteltrace-namespace"),
    &awsservicediscovery.PrivateDnsNamespaceProps{
        Name:        aws.String(NAMESPACE),
        Description: aws.String("DNS service discovery subdomain"),
        Vpc:         vpc,
    },
)
Enter fullscreen mode Exit fullscreen mode

This creates an entry in the private domain otel.letsbuild-aws.com:

DNS

On the Lambda side the first thing is to tell the adot Layer, where to find the config file:

"OPENTELEMETRY_COLLECTOR_CONFIG_FILE" : aws.String("/var/task/config.yml"),
Enter fullscreen mode Exit fullscreen mode

Because Lambda apps are deployed into the directory /var/task on the Lambda micro-vm, you have to prepend the path /var/task.

You find the file here: app/config.yml

The configuration is added to the Lambda deployment package:

env GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-s -w" -o ../dist/main main/main.go
chmod +x ../dist/main
mv ../dist/main ../dist/bootstrap
cp config.yml ../dist
cd ../dist && zip main.zip bootstrap config.yml
Enter fullscreen mode Exit fullscreen mode

The lambda package build script

Configuration

In the configuration, we have three parts

1) The local receiver:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
Enter fullscreen mode Exit fullscreen mode

It receives the traces.

2) The exporter

exporters:
  otlp:
    endpoint: jaeger.otel.letsbuild-aws.com:4317
    tls:
      insecure: true    
Enter fullscreen mode Exit fullscreen mode

Here the dns name from the awsservicediscovery is used for the ENDPOINT.

3) The pipelines

Now incoming receiver is piped to the outgoing exporter

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [ otlp]
Enter fullscreen mode Exit fullscreen mode

Details are described in the OpenTelemetry documentation. As stated in chapter 2, not all configurations are valid here.

Compare X-Ray UI to Jager UI

X-Ray now

As the collector is not configured for X-Ray traces, we just see the data from the Lambda service, not the function:

X-Ray service only

Switch the configuration back to x-rays:

1) Change app/config.yml

Samples for the configurations are provided in

  • app/config-otel.yml
  • app/config-xray.yml

2) Deploy app

cd app
task fastdeploy
Enter fullscreen mode Exit fullscreen mode

Then some traffic:

cd ..
task traffic
Enter fullscreen mode Exit fullscreen mode

Then we see all nodes in the X-Ray Map view:

Xray all services

And some traces.

X-Ray Trace Map

X-Ray Trace Map

Jaeger/Otel

Switch the configuration back to otel and deploy Lambda app again. After creating some traffic, you see traces in the jaeger ui.

Access the jaeger UI from the loadbalancer dns entry or your domain name.

Jaeger base
Jaeger Trace Map

1) Choose Service documentcounter here
This is the name I set with the environment variable
OTEL_SERVICE_NAME, configured in the Lambda Resource.
2) The button [Find Traces] shows a graphical view (4) and the single traces

Click on a trace (3) to see the detail view:

Jaeger Timeline
Jaeger Timeline

Comparing both trace maps we notice the missing nodes with the AWS service icons.

Comparing the timelines, you see that the segments otellambda AWS::Lambda and otellamba AWS::Lambda::Function only appear in X-Ray, not on jaeger.

That is because only the Lambda Function sends traces, not the Lambda Service. In the "Cloud-Native" container world, usually, it is assumed that the container is already running. So the startup time is not interesting. In Lambda the micro-vm is started, when a request hits a cold start. If that happens often, it may affect your overall latency, so you want to have data. You may get the init duration also from the Lambda Logs. If you need information from the Lambda Resource, you might use the Lambda extensions and the AWS Lambda Telemetry API, which I will cover in the last chapter.

The detail information are almost the same:

putitem detail

Is there a winner?

Functionality

If you have to decide whether to use X-Ray or other services for your traces, ADOT is the more flexible choice. It provides more support from various sources.

For services that have a large AWS part, the X-Ray service provide some more functionality like creating nodes.

Cost

It depends on your metrics!

Otel open source tracing e.g. jaeger

I have seen some other posts, which stated that an extra tracing service would be cheaper, "because it is open source". If you compare the costs the tco have some more parts:

  • Costs of the running container

Price in eu-central-1
per vCPU per hour $0.04656
per GB per hour $0.00511
With 2 vCPU | 4 GB

Which would be 62.01 €/month

  • Cost of storage: depends on backend

In the production environment, you would like to set up an application load balancer with cognito authentication with additional costs.

production setup

X-Ray

The X-Ray server costs are $5.00 per 1 million traces in eu-central-1. You can also adapt the sample rate to not have a trace with each call.

Operations

The telemetry infrastructure setup is done only once. Once you have it running, there should be not much to do.

With X-Ray, there is no additional operational cost.

Usage

Because you provide the jaeger container yourself, you can adapt the size to the speed you need. In my tests the jaeger frontend seemed very much faster than the X-Ray aka CloudWatch Service Map.

Conclusion

With the sample apps from the opentelemetry-lambda repository the Lambda part itself was easy to implement. What took me some time was to provide the jaeger Fargate service with IaC ouside of an k8s environment. But with ECS and ServiceDiscovery that was easy in the end. This should be even more simple in an EKS environment with the jaegertracing helm-charts.

Using something else as tracing solution instead of X-Ray not looks like a good choice for AWS serverless projects.
But if you have a container solution up and running, otel would be a good choice for an environment, where container traces and Lambda traces are stored together.

See also

If you need consulting for your serverless project, don't hesitate to get in touch with the sponsor of this blog, tecRacer.

For more AWS development stuff, follow me on dev https://dev.to/megaproaktiv.

Appendix: Quick Walkthrough

  1. Clone repository
git clone https://github.com/megaproaktiv/adot-otelstarter.git
cd adot-otelstarter
Enter fullscreen mode Exit fullscreen mode
  1. Set region export AWS_REGION=yourregion, e.g.
  export AWS_REGION=eu-central-1
Enter fullscreen mode Exit fullscreen mode
  1. If CDK is not bootstrapped:
  task bootstrap
Enter fullscreen mode Exit fullscreen mode
  1. Create VPC
  task jaeger:deploy-vpc
Enter fullscreen mode Exit fullscreen mode
  1. Set Domain and Service configuration

Edit jaeger/cluster.go:

  var SERVICE_NAME = "jaeger"
  var NAMESPACE = "otel.letsbuild-aws.com"
  var HOSTED_ZONE_ID = "Z042035555KH99T9LFKK6"
  var DNS_NAME = "service.letsbuild-aws.com"
Enter fullscreen mode Exit fullscreen mode
  1. Create ECS cluster with jaeger service
  task jaeger:deploy-jaeger
Enter fullscreen mode Exit fullscreen mode
  1. Deploy Lambda Resources and function
  task deploy
Enter fullscreen mode Exit fullscreen mode

Note: because of the ENI this could take a few minutes

  1. Create Traffic
  task traffic
Enter fullscreen mode Exit fullscreen mode

Top comments (0)