Aditya Gupta

Posted on Nov 2

Real-World Distributed Tracing: Java, OpenTelemetry, and Google Cloud Trace in Production

#gcp #observability #googlecloud #java

Deep Dive into Google Cloud Tracing for Java: A Production-Ready Implementation Guide

Distributed tracing has transformed from an operational luxury to a fundamental requirement for understanding production systems. Over the years, I've debugged countless incidents where proper trace instrumentation made the difference between resolving an issue in minutes versus hours of correlation work across fragmented logs. Google Cloud Trace, now deeply integrated with OpenTelemetry standards, has undergone significant evolution—particularly with the September 2025 introduction of the Telemetry API and the architectural improvements that followed.

This article provides a comprehensive, production-ready guide to implementing Google Cloud Tracing in Java applications, incorporating the latest features, best practices, and real-world implementation patterns refined through operational experience.

Understanding Cloud Trace in the Modern Observability Landscape

Google Cloud Trace is a managed distributed tracing system that collects latency data from applications and provides near real-time visibility into request flows. The fundamental building blocks remain conceptually simple: traces represent the complete journey of a request through your system, while spans represent individual operations within that trace. Each span captures precise timing information, status codes, and contextual attributes that illuminate what occurred during execution.

The power of Cloud Trace lies in its deep integration with Google Cloud's ecosystem. Services like Cloud Run, App Engine, and Cloud Functions automatically generate trace data without explicit instrumentation. For custom applications—particularly Java services, Cloud Trace now supports both the modern OpenTelemetry Protocol (OTLP) via the Telemetry API and the legacy Cloud Trace API.

Official Documentation: Cloud Trace Overview

The Telemetry API Revolution: OpenTelemetry-Native Architecture

The most transformative development in Cloud Trace occurred in September 2025 with the general availability of the Telemetry API. This represents a fundamental architectural shift in how Google Cloud ingests observability data.

The New Telemetry API Endpoint

The Telemetry API implements the OpenTelemetry Protocol (OTLP) natively at telemetry.googleapis.com, providing a vendor-neutral ingestion point for trace data. This isn't merely a compatibility shim—Google restructured its internal storage to use the OpenTelemetry data model natively, yielding substantial improvements across the board.

The limits imposed by the Telemetry API are dramatically more generous than the legacy Cloud Trace API:

Attribute	Cloud Trace API	Telemetry API (OTLP)
Attribute key size	128 bytes	512 bytes
Attribute value size	256 bytes	64 KiB
Span name length	128 bytes	1,024 bytes
Attributes per span	32	1,024
Events per span	128	256
Links per span	N/A	128

These improvements fundamentally change what you can instrument. The ability to attach 1,024 attributes to a span instead of 32 enables capturing rich contextual information about requests—user segments, feature flags, request parameters, and business context—without hitting artificial limits.

Official Documentation:

Java Implementation: Production-Ready OpenTelemetry Instrumentation

Project Dependencies and Setup

For a production Java application using Spring Boot 3.x, configure your pom.xml with the following dependencies:

<properties>
    <java.version>17</java.version>
    <spring-boot.version>3.2.0</spring-boot.version>
    <opentelemetry.version>1.40.0</opentelemetry.version>
    <opentelemetry-alpha.version>1.40.0-alpha</opentelemetry-alpha.version>
    <google-cloud-trace.version>2.28.0</google-cloud-trace.version>
</properties>

<dependencies>
    <!-- Spring Boot Dependencies -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

    <!-- OpenTelemetry API -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-api</artifactId>
        <version>${opentelemetry.version}</version>
    </dependency>

    <!-- OpenTelemetry SDK -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk</artifactId>
        <version>${opentelemetry.version}</version>
    </dependency>

    <!-- OpenTelemetry SDK Trace -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk-trace</artifactId>
        <version>${opentelemetry.version}</version>
    </dependency>

    <!-- OpenTelemetry OTLP Exporter -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-exporter-otlp</artifactId>
        <version>${opentelemetry.version}</version>
    </dependency>

    <!-- OpenTelemetry Semantic Conventions -->
    <dependency>
        <groupId>io.opentelemetry.semconv</groupId>
        <artifactId>opentelemetry-semconv</artifactId>
        <version>${opentelemetry-alpha.version}</version>
    </dependency>

    <!-- Google Cloud Trace Exporter (Direct Export) -->
    <dependency>
        <groupId>com.google.cloud.opentelemetry</groupId>
        <artifactId>exporter-trace</artifactId>
        <version>0.30.0</version>
    </dependency>

    <!-- GCP Resource Detector -->
    <dependency>
        <groupId>io.opentelemetry.contrib</groupId>
        <artifactId>opentelemetry-gcp-resources</artifactId>
        <version>1.37.0-alpha</version>
    </dependency>

    <!-- Logging -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-log4j2</artifactId>
    </dependency>

    <!-- Log4j2 JSON Template Layout for structured logging -->
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-layout-template-json</artifactId>
        <version>2.23.1</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
    </plugins>
</build>

OpenTelemetry Configuration for Production

Create a configuration class that initializes OpenTelemetry with production-grade settings:

package com.example.tracing.config;

import com.google.cloud.opentelemetry.trace.TraceConfiguration;
import com.google.cloud.opentelemetry.trace.TraceExporter;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator;
import io.opentelemetry.context.propagation.ContextPropagators;
import io.opentelemetry.contrib.gcp.resource.GCPResourceProvider;
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.resources.Resource;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.sdk.trace.export.SpanExporter;
import io.opentelemetry.sdk.trace.samplers.Sampler;
import io.opentelemetry.semconv.ResourceAttributes;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.time.Duration;
import java.util.concurrent.TimeUnit;

@Configuration
public class OpenTelemetryConfig {

    @Value("${spring.application.name}")
    private String serviceName;

    @Value("${gcp.project.id}")
    private String gcpProjectId;

    @Value("${otel.traces.sampler.probability:0.1}")
    private double samplingProbability;

    @Value("${otel.exporter.type:telemetry-api}")
    private String exporterType;

    /**
     * Creates the Resource that describes this service.
     * Automatically detects GCP resource attributes when running on GCP.
     */
    @Bean
    public Resource otelResource() {
        // Create base resource with service information
        Resource serviceResource = Resource.getDefault()
                .merge(Resource.create(
                        Attributes.builder()
                                .put(ResourceAttributes.SERVICE_NAME, serviceName)
                                .put(ResourceAttributes.SERVICE_VERSION, "1.0.0")
                                .put(ResourceAttributes.SERVICE_NAMESPACE, "production")
                                .put(ResourceAttributes.DEPLOYMENT_ENVIRONMENT, "production")
                                .build()
                ));

        // Merge with GCP-specific resource attributes
        // This automatically detects GKE, Cloud Run, Compute Engine metadata
        Resource gcpResource = new GCPResourceProvider().createResource(null);

        return serviceResource.merge(gcpResource);
    }

    /**
     * Creates the SpanExporter based on configuration.
     * Supports both direct Cloud Trace export and OTLP to Telemetry API.
     */
    @Bean
    public SpanExporter spanExporter() {
        if ("cloud-trace".equals(exporterType)) {
            // Direct export to Cloud Trace using Google Cloud exporter
            TraceConfiguration traceConfig = TraceConfiguration.builder()
                    .setProjectId(gcpProjectId)
                    .setDeadline(Duration.ofSeconds(10))
                    .build();

            return TraceExporter.createWithConfiguration(traceConfig);
        } else {
            // Export via OTLP to Telemetry API (recommended for production)
            return OtlpGrpcSpanExporter.builder()
                    .setEndpoint("https://telemetry.googleapis.com:443")
                    .setTimeout(10, TimeUnit.SECONDS)
                    .setCompression("gzip")
                    .build();
        }
    }

    /**
     * Creates the BatchSpanProcessor with production-optimized settings.
     * Batching reduces network overhead and API call costs.
     */
    @Bean
    public BatchSpanProcessor batchSpanProcessor(SpanExporter spanExporter) {
        return BatchSpanProcessor.builder(spanExporter)
                .setScheduleDelay(5, TimeUnit.SECONDS)  // Batch every 5 seconds
                .setMaxQueueSize(2048)                   // Queue up to 2048 spans
                .setMaxExportBatchSize(512)              // Export in batches of 512
                .setExporterTimeout(30, TimeUnit.SECONDS) // 30s export timeout
                .build();
    }

    /**
     * Creates the SdkTracerProvider with appropriate sampling configuration.
     */
    @Bean
    public SdkTracerProvider sdkTracerProvider(
            Resource resource,
            BatchSpanProcessor batchSpanProcessor) {

        return SdkTracerProvider.builder()
                .setResource(resource)
                .addSpanProcessor(batchSpanProcessor)
                .setSampler(Sampler.traceIdRatioBased(samplingProbability))
                .build();
    }

    /**
     * Creates the OpenTelemetry instance configured for production use.
     */
    @Bean
    public OpenTelemetry openTelemetry(SdkTracerProvider sdkTracerProvider) {
        OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder()
                .setTracerProvider(sdkTracerProvider)
                .setPropagators(ContextPropagators.create(
                        W3CTraceContextPropagator.getInstance()))
                .build();

        // Register global instance for instrumentation libraries
        Runtime.getRuntime().addShutdownHook(new Thread(sdkTracerProvider::close));

        return openTelemetrySdk;
    }
}

Application Properties Configuration

Create an application.yml with environment-specific configuration:

spring:
  application:
    name: payment-service

  # Exclude default logging starter
  autoconfigure:
    exclude:
      - org.springframework.boot.autoconfigure.logging.ConditionEvaluationReportLoggingListener

gcp:
  project:
    id: ${GCP_PROJECT_ID:my-production-project}

otel:
  traces:
    sampler:
      # 10% sampling for production; increase for dev/staging
      probability: ${OTEL_TRACES_SAMPLER_PROBABILITY:0.1}

  exporter:
    # Options: "telemetry-api" (recommended) or "cloud-trace"
    type: ${OTEL_EXPORTER_TYPE:telemetry-api}

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  metrics:
    export:
      prometheus:
        enabled: true

logging:
  config: classpath:log4j2.xml

Structured Logging Configuration with Trace Correlation

Create log4j2.xml for JSON structured logging that correlates with traces:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <!-- Use JsonTemplateLayout with GCP-specific template -->
            <JsonTemplateLayout eventTemplateUri="classpath:GcpLayout.json">
                <!-- Extract trace context from MDC and add to logs -->
                <EventTemplateAdditionalField
                    key="logging.googleapis.com/trace"
                    format="JSON"
                    value='{"$resolver": "mdc", "key": "trace_id"}'
                />
                <EventTemplateAdditionalField
                    key="logging.googleapis.com/spanId"
                    format="JSON"
                    value='{"$resolver": "mdc", "key": "span_id"}'
                />
                <EventTemplateAdditionalField
                    key="logging.googleapis.com/trace_sampled"
                    format="JSON"
                    value='{"$resolver": "mdc", "key": "trace_flags"}'
                />

                <!-- Add service context -->
                <EventTemplateAdditionalField
                    key="serviceContext"
                    format="JSON"
                    value='{"service": "${spring:spring.application.name}", "version": "1.0.0"}'
                />
            </JsonTemplateLayout>
        </Console>
    </Appenders>

    <Loggers>
        <Root level="INFO">
            <AppenderRef ref="Console"/>
        </Root>

        <!-- Reduce noise from framework libraries -->
        <Logger name="org.springframework" level="WARN"/>
        <Logger name="io.opentelemetry" level="INFO"/>
        <Logger name="com.google.cloud" level="INFO"/>
    </Loggers>
</Configuration>

Official Documentation: Java Instrumentation Sample

Context Propagation: The Foundation of Distributed Tracing

Context propagation ensures that trace context flows across service boundaries. The Telemetry API supports multiple propagation formats, with W3C Trace Context as the modern standard.

W3C Trace Context Format

The W3C Trace Context specification defines the traceparent header:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             │  │                                │                │
             │  │                                │                └─ Trace flags (sampled)
             │  │                                └─ Span ID (16 hex digits)
             │  └─ Trace ID (32 hex digits)
             └─ Version (00)

Automatic Context Propagation in Java

OpenTelemetry's Java instrumentation automatically propagates context for HTTP and gRPC:

package com.example.tracing.service;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.Scope;
import io.opentelemetry.context.propagation.TextMapGetter;
import io.opentelemetry.semconv.SemanticAttributes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpMethod;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;

import java.util.Map;

@Service
public class PaymentProcessingService {

    private static final Logger logger = LoggerFactory.getLogger(PaymentProcessingService.class);

    private final Tracer tracer;
    private final RestTemplate restTemplate;
    private final OpenTelemetry openTelemetry;

    public PaymentProcessingService(OpenTelemetry openTelemetry, RestTemplate restTemplate) {
        this.openTelemetry = openTelemetry;
        this.tracer = openTelemetry.getTracer("payment-service", "1.0.0");
        this.restTemplate = restTemplate;
    }

    /**
     * Processes a payment with full distributed tracing instrumentation.
     * Demonstrates manual span creation, attribute attachment, and error handling.
     */
    public PaymentResult processPayment(PaymentRequest request) {
        // Create a span for this operation
        Span span = tracer.spanBuilder("processPayment")
                .setSpanKind(SpanKind.INTERNAL)
                .startSpan();

        // Make this span the active span in the context
        try (Scope scope = span.makeCurrent()) {
            // Populate MDC for structured logging correlation
            populateLoggingContext(span);

            // Add semantic attributes following OpenTelemetry conventions
            span.setAttribute(SemanticAttributes.ENDUSER_ID, request.getUserId());
            span.setAttribute("payment.amount", request.getAmount());
            span.setAttribute("payment.currency", request.getCurrency());
            span.setAttribute("payment.method", request.getPaymentMethod());

            logger.info("Processing payment for user={}, amount={} {}",
                    request.getUserId(), request.getAmount(), request.getCurrency());

            // Validate payment (child span created automatically by validation logic)
            validatePayment(request);

            // Process with external payment gateway
            PaymentGatewayResponse gatewayResponse = callPaymentGateway(request);

            // Add response attributes
            span.setAttribute("payment.transaction_id", gatewayResponse.getTransactionId());
            span.setAttribute("payment.status", gatewayResponse.getStatus());

            // Record a span event for significant operations
            span.addEvent("payment.authorized", 
                    io.opentelemetry.api.common.Attributes.builder()
                            .put("authorization.code", gatewayResponse.getAuthCode())
                            .build());

            logger.info("Payment processed successfully: transactionId={}",
                    gatewayResponse.getTransactionId());

            span.setStatus(StatusCode.OK);
            return PaymentResult.success(gatewayResponse);

        } catch (PaymentValidationException e) {
            span.recordException(e);
            span.setStatus(StatusCode.ERROR, "Payment validation failed");
            logger.error("Payment validation failed: {}", e.getMessage(), e);
            throw e;

        } catch (Exception e) {
            span.recordException(e);
            span.setStatus(StatusCode.ERROR, "Payment processing error");
            logger.error("Unexpected error processing payment", e);
            throw new PaymentProcessingException("Payment processing failed", e);

        } finally {
            span.end();
            clearLoggingContext();
        }
    }

    /**
     * Calls external payment gateway with propagated trace context.
     * Demonstrates manual HTTP client instrumentation with context propagation.
     */
    private PaymentGatewayResponse callPaymentGateway(PaymentRequest request) {
        Span span = tracer.spanBuilder("callPaymentGateway")
                .setSpanKind(SpanKind.CLIENT)
                .startSpan();

        try (Scope scope = span.makeCurrent()) {
            String gatewayUrl = "https://payment-gateway.example.com/api/v1/charge";

            // Add HTTP semantic attributes
            span.setAttribute(SemanticAttributes.HTTP_REQUEST_METHOD, "POST");
            span.setAttribute(SemanticAttributes.URL_FULL, gatewayUrl);
            span.setAttribute(SemanticAttributes.SERVER_ADDRESS, "payment-gateway.example.com");
            span.setAttribute(SemanticAttributes.SERVER_PORT, 443);

            // Create HTTP headers and inject trace context
            HttpHeaders headers = new HttpHeaders();
            openTelemetry.getPropagators().getTextMapPropagator()
                    .inject(Context.current(), headers, HttpHeaders::set);

            headers.set("Content-Type", "application/json");
            headers.set("X-API-Key", getApiKey());

            logger.debug("Calling payment gateway: url={}", gatewayUrl);

            // Make HTTP request
            org.springframework.http.HttpEntity<PaymentGatewayRequest> entity = 
                    new org.springframework.http.HttpEntity<>(
                            toGatewayRequest(request), headers);

            ResponseEntity<PaymentGatewayResponse> response = restTemplate.exchange(
                    gatewayUrl,
                    HttpMethod.POST,
                    entity,
                    PaymentGatewayResponse.class
            );

            // Record response attributes
            span.setAttribute(SemanticAttributes.HTTP_RESPONSE_STATUS_CODE, 
                    response.getStatusCode().value());

            if (response.getStatusCode().is2xxSuccessful()) {
                span.setStatus(StatusCode.OK);
                return response.getBody();
            } else {
                span.setStatus(StatusCode.ERROR, "Gateway returned error");
                throw new PaymentGatewayException("Gateway error: " + response.getStatusCode());
            }

        } catch (Exception e) {
            span.recordException(e);
            span.setStatus(StatusCode.ERROR, e.getMessage());
            throw e;
        } finally {
            span.end();
        }
    }

    /**
     * Validates payment with child span instrumentation.
     */
    private void validatePayment(PaymentRequest request) {
        Span span = tracer.spanBuilder("validatePayment")
                .setSpanKind(SpanKind.INTERNAL)
                .startSpan();

        try (Scope scope = span.makeCurrent()) {
            span.setAttribute("validation.amount", request.getAmount());
            span.setAttribute("validation.currency", request.getCurrency());

            if (request.getAmount() <= 0) {
                span.addEvent("validation.failed",
                        io.opentelemetry.api.common.Attributes.builder()
                                .put("reason", "invalid_amount")
                                .build());
                throw new PaymentValidationException("Amount must be positive");
            }

            if (request.getAmount() > 10000.0) {
                span.addEvent("validation.failed",
                        io.opentelemetry.api.common.Attributes.builder()
                                .put("reason", "amount_exceeds_limit")
                                .build());
                throw new PaymentValidationException("Amount exceeds limit");
            }

            span.setStatus(StatusCode.OK);
            logger.debug("Payment validation passed");

        } finally {
            span.end();
        }
    }

    /**
     * Populates SLF4J MDC with trace context for log correlation.
     */
    private void populateLoggingContext(Span span) {
        io.opentelemetry.api.trace.SpanContext spanContext = span.getSpanContext();
        MDC.put("trace_id", "projects/" + System.getenv("GCP_PROJECT_ID") + 
                "/traces/" + spanContext.getTraceId());
        MDC.put("span_id", spanContext.getSpanId());
        MDC.put("trace_flags", spanContext.isSampled() ? "true" : "false");
    }

    private void clearLoggingContext() {
        MDC.remove("trace_id");
        MDC.remove("span_id");
        MDC.remove("trace_flags");
    }

    // Helper method implementations omitted for brevity
    private String getApiKey() { return System.getenv("PAYMENT_GATEWAY_API_KEY"); }
    private PaymentGatewayRequest toGatewayRequest(PaymentRequest request) { /* ... */ return null; }
}

Extracting Context from Incoming Requests

For services receiving requests, extract context from incoming headers:

package com.example.tracing.controller;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.Scope;
import io.opentelemetry.context.propagation.TextMapGetter;
import org.springframework.http.HttpHeaders;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import jakarta.servlet.http.HttpServletRequest;
import java.util.Collections;

@RestController
@RequestMapping("/api/v1/payments")
public class PaymentController {

    private final Tracer tracer;
    private final OpenTelemetry openTelemetry;
    private final PaymentProcessingService paymentService;

    // TextMapGetter for extracting context from HTTP headers
    private static final TextMapGetter<HttpHeaders> getter = new TextMapGetter<>() {
        @Override
        public Iterable<String> keys(HttpHeaders headers) {
            return headers.keySet();
        }

        @Override
        public String get(HttpHeaders headers, String key) {
            return headers.getFirst(key);
        }
    };

    public PaymentController(OpenTelemetry openTelemetry, 
                            PaymentProcessingService paymentService) {
        this.openTelemetry = openTelemetry;
        this.tracer = openTelemetry.getTracer("payment-service", "1.0.0");
        this.paymentService = paymentService;
    }

    @PostMapping
    public ResponseEntity<PaymentResult> createPayment(
            @RequestHeader HttpHeaders headers,
            @RequestBody PaymentRequest request) {

        // Extract context from incoming headers
        Context extractedContext = openTelemetry.getPropagators()
                .getTextMapPropagator()
                .extract(Context.current(), headers, getter);

        // Create server span as child of extracted context
        Span span = tracer.spanBuilder("POST /api/v1/payments")
                .setParent(extractedContext)
                .setSpanKind(io.opentelemetry.api.trace.SpanKind.SERVER)
                .startSpan();

        try (Scope scope = span.makeCurrent()) {
            // Process payment within span context
            PaymentResult result = paymentService.processPayment(request);
            return ResponseEntity.ok(result);
        } finally {
            span.end();
        }
    }
}

Official Documentation:

Zero-Code Instrumentation with the Java Agent

For comprehensive automatic instrumentation, use the OpenTelemetry Java Agent. This approach requires zero code changes and automatically instruments common frameworks.

Dockerfile with Java Agent

FROM eclipse-temurin:17-jre-alpine

# Download OpenTelemetry Java Agent
ARG OTEL_VERSION=2.8.0
RUN wget -O /opt/opentelemetry-javaagent.jar \
    https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v${OTEL_VERSION}/opentelemetry-javaagent.jar

# Copy application
COPY target/payment-service-1.0.0.jar /opt/app.jar

# Set environment variables for OpenTelemetry
ENV OTEL_SERVICE_NAME=payment-service
ENV OTEL_TRACES_EXPORTER=otlp
ENV OTEL_EXPORTER_OTLP_ENDPOINT=https://telemetry.googleapis.com:443
ENV OTEL_EXPORTER_OTLP_PROTOCOL=grpc
ENV OTEL_EXPORTER_OTLP_COMPRESSION=gzip
ENV OTEL_TRACES_SAMPLER=traceidratio
ENV OTEL_TRACES_SAMPLER_ARG=0.1
ENV OTEL_PROPAGATORS=tracecontext,baggage
ENV OTEL_RESOURCE_ATTRIBUTES=service.version=1.0.0,deployment.environment=production

# Enable GCP resource detection
ENV OTEL_RESOURCE_PROVIDERS_GCP_ENABLED=true

# Configure batch span processor
ENV OTEL_BSP_SCHEDULE_DELAY=5000
ENV OTEL_BSP_MAX_QUEUE_SIZE=2048
ENV OTEL_BSP_MAX_EXPORT_BATCH_SIZE=512
ENV OTEL_BSP_EXPORT_TIMEOUT=30000

# Run application with Java Agent
ENTRYPOINT ["java", "-javaagent:/opt/opentelemetry-javaagent.jar", "-jar", "/opt/app.jar"]

Kubernetes Deployment with Java Agent

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      serviceAccountName: payment-service-sa
      containers:
      - name: payment-service
        image: gcr.io/my-project/payment-service:1.0.0
        ports:
        - containerPort: 8080
          name: http
        env:
        - name: GCP_PROJECT_ID
          value: "my-production-project"
        - name: OTEL_SERVICE_NAME
          value: "payment-service"
        - name: OTEL_TRACES_EXPORTER
          value: "otlp"
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "https://telemetry.googleapis.com:443"
        - name: OTEL_EXPORTER_OTLP_PROTOCOL
          value: "grpc"
        - name: OTEL_EXPORTER_OTLP_COMPRESSION
          value: "gzip"
        - name: OTEL_TRACES_SAMPLER
          value: "traceidratio"
        - name: OTEL_TRACES_SAMPLER_ARG
          value: "0.1"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.version=1.0.0,deployment.environment=production"
        - name: OTEL_RESOURCE_PROVIDERS_GCP_ENABLED
          value: "true"
        - name: OTEL_BSP_SCHEDULE_DELAY
          value: "5000"
        - name: OTEL_BSP_MAX_QUEUE_SIZE
          value: "2048"
        - name: OTEL_BSP_MAX_EXPORT_BATCH_SIZE
          value: "512"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 20
          periodSeconds: 5
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service-sa
  namespace: production
  annotations:
    iam.gke.io/gcp-service-account: payment-service@my-production-project.iam.gserviceaccount.com

Official Documentation: OpenTelemetry Java Agent Configuration

The Google-Built OpenTelemetry Collector

For environments where you need centralized telemetry processing, the Google-Built OpenTelemetry Collector provides a secure, production-ready distribution.

Collector Deployment on GKE

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: observability
data:
  collector.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 5s
        send_batch_size: 512
        send_batch_max_size: 1024

      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 25

      resourcedetection/gcp:
        detectors: [gcp]
        timeout: 10s

      k8sattributes:
        auth_type: serviceAccount
        passthrough: false
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.node.name
          labels:
            - tag_name: app
              key: app
              from: pod

    exporters:
      googlecloud:
        project: my-production-project
        metric:
          prefix: custom.googleapis.com
        trace:
          endpoint: telemetry.googleapis.com:443
        use_insecure: false
        timeout: 12s
        retry_on_failure:
          enabled: true
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, resourcedetection/gcp, k8sattributes, batch]
          exporters: [googlecloud]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: observability
spec:
  replicas: 2
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      serviceAccountName: otel-collector-sa
      containers:
      - name: otel-collector
        image: gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
        command:
          - /usr/bin/google-cloud-ops-agent-engine
          - --config=/conf/collector.yaml
        volumeMounts:
        - name: config
          mountPath: /conf
        ports:
        - containerPort: 4317
          name: otlp-grpc
        - containerPort: 4318
          name: otlp-http
        resources:
          requests:
            memory: 256Mi
            cpu: 200m
          limits:
            memory: 512Mi
            cpu: 500m
      volumes:
      - name: config
        configMap:
          name: otel-collector-config
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: observability
spec:
  selector:
    app: otel-collector
  ports:
  - name: otlp-grpc
    port: 4317
    targetPort: 4317
  - name: otlp-http
    port: 4318
    targetPort: 4318
  type: ClusterIP

Official Documentation:

Advanced Sampling Strategies

Sampling controls the volume of trace data ingested, balancing observability with cost.

Adaptive Sampling Based on Request Attributes

package com.example.tracing.sampling;

import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.trace.data.LinkData;
import io.opentelemetry.sdk.trace.samplers.Sampler;
import io.opentelemetry.sdk.trace.samplers.SamplingResult;
import io.opentelemetry.semconv.SemanticAttributes;

import java.util.List;

/**
 * Custom sampler that makes intelligent sampling decisions based on request attributes.
 * Always samples errors, slow requests, and high-value transactions.
 */
public class AdaptiveSampler implements Sampler {

    private final double baseSamplingRate;
    private final Sampler parentBasedSampler;

    public AdaptiveSampler(double baseSamplingRate) {
        this.baseSamplingRate = baseSamplingRate;
        this.parentBasedSampler = Sampler.parentBased(
                Sampler.traceIdRatioBased(baseSamplingRate));
    }

    @Override
    public SamplingResult shouldSample(
            Context parentContext,
            String traceId,
            String name,
            SpanKind spanKind,
            Attributes attributes,
            List<LinkData> parentLinks) {

        // Always sample if there's an error
        if (attributes.get(SemanticAttributes.ERROR_TYPE) != null) {
            return SamplingResult.recordAndSample();
        }

        // Always sample high-value transactions
        Long paymentAmount = attributes.get(
                io.opentelemetry.api.common.AttributeKey.longKey("payment.amount"));
        if (paymentAmount != null && paymentAmount > 5000) {
            return SamplingResult.recordAndSample();
        }

        // Always sample specific critical operations
        if (name.contains("payment.authorization") || 
            name.contains("refund.process")) {
            return SamplingResult.recordAndSample();
        }

        // For everything else, use parent-based sampling with base rate
        return parentBasedSampler.shouldSample(
                parentContext, traceId, name, spanKind, attributes, parentLinks);
    }

    @Override
    public String getDescription() {
        return String.format("AdaptiveSampler{baseSamplingRate=%s}", baseSamplingRate);
    }
}

Use the custom sampler in your configuration:

@Bean
public SdkTracerProvider sdkTracerProvider(
        Resource resource,
        BatchSpanProcessor batchSpanProcessor) {

    return SdkTracerProvider.builder()
            .setResource(resource)
            .addSpanProcessor(batchSpanProcessor)
            .setSampler(new AdaptiveSampler(0.1))  // 10% base rate
            .build();
}

Trace Scopes for Multi-Project Visibility

Trace scopes enable searching trace data across multiple Google Cloud projects—critical for microservices architectures where services span organizational boundaries.

Creating and Managing Trace Scopes

# Create a trace scope spanning multiple projects
gcloud alpha trace scopes create production-services \
    --project=my-host-project \
    --description="Production services trace scope" \
    --projects=payment-service-project,order-service-project,inventory-service-project

# Set as default scope
gcloud alpha trace scopes update production-services \
    --project=my-host-project \
    --default

# List trace scopes
gcloud alpha trace scopes list --project=my-host-project

# View trace scope details
gcloud alpha trace scopes describe production-services \
    --project=my-host-project

Each trace scope supports up to 20 projects, and you can create up to 100 trace scopes per project. This provides tremendous flexibility for organizing trace data in complex environments.

Official Documentation: Create and Manage Trace Scopes

Production Troubleshooting Patterns

Debugging Missing Spans

When spans don't appear in Cloud Trace, systematically verify:

package com.example.tracing.diagnostics;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Context;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

/**
 * Health check that verifies tracing configuration.
 */
@Component
public class TracingHealthIndicator implements HealthIndicator {

    private static final Logger logger = LoggerFactory.getLogger(TracingHealthIndicator.class);

    private final OpenTelemetry openTelemetry;
    private final Tracer tracer;

    public TracingHealthIndicator(OpenTelemetry openTelemetry) {
        this.openTelemetry = openTelemetry;
        this.tracer = openTelemetry.getTracer("diagnostic-tracer", "1.0.0");
    }

    @Override
    public Health health() {
        try {
            // Create a diagnostic span
            Span diagnosticSpan = tracer.spanBuilder("health_check.tracing")
                    .startSpan();

            boolean isRecording = diagnosticSpan.isRecording();
            boolean isValid = diagnosticSpan.getSpanContext().isValid();
            boolean isSampled = diagnosticSpan.getSpanContext().isSampled();
            String traceId = diagnosticSpan.getSpanContext().getTraceId();

            diagnosticSpan.end();

            if (!isValid) {
                return Health.down()
                        .withDetail("error", "Invalid span context")
                        .build();
            }

            return Health.up()
                    .withDetail("tracing.enabled", true)
                    .withDetail("span.recording", isRecording)
                    .withDetail("span.sampled", isSampled)
                    .withDetail("trace.id", traceId)
                    .withDetail("project.id", System.getenv("GCP_PROJECT_ID"))
                    .build();

        } catch (Exception e) {
            logger.error("Tracing health check failed", e);
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }
}

Common Issues and Resolutions

Issue: Traces not appearing in Cloud Trace console

Resolution Checklist:

Verify Cloud Trace API is enabled: gcloud services enable cloudtrace.googleapis.com
Confirm IAM permissions: Service account needs roles/cloudtrace.agent
Check sampling: Ensure requests are being sampled (isSampled() == true)
Verify endpoint: telemetry.googleapis.com:443 for OTLP
Check BatchSpanProcessor: Ensure batches are being exported (check logs)
Validate trace storage initialization: New projects require explicit initialization

Official Documentation: Troubleshoot Cloud Trace

Cost Optimization Strategies

Cloud Trace pricing is \$0.20 per million spans ingested, with the first 2.5 million spans per month free.

Reducing Span Volume

package com.example.tracing.config;

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.trace.ReadableSpan;
import io.opentelemetry.sdk.trace.data.SpanData;
import io.opentelemetry.sdk.trace.export.SpanExporter;
import io.opentelemetry.sdk.trace.export.SpanExporterDecorator;

import java.util.Collection;
import java.util.stream.Collectors;

/**
 * Decorator that filters out low-value spans before export.
 * Reduces costs by not exporting health check spans, static resource requests, etc.
 */
public class FilteringSpanExporter implements SpanExporter {

    private final SpanExporter delegate;

    public FilteringSpanExporter(SpanExporter delegate) {
        this.delegate = delegate;
    }

    @Override
    public CompletableResultCode export(Collection<SpanData> spans) {
        // Filter out health check and static resource spans
        Collection<SpanData> filtered = spans.stream()
                .filter(span -> !isFilteredOperation(span))
                .collect(Collectors.toList());

        return delegate.export(filtered);
    }

    private boolean isFilteredOperation(SpanData span) {
        String spanName = span.getName();

        // Filter health checks
        if (spanName.contains("/actuator/health") || 
            spanName.contains("/health") ||
            spanName.contains("/healthz")) {
            return true;
        }

        // Filter static resources
        if (spanName.endsWith(".js") || 
            spanName.endsWith(".css") ||
            spanName.endsWith(".ico")) {
            return true;
        }

        // Filter very short spans (likely not interesting)
        long durationNanos = span.getEndEpochNanos() - span.getStartEpochNanos();
        if (durationNanos < 1_000_000) { // Less than 1ms
            return true;
        }

        return false;
    }

    @Override
    public CompletableResultCode flush() {
        return delegate.flush();
    }

    @Override
    public CompletableResultCode shutdown() {
        return delegate.shutdown();
    }
}

Use the filtering exporter:

@Bean
public SpanExporter spanExporter() {
    SpanExporter baseExporter = OtlpGrpcSpanExporter.builder()
            .setEndpoint("https://telemetry.googleapis.com:443")
            .setTimeout(10, TimeUnit.SECONDS)
            .setCompression("gzip")
            .build();

    return new FilteringSpanExporter(baseExporter);
}

Official Documentation: Google Cloud Observability Pricing

Integration with Cloud Monitoring and Logging

Creating Trace-Based Metrics

package com.example.tracing.metrics;

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.StatusCode;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.stereotype.Component;

import java.util.concurrent.TimeUnit;

/**
 * Aspect that creates Micrometer metrics from traced operations.
 * Enables alerting on trace-derived metrics.
 */
@Aspect
@Component
public class TracingMetricsAspect {

    private final MeterRegistry meterRegistry;

    public TracingMetricsAspect(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    @Around("@annotation(com.example.tracing.annotations.Traced)")
    public Object recordMetrics(ProceedingJoinPoint joinPoint) throws Throwable {
        Span currentSpan = Span.current();
        String operationName = joinPoint.getSignature().toShortString();

        long startTime = System.nanoTime();
        Throwable exception = null;

        try {
            return joinPoint.proceed();
        } catch (Throwable t) {
            exception = t;
            throw t;
        } finally {
            long duration = System.nanoTime() - startTime;

            // Record duration metric
            Timer.builder("traced.operation.duration")
                    .tag("operation", operationName)
                    .tag("status", exception == null ? "success" : "error")
                    .tag("sampled", String.valueOf(currentSpan.getSpanContext().isSampled()))
                    .register(meterRegistry)
                    .record(duration, TimeUnit.NANOSECONDS);

            // Record error rate
            if (exception != null) {
                meterRegistry.counter("traced.operation.errors",
                        "operation", operationName,
                        "exception", exception.getClass().getSimpleName())
                        .increment();
            }
        }
    }
}

Observability Resources

Essential Documentation

Cloud Trace Documentation - Comprehensive Cloud Trace guide
Java Instrumentation Sample - Official Java implementation guide
Telemetry API Overview - OTLP API reference
Cloud Trace Quotas and Limits - Current limits for both APIs
OpenTelemetry Java Documentation - OpenTelemetry Java SDK guide
Spring Cloud GCP Trace - Spring Boot integration
Google-Built OpenTelemetry Collector - Collector deployment guide
Trace Context Propagation - Context propagation standards
OpenTelemetry Semantic Conventions - Standard attribute naming
Cloud Monitoring Integration - Application observability setup

Conclusion

Google Cloud Trace has matured into a production-grade distributed tracing platform, particularly with the introduction of the OpenTelemetry-native Telemetry API. The combination of generous limits, native OTLP support, and deep integration with the broader Google Cloud observability stack makes it a compelling choice for Java applications.

The key to successful tracing implementation is starting with automatic instrumentation via the OpenTelemetry Java Agent, then progressively adding manual instrumentation for business-critical operations. Pair this with thoughtful sampling strategies, structured logging correlation, and integration with Cloud Monitoring for a complete observability solution.

Over time, I've learned that the most valuable traces aren't necessarily those with the most spans, but those that capture the right context at the right time. Focus on instrumenting the critical path, capture meaningful business context through span attributes, and ensure your sampling strategy preserves visibility into errors and anomalies. With these principles and the production-ready code examples provided here, your Java applications will have the observability foundation needed to operate reliably at scale.

The future of Cloud Trace continues to align with OpenTelemetry standards, ensuring that instrumentation investments remain portable and that you benefit from innovations across the entire observability ecosystem. Whether you're running on GKE, Cloud Run, or Compute Engine, the patterns described here provide a solid foundation for production observability.

DEV Community

Real-World Distributed Tracing: Java, OpenTelemetry, and Google Cloud Trace in Production

Deep Dive into Google Cloud Tracing for Java: A Production-Ready Implementation Guide

Understanding Cloud Trace in the Modern Observability Landscape

The Telemetry API Revolution: OpenTelemetry-Native Architecture

The New Telemetry API Endpoint

Java Implementation: Production-Ready OpenTelemetry Instrumentation

Project Dependencies and Setup

OpenTelemetry Configuration for Production

Application Properties Configuration

Structured Logging Configuration with Trace Correlation

Context Propagation: The Foundation of Distributed Tracing

W3C Trace Context Format

Automatic Context Propagation in Java

Extracting Context from Incoming Requests

Zero-Code Instrumentation with the Java Agent

Dockerfile with Java Agent

Kubernetes Deployment with Java Agent

The Google-Built OpenTelemetry Collector

Collector Deployment on GKE

Advanced Sampling Strategies

Adaptive Sampling Based on Request Attributes

Trace Scopes for Multi-Project Visibility

Creating and Managing Trace Scopes

Production Troubleshooting Patterns

Debugging Missing Spans

Common Issues and Resolutions

Cost Optimization Strategies

Reducing Span Volume

Integration with Cloud Monitoring and Logging

Creating Trace-Based Metrics

Observability Resources

Essential Documentation

Conclusion

Top comments (0)