ObservabilityGuy

Posted on Mar 10

Master WebSocket End-to-end Observability in One Article

#ai #observability

This article introduces end-to-end WebSocket tracing using OpenTelemetry and LoongSuite agent for AI real-time applications.

Introduction: Technical Evolution and Contemporary Value of WebSocket 1.1 What is WebSocket?

WebSocket is a full-duplex communication protocol based on the TCP protocol (RFC 6455 [1]). It establishes a persistent connection through a single HTTP handshake, enabling bidirectional data transmission between the client and the server. The following is a schematic diagram of a WebSocket communication [2]:

You can see that, unlike HTTP, the client first initiates a handshake request to the server based on the HTTP protocol, and the server returns a response indicating the handshake succeeded. After this, the existing TCP connection is upgraded to a WebSocket connection, and full-duplex communication can take place between the client and the server. The TCP connection continues until one side considers it necessary to close it, and the other side agrees to close it.

To better understand the WebSocket end-to-end observability solution, it is necessary to interpret the protocol details of WebSocket. The remaining content of this section is partially translated and summarized from WebSocket Protocol [3]. If you do not want to read the details, you can directly jump to Section 1.2.

1.1.1 URI Format and Syntax
Very similar to the HTTP protocol family, WebSocket also has a standard protocol and its secure version, distinguished by ws and wss. The security of wss is also implemented using the TLS protocol. Because WebSocket relies on the HTTP protocol for the handshake and subsequently reuses the original TCP connection, the default ports for WebSocket are also 80 (ws) and 443 (wss). The overall format of the URI is also very similar to HTTP.

1.1.2 Start Connection Handshake (Based on HTTP/1.1)
The traditional WebSocket handshake is a typical HTTP request/response. The client actively initiates a WebSocket handshake request (a special GET). If the server supports and allows communication using the WebSocket protocol, it returns a WebSocket handshake response. The WebSocket connection is then established.

The handshake request contains the following headers:

If the server accepts the WebSocket protocol, it sends a response with a StatusCode of 101:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The response includes:

● HTTP/1.1 101 Switching Protocols: indicates a successful upgrade from HTTP to WebSocket.

● Upgrade: websocket: confirms the protocol upgrade.

● Connection: Upgrade: indicates that the connection is upgraded.

● Sec-WebSocket-Accept: a value calculated based on the Sec-WebSocket-Key of the client, used to verify that the server understood the WebSocket handshake request.

The process of upgrading from HTTP/2 and HTTP/3 to WebSocket is slightly different, but it is not the key point discussed in this topic. It will not be repeated here. You are welcome to read the original WebSocket Protocol text [3].

1.1.3 WebSocket Messages and Data Frames
After the handshake is completed, the connection is upgraded to a WebSocket connection. At this time, the client and server can send WebSocket messages bidirectionally at any time to exchange data and instructions. The smallest communication unit in WebSocket is a data frame, and each message may consist of one or more data frames.

Data frames can be divided into the following three types based on their purposes:

● Text frame: The payload is UTF-8 encoded text data.

● Binary frame: The payload is binary data.

● Control frame: used to transmit protocol signals, such as ping, pong, and close frames.

The data composition of a data frame is shown in the following graph:

Regarding the meaning of each segment of data in the data frame, if you are interested, you are welcome to read the original WebSocket Protocol text [3].

1.1.4 Close Connection Handshake
When either the client or the server determines that the connection can be closed, it sends a Close frame (a type of control frame) to the peer. Upon receiving the Close frame, the peer sends another Close frame as a response as soon as possible. After sending the Close frame, the endpoint should not send any more data frames. After both parties exchange Close frames, the TCP connection closes.

1.2 Why Use WebSocket?
It is not difficult to see that the core attributes of WebSocket are reflected in the following aspects:

● Persistent connections: The connection persists after it is established, which avoids the overhead of repeated handshakes.

● Bidirectional data tunnels: The client and server can send data frames (Text/Binary) at any time.

● Low latency attributes: The transmission cost of HTTP polling request headers is eliminated.

● Message framing mechanism: Sharded transmission of extremely large data volumes is supported (the maximum size of a single frame is 2^64 bytes).

This protocol attribute makes it the preferred solution for real-time communication scenarios with large data volumes.

1.3 Revival of the WebSocket Protocol in the AI Era
With the explosion of LLM technology, more scenarios that require real-time interaction have begun to emerge. Intelligence has given new vitality to the WebSocket protocol:

● Intelligent customer service or robots that support real-time conversation and interaction

● Real-time interaction between in-vehicle AI assistants and cloud models

● AI smart glasses for automatic translation and intelligent image recognition

In addition to real-time performance, WebSocket is a stateful connection. Features such as memory retention for multi-turn conversations and immediate interruption of outputs are easier to implement than with traditional HTTP. To date, most mainstream LLM providers offer WebSocket interaction APIs and matching SDKs to help users better build backend service systems. For example:

● OpenAI supports the WebSocket-based Realtime API [4].

● Model Studio LLM service platform publishes a WebSocket-based real-time multimodal interaction protocol [5].

● Google Gemini Supports the WebSocket-based Live API [6].

While WebSocket empowers the real-time performance of AI applications, it also brings significant challenges to the observability of application systems. The high flexibility and extensibility of the WebSocket protocol determine that it cannot achieve end-to-end observability as conveniently as HTTP and gRPC. This topic will specifically analyze the implementation pain points and solutions for end-to-end observability in WebSocket scenarios.

Analysis of Pain Points in WebSocket end-to-en Observability 2.1 Tracing Analysis Dilemma Caused by Protocol Flexibility 2.1.1 Difficulty in Injecting Trace Information For regular HTTP invocations, to ensure the connectivity of the trace, the caller adds a group of key-value pairs to the HTTP headers to carry the trace context. This ensures that the callee can correctly revert the trace context of the caller when parsing the protocol, thereby ensuring that the context can be passed down. The figure shows a specific sample of the trace context header when the W3C trace protocol [7] is used:

In section 1.1.3, we learned that a WebSocket data frame actually consists of only a few bytes of control bits and a data payload. Except for the handshake when the connection is established, there is no other opportunity to transmit metadata such as headers. Therefore, the traditional OpenTelemetry W3C trace context cannot be directly embedded into each data frame. In actual application scenarios, a single WebSocket connection often does not represent just one WebSocket invocation. Relying solely on the HTTP Request and response when the connection is established is far from enough. At the same time, this also leads to a second difficulty: The definition of the span scope is blurred.

2.1.2 Span Scope Definition Blurred
In the observability realm, we generally refer to a critical operation on the trace as a span (span) [8]. A trace usually consists of a group of spans in a tree structure. With the help of the observability frontend, we can retrieve spans belonging to the same trace and render them into a waterfall chart as shown in the figure below based on the parent-child relationship (that is, the invocation relationship) and the occurrence time. This helps us understand all critical operations and invocation relationships that occurred on a trace.

However, in WebSocket scenarios, the definition of operation granularity can be very flexible. As shown in the figure, a span may correspond to the whole process from the beginning to the end of a WebSocket connection, each message sending and receiving, or even each data frame transmission process. The highly flexible definition of span granularity also leads to great changes in trace context injection and management, which also increases the difficulty of business landing.

2.1.3 Reverse Diffusion of Trace Context
Although we divide the two ends into client and server based on the initiator and receiver of the WebSocket connection, the actual service processing process is highly flexible two-way flow, and there may be cases where the server initiates the request and the client processes it. For example, a client can establish a connection with a server and register its services with the server. The server sends a message to call back the client. For this interaction mode, the message producer (caller) is the server, and the consumer (callee) is the client. Therefore, the trace context should be injected into the message by the server, restored by the client, and further passed.

2.2 Trace Break Crisis Caused by Asynchronous Invocation
In WebSocket applications, to improve the connection utilization, the two ends also use asynchronous methods to decouple the message receiving process from the message processing process. The following is a typical asynchronous message processing architecture. In this process, messages may be submitted directly to the thread pool, stored in a queue within a process, or even written directly to external storage such as Redis. This flexible asynchronous mode also brings difficulties to the in-process transmission of trace context, which is very prone to broken traces.

Best Practices of End-to-end Observation Based on LoongSuite 3.1 Scheme Fundamentals From the discussion in the last two sections, we can draw two basic conclusions:

● The usage of WebSocket is quite flexible, and the implementation of tracing analysis depends on the business implementation to a large extent. Developers need to implement some extensions to ensure trace integrity.

● High-frequency business scenarios lack some of the best landing paradigms, making it difficult to implement tracing analysis independently.

In addition, some other types of calls such as NoSQL and HTTP are inevitable on WebSocket traces. Therefore, non-intrusive agents are still required to ensure the concatenation of various calls. This requires that the non-intrusive agents and the trace context generated by custom extensions can communicate well. The extension mechanism based on the OpenTelemetry API provided by the LoongSuite non-intrusive agent is the best way to solve these problems [9].

3.1.1 Working Principle of OpenTelemetry API and LoongSuite Agent
The OpenTelemetry API is one of the important components of the observability data collection standard defined by the OpenTelemetry community [10]. It defines a set of API behavior standards and function descriptions used in observability fields, such as observability data creation, context management/transparent transmission, data reporting and other logic, and provides supporting SDK implementation for many languages. Users can easily implement context management and pass-through based on APIs and SDKs. The following is an illustration of how to use the Tracer API to define a span:

private int doWork() {
  // Create a span.
  Span doWorkSpan = tracer.spanBuilder("doWork").startSpan();
  // Activate the context of the span.
  try (Scope scope = doWorkSpan.makeCurrent()) {
    int result = 0;
    for (int i = 0; i < 10; i++) {
      result += i;
    }
    return result;
  } finally {
    // End the span.
    doWorkSpan.end();
  }
}

The LoongSuite agent is an open-source in-process observability collection component for AI applications, built by the Alibaba Cloud observability team based on the OpenTelemetry agent. For popular open-source components, such as LangChain, OpenAI SDK, and Tomcat, the LoongSuite agent provides rich predefined instrumentation implementations. Users no longer need to develop based on the OpenTelemetry API. Instead, you can simply modify compile or runtime commands. The agent automatically completes key logic such as observability data creation, context management/pass-through, and data rReporting, thereby achieving the goal of non-intrusive observability.

The LoongSuite agent can meet the observability requirements in most scenarios of production applications. However, for some highly custom scenarios, such as complex consumption procedures in message systems, partial MQTT scenarios, and WebSocket communication scenarios, using the OpenTelemetry API/SDK to add custom tracking is the optimal solution to cover the monitoring blind spots of non-intrusive agents.

3.1.2 Interaction between the LoongSuite Agent and Custom Extensions
For languages with relatively strict package management (requiring explicit version specification) such as Java and Go, inconsistent dependency versions may exist between the agent and the application, such as Jackson, gRPC, and OpenTelemetry API/SDK. To avoid dependency conflicts, the shadow method is often used for dependency isolation. However, this also causes the trace context generated when the user uses the OpenTelemetry API and SDK for autonomous tracking to fail to interoperate with the agent, which in turn leads to broken traces.

OpenTelemetry and LoongSuite agents also use a code enhancement mechanism to ensure the sharing of trace contexts. The specific overall schematic is as follows:

● The agent and the application share a set of APIs. The API itself ensures forward compatibility.

● When the agent is initialized, the initialized instance object is registered to GlobalHolder. When custom tracking is performed in the application, you can retrieve the instance object of the agent directly from the GlobalHolder in the API.

● For some methods and static APIs defined in the SDK, such as context and baggage, the original invocation of these functions is skipped through code enhancement. Instead, the corresponding implementation in the agent is used.

Through the above mechanism, the LoongSuite agent can be well connected with the spans created by the OpenTelemetry API/SDK, ensuring the integrity of the trace.

3.2 Best Practices for WebSocket Distributed Tracing Analysis
Having understood these components, the key question is: How should my application add these custom tracking points? In the implementation of the WebSocket end-to-end tracing, you need to first clarify several issues based on business requirements:

● Session granularity issue: Does one WebSocket connection correspond to one trace or multiple traces?

Corresponds to one trace: A WebSocket connection is intended to complete a series of highly correlated operations, and the duration is generally only a few minutes.
Corresponds to multiple traces: A WebSocket connection remains and is continuously reused after being established, and the duration may last for several hours.
● Invocation modeling issue: Can the internal data transmission procedure of WebSocket be modeled as discrete requests and responses?

If the connection is only used for data transfer between both parties after being established, there is no need to specifically create a span for each message. The lifecycle of a span should correspond to the complete procedure of message transfer between both parties.
If one party sends a message and the other party processes the message and returns a response after the connection is established, each group of such invocations can create a pair of parent-child spans. The corresponding data structure needs to allow carrying the serialized trace context.
In response to the different scenarios above, the implementation recommendations for custom tracking will also differ. The following describes them separately.

3.2.1 Import OpenTelemetry API Dependencies
The compatibility of the agent with the API is forward compatibility. The adaptation for the latest version of the API may be relatively limited. In the production environment, the version of the API package does not need to be too new. The basic API is sufficient for use.

For the Java language, it is recommended to import it in pom.xml. API reference: https://javadoc.io/doc/io.opentelemetry/opentelemetry-api/1.28.0/index.html

<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-api</artifactId>
</dependency>

Retrieve the global instance injected by the agent:

openTelemetry = GlobalOpenTelemetry.get();
tracer = openTelemetry.getTracer("websocket-example", "1.0.0");

For Go, you can run the go get command to obtain the package. API reference: https://pkg.go.dev/go.opentelemetry.io/otel@v1.28.0

go get go.opentelemetry.io/otel

Obtain the global instance injected by the agent:

tracer := otel.GetTracerProvider().Tracer("websocket-example")

For Python, you can obtain it via pip install. API reference: https://opentelemetry-python.readthedocs.io/en/latest/

pip install opentelemetry-api

Obtain the global instance injected by the agent:

from opentelemetry import trace
tracer = trace.get_tracer(__name__)

3.2.2 Session Granularity Issues - Creating a Trace at the WebSocket Connection Dimension
Implementation suggestion: When a WebSocket connection is established, a handshake is initiated based on an HTTP request. The trace context is reused as the context for sub-operations throughout the WebSocket connection.

The following is a basic schematic diagram of implementing an entire WebSocket connection as a single trace. All requests and data transmissions are attached as sub-spans under a single trace. Therefore, this implementation is more suitable for scenarios where WebSocket connections are established on demand and closed in a timely manner.

Client-side code implementation (using the native WebSocket library provided by Java as an example):

public static void main(String[] args) throws Exception {
  // 1. Create a connection-level trace (created before the connection so that the TraceContext is passed during handshake)
  Span connectionSpan = tracer.spanBuilder("websocket.connection")
      .setAttribute("websocket.endpoint", "/native/ws")
      .setAttribute("websocket.destination", "ws://localhost:18081")
      .setAttribute("websocket.connection.type", "client")
      .startSpan();
  // 2. Activate the current span in the context of the thread, and mark the span from the beginning of the connection to the closing of the connection.
  try (Scope scope = connectionSpan.makeCurrent()) {
    WebSocketContainer container = ContainerProvider.getWebSocketContainer();
    // Create a WebSocket client.
    NativeWebSocketClient client = new NativeWebSocketClient();
    // Use the endpoint for connection.
    Session session = container.connectToServer(
        new jakarta.websocket.Endpoint() {
          @Override
          public void onOpen(Session session, EndpointConfig config) {
            client.onOpen(session);
            // Register a message handler (use an anonymous inner class instead of a lambda to avoid generic type inference issues).
            session.addMessageHandler(new MessageHandler.Whole<String>() {
              @Override
              public void onMessage(String message) {
                client.onMessage(message);
              }
            });
          }
          @Override
          public void onClose(Session session, CloseReason closeReason) {
            client.onClose();
          }
          @Override
          public void onError(Session session, Throwable thr) {
            // Record errors to the current span.
            connectionSpan.recordException(thr);
            client.onError(thr);
          }
        },
        // 3. When you initiate a handshake, carry the current context in the request header.
        createHeaderWithUserProperties(),
        URI.create("ws://localhost:18081/native/ws"));
    client.session = session;
    client.sessionId = session.getId();
    log.info("Client started, enter message sent to server (enter 'exit' to exit):");
    // Read input from the console.
    BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
    String line;
    while ((line = reader.readLine()) != null && !line.equals("exit")) {
      if (!line.trim().isEmpty()) {
        // 4. Send a message to the server.
        client.sendMessage(line);
      }
    }
    // Close the connection.
    client.close();
    log.info("Client has exited");
  } catch (Exception e) {
    // If an error occurs, record it in the span.
    connectionSpan.recordException(e);
    log.error("The client fails to start", e);
  } finally {
    // 5. End the span.
    connectionSpan.end();
  }
  // Wait for the asynchronous report of the span. You do not need to retain it in the actual service.
  Thread.sleep(5000L);
}
private static ClientEndpointConfig createHeaderWithUserProperties() {
  // Create a ClientEndpointConfig to customize the handshake request header.
  ClientEndpointConfig.Builder configBuilder = ClientEndpointConfig.Builder.create();
  // 3.1. Obtain the current TraceContext and prepare HTTP headers.
  final Map<String, List<String>> headersMap = new HashMap<>();
  Context currentContext = Context.current();
  // 3.2. Inject the TraceContext into the headers through the ContextPropagators of the global instance.
  openTelemetry.getPropagators().getTextMapPropagator()
      .inject(currentContext, headersMap, (carrier, key, value) -> carrier.put(key, List.of(value)));
  // 3.3. Set Configurator to add headers during handshake.
  configBuilder.configurator(new ClientEndpointConfig.Configurator() {
    @Override
    public void beforeRequest(Map<String, List<String>> headers) {
      headers.putAll(headersMap);
    }
  });
  return configBuilder.build();
}

Server-side code implementation (using the native WebSocket library provided by Java as an example):

@ServerEndpoint(value = "/native/ws", configurator = NativeWebSocketServer.TraceContextConfigurator.class)
public class NativeWebSocketServer {
  // Manage the context from the client based on the session dimension.
  private static final Map<String, Context> connectionTraceContexts = new ConcurrentHashMap<>();
  // 1. Define a configuration class to extract the TraceContext during handshake.
  public static class TraceContextConfigurator extends ServerEndpointConfig.Configurator {
    private static final TextMapGetter<Map<String, List<String>>> headerGetter = new TextMapGetter<Map<String, List<String>>>() {
      @Override
      public Iterable<String> keys(Map<String, List<String>> carrier) {
        return carrier.keySet();
      }
      @Override
      public String get(Map<String, List<String>> carrier, String key) {
        List<String> values = carrier.get(key);
        return values != null && !values.isEmpty() ? values.get(0) : null;
      }
    };
    @Override
    public void modifyHandshake(ServerEndpointConfig sec, HandshakeRequest request, HandshakeResponse response) {
      // Extract TraceContext from HTTP headers.
      Map<String, List<String>> headers = request.getHeaders();
      Context extractedContext = openTelemetry.getPropagators()
          .getTextMapPropagator()
          .extract(Context.current(), headers, headerGetter);
      // 2. Store the TraceContext to userProperties and extract it when onOpen occurs.
      sec.getUserProperties().put("traceContext", extractedContext);
    }
  }
  @OnOpen
  public void onOpen(Session session, EndpointConfig config) {
    String sessionId = session.getId();
    sessions.put(sessionId, session);
    // 3. Extract TraceContext from userProperties of config (set in Configurator).
    Context parentContext = Context.current();
    Object traceContextObj = config.getUserProperties().get("traceContext");
    if (traceContextObj instanceof Context) {
        parentContext = (Context) traceContextObj;
    }
    // 4. Save the client trace context and obtain it when you need to create a sub-span.
    connectionTraceContexts.put(sessionId, parentContext);
    log.info("Client connection: sessionId ={}, number of current connections ={}, sub-span created from Client TraceContext", sessionId, sessions.size());
    sendMessage(session, "Welcome to connect! Your session ID: " + sessionId);
  }
  @OnClose
  public void onClose(Session session) {
    String sessionId = session.getId();
    sessions.remove(sessionId);
    connectionTraceContexts.remove(sessionId);
    log.info("Client disconnected: sessionId ={}, number of remaining connections ={}, trace ended", sessionId, sessions.size());
  }
}

3.2.3 Session Granularity Issue - Using Session IDs to Associate Different Traces
Implementation suggestion: Reuse the session ID of the WebSocket as an attribute of each span. If necessary, you can also query all traces from the same WebSocket session based on the attribute.

The following is a basic schematic diagram of the implementation that uses session IDs to associate different traces. Each active request initiated by the client side or server side is a separate trace. Relationships between them are not rendered in the trace waterfall chart, but the traces can be filtered and queried by using the session ID attribute. Therefore, this implementation is more suitable for scenarios where the WebSocket connection time is long and reuse may exist.

The implementation scheme of using session IDs to associate different traces is relatively simple. Most frameworks can directly retrieve the ID of the current session. You can invoke the setAttribute API to write the ID into the span. The following is a basic sample:

public void sendMessage(Session session, String message) {
  Span span = tracer.spanBuilder("Client send message").startSpan();
  // Write the session ID to the span.
  span.setAttribute("websocket.session.id", session.getId());
  try (Scope scope = span.makeCurrent()) {
    doSendMessage(message);
  } finally {
    span.end();
  }
}

3.2.4 Invocation Modeling Issue - Obvious Invocation Relationships Exist
Implementation suggestion: Follow the tracing analysis logic of the messaging system. The message sender is the caller, and the message receiver is the callee. Both create a span. The caller span serves as the parent of the callee. If multiple rounds of message sending are involved, as long as the intent is to stream, it is regarded as a single invocation behavior.

The trace effect is shown in the following figure:

This scenario is the most common situation encountered in production applications. To ensure the concatenation of the client trace and the server trace, the caller needs to ensure that the message contains a reserved field similar to headers for passing the trace context when the message is sent. This field needs to be supported for parsing by both the client and the server. Many production services reserve such fields, such as: speech synthesis CosyVoice WebSocket API#instruction (client→Server).

Caller code implementation:

public void sendMessage(String message) {
  // 0. (Optional) If a span is created for a connection, run span.makeCurrent().
  // 1. Create a header field.
  HashMap<String, String> headers = new HashMap<>();
  // 2. Create a span and write the required attributes.
  Span span = tracer.spanBuilder("Client send message").startSpan();
  span.setAttribute("websocket.session.id", session.getId());
  try (Scope scope = span.makeCurrent()) {
  // 3. Call the OpenTelemetry API to inject the context into the header.
    openTelemetry.getPropagators().getTextMapPropagator().inject(Context.current(), headers,
        (headersMap, key, value) -> headersMap.put(key, value));
  // 4. Send the message.
  // If you send a message in a streaming mode, you can add a header to the first message. The calling parties must ensure the idempotence of span creation (that is, only one span is created during the entire streaming mode.)
    sendMessage(message, headers);
  } finally {
    span.end();
  }
}

Callee code implementation:

public void onMessage(String message, Session session) {
  String sessionId = session.getId();
  try {
    // 1. Parse the message.
    MessageWithHeaders msgWithHeaders = objectMapper.readValue(message, MessageWithHeaders.class);
    Map<String, String> headers = msgWithHeaders.getHeaders();
    // 2. Extract the trace context from the message.
    Context remoteContext = openTelemetry.getPropagators().getTextMapPropagator()
        .extract(Context.current(),
            headers, new TextMapGetter<Map<String, String>>() {
              @Override
              public Iterable<String> keys(Map<String, String> headersMap) {
                return headersMap.keySet();
              }
              @Override
              public String get(Map<String, String> headersMap, String key) {
                return headersMap.getOrDefault(key, null);
              }
            });
    // 3. Create a server span with the extracted context as the parent.
    Span serverSpan = tracer.spanBuilder("Server handle message")
        .setParent(remoteContext).startSpan();
    try (Scope scope = serverSpan.makeCurrent()) {
      // 4. Process the message and return the streaming response.
      String body=msgWithHeaders.getBody();
      log.info("Received message [{}] [headers={}]: {}", sessionId, headers, body);
      // Process messages (with headers)
      handleMessage(session, body, headers);
    } catch (Exception e) {
      serverSpan.recordException(e);
    } finally {
      serverSpan.end();
    }
  } catch (Exception e) {
    log.error("The message failed to be received [{}]: {}", sessionId, message, e);
  }
}

3.2.5 nvocation Modeling Issues — No Explicit Invocation Relationship, Only Data Transmission
Implementation suggestion: The data sender creates a span as a sub-span of the entire WebSocket connection span (if any). The spans on both sides do not maintain a parent-child relationship.

The trace effect is shown in the following figure:

Data sender code example:

public void streamingSendMessages(Session session) {
  // 0. (Optional) If a span is created for a connection, run span.makeCurrent().
  Context context = connectionTraceContexts.containsKey(session.getId()) ?
            connectionTraceContexts.get(session.getId()) : Context.current();
  try (Scope pScope = context.makeCurrent()) {
    // 1. Create a span.
    Span span = tracer.spanBuilder("Client send message").setParent(context).startSpan();
    span.setAttribute("websocket.session.id", session.getId());
    try (Scope scope = span.makeCurrent()) {
      // 2. Send the message.
      while (messageQueue != null && messageQueue.containsKey(session.getId())) {
        List<Message> messages = messageQueue.get(session.getId());
        messages.forEach(message -> sendMessage(session, message));
        Thread.sleep(200L);
      }
    } finally {
      span.end();
    }
  }
}

3.2.6 Asynchronous Pass-through Issues - In-process Asynchronous Context Management
In general, there are two implementations of asynchronous operations in WebSocket applications:

● Asynchronous scheduling based on thread pools. Whenever a message is accepted, a Runnable or Callable task is created, or a Go or Python coroutine is created.

● Asynchronous communication based on in-process queues (such as Java Deque, Go channel, and Python Generator). Whenever a message is accepted, the message is enqueued and processed by a unified Worker.

For the first scenario, the LoongSuite agent already supports automatic context pass-through:

public void onMessage(String message) {
  Span messageSpan = tracer.spanBuilder("Server handle message").startSpan();
  // Activate the current span and place it in ThreadLocal.
  try (Scope scope = messageSpan.makeCurrent()) {
    // Call the message processing process asynchronously.
    // The agent automatically passes the span context to the doHandleMessage method when the Runnable task is created.
    // When the doHandleMessage method is actually executed, the context is automatically restored.
    workerExecutor.execute(() -> doHandleMessage(message));
  }
}

For the second scenario, you need to actively perform context pass-through and reversion:

public void onMessage(String message) {
  Span messageSpan = tracer.spanBuilder("Server handle message").startSpan();
  // Activate the current span and place it in ThreadLocal.
  try (Scope scope = messageSpan.makeCurrent()) {
    // Manually associate the TraceContext with the message. You can also use a map.
    message.setTracingContext(Context.current());
    // The message is enqueue.
    messageQueue.offer(message);
  }
}
public void pollAndHandleMessage() {
  while (true) {
    if (!messageQueue.isEmpty) {
      Message message = messageQueue.poll();
      // Obtain the TraceContext and span after the message is dequeue.
      Context tracingContext = message.getTracingContext();
      Span span = Span.fromContext(tracingContext);
      // Reactivate the TraceContext and place it in ThreadLocal.
      try (Scope scope = tracingContext.makeCurrent()) {
        handleMessage(message);
      } finally {
        // End message processing and disable the span.
        span.end();
      }
    }
    Thread.sleep(100L);
  }
}

3.3 Key Business Metrics for WebSocket Streams
In Section 3.2, we can see that in stream scenarios, we record a complete request as a span to prevent performance bottlenecks caused by excessive spans. However, this also erases some key performance information in the stream—during a message processing session, the processing time of certain individual packets is too long, causing the entire response procedure to be slow. In actual production, these metrics can also greatly help us measure application health and evaluate where problems lie in certain traces. The following are several common business metrics:

The following is a simple utility class implementation for calculating these metrics. For detailed usage methods, view the sample code: https://github.com/Cirilla-zmh/asr-demo/blob/main/asr-service/src/main/java/com/example/asr/ws/AsrWebSocketHandler.java

public class WebSocketPerformanceMeasure {
  private static final Logger log = LoggerFactory.getLogger(WebSocketPerformanceMeasure.class);
  private static final long UNINITIALIZED = -1L;
  private Long startTime;
  private Long firstChunkTime;
  private AtomicInteger chunkCounts;
  private AtomicLong totalInterval;
  private Long lastChunkTime;
  public static WebSocketPerformanceMeasure create() {
    WebSocketPerformanceMeasure measure = new WebSocketPerformanceMeasure();
    measure.startTime = System.currentTimeMillis();
    measure.firstChunkTime = UNINITIALIZED;
    measure.chunkCounts = new AtomicInteger(0);
    measure.totalInterval = new AtomicLong(0);
    measure.lastChunkTime = UNINITIALIZED;
    return measure;
  }
  /**
   * Start measurement (if not already started).
   */
  public void start() {
    if (startTime == null) {
      startTime = System.currentTimeMillis();
      firstChunkTime = UNINITIALIZED;
      chunkCounts = new AtomicInteger(0);
      totalInterval = new AtomicLong(0);
      lastChunkTime = UNINITIALIZED;
    }
  }
  /**
   * Record the arrival of a chunk.
   * Automatically calculate time_to_first_chunk and update interval statistics.
   *
   * @return If it is the first chunk, time_to_first_chunk (in milliseconds) is returned. Otherwise, null is returned.
   * /
  public Long recordChunk() {
    if (startTime == null) {
      log.warn("Performance measure not started, calling start() automatically");
      start();
    }
    long currentTime = System.currentTimeMillis();
    chunkCounts.incrementAndGet();
    // Record the time of the first chunk.
    Long timeToFirstChunk = null;
    if (firstChunkTime == UNINITIALIZED) {
      timeToFirstChunk = currentTime - startTime;
      firstChunkTime = currentTime;
      log.debug("First chunk recorded, time_to_first_chunk: {}ms", timeToFirstChunk);
    }
    // Calculate the chunk interval (starting from the second chunk).
    if (lastChunkTime != UNINITIALIZED) {
      long interval = currentTime - lastChunkTime;
      totalInterval.addAndGet(interval);
    }
    lastChunkTime = currentTime;
    return timeToFirstChunk;
  }
  /**
   * Get time_to_first_chunk (ms)
   * If the first chunk has not arrived yet, null is returned.
   * /
  public Long getTimeToFirstChunk() {
    if (firstChunkTime == UNINITIALIZED || startTime == null) {
      return null;
    }
    return firstChunkTime - startTime;
  }
  /**
   * Get time_to_last_chunk (ms).
   * You need to ensure that the call is made after the chunk is fully arrived.
   * If the first chunk has not arrived yet, null is returned.
   * /
  public Long getTimeToLastChunk() {
    if (lastChunkTime == UNINITIALIZED || startTime == null) {
      return null;
    }
    return lastChunkTime - startTime;
  }
  /**
   * Obtain the average chunk interval (in milliseconds).
   * If the number of chunks is less than 2, null is returned.
   * /
  public Long getAverageInterval() {
    int count = chunkCounts.get();
    if (count < 2 || totalInterval == null) {
      return null;
    }
    return totalInterval.get() / (count - 1);
  }
  /**
   * Obtain the total number of chunks.
   */
  public int getChunkCount() {
    return chunkCounts != null ? chunkCounts.get() : 0;
  }
}

Practice in Typical Scenarios: AI Voice Dialogue System In this section, we will combine a common business system in production to briefly introduce the specific practice of this scenario. The demo code is open-source and available at https://github.com/Cirilla-zmh/Apsara Stack Resilience-demo

4.1 System Architecture
The following is a brief illustration of the overall system architecture:

Device → WebSocket → ASR → LLM (intent recognition) ↓
├─ Chat → LLM (generation) → TTS → Device
└ ─ Order → MCP (order) → LLM (generation) → TTS → Device

Call sequence diagram:

4.2 Integrate the LoongSuite Agent
In this sample project, environment variables are reserved for the agent. By mounting the LoongSuite agent, we can access the observability data of the ASR demo service to the ARMS console. Here are the specific steps:

Download and decompress the LoongSuite commercial agent.

To ensure the integrity of the LLM trace, we recommend that you download the agent V4.6.0 or later.

wget "http://arms-apm-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/4.6.0/AliyunJavaAgent.zip" -O AliyunJavaAgent.zip
unzip AliyunJavaAgent.zip

You can add the parameters related to mounting the ARMS agent when you start the application. For more information about the parameters, see Manually install the ARMS agent [11].

export JAVA_AGENT_OPTIONS="-javaagent:/path/to/4.6.0/AliyunJavaAgent/aliyun-java-agent.jar -Darms.licenseKey=${your_license_key} -Darms.appName=websocket-demo -Daliyun.javaagent.regionId=cn-hangzhou -Darms.workspace=${your_cms_workspace}"
./start.sh

You can also integrate the LoongSuite open-source agent or OpenTelemetry agent, and report the observability data to the open-source observability platform. Due to space constraints, we will not expand it here. Refer to https://github.com/alibaba/loongsuite-java-agent for more information.

4.3 System Page and Observability Effect Schematic
The following is the application system page after deployment, which is similar to the current intelligent robot IM system and is used to replace the left and right sides of the device:

After the conversation is initiated, the statistical trace is shown in the figure. You can clearly see the duration of each link in a trace:

In the corresponding WebSocket span, you can view the statistics of the first packet delay and average output interval, which helps you analyze the overall service performance:

Conclusion: Future Outlook
The end-to-end observability of WebSocket has always been a strong demand of many customers and a headache for many development and O&M personnel. Observability solutions cannot be achieved overnight and require continuous in-depth co-construction and cooperation with users. I am very excited to see that Bull has completed the landing of the program in production with the observability team [12], which also provides very valuable experience for the improvement of our program. In the future, we will work with more users and open-source developers to continuously supplement and build more complete and easier-to-use WebSocket observability solutions.

Everyone is welcome to follow the LoongSuite community for the latest progress of related programs:

References
[1] RFC 6455 https://datatracker.ietf.org/doc/html/rfc6455

[2] The Road to WebSockets https://websocket.org/guides/road-to-websockets/

[3] WebSocket Protocol https://websocket.org/guides/websocket-protocol/

[4] OpenAI Realtime API https://platform.openai.com/docs/guides/realtime-websocket

[5] A Real-time multimodal interactive protocol (WebSocket) https://help.aliyun.com/zh/model-studio/multimodal-interaction-protocol

[6] Live API - WebSockets API reference https://ai.google.dev/api/live

[7] Trace Context https://www.w3.org/TR/trace-context/#abstract

[8] Basic introduction to distributed tracing https://observability.cn/project/opentelemetry/rp8k7gzvtys07zsb/

[9] Use OpenTelemetry SDK for Java to add custom instrumentation code to traces https://www.alibabacloud.com/help/en/arms/application-monitoring/use-cases/use-opentelemetry-sdk-for-java-to-manually-instrument-applications

[10] OpenTelemetry Specification Overview https://opentelemetry.io/docs/specs/otel/overview/

[11] Manually install an ARMS agent https://www.alibabacloud.com/help/en/arms/application-monitoring/user-guide/manually-install-arms-agent-for-java-applications

[12] Make every voice wake-up reliable, Bull's Murora team reconstructs the observability system https://mp.weixin.qq.com/s/ONIXSjDsGzJ0O5XPdUUhtg

DEV Community

Master WebSocket End-to-end Observability in One Article

Top comments (0)