ObservabilityGuy

Posted on Apr 20 • Edited on Apr 21

LoongSuite Python Agent Launches: Observability Into Every AI Agent Action, Zero-code Integration

#loongsuite #ai

This article introduces the LoongSuite Python Agent, Alibaba Cloud's OpenTelemetry distribution for zero-code AI application observability.

As AI applications grow in complexity, they often hit an inflection point where the features work — but making changes feels increasingly risky. With multi-agent pipelines, tool calling, retrieval-augmented generation (RAG), and memory all in play, the hard questions start to surface: What actually happened during that run? When did the context shift? Which step caused latency to spike? What did that response cost? The deeper challenge is that much of this happens inside the model's black box — leaving teams with limited visibility and no clear starting point for debugging.

The LoongSuite Python Agent brings full observability to your AI applications — no code changes required. Trace any request end to end: which model was called, which tools were invoked, which documents were retrieved, how many tokens were consumed, and how context evolved at each step. Get a clear picture of how your agent actually behaves in production, and streamline analysis, evaluation, and optimization.

I. Three Core Challenges in AI Application Observability
Traditional microservice observability centers on performance and availability. AI applications demand more — the goal is to make runtime context and behavior traceable, reproducible, and analyzable. In practice, three challenges are unavoidable.

1.1 Collecting Runtime Data Without Impacting Performance

In traditional microservices, code is the core asset. In AI applications, what truly matters is the data generated at runtime: conversations, tool calls, retrieval results, memory reads and writes, and multimodal inputs and outputs such as images, audio, and video. This runtime data is what guides agent and model optimization — making your agent smarter over time.

Collecting this data completely — without slowing down the pipeline or disrupting the application — is harder than it sounds:

● Context management is dynamic. It can shift inside a framework or be controlled by business logic. Capturing these changes transparently, across both framework internals and application code, requires a non-invasive approach.

● Multimodal payloads are large. Embedding images or audio directly into the trace pipeline can bottleneck the entire system. They need to be extracted and stored separately without blocking the application.

1.2 Inconsistent Data Semantics Undermine Observability
A range of collection tools exist — OpenTelemetry, OpenInference, Langfuse — and some frameworks like AgentScope and LangChain generate their own observability data. But when each source uses different naming conventions, attributes, and semantics, collected data becomes difficult to use:

● Storage reuse breaks down. Different observability backends support different data protocols, meaning data collected by one tool may not be correctly ingested, processed, and stored by another.

● Consumption logic cannot be shared. Even when tools share the same protocol (e.g., OTLP), semantic differences persist. The same metric may carry different names or labels across tools, making cross-platform display and processing unreliable.

This forces developers into a tight coupling between their observability backend and collection tooling. If a tool doesn't support the framework in use, developers must manually implement the backend's semantic specifications — a costly and error-prone process.

To address this, the OpenTelemetry GenAI SIG [1] — backed by dozens of leading cloud, AI, and observability vendors — established a common semantic specification for AI application observability [2]. It defines what to collect, how to name it, and in what form, across key GenAI interactions.

Platforms like Langfuse and Arize have adopted this standard, effectively decoupling observability backends from collection tooling. Once the collected data complies with the GenAI specification, subsequent visualization, consumption, and iteration will be much easier.

That said, correctly implementing the OpenTelemetry GenAI specification remains complex. Better tooling is needed to lower the barrier.

1.3 End-to-End Tracing: In-Process Visibility Is Not Enough
In production, agents and tool services frequently span multiple processes and services. Observing only in-process LLM calls leaves critical gaps: traces go unconnected, latency attribution becomes unclear, and the full request path is invisible. Meaningful troubleshooting and optimization require end-to-end visibility across the entire chain.

Single-framework observability cannot meet this need. Support for cross-process communication components — MCP, A2A, httpx, Flask, and others — is essential to closing the loop.

II. Solution: LoongSuite Python Agent
The LoongSuite Python agent addresses all three challenges out of the box.

It is Alibaba Cloud's open-source distribution of the OpenTelemetry Python agent — purpose-built to make AI application observability faster and more practical. It stays compatible with upstream standards while incorporating production-hardened practices and contributing improvements back to the community.

2.1 How It Works
Built on the OpenTelemetry standard, the LoongSuite Python agent instruments your AI application automatically — no changes to business code required. Simply wrap your start command and it handles the rest:

Auto-discovery — Detects and loads instrumentation based on the libraries present in your environment (e.g., DashScope, LangChain, Flask).
Unified semantics — All data conforms to OpenTelemetry GenAI semantic conventions, eliminating repeated adaptation for downstream visualization and consumption.
Full-stack coverage — Instruments both AI interactions (LLM, agent, tool, RAG, memory) and microservice calls (HTTP, gRPC, databases) — the foundation of end-to-end observability.
Flexible export — Exports data via OTLP to any compatible backend, including Jaeger, Langfuse, and Alibaba Cloud Observability.

2.2 Getting Started in Three Steps
Step 1: Install LoongSuite Distro from PyPI

pip install loongsuite-distro

Step 2: Install Instrumentation Packages

loongsuite-bootstrap -a install --version 0.1.0

This installs all AI-related instrumentation into your environment. Use --auto-detect to install only what's needed, or --whitelist for precise control over which instrumentation to include.

Step 3: Launch Your Application with an Bootstrapper

# Set the OTLP endpoint to your OTLP service address. The default value is gRPC.
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
# Enable statistics on the input and output of LLM calls.
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental \
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_ONLY \
loongsuite-instrument python app.py

That's it — your AI application is now fully instrumented.

What You Get
On any OTLP-compatible platform — Jaeger, Langfuse, Alibaba Cloud Observability — you can immediately view:

● Complete trace chains — LLM calls and microservice calls, all in one view.

● Granular performance metrics — Latency and error details for every invocation.

● Full context records — Captures inputs and outputs at key steps, ready for evaluation and downstream analysis.

III. LoongSuite and OpenTelemetry: The Relationship in Brief
The LoongSuite Python agent is a fork of OpenTelemetry Python Contrib. It maintains upstream compatibility while extending GenAI framework support and responding more quickly to the needs of the domestic ecosystem.

3.1 Why a Separate Release
● The upstream OTel framework matrix has limited coverage of the domestic ecosystem.

LoongSuite adds instrumentation for DashScope, AgentScope, Dify, MCP, Mem0, and more.
● Upstream development of opentelemetry-util-genai moves slowly and lacks production-ready features.

LoongSuite extends it with multimodal upload support, additional span types, and updated semantic specifications.
● Alibaba Cloud's commercial deployments have produced valuable practices, including:

ReAct round-level visualization and evaluation
Session-level trace auto-association
Through its independent release cadence, LoongSuite ships updates via loongsuite-distro commands, regularly syncs with upstream, and contributes downstream improvements back to the OpenTelemetry community.

3.2 Modules and Release Policy

IV. LoongSuite GenAI Util: A Superset of OTel GenAI Util
Not every AI agent is built on a managed framework. Many developers implement custom pipelines — calling self-hosted LLMs via REST APIs, hand-rolling ReAct loops, or building agents from scratch for more flexible and efficient control over context management.

These custom code paths fall outside the reach of automatic instrumentation and require manual tracing. Manual tracing done right involves more than adding a few spans. Developers must also consider:

● Correctly establishing parent-child span relationships

● Conforming to GenAI semantic conventions

● Properly capturing exceptions and faults

● Recording metrics and emitting logs

● Using consistent toggles to control capture of large input/output payloads

● Handling multimodal data separately to avoid bloating traces

● ...

To simplify this, the OTel GenAI SIG launched OpenTelemetry GenAI Util [4], which lets developers construct an invocation object and fill in the relevant fields — the utility handles the rest.

However, upstream development is slow and many features are not yet production-ready. LoongSuite GenAI Util [5] builds on this foundation to deliver a more complete, production-grade solution.

4.1 Supported Operation Types
The loongsuite-util-genai is available as a standalone PyPI package. It extends OpenTelemetry GenAI Util with broader span type coverage, multimodal handling, and enhanced semantic specifications.

4.2 Multimodal Upload: Keep Large Payloads Out of the Trace Pipeline
Images, audios, and videos are too large to embed directly in spans or events — doing so slows down the pipeline and inflates storage costs. LoongSuite GenAI Util handles this with asynchronous multimodal upload: large payloads are offloaded to OSS, SLS, or local storage, and only a URI reference is retained in the trace.

PreUploader — Detects Base64, Blob, and URI content; generates upload jobs; replaces multimodal parts in messages with URI references.
Uploader — Processes upload jobs asynchronously, without blocking business threads; supports idempotency to avoid duplicate uploads.
Storage backends — Supports file://, oss://, sls://, and more; integrates with OSS and SLS.

4.3 Getting Started with LoongSuite GenAI Util
Installation:

pip install loongsuite-util-genai
# Include multimodal upload support
pip install loongsuite-util-genai[multimodal_upload]

Environment configuration:

export OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=SPAN_AND_EVENT
export OTEL_INSTRUMENTATION_GENAI_EMIT_EVENT=true

# Multimodal upload (optional)
export OTEL_INSTRUMENTATION_GENAI_MULTIMODAL_UPLOAD_MODE=both
export 
OTEL_INSTRUMENTATION_GENAI_MULTIMODAL_STORAGE_BASE_PATH=file:///var/log/genai/multimodal

Manual instrumentation example using ExtendedTelemetryHandler:

from opentelemetry.util.genai.extended_handler import get_extended_telemetry_handler
from opentelemetry.util.genai.extended_types import InvokeAgentInvocation
from opentelemetry.util.genai.types import InputMessage, OutputMessage, Text

# Used to initialize the environment variable reading process, which is not required if you started the Python application using the method in section 2.2.
from opentelemetry.instrumentation._semconv import _OpenTelemetrySemanticConventionStability
if not _OpenTelemetrySemanticConventionStability._initialized:
    _OpenTelemetrySemanticConventionStability._initialize()
#1. Get the telemetry handler (can be used as a singleton)
handler=get_extended_telemetry_handler()

#2. Construct the InvokeAgent invocation
invocation = InvokeAgentInvocation(
    provider="dashscope",
    request_model=request["model"],
    agent_name="OrderAgent",
    input_messages=[
        InputMessage(role="user", parts=[Text(content="Check the status of order #101")]),
        InputMessage(role="system", parts=[Text(content="You are an order manager responsible for querying order information via tools")]),
    ]
)
with handler.invoke_agent(invocation) as invocation:
    #3. Execute InvokeAgent
    # ... Invoke the agent ...
    #4. Supplement the InvokeAgent result 
    invocation.output_messages = [
        OutputMessage(role="assistant", parts=[Text(content="Let me check that for you... Order #101 could not be found. Please verify the order number.")], finish_reason="stop")
    ]
    invocation.input_tokens=15
    invocation.output_tokens=20

V. Release Notes
Full release notes are available at: https://github.com/alibaba/loongsuite-python-agent/releases

1.Distribution and Ecosystem

The loongsuite-distro is now available on PyPI, providing loongsuite-bootstrap and loongsuite-instrument commands for one-command setup and launch.
Expanded instrumentation matrix with domestic ecosystem coverage: The self-developed instrumentation-loongsuite supports DashScope, AgentScope, Dify, MCP, Mem0, LangChain, Google ADK, Claude Agent SDK, Agno, and more.
2.LoongSuite GenAI Util

Multimodal upload — Automatically offloads Base64, Blob, and URI content to OSS, SLS, or local storage; retains URI references in messages; asynchronous by default.
Additional span types: invoke_agent, create_agent, execute_tool, retrieve, rerank, embedding, memory.
Enhanced semantic attributes: gen_ai.usage.total_tokens, gen_ai.response.time_to_first_token.
Expanded multimodal input support — Pre-upload pipeline now handles data URIs and local file paths.
Configurable hooks — Entry point extensions for PreUploader and Uploader.
VI. Conclusion
This release is just the beginning. Our roadmap is clear:

Move faster — Rapidly extend instrumentation coverage to keep pace with the domestic AI ecosystem.
Go deeper — Deliver more comprehensive multimodal handling, additional span and metric types, and up-to-date semantic specifications through LoongSuite GenAI Util.
Cover end to end — Unified tracing across AI and microservice calls to make end-to-end observability practical for multi-agent systems.
Stay upstream — Regularly sync with OpenTelemetry and contribute production-proven practices back to the community.
If you're building AI applications and care about observability, we invite you to try the LoongSuite Python agent, share your feedback, and contribute.

If you find the project useful, give us a ⭐ on GitHub — and join our developer community to help shape the observability tooling of the AI era.

WeChat group