DEV Community: Abhishek Vishwakarma

Your AI Agent Is a Black Box—Until OpenTelemetry & SigNoz Step In

Abhishek Vishwakarma — Tue, 14 Jul 2026 17:50:21 +0000

We’ve all been there. You build an AI agent, give it a sophisticated prompt, hook it up to a couple of external APIs, and let it run. It works beautifully in five consecutive tests. Then, on the sixth run, it loops endlessly, calls an LLM twenty times, drains your API credit balance, and returns an empty string.

Debugging traditional applications is straightforward—you look at the stack trace. But debugging autonomous AI agents? That feels like trying to read a mind. Because LLM calls are non-deterministic and agent loops can branch dynamically, standard log files just don't cut it anymore.

If you want to build production-ready AI applications without flying blind, you need distributed tracing. In this guide, we will step through how to set up the open-source APM tool SigNoz locally, instrument a Python AI agent using the standard OpenTelemetry library, and inspect nested spans to figure out exactly what’s happening under the hood.

The Problem: The Non-Deterministic AI Black Box

A typical AI agent works by executing a cycle of actions:

Retrieve: querying a Vector Database (like Chroma or pgvector) for relevant context.
Reason: sending the query and context to an LLM (like GPT-4 or Claude) to decide what tool to run.
Act: running local tools (e.g., executing Python calculations, searching the web, querying databases).
Synthesize: sending the tool output back to the LLM to generate the final response.

If a step fails or is slow, standard logs only show the end state or print raw, unorganized strings. You can't easily see:

How much latency was added by the Vector DB versus the LLM itself?
What exact prompt was sent to the LLM during the intermediate reasoning step?
How many tokens were consumed, and what was the actual API cost of that single execution?
Which tool invocation caused the agent loop to raise an exception?

The Solution: OpenTelemetry + SigNoz

By using OpenTelemetry (OTel), the industry standard for vendor-neutral observability, we can instrument every phase of our agent's lifecycle. Each phase is represented as a Span, and the entire execution path is linked together into a Trace.

To visualize these traces, we use SigNoz, a high-performance, open-source application performance monitoring (APM) system built on ClickHouse. SigNoz provides a unified dashboard for metrics, traces, and logs, making it the perfect developer-friendly alternative to expensive SaaS platforms.

Step 1: Spin Up SigNoz Locally

The easiest way to run SigNoz locally is using the SigNoz Foundry CLI (foundryctl).

First, install the CLI:

curl -fsSL https://signoz.io/foundry.sh | bash

Next, generate the deployment configuration:

foundryctl forge

Finally, start the SigNoz services using Docker Compose:

foundryctl cast

This starts ClickHouse, the query-service, the frontend UI, and the Otel-Collector. The dashboard will be accessible at http://localhost:8080.

(Note: On the first start, you will register your admin credentials at http://localhost:8080 to initialize the system.)

Step 2: Instrumenting the AI Agent in Python

Let's build a Python script (agent.py) simulating our agentic workflow. We'll instrument each step using OpenTelemetry's Tracer.

1. Install Dependencies

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

2. The Instrumentation Code (`agent.py`)

Here is the code representing the instrumented AI Agent. It creates a root span agent_run and nesting spans for the database lookup, reasoning, tool execution, and final synthesis. It also records metadata like model names, token counts, and token costs directly as span attributes.

import time
import random
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# 1. Initialize Tracer Provider with Resource Attributes
resource = Resource.create(attributes={
    "service.name": "ai-agent-service",
    "service.version": "1.0.0",
    "environment": "production",
    "project": "SigNoz-Hackathon"
})

provider = TracerProvider(resource=resource)

# 2. Connect the OTLP Exporter to our local SigNoz Otel-Collector (gRPC on port 4317)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
span_processor = SimpleSpanProcessor(otlp_exporter)
provider.add_span_processor(span_processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("ai-agent-tracer")

# --- Agent Tasks/Steps ---

def vector_db_lookup(query):
    with tracer.start_as_current_span("vector_db_search") as span:
        span.set_attribute("db.system", "chromadb")
        span.set_attribute("db.operation", "similarity_search")
        span.set_attribute("db.query", query)

        # Simulate query latency
        time.sleep(random.uniform(0.1, 0.3))

        span.set_attribute("db.result.count", 3)
        return ["AI agents need tracing", "OpenTelemetry standardizes observability", "SigNoz visualizes spans"]

def llm_reasoning(prompt, context):
    with tracer.start_as_current_span("llm_reasoning") as span:
        span.set_attribute("llm.provider", "openai")
        span.set_attribute("llm.model", "gpt-4o")
        span.set_attribute("llm.prompt", prompt)

        # Simulate token counts
        prompt_tokens = len(prompt.split()) + len(str(context).split())
        completion_tokens = random.randint(150, 300)

        span.set_attribute("llm.usage.prompt_tokens", prompt_tokens)
        span.set_attribute("llm.usage.completion_tokens", completion_tokens)
        span.set_attribute("llm.usage.total_tokens", prompt_tokens + completion_tokens)

        # Calculate simulated costs
        cost = (prompt_tokens * 2.5 + completion_tokens * 10) / 1_000_000
        span.set_attribute("llm.cost", cost)

        time.sleep(random.uniform(1.2, 2.5))  # LLM latency
        return "Reasoning: We must execute the math calculator tool to verify the cost."

def tool_execution(tool_name, arguments):
    with tracer.start_as_current_span("tool_execution") as span:
        span.set_attribute("tool.name", tool_name)
        span.set_attribute("tool.arguments", str(arguments))
        try:
            time.sleep(random.uniform(0.3, 0.8))  # Tool latency
            if tool_name == "calculator":
                result = str(eval(arguments))
                span.set_attribute("tool.status", "success")
                span.set_attribute("tool.output", result)
                return result
            else:
                raise ValueError(f"Unknown tool: {tool_name}")
        except Exception as e:
            # Capture exceptions inside spans automatically
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
            raise

def llm_synthesis(reasoning, tool_result):
    with tracer.start_as_current_span("llm_synthesis") as span:
        span.set_attribute("llm.provider", "openai")
        span.set_attribute("llm.model", "gpt-4o")

        prompt_tokens = len(reasoning.split()) + len(tool_result.split())
        completion_tokens = random.randint(80, 150)

        span.set_attribute("llm.usage.prompt_tokens", prompt_tokens)
        span.set_attribute("llm.usage.completion_tokens", completion_tokens)
        span.set_attribute("llm.usage.total_tokens", prompt_tokens + completion_tokens)

        cost = (prompt_tokens * 2.5 + completion_tokens * 10) / 1_000_000
        span.set_attribute("llm.cost", cost)

        time.sleep(random.uniform(0.8, 1.5))
        return "Final Answer: The calculation is complete and verified."

# --- Main Agent Loop ---

def run_agent(query):
    # Establish a parent context span for the entire execution
    with tracer.start_as_current_span("agent_run") as span:
        span.set_attribute("agent.query", query)
        print(f"[*] Starting agent with query: {query}")

        # 1. DB Lookup
        context = vector_db_lookup(query)
        print("[+] Vector DB lookup done.")

        # 2. LLM Reasoning
        reasoning = llm_reasoning(query, context)
        print("[+] LLM reasoning done.")

        # 3. Tool Execution
        try:
            tool_res = tool_execution("calculator", "125 * 8")
            print(f"[+] Calculator tool returned: {tool_res}")
        except Exception as e:
            print(f"[-] Tool error: {e}")
            tool_res = "Error"

        # 4. Final LLM Synthesis
        final_answer = llm_synthesis(reasoning, tool_res)
        print(f"[+] Final synthesis: {final_answer}")

        span.set_attribute("agent.status", "completed")

if __name__ == "__main__":
    print("[*] Generating AI Agent execution traces...")
    for i in range(10):
        run_agent(f"Calculate total token cost for query batch #{i+1}")
        time.sleep(1)

    # Flush tracer provider to ensure all spans reach SigNoz
    provider.shutdown()
    print("[*] All traces exported successfully.")

Step 3: Run the Agent

Run the script to populate SigNoz with fresh telemetry:

python3 agent.py

You will see output indicating that all steps are running synchronously and sending trace batches directly to the local collector.

Step 4: Visualizing Agent Traces in SigNoz

Open the SigNoz dashboard by navigating to http://localhost:8080 in your web browser.

Service Overview: On the Services page, you will see ai-agent-service listed. It calculates latency metrics, throughput (request rates), and error rates automatically based on the incoming traces.
Trace List: Navigate to the "Traces" tab, filter by service ai-agent-service, and search. You will see a list of the 10 distinct agent_run execution flows.
Trace Waterfall Diagram: Click on any of the traces to see the nested execution.
- You will see the main agent_run span representing the overall execution time.
- Nested inside it are vector_db_search, llm_reasoning, tool_execution, and llm_synthesis.
- Clicking on individual spans exposes the custom metadata attributes. For example, clicking llm_reasoning displays the input prompt, token usage, model (gpt-4o), and estimated API cost.
- If a tool fails or throws an exception, the tool_execution span turns red and displays the full traceback under the exceptions tab.

Conclusion: Observability is Prerequisite for Production AI

AI Agents are highly dynamic, stateful, and non-deterministic. Without tracing, they are black boxes that are incredibly hard to optimize or audit. By integrating OpenTelemetry and SigNoz, you gain deep insights into your agent's internals:

Pinpoint exactly where latency is coming from.
Keep track of tokens and compute costs per agent run.
Debug tool and database failures instantly.

observability is the key to scaling your AI agents from proof-of-concepts to robust, production-grade applications.

# Peek Inside the Black Box: Why Your AI Agent Needs OpenTelemetry and SigNoz

Abhishek Vishwakarma — Tue, 14 Jul 2026 17:50:18 +0000

The Problem: The Non-Deterministic AI Black Box

A typical AI agent works by executing a cycle of actions:

Retrieve: querying a Vector Database (like Chroma or pgvector) for relevant context.
Reason: sending the query and context to an LLM (like GPT-4 or Claude) to decide what tool to run.
Act: running local tools (e.g., executing Python calculations, searching the web, querying databases).
Synthesize: sending the tool output back to the LLM to generate the final response.

If a step fails or is slow, standard logs only show the end state or print raw, unorganized strings. You can't easily see:

How much latency was added by the Vector DB versus the LLM itself?
What exact prompt was sent to the LLM during the intermediate reasoning step?
How many tokens were consumed, and what was the actual API cost of that single execution?
Which tool invocation caused the agent loop to raise an exception?

The Solution: OpenTelemetry + SigNoz

Step 1: Spin Up SigNoz Locally

The easiest way to run SigNoz locally is using the SigNoz Foundry CLI (foundryctl).

First, install the CLI:

curl -fsSL https://signoz.io/foundry.sh | bash

Next, generate the deployment configuration:

foundryctl forge

Finally, start the SigNoz services using Docker Compose:

foundryctl cast

This starts ClickHouse, the query-service, the frontend UI, and the Otel-Collector. The dashboard will be accessible at http://localhost:8080.

(Note: On the first start, you will register your admin credentials at http://localhost:8080 to initialize the system.)

Step 2: Instrumenting the AI Agent in Python

Let's build a Python script (agent.py) simulating our agentic workflow. We'll instrument each step using OpenTelemetry's Tracer.

1. Install Dependencies

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

2. The Instrumentation Code (`agent.py`)

import time
import random
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# 1. Initialize Tracer Provider with Resource Attributes
resource = Resource.create(attributes={
    "service.name": "ai-agent-service",
    "service.version": "1.0.0",
    "environment": "production",
    "project": "SigNoz-Hackathon"
})

provider = TracerProvider(resource=resource)

# 2. Connect the OTLP Exporter to our local SigNoz Otel-Collector (gRPC on port 4317)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
span_processor = SimpleSpanProcessor(otlp_exporter)
provider.add_span_processor(span_processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("ai-agent-tracer")

# --- Agent Tasks/Steps ---

def vector_db_lookup(query):
    with tracer.start_as_current_span("vector_db_search") as span:
        span.set_attribute("db.system", "chromadb")
        span.set_attribute("db.operation", "similarity_search")
        span.set_attribute("db.query", query)

        # Simulate query latency
        time.sleep(random.uniform(0.1, 0.3))

        span.set_attribute("db.result.count", 3)
        return ["AI agents need tracing", "OpenTelemetry standardizes observability", "SigNoz visualizes spans"]

def llm_reasoning(prompt, context):
    with tracer.start_as_current_span("llm_reasoning") as span:
        span.set_attribute("llm.provider", "openai")
        span.set_attribute("llm.model", "gpt-4o")
        span.set_attribute("llm.prompt", prompt)

        # Simulate token counts
        prompt_tokens = len(prompt.split()) + len(str(context).split())
        completion_tokens = random.randint(150, 300)

        span.set_attribute("llm.usage.prompt_tokens", prompt_tokens)
        span.set_attribute("llm.usage.completion_tokens", completion_tokens)
        span.set_attribute("llm.usage.total_tokens", prompt_tokens + completion_tokens)

        # Calculate simulated costs
        cost = (prompt_tokens * 2.5 + completion_tokens * 10) / 1_000_000
        span.set_attribute("llm.cost", cost)

        time.sleep(random.uniform(1.2, 2.5))  # LLM latency
        return "Reasoning: We must execute the math calculator tool to verify the cost."

def tool_execution(tool_name, arguments):
    with tracer.start_as_current_span("tool_execution") as span:
        span.set_attribute("tool.name", tool_name)
        span.set_attribute("tool.arguments", str(arguments))
        try:
            time.sleep(random.uniform(0.3, 0.8))  # Tool latency
            if tool_name == "calculator":
                result = str(eval(arguments))
                span.set_attribute("tool.status", "success")
                span.set_attribute("tool.output", result)
                return result
            else:
                raise ValueError(f"Unknown tool: {tool_name}")
        except Exception as e:
            # Capture exceptions inside spans automatically
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
            raise

def llm_synthesis(reasoning, tool_result):
    with tracer.start_as_current_span("llm_synthesis") as span:
        span.set_attribute("llm.provider", "openai")
        span.set_attribute("llm.model", "gpt-4o")

        prompt_tokens = len(reasoning.split()) + len(tool_result.split())
        completion_tokens = random.randint(80, 150)

        span.set_attribute("llm.usage.prompt_tokens", prompt_tokens)
        span.set_attribute("llm.usage.completion_tokens", completion_tokens)
        span.set_attribute("llm.usage.total_tokens", prompt_tokens + completion_tokens)

        cost = (prompt_tokens * 2.5 + completion_tokens * 10) / 1_000_000
        span.set_attribute("llm.cost", cost)

        time.sleep(random.uniform(0.8, 1.5))
        return "Final Answer: The calculation is complete and verified."

# --- Main Agent Loop ---

def run_agent(query):
    # Establish a parent context span for the entire execution
    with tracer.start_as_current_span("agent_run") as span:
        span.set_attribute("agent.query", query)
        print(f"[*] Starting agent with query: {query}")

        # 1. DB Lookup
        context = vector_db_lookup(query)
        print("[+] Vector DB lookup done.")

        # 2. LLM Reasoning
        reasoning = llm_reasoning(query, context)
        print("[+] LLM reasoning done.")

        # 3. Tool Execution
        try:
            tool_res = tool_execution("calculator", "125 * 8")
            print(f"[+] Calculator tool returned: {tool_res}")
        except Exception as e:
            print(f"[-] Tool error: {e}")
            tool_res = "Error"

        # 4. Final LLM Synthesis
        final_answer = llm_synthesis(reasoning, tool_res)
        print(f"[+] Final synthesis: {final_answer}")

        span.set_attribute("agent.status", "completed")

if __name__ == "__main__":
    print("[*] Generating AI Agent execution traces...")
    for i in range(10):
        run_agent(f"Calculate total token cost for query batch #{i+1}")
        time.sleep(1)

    # Flush tracer provider to ensure all spans reach SigNoz
    provider.shutdown()
    print("[*] All traces exported successfully.")

Step 3: Run the Agent

Run the script to populate SigNoz with fresh telemetry:

python3 agent.py

You will see output indicating that all steps are running synchronously and sending trace batches directly to the local collector.

Step 4: Visualizing Agent Traces in SigNoz

Open the SigNoz dashboard by navigating to http://localhost:8080 in your web browser.

Service Overview: On the Services page, you will see ai-agent-service listed. It calculates latency metrics, throughput (request rates), and error rates automatically based on the incoming traces.
Trace List: Navigate to the "Traces" tab, filter by service ai-agent-service, and search. You will see a list of the 10 distinct agent_run execution flows.
Trace Waterfall Diagram: Click on any of the traces to see the nested execution.
- You will see the main agent_run span representing the overall execution time.
- Nested inside it are vector_db_search, llm_reasoning, tool_execution, and llm_synthesis.
- Clicking on individual spans exposes the custom metadata attributes. For example, clicking llm_reasoning displays the input prompt, token usage, model (gpt-4o), and estimated API cost.
- If a tool fails or throws an exception, the tool_execution span turns red and displays the full traceback under the exceptions tab.

Conclusion: Observability is Prerequisite for Production AI

Pinpoint exactly where latency is coming from.
Keep track of tokens and compute costs per agent run.
Debug tool and database failures instantly.

observability is the key to scaling your AI agents from proof-of-concepts to robust, production-grade applications.

Five Bugs Deep in an AI Memory Layer: My Week with Cognee

Abhishek Vishwakarma — Tue, 30 Jun 2026 05:34:36 +0000

By Abhishek Vishwakarma — final-year CS student, SOC analyst background, building toward GenAI/agentic AI engineering.

When I signed up for The Hangover Part AI: Where's My Context? — WeMakeDevs' hackathon built around Cognee — I didn't start by building a flashy demo. I started by reading code. Cognee promises AI agents a real memory: ingest anything, build a hybrid graph-vector knowledge store, and let agents remember(), recall(), improve(), and forget() across infinite sessions instead of waking up with amnesia every session. Before I trusted that promise enough to build on top of it, I wanted to know how solid the foundation actually was.

So instead of a project, I went issue-hunting on the Cognee GitHub repo — 25k+ stars, Python-first, the open-source backbone for a lot of "agent memory" products getting built right now. Five pull requests later, here's what I found and fixed.

1. Retrying an error that was never going to succeed

EmbeddingException is what Cognee's embedding engines raise when a chunk of text is too short to split further but still blows past the embedding model's context window. That's a deterministic failure — retrying it changes nothing. But the @retry decorator on embed_text in FastembedEmbeddingEngine, LiteLLMEmbeddingEngine, and OpenAICompatibleEmbeddingEngine was catching it anyway and retrying with exponential backoff for up to 128 seconds. In production this meant silent hangs on bad input; in CI it meant unit tests covering context-window fallbacks took over four minutes to run.

Fix: added EmbeddingException to the excluded exception types in retry_if_not_exception_type across all three engines, so non-transient errors fail fast instead of burning two minutes pretending they might not.

2. When "skip the bad entity" quietly breaks alignment

TripletSearchContextProvider builds search context by gathering results for a list of entities. The problem: when an entity was invalid (_get_entity_text(entity) returned None), it was still passed into _results_to_context(entities, results) alongside the valid entities' search tasks — but search tasks are only created for valid entities. That mismatch in list length silently zipped the wrong results to the wrong entities, with no error, just quietly wrong context.

Fix: filter to valid_entities before generating search tasks, and use that same filtered list when zipping results back into context. Added a unit test specifically verifying alignment holds.

3. Configuration that pretended to be dynamic

DefaultCrawlerConfig and TavilyConfig referenced environment variables like WEB_SCRAPER_TIMEOUT — but the Pydantic fields were bound at class-definition time, so changing the env var at runtime did nothing. The config looked configurable. It wasn't.

Fix: wrapped the env lookups in Field(default_factory=...) so timeout, concurrency, and crawl-delay settings are actually read fresh at instantiation, with a test verifying overrides take effect.

4. A docstring lying about its own function

Small one, but the kind of thing that costs someone an hour of debugging: is_embeddable(s: str)'s docstring claimed a string needed at least one alphanumeric character to be embeddable. The actual implementation only checked for one non-whitespace character. Different bar entirely — a string of just punctuation would pass the real check but, per the docs, shouldn't have.

Fix: corrected the docstring to match what the code actually does.

5. Serialization that broke on its own success

SearchResultPayload had two separate problems. First, its serialization logic couldn't properly handle nested Pydantic models, UUIDs, or collections inside result_object — it needed a real recursive serializer, not ad-hoc handling. Second, and sneakier: the result-resolution logic used a truthiness check, so a legitimately empty list, empty string, or empty dict in completion/context was treated as "nothing here" and silently fell back to different behavior — even though an empty result is still a valid result.

Fix: wrote a recursive serialize_value() helper covering BaseModel, UUIDs, lists/tuples/sets, and dicts, and replaced the truthiness check with an explicit is not None check so falsy-but-valid values are returned correctly. Added tests for both the complex serialization case and the falsy-completion case.

What this actually taught me

None of these are headline bugs — no security holes, no crashes that scream at you in production. They're the quiet kind: a retry that should never retry, a zip that's misaligned by one, a config that lies about being configurable, a docstring that's just wrong, a truthy/falsy mixup that throws away valid empty results. The kind you only find by actually reading the code path end to end instead of skimming the README and writing a demo on top of it.

Coming from a SOC/security background, that's basically the instinct I brought here: don't trust the surface, trace the actual data flow. Turns out that instinct travels well into "is this open-source memory layer solid enough to build agents on."

All five PRs are open and awaiting review as of writing. I'll update this post once they're merged — but whether or not all five land, this was a better use of hackathon week than shipping a demo I'd have to explain away in the README.

PRs: #3565 · #3566 · #3567 · #3568 · #3569

All code, fixes, and pull requests in this post are my own work. I used Claude (AI assistant) to help structure and draft the writeup, as disclosed per the hackathon rules.

Built for The Hangover Part AI by @wemakedevs, powered by Cognee.

and the above post is made by using claude

DEV Community: Abhishek Vishwakarma

Your AI Agent Is a Black Box—Until OpenTelemetry & SigNoz Step In

The Problem: The Non-Deterministic AI Black Box

The Solution: OpenTelemetry + SigNoz

Step 1: Spin Up SigNoz Locally

Step 2: Instrumenting the AI Agent in Python

1. Install Dependencies

2. The Instrumentation Code (agent.py)

Step 3: Run the Agent

Step 4: Visualizing Agent Traces in SigNoz

Conclusion: Observability is Prerequisite for Production AI

# Peek Inside the Black Box: Why Your AI Agent Needs OpenTelemetry and SigNoz

The Problem: The Non-Deterministic AI Black Box

The Solution: OpenTelemetry + SigNoz

Step 1: Spin Up SigNoz Locally

Step 2: Instrumenting the AI Agent in Python

1. Install Dependencies

2. The Instrumentation Code (agent.py)

Step 3: Run the Agent

Step 4: Visualizing Agent Traces in SigNoz

Conclusion: Observability is Prerequisite for Production AI

Five Bugs Deep in an AI Memory Layer: My Week with Cognee

1. Retrying an error that was never going to succeed

2. When "skip the bad entity" quietly breaks alignment

3. Configuration that pretended to be dynamic

4. A docstring lying about its own function

5. Serialization that broke on its own success

What this actually taught me

2. The Instrumentation Code (`agent.py`)

2. The Instrumentation Code (`agent.py`)