<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anil Murty</title>
    <description>The latest articles on DEV Community by Anil Murty (@anilmurty).</description>
    <link>https://dev.to/anilmurty</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3925538%2F7429c888-cc92-4f5f-a5da-c22be194079c.png</url>
      <title>DEV Community: Anil Murty</title>
      <link>https://dev.to/anilmurty</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anilmurty"/>
    <language>en</language>
    <item>
      <title>What is OpenTelemetry, and why does it matter for AI agents?</title>
      <dc:creator>Anil Murty</dc:creator>
      <pubDate>Mon, 11 May 2026 19:59:10 +0000</pubDate>
      <link>https://dev.to/anilmurty/what-is-opentelemetry-and-why-does-it-matter-for-ai-agents-3ddo</link>
      <guid>https://dev.to/anilmurty/what-is-opentelemetry-and-why-does-it-matter-for-ai-agents-3ddo</guid>
      <description>&lt;p&gt;&lt;em&gt;This post originally appeared on &lt;a href="https://www.tokenjam.dev/blog/2026-05-10-opentelemetry-for-ai-agents?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. It's part of a 14-post series on the agentic AI ecosystem.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenTelemetry is the CNCF standard for vendor-neutral observability instrumentation. Write once, export anywhere.&lt;/li&gt;
&lt;li&gt;Three core components: SDKs (in your code), OTLP (the wire protocol), and collectors/backends (where data lives).&lt;/li&gt;
&lt;li&gt;The GenAI semantic conventions define a shared schema for LLM traces: &lt;code&gt;gen_ai.request.model&lt;/code&gt;, &lt;code&gt;gen_ai.usage.input_tokens&lt;/code&gt;, and others. They're actively evolving but already widely adopted.&lt;/li&gt;
&lt;li&gt;Claude Code natively emits OTLP traces with &lt;code&gt;CLAUDE_CODE_ENABLE_TELEMETRY=1&lt;/code&gt;; agent frameworks like LangChain, LlamaIndex, and others follow the same pattern.&lt;/li&gt;
&lt;li&gt;Lock-in is the real problem OTel solves. Instrument once, and any OTel-aware backend can consume your traces without re-architecting.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is OpenTelemetry?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;OpenTelemetry&lt;/strong&gt; is the Cloud Native Computing Foundation's standard for collecting and exporting observability signals (traces, metrics, and logs) from applications. Instead of locking you into a single vendor's telemetry format, OpenTelemetry defines &lt;em&gt;how&lt;/em&gt; applications emit telemetry data in a vendor-neutral way, using the &lt;a href="https://opentelemetry.io/docs/specs/otlp/" rel="noopener noreferrer"&gt;OpenTelemetry Protocol (OTLP)&lt;/a&gt;. You instrument your code once and can send your telemetry to any compatible backend: Datadog, New Relic, Grafana, Jaeger, or any other system that speaks OTLP.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three components: SDKs, OTLP, and backends
&lt;/h2&gt;

&lt;p&gt;Three moving parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. SDKs (instrumentation in your code)
&lt;/h3&gt;

&lt;p&gt;An OpenTelemetry SDK is a library that runs in your application. It collects traces, metrics, and logs from your code and hands them off for export. You install it, configure it, and call its APIs (or rely on auto-instrumentation) to emit telemetry. For Python agents, the &lt;a href="https://github.com/open-telemetry/opentelemetry-python" rel="noopener noreferrer"&gt;OpenTelemetry Python SDK&lt;/a&gt; is the foundation. For TypeScript, &lt;a href="https://github.com/open-telemetry/opentelemetry-js" rel="noopener noreferrer"&gt;OpenTelemetry JavaScript&lt;/a&gt; serves the same purpose.&lt;/p&gt;

&lt;p&gt;SDKs do the heavy lifting. They manage span lifecycle, batch telemetry, apply sampling policies, and handle backpressure when backends are slow. Different instrumentation libraries (for LangChain, Anthropic, Ollama, and others) sit &lt;em&gt;on top&lt;/em&gt; of an SDK and emit standardized spans into it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. OTLP: the wire protocol
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://opentelemetry.io/docs/specs/otlp/" rel="noopener noreferrer"&gt;OTLP (OpenTelemetry Protocol)&lt;/a&gt; is how telemetry gets from your SDK to a backend. OTLP runs over gRPC or HTTP/1.1, uses Protocol Buffers for encoding, and specifies backpressure handling and retry semantics. You don't think about OTLP directly. It's configured via environment variables like &lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt; and &lt;code&gt;OTEL_EXPORTER_OTLP_HEADERS&lt;/code&gt;. It's the contract between your SDK and any backend that claims OpenTelemetry support.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Collectors and backends
&lt;/h3&gt;

&lt;p&gt;An OpenTelemetry &lt;strong&gt;collector&lt;/strong&gt; is a standalone service that receives telemetry data via OTLP and routes it to backends, applies transformations, and handles batching at scale. A &lt;strong&gt;backend&lt;/strong&gt; (Datadog, Grafana Loki, Jaeger, Honeycomb, and others) stores and queries your traces. You can skip the collector for small workloads. Many apps export directly to a cloud backend via OTLP. Collectors give you flexibility: they let you filter and enrich telemetry before it hits your backend, and they buffer data when backends are slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agents need OpenTelemetry specifically
&lt;/h2&gt;

&lt;p&gt;Vendor lock-in is real for observability. If you instrument your agent to emit telemetry in Datadog's proprietary format, switching to New Relic means rewriting instrumentation across your codebase. For organizations with many agents and teams, this tax is enormous.&lt;/p&gt;

&lt;p&gt;OpenTelemetry fixes this by making the instrumentation the &lt;em&gt;constant&lt;/em&gt;, not the vendor. Your agent code emits OTLP. Your backend is the variable. You can migrate backends, or use multiple backends simultaneously, without touching your instrumentation layer.&lt;/p&gt;

&lt;p&gt;This matters more for agent teams because agent complexity is growing. A modern agent traces LLM calls, tool invocations, retrieval steps, and agent reasoning across multiple frameworks and runtimes. A shared observability standard means you're not training teams to emit telemetry differently for each agent tool; they all follow the same conventions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GenAI semantic conventions
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry includes a specification for semantic conventions: standardized attribute names and meanings that make spans interoperable across backends. For generative AI, the &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" rel="noopener noreferrer"&gt;OpenTelemetry GenAI semantic conventions&lt;/a&gt; define how to structure traces from LLM calls and agent steps.&lt;/p&gt;

&lt;p&gt;Key attributes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.system&lt;/code&gt;&lt;/strong&gt;: The GenAI system or LLM provider (e.g., &lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;anthropic&lt;/code&gt;, &lt;code&gt;ollama&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.request.model&lt;/code&gt;&lt;/strong&gt;: The name of the model being invoked (e.g., &lt;code&gt;claude-3-5-sonnet&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.operation.name&lt;/code&gt;&lt;/strong&gt;: The operation type (e.g., &lt;code&gt;chat&lt;/code&gt;, &lt;code&gt;completion&lt;/code&gt;, &lt;code&gt;embedding&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.usage.input_tokens&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;gen_ai.usage.output_tokens&lt;/code&gt;&lt;/strong&gt;: Token counts from the LLM response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.response.id&lt;/code&gt;&lt;/strong&gt;: The response ID from the model provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.agent.id&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;gen_ai.agent.name&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;gen_ai.agent.version&lt;/code&gt;&lt;/strong&gt;: Identity and version of the agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gen_ai.conversation.id&lt;/code&gt;&lt;/strong&gt;: Unique identifier for a conversation thread (for multi-turn traces).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These conventions are &lt;em&gt;actively evolving&lt;/em&gt;; the specification is not frozen. That's by design. As new use cases emerge (tool use, function calling, retrieval-augmented generation, multi-agent coordination), the spec grows. Tools that adopt the conventions now benefit immediately. They gain interoperability across backends even as the spec matures.&lt;/p&gt;

&lt;p&gt;Adopting these conventions in your agent instrumentation means any OpenTelemetry-aware backend can parse and query your traces without custom parsing logic. You get consistent dashboards and analytics across vendors.&lt;/p&gt;

&lt;h2&gt;
  
  
  How real agent runtimes emit OTel today
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry adoption in the agent ecosystem is accelerating. Concrete examples:&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Claude Code natively emits OpenTelemetry traces when you set the &lt;code&gt;CLAUDE_CODE_ENABLE_TELEMETRY=1&lt;/code&gt; environment variable. You then configure where traces go using standard OTEL environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_ENABLE_TELEMETRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://your-backend.example.com:4317
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;grpc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For full configuration details, see the &lt;a href="https://code.claude.com/docs/en/env-vars" rel="noopener noreferrer"&gt;Claude Code environment variables documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain
&lt;/h3&gt;

&lt;p&gt;LangChain supports OpenTelemetry instrumentation via the &lt;a href="https://pypi.org/project/opentelemetry-instrumentation-langchain/" rel="noopener noreferrer"&gt;opentelemetry-instrumentation-langchain&lt;/a&gt; package. You instrument your LangChain app and export via any OTLP-compatible backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LangchainInstrumentor&lt;/span&gt;

&lt;span class="nc"&gt;LangchainInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traces follow the GenAI conventions, so your LangChain chains are observable across any OpenTelemetry-aware platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  LlamaIndex + OpenInference
&lt;/h3&gt;

&lt;p&gt;LlamaIndex integrates with &lt;a href="https://arize-ai.github.io/openinference/" rel="noopener noreferrer"&gt;OpenInference&lt;/a&gt;, a set of conventions built &lt;em&gt;on top of&lt;/em&gt; OpenTelemetry for AI observability. OpenInference spans are valid OTLP traces, so you get the same portability as native OpenTelemetry.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenLLMetry by Traceloop
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.traceloop.com/docs/openllmetry/introduction" rel="noopener noreferrer"&gt;OpenLLMetry&lt;/a&gt; is a collection of OpenTelemetry instrumentations for LLM apps. It provides ready-made instrumentation for LangChain, Anthropic, Ollama, Pinecone, Qdrant, and many other LLM-adjacent services. Because it's built on OpenTelemetry, any instrumentation you install works with any OTLP backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notable tools and SDKs
&lt;/h2&gt;

&lt;p&gt;The OpenTelemetry ecosystem for agents includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/open-telemetry/opentelemetry-python" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenTelemetry Python SDK&lt;/strong&gt;&lt;/a&gt;: The core SDK for Python agents. Use this as your foundation for any Python-based agent instrumentation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/open-telemetry/opentelemetry-js" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenTelemetry JavaScript SDK&lt;/strong&gt;&lt;/a&gt;: The equivalent for Node.js and browser-based agents.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/traceloop/openllmetry" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenLLMetry&lt;/strong&gt;&lt;/a&gt;: Pre-built instrumentations for LangChain, Anthropic, OpenAI, LlamaIndex, Ollama, Qdrant, and others. Reduces boilerplate if your agent uses popular frameworks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arize-ai.github.io/openinference/" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenInference&lt;/strong&gt;&lt;/a&gt; by Arize: A semantic convention and instrumentation set for AI workloads. Integrates with OpenTelemetry and works with any OTel backend, including Arize Phoenix, Jaeger, and Datadog.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Arize-ai/phoenix" rel="noopener noreferrer"&gt;&lt;strong&gt;Phoenix by Arize&lt;/strong&gt;&lt;/a&gt;: An open-source observability tool for ML and LLM apps that consumes OpenInference (and thus OpenTelemetry) traces.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collector distributions&lt;/strong&gt;: &lt;a href="https://github.com/open-telemetry/opentelemetry-collector" rel="noopener noreferrer"&gt;OpenTelemetry Collector&lt;/a&gt; is the standard. Vendor-specific distributions (e.g., Datadog Agent, New Relic Agent) also speak OTLP.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why are my LLM calls showing up as HTTP spans instead of GenAI spans?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You probably have base HTTP instrumentation without an LLM-aware layer on top. The default OpenTelemetry HTTP instrumentation captures your LLM API calls as plain HTTP spans (&lt;code&gt;POST /v1/messages&lt;/code&gt;, &lt;code&gt;200 OK&lt;/code&gt;, 142ms). They show up. They're just missing the actually-useful attributes: model name, token counts, response ID. To get GenAI semantic-convention spans, install an LLM-aware instrumentor: OpenLLMetry's Anthropic or OpenAI instrumentor, OpenInference, or use a framework that emits GenAI spans natively (Claude Code, LangChain via its OTel package). Install the instrumentor (e.g., &lt;code&gt;opentelemetry-instrumentation-anthropic&lt;/code&gt; from OpenLLMetry) and initialize it before your code creates the LLM client. After that, calls to &lt;code&gt;client.messages.create()&lt;/code&gt; should produce &lt;code&gt;gen_ai.*&lt;/code&gt; spans alongside the HTTP spans, and you can filter on &lt;code&gt;gen_ai.system&lt;/code&gt; in your backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which OpenTelemetry SDK should I use with my agent framework?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Depends on your language and framework. Python agents use the &lt;a href="https://github.com/open-telemetry/opentelemetry-python" rel="noopener noreferrer"&gt;OpenTelemetry Python SDK&lt;/a&gt;. If you're on LangChain, LlamaIndex, or another framework, look for that framework's OTel instrumentation package first (via OpenLLMetry or framework-native support). If no instrumentation exists, you can hand-instrument your code using the SDK directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's OTLP?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OTLP is the OpenTelemetry Protocol: the wire format and transport mechanism for sending telemetry data from your SDK to a collector or backend. It's built on Protocol Buffers and runs over gRPC or HTTP/1.1. You don't configure OTLP directly. You set environment variables like &lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt; to point your SDK at a backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does setting &lt;code&gt;CLAUDE_CODE_ENABLE_TELEMETRY=1&lt;/code&gt; send my data to Anthropic?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. The flag tells Claude Code to emit OpenTelemetry traces to whatever OTLP endpoint &lt;em&gt;you&lt;/em&gt; configure via &lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt;. If you don't set an endpoint, the SDK has nowhere to send them and they're dropped on the floor. Anthropic doesn't receive your traces from this path. That's distinct from Anthropic's usage-and-billing telemetry, which is sent to Anthropic regardless of the OTel flag because it's how the API gets metered. The OTel data is for &lt;em&gt;you&lt;/em&gt;: send it to Datadog, Grafana, a local Jaeger, or wherever you run observability. See the &lt;a href="https://code.claude.com/docs/en/env-vars" rel="noopener noreferrer"&gt;Claude Code env-vars docs&lt;/a&gt; for the full configuration list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I set up telemetry export in my agent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standard pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the OpenTelemetry SDK for your language.&lt;/li&gt;
&lt;li&gt;Install instrumentation packages for your frameworks (LangChain, Anthropic, and so on).&lt;/li&gt;
&lt;li&gt;Initialize the instrumentation in your agent startup code.&lt;/li&gt;
&lt;li&gt;Set OTEL environment variables to point at your backend:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/code&gt;: Your backend's OTLP endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/code&gt;: &lt;code&gt;grpc&lt;/code&gt; or &lt;code&gt;http/protobuf&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OTEL_EXPORTER_OTLP_HEADERS&lt;/code&gt;: Auth headers, if needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See your backend's documentation for the specific OTLP endpoint URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use OpenTelemetry with multiple backends simultaneously?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Configure multiple exporters in your SDK, or use an OpenTelemetry Collector to fan telemetry out to multiple destinations. Common during a backend migration, or when you want redundancy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" rel="noopener noreferrer"&gt;OpenTelemetry GenAI semantic conventions specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentelemetry.io/docs/specs/otlp/" rel="noopener noreferrer"&gt;OTLP Protocol specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/env-vars" rel="noopener noreferrer"&gt;Claude Code environment variables and telemetry setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.traceloop.com/docs/openllmetry/introduction" rel="noopener noreferrer"&gt;OpenLLMetry documentation and instrumentations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arize-ai.github.io/openinference/" rel="noopener noreferrer"&gt;OpenInference specification for AI observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;See also: &lt;a href="https://tokenjam.dev/blog/what-is-agent-observability" rel="noopener noreferrer"&gt;What is agent observability?&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://www.tokenjam.dev/blog/2026-05-10-opentelemetry-for-ai-agents?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. Part of an ongoing series on the agentic AI ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>monitoring</category>
      <category>observability</category>
    </item>
    <item>
      <title>What is Agent Observability?</title>
      <dc:creator>Anil Murty</dc:creator>
      <pubDate>Mon, 11 May 2026 19:52:34 +0000</pubDate>
      <link>https://dev.to/anilmurty/what-is-agent-observability-2kpl</link>
      <guid>https://dev.to/anilmurty/what-is-agent-observability-2kpl</guid>
      <description>&lt;p&gt;&lt;em&gt;This post originally appeared on &lt;a href="https://www.tokenjam.dev/blog/2026-05-09-agent-observability?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. It's part of a 14-post series on the agentic AI ecosystem.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent observability captures what an agent did (tool calls, token costs, latency, reasoning chains) at detail sufficient to debug and audit behavior in production&lt;/li&gt;
&lt;li&gt;Traditional logs and metrics aren't enough; you need traces that record the LLM's step-by-step decisions, tool invocations, and outcomes&lt;/li&gt;
&lt;li&gt;Agents are harder to observe than services because of nondeterminism, deeply nested calls, prompts and completions as data, and vocabulary that didn't exist three years ago&lt;/li&gt;
&lt;li&gt;OpenTelemetry GenAI semantic conventions are becoming the emerging standard for agent telemetry&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Agent observability is the practice of capturing what an AI agent did (its tool calls, token costs, behavioral patterns, and outcomes) at a level of detail sufficient to debug, optimize, and audit agent behavior in production. You record the agent's full journey: every decision point, every tool invocation, every LLM call with inputs and outputs, latencies, costs, and errors. Service observability captures &lt;em&gt;what your code did&lt;/em&gt;. Agent observability captures the reasoning chain itself: the sequence of thoughts and decisions that led the agent to act.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agent observability is harder than service observability
&lt;/h2&gt;

&lt;p&gt;Service observability is built on a predictable model. A request comes in, your code executes a series of steps, a response goes out. Each step is deterministic. Logs tell you what happened. Metrics tell you how long it took and whether it succeeded.&lt;/p&gt;

&lt;p&gt;Agents break this model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nondeterminism is the core problem.&lt;/strong&gt; The same input to an agent with the same model and parameters might produce different outputs on different runs. The LLM samples from a probability distribution. You can't debug an agent from logs alone. You have to capture the complete trace of that specific run to understand what reasoning led to that specific output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool calls are deeply nested.&lt;/strong&gt; A service call stack might be five or ten levels deep. An agentic system can have an agent call a tool, which triggers a retrieval operation, which calls an embedding model, which calls a database, which triggers another tool. The nesting is deep and irregular. A trace that doesn't capture every step in this chain will miss the real bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts and completions are your actual data.&lt;/strong&gt; In a service, your data is SQL queries and JSON payloads. In an agent, your data is the prompt sent to the LLM and the completion it returned. These are large and unstructured. They're often sensitive: they contain user context, proprietary information, internal state. Traditional logging systems don't handle this well. Observability for agents has to be built around capturing and safely storing these artifacts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vocabulary didn't exist three years ago.&lt;/strong&gt; Terms like "token usage," "tool selection," "context window," and "hallucination" are specific to the agentic context. Existing APM (application performance monitoring) tools (Datadog, New Relic, Dynatrace) were built for microservices. They have no native concept of an LLM call, a token count, or a tool invocation. Shoehorning agent data into these systems works. It's also awkward.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three pillars, adapted for agents
&lt;/h2&gt;

&lt;p&gt;Observability has three pillars: traces, metrics, logs. The definitions shift when you apply them to agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traces&lt;/strong&gt; capture the complete execution path of a request. In a microservice, a trace is a sequence of function calls and RPC hops. In an agent, a trace is the agent's full journey: the user input, each LLM call (with prompt and completion), each tool invocation and result, latency at each step, token usage at each step, and the final output. A trace is the highest-fidelity record you have. It answers questions like "Why did the agent choose tool X instead of tool Y?" or "Where did the latency spike occur?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt; are aggregations: counts and percentiles. In services, you track request latency, error rate, throughput. For agents, you track cost per request (sum of token usage × model pricing), latency per LLM call, tool invocation frequency, error rates (both LLM errors and tool errors), and token efficiency (useful output tokens vs. wasted context). Metrics let you spot trends over time and set up alerts when something goes wrong at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logs&lt;/strong&gt; are raw events: "This LLM call failed," "Token limit exceeded," "Tool returned an error." In a service, logs focus on errors. In an agent, logs are also informational: "Agent selected tool X." "Retry attempt 2 of 3." Logs are lower resolution than traces. They're faster to query and more storage-efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually capture
&lt;/h2&gt;

&lt;p&gt;A production-grade agent observability system captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM calls&lt;/strong&gt;: Model name, parameters (temperature, max_tokens, top_p), the prompt sent, the completion received, token counts (input and output), latency, cost, success or failure. This is the core of agent observation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool invocations&lt;/strong&gt;: Tool name, input parameters, output, latency, whether the tool succeeded or failed, and any retry information. Tools are where your agent touches the outside world. They cause most of your latency and most of your errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token usage per call&lt;/strong&gt;: Not just total tokens consumed. A breakdown: how many tokens in the context window, how many in the prompt, how many in the response. This helps you optimize context and identify tokens wasted on irrelevant context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The agent's reasoning chain&lt;/strong&gt;: The intermediate thoughts or justifications the agent produced at each step. Some LLM frameworks (like ReAct) explicitly generate these; others encode them implicitly. Capturing this chain is what lets you debug why an agent made a particular decision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model and parameters&lt;/strong&gt;: Which model was used, which version, what temperature and sampling parameters. This matters because the same agent with different parameters can behave very differently.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Errors and retries&lt;/strong&gt;: When a tool call failed, did the agent retry? How many times? Did it eventually succeed or give up? This tells you if your agent is robust or brittle.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency per layer&lt;/strong&gt;: Total latency is a sum of LLM latency + tool latency + overhead. Breaking this down tells you where to optimize.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These signals should conform to the &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" rel="noopener noreferrer"&gt;OpenTelemetry semantic conventions for generative AI&lt;/a&gt;. The conventions define a standard schema for representing LLM calls, tool use, embeddings, and agent systems in trace data. Adopting the standard means your agent traces can be ingested by any OpenTelemetry-compatible backend (Jaeger, Datadog, Elastic, or a custom system) without vendor lock-in. See &lt;a href="https://tokenjam.dev/blog/what-is-opentelemetry-for-ai-agents" rel="noopener noreferrer"&gt;What is OpenTelemetry for AI agents?&lt;/a&gt; for a deeper dive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why does my trace show 47 LLM calls when I only invoked the agent once?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three common causes. First, the framework you're using (LangChain, LlamaIndex, AutoGen, CrewAI) might be making nested chains where each "step" is itself an LLM call: a planning call, an action call, a reflection call, a synthesis call. A single user request fans out fast. Second, retries: if a tool call returns an unexpected error or the LLM produces malformed output, many frameworks silently retry with backoff, multiplying calls. Third, agent loops: if the agent can't converge on an answer, it keeps reasoning and acting until it hits a max-iteration limit. Open the trace tree and look at timestamps. Tightly clustered calls with the same model and parameters mean retries. Spread-out calls with different prompts mean the framework is decomposing the task more than you expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My agent traces are 50MB each. Should I be worried?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, in a specific way. Trace size is dominated by prompt and completion text. A 50MB trace means you're sending massive prompts to the LLM: huge system prompts, retrieved documents, long conversation history, included file contents. The cost is real: that's a lot of input tokens per call. The performance hit is also real because most trace UIs struggle to render or query traces above ~10MB. Two fixes work. First, reduce what you put in the prompt: tighter system prompts, smarter retrieval, summarize conversation history rather than passing it raw. Second, configure your observability tool to truncate long fields above a threshold (Langfuse, Arize Phoenix, and Datadog all support this). Truncated traces are still useful for navigation, and you can fetch the full prompt from your application logs if you actually need it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use my existing APM (Datadog, New Relic) for agents?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Partially. Datadog and New Relic have built LLM modules onto their existing platforms. They work. They weren't designed for agents from the ground up. They're better at capturing that an LLM call happened than at capturing the reasoning chain or the interaction between multiple tool calls. If you're already in Datadog, LLM Observability is a reasonable choice. If you're starting fresh, a tool built for agents will give you more signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What should I capture in production agent traces?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with: every LLM call (prompt and completion), every tool invocation (name and result), latency per call, total token usage, and final outcome (success or failure). Add error details if the agent failed. Once that's stable, add cost breakdown per model and tool selection reasoning. Don't try to capture everything on day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I avoid storing sensitive data in traces?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most tools support redaction: marking which fields should not be logged (API keys, user PII, secrets). Some (like Datadog LLM Observability) ship with automatic PII detection. Build redaction into your SDK wrapper early; it's easier to add than to retrofit. Also consider sampling. You don't need to trace every request, just a statistically significant sample.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much overhead does observability add?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Good observability SDKs are asynchronous. Traces are queued locally and sent in batches in the background, so they add minimal latency to your agent's response time. Expect overhead of 5–15% at the p99, depending on the tool and your stack. That's a worthwhile trade-off for production visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" rel="noopener noreferrer"&gt;OpenTelemetry semantic conventions for generative AI&lt;/a&gt;. The emerging standard for agent telemetry. Start with the GenAI spans spec.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tokenjam.dev/blog/what-is-an-ai-agent" rel="noopener noreferrer"&gt;What is an AI agent?&lt;/a&gt;. Background on agent architecture and how agents differ from prompt-based systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tokenjam.dev/blog/what-is-opentelemetry-for-ai-agents" rel="noopener noreferrer"&gt;What is OpenTelemetry for AI agents?&lt;/a&gt;. Deep dive into OpenTelemetry's semantic conventions and how to instrument agents with OTel.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://www.tokenjam.dev/blog/2026-05-09-agent-observability?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. Part of an ongoing series on the agentic AI ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Agents 101: Reasoning, Actions &amp; Autonomy</title>
      <dc:creator>Anil Murty</dc:creator>
      <pubDate>Mon, 11 May 2026 19:30:52 +0000</pubDate>
      <link>https://dev.to/anilmurty/agents-101-reasoning-actions-autonomy-3imm</link>
      <guid>https://dev.to/anilmurty/agents-101-reasoning-actions-autonomy-3imm</guid>
      <description>&lt;p&gt;&lt;em&gt;This post originally appeared on &lt;a href="https://www.tokenjam.dev/blog/2026-05-08-agents-101?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. It's part of a 14-post series on the agentic AI basics and ecosystem.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AI agent uses an LLM to reason about a goal and decide what actions to take, calling tools and observing results until the goal is reached&lt;/li&gt;
&lt;li&gt;Agents differ fundamentally from chatbots (which don't act) and workflows (which don't decide)&lt;/li&gt;
&lt;li&gt;The ReAct pattern (reasoning + acting) is the dominant architecture in modern agent systems&lt;/li&gt;
&lt;li&gt;Agents range from copilots that suggest actions to fully autonomous systems that run unattended for hours&lt;/li&gt;
&lt;li&gt;Key components: the LLM (reasoning), tools (actions), context/memory (state), and a control loop (orchestration)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What is an AI agent?&lt;/strong&gt; An AI agent is a system that uses a large language model to make decisions and take actions in pursuit of a goal. It calls tools, observes what they return, and iterates until the goal is reached. A chatbot waits for the next message; an agent plans and executes its own sequence of steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;The term entered the mainstream in late 2022, when projects like AutoGPT showed that LLMs could direct their own execution. The concept wasn't new. Researchers had been studying goal-directed autonomous systems for decades. What changed was accessibility: capable base models (GPT-4, Claude) and standardized tool-calling APIs made it practical to build a working agent in a few dozen lines of code.&lt;/p&gt;

&lt;p&gt;The word now gets used loosely. Some vendors call a chatbot with a search feature an agent. Others claim that any LLM inference with retrieval is "agentic." This inflation matters. It obscures what's actually new and what's repackaging. Precision helps you know what you're building or evaluating.&lt;/p&gt;

&lt;p&gt;Agents represent a shift in how LLMs are deployed. The old model: user asks a question, system returns an answer, conversation ends. Agents invert that. The system receives a goal, decides on sub-goals, gathers information, corrects itself, and iterates without waiting for permission between steps. New architecture. New error handling. New thinking about safety and observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents vs. chatbots vs. workflows vs. traditional AI
&lt;/h2&gt;

&lt;p&gt;A quick way to distinguish these four categories is to ask: does it use an LLM to decide what to do next? And can it call tools to act on those decisions?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chatbots&lt;/strong&gt; use an LLM to generate text. They don't call tools, and they don't pursue goals across steps. A customer-service chatbot answers your question. It doesn't modify your account or call internal APIs unless you ask. Even then, it tends to suggest options or retrieve data rather than decide and act. The LLM's job is to understand and respond.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflows&lt;/strong&gt; call tools and pursue goals. They don't use an LLM to decide which tool to call or how to interpret the result. A workflow might be: fetch customer data, run a validation rule, log an event, send an email. Each step is predefined. Branching is rule-based. The LLM is not in the loop. Workflows are predictable and cheap. They break when the task is ambiguous or open-ended.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents&lt;/strong&gt; combine both. The LLM observes the current state and decides which tool to call next. It adapts and self-corrects as it goes. If a tool call fails, the agent reasons about why and tries something else. The flexibility costs you something. Agents are less predictable, more expensive per inference, and harder to debug. The reward is open-ended tasks, where the path isn't predetermined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional AI/ML systems&lt;/strong&gt; (classifiers, regressions, recommenders) optimize a fixed function learned from data. They have no LLM, and they don't pursue multi-step goals. They are specialized and efficient. Generalizing to a new task means retraining.&lt;/p&gt;

&lt;p&gt;The table below summarizes the differences:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Chatbot&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Traditional ML&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Uses LLM to decide next step?&lt;/td&gt;
&lt;td&gt;No (generates text)&lt;/td&gt;
&lt;td&gt;No (follows rules)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calls tools?&lt;/td&gt;
&lt;td&gt;Rarely; usually retrieval only&lt;/td&gt;
&lt;td&gt;Yes; predefined sequence&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes; chosen by LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pursues multi-step goal?&lt;/td&gt;
&lt;td&gt;No (responds to input)&lt;/td&gt;
&lt;td&gt;Yes; fixed path&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes; adaptive path&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles ambiguous tasks?&lt;/td&gt;
&lt;td&gt;Moderate (can discuss)&lt;/td&gt;
&lt;td&gt;Poor (requires rigid structure)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Good (can reason and adapt)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The ReAct pattern and core components
&lt;/h2&gt;

&lt;p&gt;Most agents built since 2023 follow a pattern called ReAct (Reasoning and Acting), introduced in &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;Yao et al.'s 2022 paper&lt;/a&gt; from Google Research and Princeton. The idea is straightforward. The LLM produces reasoning steps (thinking aloud about what it needs to do) interleaved with actions (tool calls). It observes the result, then reasons further.&lt;/p&gt;

&lt;p&gt;A ReAct loop looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Observation&lt;/strong&gt;: The agent observes the current state (the original goal, prior tool results, conversation history).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning&lt;/strong&gt;: The LLM thinks through the problem: "I need to fetch the user's account, check their history, then decide whether to approve the request."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action&lt;/strong&gt;: The agent calls a tool, say &lt;code&gt;fetch_account(user_id)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observation&lt;/strong&gt;: The agent receives the result and feeds it back to the LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loop&lt;/strong&gt;: The LLM reasons again, decides on the next action, and repeats until it either reaches the goal or determines that the goal isn't achievable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pattern works because the reasoning traces make the LLM's decisions interpretable. You can see why it chose an action. They also enable self-correction: if a tool result is unexpected, the LLM can reason about what went wrong.&lt;/p&gt;

&lt;p&gt;An agent's core components are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The LLM&lt;/strong&gt; (reasoning engine): Decides what action to take based on the goal and current state. The decision-making layer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; (action layer): Functions the agent can call: APIs, database queries, code execution, web searches, file operations. Tools are how the agent affects the world.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context and memory&lt;/strong&gt; (state): Everything the agent knows: the original goal, conversation history, prior tool results, and any persistent state it needs. Without good memory management, agents hallucinate and repeat mistakes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Control loop&lt;/strong&gt; (orchestration): The code that runs the loop. It calls the LLM, parses the output for tool calls, executes them, and feeds results back. Modern frameworks (Anthropic's Claude SDK, LangChain, LlamaIndex) handle this. You can also implement it from scratch.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Levels of autonomy
&lt;/h2&gt;

&lt;p&gt;Agents exist on a spectrum. On one end are suggestion-based copilots that nudge you. On the other are autonomous systems that run unattended for hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot mode&lt;/strong&gt; (suggestion): The agent observes what you're doing and suggests the next action. You approve before it executes. Example: Cursor's autocomplete suggests the next line of code; you hit Tab to accept or Escape to reject. The model is doing some reasoning. You stay in control of execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic mode&lt;/strong&gt; (supervised autonomy): The agent makes and executes decisions within a scope you define. You might say "add tests for this file" and the agent writes tests, runs them, and shows you the result, all without asking permission between steps. You can pause or override at any point. Example: Claude Code in an IDE, or an agent working a bounded coding task. The agent is autonomous within the scope, not globally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autonomous agent&lt;/strong&gt; (unattended): The agent pursues a goal with minimal human oversight. You set a goal ("reduce our average response time by 10%") and the agent decides what to measure, what to try, what to roll back, and what to keep. It might run for days, making changes and watching outcomes. Example: an agent managing an experimentation platform, or optimizing an ad-bidding algorithm. These are rare and tend to be domain-specific. The cost of mistakes is too high for general-purpose deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notable tools
&lt;/h2&gt;

&lt;p&gt;Here are some widely used agent runtimes and frameworks, current as of 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; (&lt;a href="https://www.anthropic.com/product/claude-code" rel="noopener noreferrer"&gt;anthropic.com/product/claude-code&lt;/a&gt;): Anthropic's agentic coding tool in the terminal, IDE, and browser. Understands your codebase, executes tasks, and handles git workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; (&lt;a href="https://cursor.com/" rel="noopener noreferrer"&gt;cursor.com&lt;/a&gt;): AI code editor with agent mode. Autonomously explores your codebase, edits files, runs tests, and implements features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenHands&lt;/strong&gt; (&lt;a href="https://www.openhands.dev/" rel="noopener noreferrer"&gt;openhands.dev&lt;/a&gt;): Open-source autonomous agent for software engineering. Runs in a Docker sandbox, can execute complex tasks end-to-end, and publishes pull requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Aider&lt;/strong&gt; (&lt;a href="https://aider.chat/" rel="noopener noreferrer"&gt;aider.chat&lt;/a&gt;): Open-source AI pair programmer for the terminal. Works with your git workflow, supports multiple LLM providers, and commits changes automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Continue&lt;/strong&gt; (&lt;a href="https://www.continue.dev/" rel="noopener noreferrer"&gt;continue.dev&lt;/a&gt;): Open-source IDE extension for VS Code and JetBrains. Offers autocomplete, chat, and agent modes, works with any LLM provider.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AutoGPT&lt;/strong&gt; (&lt;a href="https://agpt.co/" rel="noopener noreferrer"&gt;agpt.co&lt;/a&gt;): Open-source autonomous agent framework, released in 2023. Pioneering example of general-purpose agent architecture; known for demonstrating both promise and limitations of autonomous systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How is an agent different from a chatbot?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A chatbot responds. An agent pursues. Ask a chatbot "book me a flight" and it asks clarifying questions, then waits for you to confirm. Ask an agent and it gathers options, checks your calendar, considers your budget, and books, without asking permission between steps. The chatbot reacts. The agent acts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the difference between an agent and a workflow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A workflow is a fixed sequence of steps determined in advance. You define "do A, then B, then C, with these rules for branching." A workflow always takes the same path for the same inputs. An agent reasons about which steps to take and in what order, adapting based on intermediate results. Workflows are predictable and efficient. Agents trade predictability for flexibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does my agent keep calling the same tool five times in a row?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a loop, and the LLM probably doesn't recognize what the tool returned as the answer it was looking for. Common causes: the tool returned an error and the agent retried with the same inputs; the response shape was different from what the LLM expected, so it kept trying; the system prompt left the goal vague enough that the LLM thrashes between candidates. Fixes that work: clearer descriptions in your tool schema, explicit error messages from the tool ("not found" rather than null), and a hard call-count budget so the loop terminates rather than burning tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How autonomous do agents actually get?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Depends on the task and the risk. In low-risk domains (code suggestions, documentation), agents run nearly unsupervised. In higher-risk domains (financial transactions, customer-facing decisions), agents operate under constraints: bounded scope, human review loops, or escalation to a human when confidence is low. Most production agents are supervised autonomy, not full autonomy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it normal for a single Claude Code session to cost $40?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not normal, not rare. A long session that maintains a big context and re-reads files often will pile up tokens fast. Three places to look. First, prompt caching: is the run hitting the cache, or rebuilding the prompt every turn? Second, context bloat: huge system prompts, large repos, and many open files multiply per-call cost. Third, model choice: Opus is meaningfully pricier than Sonnet on the same workload. Set a hard spend cap and watch tokens per turn. Most overruns trace to context size, not call count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why do some agents get stuck or make silly mistakes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents inherit their LLM's limitations. An LLM can hallucinate or misinterpret what a tool returned. Across multiple reasoning steps, these errors compound. A bad tool result leads the agent down the wrong path. Confirmation bias makes it ignore contradictory evidence. Good design mitigates the failure modes: clear tool descriptions, explicit error signals from tools, and a memory model that lets the agent backtrack rather than press on with bad state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/a&gt;. Yao et al., 2022 (ICLR 2023). The foundational paper introducing the ReAct pattern.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Building Effective AI Agents&lt;/a&gt;. Anthropic's guide to architecture patterns, tool design, and implementation frameworks for single and multi-agent systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.anthropic.com/engineering/writing-tools-for-agents" rel="noopener noreferrer"&gt;Writing Effective Tools for AI Agents&lt;/a&gt;. Anthropic's technical advice on tool design for agentic systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents" rel="noopener noreferrer"&gt;Anthropic Cookbook: Patterns and Agents&lt;/a&gt;. Reference implementations and code examples.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;See also: &lt;a href="https://tokenjam.dev/blog/what-is-agent-observability" rel="noopener noreferrer"&gt;What is agent observability?&lt;/a&gt;, &lt;a href="https://tokenjam.dev/blog/what-is-mcp-model-context-protocol" rel="noopener noreferrer"&gt;What is MCP (Model Context Protocol)?&lt;/a&gt;, &lt;a href="https://tokenjam.dev/blog/agent-control-loops" rel="noopener noreferrer"&gt;Agent control loops&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://www.tokenjam.dev/blog/2026-05-08-agents-101?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=cross-post" rel="noopener noreferrer"&gt;tokenjam.dev/blog&lt;/a&gt;. Part of an ongoing series on the agentic AI ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
