<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arindam Majumder </title>
    <description>The latest articles on DEV Community by Arindam Majumder  (@arindam_1729).</description>
    <link>https://dev.to/arindam_1729</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2Fe0982512-4de1-4154-b3c3-1869d19e9ecc.png</url>
      <title>DEV Community: Arindam Majumder </title>
      <link>https://dev.to/arindam_1729</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arindam_1729"/>
    <language>en</language>
    <item>
      <title>I built a local dashboard to inspect Claude Code sessions, tokens, and costs</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Thu, 02 Apr 2026 07:58:55 +0000</pubDate>
      <link>https://dev.to/arindam_1729/i-built-a-local-dashboard-to-inspect-claude-code-sessions-tokens-and-costs-173m</link>
      <guid>https://dev.to/arindam_1729/i-built-a-local-dashboard-to-inspect-claude-code-sessions-tokens-and-costs-173m</guid>
      <description>&lt;p&gt;I’ve been using Claude Code heavily over the last few weeks and started wondering where my tokens were actually going.&lt;/p&gt;

&lt;p&gt;Claude stores everything locally in ~/.claude/, which is great, but the data mostly sits in JSON logs. If you want to understand session usage, token costs, tool calls, or activity patterns, you basically end up digging through raw files.&lt;/p&gt;

&lt;p&gt;So I built a small tool called cc-lens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3spc4nyf2nhk221or95m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3spc4nyf2nhk221or95m.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built a local dashboard to inspect Claude Code sessions, tokens, and costs&lt;br&gt;
It’s a local-first dashboard that reads your Claude Code session files and turns them into something you can actually explore.&lt;/p&gt;

&lt;p&gt;It runs entirely on your machine. It doesn't have any cloud sync, sign-ups, or telemetry.&lt;/p&gt;

&lt;p&gt;Some things it shows:&lt;/p&gt;

&lt;p&gt;• Usage overview: sessions, messages, tokens, estimated cost&lt;br&gt;
• Per-project breakdown: see which repos are burning the most tokens&lt;br&gt;
• Full session replay: inspect conversations turn-by-turn with token counts and tool calls&lt;br&gt;
• Cost &amp;amp; cache analytics: stacked charts by model and cache usage&lt;br&gt;
• Activity heatmap: GitHub-style view of when you’re using Claude the most&lt;br&gt;
• Memory &amp;amp; plan explorer: browse/edit Claude memory files and saved plans&lt;br&gt;
• Export/import: move dashboards across machines&lt;/p&gt;

&lt;p&gt;You can run it instantly with:&lt;/p&gt;

&lt;p&gt;npx cc-lens&lt;br&gt;
(or clone the repo if you prefer).&lt;/p&gt;

&lt;p&gt;Here's the &lt;a href="https://github.com/Arindam200/cc-lens/" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt;, if you want to try it out!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 28 Mar 2026 13:37:31 +0000</pubDate>
      <link>https://dev.to/arindam_1729/-1p2l</link>
      <guid>https://dev.to/arindam_1729/-1p2l</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-story__hidden-navigation-link"&gt;Running LLM Applications Across Providers with Bifrost&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/studio1hq"&gt;
            &lt;img alt="Studio1 logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9405%2Ff91309c4-f670-4501-9882-79e1e70e2e96.png" class="crayons-logo__image" width="500" height="500"&gt;
          &lt;/a&gt;

          &lt;a href="/arindam_1729" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2Fe0982512-4de1-4154-b3c3-1869d19e9ecc.png" alt="arindam_1729 profile" class="crayons-avatar__image" width="612" height="612"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/arindam_1729" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Arindam Majumder 
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Arindam Majumder 
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png" width="166" height="102"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3363768" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/arindam_1729" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2Fe0982512-4de1-4154-b3c3-1869d19e9ecc.png" class="crayons-avatar__image" alt="" width="612" height="612"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Arindam Majumder &lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/studio1hq" class="crayons-story__secondary fw-medium"&gt;Studio1&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 17&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" id="article-link-3363768"&gt;
          Running LLM Applications Across Providers with Bifrost
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/proxy"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;proxy&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/litellm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;litellm&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/raised-hands-74b2099fd66a39f2d7eed9305ee0f4553df0eb7b4f11b01b6b1b499973048fe5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;15&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>proxy</category>
      <category>litellm</category>
    </item>
    <item>
      <title>Build a Semantic Movie Discovery App with Claude Code and Weaviate Agent Skills</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 27 Mar 2026 20:45:45 +0000</pubDate>
      <link>https://dev.to/studio1hq/build-a-semantic-movie-discovery-app-with-claude-code-and-weaviate-agent-skills-30gd</link>
      <guid>https://dev.to/studio1hq/build-a-semantic-movie-discovery-app-with-claude-code-and-weaviate-agent-skills-30gd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Versatility in agentic coding is increasing as new tools such as Model Context Protocol (MCP) servers and Agent Skills become more common. At the same time, many developers ask the same question when building AI applications: should they use MCP servers or Agent Skills? The important thing is understanding what each approach does well and choosing the one that fits your use case.&lt;/p&gt;

&lt;p&gt;In this post, we’ll explain what MCP servers and Agent Skills are and how they differ, including architecture diagrams and technical details. In the later sections, we’ll also walk through how to use &lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt; with &lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to build a “Semantic Movie Discovery” application with several useful features.&lt;/p&gt;

&lt;p&gt;Let’s get started!&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding MCP
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) is an open standard introduced by Anthropic that enables Large Language Models (LLMs) to interact with external systems such as data sources, APIs and services. MCP provides a structured way for an &lt;a href="https://weaviate.io/agentic-ai" rel="noopener noreferrer"&gt;AI agent&lt;/a&gt; to connect to compliant tools through a single interface instead of requiring custom integrations for each service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfus3ya7jofj8kchzml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfus3ya7jofj8kchzml.png" alt="MCP Architecture " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Architecture
&lt;/h3&gt;

&lt;p&gt;The MCP system operates on a client–server model and consists of three main components.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host:&lt;/strong&gt; the application that runs the AI model and provides the environment where the agent operates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; the protocol connector inside the host that handles communication between the model and MCP servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server:&lt;/strong&gt; an external service that exposes tools, resources, or prompts that the agent can access.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MCP and Agentic Coding
&lt;/h3&gt;

&lt;p&gt;Before MCP, each AI tool required custom integrations for every external service it wanted to connect to. MCP simplifies this process by introducing a shared protocol that multiple agents and tools can use.&lt;/p&gt;

&lt;p&gt;Developers can now expose capabilities through an MCP server once and allow any compatible agent to access them without building separate integrations for each system.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Understanding Agent Skills&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt;, also introduced by Anthropic, provide developers with a simple way to extend AI coding agents without running MCP servers. An Agent Skill is a structured configuration file, usually written as markdown files with YAML metadata that defines capabilities, parameter schemas and natural-language instructions describing how the agent should use those capabilities.&lt;/p&gt;

&lt;p&gt;AI tools such as Claude Code read these files at session start and load the skills directly into the agent's working context without requiring an additional runtime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn13awyixqnmfnllmjlld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn13awyixqnmfnllmjlld.png" alt="Agent Skills with an AI tool (Claude Code)" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Agent Skills Work
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When Claude Code detects a skill file in the project directory (typically under &lt;code&gt;.claude/skills/&lt;/code&gt;), it loads the manifest into the agent's context at the beginning of the session.&lt;/li&gt;
&lt;li&gt;The skill definition describes available capabilities, how to invoke them correctly and when to prefer one approach over another. Because the instructions are written in natural language alongside parameter schemas, the agent can reason about how to use the skill.&lt;/li&gt;
&lt;li&gt;Skills are portable across repositories. If a developer commits a skill file to a repository, any collaborator who clones the project and opens it in Claude Code automatically gains access to the same capabilities without additional setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP and Agent Skills solve different problems in agent systems. MCP provides a standardized way for AI agents to connect to external tools, APIs, databases and services through a client–server architecture with structured schemas. Agent Skills extend the agent’s capabilities through configuration files that define workflows, instructions and parameter schemas without requiring a running server.&lt;/p&gt;

&lt;p&gt;In simple terms, &lt;strong&gt;MCP enables agents to access external systems, while Agent Skills define how agents perform tasks or workflows within their environment.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Weaviate Agent Skills
&lt;/h2&gt;

&lt;p&gt;Weaviate has released an official set of &lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; designed for use with Claude Code and other compatible agent-based development environments like Cursor, Antigravity, Windsurf and more. These skills provide structured access to Weaviate vector databases, allowing agents to perform common operations such as search, querying, schema inspection, data exploration and collection management.&lt;/p&gt;

&lt;p&gt;The repository includes ready-to-use skill definitions for tasks like semantic, hybrid and keyword search, along with natural language querying through the Query Agent. It also supports workflows such as creating collections, importing data and fetching filtered results, and cookbooks. This enables agents to interact/build with Weaviate and perform multi-step retrieval and agentic tasks more effectively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgiqyrgy3vpbq0xxz5ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgiqyrgy3vpbq0xxz5ej.png" alt="Weaviate Ecosystem Tools and Features" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Skills and Vector Databases
&lt;/h2&gt;

&lt;p&gt;AI coding agents face difficulties when working with vector databases. Vector database APIs provide extensive capabilities, including basic “key–value” retrieval, single-vector near-text searches, multimodal near-image searches, hybrid BM25-plus-vector search, generative modules and multi-tenant system support. Without structured guidance, even a capable coding agent may produce suboptimal queries: correct syntax but the wrong search strategy, missing parameters or failure to use powerful features like the Weaviate Query Agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://weaviate.io/blog/weaviate-agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt; address this by providing correct usage patterns, parameter recommendations and decision logic, enabling coding agents to generate production-ready code from their initial attempts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Weaviate Agent Skills repository is organized into two main parts&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcuqk3n68wemqdz6hj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcuqk3n68wemqdz6hj.png" alt="Overview of Weaviate Agent Skills" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate&lt;/strong&gt; 𝗦𝗸𝗶𝗹𝗹 (skills/weaviate): Focused scripts for tasks such as schema inspection, data ingestion and vector search. Agents use these while writing application logic or backend code.&lt;/li&gt;
&lt;li&gt;𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸𝘀 &lt;strong&gt;Skill&lt;/strong&gt; (skills/weaviate-cookbooks): End-to-end project examples that combine tools such as FastAPI, Next.js and Weaviate to demonstrate full application workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Weaviate Agent Skills work with several development environments, including Claude Code, Cursor, GitHub Copilot, VS Code and Gemini CLI. When connected to a Weaviate Cloud instance, agents can directly interact with database modules and perform search, data management and retrieval tasks.&lt;/p&gt;

&lt;p&gt;To evaluate how effective Weaviate Agent Skills really are, let’s build a small project and see how they accelerate RAG and agentic application development with Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Semantic Movie Discovery Application
&lt;/h2&gt;

&lt;p&gt;We will build a &lt;strong&gt;Movie Discovery App&lt;/strong&gt; that takes a natural-language description and returns the most semantically similar movies from a Weaviate collection. In the process, we will explore Weaviate capabilities such as multimodal storage, named vector search, generative AI (RAG) and the Query Agent in action with Claude Code, showing how these Agentic tools help you build applications faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;Python 3.10&lt;/a&gt; or higher&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.weaviate.io/weaviate/quickstart" rel="noopener noreferrer"&gt;Weaviate Cloud&lt;/a&gt; – Create a free cluster and obtain an API key.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.themoviedb.org/" rel="noopener noreferrer"&gt;TMDB API key&lt;/a&gt; – Used to fetch movie metadata&lt;/li&gt;
&lt;li&gt;OpenAI API key – Required for &lt;a href="https://weaviate.io/rag" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; features.&lt;/li&gt;
&lt;li&gt;Access to &lt;a href="https://code.claude.com/docs/en/quickstart" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nodejs.org/en/download" rel="noopener noreferrer"&gt;Node.js 18+&lt;/a&gt; and npm – Required to run the Next.js frontend&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Project Setup
&lt;/h3&gt;

&lt;p&gt;Create a &lt;strong&gt;movie-discovery-app&lt;/strong&gt; folder&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mkdir&lt;/span&gt; &lt;span class="n"&gt;movie&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;discovery&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create and activate a  &lt;strong&gt;Python virtual environment&lt;/strong&gt; in the folder&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;movie-discovery-app py &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate.bat 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Python dependencies&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;weaviate-client&lt;span class="o"&gt;==&lt;/span&gt;4.20.1 fastapi uvicorn[standard] openai weaviate-agents&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;1.3.0 requests python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Node.js dependencies for the frontend&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, create a &lt;code&gt;.env&lt;/code&gt; file at the project root. Add the following parameters to configure &lt;strong&gt;Weaviate Agent Skills with Claude Code&lt;/strong&gt;, along with your &lt;strong&gt;OpenAI API key&lt;/strong&gt; and &lt;strong&gt;TMDB API key&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WEAVIATE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;without&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;
&lt;span class="n"&gt;WEAVIATE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;TMDB&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tmdb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After signing up for Weaviate, click the &lt;strong&gt;Create Cluster&lt;/strong&gt; button to start a new cluster for your use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo4cx6bxr7o7xkbqyu1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo4cx6bxr7o7xkbqyu1j.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;“How to Connect”&lt;/strong&gt; to view the required Weaviate connection parameters.&lt;/p&gt;

&lt;p&gt;Now that everything is set up, we can connect Weaviate Cloud with &lt;strong&gt;Claude Code&lt;/strong&gt; by running &lt;code&gt;claude&lt;/code&gt; in your project terminal:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9y0xh1tmthf9gp5hilm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9y0xh1tmthf9gp5hilm.png" alt="Claude Code screnshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the following prompt in your Claude terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Write and run &lt;span class="sb"&gt;`check_modules.py`&lt;/span&gt; that connects using &lt;span class="sb"&gt;`weaviate.connect_to_weaviate_cloud`&lt;/span&gt;with &lt;span class="sb"&gt;`skip_init_checks=True`&lt;/span&gt;, loads credentials from &lt;span class="sb"&gt;`.env`&lt;/span&gt; with &lt;span class="sb"&gt;`python-dotenv`&lt;/span&gt;,
and prints the full JSON list of enabled Weaviate modules.
Run it with &lt;span class="sb"&gt;`venv/Scripts/python check_modules.py`&lt;/span&gt;."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Create A Weaviate Collection and Import Sample Movie Data
&lt;/h3&gt;

&lt;p&gt;In this step, we create a Weaviate collection and import the movie dataset into Weaviate.  The dataset contains movie metadata sourced from the TMDB API. Each entry includes: &lt;em&gt;title, overview, release_date, poster_url, popularity, and other important movie fields&lt;/em&gt;. You can import a JSON or CSV dataset directly into Weaviate.&lt;/p&gt;

&lt;p&gt;Run this prompt to retrieve the dataset from the TMDB API and save it to a file named &lt;em&gt;movies.json&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Create a TMDB dataset JSON file, movies.json, that contains 100 movie metadata and poster URLs directly from the TMDB API. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterwards, &lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate/references/import_data.md" rel="noopener noreferrer"&gt;Weaviate Import Skills&lt;/a&gt; creates a Weaviate collection and imports the data from &lt;em&gt;movies.json&lt;/em&gt; into the Weaviate database. Claude code activates Weaviate to perform this action when prompted with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Import &lt;span class="sb"&gt;`movie.json`&lt;/span&gt; into a new Weaviate collection called Movie
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbeb2l8quvgqtbfmbzt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbeb2l8quvgqtbfmbzt7.png" alt="Claude Code" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then the data is imported&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuihrumms8ofngypte6vi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuihrumms8ofngypte6vi.png" alt="Terminal Output" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Building the FastAPI Backend and Next.js Frontend with Weaviate Cookbooks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate-cookbooks/references/frontend_interface.md" rel="noopener noreferrer"&gt;Weaviate cookbooks&lt;/a&gt; enable the app to use a two-layer architecture: a FastAPI backend that exposes REST endpoints and a Next.js frontend that renders the UI. The backend connects directly to Weaviate Cloud and the Weaviate Query Agent. Weaviate cookbooks also include some frontend guidelines to communicate with the &lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate-cookbooks/references/frontend_interface.md" rel="noopener noreferrer"&gt;Weaviate backend&lt;/a&gt; over HTTP.&lt;/p&gt;

&lt;p&gt;The app is organized into two views accessed via a collapsible sidebar:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Search view&lt;/strong&gt;: performs semantic search and RAG using Weaviate named vectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat view&lt;/strong&gt;: handles multi-turn conversations through the Weaviate Query Agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our app includes the following features:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Layer&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Role&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;backend.py (FastAPI) - REST API on port 8000/docs&lt;/td&gt;
&lt;td&gt;Routes: GET /health, GET /search, POST /ai/explain, POST /ai/plan, POST /chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js + TypeScript (port 3000)&lt;/td&gt;
&lt;td&gt;Single-page app with sidebar navigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;SearchView.tsx&lt;/td&gt;
&lt;td&gt;Semantic search (near_text), AI explanations (single_prompt), Movie Night Planner (grouped_task)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;MovieCard.tsx&lt;/td&gt;
&lt;td&gt;Renders base64 poster inline, watchlist add/remove button&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;ChatView.tsx&lt;/td&gt;
&lt;td&gt;Multi-turn Query AI Agent chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;AppSidebar.tsx&lt;/td&gt;
&lt;td&gt;Navigation (Search/Chat), Weaviate logo + feature summary, watchlist manager with ‘.txt’ export&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use the following prompts with Claude Code to generate the backend and frontend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;/weaviate cookbooks 

Create &lt;span class="sb"&gt;`backend.py`&lt;/span&gt;: a FastAPI app with CORS enabled for localhost:3000.
Connect to Weaviate Cloud using credentials from .env with skip_init_checks=True.
The /search endpoint should return genre and vote_average alongside title, description, release_year, and poster.
Implement these routes:  
&lt;span class="p"&gt;
-&lt;/span&gt; GET  /health                  → {"status": "ok"}  
&lt;span class="p"&gt;-&lt;/span&gt; GET  /search?q=...&amp;amp;limit=3    → near_text on text_vector, return title/description/release_year/poster  
&lt;span class="p"&gt;-&lt;/span&gt; POST /ai/explain              → generate.near_text with single_prompt  
&lt;span class="p"&gt;-&lt;/span&gt; POST /ai/plan                 → generate.near_text with grouped_task  
&lt;span class="p"&gt;-&lt;/span&gt; POST /chat                    → QueryAgent.ask() with full message history

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Frontend Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Using Weaviate cookbooks frontend reference, create a Next.js TypeScript app in the frontend/ folder.
MovieCard.tsx should display a star rating (vote_average) and genre tag beneath the movie title. 

Components needed:  
&lt;span class="p"&gt;
-&lt;/span&gt; page.tsx        — SidebarProvider layout, view state (search | chat)  
&lt;span class="p"&gt;-&lt;/span&gt; SearchView.tsx  — search input, MovieCard grid, AI explain and plan buttons  
&lt;span class="p"&gt;-&lt;/span&gt; MovieCard.tsx   — poster image, title, year, description, watchlist button  
&lt;span class="p"&gt;-&lt;/span&gt; ChatView.tsx    — message bubbles, source citations, clear chat  
&lt;span class="p"&gt;-&lt;/span&gt; AppSidebar.tsx  — navigation, Weaviate logo + feature list, watchlist + exportBackend base URL from NEXT_PUBLIC_BACKEND_HOST env var (default localhost:8000)

Run backend and frontend servers with: uvicorn backend:app --reload --port 800 and npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, Claude Code will automatically build the app by adding relevant files and start both servers. You can start using the application immediately.&lt;/p&gt;

&lt;p&gt;The FastAPI backend runs at &lt;code&gt;http://localhost:8000/docs&lt;/code&gt;while the frontend app is available at &lt;code&gt;http://localhost:3000&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can also manually start both processes in separate terminals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 — Backend &lt;/span&gt;
uvicorn backend:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;span class="c"&gt;# Terminal 2 — Frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Congratulations! You’ve completed the project without needing to do much manual configuration or coding.&lt;/strong&gt; 🔥&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo
&lt;/h3&gt;

&lt;p&gt;So far, we have used Weaviate Agent Skills with Claude Code to build a Semantic Movie Discovery Application powered by an OpenAI API key, a TMDB API key, and Weaviate.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4udXaqI0PaQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Movie Discovery app we built includes the following features&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search:&lt;/strong&gt; Describe a mood or theme and retrieve matching movies using vector-based search (&lt;code&gt;near_text&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI explanations:&lt;/strong&gt; Generate per-movie summaries using RAG with &lt;code&gt;single_prompt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Movie Night Planner:&lt;/strong&gt; Create a viewing order, snack pairings and a theme summary using &lt;code&gt;grouped_task&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversational chat:&lt;/strong&gt; Ask questions about the movie collection through a chat interface powered by the Weaviate Query Agent, with source citations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watchlist:&lt;/strong&gt; Save movies during your session and export the list as a &lt;code&gt;.txt&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What’s Next?
&lt;/h3&gt;

&lt;p&gt;You could add image-based search to find similar movies and better meet your movie choices. You could also include a hybrid search feature that incorporates keyword-heavy queries and image search. &lt;/p&gt;

&lt;p&gt;You can take your app even further by getting up to speed with Weaviate’s latest &lt;a href="https://weaviate.io/blog" rel="noopener noreferrer"&gt;releases&lt;/a&gt; and becoming familiar with features such as server-side batching, async replication improvements, Object TTL and many more.&lt;/p&gt;

&lt;p&gt;To explore further, check out the latest Weaviate &lt;a href="https://weaviate.io/blog" rel="noopener noreferrer"&gt;releases&lt;/a&gt; and join the discussion on the &lt;a href="https://forum.weaviate.io/" rel="noopener noreferrer"&gt;community forum&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Weaviant Agent Skills in Action&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Weaviate modules were used in the application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text2vec-weaviate:&lt;/strong&gt; Responsible for text embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi2multivec-weaviate:&lt;/strong&gt; Responsible for embedding images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generative-openai:&lt;/strong&gt; Integrates GPT directly into the query workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Skill:&lt;/strong&gt; Creates a collection and imports data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Cookbooks Skill:&lt;/strong&gt; For defining the app’s logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Query Agent:&lt;/strong&gt; A higher-level abstraction that accepts natural language queries, decides the best query method, executes queries, synthesizes results and returns answers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weaviate Agent Skills help in shipping faster and more accurate RAG applications. Backend development tasks such as schema inspection, data ingestion and search operations are automated and optimized. Ultimately, this helps developers save valuable development time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both MCP servers and Agent Skills provide useful patterns for building AI-powered applications. MCP servers are well-suited for exposing external tools and services through a standardized interface, while Agent Skills focus on guiding coding agents with structured workflows and best practices.&lt;/p&gt;

&lt;p&gt;In this tutorial, we demonstrated how Weaviate Agent Skills can simplify development by helping Claude Code generate correct database queries, ingestion pipelines and search logic. By combining vector search, multimodal storage and generative capabilities, we built a semantic movie discovery application with minimal manual setup.&lt;/p&gt;

&lt;p&gt;As agentic development environments continue to evolve, tools like MCP servers and Agent Skills will likely be used together. The key is understanding where each approach fits and selecting the one that best supports your application architecture.&lt;/p&gt;

&lt;p&gt;Happy building.&lt;/p&gt;




&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Studio1HQ/movie-discovery-app" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt; for the Movie Discovery App&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>rag</category>
      <category>webdev</category>
    </item>
    <item>
      <title>We Cut Our MCP Token Spend in Half. Here's the Architecture</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Wed, 25 Mar 2026 19:04:52 +0000</pubDate>
      <link>https://dev.to/studio1hq/we-cut-our-mcp-token-spend-in-half-heres-the-architecture-1jic</link>
      <guid>https://dev.to/studio1hq/we-cut-our-mcp-token-spend-in-half-heres-the-architecture-1jic</guid>
      <description>&lt;p&gt;When we started scaling our MCP workflows, token usage was something we barely tracked. The system worked well, responses were accurate, and adding more tools felt like the right next step. Over time, the cost began rising in ways that did not align with how much the system was actually used.&lt;/p&gt;

&lt;p&gt;At first, we assumed this was due to higher usage or more complex queries. The data showed something else. Even simple requests were using more tokens than expected. This led us to ask a basic question. What exactly are we sending to the LLM on every call?&lt;/p&gt;

&lt;p&gt;A closer look made things clearer. The issue came from how the system was built. We handled context, tool definitions, and execution flow by adding extra tokens at every step.&lt;/p&gt;

&lt;p&gt;This article explains how we found the root cause and redesigned the architecture to fix it. The changes cut our MCP token usage by nearly half and gave us better control over how the system behaves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Token Usage in MCP Systems
&lt;/h2&gt;

&lt;p&gt;Once we started examining token usage, a clear pattern showed up. The LLM was receiving far more context than most requests actually needed. A large part of this came from tool definitions being sent repeatedly on every call.&lt;/p&gt;

&lt;p&gt;Each request included the full list of tools, even when only one or two were needed. On top of that, earlier outputs and intermediate results were passed back into the model. The context kept growing, even for simple queries.&lt;/p&gt;

&lt;p&gt;The execution flow added to the problem. The LLM would choose a tool, call it, process the result, and then repeat the same cycle if another step was needed. Each step added more tokens, and the same data often appeared many times across calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraya207lc4ie4r2yqsd2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraya207lc4ie4r2yqsd2.png" alt="Image1" width="800" height="1422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup worked at a small scale. As the number of tools increased, the cost grew quickly. More tools meant more context. More steps meant repeated processing. The system was doing extra work without adding real value. At this point, the cause was clear. Token usage came from how the system handled context and execution. The design itself was driving the overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Bifrost
&lt;/h2&gt;

&lt;p&gt;We started looking for a way to change how the system handled tool execution. The goal was simple. Reduce the amount of context sent to the LLM and avoid repeated processing across steps.&lt;/p&gt;

&lt;p&gt;During this process, we came across &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt;, an &lt;a href="https://github.com/maximhq/bifrost" rel="noopener noreferrer"&gt;open source&lt;/a&gt; MCP gateway. It works between the application, the model, and the tools. It brings structure for how tools are discovered and executed, so the LLM receives only what is needed on each call.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhnphaglsh5ymggy61oe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhnphaglsh5ymggy61oe.png" alt="Image" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This changed how we thought about the system. Tool access became more controlled. Context stayed limited to what was required for each request. The overall flow of execution became easier to follow and reason about.&lt;/p&gt;

&lt;p&gt;These changes directly addressed the issues we were seeing. Tool definitions were sent only when required. Repeated decision loops were reduced. The system handled execution in a more controlled and predictable way.&lt;/p&gt;

&lt;p&gt;From here, the focus moved away from adjusting prompts and toward changing how the system runs end-to-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural Changes with Bifrost Code Mode
&lt;/h2&gt;

&lt;p&gt;The main change came from how execution was handled inside Bifrost. &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt; is a Bifrost feature that changes how the LLM interacts with MCP tools. Earlier, the LLM handled both planning and step-by-step tool interaction. Each step required another call, and each call carried a growing context.&lt;/p&gt;

&lt;p&gt;Code Mode separates these responsibilities. The LLM focuses on planning. It generates executable code that defines the full workflow for a task. &lt;/p&gt;

&lt;p&gt;Code Mode works best when multiple MCP servers are involved, workflows have several steps, or tools need to share data. For simpler setups with one or two tools, Classic MCP works well.&lt;/p&gt;

&lt;p&gt;A mixed setup also works. Use Code Mode for heavier workflows like search or databases, and keep simple tools as direct calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcz78lp878cwfdmchwomm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcz78lp878cwfdmchwomm.png" alt="Image2" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting the right tools&lt;/li&gt;
&lt;li&gt;Passing data between tools&lt;/li&gt;
&lt;li&gt;Defining how the final output is produced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system exposes a minimal interface to the LLM. It can list available tools, read tool details, and, when required, understand how each tool works. Tool definitions are accessed on demand, which keeps the initial context small.&lt;/p&gt;

&lt;p&gt;Once the plan is generated, execution moves to a runtime environment. The code runs in a sandbox and interacts directly with tools. All intermediate steps, tool responses, and data transformations stay within this layer.&lt;/p&gt;

&lt;p&gt;This removes the need for repeated LLM calls during execution. The workflow runs in one pass, guided by the generated code. The LLM is involved mainly at the planning stage and for producing the final response if required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawpurvuv48ogzbgr1rdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawpurvuv48ogzbgr1rdu.png" alt="Image" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The flow becomes more structured. A request comes in, relevant tools are identified, code is generated, and execution happens in a controlled environment. The system handles state and intermediate data outside the LLM.&lt;/p&gt;

&lt;p&gt;This approach improves clarity in how tasks are executed. The generated code can be inspected, debugged, and understood directly. Each request follows a defined path, which makes behavior easier to track and reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Bifrost CLI in Our Workflow
&lt;/h2&gt;

&lt;p&gt;Getting started required two commands. First, start the gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then launch the CLI from a separate terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP servers are registered once through the API. The key flag is &lt;code&gt;is_code_mode_client&lt;/code&gt;, which tells Bifrost to handle that server through Code Mode instead of sending its tool definitions on every request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/api/mcp/client &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "youtube",
    "connection_type": "http",
    "connection_string": "http://localhost:3001/mcp",
    "tools_to_execute": ["*"],
    "is_code_mode_client": true
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once registered, the LLM discovers tools on demand using &lt;code&gt;listToolFiles&lt;/code&gt; and &lt;code&gt;readToolFile&lt;/code&gt;, then submits a full execution plan through &lt;code&gt;executeToolCode&lt;/code&gt;. A workflow that previously took six LLM turns now completes in three to four.&lt;/p&gt;

&lt;p&gt;Bifrost organizes tool definitions using two binding levels. Server-level (default) groups all tools from a server into one &lt;code&gt;.pyi&lt;/code&gt; file. Tool-level gives each tool its own file — better for servers with 30+ tools. Set it once in &lt;code&gt;config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool_manager_config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"code_mode_binding_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debugging became simpler because the generated code is the execution plan. When something went wrong, the issue was visible directly in the code rather than buried in prompt chains. This setup also made execution easier to inspect.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;youtube&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI infrastructure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxResults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;titles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;snippet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;titles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution runs in a Starlark interpreter, a restricted subset of Python. A few constraints to keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No import statements, file I/O, or network access&lt;/li&gt;
&lt;li&gt;Classes are not supported, use dictionaries&lt;/li&gt;
&lt;li&gt;Tool calls run synchronously; async handling is not required&lt;/li&gt;
&lt;li&gt;Each tool call has a default timeout of 30 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code Mode also works with &lt;a href="https://docs.getbifrost.ai/mcp/agent-mode" rel="noopener noreferrer"&gt;Agent Mode&lt;/a&gt; for automated workflows. The &lt;code&gt;listToolFiles&lt;/code&gt; and &lt;code&gt;readToolFile&lt;/code&gt; tools are always auto-executable since they are read-only. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;executeToolCode&lt;/code&gt; tool only auto-executes if every tool call within the generated code is on the approved list. If any call falls outside that list, Bifrost returns it to the user for approval before running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impact on Token Usage and System Efficiency
&lt;/h2&gt;

&lt;p&gt;The reduction in token usage came from four specific changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool schemas were sent only when required&lt;/li&gt;
&lt;li&gt;Intermediate outputs stayed within the execution layer&lt;/li&gt;
&lt;li&gt;Repeated context across steps was removed&lt;/li&gt;
&lt;li&gt;Fewer LLM calls were needed, since execution moved to a sandbox and ran in a single flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These changes had a clear effect. Token usage dropped by nearly half. Latency reduced along with it. Execution became more predictable, since each request followed a defined path with fewer moving parts.&lt;/p&gt;

&lt;p&gt;The broader takeaway is clear. Token cost comes from system design. Small changes in prompts or outputs help at the edges. The main overhead comes from the system's structure.&lt;/p&gt;

&lt;p&gt;LLMs work best when they focus on planning. Managing execution through repeated loops adds cost and introduces variability. A separate execution layer keeps the flow stable and easier to understand. Context also needs careful control. It should be built for each request with only the required information. Letting it grow across steps results in unnecessary overhead and increased token usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Token inefficiency in MCP workflows comes from system design. Bifrost and Code Mode introduced a clear separation between planning and execution. The LLM handles planning, and the runtime handles execution. This brought immediate and measurable improvements in both cost and system behavior.&lt;/p&gt;

&lt;p&gt;If you are working with MCP workflows at scale, &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt; is worth exploring. The &lt;a href="https://docs.getbifrost.ai/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; provides a good starting point to set up the gateway, connect servers, and run workflows using Code Mode.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Composer 2 is controversial, but my actual experience was solid</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 21 Mar 2026 06:55:09 +0000</pubDate>
      <link>https://dev.to/arindam_1729/composer-2-is-controversial-but-my-actual-experience-was-solid-5a7h</link>
      <guid>https://dev.to/arindam_1729/composer-2-is-controversial-but-my-actual-experience-was-solid-5a7h</guid>
      <description>&lt;p&gt;I tried Composer 2 properly today, and honestly, if you put all the controversy aside for a second, the model itself is not bad at all.&lt;/p&gt;

&lt;p&gt;In fact, my first impression is that it’s a real upgrade over Composer 1 and 1.5. I gave it a pretty solid test. I asked it to build a full-stack Reddit clone and deploy it too.&lt;/p&gt;

&lt;p&gt;On the first go, it handled most of the work surprisingly well. The deployment also worked, which was a good sign. The main thing that broke was authentication.&lt;/p&gt;

&lt;p&gt;Then on the second prompt, I asked it to fix that, and it actually fixed the auth issue and redeployed the app.&lt;/p&gt;

&lt;p&gt;That said, it was not perfect. There were still some backend issues left that it could not fully solve. So I would not say it is at the level of Claude Opus 4.6 or GPT-5.4 for coding quality.&lt;/p&gt;

&lt;p&gt;But speed-wise, it felt much faster. For me, it was around 5 to 7x faster than Opus 4.6 / GPT-5.4 in actual workflow, and it also feels much more cost-effective.&lt;/p&gt;

&lt;p&gt;That combination matters a lot.&lt;/p&gt;

&lt;p&gt;Because even if the raw coding quality is still below Opus 4.6 / GPT-5.4, the overall experience was smoother than I expected. It gets you from idea to working product much faster, and for a lot of people that tradeoff will be worth it.&lt;/p&gt;

&lt;p&gt;My current take is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better than Composer 1 / 1.5 by a clear margin&lt;/li&gt;
&lt;li&gt;Fast enough to change how often I’d use it&lt;/li&gt;
&lt;li&gt;Good at getting most of the app done quickly&lt;/li&gt;
&lt;li&gt;Still weak enough in backend reliability that I would not fully trust it yet for complex production work&lt;/li&gt;
&lt;li&gt;Not as strong as Opus 4.6 / GPT-5.4 in coding depth, but still very usable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yeah, I agree with the criticism that it is not on the same level as Opus 4.6 / GPT-5.4 for hard-coding tasks. ( may be because the base model is Kimi K2.5)&lt;/p&gt;

&lt;p&gt;But I also think some people are dismissing it too quickly. If you judge it as a fast, cheaper, improved Composer, it is genuinely solid. &lt;/p&gt;

&lt;p&gt;I shared a longer breakdown &lt;a href="https://www.youtube.com/watch?v=nv1fcjfC5wg" rel="noopener noreferrer"&gt;here&lt;/a&gt; with the exact build flow, where it got things right, and where it still fell short, in case anyone wants more context&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>kimi</category>
      <category>ai</category>
      <category>composer</category>
    </item>
    <item>
      <title>Building an AI-Powered Content Moderation API with InsForge Edge Functions</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:55:13 +0000</pubDate>
      <link>https://dev.to/arindam_1729/building-an-ai-powered-content-moderation-api-with-insforge-edge-functions-j0k</link>
      <guid>https://dev.to/arindam_1729/building-an-ai-powered-content-moderation-api-with-insforge-edge-functions-j0k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern applications rely on user-generated content such as comments, reviews, and messages. Platforms must moderate this content to enforce safety policies and maintain compliance. Manual moderation does not scale, so production systems typically rely on automated moderation pipelines powered by AI.&lt;/p&gt;

&lt;p&gt;Traditional implementations require multiple backend services. Developers often provision servers, integrate AI APIs, manage databases, and configure storage separately. This fragmented setup increases operational overhead and slows development. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/InsForge/InsForge" rel="noopener noreferrer"&gt;InsForge&lt;/a&gt; simplifies this architecture by combining Edge Functions, PostgreSQL Database, Storage, and Model Gateway in a single platform. Benchmarks also show that it can deliver &lt;a href="https://insforge.dev/blog/mcpmark-benchmark-results-v2" rel="noopener noreferrer"&gt;~1.6× faster responses and 2.4x lower token usage&lt;/a&gt; compared to fragmented integrations.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will build a production-ready AI moderation API that runs entirely within InsForge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqdx9ozf5ku2uwyr2ypm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqdx9ozf5ku2uwyr2ypm.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Are Building
&lt;/h2&gt;

&lt;p&gt;Here are the tools that we will be using to build a simple backend moderation workflow using InsForge core services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Moderation API Endpoint:&lt;/strong&gt; We will create an API endpoint using &lt;a href="https://docs.insforge.dev/core-concepts/functions/architecture" rel="noopener noreferrer"&gt;Edge Functions&lt;/a&gt; that accepts user-submitted text content and processes moderation requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Content Evaluation:&lt;/strong&gt; The API will use Model Gateway to access an AI model that classifies submitted content as SAFE or UNSAFE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Storage for Approved Content:&lt;/strong&gt; Approved comments will be stored in a PostgreSQL &lt;a href="https://docs.insforge.dev/core-concepts/database/architecture" rel="noopener noreferrer"&gt;Database managed by InsForge&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attachment Handling with Storage:&lt;/strong&gt; Optional user attachments will be uploaded and stored using &lt;a href="https://docs.insforge.dev/core-concepts/storage/architecture" rel="noopener noreferrer"&gt;Storage Buckets&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated Moderation Response:&lt;/strong&gt; Unsafe content will be rejected immediately, and the API will return a structured moderation response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-Ready Backend Workflow:&lt;/strong&gt; The moderation pipeline will run entirely within InsForge using Database, Edge Functions, &lt;a href="https://docs.insforge.dev/core-concepts/ai/architecture" rel="noopener noreferrer"&gt;Model Gateway&lt;/a&gt;, and Storage, without external servers or additional infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Setup and Repository Structure
&lt;/h2&gt;

&lt;p&gt;Before configuring the backend resources, clone the project repository and review the project structure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Studio1HQ/Content-moderation-Insforge" rel="noopener noreferrer"&gt;Clone the repository&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Studio1HQ/Content-moderation-Insforge
&lt;span class="nb"&gt;cd &lt;/span&gt;content-moderation-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repository contains both the Next.js frontend and the InsForge Edge Function used for moderation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Repository Structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Folder&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/app&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next.js application pages and layouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/components&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;UI components such as the moderation form&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/lib&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Client utilities for connecting to InsForge APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;insforge-functions/moderate-comment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Edge Function implementation for moderation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;handler.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Serverless function that processes moderation requests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This structure keeps the frontend and backend logic organized within the same project while allowing the Edge Function to be deployed independently.&lt;/p&gt;

&lt;p&gt;After cloning the repository, proceed with configuring the backend resources in InsForge.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can set up this backend in two ways. Follow the manual steps in this tutorial to create the database, storage bucket, and Edge Function using the dashboard and CLI. Alternatively, you can use InsForge MCP with your AI coding agent to provision the same resources using a single prompt. See the MCP section at the end of the article for the prompt template and instructions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 1: Setting Up the Database
&lt;/h2&gt;

&lt;p&gt;InsForge provides a managed PostgreSQL Database that you can configure directly from the dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open the Tables Section&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open your project in the InsForge Dashboard.&lt;/li&gt;
&lt;li&gt;In the left sidebar, select Tables.&lt;/li&gt;
&lt;li&gt;Click the + icon next to Tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w6xkfmm1hgtpswgto3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w6xkfmm1hgtpswgto3r.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create the following columns.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;uuid&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primary key for each comment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;content&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User submitted comment text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;attachment_url&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;URL for uploaded file (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Moderation result (&lt;code&gt;approved&lt;/code&gt; or &lt;code&gt;rejected&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;created_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;timestamp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time when the comment was created&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Save the Table&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Create Table to apply the schema.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;comments&lt;/code&gt; table will appear in the Tables panel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53ywzcg9oeat3v2g9ugd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53ywzcg9oeat3v2g9ugd.png" alt="Image3" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Creating the Edge Function
&lt;/h2&gt;

&lt;p&gt;Next, create the serverless API that will process moderation requests.&lt;/p&gt;

&lt;p&gt;InsForge Edge Functions allow you to run backend logic without managing servers. In this tutorial, the function receives user content, evaluates it using AI, and stores approved results in the database.&lt;/p&gt;

&lt;p&gt;Navigate to the Edge Function directory in the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge-functions/moderate-comment/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside this folder, there will be a file named:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;handler.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file will contain the moderation logic executed by the Edge Function.&lt;/p&gt;

&lt;p&gt;The Edge Function performs the following tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accept a POST request containing user content.&lt;/li&gt;
&lt;li&gt;Send the content to the AI model through Model Gateway.&lt;/li&gt;
&lt;li&gt;Classify the content as SAFE or UNSAFE.&lt;/li&gt;
&lt;li&gt;Upload attachments to Storage if present.&lt;/li&gt;
&lt;li&gt;Insert approved content into the comments table.&lt;/li&gt;
&lt;li&gt;Return a structured moderation response.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All moderation logic runs inside the Edge Function, keeping the backend workflow centralized within InsForge.&lt;/p&gt;

&lt;p&gt;Deploy the function using the InsForge CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge functions deploy moderate-comment--file ./insforge-functions/moderate-comment/handler.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faslzh2e3doyhv84lfjc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faslzh2e3doyhv84lfjc7.png" alt="Image5" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once deployed, the function becomes available as a backend API endpoint that the frontend application can call.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljbdqd5046ag3genf6us.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljbdqd5046ag3genf6us.png" alt="Image6" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: AI Integration Inside the Function
&lt;/h2&gt;

&lt;p&gt;The moderation logic inside the Edge Function uses Model Gateway, which provides unified access to multiple AI models directly within InsForge.&lt;/p&gt;

&lt;p&gt;Model Gateway allows Edge Functions to call AI models without configuring external API clients or managing provider-specific integrations.&lt;/p&gt;

&lt;p&gt;Open the Model Gateway section in the InsForge dashboard and enable a model for the project.&lt;/p&gt;

&lt;p&gt;For this tutorial, enable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openai/gpt-4o-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model will be used to classify incoming content during moderation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbysoea6ct9gsuzn5ye2v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbysoea6ct9gsuzn5ye2v.png" alt="Image9" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the CLI to send a test request to the moderation API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;insforge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;functions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;invoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;moderate-comment--data&lt;/span&gt;&lt;span class="s2"&gt;"{\"&lt;/span&gt;&lt;span class="nx"&gt;content\&lt;/span&gt;&lt;span class="s2"&gt;":\"&lt;/span&gt;&lt;span class="nx"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;community&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;very&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;helpful.\&lt;/span&gt;&lt;span class="s2"&gt;"}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command sends a JSON payload containing the &lt;code&gt;content&lt;/code&gt; field to the Edge Function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngzbpe1ujssmhstwvlyg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngzbpe1ujssmhstwvlyg.png" alt="Image10" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Edge Function also inserts the approved comment into the comments table in the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Configuring Insforge Storage
&lt;/h2&gt;

&lt;p&gt;The moderation workflow also supports optional file uploads using InsForge Storage. Storage provides an S3-compatible object storage system that integrates directly with Edge Functions and the database.&lt;/p&gt;

&lt;p&gt;When a user submits a comment with an attachment, the Edge Function uploads the file to a storage bucket before inserting the comment into PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create a Storage Bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the Storage section in the InsForge dashboard.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to Storage in the sidebar.&lt;/li&gt;
&lt;li&gt;Click Create Bucket.&lt;/li&gt;
&lt;li&gt;Name the bucket: attachments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This bucket will store files uploaded with moderated comments. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7no9lh3kevy5mv8t7n65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7no9lh3kevy5mv8t7n65.png" alt="Image 10" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The upload operation returns a &lt;strong&gt;public file URL&lt;/strong&gt;, which is stored in the &lt;code&gt;attachment_url&lt;/code&gt; column of the &lt;code&gt;comments&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;The moderation function processes attachments as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user submits content with an optional file.&lt;/li&gt;
&lt;li&gt;The Edge Function evaluates the text using AI moderation.&lt;/li&gt;
&lt;li&gt;If the content is classified as SAFE, the file is uploaded to the attachments bucket.&lt;/li&gt;
&lt;li&gt;The returned file URL is stored in the comments table.&lt;/li&gt;
&lt;li&gt;If the content is UNSAFE, the function rejects the request and no file is uploaded.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensures that only approved content and attachments are stored, keeping the storage system aligned with the moderation rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Building the Next.js UI
&lt;/h2&gt;

&lt;p&gt;The repository already includes a &lt;strong&gt;Next.js application&lt;/strong&gt; that provides a simple interface for interacting with the moderation API.&lt;/p&gt;

&lt;p&gt;Navigate to the frontend code inside the &lt;code&gt;src&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key UI Files&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File / Folder&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/app/page.tsx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Main page that renders the moderation interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/components&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reusable UI components for the moderation workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/lib/insforge.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Utility for connecting the frontend to the InsForge backend&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The UI includes a form where users submit content for moderation.&lt;/p&gt;

&lt;p&gt;The form collects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text content entered by the user&lt;/li&gt;
&lt;li&gt;Optional file attachment&lt;/li&gt;
&lt;li&gt;Submit an action that triggers the moderation request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user submits the form, the application sends a POST request to the Edge Function endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zaapex5dbdokcbra9ay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zaapex5dbdokcbra9ay.png" alt="Image11" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The UI handles the API response and updates the interface accordingly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Approved comments appear in the moderation results section.&lt;/li&gt;
&lt;li&gt;Rejected content displays an error message.&lt;/li&gt;
&lt;li&gt;Approved entries are also visible in the comments database table.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup creates a complete workflow where the Next.js UI communicates with the InsForge Edge Function to perform moderation in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using an AI Agent to Build the UI
&lt;/h3&gt;

&lt;p&gt;You can also accelerate this step using an AI coding agent (such as Cursor, Claude Code, or other agent-based tools). Instead of manually writing the UI components, the agent can generate the form, API calls, and component structure based on a prompt.&lt;/p&gt;

&lt;p&gt;Example prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt; &lt;span class="nx"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nx"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;textarea&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="nx"&gt;comments&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;An&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="nx"&gt;upload&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt; &lt;span class="nx"&gt;submit&lt;/span&gt; &lt;span class="nx"&gt;button&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Send&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;POST&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;InsForge&lt;/span&gt; &lt;span class="nx"&gt;Edge&lt;/span&gt; &lt;span class="nb"&gt;Function&lt;/span&gt; &lt;span class="nx"&gt;endpoint&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Display&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt; &lt;span class="nf"&gt;result &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;approved&lt;/span&gt; &lt;span class="nx"&gt;or&lt;/span&gt; &lt;span class="nx"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;UI&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Use&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;handle&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="nx"&gt;submission&lt;/span&gt; &lt;span class="nx"&gt;and&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Testing the API Endpoint
&lt;/h2&gt;

&lt;p&gt;After deploying the Edge Function and setting up the UI, test the moderation workflow to verify that the API behaves correctly. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Submit Safe Content&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enter a comment through the UI and submit the form.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sz5omlqs9mvejn1zo2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sz5omlqs9mvejn1zo2m.png" alt="Image 12" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expected behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Edge Function sends the content to the AI moderation model.&lt;/li&gt;
&lt;li&gt;The model classifies the text as SAFE.&lt;/li&gt;
&lt;li&gt;The function inserts the comment into the comments table in PostgreSQL.&lt;/li&gt;
&lt;li&gt;If an attachment is included, the file is uploaded to the attachments storage bucket.&lt;/li&gt;
&lt;li&gt;The API returns an approved response to the frontend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32e8hp25nekpkmnmwd4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32e8hp25nekpkmnmwd4u.png" alt="Image 13" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, test a rejection case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3putsxojxrqvnfim1ft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3putsxojxrqvnfim1ft.png" alt="Image 14" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expected behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Edge Function sends the text to the AI moderation model.&lt;/li&gt;
&lt;li&gt;The model classifies the content as UNSAFE.&lt;/li&gt;
&lt;li&gt;The function immediately returns a rejection response.&lt;/li&gt;
&lt;li&gt;No entry is inserted into the comments table.&lt;/li&gt;
&lt;li&gt;No file is uploaded to Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu71rplg51e61daejmmuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu71rplg51e61daejmmuh.png" alt="Image 16" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The table in your Insforge dashboard also reflects the results: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrufr30e913dc6304kbk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrufr30e913dc6304kbk.png" alt="Image 17" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Deployment Using InsForge
&lt;/h2&gt;

&lt;p&gt;Once the function and UI are ready, deploy the backend using the InsForge CLI. This publishes the Edge Function and connects it to the project environment.&lt;/p&gt;

&lt;p&gt;Refer to the &lt;a href="https://insforge.dev/blog/insforge-deployment" rel="noopener noreferrer"&gt;deployment guide here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authenticate the CLI with your InsForge account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Complete the authentication process in the browser. Link the local project directory to your InsForge backend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge &lt;span class="nb"&gt;link&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Select the project created earlier in the InsForge dashboard. This connects the CLI to the correct backend workspace.&lt;/p&gt;

&lt;p&gt;Deploy the Next.js application while passing the required environment variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;insforge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;deployments&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;deploy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;--env&lt;/span&gt;&lt;span class="s2"&gt;"{\"&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_INSFORGE_BASE_URL\&lt;/span&gt;&lt;span class="s2"&gt;":\"&lt;/span&gt;&lt;span class="nx"&gt;https://your-project.insforge.app\&lt;/span&gt;&lt;span class="s2"&gt;"}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This environment variable allows the frontend to communicate with the deployed Edge Function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosegkycwonbfpt46vvw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosegkycwonbfpt46vvw9.png" alt="Image 14" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify the Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After deployment, the application becomes accessible via the InsForge-hosted domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwej10rcemepkz59bxtit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwej10rcemepkz59bxtit.png" alt="Image 16" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Access the &lt;a href="https://sec3hf94.insforge.site/" rel="noopener noreferrer"&gt;live demo here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using MCP to Accelerate Development
&lt;/h2&gt;

&lt;p&gt;Instead of manually creating tables, storage buckets, and Edge Functions, you can also configure the backend using &lt;a href="https://docs.insforge.dev/mcp-setup" rel="noopener noreferrer"&gt;Remote MCP (Model Context Protocol)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;MCP exposes InsForge backend capabilities as tools that an AI coding agent can call to provision resources automatically. With a single prompt, the agent can generate the database schema, configure storage, and deploy the moderation function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt7ojupntpxibvg3snb4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt7ojupntpxibvg3snb4.png" alt="Image 17" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example prompt used to create this backend workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="n"&gt;moderation&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;InsForge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;PostgreSQL&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"comments"&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;attachment_url&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"attachments"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;storing&lt;/span&gt; &lt;span class="n"&gt;uploaded&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;Edge&lt;/span&gt; &lt;span class="k"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"moderate-comment"&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="n"&gt;POST&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;comment&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sends&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;AI&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;classifies&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;SAFE&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="n"&gt;UNSAFE&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;uploads&lt;/span&gt; &lt;span class="n"&gt;attachments&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;present&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;inserts&lt;/span&gt; &lt;span class="n"&gt;approved&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using MCP, developers can provision backend resources and deploy functions directly from prompts, significantly accelerating backend setup while keeping the same architecture described in this tutorial.&lt;/p&gt;

&lt;p&gt;Refer to the &lt;a href="https://docs.insforge.dev/mcp-setup" rel="noopener noreferrer"&gt;quick demo here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we built a content moderation API using InsForge Edge Functions, integrated AI-powered classification through Model Gateway, stored approved results in PostgreSQL, and handled optional file uploads with Storage. The entire workflow runs inside InsForge, without external servers or fragmented infrastructure.&lt;/p&gt;

&lt;p&gt;This approach demonstrates how developers can combine Edge Functions, AI integration, database services, and storage to implement production-ready backend APIs with minimal operational overhead.&lt;/p&gt;

&lt;p&gt;If your application relies on user-generated content, moderation pipelines, or AI-assisted workflows, this architecture provides a straightforward and scalable foundation.&lt;/p&gt;

&lt;p&gt;Ready to simplify your backend stack? Explore InsForge’s Edge Functions, Model Gateway, PostgreSQL database, and Storage services to build intelligent APIs without managing infrastructure.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Try &lt;a href="https://github.com/InsForge/InsForge" rel="noopener noreferrer"&gt;InsForge&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quickstart guide &lt;a href="https://github.com/InsForge/InsForge?tab=readme-ov-file#quickstart" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>fullstack</category>
      <category>insforge</category>
      <category>edgefunctions</category>
    </item>
    <item>
      <title>Cursor Composer 2: Features, Pricing, Benchmarks, and Initial Impressions</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Thu, 19 Mar 2026 20:25:28 +0000</pubDate>
      <link>https://dev.to/arindam_1729/cursor-composer-20-features-pricing-benchmarks-and-initial-impressions-19jd</link>
      <guid>https://dev.to/arindam_1729/cursor-composer-20-features-pricing-benchmarks-and-initial-impressions-19jd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Cursor has released Composer 2, the latest version of its in-house coding model.&lt;/p&gt;

&lt;p&gt;The announcement is focused and fairly easy to summarize. Cursor is making three main claims:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Composer 2 is frontier-level at coding&lt;/li&gt;
&lt;li&gt;it is materially better than previous Composer versions on Cursor’s published benchmarks&lt;/li&gt;
&lt;li&gt;it is priced aggressively enough to be practical for everyday use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination makes the release worth paying attention to. In this post, I’ll walk through what Composer 2 is, what Cursor says improved, how the benchmark results look, what the pricing means, and my initial take on the release.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Composer 2?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tq4zz7n7m00yc7hk1gy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tq4zz7n7m00yc7hk1gy.png" alt="Image1" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Composer 2 is Cursor’s latest in-house coding model.&lt;/p&gt;

&lt;p&gt;Cursor describes it as frontier-level at coding and positions it as a better cost-performance option for agentic software work. The model is now available in Cursor, and the announcement puts most of the emphasis on three areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stronger coding performance&lt;/li&gt;
&lt;li&gt;improved long-horizon task handling&lt;/li&gt;
&lt;li&gt;lower cost than many competing fast models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike some model launches that bundle a large number of product features together, this one is mostly about the model itself. Cursor is not presenting Composer 2 as a general platform shift. It is presenting it as a more capable and more economical coding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 Key Features
&lt;/h2&gt;

&lt;p&gt;The Composer 2 announcement is short, but there are still a few important takeaways.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better coding performance
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839qounser481rjf4b1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839qounser481rjf4b1r.png" alt="Image4" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cursor says Composer 2 delivers large improvements on all of the benchmarks it tracks, including Terminal-Bench 2.0 and SWE-bench Multilingual.&lt;/p&gt;

&lt;p&gt;That matters because it suggests the gains are not limited to one internal evaluation. Cursor is showing improvement across several coding-oriented benchmarks rather than relying on a single headline number.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continued pretraining
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5tq4atu9e7ii4hrx0do.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5tq4atu9e7ii4hrx0do.png" alt="Image3" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the most notable details in the post is that these improvements come from Cursor’s first continued pretraining run.&lt;/p&gt;

&lt;p&gt;This is important because continued pretraining is often what gives a model a stronger base before more specialized post-training methods are applied. Cursor is explicitly saying that Composer 2 starts from a better foundation than earlier Composer versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reinforcement learning for long-horizon tasks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frphs7vriemh5upww7cyi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frphs7vriemh5upww7cyi.png" alt="Image2" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cursor also says it trains Composer 2 on long-horizon coding tasks using reinforcement learning.&lt;/p&gt;

&lt;p&gt;This is probably the most interesting technical claim in the announcement. Cursor says Composer 2 can solve challenging tasks requiring hundreds of actions. That implies the model is being optimized for sustained multi-step software tasks, not just short code completions or simple edits.&lt;/p&gt;

&lt;h3&gt;
  
  
  A fast variant with the same intelligence
&lt;/h3&gt;

&lt;p&gt;Cursor also introduces a faster Composer 2 variant and says it has the same intelligence.&lt;/p&gt;

&lt;p&gt;That is a useful product choice. Instead of forcing users to pick between a “smart” model and a “fast” model family, Cursor is presenting speed as a deployment option on top of the same underlying capability level.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 Benchmarks
&lt;/h2&gt;

&lt;p&gt;Cursor publishes three benchmark comparisons in the announcement:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;CursorBench&lt;/th&gt;
&lt;th&gt;Terminal-Bench 2.0&lt;/th&gt;
&lt;th&gt;SWE-bench Multilingual&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2&lt;/td&gt;
&lt;td&gt;61.3&lt;/td&gt;
&lt;td&gt;61.7&lt;/td&gt;
&lt;td&gt;73.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 1.5&lt;/td&gt;
&lt;td&gt;44.2&lt;/td&gt;
&lt;td&gt;47.9&lt;/td&gt;
&lt;td&gt;65.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 1&lt;/td&gt;
&lt;td&gt;38.0&lt;/td&gt;
&lt;td&gt;40.0&lt;/td&gt;
&lt;td&gt;56.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These gains are large enough to be meaningful.&lt;/p&gt;

&lt;p&gt;The biggest point here is not just that Composer 2 is ahead of Composer 1 and 1.5, but that the improvements show up consistently across all three benchmarks. That gives the release more credibility than a single isolated result would.&lt;/p&gt;

&lt;p&gt;Terminal-Bench 2.0 is especially relevant because Cursor frames it as an evaluation for agentic terminal use. If Composer 2 is genuinely stronger there, that supports Cursor’s claim that the model is getting better at longer, more interactive coding tasks.&lt;/p&gt;

&lt;p&gt;SWE-bench Multilingual is also worth noting because it suggests broader coding competence beyond narrow English-only setups.&lt;/p&gt;

&lt;p&gt;Still, these are vendor-published numbers, so the right takeaway is measured optimism rather than certainty.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Composer 2 Is Priced
&lt;/h2&gt;

&lt;p&gt;Cursor says Composer 2 is priced at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.50 per million input tokens&lt;/li&gt;
&lt;li&gt;$2.50 per million output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The faster variant is priced at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$1.50 per million input tokens&lt;/li&gt;
&lt;li&gt;$7.50 per million output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cursor also says the fast variant has lower cost than other fast models and that fast will be the default option.&lt;/p&gt;

&lt;p&gt;This part of the announcement is more important than it looks. Model releases are usually judged on benchmark quality first, but pricing determines whether a model becomes part of normal daily use or gets reserved for occasional high-value tasks. Cursor is clearly trying to push Composer 2 into the first category.&lt;/p&gt;

&lt;p&gt;On individual plans, Composer usage is part of a standalone usage pool with generous usage included.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 vs Earlier Composer Versions
&lt;/h2&gt;

&lt;p&gt;Based on Cursor’s published table, Composer 2 is a clear step up from Composer 1.5 and Composer 1.&lt;/p&gt;

&lt;p&gt;The improvement is visible across all the benchmarks included in the post, and Cursor attributes that jump to a combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a stronger base model from continued pretraining&lt;/li&gt;
&lt;li&gt;reinforcement learning on long-horizon coding tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a sensible recipe for a coding model. Better base training improves general capability, while long-horizon RL helps the model stay coherent over extended multi-step tasks.&lt;/p&gt;

&lt;p&gt;From the announcement alone, Composer 2 looks like a real model upgrade rather than a minor iteration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Initial Impressions
&lt;/h2&gt;

&lt;p&gt;My first impression is that this is a disciplined release.&lt;/p&gt;

&lt;p&gt;Cursor is not trying to claim that Composer 2 changes everything. The message is narrower and more believable: the model is better, it handles long-horizon coding tasks more effectively, and it is priced aggressively enough to be useful in regular workflows.&lt;/p&gt;

&lt;p&gt;The long-horizon point is the one I would pay most attention to. A lot of coding models can produce a good patch in one pass. Fewer models stay reliable across a task that unfolds over many actions. If Composer 2 is genuinely stronger there, that is a meaningful improvement.&lt;/p&gt;

&lt;p&gt;The pricing is the other major strength. A coding model can be strong on benchmarks and still be awkward in practice if the economics are wrong. Cursor seems to understand that and is making cost a central part of the launch rather than an afterthought.&lt;/p&gt;

&lt;p&gt;At the same time, this is still an announcement built around Cursor’s own evaluation framing. The benchmark gains look strong, but the real test will be whether Composer 2 feels materially better in day-to-day software work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1simb06d45iu9zxnhgo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1simb06d45iu9zxnhgo.png" alt="Image2" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Composer 2 looks like a meaningful upgrade to Cursor’s coding model stack.&lt;/p&gt;

&lt;p&gt;The release is compelling for three reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the benchmark gains are substantial&lt;/li&gt;
&lt;li&gt;the training story is technically coherent&lt;/li&gt;
&lt;li&gt;the pricing is practical&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already use Cursor, Composer 2 is worth trying.&lt;/p&gt;

&lt;p&gt;If you evaluate coding models more broadly, this release is notable because it tries to improve both capability and economics at the same time. That is the right combination to optimize for.&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Running LLM Applications Across Providers with Bifrost</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Tue, 17 Mar 2026 16:15:23 +0000</pubDate>
      <link>https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h</link>
      <guid>https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h</guid>
      <description>&lt;p&gt;Many modern applications include AI features that rely on large language models accessed through APIs. When an application sends a prompt to a model and receives a response, that request usually goes through an external service.&lt;/p&gt;

&lt;p&gt;Getting access to different LLM models is easier today. Providers such as &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; and &lt;a href="https://platform.claude.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt; provide model APIs, and platforms like &lt;a href="https://aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; and &lt;a href="https://cloud.google.com/vertex-ai" rel="noopener noreferrer"&gt;Google Vertex&lt;/a&gt; AI give access to several models from one place. Because of this, many applications connect to more than one provider to compare models, manage cost, or keep a backup option if one service fails.&lt;/p&gt;

&lt;p&gt;But each provider works a little differently. Authentication methods, rate limits, and request formats are not the same. Managing these differences inside an application can slowly add complexity to the system. In this article, let us explore Bifrost, an open-source LLM gateway that provides a single layer to route requests and manage interactions with multiple model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Cost of Provider Integrations
&lt;/h2&gt;

&lt;p&gt;Connecting to several LLM providers may look simple at the start. Adding another provider can feel like just integrating one more API.&lt;/p&gt;

&lt;p&gt;That situation changes once the application runs in production. Requests may need to go to different models based on cost, response quality, or latency. If a provider slows down or becomes unavailable, the system must redirect requests to another provider and keep the service running.&lt;/p&gt;

&lt;p&gt;Handling these situations introduces additional logic into the codebase. The application needs to manage how requests are routed between models. It must also include retry logic for failed calls, fallback providers during outages, and tracking for how requests are distributed across models.&lt;/p&gt;

&lt;p&gt;Each of these responsibilities adds extra work to the system. Over time, operational logic becomes part of the application and increases maintenance effort. This overhead becomes the hidden cost of working directly with multiple model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Bifrost: A Gateway for LLM Infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.getbifrost.ai/overview" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt; is an &lt;a href="https://github.com/maximhq/bifrost" rel="noopener noreferrer"&gt;open-source&lt;/a&gt; LLM and MCP gateway designed to manage interactions between applications and model providers. It sits between the application and the LLM services and acts as a central layer that controls how requests move between systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsyseg3iy2fg1v6h6yhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsyseg3iy2fg1v6h6yhe.png" alt="Image1" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Applications often connect directly to each provider they use. Bifrost adds a gateway layer between the application and the providers, so requests pass through a single entry point before reaching the model services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygdaoyre598cw4i7cdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygdaoyre598cw4i7cdw.png" alt="Image2" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This structure separates provider management from the application. The application sends requests to one endpoint, and the gateway manages communication with different model providers. Provider configuration and request handling stay inside the gateway layer, reducing provider-specific logic in the application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Infrastructure Capabilities
&lt;/h2&gt;

&lt;p&gt;Bifrost provides several infrastructure capabilities for managing LLM interactions across providers. These capabilities move provider-specific handling out of the application and into the gateway layer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-provider routing:&lt;/strong&gt; Bifrost supports multiple AI providers through a single API interface. Applications send requests to one endpoint, and the gateway routes each request to the configured provider or model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing:&lt;/strong&gt; When multiple providers or API keys are configured, Bifrost distributes requests across them based on defined rules. Traffic spreads across providers and reduces the chance of hitting rate limits on a single service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic fallback:&lt;/strong&gt; When a provider returns an error or becomes unavailable, Bifrost sends the request to another configured provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic caching:&lt;/strong&gt; Bifrost stores responses and returns them for similar prompts. Prompt comparison uses semantic similarity. This reduces repeated API calls and improves response time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Platform Support and Integrations
&lt;/h2&gt;

&lt;p&gt;Bifrost fits environments where applications use multiple models and providers. The gateway exposes an OpenAI-compatible API, so applications that already use OpenAI SDKs can connect with minimal changes and send requests through a single endpoint.&lt;/p&gt;

&lt;p&gt;Bifrost works with several &lt;a href="https://docs.getbifrost.ai/providers/supported-providers/overview" rel="noopener noreferrer"&gt;LLM providers&lt;/a&gt;, such as OpenAI, Anthropic, Amazon Bedrock, Google Vertex AI, Cohere, and Mistral. Applications can reach these providers through the same gateway interface.&lt;/p&gt;

&lt;p&gt;The gateway also supports the &lt;a href="https://docs.getbifrost.ai/mcp/overview" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt;. Systems that use MCP can connect tools and external services through the same layer used for model requests. Bifrost also includes a &lt;a href="https://docs.getbifrost.ai/plugins/getting-started" rel="noopener noreferrer"&gt;plugin system&lt;/a&gt; for adding custom behavior such as request validation, logging, or request transformation.&lt;/p&gt;

&lt;p&gt;Bifrost can run using tools such as NPX or Docker and can operate in local setups or production environments. The project is open source under the MIT license and can run across different infrastructure environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Gateway Performance and Benchmark&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A gateway processes every request sent to a model provider. The performance of this layer becomes important in systems that handle a large number of AI requests.&lt;/p&gt;

&lt;p&gt;Bifrost is written in Go, a language often used for backend services that process many requests simultaneously. The system focuses on keeping the extra processing time very small.&lt;/p&gt;

&lt;p&gt;Benchmark tests show that Bifrost adds about 11 microseconds of latency at 5,000 requests per second. One microsecond equals 0.001 milliseconds, so 11 microseconds equals 0.011 milliseconds, which means the delay introduced by the gateway remains extremely small.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.getbifrost.ai/benchmarking/getting-started" rel="noopener noreferrer"&gt;published benchmarks&lt;/a&gt; were executed on AWS EC2 t3.medium and t3.large instances. These are cloud virtual machines with moderate CPU and memory resources that are commonly used to run backend services and APIs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqud1pe1ewno7lns871w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqud1pe1ewno7lns871w.png" alt="Image3" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bifrost also provides a &lt;a href="https://github.com/maximhq/bifrost-benchmarking" rel="noopener noreferrer"&gt;public benchmarking repository&lt;/a&gt; with the scripts and setup used in the tests. Anyone can run the same tests or perform custom benchmarking based on their own infrastructure, traffic patterns, or model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Bifrost
&lt;/h2&gt;

&lt;p&gt;Bifrost is designed for quick setup and can run locally or in a server environment. The gateway can start in a few steps and begin routing LLM requests through a single endpoint.&lt;/p&gt;

&lt;p&gt;One way to start Bifrost is by using &lt;strong&gt;NPX&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bifrost can also run using &lt;strong&gt;Docker&lt;/strong&gt;, which allows the gateway to start inside a container environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the gateway starts, applications can send LLM requests to the Bifrost endpoint. The gateway then routes the requests to the configured model providers.&lt;/p&gt;

&lt;p&gt;Configuration options allow the gateway to define providers, API keys, routing rules, caching behavior, and fallback settings. These configurations control how requests move between different LLM providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Managing several LLM providers inside an application can introduce extra operational logic and maintenance effort. A gateway layer offers a cleaner structure for handling these interactions.&lt;/p&gt;

&lt;p&gt;Bifrost provides this layer by placing a gateway between applications and model providers. Requests go through one endpoint, and the gateway manages routing and provider communication.&lt;/p&gt;

&lt;p&gt;This approach keeps provider integrations outside the core application code and places request management in a separate infrastructure layer.&lt;/p&gt;

&lt;p&gt;To explore configuration options, deployment steps, and additional features, &lt;a href="https://docs.getbifrost.ai/overview" rel="noopener noreferrer"&gt;refer to the official Bifrost documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>proxy</category>
      <category>litellm</category>
    </item>
    <item>
      <title>5 OpenClaw Plugins That Actually Make It Production-Ready</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 13 Mar 2026 15:19:20 +0000</pubDate>
      <link>https://dev.to/arindam_1729/5-openclaw-plugins-that-actually-make-it-production-ready-14kn</link>
      <guid>https://dev.to/arindam_1729/5-openclaw-plugins-that-actually-make-it-production-ready-14kn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;There is a certain point every serious &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; user reaches. The agent is running, the setup works, and then slowly, almost without noticing, the cracks show up. A workflow that should take seconds starts requiring three follow-up prompts. The context window fills up with things the agent should already know. The API bill at the end of the month is higher than expected, and there is no clear answer for why.&lt;/p&gt;

&lt;p&gt;Most people at this point start tweaking their skills, adjusting prompts, or switching models, but the problem is usually none of those things.&lt;/p&gt;

&lt;p&gt;OpenClaw's default configuration is designed to get you started, not to match how you actually use it. The real power that makes it suitable for daily professional use lies in the plugin layer, yet most OpenClaw users have never explored it.&lt;/p&gt;

&lt;p&gt;In this post, we are covering five &lt;a href="https://docs.openclaw.ai/tools/plugin" rel="noopener noreferrer"&gt;OpenClaw Plugins&lt;/a&gt;, and each solves a different problem, each adding a layer that the default setup simply does not have. But before getting into the plugins themselves, it is worth understanding what separates a plugin from a skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are OpenClaw Plugins (And Why They're Different from Skills)
&lt;/h2&gt;

&lt;p&gt;If you have spent any time in the OpenClaw community, you have probably seen both terms used interchangeably. They are not the same thing, and the distinction matters more than it seems.&lt;/p&gt;

&lt;p&gt;A skill is a markdown file, specifically a &lt;code&gt;SKILL.md&lt;/code&gt;  that gets injected into the agent's context at inference time. It shapes how the agent thinks, what tone it uses, and what steps it follows. Every time the agent runs, that file loads into the prompt. Skills are useful for behavior, but they come at a cost: they consume tokens on every single request, whether or not they are relevant to what you asked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7n9a9e4277gcv1qhoc0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7n9a9e4277gcv1qhoc0.png" alt="OpenClaw skill vs plugin." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A plugin is fundamentally different. It is a standalone executable that runs as a separate process alongside OpenClaw. Instead of loading into context, it exposes a set of tools through a defined interface that the agent can call when it actually needs them.  OpenClaw loads plugins once at startup and calls into them only when a task requires it. No tokens consumed just by existing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install a Plugin
&lt;/h2&gt;

&lt;p&gt;Installing any plugin follows the same pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; &amp;lt;plugin-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command downloads the plugin, registers it in your OpenClaw configuration at &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;, and makes its tools available the next time the agent starts. You can open that file at any time to see which plugins are currently registered and adjust their individual configurations.&lt;/p&gt;

&lt;p&gt;To confirm a plugin is active after installation, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns all registered plugins and their current status. If something is not showing up, a full restart of the OpenClaw daemon is usually all it takes.&lt;/p&gt;

&lt;p&gt;With that covered, here are the five plugins worth adding to your setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. &lt;a href="https://manifest.build/docs/install" rel="noopener noreferrer"&gt;Manifest&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;When you configure OpenClaw, you pick a default model. Claude Opus, GPT-4, whatever you prefer. From that point on, every request, regardless of its type, goes to that model. Asking the agent to list files in a directory costs the same as asking it to debug a race condition across three services. The model does not know the difference, and OpenClaw does not try to make one.&lt;/p&gt;

&lt;p&gt;This is where most API bills quietly spiral. Not from one expensive task, but from hundreds of simple ones hitting a premium model they never needed.&lt;/p&gt;

&lt;p&gt;Manifest sits between OpenClaw and your LLM providers. Every request passes through it before reaching a model. It reads the request, classifies the task complexity, and routes it to the cheapest model capable of handling it. Simple lookups go to lighter models. Reasoning-heavy tasks escalate to whatever model can actually handle them. Routing occurs in milliseconds and is invisible to the agent; it only sees a response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcg691g1sqc289fsfcz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcg691g1sqc289fsfcz0.png" alt="Manifest plugin routing OpenClaw" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cost difference compounds fast. Users running OpenClaw through Manifest have reported up to 70% reduction in monthly API spend, not by doing less, but by stopping the habit of paying Opus prices for Haiku-level work. The Manifest dashboard makes this visible: you can see cost broken down per session, per tool call, and per model, so you know exactly where your spend is going and whether the routing decisions are working as expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6cf2c6c4amj9953a552.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6cf2c6c4amj9953a552.png" alt="Manifest Dashboard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Manifest:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install &lt;/span&gt;manifest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once installed, Manifest registers itself as the default routing layer. You can configure routing thresholds and model preferences in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; under the &lt;code&gt;manifest&lt;/code&gt; plugin entry.&lt;/p&gt;

&lt;p&gt;Manifest makes the biggest difference in setups where the agent runs long sessions, handles multi-step tasks, or operates overnight without supervision. The more requests flow through OpenClaw, the more the routing logic saves, because the inefficiency it fixes is not a one-time cost; it is per-request.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;a href="https://composio.dev/toolkits/composio/framework/openclaw" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Out of the box, OpenClaw cannot reach your Gmail, Slack, GitHub, or Notion. Not because the agent is incapable, but because every external service requires OAuth authentication, token management, and refresh handling, none of which OpenClaw sets up for you. Most people work around this by manually generating API keys, pasting them into configuration files, and hoping the tokens don't expire mid-session. It works until it does not.&lt;/p&gt;

&lt;p&gt;Composio solves this at the authentication layer. It runs as an MCP server that sits between OpenClaw and every external app you want the agent to reach. You connect your accounts once through the Composio dashboard, and from that point on, OpenClaw talks to Composio, and it handles everything else. Token refresh, OAuth flows, rate limits, API versioning. None of that touches your OpenClaw config directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumz76k9hvc68dhv6ii1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumz76k9hvc68dhv6ii1w.png" alt="Composio MCP Server connecting OpenClaw" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each app connection runs in an isolated MCP session. If one integration fails or a token expires, it does not affect the others. The agent continues operating normally while Composio handles the reconnection in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Composio:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install &lt;/span&gt;composio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, connect your apps through the Composio dashboard and add the plugin entry to &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"composio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-composio-api-key"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, what this unlocks is straightforward. A single prompt like &lt;em&gt;"summarize my unread emails, open a GitHub issue for anything that needs follow-up, and post a summary to the team Slack channel"&lt;/em&gt; now executes end to end, no switching tabs, no copying API keys, no manual auth setup. The agent has the required access, and Composio ensures it remains valid.&lt;/p&gt;

&lt;p&gt;With 850+ supported apps, Composio covers most of what a professional OpenClaw setup would ever need to reach.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. &lt;a href="https://github.com/hyperspell/hyperspell-openclaw" rel="noopener noreferrer"&gt;Hyperspell&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;OpenClaw's default memory is a &lt;code&gt;MEMORY.md&lt;/code&gt; file. It grows with every session, gets compacted when it reaches a limit, loses information in the process, and reloads entirely on every turn, whether the content is relevant or not. For occasional use, this is fine, but for anyone relying on OpenClaw daily, it becomes a real problem fast.&lt;/p&gt;

&lt;p&gt;Hyperspell replaces this layer entirely. It indexes your connected data sources, emails, documents, and past conversations into a knowledge graph, then injects only the relevant slice of that graph before each agent turn. The agent gets what it needs, not everything it has ever seen.&lt;/p&gt;

&lt;p&gt;Memory also becomes sharper over time. Every query refines how the knowledge graph is indexed, so context recall improves the more you use it. An agent running Hyperspell can reference a decision you made three weeks ago without you having to bring it up.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Hyperspell:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;openclaw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;plugins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;@hyperspell/openclaw-hyperspell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect your data sources through the Hyperspell dashboard, then add your API key under the &lt;code&gt;hyperspell&lt;/code&gt; entry in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;. Context injection is automatic from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. &lt;a href="https://github.com/lekt9/openclaw-foundry" rel="noopener noreferrer"&gt;OpenClaw Foundry&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Most workflows repeat. You run the same sequence of tasks every morning, follow the same steps every time a PR needs review, and ask the agent the same three things before a meeting. OpenClaw handles all of these, but it handles them the same way every time, waiting for you to prompt it from scratch. It does not recognize the pattern. It does not try to make things easier on its own.&lt;/p&gt;

&lt;p&gt;Foundry fixes this. It sits in the background during your sessions, watches what you ask for, and, when it detects a recurring pattern, writes a new tool definition into itself. That tool becomes part of the agent's available toolkit the next time you start a session, no manual configuration required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2tv42jbkdfqbbntrf78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2tv42jbkdfqbbntrf78.png" alt="OpenClaw Foundry plugin" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What makes this different from writing a skill is the output. A skill adds behavioral instructions to the agent's context. Foundry creates an executable tool that the agent can call, with its own inputs and outputs, registered in the tool registry and available on demand.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Foundry:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @getfoundry/foundry-openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads the plugin from npm, extracts it to &lt;code&gt;~/.openclaw/extensions/foundry/&lt;/code&gt;, enables it automatically, and restarts the gateway. After that, add the following to &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"foundry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"autoLearn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"sources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"docs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"experience"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"arxiv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"marketplace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"autoPublish"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;autoLearn: true&lt;/code&gt; is the key setting, it tells Foundry to continuously learn from your sessions without requiring you to trigger it manually. The &lt;code&gt;sources&lt;/code&gt; block controls where Foundry pulls additional context when writing new tools: OpenClaw's own documentation, your past session experience, arXiv papers, and public GitHub repos. For most setups, keeping &lt;code&gt;docs&lt;/code&gt; and &lt;code&gt;experience&lt;/code&gt; enabled is enough to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. &lt;a href="https://github.com/comet-ml/opik-openclaw" rel="noopener noreferrer"&gt;Opik&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-step agent runs fail in non-obvious ways. A tool call returns incorrect output, a sub-agent silently errors out, and a model call takes 12 seconds on a task that should take 2 seconds. Without structured tracing, you are left reading raw logs and guessing. That gets old fast.&lt;/p&gt;

&lt;p&gt;Opik is an open-source LLM and agent observability platform built by Comet ML. The OpenClaw plugin hooks into the gateway process and exports a structured trace for every run, LLM request and response spans, tool call inputs and outputs, sub-agent lifecycle events, latency at each step, and token usage with cost. Every event that matters has a corresponding span in the Opik dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tih007yxok3phfi4enr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tih007yxok3phfi4enr.png" alt="Opik Dashboard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Image reference: &lt;a href="https://github.com/comet-ml/opik-openclaw" rel="noopener noreferrer"&gt;https://github.com/comet-ml/opik-openclaw&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a different layer from what Manifest covers. Manifest tells you how much a request costs and which model handled it. Opik tells you what the agent actually did inside that request, which tools it called, in what order, what each one returned, and where the run slowed down or failed. Both answer different questions and neither replaces the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Opik:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Requirements: OpenClaw &lt;code&gt;&amp;gt;=2026.3.2&lt;/code&gt;, Node.js &lt;code&gt;&amp;gt;=22.12.0&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @opik/opik-openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, restart the gateway, then run the setup wizard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw opik configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This validates your endpoint and API key and automatically writes the config. To verify everything is connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw opik status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recommended config in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"plugins"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"entries"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"opik-openclaw"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;"enabled"&lt;/span&gt;: &lt;span class="nb"&gt;true&lt;/span&gt;,
        &lt;span class="s2"&gt;"config"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"apiKey"&lt;/span&gt;: &lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;,
          &lt;span class="s2"&gt;"apiUrl"&lt;/span&gt;: &lt;span class="s2"&gt;"https://www.comet.com/opik/api"&lt;/span&gt;,
          &lt;span class="s2"&gt;"projectName"&lt;/span&gt;: &lt;span class="s2"&gt;"openclaw"&lt;/span&gt;,
          &lt;span class="s2"&gt;"workspaceName"&lt;/span&gt;: &lt;span class="s2"&gt;"default"&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For teams that cannot send trace data to a third party, Opik is fully self-hostable. Replace &lt;code&gt;apiUrl&lt;/code&gt; with your own instance endpoint, and nothing else changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where to Go From Here&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Each plugin owns a distinct layer. Hyperspell handles context before the request starts. Manifest handles model routing during it. Composio handles external reach when the agent needs to act. Foundry watches for patterns across sessions and builds tools from them. Opik traces everything after the fact, so you know exactly what happened and why.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F164mtxkcmhpwjyfb6hjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F164mtxkcmhpwjyfb6hjr.png" alt="openclaw plugins" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;None of them overlap, and none of them are doing the same job twice. You can start with just one, whichever layer is causing the most friction in your current setup, and layer in the rest as your workflow grows.&lt;/p&gt;

&lt;p&gt;Each plugin has its own documentation to read before configuring anything: &lt;a href="https://manifest.build/docs" rel="noopener noreferrer"&gt;&lt;strong&gt;Manifest&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;Composio&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://github.com/hyperspell/hyperspell-openclaw" rel="noopener noreferrer"&gt;&lt;strong&gt;Hyperspell&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://github.com/lekt9/openclaw-foundry" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenClaw Foundry&lt;/strong&gt;&lt;/a&gt;, and &lt;a href="https://www.comet.com/docs/opik" rel="noopener noreferrer"&gt;&lt;strong&gt;Opik&lt;/strong&gt;&lt;/a&gt;. The &lt;a href="https://docs.openclaw.ai/tools/plugin" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenClaw plugin docs&lt;/strong&gt;&lt;/a&gt; cover the installation system in full if you want to go deeper on how plugins interact with the gateway.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>programming</category>
      <category>skills</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 07 Mar 2026 08:35:19 +0000</pubDate>
      <link>https://dev.to/arindam_1729/-554c</link>
      <guid>https://dev.to/arindam_1729/-554c</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/arindam_1729" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2Fe0982512-4de1-4154-b3c3-1869d19e9ecc.png" alt="arindam_1729"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;What is LLM Observability? The Complete Guide (2026)&lt;/h2&gt;
      &lt;h3&gt;Arindam Majumder  ・ Mar 6&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#llm&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#observability&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>observability</category>
      <category>programming</category>
    </item>
    <item>
      <title>What is LLM Observability? The Complete Guide (2026)</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 06 Mar 2026 07:59:12 +0000</pubDate>
      <link>https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6</link>
      <guid>https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; LLM observability is your ability to understand what your language models are doing in production, not just whether they're up, but whether they're good. &lt;br&gt;
This guide covers everything: what it is, how it differs from traditional monitoring, the four pillars, key metrics, RAG and agent observability, enterprise challenges, the current tools landscape, and how to implement it from scratch.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Shipping an LLM feature is no longer the hard part. Keeping it reliable, fast, and cheap in production is.&lt;/p&gt;

&lt;p&gt;Most engineering teams hit the same wall about three months after their first production deployment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;responses quietly degrade,&lt;/li&gt;
&lt;li&gt;costs balloon unexpectedly,&lt;/li&gt;
&lt;li&gt;a customer escalation surfaces a class of failures nobody noticed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the team realizes they have almost no visibility into what the model is actually doing.&lt;/p&gt;

&lt;p&gt;That's where we need LLM observability.&lt;/p&gt;

&lt;p&gt;In this article, we'll understand what LLM observability is, why traditional monitoring isn't enough, the four pillars you need to instrument, and how to implement it, for simple deployments, RAG pipelines, and agentic systems alike.&lt;/p&gt;

&lt;p&gt;Let's start with the problem that makes all of this necessary.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;You can have a perfect HTTP 200 response from an LLM API and still be on the edge of a production disaster.&lt;/p&gt;

&lt;p&gt;The model responded. No error was thrown. The latency was acceptable. Your uptime check is green.&lt;/p&gt;

&lt;p&gt;And somewhere in that response, the model hallucinated a fact, cited a policy that doesn't exist, or gave a customer the wrong refund amount, with complete confidence, in fluent prose.&lt;/p&gt;

&lt;p&gt;This is the core problem that LLM observability exists to solve. Traditional software monitoring tells you whether your system &lt;em&gt;worked&lt;/em&gt;. LLM observability tells you whether it worked &lt;em&gt;well&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The gap between those two questions is where most enterprise AI deployments silently fail.&lt;/p&gt;

&lt;p&gt;The LLM observability platform market was valued at &lt;a href="https://www.einpresswire.com/article/870921708/large-language-model-llm-observability-platform-market-to-grow-at-363-cagr-2025" rel="noopener noreferrer"&gt;&lt;strong&gt;$1.44 billion in 2024&lt;/strong&gt; and is projected to reach &lt;strong&gt;$6.80 billion by 2029&lt;/strong&gt;&lt;/a&gt;, with a 36.3% CAGR. But market growth doesn't tell you what to do on Monday morning.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is LLM Observability?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeatmu9enpobhjtc5246.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeatmu9enpobhjtc5246.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LLM observability is the practice of capturing, analyzing, and acting on telemetry data from large language model applications in production. It gives you the capacity to understand the internal states of an LLM system through its outputs, with which your team can ensure that the models function accurately, reliably, and safely at scale.&lt;/p&gt;

&lt;p&gt;More precisely, LLM observability means capturing inference-level data, token usage, prompt content, response quality, error rates, latency breakdowns, and cost, and correlating it with user interactions to provide a complete, queryable picture of system behavior.&lt;/p&gt;

&lt;p&gt;Before going further, three terms are worth distinguishing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuously tracks performance metrics (latency, token usage, error rates) as a health signal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Goes beyond monitoring &amp;amp; provides in-depth insight into &lt;em&gt;how and why&lt;/em&gt; an LLM behaves the way it does&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Tracing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Captures request/response flow through an LLM pipeline, tracking inputs, intermediate stages, and outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Observability includes monitoring but adds root-cause analysis. Monitoring tells you &lt;em&gt;something is wrong&lt;/em&gt;. Observability tells you &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why LLM Observability is Different from Traditional APM
&lt;/h2&gt;

&lt;p&gt;Your platform team already runs Datadog, Grafana, New Relic, or some combination. You have RED metrics (Rate, Errors, Duration) on every service. Your infrastructure is well-monitored.&lt;/p&gt;

&lt;p&gt;That's great   but none of these are sufficient for LLMs.&lt;/p&gt;

&lt;p&gt;Traditional APM was designed for deterministic systems. A successful API call means the function completed correctly. An HTTP 200 means the request was handled. These assumptions break completely the moment you introduce a language model.&lt;/p&gt;

&lt;p&gt;They are non-deterministic in nature; they can give 5 different answers in 5 calls.&lt;/p&gt;

&lt;p&gt;Here's the fundamental incompatibility:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Traditional APM&lt;/th&gt;
&lt;th&gt;LLM Observability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary question&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Did it work?&lt;/td&gt;
&lt;td&gt;Was it good?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Success signal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HTTP 200, no exception&lt;/td&gt;
&lt;td&gt;Output quality score&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Input space&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Finite, structured&lt;/td&gt;
&lt;td&gt;Infinite, natural language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deterministic&lt;/td&gt;
&lt;td&gt;Non-deterministic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure modes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Errors, timeouts&lt;/td&gt;
&lt;td&gt;Hallucinations, drift, unsafe content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fixed compute&lt;/td&gt;
&lt;td&gt;Variable (token-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Debugging unit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stack trace&lt;/td&gt;
&lt;td&gt;Prompt + context + response chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analysis type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Threshold alerts&lt;/td&gt;
&lt;td&gt;Semantic search over interactions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most dangerous LLM failures are &lt;strong&gt;silent&lt;/strong&gt;. The model doesn't throw an exception when it makes something up. Degradations happen without a single error being raised.&lt;/p&gt;

&lt;p&gt;A customer service bot starts giving inaccurate answers. A RAG pipeline silently returns less relevant chunks after an index update. A fine-tuned model's quality drifts after a base model version change by the API provider. None of these appear in your existing dashboards.&lt;/p&gt;

&lt;p&gt;LLM observability doesn't replace traditional APM; it supplements it. You still need infrastructure health monitoring. You just need an additional layer that understands what's happening at the &lt;em&gt;content&lt;/em&gt; level, not just the &lt;em&gt;infrastructure&lt;/em&gt; level.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Pillars of LLM Observability
&lt;/h2&gt;

&lt;p&gt;Traditional observability has three pillars: logs, metrics, and traces. LLM observability keeps all three and adds a fourth that doesn't exist in traditional software monitoring at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 1: Metrics
&lt;/h3&gt;

&lt;p&gt;Metrics in LLM observability cover two distinct layers, infrastructure performance and business quality, and you need both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62gl5uh2tinzzve4y9ml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62gl5uh2tinzzve4y9ml.png" alt="Explanation" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Time to First Token (TTFT)&lt;/strong&gt; is the elapsed time between when a request is sent and when the first response token arrives. It's the primary latency signal for streaming interfaces because it's what users &lt;em&gt;feel&lt;/em&gt; as "waiting." TTFT has two components: the prefill phase (the model processes all input tokens and builds its KV-cache) plus any scheduling or queue time. The longer the prompt, the higher the TTFT. This is why prompt size distribution is a leading indicator for latency degradation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token throughput (TPS)&lt;/strong&gt; is the number of output tokens generated per second after the first token. This is your compute load metric. It's not the same as requests per second; a single request can be 50 tokens or 50,000. Token throughput tells you what the model is actually doing; RPS tells you how many requests arrived. Both matter, for different reasons.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;End-to-end latency&lt;/strong&gt; is what most teams track. But the unit that matters is percentiles, not averages. LLM response time distributions are heavily skewed by prompt length, context size, and load. p50 is your median user experience. p95 is your bad-day experience. p99 is where your SLA commitments should live. Average latency in a skewed distribution is a lie; it will always look better than what your worst users actually experience.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30o0gclufzfhic4pqk66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30o0gclufzfhic4pqk66.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zczl9ra3w4499zthote.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zczl9ra3w4499zthote.png" alt="Error Metrics" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not all LLM errors are equal, and treating them as a single "error rate" metric is one of the most common mistakes in inference monitoring.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4xx errors&lt;/strong&gt;: malformed requests, invalid parameters, context length violations. Client problem. The application team owns it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;429s&lt;/strong&gt;: rate limit exhaustion. Capacity problem. Needs a quota increase or traffic shaping, not a bug fix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5xx errors&lt;/strong&gt;: infrastructure failures, model unavailability. Infra team incident. Pages differently from a 4xx.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When these are merged into a single error rate, a 429 storm looks identical to an infrastructure outage. You escalate to the wrong team, debug the wrong layer, and waste hours that your users are already feeling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.tokenfactory.nebius.com/ai-models-inference/observability" rel="noopener noreferrer"&gt;Nebius Token Factory&lt;/a&gt; separates these three error classes natively   each tracked as its own signal, filterable by endpoint, region, and API key. This is how error monitoring should work by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quality metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma73sb6rwfxfhuf5baes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma73sb6rwfxfhuf5baes.png" alt="Quality metrics" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These are unique to LLM observability and have no equivalent in traditional APM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Groundedness score:&lt;/strong&gt; Alignment between the response and source documents (critical for RAG)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relevance score:&lt;/strong&gt; How well the response addresses the actual user query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness:&lt;/strong&gt; Whether the response is supported by the retrieved context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination rate:&lt;/strong&gt; Percentage of responses containing fabricated information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refusal rate:&lt;/strong&gt; Proportion of queries declined by safety filters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kcmxmco6h0jc71c0n3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kcmxmco6h0jc71c0n3y.png" alt="cost-metrics" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Token consumption drives your AI budget, and it can grow invisibly fast.&lt;/p&gt;

&lt;p&gt;Track cost per request broken down by model, endpoint, team, and API key. Track input vs. output token ratios   a single bloated system prompt, replicated across millions of requests, can materially change your monthly bill. Track cache hit ratio.&lt;/p&gt;

&lt;p&gt;Most teams that audit their token usage find that a &lt;a href="https://www.helicone.ai/blog/monitor-and-optimize-llm-costs" rel="noopener noreferrer"&gt;30–50% reduction is achievable without any quality loss&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.tokenfactory.nebius.com/ai-models-inference/observability" rel="noopener noreferrer"&gt;Nebius Token Factory&lt;/a&gt; tracks token throughput broken down by project, endpoint, and API key out of the box   giving you the per-consumer cost visibility that most teams have to build custom tooling for.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcey3r2oh7ufci93fook.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcey3r2oh7ufci93fook.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Pillar 2: Traces
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j4lt2ulzeb9vufcva9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j4lt2ulzeb9vufcva9h.png" alt="Traces" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traces are the backbone of LLM observability. A trace captures the complete lifecycle of a request as it moves through every component of your system, from the user's input through retrieval, re-ranking, the LLM call, post-processing, and back to the user.&lt;/p&gt;

&lt;p&gt;The trace hierarchy for LLM applications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A multi-turn user conversation (groups related traces)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complete end-to-end request lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Span&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A discrete unit of work within the trace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A single LLM call, prompt in, completion out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retrieval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A RAG document fetch operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Call&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;An external API call made by an agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Event&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A state milestone within a span&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What each generation span should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact prompt content (or a hash, if privacy-constrained)&lt;/li&gt;
&lt;li&gt;Model name and version&lt;/li&gt;
&lt;li&gt;Temperature and sampling parameters&lt;/li&gt;
&lt;li&gt;Input token count and output token count&lt;/li&gt;
&lt;li&gt;TTFT, total latency, and per-token latency&lt;/li&gt;
&lt;li&gt;Cost in USD&lt;/li&gt;
&lt;li&gt;Evaluation scores&lt;/li&gt;
&lt;li&gt;Tool arguments and returns (for agents)&lt;/li&gt;
&lt;li&gt;Retrieved document chunks and relevance scores (for RAG)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical discipline: &lt;strong&gt;propagate a single trace_id through every layer of the system.&lt;/strong&gt; Application → retriever → guardrails → model call → post-processing. Without this thread, distributed traces are incoherent. Debugging becomes guessing.&lt;/p&gt;




&lt;h3&gt;
  
  
  Pillar 3: Logs
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo6io0nhhimappvvno0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo6io0nhhimappvvno0z.png" alt="Logs" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LLM logs differ from traditional application logs in one fundamental way: the payload is unstructured natural language, not structured error codes.&lt;/p&gt;

&lt;p&gt;A JSON log of a successful API call tells you almost nothing about whether that call was valuable. You need a different approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended log structure for every LLM call:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trace_id              # Propagated from parent span
timestamp
model_id
model_provider
input_tokens
output_tokens
latency_ms
ttft_ms
cost_usd
user_id               # If applicable
session_id
environment           # dev / staging / prod
application_name
error_type            # If applicable
evaluation_scores     # groundedness, relevance, etc.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Privacy-first default:&lt;/strong&gt; Log metadata, not content. Token counts, model names, latency, and trace IDs by default, not raw prompt and response content. Prompts frequently contain PII. Enable full content capture only for authenticated sessions with explicit data governance controls.&lt;/p&gt;

&lt;p&gt;A semantic search capability over stored prompts is eventually necessary at scale. Traditional log indexing is insufficient for natural language; you can't grep your way to the root cause in a production LLM incident when the relevant signal is "prompts that started producing low groundedness scores after Tuesday."&lt;/p&gt;




&lt;h3&gt;
  
  
  Pillar 4: Evaluation
&lt;/h3&gt;

&lt;p&gt;This is the pillar that has no equivalent in traditional software observability. It is unique to LLM systems, and it is the pillar most teams skip.&lt;/p&gt;

&lt;p&gt;Evaluation is the practice of systematically assessing the quality of LLM outputs. Not just whether requests were completed, but whether the responses were &lt;em&gt;good&lt;/em&gt; by the standards your application requires.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM-as-Judge:&lt;/strong&gt; A separate LLM evaluates the outputs of your primary LLM. Scores across dimensions: relevance, accuracy, coherence, safety, tone. Scales to production traffic volumes without human bottlenecks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Online evaluation:&lt;/strong&gt; Running evals on live traffic in real-time. Flags outputs that fall below quality thresholds as they happen.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Offline evaluation:&lt;/strong&gt; Running evals on captured traces as a batch job. Slower but more thorough, uses more expensive evaluators on a sample of historical traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Human annotation:&lt;/strong&gt; The gold standard for precision. Human labelers reviewing flagged outputs and feeding corrections back into training datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The practical implementation: define 3–5 evaluation metrics you will actually act on. Tracking 20 metrics and acting on none is worse than tracking 3 rigorously. For most applications, a good starting set is: groundedness (for RAG), response relevance, format compliance, and safety.&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM Observability for RAG Pipelines
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) is now one of the most common LLM deployment patterns in the enterprise. It also introduces multiple independent observable stages, each of which can fail or degrade silently.&lt;/p&gt;

&lt;p&gt;The observable stages in a RAG pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query processing&lt;/strong&gt;: User query embedding, query rewriting, or expansion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: Vector database lookup, hybrid search, metadata filtering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranking&lt;/strong&gt;: Scoring and ordering retrieved chunks by relevance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context assembly&lt;/strong&gt;: Stuffing ranked chunks into the prompt within context window limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: The LLM call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-processing&lt;/strong&gt;: Citation extraction, formatting, safety filtering&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each stage has its own failure mode. Slow retrieval. Poor reranking quality. Context window overflow. Hallucination despite good retrieval. Good retrieval that the model ignores.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG-specific metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context relevance:&lt;/strong&gt; Are the retrieved chunks actually relevant to the query?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness/groundedness:&lt;/strong&gt; Does the generated answer stay within what the retrieved documents support? (This is your RAG-specific hallucination detector.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer relevance:&lt;/strong&gt; Does the response address the original question?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recall@k:&lt;/strong&gt; Of all truly relevant documents in the corpus, what fraction was retrieved?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk hit rate:&lt;/strong&gt; How often at least one retrieved chunk was genuinely useful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window utilization:&lt;/strong&gt; Truncation rates and overflow events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight for RAG observability: &lt;strong&gt;evaluate components independently from the full pipeline.&lt;/strong&gt; Good retrieval does not guarantee good answers. Good answers can sometimes emerge from poor retrieval.&lt;/p&gt;

&lt;p&gt;You need both component-level evaluation (retrieval quality) and end-to-end evaluation (answer quality), and you need to track them separately, because they can disagree.&lt;/p&gt;

&lt;p&gt;Wrap your retrieval function in a span that captures: which documents were fetched, their relevance scores, the query that produced them, and the latency. Link this span to the parent trace. When a user gets a bad answer, you need to be able to answer: "Was the retrieval bad, or was the model bad with good data?"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlaos7mjejjn0gtzdde7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlaos7mjejjn0gtzdde7.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM Observability for Agentic Systems
&lt;/h2&gt;

&lt;p&gt;Agents are the hardest observability problem in the LLM space.&lt;/p&gt;

&lt;p&gt;In a simple LLM call, you have one input and one output. In an agentic system, a single user request might execute 15 LLM calls across multiple models, trigger 8 tool calls, read from memory, spawn sub-agents, and make decisions at branching points, all before returning a response.&lt;/p&gt;

&lt;p&gt;None of this is visible in standard API logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why agents are uniquely hard:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Execution path variability:&lt;/strong&gt; The same input can produce completely different execution paths across runs. Slight phrasing changes, ambiguous instructions, or slight differences in retrieved memory can produce different tool call sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth:&lt;/strong&gt; Multi-hop reasoning chains mean errors can originate deep in the chain, far from where symptoms appear.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal span:&lt;/strong&gt; Agent tasks can run for minutes to hours. The request/response tracing model breaks down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent failure:&lt;/strong&gt; An agent might silently terminate early, return a partial result, or use the wrong tool, without any error being raised.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;According to &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;LangChain's State of AI Agents report&lt;/a&gt;, &lt;strong&gt;89% of organizations&lt;/strong&gt; have implemented some form of observability for their AI agents. But &lt;strong&gt;quality issues remain the primary production barrier&lt;/strong&gt;, cited by 32% of organizations   because monitoring that the agent &lt;em&gt;ran&lt;/em&gt; is not the same as monitoring that the agent &lt;em&gt;worked correctly&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Required trace data for agents:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every LLM call: prompt, completion, token usage, latency, cost&lt;/li&gt;
&lt;li&gt;Every tool call: function name, arguments passed, return value, latency&lt;/li&gt;
&lt;li&gt;Every memory read/write&lt;/li&gt;
&lt;li&gt;Every branching decision with the reasoning that led to it&lt;/li&gt;
&lt;li&gt;Session context across multiple turns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The single most important capability:&lt;/strong&gt; Step-level trace reconstruction. For any failed or low-quality agent run, you must be able to reconstruct the exact sequence of decisions the agent made. Not just "the agent returned a bad result", but "at step 7, the agent called the search tool with query X, received result Y, and then decided Z, which led to the failure."&lt;/p&gt;

&lt;p&gt;Without that capability, debugging agentic failures is not debugging. It's guessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enterprise LLM Observability Challenges
&lt;/h2&gt;

&lt;p&gt;For enterprises, observability is not just a developer experience concern. It's a compliance, security, and governance concern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Privacy
&lt;/h3&gt;

&lt;p&gt;LLMs in enterprise settings operate on sensitive data, such as customer records, contracts, source code, financial data, and medical records. Standard observability that logs raw prompts can inadvertently create a regulated data lake nobody intended to build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practical implications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PII in prompts:&lt;/strong&gt; User inputs frequently contain names, account numbers, addresses, and medical information. Your observability pipeline needs PII scrubbing middleware before data is written to any storage system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt content as IP:&lt;/strong&gt; Even without PII, system prompts may encode proprietary business logic, product roadmaps, or trade secrets. Who has access to that data in your observability tool?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data residency:&lt;/strong&gt; GDPR, HIPAA, and financial services regulations may prohibit transmitting prompt data to US-based SaaS platforms for EU-based or regulated-industry teams.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The default should be: log metadata, not content. Enable full prompt/response logging only where it's been explicitly reviewed and approved.&lt;/p&gt;

&lt;p&gt;For teams running under GDPR or strict data residency requirements, Nebius Token Factory stores observability metrics in the EU-North region regardless of where inference runs   and supports Zero Data Retention mode, where requests and outputs are never stored or reused. Both options exist because the right choice depends on your regulatory context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compliance Requirements
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;EU AI Act:&lt;/strong&gt; Requires transparency, risk documentation, and human oversight for high-risk AI systems. Observability is the technical foundation for compliance; you cannot document risk you cannot measure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GDPR:&lt;/strong&gt; Data protection, right to explanation, right to erasure. Every AI-assisted decision that affects a user must be traceable. "Right to erasure" is particularly complex for LLMs; models don't support selective forgetting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HIPAA:&lt;/strong&gt; Any LLM processing patient information requires audit trails of every interaction, Business Associate Agreements (BAAs) with every vendor in the data path (including your observability tool), and strict access controls.&lt;/p&gt;

&lt;p&gt;This is not theoretical. These requirements apply to production LLM deployments today, and the cost of non-compliance dwarfs the cost of proper observability infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Tenancy
&lt;/h3&gt;

&lt;p&gt;Enterprises serving multiple internal teams or external customers from shared LLM infrastructure face a set of observability challenges that single-tenant deployments don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost attribution&lt;/strong&gt; requires per-request metadata tagging, team ID, product, and customer ID, so you can do showback and chargeback across cost centers. Without this, your AI costs are opaque at the organizational level, even if your infrastructure team can see the aggregate bill.&lt;/p&gt;

&lt;p&gt;Nebius Token Factory's per-API-key and per-project filtering makes this practical without additional instrumentation   each consumer's token usage, latency, and error rates are filterable independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data isolation&lt;/strong&gt; means Tenant A's prompts and conversation history must never appear in Tenant B's context. Your retrieval layer and vector stores must support tenant-scoped filtering, and your observability data access must be role-gated by the same tenant boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Differential SLAs&lt;/strong&gt; require per-tenant performance tracking. If you've committed to different latency or availability targets for different customers or teams, you need to monitor against those targets independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-Determinism at Scale
&lt;/h3&gt;

&lt;p&gt;A production LLM can return a different response to the same prompt across runs. This isn't a bug, it's inherent to how these systems work. But it fundamentally changes what monitoring means.&lt;/p&gt;

&lt;p&gt;You cannot monitor LLM quality as binary pass/fail. You have to monitor distributions, trends, and statistical anomalies. A model behaving "correctly" on 97% of requests is different from one behaving correctly on 83%, but neither is 0% failure rate. You need statistical significance thresholds, not simple error rate alerts.&lt;/p&gt;

&lt;p&gt;Model provider updates compound this. When OpenAI or Anthropic updates their base model, your fine-tuned model or prompt template may produce different outputs without any change on your end. Observability is the only way to catch this before users do.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Current Tool Landscape
&lt;/h2&gt;

&lt;p&gt;The LLM observability tool market has matured into three categories. Choosing the wrong one for your situation creates lock-in, privacy exposure, or observability blind spots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Purpose-Built LLM Observability Platforms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://langfuse.com" rel="noopener noreferrer"&gt;Langfuse&lt;/a&gt;&lt;/strong&gt;: Open source (MIT license), self-hostable, framework-agnostic. Best for teams with data residency requirements or those who want full control without per-request SaaS pricing at scale.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.langchain.com/langsmith" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt;&lt;/strong&gt;: Commercial, deep LangChain/LangGraph integration. Best for teams already invested in the LangChain ecosystem who want native debugging tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.helicone.ai" rel="noopener noreferrer"&gt;Helicone&lt;/a&gt;&lt;/strong&gt;: Proxy-based integration (change one URL, logging starts immediately). Best for fast time-to-value when using vanilla API calls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://phoenix.arize.com" rel="noopener noreferrer"&gt;Arize Phoenix&lt;/a&gt;&lt;/strong&gt;: Open source, OpenTelemetry-native, strong for LLM and RAG evaluation. Best for teams wanting vendor-neutral instrumentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.getmaxim.ai" rel="noopener noreferrer"&gt;Maxim AI&lt;/a&gt;&lt;/strong&gt;: Ultra-low latency gateway (&amp;lt;11 microseconds at 5,000 RPS) plus evaluation. Best for teams needing gateway + eval + observability in one platform.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enterprise APM Platforms Adding LLM Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.datadoghq.com/product/llm-observability/" rel="noopener noreferrer"&gt;Datadog LLM Observability&lt;/a&gt;&lt;/strong&gt;: Native OTel GenAI Semantic Convention support (v1.37+) and "AI Guard" for real-time security guardrails. Best for enterprises already standardized on Datadog who want LLM observability in the same pane of glass as infrastructure monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://grafana.com" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt; + &lt;a href="https://openlit.io" rel="noopener noreferrer"&gt;OpenLIT&lt;/a&gt;&lt;/strong&gt;: For teams already running Prometheus/Grafana, OpenLIT (&lt;code&gt;pip install openlit&lt;/code&gt;, two lines of setup) exports LLM metrics via OTLP to Grafana Cloud. Best for teams who want to stay in their existing observability stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.splunk.com" rel="noopener noreferrer"&gt;Splunk&lt;/a&gt;&lt;/strong&gt;: Hallucination detection, drift management, compliance audit trails. Best for enterprise security/compliance teams already standardized on Splunk.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Inference Platforms with Built-in Observability
&lt;/h3&gt;

&lt;p&gt;An often-overlooked category: inference platforms that instrument your models at the infrastructure level, without requiring application-layer SDKs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://tokenfactory.nebius.com" rel="noopener noreferrer"&gt;Nebius Token Factory&lt;/a&gt;&lt;/strong&gt; is the clearest example of this approach. TTFT, token throughput, error categorization (4xx/429/5xx separately), active replica counts, and prompt size distributions are all tracked natively   per endpoint, per API key, per region   with near-real-time updates. Metrics export via Prometheus for integration with your existing Grafana dashboards.&lt;/p&gt;

&lt;p&gt;The advantage of this approach: you get inference-layer observability without any instrumentation overhead in your application code. The disadvantage: it doesn't give you the application-layer context (session IDs, user IDs, evaluation scores) that purpose-built platforms provide. The right answer for most teams is both   infrastructure observability at the inference layer and application observability at the SDK layer, unified in one Grafana dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Choose
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data residency required&lt;/td&gt;
&lt;td&gt;Self-hosted: &lt;a href="https://langfuse.com" rel="noopener noreferrer"&gt;Langfuse&lt;/a&gt;, &lt;a href="https://grafana.com" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt; + &lt;a href="https://openlit.io" rel="noopener noreferrer"&gt;OpenLIT&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Already on LangChain&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.langchain.com/langsmith" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fastest time-to-value&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.helicone.ai" rel="noopener noreferrer"&gt;Helicone&lt;/a&gt; (proxy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Already on Datadog&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.datadoghq.com/product/llm-observability/" rel="noopener noreferrer"&gt;Datadog LLM Observability&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation-first approach&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://phoenix.arize.com" rel="noopener noreferrer"&gt;Arize Phoenix&lt;/a&gt;, &lt;a href="https://www.braintrust.dev" rel="noopener noreferrer"&gt;Braintrust&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference-layer visibility&lt;/td&gt;
&lt;td&gt;&lt;a href="https://tokenfactory.nebius.com" rel="noopener noreferrer"&gt;Nebius Token Factory&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise scale + compliance&lt;/td&gt;
&lt;td&gt;Combination of the above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The teams that operate AI reliably are not necessarily the ones with the best models. They are the ones who can see what their models are actually doing in production and respond before their users notice.&lt;/p&gt;

&lt;p&gt;LLM observability is not an ML concern or an infra concern. It is a product quality concern. Every blind spot in your observability stack is a failure mode that will eventually become an incident. The only variable is whether your team finds it first or your users do.&lt;/p&gt;

&lt;p&gt;The four pillars   metrics, traces, logs, and evaluation   are your minimum. Start with the metrics and traces. Add evaluation before your team thinks you need it. Close the loop between observation and improvement.&lt;/p&gt;

&lt;p&gt;Instrument everything. Ship nothing you can't see.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>observability</category>
      <category>programming</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Wed, 11 Feb 2026 20:15:31 +0000</pubDate>
      <link>https://dev.to/arindam_1729/-2ae2</link>
      <guid>https://dev.to/arindam_1729/-2ae2</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/copilotkit" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__org__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7820%2F85b7e418-7abd-4fb5-8be6-69eb48a30e53.gif" alt="CopilotKit" width="320" height="320"&gt;
      &lt;div class="ltag__link__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2Fe0982512-4de1-4154-b3c3-1869d19e9ecc.png" alt="" width="612" height="612"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/copilotkit/build-a-multi-agent-telecom-support-system-with-copilotkit-langgraph-js-52oc" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Build a Multi-Agent Telecom Support System with CopilotKit &amp;amp; LangGraph JS&lt;/h2&gt;
      &lt;h3&gt;Arindam Majumder  for CopilotKit ・ Feb 9&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#webdev&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#opensource&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
