<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: TokVera</title>
    <description>The latest articles on DEV Community by TokVera (@tokvera).</description>
    <link>https://dev.to/tokvera</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838455%2F10da102b-c4cf-411a-8804-d79b2ee80fdc.jpg</url>
      <title>DEV Community: TokVera</title>
      <link>https://dev.to/tokvera</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tokvera"/>
    <language>en</language>
    <item>
      <title>How to Trace a Deep-Research Workbench in Node.js</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Fri, 03 Apr 2026 07:50:50 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-trace-a-deep-research-workbench-in-nodejs-49g4</link>
      <guid>https://dev.to/tokvera/how-to-trace-a-deep-research-workbench-in-nodejs-49g4</guid>
      <description>&lt;p&gt;Most research-agent demos optimize for the final answer.&lt;/p&gt;

&lt;p&gt;That is the least useful place to debug them.&lt;/p&gt;

&lt;p&gt;The operational questions show up earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how the research brief was framed&lt;/li&gt;
&lt;li&gt;what source directions were chosen&lt;/li&gt;
&lt;li&gt;whether the source mix was too narrow&lt;/li&gt;
&lt;li&gt;how the synthesis was assembled&lt;/li&gt;
&lt;li&gt;whether the final report preserved confidence and disagreement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why we built &lt;code&gt;open-deep-research-workbench&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Tokvera/open-deep-research-workbench" rel="noopener noreferrer"&gt;https://github.com/Tokvera/open-deep-research-workbench&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is a small Node starter that takes a research brief and turns it into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a research plan&lt;/li&gt;
&lt;li&gt;source directions&lt;/li&gt;
&lt;li&gt;a citation-aware synthesis&lt;/li&gt;
&lt;li&gt;recommended next steps&lt;/li&gt;
&lt;li&gt;one Tokvera root trace for the whole workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this is a better starting point than a flashy research demo
&lt;/h2&gt;

&lt;p&gt;A final answer can look polished even when the workflow behind it is weak.&lt;/p&gt;

&lt;p&gt;That is why teams need workflow-level visibility for research agents.&lt;/p&gt;

&lt;p&gt;This starter keeps the work inside one root trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research brief
  -&amp;gt; plan_research
  -&amp;gt; collect_sources
  -&amp;gt; synthesize_report
  -&amp;gt; return report + citations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;Express&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Tokvera JavaScript SDK&lt;/li&gt;
&lt;li&gt;Zod&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mock mode is enabled by default, so it is easy to run locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Tokvera/open-deep-research-workbench.git
&lt;span class="nb"&gt;cd &lt;/span&gt;open-deep-research-workbench
npm &lt;span class="nb"&gt;install
&lt;/span&gt;copy .env.example .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server starts on &lt;code&gt;http://localhost:3400&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Endpoints
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GET /health&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/demo-brief&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/sample-briefs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /api/research&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example request
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3400/api/research &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "topic": "How engineering teams should evaluate coding agents before letting them open pull requests",
    "audience": "Platform and application engineering leads",
    "goals": [
      "Find the main reliability and review concerns around coding agents",
      "Collect practical examples of evaluation workflow design",
      "Summarize what observability signals matter before production rollout"
    ],
    "timeframe": "current developer guidance"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why the root trace matters
&lt;/h2&gt;

&lt;p&gt;Research-agent failures are usually lineage failures.&lt;/p&gt;

&lt;p&gt;The brief may be weak.&lt;br&gt;
The source directions may be too narrow.&lt;br&gt;
The synthesis may flatten disagreement.&lt;/p&gt;

&lt;p&gt;Without one root trace, you only argue about the final answer.&lt;br&gt;
With one root trace, you can inspect where the workflow drifted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful follow-up links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo:

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Tokvera/open-deep-research-workbench" rel="noopener noreferrer"&gt;https://github.com/Tokvera/open-deep-research-workbench&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Website post:

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tokvera.org/blog/how-to-build-a-deep-research-workbench-with-one-root-trace" rel="noopener noreferrer"&gt;https://tokvera.org/blog/how-to-build-a-deep-research-workbench-with-one-root-trace&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Multi-step workflow page:

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tokvera.org/use-cases/multi-step-ai-workflow-observability" rel="noopener noreferrer"&gt;https://tokvera.org/use-cases/multi-step-ai-workflow-observability&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Agent workflow debugging:

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tokvera.org/use-cases/agent-workflow-debugging" rel="noopener noreferrer"&gt;https://tokvera.org/use-cases/agent-workflow-debugging&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>node</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
    <item>
      <title>Build a Coding-Agent PR Planner in Node.js with One Root Trace</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Fri, 03 Apr 2026 06:36:35 +0000</pubDate>
      <link>https://dev.to/tokvera/build-a-coding-agent-pr-planner-in-nodejs-with-one-root-trace-2hg0</link>
      <guid>https://dev.to/tokvera/build-a-coding-agent-pr-planner-in-nodejs-with-one-root-trace-2hg0</guid>
      <description>&lt;p&gt;Coding agents are useful long before you let them write code directly into production repositories.&lt;/p&gt;

&lt;p&gt;The first operationally useful step is smaller:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take a real engineering task&lt;/li&gt;
&lt;li&gt;classify it&lt;/li&gt;
&lt;li&gt;inspect the relevant repo area&lt;/li&gt;
&lt;li&gt;draft a concrete implementation plan&lt;/li&gt;
&lt;li&gt;generate a PR title and summary&lt;/li&gt;
&lt;li&gt;keep the whole workflow inside one root trace&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what I built in &lt;code&gt;coding-agent-pr-ops&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://github.com/Tokvera/coding-agent-pr-ops&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a better starting point than full auto-code generation
&lt;/h2&gt;

&lt;p&gt;Most coding-agent demos jump too quickly from task input to generated code. That looks impressive, but it skips the part engineering teams actually need to trust:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why the task was classified a certain way&lt;/li&gt;
&lt;li&gt;which repo area the agent thinks matters&lt;/li&gt;
&lt;li&gt;how risky the task is&lt;/li&gt;
&lt;li&gt;whether the review checklist actually protects the rollout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot inspect those steps, the final PR is just a black box with a diff attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the repo does
&lt;/h2&gt;

&lt;p&gt;For each task, the starter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;diagnoses work type and rollout risk&lt;/li&gt;
&lt;li&gt;inspects basic repository context&lt;/li&gt;
&lt;li&gt;drafts an implementation plan&lt;/li&gt;
&lt;li&gt;generates a PR title and PR summary&lt;/li&gt;
&lt;li&gt;returns a review checklist&lt;/li&gt;
&lt;li&gt;traces the workflow with Tokvera&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Workflow shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;engineering task
  -&amp;gt; diagnose_task
  -&amp;gt; inspect_repo_context
  -&amp;gt; draft_plan
  -&amp;gt; return PR plan + review checklist
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;Express&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Tokvera JavaScript SDK&lt;/li&gt;
&lt;li&gt;Zod&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mock mode is enabled by default, so you can run the whole thing without a live model key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Tokvera/coding-agent-pr-ops.git
&lt;span class="nb"&gt;cd &lt;/span&gt;coding-agent-pr-ops
npm &lt;span class="nb"&gt;install
&lt;/span&gt;copy .env.example .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server runs on &lt;code&gt;http://localhost:3300&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Endpoints
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GET /health&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/demo-task&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/sample-tasks&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /api/pr-plan&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example request
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3300/api/pr-plan &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "title": "Fix duplicate webhook retries when the upstream API returns 429",
    "body": "When the upstream billing API returns 429, our retry worker creates duplicate webhook attempts instead of backing off cleanly.",
    "repoName": "acme/ops-agent",
    "branchName": "main",
    "labels": ["bug", "webhooks", "billing"],
    "filesHint": ["src/workers/webhook-dispatch.ts", "src/lib/backoff.ts"]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Example response shape
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"traceId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"diagnosis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"workType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bugfix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"risk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"backend"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"repoArea"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"request handling and retries"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This looks like a bugfix task in request handling and retries with high rollout risk."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"implementationPlan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Reproduce and isolate the current behavior"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"detail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Use the labels and hinted files to confirm where the current path fails."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reviewChecklist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Root workflow trace shows classify -&amp;gt; inspect -&amp;gt; plan -&amp;gt; draft PR summary"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prTitle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fix: fix duplicate webhook retries when the upstream api returns 429"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why the root trace matters
&lt;/h2&gt;

&lt;p&gt;Coding-agent failures are workflow failures, not just bad completions.&lt;/p&gt;

&lt;p&gt;The diagnosis might be wrong.&lt;br&gt;
The repo context might point at the wrong files.&lt;br&gt;
The plan might look coherent but still be aimed at the wrong area.&lt;/p&gt;

&lt;p&gt;Without one root trace, you only see fragments.&lt;/p&gt;

&lt;p&gt;With one root trace, you can inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;diagnosis&lt;/li&gt;
&lt;li&gt;repo-context lookup&lt;/li&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;output handoff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives reviewers something operationally useful instead of just “the model generated this.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Why mock mode is a feature
&lt;/h2&gt;

&lt;p&gt;Mock mode makes the repo far more reusable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;easier to demo&lt;/li&gt;
&lt;li&gt;easier to screenshot&lt;/li&gt;
&lt;li&gt;easier to explain in articles&lt;/li&gt;
&lt;li&gt;easier for developers to fork without setup friction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the workflow is clear, you can replace static hints with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub issue fetches&lt;/li&gt;
&lt;li&gt;repo inspection through MCP or GitHub APIs&lt;/li&gt;
&lt;li&gt;patch generation&lt;/li&gt;
&lt;li&gt;PR review comments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow stays the same. The trace stays useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would inspect before trusting a coding agent
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;whether diagnosis and risk match what a human reviewer would say&lt;/li&gt;
&lt;li&gt;whether repo context points to the right code path&lt;/li&gt;
&lt;li&gt;whether the review checklist protects the risky path&lt;/li&gt;
&lt;li&gt;whether the PR summary explains rollout verification&lt;/li&gt;
&lt;li&gt;whether similar tasks are getting cheaper and more reliable over time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Useful follow-up reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Website post:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://tokvera.org/blog/how-to-build-a-coding-agent-pr-planning-workflow-with-one-root-trace&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Coding-agent docs:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://tokvera.org/docs/coding-agent-tracing&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Coding-agent use case:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://tokvera.org/use-cases/coding-agent-observability&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Agent evals in CI:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://tokvera.org/docs/agent-evals-in-ci&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If you want to fork it, the repo is here:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://github.com/Tokvera/coding-agent-pr-ops&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>typescript</category>
    </item>
    <item>
      <title>How to Add AI Gateway Observability to a Production Control Plane</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Thu, 02 Apr 2026 04:19:00 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-add-ai-gateway-observability-to-a-production-control-plane-4gbb</link>
      <guid>https://dev.to/tokvera/how-to-add-ai-gateway-observability-to-a-production-control-plane-4gbb</guid>
      <description>&lt;p&gt;A lot of teams add an AI gateway for a good reason.&lt;/p&gt;

&lt;p&gt;They want one place to enforce policy.&lt;br&gt;
They want one place to shape traffic.&lt;br&gt;
They want one place to introduce retries, failover, quotas, and model controls without rewriting every application.&lt;/p&gt;

&lt;p&gt;That architecture makes sense.&lt;/p&gt;

&lt;p&gt;But once the gateway starts making real decisions, it is no longer just a proxy.&lt;/p&gt;

&lt;p&gt;It becomes part of the production control plane.&lt;/p&gt;

&lt;p&gt;That is the point where &lt;strong&gt;AI gateway observability&lt;/strong&gt; matters.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why a gateway becomes hard to debug
&lt;/h2&gt;

&lt;p&gt;In a direct-to-provider setup, the debugging path is smaller.&lt;/p&gt;

&lt;p&gt;You usually inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the application request&lt;/li&gt;
&lt;li&gt;the provider call&lt;/li&gt;
&lt;li&gt;the final response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A gateway inserts a new decision layer in the middle.&lt;/p&gt;

&lt;p&gt;Now the same request may go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a policy check&lt;/li&gt;
&lt;li&gt;a quota or budget guardrail&lt;/li&gt;
&lt;li&gt;route selection logic&lt;/li&gt;
&lt;li&gt;a retry branch&lt;/li&gt;
&lt;li&gt;a failover path&lt;/li&gt;
&lt;li&gt;a downstream provider call&lt;/li&gt;
&lt;li&gt;response shaping before it returns to the app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If latency spikes or the wrong provider is used, the real problem may not be the downstream model at all.&lt;/p&gt;

&lt;p&gt;It may be the control-plane logic that shaped the request before the model call happened.&lt;/p&gt;
&lt;h2&gt;
  
  
  What good gateway observability should answer
&lt;/h2&gt;

&lt;p&gt;A useful gateway trace should help you answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why did this request take this route?&lt;/li&gt;
&lt;li&gt;Did a quota rule change the selected model?&lt;/li&gt;
&lt;li&gt;Did failover trigger because of provider health or a gateway bug?&lt;/li&gt;
&lt;li&gt;Did retries increase latency or token cost?&lt;/li&gt;
&lt;li&gt;Which tenants were affected by the behavior change?&lt;/li&gt;
&lt;li&gt;Did the issue begin in the gateway or at the provider?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those questions from one request lineage, your gateway is still too opaque.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical trace shape
&lt;/h2&gt;

&lt;p&gt;A small but useful gateway trace can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gateway request
  -&amp;gt; policy check
  -&amp;gt; route selection
  -&amp;gt; quota / budget rule
  -&amp;gt; failover or retry branch
  -&amp;gt; downstream provider call
  -&amp;gt; response + trace metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure makes it much easier to separate classes of problems.&lt;/p&gt;

&lt;p&gt;If the provider was slow, you can see it.&lt;/p&gt;

&lt;p&gt;If the provider was fine but the gateway retried too aggressively, you can see that too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example request flow
&lt;/h2&gt;

&lt;p&gt;Suppose a client sends a payload like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-enterprise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a concise assistant."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Summarize today’s error budget status."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway might make decisions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;apply enterprise-specific policies&lt;/li&gt;
&lt;li&gt;prefer the primary provider under normal conditions&lt;/li&gt;
&lt;li&gt;fall back if the provider is degraded&lt;/li&gt;
&lt;li&gt;preserve route metadata for later debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A response record with observability fields might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"primary_provider_ok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"retry_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failover_used"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme-enterprise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_123abc"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That record gives teams something much more useful than a plain request log.&lt;/p&gt;

&lt;p&gt;It explains the control-plane behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to instrument first
&lt;/h2&gt;

&lt;p&gt;If you are just getting started, begin with the fields that explain route changes and incidents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route reason&lt;/li&gt;
&lt;li&gt;selected provider&lt;/li&gt;
&lt;li&gt;selected model&lt;/li&gt;
&lt;li&gt;override source&lt;/li&gt;
&lt;li&gt;retry count&lt;/li&gt;
&lt;li&gt;failover state&lt;/li&gt;
&lt;li&gt;tenant context&lt;/li&gt;
&lt;li&gt;latency by step&lt;/li&gt;
&lt;li&gt;cost by step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those fields make it possible to debug most real gateway issues without rebuilding the whole platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI gateway observability helps with in practice
&lt;/h2&gt;

&lt;p&gt;Here are common production problems that become easier to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a premium customer got routed to a cheaper model unexpectedly&lt;/li&gt;
&lt;li&gt;traffic shifted to a backup provider but never shifted back&lt;/li&gt;
&lt;li&gt;a policy rollout increased latency for one customer segment&lt;/li&gt;
&lt;li&gt;quota pressure caused silent route changes&lt;/li&gt;
&lt;li&gt;retries doubled cost during partial provider instability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues are hard to explain when all you have is provider logging.&lt;/p&gt;

&lt;p&gt;They become much easier to reason about when the gateway decisions themselves are visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The main idea
&lt;/h2&gt;

&lt;p&gt;Most teams think they need more logs.&lt;/p&gt;

&lt;p&gt;What they often need is a clearer operational trace of the gateway as a decision system.&lt;/p&gt;

&lt;p&gt;That means treating the gateway request like a workflow with explicit steps rather than a black box in front of model providers.&lt;/p&gt;

&lt;p&gt;Once you do that, the control plane becomes much easier to operate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If your gateway shapes routing, policy, failover, or provider behavior, it is already part of production operations.&lt;/p&gt;

&lt;p&gt;That means you need observability for the gateway itself, not just the downstream model call.&lt;/p&gt;

&lt;p&gt;Because the important question in production is usually not:&lt;/p&gt;

&lt;p&gt;“Did the request finish?”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;“Why did it take this path?”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>observability</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How to Add AI Agent Handoff Observability to a Multi-Step Workflow</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Wed, 01 Apr 2026 04:15:00 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-add-ai-agent-handoff-observability-to-a-multi-step-workflow-g46</link>
      <guid>https://dev.to/tokvera/how-to-add-ai-agent-handoff-observability-to-a-multi-step-workflow-g46</guid>
      <description>&lt;p&gt;A lot of multi-step AI systems look clean in architecture diagrams.&lt;/p&gt;

&lt;p&gt;One agent classifies.&lt;br&gt;
Another retrieves context.&lt;br&gt;
Another drafts the response.&lt;br&gt;
A human steps in when confidence is low or escalation is required.&lt;/p&gt;

&lt;p&gt;The problem is that production issues often do not happen inside one agent step.&lt;/p&gt;

&lt;p&gt;They happen at the boundary between steps.&lt;/p&gt;

&lt;p&gt;That is where &lt;strong&gt;AI agent handoff observability&lt;/strong&gt; becomes important.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why handoffs are harder than they look
&lt;/h2&gt;

&lt;p&gt;A handoff sounds simple.&lt;/p&gt;

&lt;p&gt;One step finishes and another takes over.&lt;/p&gt;

&lt;p&gt;In practice, that boundary carries a lot of hidden risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context may be incomplete&lt;/li&gt;
&lt;li&gt;the wrong owner may be selected&lt;/li&gt;
&lt;li&gt;a human may receive too little evidence&lt;/li&gt;
&lt;li&gt;the next step may repeat work that was already done&lt;/li&gt;
&lt;li&gt;the workflow may appear successful even though continuity was broken&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the important debugging question is often not:&lt;/p&gt;

&lt;p&gt;“What did the model return?”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“What happened when ownership changed?”&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What a handoff trace should show
&lt;/h2&gt;

&lt;p&gt;A useful handoff trace should let you inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why the handoff was triggered&lt;/li&gt;
&lt;li&gt;which next owner or agent was selected&lt;/li&gt;
&lt;li&gt;what context was passed forward&lt;/li&gt;
&lt;li&gt;what summary or evidence was included&lt;/li&gt;
&lt;li&gt;whether the transfer led to progress or just another branch&lt;/li&gt;
&lt;li&gt;how much latency and cost the transfer added&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that information, teams only see the final output and miss the exact boundary where the workflow became fragile.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical handoff workflow shape
&lt;/h2&gt;

&lt;p&gt;A multi-step workflow with handoffs can often be modeled like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;request
  -&amp;gt; initial agent step
  -&amp;gt; handoff trigger
  -&amp;gt; ownership transfer
  -&amp;gt; context package
  -&amp;gt; next agent or human step
  -&amp;gt; follow-up action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That shape is simple, but it is enough to make the transfer inspectable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example handoff scenario
&lt;/h2&gt;

&lt;p&gt;Imagine a support workflow that starts with an automated agent.&lt;/p&gt;

&lt;p&gt;The agent reviews an incoming issue, detects that it may involve an enterprise outage, and decides to escalate to a human responder.&lt;/p&gt;

&lt;p&gt;A useful payload might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cust_456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"plan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"enterprise"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"issue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"possible_incident"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Customers are reporting repeated login failures across multiple regions."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.54&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A handoff-aware response should preserve the transfer context, not just the final destination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"handoff_trigger"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low_confidence_enterprise_incident"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"from_owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"triage_agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"to_owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"human_on_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context_package"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_plan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"enterprise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"issue_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"possible_incident"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Repeated login failures across multiple regions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"recommended_next_action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open incident review"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_handoff_789"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much more useful than a simple “escalated=true” flag.&lt;/p&gt;

&lt;p&gt;It explains the transfer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks without handoff visibility
&lt;/h2&gt;

&lt;p&gt;Without observability around handoffs, teams run into issues like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the next owner asks the customer to repeat information&lt;/li&gt;
&lt;li&gt;a human reviewer gets a handoff with no useful summary&lt;/li&gt;
&lt;li&gt;an agent hands off too often because confidence logic is noisy&lt;/li&gt;
&lt;li&gt;a downstream step reclassifies or reroutes the issue unnecessarily&lt;/li&gt;
&lt;li&gt;ownership changes become hard to explain during incident reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not just UX issues.&lt;/p&gt;

&lt;p&gt;They are workflow quality issues.&lt;/p&gt;

&lt;p&gt;And they often become visible only after automation is already live.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to instrument first
&lt;/h2&gt;

&lt;p&gt;If you want to keep handoff instrumentation lightweight, start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;handoff trigger reason&lt;/li&gt;
&lt;li&gt;previous owner&lt;/li&gt;
&lt;li&gt;next owner&lt;/li&gt;
&lt;li&gt;summary payload&lt;/li&gt;
&lt;li&gt;preserved context fields&lt;/li&gt;
&lt;li&gt;confidence or escalation score&lt;/li&gt;
&lt;li&gt;latency around the transfer&lt;/li&gt;
&lt;li&gt;follow-up outcome&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those fields make it possible to understand whether the handoff actually helped the workflow continue cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for agent-to-human systems
&lt;/h2&gt;

&lt;p&gt;The value of handoff observability grows when humans are part of the loop.&lt;/p&gt;

&lt;p&gt;If an AI system escalates to a person, the transfer quality affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;responder speed&lt;/li&gt;
&lt;li&gt;decision confidence&lt;/li&gt;
&lt;li&gt;customer experience&lt;/li&gt;
&lt;li&gt;repeated work&lt;/li&gt;
&lt;li&gt;operational trust in the workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A weak handoff does not just slow one request down.&lt;/p&gt;

&lt;p&gt;It makes the whole automation system harder to trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The main idea
&lt;/h2&gt;

&lt;p&gt;A multi-step workflow is only as strong as its boundaries.&lt;/p&gt;

&lt;p&gt;The steps themselves might work well, but if the transfer between them is opaque, the workflow becomes hard to debug and hard to improve.&lt;/p&gt;

&lt;p&gt;That is why handoff observability matters.&lt;/p&gt;

&lt;p&gt;It makes the transition itself inspectable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If your AI system moves work between agents, tools, queues, or humans, the handoff is part of the product logic.&lt;/p&gt;

&lt;p&gt;So it should be observable like any other important production step.&lt;/p&gt;

&lt;p&gt;Because the real question is not just whether the workflow completed.&lt;/p&gt;

&lt;p&gt;It is whether ownership changed in a way that preserved enough context for the next step to succeed.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>observability</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How to Add LLM Routing Visibility to a Multi-Model App</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:22:37 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-add-llm-routing-visibility-to-a-multi-model-app-3fp1</link>
      <guid>https://dev.to/tokvera/how-to-add-llm-routing-visibility-to-a-multi-model-app-3fp1</guid>
      <description>&lt;p&gt;A multi-model app usually starts with a good idea.&lt;/p&gt;

&lt;p&gt;Use a faster model for simple requests.&lt;br&gt;
Use a stronger model for harder ones.&lt;br&gt;
Fail over when a provider is slow.&lt;br&gt;
Route enterprise traffic differently from free-tier traffic.&lt;/p&gt;

&lt;p&gt;All of that makes sense.&lt;/p&gt;

&lt;p&gt;The problem starts later, when the system behaves unexpectedly and nobody can explain why a request took a specific path.&lt;/p&gt;

&lt;p&gt;That is when you need &lt;strong&gt;LLM routing visibility&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why routing visibility matters
&lt;/h2&gt;

&lt;p&gt;In a single-model app, the debugging path is relatively small.&lt;/p&gt;

&lt;p&gt;You inspect the input, the prompt, the model call, and the response.&lt;/p&gt;

&lt;p&gt;In a multi-model system, there are more moving parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route selection logic&lt;/li&gt;
&lt;li&gt;policy or override checks&lt;/li&gt;
&lt;li&gt;fallback branches&lt;/li&gt;
&lt;li&gt;selected provider and model&lt;/li&gt;
&lt;li&gt;downstream execution details&lt;/li&gt;
&lt;li&gt;cost and latency tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When something goes wrong, the important question is no longer just “what did the model return?”&lt;/p&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why did the system choose this route?&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  A simple routing shape
&lt;/h2&gt;

&lt;p&gt;A practical routing flow can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;request
  -&amp;gt; route decision
  -&amp;gt; selected model/provider
  -&amp;gt; fallback or retry branch
  -&amp;gt; downstream model call
  -&amp;gt; response + trace metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough structure to make routing behavior observable in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example routing logic
&lt;/h2&gt;

&lt;p&gt;Here is a tiny example in TypeScript pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickRoute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;providerHealth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;degraded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;providerHealth&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;degraded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-3-5-sonnet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;provider_failover&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tier&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;enterprise&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4.1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;enterprise_high_complexity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default_fast_path&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The routing logic itself is not the hard part.&lt;/p&gt;

&lt;p&gt;The hard part is preserving enough metadata so you can inspect what happened later.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to attach to the trace
&lt;/h2&gt;

&lt;p&gt;For each routed request, you usually want to capture at least:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route reason&lt;/li&gt;
&lt;li&gt;selected provider&lt;/li&gt;
&lt;li&gt;selected model&lt;/li&gt;
&lt;li&gt;fallback or retry status&lt;/li&gt;
&lt;li&gt;tenant or plan context&lt;/li&gt;
&lt;li&gt;latency for the routing step&lt;/li&gt;
&lt;li&gt;latency for the downstream model call&lt;/li&gt;
&lt;li&gt;cost for the final route taken&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With that data, a request stops being mysterious.&lt;/p&gt;

&lt;p&gt;You can understand whether the system made an intentional choice or drifted into the wrong branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What routing visibility helps you debug
&lt;/h2&gt;

&lt;p&gt;Here are the kinds of issues that become easier to explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a request hit an expensive model unexpectedly&lt;/li&gt;
&lt;li&gt;fallback triggered too often during partial outages&lt;/li&gt;
&lt;li&gt;one customer segment saw higher latency after a routing change&lt;/li&gt;
&lt;li&gt;a route change fixed reliability but increased spend&lt;/li&gt;
&lt;li&gt;a caller override was ignored or silently replaced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are difficult problems when you only have final responses and provider logs.&lt;/p&gt;

&lt;p&gt;They become much easier when the route decision itself is visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example traced output
&lt;/h2&gt;

&lt;p&gt;A useful response record might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"request_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"req_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"enterprise_high_complexity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallback_used"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;841&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.012&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.041&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_xyz789"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That record gives teams something actionable.&lt;/p&gt;

&lt;p&gt;It shows both the route and the execution path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden value of routing visibility
&lt;/h2&gt;

&lt;p&gt;Routing visibility is not only about debugging bad outcomes.&lt;/p&gt;

&lt;p&gt;It is also how teams evaluate whether routing logic is actually helping.&lt;/p&gt;

&lt;p&gt;A route change might reduce provider errors but increase latency.&lt;/p&gt;

&lt;p&gt;A fallback policy might improve reliability but hurt quality.&lt;/p&gt;

&lt;p&gt;A cheaper model path might look efficient until it causes more retries and rework downstream.&lt;/p&gt;

&lt;p&gt;Without visibility into route reasoning and route-level cost, those tradeoffs are hard to measure honestly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start small
&lt;/h2&gt;

&lt;p&gt;If you already have a multi-model app, you do not need to rebuild it.&lt;/p&gt;

&lt;p&gt;Start by making the route explicit.&lt;/p&gt;

&lt;p&gt;Keep a root trace for the request, then add child steps for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route selection&lt;/li&gt;
&lt;li&gt;fallback or retry logic&lt;/li&gt;
&lt;li&gt;downstream model execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even that small amount of structure can make production behavior much easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;A multi-model app becomes significantly harder to operate once routing decisions influence latency, cost, quality, and reliability.&lt;/p&gt;

&lt;p&gt;That is why LLM routing visibility matters.&lt;/p&gt;

&lt;p&gt;You do not just need to know which model returned the answer.&lt;/p&gt;

&lt;p&gt;You need to know why the system chose that path in the first place.&lt;/p&gt;

&lt;p&gt;That is the difference between having routing logic and being able to trust it in production.&lt;/p&gt;

&lt;p&gt;Check this tool &lt;a href="https://tokvera.org/docs" rel="noopener noreferrer"&gt;https://tokvera.org/docs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>observability</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How to Add Ticket Triage Workflow Tracing to a Support AI System</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:13:13 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-add-ticket-triage-workflow-tracing-to-a-support-ai-system-1h6k</link>
      <guid>https://dev.to/tokvera/how-to-add-ticket-triage-workflow-tracing-to-a-support-ai-system-1h6k</guid>
      <description>&lt;p&gt;A lot of support AI demos stop too early.&lt;/p&gt;

&lt;p&gt;A user sends a message. A model returns a response. The example ends.&lt;/p&gt;

&lt;p&gt;That is not how real support systems behave.&lt;/p&gt;

&lt;p&gt;A production support workflow usually has to do more than answer the customer. It has to classify the issue, assign urgency, choose the right queue, decide whether escalation is needed, and hand enough context to the next team.&lt;/p&gt;

&lt;p&gt;Once you add those steps, a new problem appears:&lt;/p&gt;

&lt;p&gt;How do you debug the workflow when the output is wrong?&lt;/p&gt;

&lt;p&gt;That is where &lt;strong&gt;ticket triage workflow tracing&lt;/strong&gt; becomes useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why support triage needs tracing
&lt;/h2&gt;

&lt;p&gt;If a system sends a billing issue to the bug queue, the problem might be in several places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the classification step&lt;/li&gt;
&lt;li&gt;the priority logic&lt;/li&gt;
&lt;li&gt;the queue selection rule&lt;/li&gt;
&lt;li&gt;the escalation branch&lt;/li&gt;
&lt;li&gt;missing customer or SLA context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without tracing, all you see is the final result.&lt;/p&gt;

&lt;p&gt;That makes the workflow hard to debug because the important path is hidden.&lt;/p&gt;

&lt;p&gt;A good trace lets you inspect the full triage sequence, not just the final queue name.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical triage shape
&lt;/h2&gt;

&lt;p&gt;A useful support workflow does not need to be huge.&lt;/p&gt;

&lt;p&gt;A small production-shaped version can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ticket input
  -&amp;gt; classification
  -&amp;gt; priority scoring
  -&amp;gt; queue selection
  -&amp;gt; escalation check
  -&amp;gt; summary + handoff metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That shape gives you enough structure to understand why a ticket moved the way it did.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to capture in each run
&lt;/h2&gt;

&lt;p&gt;At minimum, a triage trace should help answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what issue class was chosen&lt;/li&gt;
&lt;li&gt;what priority or SLA score was assigned&lt;/li&gt;
&lt;li&gt;which queue was selected&lt;/li&gt;
&lt;li&gt;whether escalation was triggered&lt;/li&gt;
&lt;li&gt;what summary and next actions were produced&lt;/li&gt;
&lt;li&gt;how long each step took&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That turns the workflow from a black box into something teams can inspect when production behavior changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example request shape
&lt;/h2&gt;

&lt;p&gt;A triage system can accept a payload like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cust_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"plan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"enterprise"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ticket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"API requests are timing out for our support agents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"We are seeing repeated timeouts in production and our queue is backing up."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"channel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A useful response should include both customer-facing and internal workflow context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"classification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"incident"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"queue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support-engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"escalation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Enterprise customer reporting production API timeouts affecting support operations."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"next_actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"route to support engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"open incident review"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"notify account owner"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_abc123"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The key idea: treat triage like a workflow, not a prompt
&lt;/h2&gt;

&lt;p&gt;A lot of teams still try to solve triage with one prompt.&lt;/p&gt;

&lt;p&gt;That works for demos, but it usually breaks down once support teams need to trust the result.&lt;/p&gt;

&lt;p&gt;Triage is a workflow because it includes multiple decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding the issue type&lt;/li&gt;
&lt;li&gt;interpreting urgency&lt;/li&gt;
&lt;li&gt;applying business logic&lt;/li&gt;
&lt;li&gt;deciding whether escalation is needed&lt;/li&gt;
&lt;li&gt;handing clean context to the next owner&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you model those steps explicitly, tracing them becomes much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  What breaks without visibility
&lt;/h2&gt;

&lt;p&gt;Without triage tracing, common support issues become harder to explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why was the wrong queue selected?&lt;/li&gt;
&lt;li&gt;Why was an enterprise issue not escalated?&lt;/li&gt;
&lt;li&gt;Why did the system label this as a normal bug instead of an incident?&lt;/li&gt;
&lt;li&gt;Why did the workflow take much longer for one customer segment?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are workflow questions, not just model questions.&lt;/p&gt;

&lt;p&gt;That is why the trace should preserve both the workflow path and the operational metadata around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple instrumentation mindset
&lt;/h2&gt;

&lt;p&gt;You do not need to instrument every field from day one.&lt;/p&gt;

&lt;p&gt;Start by keeping one root trace for the triage request, then add child steps for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;priority scoring&lt;/li&gt;
&lt;li&gt;queue routing&lt;/li&gt;
&lt;li&gt;escalation logic&lt;/li&gt;
&lt;li&gt;summary generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That alone gives you a much clearer picture of how the system behaves in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Support AI is more useful when it helps teams make operational decisions, not just generate text.&lt;/p&gt;

&lt;p&gt;Once your workflow starts classifying tickets, assigning urgency, choosing owners, and escalating issues, you need a way to inspect the path that produced those decisions.&lt;/p&gt;

&lt;p&gt;That is what ticket triage workflow tracing gives you.&lt;/p&gt;

&lt;p&gt;Not more logs.&lt;/p&gt;

&lt;p&gt;A debuggable workflow.&lt;/p&gt;

&lt;p&gt;visit &lt;a href="https://tokvera.org" rel="noopener noreferrer"&gt;tokvera.org&lt;/a&gt; for more details&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>observability</category>
      <category>api</category>
    </item>
    <item>
      <title>How to Build a LangGraph Support Triage Workflow with Trace Visibility</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Tue, 24 Mar 2026 18:30:00 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-build-a-langgraph-support-triage-workflow-with-trace-visibility-222</link>
      <guid>https://dev.to/tokvera/how-to-build-a-langgraph-support-triage-workflow-with-trace-visibility-222</guid>
      <description>&lt;p&gt;A lot of LangGraph demos prove that graphs can run.&lt;/p&gt;

&lt;p&gt;Fewer prove that teams can operate them.&lt;/p&gt;

&lt;p&gt;That difference matters.&lt;/p&gt;

&lt;p&gt;Once a workflow starts classifying tickets, choosing queues, deciding whether to escalate, and generating internal summaries, the important question is no longer just "did the graph execute?" It becomes "why did it make that decision?"&lt;/p&gt;

&lt;p&gt;That is the motivation behind &lt;a href="https://github.com/Tokvera/langgraph-ticket-triage" rel="noopener noreferrer"&gt;&lt;code&gt;langgraph-ticket-triage&lt;/code&gt;&lt;/a&gt;, a small Python starter that shows how to build a support triage workflow with LangGraph, FastAPI, and Tokvera trace visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LangGraph workflows need observability
&lt;/h2&gt;

&lt;p&gt;LangGraph is useful because it gives you a clean way to model multi-step workflows.&lt;/p&gt;

&lt;p&gt;But in production-like systems, graph execution alone is not enough.&lt;/p&gt;

&lt;p&gt;Teams still need to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how a ticket was classified&lt;/li&gt;
&lt;li&gt;why a queue was selected&lt;/li&gt;
&lt;li&gt;whether escalation logic was applied&lt;/li&gt;
&lt;li&gt;what summary was generated for the internal team&lt;/li&gt;
&lt;li&gt;whether the result came from mock mode or a live model call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that visibility, graph-based systems can become just as opaque as a large one-shot prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this starter repo does
&lt;/h2&gt;

&lt;p&gt;The repo focuses on a practical support triage flow instead of a toy graph.&lt;/p&gt;

&lt;p&gt;For each incoming ticket, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;starts a LangGraph workflow run&lt;/li&gt;
&lt;li&gt;classifies the ticket&lt;/li&gt;
&lt;li&gt;chooses a destination queue&lt;/li&gt;
&lt;li&gt;assigns SLA and suggested ownership&lt;/li&gt;
&lt;li&gt;generates an internal summary&lt;/li&gt;
&lt;li&gt;returns triage metadata, next actions, and Tokvera trace IDs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That makes it a strong reference for teams that want a Python-first agent workflow example with real operational shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow structure is intentionally simple
&lt;/h2&gt;

&lt;p&gt;The current graph uses two nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;classify&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;summarize&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the workflow path looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ticket input
  -&amp;gt; classify node
  -&amp;gt; summarize node
  -&amp;gt; triage response + Tokvera trace IDs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a good starter shape because it keeps the graph readable while still separating two different responsibilities.&lt;/p&gt;

&lt;p&gt;Classification handles routing decisions.&lt;/p&gt;

&lt;p&gt;Summarization handles internal communication.&lt;/p&gt;

&lt;p&gt;That separation makes the workflow easier to inspect and extend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this workflow is more realistic than a simple agent demo
&lt;/h2&gt;

&lt;p&gt;A realistic support flow has to do more than produce text.&lt;/p&gt;

&lt;p&gt;It has to turn an inbound ticket into operational decisions.&lt;/p&gt;

&lt;p&gt;In this starter, that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification such as &lt;code&gt;bug&lt;/code&gt;, &lt;code&gt;billing&lt;/code&gt;, &lt;code&gt;feature&lt;/code&gt;, or &lt;code&gt;general&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;priority setting&lt;/li&gt;
&lt;li&gt;queue selection&lt;/li&gt;
&lt;li&gt;escalation recommendation&lt;/li&gt;
&lt;li&gt;suggested ownership&lt;/li&gt;
&lt;li&gt;SLA expectations&lt;/li&gt;
&lt;li&gt;next actions for the support team&lt;/li&gt;
&lt;li&gt;an internal summary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the kind of output support and platform teams can actually use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API surface
&lt;/h2&gt;

&lt;p&gt;The project exposes a small set of routes for health checks, reusable sample payloads, and direct workflow execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GET /health&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/demo-ticket&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/sample-tickets&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /api/triage&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3200/api/triage &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "subject": "Bug: team members cannot open traces",
    "message": "Our support team sees a permissions error whenever they click a trace detail page.",
    "plan": "enterprise",
    "customer_name": "Ava",
    "customer_email": "ava@example.com"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That keeps local evaluation simple and makes the repo easy to demonstrate in articles, screenshots, and developer onboarding flows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the response gives you
&lt;/h2&gt;

&lt;p&gt;The output is not just a generated summary.&lt;/p&gt;

&lt;p&gt;It returns the data that an internal support workflow actually needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ticket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bug: team members cannot open traces"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"plan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"enterprise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ava"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ava@example.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"triage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"classification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bug"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"queue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"should_escalate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggested_owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support-engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggested_sla_hours"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"urgent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"short_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"incident language detected"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"next_actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Assign to support-engineering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Respond within 2 hours"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Collect reproduction details, timestamps, and failing trace IDs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Escalate because the enterprise plan requires faster handling"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That combination of workflow metadata plus trace identifiers is what makes the example useful beyond a basic LangGraph demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the workflow behaves
&lt;/h2&gt;

&lt;p&gt;The classification step can run in mock mode or with a live model.&lt;/p&gt;

&lt;p&gt;The repo includes heuristic fallback behavior for issues like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bugs and incidents&lt;/li&gt;
&lt;li&gt;billing questions&lt;/li&gt;
&lt;li&gt;feature requests&lt;/li&gt;
&lt;li&gt;general support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the summarization step turns the classification output into a short internal handoff summary and a set of next actions.&lt;/p&gt;

&lt;p&gt;That is a good pattern for real teams because it separates decision logic from communication logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tokvera fits well with LangGraph
&lt;/h2&gt;

&lt;p&gt;LangGraph gives you workflow structure.&lt;/p&gt;

&lt;p&gt;Tokvera gives you workflow visibility.&lt;/p&gt;

&lt;p&gt;This starter uses Tokvera to make the graph inspectable at two useful levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;graph root runs&lt;/li&gt;
&lt;li&gt;node-level execution spans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means you can inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the overall workflow run&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;classify_ticket&lt;/code&gt; decision step&lt;/li&gt;
&lt;li&gt;the model-backed classification call when live mode is enabled&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;summarize_triage&lt;/code&gt; step&lt;/li&gt;
&lt;li&gt;the model-backed summary generation call when live mode is enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction matters because debugging agent workflows usually requires more than raw model telemetry.&lt;/p&gt;

&lt;p&gt;You need to understand the workflow path itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this helps you debug
&lt;/h2&gt;

&lt;p&gt;With node-level visibility, you can answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the graph classify a billing issue as a bug?&lt;/li&gt;
&lt;li&gt;Was escalation triggered because of the plan, the message content, or both?&lt;/li&gt;
&lt;li&gt;Did the classification step behave correctly but the summary step produce weak output?&lt;/li&gt;
&lt;li&gt;Did mock mode hide a live-model issue during local testing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the kinds of questions teams actually hit when they move from demo graphs to production-like workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running it locally
&lt;/h2&gt;

&lt;p&gt;The project defaults to mock mode, which is the right choice for a starter.&lt;/p&gt;

&lt;p&gt;It lets you evaluate the workflow without needing live provider credentials on day one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;.&lt;/span&gt; .venv/Scripts/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
copy .env.example .env
uvicorn app.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 3200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, the API runs on &lt;code&gt;http://localhost:3200&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To use a live provider, set &lt;code&gt;MOCK_MODE=false&lt;/code&gt; and provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TOKVERA_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also configure &lt;code&gt;TOKVERA_INGEST_URL&lt;/code&gt;, &lt;code&gt;TOKVERA_TENANT_ID&lt;/code&gt;, and &lt;code&gt;OPENAI_MODEL&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this repo is valuable for Python-first teams
&lt;/h2&gt;

&lt;p&gt;A lot of OSS AI starter content leans heavily toward JavaScript.&lt;/p&gt;

&lt;p&gt;This repo matters because it gives Python teams a concrete example of how to combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI for the API surface&lt;/li&gt;
&lt;li&gt;LangGraph for workflow orchestration&lt;/li&gt;
&lt;li&gt;OpenAI for model-backed steps&lt;/li&gt;
&lt;li&gt;Tokvera for root-run and node-level visibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination makes it a good reference for teams building internal agents, support flows, and other stateful multi-step workflows in Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to customize next
&lt;/h2&gt;

&lt;p&gt;The starter is intentionally compact, which makes it easy to extend.&lt;/p&gt;

&lt;p&gt;The next useful upgrades would be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add more graph nodes for knowledge-base lookup or escalation review&lt;/li&gt;
&lt;li&gt;add a human-in-the-loop approval step before escalation&lt;/li&gt;
&lt;li&gt;add queue-specific summary formats&lt;/li&gt;
&lt;li&gt;persist workflow runs to a database&lt;/li&gt;
&lt;li&gt;attach screenshots or payload references to traces&lt;/li&gt;
&lt;li&gt;build a lightweight support console UI on top of the API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are natural next steps for any team turning a graph demo into a real workflow surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The best LangGraph examples do more than show nodes and edges.&lt;/p&gt;

&lt;p&gt;They show how a workflow makes decisions and how a team can inspect those decisions later.&lt;/p&gt;

&lt;p&gt;That is why &lt;code&gt;langgraph-ticket-triage&lt;/code&gt; is useful.&lt;/p&gt;

&lt;p&gt;It gives Python teams a practical support-triage workflow with clear graph structure, useful operational output, and trace visibility that makes the system debuggable instead of opaque.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/Tokvera/langgraph-ticket-triage" rel="noopener noreferrer"&gt;https://github.com/Tokvera/langgraph-ticket-triage&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangGraph tracing docs: &lt;a href="https://tokvera.org/docs/integrations/langgraph" rel="noopener noreferrer"&gt;https://tokvera.org/docs/integrations/langgraph&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Get started: &lt;a href="https://tokvera.org/docs/get-started" rel="noopener noreferrer"&gt;https://tokvera.org/docs/get-started&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to Build an OpenAI-Compatible LLM Gateway with Model Routing Visibility</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Sun, 22 Mar 2026 18:30:00 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-build-an-openai-compatible-llm-gateway-with-model-routing-visibility-2jnh</link>
      <guid>https://dev.to/tokvera/how-to-build-an-openai-compatible-llm-gateway-with-model-routing-visibility-2jnh</guid>
      <description>&lt;p&gt;Most teams do not start with a full AI platform.&lt;/p&gt;

&lt;p&gt;They start with a problem.&lt;/p&gt;

&lt;p&gt;Maybe one team wants to proxy OpenAI traffic through an internal service. Maybe another wants to route small prompts to a cheaper model and longer prompts to a stronger one. Maybe the platform team wants one place to add policy, fallback, logging, rate limits, or tenant-specific rules.&lt;/p&gt;

&lt;p&gt;That is usually the moment when a gateway becomes more valuable than another direct SDK call.&lt;/p&gt;

&lt;p&gt;The challenge is that once you insert a gateway between the application and the model provider, you also create a new layer that can become opaque. A request gets routed somewhere, a model gets selected, a response comes back, and later nobody remembers why that route was chosen.&lt;/p&gt;

&lt;p&gt;That is the motivation behind &lt;a href="https://github.com/Tokvera/llm-gateway-template" rel="noopener noreferrer"&gt;&lt;code&gt;llm-gateway-template&lt;/code&gt;&lt;/a&gt;, an open-source Node.js starter that shows how to build an OpenAI-compatible gateway with model routing and Tokvera trace visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an LLM gateway is useful
&lt;/h2&gt;

&lt;p&gt;An LLM gateway gives platform teams a control point.&lt;/p&gt;

&lt;p&gt;Instead of letting every application talk to providers directly, the gateway becomes the place where you can standardize request handling and enforce common decisions.&lt;/p&gt;

&lt;p&gt;That usually includes things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;routing &lt;code&gt;auto&lt;/code&gt; requests to different models&lt;/li&gt;
&lt;li&gt;applying policy before a provider call happens&lt;/li&gt;
&lt;li&gt;centralizing observability and audit metadata&lt;/li&gt;
&lt;li&gt;adding tenant-level behavior without changing every client app&lt;/li&gt;
&lt;li&gt;introducing fallback logic without touching each product surface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful when the application team wants a familiar API contract but the platform team wants more control underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this starter repo does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;llm-gateway-template&lt;/code&gt; is intentionally small, but it captures the workflow shape that matters.&lt;/p&gt;

&lt;p&gt;For each incoming OpenAI-style request, the service:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;accepts a &lt;code&gt;/v1/chat/completions&lt;/code&gt; payload&lt;/li&gt;
&lt;li&gt;decides whether to keep the requested model or auto-route it&lt;/li&gt;
&lt;li&gt;forwards the request to a downstream provider or mock responder&lt;/li&gt;
&lt;li&gt;returns an OpenAI-compatible completion response&lt;/li&gt;
&lt;li&gt;includes Tokvera metadata for the route and trace&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That makes the repo useful for teams that want to prototype gateway behavior without having to build a large internal platform first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API shape stays familiar
&lt;/h2&gt;

&lt;p&gt;One of the best choices in this starter is that it keeps the interface simple.&lt;/p&gt;

&lt;p&gt;Clients can call it using a familiar OpenAI-style payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3100/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "auto",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize the importance of model routing in two bullet points." }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because it lowers adoption friction.&lt;/p&gt;

&lt;p&gt;You can introduce gateway logic without forcing every internal caller to learn a completely new contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  How routing works in the starter
&lt;/h2&gt;

&lt;p&gt;The default routing logic is simple on purpose.&lt;/p&gt;

&lt;p&gt;If the caller specifies an explicit model, the gateway passes that through unchanged.&lt;/p&gt;

&lt;p&gt;If the caller uses &lt;code&gt;model: "auto"&lt;/code&gt;, the gateway estimates prompt size and chooses either a small model or a larger one.&lt;/p&gt;

&lt;p&gt;In the current implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;explicit models become passthrough requests&lt;/li&gt;
&lt;li&gt;short prompts route to the smaller model&lt;/li&gt;
&lt;li&gt;longer prompts route to the larger model&lt;/li&gt;
&lt;li&gt;the response carries the route reason and selected model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is enough to demonstrate the control plane behavior that most teams care about first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why visibility matters at the gateway layer
&lt;/h2&gt;

&lt;p&gt;A gateway is not only an HTTP proxy.&lt;/p&gt;

&lt;p&gt;It is a decision engine.&lt;/p&gt;

&lt;p&gt;Once the gateway starts selecting models, estimating prompt size, or applying policy, it becomes one of the most important places to observe.&lt;/p&gt;

&lt;p&gt;Without visibility, teams run into questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why did this request choose the large model?&lt;/li&gt;
&lt;li&gt;Did the client override the route, or did the gateway decide?&lt;/li&gt;
&lt;li&gt;Was the request expensive because of the prompt, the chosen model, or both?&lt;/li&gt;
&lt;li&gt;Did the provider fail, or did routing logic choose the wrong path?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your only evidence is the final completion response, debugging turns into guesswork.&lt;/p&gt;

&lt;p&gt;That is why tracing the gateway itself matters just as much as tracing the downstream model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Tokvera fits into the flow
&lt;/h2&gt;

&lt;p&gt;The starter uses Tokvera to trace both the gateway root and the downstream model execution.&lt;/p&gt;

&lt;p&gt;The architecture is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI-style request
  -&amp;gt; route_request
  -&amp;gt; downstream_provider_call
  -&amp;gt; completion response + Tokvera metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure gives you a coherent trace instead of isolated model events.&lt;/p&gt;

&lt;p&gt;You can inspect the routing step, see the selected model, review route reasoning, and keep the downstream provider call attached to the same workflow lineage.&lt;/p&gt;

&lt;p&gt;That is much more useful than observing only the final provider response in isolation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the response gives you
&lt;/h2&gt;

&lt;p&gt;The gateway returns a familiar completion response and includes a &lt;code&gt;tokvera&lt;/code&gt; object with routing and request metadata.&lt;/p&gt;

&lt;p&gt;Example shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl_mock_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mock gateway response from gpt-4o-mini: ..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tokvera"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"traceId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"runId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"routeReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"short_prompt_default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"sizeClass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"small"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selectedModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"totalCharacters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;124&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"estimatedPromptTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"request"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"requestedModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"messageCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"mockMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mock"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That extra metadata is what makes the gateway operationally useful.&lt;/p&gt;

&lt;p&gt;It lets platform teams answer not just what the model said, but how the request moved through the routing system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running it locally
&lt;/h2&gt;

&lt;p&gt;Like the support-router starter, this project defaults to mock mode.&lt;/p&gt;

&lt;p&gt;That makes it easy to evaluate and demo without needing live provider traffic on day one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, the service runs on &lt;code&gt;http://localhost:3100&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To use live requests, set &lt;code&gt;MOCK_MODE=false&lt;/code&gt; and provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TOKVERA_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also configure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;OPENAI_MODEL_SMALL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OPENAI_MODEL_LARGE&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GATEWAY_TENANT_ID&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TOKVERA_INGEST_URL&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes the starter good for both local demos and real integration experiments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to customize next
&lt;/h2&gt;

&lt;p&gt;The repo is deliberately minimal, which makes it a good foundation for platform-specific extensions.&lt;/p&gt;

&lt;p&gt;The next useful upgrades would be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add provider fallback chains&lt;/li&gt;
&lt;li&gt;add latency-aware or cost-aware routing&lt;/li&gt;
&lt;li&gt;add tenant-specific policies and budgets&lt;/li&gt;
&lt;li&gt;add rate limiting and request logging&lt;/li&gt;
&lt;li&gt;add payload redaction or prompt policy checks&lt;/li&gt;
&lt;li&gt;add Anthropic or Gemini as downstream providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the kinds of features that turn a starter into a real internal AI gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this repo is commercially useful
&lt;/h2&gt;

&lt;p&gt;A lot of AI infrastructure work happens before a team is ready for a full orchestration platform.&lt;/p&gt;

&lt;p&gt;They still need a place to enforce routing rules, centralize cost control, and inspect why requests were handled the way they were.&lt;/p&gt;

&lt;p&gt;That is exactly where an OpenAI-compatible gateway becomes valuable.&lt;/p&gt;

&lt;p&gt;And that is why &lt;code&gt;llm-gateway-template&lt;/code&gt; is a strong reference repo.&lt;/p&gt;

&lt;p&gt;It shows how to preserve a familiar client interface while making gateway behavior observable, inspectable, and extensible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/Tokvera/llm-gateway-template" rel="noopener noreferrer"&gt;https://github.com/Tokvera/llm-gateway-template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Existing app tracing guide: &lt;a href="https://tokvera.org/docs/integrations/existing-app" rel="noopener noreferrer"&gt;https://tokvera.org/docs/integrations/existing-app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Get started: &lt;a href="https://tokvera.org/docs/get-started" rel="noopener noreferrer"&gt;https://tokvera.org/docs/get-started&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>opensource</category>
      <category>api</category>
    </item>
    <item>
      <title>How to Build a Customer Support AI Router with Trace Visibility</title>
      <dc:creator>TokVera</dc:creator>
      <pubDate>Sun, 22 Mar 2026 14:19:01 +0000</pubDate>
      <link>https://dev.to/tokvera/how-to-build-a-customer-support-ai-router-with-trace-visibility-37</link>
      <guid>https://dev.to/tokvera/how-to-build-a-customer-support-ai-router-with-trace-visibility-37</guid>
      <description>&lt;p&gt;Most AI support demos stop at a single prompt. A user asks a question, the model returns a reply, and the tutorial ends there.&lt;/p&gt;

&lt;p&gt;That is not how real support systems behave.&lt;/p&gt;

&lt;p&gt;A real support workflow has to decide what kind of issue it is, where it should go, whether it needs escalation, what policy context applies, and how the final response should be written. Once you add those steps, you also need a way to inspect why the workflow made each decision.&lt;/p&gt;

&lt;p&gt;That is the problem behind &lt;a href="https://github.com/Tokvera/ai-support-router-starter" rel="noopener noreferrer"&gt;&lt;code&gt;ai-support-router-starter&lt;/code&gt;&lt;/a&gt;, a small open-source Node.js example from Tokvera that shows how to build a realistic customer-support AI workflow with trace visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why support AI needs routing, not just prompting
&lt;/h2&gt;

&lt;p&gt;A support assistant is not only a writing tool.&lt;/p&gt;

&lt;p&gt;It is also a routing system.&lt;/p&gt;

&lt;p&gt;If a customer reports unexpected charges, the system should recognize that this is likely a billing issue, route it to the right internal queue, decide whether escalation is needed, and apply the right response guidance before drafting the final reply.&lt;/p&gt;

&lt;p&gt;In practice, that usually means you need at least these layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;policy or knowledge lookup&lt;/li&gt;
&lt;li&gt;reply drafting&lt;/li&gt;
&lt;li&gt;operational metadata like SLA, ownership, and escalation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that structure, you end up with a nice demo and a fragile system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the starter repo does
&lt;/h2&gt;

&lt;p&gt;The starter focuses on a practical workflow shape instead of pretending a single prompt solves support automation.&lt;/p&gt;

&lt;p&gt;It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classifies inbound tickets into categories like billing, bug, feature, or general support&lt;/li&gt;
&lt;li&gt;chooses a queue, owner, priority, and escalation recommendation&lt;/li&gt;
&lt;li&gt;looks up policy guidance before drafting the reply&lt;/li&gt;
&lt;li&gt;returns internal next actions along with the customer-facing answer&lt;/li&gt;
&lt;li&gt;emits Tokvera trace data so the end-to-end request path is inspectable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to be a complete helpdesk product.&lt;/p&gt;

&lt;p&gt;The goal is to give teams a realistic foundation they can fork and extend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API shape
&lt;/h2&gt;

&lt;p&gt;The project exposes a small set of routes for health checks, sample payloads, and workflow execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GET /health&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/demo-ticket&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GET /api/sample-tickets&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /api/tickets/reply&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/api/tickets/reply &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "subject": "Need help understanding extra usage charges",
    "message": "Our finance team saw a larger invoice this week. Can you explain what changed?",
    "plan": "pro",
    "customerName": "Riya",
    "customerEmail": "riya@example.com"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the response is more useful than a plain model completion because it contains workflow output, not just generated text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"traceId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trc_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"triage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"billing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"queue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"billing-ops"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggestedOwner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"billing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggestedSlaHours"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shouldEscalate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reassuring"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nextActions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Assign to billing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Respond within 8 hours"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Review included usage, overages, and invoice change history"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reply"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Workflow architecture
&lt;/h2&gt;

&lt;p&gt;The starter keeps the flow intentionally simple and inspectable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inbound ticket
  -&amp;gt; classify_ticket
  -&amp;gt; lookup_policy
  -&amp;gt; draft_reply
  -&amp;gt; return triage + reply + next actions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That separation matters.&lt;/p&gt;

&lt;p&gt;If classification, policy lookup, and drafting all live inside one opaque prompt, you only see the final answer and are left guessing where the system went wrong.&lt;/p&gt;

&lt;p&gt;When the workflow is split into distinct steps, debugging becomes much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why trace visibility matters
&lt;/h2&gt;

&lt;p&gt;Support automation can fail in several different ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a billing ticket gets routed to engineering&lt;/li&gt;
&lt;li&gt;an enterprise account does not escalate quickly enough&lt;/li&gt;
&lt;li&gt;the correct queue is chosen but the wrong policy guidance is used&lt;/li&gt;
&lt;li&gt;the workflow classifies correctly but drafts the wrong tone or final answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If all you can see is the final response, root cause analysis becomes slow and fuzzy.&lt;/p&gt;

&lt;p&gt;With trace visibility, you can inspect the workflow path that produced the result and see which step actually broke down.&lt;/p&gt;

&lt;p&gt;That is where Tokvera fits into the starter.&lt;/p&gt;

&lt;p&gt;Instead of only tracking raw model usage, Tokvera helps you inspect the root workflow trace and the individual decisions made along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the project locally
&lt;/h2&gt;

&lt;p&gt;One nice detail in this repo is that it defaults to mock mode.&lt;/p&gt;

&lt;p&gt;That makes it useful for local evaluation, demos, screenshots, and onboarding even before you wire in live provider traffic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you want to switch to live traffic, set &lt;code&gt;MOCK_MODE=false&lt;/code&gt; and provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TOKVERA_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives teams a clean path from local development to real tracing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to customize next
&lt;/h2&gt;

&lt;p&gt;If you want to take this beyond a starter, the next obvious extensions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;replace the policy lookup stub with a real knowledge base or help center integration&lt;/li&gt;
&lt;li&gt;add Slack or email escalation hooks for urgent tickets&lt;/li&gt;
&lt;li&gt;persist ticket state and triage output to a database&lt;/li&gt;
&lt;li&gt;add provider fallback for reply drafting&lt;/li&gt;
&lt;li&gt;build a lightweight support review UI on top of the API&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this repo is useful
&lt;/h2&gt;

&lt;p&gt;A lot of AI tutorials show how to get text back from a model.&lt;/p&gt;

&lt;p&gt;Fewer show how to build a workflow that another team can actually operate.&lt;/p&gt;

&lt;p&gt;That is what makes &lt;code&gt;ai-support-router-starter&lt;/code&gt; valuable.&lt;/p&gt;

&lt;p&gt;It treats customer support AI as a decision pipeline rather than a one-shot prompt, and it gives you a traced, extensible foundation for building something real.&lt;/p&gt;

&lt;p&gt;If you want to try it, start with the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Tokvera/ai-support-router-starter" rel="noopener noreferrer"&gt;https://github.com/Tokvera/ai-support-router-starter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://tokvera.org/docs/integrations/existing-app" rel="noopener noreferrer"&gt;https://tokvera.org/docs/integrations/existing-app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Canonical article: &lt;a href="https://tokvera.org/blog/customer-support-ai-router-trace-visibility" rel="noopener noreferrer"&gt;https://tokvera.org/blog/customer-support-ai-router-trace-visibility&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>opensource</category>
      <category>api</category>
    </item>
  </channel>
</rss>
