DEV Community: TokVera

How to Trace a Deep-Research Workbench in Node.js

TokVera — Fri, 03 Apr 2026 07:50:50 +0000

Most research-agent demos optimize for the final answer.

That is the least useful place to debug them.

The operational questions show up earlier:

how the research brief was framed
what source directions were chosen
whether the source mix was too narrow
how the synthesis was assembled
whether the final report preserved confidence and disagreement

That is why we built open-deep-research-workbench:

https://github.com/Tokvera/open-deep-research-workbench

It is a small Node starter that takes a research brief and turns it into:

a research plan
source directions
a citation-aware synthesis
recommended next steps
one Tokvera root trace for the whole workflow

Why this is a better starting point than a flashy research demo

A final answer can look polished even when the workflow behind it is weak.

That is why teams need workflow-level visibility for research agents.

This starter keeps the work inside one root trace:

research brief
  -> plan_research
  -> collect_sources
  -> synthesize_report
  -> return report + citations

Stack

Node.js
Express
OpenAI
Tokvera JavaScript SDK
Zod

Mock mode is enabled by default, so it is easy to run locally.

Quick start

git clone https://github.com/Tokvera/open-deep-research-workbench.git
cd open-deep-research-workbench
npm install
copy .env.example .env
npm run dev

The server starts on http://localhost:3400.

Endpoints

GET /health
GET /api/demo-brief
GET /api/sample-briefs
POST /api/research

Example request

curl -X POST http://localhost:3400/api/research \
  -H "Content-Type: application/json" \
  -d '{
    "topic": "How engineering teams should evaluate coding agents before letting them open pull requests",
    "audience": "Platform and application engineering leads",
    "goals": [
      "Find the main reliability and review concerns around coding agents",
      "Collect practical examples of evaluation workflow design",
      "Summarize what observability signals matter before production rollout"
    ],
    "timeframe": "current developer guidance"
  }'

Why the root trace matters

Research-agent failures are usually lineage failures.

The brief may be weak.
The source directions may be too narrow.
The synthesis may flatten disagreement.

Without one root trace, you only argue about the final answer.
With one root trace, you can inspect where the workflow drifted.

Useful follow-up links

Repo:
- https://github.com/Tokvera/open-deep-research-workbench
Website post:
- https://tokvera.org/blog/how-to-build-a-deep-research-workbench-with-one-root-trace
Multi-step workflow page:
- https://tokvera.org/use-cases/multi-step-ai-workflow-observability
Agent workflow debugging:
- https://tokvera.org/use-cases/agent-workflow-debugging

Build a Coding-Agent PR Planner in Node.js with One Root Trace

TokVera — Fri, 03 Apr 2026 06:36:35 +0000

Coding agents are useful long before you let them write code directly into production repositories.

The first operationally useful step is smaller:

take a real engineering task
classify it
inspect the relevant repo area
draft a concrete implementation plan
generate a PR title and summary
keep the whole workflow inside one root trace

That is what I built in coding-agent-pr-ops:

https://github.com/Tokvera/coding-agent-pr-ops

Why this is a better starting point than full auto-code generation

Most coding-agent demos jump too quickly from task input to generated code. That looks impressive, but it skips the part engineering teams actually need to trust:

why the task was classified a certain way
which repo area the agent thinks matters
how risky the task is
whether the review checklist actually protects the rollout

If you cannot inspect those steps, the final PR is just a black box with a diff attached.

What the repo does

For each task, the starter:

diagnoses work type and rollout risk
inspects basic repository context
drafts an implementation plan
generates a PR title and PR summary
returns a review checklist
traces the workflow with Tokvera

Workflow shape:

engineering task
  -> diagnose_task
  -> inspect_repo_context
  -> draft_plan
  -> return PR plan + review checklist

Stack

Node.js
Express
OpenAI
Tokvera JavaScript SDK
Zod

Mock mode is enabled by default, so you can run the whole thing without a live model key.

Local setup

git clone https://github.com/Tokvera/coding-agent-pr-ops.git
cd coding-agent-pr-ops
npm install
copy .env.example .env
npm run dev

The server runs on http://localhost:3300.

Endpoints

GET /health
GET /api/demo-task
GET /api/sample-tasks
POST /api/pr-plan

Example request

curl -X POST http://localhost:3300/api/pr-plan \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Fix duplicate webhook retries when the upstream API returns 429",
    "body": "When the upstream billing API returns 429, our retry worker creates duplicate webhook attempts instead of backing off cleanly.",
    "repoName": "acme/ops-agent",
    "branchName": "main",
    "labels": ["bug", "webhooks", "billing"],
    "filesHint": ["src/workers/webhook-dispatch.ts", "src/lib/backoff.ts"]
  }'

Example response shape

{
  "traceId": "trc_123",
  "runId": "run_123",
  "diagnosis": {
    "workType": "bugfix",
    "risk": "high",
    "owner": "backend",
    "repoArea": "request handling and retries"
  },
  "summary": "This looks like a bugfix task in request handling and retries with high rollout risk.",
  "implementationPlan": [
    {
      "title": "Reproduce and isolate the current behavior",
      "detail": "Use the labels and hinted files to confirm where the current path fails."
    }
  ],
  "reviewChecklist": [
    "Root workflow trace shows classify -> inspect -> plan -> draft PR summary"
  ],
  "prTitle": "fix: fix duplicate webhook retries when the upstream api returns 429"
}

Why the root trace matters

Coding-agent failures are workflow failures, not just bad completions.

The diagnosis might be wrong.
The repo context might point at the wrong files.
The plan might look coherent but still be aimed at the wrong area.

Without one root trace, you only see fragments.

With one root trace, you can inspect:

diagnosis
repo-context lookup
planning
output handoff

That gives reviewers something operationally useful instead of just “the model generated this.”

Why mock mode is a feature

Mock mode makes the repo far more reusable:

easier to demo
easier to screenshot
easier to explain in articles
easier for developers to fork without setup friction

Once the workflow is clear, you can replace static hints with:

GitHub issue fetches
repo inspection through MCP or GitHub APIs
patch generation
PR review comments

The workflow stays the same. The trace stays useful.

What I would inspect before trusting a coding agent

whether diagnosis and risk match what a human reviewer would say
whether repo context points to the right code path
whether the review checklist protects the risky path
whether the PR summary explains rollout verification
whether similar tasks are getting cheaper and more reliable over time

Useful follow-up reading

Website post:
- https://tokvera.org/blog/how-to-build-a-coding-agent-pr-planning-workflow-with-one-root-trace
Coding-agent docs:
- https://tokvera.org/docs/coding-agent-tracing
Coding-agent use case:
- https://tokvera.org/use-cases/coding-agent-observability
Agent evals in CI:
- https://tokvera.org/docs/agent-evals-in-ci

If you want to fork it, the repo is here:

https://github.com/Tokvera/coding-agent-pr-ops

How to Add AI Gateway Observability to a Production Control Plane

TokVera — Thu, 02 Apr 2026 04:19:00 +0000

A lot of teams add an AI gateway for a good reason.

They want one place to enforce policy.
They want one place to shape traffic.
They want one place to introduce retries, failover, quotas, and model controls without rewriting every application.

That architecture makes sense.

But once the gateway starts making real decisions, it is no longer just a proxy.

It becomes part of the production control plane.

That is the point where AI gateway observability matters.

Why a gateway becomes hard to debug

In a direct-to-provider setup, the debugging path is smaller.

You usually inspect:

the application request
the provider call
the final response

A gateway inserts a new decision layer in the middle.

Now the same request may go through:

a policy check
a quota or budget guardrail
route selection logic
a retry branch
a failover path
a downstream provider call
response shaping before it returns to the app

If latency spikes or the wrong provider is used, the real problem may not be the downstream model at all.

It may be the control-plane logic that shaped the request before the model call happened.

What good gateway observability should answer

A useful gateway trace should help you answer questions like:

Why did this request take this route?
Did a quota rule change the selected model?
Did failover trigger because of provider health or a gateway bug?
Did retries increase latency or token cost?
Which tenants were affected by the behavior change?
Did the issue begin in the gateway or at the provider?

If you cannot answer those questions from one request lineage, your gateway is still too opaque.

A practical trace shape

A small but useful gateway trace can look like this:

gateway request
  -> policy check
  -> route selection
  -> quota / budget rule
  -> failover or retry branch
  -> downstream provider call
  -> response + trace metadata

That structure makes it much easier to separate classes of problems.

If the provider was slow, you can see it.

If the provider was fine but the gateway retried too aggressively, you can see that too.

Example request flow

Suppose a client sends a payload like this:

{
  "tenant_id": "acme-enterprise",
  "model": "auto",
  "messages": [
    { "role": "system", "content": "You are a concise assistant." },
    { "role": "user", "content": "Summarize today’s error budget status." }
  ]
}

The gateway might make decisions like:

apply enterprise-specific policies
prefer the primary provider under normal conditions
fall back if the provider is degraded
preserve route metadata for later debugging

A response record with observability fields might look like this:

{
  "route_reason": "primary_provider_ok",
  "selected_provider": "openai",
  "selected_model": "gpt-4o-mini",
  "retry_count": 0,
  "failover_used": false,
  "tenant_id": "acme-enterprise",
  "trace_id": "trc_123abc"
}

That record gives teams something much more useful than a plain request log.

It explains the control-plane behavior.

What to instrument first

If you are just getting started, begin with the fields that explain route changes and incidents:

route reason
selected provider
selected model
override source
retry count
failover state
tenant context
latency by step
cost by step

Those fields make it possible to debug most real gateway issues without rebuilding the whole platform.

What AI gateway observability helps with in practice

Here are common production problems that become easier to understand:

a premium customer got routed to a cheaper model unexpectedly
traffic shifted to a backup provider but never shifted back
a policy rollout increased latency for one customer segment
quota pressure caused silent route changes
retries doubled cost during partial provider instability

These issues are hard to explain when all you have is provider logging.

They become much easier to reason about when the gateway decisions themselves are visible.

The main idea

Most teams think they need more logs.

What they often need is a clearer operational trace of the gateway as a decision system.

That means treating the gateway request like a workflow with explicit steps rather than a black box in front of model providers.

Once you do that, the control plane becomes much easier to operate.

The takeaway

If your gateway shapes routing, policy, failover, or provider behavior, it is already part of production operations.

That means you need observability for the gateway itself, not just the downstream model call.

Because the important question in production is usually not:

“Did the request finish?”

It is:

“Why did it take this path?”

How to Add AI Agent Handoff Observability to a Multi-Step Workflow

TokVera — Wed, 01 Apr 2026 04:15:00 +0000

A lot of multi-step AI systems look clean in architecture diagrams.

One agent classifies.
Another retrieves context.
Another drafts the response.
A human steps in when confidence is low or escalation is required.

The problem is that production issues often do not happen inside one agent step.

They happen at the boundary between steps.

That is where AI agent handoff observability becomes important.

Why handoffs are harder than they look

A handoff sounds simple.

One step finishes and another takes over.

In practice, that boundary carries a lot of hidden risk:

context may be incomplete
the wrong owner may be selected
a human may receive too little evidence
the next step may repeat work that was already done
the workflow may appear successful even though continuity was broken

That means the important debugging question is often not:

“What did the model return?”

It is:

“What happened when ownership changed?”

What a handoff trace should show

A useful handoff trace should let you inspect:

why the handoff was triggered
which next owner or agent was selected
what context was passed forward
what summary or evidence was included
whether the transfer led to progress or just another branch
how much latency and cost the transfer added

Without that information, teams only see the final output and miss the exact boundary where the workflow became fragile.

A practical handoff workflow shape

A multi-step workflow with handoffs can often be modeled like this:

request
  -> initial agent step
  -> handoff trigger
  -> ownership transfer
  -> context package
  -> next agent or human step
  -> follow-up action

That shape is simple, but it is enough to make the transfer inspectable.

Example handoff scenario

Imagine a support workflow that starts with an automated agent.

The agent reviews an incoming issue, detects that it may involve an enterprise outage, and decides to escalate to a human responder.

A useful payload might look like this:

{
  "customer": {
    "id": "cust_456",
    "plan": "enterprise"
  },
  "issue": {
    "type": "possible_incident",
    "summary": "Customers are reporting repeated login failures across multiple regions."
  },
  "confidence": 0.54
}

A handoff-aware response should preserve the transfer context, not just the final destination:

{
  "handoff_trigger": "low_confidence_enterprise_incident",
  "from_owner": "triage_agent",
  "to_owner": "human_on_call",
  "context_package": {
    "customer_plan": "enterprise",
    "issue_type": "possible_incident",
    "summary": "Repeated login failures across multiple regions",
    "recommended_next_action": "open incident review"
  },
  "trace_id": "trc_handoff_789"
}

That is much more useful than a simple “escalated=true” flag.

It explains the transfer.

What breaks without handoff visibility

Without observability around handoffs, teams run into issues like:

the next owner asks the customer to repeat information
a human reviewer gets a handoff with no useful summary
an agent hands off too often because confidence logic is noisy
a downstream step reclassifies or reroutes the issue unnecessarily
ownership changes become hard to explain during incident reviews

These are not just UX issues.

They are workflow quality issues.

And they often become visible only after automation is already live.

What to instrument first

If you want to keep handoff instrumentation lightweight, start with:

handoff trigger reason
previous owner
next owner
summary payload
preserved context fields
confidence or escalation score
latency around the transfer
follow-up outcome

Those fields make it possible to understand whether the handoff actually helped the workflow continue cleanly.

Why this matters for agent-to-human systems

The value of handoff observability grows when humans are part of the loop.

If an AI system escalates to a person, the transfer quality affects:

responder speed
decision confidence
customer experience
repeated work
operational trust in the workflow

A weak handoff does not just slow one request down.

It makes the whole automation system harder to trust.

The main idea

A multi-step workflow is only as strong as its boundaries.

The steps themselves might work well, but if the transfer between them is opaque, the workflow becomes hard to debug and hard to improve.

That is why handoff observability matters.

It makes the transition itself inspectable.

The takeaway

If your AI system moves work between agents, tools, queues, or humans, the handoff is part of the product logic.

So it should be observable like any other important production step.

Because the real question is not just whether the workflow completed.

It is whether ownership changed in a way that preserved enough context for the next step to succeed.

How to Add LLM Routing Visibility to a Multi-Model App

TokVera — Tue, 31 Mar 2026 07:22:37 +0000

A multi-model app usually starts with a good idea.

Use a faster model for simple requests.
Use a stronger model for harder ones.
Fail over when a provider is slow.
Route enterprise traffic differently from free-tier traffic.

All of that makes sense.

The problem starts later, when the system behaves unexpectedly and nobody can explain why a request took a specific path.

That is when you need LLM routing visibility.

Why routing visibility matters

In a single-model app, the debugging path is relatively small.

You inspect the input, the prompt, the model call, and the response.

In a multi-model system, there are more moving parts:

route selection logic
policy or override checks
fallback branches
selected provider and model
downstream execution details
cost and latency tradeoffs

When something goes wrong, the important question is no longer just “what did the model return?”

It becomes:

Why did the system choose this route?

A simple routing shape

A practical routing flow can look like this:

request
  -> route decision
  -> selected model/provider
  -> fallback or retry branch
  -> downstream model call
  -> response + trace metadata

That is enough structure to make routing behavior observable in production.

Example routing logic

Here is a tiny example in TypeScript pseudocode:

function pickRoute(input: { tier: string; complexity: "low" | "high"; providerHealth: "ok" | "degraded" }) {
  if (input.providerHealth === "degraded") {
    return { provider: "anthropic", model: "claude-3-5-sonnet", reason: "provider_failover" };
  }

  if (input.tier === "enterprise" && input.complexity === "high") {
    return { provider: "openai", model: "gpt-4.1", reason: "enterprise_high_complexity" };
  }

  return { provider: "openai", model: "gpt-4o-mini", reason: "default_fast_path" };
}

The routing logic itself is not the hard part.

The hard part is preserving enough metadata so you can inspect what happened later.

What to attach to the trace

For each routed request, you usually want to capture at least:

route reason
selected provider
selected model
fallback or retry status
tenant or plan context
latency for the routing step
latency for the downstream model call
cost for the final route taken

With that data, a request stops being mysterious.

You can understand whether the system made an intentional choice or drifted into the wrong branch.

What routing visibility helps you debug

Here are the kinds of issues that become easier to explain:

a request hit an expensive model unexpectedly
fallback triggered too often during partial outages
one customer segment saw higher latency after a routing change
a route change fixed reliability but increased spend
a caller override was ignored or silently replaced

These are difficult problems when you only have final responses and provider logs.

They become much easier when the route decision itself is visible.

Example traced output

A useful response record might look like this:

{
  "request_id": "req_123",
  "route_reason": "enterprise_high_complexity",
  "selected_provider": "openai",
  "selected_model": "gpt-4.1",
  "fallback_used": false,
  "latency_ms": {
    "routing": 12,
    "provider_call": 841
  },
  "cost": {
    "input": 0.012,
    "output": 0.041
  },
  "trace_id": "trc_xyz789"
}

That record gives teams something actionable.

It shows both the route and the execution path.

The hidden value of routing visibility

Routing visibility is not only about debugging bad outcomes.

It is also how teams evaluate whether routing logic is actually helping.

A route change might reduce provider errors but increase latency.

A fallback policy might improve reliability but hurt quality.

A cheaper model path might look efficient until it causes more retries and rework downstream.

Without visibility into route reasoning and route-level cost, those tradeoffs are hard to measure honestly.

Start small

If you already have a multi-model app, you do not need to rebuild it.

Start by making the route explicit.

Keep a root trace for the request, then add child steps for:

route selection
fallback or retry logic
downstream model execution

Even that small amount of structure can make production behavior much easier to reason about.

The takeaway

A multi-model app becomes significantly harder to operate once routing decisions influence latency, cost, quality, and reliability.

That is why LLM routing visibility matters.

You do not just need to know which model returned the answer.

You need to know why the system chose that path in the first place.

That is the difference between having routing logic and being able to trust it in production.

Check this tool https://tokvera.org/docs

How to Add Ticket Triage Workflow Tracing to a Support AI System

TokVera — Tue, 31 Mar 2026 07:13:13 +0000

A lot of support AI demos stop too early.

A user sends a message. A model returns a response. The example ends.

That is not how real support systems behave.

A production support workflow usually has to do more than answer the customer. It has to classify the issue, assign urgency, choose the right queue, decide whether escalation is needed, and hand enough context to the next team.

Once you add those steps, a new problem appears:

How do you debug the workflow when the output is wrong?

That is where ticket triage workflow tracing becomes useful.

Why support triage needs tracing

If a system sends a billing issue to the bug queue, the problem might be in several places:

the classification step
the priority logic
the queue selection rule
the escalation branch
missing customer or SLA context

Without tracing, all you see is the final result.

That makes the workflow hard to debug because the important path is hidden.

A good trace lets you inspect the full triage sequence, not just the final queue name.

A practical triage shape

A useful support workflow does not need to be huge.

A small production-shaped version can look like this:

ticket input
  -> classification
  -> priority scoring
  -> queue selection
  -> escalation check
  -> summary + handoff metadata

That shape gives you enough structure to understand why a ticket moved the way it did.

What to capture in each run

At minimum, a triage trace should help answer:

what issue class was chosen
what priority or SLA score was assigned
which queue was selected
whether escalation was triggered
what summary and next actions were produced
how long each step took

That turns the workflow from a black box into something teams can inspect when production behavior changes.

Example request shape

A triage system can accept a payload like this:

{
  "customer": {
    "id": "cust_123",
    "plan": "enterprise"
  },
  "ticket": {
    "subject": "API requests are timing out for our support agents",
    "body": "We are seeing repeated timeouts in production and our queue is backing up.",
    "channel": "email"
  }
}

A useful response should include both customer-facing and internal workflow context:

{
  "classification": "incident",
  "priority": "high",
  "queue": "support-engineering",
  "escalation": true,
  "summary": "Enterprise customer reporting production API timeouts affecting support operations.",
  "next_actions": [
    "route to support engineering",
    "open incident review",
    "notify account owner"
  ],
  "trace_id": "trc_abc123"
}

The key idea: treat triage like a workflow, not a prompt

A lot of teams still try to solve triage with one prompt.

That works for demos, but it usually breaks down once support teams need to trust the result.

Triage is a workflow because it includes multiple decisions:

understanding the issue type
interpreting urgency
applying business logic
deciding whether escalation is needed
handing clean context to the next owner

Once you model those steps explicitly, tracing them becomes much easier.

What breaks without visibility

Without triage tracing, common support issues become harder to explain:

Why was the wrong queue selected?
Why was an enterprise issue not escalated?
Why did the system label this as a normal bug instead of an incident?
Why did the workflow take much longer for one customer segment?

These are workflow questions, not just model questions.

That is why the trace should preserve both the workflow path and the operational metadata around it.

A simple instrumentation mindset

You do not need to instrument every field from day one.

Start by keeping one root trace for the triage request, then add child steps for:

classification
priority scoring
queue routing
escalation logic
summary generation

That alone gives you a much clearer picture of how the system behaves in production.

The takeaway

Support AI is more useful when it helps teams make operational decisions, not just generate text.

Once your workflow starts classifying tickets, assigning urgency, choosing owners, and escalating issues, you need a way to inspect the path that produced those decisions.

That is what ticket triage workflow tracing gives you.

Not more logs.

A debuggable workflow.

visit tokvera.org for more details

How to Build a LangGraph Support Triage Workflow with Trace Visibility

TokVera — Tue, 24 Mar 2026 18:30:00 +0000

A lot of LangGraph demos prove that graphs can run.

Fewer prove that teams can operate them.

That difference matters.

Once a workflow starts classifying tickets, choosing queues, deciding whether to escalate, and generating internal summaries, the important question is no longer just "did the graph execute?" It becomes "why did it make that decision?"

That is the motivation behind langgraph-ticket-triage, a small Python starter that shows how to build a support triage workflow with LangGraph, FastAPI, and Tokvera trace visibility.

Why LangGraph workflows need observability

LangGraph is useful because it gives you a clean way to model multi-step workflows.

But in production-like systems, graph execution alone is not enough.

Teams still need to understand:

how a ticket was classified
why a queue was selected
whether escalation logic was applied
what summary was generated for the internal team
whether the result came from mock mode or a live model call

Without that visibility, graph-based systems can become just as opaque as a large one-shot prompt.

What this starter repo does

The repo focuses on a practical support triage flow instead of a toy graph.

For each incoming ticket, it:

starts a LangGraph workflow run
classifies the ticket
chooses a destination queue
assigns SLA and suggested ownership
generates an internal summary
returns triage metadata, next actions, and Tokvera trace IDs

That makes it a strong reference for teams that want a Python-first agent workflow example with real operational shape.

The workflow structure is intentionally simple

The current graph uses two nodes:

classify
summarize

And the workflow path looks like this:

ticket input
  -> classify node
  -> summarize node
  -> triage response + Tokvera trace IDs

That is a good starter shape because it keeps the graph readable while still separating two different responsibilities.

Classification handles routing decisions.

Summarization handles internal communication.

That separation makes the workflow easier to inspect and extend.

Why this workflow is more realistic than a simple agent demo

A realistic support flow has to do more than produce text.

It has to turn an inbound ticket into operational decisions.

In this starter, that includes:

classification such as bug, billing, feature, or general
priority setting
queue selection
escalation recommendation
suggested ownership
SLA expectations
next actions for the support team
an internal summary

That is the kind of output support and platform teams can actually use.

The API surface

The project exposes a small set of routes for health checks, reusable sample payloads, and direct workflow execution:

GET /health
GET /api/demo-ticket
GET /api/sample-tickets
POST /api/triage

Example request:

curl -X POST http://localhost:3200/api/triage \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Bug: team members cannot open traces",
    "message": "Our support team sees a permissions error whenever they click a trace detail page.",
    "plan": "enterprise",
    "customer_name": "Ava",
    "customer_email": "ava@example.com"
  }'

That keeps local evaluation simple and makes the repo easy to demonstrate in articles, screenshots, and developer onboarding flows.

What the response gives you

The output is not just a generated summary.

It returns the data that an internal support workflow actually needs:

{
  "trace_id": "trc_123",
  "run_id": "run_123",
  "ticket": {
    "subject": "Bug: team members cannot open traces",
    "plan": "enterprise",
    "customer_name": "Ava",
    "customer_email": "ava@example.com"
  },
  "triage": {
    "classification": "bug",
    "priority": "high",
    "queue": "engineering",
    "should_escalate": true,
    "suggested_owner": "support-engineering",
    "suggested_sla_hours": 2,
    "tone": "urgent",
    "short_reason": "incident language detected"
  },
  "next_actions": [
    "Assign to support-engineering",
    "Respond within 2 hours",
    "Collect reproduction details, timestamps, and failing trace IDs",
    "Escalate because the enterprise plan requires faster handling"
  ],
  "summary": "..."
}

That combination of workflow metadata plus trace identifiers is what makes the example useful beyond a basic LangGraph demo.

How the workflow behaves

The classification step can run in mock mode or with a live model.

The repo includes heuristic fallback behavior for issues like:

bugs and incidents
billing questions
feature requests
general support

Then the summarization step turns the classification output into a short internal handoff summary and a set of next actions.

That is a good pattern for real teams because it separates decision logic from communication logic.

Why Tokvera fits well with LangGraph

LangGraph gives you workflow structure.

Tokvera gives you workflow visibility.

This starter uses Tokvera to make the graph inspectable at two useful levels:

graph root runs
node-level execution spans

That means you can inspect:

the overall workflow run
the classify_ticket decision step
the model-backed classification call when live mode is enabled
the summarize_triage step
the model-backed summary generation call when live mode is enabled

That distinction matters because debugging agent workflows usually requires more than raw model telemetry.

You need to understand the workflow path itself.

What this helps you debug

With node-level visibility, you can answer questions like:

Did the graph classify a billing issue as a bug?
Was escalation triggered because of the plan, the message content, or both?
Did the classification step behave correctly but the summary step produce weak output?
Did mock mode hide a live-model issue during local testing?

Those are the kinds of questions teams actually hit when they move from demo graphs to production-like workflows.

Running it locally

The project defaults to mock mode, which is the right choice for a starter.

It lets you evaluate the workflow without needing live provider credentials on day one.

python -m venv .venv
. .venv/Scripts/activate
pip install -e .
copy .env.example .env
uvicorn app.main:app --reload --port 3200

By default, the API runs on http://localhost:3200.

To use a live provider, set MOCK_MODE=false and provide:

OPENAI_API_KEY
TOKVERA_API_KEY

You can also configure TOKVERA_INGEST_URL, TOKVERA_TENANT_ID, and OPENAI_MODEL.

Why this repo is valuable for Python-first teams

A lot of OSS AI starter content leans heavily toward JavaScript.

This repo matters because it gives Python teams a concrete example of how to combine:

FastAPI for the API surface
LangGraph for workflow orchestration
OpenAI for model-backed steps
Tokvera for root-run and node-level visibility

That combination makes it a good reference for teams building internal agents, support flows, and other stateful multi-step workflows in Python.

What to customize next

The starter is intentionally compact, which makes it easy to extend.

The next useful upgrades would be:

add more graph nodes for knowledge-base lookup or escalation review
add a human-in-the-loop approval step before escalation
add queue-specific summary formats
persist workflow runs to a database
attach screenshots or payload references to traces
build a lightweight support console UI on top of the API

Those are natural next steps for any team turning a graph demo into a real workflow surface.

Conclusion

The best LangGraph examples do more than show nodes and edges.

They show how a workflow makes decisions and how a team can inspect those decisions later.

That is why langgraph-ticket-triage is useful.

It gives Python teams a practical support-triage workflow with clear graph structure, useful operational output, and trace visibility that makes the system debuggable instead of opaque.

How to Build an OpenAI-Compatible LLM Gateway with Model Routing Visibility

TokVera — Sun, 22 Mar 2026 18:30:00 +0000

Most teams do not start with a full AI platform.

They start with a problem.

Maybe one team wants to proxy OpenAI traffic through an internal service. Maybe another wants to route small prompts to a cheaper model and longer prompts to a stronger one. Maybe the platform team wants one place to add policy, fallback, logging, rate limits, or tenant-specific rules.

That is usually the moment when a gateway becomes more valuable than another direct SDK call.

The challenge is that once you insert a gateway between the application and the model provider, you also create a new layer that can become opaque. A request gets routed somewhere, a model gets selected, a response comes back, and later nobody remembers why that route was chosen.

That is the motivation behind llm-gateway-template, an open-source Node.js starter that shows how to build an OpenAI-compatible gateway with model routing and Tokvera trace visibility.

Why an LLM gateway is useful

An LLM gateway gives platform teams a control point.

Instead of letting every application talk to providers directly, the gateway becomes the place where you can standardize request handling and enforce common decisions.

That usually includes things like:

routing auto requests to different models
applying policy before a provider call happens
centralizing observability and audit metadata
adding tenant-level behavior without changing every client app
introducing fallback logic without touching each product surface

This is especially useful when the application team wants a familiar API contract but the platform team wants more control underneath.

What this starter repo does

llm-gateway-template is intentionally small, but it captures the workflow shape that matters.

For each incoming OpenAI-style request, the service:

accepts a /v1/chat/completions payload
decides whether to keep the requested model or auto-route it
forwards the request to a downstream provider or mock responder
returns an OpenAI-compatible completion response
includes Tokvera metadata for the route and trace

That makes the repo useful for teams that want to prototype gateway behavior without having to build a large internal platform first.

The API shape stays familiar

One of the best choices in this starter is that it keeps the interface simple.

Clients can call it using a familiar OpenAI-style payload:

curl -X POST http://localhost:3100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      { "role": "system", "content": "You are a concise assistant." },
      { "role": "user", "content": "Summarize the importance of model routing in two bullet points." }
    ]
  }'

That matters because it lowers adoption friction.

You can introduce gateway logic without forcing every internal caller to learn a completely new contract.

How routing works in the starter

The default routing logic is simple on purpose.

If the caller specifies an explicit model, the gateway passes that through unchanged.

If the caller uses model: "auto", the gateway estimates prompt size and chooses either a small model or a larger one.

In the current implementation:

explicit models become passthrough requests
short prompts route to the smaller model
longer prompts route to the larger model
the response carries the route reason and selected model

That is enough to demonstrate the control plane behavior that most teams care about first.

Why visibility matters at the gateway layer

A gateway is not only an HTTP proxy.

It is a decision engine.

Once the gateway starts selecting models, estimating prompt size, or applying policy, it becomes one of the most important places to observe.

Without visibility, teams run into questions like:

Why did this request choose the large model?
Did the client override the route, or did the gateway decide?
Was the request expensive because of the prompt, the chosen model, or both?
Did the provider fail, or did routing logic choose the wrong path?

If your only evidence is the final completion response, debugging turns into guesswork.

That is why tracing the gateway itself matters just as much as tracing the downstream model call.

How Tokvera fits into the flow

The starter uses Tokvera to trace both the gateway root and the downstream model execution.

The architecture is simple:

OpenAI-style request
  -> route_request
  -> downstream_provider_call
  -> completion response + Tokvera metadata

That structure gives you a coherent trace instead of isolated model events.

You can inspect the routing step, see the selected model, review route reasoning, and keep the downstream provider call attached to the same workflow lineage.

That is much more useful than observing only the final provider response in isolation.

What the response gives you

The gateway returns a familiar completion response and includes a tokvera object with routing and request metadata.

Example shape:

{
  "id": "chatcmpl_mock_123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Mock gateway response from gpt-4o-mini: ..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 18,
    "total_tokens": 48
  },
  "tokvera": {
    "traceId": "trc_123",
    "runId": "run_123",
    "routing": {
      "routeReason": "short_prompt_default",
      "sizeClass": "small",
      "selectedModel": "gpt-4o-mini",
      "totalCharacters": 124,
      "estimatedPromptTokens": 31
    },
    "request": {
      "requestedModel": "auto",
      "messageCount": 2,
      "mockMode": true,
      "provider": "mock"
    }
  }
}

That extra metadata is what makes the gateway operationally useful.

It lets platform teams answer not just what the model said, but how the request moved through the routing system.

Running it locally

Like the support-router starter, this project defaults to mock mode.

That makes it easy to evaluate and demo without needing live provider traffic on day one.

npm install
cp .env.example .env
npm run dev

By default, the service runs on http://localhost:3100.

To use live requests, set MOCK_MODE=false and provide:

OPENAI_API_KEY
TOKVERA_API_KEY

You can also configure:

OPENAI_MODEL_SMALL
OPENAI_MODEL_LARGE
GATEWAY_TENANT_ID
TOKVERA_INGEST_URL

That makes the starter good for both local demos and real integration experiments.

What to customize next

The repo is deliberately minimal, which makes it a good foundation for platform-specific extensions.

The next useful upgrades would be:

add provider fallback chains
add latency-aware or cost-aware routing
add tenant-specific policies and budgets
add rate limiting and request logging
add payload redaction or prompt policy checks
add Anthropic or Gemini as downstream providers

Those are the kinds of features that turn a starter into a real internal AI gateway.

Why this repo is commercially useful

A lot of AI infrastructure work happens before a team is ready for a full orchestration platform.

They still need a place to enforce routing rules, centralize cost control, and inspect why requests were handled the way they were.

That is exactly where an OpenAI-compatible gateway becomes valuable.

And that is why llm-gateway-template is a strong reference repo.

It shows how to preserve a familiar client interface while making gateway behavior observable, inspectable, and extensible.

How to Build a Customer Support AI Router with Trace Visibility

TokVera — Sun, 22 Mar 2026 14:19:01 +0000

Most AI support demos stop at a single prompt. A user asks a question, the model returns a reply, and the tutorial ends there.

That is not how real support systems behave.

A real support workflow has to decide what kind of issue it is, where it should go, whether it needs escalation, what policy context applies, and how the final response should be written. Once you add those steps, you also need a way to inspect why the workflow made each decision.

That is the problem behind ai-support-router-starter, a small open-source Node.js example from Tokvera that shows how to build a realistic customer-support AI workflow with trace visibility.

Why support AI needs routing, not just prompting

A support assistant is not only a writing tool.

It is also a routing system.

If a customer reports unexpected charges, the system should recognize that this is likely a billing issue, route it to the right internal queue, decide whether escalation is needed, and apply the right response guidance before drafting the final reply.

In practice, that usually means you need at least these layers:

classification
routing
policy or knowledge lookup
reply drafting
operational metadata like SLA, ownership, and escalation

Without that structure, you end up with a nice demo and a fragile system.

What the starter repo does

The starter focuses on a practical workflow shape instead of pretending a single prompt solves support automation.

It:

classifies inbound tickets into categories like billing, bug, feature, or general support
chooses a queue, owner, priority, and escalation recommendation
looks up policy guidance before drafting the reply
returns internal next actions along with the customer-facing answer
emits Tokvera trace data so the end-to-end request path is inspectable

The goal is not to be a complete helpdesk product.

The goal is to give teams a realistic foundation they can fork and extend.

The API shape

The project exposes a small set of routes for health checks, sample payloads, and workflow execution:

GET /health
GET /api/demo-ticket
GET /api/sample-tickets
POST /api/tickets/reply

Example request:

curl -X POST http://localhost:3000/api/tickets/reply \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Need help understanding extra usage charges",
    "message": "Our finance team saw a larger invoice this week. Can you explain what changed?",
    "plan": "pro",
    "customerName": "Riya",
    "customerEmail": "riya@example.com"
  }'

And the response is more useful than a plain model completion because it contains workflow output, not just generated text:

{
  "traceId": "trc_123",
  "runId": "run_123",
  "triage": {
    "category": "billing",
    "priority": "medium",
    "queue": "billing-ops",
    "suggestedOwner": "billing",
    "suggestedSlaHours": 8,
    "shouldEscalate": false,
    "tone": "reassuring"
  },
  "nextActions": [
    "Assign to billing",
    "Respond within 8 hours",
    "Review included usage, overages, and invoice change history"
  ],
  "reply": "..."
}

Workflow architecture

The starter keeps the flow intentionally simple and inspectable:

Inbound ticket
  -> classify_ticket
  -> lookup_policy
  -> draft_reply
  -> return triage + reply + next actions

That separation matters.

If classification, policy lookup, and drafting all live inside one opaque prompt, you only see the final answer and are left guessing where the system went wrong.

When the workflow is split into distinct steps, debugging becomes much easier.

Why trace visibility matters

Support automation can fail in several different ways:

a billing ticket gets routed to engineering
an enterprise account does not escalate quickly enough
the correct queue is chosen but the wrong policy guidance is used
the workflow classifies correctly but drafts the wrong tone or final answer

If all you can see is the final response, root cause analysis becomes slow and fuzzy.

With trace visibility, you can inspect the workflow path that produced the result and see which step actually broke down.

That is where Tokvera fits into the starter.

Instead of only tracking raw model usage, Tokvera helps you inspect the root workflow trace and the individual decisions made along the way.

Running the project locally

One nice detail in this repo is that it defaults to mock mode.

That makes it useful for local evaluation, demos, screenshots, and onboarding even before you wire in live provider traffic.

npm install
cp .env.example .env
npm run dev

When you want to switch to live traffic, set MOCK_MODE=false and provide:

OPENAI_API_KEY
TOKVERA_API_KEY

That gives teams a clean path from local development to real tracing.

What to customize next

If you want to take this beyond a starter, the next obvious extensions are:

replace the policy lookup stub with a real knowledge base or help center integration
add Slack or email escalation hooks for urgent tickets
persist ticket state and triage output to a database
add provider fallback for reply drafting
build a lightweight support review UI on top of the API

Why this repo is useful

A lot of AI tutorials show how to get text back from a model.

Fewer show how to build a workflow that another team can actually operate.

That is what makes ai-support-router-starter valuable.

It treats customer support AI as a decision pipeline rather than a one-shot prompt, and it gives you a traced, extensible foundation for building something real.

If you want to try it, start with the repo:

GitHub: https://github.com/Tokvera/ai-support-router-starter
Docs: https://tokvera.org/docs/integrations/existing-app
Canonical article: https://tokvera.org/blog/customer-support-ai-router-trace-visibility

DEV Community: TokVera

How to Trace a Deep-Research Workbench in Node.js

Why this is a better starting point than a flashy research demo

Stack

Quick start

Endpoints

Example request

Why the root trace matters

Useful follow-up links

Build a Coding-Agent PR Planner in Node.js with One Root Trace

Why this is a better starting point than full auto-code generation

What the repo does

Stack

Local setup

Endpoints

Example request

Example response shape

Why the root trace matters

Why mock mode is a feature

What I would inspect before trusting a coding agent

Useful follow-up reading

How to Add AI Gateway Observability to a Production Control Plane

Why a gateway becomes hard to debug

What good gateway observability should answer

A practical trace shape

Example request flow

What to instrument first

What AI gateway observability helps with in practice

The main idea

The takeaway

How to Add AI Agent Handoff Observability to a Multi-Step Workflow

Why handoffs are harder than they look

What a handoff trace should show

A practical handoff workflow shape

Example handoff scenario

What breaks without handoff visibility

What to instrument first

Why this matters for agent-to-human systems

The main idea

The takeaway

How to Add LLM Routing Visibility to a Multi-Model App

Why routing visibility matters

A simple routing shape

Example routing logic

What to attach to the trace

What routing visibility helps you debug

Example traced output

The hidden value of routing visibility

Start small

The takeaway

How to Add Ticket Triage Workflow Tracing to a Support AI System

Why support triage needs tracing

A practical triage shape

What to capture in each run

Example request shape

The key idea: treat triage like a workflow, not a prompt

What breaks without visibility

A simple instrumentation mindset

The takeaway

How to Build a LangGraph Support Triage Workflow with Trace Visibility

Why LangGraph workflows need observability

What this starter repo does

The workflow structure is intentionally simple

Why this workflow is more realistic than a simple agent demo

The API surface

What the response gives you

How the workflow behaves

Why Tokvera fits well with LangGraph

What this helps you debug

Running it locally

Why this repo is valuable for Python-first teams

What to customize next

Conclusion

Related links

How to Build an OpenAI-Compatible LLM Gateway with Model Routing Visibility

Why an LLM gateway is useful

What this starter repo does

The API shape stays familiar

How routing works in the starter

Why visibility matters at the gateway layer