<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Devansh</title>
    <description>The latest articles on DEV Community by Devansh (@devansh365).</description>
    <link>https://dev.to/devansh365</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F679755%2F9dc6ebfe-a1d9-4613-8192-f2854324ea75.png</url>
      <title>DEV Community: Devansh</title>
      <link>https://dev.to/devansh365</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devansh365"/>
    <language>en</language>
    <item>
      <title>How to Make Your Codebase Work for AI Coding Agents (Without Better Prompts)</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Wed, 03 Jun 2026 02:57:19 +0000</pubDate>
      <link>https://dev.to/devansh365/how-to-make-your-codebase-work-for-ai-coding-agents-without-better-prompts-kcb</link>
      <guid>https://dev.to/devansh365/how-to-make-your-codebase-work-for-ai-coding-agents-without-better-prompts-kcb</guid>
      <description>&lt;p&gt;Your agent wrote valid code. It still missed the point.&lt;/p&gt;

&lt;p&gt;Wrong package manager. Tests run with a flag your pipeline never uses. Business logic landed in a route handler because the model found a similar file three folders away. You pasted more context, tightened the prompt, ran again. Same failure on the next task.&lt;/p&gt;

&lt;p&gt;That is not a model problem. It is a repo problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5agfm6ftp1x881yp5ju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5agfm6ftp1x881yp5ju.png" alt="Tuning prompts loops the same failures; fixing AGENTS.md and golden commands in the repo reduces repeats" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A wave of posts in early 2026 (Medeiros, Fabisevich, Marmelab, Sourcegraph, Vstorm, and others) converged on the same idea: &lt;strong&gt;agent productivity is architectural&lt;/strong&gt;. Tools matter. Structure and feedback loops matter more.&lt;/p&gt;

&lt;p&gt;This post is a practical distillation. No tool worship. What to add to your repository so Copilot, Claude Code, Cursor, Codex, or whatever you use next month can ship without you re-explaining the project every session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why your prompts stop working
&lt;/h2&gt;

&lt;p&gt;Humans absorb tribal knowledge. Half-documented setup scripts. "Ask Priya about auth." Agents do not ask Priya. They pattern-match on what is in the tree and what they can grep.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.heliomedeiros.com/posts/2025-08-07-agent-friendly-codebase/" rel="noopener noreferrer"&gt;Hélio Medeiros&lt;/a&gt; frames the repository as an interface. &lt;a href="https://www.infoworld.com/article/4142019/coding-for-agents.html" rel="noopener noreferrer"&gt;InfoWorld's "Coding for agents"&lt;/a&gt; goes further: context is infrastructure. Test commands, boundaries, and "do not touch" paths are part of how work runs when the worker is an agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The litmus test&lt;/strong&gt; (use this before you blame the model):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Delete the chat history.&lt;/li&gt;
&lt;li&gt;Open a fresh agent session on the same branch.&lt;/li&gt;
&lt;li&gt;Give one real task: "Add a field to the checkout API" or "Fix the failing test in module X."&lt;/li&gt;
&lt;li&gt;Do not paste architecture essays.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the agent cannot finish using only committed files, you are still carrying the load. The agent is typing.&lt;/p&gt;

&lt;p&gt;That test takes ten minutes. It tells you exactly where to invest next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmdpqs12tggq81lstksn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhmdpqs12tggq81lstksn.png" alt="What to put in AGENTS.md and how to verify a fresh agent session can finish using only the repo" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get when the repo carries the instructions
&lt;/h2&gt;

&lt;p&gt;Teams that retrofit for agents report the same wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fewer wrong commands (install, test, lint, migrate).&lt;/li&gt;
&lt;li&gt;Fewer edits to generated files, lockfiles, or secrets.&lt;/li&gt;
&lt;li&gt;Smaller diffs that match how your team actually layers code.&lt;/li&gt;
&lt;li&gt;Less time re-typing "we use pnpm" or "migrations are generated" in every thread.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://oss.vstorm.co/blog/agents-md-ai-friendly-codebase/" rel="noopener noreferrer"&gt;Vstorm's guide&lt;/a&gt; and &lt;a href="https://cobusgreyling.substack.com/p/what-is-agentsmd" rel="noopener noreferrer"&gt;community writeups on AGENTS.md&lt;/a&gt; put the setup time at roughly &lt;strong&gt;15 minutes&lt;/strong&gt; for a first version. The payback shows up in the first week of review loops you do not have to run.&lt;/p&gt;

&lt;p&gt;You are not building for robots. You are writing down what a good senior engineer would need on day one. Agents just force the issue because they never attend onboarding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Add &lt;code&gt;AGENTS.md&lt;/code&gt; at the repo root
&lt;/h2&gt;

&lt;p&gt;A year ago every tool wanted its own rules file. &lt;code&gt;.cursorrules&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;, tool-specific Gemini configs. Same conventions copied four times, drifting within weeks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/strong&gt; is the convention that stuck: one Markdown file at the root that multiple agents read. Plain text. No JSON schema. Works across Copilot, Codex, Claude Code, Cursor, and others (see &lt;a href="https://dev.to/jason_peterson_607e54abf5/testing-agentsmd-across-three-agentic-coding-platforms-universal-context-has-arrived-1lg0"&gt;this cross-platform test on DEV&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Keep tool-specific extras if you want (&lt;code&gt;CLAUDE.md&lt;/code&gt; for Claude-only workflow). &lt;strong&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; should stand alone.&lt;/strong&gt; If an agent reads one file, it should still know how to work here.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to put in it (highest leverage first)
&lt;/h3&gt;

&lt;p&gt;Copy this skeleton and fill in the blanks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# AGENTS.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project overview&lt;/span&gt;
[Name]. [One line: what it does].
Stack: [language, framework, database, package manager].

&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="gh"&gt;# Install&lt;/span&gt;
[exact command]

&lt;span class="gh"&gt;# Dev&lt;/span&gt;
[exact command]

&lt;span class="gh"&gt;# Test&lt;/span&gt;
[exact command]

&lt;span class="gh"&gt;# Lint / format&lt;/span&gt;
[exact command]

&lt;span class="gu"&gt;## Structure&lt;/span&gt;
[key directories only, 10-15 lines max]
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`src/api/`&lt;/span&gt;: HTTP handlers
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`src/domain/`&lt;/span&gt;: business rules
&lt;span class="p"&gt;-&lt;/span&gt; ...

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [Where new endpoints go]
&lt;span class="p"&gt;-&lt;/span&gt; [How you name tests]
&lt;span class="p"&gt;-&lt;/span&gt; [Patterns agents get wrong: e.g. flush() in repo, not commit()]

&lt;span class="gu"&gt;## Do not modify&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [generated migrations]
&lt;span class="p"&gt;-&lt;/span&gt; [lockfiles]
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`.env`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [auto-generated docs]

&lt;span class="gu"&gt;## More context&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`docs/architecture.md`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`CONTRIBUTING.md`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sections that prevent the most damage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section&lt;/th&gt;
&lt;th&gt;What it stops&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Commands&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pip&lt;/code&gt; vs &lt;code&gt;uv&lt;/code&gt;, &lt;code&gt;npm&lt;/code&gt; vs &lt;code&gt;pnpm&lt;/code&gt;, wrong test runner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;Logic dropped in &lt;code&gt;main.ts&lt;/code&gt; or the wrong package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conventions&lt;/td&gt;
&lt;td&gt;Architecturally "valid" code that violates your patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Do not modify&lt;/td&gt;
&lt;td&gt;Ruined migrations, committed secrets, reformatted lockfiles&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Do not paste your entire README. &lt;a href="https://www.infoworld.com/article/4142019/coding-for-agents.html" rel="noopener noreferrer"&gt;OpenAI's harness engineering notes&lt;/a&gt; (summarized widely in 2026) argue that one giant agent manual goes stale. Use &lt;strong&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; as a map&lt;/strong&gt;, not an encyclopedia.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Add &lt;code&gt;llms.txt&lt;/code&gt; if agents need a wider map
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://build.ms/2026/3/11/small-steps-for-agent-friendly-codebases/" rel="noopener noreferrer"&gt;Joe Fabisevich's Recap 2.0 writeup&lt;/a&gt; describes a small &lt;code&gt;llms.txt&lt;/code&gt; that points agents at the right docs without dumping the whole repo into context.&lt;/p&gt;

&lt;p&gt;Use it for pointers, not rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# llms.txt
/docs/architecture.md
/docs/api.md
/CONTRIBUTING.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Put operational rules in &lt;code&gt;AGENTS.md&lt;/code&gt;. Put "where to look next" in &lt;code&gt;llms.txt&lt;/code&gt; or &lt;code&gt;public/llms.txt&lt;/code&gt; for web projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: One golden path for commands (and match CI)
&lt;/h2&gt;

&lt;p&gt;Medeiros recommends stable entrypoints, often wrapped in Make:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make bootstrap
make &lt;span class="nb"&gt;test
&lt;/span&gt;make lint
make run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your implementation can be npm scripts, pnpm, mise, or a Taskfile. The agent does not care about the wrapper. It cares that &lt;strong&gt;one string always works on a clean clone&lt;/strong&gt; and that &lt;strong&gt;CI runs the same string&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Bad state: local &lt;code&gt;npm test&lt;/code&gt;, CI &lt;code&gt;pnpm test --filter=api&lt;/code&gt;. The agent optimizes for whatever just ran in the terminal. You merge green locally and red in the pipeline.&lt;/p&gt;

&lt;p&gt;Good state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vitest run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint ."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and the workflow file calls &lt;code&gt;pnpm test&lt;/code&gt; and &lt;code&gt;pnpm lint&lt;/code&gt;, not a different incantation.&lt;/p&gt;

&lt;p&gt;When verification is slow or flaky, the agent becomes a diff machine and you become the test runner. Fast unit tests on pure domain code (where you have any) shorten the loop more than swapping to a frontier model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Shrink where a change is allowed to live
&lt;/h2&gt;

&lt;p&gt;You do not need hexagonal architecture on every side project. You do need &lt;strong&gt;obvious boundaries&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Medeiros and others recommend ports-and-adapters style layouts because they make violations visible: domain code cannot import the database driver, so the build fails when an agent takes a shortcut.&lt;/p&gt;

&lt;p&gt;Transferable pattern for any stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put business rules in one place (domain, &lt;code&gt;core/&lt;/code&gt;, &lt;code&gt;lib/domain/&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Keep framework glue thin (handlers, UI routes, CLI).&lt;/li&gt;
&lt;li&gt;Wire dependencies at the edges (&lt;code&gt;main&lt;/code&gt;, &lt;code&gt;app/&lt;/code&gt;, composition root).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a feature-folder Next.js app, that might mean: routes in &lt;code&gt;app/&lt;/code&gt;, product logic in &lt;code&gt;features/*/&lt;/code&gt;, shared MDX paths documented in &lt;code&gt;AGENTS.md&lt;/code&gt; so "add a blog post" does not create &lt;code&gt;data/blog/&lt;/code&gt; and &lt;code&gt;features/blog/data/posts/&lt;/code&gt; on the same day.&lt;/p&gt;

&lt;p&gt;Add a &lt;strong&gt;one-paragraph README&lt;/strong&gt; in folders agents confuse often (&lt;code&gt;src/billing/&lt;/code&gt;, &lt;code&gt;packages/api/&lt;/code&gt;). Agents frequently read folder READMEs when they list a directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Treat agent mistakes as repo tickets
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://marmelab.com/blog/2026/01/21/agent-experience.html" rel="noopener noreferrer"&gt;Marmelab's agent experience post&lt;/a&gt; is long. The habit worth stealing is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every time an agent does something stupid, ask if the repository should have prevented it.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent mistake&lt;/th&gt;
&lt;th&gt;Repo fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Wrong test command&lt;/td&gt;
&lt;td&gt;Add to &lt;code&gt;AGENTS.md&lt;/code&gt; Commands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reinvented helper&lt;/td&gt;
&lt;td&gt;Add convention: search before creating&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Same formatting nit on every PR&lt;/td&gt;
&lt;td&gt;Pre-commit hook or agent hook&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broke auth on a "small" change&lt;/td&gt;
&lt;td&gt;Document blast radius; list related paths in &lt;code&gt;AGENTS.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tooling and MCP servers come last in their ordering. Most teams still fail on missing context, not missing plugins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 80% problem (and what to do at your scale)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://sourcegraph.com/blog/agentic-coding" rel="noopener noreferrer"&gt;Sourcegraph's agentic coding guide&lt;/a&gt; names a pattern teams recognize: the agent finishes the &lt;strong&gt;visible&lt;/strong&gt; 80%. Tests pass in the files it touched. Days later, CI fails elsewhere because middleware, DTOs, audit logs, or a sibling service still expect the old contract.&lt;/p&gt;

&lt;p&gt;That is incomplete context, not stupidity.&lt;/p&gt;

&lt;p&gt;On a single app, blast radius is smaller. Still run this before you call a task done: &lt;strong&gt;grep for every symbol the agent renamed or exported.&lt;/strong&gt; Open files it never touched. If something depends on the old shape, the task is not done.&lt;/p&gt;

&lt;p&gt;On large or multi-repo codebases, you need deterministic cross-repo search and explicit scoping before merge. The fix scales up; the diagnosis stays the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your 30-minute retrofit checklist
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt5bf8l6rblf93fsl0l7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt5bf8l6rblf93fsl0l7.png" alt="Five-step retrofit checklist and the invisible cross-cutting dependencies agents often miss" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Do this on the repo you use agents on most:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write&lt;/strong&gt; &lt;code&gt;AGENTS.md&lt;/code&gt; using the skeleton above (15 minutes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Align local test/lint with CI&lt;/strong&gt; (one script name, both places).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add folder READMEs&lt;/strong&gt; where agents keep landing wrong (5 minutes each, only where needed).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run the litmus test&lt;/strong&gt; with a fresh session and one real task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After the task&lt;/strong&gt;, add one line to &lt;code&gt;AGENTS.md&lt;/code&gt; for anything the agent had to be told in chat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Start on a small project if you are learning the pattern. &lt;a href="https://build.ms/2026/3/11/small-steps-for-agent-friendly-codebases/" rel="noopener noreferrer"&gt;Fabisevich's advice&lt;/a&gt; is to practice on something bounded, then port the habits to the big codebase.&lt;/p&gt;

&lt;p&gt;Reading about agent-friendly repos does nothing until a file lands in git. The litmus test is the scoreboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;Primary sources behind this post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://speakerdeck.com/helmedeiros/its-time-for-an-agent-friendly-codebase" rel="noopener noreferrer"&gt;It's Time for an Agent-Friendly Codebase&lt;/a&gt; (Hélio Medeiros) and &lt;a href="https://blog.heliomedeiros.com/posts/2025-08-07-agent-friendly-codebase/" rel="noopener noreferrer"&gt;blog companion&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://build.ms/2026/3/11/small-steps-for-agent-friendly-codebases/" rel="noopener noreferrer"&gt;Small Steps For Agent-Friendly Codebases&lt;/a&gt; (Joe Fabisevich)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://oss.vstorm.co/blog/agents-md-ai-friendly-codebase/" rel="noopener noreferrer"&gt;AGENTS.md: AI-Agent Friendly Codebase Guide&lt;/a&gt; (Vstorm OSS)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://marmelab.com/blog/2026/01/21/agent-experience.html" rel="noopener noreferrer"&gt;Agent Experience: Best Practices&lt;/a&gt; (Marmelab)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.infoworld.com/article/4142019/coding-for-agents.html" rel="noopener noreferrer"&gt;Coding for agents&lt;/a&gt; (InfoWorld)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sourcegraph.com/blog/agentic-coding" rel="noopener noreferrer"&gt;Agentic Coding in 2026: A Practical Guide for Big Code&lt;/a&gt; (Sourcegraph)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cobusgreyling.substack.com/p/what-is-agentsmd" rel="noopener noreferrer"&gt;What is AGENTS.md?&lt;/a&gt; (Cobus Greyling)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/jason_peterson_607e54abf5/testing-agentsmd-across-three-agentic-coding-platforms-universal-context-has-arrived-1lg0"&gt;Testing AGENTS.md Across Three Platforms&lt;/a&gt; (DEV Community)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>openai</category>
    </item>
    <item>
      <title>OpenCode Go + Oh My OpenAgent: The Model Routing Config That Actually Saves Money</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Sun, 24 May 2026 02:53:03 +0000</pubDate>
      <link>https://dev.to/devansh365/opencode-go-oh-my-openagent-the-model-routing-config-that-actually-saves-money-3jmj</link>
      <guid>https://dev.to/devansh365/opencode-go-oh-my-openagent-the-model-routing-config-that-actually-saves-money-3jmj</guid>
      <description>&lt;p&gt;Most guides on OpenCode Go start with the models. I want to start with the thing most guides get wrong: the limits are denominated in dollars, not requests.&lt;/p&gt;

&lt;p&gt;That sounds like a minor distinction. It isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing everyone misses
&lt;/h2&gt;

&lt;p&gt;OpenCode Go costs $5 for the first month, then $10/month. Your usage cap is $12 per 5-hour window, $30/week, $60/month.&lt;/p&gt;

&lt;p&gt;When you spend $12 in a 5-hour window on DeepSeek V4 Flash, you get approximately 31,650 requests. When you spend the same $12 on GLM-5.1, you get around 880. Same budget. 36x difference in volume.&lt;/p&gt;

&lt;p&gt;This is why routing actually matters. If you pick one model and use it for everything, you are either burning premium requests on tasks that don't need them, or you are under-using cheap models that are surprisingly capable. The right move is assigning models to tasks based on what each task actually requires.&lt;/p&gt;

&lt;p&gt;MiniMax M2.5 has a hard cap of 100,000 requests per month regardless of cost. It activates only ~10B parameters and is priced at 16.7x cheaper than Claude Opus 4.6 on input tokens. For high-volume low-complexity work, it is the obvious choice, and most people don't know it exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you lose running on a single premium model
&lt;/h2&gt;

&lt;p&gt;Say you put everything through DeepSeek V4 Pro: 10,200 requests per 5-hour window. That sounds fine for light use. But Oh My OpenAgent runs multiple agents in parallel. Prometheus decomposes your task, Metis synthesizes context, Atlas manages sequencing, Sisyphus runs execution, and the Librarian reads docs. A single complex task can fan out into 30-50 requests without you doing anything. Your 5-hour budget evaporates in a few hours of active work.&lt;/p&gt;

&lt;p&gt;The problem isn't the quality gap. V4 Pro at 80.6% is within 7 percentage points of Claude Opus 4.7 at 87.6%, and for most routine tickets that gap is invisible. The problem is you don't need that quality for every step of a multi-agent workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tier breakdown with actual numbers
&lt;/h2&gt;

&lt;p&gt;Here is what the available models score on benchmarks that matter for coding tasks, plus the API pricing that drives the routing math:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;SWE-Bench Verified&lt;/th&gt;
&lt;th&gt;Input price (per M tokens)&lt;/th&gt;
&lt;th&gt;Requests/5hrs ($12)&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;87.6%&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;~480&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;td&gt;80.6%&lt;/td&gt;
&lt;td&gt;$0.435 (promo, ends May 31)&lt;/td&gt;
&lt;td&gt;~5,500&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;80.2%&lt;/td&gt;
&lt;td&gt;$0.95&lt;/td&gt;
&lt;td&gt;~2,500&lt;/td&gt;
&lt;td&gt;256K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;79.6%&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;~800&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiMo-V2.5-Pro&lt;/td&gt;
&lt;td&gt;78.9%&lt;/td&gt;
&lt;td&gt;~$0.40&lt;/td&gt;
&lt;td&gt;~6,000&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3.6 Plus&lt;/td&gt;
&lt;td&gt;78.8%&lt;/td&gt;
&lt;td&gt;$0.325&lt;/td&gt;
&lt;td&gt;~7,400&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;~79.0%&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;~17,000&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM-5.1&lt;/td&gt;
&lt;td&gt;SWE-Bench Pro 58.4%&lt;/td&gt;
&lt;td&gt;~$1.50&lt;/td&gt;
&lt;td&gt;~1,600&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3.5 Plus&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;~30,000&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiniMax M2.5&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;td&gt;up to 100K/month&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;(Requests per 5-hour window calculated at roughly 2,500 average tokens per request.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoj9u2dn5r4rntiska4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoj9u2dn5r4rntiska4w.png" alt="Cost vs performance" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Kimi K2.6 original series was discontinued on May 25, 2026. The model itself stays available but the series is no longer receiving updates. DeepSeek V4 Pro's promotional pricing ($0.435/M) ends May 31 — after that the price increases, which changes the requests-per-window math.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Opus 4.7 at 87.6% is genuinely the strongest model available for coding tasks right now, 7 points above V4 Pro. But at $5/M tokens, it costs 35x more than DeepSeek V4 Flash per token. Within the $12/5hr window, you get around 480 Opus 4.7 requests vs 17,000 Flash requests.&lt;/p&gt;

&lt;p&gt;DeepSeek V4 Flash sits within one point of V4 Pro in benchmark performance but at about 3x lower cost per token. For most routine coding tasks, that gap does not show up in practice. V4 Flash runs 284B total parameters with 13B active. V4 Pro runs 1.6T total with 49B active.&lt;/p&gt;

&lt;p&gt;Kimi K2.6 is a 1-trillion-parameter MoE model with 32B active parameters, 80.2% SWE-Bench Verified. That puts it above Qwen3.6 Plus and close to V4 Pro, making it the right choice for genuinely hard multi-step reasoning when V4 Flash stalls.&lt;/p&gt;

&lt;p&gt;GLM-5.1 sits at 744B total / 40B active. Its 200K context makes it suitable for deep planning tasks, and it handles the Oracle and Prometheus roles well at a mid-range cost point.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Oh My OpenAgent is structured
&lt;/h2&gt;

&lt;p&gt;Oh My OpenAgent v4.2.3 (as of May 2026, with 48K+ GitHub stars) uses a 3-layer architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Planning Layer&lt;/strong&gt; handles strategic decomposition and knowledge synthesis. Two agents: Prometheus (breaks down what needs to happen) and Metis (synthesizes context and prior knowledge).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestration Layer&lt;/strong&gt; is Atlas. It maintains a todo-list, enforces sequencing, and tracks completion. It does not do the work itself. It manages what gets done in what order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution Layer&lt;/strong&gt; is where the work happens. Sisyphus is the default orchestrator with a 32K extended thinking budget. Nine or more specialized agents handle specific task types.&lt;/p&gt;

&lt;p&gt;v4.0.0 introduced Team Mode, which activates 7 additional hooks (61 total vs 54 in standard mode). Team Mode is worth enabling if you are running parallel workstreams. It is off by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The routing configuration
&lt;/h2&gt;

&lt;p&gt;This is the community-recommended agent-to-model assignment. It is the result of a lot of trial and error, not theory:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Primary Model&lt;/th&gt;
&lt;th&gt;Fallback&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sisyphus&lt;/td&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Pro, then Qwen3.6 Plus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hephaestus&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Flash, then Kimi K2.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Oracle&lt;/td&gt;
&lt;td&gt;GLM-5.1&lt;/td&gt;
&lt;td&gt;Kimi K2.6, then DeepSeek V4 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Librarian&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;Qwen3.5 Plus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explore&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;GLM-5.1&lt;/td&gt;
&lt;td&gt;Qwen3.6 Plus, then DeepSeek V4 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metis&lt;/td&gt;
&lt;td&gt;Qwen3.6 Plus&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Atlas&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code-reviewer&lt;/td&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal-Looker&lt;/td&gt;
&lt;td&gt;MiMo-V2.5-Pro&lt;/td&gt;
&lt;td&gt;Qwen3.6 Plus&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsfqa2yj5vvms4o0stf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmsfqa2yj5vvms4o0stf5.png" alt="Agent routing m" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sisyphus gets Kimi K2.6 because it runs extended thinking at up to 32K tokens. You want the strongest reasoning model here, even at lower volume. Kimi's 256K context window handles long execution traces.&lt;/p&gt;

&lt;p&gt;Librarian and Explore get V4 Flash. These agents read docs, fetch context, and do lookup work. They do not need frontier-level reasoning. Wasting V4 Pro on Librarian is the single most common budget mistake I see.&lt;/p&gt;

&lt;p&gt;Oracle and Prometheus both get GLM-5.1. Planning and deep reasoning are where GLM-5.1 earns its slot. It is not the cheapest model, but it is not the most expensive either, and it performs well on the kinds of open-ended decomposition tasks these agents handle.&lt;/p&gt;

&lt;p&gt;Hephaestus (the primary coding agent) gets V4 Pro as primary with V4 Flash as fallback. The gap between them is small enough that on simpler coding tasks, falling back to Flash costs you nothing visible.&lt;/p&gt;

&lt;p&gt;MiMo-V2.5-Pro on Multimodal-Looker is deliberate. It scored 78.9% on SWE-Bench Verified and is specifically designed for agentic workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The routing decision rule
&lt;/h2&gt;

&lt;p&gt;Route through V4 Flash first for any task that will exceed 100 requests. Escalate to Kimi K2.6 or V4 Pro only if V4 Flash gets stuck.&lt;/p&gt;

&lt;p&gt;This works because V4 Flash at 79.0% SWE-Bench Verified handles the majority of real-world coding tasks correctly. The one-point gap to V4 Pro is real but rarely shows up unless you are hitting genuinely hard tickets. When it does, the fallback chain handles it.&lt;/p&gt;

&lt;p&gt;Do not escalate preemptively. Let the model fail first, then escalate. Preemptive escalation is how you burn through your window in an hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb8umlny27ljuymztw8d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb8umlny27ljuymztw8d.png" alt="Budget comparison" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What $10/month actually buys
&lt;/h2&gt;

&lt;p&gt;At $60/month hard cap (the monthly ceiling), here is the math:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~5 active hours per day across 5 working days = 25 hours of active window time&lt;/li&gt;
&lt;li&gt;Each 5-hour window: $12 budget&lt;/li&gt;
&lt;li&gt;Routed correctly, a typical Oh My OpenAgent session on a medium-complexity feature might use 400-600 requests, weighted toward V4 Flash and Qwen3.5 Plus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice: you can run 8-12 substantial coding sessions per month before feeling the ceiling. For individual developer use, $10/month is genuinely enough. OpenCode hit 150K GitHub stars in May 2026 in part because that math works out.&lt;/p&gt;

&lt;p&gt;The realistic comparison: Claude API at similar quality levels would cost $150-300/month for the same volume. That is where the 10-20x cost reduction claim comes from, and in my experience it holds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest trade-off
&lt;/h2&gt;

&lt;p&gt;The gap between this stack and Claude Opus 4.7 on real-world bug fixes is about 7 percentage points. That is real. Some tickets require multiple iterations where Claude would have gotten it right once. Budget for that.&lt;/p&gt;

&lt;p&gt;The 7-point gap is an average across all task types. On well-scoped tickets with clear acceptance criteria, the gap narrows significantly. The routing configuration is specifically designed to escalate to Kimi K2.6 or V4 Pro on the tasks where that gap is most likely to show up.&lt;/p&gt;

&lt;p&gt;Where this stack genuinely struggles: ambiguous requirements, complex multi-file refactors with implicit dependencies, and tasks that require understanding undocumented system behavior. On those, premium models earn their cost. The routing configuration handles this by putting Kimi K2.6 on the hardest tasks, but Kimi has a 256K context window vs Qwen3.6 Plus's 1M, so very long context tasks may require a different allocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual configuration
&lt;/h2&gt;

&lt;p&gt;Two files control everything: &lt;code&gt;opencode.json&lt;/code&gt; at your project root, and &lt;code&gt;.omc/config.json&lt;/code&gt; for Oh My OpenAgent routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;opencode.json&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://opencode.ai/config.schema.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"theme"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"opencode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"autoshare"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"opencode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"kimi-k2.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"qwen3.6-plus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"qwen3.5-plus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"mimo-v2.5-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"minimax-m2.5"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;"model"&lt;/code&gt; field sets your default. V4 Flash is the right default because it handles most tasks at lowest cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.omc/config.json&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4.2.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"teamMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sisyphus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kimi-k2.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3.6-plus"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"thinkingBudget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32000&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hephaestus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kimi-k2.6"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"oracle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"kimi-k2.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prometheus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.6-plus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"metis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3.6-plus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"atlas"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"librarian"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.5-plus"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"explore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code-reviewer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kimi-k2.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"multimodal-looker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mimo-v2.5-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.6-plus"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"routing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"escalationPolicy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"on-failure"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"budgetAlert"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"windowBudget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;12.00&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;escalationPolicy: "on-failure"&lt;/code&gt; enforces the core rule: models escalate only when the primary fails, not preemptively. &lt;code&gt;budgetAlert&lt;/code&gt; triggers a warning at $10 so you know you have $2 left in the window before the ceiling hits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install OpenCode Go&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; opencode

&lt;span class="c"&gt;# Install Oh My OpenAgent&lt;/span&gt;
npx omc &lt;span class="nb"&gt;install &lt;/span&gt;oh-my-openagent

&lt;span class="c"&gt;# Create opencode.json and .omc/config.json from the templates above, then:&lt;/span&gt;
omc init &lt;span class="nt"&gt;--preset&lt;/span&gt; oh-my-openagent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your current window spend&lt;/span&gt;
opencode usage &lt;span class="nt"&gt;--window&lt;/span&gt; current
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Knowing where you are in the $12 window changes how aggressively you escalate to premium models.&lt;/p&gt;




&lt;p&gt;For a deeper walkthrough of the original configuration approach, the guide that got me started is Jatin Malik's post: &lt;a href="https://medium.com/@jatinkrmalik/opencode-go-oh-my-openagent-the-complete-guide-to-sota-model-routing-without-hitting-limits-49fdc8cb3417" rel="noopener noreferrer"&gt;OpenCode Go + Oh My OpenAgent: The Complete Guide to SOTA Model Routing Without Hitting Limits&lt;/a&gt;. It covers the earlier v4.0-v4.1 configuration in detail and is worth reading alongside this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Browser SDK That Detects LLM Agents. Here's How It Works.</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Fri, 22 May 2026 04:19:55 +0000</pubDate>
      <link>https://dev.to/devansh365/i-built-a-browser-sdk-that-detects-llm-agents-heres-how-it-works-3bdk</link>
      <guid>https://dev.to/devansh365/i-built-a-browser-sdk-that-detects-llm-agents-heres-how-it-works-3bdk</guid>
      <description>&lt;p&gt;Every bot detection system I've seen works with two actors: human or bot. Block the bot, let the human through.&lt;/p&gt;

&lt;p&gt;That model is wrong in 2026.&lt;/p&gt;

&lt;p&gt;There is a third actor: AI agents acting legitimately on behalf of real users. Shopping assistants. Automated onboarding flows. Fintech integrations. These agents look like bots by every traditional signal: headless browser characteristics, scripted input patterns, no idle pauses. But blocking them means turning away real business.&lt;/p&gt;

&lt;p&gt;I built Nyasa to handle all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why existing tools fail now
&lt;/h2&gt;

&lt;p&gt;CAPTCHA was built for bots that couldn't read distorted text. Those bots are dead. CAPTCHA farms solve challenges at $0.50 each. LLM vision models solve them in milliseconds.&lt;/p&gt;

&lt;p&gt;Device fingerprinting catches webdriver flags and automation markers. Modern headless browsers patch those out. Playwright, Puppeteer, and every major automation framework have community patches specifically for passing fingerprint checks.&lt;/p&gt;

&lt;p&gt;Behavioral analytics (typing cadence, mouse movement) catch scripted bots. But LLM agents don't use scripted input anymore. They type at 60-80 WPM with realistic keystroke intervals, move the mouse in curved paths, and pause at form fields before filling them.&lt;/p&gt;

&lt;p&gt;The deeper problem is architectural. Existing systems ask: "Is this a bot?" Nyasa asks: "Who is this session?" The answer is one of three things.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three-actor model
&lt;/h2&gt;

&lt;p&gt;Nyasa classifies every session as exactly one of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Human&lt;/strong&gt;: no detection rules fired&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AuthorizedAgent&lt;/strong&gt;: holds a valid cryptographic identity claim, automatically bypasses bot rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UnauthorizedBot&lt;/strong&gt;: one or more detection rules fired&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AuthorizedAgent category is the real addition. An AI shopping assistant built on top of your product shouldn't have to pass a CAPTCHA. It should present a signed identity claim, and your system should recognize and respect it. Nyasa handles that handshake.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3kk01v6tf9e8ugawtxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3kk01v6tf9e8ugawtxr.png" alt="Three distinct session verdict types: Human, AuthorizedAgent, and UnauthorizedBot. Both unauthorized LLM agents and headless bots resolve to UnauthorizedBot, but badge labels differentiate them for downstream routing." width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The signal stack
&lt;/h2&gt;

&lt;p&gt;24 signals total, split across three layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behavioral signals (13)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Keystroke dwell and flight time&lt;/td&gt;
&lt;td&gt;How long keys are held, time between keystrokes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mouse path curvature&lt;/td&gt;
&lt;td&gt;Deviation from straight-line movement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paste vs typed ratio&lt;/td&gt;
&lt;td&gt;Whether text was typed character by character or bulk-pasted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Click precision (center offset)&lt;/td&gt;
&lt;td&gt;Distance from click point to element center&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session burst-pause rhythm&lt;/td&gt;
&lt;td&gt;Alternation between fast activity and idle gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backspace corrections&lt;/td&gt;
&lt;td&gt;Correction frequency during text input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scroll depth&lt;/td&gt;
&lt;td&gt;How far down the page a session goes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Touch mechanics&lt;/td&gt;
&lt;td&gt;Multi-touch patterns and pressure distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Field-level timing&lt;/td&gt;
&lt;td&gt;Time spent on each form field before moving on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input origin&lt;/td&gt;
&lt;td&gt;Typed vs pasted vs dropped vs programmatic fill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tab visibility&lt;/td&gt;
&lt;td&gt;Whether the session loses and regains focus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File upload mechanics&lt;/td&gt;
&lt;td&gt;How files are attached (drag, click, or programmatic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session rhythm&lt;/td&gt;
&lt;td&gt;Overall pace and structure of the session&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Fingerprint signals (8)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Webdriver and CDP markers&lt;/td&gt;
&lt;td&gt;Automation framework artifacts in the browser object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iframe vs parent plugin consistency&lt;/td&gt;
&lt;td&gt;Mismatches between iframe and parent context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Canvas fingerprint hash&lt;/td&gt;
&lt;td&gt;Rendering environment identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebGL renderer string&lt;/td&gt;
&lt;td&gt;SwiftShader and LLVMpipe detection (headless GPU emulation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio fingerprint via OfflineAudioContext&lt;/td&gt;
&lt;td&gt;Audio processing environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Incognito detection via storage quota probe&lt;/td&gt;
&lt;td&gt;Private browsing mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timezone vs navigator.language consistency&lt;/td&gt;
&lt;td&gt;Region mismatch between system and browser config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistent device UUID with isNew flag&lt;/td&gt;
&lt;td&gt;Tracks whether this device has been seen before&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Network signals (3)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Page reaction time&lt;/td&gt;
&lt;td&gt;Time from page load to first interaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection type&lt;/td&gt;
&lt;td&gt;Network type reported by Navigator API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page load timing&lt;/td&gt;
&lt;td&gt;Performance timing breakdown&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Detection rules
&lt;/h2&gt;

&lt;p&gt;Six rules run against the collected signals. Each rule fires independently. Any fired rule pushes the verdict toward UnauthorizedBot, except isAuthorizedAgent which short-circuits to AuthorizedAgent regardless of other rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isHeadless&lt;/strong&gt; reads the fingerprint layer for automation markers: webdriver properties, CDP exposure, WebGL renderer strings like SwiftShader or LLVMpipe, iframe/parent inconsistencies. If the browser environment looks like a headless runner, this fires.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isScripted&lt;/strong&gt; reads behavioral signals for bot-like input patterns. Scripted bots fill fields in milliseconds, never backspace, and move between fields in perfect sequence. This rule catches that pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isLLMAgent&lt;/strong&gt; is the hardest one. I'll cover it in the next section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isAuthorizedAgent&lt;/strong&gt; reads the agent identity claim from &lt;code&gt;window.__nyasaAgentSignature&lt;/code&gt; or a meta tag. If a valid claim is present, the session is classified as AuthorizedAgent and no further rules are evaluated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isUploadAutomation&lt;/strong&gt; checks file upload mechanics. Human uploads involve a file picker dialog or drag interaction. Programmatic uploads bypass both. This rule catches the bypass pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;isMultimodalBot&lt;/strong&gt; looks for cross-signal contradictions: a session that passes one rule but shows soft signals consistent with automation across several others. It reads sibling DetectionResults directly rather than re-sampling signals, which avoids divergence when two rules read the same underlying data at different times.&lt;/p&gt;

&lt;h2&gt;
  
  
  isLLMAgent deep dive
&lt;/h2&gt;

&lt;p&gt;This is the rule I spent the most time on. LLM agents are genuinely hard to distinguish from fast, focused humans. Seven signals, evaluated together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Machine-speed keystroke bursts under 20ms.&lt;/strong&gt; Human dwell times cluster around 80-200ms. Sub-20ms bursts don't happen in human typing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mouse stillness above 70%.&lt;/strong&gt; Humans move the mouse constantly, even when not clicking. LLM-driven sessions often keep the cursor parked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uniform keystroke variance near zero.&lt;/strong&gt; Human typing has natural rhythm variation. LLM agent keystrokes have suspiciously consistent intervals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero backspace rate.&lt;/strong&gt; Humans make corrections. An agent filling a form it computed upfront doesn't backspace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pixel-perfect click precision.&lt;/strong&gt; Humans click near the center of an element but not exactly on it. Agents click at the computed center coordinate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing field exploration.&lt;/strong&gt; Humans often click into a field, leave, return, re-read the label. LLM agents visit each field once in sequence and move on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No idle micro-pauses.&lt;/strong&gt; Humans have sub-second pauses between thoughts. Agent sessions show continuous forward progress.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No single signal is definitive. A fast typist has low keystroke variance. A focused user might not backspace. isLLMAgent requires several of these signals to align before it fires.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe74yj8xq6nny3l1tbdlk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe74yj8xq6nny3l1tbdlk.png" alt="Side-by-side comparison of behavioral signals between a human typist and an LLM agent across seven dimensions: keystroke interval distribution, mouse activity, keystroke variance, backspace rate, click precision, field exploration pattern, and micro-pause presence." width="800" height="1071"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature extraction architecture
&lt;/h2&gt;

&lt;p&gt;Early versions of Nyasa had each detection rule computing its own derived metrics from raw signals. That caused two problems: duplicated math across rules, and rules diverging when they read the same underlying signal at slightly different times during a session.&lt;/p&gt;

&lt;p&gt;The feature extraction layer solves this. It runs once per session and computes 8 shared derived metrics before any detection rule evaluates. Every rule reads from the same computed values.&lt;/p&gt;

&lt;p&gt;The metrics include things like overall typing variance, click precision distribution, and mouse activity ratio. Computing them once means a detection rule that needs "mouse stillness percentage" and a sibling rule that also needs it will always agree on the number.&lt;/p&gt;

&lt;p&gt;isMultimodalBot benefits from this the most. It reads the DetectionResults of its sibling rules rather than re-running signal evaluation. Near-miss composition (where a session nearly triggers multiple rules without fully triggering any) gets caught without any rule having to re-sample data that may have aged out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The verdict system
&lt;/h2&gt;

&lt;p&gt;Every session gets a verdict object with three fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;NyasaVerdict&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Human&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AuthorizedAgent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;UnauthorizedBot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// 0.0 to 1.0&lt;/span&gt;
  &lt;span class="nl"&gt;badges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DetectionBadge&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;     &lt;span class="c1"&gt;// which rules fired or nearly fired&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confidence is a noisy-OR score across all active rules. If one rule fires with 0.8 confidence and a second fires with 0.6, the combined score is &lt;code&gt;1 - (1 - 0.8) * (1 - 0.6)&lt;/code&gt; = 0.92. Multiple weak signals compound.&lt;/p&gt;

&lt;p&gt;Badge labels tell you which rules contributed. A session might come back as UnauthorizedBot with badges for &lt;code&gt;isHeadless&lt;/code&gt; and &lt;code&gt;isLLMAgent&lt;/code&gt;, which tells you this is a headless LLM agent. That gets different handling than a scripted form filler.&lt;/p&gt;

&lt;p&gt;The verdict payload ships via &lt;code&gt;navigator.sendBeacon&lt;/code&gt;. Non-blocking, fires after the page interaction completes, survives page unload. Your analytics pipeline or backend decision layer receives it without adding latency to the user-facing flow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45qn6qq85m8hbhktweeh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45qn6qq85m8hbhktweeh.png" alt="How 24 signals across three collection layers feed into a shared feature extraction layer, then into six independent detection rules, and finally into a single typed verdict with confidence score and badge labels." width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61ebk1qdaoadnl39gt26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61ebk1qdaoadnl39gt26.png" alt="Nyasa SDK Architecture" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The SDK runs entirely in the browser. Signals are collected passively as the session progresses. The feature extraction layer runs on a timer and on key events. Detection rules evaluate when a verdict is requested or automatically at session end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dual packaging
&lt;/h2&gt;

&lt;p&gt;Nyasa ships as both ESM and IIFE from a single tsup build config.&lt;/p&gt;

&lt;p&gt;ESM for bundlers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createNyasa&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@devanshhq/nyasa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nyasa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createNyasa&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-backend.com/nyasa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agentBypass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;nyasa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;IIFE for script tags, no bundler required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://cdn.jsdelivr.net/npm/@devanshhq/nyasa/dist/nyasa.iife.js"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;Nyasa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createNyasa&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/nyasa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;agentBypass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same source, two outputs. The tsup config handles the split. No separate build for CDN distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @devanshhq/nyasa
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Minimal setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createNyasa&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@devanshhq/nyasa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nyasa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createNyasa&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/session-verdict&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;nyasa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Get the verdict at any point&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nyasa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getVerdict&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;       &lt;span class="c1"&gt;// 'Human' | 'AuthorizedAgent' | 'UnauthorizedBot'&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 0.0 - 1.0&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;badges&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;     &lt;span class="c1"&gt;// ['isHeadless', 'isLLMAgent', ...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the backend, you receive the verdict via the beacon endpoint and decide what to do: allow, challenge, block, or route differently based on the type.&lt;/p&gt;

&lt;h2&gt;
  
  
  The authorized agent bypass
&lt;/h2&gt;

&lt;p&gt;If you're building an AI agent that needs to interact with Nyasa-protected pages, set the signature before the SDK initializes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In your agent code, before navigating to the page&lt;/span&gt;
&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__nyasaAgentSignature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;signed-jwt-from-your-auth-server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;shopping-assistant-v2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;issuedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via meta tag for server-rendered flows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"nyasa-agent-signature"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"signed-jwt-here"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;isAuthorizedAgent&lt;/code&gt; rule reads this claim, validates the signature, and short-circuits to &lt;code&gt;AuthorizedAgent&lt;/code&gt;. No other rules run. The session is logged as a known agent, not blocked as a bot.&lt;/p&gt;

&lt;p&gt;This is the part that most bot detection tools don't have a concept for. If you're running a fintech integration or an AI onboarding assistant, you shouldn't have to fight your own security layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it catches that others miss
&lt;/h2&gt;

&lt;p&gt;Traditional fingerprinting misses LLM agents because they run in real browsers with patched automation markers. Traditional behavioral analytics miss them because modern LLM agents have realistic typing cadence.&lt;/p&gt;

&lt;p&gt;Nyasa catches them through the combination: machine-speed micro-bursts that no human produces, combined with zero backspace rate and pixel-perfect clicks. Any one signal has false positives. All three together don't.&lt;/p&gt;

&lt;p&gt;The multimodal rule catches the edge cases: sessions that pass fingerprinting and look almost human behaviorally but have soft contradictions across signals that don't fit either profile cleanly.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Live:&lt;/strong&gt; &lt;a href="https://nyasa.devanshtiwari.com" rel="noopener noreferrer"&gt;nyasa.devanshtiwari.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Devansh-365/nyasa" rel="noopener noreferrer"&gt;github.com/Devansh-365/nyasa&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;npm:&lt;/strong&gt; &lt;code&gt;npm install @devanshhq/nyasa&lt;/code&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>typescript</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Your Google Sheets backend is silently dropping rows. Here's why.</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Sun, 26 Apr 2026 23:30:00 +0000</pubDate>
      <link>https://dev.to/devansh365/your-google-sheets-backend-is-silently-dropping-rows-heres-why-3o74</link>
      <guid>https://dev.to/devansh365/your-google-sheets-backend-is-silently-dropping-rows-heres-why-3o74</guid>
      <description>&lt;p&gt;A signup form POSTing to Google Sheets is the most common "backend" on the indie web.&lt;/p&gt;

&lt;p&gt;It works for your landing page demo. It works when you test it with 5 friends. It works right up until the moment you don't want it to fail.&lt;/p&gt;

&lt;p&gt;Here's the part nobody tells you: &lt;strong&gt;Google's own &lt;code&gt;values.append&lt;/code&gt; endpoint silently drops rows under concurrent writes.&lt;/strong&gt; Two simultaneous POSTs can resolve to the same target row and one of them gets overwritten. No error in your logs. No error in your client. Just rows that silently didn't land.&lt;/p&gt;

&lt;p&gt;Every "Sheets as a backend" wrapper you've heard of — SheetDB, Sheety, SheetBest, NoCodeAPI — forwards your request straight to &lt;code&gt;values.append&lt;/code&gt;. They inherit the bug.&lt;/p&gt;

&lt;p&gt;I spent the last few weeks building a fix. It's called SheetForge, it's MIT-licensed, and this post is about the actual bug and why the fix isn't as simple as "throw a mutex on it."&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug, reproduced in 4 lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rowA&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rowB&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rowC&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;rowD&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four POSTs. You expect four rows. Under load you often get three. Sometimes two.&lt;/p&gt;

&lt;p&gt;The reason is inside &lt;code&gt;values.append&lt;/code&gt;. The operation reads the current last row, then writes to the position after it. When two calls race, they can read the same "current last row" and write to the same target cell range. One value wins. The other silently disappears.&lt;/p&gt;

&lt;p&gt;Google has &lt;a href="https://developers.google.com/sheets/api/guides/values#appending_values" rel="noopener noreferrer"&gt;documented this&lt;/a&gt;. The official workaround is: "Don't write concurrently." That's a fine rule when your launch gets 3 signups. It's actively destructive when you hit HN's front page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2c4ejfe7ra8amw1navaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2c4ejfe7ra8amw1navaf.png" alt="4 POSTs, 3 rows. rowC was silently overwritten. No error." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why your form loses 40% of rows during a traffic spike
&lt;/h2&gt;

&lt;p&gt;Your form looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// /api/signup&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()]]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deployed on Vercel. Each request gets its own isolated serverless invocation. They run truly in parallel. There is zero coordination between them.&lt;/p&gt;

&lt;p&gt;Now Product Hunt puts you on the daily leaderboard. Forty people land on your page in the same 10 seconds. Twelve of them submit the form before the others.&lt;/p&gt;

&lt;p&gt;Twelve concurrent POSTs to &lt;code&gt;values.append&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You won't end up with twelve rows. You'll end up with something closer to eight, maybe nine. The exact number depends on how Google's backend serializes the writes internally (it doesn't, deterministically — that's the whole problem). The lost rows show no error. The users who submitted them see a green checkmark. Their email is gone.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. Levels.fyi &lt;a href="https://www.levels.fyi/blog/scaling-to-millions-with-google-sheets.html" rel="noopener noreferrer"&gt;wrote a long engineering post&lt;/a&gt; about running their entire site on Sheets until exactly this class of problem forced them to migrate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a mutex doesn't fix it
&lt;/h2&gt;

&lt;p&gt;The naive fix is obvious: put a lock in front of the Sheets API call. One write at a time per sheet. Problem solved.&lt;/p&gt;

&lt;p&gt;In practice, this is where it breaks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Your API runs on serverless.&lt;/strong&gt; A lock in Node process memory doesn't work when requests are spread across 12 cold starts. You need a distributed lock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A distributed lock has its own bugs.&lt;/strong&gt; Redis &lt;code&gt;SETNX&lt;/code&gt; with a TTL is the standard answer. But the classic failure mode: process A acquires the lock, gets paused by GC, TTL expires, process B grabs the lock, process A wakes up and releases "its" lock — which is now B's lock. Now two processes think they hold the lock. You're back to silent data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Retries break everything.&lt;/strong&gt; Your client retries a failed POST. The lock-protected write succeeds twice. Now you have duplicate rows. Fixing drops created duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Crashes strand the lock.&lt;/strong&gt; If your process dies while holding the lock, the next writer has to wait for the TTL to expire. For TTLs long enough to be safe (30+ seconds), that stalls your whole write throughput.&lt;/p&gt;

&lt;p&gt;Every one of these is a real bug I hit while prototyping. The fix isn't a mutex. The fix is a proper queue.&lt;/p&gt;

&lt;h2&gt;
  
  
  How SheetForge actually fixes it
&lt;/h2&gt;

&lt;p&gt;The architecture, in one sentence: &lt;strong&gt;every write goes into a per-sheet queue, one worker per sheet pulls from that queue inside a Postgres transaction, and an idempotency key dedupes retries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;submitWrite(row)
  └─ INSERT INTO write_ledger (sheet_id, idempotency_key, payload)
     (partial unique index dedupes retries before we even touch Sheets)
  └─ XADD to Redis Stream for this sheet
  └─ return { writeId, status: 'pending' }

processNext()  ← runs in a loop on the worker
  └─ XREADGROUP from the sheet's stream
  └─ BEGIN transaction
     └─ SELECT pg_advisory_xact_lock(hashtextextended(streamKey, 0))
     └─ call sheets.values.append(payload)
     └─ UPDATE write_ledger SET status = 'committed'
  └─ COMMIT
  └─ XACK only after commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four things matter here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The advisory lock is the fence.&lt;/strong&gt; &lt;code&gt;pg_advisory_xact_lock&lt;/code&gt; acquires a lock inside the transaction. If the transaction commits or aborts, the lock is released automatically by Postgres. There is no TTL. No lease clock. No split-brain. If your process dies mid-handler, the transaction rolls back, the lock releases, and the message redelivers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The idempotency key is the deduper.&lt;/strong&gt; Every write comes in with an &lt;code&gt;Idempotency-Key&lt;/code&gt; header. A partial unique index on &lt;code&gt;(sheet_id, idempotency_key)&lt;/code&gt; WHERE &lt;code&gt;status IN ('pending', 'committed')&lt;/code&gt; means the database rejects duplicates before the worker even sees them. Retry the same request 100 times, you get 1 row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ledger is the truth.&lt;/strong&gt; The write is durable in Postgres the moment you get the &lt;code&gt;writeId&lt;/code&gt; back. Sheets is downstream. If Google's API is down, your writes queue up and flush when it recovers. Your users see a green checkmark and it means something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XACK happens post-commit.&lt;/strong&gt; Redis Streams' PEL (pending entries list) redelivers messages if they're not acked. If the worker crashes mid-transaction, Postgres rolls back, Redis redelivers, the idempotency key catches the replay. Exactly-once semantics, for real.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh004xnip7xx24dxgk2c0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh004xnip7xx24dxgk2c0.png" alt="Queue Architecture" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The full test for this concurrency
&lt;/h2&gt;

&lt;p&gt;Every change to the write-queue slice requires a concurrency test. This is the one I do not break:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;50 parallel writes land 50 rows in order&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createTestSheet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;writes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`user-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@test.com`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;idempotencyKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`key-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}))&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;writes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;sheetforge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;idempotencyKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;idempotencyKey&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitForQueueDrain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readSheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;writes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;50 parallel POSTs. 50 rows. In order. Retry safe.&lt;/p&gt;

&lt;p&gt;This same test against raw &lt;code&gt;values.append&lt;/code&gt; reliably fails. It fails under SheetDB, Sheety, and SheetBest too — I checked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhard04qrm1c4pkalpz5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhard04qrm1c4pkalpz5i.png" alt="Concurrency Test Result" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The typed SDK is the bonus
&lt;/h2&gt;

&lt;p&gt;Once you have a proper queue, the rest gets interesting. Since SheetForge knows your sheet's header row, it can generate a typed TypeScript client with literal union types inferred from your sample cells.&lt;/p&gt;

&lt;p&gt;Header row: &lt;code&gt;email | plan | created_at&lt;/code&gt;&lt;br&gt;
Sample cells: &lt;code&gt;hi@example.com | free | 2026-04-15&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Generated SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./sheetforge-client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SHEETFORGE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;sheetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sht_abc123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hi@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;free&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 'free' | 'pro' — inferred from sample cells&lt;/span&gt;
    &lt;span class="na"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;idempotencyKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compiler catches header drift. Rename a column in the sheet and regenerate the client — TypeScript tells you every call site that needs updating.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;One-click hosted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://getsheetforge.vercel.app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sign in with Google. Connect a sheet. Copy your API key. Done.&lt;/p&gt;

&lt;p&gt;Self-host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Devansh-365/sheetforge.git
&lt;span class="nb"&gt;cd &lt;/span&gt;sheetforge
pnpm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env   &lt;span class="c"&gt;# Google OAuth + DATABASE_URL + Redis&lt;/span&gt;
pnpm db:push
pnpm dev               &lt;span class="c"&gt;# web :3000, api :3001&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prereqs: Node 20+, pnpm 9+, Postgres 14+, Redis 6+ (or Upstash REST).&lt;/p&gt;

&lt;p&gt;The OSS core (&lt;code&gt;packages/queue&lt;/code&gt;, &lt;code&gt;packages/codegen&lt;/code&gt;, &lt;code&gt;packages/sdk-ts&lt;/code&gt;) is MIT and stays free forever. The hosted SaaS runs the same code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyi08g3kjqj1953ks7sa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyi08g3kjqj1953ks7sa.png" alt="If you need webhooks today, use SheetDB. If you need your rows to land, come back." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What SheetForge is not
&lt;/h2&gt;

&lt;p&gt;It is not a Postgres replacement. If you need complex queries, indices, or relational integrity, use a real database.&lt;/p&gt;

&lt;p&gt;It is not a high-throughput pipe. Google caps you at ~60 writes/minute per sheet regardless of what sits in front. SheetForge makes sure those writes land; it doesn't make them faster.&lt;/p&gt;

&lt;p&gt;It is not a reason to keep Sheets as your backend forever. It's the right tool for landing pages, waitlists, internal forms, ops tools, and MVPs where you need rows to actually land. If your app outgrows Sheets, it outgrows SheetForge too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one takeaway
&lt;/h2&gt;

&lt;p&gt;Every Sheets-as-backend wrapper you've been using has a silent data-loss bug that only shows up when traffic spikes. That's exactly the moment you most care about losing rows.&lt;/p&gt;

&lt;p&gt;If you've ever shipped a form on Sheets and watched rows vanish mid-launch, give SheetForge a try. If it saves you one bug, star the repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Devansh-365/sheetforge" rel="noopener noreferrer"&gt;github.com/Devansh-365/sheetforge&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Hosted:&lt;/strong&gt; &lt;a href="https://getsheetforge.vercel.app" rel="noopener noreferrer"&gt;getsheetforge.vercel.app&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>googlesheets</category>
      <category>startup</category>
      <category>saas</category>
    </item>
    <item>
      <title>Gemini 2.5 Flash was returning 37 tokens. Here's why.</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Sun, 19 Apr 2026 23:30:00 +0000</pubDate>
      <link>https://dev.to/devansh365/gemini-25-flash-was-returning-37-tokens-heres-why-4ppp</link>
      <guid>https://dev.to/devansh365/gemini-25-flash-was-returning-37-tokens-heres-why-4ppp</guid>
      <description>&lt;p&gt;I set &lt;code&gt;max_tokens: 1000&lt;/code&gt; on a Gemini 2.5 Flash call.&lt;/p&gt;

&lt;p&gt;The response came back with 37 tokens. &lt;code&gt;finish_reason: "MAX_TOKENS"&lt;/code&gt;. No error. No warning. Just a string that stopped mid-sentence.&lt;/p&gt;

&lt;p&gt;I changed it to 2000. Got back 41 tokens. Then 5000. Got back 38.&lt;/p&gt;

&lt;p&gt;That's when I knew something was actually broken, not just a config issue.&lt;/p&gt;

&lt;p&gt;I spent a day tracing this. The root cause is surprising, the official docs don't explain it, and the fix depends on which version of which SDK you're using. Here's what I learned, and a diagnostic script at the end so you can figure out which variant of the bug you hit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;Your Gemini 2.5 Flash or Pro call returns one of these shapes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"parts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"finishReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MAX_TOKENS"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usageMetadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"promptTokenCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"candidatesTokenCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"thoughtsTokenCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;964&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"totalTokenCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1084&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or a truncated mid-sentence response with &lt;code&gt;candidatesTokenCount&lt;/code&gt; near zero and &lt;code&gt;thoughtsTokenCount&lt;/code&gt; close to whatever you set &lt;code&gt;max_output_tokens&lt;/code&gt; to.&lt;/p&gt;

&lt;p&gt;The word &lt;code&gt;thoughtsTokenCount&lt;/code&gt; is the giveaway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Gemini 2.5 Flash and Pro are reasoning models. Like OpenAI's o-series, they burn tokens on internal reasoning before writing the visible response. Unlike OpenAI's models, Google counts those thinking tokens against your &lt;code&gt;max_output_tokens&lt;/code&gt; budget.&lt;/p&gt;

&lt;p&gt;So when you ask for 1,000 tokens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model thinks. This uses some number of tokens, tracked as &lt;code&gt;thoughtsTokenCount&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Once &lt;code&gt;thoughtsTokenCount + candidatesTokenCount&lt;/code&gt; hits your budget, generation stops.&lt;/li&gt;
&lt;li&gt;If thinking consumed most of the budget, &lt;code&gt;candidatesTokenCount&lt;/code&gt; ends up near zero.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gemini 2.5 Flash defaults to a dynamic thinking budget. It decides how much to think based on the task. For anything non-trivial, it will happily burn 90 to 98 percent of your budget on reasoning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyutlsc5hr5fc8x0t2zmc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyutlsc5hr5fc8x0t2zmc.png" alt="Where your max_tokens actually go"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see this directly in the API response. If you're using the Google GenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize quantum computing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Output tokens:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates_token_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thinking tokens:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thoughts_token_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_token_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finish reason:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;thoughts_token_count&lt;/code&gt; field is where your budget actually went.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three fixes, ranked
&lt;/h2&gt;

&lt;p&gt;There are three ways to handle this, and they have real tradeoffs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disable thinking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;thinking_budget: 0&lt;/code&gt; (Flash) or &lt;code&gt;reasoning_effort: "none"&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Lower on complex reasoning&lt;/td&gt;
&lt;td&gt;Chat UIs, structured extraction, high-volume endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cap thinking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;thinking_budget: 1024&lt;/code&gt; + &lt;code&gt;max_output_tokens: 8192&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Most production workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dynamic thinking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Let Flash decide, set &lt;code&gt;max_output_tokens&lt;/code&gt; to 8K+&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;Best&lt;/td&gt;
&lt;td&gt;Research queries, complex analysis, one-shot deep tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The third option is the default, and it's the source of the bug. It's only the right choice if you're actually okay with burning most of your tokens on reasoning and waiting 5 to 30 seconds per response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 1: Disable thinking for Flash
&lt;/h2&gt;

&lt;p&gt;For Gemini 2.5 Flash, you can turn thinking off entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain circuit breakers in 2 sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;max_output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thinking_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ThinkingConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thinking_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;thinking_budget=0&lt;/code&gt; is only valid for 2.5 Flash. Pro refuses to run without at least some thinking, and throws:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thinking can't be disabled for this model.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Pro, the minimum accepted value is 128. Using &lt;code&gt;thinking_budget=128&lt;/code&gt; gets you the closest thing to "off" that Pro allows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 2: The OpenAI-compat escape hatch (underdocumented)
&lt;/h2&gt;

&lt;p&gt;If you're hitting Gemini through the OpenAI-compatible endpoint (either Google's own &lt;code&gt;generativelanguage.googleapis.com/v1beta/openai&lt;/code&gt; or through a proxy like LiteLLM), you can use &lt;code&gt;reasoning_effort&lt;/code&gt; instead of &lt;code&gt;thinking_budget&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://generativelanguage.googleapis.com/v1beta/openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GEMINI_API_KEY&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain circuit breakers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# or "low", "medium", "high"
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is barely documented. Google's official OpenAI-compatibility page mentions it in passing, and almost no tutorials cover it. But it works, and it's the cleanest way to control reasoning from code that uses the OpenAI SDK.&lt;/p&gt;

&lt;p&gt;Mapping:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;reasoning_effort: "none"&lt;/code&gt; → &lt;code&gt;thinking_budget: 0&lt;/code&gt; (Flash only)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reasoning_effort: "low"&lt;/code&gt; → &lt;code&gt;thinking_budget: 1024&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reasoning_effort: "medium"&lt;/code&gt; → &lt;code&gt;thinking_budget: 8192&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reasoning_effort: "high"&lt;/code&gt; → &lt;code&gt;thinking_budget: 24576&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm75kc7oclgtgkgc5r2x6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm75kc7oclgtgkgc5r2x6.png" alt="Fix decision tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 3: The integration-specific gotchas
&lt;/h2&gt;

&lt;p&gt;The bug manifests differently depending on your stack. Some quick notes from actual GitHub issues (&lt;a href="https://github.com/googleapis/python-genai/issues/782" rel="noopener noreferrer"&gt;python-genai #782&lt;/a&gt;, &lt;a href="https://github.com/google-gemini/gemini-cli/issues/23081" rel="noopener noreferrer"&gt;gemini-cli #23081&lt;/a&gt;, &lt;a href="https://github.com/langchain-ai/langchain-google/issues/1490" rel="noopener noreferrer"&gt;langchain-google #1490&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangChain&lt;/strong&gt; silently truncates output. Developers report setting &lt;code&gt;max_tokens=16000&lt;/code&gt; and still getting cut-off responses. Fix: pass &lt;code&gt;thinking_budget&lt;/code&gt; via &lt;code&gt;model_kwargs&lt;/code&gt;, or switch to the OpenAI-compat endpoint through &lt;code&gt;ChatOpenAI&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LiteLLM&lt;/strong&gt; accepts &lt;code&gt;reasoning_effort&lt;/code&gt; and maps it to the Gemini parameter, but as of late 2025 it rejected the parameter for Pro with "Thinking can't be disabled." Fix: use &lt;code&gt;reasoning_effort="low"&lt;/code&gt; instead of &lt;code&gt;"none"&lt;/code&gt; for Pro.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ha-llmvision&lt;/strong&gt; defaulted &lt;code&gt;thinkingBudget&lt;/code&gt; to 35 to 50 tokens. That value gets fully consumed by thinking, leaving nothing for output. Fix: set to 1024 or higher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cline&lt;/strong&gt; set &lt;code&gt;thinkingBudget: 0&lt;/code&gt; which works for Flash Lite but throws on Pro. Fix depends on which model you're targeting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vertex AI&lt;/strong&gt; uses &lt;code&gt;thinkingConfig.thinkingBudget&lt;/code&gt; nested inside the config object. Raw API requests that put it at the top level silently ignore it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diagnostic script
&lt;/h2&gt;

&lt;p&gt;If you're not sure which variant of the bug you hit, run this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;diagnose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;finish&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt;

    &lt;span class="n"&gt;thinking_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thoughts_token_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thoughts_token_count&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;output_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates_token_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates_token_count&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model:          &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Budget:         &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thinking used:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thoughts_token_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;thinking_pct&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Output tokens:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates_token_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_pct&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finish reason:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Response len:   &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;finish&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAX_TOKENS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates_token_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;DIAGNOSIS: Thinking tokens ate your budget.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FIX: Set thinking_budget=0 (Flash) or reasoning_effort=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;finish&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAX_TOKENS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;DIAGNOSIS: Output actually hit the cap. Raise max_output_tokens.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;diagnose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short poem about debugging.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script prints a percentage breakdown showing exactly where your budget went. If thinking is over 50 percent of your budget, you need to cap it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gateway-level fix
&lt;/h2&gt;

&lt;p&gt;All of this is fixable at the application layer, but it requires every caller to know about reasoning budgets. That doesn't scale if you have multiple services calling Gemini.&lt;/p&gt;

&lt;p&gt;I run &lt;a href="https://github.com/devansh-365/freellm" rel="noopener noreferrer"&gt;FreeLLM&lt;/a&gt; in front of my LLM calls. It's an OpenAI-compatible gateway that routes across six providers, and it sets the right reasoning budget per Gemini model automatically. Flash gets &lt;code&gt;reasoning_effort: "none"&lt;/code&gt;. Pro gets &lt;code&gt;"low"&lt;/code&gt;. Your full &lt;code&gt;max_tokens&lt;/code&gt; budget goes to the actual answer. You can override per-request if you need reasoning back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 1000
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before the gateway: 37 output tokens. After: 670+ tokens, &lt;code&gt;finish_reason: stop&lt;/code&gt;. Same prompt, same budget.&lt;/p&gt;

&lt;p&gt;The point is not "use my tool." The point is that gateway-level defaults let you fix provider quirks once instead of in every service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to take away
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Gemini 2.5 is a reasoning model. Its thinking tokens count against your &lt;code&gt;max_output_tokens&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The dynamic default will eat 90 to 98 percent of your budget on anything non-trivial.&lt;/li&gt;
&lt;li&gt;For Flash, disable thinking with &lt;code&gt;thinking_budget: 0&lt;/code&gt; or &lt;code&gt;reasoning_effort: "none"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For Pro, cap thinking with &lt;code&gt;thinking_budget: 128&lt;/code&gt; (minimum) or &lt;code&gt;reasoning_effort: "low"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If you're using an OpenAI-compat endpoint, &lt;code&gt;reasoning_effort&lt;/code&gt; is cleaner and underdocumented.&lt;/li&gt;
&lt;li&gt;Run the diagnostic script above when in doubt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The official docs don't make any of this obvious. Hopefully this post saves you the day I spent figuring it out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt; (the gateway that handles this for you)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>openai</category>
      <category>development</category>
    </item>
    <item>
      <title>LiteLLM got hacked. I built a simpler LLM gateway you can actually audit.</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Tue, 14 Apr 2026 00:10:45 +0000</pubDate>
      <link>https://dev.to/devansh365/litellm-got-hacked-i-built-a-simpler-llm-gateway-you-can-actually-audit-3hia</link>
      <guid>https://dev.to/devansh365/litellm-got-hacked-i-built-a-simpler-llm-gateway-you-can-actually-audit-3hia</guid>
      <description>&lt;p&gt;On March 24, 2026, LiteLLM versions 1.82.7 and 1.82.8 were uploaded to PyPI with a credential harvester, a Kubernetes lateral-movement toolkit, and a persistent remote code execution backdoor baked in.&lt;/p&gt;

&lt;p&gt;The malicious package was live for about 40 minutes before PyPI quarantined it.&lt;/p&gt;

&lt;p&gt;40 minutes doesn't sound like much. But LiteLLM gets 95 million downloads a month. It's the default multi-provider routing library for anyone building on LLMs. Teams running &lt;code&gt;pip install litellm&lt;/code&gt; during that window got compromised automatically. No explicit import needed. The payload triggered on Python interpreter startup via a &lt;code&gt;.pth&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Google brought in Mandiant for the investigation. Snyk, Kaspersky, and Trend Micro all published breakdowns. The attack vector: a compromised Trivy security scanner leaked CircleCI credentials, including the PyPI publishing token and a GitHub PAT.&lt;/p&gt;

&lt;p&gt;This is not a theoretical risk. This happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn61cm58mhjeckf0ttnf8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn61cm58mhjeckf0ttnf8.png" alt="LiteLLM Attack Timeline" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem is not one attack
&lt;/h2&gt;

&lt;p&gt;LiteLLM does a lot. 2,000+ models across 100+ providers. Proxy server, load balancing, spend tracking, A/B testing, caching, logging, guardrails, prompt management.&lt;/p&gt;

&lt;p&gt;That scope is the problem.&lt;/p&gt;

&lt;p&gt;A developer on HN described the codebase as having a 7,000+ line &lt;code&gt;utils.py&lt;/code&gt;. A 30-year engineer called it "the worst code I have ever read in my life." Before the supply chain attack, a DEV Community post titled "5 Real Issues With LiteLLM That Are Pushing Teams Away in 2026" was already documenting the trust erosion.&lt;/p&gt;

&lt;p&gt;The supply chain attack was the tipping point, not the root cause. The root cause is depending on a massive, opaque library for critical routing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a simpler design looks like
&lt;/h2&gt;

&lt;p&gt;I ran into the same multi-provider routing problem last year while building Metis, an AI stock analysis tool. Kept burning through Groq's free tier in 20 minutes, switching to Gemini manually, hitting their cap, switching again.&lt;/p&gt;

&lt;p&gt;Built FreeLLM to stop doing that manually. It solves a narrower problem than LiteLLM, and that's the point.&lt;/p&gt;

&lt;p&gt;FreeLLM is an OpenAI-compatible gateway that routes across Groq, Gemini, Mistral, Cerebras, NVIDIA NIM, and Ollama. When one provider rate-limits, the next one answers. That's the core of it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5zmkmnwy550olr0tz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5zmkmnwy550olr0tz4.png" alt="LiteLLM vs FreeLLM" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "free-fast", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing OpenAI SDK code works. Swap the base URL. Keep your code.&lt;/p&gt;

&lt;p&gt;Three meta-models handle routing: &lt;code&gt;free-fast&lt;/code&gt; (lowest latency, usually Groq/Cerebras), &lt;code&gt;free-smart&lt;/code&gt; (best reasoning, usually Gemini 2.5 Pro), and &lt;code&gt;free&lt;/code&gt; (max availability).&lt;/p&gt;

&lt;h3&gt;
  
  
  What it fixes that LiteLLM doesn't
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 reasoning tokens eating your output.&lt;/strong&gt; This is one of the most reported Gemini bugs right now. Gemini 2.5 Flash and Pro are reasoning models. They burn 90-98% of your &lt;code&gt;max_tokens&lt;/code&gt; on internal thinking before producing visible text. Ask for 1,000 tokens and you get back 37. There are 15+ open GitHub issues about this across multiple SDKs.&lt;/p&gt;

&lt;p&gt;FreeLLM fixes it at the gateway. Flash gets &lt;code&gt;reasoning_effort: "none"&lt;/code&gt; by default. Pro gets &lt;code&gt;"low"&lt;/code&gt;. Your full token budget goes to the actual answer. Override per-request if you want the reasoning back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider outages don't break your app.&lt;/strong&gt; Claude went down for three consecutive days in early April. 8,000+ Downdetector reports. If your app depends on one provider, that's three days of broken service. FreeLLM's circuit breakers pull failing providers from rotation and test for recovery automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response caching without a separate layer.&lt;/strong&gt; Identical prompts return in ~23ms with zero quota burn. The cache refuses to store truncated responses (another Gemini bug: reasoning models returning cut-off output that then poisons your cache for an hour).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser-safe tokens for static sites.&lt;/strong&gt; Mint a short-lived HMAC-signed token from a serverless function, pass it to the browser, call the gateway directly from client-side JavaScript. No auth backend. No session store.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key stacking: 360 free requests per minute
&lt;/h3&gt;

&lt;p&gt;Every provider env var accepts a comma-separated list. FreeLLM rotates round-robin per key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GROQ_API_KEY=gsk_key1,gsk_key2,gsk_key3
GEMINI_API_KEY=AI_key1,AI_key2,AI_key3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stack 3 keys across 5 cloud providers: ~360 req/min. All free. Enough to prototype an entire product without spending anything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrn8oxy5n9bvgvnpbjze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrn8oxy5n9bvgvnpbjze.png" alt="Key Stacking Math" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it running
&lt;/h2&gt;

&lt;p&gt;Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gsk_... &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AI... &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/devansh-365/freellm:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or one-click deploy on Railway or Render (buttons in the README).&lt;/p&gt;

&lt;p&gt;Use it from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free-smart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain circuit breakers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TypeScript, Go, Ruby, anything that speaks OpenAI. Same pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond FreeLLM
&lt;/h2&gt;

&lt;p&gt;The LiteLLM attack exposed something the community already suspected: critical AI infrastructure is running on libraries nobody audits.&lt;/p&gt;

&lt;p&gt;The fix is not "use my tool instead." The fix is smaller dependencies, pinned versions, codebases you can read in an afternoon. FreeLLM is 262 tests across 22 files. TypeScript, not Python. Docker images with pinned deps. MIT licensed.&lt;/p&gt;

&lt;p&gt;If you don't use FreeLLM, build something similarly scoped. The era of "install this 100-provider mega-library and trust it with your API keys" should be over.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl47ise5n7ao6pfjztu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl47ise5n7ao6pfjztu3.png" alt="Request Flow" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;262 tests. 6 providers. One endpoint. Zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>openai</category>
    </item>
    <item>
      <title>I built an OpenAI-compatible gateway that routes across 5 free LLM providers</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:22:07 +0000</pubDate>
      <link>https://dev.to/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</link>
      <guid>https://dev.to/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</guid>
      <description>&lt;p&gt;Every LLM provider has a free tier.&lt;/p&gt;

&lt;p&gt;Groq gives you 30 requests per minute. Gemini gives you 15. Cerebras gives you 30. Mistral gives you 5.&lt;/p&gt;

&lt;p&gt;Combined, that's about 80 requests per minute. Enough for prototyping, internal tools, and side projects where you don't want to pay for API access yet.&lt;/p&gt;

&lt;p&gt;The problem: each provider has its own SDK, its own rate limits, its own auth, and its own downtime. You end up writing provider-switching logic, catching 429 errors, and managing API keys across five different dashboards.&lt;/p&gt;

&lt;p&gt;I got tired of this while building &lt;a href="https://trymetis.app" rel="noopener noreferrer"&gt;Metis&lt;/a&gt;, an AI stock analysis tool. Kept hitting Groq's limits while Gemini had capacity sitting idle. So I built FreeLLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  What FreeLLM does
&lt;/h2&gt;

&lt;p&gt;One endpoint. Five providers. Twenty models. All free.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "free-fast", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing OpenAI SDK code works. Just change the base URL. That's the whole migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the routing works
&lt;/h2&gt;

&lt;p&gt;When a request comes in, FreeLLM:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks which providers are healthy (circuit breakers track this automatically)&lt;/li&gt;
&lt;li&gt;Picks the best available provider based on your model choice&lt;/li&gt;
&lt;li&gt;If that provider returns a 429 or fails, it tries the next one&lt;/li&gt;
&lt;li&gt;You get a response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three meta-models handle routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;free-fast   → lowest latency (usually Groq or Cerebras)
free-smart  → most capable model (usually Gemini 2.5)
free        → maximum availability across all providers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Providers and their free tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B, Llama 4 Scout, Qwen3 32B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;2.5 Flash, 2.5 Pro, 2.0 Flash&lt;/td&gt;
&lt;td&gt;~15 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cerebras&lt;/td&gt;
&lt;td&gt;Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Small, Medium, Nemo&lt;/td&gt;
&lt;td&gt;~5 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Any local model&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's under the hood
&lt;/h2&gt;

&lt;p&gt;This isn't a simple round-robin proxy. The routing layer handles real production concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sliding-window rate limiter.&lt;/strong&gt; Each provider's limits are tracked independently. FreeLLM knows how many requests you've sent to Groq in the last 60 seconds and won't send another if you're near the cap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Circuit breakers.&lt;/strong&gt; If Gemini starts returning 500s, FreeLLM pulls it from rotation. Every 30 seconds, it sends a test request. When the provider recovers, it goes back in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-client rate limiting.&lt;/strong&gt; If you expose this to a team, each client gets their own limit. Admin auth protects the config endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zod validation.&lt;/strong&gt; Every request is validated before it hits any provider. Bad payloads fail fast with clear error messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-time dashboard.&lt;/strong&gt; React frontend showing provider health, request logs, and latency. You can see which providers are healthy at a glance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it running in 30 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/devansh-365/freellm.git
&lt;span class="nb"&gt;cd &lt;/span&gt;freellm
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env   &lt;span class="c"&gt;# add your free API keys&lt;/span&gt;
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API on &lt;code&gt;localhost:3000&lt;/code&gt;. Dashboard on &lt;code&gt;localhost:3000/dashboard&lt;/code&gt;. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using it with the OpenAI SDK
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;not-needed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;free-fast&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain circuit breakers in 2 sentences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No new SDK to learn. No migration effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I was building Metis and kept running into the same pattern: burn through Groq's free tier in 20 minutes of testing, switch to Gemini manually, hit their limit, switch to Mistral. Repeat.&lt;/p&gt;

&lt;p&gt;Wrote a quick proxy to automate the switching. Added failover because providers go down randomly. Added circuit breakers because I didn't want to wait for timeouts. Added a dashboard because I wanted to see what was happening.&lt;/p&gt;

&lt;p&gt;It grew into a proper tool. Open-sourced it because every developer prototyping with LLMs has this exact problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;TypeScript, Express 5, React 19, Zod, Docker. MIT licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>react native animation</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Wed, 10 Sep 2025 16:11:18 +0000</pubDate>
      <link>https://dev.to/devansh365/react-native-animation-46f2</link>
      <guid>https://dev.to/devansh365/react-native-animation-46f2</guid>
      <description></description>
      <category>reactnative</category>
      <category>react</category>
      <category>animation</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
