<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shudipto Trafder</title>
    <description>The latest articles on DEV Community by Shudipto Trafder (@shudiptotrafder).</description>
    <link>https://dev.to/shudiptotrafder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3681541%2Fd686de74-59f1-4097-9bf4-81b3837b0aba.jpg</url>
      <title>DEV Community: Shudipto Trafder</title>
      <link>https://dev.to/shudiptotrafder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shudiptotrafder"/>
    <language>en</language>
    <item>
      <title>Agent memory: 7 types, and 2 of them aren't memory</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Tue, 23 Jun 2026 04:58:56 +0000</pubDate>
      <link>https://dev.to/shudiptotrafder/agent-memory-7-types-and-2-of-them-arent-memory-6oi</link>
      <guid>https://dev.to/shudiptotrafder/agent-memory-7-types-and-2-of-them-arent-memory-6oi</guid>
      <description>&lt;p&gt;Your agent doesn't have a memory problem. It has seven of them, and most teams have built two.&lt;/p&gt;

&lt;p&gt;Start from the thing everyone skips past: the model itself remembers nothing. An LLM is a pure function. Same input, same output, no state carried between calls. Whatever feels like memory when you talk to ChatGPT is a layer wrapped around the model, re-sending the relevant history on every single request. The model is not remembering your last message. Something else is handing it back to the model, every turn, and paying for the tokens each time.&lt;/p&gt;

&lt;p&gt;That layer is where almost all the engineering lives, and almost all of it collapses into two patterns: a conversation history that keeps growing until you truncate it, and a vector database you call RAG. Those are two of seven distinct things an agent can remember. The catch is that they're the two that don't make the agent any smarter over time. The type that does, the one that turns yesterday's mistakes into tomorrow's rules, is the least-built component in the entire stack.&lt;/p&gt;

&lt;p&gt;This is part one of three. Here I'll lay out the seven types and argue about which ones actually earn the name. The taxonomy isn't mine: it comes from cognitive science by way of the CoALA paper (Sumers et al., Princeton 2023) and Tulving's 1972 split between episodic and semantic memory. What follows is the engineer's version, with opinions about what to build and what to ignore.&lt;/p&gt;




&lt;h2&gt;
  
  
  The only axis that matters: how long it lives
&lt;/h2&gt;

&lt;p&gt;Forget the seven labels for a second. There's one organizing question, and it's temporal. Does this memory live inside the context window for a single turn, or does it persist outside the model across sessions?&lt;/p&gt;

&lt;p&gt;Short-term is the context window. It's fast, it's right there, and it evaporates when the session ends. Long-term is everything you write to an external store and read back later. Two of the seven straddle the middle: their store is long-lived, but what they hand the model is used for exactly one turn and then thrown away. One type doesn't sit in any store at all; it's frozen into the model's weights.&lt;/p&gt;

&lt;p&gt;Here's the full set, then I'll argue about it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Lives&lt;/th&gt;
&lt;th&gt;In one line&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Short&lt;/td&gt;
&lt;td&gt;Everything in the context window right now&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Facts, preferences, domain knowledge: the &lt;em&gt;know-what&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Logged past events: what worked, what failed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Procedural&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Skills, workflows, tool patterns: the &lt;em&gt;know-how&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;Both&lt;/td&gt;
&lt;td&gt;Knowledge pulled in by similarity search (RAG)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Parametric&lt;/td&gt;
&lt;td&gt;Long&lt;/td&gt;
&lt;td&gt;Knowledge baked into the model's weights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Prospective&lt;/td&gt;
&lt;td&gt;Both&lt;/td&gt;
&lt;td&gt;Future intentions, scheduled to fire later&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Read past the table and you'll notice two of these don't behave like memory at all. That's the interesting part.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Working memory: where you pay the bill
&lt;/h2&gt;

&lt;p&gt;Working memory is the context window: system prompt, conversation history, tool outputs, retrieved chunks, and the model's own running reasoning. It's the only memory the model can directly see. Every other type on this list exists to load itself into here at the right moment.&lt;/p&gt;

&lt;p&gt;Three facts about it decide your whole architecture. It's bounded, so you evict or summarize at the limit. It vanishes at session end. And you re-pay for it every turn, because the entire window is re-sent on each call.&lt;/p&gt;

&lt;p&gt;That last one is where the money goes, and it's just math. A 50-turn conversation that keeps full history re-sends turn 1 fifty times. The early context isn't expensive once, it's expensive once per remaining turn. This is why "just use a bigger context window" is a credit card, not a solution. It works until the bill arrives, and the bill scales with the square of the conversation length.&lt;/p&gt;

&lt;p&gt;There's no magic in the code, just a list that goes back in full every turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The whole window is re-sent on every turn. That is the bill.
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Frameworks paper over this with persisted state. A LangGraph checkpointer, for example, saves the thread so the history survives a restart. Persistence is not the same as paying less, though: the window still ships in full on every call. What you actually control is what goes into that list, which is why eviction and summarization are the real levers.&lt;/p&gt;

&lt;p&gt;Working memory is the convergence point for everything else here. Get it wrong and no clever retrieval scheme downstream will save you.&lt;/p&gt;




&lt;h2&gt;
  
  
  2, 3, 4: the three long-term stores worth separating
&lt;/h2&gt;

&lt;p&gt;These three are the heart of the system, and the useful move is to keep them distinct, because they answer different questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic memory&lt;/strong&gt; is the &lt;em&gt;know-what&lt;/em&gt;: stable facts and preferences, decoupled from when you learned them. A user is on the Enterprise plan and prefers email over phone. A client is an NRI with a moderate risk profile who won't touch structured products. You apply that in every conversation without re-deriving it. You build it from a structured store for the clean fields plus a vector store for the fuzzy recall, and you update profiles incrementally.&lt;/p&gt;

&lt;p&gt;The failure mode here isn't retrieval. It's truth. At one user, stale facts are an annoyance. At enterprise scale, with multiple agents writing to the same profile, the hard problem is source-of-truth conflict: two agents hold contradictory versions of the same fact and nothing arbitrates. That's a data-governance problem wearing a memory costume, and it's the thing that actually bites in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Episodic memory&lt;/strong&gt; is the log of specific past events: full runs, the decision the agent made, and whether it worked. A fraud agent records each case (the pattern it saw, what it recommended, and whether it turned out to be real fraud or a false positive), then pulls the close matches when a similar signature shows up. This is case-based reasoning. It's how an agent stops repeating the same mistake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedural memory&lt;/strong&gt; is the &lt;em&gt;know-how&lt;/em&gt;: the workflows, tool-use patterns, and rules for how to do things. A claims agent's flow is procedural: validate policy, assess the damage photos, check fraud signals, compute the payout band, route for approval above a threshold. Some of this lives in a tuned system prompt acting as the meta-controller, and some lives in an external store of decision rules. For rules, a structured key lookup beats fuzzy search every time. You don't want "approval threshold" retrieved by cosine similarity; you want it looked up exactly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Rules are an exact lookup, not a similarity hit.
&lt;/span&gt;&lt;span class="n"&gt;PAYOUT_BANDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;home&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200_000&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PAYOUT_BANDS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;        &lt;span class="c1"&gt;# correct every time
&lt;/span&gt;
&lt;span class="c1"&gt;# not this, where "close" quietly becomes "wrong":
&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; approval threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The clean split, and the reason to keep them apart: procedural says &lt;em&gt;how&lt;/em&gt;, semantic says &lt;em&gt;what the policy is&lt;/em&gt;, episodic says &lt;em&gt;what happened&lt;/em&gt;. Collapse them into one "memory" blob and you lose the ability to update one without corrupting the others.&lt;/p&gt;

&lt;h3&gt;
  
  
  The one nobody builds
&lt;/h3&gt;

&lt;p&gt;Here's the part worth circling. The link between episodic and semantic memory is where learning actually happens. An agent hits the same situation a dozen times, you abstract the pattern into a rule, and you promote that rule from the episode log into semantic memory. After that, the agent doesn't reason from twelve analogies; it applies one fact.&lt;/p&gt;

&lt;p&gt;The loop itself is small to write, which is what makes skipping it so telling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;similar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;they_agree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abstract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# "auto claims with signal Y are usually false positives"
&lt;/span&gt;    &lt;span class="n"&gt;semantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# promote the pattern into a durable fact
&lt;/span&gt;    &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_consolidated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't have to hand-roll all of it. LangMem and Mem0 both extract and consolidate memories from past runs instead of leaving you a raw log, and Letta (formerly MemGPT) lets the agent rewrite its own memory between turns. The tooling is catching up to the idea. Most teams still haven't wired it in.&lt;/p&gt;

&lt;p&gt;That loop, consolidation, is the most valuable stage in the whole system and the least implemented. Almost everyone logs episodes. Almost nobody closes the loop back into durable rules. It's the difference between an agent with a diary and an agent that learns, and it's the subject of part two.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Retrieval is not a memory type
&lt;/h2&gt;

&lt;p&gt;This is where I'll take the unpopular position. RAG isn't a kind of memory. It's a delivery mechanism.&lt;/p&gt;

&lt;p&gt;The mechanics are familiar: embed the query, run a similarity search over a vector store, inject the top-k chunks into the context, and let the model answer grounded in them. A compliance bot keeps RBI and SEBI rules in a vector store and pulls only the few passages a KYC question needs. Useful, yes. But notice what's stored there: semantic facts and episodic events. Retrieval is how those get from the store into working memory. It's the pipe, not the water.&lt;/p&gt;

&lt;p&gt;This matters because conflating the two is exactly why so many agents are "just a vector DB" and stop there. Once you see retrieval as plumbing, the real questions surface: what's worth storing, how do you keep it consistent, and when is similarity the wrong access pattern entirely? It usually is for rules and exact lookups, where similarity will happily hand you the wrong threshold because it reads close to the right one.&lt;/p&gt;

&lt;p&gt;And while we're here: similarity is not relevance. The chunk that scores highest against your query embedding is the one that's closest in vector space, which is not the same as the one that answers the question. The gap between those two is where most "the RAG is hallucinating" bug reports actually come from.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Parametric memory: the knowledge the model is
&lt;/h2&gt;

&lt;p&gt;Parametric memory is everything baked into the weights at training time: grammar, arithmetic, broad world knowledge, the fact that Paris is the capital of France. The model doesn't consult this. It &lt;em&gt;is&lt;/em&gt; this. No retrieval step, always available, instant.&lt;/p&gt;

&lt;p&gt;Which is exactly why it's a trap for anything that changes. The weights are frozen between training runs. They're opaque, they can be confidently wrong, and they know nothing past the cutoff. So the design boundary is sharp: general reasoning and language go to parametric, and anything volatile, proprietary, or recent goes to an external store.&lt;/p&gt;

&lt;p&gt;Concretely: knowing what "loan-to-value" means is parametric, and you should trust the model for it. The current repo rate, or this specific bank's product rules, must come from a store you control. Bake in the stuff that doesn't move. Retrieve the stuff that does. Confusing the two is how an agent ends up quoting last year's interest rate with total confidence.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Prospective memory: the one you can't fake with a vector DB
&lt;/h2&gt;

&lt;p&gt;The last type is the one that's architecturally different from everything above. Prospective memory is remembering to act later: intentions the agent formed but hasn't run yet. "Send the portfolio review on the 1st of every month." "When this client's SIP fails, alert the relationship manager." "Review this FD maturity in September," decided back in March.&lt;/p&gt;

&lt;p&gt;You cannot build this with similarity search, and that's the whole point. There's no query to embed. It's a task queue, a scheduler, and goal trackers that fire on a trigger, either a clock or an event. If your agent needs to do something next Monday, no amount of retrieval tuning gets you there. You need a thing that wakes up on Monday.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# A scheduler, not a vector store.
&lt;/span&gt;&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;send_portfolio_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cron&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;review_fd_maturity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-09-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sip_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;alert_rm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;APScheduler covers the in-process case. Reach for Celery or Temporal when the intention has to survive a crash. Either way the trigger is a clock or an event, never a query.&lt;/p&gt;

&lt;p&gt;This is the type that separates a chatbot from an agent with a horizon. Skip it and your agent can only ever react to the message in front of it. Build it and the agent can schedule its own future, which is most of what "agentic" actually means.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where each type lives in real tooling
&lt;/h2&gt;

&lt;p&gt;If you want the shortcut from taxonomy to dependencies, this is roughly where each type lands today.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;What it needs&lt;/th&gt;
&lt;th&gt;Tools to reach for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Persisted thread state, eviction and summary&lt;/td&gt;
&lt;td&gt;LangGraph checkpointers, your framework's thread state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Structured store plus vector recall&lt;/td&gt;
&lt;td&gt;Postgres + pgvector/Qdrant/Pinecone; LangMem, Mem0, Zep&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Append-only event log plus a similarity index&lt;/td&gt;
&lt;td&gt;Postgres or Mongo + a vector index; Zep, Mem0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedural&lt;/td&gt;
&lt;td&gt;Exact key lookup plus a tuned system prompt&lt;/td&gt;
&lt;td&gt;Redis or Postgres; LangMem for rule and prompt tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;Similarity search over the stores above&lt;/td&gt;
&lt;td&gt;pgvector, Qdrant, Weaviate, Pinecone, Chroma; LlamaIndex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parametric&lt;/td&gt;
&lt;td&gt;The weights, plus an optional fine-tune&lt;/td&gt;
&lt;td&gt;the base model; LoRA/PEFT if you must bake it in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prospective&lt;/td&gt;
&lt;td&gt;A scheduler or queue with triggers&lt;/td&gt;
&lt;td&gt;APScheduler, Celery, Temporal, cron&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these are load-bearing on their own. The architecture is the seven boxes and how they hand knowledge between each other. The libraries just fill the boxes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bottom line: what to actually build
&lt;/h2&gt;

&lt;p&gt;If you're starting an agent and want to know where to spend, here's the order I'd put it in.&lt;/p&gt;

&lt;p&gt;Start with working memory and treat it as a budget, not a convenience. It's the one cost that compounds, so settle your eviction and summarization strategy before you build anything clever on top. Everything else loads in here, and you pay for it per turn.&lt;/p&gt;

&lt;p&gt;Keep semantic, episodic, and procedural in separate stores: one for facts, one for events, one for rules and workflows. The day you can update what the agent knows without touching what it did, you've built something you can maintain.&lt;/p&gt;

&lt;p&gt;Treat retrieval as plumbing. It serves the three stores above, nothing more. If your memory system is only a vector DB, you've built the pipe and forgotten the water.&lt;/p&gt;

&lt;p&gt;Draw the parametric line hard. General reasoning comes from the weights, everything volatile comes from a store you own. Never let a frozen model be your source of truth for anything that has a date on it.&lt;/p&gt;

&lt;p&gt;Add prospective memory the day your agent needs a future. A scheduler, not a vector store. It's the cheapest type to describe and the one most agents are missing.&lt;/p&gt;

&lt;p&gt;And if you only do one of these well, do the consolidation loop: the promotion of repeated episodes into durable rules. Logging what happened is table stakes. Turning it into something the agent applies next time is the part almost nobody finishes, and it's the whole reason to build memory instead of just a bigger prompt.&lt;/p&gt;

&lt;p&gt;That handoff, how the stores read, write, and promote knowledge between each other, is the difference between an agent that answers and one that gets better at its job. Most agents shipping today are a context window and a vector DB with a good README. The seven-store version is more work, and it's the version that learns. How those stores talk to each other is part two.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>Claude Code Is Reading Your .env File Right Now — And You Probably Don't Know It</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Tue, 19 May 2026 03:05:32 +0000</pubDate>
      <link>https://dev.to/shudiptotrafder/claude-code-is-reading-your-env-file-right-now-and-you-probably-dont-know-it-3ja5</link>
      <guid>https://dev.to/shudiptotrafder/claude-code-is-reading-your-env-file-right-now-and-you-probably-dont-know-it-3ja5</guid>
      <description>&lt;p&gt;Every time you open a project with Claude Code, it starts scanning your files. Your source code. Your configs. Your &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;By the time you type your first prompt, Claude already knows your database password, your Supabase service key, and that Twilio auth token you've been meaning to rotate for three months. And depending on your setup, all of that might be sitting in Anthropic's conversation logs right now.&lt;/p&gt;

&lt;p&gt;I know this sounds alarmist. But this isn't theoretical — a GitHub issue filed in April 2026 confirmed that Claude reads and echoes &lt;code&gt;.env&lt;/code&gt; contents into conversation context, &lt;strong&gt;even when you've explicitly told it not to in your &lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/strong&gt;. That was the moment I stopped trusting advisory rules and started understanding how Claude's permission system actually works.&lt;/p&gt;

&lt;p&gt;Here's what I found — and more importantly, how to actually fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The False Sense of Security: Why CLAUDE.md Won't Save You
&lt;/h2&gt;

&lt;p&gt;The first thing most developers do is open their &lt;code&gt;CLAUDE.md&lt;/code&gt; and write something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Never read &lt;code&gt;.env&lt;/code&gt; files. Never expose API keys."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Reasonable. Logical. And almost completely useless as a security control.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable truth: &lt;code&gt;CLAUDE.md&lt;/code&gt; is a suggestion, not a constraint. Claude follows it under normal conditions — short context windows, simple tasks, clear intent. But push the model into a complex debugging session with a long conversation history and an ambiguous instruction, and those advisory rules start slipping.&lt;/p&gt;

&lt;p&gt;The model isn't being malicious. It's just prioritizing. When the system prompt says "don't read &lt;code&gt;.env&lt;/code&gt;" but the task at hand requires understanding why a database connection is failing, the task usually wins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The only thing that actually enforces a hard boundary is a deny rule in &lt;code&gt;settings.json&lt;/code&gt;.&lt;/strong&gt; Deny rules are evaluated before Claude even attempts the operation. The file never opens. The contents never enter the context. It's the difference between "please don't" and "you physically cannot."&lt;/p&gt;




&lt;h2&gt;
  
  
  It's Not Just One Leak — It's Three
&lt;/h2&gt;

&lt;p&gt;Most developers, once they hear about this problem, go add a deny rule for &lt;code&gt;.env&lt;/code&gt; files and call it done. That blocks one of the three ways your secrets get exposed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leak #1: Direct file read&lt;/strong&gt; — This is the obvious one. Claude scans your project directory, opens &lt;code&gt;.env&lt;/code&gt;, and the keys become part of the conversation. Deny rules stop this completely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leak #2: Runtime output capture&lt;/strong&gt; — This is the sneaky one. Claude runs your test suite. One test makes an HTTP request with an &lt;code&gt;Authorization&lt;/code&gt; header. The request fails and the error log dumps the full header value — your live API key — into the terminal output. Claude captures all of that output. Your secret is now in the conversation, and Claude never needed to open a single file.&lt;/p&gt;

&lt;p&gt;Or imagine a database connection timing out. The error message includes the full connection string: &lt;code&gt;postgres://admin:MyActualPassword123@prod-database.us-east-1.rds.amazonaws.com/appdatabase&lt;/code&gt;. Claude sees it. It's in context. Done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leak #3: Search and grep&lt;/strong&gt; — Claude uses grep to find where you defined a helper function. The search returns matches from a config file that happens to contain your Resend API key alongside the function definition. The matched lines show up in grep output. Claude reads it. You never suspected a thing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!warning]+ Reality Check&lt;br&gt;
Most guides on this topic protect against Leak #1 only. Leaks #2 and #3 are where real credentials actually escape in production workflows. I'll show you how to handle all three.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Fix That Actually Works: Hard Deny Rules
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;~/.claude/settings.json&lt;/code&gt; — or create it if it doesn't exist. This is the global config that applies to every project you open with Claude Code.&lt;/p&gt;

&lt;p&gt;Add deny rules for every sensitive file pattern you want to block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.env*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/secrets/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/credentials/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.ssh/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/.env*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/secrets/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/credentials/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/.ssh/**)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;**&lt;/code&gt; wildcard means these rules apply to every subdirectory, not just the project root. If you have a monorepo with a &lt;code&gt;packages/api/.env.production&lt;/code&gt;, it's blocked. If your CI scripts live in &lt;code&gt;tooling/scripts/.env.ci&lt;/code&gt;, it's blocked.&lt;/p&gt;

&lt;p&gt;Write rules matter too — you don't want Claude accidentally creating or overwriting &lt;code&gt;.env&lt;/code&gt; files during a "let me set up your environment" task.&lt;/p&gt;




&lt;h2&gt;
  
  
  Solving Leak #2: The Test Environment Trick
&lt;/h2&gt;

&lt;p&gt;Deny rules can't intercept runtime output — that's just text flowing through the terminal. The solution is to ensure Claude never runs code that has access to real credentials in the first place.&lt;/p&gt;

&lt;p&gt;Create a &lt;code&gt;.env.test&lt;/code&gt; file that contains placeholder values for every key your app uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env.test — safe values for all automated tasks&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-test-placeholder-not-real
&lt;span class="nv"&gt;SUPABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://test-project.supabase.co
&lt;span class="nv"&gt;SUPABASE_SERVICE_ROLE_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.test
&lt;span class="nv"&gt;RESEND_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;re_test_placeholder_123456789
&lt;span class="nv"&gt;TWILIO_ACCOUNT_SID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ACtest00000000000000000000000000000
&lt;span class="nv"&gt;TWILIO_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;test_auth_token_placeholder_value
&lt;span class="nv"&gt;REDIS_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;redis://localhost:6379
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point your test runner at &lt;code&gt;.env.test&lt;/code&gt; instead of &lt;code&gt;.env&lt;/code&gt;. For Python projects using &lt;code&gt;pytest&lt;/code&gt;, add this to &lt;code&gt;pytest.ini&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[pytest]&lt;/span&gt;
&lt;span class="py"&gt;env_files&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;.env.test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Node.js projects, load it explicitly in your test setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test/setup.js&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dotenv&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.env.test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;override&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when Claude runs your test suite and a request fails with a logged header, the only key that shows up is &lt;code&gt;sk-ant-test-placeholder-not-real&lt;/code&gt;. Harmless.&lt;/p&gt;




&lt;h2&gt;
  
  
  Solving Leak #3: The Pre-Commit Safety Net
&lt;/h2&gt;

&lt;p&gt;Even with deny rules and test environments, the repo itself is the last line of defense. A pre-commit hook scans every staged file before it reaches git history — and git history is permanent in a way that conversations aren't.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;.git/hooks/pre-commit&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="c"&gt;# Pre-commit hook: blocks commits that contain credential patterns&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;SECRET_PATTERNS&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;
  &lt;span class="s1"&gt;'sk-ant-api'&lt;/span&gt;              &lt;span class="c"&gt;# Anthropic production keys&lt;/span&gt;
  &lt;span class="s1"&gt;'re_[A-Za-z0-9]{20,}'&lt;/span&gt;    &lt;span class="c"&gt;# Resend API keys&lt;/span&gt;
  &lt;span class="s1"&gt;'eyJhbGciOiJIUzI1NiJ9'&lt;/span&gt;   &lt;span class="c"&gt;# Supabase JWTs (common header)&lt;/span&gt;
  &lt;span class="s1"&gt;'ACa[0-9a-f]{32}'&lt;/span&gt;        &lt;span class="c"&gt;# Twilio Account SIDs&lt;/span&gt;
  &lt;span class="s1"&gt;'AKID[A-Z0-9]{16}'&lt;/span&gt;       &lt;span class="c"&gt;# Cloud access key IDs&lt;/span&gt;
  &lt;span class="s1"&gt;'postgres://[^@]+:[^@]+@'&lt;/span&gt; &lt;span class="c"&gt;# Postgres DSNs with embedded passwords&lt;/span&gt;
  &lt;span class="s1"&gt;'mongodb\+srv://[^@]+:[^@]+@'&lt;/span&gt; &lt;span class="c"&gt;# MongoDB Atlas URIs&lt;/span&gt;
  &lt;span class="s1"&gt;'-----BEGIN (RSA|EC|OPENSSH) PRIVATE KEY-----'&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;BLOCKED_FILENAMES&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;
  &lt;span class="s1"&gt;'.env'&lt;/span&gt;
  &lt;span class="s1"&gt;'.env.local'&lt;/span&gt;
  &lt;span class="s1"&gt;'.env.production'&lt;/span&gt;
  &lt;span class="s1"&gt;'id_rsa'&lt;/span&gt;
  &lt;span class="s1"&gt;'id_ed25519'&lt;/span&gt;
  &lt;span class="s1"&gt;'*.p12'&lt;/span&gt;
  &lt;span class="s1"&gt;'*.pfx'&lt;/span&gt;
  &lt;span class="s1"&gt;'service-account.json'&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;FOUND_ISSUE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0

&lt;span class="c"&gt;# Check for secret patterns in staged diff&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;pattern &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SECRET_PATTERNS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if &lt;/span&gt;git diff &lt;span class="nt"&gt;--cached&lt;/span&gt; &lt;span class="nt"&gt;--diff-filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ACM &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s2"&gt;"^&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="s2"&gt;.*&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pattern&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ BLOCKED: Potential secret found matching pattern: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pattern&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nv"&gt;FOUND_ISSUE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
  &lt;span class="k"&gt;fi
done&lt;/span&gt;

&lt;span class="c"&gt;# Check for sensitive filenames being staged&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;filename &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BLOCKED_FILENAMES&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if &lt;/span&gt;git diff &lt;span class="nt"&gt;--cached&lt;/span&gt; &lt;span class="nt"&gt;--name-only&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qF&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ BLOCKED: Sensitive file staged for commit: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nv"&gt;FOUND_ISSUE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
  &lt;span class="k"&gt;fi
done

if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$FOUND_ISSUE&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 1 &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Remove the flagged content and try again."&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"If this is a false positive, use: git commit --no-verify"&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Pre-commit security scan passed."&lt;/span&gt;
&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make it executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x .git/hooks/pre-commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches the patterns that matter for modern stacks: Anthropic keys, Resend, Supabase JWTs, Twilio, cloud IAM keys, embedded database passwords in connection strings, and private key material. If you hit a false positive on a test value, &lt;code&gt;git commit --no-verify&lt;/code&gt; skips the hook — but that's a conscious override, not an accident.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Nuclear Option: Container Isolation
&lt;/h2&gt;

&lt;p&gt;For client work or anything touching production credentials, there's a more drastic approach: don't let &lt;code&gt;.env&lt;/code&gt; files exist inside Claude's environment at all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Replace .env with an empty file at mount time&lt;/span&gt;
docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;:/workspace"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /dev/null:/workspace/.env &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /dev/null:/workspace/.env.local &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-w&lt;/span&gt; /workspace &lt;span class="se"&gt;\&lt;/span&gt;
  your-dev-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From Claude's perspective, &lt;code&gt;.env&lt;/code&gt; is an empty file. The deny rules still apply. The test environment still runs. But even if something went wrong at every other layer, there's nothing to leak because the file physically contains nothing.&lt;/p&gt;

&lt;p&gt;This is overkill for personal projects. It's the right call for anything where you're holding someone else's production database password.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Config: Copy, Paste, Done
&lt;/h2&gt;

&lt;p&gt;Here's the complete &lt;code&gt;~/.claude/settings.json&lt;/code&gt; that combines everything — allowing normal development operations while blocking secrets and dangerous commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Glob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Grep"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"LS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Edit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"MultiEdit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(src/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(tests/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(docs/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(python -m pytest *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(uv run *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(poetry run *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(npm run *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(npx *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git status)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git diff *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git log *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git add *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git commit *)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.env*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.dev.vars*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.pem)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.key)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/*.p12)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/secrets/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/credentials/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.aws/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.ssh/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/config/secrets.toml)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/config/production.json)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.netrc)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Read(**/.pypirc)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/.env*)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/secrets/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(**/.ssh/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Write(.github/workflows/**)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(rm -rf *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(sudo *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(git push *)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(pip install * --user)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(curl * | bash)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(curl * | sh)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"Bash(wget * -O- | sh)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaultMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acceptEdits"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The allow list covers what you actually do day-to-day: reading code, making edits, running tests, checking git status. The deny list covers secrets, sensitive system directories, and shell patterns that could pipe remote code into execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decision Matrix
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your situation                            → What to do
─────────────────────────────────────────────────────────────────
Personal projects, no production creds   → deny rules in settings.json
Team project with shared repo            → deny rules + pre-commit hook
Running Claude-generated tests often     → deny rules + .env.test setup
Client work / holding their credentials  → All of the above + container isolation
CI/CD pipeline with Claude integration   → Vault (AWS Secrets Manager, GCP Secret Manager)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Before You Close This Tab: The 6-Point Check
&lt;/h2&gt;

&lt;p&gt;Run through these right now, not after your next session:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;~/.claude/settings.json&lt;/code&gt; exists&lt;/strong&gt; and has deny rules for &lt;code&gt;.env*&lt;/code&gt;, &lt;code&gt;*.pem&lt;/code&gt;, &lt;code&gt;*.key&lt;/code&gt;, and &lt;code&gt;secrets/**&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.env.test&lt;/code&gt; exists&lt;/strong&gt; with placeholder values for every key your test suite touches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.git/hooks/pre-commit&lt;/code&gt; is executable&lt;/strong&gt; and scans for credential patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.env&lt;/code&gt; is in &lt;code&gt;.gitignore&lt;/code&gt;&lt;/strong&gt; — if it's not, fix that first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production credentials live in a vault&lt;/strong&gt;, not in a plaintext file anywhere near your project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.env&lt;/code&gt; files live outside the project directory&lt;/strong&gt; when possible — one directory up means they're never inside a Claude scan boundary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you checked all six: you're in better shape than 95% of Claude Code users. If you checked zero: you're one long debugging session away from your live credentials becoming part of a conversation log you can't delete.&lt;/p&gt;

&lt;p&gt;The deny rules take five minutes to set up. The pre-commit hook takes another five. That's ten minutes against an unlimited blast radius.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Claude Code is genuinely useful. That's not in question. But "useful" and "safe by default" are different things, and right now it leans heavily toward the former.&lt;/p&gt;

&lt;p&gt;The tooling to make it safe exists — it's just not turned on out of the box. &lt;code&gt;CLAUDE.md&lt;/code&gt; instructions feel like security because they're written with security intent. But they're conversation rules, not permission boundaries. One confused model state and they're gone.&lt;/p&gt;

&lt;p&gt;Deny rules in &lt;code&gt;settings.json&lt;/code&gt; are a different category entirely. They're enforced at the system level, before the model sees anything. That's what a real boundary looks like.&lt;/p&gt;

&lt;p&gt;Set them up once. Run every future session knowing your secrets are actually safe.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>vibecoding</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>InjectQ: The Modern Python Dependency Injection Library</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Sun, 17 May 2026 17:19:14 +0000</pubDate>
      <link>https://dev.to/shudiptotrafder/injectq-the-modern-python-dependency-injection-library-1gnc</link>
      <guid>https://dev.to/shudiptotrafder/injectq-the-modern-python-dependency-injection-library-1gnc</guid>
      <description>&lt;h3&gt;
  
  
  The Pain of Python Apps (And How InjectQ Fixes It)
&lt;/h3&gt;

&lt;p&gt;Ever built a Python app that started simple but slowly turned into a tangled web of dependencies?&lt;br&gt;
Where changing one component breaks another?&lt;br&gt;
Where testing becomes painful and dependency management spirals out of control?&lt;/p&gt;

&lt;p&gt;You’re not alone.&lt;/p&gt;

&lt;p&gt;Most Python developers eventually hit the same problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tight coupling between components&lt;/li&gt;
&lt;li&gt;Difficult-to-test business logic&lt;/li&gt;
&lt;li&gt;Manual dependency wiring everywhere&lt;/li&gt;
&lt;li&gt;Async code that becomes messy over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional dependency injection frameworks often make things worse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verbose configuration&lt;/li&gt;
&lt;li&gt;Complex setup&lt;/li&gt;
&lt;li&gt;Poor async support&lt;/li&gt;
&lt;li&gt;Too much boilerplate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where &lt;a href="https://10xhub.github.io/injectq?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;InjectQ&lt;/a&gt; comes in.&lt;/p&gt;

&lt;p&gt;InjectQ is a lightweight, modern dependency injection library for Python that feels intuitive from day one.&lt;br&gt;
It’s as simple as using a dictionary, yet powerful enough for enterprise-grade applications.&lt;/p&gt;

&lt;p&gt;Built for modern Python:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Async-first&lt;/li&gt;
&lt;li&gt;Type-safe&lt;/li&gt;
&lt;li&gt;Thread-safe&lt;/li&gt;
&lt;li&gt;FastAPI-ready&lt;/li&gt;
&lt;li&gt;FastMCP-ready&lt;/li&gt;
&lt;li&gt;Taskiq-ready&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  5-Minute Setup
&lt;/h2&gt;

&lt;p&gt;Install InjectQ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;injectq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Your First InjectQ App
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;singleton&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="nd"&gt;@singleton&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB connected!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@singleton&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="nd"&gt;@inject&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DB&lt;/span&gt; &lt;span class="n"&gt;connected&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No manual wiring.&lt;br&gt;
No container plumbing.&lt;br&gt;
Just clean, testable Python code.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why InjectQ?
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Dictionary-Simple API
&lt;/h3&gt;

&lt;p&gt;InjectQ keeps dependency management intuitive.&lt;/p&gt;

&lt;p&gt;Register values, classes, or instances using familiar dictionary syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello World&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapid prototyping&lt;/li&gt;
&lt;li&gt;Config management&lt;/li&gt;
&lt;li&gt;Lightweight applications&lt;/li&gt;
&lt;li&gt;Enterprise services&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Automatic Injection with &lt;code&gt;@inject&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;@inject&lt;/code&gt; decorator resolves dependencies automatically using type hints.&lt;/p&gt;

&lt;p&gt;Works with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Functions&lt;/li&gt;
&lt;li&gt;Class methods&lt;/li&gt;
&lt;li&gt;Static methods&lt;/li&gt;
&lt;li&gt;Async functions
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;


&lt;span class="nd"&gt;@inject&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No need to pass &lt;code&gt;service&lt;/code&gt; manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Lazy Injection with &lt;code&gt;Inject[T]&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Need optional or deferred dependencies?&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;Inject[T]&lt;/code&gt; for lazy resolution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Inject&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Inject&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dependencies resolve only when accessed.&lt;/p&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster startup time&lt;/li&gt;
&lt;li&gt;Lower memory usage&lt;/li&gt;
&lt;li&gt;Cleaner optional dependency handling&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Powerful Factory Support
&lt;/h2&gt;

&lt;p&gt;InjectQ supports runtime-aware factories with mixed injected and manual parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;UserHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_factory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request-specific objects&lt;/li&gt;
&lt;li&gt;Multi-tenant systems&lt;/li&gt;
&lt;li&gt;Background jobs&lt;/li&gt;
&lt;li&gt;Dynamic runtime construction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Lifecycle &amp;amp; Scope Control
&lt;/h2&gt;

&lt;p&gt;Control object lifecycles precisely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;singleton&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scoped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transient&lt;/span&gt;


&lt;span class="nd"&gt;@singleton&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DatabasePool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;


&lt;span class="nd"&gt;@scoped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;


&lt;span class="nd"&gt;@transient&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Validator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Available Scopes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@singleton&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One instance for the entire application&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@scoped()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One instance per scope/context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@transient&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;New instance every resolution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database pools&lt;/li&gt;
&lt;li&gt;Request contexts&lt;/li&gt;
&lt;li&gt;Per-task resources&lt;/li&gt;
&lt;li&gt;Stateless utilities&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. Async-First by Design
&lt;/h2&gt;

&lt;p&gt;InjectQ was built for modern async Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;


&lt;span class="nd"&gt;@inject&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;async_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AsyncService&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;async_task&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Async dependency resolution&lt;/li&gt;
&lt;li&gt;Async factories&lt;/li&gt;
&lt;li&gt;Async scopes&lt;/li&gt;
&lt;li&gt;Async frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No hacks. No wrappers. Native async support.&lt;/p&gt;




&lt;h2&gt;
  
  
  Framework Integrations
&lt;/h2&gt;

&lt;p&gt;InjectQ integrates seamlessly with modern Python frameworks.&lt;/p&gt;




&lt;h2&gt;
  
  
  FastAPI Integration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq.integrations.fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;InjectFastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;setup_fastapi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;setup_fastapi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users/{user_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;InjectFastAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request-scoped dependencies&lt;/li&gt;
&lt;li&gt;Type-safe injection&lt;/li&gt;
&lt;li&gt;Async-native support&lt;/li&gt;
&lt;li&gt;Clean route handlers&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Taskiq Integration
&lt;/h2&gt;

&lt;p&gt;Background job processing becomes clean and maintainable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;taskiq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryBroker&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq.integrations.taskiq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;InjectTaskiq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;setup_taskiq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;broker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryBroker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;setup_taskiq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@broker.task&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;OrderService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;InjectTaskiq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OrderService&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Async workers&lt;/li&gt;
&lt;li&gt;Distributed tasks&lt;/li&gt;
&lt;li&gt;Queue processing&lt;/li&gt;
&lt;li&gt;Event-driven systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FastMCP Integration
&lt;/h2&gt;

&lt;p&gt;Build clean, dependency-injected MCP servers effortlessly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;injectq.integrations.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;InjectFastMCP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;setup_fastmcp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InjectQ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;setup_fastmcp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;InjectFastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI tools&lt;/li&gt;
&lt;li&gt;MCP servers&lt;/li&gt;
&lt;li&gt;Agent platforms&lt;/li&gt;
&lt;li&gt;LLM applications&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance That Impresses
&lt;/h2&gt;

&lt;p&gt;InjectQ is designed for speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dependency Resolution&lt;/td&gt;
&lt;td&gt;~1µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10-Service Web Request&lt;/td&gt;
&lt;td&gt;~142µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DI Throughput&lt;/td&gt;
&lt;td&gt;7,000+ req/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even deep dependency trees resolve extremely fast.&lt;/p&gt;

&lt;p&gt;Built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread safety&lt;/li&gt;
&lt;li&gt;Zero unnecessary locks&lt;/li&gt;
&lt;li&gt;Optimized async execution&lt;/li&gt;
&lt;li&gt;Minimal overhead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Developers Choose InjectQ
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Simple and intuitive API&lt;/li&gt;
&lt;li&gt;Async-first architecture&lt;/li&gt;
&lt;li&gt;Excellent performance&lt;/li&gt;
&lt;li&gt;Type-safe dependency injection&lt;/li&gt;
&lt;li&gt;Lightweight and minimal&lt;/li&gt;
&lt;li&gt;Production-ready scopes&lt;/li&gt;
&lt;li&gt;FastAPI integration&lt;/li&gt;
&lt;li&gt;Taskiq integration&lt;/li&gt;
&lt;li&gt;FastMCP integration&lt;/li&gt;
&lt;li&gt;Enterprise-friendly design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scripts&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;Background workers&lt;/li&gt;
&lt;li&gt;MCP servers&lt;/li&gt;
&lt;li&gt;Enterprise systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;InjectQ scales with your application.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;injectq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Documentation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://10xhub.github.io/injectq?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;InjectQ Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub Repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/10XScale-in/injectq?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;InjectQ GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Start building cleaner, faster, and more maintainable Python applications with InjectQ.&lt;/p&gt;

</description>
      <category>python</category>
      <category>dependencyinjection</category>
      <category>fastapi</category>
      <category>fastmcp</category>
    </item>
    <item>
      <title>AgentFlow — From Agent Code to Production API in Minutes</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Sun, 03 May 2026 17:09:58 +0000</pubDate>
      <link>https://dev.to/10xscale/agentflow-from-agent-code-to-production-api-in-minutes-p3e</link>
      <guid>https://dev.to/10xscale/agentflow-from-agent-code-to-production-api-in-minutes-p3e</guid>
      <description>&lt;h2&gt;
  
  
  AgentFlow — The Python Framework for Production AI Agents
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Stop rebuilding the same agent infrastructure. AgentFlow gives you auth, streaming, persistence, and a React frontend — out of the box.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AgentFlow (&lt;code&gt;10xscale-agentflow&lt;/code&gt; on PyPI) is an open-source Python framework for building and deploying multi-agent AI systems. Write your agent graph once. Run it locally. Ship it to production without rewriting your backend.&lt;/p&gt;

&lt;p&gt;Built by &lt;a href="https://10xscale.ai/" rel="noopener noreferrer"&gt;10xScale&lt;/a&gt;. MIT licensed. No vendor lock-in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AgentFlow?
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks stop at the prototype. You get a cute demo, then spend weeks bolting on auth, rate limiting, persistence, and a frontend. AgentFlow is built for what comes after the demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One framework. From first &lt;code&gt;pip install&lt;/code&gt; to production Docker deploy.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔗 Links
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Python Library&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow" rel="noopener noreferrer"&gt;github.com/10xHub/Agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API &amp;amp; CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/agentflow-cli" rel="noopener noreferrer"&gt;github.com/10xHub/agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10xhub.github.io/agentflow-docs" rel="noopener noreferrer"&gt;10xhub.github.io/agentflow-docs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI — Core&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow/" rel="noopener noreferrer"&gt;pypi.org/project/10xscale-agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI — CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow-cli/" rel="noopener noreferrer"&gt;pypi.org/project/10xscale-agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Full Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agentflow            →  Core Python orchestration engine
agentflow-cli        →  FastAPI server + CLI tooling
agentflow-client     →  TypeScript/React SDK (@10xscale/agentflow-client)
agentflow-playground →  Hosted UI for testing agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use any layer alone. Use them together for a complete AI product stack — from LLM call to browser UI — without stitching four different libraries together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Running in 60 Seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow-cli

agentflow init   &lt;span class="c"&gt;# scaffold a new project&lt;/span&gt;
agentflow api    &lt;span class="c"&gt;# start the dev server&lt;/span&gt;
agentflow play   &lt;span class="c"&gt;# open the playground UI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your agent is running, streamed, and explorable in under a minute.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Graph-Based Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;AgentFlow uses a &lt;code&gt;StateGraph&lt;/code&gt; — directed nodes, conditional edges, and full control over execution flow. No black boxes. No magic routing you can't debug.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.state&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.utils.constants&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather for a location.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny, 72°F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini/gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;tool_node_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;tools_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather in NYC?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]},&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stateful. Tool-calling. Under 30 lines.&lt;/p&gt;




&lt;h3&gt;
  
  
  LLM-Agnostic
&lt;/h3&gt;

&lt;p&gt;Pass the model string. AgentFlow routes it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI (GPT-4o, o3, etc.)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install openai&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini + Vertex AI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install google-genai&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Claude&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pip install anthropic&lt;/code&gt; &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No provider-specific abstractions to learn. Swap models without touching your agent logic.&lt;/p&gt;




&lt;h3&gt;
  
  
  Parallel Tool Execution — Automatic
&lt;/h3&gt;

&lt;p&gt;When an LLM calls multiple tools at once, AgentFlow runs them concurrently. No config required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Other frameworks:  1.0s + 1.5s + 0.8s = 3.3s
AgentFlow:         max(1.0s, 1.5s, 0.8s) = 1.5s  ⚡ 2.2x faster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Production Memory — Three Layers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Working Memory    →  Current execution state (AgentState)
Session Memory    →  Redis (hot) + PostgreSQL (durable) checkpointer
Knowledge Memory  →  Qdrant vector store + Mem0 semantic recall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Redis keeps hot conversation state fast. PostgreSQL keeps it durable and horizontally scalable. Both run together — you don't pick one.&lt;/p&gt;




&lt;h3&gt;
  
  
  Streaming
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream_gen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_granularity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ResponseGranularity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream_gen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three granularity levels: token-by-token (ChatGPT-style), message-by-message, or node-by-node graph traces. Your frontend decides what to show.&lt;/p&gt;




&lt;h3&gt;
  
  
  Auth and Security — Built In, Not Bolted On
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most frameworks leave auth as an exercise for the reader. AgentFlow ships it.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jwt"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"custom"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auth.my_backend:MyAuth"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line in &lt;code&gt;agentflow.json&lt;/code&gt;. Switch from dev to production auth without touching your graph code.&lt;/p&gt;

&lt;p&gt;Security features included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JWT authentication with configurable secrets&lt;/li&gt;
&lt;li&gt;Custom auth backends for OAuth2, API keys, and sessions&lt;/li&gt;
&lt;li&gt;Role-Based Access Control (RBAC)&lt;/li&gt;
&lt;li&gt;Sliding-window rate limiting (memory or Redis backends)&lt;/li&gt;
&lt;li&gt;Configurable request size limits (DoS protection, default 10 MB)&lt;/li&gt;
&lt;li&gt;Auto-redaction of tokens and secrets from logs&lt;/li&gt;
&lt;li&gt;Startup validation — warns about insecure CORS and debug mode before you accidentally deploy them&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Lifecycle Callbacks
&lt;/h3&gt;

&lt;p&gt;Hook into every layer of execution — before and after each LLM call, tool call, or MCP invocation. Hook into the graph itself for start, end, checkpoint, interrupt, resume, and error events.&lt;/p&gt;

&lt;p&gt;Use them for audit logs, billing meters, policy enforcement, prompt-injection checks, or any business logic that shouldn't live inside the prompt.&lt;/p&gt;




&lt;h3&gt;
  
  
  The CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentflow init              &lt;span class="c"&gt;# scaffold project + config&lt;/span&gt;
agentflow api               &lt;span class="c"&gt;# dev server with auto-reload&lt;/span&gt;
agentflow play              &lt;span class="c"&gt;# open playground against local backend&lt;/span&gt;
agentflow build &lt;span class="nt"&gt;--docker-compose&lt;/span&gt;  &lt;span class="c"&gt;# generate Dockerfile + compose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-generated FastAPI endpoints:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/invoke&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;POST — synchronous agent call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;POST — streaming agent call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET — list conversation threads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET — fetch thread history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DELETE — delete thread&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your agent graph becomes a production API. No FastAPI boilerplate to write.&lt;/p&gt;




&lt;h3&gt;
  
  
  Dependency Injection with InjectQ
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather for a location.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather for user &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean tools. Testable tools. Per-request context without global state.&lt;/p&gt;




&lt;h3&gt;
  
  
  Human-in-the-Loop
&lt;/h3&gt;

&lt;p&gt;Pause execution mid-graph. Inject a human decision. Resume with full state intact. No re-running prior steps.&lt;/p&gt;

&lt;p&gt;Approval workflows, moderation gates, interactive debugging — all supported without custom state management.&lt;/p&gt;




&lt;h3&gt;
  
  
  Event Publishing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Publisher&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Redis Pub/Sub&lt;/td&gt;
&lt;td&gt;Lightweight in-process distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;High-volume event streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RabbitMQ&lt;/td&gt;
&lt;td&gt;Reliable queuing, distributed systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Console&lt;/td&gt;
&lt;td&gt;Local debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Any backend you want&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  React/TypeScript Client SDK
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;@10xscale/agentflow-client&lt;/code&gt; gives you React hooks (&lt;code&gt;useAgent&lt;/code&gt;, &lt;code&gt;useStream&lt;/code&gt;, &lt;code&gt;useThreads&lt;/code&gt;), token-level streaming for ChatGPT-style UIs, and client-side tool execution. The frontend talks to your AgentFlow API without custom integration code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;AgentFlow&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;LangGraph&lt;/th&gt;
&lt;th&gt;CrewAI&lt;/th&gt;
&lt;th&gt;AutoGen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;Role-Based&lt;/td&gt;
&lt;td&gt;Conversational&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Stack (Backend + Frontend SDK)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel Tool Execution&lt;/td&gt;
&lt;td&gt;✅ Auto&lt;/td&gt;
&lt;td&gt;⚠️ Config&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;✅ Redis + Postgres&lt;/td&gt;
&lt;td&gt;⚠️ Postgres/SQLite&lt;/td&gt;
&lt;td&gt;⚠️ Local&lt;/td&gt;
&lt;td&gt;⚠️ Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency Injection&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI + Docker Deployment&lt;/td&gt;
&lt;td&gt;✅ One command&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth Built-In&lt;/td&gt;
&lt;td&gt;✅ JWT + Custom&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate Limiting&lt;/td&gt;
&lt;td&gt;✅ Memory + Redis&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lifecycle Callbacks&lt;/td&gt;
&lt;td&gt;✅ Full&lt;/td&gt;
&lt;td&gt;⚠️ Manual&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Support&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event Publishing&lt;/td&gt;
&lt;td&gt;✅ Kafka/Redis/AMQP&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Source (MIT)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Core library&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow

&lt;span class="c"&gt;# Full CLI + API server&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optional extras:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[pg_checkpoint]   &lt;span class="c"&gt;# PostgreSQL + Redis persistence&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[mcp]             &lt;span class="c"&gt;# Model Context Protocol&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[google-genai]    &lt;span class="c"&gt;# Google GenAI adapter&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[kafka]           &lt;span class="c"&gt;# Kafka event publishing&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[redis]           &lt;span class="c"&gt;# Redis publisher + rate limiting&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Current Version
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;10xscale-agentflow&lt;/code&gt; (core)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v0.7.4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;10xscale-agentflow-cli&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v0.3.2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Added in v0.7.x:&lt;/strong&gt; multimodal support (images, audio, video), extended reasoning / chain-of-thought, 3-layer memory, callback and lifecycle hooks, agent skills, Vertex AI support, structured Pydantic outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Graph engine with nodes, edges, and conditional routing&lt;/li&gt;
&lt;li&gt;✅ Redis + PostgreSQL state checkpointing&lt;/li&gt;
&lt;li&gt;✅ Tool integration — local Python, MCP, optional adapters&lt;/li&gt;
&lt;li&gt;✅ Parallel tool execution&lt;/li&gt;
&lt;li&gt;✅ Lifecycle callbacks and graph hooks&lt;/li&gt;
&lt;li&gt;✅ Streaming + event publishing&lt;/li&gt;
&lt;li&gt;✅ Human-in-the-loop&lt;/li&gt;
&lt;li&gt;✅ Multimodal agents&lt;/li&gt;
&lt;li&gt;🚧 Remote node execution for distributed processing&lt;/li&gt;
&lt;li&gt;🚧 OpenTelemetry tracing&lt;/li&gt;
&lt;li&gt;🚧 More persistence backends (DynamoDB, etc.)&lt;/li&gt;
&lt;li&gt;🚧 Visual graph editor&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Privacy and Licensing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MIT License&lt;/strong&gt; — use freely in commercial products&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No data collection&lt;/strong&gt; — your conversations and agent data stay on your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No per-call billing&lt;/strong&gt; — you pay for your LLM API and infra, not our licensing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy anywhere&lt;/strong&gt; — Docker, Kubernetes, AWS ECS, Cloud Run, Azure, Heroku&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Library&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API &amp;amp; CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/agentflow-cli" rel="noopener noreferrer"&gt;https://github.com/10xHub/agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10xhub.github.io/agentflow-docs" rel="noopener noreferrer"&gt;https://10xhub.github.io/agentflow-docs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI Core&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow/" rel="noopener noreferrer"&gt;https://pypi.org/project/10xscale-agentflow/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow-cli/" rel="noopener noreferrer"&gt;https://pypi.org/project/10xscale-agentflow-cli/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Issues &amp;amp; Requests&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow/issues" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow/issues&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discussions&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow/discussions" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow/discussions&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://10xscale.ai/" rel="noopener noreferrer"&gt;10xScale&lt;/a&gt; and the community. MIT licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>langgraph</category>
    </item>
    <item>
      <title>TOON for LLMs: A Benchmark Performance Analysis</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Sat, 27 Dec 2025 15:36:50 +0000</pubDate>
      <link>https://dev.to/shudiptotrafder/toon-for-llms-a-comparative-performance-analysis-against-json-52am</link>
      <guid>https://dev.to/shudiptotrafder/toon-for-llms-a-comparative-performance-analysis-against-json-52am</guid>
      <description>&lt;p&gt;Every API call you make with JSON is costing you more than you think.&lt;/p&gt;

&lt;p&gt;I ran real-world extractions using Gemini 2.5 Flash, and the results were startling: JSON consistently used 30–40% more output tokens than TOON format. In one test, JSON consumed 471 output tokens while TOON used just 227 — a 51% reduction.&lt;/p&gt;

&lt;p&gt;But here’s where it gets interesting: &lt;strong&gt;TOON initially failed 70% of the time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After optimization, I achieved 100% parsing success and discovered something counterintuitive — it uses more prompt tokens, with TOON actually saves you money overall. When I tested structured outputs with Pydantic models, JSON required 389 output tokens versus TOON’s simpler encoding.&lt;/p&gt;

&lt;p&gt;The hidden goldmine? &lt;strong&gt;Tool/function calling.&lt;/strong&gt; That’s where TOON’s compact format shines brightest, slashing token costs in agentic workflows where responses become the next prompt.&lt;/p&gt;

&lt;p&gt;This isn’t theoretical. I’m sharing the actual prompts, parsing errors, token counts, and code that took TOON from a 70% failure rate to production-ready. Whether TOON beats JSON depends on your use case — and I have the data to prove exactly when.&lt;/p&gt;

&lt;p&gt;Let’s break down the numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiment #1: The Initial TOON Failure (70% Success Rate)
&lt;/h2&gt;

&lt;p&gt;I started with what seemed like a straightforward test: extracting structured job description data using TOON instead of JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Setup:
&lt;/h3&gt;

&lt;p&gt;My prompt was simple — ask Gemini 2.5 Flash to extract role, skills, experience, location, and responsibilities from a job posting. For the output format, I did what seemed logical: I showed TOON’s encoded structure using the actual output format (essentially a drop-in replacement approach).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract Role, Primary Skills, Secondary Skills,
Minimum Experience, Maximum Experience,
Location, Employment Type, Summary, and Responsibilities

Job Description:
&amp;lt;JD Text&amp;gt;

Output in TOON format:

Role: ""
"Primary Skills"[2]: Python,JavaScript
"Secondary Skills"[2]: Responsibility,Communication
"Minimum Experience": ""
"Maximum Experience": ""
Location: ""
"Employment Type": ""
Summary: ""
Responsibilities[2]: Task A,Task B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what I suspected would work: By showing the encoded format with empty strings and generic placeholders, the model would understand the structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reality check: 70% failure rate.&lt;/strong&gt;&lt;br&gt;
The errors were telling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Error parsing TOON format for JD#2: Expected 10 values, but got 16&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Error parsing TOON format for JD#5: Missing colon after key&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model was confused about arrays. Sometimes it outputs &lt;code&gt;Skills: Python, JavaScript, React&lt;/code&gt; as a flat string. Other times, it attempted brackets but malformed the syntax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hypothesis:&lt;/strong&gt; Maybe showing encoded/empty examples was the problem. The model needed to see real data patterns, especially for arrays.&lt;/p&gt;
&lt;h3&gt;
  
  
  Token Usage (Failed Attempts, 70% Success Rate):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt:&lt;/strong&gt; 729 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output:&lt;/strong&gt; 227 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success Rate:&lt;/strong&gt; ~30% initially, improved to 70% after adding two real examples with populated arrays&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Json Token Usages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt:&lt;/strong&gt; 723 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output:&lt;/strong&gt; 471 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt;&lt;br&gt;
TOON's compact syntax is unforgiving. JSON has redundancy (&lt;code&gt;{"key": "value"}&lt;/code&gt;) that helps models self-correct. TOON's &lt;code&gt;Key: value&lt;/code&gt; format offers no such safety net. The model needed concrete examples, not abstract templates.&lt;/p&gt;

&lt;p&gt;But 70% wasn't good enough for production. Time to fix this properly.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #2: Achieving 100% Parsing Success (And the Token Trade-off)
&lt;/h2&gt;

&lt;p&gt;I needed to fix the 70% success rate. The solution? Stop being minimalist with examples.&lt;/p&gt;

&lt;p&gt;Instead of showing encoded/empty structures, I gave the model a complete, realistic example with proper TOON formatting — especially for arrays.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Revised Prompt:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract Role, Primary Skills, Secondary Skills,
Minimum Experience, Maximum Experience,
Location, Employment Type, Summary, and Responsibilities

Job Description:
&amp;lt;JD Text&amp;gt;

Output in TOON format. Example structure:

Role: "Senior Data Scientist"
Primary_Skills:

 [1]: "Machine Learning"
 [2]: "Statistical Analysis"
Secondary_Skills:
 [0]: "Big Data"
 [1]: "Cloud Platforms"
Minimum_Experience: "5 years"
Maximum_Experience: "10 years"
Location: "New York, NY or Remote"
Employment_Type: "Full-time"
Summary: "Lead data science initiatives"
Responsibilities:
 [0]: "Design ML models"
 [1]: "Analyze datasets"


Now provide the extraction in TOON format. Keep the format exactly
as shown above.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 100% parsing. No more malformed arrays. No more missing colons.&lt;/p&gt;

&lt;p&gt;But here's the catch—the prompt got heavier.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Token Comparison: TOON vs JSON
&lt;/h3&gt;

&lt;p&gt;Let me show you the actual numbers across the same 10 job descriptions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON Approach: Token Usage&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 723&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 471&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% (JSON is forgiving)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TOON Approach (Initial — 70% success)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 729&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 227 ✅ (51.8% reduction vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Total:&lt;/strong&gt; 956 tokens (saves 238 tokens per request)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 70% ❌&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TOON Approach (Optimized — 100% success)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 802 ❌ (+11% vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 455 ✅ (3.4% reduction vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;For basic extraction tasks, optimized TOON costs MORE than JSON.&lt;/p&gt;

&lt;p&gt;Yes, the output is slightly more compact (455 vs 471 tokens), but the verbose prompting needed to achieve 100% reliability completely erases any savings. In fact, you’re paying 5% more per request.&lt;/p&gt;

&lt;p&gt;So why am I still testing TOON?&lt;/p&gt;

&lt;p&gt;Because this experiment revealed something crucial: the baseline comparison is misleading. Real-world LLM applications don’t just extract data once — they use structured outputs for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Pydantic model validation (native SDK support)&lt;/li&gt;
&lt;li&gt; Tool/function calling (where output becomes input)&lt;/li&gt;
&lt;li&gt; Multi-turn agentic workflows (repeated serialization)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s where the math changes completely. Let me show you.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #3: Pydantic Models — Where the SDK Does the Heavy Lifting
&lt;/h2&gt;

&lt;p&gt;Here’s where things get interesting. Modern LLM SDKs have first-class support for structured outputs using Pydantic models. Instead of prompt engineering, you define a schema and let the SDK handle formatting.&lt;/p&gt;

&lt;p&gt;The key difference: You don’t need to explain the output format in your prompt — the SDK extracts it from your Pydantic model automatically.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Setup: Google’s GenAI SDK
&lt;/h3&gt;

&lt;p&gt;I used the same job extraction task, but this time with a Pydantic model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_mime_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what’s missing: No output format instructions. No examples. No “Output as JSON with these exact keys.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Become a member&lt;/strong&gt;&lt;br&gt;
The SDK injects the schema behind the scenes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Token Comparison: Pydantic JSON vs Manual TOON
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pydantic + JSON (SDK-Managed)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 647 ✅ (19.3% less than optimized TOON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 389 ✅ (14.5% less than optimized TOON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Parsing:&lt;/strong&gt; Native (SDK returns typed Python objects)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Manual TOON (From Experiment #2)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 802 ❌&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 455 ❌&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Parsing:&lt;/strong&gt; Custom (you write the parser)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Brutal Takeaway
&lt;/h3&gt;

&lt;p&gt;For structured extraction with strong SDK support, Pydantic really shines. Native Pydantic integration delivers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ✅ Cleaner prompts (~155 fewer prompt tokens)&lt;/li&gt;
&lt;li&gt;  ✅ Smaller outputs (~66 fewer output tokens)&lt;/li&gt;
&lt;li&gt;  ✅ No custom parsing logic&lt;/li&gt;
&lt;li&gt;  ✅ Built-in type validation&lt;/li&gt;
&lt;li&gt;  ✅ Parsed objects returned directly, ready to use&lt;/li&gt;
&lt;li&gt;  ✅ A much smoother developer experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, I’ll increasingly rely on Pydantic and native parsing support for structured extraction. It’s simply more reliable and maintainable than handling parsing and validation manually.&lt;/p&gt;

&lt;p&gt;That said, there’s one scenario where JSON’s verbosity becomes a genuine liability: tool calling in agentic workflows.&lt;br&gt;
That’s where TOON finally proves its worth.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #4: Tool Calling — Where TOON Finally Wins
&lt;/h2&gt;

&lt;p&gt;This is where everything clicked.&lt;/p&gt;

&lt;p&gt;In agentic workflows, your LLM doesn’t just extract data once — it calls tools, receives results, and uses those results to reason further. The tool’s response becomes part of the next prompt. And if that response is bloated with JSON syntax, you’re paying for it twice: once as output, once as input.&lt;/p&gt;

&lt;p&gt;The insight: Tool results are pure token waste. The model doesn’t need &lt;code&gt;{"key": "value"}&lt;/code&gt; ceremony—it needs the data, efficiently encoded.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Setup: Weather Agent with Function Calling
&lt;/h3&gt;

&lt;p&gt;I built a simple agent that calls a &lt;code&gt;get_current_weather&lt;/code&gt; function. The user asks for weather, the model calls the tool, the function returns data, and the model synthesizes a response.&lt;/p&gt;

&lt;p&gt;The critical moment: What format should &lt;code&gt;get_current_weather&lt;/code&gt; return?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version A: JSON Tool Response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;72 F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;forecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns JSON string
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version B: TOON Tool Response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;72 F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;forecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns TOON-encoded string
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Main code&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the weather like in New York? Share next 15 days forecast as well.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_current_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Result Token Usage:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Initial prompt tokens:&lt;/strong&gt; 152 (user message + tool definition)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Tool response tokens (becomes input):&lt;/strong&gt; 480 ✅ (24% reduction)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model’s final output:&lt;/strong&gt; 384 (slightly longer, but reasonable)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Total tokens:&lt;/strong&gt; 1,016 ✅ (11.5% reduction overall)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why TOON Wins in Agentic Workflows
&lt;/h3&gt;

&lt;p&gt;Here’s the math that matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single Tool Call&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  JSON approach: 632 tokens for tool result&lt;/li&gt;
&lt;li&gt;  TOON approach: 480 tokens for tool result&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Savings: 152 tokens per tool call (24%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multi-Turn Agent (5 tool calls)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  JSON approach: 632 × 5 = 3,160 tokens in tool results&lt;/li&gt;
&lt;li&gt;  TOON approach: 480 × 5 = 2,400 tokens in tool results&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Savings: 760 tokens (24%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Compounding Effect
&lt;/h3&gt;

&lt;p&gt;Why this matters more than single extractions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Tool results are pure input tokens&lt;/strong&gt; — You pay for them every single time&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Verbosity multiplies&lt;/strong&gt; — JSON’s &lt;code&gt;{}: ,&lt;/code&gt; Syntax adds 20-30% overhead for nested data&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No parsing penalty&lt;/strong&gt; — The model consumes TOON just as easily (we verified this in follow-up tests)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scales with agent complexity&lt;/strong&gt; — More tools = more savings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference? Where the efficiency matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;After this test runs across four different scenarios, here’s what the data tells us:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOON loses at single extractions.&lt;/strong&gt; Whether you’re doing manual prompting or using Pydantic models, JSON with SDK support is cleaner, cheaper, and more reliable. The 17.6% token savings from native schema integration beats TOON’s manual approach every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But TOON wins where it counts for agents: tool calling workflows.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your LLM’s output becomes the next prompt — when data cycles between model and functions repeatedly — TOON’s 24% reduction per tool call transforms from interesting to impactful. An agent making 20 tool calls saves 3,040 tokens per session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The decision matrix is simple:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Building a chatbot that extracts structured data? &lt;strong&gt;Use JSON + Pydantic.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Building an agent that calls tools 10+ times per session? &lt;strong&gt;Test TOON.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Building anything else? &lt;strong&gt;Profile first, optimize later.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;I’ve open-sourced all the experiments, prompts, and token measurements: &lt;a href="https://gist.github.com/the-m-u-s-h-r-o-o-m/080c9e697843339946850d5353e9343c" rel="noopener noreferrer"&gt;View complete code and results on GitHub Gist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ✅ All four experiment setups with actual prompts&lt;/li&gt;
&lt;li&gt;  ✅ Token usage logs for every test case&lt;/li&gt;
&lt;li&gt;  ✅ Side-by-side comparison scripts&lt;/li&gt;
&lt;li&gt;  ✅ The job descriptions I used for testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TOON isn’t magic — it’s math. And the math only works when token efficiency genuinely matters. For most applications, JSON’s ecosystem advantages outweigh the savings. But for token-heavy agentic workflows? TOON might just pay for itself.&lt;/p&gt;

&lt;p&gt;Now you have the data to decide.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>python</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
