<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community</title>
    <description>The most recent home feed on DEV Community.</description>
    <link>https://dev.to</link>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed"/>
    <language>en</language>
    <item>
      <title>Running a LangGraph ReAct Agent in Production: OpenAI-Compatible API + Multi-Model Gateway + One-Line Tracing</title>
      <dc:creator>Sangduk Yoo(duke)</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:57:54 +0000</pubDate>
      <link>https://dev.to/javaking1129/running-a-langgraph-react-agent-in-production-openai-compatible-api-multi-model-gateway--emi</link>
      <guid>https://dev.to/javaking1129/running-a-langgraph-react-agent-in-production-openai-compatible-api-multi-model-gateway--emi</guid>
      <description>&lt;p&gt;Most LangGraph content stops at the notebook. You build a cute ReAct loop, it answers one question, and the article ends before the hard part: &lt;em&gt;how do you actually serve this thing, swap models without a rewrite, and see what it's doing when it misbehaves?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post walks through a small but &lt;strong&gt;production-shaped&lt;/strong&gt; LangGraph deployment: a RAG ReAct agent that&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exposes an &lt;strong&gt;OpenAI-compatible HTTP API&lt;/strong&gt;, so any OpenAI client (Open WebUI, the &lt;code&gt;openai&lt;/code&gt; SDK, LibreChat) can talk to it unchanged,&lt;/li&gt;
&lt;li&gt;routes every model call through a &lt;strong&gt;gateway&lt;/strong&gt; so switching from a hosted API to self-hosted vLLM is a config change, not a code change, and&lt;/li&gt;
&lt;li&gt;gets &lt;strong&gt;full tracing&lt;/strong&gt; — node transitions, tool calls, and LLM calls in one trace — by adding a single callback.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every snippet below is real code from a working service. Roughly 150 lines of Python is all it takes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The shape of the thing
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI client (Open WebUI, openai SDK)
        │  POST /v1/chat/completions
        ▼
FastAPI router ──► LangGraph StateGraph ──► LLM Gateway ──► model (hosted API today, vLLM tomorrow)
        │                   │
        │                   └──► ToolNode ──► Qdrant (RAG)
        │
        └──► Langfuse callback (one trace per request)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The contract with the outside world is &lt;strong&gt;just the OpenAI API&lt;/strong&gt;. Everything interesting — the graph, RAG, tracing — lives behind that boundary. That single decision is what lets an off-the-shelf chat UI drive a custom agent with zero adapter code.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The ReAct graph
&lt;/h2&gt;

&lt;p&gt;The graph is deliberately tiny: one &lt;code&gt;agent&lt;/code&gt; node that reasons, one &lt;code&gt;tools&lt;/code&gt; node that retrieves, and a conditional edge that loops between them until the model stops asking for tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/graph/builder.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools_condition&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_graph&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ReAct: if the model emits tool_calls, go to `tools`; otherwise END.
&lt;/span&gt;    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools_condition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tools_condition&lt;/code&gt; and &lt;code&gt;ToolNode&lt;/code&gt; are LangGraph prebuilts that do the unglamorous work: inspect the last message for &lt;code&gt;tool_calls&lt;/code&gt;, route accordingly, execute the tools, and append &lt;code&gt;ToolMessage&lt;/code&gt;s back into state. You wire the loop; they run it.&lt;/p&gt;

&lt;p&gt;State is a single shared message log with a reducer that &lt;em&gt;appends&lt;/em&gt; rather than replaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/graph/state.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph.message&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;add_messages&lt;/code&gt; is the reducer. Every node returns &lt;code&gt;{"messages": [...]}&lt;/code&gt; and LangGraph merges it into the running log — no manual list-shuffling, and it's what makes the agent⇄tools loop accumulate context correctly.&lt;/p&gt;

&lt;p&gt;The agent node binds the tools and calls the model. Note &lt;code&gt;bind_tools&lt;/code&gt; is conditional — flip RAG off and the exact same node degrades to a plain single-shot chat call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/graph/nodes/agent.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_llm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;get_settings&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;rag_enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the tool itself is an ordinary &lt;code&gt;@tool&lt;/code&gt;-decorated function. The docstring is not documentation — it's the prompt the model reads to decide &lt;em&gt;when&lt;/em&gt; to call it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/graph/tools.py
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search internal docs for content relevant to the question.
    When the user asks about the project/system/docs, call this first.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_vector_store&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_settings&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;rag_top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] (source: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No relevant documents found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returning a &lt;code&gt;[1] (source: ...)&lt;/code&gt; structure isn't cosmetic — it's how the model can cite sources in its final answer, which is the difference between a demo and something people trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The OpenAI-compatible surface
&lt;/h2&gt;

&lt;p&gt;Here's the lever that makes everything else cheap: the agent speaks OpenAI's wire format. The router turns an incoming &lt;code&gt;/v1/chat/completions&lt;/code&gt; request into graph input and the graph's output back into an OpenAI response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/api/router.py
&lt;/span&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_completions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatCompletionRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;to_langchain_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_final_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;make_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;served_model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;graph_to_openai_sse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;served_model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the response matches OpenAI's schema (including SSE streaming chunks), &lt;strong&gt;Open WebUI thinks it's talking to OpenAI&lt;/strong&gt;. You point its &lt;code&gt;openaiBaseUrl&lt;/code&gt; at this service and your custom RAG agent shows up as a selectable model. No frontend work.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. One gateway, many models
&lt;/h2&gt;

&lt;p&gt;LangGraph nodes never name a provider. They call one factory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/llm/client.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;streaming&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_settings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;litellm_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# gateway, not a provider
&lt;/span&gt;        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;litellm_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_temperature&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;streaming&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;streaming&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;base_url&lt;/code&gt; points at a &lt;strong&gt;LiteLLM gateway&lt;/strong&gt;, not at any specific vendor. LiteLLM exposes an OpenAI-compatible endpoint and fans out to whatever its &lt;code&gt;model_list&lt;/code&gt; says — a hosted API today, self-hosted vLLM tomorrow. Migrating off a paid API to an in-cluster GPU model becomes a &lt;em&gt;gateway config edit&lt;/em&gt;; this Python file never changes.&lt;/p&gt;

&lt;p&gt;There's one deliberate escape hatch — when the gateway is down locally, point straight at Ollama's OpenAI-compatible endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ollama_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ollama_chat_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;ChatOpenAI&lt;/code&gt; class, different &lt;code&gt;base_url&lt;/code&gt;. The OpenAI-compatible interface shows up &lt;em&gt;three&lt;/em&gt; times in this architecture — inbound API, gateway, and local fallback — and that consistency is the whole trick.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Tracing in one line
&lt;/h2&gt;

&lt;p&gt;A multi-node graph with a tool loop is opaque when it goes wrong. Did the model skip the tool? Retrieve garbage? Loop twice? Langfuse's LangChain callback captures the entire run — every node transition, tool call, and LLM call — as a single nested trace.&lt;/p&gt;

&lt;p&gt;The integration is genuinely one object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/obs/langfuse.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lru_cache&lt;/span&gt;

&lt;span class="nd"&gt;@lru_cache&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_langfuse_handler&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_settings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;langfuse_public_key&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;langfuse_secret_key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# no keys → tracing silently disabled (safe for local/POC)
&lt;/span&gt;    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langfuse.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CallbackHandler&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CallbackHandler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Heads-up for the SDK version churn: on &lt;strong&gt;Langfuse SDK v3+&lt;/strong&gt; the import is &lt;code&gt;from langfuse.langchain import CallbackHandler&lt;/code&gt;, and the handler reads &lt;code&gt;LANGFUSE_PUBLIC_KEY&lt;/code&gt; / &lt;code&gt;LANGFUSE_SECRET_KEY&lt;/code&gt; / &lt;code&gt;LANGFUSE_HOST&lt;/code&gt; from the environment — you don't pass keys to the constructor anymore. This tripped up a lot of v2 tutorials.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then attach it per request via the graph &lt;code&gt;config&lt;/code&gt; — which is also where you stamp user/session metadata so traces are filterable in the Langfuse UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/api/router.py
&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_langfuse_handler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langfuse_user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langfuse_session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langfuse_tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;served_model_name&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Passing the handler through &lt;code&gt;config["callbacks"]&lt;/code&gt; (rather than baking it into the LLM client) means it propagates down the &lt;em&gt;entire&lt;/em&gt; graph automatically. One request → one trace → every step visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this buys you
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;How it's handled&lt;/th&gt;
&lt;th&gt;Why it scales&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend integration&lt;/td&gt;
&lt;td&gt;OpenAI-compatible API&lt;/td&gt;
&lt;td&gt;Any OpenAI client works unchanged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model choice&lt;/td&gt;
&lt;td&gt;LiteLLM gateway behind &lt;code&gt;ChatOpenAI&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Swap providers via config, not code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent logic&lt;/td&gt;
&lt;td&gt;LangGraph &lt;code&gt;StateGraph&lt;/code&gt; + prebuilts&lt;/td&gt;
&lt;td&gt;ReAct loop in ~10 lines, extensible to multi-agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Langfuse callback via graph &lt;code&gt;config&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One trace per request, zero per-node wiring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local dev&lt;/td&gt;
&lt;td&gt;Ollama fallback through same interface&lt;/td&gt;
&lt;td&gt;No gateway needed to hack offline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these pieces is exotic. The point is the &lt;strong&gt;seams&lt;/strong&gt;: an OpenAI boundary on the outside, a gateway boundary on the model side, and a callback boundary for observability. Get the seams right and the agent in the middle stays small and swappable.&lt;/p&gt;

&lt;p&gt;The same skeleton extends cleanly to a supervisor/worker multi-agent graph, a Postgres checkpointer for persistent threads, and an in-cluster vLLM model — each is an additive change behind one of those seams. But that's a follow-up post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with LangGraph, LangChain, LiteLLM, Qdrant, and Langfuse. If you're running LangGraph in production and want to compare notes on deployment patterns, reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>llm</category>
      <category>python</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>The SaaS Churn Rate Formula: 3 Calculations That Expose Your Real Runway Risk</title>
      <dc:creator>Doni Setiawan</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:57:40 +0000</pubDate>
      <link>https://dev.to/saasdev11/the-saas-churn-rate-formula-3-calculations-that-expose-your-real-runway-risk-30bb</link>
      <guid>https://dev.to/saasdev11/the-saas-churn-rate-formula-3-calculations-that-expose-your-real-runway-risk-30bb</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published at &lt;a href="https://saastools.corenk.com/articles/saas-churn-rate-formula" rel="noopener noreferrer"&gt;https://saastools.corenk.com/articles/saas-churn-rate-formula&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You closed the month at $12,750 MRR. But when you opened your dashboard on the 3rd, $890 had quietly walked out the door — six cancellations, one downgrade, and two payment failures that won’t retry. That $890 won’t come back. And next month, the same silent arithmetic starts all over again. If the only number you’re tracking is a rough churn percentage from your billing tool, you are flying into a cash wall with the instrument panel switched off.&lt;/p&gt;

&lt;p&gt;Most bootstrapped founders treat the SaaS churn rate formula as a single division problem they glance at quarterly. The truth: there are three distinct formulas, and each one tells you a different survival story. This guide will walk you through the exact calculations, give you worked examples you can steal, and show you how to read the numbers before your runway becomes a countdown. If you want the full context of why churn kills, &lt;a href="https://saastools.corenk.com/articles/saas-churn-rate" rel="noopener noreferrer"&gt;the full churn rate overview&lt;/a&gt; explains the silent compounding that turns small losses into fatal ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Do You Calculate Customer Churn (Logo Churn)?
&lt;/h2&gt;

&lt;p&gt;Logo churn is the bluntest instrument in your financial toolkit, yet it’s the one most founders default to. The calculation is dead simple, but what it hides is far more dangerous than what it shows.&lt;/p&gt;

&lt;p&gt;Customer (Logo) Churn Rate (%) = (Customers Cancelled ÷ Total Customers at Start of Period) × 100 &lt;/p&gt;

&lt;p&gt;Let’s say you began February with 213 paying customers. By the 28th, 14 of them cancelled. Logo churn = (14 ÷ 213) × 100 = 6.6%. On the surface, that’s a manageable fraction — until you realize that logo churn treats every customer as equal. It doesn’t care whether the customers who left were on $19/mo or $199/mo plans. It’s a headcount metric, not a money metric.&lt;/p&gt;

&lt;p&gt;One founder I worked with, Marta, was running a bootstrapped analytics SaaS. Her dashboard showed 3.2% logo churn, and she assumed she was in great shape. The problem? Most of the cancellations were coming from her highest‑tier accounts. Her MRR was being gutted while the logo number stayed deceptively low. Logo churn is a canary — it chirps early, but you need to know which mine it’s sitting in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Gross MRR Churn and Why Does It Hit Harder?
&lt;/h2&gt;

&lt;p&gt;Gross MRR churn translates the headcount loss into actual dollars — and that’s where the runway pain becomes impossible to ignore. This version of the SaaS churn rate formula accounts for both cancellations and downgrades, giving you the real revenue hole each period.&lt;/p&gt;

&lt;p&gt;Gross MRR Churn Rate (%) = (MRR Lost from Cancellations + MRR Lost from Downgrades) ÷ Starting MRR × 100 &lt;/p&gt;

&lt;p&gt;In Marta’s case, her $12,750 starting MRR lost $960 from cancellations and another $120 from downgrades — total $1,080 in lost gross MRR. Gross MRR churn = ($1,080 ÷ $12,750) × 100 = 8.5%. That’s over two and a half times her logo churn figure. She wasn’t just losing customers; she was bleeding the revenue that kept her lights on. A downgrade from a $99/mo plan to a $29/mo plan doesn’t appear in logo churn, but it vaporises $70 every month forever. Gross MRR churn catches that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WARNING: Logo Churn vs. MRR Churn Gap&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If your gross MRR churn is consistently higher than logo churn by more than 2×, your largest accounts are abandoning you. This is the most common cause of “silent runway evaporation” — MRR drops while customer count looks flat. Every month you ignore this gap, you are burning an extra $600–$2,000 that you won’t get back. &lt;/p&gt;

&lt;h2&gt;
  
  
  Can Your Net MRR Churn Be Negative?
&lt;/h2&gt;

&lt;p&gt;The third variant of the SaaS churn rate formula is where bootstrapped growth actually lives. Net MRR churn subtracts expansion revenue — upgrades, add‑ons, seat additions — from the revenue you lost. A positive net churn means you’re still losing ground. A negative net churn means your existing customers are out‑growing your losses; you’re expanding faster than you churn.&lt;/p&gt;

&lt;p&gt;Net MRR Churn Rate (%) = (Lost MRR − Expansion MRR) ÷ Starting MRR × 100 &lt;/p&gt;

&lt;p&gt;In the same month Marta lost $1,080, her expansion revenue from upsells and seat additions brought in $1,730. Net MRR churn = ($1,080 − $1,730) ÷ $12,750 × 100 = −5.1%. Negative. Her churn didn’t just stop eating her runway — it became a growth engine. The expansion revenue more than covered the hole left by cancellations and downgrades. This is the state every bootstrapped SaaS should fight for: net‑negative churn turns retention math into a compounding asset instead of a liability. Baremetrics’ open benchmark data consistently shows that bootstrapped SaaS with negative net churn grow 3‑5× faster without raising a dime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FOUNDER INSIGHT: Net‑Negative as the Bootstrapped Multiplier&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
ProfitWell’s retention research highlights that companies with net‑negative churn achieve median ARPU growth of 15–22% year‑over‑year from the existing base alone. For a bootstrapped founder at $15,000 MRR, that’s an extra $2,250–$3,300 a month from customers you already have — no ad spend, no launch, just expansion. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Compound Math of Churn: A $12,750 MRR Comparison
&lt;/h2&gt;

&lt;p&gt;Small churn differences don’t feel urgent in month one. By month twelve, they’ve carved entirely different futures out of the same starting revenue. The table below shows what happens to $12,750 MRR under two churn scenarios — 2% and 5% monthly — without any new customer growth. Every number assumes only the churn math is at work.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;MRR Remaining at 2% Churn&lt;/th&gt;
&lt;th&gt;MRR Remaining at 5% Churn&lt;/th&gt;
&lt;th&gt;Immediate Monthly Loss (5% vs 2%)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Month 1&lt;/td&gt;
&lt;td&gt;$12,495&lt;/td&gt;
&lt;td&gt;$12,113&lt;/td&gt;
&lt;td&gt;−$382 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Month 6&lt;/td&gt;
&lt;td&gt;$11,291&lt;/td&gt;
&lt;td&gt;$9,372&lt;/td&gt;
&lt;td&gt;−$1,919 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Month 12&lt;/td&gt;
&lt;td&gt;$10,005&lt;/td&gt;
&lt;td&gt;$6,890&lt;/td&gt;
&lt;td&gt;−$3,115 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At the 12‑month mark, the 5% churn scenario has eaten 46% of the original MRR — more than $5,800 gone from the monthly bank balance. A bootstrapped company running on thin margins can’t absorb that without layoffs or a cash infusion. The difference between 2% and 5% monthly churn is literally a $3,115 monthly cash gap. Every month you delay tightening retention, you trade future runway for today’s comfort.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Healthy SaaS Churn Rate Across Market Tiers?
&lt;/h2&gt;

&lt;p&gt;The “good” churn number depends heavily on your customer segment. A B2C prosumer tool lives in a different churn universe than an enterprise workflow platform. Use the table below to benchmark your logo and MRR churn rates against real‑world bands, with the monthly MRR impact measured at $12,750 starting revenue. All figures come from the open benchmark datasets of Baremetrics and ChartMogul.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Market Tier&lt;/th&gt;
&lt;th&gt;Typical Monthly Churn Rate&lt;/th&gt;
&lt;th&gt;Monthly MRR Loss /mo at $12,750 MRR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;B2C / Prosumer&lt;/td&gt;
&lt;td&gt;5–10%&lt;/td&gt;
&lt;td&gt;$638–$1,275 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SMB&lt;/td&gt;
&lt;td&gt;3–5%&lt;/td&gt;
&lt;td&gt;$383–$638 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid‑Market&lt;/td&gt;
&lt;td&gt;1.5–2.5%&lt;/td&gt;
&lt;td&gt;$191–$319 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;0.5–1%&lt;/td&gt;
&lt;td&gt;$64–$128 /mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Figures calculated at $12,750 starting MRR.&lt;/p&gt;

&lt;p&gt;If you sell to SMBs and your gross churn sits above 5%, you are losing at least $638/month more than the upper boundary expects — and that’s before any downgrades are counted. This gap compounds into a multi‑thousand‑dollar runway reduction every quarter, so treat the benchmark as a floor pressure, not a ceiling permission.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 Tactical Rituals to Calculate and Weaponize Your SaaS Churn Rate Formula Monthly
&lt;/h2&gt;

&lt;p&gt;Knowing the formulas isn’t enough. You need a repeatable discipline that turns the numbers into action. These four rituals take less than an hour a week and have saved bootstrapped founders thousands in preventable MRR leakage.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Run the full three‑churn‑rate spreadsheet every first Monday of the month.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pull your billing data, calculate logo, gross MRR, and net MRR churn side‑by‑side. Do this before opening your analytics dashboard — let the raw finance lead. Marta did this ritual monthly and in under 90 days spotted a 2.1× logo‑to‑MRR gap that was silently erasing $480/mo from her runway. Fixing it recovered $4,800 in annualized MRR.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;2&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Track the logo‑to‑MRR churn gap weekly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Set up a 5‑minute Friday check: if gross MRR churn exceeds logo churn by more than 2×, you have a revenue‑concentration problem. A bootstrapped project management tool I advise caught this when two mid‑market clients downgraded in the same week — the logo rate barely moved, but MRR churn spiked to 7.8%. An emergency retention call saved $620/mo that would have vanished by the weekend.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Run a 30‑day “negative churn sprint” on at‑risk accounts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pick five accounts showing low engagement or a support‑ticket surge. Offer them a personalized expansion incentive — a usage‑based upgrade, a discounted annual seat addition, or a bundle. One founder I know turned a 2.8% net churn into ‑0.9% net within a single quarter by targeting six accounts, netting $1,340/month in new expansion MRR while zero additional ads were running.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;4&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Automate the math with a churn calculator as a single source of truth.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Manual spreadsheets are prone to cell‑reference errors that can misstate your runway by months. Use the free &lt;a href="https://saastools.corenk.com/tools/saas-churn-calculator" rel="noopener noreferrer"&gt;SaaS Churn Calculator&lt;/a&gt; to instantly compute logo, gross MRR, and net MRR churn from the same inputs. In a survey of bootstrapped founders, those who automated churn tracking reduced the average reporting error from 12% to under 2%, effectively making every runway forecast actionable instead of guesswork.&lt;/p&gt;

&lt;p&gt;The SaaS churn rate formula isn’t three separate math problems — it’s one set of financial lenses glued together. If you’re still only tracking logo churn, you are reading the first page of a three‑page letter that tells you exactly how many months you have left. The question now isn’t whether you can calculate churn. It’s whether you will do all three calculations this month, or keep flying with the instrument panel half‑dark until the warning light is already red.&lt;/p&gt;

</description>
      <category>saas</category>
      <category>startup</category>
      <category>metrics</category>
      <category>bootstrapped</category>
    </item>
    <item>
      <title>I built an open-source crypto trading bot that runs 4 exchanges from one server</title>
      <dc:creator>GainAlgo</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:50:55 +0000</pubDate>
      <link>https://dev.to/gainalgo/i-built-an-open-source-crypto-trading-bot-that-runs-4-exchanges-from-one-server-47c1</link>
      <guid>https://dev.to/gainalgo/i-built-an-open-source-crypto-trading-bot-that-runs-4-exchanges-from-one-server-47c1</guid>
      <description>&lt;p&gt;&lt;strong&gt;GainAlgo&lt;/strong&gt; is an MIT-licensed crypto trading bot I've been building. The latest update lets it manage &lt;strong&gt;four exchanges from a single server and one dashboard&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Exchange&lt;/th&gt;
&lt;th&gt;Futures (USDT-M)&lt;/th&gt;
&lt;th&gt;Spot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Binance&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bybit&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upbit&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bithumb&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's 2 futures + 4 spot markets, all in one process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honesty first: it's not a money machine
&lt;/h2&gt;

&lt;p&gt;On its own, the bot is roughly break-even. I think of it as tilling the field and standing guard 24/7 — the human still does the final harvest. Defaults are deliberately safe: every engine &lt;strong&gt;off&lt;/strong&gt;, &lt;strong&gt;paper&lt;/strong&gt; mode, single server. Live trading is an explicit, per-exchange opt-in.&lt;/p&gt;

&lt;p&gt;If you came for "wake up to money piling up," this isn't that. It's a framework you tune — together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why one server?
&lt;/h2&gt;

&lt;p&gt;Running a separate bot per exchange means juggling windows and duplicated infra. GainAlgo keeps them in one place but &lt;strong&gt;fully isolated&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capital&lt;/strong&gt; — one exchange's balance/positions never leak into another's sizing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Records&lt;/strong&gt; — trade journals, daily P&amp;amp;L, and gate stats are kept per-exchange.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings&lt;/strong&gt; — entry/exit configs are tuned independently per exchange.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One screen, zero cross-contamination.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shipping multi-exchange taught me
&lt;/h2&gt;

&lt;p&gt;Before open-sourcing the latest work, I ran a &lt;strong&gt;multi-agent adversarial audit&lt;/strong&gt; over the code and fixed what it surfaced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a paper-mode path that still queried the &lt;strong&gt;live&lt;/strong&gt; account,&lt;/li&gt;
&lt;li&gt;missing &lt;strong&gt;order idempotency&lt;/strong&gt; (a retry could double-fill),&lt;/li&gt;
&lt;li&gt;per-exchange record files that could &lt;strong&gt;overwrite&lt;/strong&gt; each other.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small bot, but I'd rather knock on the bridge before crossing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;git clone &lt;a href="https://github.com/gainalgo/nunnaya" rel="noopener noreferrer"&gt;https://github.com/gainalgo/nunnaya&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy &lt;code&gt;.env.example&lt;/code&gt; → &lt;code&gt;.env&lt;/code&gt;, fill only the exchange keys you use (withdrawal permission &lt;strong&gt;off&lt;/strong&gt;), and run. It boots in paper mode by default — observe first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contribute
&lt;/h2&gt;

&lt;p&gt;This is a community project — the whole point is finding good configs together. Issues, PRs, and honest teardowns are welcome.&lt;br&gt;
⭐ Repo: &lt;a href="https://github.com/gainalgo/nunnaya" rel="noopener noreferrer"&gt;https://github.com/gainalgo/nunnaya&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>cryptocurrency</category>
      <category>showdev</category>
    </item>
    <item>
      <title>How LLM Tokens Work (And Why They Explain Your AI Bill)</title>
      <dc:creator>thestackunderflow</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:47:10 +0000</pubDate>
      <link>https://dev.to/thestackunderflow_3b3b3b6/how-llm-tokens-work-and-why-they-explain-your-ai-bill-46b</link>
      <guid>https://dev.to/thestackunderflow_3b3b3b6/how-llm-tokens-work-and-why-they-explain-your-ai-bill-46b</guid>
      <description>&lt;p&gt;Your LLM never reads your words — it reads tokens. And almost every surprise on your AI bill traces back to that one fact. Here's the breakdown 👇&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4NO1EJPnWXQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Here's the thing almost nobody internalizes about large language models: &lt;strong&gt;Claude never reads your words.&lt;/strong&gt; It reads &lt;em&gt;tokens&lt;/em&gt; — numbers. Your prompt is chopped into pieces, each piece is mapped to an integer, and the model only ever sees those integers. Every limit you hit, every bill you pay, and half the weird behavior you've seen traces back to this one fact.&lt;/p&gt;

&lt;p&gt;This article explains what a token actually is, why the model works in tokens instead of words, and how that single design choice explains your AI bill.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The one-sentence version:&lt;/strong&gt; text is split into tokens (chunks roughly ¾ of a word on average), each token maps to a number, and you pay per token — in &lt;em&gt;and&lt;/em&gt; out — so understanding tokens is understanding cost.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What a token actually is
&lt;/h2&gt;

&lt;p&gt;A token is a chunk of text — often a word, but frequently a &lt;em&gt;piece&lt;/em&gt; of a word, a space, or a punctuation mark. The tokenizer is a fixed dictionary that maps text chunks to integer IDs.&lt;/p&gt;

&lt;p&gt;Rough intuition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Common words (&lt;code&gt;the&lt;/code&gt;, &lt;code&gt;code&lt;/code&gt;, &lt;code&gt;error&lt;/code&gt;) are usually &lt;strong&gt;one token&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Rare or long words split into &lt;strong&gt;several tokens&lt;/strong&gt; (&lt;code&gt;tokenization&lt;/code&gt; → &lt;code&gt;token&lt;/code&gt; + &lt;code&gt;ization&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Whitespace and punctuation are tokens too.&lt;/li&gt;
&lt;li&gt;A useful rule of thumb in English: &lt;strong&gt;~4 characters per token&lt;/strong&gt;, or &lt;strong&gt;~0.75 words per token&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So "How tokens work" isn't 3 words to the model — it's a sequence of integer IDs like &lt;code&gt;[4438, 11460, 990]&lt;/code&gt;. The model does math on those numbers. The English you typed was never seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why models use tokens instead of words or letters
&lt;/h2&gt;

&lt;p&gt;Two extremes, both bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Whole words:&lt;/strong&gt; the vocabulary would be enormous and would break on any word it had never seen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single characters:&lt;/strong&gt; sequences would be absurdly long and the model would waste capacity relearning how letters form words.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tokens are the engineered middle: a fixed vocabulary (tens of thousands of entries) of common chunks that can assemble &lt;em&gt;any&lt;/em&gt; text — including words the model has never encountered — by gluing pieces together. It's the compression that makes the whole thing tractable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this explains your bill
&lt;/h2&gt;

&lt;p&gt;Every API provider, including Anthropic, &lt;strong&gt;prices per token&lt;/strong&gt; — and counts both directions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input tokens:&lt;/strong&gt; everything you send — your prompt, the system prompt, the conversation history, retrieved documents, tool definitions. All of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens:&lt;/strong&gt; everything the model generates back. Output is typically priced &lt;strong&gt;several times higher&lt;/strong&gt; than input.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why costs surprise people:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long context isn't free.&lt;/strong&gt; If you stuff 50 pages into the prompt "just in case," you pay for all of it on &lt;em&gt;every&lt;/em&gt; call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation history compounds.&lt;/strong&gt; In a chat, each new turn resends the whole prior conversation as input. Turn 20 is paying for turns 1–19 again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbose output costs more than verbose input.&lt;/strong&gt; Asking for a 2,000-word answer is pricier than sending a 2,000-word prompt.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your bill ≈ (input tokens × input price) + (output tokens × output price)
            └── prompt + history + docs + tools      └── the model's reply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  A worked intuition
&lt;/h2&gt;

&lt;p&gt;Say input is priced at $3 per million tokens and output at $15 per million (illustrative — check current rates). You send a 1,000-token prompt and get a 500-token answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 1,000 × ($3 / 1,000,000) = $0.003&lt;/li&gt;
&lt;li&gt;Output: 500 × ($15 / 1,000,000) = $0.0075&lt;/li&gt;
&lt;li&gt;One call ≈ &lt;strong&gt;$0.01&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tiny — until you multiply by thousands of calls, or let conversation history balloon each call's input to 20,000 tokens. &lt;em&gt;That's&lt;/em&gt; where bills come from: not one expensive call, but token count × call count.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to reason about token cost
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Count tokens, not words,&lt;/strong&gt; when estimating cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trim the prompt to what's needed.&lt;/strong&gt; Every "just in case" paragraph is paid for on every call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch history growth&lt;/strong&gt; in chat apps — prune or summarize old turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constrain output length&lt;/strong&gt; when you don't need an essay.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache the stable prefix&lt;/strong&gt; if you reuse the same big context repeatedly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common misconceptions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"The model reads my text."&lt;/strong&gt; No — it reads token IDs. Your words are converted first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"One word = one token."&lt;/strong&gt; Often, but long/rare words split into multiple tokens, and spaces/punctuation count.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Only output costs money."&lt;/strong&gt; Both directions are billed; input is usually cheaper per token but there's a lot more of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"A bigger context window is free to use."&lt;/strong&gt; The capacity is available; &lt;em&gt;using&lt;/em&gt; it costs tokens on every call.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How many tokens is a typical page of text?&lt;/strong&gt;&lt;br&gt;
Roughly 500–800 tokens per page of prose, but it varies with formatting and vocabulary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why do code and JSON sometimes cost more tokens than they look?&lt;/strong&gt;&lt;br&gt;
Symbols, indentation, and braces each tokenize separately, so structured text can be token-dense relative to its character count.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the system prompt count?&lt;/strong&gt;&lt;br&gt;
Yes. The system prompt, tool definitions, and any retrieved context are all input tokens you pay for on every call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a way to avoid resending the same big context every time?&lt;/strong&gt;&lt;br&gt;
Yes — prompt caching lets you reuse a stable prefix at a fraction of the cost.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm doing a whole series taking Claude apart piece by piece — video + written version of each — at &lt;a href="https://www.youtube.com/@TheStackUnderflow" rel="noopener noreferrer"&gt;The Stack Underflow&lt;/a&gt;. The full written companion to this one, plus the rest of the series, lives at &lt;a href="https://thestackunderflow.com/tutorials/" rel="noopener noreferrer"&gt;thestackunderflow.com/tutorials&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>programming</category>
    </item>
    <item>
      <title>LeetCode Like a Jedi: The Ultimate Beginner Study Plan</title>
      <dc:creator>Timevolt</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:41:07 +0000</pubDate>
      <link>https://dev.to/timevolt/leetcode-like-a-jedi-the-ultimate-beginner-study-plan-3n4h</link>
      <guid>https://dev.to/timevolt/leetcode-like-a-jedi-the-ultimate-beginner-study-plan-3n4h</guid>
      <description>&lt;h2&gt;
  
  
  The Quest Begins (The "Why")
&lt;/h2&gt;

&lt;p&gt;I still remember the first time I opened LeetCode feeling like a wide‑eyed Padawan staring at a holocron. I’d pick a problem, stare at the solution for ten minutes, copy it, move on, and then feel completely lost when a slightly different prompt showed up. It was like trying to wield a lightsaber without ever feeling the Force—lots of motion, zero impact. I knew I was grinding, but I wasn’t getting stronger. The frustration built up until I realized I was treating each problem as a isolated boss fight instead of learning the underlying patterns that connect them. That’s when I decided to change my approach and look for a single, repeatable technique that would actually stick.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Revelation (The Insight)
&lt;/h2&gt;

&lt;p&gt;The breakthrough came when I started treating every solved problem like a mini‑lesson I had to teach someone else. I call it the &lt;strong&gt;Teach‑Back Method&lt;/strong&gt;: after you get a working solution, you pause, explain the algorithm out loud as if you’re teaching a five‑year‑old, and then write &lt;strong&gt;one plain‑English sentence&lt;/strong&gt; that captures the core idea or pattern. That’s it. No fancy notebooks, no endless flashcards—just a verbal recap and a crisp summary.&lt;/p&gt;

&lt;p&gt;Why does this work? Because teaching forces you to reorganize your knowledge from “I memorized steps” to “I understand why those steps exist.” When you articulate the logic in simple terms, your brain spots the invariant, the data structure trick, or the loop invariant that makes the solution tick. Once you have that sentence, you’ve got a mental hook you can reuse on future problems. It’s like leveling up your character in an RPG—each teach‑back gives you a new skill point that applies to many quests ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wielding the Power (Code &amp;amp; Examples)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before: The Copy‑Paste Trap
&lt;/h3&gt;

&lt;p&gt;Let’s look at a classic easy problem: &lt;strong&gt;Two Sum&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Struggle version – just copying a solution I found online
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;twoSum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’d run this, see it passed, and move on. The next day, a variant asked for “return the indices of three numbers that sum to target.” I stared blankly because I had never internalized the &lt;em&gt;why&lt;/em&gt; behind the nested loops—I just knew “brute force works for two.”&lt;/p&gt;

&lt;h3&gt;
  
  
  After: Applying Teach‑Back
&lt;/h3&gt;

&lt;p&gt;After solving the problem, I explained it out loud:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“We need to find two numbers that add up to a target. As we walk through the list, we can remember what number we’d need to complete the pair. If we’ve already seen that needed number, we’ve found the answer.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That forced me to think about a lookup table. The “one‑sentence summary” I wrote down was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Use a hash map to store each number’s index and check for its complement while iterating.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now the solution looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Victory version – pattern extracted via teach‑back
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;twoSum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;                     &lt;span class="c1"&gt;# value -&amp;gt; index
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;complement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;complement&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="c1"&gt;# we’ve already seen the partner
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;complement&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;             &lt;span class="c1"&gt;# store current number for future checks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The teach‑back turned a vague memory of “nested loops” into a concrete, reusable pattern: &lt;em&gt;hash map for complement lookup&lt;/em&gt;. When I later faced the “3Sum” variant, I immediately thought, “Can I fix one number and reduce it to a two‑sum problem?”—and the same hash‑map idea guided me to a far faster solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Another Example: Maximum Subarray (Kadane’s Algorithm)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; I’d try every possible subarray, O(n²), and feel proud when it passed the small test cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; I taught myself the idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“At each position, the best subarray ending here is either the current element alone, or the current element plus the best subarray ending at the previous position.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One‑sentence summary:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Keep running sum; reset to zero when it drops below zero, tracking the maximum seen.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before – brute force (inefficient)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;maxSubArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-inf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;best&lt;/span&gt;

&lt;span class="c1"&gt;# After – Kadane’s, derived from teach‑back
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;maxSubArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;max_ending_here&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_so_far&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]:&lt;/span&gt;
        &lt;span class="n"&gt;max_ending_here&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_ending_here&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;max_so_far&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_so_far&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_ending_here&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;max_ending_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The teach‑back didn’t just give me a faster algorithm; it gave me a mental model I could apply to any “running total” problem—stock profit, longest positive streak, you name it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This New Power Matters
&lt;/h2&gt;

&lt;p&gt;When you internalize the &lt;em&gt;why&lt;/em&gt; behind a solution, you stop treating LeetCode as a memorization marathon and start seeing it as a pattern‑recognition gym. Each teach‑back session deposits a reusable concept in your mental toolbox. Over weeks, you’ll notice that medium‑hard problems begin to feel like variations of themes you’ve already mastered, not alien monsters. Your confidence spikes because you’re not hoping you remembered the right trick; you &lt;em&gt;know&lt;/em&gt; the trick because you explained it yourself.&lt;/p&gt;

&lt;p&gt;And the best part? The technique scales. Spend five minutes after each problem to teach‑back and write that one‑sentence summary. Do that for 20 problems a week, and you’ll have built a personal cheat sheet of patterns—far more valuable than any pre‑made list of “top 100 interview questions.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Turn: Start the Quest
&lt;/h2&gt;

&lt;p&gt;Pick any problem you’ve solved today (or solve a new one right now). Close the editor, stare at the ceiling, and explain the solution out loud as if your rubber duck is a curious five‑year‑old. Then write &lt;strong&gt;one sentence&lt;/strong&gt; that captures the essence. Post that sentence in a comment or a tweet—share your newly earned pattern with the world.&lt;/p&gt;

&lt;p&gt;What’s the first sentence you’ll write? I can’t wait to see what patterns you unlock. Happy coding, and may the Force be with your teach‑backs! 🚀&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>programming</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Discrete MCP tools vs execute_code: when each wins</title>
      <dc:creator>Bryan Clark</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:40:38 +0000</pubDate>
      <link>https://dev.to/clarkbw--/discrete-mcp-tools-vs-executecode-when-each-wins-1d9c</link>
      <guid>https://dev.to/clarkbw--/discrete-mcp-tools-vs-executecode-when-each-wins-1d9c</guid>
      <description>&lt;p&gt;When we wanted our boat agents to read SignalK — wind, position, battery,&lt;br&gt;
depth — over MCP, there was already a capable server for it:&lt;br&gt;
&lt;a href="https://github.com/VesselSense/signalk-mcp-server" rel="noopener noreferrer"&gt;VesselSense/signalk-mcp-server&lt;/a&gt;&lt;br&gt;
(TypeScript, MIT). It's well built. Our prime directive says use existing&lt;br&gt;
tools before building your own, and we take that seriously — our ship's log&lt;br&gt;
is &lt;a href="https://github.com/meri-imperiumi/signalk-logbook" rel="noopener noreferrer"&gt;someone else's plugin&lt;/a&gt;&lt;br&gt;
for exactly that reason.&lt;/p&gt;

&lt;p&gt;We built a separate server anyway:&lt;br&gt;
&lt;a href="https://github.com/sailingnaturali/signalk-mcp" rel="noopener noreferrer"&gt;signalk-mcp&lt;/a&gt;. This post is&lt;br&gt;
the honest version of why — because the two servers represent two genuinely&lt;br&gt;
different answers to the same design question, and which one you want depends&lt;br&gt;
entirely on what's driving it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design question
&lt;/h2&gt;

&lt;p&gt;How much of the query should the &lt;em&gt;model&lt;/em&gt; write?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VesselSense answers: all of it.&lt;/strong&gt; It exposes a single &lt;code&gt;execute_code&lt;/code&gt; tool.&lt;br&gt;
The agent writes JavaScript, which runs in a sandboxed V8 isolate with access&lt;br&gt;
to the SignalK data model. One tool definition in the context window, unlimited&lt;br&gt;
query flexibility. Want the average of three battery banks, but only if the&lt;br&gt;
engine is off? Write the code. No server release needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;signalk-mcp answers: none of it.&lt;/strong&gt; It exposes discrete, named tools —&lt;br&gt;
&lt;code&gt;read_sensor(path)&lt;/code&gt;, &lt;code&gt;battery_state(bank)&lt;/code&gt;, &lt;code&gt;depth_state()&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;get_active_alarms()&lt;/code&gt; — each with a one-argument schema and a fixed response&lt;br&gt;
shape. The flexibility ceiling is whatever tools the server ships.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why execute_code wins with frontier models
&lt;/h2&gt;

&lt;p&gt;If your agent is a frontier model, &lt;code&gt;execute_code&lt;/code&gt; is hard to beat:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token efficiency.&lt;/strong&gt; One tool definition instead of a dozen. For long
agent sessions, tool schemas are recurring context-window rent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No n+1 round trips.&lt;/strong&gt; A composite question ("compare house and starter
bank voltage trends") is one code block, not four tool calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No server roadmap coupling.&lt;/strong&gt; The model can answer questions the server
authors never anticipated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A big model writes small JavaScript correctly nearly every time. The&lt;br&gt;
flexibility is real and the costs are low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why discrete tools win on a boat
&lt;/h2&gt;

&lt;p&gt;Our target runtime is the opposite end of the spectrum: a voice assistant on&lt;br&gt;
the boat, designed to run against small local models, with a text-to-speech&lt;br&gt;
front-end and a sailor's attention split between the agent and the water.&lt;br&gt;
Two things dominate that design, and neither is token efficiency:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Reliability.&lt;/strong&gt; "What's my battery?" must work every time, in swell,&lt;br&gt;
on the local model. A named tool with one validated argument is a much&lt;br&gt;
smaller ask than writing correct JavaScript against a data model the agent&lt;br&gt;
half-remembers. And the failure modes differ in kind: a wrong tool argument&lt;br&gt;
fails &lt;em&gt;loudly&lt;/em&gt; at the schema validator; subtly wrong JavaScript fails&lt;br&gt;
&lt;em&gt;quietly&lt;/em&gt; with a plausible-looking number. On a boat, the quiet failure is&lt;br&gt;
the dangerous one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A speech contract.&lt;/strong&gt; Every signalk-mcp value carries a &lt;code&gt;display&lt;/code&gt; string&lt;br&gt;
the agent speaks verbatim — &lt;code&gt;"16.5 knots"&lt;/code&gt;, &lt;code&gt;"48.8 North, 123.1 West"&lt;/code&gt;,&lt;br&gt;
spelled-out units, no symbols a TTS engine mispronounces, no radians, no&lt;br&gt;
Kelvin. The server formats; the model relays. We've&lt;br&gt;
&lt;a href="https://engineering.sailingnaturali.com/fix-llm-formatting-in-the-tool-layer-not-the-prompt/" rel="noopener noreferrer"&gt;written before&lt;/a&gt; about&lt;br&gt;
why this belongs in the tool layer and not the prompt: prompt-level formatting&lt;br&gt;
rules are model-dependent and leak. With &lt;code&gt;execute_code&lt;/code&gt;, the model touches raw&lt;br&gt;
SignalK values (SI units, decimal degrees, ISO timestamps) on every query, so&lt;br&gt;
every query is a fresh chance to mispronounce the data. With discrete tools,&lt;br&gt;
the raw values never reach the model at all.&lt;/p&gt;

&lt;p&gt;That second point keeps proving itself. The same week we wrote this, we&lt;br&gt;
watched a capable cloud model restyle a coordinate string three different&lt;br&gt;
ways through three increasingly strict prompt instructions — and stop only&lt;br&gt;
when the tool returned one pre-assembled sentence to relay. Models reformat&lt;br&gt;
whatever they're allowed to reassemble. Tools that want deterministic output&lt;br&gt;
must hand over finished strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which one do you want?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;execute_code (VesselSense)&lt;/th&gt;
&lt;th&gt;discrete tools (signalk-mcp)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best driver&lt;/td&gt;
&lt;td&gt;frontier model&lt;/td&gt;
&lt;td&gt;small/local or voice-first model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query flexibility&lt;/td&gt;
&lt;td&gt;unlimited&lt;/td&gt;
&lt;td&gt;fixed tool surface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context cost&lt;/td&gt;
&lt;td&gt;one tool schema&lt;/td&gt;
&lt;td&gt;one schema per tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composite queries&lt;/td&gt;
&lt;td&gt;one call&lt;/td&gt;
&lt;td&gt;several calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure mode&lt;/td&gt;
&lt;td&gt;quiet (wrong code, plausible output)&lt;/td&gt;
&lt;td&gt;loud (schema rejection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTS output&lt;/td&gt;
&lt;td&gt;model formats raw values&lt;/td&gt;
&lt;td&gt;server-formatted &lt;code&gt;display&lt;/code&gt; strings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New question support&lt;/td&gt;
&lt;td&gt;immediate&lt;/td&gt;
&lt;td&gt;needs a server release&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Neither column is "better." If you're driving SignalK with a frontier model&lt;br&gt;
and want maximum flexibility, use VesselSense — genuinely. If you want&lt;br&gt;
simple, reliable, speakable tools for a voice-first agent, that's what&lt;br&gt;
signalk-mcp is for.&lt;/p&gt;

&lt;p&gt;The deeper takeaway isn't about boats: &lt;strong&gt;tool design is model-targeting.&lt;/strong&gt;&lt;br&gt;
The same backend deserves a different MCP surface depending on who's calling.&lt;br&gt;
Token-efficient power tools for big models; validated, pre-formatted,&lt;br&gt;
single-purpose tools for small ones. Pick the surface for the agent you&lt;br&gt;
actually run.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>signalk</category>
      <category>voiceassistant</category>
    </item>
    <item>
      <title>Adopt vs build: why we deleted our working logbook for SignalK</title>
      <dc:creator>Bryan Clark</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:40:35 +0000</pubDate>
      <link>https://dev.to/clarkbw--/adopt-vs-build-why-we-deleted-our-working-logbook-for-signalk-ip4</link>
      <guid>https://dev.to/clarkbw--/adopt-vs-build-why-we-deleted-our-working-logbook-for-signalk-ip4</guid>
      <description>&lt;p&gt;Our boat agents log moments by voice: "log this moment" → an entry with&lt;br&gt;
position, time, and conditions. The first version of&lt;br&gt;
&lt;a href="https://github.com/sailingnaturali/logbook-mcp" rel="noopener noreferrer"&gt;logbook-mcp&lt;/a&gt; backed that&lt;br&gt;
with SQLite on the agent machine. It worked. It had tests. It shipped.&lt;/p&gt;

&lt;p&gt;Then we audited it against the SignalK logbook ecosystem and deleted the&lt;br&gt;
entire storage layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit
&lt;/h2&gt;

&lt;p&gt;Our prime directive is &lt;em&gt;use or improve existing tools before building our&lt;br&gt;
own&lt;/em&gt;. Applied honestly, that means auditing your own working code against&lt;br&gt;
the ecosystem — not just once at design time, but again when you learn the&lt;br&gt;
ecosystem better. Three candidates:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/meri-imperiumi/signalk-logbook" rel="noopener noreferrer"&gt;meri-imperiumi/signalk-logbook&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
— a semi-automatic electronic logbook that runs &lt;em&gt;on&lt;/em&gt; the SignalK server.&lt;br&gt;
Per-day YAML files, a REST API with an OpenAPI spec, a webapp in the SignalK&lt;br&gt;
admin UI, and semi-automatic entries (hourly while underway, trip start/end).&lt;br&gt;
The killer feature: &lt;code&gt;POST /logs&lt;/code&gt; takes just the entry text, and the plugin&lt;br&gt;
snapshots position, heading, speed, wind, and barometer from the live bus&lt;br&gt;
server-side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Saillogger&lt;/strong&gt; — polished automatic trip capture, but your log lives in&lt;br&gt;
their cloud. Wrong shape for a local-first agent that needs to read and&lt;br&gt;
write programmatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/xbgmsharp/signalk-postgsail" rel="noopener noreferrer"&gt;postgsail&lt;/a&gt;&lt;/strong&gt; — a&lt;br&gt;
self-hosted Postgres + PostgREST + Grafana stack. Powerful trip analytics,&lt;br&gt;
but a whole infrastructure tier to operate, aimed at dashboards rather than&lt;br&gt;
agent-written narrative entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep ours&lt;/strong&gt; — full control, no network dependency, no third-party risk.&lt;br&gt;
But we'd be rebuilding, worse, what already exists: no UI, no auto-entries,&lt;br&gt;
no enrichment — and the log would die with the laptop instead of living on&lt;br&gt;
the boat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deciding argument
&lt;/h2&gt;

&lt;p&gt;A ship's log belongs to the ship. Storing it in SQLite on the agent machine&lt;br&gt;
was always slightly wrong; we just hadn't said it out loud. signalk-logbook&lt;br&gt;
puts the entries on the vessel's own server as human-readable YAML — files&lt;br&gt;
that remain trivially parseable decades from now even if every tool in this&lt;br&gt;
post is abandoned.&lt;/p&gt;

&lt;p&gt;That reframing also clarified what was actually &lt;em&gt;ours&lt;/em&gt;: not entry storage,&lt;br&gt;
but the agent-facing tool surface and (on the roadmap) USCG/Transport Canada&lt;br&gt;
sea-time accounting — which nothing in the ecosystem does. So logbook-mcp&lt;br&gt;
became ~200 lines of stateless glue: &lt;code&gt;mark_moment&lt;/code&gt; and &lt;code&gt;read_entries&lt;/code&gt; over&lt;br&gt;
the plugin's REST API, with sea-service form export later &lt;em&gt;derived from&lt;/em&gt;&lt;br&gt;
the plugin's entries instead of kept in a parallel store. One source of&lt;br&gt;
truth; the only code we maintain is the genuinely novel part.&lt;/p&gt;

&lt;h2&gt;
  
  
  What adoption actually cost: four undocumented quirks
&lt;/h2&gt;

&lt;p&gt;Adopting someone else's plugin is not free. The OpenAPI spec got us 80% of&lt;br&gt;
the way; live integration against the real server found the rest:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;POST /logs&lt;/code&gt; returns a bare &lt;code&gt;201&lt;/code&gt; with no body.&lt;/strong&gt; You don't get the
created entry back — so confirming "what did I just write?" means
re-fetching the day and taking the newest entry (with a previous-UTC-day
fallback for midnight races).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "optional" &lt;code&gt;ago&lt;/code&gt; field is mandatory in practice.&lt;/strong&gt; The spec marks
it optional; the code calls &lt;code&gt;buffer.get(req.body.ago)&lt;/code&gt; whenever its
15-minute state buffer is non-empty, and &lt;code&gt;buffer.get(undefined)&lt;/code&gt; throws.
Send &lt;code&gt;ago: 0&lt;/code&gt; always.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Author attribution reads a cookie, not the Authorization header.&lt;/strong&gt; The
plugin derives the entry author from &lt;code&gt;parseJwt(req.cookies.JAUTHENTICATION)&lt;/code&gt;.
Our client sends the token both ways: header for the server's auth gate,
cookie for the plugin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All &lt;code&gt;/plugins/*&lt;/code&gt; REST routes are admin-gated.&lt;/strong&gt; signalk-server's
&lt;code&gt;adminAuthenticationMiddleware&lt;/code&gt; guards every plugin route — device access
tokens and read/write user tokens get 401 no matter what permissions you
grant them. The agent's token must belong to an admin user. We verified
this in the server source after two very confusing rounds of token
provisioning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these are complaints — they're the normal cost of integrating with&lt;br&gt;
real software, and they're all documented now (in our&lt;br&gt;
&lt;a href="https://github.com/sailingnaturali/logbook-mcp/blob/main/SPEC.md" rel="noopener noreferrer"&gt;SPEC&lt;/a&gt; and&lt;br&gt;
in this post, which is the writeup we wish we'd found). The total was still&lt;br&gt;
a fraction of what maintaining our own store, schema migrations, UI, and&lt;br&gt;
auto-entry pipeline would cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the line actually is
&lt;/h2&gt;

&lt;p&gt;The audit's useful output wasn't "adopt" — it was a cleaner boundary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Theirs:&lt;/strong&gt; entry storage, enrichment, semi-automatic entries, the
curation UI. Commodity for us; their core competence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ours:&lt;/strong&gt; the agent tool surface (validated schemas, TTS-safe
&lt;a href="https://engineering.sailingnaturali.com/fix-llm-formatting-in-the-tool-layer-not-the-prompt/" rel="noopener noreferrer"&gt;&lt;code&gt;display&lt;/code&gt; strings&lt;/a&gt;,
honest error states that never claim an unrecorded moment was logged),
and the sea-time/licensing layer nothing else provides.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're holding working code you built before you knew the ecosystem,&lt;br&gt;
the audit is still worth running. The sunk cost is real but small; ours&lt;br&gt;
was one evening, and we traded a database for two hundred lines of glue&lt;br&gt;
and a log that lives where it should — on the boat.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>signalk</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Zero Wasn't Zero</title>
      <dc:creator>Robert Floyd Dugger</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:39:35 +0000</pubDate>
      <link>https://dev.to/robert_floyddugger_6f9a4/zero-wasnt-zero-432i</link>
      <guid>https://dev.to/robert_floyddugger_6f9a4/zero-wasnt-zero-432i</guid>
      <description>&lt;p&gt;My design reviewer had been poking the same hole in my strategy for weeks. Small gaps, pointed out one at a time — the analytics weren't catching traffic well, the numbers I was quoting didn't have the resolution to support the decisions I was hanging on them. So one night I finally sat down to actually read my Google Analytics dashboard, instead of glancing at it and feeling vaguely behind. The goal was real details. Another step in the right direction.&lt;/p&gt;

&lt;p&gt;Eight active users for the week. Four of them from Council Bluffs, Iowa.&lt;/p&gt;

&lt;p&gt;I don't know anyone in Council Bluffs, Iowa. Google does, though — it's the site of one of their largest data centers. Half of my "users" were crawlers pinging my site and getting logged as people. So my real week was three, maybe four humans. Organic search: zero. Qualified leads: zero. Converted leads: zero.&lt;/p&gt;

&lt;p&gt;I run a consulting intake form on that site. Multi-step wizard, dialer platforms, pain points, contact info — the funnel's entire bottom end. And here's the part I have to be honest about: when the work to wire it into analytics finally started, and the agent running the directive stopped cold to report that the intake page had never loaded the analytics tag at all — not one recorded page view, ever — I wasn't shocked. I'd been told it probably wasn't tied in. I half knew. The stop report wasn't a discovery. It was confirmation of a suspicion I'd been carrying around for weeks instead of spending ten minutes to check.&lt;/p&gt;

&lt;p&gt;That's the actual lesson, and it's less flattering than "I found a bug." Some projects get real developmental commitment. My website isn't one of them — it gets passing attention, and known holes survive a remarkably long time in projects that only get passing attention. I prefer a clean repo the way everyone prefers a clean kitchen, and like everyone, I have a room I just don't go in.&lt;/p&gt;

&lt;p&gt;What surprised me wasn't the hole. It was noticing what the zeros had been doing to me anyway. I &lt;em&gt;knew&lt;/em&gt; the instrumentation was suspect — and the row of zeros under "leads" still read like a verdict every single time I glanced at it. A gauge you know is broken still lies to you, and you still flinch. "Nobody wants this" and "you never measured it" produce the exact same dashboard, and even when you suspect it's the second one, your gut reads the first.&lt;/p&gt;

&lt;p&gt;The fix took one evening once it stopped being deferred: tag on the page, a &lt;code&gt;generate_lead&lt;/code&gt; event on successful submission, fallback paths so analytics can never break the form itself. The funnel now reports both of its ends — views and submissions — so the two failure modes finally look different. Views without submissions means the page doesn't convert. No views means nothing sends anyone there. Opposite problems, opposite fixes, indistinguishable until this week.&lt;/p&gt;

&lt;p&gt;Instrument the conversion point before you judge the funnel. And if you suspect a gauge is broken — confirm it today, because you're going to keep reading it either way.&lt;/p&gt;

&lt;p&gt;Data starts now. The next zero on that dashboard will be a real one. Weirdly, I'm looking forward to it.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>ga4</category>
      <category>instrumentation</category>
      <category>lessons</category>
    </item>
    <item>
      <title>Análisis de APIs para agentes autónomos: patrones en Prowl</title>
      <dc:creator>ProwlIndex</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:35:55 +0000</pubDate>
      <link>https://dev.to/prowlindex/analisis-de-apis-para-agentes-autonomos-patrones-en-prowl-4de4</link>
      <guid>https://dev.to/prowlindex/analisis-de-apis-para-agentes-autonomos-patrones-en-prowl-4de4</guid>
      <description>&lt;h2&gt;
  
  
  Resumen
&lt;/h2&gt;

&lt;p&gt;Este artículo examina cinco APIs listadas en Prowl, todas con score n/a. Aunque no hay ranking cuantitativo, el contexto permite identificar patrones de diseño en plataformas orientadas a agentes autónomos y procesos distribuidos.&lt;/p&gt;

&lt;h2&gt;
  
  
  APIs analizadas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Apumail
&lt;/h3&gt;

&lt;p&gt;API nativa para agentes, con capacidad de crear y leer correos electrónicos sobre un API de texto plano. Soporta content negotiation (text/plain para agentes, HTML para humanos). Esto es relevante: permite que un agente lea un correo como texto plano sin parsear HTML, reduciendo costos de procesamiento y errores.&lt;/p&gt;

&lt;p&gt;Ejemplo de uso: un agente de customer support que revisa bandejas de entrada y responde automáticamente en base a reglas.&lt;/p&gt;

&lt;h3&gt;
  
  
  RogerThat
&lt;/h3&gt;

&lt;p&gt;Capa de coordinación y chat entre agentes. Infraestructura de mensajería en tiempo real para sistemas multiagente. No es un simple bus de eventos; es un canal estructurado con timeouts y delivery guarantees.&lt;/p&gt;

&lt;p&gt;Caso práctico: dos agentes, uno de extracción de datos y otro de generación de informes, coordinándose mediante mensajes de solicitud/respuesta en tiempo real.&lt;/p&gt;

&lt;h3&gt;
  
  
  DOBI
&lt;/h3&gt;

&lt;p&gt;Agente autónomo para operaciones en el mundo físico a través de DePIN (redes de infraestructura física descentralizada) y activos del mundo real (RWAs). Ejecuta acciones on-chain. Ejemplo: un agente que verifica la temperatura de un sensor IoT desplegado en una bodega y activa un contrato inteligente si supera un umbral.&lt;/p&gt;

&lt;h3&gt;
  
  
  CIDIF
&lt;/h3&gt;

&lt;p&gt;Plataforma para gestionar solicitudes de fondos de I+D e innovación. No es un agente per se, sino un frontend para procesos administrativos. Su inclusión sugiere que Prowl indexa herramientas de productividad no estrictamente IA.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orquesta
&lt;/h3&gt;

&lt;p&gt;Orquestador de flujos de trabajo de IA con soporte para pipelines multi-paso. Permite componer, ejecutar y monitorear cadenas de agentes. Similar a frameworks como LangGraph o Temporal, pero empaquetado como API.&lt;/p&gt;

&lt;p&gt;Ejemplo: un pipeline que recibe un audio, lo transcribe, extrae sentimiento y genera un resumen, todo orquestado por Orquesta.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patrones observados
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dominio de agentes autónomos.&lt;/strong&gt; Cuatro de cinco APIs están diseñadas para agentes que operan sin intervención humana constante. Esto alinea con tendencias actuales de automatización inteligente.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API nativa, no SDK pesado.&lt;/strong&gt; Apumail y RogerThat ofrecen APIs REST con formatos plain-text o JSON. Sin dependencias de Python o Node.js, lo que facilita integraciones ligeras.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coordinación y comunicación.&lt;/strong&gt; RogerThat se enfoca en la capa de comunicación entre agentes, un punto débil de muchos sistemas multiagente. Orquesta resuelve la orquestación secuencial. Ambos son complementarios.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operaciones en el mundo real.&lt;/strong&gt; DOBI integra blockchain e IoT, lo que indica una expansión de los agentes más allá de APIs web. Riesgos de latencia y costes de gas.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Herramientas administrativas.&lt;/strong&gt; CIDIF es la excepción: no es un agente, sino un portal de gestión. Su presencia puede ser un ruido en el índice o una señal de que la plataforma cataloga también servicios SaaS tradicionales.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusión
&lt;/h2&gt;

&lt;p&gt;Las APIs más prometedoras para sistemas distribuidos son Apumail (acceso a correo optimizado para agentes), RogerThat (mensajería interagente) y Orquesta (orquestación de pipelines). DOIBI añade complejidad on-chain que puede ser excesiva para muchos casos de uso.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hipótesis falsable:&lt;/strong&gt; En los próximos seis meses, el uso conjunto de Apumail + RogerThat + Orquesta reducirá en al menos 30% el tiempo de implementación de un sistema de atención al cliente basado en agentes, comparado con una solución que implemente estos componentes desde cero.&lt;/p&gt;

</description>
      <category>api</category>
      <category>autonomousagents</category>
      <category>orchestration</category>
      <category>distribution</category>
    </item>
    <item>
      <title># Stop Paying AI to Read Garbage Job Posts (And Help Us Build the Antidote)</title>
      <dc:creator>Alex Seif</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:34:24 +0000</pubDate>
      <link>https://dev.to/alexseif/-stop-paying-ai-to-read-garbage-job-posts-and-help-us-build-the-antidote-3cp2</link>
      <guid>https://dev.to/alexseif/-stop-paying-ai-to-read-garbage-job-posts-and-help-us-build-the-antidote-3cp2</guid>
      <description>&lt;p&gt;Let’s be honest: hunting for a tech job right now feels like screaming into a void that occasionally auto-replies with "Unfortunately, we are moving forward with other candidates."&lt;/p&gt;

&lt;p&gt;Naturally, as engineers, our first instinct is to automate the screaming. We build scrapers. We hook them up to GPT-4. We send it 500 job descriptions a day and say, &lt;em&gt;"Hey AI, tell me if I'm qualified for this Senior PHP role."&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;And then the OpenAI bill arrives. Congratulations, you are now unemployed &lt;em&gt;and&lt;/em&gt; broke. &lt;/p&gt;

&lt;p&gt;Sending entire HTML job descriptions to an LLM to evaluate is the equivalent of buying a Ferrari just to drive to the end of your driveway to check the mail. It's expensive, overkill, and frankly, a little embarrassing. &lt;/p&gt;

&lt;p&gt;Enter &lt;strong&gt;Freeworld Job Finder&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Fail-Fast" LangGraph Engine
&lt;/h3&gt;

&lt;p&gt;We are building a dual-engine, decoupled platform designed specifically to aggressively protect your API tokens while actually finding you jobs you qualify for. &lt;/p&gt;

&lt;p&gt;Instead of a brute-force approach, we engineered a LangGraph-inspired Finite State Machine (FSM) in PHP. It acts as an elite, ruthless bouncer for your tokens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Career Modeler&lt;/strong&gt;: Give us whatever messy PDF resume or random URL profile you have. We don't care. Our Skill Extractor uses a local LLM to cleanly distill it into a normalized &lt;code&gt;SkillSetDTO&lt;/code&gt;. No LinkedIn-style forced form fields. Just raw data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Assessor Graph&lt;/strong&gt;: You tell us what you want ("Principal Engineer, Remote"). Our standalone scraper library (&lt;code&gt;php-jobspy&lt;/code&gt;) grabs the initial list. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Gauntlet&lt;/strong&gt;: Jobs are immediately passed through a "Fail-Fast" pipeline. If a job has a dealbreaker keyword (e.g., "Clearance Required"), it dies at Node 1 (0 tokens). If the title is "Junior Dev", it dies at Node 2 (~100 tokens). &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Only the absolute highest-quality, most promising jobs survive to reach the "Deep Evaluator" Node, where we finally spend the tokens to deeply compare the job description against your extracted skill set. &lt;/p&gt;

&lt;p&gt;You find out if you are Qualified, Overqualified, or a Mismatch, and you didn't bankrupt yourself doing it.&lt;/p&gt;

&lt;h3&gt;
  
  
  We Need Contributors!
&lt;/h3&gt;

&lt;p&gt;We are currently building this out with a decoupled architecture (a pure PHP/Symfony Backend API and a beautiful, glassmorphic Next.js/Vite Web Portal). &lt;/p&gt;

&lt;p&gt;If you are tired of the job-hunting grind and want to help build a ruthless, efficient, open-source pipeline that automates the nonsense, we want you. We need React/Vue wizards for the frontend, and PHP engineers who appreciate strict DTOs and pure domain logic.&lt;/p&gt;

&lt;p&gt;Come help us automate the void. &lt;a href="https://github.com/alexseif/freeworld-job-finder" rel="noopener noreferrer"&gt;https://github.com/alexseif/freeworld-job-finder&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What breaks when a Telegram founder community crosses 2,000 people</title>
      <dc:creator>Shekhar Thathera</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:33:32 +0000</pubDate>
      <link>https://dev.to/shipwithshekhar/what-breaks-when-a-telegram-founder-community-crosses-2000-people-42ej</link>
      <guid>https://dev.to/shipwithshekhar/what-breaks-when-a-telegram-founder-community-crosses-2000-people-42ej</guid>
      <description>&lt;p&gt;_&lt;br&gt;
What breaks when a Telegram founder community crosses 2,000 people_&lt;br&gt;
I co-founded Startup4Nation two years ago. It is a 2,000-person community of early-stage founders and students in India. Most of the conversation happens on Telegram. Some of it happens on Luma. We run founder mixers, demo nights, and an annual pitch event.&lt;br&gt;
This post is about the failure modes I hit between 1,000 and 2,000 members. Not the wins. The breaks. If you are scaling a community past 1k on Telegram specifically, these are the things nobody warned me about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics stop being a routing mechanism. They become a graveyard.&lt;/strong&gt;&lt;br&gt;
Under 500 members, a few well-named topics carry the conversation. People skim, find their lane, post. Past 1k, this collapses. New members do not read the topic list. They post in the first chat they see. You start the day with six "introductions" in random topics, two pitch decks in #general, and one founder asking for a cofounder in #mumbai when they are in Bangalore.&lt;br&gt;
What I tried:&lt;br&gt;
Pinned a welcome message in every topic with a one-line "post X in topic Y" rule.&lt;br&gt;
Asked mods to redirect posts. They got tired by week two.&lt;br&gt;
Set slow_mode = 60 in #general. People just moved to other topics.&lt;br&gt;
What worked:&lt;br&gt;
Killed topics that did not earn their place every month. I run a 30-day audit: if a topic averaged under three posts per day, it got merged or deleted. Went from 14 topics to 6. Activity per topic went up. New members could actually see the conversation.&lt;br&gt;
One and only one "drop anything" topic. Called it #lounge. It is the only topic that does not require a category. Everything that does not fit #funding, #hiring, #events, #build-in-public, #mumbai, #delhi, #blr goes there. This is the single biggest relief valve in the whole group.&lt;br&gt;
If you take one thing from this post, take that: pick one topic and call it the junk drawer. Naming it #lounge is fine. #random is fine. Just have exactly one.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Telegram does not give you moderation granularity. You have to fake it.
Telegram has admins, and it has per-chat permissions. That is it. There is no "remove on third warning," no "slow mode per user," no "ban by join date." At 2k members with three volunteer mods, you will burn out your mods in three weeks.
What I tried:
Built a Python bot with python-telegram-bot. Auto-removed intro posts that did not match a template. Mods hated it because false positives were high.
Added new joiners to a "trial" group for 48 hours before letting them into the main group. People hated it. Drop-off was 40%.
What worked:
A simple /rules command with a one-line response. New joiners get an automated DM with the rules in plain text. Most of them read it. Most of the rest get a single, polite reminder from a mod and self-correct.
A weekly mod sync on Sunday at 9pm. 20 minutes. Three mods, me, agenda in a pinned message. We review (a) who is breaking rules, (b) who is doing great and should be promoted to mod-in-training, (c) what content we are missing. Without this sync the mods drift. With it, they stay.
A mod queue channel that only mods see. Any reported message goes there. Mods handle them async over the week. Crucially, we agreed on a 24-hour SLA. If a mod cannot handle a report in 24 hours, it goes to the next mod.
The lesson: at this scale, your tooling is not the bot. Your tooling is the meeting.&lt;/li&gt;
&lt;li&gt;Luma helps for events. It does not help for conversation.
I moved all event RSVPs to Luma around the 1,200-member mark. It was a clear win. Event pages, ticket types, waitlists, post-event emails, all solved.
But Luma is not a community platform. It does not replace Telegram. The 5% of people who RSVPed on Luma and never showed up to events were the same 5% who would have ghosted anyway. And the 20% who showed up to every event without ever RSVPing on Luma were the same 20% who were already most engaged in Telegram.
What worked: treat Luma as a calendar, not a community. The Telegram group is the community. Luma is the schedule. Cross-link them on every event: Luma page has the Telegram link in the description. Telegram event announcement has the Luma link pinned. Never pretend one replaces the other.&lt;/li&gt;
&lt;li&gt;The hardest problem is not growth. It is graduating.
We have about 30 founders in the group who have raised pre-seed or are at revenue. They do not need a 2,000-person Telegram. They need a private WhatsApp with 8 other founders at their stage. They also do not want to leave the community publicly.
What worked for me: I never asked them to leave. I started a parallel #founders-only topic that auto-adds anyone who has raised or hit ₹10L MRR. It is opt-in. It has its own rhythm. The main group stays open and welcoming. The founders topic stays tight and useful.
The mistake I almost made: turning #founders-only into a paid tier. That would have killed the whole thing. The 30 founders would have left, and the 1,970 others would have felt like the community was leaving them behind. Community is not a funnel. It is a place.&lt;/li&gt;
&lt;li&gt;Things I would do differently if I started over
Add a lurker topic on day one. Half of any large Telegram group is reading, not posting. Give them a topic where they can react without composing a message. Emoji-only responses allowed.
Write a one-page community handbook before 500 members. Not a long doc. One page. Rules, topics, mod contact, how to report. Update it quarterly.
Promote two members to mod-in-training at every 500-member mark. Do not promote based on activity. Promote based on the people they help in DMs. That is the actual signal.
Stop chasing "engagement metrics." Daily active members in a 2k-person group will sit at 8 to 12% on any given day. That is fine. It is not a funnel. Do not optimize it into the ground.
What this has to do with DevRel
I am applying for Founding DevRel at Failproof. The role is to own developer community end-to-end for an AI agent company. Slack, GitHub, Luma, events, the works.
If you are hiring for this kind of role and reading this: this is the kind of community I have run. Not a Discord with three regulars. A 2,000-person community with mods, topics, events, an off-ramp for the high-engagement members, and a junk drawer for everyone else. I am not a community manager who has run one meetup. I am a community builder who has hit the failure modes above and shipped fixes for them.
If you want to talk, my Telegram is in my profile. Or just open an issue on the github-triage-agent repo I shipped this week. I am triaging my own DM.
***Built and shipped github-triage-agent this week — github.com/shekhar854/github-triage-agent. Small LangGraph + Claude agent that triages GitHub issues. README documents where Claude Code breaks in production.
Shekhar Thathera — co-founder, Startup4Nation. Community manager, Aayo App. B.Tech CSE (AI/Data Science), IIMT College of Engineering, Greater Noida. Applying for Founding DevRel.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>developer</category>
      <category>community</category>
    </item>
    <item>
      <title>Four 2026 Trust Failures You Can't Out-Patch (AUR, PAN-OS, Cisco SD-WAN, PeopleSoft)</title>
      <dc:creator>Kerry Kier</dc:creator>
      <pubDate>Tue, 23 Jun 2026 22:29:00 +0000</pubDate>
      <link>https://dev.to/kkierii/four-2026-trust-failures-you-cant-out-patch-aur-pan-os-cisco-sd-wan-peoplesoft-45bh</link>
      <guid>https://dev.to/kkierii/four-2026-trust-failures-you-cant-out-patch-aur-pan-os-cisco-sd-wan-peoplesoft-45bh</guid>
      <description>&lt;p&gt;Every keynote this spring told us the same thing: AI compressed the gap between disclosure and weaponization, so the answer is to patch faster. Fine. But I went back through what actually got exploited over the last several weeks, and most of the worst of it would not have cared how fast you patched. The bugs were not clever. They were trust assumptions and missing integrity checks we have had named CWE categories for since before half of you started writing code. Here is the mechanism on four of them, with the detection you can run today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trust model is the attack surface (AUR, no CVE)
&lt;/h2&gt;

&lt;p&gt;Around June 11, somebody adopted a pile of orphaned packages in the Arch User Repository, edited the build recipes, and turned them into credential stealers. Over 400 confirmed, more on the community lists as cleanup dragged on. There was no zero-day and no breach of Arch's own infrastructure. The official repos were never touched.&lt;/p&gt;

&lt;p&gt;The mechanism is the insulting part. The attacker edited &lt;code&gt;PKGBUILD&lt;/code&gt; and &lt;code&gt;.install&lt;/code&gt; files to invoke npm during the build, pull a malicious package (&lt;code&gt;atomic-lockfile&lt;/code&gt;), and drop a stripped Rust binary that harvests SSH keys, tokens, browser data, cloud creds, and messaging sessions. A second wave swapped npm for &lt;code&gt;bun&lt;/code&gt; to dodge signatures keyed on the first. What got exploited was the AUR's trust model: it trusts a package's &lt;em&gt;name and history&lt;/em&gt; over who maintains it right now, and adopting an abandoned package is a sanctioned process. Nobody broke in. They walked through a door the system holds open by design.&lt;/p&gt;

&lt;p&gt;Triage if you run Arch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Foreign (AUR) packages by install date -- anything touched on/after June 11 is suspect&lt;/span&gt;
pacman &lt;span class="nt"&gt;-Qqm&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read &lt;/span&gt;pkg&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;pacman &lt;span class="nt"&gt;-Qi&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pkg&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"^(Name|Install Date)"&lt;/span&gt; | &lt;span class="nb"&gt;paste&lt;/span&gt; - -
&lt;span class="k"&gt;done&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-k4&lt;/span&gt;

&lt;span class="c"&gt;# Diff the PKGBUILD of anything recent. Treat npm/pip/cargo/bun calls with no&lt;/span&gt;
&lt;span class="c"&gt;# relationship to the software's function as hostile:&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-nE&lt;/span&gt; &lt;span class="s2"&gt;"npm|pip|cargo|bun"&lt;/span&gt; PKGBUILD &lt;span class="k"&gt;*&lt;/span&gt;.install 2&amp;gt;/dev/null

&lt;span class="c"&gt;# The optional eBPF rootkit pins BPF maps under these names. If they exist,&lt;/span&gt;
&lt;span class="c"&gt;# stop trusting the host's own tooling:&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /sys/fs/bpf/hidden_&lt;span class="k"&gt;*&lt;/span&gt; 2&amp;gt;/dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One nuance the early coverage got wrong: the eBPF rootkit is optional, root-only (needs &lt;code&gt;CAP_BPF&lt;/code&gt;), and does not escalate privilege. It just hides the stealer after the fact. But that changes your cleanup math. If the payload ran as root, &lt;code&gt;pacman -R&lt;/code&gt; does not clean the box -- a package manager only deletes files it knows about, and a rootkit's whole job is to not be one of them. Rebuild from clean media or do not trust the host.&lt;/p&gt;

&lt;h2&gt;
  
  
  CVE-2026-0257: a firewall that trusts any cookie it can decrypt (PAN-OS)
&lt;/h2&gt;

&lt;p&gt;This is a security appliance failing at the one thing it exists to do. The GlobalProtect portal issues an encrypted "authentication override" cookie so users do not re-auth constantly. When the cookie comes back, PAN-OS decrypts it with its private key and then trusts the contents &lt;strong&gt;without verifying a signature.&lt;/strong&gt; The CWE is 565, reliance on cookies without integrity checking.&lt;/p&gt;

&lt;p&gt;It gets worse if the same certificate is reused for the box's HTTPS service, which is a common config, not an exotic one. An attacker connects over HTTPS, pulls the public key, and forges a cookie the firewall accepts as gospel. Rapid7 saw exploitation start May 17. Palo Alto quietly bumped the CVSS from 4.7 to 7.8 on May 29, the same day CISA added it to the KEV.&lt;/p&gt;

&lt;p&gt;You are exposed only if both are true: authentication override cookies are enabled on the portal or gateway, and the cookie-encryption certificate is shared with another service. Check &lt;code&gt;Network &amp;gt; GlobalProtect &amp;gt; Portals/Gateways &amp;gt; Agent &amp;gt; Authentication&lt;/code&gt; for the override setting. Mitigation is to disable authentication override or generate a certificate used &lt;em&gt;only&lt;/em&gt; for cookie encryption and shared with nothing else. Prisma Access was also in the affected list; Panorama and Cloud NGFW were not.&lt;/p&gt;

&lt;p&gt;Hunt your GlobalProtect logs for the PoC's tells:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Forged-cookie sessions in the public PoC showed:&lt;/span&gt;
&lt;span class="c"&gt;#   - cookie auth to the local admin account from low-cost hosting IPs (Vultr, etc.)&lt;/span&gt;
&lt;span class="c"&gt;#   - source user with an EMPTY domain field&lt;/span&gt;
&lt;span class="c"&gt;#   - endpoint_os_version: "Microsoft Windows 10 Pro 64-bit"&lt;/span&gt;
&lt;span class="c"&gt;# Grep gateway-auth login events and validate any "Cookie" auth to local admin:&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"gateway-auth.*login.*Cookie"&lt;/span&gt; /path/to/globalprotect.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CVE-2026-20182: a control plane whose auth step doesn't authenticate (Cisco SD-WAN)
&lt;/h2&gt;

&lt;p&gt;This one is a months-long pattern, and Cisco is wearing it. On April 20, CISA KEV-listed three Catalyst SD-WAN Manager flaws that chain into unauthenticated access: CVE-2026-20122 (incorrect use of privileged APIs), CVE-2026-20128 (storing passwords in a &lt;em&gt;recoverable&lt;/em&gt; format), and CVE-2026-20133 (sensitive information exposure). Then on May 14 came the one that should have been the headline: &lt;strong&gt;CVE-2026-20182, CVSS 10.0&lt;/strong&gt;, an authentication bypass in the SD-WAN control plane where the peering-authentication step simply does not authenticate (CWE-287). A sophisticated actor Cisco tracks as UAT-8616 hit it as a zero-day. CISA issued Emergency Directive 26-03 over it, and once PoC code circulated, researchers counted roughly ten additional clusters piling on. June added two more, including a path traversal (CVE-2026-20262) letting an authenticated attacker overwrite any file on the box.&lt;/p&gt;

&lt;p&gt;This is the controller that pushes config across your entire fabric -- the single most privileged box in the network -- and over a few months it shipped recoverable password storage, an info leak, a path traversal, and a control-plane auth mechanism that does not authenticate. After exploiting 20182, UAT-8616 injected an attacker key into the &lt;code&gt;vmanage-admin&lt;/code&gt; account, then logged in over NETCONF (SSH on TCP 830) and started issuing commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Hunt for the attacker key injection on SD-WAN control components:&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Accepted publickey for vmanage-admin"&lt;/span&gt; /var/log/auth.log

&lt;span class="c"&gt;# Then manually validate every control-connection peering event -- especially&lt;/span&gt;
&lt;span class="c"&gt;# vmanage peering types -- from unrecognized IPs or at unexpected times.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CVE-2026-35273: the one that was actually hard (PeopleSoft)
&lt;/h2&gt;

&lt;p&gt;Credit where due: this was a genuine zero-day. ShinyHunters (Mandiant tracks them as UNC6240) spent late May and early June tearing through Oracle PeopleSoft via CVE-2026-35273, an unauthenticated RCE in the Environment Management component of PeopleTools 8.61 and 8.62, rated 9.8. Mandiant dates exploitation to May 27 through June 9. Oracle's out-of-band advisory did not land until June 10 -- the whole campaign ran before there was anything to patch. Mandiant notified 100+ orgs; 68% were higher ed. CISA KEV-listed it June 12.&lt;/p&gt;

&lt;p&gt;Post-exploit, they dropped MeshCentral agents masquerading as Azure services (C2 at &lt;code&gt;azurenetfiles[.]net&lt;/code&gt;), ran a &lt;code&gt;*_fanout.sh&lt;/code&gt; lateral-movement/defacement script, and exfiltrated with zstd.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Breach marker dropped into PeopleSoft web/app directories:&lt;/span&gt;
find / &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"README-IF-YOU-SEE-THIS-YOUVE-BEEN-HACKED.TXT"&lt;/span&gt; 2&amp;gt;/dev/null

&lt;span class="c"&gt;# Compensating controls (Oracle/Mandiant guidance):&lt;/span&gt;
&lt;span class="c"&gt;#   - Disable the Environment Management Hub (EMHub) service, or remove PSEMHUB&lt;/span&gt;
&lt;span class="c"&gt;#   - Block external access to /PSEMHUB/* and /PSIGW/HttpListeningConnector&lt;/span&gt;
&lt;span class="c"&gt;#   - Watch outbound SMB (TCP 445) from PeopleSoft hosts to external destinations&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The pattern: you can't patch a broken assumption
&lt;/h2&gt;

&lt;p&gt;Now the macro picture, because it is the actual argument. Per Verizon's 2026 DBIR, the median time to fix a known-exploited vulnerability went &lt;em&gt;up&lt;/em&gt; year over year, 32 days to 43, and the share fully patched fell from 38% to 26%. Rapid7's 2026 report logged a 105% jump in confirmed exploitation of high- and critical-severity flaws (71 cases to 146), and the disclosure-to-weaponization window that CSA and the Zero Day Clock now measure in hours used to take weeks. Offense compresses, remediation expands, and yes, AI compressed the discovery-and-weaponization side. That part is real.&lt;/p&gt;

&lt;p&gt;But look at what it bought the attackers in these four. None of them was a speed problem at root. You cannot patch your way out of a package trusted because the system likes its name, a firewall that trusts any cookie it can decrypt, a control plane whose auth step does not authenticate, or an ERP endpoint left facing the internet. And in two of them -- Cisco's 10.0 and the PeopleSoft RCE -- the attackers were already inside before a patch existed at all. You cannot out-patch a clock that started before the vendor knew.&lt;/p&gt;

&lt;p&gt;"Patch faster" is not wrong so much as beside the point. These were design failures we agreed to ship, and no amount of velocity downstream fixes a broken assumption upstream. The window did collapse. The bugs that walked through it did not need it to.&lt;/p&gt;

&lt;p&gt;What is the worst trust-by-default you have found still shipping in a box you were told to trust? I want the examples.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://blog.vertexops.org/patch-faster-myth" rel="noopener noreferrer"&gt;blog.vertexops.org&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
