<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aman Sachan</title>
    <description>The latest articles on DEV Community by Aman Sachan (@aman_sachan_126d19c4a2773).</description>
    <link>https://dev.to/aman_sachan_126d19c4a2773</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3905077%2Fb9a51a6d-6ccb-4265-afe4-af43e57b0e81.jpg</url>
      <title>DEV Community: Aman Sachan</title>
      <link>https://dev.to/aman_sachan_126d19c4a2773</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aman_sachan_126d19c4a2773"/>
    <language>en</language>
    <item>
      <title>I built a 81-tool, fully local AI desktop assistant with PySide6 and Ollama (here is the architecture)</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 08 Jun 2026 01:39:25 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-81-tool-fully-local-ai-desktop-assistant-with-pyside6-and-ollama-here-is-the-1g02</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-81-tool-fully-local-ai-desktop-assistant-with-pyside6-and-ollama-here-is-the-1g02</guid>
      <description>&lt;h2&gt;
  
  
  Why a desktop app, not another chat UI
&lt;/h2&gt;

&lt;p&gt;VS Code gives you an editor. Cursor gives you an editor + chat bolted on. &lt;strong&gt;Sentience&lt;/strong&gt; is a different shape of animal: a native desktop window where the AI is the primary surface, 81 first-class tools hang off a ReAct loop, and the whole thing runs offline against Ollama with no telemetry, no extension marketplace, and no monthly bill.&lt;/p&gt;

&lt;p&gt;It is about 6,200 lines of Python. The license is MIT. The window opens in under a second on a cold start.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape: a window, an agent, and a tool registry
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Sentience window (PySide6)]
  - Sidebar (file tree)
  - Editor (QScintilla, syntax highlight)
  - Chat panel (ReAct loop entry point)
  - Embedded terminal (QTermWidget subprocess)
        |
        v
  jarvis_agent.py
    - tool_registry[81 entries]
    - provider (Ollama / OpenAI / Anthropic / Groq)
    - memory (SQLite + sqlite-vec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three widgets in the main window: a file tree, a code editor, and a chat panel. The chat panel is the entry point to the agent. Below them, an embedded terminal you can shell into; the agent uses the &lt;em&gt;same&lt;/em&gt; terminal internally when it runs shell-scoped tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ReAct loop with strict tool schemas
&lt;/h2&gt;

&lt;p&gt;The heart of the agent is a Reason -&amp;gt; Act -&amp;gt; Observe loop, but with a twist: every tool is a Pydantic model with a JSON schema the model has to fill. No free-form string parsing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schemas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schemas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_json_schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReadFileTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Absolute path to file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;end_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@registry.register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReadFileTool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_file&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end_line&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model is told about all 81 tools in a single system message. On each turn it must return a JSON object like &lt;code&gt;{"thought": "...", "tool": "read_file", "args": {...}}&lt;/code&gt; or &lt;code&gt;{"thought": "...", "final_answer": "..."}&lt;/code&gt;. We parse with &lt;code&gt;pydantic.ValidationError&lt;/code&gt; caught and re-prompted, so malformed tool calls never crash the loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-provider without a rewrite
&lt;/h2&gt;

&lt;p&gt;The same loop works against Ollama, OpenAI, Anthropic, or Groq. The abstraction is one method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ModelResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OllamaProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2:3b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ModelResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_obj&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;Provider&lt;/code&gt; protocol, different wire format. The agent does not care which model is on the other end; the JSON tool schema is the contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  The self-modify tool, kept honest
&lt;/h2&gt;

&lt;p&gt;There is a tool literally called &lt;code&gt;edit_own_source&lt;/code&gt;. It is dangerous, so it is gated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@registry.register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;edit_own_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EditOwnSourceTool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;edit_own_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confirm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;confirm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSED: pass confirm=True only after showing the diff to the user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ROOT&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUSED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is outside src/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;backup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_suffix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bak.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;shutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;backup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OK; backup at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;backup&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two guard rails. The first forces the model to &lt;em&gt;show the diff&lt;/em&gt; to the user before flipping &lt;code&gt;confirm=True&lt;/code&gt;. The second refuses any path outside &lt;code&gt;src/&lt;/code&gt;. A backup is written before any write. This is the kind of thing that needs to be loud, not silent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the terminal lives inside the agent
&lt;/h2&gt;

&lt;p&gt;When the agent runs a shell command, it does not spawn a hidden subprocess; it pushes the command into the embedded terminal widget. The user can see exactly what is running, can Ctrl-C it, and can scroll back. The agent reads the output as if it were any other tool.&lt;/p&gt;

&lt;p&gt;That single design choice killed a whole category of "agent ran rm -rf in the background" bugs. Visibility first, autonomy second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory: SQLite + a dumb but fast vector
&lt;/h2&gt;

&lt;p&gt;Persistent memory is a SQLite table with a &lt;code&gt;vec0&lt;/code&gt; virtual table for embeddings. No Pinecone, no server. Embeddings are computed locally with &lt;code&gt;nomic-embed-text&lt;/code&gt; via Ollama. The &lt;code&gt;remember&lt;/code&gt; tool writes a row; the &lt;code&gt;recall&lt;/code&gt; tool does a cosine top-k and returns the joined text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Window&lt;/td&gt;
&lt;td&gt;PySide6 (Qt 6.5+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Editor&lt;/td&gt;
&lt;td&gt;QScintilla with LSP over stdin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal&lt;/td&gt;
&lt;td&gt;QTermWidget wrapping bash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;td&gt;Custom ReAct loop, Pydantic tool schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Providers&lt;/td&gt;
&lt;td&gt;Ollama, OpenAI, Anthropic, Groq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;SQLite + sqlite-vec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Packaging&lt;/td&gt;
&lt;td&gt;PyInstaller (Windows .exe, Linux AppImage)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;A proper streaming mode for the chat panel (right now it is chunky), a visual diff tool so &lt;code&gt;edit_own_source&lt;/code&gt; is reviewable in-app, and a "skills" registry; small JSON manifests that bundle 3-5 tools into a named capability the model can opt into. Cursor and Claude have skills; Sentience should too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/AmSach/sentience" rel="noopener noreferrer"&gt;github.com/AmSach/sentience&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Run it:&lt;/strong&gt; &lt;code&gt;pip install pyside6 openai anthropic aiohttp requests &amp;amp;&amp;amp; python sentience.py&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you find a tool the agent should have, open an issue with a Pydantic schema and a one-line description. I merge PRs in 24 hours.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Python #PySide6 #LocalAI #Ollama #OpenSource #BuildInPublic
&lt;/h1&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>ollama</category>
    </item>
    <item>
      <title>Building a 3-second medicine verifier with Gemini vision + pg_trgm fuzzy matching</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 08 Jun 2026 01:06:59 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/building-a-3-second-medicine-verifier-with-gemini-vision-pgtrgm-fuzzy-matching-54da</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/building-a-3-second-medicine-verifier-with-gemini-vision-pgtrgm-fuzzy-matching-54da</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Indian households spend roughly &lt;strong&gt;₹65,000 crore&lt;/strong&gt; every year on out-of-pocket medicine costs. A meaningful slice of that is going toward branded drugs when the exact same molecule — same active ingredient, same dosage, same CDSCO approval — is sitting in a Jan Aushadhi store for a fraction of the price. Dolo-650 retails at ₹32 a strip; the Jan Aushadhi paracetamol is ₹4.90. The cheaper medicine isn't the issue. Nobody tells the patient it's there.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://agadahealth.vercel.app" rel="noopener noreferrer"&gt;Agada&lt;/a&gt; to fix that. Snap a photo of the strip. Three seconds later you know if it's CDSCO-registered, what it actually does, and what the cheaper version costs. No login. No app install. Free.&lt;/p&gt;

&lt;p&gt;Here's how the whole thing fits together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Phone camera] -&amp;gt; [client compression] -&amp;gt; [Gemini 1.5 Flash vision]
                                              |
                                              v (structured JSON: brand, molecule, dosage)
                                      [Supabase parallel queries]
                                      |--&amp;gt; CDSCO registry (300k+ drugs, pg_trgm GIN)
                                      |--&amp;gt; Jan Aushadhi product list
                                      |--&amp;gt; NPPA price ceiling
                                              |
                                              v
                                       [3 result cards]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things happen in parallel once the image hits the backend: Gemini extracts structured fields, and Supabase fires three independent lookups. Wall-clock is dominated by Gemini (~1.8s). DB queries finish in 80-150ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fuzzy matching with pg_trgm
&lt;/h2&gt;

&lt;p&gt;Drug naming in India is chaos. The CDSCO registry has &lt;em&gt;Crocin 500mg Tablet IP&lt;/em&gt;, &lt;em&gt;CROCIN 500&lt;/em&gt;, and &lt;em&gt;Crocin Advance&lt;/em&gt; as distinct rows. A photo of a strip rarely matches the canonical name exactly — paper gets crumpled, fonts vary, manufacturers abbreviate.&lt;/p&gt;

&lt;p&gt;We use PostgreSQL's &lt;code&gt;pg_trgm&lt;/code&gt; extension with GIN indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;pg_trgm&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;drugs_brand_trgm_idx&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;drugs&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brand_name&lt;/span&gt; &lt;span class="n"&gt;gin_trgm_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- lookup&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;brand_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;molecule&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;manufacturer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;drugs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;brand_name&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'%'&lt;/span&gt;
   &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;brand_name&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;        &lt;span class="c1"&gt;-- pg_trgm similarity operator&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brand_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;%&lt;/code&gt; operator returns true when trigram similarity exceeds &lt;code&gt;pg_trgm.similarity_threshold&lt;/code&gt;. Combined with &lt;code&gt;ILIKE&lt;/code&gt; for prefix matches, we get recall on messy OCR without falling over on false positives. Confidence scores are exposed in the UI — if similarity is below 0.4, we mark the result as &lt;code&gt;AI Estimated&lt;/code&gt; instead of &lt;code&gt;CDSCO Verified&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source transparency over false confidence
&lt;/h2&gt;

&lt;p&gt;Every result card carries a badge: &lt;strong&gt;CDSCO Verified&lt;/strong&gt;, &lt;strong&gt;Jan Aushadhi / BPPI&lt;/strong&gt;, or &lt;strong&gt;AI Estimated&lt;/strong&gt;. The AI badge always includes the line &lt;em&gt;"verify with a pharmacist"&lt;/em&gt; — we'd rather say "not found" than give a counterfeit-looking strip false legitimacy.&lt;/p&gt;

&lt;p&gt;We deliberately don't store scan history. A user photographing psychiatric or oncology medication is sharing sensitive health info. No account, no record, no analytics on what people scan.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CDSCO Approved Drug List&lt;/strong&gt; — 300k+ rows, refreshed quarterly from &lt;a href="https://cdscoonline.gov.in" rel="noopener noreferrer"&gt;cdscoonline.gov.in&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jan Aushadhi&lt;/strong&gt; — &lt;a href="https://janaushadhi.gov.in/product_list.html" rel="noopener noreferrer"&gt;janaushadhi.gov.in&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NPPA price ceilings&lt;/strong&gt; — &lt;a href="https://nppaindia.nic.in/price-list" rel="noopener noreferrer"&gt;nppaindia.nic.in&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A hand-picked dataset for the highest-traffic molecules, with a local fallback when Supabase isn't configured&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; React 18 + Vite, client-side image compression before upload (cuts 4MB phone photos to ~250KB JPEG)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision + NLG:&lt;/strong&gt; Gemini 1.5 Flash (one call for extraction, one for the plain-English explanation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB:&lt;/strong&gt; Supabase / PostgreSQL 15 with &lt;code&gt;pg_trgm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API layer:&lt;/strong&gt; Vercel serverless functions, ESM (&lt;code&gt;type: module&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;i18n:&lt;/strong&gt; 6 Indian languages (EN, HI, TA, BN, TE, MR)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosting:&lt;/strong&gt; Vercel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The price engine falls back to a hardcoded &lt;code&gt;JAN_AUSHADHI_DB&lt;/code&gt; if Supabase can't be reached — every result still has a price row.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;A WhatsApp bot. 500M+ Indians use WhatsApp; nobody wants to open a browser at the chemist counter. A photo-to-number flow would 10x reach overnight. Then offline mode for Tier-3 Jan Aushadhi stores with poor connectivity, and a "find the nearest Kendra" map to close the last-mile gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; &lt;a href="https://agadahealth.vercel.app" rel="noopener noreferrer"&gt;agadahealth.vercel.app&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/AmSach/agadahealth" rel="noopener noreferrer"&gt;github.com/AmSach/agadahealth&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Team:&lt;/strong&gt; Aman Sachan, Siddharth Lalwani, Chetna Kalra, Syed Akbar — Team Agada, Open Innovation 2026.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Gemini vision + a government drug database to fight pharma price gouging. No data retention, no paywalls, no accounts.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>ai</category>
      <category>india</category>
    </item>
    <item>
      <title>How I read 600 RSS feeds every morning in 3 minutes (pure Python, no framework)</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Fri, 05 Jun 2026 01:52:10 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/how-i-read-600-rss-feeds-every-morning-in-3-minutes-pure-python-no-framework-5p2</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/how-i-read-600-rss-feeds-every-morning-in-3-minutes-pure-python-no-framework-5p2</guid>
      <description>&lt;p&gt;Every morning I was opening 8-10 tabs — TOI, The Hindu, Indian Express, NDTV, Moneycontrol, Scroll, The Wire, FT, BBC. By the time I finished, it was 9 AM and I'd lost an hour to context-switching. So I built something to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;A scheduled agent that polls ~600 RSS sources every morning, deduplicates cross-posted articles, categorizes them into 8 buckets, and emails me a clean brief before 8 AM IST.&lt;/p&gt;

&lt;p&gt;The full skill is &lt;strong&gt;293 lines of Python&lt;/strong&gt; — pure stdlib (&lt;code&gt;urllib&lt;/code&gt;, &lt;code&gt;xml.etree&lt;/code&gt;, &lt;code&gt;re&lt;/code&gt;, &lt;code&gt;email.utils&lt;/code&gt;). No &lt;code&gt;feedparser&lt;/code&gt;. No framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17 Google News RSS queries + 30 direct publisher feeds
        |
        v
[fetch_all]  -- parallel via concurrent.futures
        |
        v
[dedupe]  -- 3-pass: normalize title, URL quality, jaccard token overlap
        |
        v
[categorize]  -- 8 buckets, rule-based keyword match
        |
        v
[render_markdown]  -- top stories + categorized sections
        |
        v
[send_email]  -- Gmail SMTP, plain text + optional PDF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The interesting parts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Three-pass dedupe
&lt;/h3&gt;

&lt;p&gt;Google News reposts the same story from 5+ outlets. Single-pass title matching misses the tricky ones (reworded headlines, different source attribution).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# "Modi visits US - The Hindu" -&amp;gt; "modi visits us"
&lt;/span&gt;    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\s*-\s*[A-Z][a-zA-Z\s]+$&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[^\w\s]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;url_quality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;news.google.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;toi&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;thehindu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jaccard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rule-based categorization beats LLM here
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;8 buckets (Politics, Economy, World, Business, Tech, Defence, Sports, Science)&lt;/li&gt;
&lt;li&gt;~30 keywords per bucket&lt;/li&gt;
&lt;li&gt;Matches in &amp;lt;1ms per item&lt;/li&gt;
&lt;li&gt;No API cost, no rate limits&lt;/li&gt;
&lt;li&gt;Predictable: "RBI rate decision" -&amp;gt; Economy, always&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Today's actual brief (5 Jun 2026)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;48 stories from 600+ raw pulls (92% dedupe rate)&lt;/li&gt;
&lt;li&gt;7 top stories (FT, Al Jazeera, NDTV, India Today)&lt;/li&gt;
&lt;li&gt;~600 words, ~3 min read&lt;/li&gt;
&lt;li&gt;Landed in inbox at 7:00 AM IST&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this beats the alternatives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Apple News / Google Discover: too US-centric, weak India coverage&lt;/li&gt;
&lt;li&gt;Twitter lists: noisy, no dedupe, no archival&lt;/li&gt;
&lt;li&gt;Manual scanning: 1+ hour/day, miss stuff&lt;/li&gt;
&lt;li&gt;Other RSS readers: they aggregate but don't dedupe or categorize&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+, stdlib only&lt;/li&gt;
&lt;li&gt;Runs as a scheduled agent on Zo Computer&lt;/li&gt;
&lt;li&gt;Sends via Gmail SMTP&lt;/li&gt;
&lt;li&gt;~293 LOC total, single file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/AmSach/india-daily" rel="noopener noreferrer"&gt;https://github.com/AmSach/india-daily&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions on the dedupe logic or categorization rules. If anyone wants a different regional feed set (US, EU, SEA), drop a comment.&lt;/p&gt;

</description>
      <category>python</category>
      <category>opensource</category>
      <category>rss</category>
      <category>india</category>
    </item>
    <item>
      <title>I built a multi-agent BTC research pipeline — 3 agents, 1 daily signal, full code</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 04 Jun 2026 00:51:28 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-multi-agent-btc-research-pipeline-3-agents-1-daily-signal-full-code-4c7b</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-multi-agent-btc-research-pipeline-3-agents-1-daily-signal-full-code-4c7b</guid>
      <description>&lt;h1&gt;
  
  
  I built a multi-agent BTC research pipeline that runs 3 specialized agents and outputs a daily signal
&lt;/h1&gt;

&lt;p&gt;BTC at $76,099. Down 40.8% from the $125,835 ATH. RSI neutral at 54. Sitting 8% below the 200-day EMA.&lt;/p&gt;

&lt;p&gt;Most retail traders and even most bots handle this kind of sideways chop with rules-based buy-the-dip logic. But rules miss context — when exchange reserves are at 7-year lows and $2.5B in ETF money just flowed in during a correction, that is structurally different from "BTC dropped 8% in a week, time to buy."&lt;/p&gt;

&lt;p&gt;So I built a multi-agent BTC research system that does the reasoning explicitly. Three specialized agents, each with a domain, weighted and fused into a single daily signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                        Master Controller (zo.ask)
                                  │
              ┌───────────────────┼───────────────────┐
              ▼                   ▼                   ▼
     Technical Agent      On-Chain Agent       Macro Agent
        (0.35)               (0.40)              (0.25)
              │                   │                   │
              └───────────────────┼───────────────────┘
                                  ▼
                        Signal Generator
                                  ▼
                          Daily Report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Technical Agent&lt;/strong&gt; — RSI, MACD, Bollinger Bands, support/resistance, price vs 200 EMA, distance from ATH. Reads the chart and gives a directional score.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On-Chain Agent&lt;/strong&gt; — MVRV, realized price floor, exchange reserves (7-year low = bullish), ETF flows, accumulation trend score. Reads the blockchain fundamentals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Macro Agent&lt;/strong&gt; — DXY correlation, BTC-S&amp;amp;P500 correlation, Fed rate expectations, S&amp;amp;P500 at record highs (risk-on backdrop), geopolitical risk (Iran war premium), institutional adoption. Reads the world outside crypto.&lt;/p&gt;

&lt;p&gt;Each agent outputs a &lt;code&gt;score&lt;/code&gt; (0-1) and a &lt;code&gt;confidence&lt;/code&gt; (0-1). The signal generator multiplies scores by weights (0.35 / 0.40 / 0.25) and sums them. Today's output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;Total&lt;/span&gt; &lt;span class="py"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;0.662&lt;/span&gt;
&lt;span class="py"&gt;Confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;0.684&lt;/span&gt;
&lt;span class="err"&gt;Signal&lt;/span&gt; &lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;BUY&lt;/span&gt;
&lt;span class="py"&gt;Action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="s"&gt;Take partial position (25-50% of capital)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How the agents actually score
&lt;/h2&gt;

&lt;p&gt;Here is the on-chain scoring (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_mvrv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mvrv&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mvrv&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UNDERVALUED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mvrv&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAIR_VALUE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mvrv&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;3.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OVERVALUED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;                 &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EXTREME_OVERVALUED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_exchange_reserves&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pct&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pct&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOW_RESERVES_BULLISH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# 7-year low
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pct&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NORMAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;                 &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH_RESERVES_BEARISH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The technical agent is similar — RSI 54 = neutral, below 200 EMA = -0.10, near resistance = -0.15. Each indicator adds or subtracts from the total weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  What today's signal actually says
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;Contribution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Technical&lt;/td&gt;
&lt;td&gt;0.44&lt;/td&gt;
&lt;td&gt;0.35&lt;/td&gt;
&lt;td&gt;0.154&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On-Chain&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;td&gt;0.40&lt;/td&gt;
&lt;td&gt;0.320&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Macro&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;0.188&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On-chain at 0.8 because: MVRV 1.5 (fair value but historically a buy zone), realized price floor $50K (we're well above), exchange reserves at 11.9% (7-year low = holders aren't selling), $2.5B in Q1 ETF inflows (institutions bought the correction).&lt;/p&gt;

&lt;p&gt;Macro at 0.75 because: S&amp;amp;P500 and Nasdaq at record highs (risk-on backdrop that should lift BTC), BTC lagging those highs (bullish catch-up), MSTR bought 34,164 BTC at $74,395 ($2.54B position), Trump Strategic Bitcoin Reserve at 328,372 BTC (~$24.5B).&lt;/p&gt;

&lt;p&gt;Technical is the laggard at 0.44 because: RSI neutral, below 200 EMA ($82,919), and the price is near the $75,000 resistance. The chart says "wait for confirmation" while the fundamentals say "the dip is over."&lt;/p&gt;

&lt;p&gt;That's the whole point of the multi-agent design: when fundamentals say BUY but technicals say WAIT, the system outputs a calibrated partial-position signal instead of a binary yes/no.&lt;/p&gt;

&lt;h2&gt;
  
  
  Signal types
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;STRONG_BUY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;score &amp;gt;= 0.75 → full position (50-100%)&lt;/span&gt;
&lt;span class="py"&gt;BUY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;          &lt;span class="s"&gt;score &amp;gt;= 0.60 → partial position (25-50%)&lt;/span&gt;
&lt;span class="py"&gt;NEUTRAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="s"&gt;score &amp;gt;= 0.45 → hold / no new entries&lt;/span&gt;
&lt;span class="py"&gt;SELL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="s"&gt;score &amp;gt;= 0.30 → reduce position 25-50%&lt;/span&gt;
&lt;span class="py"&gt;STRONG_SELL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;score &amp;lt;  0.30 → exit or short&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;btc-research
python3 scripts/run_analysis.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Outputs to &lt;code&gt;signals/daily_signals.json&lt;/code&gt; and &lt;code&gt;reports/daily_report_YYYY-MM-DD.md&lt;/code&gt;. Each agent is independently runnable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 agents/technical_agent.py   &lt;span class="c"&gt;# chart signals&lt;/span&gt;
python3 agents/onchain_agent.py     &lt;span class="c"&gt;# Glassnode-style metrics&lt;/span&gt;
python3 agents/macro_agent.py       &lt;span class="c"&gt;# DXY, Fed, equities&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole thing runs as a scheduled agent on Zo Computer at 06:00 UTC daily, writes a markdown report, and could email/Telegram it to the user. Every component is plain Python — no LLM calls, no external APIs, fully transparent scoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's not here yet
&lt;/h2&gt;

&lt;p&gt;The agents currently use a static dataset (current price, MVRV, ETF flows, etc.) hand-curated from research. The next iteration is to wire real data: CoinGecko or Glassnode for price/on-chain, FRED for macro. The agent code is structured for it — &lt;code&gt;analyze()&lt;/code&gt; just needs to swap the hardcoded metrics dict for an API call.&lt;/p&gt;

&lt;p&gt;The signal generator is also naive about disagreement. When technicals say NEUTRAL but on-chain says BULLISH, the weighted sum works but doesn't tell you "technicals are the drag, watch for the breakout." A v2 could add per-agent variance detection and flag low-confidence signals explicitly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;No external libraries for the agents (just stdlib)&lt;/li&gt;
&lt;li&gt;Markdown reports&lt;/li&gt;
&lt;li&gt;SQLite for historical signals (planned)&lt;/li&gt;
&lt;li&gt;Runs on Zo Computer as a scheduled agent&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/AmSach/btc-research" rel="noopener noreferrer"&gt;https://github.com/AmSach/btc-research&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Skill: &lt;code&gt;/home/workspace/btc-research/skills/btc_system/SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Architecture doc: &lt;code&gt;btc-research/AUTOMATION.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Latest signal: &lt;code&gt;signals/daily_signals.json&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT license. The signal is research, not advice — always do your own due diligence. Crypto is risky. The point of the system is to make the reasoning auditable, not to replace your judgment.&lt;/p&gt;

&lt;p&gt;If you have ideas for additional agents (sentiment, derivatives funding rates, options skew, miner flows), PRs welcome. The agent interface is simple — return a dict with &lt;code&gt;signal&lt;/code&gt;, &lt;code&gt;score&lt;/code&gt;, &lt;code&gt;confidence&lt;/code&gt;, and a &lt;code&gt;summary&lt;/code&gt; string — and the signal generator picks them up automatically.&lt;/p&gt;

</description>
      <category>python</category>
      <category>bitcoin</category>
      <category>opensource</category>
      <category>crypto</category>
    </item>
    <item>
      <title>I automated reading 600+ RSS feeds into one daily India news brief</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Sun, 10 May 2026 04:05:25 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-automated-reading-600-rss-feeds-into-one-daily-india-news-brief-3nbp</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-automated-reading-600-rss-feeds-into-one-daily-india-news-brief-3nbp</guid>
      <description>&lt;p&gt;Every morning I used to spend an hour jumping between news sites — NDTV, The Hindu, Economic Times, scrolling through dozens of RSS feeds. Then I built a script to do it for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with more sources
&lt;/h2&gt;

&lt;p&gt;More feeds actually made quality worse until I built a velocity scoring system. Without it, breaking stories drowned in noise and structural trends kept resurfacing even after I had already read them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;600+ RSS feeds, deduplicated, ranked by source authority and story velocity — synthesized into one email delivered at 8:30 AM IST. Politics, economy, tech, environment, markets — all in one readable brief.&lt;/p&gt;

&lt;p&gt;The pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fetch&lt;/strong&gt; — parallel HTTP requests to all feeds, 10-second timeout per feed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deduplicate&lt;/strong&gt; — simhash-based near-duplicate detection across all sources
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rank&lt;/strong&gt; — source authority weight × story velocity × recency score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarize&lt;/strong&gt; — Groq AI generates a 3-paragraph situational brief from top stories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email&lt;/strong&gt; — formatted HTML email, direct to inbox&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The hardest part
&lt;/h2&gt;

&lt;p&gt;The velocity scoring. I tried naive tf-idf first — it ranked opinion pieces over breaking news. The fix was a momentum score that tracks how many new sources pick up a story in the first 2 hours. Now genuine breaking stories surface without drowning editorial content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.12 + feedparser + httpx&lt;/li&gt;
&lt;li&gt;Redis for feed metadata cache&lt;/li&gt;
&lt;li&gt;Groq API for summarization
&lt;/li&gt;
&lt;li&gt;SQLite for story fingerprints&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;One email at 8:30 AM instead of 60 minutes of scattered reading. MIT licensed.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/amsach/btc-research" rel="noopener noreferrer"&gt;https://github.com/amsach/btc-research&lt;/a&gt; (skill lives in /home/workspace/Skills/india-daily/)&lt;/p&gt;

&lt;p&gt;If you consume India news for research or journalism, this might save you serious time.&lt;/p&gt;

&lt;h1&gt;
  
  
  Python #India #RSS #Automation #OpenSource #Journalism
&lt;/h1&gt;

</description>
      <category>python</category>
      <category>india</category>
      <category>rss</category>
      <category>automation</category>
    </item>
    <item>
      <title>I built a distributed compute grid where your idle laptop runs ML jobs — here's the architecture</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Wed, 06 May 2026 01:02:06 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-distributed-compute-grid-where-your-idle-laptop-runs-ml-jobs-heres-the-architecture-13e7</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-built-a-distributed-compute-grid-where-your-idle-laptop-runs-ml-jobs-heres-the-architecture-13e7</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Most personal computers sit idle 90% of the time. Meanwhile, ML training and gaming workloads cost a fortune on cloud GPU instances. I wanted to bridge that gap — turn idle hardware into useful compute.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ComputePool&lt;/strong&gt; — a hub-and-spoke distributed compute grid. Your Zo Computer acts as the control plane. Idle laptops and PCs become worker nodes that poll for jobs, execute workloads, and earn credits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Node Agent (Python) ← polls → Hub API ← dispatches → Worker Pool
                                    ↓
                              Credit Ledger
                                    ↓
                              Cashout System
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Node Agent&lt;/strong&gt; (&lt;code&gt;node-agent/node_agent.py&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polls hub every 30s for available jobs&lt;/li&gt;
&lt;li&gt;Reports GPU tier (RTX 4090 = 3x credit multiplier)&lt;/li&gt;
&lt;li&gt;Streams results back on completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hub&lt;/strong&gt; (&lt;code&gt;hub/hub.ts&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI backend on Railway&lt;/li&gt;
&lt;li&gt;Job queue with priority based on GPU tier&lt;/li&gt;
&lt;li&gt;Credit ledger per node&lt;/li&gt;
&lt;li&gt;Regional multipliers (Indian region: 0.7x)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dashboard&lt;/strong&gt; (&lt;code&gt;frontend/&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next.js 14 on Vercel&lt;/li&gt;
&lt;li&gt;Real-time job status, credit balance, node management&lt;/li&gt;
&lt;li&gt;Live at &lt;a href="https://man44.zo.space/pool" rel="noopener noreferrer"&gt;man44.zo.space/pool&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Credit Economy
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Workers earn credits per job completed&lt;/li&gt;
&lt;li&gt;GPU tiers: RTX 4090 (3x), RTX 3080 (2x), GTX 1080 (1x)&lt;/li&gt;
&lt;li&gt;Indian region: 0.7x base rate&lt;/li&gt;
&lt;li&gt;20% platform fee on all earnings&lt;/li&gt;
&lt;li&gt;Minimum cashout: ₹500&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Design Decisions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pull-based job distribution&lt;/strong&gt; — Nodes poll, hub doesn't push. Eliminates NAT traversal issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU-tiered pricing&lt;/strong&gt; — Higher-end GPUs earn more credits, incentivizes quality hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional multipliers&lt;/strong&gt; — Adjusts for purchasing power parity in different markets.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Backend: FastAPI + PostgreSQL&lt;/li&gt;
&lt;li&gt;Frontend: Next.js 14 + Tailwind CSS&lt;/li&gt;
&lt;li&gt;Node Agent: Python 3.10+ with Docker&lt;/li&gt;
&lt;li&gt;Deployment: Railway (backend) + Vercel (frontend)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/amsach/compute-pool" rel="noopener noreferrer"&gt;https://github.com/amsach/compute-pool&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Feedback Wanted
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Is the credit economy balanced for casual node operators?&lt;/li&gt;
&lt;li&gt;Would you run a node agent on your machine? Why or why not?&lt;/li&gt;
&lt;li&gt;Any security concerns with the pull-based model?&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  distributed #machinelearning #python #javascript #BuildInPublic
&lt;/h1&gt;

</description>
      <category>distributed</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Qwen sky proof: compressed memory made a tiny model behave better — with the receipts</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 04 May 2026 21:33:46 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/qwen-sky-proof-compressed-memory-made-a-tiny-model-behave-better-with-the-receipts-804</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/qwen-sky-proof-compressed-memory-made-a-tiny-model-behave-better-with-the-receipts-804</guid>
      <description>&lt;p&gt;This was a tiny-model before/after run with a very ordinary goal: keep the answer useful when the wording changes.&lt;/p&gt;

&lt;p&gt;The setup used &lt;strong&gt;Qwen2.5-0.5B-Instruct&lt;/strong&gt; with a memory layer around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The measured result
&lt;/h2&gt;

&lt;p&gt;From the proof pack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before latency:&lt;/strong&gt; 10,061.7 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After latency:&lt;/strong&gt; 4,652.6 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before tokens:&lt;/strong&gt; 35&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After tokens:&lt;/strong&gt; 97&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token saved:&lt;/strong&gt; -177.1%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency delta:&lt;/strong&gt; -5,409.1 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak RSS:&lt;/strong&gt; 1,794 MB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a nice reminder that “smaller prompt” is not always the same thing as “better answer”. Sometimes the smarter move is to give the model the right memory, even if it costs a few more tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the demo showed
&lt;/h2&gt;

&lt;p&gt;The before run was raw. The after run used a compressed memory summary that kept the useful facts and dropped the filler.&lt;/p&gt;

&lt;p&gt;That is the point of this kind of system: stay useful when the wording changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof pack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq8gayc707d46nw258rx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq8gayc707d46nw258rx.png" alt="Side-by-side proof" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvmtwc6ze30dxu5it72r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvmtwc6ze30dxu5it72r.png" alt="Terminal capture" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74pfmpgdlurjfo5ft1kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74pfmpgdlurjfo5ft1kt.png" alt="Links and artefacts" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Proof pack: &lt;a href="https://zo.pub/man42/qwen-sky-proof" rel="noopener noreferrer"&gt;https://zo.pub/man42/qwen-sky-proof&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub profile: &lt;a href="https://github.com/AmSach" rel="noopener noreferrer"&gt;https://github.com/AmSach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Instagram: &lt;a href="https://www.instagram.com/i.amsach" rel="noopener noreferrer"&gt;https://www.instagram.com/i.amsach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://www.linkedin.com/in/theamansachan" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/theamansachan&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>benchmarking</category>
    </item>
    <item>
      <title>KVQuant / BitForge: same model, smarter context, better answer</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 04 May 2026 21:32:43 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/kvquant-bitforge-same-model-smarter-context-better-answer-55ff</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/kvquant-bitforge-same-model-smarter-context-better-answer-55ff</guid>
      <description>&lt;p&gt;Most AI workflow posts are just a screenshot of a chat box and a hopeful caption.&lt;/p&gt;

&lt;p&gt;This one is different: I ran the &lt;strong&gt;same local model&lt;/strong&gt; twice on the &lt;strong&gt;same question&lt;/strong&gt;, once with a raw prompt and once with a memory + retrieval stack around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw prompt&lt;/li&gt;
&lt;li&gt;no compression&lt;/li&gt;
&lt;li&gt;no semantic retrieval&lt;/li&gt;
&lt;li&gt;more clutter in context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compressed working context&lt;/li&gt;
&lt;li&gt;semantic retrieval from memory notes&lt;/li&gt;
&lt;li&gt;fewer prompt tokens&lt;/li&gt;
&lt;li&gt;same model, same task, less nonsense&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The measured result
&lt;/h2&gt;

&lt;p&gt;From the proof pack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before latency:&lt;/strong&gt; 28,590.3 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After latency:&lt;/strong&gt; 25,008.9 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before accuracy:&lt;/strong&gt; 0.500&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After accuracy:&lt;/strong&gt; 1.000&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before prompt tokens:&lt;/strong&gt; 87&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After prompt tokens:&lt;/strong&gt; 108&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory saved:&lt;/strong&gt; -24.1%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last line is the fun one: the “after” run used &lt;em&gt;more&lt;/em&gt; prompt tokens here, because I tuned it to answer the question better. Token count is a tool, not a religion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;The model did not become magical. The workflow got smarter.&lt;/p&gt;

&lt;p&gt;That is the whole game with KV cache compression and prompt shaping work: make the task clearer, measure the result, and keep the same model honest across versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof pack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Froic7g7bddo714xezp5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Froic7g7bddo714xezp5l.png" alt="Before/after view" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kwedxxht1y76aixyh7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kwedxxht1y76aixyh7a.png" alt="Scores panel" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwy21fuq1je2zjvw3h8kg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwy21fuq1je2zjvw3h8kg.png" alt="Terminal transcript" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/AmSach/llm-foundry" rel="noopener noreferrer"&gt;https://github.com/AmSach/llm-foundry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Proof pack: &lt;a href="https://zo.pub/man42/kvquant-bitforge-real-prompt-proof" rel="noopener noreferrer"&gt;https://zo.pub/man42/kvquant-bitforge-real-prompt-proof&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub profile: &lt;a href="https://github.com/AmSach" rel="noopener noreferrer"&gt;https://github.com/AmSach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Instagram: &lt;a href="https://www.instagram.com/i.amsach" rel="noopener noreferrer"&gt;https://www.instagram.com/i.amsach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://www.linkedin.com/in/theamansachan" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/theamansachan&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>benchmarking</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>LLM Foundry on a tiny model: the stack still does the heavy lifting</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 04 May 2026 21:32:41 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/llm-foundry-on-a-tiny-model-the-stack-still-does-the-heavy-lifting-360a</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/llm-foundry-on-a-tiny-model-the-stack-still-does-the-heavy-lifting-360a</guid>
      <description>&lt;p&gt;This run was intentionally small-model and intentionally boring: no cloud API, no fake genius, just a tiny local model plus a better stack around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM Foundry with Qwen2.5-0.5B&lt;/strong&gt; is the version that makes the point most cleanly: the model itself is small, but the workflow around it can still be decent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the proof showed
&lt;/h2&gt;

&lt;p&gt;From the local proof run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark pass rate:&lt;/strong&gt; 50%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning:&lt;/strong&gt; 60%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding:&lt;/strong&gt; 100%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool + memory:&lt;/strong&gt; 100%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The demo also showed memory compression and retrieval in action. The exact lesson is simple: if wording changes, semantic retrieval is a lot better than brittle keyword matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I care
&lt;/h2&gt;

&lt;p&gt;The whole point of this layer is not to brag about a bigger model. It is to make a small model more usable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it can recover relevant context&lt;/li&gt;
&lt;li&gt;it can shrink messy transcripts into working memory&lt;/li&gt;
&lt;li&gt;it can be checked instead of hand-waved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the part around the model that turns a chat toy into something that can remember, recover context, and be tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof pack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgme13nxs9xsoyglohy24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgme13nxs9xsoyglohy24.png" alt="Top screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vuov9opbyca9ksut3cr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vuov9opbyca9ksut3cr.png" alt="Middle screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2950fapsnoxsxwg4dwd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2950fapsnoxsxwg4dwd.png" alt="Bottom screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/AmSach/llm-foundry" rel="noopener noreferrer"&gt;https://github.com/AmSach/llm-foundry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Proof pack: &lt;a href="https://zo.pub/man42/llm-foundry-small-model" rel="noopener noreferrer"&gt;https://zo.pub/man42/llm-foundry-small-model&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub profile: &lt;a href="https://github.com/AmSach" rel="noopener noreferrer"&gt;https://github.com/AmSach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Instagram: &lt;a href="https://www.instagram.com/i.amsach" rel="noopener noreferrer"&gt;https://www.instagram.com/i.amsach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://www.linkedin.com/in/theamansachan" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/theamansachan&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>LLM Foundry: why the stack around the model matters more than the model itself</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 04 May 2026 21:31:17 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/llm-foundry-why-the-stack-around-the-model-matters-more-than-the-model-itself-23h4</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/llm-foundry-why-the-stack-around-the-model-matters-more-than-the-model-itself-23h4</guid>
      <description>&lt;p&gt;I wanted to see whether a weak local model could become genuinely useful without pretending the base model was magic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM Foundry&lt;/strong&gt; is the stack around the model: memory, compression, semantic retrieval, provider support, and a benchmark harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;A useful model workflow usually looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;read the task&lt;/li&gt;
&lt;li&gt;recover relevant memory&lt;/li&gt;
&lt;li&gt;compress the clutter&lt;/li&gt;
&lt;li&gt;ask the model&lt;/li&gt;
&lt;li&gt;check the answer&lt;/li&gt;
&lt;li&gt;use tools if needed&lt;/li&gt;
&lt;li&gt;save traces&lt;/li&gt;
&lt;li&gt;benchmark the result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the difference between a chatbot and something you can actually trust on real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;The current version now has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;embedding-based semantic retrieval&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;multi-provider support&lt;/strong&gt; for OpenAI-compatible and Anthropic endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;compression + memory&lt;/strong&gt; so long tasks can be shrunk into compact context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;agent traces&lt;/strong&gt; that can become training data later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;benchmarks and harnesses&lt;/strong&gt; so the system is measurable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The measured part
&lt;/h2&gt;

&lt;p&gt;The proof pack shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark pass rate:&lt;/strong&gt; 50%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning harness:&lt;/strong&gt; 60%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding harness:&lt;/strong&gt; 100%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-use harness:&lt;/strong&gt; 100%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory harness:&lt;/strong&gt; 100%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That benchmark score is not a brag. It is a baseline. The point is that the system is measurable, and therefore improvable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limitation
&lt;/h2&gt;

&lt;p&gt;Orchestration helps, but it does not create capability out of thin air. If the base model is weak at reasoning, the stack can make it more useful, more reliable, and easier to test — but not magically frontier-grade.&lt;/p&gt;

&lt;p&gt;That is still a very good deal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof pack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzd9qmjsn8gnsd7pswvk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzd9qmjsn8gnsd7pswvk.png" alt="Top screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdnvoge1rtr2aro8yb1sg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdnvoge1rtr2aro8yb1sg.png" alt="Middle screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2ys4y3vl8kbd2aihyk4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2ys4y3vl8kbd2aihyk4.png" alt="Bottom screenshot" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/AmSach/llm-foundry" rel="noopener noreferrer"&gt;https://github.com/AmSach/llm-foundry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Proof pack: &lt;a href="https://zo.pub/man42/llm-foundry" rel="noopener noreferrer"&gt;https://zo.pub/man42/llm-foundry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub profile: &lt;a href="https://github.com/AmSach" rel="noopener noreferrer"&gt;https://github.com/AmSach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Instagram: &lt;a href="https://www.instagram.com/i.amsach" rel="noopener noreferrer"&gt;https://www.instagram.com/i.amsach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://www.linkedin.com/in/theamansachan" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/theamansachan&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>SignalHop: the acoustic mesh networking stack that talks through sound</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Mon, 04 May 2026 21:31:16 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/signalhop-the-acoustic-mesh-networking-stack-that-talks-through-sound-1llo</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/signalhop-the-acoustic-mesh-networking-stack-that-talks-through-sound-1llo</guid>
      <description>&lt;p&gt;I wanted an answer to a boringly practical question: what if the network is gone, but the speakers and microphones still work?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SignalHop&lt;/strong&gt; is an acoustic modem and mesh prototype that moves tiny messages over ultrasound instead of Wi‑Fi or Bluetooth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;

&lt;p&gt;The current build uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FSK tones at &lt;strong&gt;18 kHz&lt;/strong&gt; and &lt;strong&gt;20 kHz&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;48 kHz&lt;/strong&gt; sample rate&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;4 up-chirp&lt;/strong&gt; sync preamble&lt;/li&gt;
&lt;li&gt;frames with payloads up to &lt;strong&gt;255 bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a WAV/audio runtime for encode/decode and round-trip testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes it useful for emergency text, sensor payloads, and low-bandwidth status updates. Not glamorous. Very useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The measured part
&lt;/h2&gt;

&lt;p&gt;From the current proof pack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Encode latency&lt;/td&gt;
&lt;td&gt;16.634 ms mean&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decode latency&lt;/td&gt;
&lt;td&gt;713.028 ms mean&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Round-trip&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payload tested&lt;/td&gt;
&lt;td&gt;77 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max payload&lt;/td&gt;
&lt;td&gt;255 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The decode path is slower because it scans the full signal with correlation. That is fine for v1. The important part is that the numbers are measured, not imagined.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limitation
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;core modem path is field-deployable today&lt;/strong&gt;, but the full multi-node mesh still needs hardware validation across real devices. I’m keeping that line explicit on purpose.&lt;/p&gt;

&lt;p&gt;That means this is a real protocol stack, not marketing fog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof pack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4dzxottz09q9ov3em9r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4dzxottz09q9ov3em9r.png" alt="SignalHop cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3qzjqzgegj6z8tk2ki5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3qzjqzgegj6z8tk2ki5.png" alt="Performance card" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc502g5b0sk5cs7piub3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc502g5b0sk5cs7piub3.png" alt="Mesh diagram" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/AmSach/SignalHop" rel="noopener noreferrer"&gt;https://github.com/AmSach/SignalHop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Bench JSON: &lt;a href="https://raw.githubusercontent.com/AmSach/SignalHop/master/media/bench.json" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/AmSach/SignalHop/master/media/bench.json&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Summary JSON: &lt;a href="https://raw.githubusercontent.com/AmSach/SignalHop/master/media/summary.json" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/AmSach/SignalHop/master/media/summary.json&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Sound is the oldest protocol. We just updated the spec.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>networking</category>
      <category>iot</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>KVQuant: real terminal proof for KV-cache compression</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Sun, 03 May 2026 17:27:30 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/kvquant-real-terminal-proof-for-kv-cache-compression-3aog</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/kvquant-real-terminal-proof-for-kv-cache-compression-3aog</guid>
      <description>&lt;h1&gt;
  
  
  KVQuant: real terminal proof for KV-cache compression
&lt;/h1&gt;

&lt;p&gt;KVQuant is a cache-compression layer for long-context inference. The interesting bit is not the idea — lots of projects have that — but whether it survives contact with a real model, a real terminal, and a real benchmark table.&lt;/p&gt;

&lt;p&gt;This write-up is the boring but useful version: what it does, what I ran, what the numbers were, and where it helps or doesn’t.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why KV cache matters
&lt;/h2&gt;

&lt;p&gt;When a model generates text, it keeps a memory of previous tokens in the &lt;strong&gt;KV cache&lt;/strong&gt;. That cache grows with every step. Weight quantisation shrinks the model weights, but it doesn’t directly touch this memory tax.&lt;/p&gt;

&lt;p&gt;KVQuant targets that cache directly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Allocate fewer bits for older tokens&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pack the cache into smaller storage&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Restore it before the next forward pass&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That gives you a real memory win on long-running chats and long-context inference.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I benchmarked
&lt;/h2&gt;

&lt;p&gt;I ran two kinds of proof:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;real Hugging Face model&lt;/strong&gt; run with &lt;code&gt;distilgpt2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;deterministic synthetic cache benchmark&lt;/strong&gt; to make the cache math obvious and reproducible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-model result
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Prompt tokens&lt;/th&gt;
&lt;th&gt;Generated tokens&lt;/th&gt;
&lt;th&gt;Baseline cache&lt;/th&gt;
&lt;th&gt;KVQuant cache&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;th&gt;Cache ratio&lt;/th&gt;
&lt;th&gt;KVQuant compression&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;product-explainer&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;9.56 MiB&lt;/td&gt;
&lt;td&gt;2.39 MiB&lt;/td&gt;
&lt;td&gt;7.17 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;td&gt;8.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;developer-note&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;9.63 MiB&lt;/td&gt;
&lt;td&gt;2.41 MiB&lt;/td&gt;
&lt;td&gt;7.22 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;td&gt;8.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total cache saved:&lt;/strong&gt; 14.40 MiB&lt;/p&gt;

&lt;h3&gt;
  
  
  Honest speed note
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Baseline t/s&lt;/th&gt;
&lt;th&gt;KVQuant t/s&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;product-explainer&lt;/td&gt;
&lt;td&gt;21.17&lt;/td&gt;
&lt;td&gt;16.05&lt;/td&gt;
&lt;td&gt;0.76x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;developer-note&lt;/td&gt;
&lt;td&gt;21.88&lt;/td&gt;
&lt;td&gt;20.10&lt;/td&gt;
&lt;td&gt;0.92x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is the part I do &lt;strong&gt;not&lt;/strong&gt; want to hide: on a small CPU model, compression overhead can offset throughput gains. The memory savings are real; the wall-clock speedup is workload-dependent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Actual terminal proof
&lt;/h2&gt;

&lt;p&gt;This is the real terminal run I captured. The key part is that it is a direct terminal transcript from a benchmark script, not a dashboard summary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj495jtsd5w1xx5vzr35a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj495jtsd5w1xx5vzr35a.png" alt="Terminal proof" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Exact command run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/.venv/bin/activate
&lt;span class="nb"&gt;cd&lt;/span&gt; /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/KVQuant
&lt;span class="nv"&gt;HF_HUB_DISABLE_PROGRESS_BARS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="nv"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; python examples/e2e_benchmark.py &lt;span class="nt"&gt;--model&lt;/span&gt; distilgpt2 &lt;span class="nt"&gt;--output-dir&lt;/span&gt; /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/terminal-proof/output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step-by-step terminal output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1) Benchmark started
# KVQuant end-to-end benchmark (distilgpt2)

2) Model and generation mode
Real Hugging Face causal LM, real greedy generation, and real output tokens.

3) Measured table
| Scenario | Prompt tokens | Generated tokens | Baseline t/s | KVQuant t/s | Speedup | Baseline cache | KVQuant cache | Saved | Cache ratio | KVQuant compression |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| product-explainer | 17 | 256 | 21.17 | 16.05 | 0.76x | 9.56 MiB | 2.39 MiB | 7.17 MiB | 4.00x | 8.00x |
| developer-note | 19 | 256 | 21.88 | 20.10 | 0.92x | 9.63 MiB | 2.41 MiB | 7.22 MiB | 4.00x | 8.00x |

4) Summary
**Average speedup:** 0.84x
**Average cache ratio:** 4.00x
**Average generated tokens:** 256
**Total cache saved:** 14.40 MiB

5) File outputs
HTML: /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/terminal-proof/output/kvquant-e2e-benchmark.html
JSON: /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/terminal-proof/output/kvquant-e2e-benchmark.json
Markdown: /home/.z/workspaces/con_v0tzKzkrq5Z4Ia2E/terminal-proof/output/kvquant-e2e-benchmark.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Exact output text for each prompt
&lt;/h2&gt;

&lt;p&gt;These are the Qwen2.5-0.5B-Instruct sample outputs I used so the post shows the &lt;strong&gt;full text&lt;/strong&gt;, not a cut-off snippet.&lt;/p&gt;

&lt;h3&gt;
  
  
  product-explainer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Explain why KV cache compression matters in long-context AI systems. Return exactly 5 sentences, no bullets, no intro.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;KV (Key-Value) caches play a crucial role in long-context AI systems by efficiently storing and retrieving large amounts of data. By compressing key-value pairs, these caches reduce storage requirements significantly, especially when dealing with large datasets or complex queries. This compression helps in managing memory usage more effectively, ensuring that the system can handle high volumes of data without running out of disk space. Additionally, compressed KV caches enable faster retrieval times for frequently accessed data, improving overall performance and responsiveness in real-time applications.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  developer-note
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Explain KV cache compression to an engineer who cares about latency, memory, and reliability. Return exactly 5 bullet points, no intro.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- **Compression**: Utilizes data redundancy to reduce the amount of data that needs to be transferred between the CPU and disk.
- **Memory Efficiency**: Reduces the number of I/O operations required by caching frequently accessed data in memory.
- **Latency Reduction**: Minimizes the time it takes for data to reach the CPU from the disk, improving overall system performance.
- **Reliability Enhancement**: Ensures consistent access to data even when network or hardware failures occur.
- **Scalability**: Allows for efficient use of resources based on the size of the data being cached.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Browser-rendered proof
&lt;/h2&gt;

&lt;p&gt;Here’s the full report rendered in browser.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fman42.zo.space%2Fassets%2Fkvquant-e2e-proof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fman42.zo.space%2Fassets%2Fkvquant-e2e-proof.png" alt="Benchmark proof" width="800" height="3994"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The synthetic benchmark baseline
&lt;/h2&gt;

&lt;p&gt;Before trusting real-model results, I verified with synthetic tensors across a range of cache shapes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;Without KVQuant&lt;/th&gt;
&lt;th&gt;With KVQuant&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;chat-turn&lt;/td&gt;
&lt;td&gt;(1, 8, 512, 64)&lt;/td&gt;
&lt;td&gt;0.50 MiB&lt;/td&gt;
&lt;td&gt;0.13 MiB&lt;/td&gt;
&lt;td&gt;0.38 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;code-assist&lt;/td&gt;
&lt;td&gt;(1, 16, 1024, 64)&lt;/td&gt;
&lt;td&gt;1.00 MiB&lt;/td&gt;
&lt;td&gt;0.25 MiB&lt;/td&gt;
&lt;td&gt;0.75 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;rag-summary&lt;/td&gt;
&lt;td&gt;(1, 16, 2048, 64)&lt;/td&gt;
&lt;td&gt;2.00 MiB&lt;/td&gt;
&lt;td&gt;0.50 MiB&lt;/td&gt;
&lt;td&gt;1.50 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tool-agent&lt;/td&gt;
&lt;td&gt;(1, 32, 2048, 128)&lt;/td&gt;
&lt;td&gt;8.00 MiB&lt;/td&gt;
&lt;td&gt;2.00 MiB&lt;/td&gt;
&lt;td&gt;6.00 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;long-context&lt;/td&gt;
&lt;td&gt;(1, 32, 4096, 128)&lt;/td&gt;
&lt;td&gt;16.00 MiB&lt;/td&gt;
&lt;td&gt;4.00 MiB&lt;/td&gt;
&lt;td&gt;12.00 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tiny-firmware&lt;/td&gt;
&lt;td&gt;(1, 4, 256, 64)&lt;/td&gt;
&lt;td&gt;0.0625 MiB&lt;/td&gt;
&lt;td&gt;0.0156 MiB&lt;/td&gt;
&lt;td&gt;0.0469 MiB&lt;/td&gt;
&lt;td&gt;4.00x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 4x ratio is consistent across all scales. This is the expected outcome: 4-bit quantization of fp16 gives you exactly 4x.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changed in this round
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bigger intent set
&lt;/h3&gt;

&lt;p&gt;Added scenarios with high token counts (256 output tokens) so the cache actually accumulates to meaningful sizes. Real-world use cases — not toy examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real end-to-end benchmark
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;examples/e2e_benchmark.py&lt;/code&gt; runs a full generation loop and writes &lt;code&gt;.html&lt;/code&gt;, &lt;code&gt;.json&lt;/code&gt;, and &lt;code&gt;.md&lt;/code&gt; output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real DynamicCache integration
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CompressedDynamicCache&lt;/code&gt; in &lt;code&gt;kvquant/cache.py&lt;/code&gt; is a drop-in &lt;code&gt;DynamicCache&lt;/code&gt; subclass. It compresses on &lt;code&gt;update()&lt;/code&gt; and decompresses on iteration. Works with &lt;code&gt;model.generate()&lt;/code&gt; directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tiny firmware export profile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; python examples/e2e_benchmark.py &lt;span class="nt"&gt;--profile&lt;/span&gt; tiny
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generates a build-ready JSON profile that proves the cache shape, bit allocation, and target ratio without needing a full model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next direction: retrieval-assisted memory
&lt;/h3&gt;

&lt;p&gt;A sensible next step is to combine KV compression with an embedding-indexed memory layer so the system can retrieve the most relevant past context instead of keeping every token equally alive. That could push compression harder while keeping quality closer to baseline, but that is a &lt;strong&gt;research direction&lt;/strong&gt;, not a claim I can honestly call zero-loss yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this is not (yet)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not a throughput win on small/fast models&lt;/strong&gt; — compression overhead &amp;gt; memory savings for distilgpt2 on CPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a training system&lt;/strong&gt; — inference only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not magic&lt;/strong&gt; — it targets the KV cache, not weights&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A real, working KV cache compressor with honest benchmarks&lt;/li&gt;
&lt;li&gt;A drop-in &lt;code&gt;DynamicCache&lt;/code&gt; that production pipelines can use today&lt;/li&gt;
&lt;li&gt;A foundation for the regimes where memory wins translate to throughput wins (larger models, longer context)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;kvquant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or from source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/AmSach/KVQuant.git
&lt;span class="nb"&gt;cd &lt;/span&gt;KVQuant
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="nv"&gt;PYTHONPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; python examples/e2e_benchmark.py &lt;span class="nt"&gt;--model&lt;/span&gt; distilgpt2 &lt;span class="nt"&gt;--output-dir&lt;/span&gt; ./benchmark-results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;All benchmark data is reproducible. Screenshots and JSON logs are in the repo under &lt;code&gt;examples/&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
