<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shakib S.</title>
    <description>The latest articles on DEV Community by Shakib S. (@workspacedex).</description>
    <link>https://dev.to/workspacedex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3773333%2F1a24000d-1620-4beb-bd9c-d71676ed32f8.jpg</url>
      <title>DEV Community: Shakib S.</title>
      <link>https://dev.to/workspacedex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/workspacedex"/>
    <language>en</language>
    <item>
      <title>The Hitchhiker's Guide to Running Agentic Systems Locally</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Thu, 09 Apr 2026 18:07:09 +0000</pubDate>
      <link>https://dev.to/workspacedex/the-hitchhikers-guide-to-running-agentic-systems-locally-312p</link>
      <guid>https://dev.to/workspacedex/the-hitchhikers-guide-to-running-agentic-systems-locally-312p</guid>
      <description>&lt;p&gt;&lt;strong&gt;Engineering a Hybrid LLM Router for Production Agentic Systems&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every agentic system eventually confronts the same wall: intelligence costs latency, and latency destroys experience. The standard prescription — throw more compute at it — is lazy engineering disguised as ambition.&lt;/p&gt;

&lt;p&gt;After months of iterating on agentic workflows on an Arch Linux rig, I found a third path. Not faster models. Not cheaper models. A smarter layer that decides which model to use — and when.&lt;/p&gt;

&lt;p&gt;Small open-weight models are excellent for routine tasks: fast, private, and inexpensive to run. Their limitation appears when prompts require multi-step reasoning, structured output, or strict tool use.&lt;/p&gt;

&lt;p&gt;But it has what I call a &lt;strong&gt;reasoning ceiling&lt;/strong&gt; — a hard limit where multi-step logical deduction collapses into what I call the &lt;strong&gt;confidence loop&lt;/strong&gt;: the model executes the wrong tool with complete conviction, or hallucinates a JSON schema that does not exist.&lt;/p&gt;

&lt;p&gt;Frontier APIs — DeepSeek V3.2, GPT-4o — solve the ceiling problem. But they also introduce extra latency, especially when used for simple requests and a monthly bill that resembles a lease payment more than a software expense.&lt;/p&gt;

&lt;p&gt;The more practical solution is not to depend on one model for everything, but to add a routing layer that chooses the right model for the task.&lt;/p&gt;

&lt;p&gt;What follows is a production engineering account of how to build one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Routing Layer: Theory vs. Reality
&lt;/h2&gt;

&lt;p&gt;The naive formulation is seductive: route 'trivial' tasks to the local 9B, 'complex' tasks to the cloud. The production reality is that complexity is not a binary flag — and the cost of misclassification flows in both directions.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Failure of Keyword-Based Routing
&lt;/h3&gt;

&lt;p&gt;My first iteration used a keyword router. Prompts containing 'analyze' or 'compare' were dispatched to the cloud. It produced two failure modes that made it unusable in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives:&lt;/strong&gt; A prompt like 'Compare 2+2 and 3+3' hit the cloud, wasting API credits and adding 2+ seconds of network round-trip for a task any model could answer instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False negatives:&lt;/strong&gt; A deceptively simple-looking prompt — 'Summarize this 10k-token log and find the one-line error' — was sent local. The 9B choked on the context window and hallucinated the result with full confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Confidence-Based Architecture
&lt;/h3&gt;

&lt;p&gt;The solution was to evaluate prompts across three independent signal vectors rather than classify them by keyword pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constraint Density.&lt;/strong&gt; Does the prompt contain more than three strict, simultaneous constraints? ('JSON output,' 'Under 50 words,' 'Reference Page 4.') High constraint density is a reliable predictor of structured output failure in quantized models. Route to cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Pressure.&lt;/strong&gt; Is the input token count above 8k? Local 9B models begin to exhibit needle-in-a-haystack degradation in this range — they process the context window but lose positional accuracy on retrievals. Route to cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Scout Classifier.&lt;/strong&gt; A dedicated 1B model — lightweight enough to run in under 50ms — whose sole function is to categorize incoming prompts as Trivial, Standard, or Complex. It adds minimal overhead while dramatically improving routing accuracy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xe81fokkc8xaatd4kbj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xe81fokkc8xaatd4kbj.png" alt=" " width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unit Economics of Agentic Compute
&lt;/h2&gt;

&lt;p&gt;The standard metric — monthly API spend — is the wrong unit of measurement for agentic systems. It optimizes for the wrong variable and obscures the actual cost structure of the work.&lt;/p&gt;

&lt;p&gt;The correct metric is &lt;strong&gt;Cost per Successful Task (CPST)&lt;/strong&gt;. If a local model is 'free' but fails 30% of the time, requiring manual human correction, the cost is not zero — it is your time, which is the most expensive resource in the system. If a cloud model charges $0.05 and succeeds 100% of the time, it is, by any rational accounting, cheaper.&lt;/p&gt;

&lt;p&gt;Free models are not free. They externalize cost onto the operator's attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tradeoffs in the Quantization Curve
&lt;/h3&gt;

&lt;p&gt;A week of benchmarking q4_K_M vs. q8_0 (GGUF) produced a finding that materially changes how the hybrid system should be designed:&lt;/p&gt;

&lt;p&gt;For most routine tasks, q4_K_M performs close to full precision, but structured tool-calling is where reliability begins to degrade.&lt;/p&gt;

&lt;p&gt;For structured tool-calling, q4 quantization introduces an intermittent &lt;strong&gt;bracket-drop failure&lt;/strong&gt;: occasionally missing a closing brace in a generated JSON schema. This single failure mode propagates up the agent loop and crashes the entire execution chain.&lt;/p&gt;

&lt;p&gt;The engineering resolution is straightforward: use q4_K_M for scout classification and general conversational tasks; reserve a dedicated q8_0 inference slice for the tool-calling engine specifically. The additional memory overhead is modest. The reliability gain is non-trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  ROI Breakdown: Daily Heavy Usage
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F122tuw3rgqp8nazx7eek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F122tuw3rgqp8nazx7eek.png" alt=" " width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The daily operating cost of this architecture — under heavy professional use — is approximately &lt;strong&gt;$0.17&lt;/strong&gt;. For comparison: a single GPT-4o API session on a complex document task can easily exceed that figure on its own.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation: Beyond the Toy Script
&lt;/h2&gt;

&lt;p&gt;A production router cannot be a collection of if-statements around &lt;code&gt;os.popen&lt;/code&gt; calls. The core requirements are: asynchronous evaluation, type-safe output validation, and resilient fallback semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Async + Pydantic Stack
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F307okcfexossp4mavebl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F307okcfexossp4mavebl.png" alt=" " width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The current implementation uses &lt;code&gt;asyncio&lt;/code&gt; for parallel prompt evaluation — the scout classification and context pressure check run concurrently, not sequentially. Pydantic enforces the routing schema: if the local model produces an invalid tool call, the resulting &lt;code&gt;ValidationError&lt;/code&gt; is caught at the boundary, and execution silently fails over to the cloud model. The user sees a valid response. The failure is logged for analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;## Production routing logic - async, type-safe, resilient
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;LLMResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;## Parallel evaluation: scout + context pressure
&lt;/span&gt;    &lt;span class="n"&gt;classification&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pressure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;scout_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;check_context_pressure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;complex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;pressure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cloud_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;local_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ResponseSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Pydantic guard
&lt;/span&gt;    &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LocalInferenceError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;## Graceful degradation - invisible to the caller
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cloud_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;ValidationError&lt;/code&gt; catch is the critical architectural seam. Without it, a single malformed tool call from the local model becomes a process-level exception that kills the agent loop.&lt;/p&gt;

&lt;p&gt;With it, the system degrades gracefully to the cloud path without user-visible impact, while preserving the performance characteristics for the 90%+ of tasks that the local model handles correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: What to Instrument
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Route decision distribution per session (local% vs. cloud%)&lt;/li&gt;
&lt;li&gt;Local validation failure rate — a rising trend signals model drift or prompt distribution shift&lt;/li&gt;
&lt;li&gt;End-to-end CPST across task categories — the ground truth metric for system health&lt;/li&gt;
&lt;li&gt;Scout classifier latency — should remain under 80ms or the overhead defeats its purpose&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Computational Sovereignty: The Strategic Dimension
&lt;/h2&gt;

&lt;p&gt;This architecture is more than a cost optimization exercise. It is a hedge against &lt;strong&gt;dependency risk&lt;/strong&gt; — a class of risk that most engineering teams do not model until it becomes acute.&lt;/p&gt;

&lt;p&gt;Depending entirely on a cloud provider for inference capability introduces a category of operational risk that conventional SLAs do not address. API pricing is not fixed. Rate limits shift. Providers modify moderation behavior. A workflow that runs cleanly today may be rejected tomorrow due to policy changes applied server-side, without notice, to weights you do not own.&lt;/p&gt;

&lt;p&gt;By maintaining a local 9B baseline, you own the weights. You retain a survival-minimum of intelligence that operates offline, during outages, and independent of any provider's policy decisions.&lt;/p&gt;

&lt;p&gt;The hybrid architecture is not a concession to the limits of local models. It is a deliberate design choice that treats cloud inference as a performance upgrade — powerful, but optional — rather than a fundamental dependency. The baseline capability is yours. The ceiling is rented.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: The Personal Intelligence Stack
&lt;/h2&gt;

&lt;p&gt;The pattern described here — scout, route, validate, fallback — is not novel as an abstract architecture. Routing layers exist across distributed systems engineering. What is new is applying it to the inference layer of agentic AI, at the level of the individual practitioner.&lt;/p&gt;

&lt;p&gt;Today, this looks like a sophisticated personal optimization. A few years from now, it will look like table stakes. The practitioners who ship reliable agentic systems are not waiting for a single model to solve the latency-intelligence paradox. They are building the routing layer that makes the paradox irrelevant.&lt;/p&gt;

&lt;p&gt;People won't talk about using AI tools. They'll talk about running &lt;strong&gt;personal intelligence stacks&lt;/strong&gt;. This is what that looks like — early.&lt;/p&gt;

&lt;p&gt;The weights are already cheap. The inference hardware is already accessible. The only remaining variable is the engineering discipline to compose them correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build the router.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>local</category>
      <category>llm</category>
      <category>architecture</category>
      <category>cloud</category>
    </item>
    <item>
      <title>I Tried to Build a Local Claude-Style Assistant</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Mon, 09 Mar 2026 11:35:53 +0000</pubDate>
      <link>https://dev.to/workspacedex/i-tried-to-build-a-local-claude-style-assistant-gjf</link>
      <guid>https://dev.to/workspacedex/i-tried-to-build-a-local-claude-style-assistant-gjf</guid>
      <description>&lt;p&gt;I didn't want a demo. I wanted a real assistant.&lt;/p&gt;

&lt;p&gt;The plan was simple. To use &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; with Qwen's latest 3.5 Large Language Models.&lt;/p&gt;

&lt;p&gt;Not another local LLM that could summarize text and write boilerplate. I wanted something with memory — something that could keep a user profile, retrieve my notes, use tools, live in Telegram, and actually feel persistent. The kind of thing you close your laptop and trust is still &lt;em&gt;there&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So I did what the local AI community makes look achievable: I tried to build it myself.&lt;/p&gt;

&lt;p&gt;My machine: Arch Linux, an RTX 3050-class GPU with 6 GB VRAM, ~12 GB of system RAM. Enough to run small models. Enough to experiment. Enough, I thought, to build something real.&lt;/p&gt;

&lt;p&gt;What I got instead was a sharp education in the gap between "running a model locally" and "running an agent framework locally." They are not the same workload, and conflating them is the most common mistake in this space.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack That Made Sense on Paper
&lt;/h2&gt;

&lt;p&gt;The tool I wanted to build on was &lt;strong&gt;OpenClaw&lt;/strong&gt; — an open-source agent framework that layers tools, memory, sessions, multi-channel support, and structured workflows on top of a language model backend. It promised to be the missing piece between "I can run a model" and "I have an assistant."&lt;/p&gt;

&lt;p&gt;The plan:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; to serve models locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen&lt;/strong&gt; as the model (small, recent, reportedly strong for its size)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; as the agent layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On paper, this is a reasonable stack. In practice, it surfaces every assumption that local AI discourse glosses over.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mistake #1: Assuming the Model Was the Whole Problem
&lt;/h2&gt;

&lt;p&gt;I started with &lt;code&gt;qwen3.5:4b&lt;/code&gt; through Ollama.&lt;/p&gt;

&lt;p&gt;It ran. That wasn't the issue.&lt;/p&gt;

&lt;p&gt;The issue was behavioral: the model kept slipping into visible chain-of-thought output. Every response came with narrated reasoning. Not broken, but wrong — asking for a daily assistant and getting a model that wanted to think out loud at every turn is like hiring a receptionist who reads their internal monologue aloud before answering the phone.&lt;/p&gt;

&lt;p&gt;So I did what everyone does: I tried to prompt my way out of it. Stricter system prompts. Custom Modelfiles. Context limits. Persona instructions. Explicit &lt;code&gt;/no_think&lt;/code&gt; directives.&lt;/p&gt;

&lt;p&gt;That helped. It didn't solve the core mismatch.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;qwen3.5:4b&lt;/code&gt; model is from the &lt;em&gt;thinking&lt;/em&gt; branch of the Qwen family — it's optimized for reasoning tasks, not conversational fluency. The model family matters. Using a reasoning model for a chat assistant is a category error, not a configuration problem.&lt;/p&gt;

&lt;p&gt;The fix was straightforward once I admitted it: switch to an instruct model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3:4b-instruct-2507-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;q4_K_M&lt;/code&gt; quantization is a solid default for 4B models on 6 GB VRAM — it cuts memory footprint meaningfully without destroying output quality. With a pure instruct model, the behavioral issues cleared up immediately.&lt;/p&gt;

&lt;p&gt;But the harder problem was still waiting.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mistake #2: Thinking "Model Works" Means "Stack Works"
&lt;/h2&gt;

&lt;p&gt;Here's what a plain local chat app sends to a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[system prompt]
[recent conversation turns]
[user message]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. A few hundred tokens, maybe a few thousand if you keep a long history. Totally manageable.&lt;/p&gt;

&lt;p&gt;Here's what an agent framework sends to a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[system prompt]
[tool schemas — every tool the agent can call]
[session state and memory]
[workspace context]
[bootstrap instructions]
[structured output expectations]
[past actions and results]
[conversation history]
[user message]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every one of those elements costs tokens. And on a local setup, tokens aren't a billing abstraction — they're a &lt;strong&gt;memory and latency problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the thing local AI content almost never explains clearly: the model is not the product. The &lt;em&gt;framework around the model&lt;/em&gt; is the product. And that framework has weight.&lt;/p&gt;

&lt;p&gt;OpenClaw made this visible very quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Context Window Trap: 4K vs 16K vs 262K
&lt;/h2&gt;

&lt;p&gt;The first error I hit was blunt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: context window too small. Minimum required: 16000 tokens. Current: 4096.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fine. I increased the context window.&lt;/p&gt;

&lt;p&gt;Then Ollama started allocating memory as if the context was 262,144 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: model requires ~38.9 GiB of system memory. Available: ~12.5 GiB.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not a typo. 38.9 GB for a 4B model, because of context window size.&lt;/p&gt;

&lt;p&gt;Here's why: transformer attention is quadratic in sequence length. When you set a 262K context window, the KV cache — the memory structure that stores computed attention for every token in context — scales with it. For a 4B model at &lt;code&gt;q4_K_M&lt;/code&gt; quantization, the model weights themselves are around 2.5 GB. But a 262K context KV cache can dwarf that by an order of magnitude.&lt;/p&gt;

&lt;p&gt;The math, roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;KV cache size ≈ 2 × layers × heads × head_dim × context_length × bytes_per_element

For Qwen 4B:
≈ 2 × 32 × 8 × 128 × 262144 × 2 bytes
≈ ~34 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see how a "reasonable" context ceiling becomes a hardware wall fast.&lt;/p&gt;

&lt;p&gt;The trap is that the minimum functional context OpenClaw needs (16K) is already well above what fits comfortably in a 6 GB VRAM setup if you want any headroom for inference. And 16K is the &lt;em&gt;floor&lt;/em&gt;, not the sweet spot.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Configuration Spiral
&lt;/h2&gt;

&lt;p&gt;What followed was a long sequence of plausible-looking fixes that went nowhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set &lt;code&gt;num_ctx: 8192&lt;/code&gt; in Ollama → OpenClaw complained the session minimum wasn't met&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;num_ctx: 16384&lt;/code&gt; → worked, but sessions kept inflating back to 262K in the state view&lt;/li&gt;
&lt;li&gt;Manually edited OpenClaw config files → watched changes revert&lt;/li&gt;
&lt;li&gt;Checked whether I was editing the wrong state directory → yes, sometimes&lt;/li&gt;
&lt;li&gt;Created local model aliases with explicit context caps → partial success&lt;/li&gt;
&lt;li&gt;Hard-capped context in every config location I could find → Ollama respected it; OpenClaw didn't always agree with the result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is seductive. Every partial success feels like momentum. The model loaded? Great. The context lowered? Okay. The agent accepted the config? Almost there.&lt;/p&gt;

&lt;p&gt;Then the next hidden assumption surfaces.&lt;/p&gt;

&lt;p&gt;The system wasn't randomly broken. It was &lt;em&gt;consistently&lt;/em&gt; surfacing the same underlying incompatibility: the framework assumed a context budget that my hardware couldn't provide. Every workaround was borrowing against that fundamental gap, not closing it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Insight That Reframed Everything
&lt;/h2&gt;

&lt;p&gt;At some point I stopped treating this as a configuration bug and asked a different question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What kind of system is OpenClaw actually designed for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "can I make it work with a 4B model on 6 GB VRAM?" but "what does the framework assume about its environment?"&lt;/p&gt;

&lt;p&gt;The answer, once you look honestly at the architecture, is clear:&lt;/p&gt;

&lt;p&gt;OpenClaw is built for models with &lt;strong&gt;real context headroom&lt;/strong&gt; — 32K, 64K, 128K tokens where the agent scaffolding is a small fraction of available budget rather than the entire budget.&lt;/p&gt;

&lt;p&gt;It's built for models with &lt;strong&gt;low-latency inference&lt;/strong&gt; — where tool call round-trips and multi-step reasoning don't become multi-minute waits.&lt;/p&gt;

&lt;p&gt;It's built for &lt;strong&gt;the API tier&lt;/strong&gt;, not the consumer GPU tier.&lt;/p&gt;

&lt;p&gt;That's not a criticism. It's a design reality. The framework does things that genuinely require those resources: persistent memory, multi-turn tool use, session-aware behavior, complex orchestration. That stuff is the whole value proposition. And it has a minimum viable substrate.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed When I Switched to a Cloud Backend
&lt;/h2&gt;

&lt;p&gt;I pointed OpenClaw at Kimi K2.5 via a cloud API and the experience shifted immediately.&lt;/p&gt;

&lt;p&gt;Not magically. The framework still has quirks. But the fundamental friction — the constant negotiation over whether the infrastructure could physically support the next operation — disappeared.&lt;/p&gt;

&lt;p&gt;Messages went through cleanly. Context stopped being the entire conversation. The tool layer worked the way the documentation described. I could actually evaluate the product rather than fighting the substrate.&lt;/p&gt;

&lt;p&gt;The comparison is useful: the same framework, the same prompts, the same configuration. The only variable was whether the model backend could absorb the overhead without drowning.&lt;/p&gt;

&lt;p&gt;Local: every interaction was a resource negotiation.&lt;br&gt;&lt;br&gt;
Cloud: the framework did what it was supposed to do.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Actually Means for Local AI
&lt;/h2&gt;

&lt;p&gt;I want to be precise here, because "just use the cloud" is a lazy conclusion and I don't believe it.&lt;/p&gt;

&lt;p&gt;Small local models are genuinely good at a real set of tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plain conversational chat&lt;/strong&gt; — instruct models at 4B–8B are solid here&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Focused code help&lt;/strong&gt; — constrained tasks where context window is not the bottleneck&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document drafting&lt;/strong&gt; — one-shot or few-shot generation over small inputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local RAG&lt;/strong&gt; — retrieval over a small, well-scoped document set&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy-sensitive workflows&lt;/strong&gt; — anything that shouldn't leave your machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where local models struggle is not a model quality problem. It's a &lt;strong&gt;systems design problem&lt;/strong&gt;: if your tool layer, memory layer, session model, and orchestration all assume a generous context budget, a small local setup will spend most of its energy surviving the framework rather than doing useful work.&lt;/p&gt;

&lt;p&gt;The honest reframe is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Running a 4B model in a simple chat interface and running that same 4B model inside a full agent framework are not the same workload. One fits on your GPU. The other assumes datacenter-class headroom.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Treating them as equivalent is why so many local AI projects stall out in configuration hell rather than producing something useful.&lt;/p&gt;


&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If your goal is a local everyday assistant:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use a small instruct model in a lean interface. Ollama + Open WebUI or a minimal Python frontend. Keep the context requirement below 8K. Avoid frameworks that inject large amounts of scaffolding unless you've measured the overhead. A &lt;code&gt;q4_K_M&lt;/code&gt; quantized 4B–8B instruct model in a simple chat loop is genuinely useful.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.2:3b-instruct-q4_K_M  &lt;span class="c"&gt;# ~2GB, fast, good for chat&lt;/span&gt;
ollama pull qwen3:8b-instruct-q4_K_M     &lt;span class="c"&gt;# ~5GB, better reasoning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;If your goal is to actually experience OpenClaw:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with a cloud-backed model. Let the framework do what it was designed to do before you optimize for local deployment. You'll learn what the product actually is rather than spending all your time fighting the substrate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're committed to local + agent features:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You need either a machine with 24+ GB VRAM (RTX 4090, A-series workstation GPUs), or you need to be very intentional about which agent features you enable and what their context cost is. Profile the token overhead of each feature before enabling it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Question I Should Have Asked First
&lt;/h2&gt;

&lt;p&gt;I went into this project asking: &lt;em&gt;"Which model should I run?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's the wrong question. The right question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What kind of system am I trying to run, and what does that system actually require?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Plain chat and agent frameworks are different categories with different resource profiles. A model that works beautifully in a simple interface can fail badly inside a framework that assumes 10x the context budget.&lt;/p&gt;

&lt;p&gt;Understanding that distinction early would have saved me a lot of configuration spirals. It's also just a more accurate mental model for thinking about local AI in general — not as "models you run" but as "systems with compute requirements," where the framework overhead is often larger than the model itself.&lt;/p&gt;

&lt;p&gt;That's the lesson that actually transfers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you run into context window or memory walls with local agent frameworks? What workarounds have actually held up? I'd like to hear what's working.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>Why Local AI Agents Fail Silently — and What to Measure Before You Ship</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Sun, 08 Mar 2026 18:10:44 +0000</pubDate>
      <link>https://dev.to/workspacedex/small-llms-arent-dumb-theyre-just-missing-tools-2fnh</link>
      <guid>https://dev.to/workspacedex/small-llms-arent-dumb-theyre-just-missing-tools-2fnh</guid>
      <description>&lt;p&gt;If you spend enough time around AI engineering, you eventually run into the same frustration.&lt;/p&gt;

&lt;p&gt;You use a cloud model like ChatGPT or Claude, and it feels impressively capable. It can reason through multi-step tasks, fetch up-to-date information, write code, and respond with the kind of fluency that makes it feel far more useful than a simple text generator.&lt;/p&gt;

&lt;p&gt;Then you run a local model on your own machine — Llama, Mistral, Qwen, or another open model — and the experience feels much more limited.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It cannot answer questions about current events.&lt;/li&gt;
&lt;li&gt;It struggles with tasks that require live information.&lt;/li&gt;
&lt;li&gt;It often feels weaker than the cloud systems you are used to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The immediate reaction is: &lt;em&gt;"Open-source models just aren't as good."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But that explanation is incomplete.&lt;/p&gt;

&lt;p&gt;The real difference between a cloud AI product and a local model is rarely just model quality. More often, &lt;strong&gt;the gap comes from the surrounding infrastructure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud AI systems are rarely "just a model." They are packaged with orchestration layers, tool calling, retrieval systems, search, memory, and routing logic that make the model feel more capable than it would on its own.&lt;/p&gt;

&lt;p&gt;When you run a local model, you are usually interacting with the raw foundation model directly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To make a local LLM genuinely useful, you often do not need a bigger model first. You need to give it access to tools. You need to turn it into an &lt;strong&gt;agent&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this tutorial, we will build a simple local AI agent using Python and Ollama. By the end, your local model will be able to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reason about a task&lt;/li&gt;
&lt;li&gt;decide when it needs external information&lt;/li&gt;
&lt;li&gt;call a web search tool&lt;/li&gt;
&lt;li&gt;return answers that are more useful and up to date&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What We Will Build
&lt;/h2&gt;

&lt;p&gt;By the end of this tutorial, you will have a local AI agent that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run a local LLM with Ollama&lt;/li&gt;
&lt;li&gt;decide when it needs external data&lt;/li&gt;
&lt;li&gt;call a web search tool automatically&lt;/li&gt;
&lt;li&gt;integrate tool results into its reasoning loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  │
  ▼
Local LLM (Ollama)
  │
  ▼
Agent Loop (ReAct)
  │
  ▼
Tool Router
  │
  └── Web Search Tool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of treating the model as an all-knowing oracle, we treat it as the &lt;strong&gt;reasoning engine&lt;/strong&gt; inside a larger application.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Local LLMs Feel Weak
&lt;/h2&gt;

&lt;p&gt;Before writing any code, it's important to understand why local models feel weaker out of the box.&lt;/p&gt;

&lt;p&gt;The issue isn't necessarily the model. &lt;strong&gt;The issue is the missing infrastructure around the model.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Missing Orchestration
&lt;/h3&gt;

&lt;p&gt;When you chat with systems like ChatGPT, your message is not simply passed to a model. Behind the scenes, an orchestration layer decides things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether the model should search the web&lt;/li&gt;
&lt;li&gt;whether it should execute Python&lt;/li&gt;
&lt;li&gt;whether additional context should be retrieved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your local LLM does none of this. It simply predicts the next token in a sequence. Without orchestration, the model is forced to guess information it cannot access.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lack of External Tools
&lt;/h3&gt;

&lt;p&gt;LLMs are excellent reasoning engines, but terrible databases. If you ask a model for today's weather, the correct answer requires live data.&lt;/p&gt;

&lt;p&gt;Humans solve this by using tools: calculators, web browsers, APIs. LLMs can do the same — but only if you give them access to those tools. Without tools, the model is effectively trapped inside its training data.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Missing Middleware and State
&lt;/h3&gt;

&lt;p&gt;Production AI systems include middleware that handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context management&lt;/li&gt;
&lt;li&gt;memory summarization&lt;/li&gt;
&lt;li&gt;structured tool outputs&lt;/li&gt;
&lt;li&gt;retry logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without these systems, a local model can quickly lose context or fail at multi-step tasks.&lt;/p&gt;

&lt;p&gt;This leads to an important shift in perspective:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;❌ Old thinking&lt;/th&gt;
&lt;th&gt;✅ New thinking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"The model should know everything."&lt;/td&gt;
&lt;td&gt;"The model should decide which tools to use."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your LLM becomes the &lt;strong&gt;CPU&lt;/strong&gt; of an AI application.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Architecture
&lt;/h2&gt;

&lt;p&gt;To enable tool usage, we need a simple agent architecture. One of the most widely used patterns is called &lt;strong&gt;ReAct (Reasoning + Acting)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of generating a single response, the model runs inside a reasoning loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user sends a query&lt;/li&gt;
&lt;li&gt;The model reasons about the problem&lt;/li&gt;
&lt;li&gt;The model decides if it needs a tool&lt;/li&gt;
&lt;li&gt;The tool executes&lt;/li&gt;
&lt;li&gt;The result is returned to the model&lt;/li&gt;
&lt;li&gt;The model produces the final answer&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Components of the System
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Brain (LLM)&lt;/strong&gt;&lt;br&gt;
The local model running in Ollama. It reads the query and decides what to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Tool Library&lt;/strong&gt;&lt;br&gt;
A collection of Python functions such as &lt;code&gt;search_web()&lt;/code&gt;, &lt;code&gt;read_file()&lt;/code&gt;, &lt;code&gt;calculate()&lt;/code&gt;. Each tool exposes a clear schema so the model knows how to call it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Router&lt;/strong&gt;&lt;br&gt;
The router connects the LLM to the tools. If the LLM requests a tool call, the router identifies the tool, executes the Python function, and returns the result to the model.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 1 — Install Ollama
&lt;/h2&gt;

&lt;p&gt;Ollama makes it easy to run local models with an API interface similar to OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Download Ollama&lt;/strong&gt; from &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;https://ollama.com&lt;/a&gt; and install it for your OS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pull a tool-capable model:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 Instruct-tuned models generally perform better with tool calling.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Start the model:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run llama3.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads approximately 4–5 GB of model weights. Exit with &lt;code&gt;/bye&lt;/code&gt; or &lt;code&gt;Ctrl+D&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set up your Python environment:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv agent_env
&lt;span class="nb"&gt;source &lt;/span&gt;agent_env/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;ollama duckduckgo-search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ollama&lt;/code&gt; for model interaction&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;duckduckgo-search&lt;/code&gt; for live web search&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2 — Add a Web Search Tool
&lt;/h2&gt;

&lt;p&gt;Create a file called &lt;code&gt;agent.py&lt;/code&gt; and add the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;duckduckgo_search&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DDGS&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Search the web for up-to-date information.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Tool] Searching the web for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DDGS&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;snippet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why docstrings matter
&lt;/h3&gt;

&lt;p&gt;When Ollama exposes tools to the model, it builds a schema from the function name, its arguments, and its &lt;strong&gt;docstring&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The model uses that schema to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the tool does&lt;/li&gt;
&lt;li&gt;when it should be used&lt;/li&gt;
&lt;li&gt;what arguments to pass&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ If your tool descriptions are vague, the model is more likely to hallucinate bad tool calls. Good docstrings directly improve reliability.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 3 — Implement a Tool Router
&lt;/h2&gt;

&lt;p&gt;Next, add a router that safely executes tool calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;web_search&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: tool &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool execution error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This router acts as a &lt;strong&gt;security layer&lt;/strong&gt;. If the LLM hallucinates a tool like &lt;code&gt;hack_mainframe()&lt;/code&gt;, the router safely blocks it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4 — Run the Agent Loop
&lt;/h2&gt;

&lt;p&gt;Now implement the full ReAct agent loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an assistant with access to tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop implements the full ReAct cycle. The model can reason, call a tool, receive results, and generate a final answer — all in one flow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 — Test It
&lt;/h2&gt;

&lt;p&gt;Add this to the bottom of &lt;code&gt;agent.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who won the Super Bowl in 2024?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Who won the Super Bowl in 2024?
[Tool] Searching the web for: Super Bowl 2024 winner
Agent: The Kansas City Chiefs won Super Bowl LVIII in 2024 with a score of 25-22 against the San Francisco 49ers.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the local model has done something important:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It recognized it did not know the answer&lt;/li&gt;
&lt;li&gt;It decided to call the web search tool&lt;/li&gt;
&lt;li&gt;It retrieved fresh information&lt;/li&gt;
&lt;li&gt;It incorporated that result into its final response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your model is no longer limited to its training data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full &lt;code&gt;agent.py&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;duckduckgo_search&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DDGS&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Search the web for up-to-date information.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Tool] Searching the web for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DDGS&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;snippet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;web_search&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: tool &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;function&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool execution error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an assistant with access to tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who won the Super Bowl in 2024?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Using the same pattern, you can extend the agent with tools that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;query databases&lt;/li&gt;
&lt;li&gt;read PDFs&lt;/li&gt;
&lt;li&gt;run shell commands&lt;/li&gt;
&lt;li&gt;call external APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some ideas to try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add more tools to AVAILABLE_TOOLS
&lt;/span&gt;&lt;span class="n"&gt;AVAILABLE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;run_python_snippet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The perceived "incompetence" of local LLMs is often not a model problem first. &lt;strong&gt;It is an infrastructure problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you wrap a local model in an agent architecture — with a reasoning loop, tool library, and routing layer — it becomes far more useful than a raw chat interface suggests.&lt;/p&gt;

&lt;p&gt;Cloud AI systems will continue to dominate on raw scale and infrastructure maturity. But local agents offer something different: &lt;strong&gt;control, privacy, flexibility&lt;/strong&gt;, and the ability to shape the system around your own workflow.&lt;/p&gt;

&lt;p&gt;Once you start thinking this way, local models stop feeling like weak copies of cloud AI — and start feeling like programmable building blocks.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Drop a ❤️ and follow for more AI engineering content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why Your Local LLM Feels "Dumb" Compared to Cloud APIs</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Sat, 28 Feb 2026 09:28:10 +0000</pubDate>
      <link>https://dev.to/workspacedex/why-your-local-llm-feels-dumb-compared-to-cloud-apis-4id7</link>
      <guid>https://dev.to/workspacedex/why-your-local-llm-feels-dumb-compared-to-cloud-apis-4id7</guid>
      <description>&lt;h3&gt;
  
  
  The Experiment That Changed How I Think About AI
&lt;/h3&gt;

&lt;p&gt;I had just set up a fully local AI stack. Ollama running Llama 3 8B, clean terminal, no API keys, no monthly bill. I typed a real-world task:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Research the latest patch notes for Elden Ring and save a summary to my desktop."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It failed. Politely, apologetically — but it failed.&lt;/p&gt;

&lt;p&gt;Then I opened Claude. Same task. It searched the web, summarized the results, and handed me a file.&lt;/p&gt;

&lt;p&gt;My first instinct was the obvious one: &lt;em&gt;Claude is just smarter.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That instinct was wrong. And understanding why it's wrong is the most useful thing you can learn about AI right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Brain in a Jar Problem
&lt;/h2&gt;

&lt;p&gt;A local LLM in its default state is exactly this: &lt;strong&gt;a brain in a jar.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It has knowledge. Enormous knowledge, compressed into billions of parameters. But it has no limbs. It cannot reach the internet. It cannot write to your filesystem. It cannot verify its own outputs. It cannot call an API. It just... predicts the next token, over and over, inside a sealed container.&lt;/p&gt;

&lt;p&gt;When Claude "thinks" through a complex task, it isn't doing so purely with a bigger or smarter model. It's using &lt;strong&gt;tool calling&lt;/strong&gt; — a mechanism where the model emits structured instructions, and a surrounding system executes them.&lt;/p&gt;

&lt;p&gt;The model says: &lt;em&gt;"I need to search the web."&lt;/em&gt;&lt;br&gt;&lt;br&gt;
The system does the search.&lt;br&gt;&lt;br&gt;
The results come back into context.&lt;br&gt;&lt;br&gt;
The model says: &lt;em&gt;"Now write this to a file."&lt;/em&gt;&lt;br&gt;&lt;br&gt;
The system writes the file.&lt;/p&gt;

&lt;p&gt;The model itself never touched the internet. Never touched your filesystem. It just coordinated — and the orchestration layer did the actual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That orchestration layer is what you're paying for when you pay for Claude or GPT-4.&lt;/strong&gt; The model is increasingly a commodity. The IP is the system built around it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Proving It: Raw vs. Orchestrated
&lt;/h2&gt;

&lt;p&gt;To make this concrete, I ran an experiment with a simple setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inference: Ollama (Llama 3 8B)&lt;/li&gt;
&lt;li&gt;Orchestration: Python middleware with basic function calling&lt;/li&gt;
&lt;li&gt;Tools: A web search API (Tavily) and a local filesystem writer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The logic flow for the Elden Ring task:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without orchestration, the model has no path to success. It knows what patch notes are. It knows how to summarize. But it cannot get the data, so the task dies before it starts.&lt;/p&gt;

&lt;p&gt;With orchestration, the flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User sends the request&lt;/li&gt;
&lt;li&gt;The model emits structured JSON: &lt;code&gt;{"tool": "web_search", "query": "Elden Ring patch notes 2026"}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The Python middleware intercepts this, runs the search, returns the text&lt;/li&gt;
&lt;li&gt;The model summarizes and emits: &lt;code&gt;{"tool": "write_file", "filename": "summary.txt", "content": "..."}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The middleware writes the file&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Same 8B model. Completely different result.&lt;/p&gt;

&lt;p&gt;The finding was stark: &lt;strong&gt;an orchestrated 8B model consistently outperforms a raw 70B model on real-world tasks.&lt;/strong&gt; Not because it's smarter — because it has agency. It has limbs.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Code That Makes It Real
&lt;/h2&gt;

&lt;p&gt;Here's a minimal Python implementation of this pattern. This is not pseudocode — this runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;OLLAMA_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;write_to_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace with your Tavily or SearXNG endpoint
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.tavily.com/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_to_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File written: /tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are an agent with access to tools. When you need to use a tool, 
    respond ONLY with valid JSON in this format:
    {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arg1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}}

    Available tools:
    - web_search: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}}
    - write_file: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}}

    When the task is complete, respond normally in plain text.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# max 5 tool calls
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OLLAMA_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;KeyError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Model responded in plain text — task is done
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Final answer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max iterations reached.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the latest Elden Ring patch notes and save a summary to my desktop.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the bridge. Not complex — but powerful. Once you have this pattern, you can add any tool: database lookups, calendar access, code execution, anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Distillation Bonus
&lt;/h2&gt;

&lt;p&gt;There's a second force making this even more interesting right now.&lt;/p&gt;

&lt;p&gt;Larger models are being used to train smaller ones — a process called &lt;strong&gt;distillation&lt;/strong&gt;. The practical result: we are approaching a point where the reasoning capability of a trillion-parameter model gets compressed into 7 billion weights.&lt;/p&gt;

&lt;p&gt;Models like DeepSeek-R1, Qwen3, and Mistral's latest releases are examples of this trend. The gap between a "small" local model and a frontier cloud model is shrinking every quarter.&lt;/p&gt;

&lt;p&gt;This creates a compounding advantage for the local orchestration approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The model gets smarter&lt;/strong&gt; (distillation brings frontier reasoning to edge hardware)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The system gives it agency&lt;/strong&gt; (orchestration adds tools and memory)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your data stays local&lt;/strong&gt; (the privacy moat stays intact)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get increasing capability without increasing your exposure. That's a rare combination.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Takeaway
&lt;/h2&gt;

&lt;p&gt;Stop searching for a smarter model. The model is rarely your bottleneck.&lt;/p&gt;

&lt;p&gt;The three things that actually determine usefulness are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Web access&lt;/strong&gt; — A model without current information is operating blind. Add a search tool. Even a free SearXNG instance changes everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Memory / context persistence&lt;/strong&gt; — By default, each conversation starts from zero. A simple vector store (Chroma, Qdrant) or even a plain text log fed back into context gives your model continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. File system / execution access&lt;/strong&gt; — The ability to write, read, and run code transforms the model from an advisor into an agent.&lt;/p&gt;

&lt;p&gt;These are not advanced features. Each one is a weekend project. And together, they close most of the gap between a local 8B model and a productized cloud API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing: The Race Has Already Shifted
&lt;/h2&gt;

&lt;p&gt;The AI competition used to be about who could train the biggest model. That race is becoming irrelevant for most practical use cases.&lt;/p&gt;

&lt;p&gt;The new race is about who builds the best system around the most efficient model.&lt;/p&gt;

&lt;p&gt;Cloud providers understood this first — that's why their APIs feel so capable. But the tools to replicate that architecture locally are open, documented, and running on consumer hardware right now.&lt;/p&gt;

&lt;p&gt;You don't need a bigger model.&lt;br&gt;&lt;br&gt;
You need better plumbing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you want to go deeper on the self-hosting stack itself — Ollama, Open WebUI, and SearXNG working together — I covered the full setup in &lt;a href="https://medium.com/@strangelyevil/replacing-cloud-ai-with-a-privacy-first-local-llm-stack-8d9c651a0710" rel="noopener noreferrer"&gt;Part 1 of this series&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>architecture</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"Fuck You NVIDIA" (and What I Learned Staring at a Blank Screen)</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Thu, 26 Feb 2026 18:13:03 +0000</pubDate>
      <link>https://dev.to/workspacedex/fuck-you-nvidia-and-what-i-learned-staring-at-a-blank-screen-3g1g</link>
      <guid>https://dev.to/workspacedex/fuck-you-nvidia-and-what-i-learned-staring-at-a-blank-screen-3g1g</guid>
      <description>&lt;p&gt;When I A bug or an system display related issue I found in Arch Linux running with SDDM.&lt;/p&gt;

&lt;p&gt;It wakes up.&lt;/p&gt;

&lt;p&gt;Black screen.&lt;/p&gt;

&lt;p&gt;I'm running &lt;strong&gt;Arch Linux with SDDM&lt;/strong&gt;. NVIDIA GPU. Consumer card.&lt;/p&gt;

&lt;p&gt;Here's the thing about NVIDIA on Linux: their &lt;strong&gt;power management on consumer-level hardware is broken by design.&lt;/strong&gt; When the display sleeps and wakes, the driver conflicts. The screen doesn't recover. You're left staring at nothing, wondering if you broke something or if something was always broken.&lt;/p&gt;

&lt;p&gt;It's a known issue. Documented in forums. Mentioned in bug trackers. NVIDIA just hasn't cared enough to properly fix it for us.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyfr5os6vci9brb5pk0v.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyfr5os6vci9brb5pk0v.gif" alt=" " width="498" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Linus Torvalds, 2012 (Still accurate)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  A Year on Arch (Without Going Down the Rice Hole)
&lt;/h2&gt;

&lt;p&gt;I've been on Arch for a year now.&lt;/p&gt;

&lt;p&gt;I didn't rice it. Not obsessively, anyway. I knew the trap — you spend three months building a desktop that's perfectly yours: every keybinding, every color, every font chosen by your own hands, understood by exactly one person on Earth. Beautiful to you. Useless to your deadline.&lt;/p&gt;

&lt;p&gt;That wasn't wise for me. Not yet.&lt;/p&gt;

&lt;p&gt;But I still learned &lt;em&gt;more&lt;/em&gt; from this OS than any hand-holding distro ever taught me. Arch doesn't protect you from yourself. It hands you a blank canvas, a wiki, and your own stubbornness — then steps back.&lt;/p&gt;

&lt;p&gt;You learn because you &lt;em&gt;have&lt;/em&gt; to. And somehow that sticks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Today's Rabbit Hole
&lt;/h2&gt;

&lt;p&gt;I was installing &lt;strong&gt;Omarchy&lt;/strong&gt; on a CachyOS base — Hyprland setup, fresh install, ready to go. Used what was labeled a "safe" test script from GitHub.&lt;/p&gt;

&lt;p&gt;It wasn't perfect.&lt;/p&gt;

&lt;p&gt;I started troubleshooting. Logs, terminal output, forum threads from 2019 that are somehow still the most relevant thing on the internet. Feeding output to my AI, clicking through configs, muttering to myself.&lt;/p&gt;

&lt;p&gt;And then — the pieces connected.&lt;/p&gt;

&lt;p&gt;That display bug I'd been living with for &lt;em&gt;months&lt;/em&gt;? I finally traced it. NVIDIA drivers conflicting on wake from sleep. Consumer power management. A problem that's been sitting in plain sight, documented and unfixed, waiting for me to finally look it in the eye.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hours You Spend Confused Are the Investment
&lt;/h2&gt;

&lt;p&gt;Nobody tells you this when you install Arch.&lt;/p&gt;

&lt;p&gt;The blank screens, the &lt;code&gt;journalctl&lt;/code&gt; rabbit holes, the 3am forum threads — that's not wasted time. That's &lt;strong&gt;tuition&lt;/strong&gt;. You're paying for a mental model of your own system. One that no YouTube tutorial can hand you.&lt;/p&gt;

&lt;p&gt;I questioned those moments. Hard. Staring at nothing, wondering why I was doing this to myself, wondering what normal people do with their evenings.&lt;/p&gt;

&lt;p&gt;But today I can say: &lt;em&gt;I know what was wrong. I understand my system.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And the fix?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In /etc/systemd/logind.conf&lt;/span&gt;
&lt;span class="nv"&gt;HandleLidSwitch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ignore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line. Toggle the lid behavior off. The display stops sleeping. The conflict never triggers.&lt;/p&gt;

&lt;h2&gt;
  
  
  That's it. That's the ending. Months of blank screens, solved by one config line I could've written on day one — if I'd known enough to write it.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Still With You, Linus
&lt;/h2&gt;

&lt;p&gt;NVIDIA makes powerful hardware. They also make Linux users' lives unnecessarily difficult — and have for decades. The open-source community has worked around them, patched around them, and occasionally yelled at them in legendary fashion.&lt;/p&gt;

&lt;p&gt;I'm still here. Still on Arch. Still learning things the hard way, which turns out to be the only way that actually sticks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The system I'm running today — I &lt;em&gt;understand&lt;/em&gt; it. Not perfectly. Not completely. But more than I did yesterday, and infinitely more than if I'd stayed somewhere comfortable.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What I Actually Learned (TL;DR for the skimmers)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;NVIDIA + Linux + display sleep = known conflict, poorly maintained&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HandleLidSwitch=ignore&lt;/code&gt; in &lt;code&gt;/etc/systemd/logind.conf&lt;/code&gt; sidesteps the wake issue&lt;/li&gt;
&lt;li&gt;A year on Arch without ricing was the right call for me — depth over aesthetics&lt;/li&gt;
&lt;li&gt;The painful hours are the curriculum. There's no shortcut that gives you the same understanding&lt;/li&gt;
&lt;li&gt;Omarchy on CachyOS/Hyprland is worth exploring — just go in with eyes open&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Running Arch. Still here. Send help (or just more coffee).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you've hit the same NVIDIA sleep bug — drop your fix in the comments. There are a hundred ways to solve this and I've probably only found one of them.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>linux</category>
      <category>archlinux</category>
      <category>nvidia</category>
      <category>devjournal</category>
    </item>
    <item>
      <title>LeetCode: The “Contains Duplicate” Problem</title>
      <dc:creator>Shakib S.</dc:creator>
      <pubDate>Mon, 16 Feb 2026 19:25:23 +0000</pubDate>
      <link>https://dev.to/workspacedex/leetcode-the-contains-duplicate-problem-23m7</link>
      <guid>https://dev.to/workspacedex/leetcode-the-contains-duplicate-problem-23m7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Day 2 of Refusing to Write Code Without Understanding It.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6kcgc8gfl6tcfy1ogkm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff6kcgc8gfl6tcfy1ogkm.png" alt=" " width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There’s a specific question the computer is asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Does this list have the same number appearing more than once?”&lt;br&gt;
Simple. Almost boring.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But I made one rule for myself on this journey:&lt;/p&gt;

&lt;p&gt;I don’t just want the green “Accepted” badge on LeetCode.&lt;/p&gt;

&lt;p&gt;I want to see the code actually run.&lt;/p&gt;

&lt;p&gt;I want to watch the output.&lt;/p&gt;

&lt;p&gt;I want to feel the logic execute.&lt;/p&gt;

&lt;p&gt;That rule has kept this process fulfilling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem (Explained)
&lt;/h2&gt;

&lt;p&gt;Examples everyone understands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[3, 7, 1, 9]   → false 
[3, 7, 3, 9]   → true 
[5]            → false 
[8, 8]         → true 
[]             → false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We need a reliable way to detect if any number appears more than once.&lt;/p&gt;

&lt;p&gt;Let’s solve it the most human way first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The “Looking With Your Eyes” Method
&lt;/h2&gt;

&lt;p&gt;Imagine this list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[4, 1, 7, 2, 9, 4, 8]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would naturally do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Take 4 → remember it
Take 1 → new → remember
Take 7 → new
Take 2 → new
Take 9 → new
Take 4 → wait… I’ve seen 4 before → duplicate!
That’s the entire algorithm.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember what you’ve seen.&lt;/p&gt;

&lt;p&gt;If you see it again → stop.&lt;/p&gt;

&lt;p&gt;Now let’s write that in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning Human Thinking Into Python
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def has_duplicate(nums):
   seen = []                # empty list = our memory

   for num in nums:         # look at each number one by one
       if num in seen:      # is it already in memory?
           return True      # yes → duplicate found
       seen.append(num)     # no → remember it

   return False             # no duplicates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s mentally run it:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Input: [4, 1, 7, 4]&lt;br&gt;
4 → add → [4]&lt;br&gt;
1 → add → [4,1]&lt;br&gt;
7 → add → [4,1,7]&lt;br&gt;
4 → already in → return True&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Perfect.&lt;/p&gt;

&lt;p&gt;But here’s where things get interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Performance Reality
&lt;/h2&gt;

&lt;p&gt;Lists in Python are slow at checking membership.&lt;/p&gt;

&lt;p&gt;When you do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if num in seen:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python checks every element in the list until it finds a match.&lt;/p&gt;

&lt;p&gt;That’s &lt;a href="https://en.wikipedia.org/wiki/Big_O_notation" rel="noopener noreferrer"&gt;O(n)&lt;/a&gt; time.&lt;/p&gt;

&lt;p&gt;If the list has 100,000 elements, that check might scan through all 100,000.&lt;/p&gt;

&lt;p&gt;And we do that inside a loop.&lt;/p&gt;

&lt;p&gt;That makes the total time complexity:&lt;br&gt;
O(n²)&lt;/p&gt;

&lt;p&gt;For interviews? Not good.&lt;/p&gt;
&lt;h2&gt;
  
  
  Enter the Hash Map (via Set)
&lt;/h2&gt;

&lt;p&gt;Python has a special data structure called a set. This is the secret sauce.&lt;/p&gt;

&lt;p&gt;A set is implemented using a hash table (a structure that allows near constant-time lookup).&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;p&gt;Checking num in seen becomes approximately O(1).&lt;/p&gt;

&lt;p&gt;Now the entire loop becomes O(n).&lt;/p&gt;

&lt;p&gt;Here’s the optimized version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def has_duplicate(nums):
   seen = set()             # fast lookup structure

   for num in nums:
       if num in seen:
           return True
       seen.add(num)

   return False

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same logic.&lt;br&gt;
Different container.&lt;br&gt;
Massive performance difference.&lt;br&gt;
That’s the power of data structures.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Lab Setup (The Rule I Follow. So Should You.)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw6iochdt0fs0wmtoqfh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw6iochdt0fs0wmtoqfh.png" alt=" " width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where my personal rule kicks in.&lt;/p&gt;

&lt;p&gt;I don’t just submit to LeetCode.&lt;/p&gt;

&lt;p&gt;I build a tiny lab and run it myself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Solution:
   def containsDuplicate(self, nums):
       seen = set()
       for num in nums:
           if num in seen:
               return True
           seen.add(num)
       return False

s = Solution()
nums = [1, 0, 2, 5, 8, 9, 1]
result = s.containsDuplicate(nums)
print(result)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;True
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s more satisfying.&lt;/p&gt;

&lt;p&gt;It feels real.&lt;/p&gt;

&lt;p&gt;It feels engineered.&lt;/p&gt;

&lt;p&gt;Not gamified.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Problem Actually Taught Me
&lt;/h2&gt;

&lt;p&gt;This wasn’t about duplicates.&lt;/p&gt;

&lt;p&gt;It was about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choosing the right data structure&lt;/li&gt;
&lt;li&gt;Understanding time complexity&lt;/li&gt;
&lt;li&gt;Thinking in terms of scale&lt;/li&gt;
&lt;li&gt;Translating human logic into machine logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LeetCode is helping me build algorithmic discipline.&lt;/p&gt;

&lt;p&gt;And that discipline matters when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Designing APIs&lt;/li&gt;
&lt;li&gt;Optimizing backend systems&lt;/li&gt;
&lt;li&gt;Handling large datasets&lt;/li&gt;
&lt;li&gt;Preventing performance bottlenecks&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Thanks for reading.&lt;/p&gt;

&lt;p&gt;This week I’m focused on hash maps and strings.&lt;br&gt;
Cheers to learning.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>learning</category>
      <category>leetcode</category>
    </item>
  </channel>
</rss>
