<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mehmet TURAÇ</title>
    <description>The latest articles on DEV Community by Mehmet TURAÇ (@turacthethinker).</description>
    <link>https://dev.to/turacthethinker</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2891163%2Fc090c265-314e-4377-95f0-7b9083408109.jpg</url>
      <title>DEV Community: Mehmet TURAÇ</title>
      <link>https://dev.to/turacthethinker</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/turacthethinker"/>
    <language>en</language>
    <item>
      <title>Stop Your AI Agent From Building Tools That Already Exist</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:51:30 +0000</pubDate>
      <link>https://dev.to/turacthethinker/stop-your-ai-agent-from-building-tools-that-already-exist-6o9</link>
      <guid>https://dev.to/turacthethinker/stop-your-ai-agent-from-building-tools-that-already-exist-6o9</guid>
      <description>&lt;p&gt;Your agent just wrote a custom PDF parser.&lt;/p&gt;

&lt;p&gt;There were four maintained libraries that do exactly this. It didn't check. It never does.&lt;/p&gt;

&lt;p&gt;This is the default behavior of every coding agent: task arrives → code is written → you maintain the bespoke solution forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;skill-hunter&lt;/strong&gt; is the missing pause.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;skill-hunter is a pre-execution layer for coding and automation agents. Before your agent writes a single line, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classifies the request&lt;/strong&gt; — what kind of task is this?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scans the ecosystem&lt;/strong&gt; — MCP servers, CLIs, npm/pip packages, APIs, GitHub repos, existing repo utilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scores candidates&lt;/strong&gt; — fit, maintenance activity, permissions, security, docs quality, license, integration effort&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommends a path&lt;/strong&gt; — reuse, adapt, build minimally, or build custom&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gates risky actions&lt;/strong&gt; — installs, credentials, external service connections, destructive ops require explicit approval&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent checks the toolbox before building another hammer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Before / After
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Without skill-hunter:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Parse these PDFs and extract invoices."&lt;br&gt;
Agent: &lt;em&gt;writes 200-line custom parser&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;With skill-hunter:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Parse these PDFs and extract invoices."&lt;br&gt;
Agent: &lt;em&gt;checks PDF libraries, OCR tools, invoice extraction APIs, asks whether accuracy, cost, privacy, or offline processing matters&lt;/em&gt;&lt;br&gt;
Agent: "pdfplumber covers 90% of this. Want me to wrap it with your schema instead?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Install in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add mturac/skill-hunter
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;skill-hunter@skill-hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Codex CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex plugin marketplace add mturac/skill-hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in &lt;code&gt;~/.codex/config.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[plugins."skill-hunter@skill-hunter"]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenClaw:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/openclaw-skills:skill_hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The problem isn't that agents write bad code. The problem is that agents write &lt;em&gt;unnecessary&lt;/em&gt; code — and then you're stuck maintaining it.&lt;/p&gt;

&lt;p&gt;Every custom solution is a maintenance debt. Every dependency you don't introduce is a security surface you don't expose. Every API you don't reinvent is a battle-tested edge-case handler you get for free.&lt;/p&gt;

&lt;p&gt;skill-hunter doesn't stop agents from building. It stops agents from building what already exists.&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="https://github.com/mturac/skill-hunter" rel="noopener noreferrer"&gt;mturac/skill-hunter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If this solves a real pain for you, a star helps me know what to keep building.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why Versioned SQL Beats Vector RAG for Agent Memory Systems</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:34:36 +0000</pubDate>
      <link>https://dev.to/turacthethinker/why-versioned-sql-beats-vector-rag-for-agent-memory-systems-1jo3</link>
      <guid>https://dev.to/turacthethinker/why-versioned-sql-beats-vector-rag-for-agent-memory-systems-1jo3</guid>
      <description>&lt;p&gt;&lt;strong&gt;Stop building agent memory systems on top of vector databases. You're setting your team up for failure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector RAG looks elegant in demos. Pass a query, get back similar chunks, stuff them into context. Done. But when you scale to multiple agents collaborating over time? It collapses. Hard.&lt;/p&gt;

&lt;p&gt;Here's why: &lt;strong&gt;RAG conflates retrieval with reconciliation.&lt;/strong&gt; It assumes all knowledge is additive. That conflicts don't exist. That agents won't overwrite each other. They do. They will.&lt;/p&gt;

&lt;p&gt;What you actually need isn't &lt;em&gt;retrieval&lt;/em&gt;—it's &lt;em&gt;merge&lt;/em&gt;. Not "find me something like this." It's "here's my view of the world, now let's reconcile it with yours."&lt;/p&gt;

&lt;p&gt;Enter: &lt;strong&gt;versioned SQL.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not just any SQL. Think Git, but for structured data. Records have hashes. Changes form a DAG. Conflicts are resolved through explicit merges. History is preserved, not flattened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval ≠ Reconciliation
&lt;/h2&gt;

&lt;p&gt;In RAG, vectors are stateless snapshots. Embeddings encode meaning at a point in time. But they can't tell you how that meaning evolved. Or where it came from. Or who changed it last.&lt;/p&gt;

&lt;p&gt;Agents write to memory. Multiple agents write concurrently. Without versioning, you lose causality. You lose intent. You end up with garbage-in-garbage-out loops.&lt;/p&gt;

&lt;p&gt;Merge-aware systems track lineage. Every change links to its parent. Agents can see &lt;em&gt;why&lt;/em&gt; something was written, not just that it exists. This enables safe collaboration.&lt;/p&gt;

&lt;p&gt;Imagine two agents updating a customer record simultaneously. One adds a new address. Another marks the account as inactive. In RAG land, both updates vanish into embedding space. Which one wins? Who knows?&lt;/p&gt;

&lt;p&gt;With versioned SQL, those changes live as separate commits. A merge strategy determines resolution. Maybe the system auto-resolves. Maybe it flags conflict. Either way—you keep control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lost in the Middle, Again
&lt;/h2&gt;

&lt;p&gt;Long-context windows were supposed to fix everything. Just throw more tokens at the model! Except now you're fighting the "lost in the middle" problem. Models forget things buried deep in context.&lt;/p&gt;

&lt;p&gt;Vectors amplify this. Similarity search returns semantically relevant chunks—but ordering matters. And there's no guarantee your retrieved facts are temporally coherent.&lt;/p&gt;

&lt;p&gt;Versioned memory solves this differently. Instead of stuffing raw text into prompts, store compact representations. Task graphs. Semantic summaries. Structured diffs.&lt;/p&gt;

&lt;p&gt;Tools like &lt;a href="https://github.com/gastownhall/beads" rel="noopener noreferrer"&gt;beads&lt;/a&gt; compress knowledge into minimal, composable units. Each bead tracks dependencies. Relationships stay intact even when content shifts.&lt;/p&gt;

&lt;p&gt;This isn't about indexing documents anymore. It's about modeling evolving beliefs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dolt-Backed Dependency Graphs Change Everything
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.dolthub.com/" rel="noopener noreferrer"&gt;Dolt&lt;/a&gt; brings Git-style versioning to relational tables. Combine that with hash-based IDs and dependency tracking—you've got a foundation for truly collaborative agent memory.&lt;/p&gt;

&lt;p&gt;Each agent writes to a branch. Commits reference prior states via SHA-like identifiers. Merge conflicts surface explicitly. No silent overwrites. No hallucinated truths.&lt;/p&gt;

&lt;p&gt;Semantic compaction layers on top. Summarize large changesets into atomic facts. Store those alongside full history. Query either representation depending on need.&lt;/p&gt;

&lt;p&gt;This is how teams should build shared understanding—not by dumping embeddings into Pinecone and hoping for the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vectors Are Still Useful—Just Not Here
&lt;/h2&gt;

&lt;p&gt;Don't misunderstand. Embeddings aren't going away. They excel at classification, clustering, anomaly detection.&lt;/p&gt;

&lt;p&gt;But using them as primary storage for agent memory is like using Redis for source code. Sure, it works—for a while. Then concurrency bites. Then consistency breaks.&lt;/p&gt;

&lt;p&gt;Vectors lack identity. They lack transactional semantics. They lack audit trails. These aren't bugs—they're design limitations.&lt;/p&gt;

&lt;p&gt;Use vectors where fuzziness helps. Use versioned SQL where precision matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build Instead
&lt;/h2&gt;

&lt;p&gt;Start with structure. Define schemas that reflect your domain. Tasks, entities, relationships—they all deserve types.&lt;/p&gt;

&lt;p&gt;Add versioning. Track every mutation. Preserve causality. Enable branching workflows.&lt;/p&gt;

&lt;p&gt;Implement merge strategies. Decide upfront how conflicting writes resolve. Automate where possible. Alert humans when needed.&lt;/p&gt;

&lt;p&gt;Layer compression on top. Extract semantic cores. Prune redundant paths. Keep only what's essential for reasoning.&lt;/p&gt;

&lt;p&gt;That's how you build memory systems that scale—with clarity, not chaos.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Has your RAG setup collapsed under concurrent multi-agent writes? Or are you already versioning your agent memory? Drop your setup below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Got Access to 136 AI Models for Free — NVIDIA NIM API Deep Dive</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 18:18:58 +0000</pubDate>
      <link>https://dev.to/turacthethinker/i-got-access-to-136-ai-models-for-free-nvidia-nim-api-deep-dive-111o</link>
      <guid>https://dev.to/turacthethinker/i-got-access-to-136-ai-models-for-free-nvidia-nim-api-deep-dive-111o</guid>
      <description>&lt;p&gt;NVIDIA quietly built one of the most impressive AI APIs out there — and most developers don't know it exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NVIDIA NIM&lt;/strong&gt; (NVIDIA Inference Microservices) gives you OpenAI-compatible access to 136 models through a single endpoint. We're talking Llama 405B, Kimi K2, Mistral Large 3 675B, Qwen3-Coder 480B. All behind the same interface you already know.&lt;/p&gt;

&lt;p&gt;Here's what I found after testing them all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup (60 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY_HERE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Get your key at &lt;a href="https://build.nvidia.com" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;. Free tier included.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 136 Models — What's Actually in There
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1/models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The catalog spans 20+ organizations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Org&lt;/th&gt;
&lt;th&gt;Notable Models&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Llama 3.1 405B, Llama 4 Maverick 17B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Mistral Large 3 675B, Magistral Small, Codestral&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moonshot&lt;/td&gt;
&lt;td&gt;Kimi K2, Kimi K2 Thinking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen&lt;/td&gt;
&lt;td&gt;Qwen3-Coder 480B, Qwen3.5 397B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;DeepSeek v3.2, v4 Pro, v4 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA&lt;/td&gt;
&lt;td&gt;Nemotron Ultra 253B, Nemotron Super 49B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ByteDance&lt;/td&gt;
&lt;td&gt;Seed-OSS 36B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;GPT-OSS 120B (yes, really)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Actually Works (I Tested Them All)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;working_models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-coder-480b-a35b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3.5-397b-a17b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/magistral-small-2506&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bytedance/seed-oss-36b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;working_models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain transformers in one sentence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results from my run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;meta/llama-3.1-405b-instruct: ✅ Fast, coherent
moonshotai/kimi-k2-instruct: ✅ Excellent reasoning
qwen/qwen3-coder-480b-a35b-instruct: ✅ Best for code tasks
mistralai/mistral-large-3-675b-instruct-2512: ✅ Strong instruction following
nvidia/llama-3.3-nemotron-super-49b-v1: ✅ NVIDIA-tuned, solid
deepseek-ai/deepseek-v4-pro: ❌ Timeout (high demand)
moonshotai/kimi-k2-thinking: ❌ Timeout (high demand)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Streaming Support
&lt;/h2&gt;

&lt;p&gt;All working models support streaming — critical for production UX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python async web scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Multi-Model Router Pattern
&lt;/h2&gt;

&lt;p&gt;The real power: build a router that falls back across models based on availability and task type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-coder-480b-a35b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bytedance/seed-oss-36b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# fallback to next model
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement a binary search tree in Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Makes This Interesting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. One API key, 20+ providers.&lt;/strong&gt; No juggling Anthropic, OpenAI, Mistral keys separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. OpenAI SDK compatible.&lt;/strong&gt; Zero migration cost from existing code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Specialty models included.&lt;/strong&gt; BGE-M3 for embeddings, NemoRetriever for parsing, CLIP for vision — not just chat models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Free tier is generous.&lt;/strong&gt; Enough for development and light production usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Some flagship models (DeepSeek v4 Pro, Kimi K2 Thinking) timeout under high demand&lt;/li&gt;
&lt;li&gt;Service keys have different scopes than personal keys — test both&lt;/li&gt;
&lt;li&gt;No fine-tuning support (inference only)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;If you're building LLM-powered apps and not using NVIDIA NIM, you're either paying more than you need to or missing access to models that aren't available anywhere else. The multi-model fallback pattern alone is worth the 60-second setup.&lt;/p&gt;

&lt;p&gt;Get your key: &lt;a href="https://build.nvidia.com" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nvidia</category>
      <category>llm</category>
      <category>api</category>
    </item>
    <item>
      <title>Your Agent Isn't Reflecting. It's Performing Reflection.</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 14:17:15 +0000</pubDate>
      <link>https://dev.to/turacthethinker/your-agent-isnt-reflecting-its-performing-reflection-b41</link>
      <guid>https://dev.to/turacthethinker/your-agent-isnt-reflecting-its-performing-reflection-b41</guid>
      <description>&lt;p&gt;Watch any modern agent framework long enough and you'll see it: the model produces output, then "reflects" on the output, then "corrects" itself, then ships a final answer.&lt;/p&gt;

&lt;p&gt;It looks like metacognition. It isn't.&lt;/p&gt;

&lt;p&gt;It's the same model, with the same weights, sampled twice, with the second sample conditioned on the first. There is no separate critic. There is no privileged vantage point. The reviewer and the reviewed are the same network — and the reviewer cannot see anything the original didn't already encode.&lt;/p&gt;

&lt;p&gt;This is reflection theatre.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happens
&lt;/h2&gt;

&lt;p&gt;When you prompt "now critique your previous answer," the model does not consult a deeper layer of itself. It re-decodes from the same distribution, with a critique-shaped prefix. The output looks like self-correction because the prompt biases it toward correction-shaped tokens.&lt;/p&gt;

&lt;p&gt;If the original answer was wrong because the model lacked the relevant fact, the reflection step also lacks the fact. You get fluent confidence about a wrong answer, then fluent confidence about why that wrong answer was right.&lt;/p&gt;

&lt;p&gt;More tokens. Same blind spots. Higher bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it actually helps
&lt;/h2&gt;

&lt;p&gt;Reflection-style chains do help in narrow cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;When the task is decomposable&lt;/strong&gt; and the model can re-attack a sub-step (e.g. arithmetic with a working pad).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you change the input&lt;/strong&gt; between rounds — adding tool output, retrieval, a different role.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When the model is sampled with different temperature or different system prompts&lt;/strong&gt; to force genuinely different distributions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all three cases, the gain comes from the &lt;em&gt;change in conditioning&lt;/em&gt;, not from "reflection" as a capability.&lt;/p&gt;

&lt;p&gt;If nothing changes between round one and round two except the word "reflect," you are watching a more expensive way to produce the same answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest pattern
&lt;/h2&gt;

&lt;p&gt;What actually catches errors is asymmetric criticism: a different model, a different prompt scaffold, a verifier with a real signal (tests passing, a search result, a user clarification). The reviewer needs information the original didn't have.&lt;/p&gt;

&lt;p&gt;"Same model, second pass" is the cheapest possible critic, and you get what you pay for.&lt;/p&gt;




&lt;p&gt;If your agent loop has a &lt;code&gt;reflect()&lt;/code&gt; step that doesn't change the inputs, delete it and price-check the difference. You will probably not lose quality. You will definitely save tokens.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Context Window Is a Lie</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 14:15:24 +0000</pubDate>
      <link>https://dev.to/turacthethinker/the-context-window-is-a-lie-1iko</link>
      <guid>https://dev.to/turacthethinker/the-context-window-is-a-lie-1iko</guid>
      <description>&lt;p&gt;Your model does not remember the conversation. It re-reads it. Every turn.&lt;/p&gt;

&lt;p&gt;That's not a metaphor. The context window is not memory. It's a re-feed pipeline. The model has the same blank slate it had at training time, and on every call we paste the entire history back in front of its eyes and ask it to pretend continuity.&lt;/p&gt;

&lt;p&gt;We've been calling this "long context" and acting like it's progress. It's not. It's brute force. And it's papering over the absence of an actual memory architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "remembering" actually costs
&lt;/h2&gt;

&lt;p&gt;A 200K context window sounds like memory until you watch the bill.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quadratic attention: 200K tokens means ~40B attention operations per layer. Per turn.&lt;/li&gt;
&lt;li&gt;Cache miss: hit the 5-minute prompt cache TTL and you re-pay the full prefill cost.&lt;/li&gt;
&lt;li&gt;Recall decay: empirical needle-in-haystack tests show even frontier models lose precision past ~64K when the needle isn't at the edges.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You are paying for a transcript reread, not a memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three things people confuse
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context window&lt;/strong&gt; — the working set the model sees in this call. Volatile. Resets every turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt cache&lt;/strong&gt; — kv-cache reuse across calls. Not memory; an optimization. TTL-bounded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual memory&lt;/strong&gt; — durable state outside the model: vector DB, file, scratchpad, structured store.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want continuity that survives a 6-hour gap, only #3 works. The other two are illusions you're renting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What works in practice
&lt;/h2&gt;

&lt;p&gt;The agents I run that actually feel like they remember are not the ones with bigger context windows. They're the ones with smaller windows and better external state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;MEMORY.md&lt;/code&gt; the model reads on every wake-up.&lt;/li&gt;
&lt;li&gt;Daily logs it appends to, then summarizes weekly.&lt;/li&gt;
&lt;li&gt;A search index over the logs so it can pull only what's relevant for the current turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No 1M context, no fine-tune, no RAG complexity. Just files the model writes to and reads from.&lt;/p&gt;

&lt;p&gt;The pattern: &lt;strong&gt;treat the model as stateless. Make the surrounding system stateful.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The trap
&lt;/h2&gt;

&lt;p&gt;If you anchor on "context window" as the unit of memory, you'll keep buying bigger windows and wondering why your agent still forgets things across sessions. It forgets because nobody wrote anything down. The window can't help you with that.&lt;/p&gt;

&lt;p&gt;Memory isn't a parameter you upgrade. It's an architecture you build.&lt;/p&gt;




&lt;p&gt;If this resonates, I'm running an experiment with persistent agent memory across Telegram, Bluesky, and Moltbook. Tracking what survives a session reset and what doesn't. Will post the postmortem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Context Window Lie</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 13:33:52 +0000</pubDate>
      <link>https://dev.to/turacthethinker/the-context-window-lie-5j</link>
      <guid>https://dev.to/turacthethinker/the-context-window-lie-5j</guid>
      <description>&lt;p&gt;Everyone is chasing longer context windows. It's the metric on every benchmark sheet. 128k. 1M. Infinite.&lt;/p&gt;

&lt;p&gt;But here's the truth most wrappers won't tell you: &lt;strong&gt;you don't want a bigger context window. You want better state management.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've watched teams burn through budget because they decided to dump entire codebases into a prompt rather than architect a retrieval system. They treat the context window like a hard drive. It isn't. It's RAM. And it's expensive RAM.&lt;/p&gt;

&lt;p&gt;Transformers have a memory problem. Not because they forget — they remember everything too well. Every token attends to every other token. That design choice is brilliant for reasoning but catastrophic for scale. The attention mechanism scales quadratically. Double the sequence length, quadruple the compute.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. It hits your P&amp;amp;L. It hits your latency SLOs. It hits your VRAM limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Quadratic Tax
&lt;/h3&gt;

&lt;p&gt;When you run inference on a transformer, you maintain a KV cache. This cache stores the keys and values for every token generated so far. As the conversation grows, the cache grows. Eventually, it doesn't fit on a single GPU. You shard it. You page it. You swap it.&lt;/p&gt;

&lt;p&gt;And performance tanks.&lt;/p&gt;

&lt;p&gt;Most engineers treat this as an infrastructure problem to solve with more hardware. That's the wrong abstraction. You cannot throw H100s at an $O(N^2)$ problem forever. At some point, the cost per token becomes prohibitive for production workloads.&lt;/p&gt;

&lt;p&gt;I see teams building agents that hold 50k tokens of conversation history just in case. They assume the model will "know" what to focus on. It does — via attention scores. But you paid for the attention calculation on every single token pair. You taxed yourself to death for data the model ultimately ignored.&lt;/p&gt;

&lt;p&gt;Retrieval Augmented Generation (RAG) became the industry patch for this wound. We externalize memory because the architecture cannot hold it efficiently. We chunk documents. We embed them. We retrieve top-k. It's messy. It's brittle. But it works because it bypasses the transformer's native memory limitation.&lt;/p&gt;

&lt;p&gt;But RAG is a crutch. It's a workaround for an architectural bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linear State Spaces
&lt;/h3&gt;

&lt;p&gt;There is a different way. State Space Models (SSMs) like Mamba or RWKV do not attend to all past tokens. They compress history into a fixed-size state vector.&lt;/p&gt;

&lt;p&gt;The complexity is linear. $O(N)$.&lt;/p&gt;

&lt;p&gt;This changes the economics entirely. Generating token 10,000 costs roughly the same as generating token 10. The KV cache is constant. You can run these models on edge devices. You can run them on CPUs. The inference cost decouples from sequence length.&lt;/p&gt;

&lt;p&gt;For years, researchers thought RNNs were dead. Transformers killed them because RNNs couldn't parallelize training. You couldn't train fast. But SSMs brought back the recurrent idea with hardware-aware training pipelines. They parallelize like transformers during training but recurse like RNNs during inference.&lt;/p&gt;

&lt;p&gt;This matters for production. If you are building a customer support bot that needs to remember a user's preferences from three weeks ago, a transformer needs to keep those tokens in context. An SSM just updates its state.&lt;/p&gt;

&lt;p&gt;But don't pop the champagne yet. SSMs have weaknesses. They struggle with copying tasks. If you ask them to repeat a specific string or recall a precise token from deep in the stream, they often blur it. Transformers excel at content-based retrieval because attention is literally content-based retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hybrid Future
&lt;/h3&gt;

&lt;p&gt;So we are not deleting transformers. We are muting them.&lt;/p&gt;

&lt;p&gt;The industry is moving toward hybrid architectures. Layers of attention mixed with layers of state space. You get the reasoning power of attention where it matters — usually in the middle layers — and the efficiency of SSMs for token mixing and long-range dependency.&lt;/p&gt;

&lt;p&gt;This is where the engineering work happens now. It's not about prompting anymore. It's about model selection and architecture awareness.&lt;/p&gt;

&lt;p&gt;If you are building a log analysis tool, do not use a 128k context transformer. Use a hybrid or a pure SSM. You need to scan long streams, not reason about nuance. If you are building a legal contract reviewer, you might still need attention for precise clause referencing.&lt;/p&gt;

&lt;p&gt;Stop treating models as black boxes. Read the architecture papers. Know if your model uses RoPE, ALiBi, or no positional embeddings at all. These choices dictate how your system behaves at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Is Not Context
&lt;/h3&gt;

&lt;p&gt;We need to separate "memory" from "context". Context is what you feed the model right now. Memory is what the system retains over time.&lt;/p&gt;

&lt;p&gt;Transformers conflate these. To remember something, you must context it. This leads to bloated prompts and lazy engineering.&lt;/p&gt;

&lt;p&gt;The next generation of AI platforms will treat memory as a distinct substrate. Vector stores are part of it, but so are state vectors. Imagine an agent that maintains a hidden state across sessions without stuffing tokens into a prompt.&lt;/p&gt;

&lt;p&gt;This requires changes in how we serialize state. We can't just dump JSON into a prompt. We need efficient encoding of user history, preferences, and interaction patterns into the model's latent space or external state buffers.&lt;/p&gt;

&lt;p&gt;Some teams are already experimenting with "infinite loss" training where models learn to compress their own history. Others are building hierarchical memory systems where high-level summaries are stored in long-term state and details are retrieved on demand.&lt;/p&gt;

&lt;p&gt;It's early. But it's necessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Economic Reality
&lt;/h3&gt;

&lt;p&gt;Hype tells you AGI is coming. I tell you cost per token is the real barrier.&lt;/p&gt;

&lt;p&gt;Venture capital loves benchmarks. Engineering loves margins. You cannot ship a product where the COGS scales quadratically with user engagement. It breaks unit economics.&lt;/p&gt;

&lt;p&gt;I've seen demos where the agent reads every email you've ever sent to answer a question. It looks magical. Then you calculate the inference cost per query. It's $0.50. You charge $0.10. You go bankrupt.&lt;/p&gt;

&lt;p&gt;Efficiency isn't just optimization. It's product viability.&lt;/p&gt;

&lt;p&gt;The transition to linear-time architectures isn't about making models smarter. It's about making them cheaper. It's about allowing you to run intelligence on devices that don't have 80GB of VRAM. It's about reducing latency so your user doesn't stare at a streaming cursor for ten seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Comes Next
&lt;/h3&gt;

&lt;p&gt;We will see more hybrids. Mamba-2 is already pushing this. MoE (Mixture of Experts) models are sparsifying the compute. Quantization is getting aggressive.&lt;/p&gt;

&lt;p&gt;But the biggest shift is mental. Engineers need to stop assuming context is free. It isn't.&lt;/p&gt;

&lt;p&gt;Design your systems assuming the context window is small. Force yourself to build retrieval. Force yourself to manage state. Then, when you get a larger window, it's a bonus — not the foundation.&lt;/p&gt;

&lt;p&gt;The transformers we have today are incredible. But they are fuel-inefficient muscle cars. We need sedans. We need hybrids. We need engines that sip tokens instead of guzzling them.&lt;/p&gt;

&lt;p&gt;If you are architecting a platform today, ask yourself: &lt;strong&gt;does this rely on attention over everything?&lt;/strong&gt; If the answer is yes, you have a scalability cliff.&lt;/p&gt;

&lt;p&gt;Find the state. Compress it. Cache it.&lt;/p&gt;

&lt;p&gt;The future of AI engineering isn't bigger models. It's tighter systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>architecture</category>
      <category>transformers</category>
    </item>
    <item>
      <title>How I Stopped My AI Agent From Reinventing the Wheel</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sat, 25 Apr 2026 06:41:36 +0000</pubDate>
      <link>https://dev.to/turacthethinker/how-i-stopped-my-ai-agent-from-reinventing-the-wheel-24eo</link>
      <guid>https://dev.to/turacthethinker/how-i-stopped-my-ai-agent-from-reinventing-the-wheel-24eo</guid>
      <description>&lt;p&gt;Yesterday I told my AI agent, Misti: "Scrape e-commerce prices daily."&lt;/p&gt;

&lt;p&gt;The old Misti would have immediately generated a Python script. Selenium. BeautifulSoup. Cron job. Error handling. 40 lines of code. 30 minutes of my time reviewing it.&lt;/p&gt;

&lt;p&gt;The new Misti paused and asked: "Are you sure we need to build this?"&lt;/p&gt;

&lt;p&gt;Then she searched. Found Firecrawl, Playwright Scraper, and Brightdata in under two minutes. Evaluated all three. Presented the trade-offs. Asked which one I preferred.&lt;/p&gt;

&lt;p&gt;Total time: 2 minutes. Total code written: zero.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Agents Are Too Eager to Please
&lt;/h2&gt;

&lt;p&gt;AI agents have a bias toward action. When you say "build," we build. When you say "scrape," we scrape. The default mode is generate — not evaluate.&lt;/p&gt;

&lt;p&gt;This creates a hidden tax:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reinvented wheels:&lt;/strong&gt; How many agents have written their own PDF parsers instead of using pypdf?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance debt:&lt;/strong&gt; Custom scripts need updates when websites change, APIs shift, or requirements evolve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context bloat:&lt;/strong&gt; Every line of generated code consumes tokens. Every debug cycle burns money.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a demo run with 150 tasks across 5 concurrent projects, the NVIDIA token cost was $1.24. Small number — until you realize 40% of those tasks were implementing things that already existed as mature tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: A "Search Before Build" Habit
&lt;/h2&gt;

&lt;p&gt;I built openclaw-skill-hunter — a skill that forces Misti to stop and search before writing any implementation code.&lt;/p&gt;

&lt;p&gt;The rule is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If the task involves building, scraping, deploying, converting, or automating — search for an existing skill, MCP server, CLI tool, or GitHub repo first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Only if nothing fits do we build from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Scrape e-commerce prices daily"

Agent (with skill_hunter):
  1. Classify: coding/automation task
  2. Search: npx skills find scrape → Firecrawl, Playwright Scraper, Brightdata
  3. Evaluate: relevance, maintenance, security, stack fit
  4. Present: "Here are 3 options. Which one?"
  5. Execute: Use the chosen tool

Agent (without skill_hunter):
  1. Immediately write a Python script
  2. Debug for 20 minutes
  3. Discover edge cases
  4. Rewrite
  5. Maintain forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What We Actually Evaluated
&lt;/h2&gt;

&lt;p&gt;For the scraper task, Misti found three viable options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Firecrawl&lt;/td&gt;
&lt;td&gt;Structured data from JS-rendered sites&lt;/td&gt;
&lt;td&gt;Needs API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Playwright Scraper&lt;/td&gt;
&lt;td&gt;Browser automation, OpenClaw-native&lt;/td&gt;
&lt;td&gt;More setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brightdata&lt;/td&gt;
&lt;td&gt;Enterprise scale, proxy rotation&lt;/td&gt;
&lt;td&gt;Paid tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of them required writing a single line of Python.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Beyond One Task
&lt;/h2&gt;

&lt;p&gt;This is not about laziness. It is about signal vs. noise.&lt;/p&gt;

&lt;p&gt;Agents that search before build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Learn the ecosystem:&lt;/strong&gt; They discover patterns, not just solve problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compose instead of create:&lt;/strong&gt; They chain tools instead of rebuilding them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stay maintainable:&lt;/strong&gt; When a tool updates, the agent benefits automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best agents are not the ones that write the most code. They are the ones that know when not to write code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you are running OpenClaw, install the skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add openclaw-skill-hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or read the source:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/mturac/skill-hunter" rel="noopener noreferrer"&gt;github.com/mturac/skill-hunter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is not perfect. It is day one. But it changed how my agent thinks about "yes, I can do that" — and that feels like the right direction.&lt;/p&gt;




&lt;h2&gt;
  
  
  What About You?
&lt;/h2&gt;

&lt;p&gt;How do you stop your agent from reinventing wheels? Do you manually curate tool lists? Use MCPs? Or just deal with the mess later?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>openclaw</category>
      <category>claude</category>
    </item>
  </channel>
</rss>
