<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mehmet TURAÇ</title>
    <description>The latest articles on DEV Community by Mehmet TURAÇ (@turacthethinker).</description>
    <link>https://dev.to/turacthethinker</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2891163%2F4ed4212c-3d45-4e35-877f-decf97916132.png</url>
      <title>DEV Community: Mehmet TURAÇ</title>
      <link>https://dev.to/turacthethinker</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/turacthethinker"/>
    <language>en</language>
    <item>
      <title>Why AI Agents Fail?</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Fri, 15 May 2026 18:29:02 +0000</pubDate>
      <link>https://dev.to/turacthethinker/why-ai-agents-fail-ddg</link>
      <guid>https://dev.to/turacthethinker/why-ai-agents-fail-ddg</guid>
      <description>&lt;p&gt;This article is the extended version of the essay “Why AI Agents Fail.” It incorporates research from 2025–2026 on why many AI agent projects do not deliver the promised business impact and offers a comprehensive roadmap. Technical terms are preserved in English with parenthetical explanations where appropriate.&lt;/p&gt;

&lt;p&gt;1 Introduction: Defining Agents and Sorting Hype&lt;/p&gt;

&lt;p&gt;AI agents are software components built around a language model. Unlike a simple chatbot that generates a single answer, an agent plans a sequence of actions, uses tools and APIs, and works toward a goal. This “agentic AI” market exploded in 2024–2026, but most deployments under‑deliver. Industry analyses paint a sobering picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MIT’s 2025 study found that 95 % of enterprise GenAI pilots produced no measurable P&amp;amp;L impact.&lt;/li&gt;
&lt;li&gt;Gartner predicted that more than 40 % of agentic AI projects will be cancelled by the end of 2027. It warns that thousands of vendors are “agent‑washing” existing products, while only ~130 actually provide agentic capabilities.&lt;/li&gt;
&lt;li&gt;In Carnegie Mellon’s TheAgentCompany simulation, Claude 3.5 Sonnet completed only 24 % of realistic office tasks and GPT‑4o achieved 8.6 %. The study found that small errors in early steps trigger cascading failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These numbers suggest that failure is not because the models are weak. Rather, poor architecture, integration, evaluation, governance and human oversight cause projects to fall apart. Tech insiders such as Anil Dash and Andrej Karpathy remind us that AI is not magical; fully autonomous agents are still science fiction. Jay Latta notes that LLMs do not learn on the fly and marketing language often masks limitations.&lt;/p&gt;

&lt;p&gt;2 Root Causes of Agent Failure&lt;/p&gt;

&lt;p&gt;2.1 Context Management and Context Debt&lt;/p&gt;

&lt;p&gt;Engineers often assume that model quality determines success. But Inkeep’s 2025 “context engineering” analysis shows that most failures stem from how context (the information fed into the model) is handled. Poor context management introduces three problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Context pollution – pulling too much irrelevant data into the agent’s prompt (“dumb RAG”) overwhelms the model and increases hallucinations.&lt;/li&gt;
&lt;li&gt;Tool bloat – adding too many tools does not improve performance; studies show that agents degrade beyond 5–10 tools and specialized sub‑agents perform better.&lt;/li&gt;
&lt;li&gt;Memory and summarization – storing entire conversations bloats tokens and pollutes context. Agents need to summarize and retrieve only relevant information.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Context should be treated as a finite budget. When context debt accumulates (unused or irrelevant data persists across tasks), the cost and error rate rise. Stronger models do not solve this; they make wrong answers more persuasive.&lt;/p&gt;

&lt;p&gt;2.2 Integration Gaps and Brittle Connectors&lt;/p&gt;

&lt;p&gt;Composio’s 2025 AI Agent report argues that most pilots fail because of integration gaps, not model issues. It identifies three traps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dumb RAG: dumping all enterprise data into context.&lt;/li&gt;
&lt;li&gt;Brittle connectors: fragile API bindings that break easily.&lt;/li&gt;
&lt;li&gt;Polling tax: systems that poll for updates instead of using event‑driven architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address this, Composio proposes an agent‑native integration layer with four principles: (1) context precision (fetch only what is needed), (2) bidirectional event‑driven I/O, (3) policy and governance enforcement, and (4) observability and testability.&lt;/p&gt;

&lt;p&gt;2.3 Multi‑Step Brittleness and Task Complexity&lt;/p&gt;

&lt;p&gt;Carnegie Mellon’s simulation reveals that agents struggle with multi‑step tasks. Agents failed 70 % of the time when they had to plan and execute multiple steps. The simplest tasks—drafting an email, formatting data, summarizing text—fare better, while actions requiring API calls, navigation or coordination often collapse. Future Factors’ 2026 analysis suggests a framework to decide when humans must be in the loop: assess the risk of the task, the uncertainty of input, the cost of error, and enforce a trial “review mode” before moving to production.&lt;/p&gt;

&lt;p&gt;2.4 Evaluation and Observability&lt;/p&gt;

&lt;p&gt;Many organisations lack observability and evaluation infrastructure. Atlan’s AI agent observability guide defines three essential components: (1) end‑to‑end execution traces, (2) critical metrics (latency, cost, success rate, token usage, hallucination rate), and (3) logging tied to a governed context graph. It warns that 50 % of AI deployments will fail by 2030 due to insufficient governance and observability.&lt;/p&gt;

&lt;p&gt;Tricentis’ evaluation framework emphasises defining success criteria, logging each reasoning step, writing test cases, and measuring both “hard” metrics (tool correctness, latency, policy violations) and “soft” metrics (reasoning quality, hallucinations). Afiniti Global reports that 70 % of B2B agent pilots do not reach production because of behavioral drift, brittle integrations, lack of evaluation infrastructure and opaque operations.&lt;/p&gt;

&lt;p&gt;2.5 Governance, Human Oversight and Safety&lt;/p&gt;

&lt;p&gt;Many failures happen because there is no mechanism to override wrong decisions. Elementum AI’s 2026 analysis shows that agents fail on 70 % of complex tasks when no structured human oversight exists. It proposes three levels of human involvement:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Human‑in‑the‑loop: the agent must get approval before executing critical actions (financial transfers, medical decisions, legal steps).&lt;/li&gt;
&lt;li&gt;Human‑on‑the‑loop: the agent completes tasks but a human reviews the output and provides feedback for continuous improvement.&lt;/li&gt;
&lt;li&gt;Human‑out‑of‑the‑loop: for low‑risk, single‑step tasks; automated alerts still monitor performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Elementum lists four risk categories: hallucinations causing legal liability, goal misalignment (e.g., a code assistant accidentally deleting a production database), security vulnerabilities (prompt injection), and other issues like privacy leaks or harm to individuals.&lt;/p&gt;

&lt;p&gt;3 The Four‑Layer Architecture for Reliable Agents&lt;/p&gt;

&lt;p&gt;Afiniti Global proposes a four‑layer architecture to make agents production‑ready:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Planning layer: breaks down tasks into sub‑goals and decides which tools to use; separate planning from execution.&lt;/li&gt;
&lt;li&gt;Tools layer: the set of functions and APIs the agent calls. Each tool should be idempotent, return structured data and handle errors gracefully.&lt;/li&gt;
&lt;li&gt;Evaluation layer: includes test suites, trajectory‑based evaluations, and outcome‑oriented metrics. Setting up evaluation harnesses costs ~15–25 % of the total project but without them every model update is like rolling dice.&lt;/li&gt;
&lt;li&gt;Operations layer: covers logging, monitoring, traffic shaping, rollback and emergency stop mechanisms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture mitigates behavioral drift, brittle integrations, missing tests and operational opacity.&lt;/p&gt;

&lt;p&gt;4 Dashboard: Key Metrics and KPIs&lt;/p&gt;

&lt;p&gt;Agents need dashboards that combine hard and soft metrics. Suggested metrics include:&lt;/p&gt;

&lt;p&gt;Metric  Description Target  Notes&lt;br&gt;
Task completion rate    Share of tasks the agent finishes correctly &amp;gt;90 % for defined tasks Leading models currently score 24–30 % on multi‑step tasks.&lt;br&gt;
Cost per task   Total token, API and compute cost   Lower than human labour Important for ROI calculation.&lt;br&gt;
Hallucination rate  Frequency of incorrect or fabricated responses  &amp;lt;1 %    Hallucinations create legal liability.&lt;br&gt;
Context debt    Accumulation of irrelevant context  Minimised   Treat context as a finite budget.&lt;br&gt;
Human‑in‑the‑loop intervention rate   Proportion of actions requiring human approval  Calibrated to task risk Use a tiered oversight model.&lt;br&gt;
Latency End‑to‑end time to complete a task  Aligned with SLAs   Critical for customer‑facing agents.&lt;br&gt;
Safety &amp;amp; compliance indicators  Policy violations, data leakage, legal risk Zero tolerance  Many agents ignore robots.txt and fail to disclose they are bots.&lt;br&gt;
User satisfaction   Human feedback scores   High    Included in the 2026 AI Agent Benchmarks.&lt;/p&gt;

&lt;p&gt;Combining these metrics with full execution traces enables teams to diagnose failures and improve performance.&lt;/p&gt;

&lt;p&gt;5 Roadmap for Leaders&lt;/p&gt;

&lt;p&gt;Leaders should look beyond technology and ask five strategic questions before launching an agent project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Context and data ownership: What data does the agent access? How do we handle privacy, security and compliance? How will we manage context debt?&lt;/li&gt;
&lt;li&gt;Decision rights and accountability: Which actions require human approval? What are the levels of human oversight? Can we roll back actions or stop the agent?&lt;/li&gt;
&lt;li&gt;Integration and tool management: Are our APIs idempotent and versioned? Have we designed to avoid brittle connectors and polling tax?&lt;/li&gt;
&lt;li&gt;Evaluation and test infrastructure: Do we have test suites for each tool and workflow? Are we continuously measuring hard and soft metrics? Have we budgeted for building evaluation harnesses?&lt;/li&gt;
&lt;li&gt;Team skills and culture: Does the team understand the limitations and risks of agents? Are training and policies in place? Are we fostering leadership that can distinguish hype from reality?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Answering these questions shapes the scope, risk profile and governance model of the project.&lt;/p&gt;

&lt;p&gt;6 Conclusion: Realistic Expectations and Responsible Design&lt;/p&gt;

&lt;p&gt;AI agents often fail not because the models are inadequate but because of poor design, integration, observability and governance. Throwing larger models or more tools at the problem adds context debt, integration brittleness and untested workflows. Many deployed agents lack transparency and safety standards.&lt;/p&gt;

&lt;p&gt;Yet agents can create real value when designed responsibly. Modular agents with human‑in‑the‑loop supervision excel at single‑step, well‑defined tasks. A four‑layer architecture, evaluation harnesses and operational monitoring make even complex tasks viable. Above all, leaders must look past hype and embrace accountability and transparency.&lt;/p&gt;

&lt;p&gt;Borrowing from Acemoglu and Robinson’s institutional theory: successful agentic systems resemble inclusive institutions—transparent, accountable and flexible. Exploitative, opaque, monolithic systems may deliver short‑term wins but are fragile. The next generation of AI systems will succeed not only with better models but also with the right architecture, context management, human oversight and ethical governance.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;References: Inkeep “Context Engineering” (2025); Composio “AI Agent Report” (2025); Carnegie Mellon University “TheAgentCompany Simulation” (2025); Atlan “AI Agent Observability” (2026); Tricentis “AI Agent Evaluation Framework” (2025); Elementum AI “Human‑in‑the‑Loop Agentic AI” (2026); Afiniti Global “Why 70 % of B2B AI Agent Pilots Fail Production” (2026); Future Factors “The 70 % Problem” (2026); MIT “The 2025 AI Agent Index” (2025); Newsworthy.ai and The Register coverage on AI agent performance.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>agents</category>
      <category>code</category>
    </item>
    <item>
      <title>🌪️ We Ship to Production Without Tests. Here's How It Destroyed Us.</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Thu, 07 May 2026 19:58:59 +0000</pubDate>
      <link>https://dev.to/turacthethinker/we-ship-to-production-without-tests-heres-how-it-destroyed-us-i4i</link>
      <guid>https://dev.to/turacthethinker/we-ship-to-production-without-tests-heres-how-it-destroyed-us-i4i</guid>
      <description>&lt;p&gt;&lt;em&gt;"You don't need bad intentions to destroy a project. Just say 'we'll write tests later' and watch it burn."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;We built an e-commerce platform in 2 weeks. No tests. 3 customers got overcharged. Here's the full story — and why TDD saved our careers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Beginning
&lt;/h2&gt;

&lt;p&gt;November 2025. Three developers. One sneaker marketplace called &lt;strong&gt;SoleDrop&lt;/strong&gt;. Limited edition kicks, raffle system, payments — the works.&lt;/p&gt;

&lt;p&gt;Our boss said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We're selling to hypebeasts. Cart, checkout, raffle. Go."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So we went. Full speed. Zero tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Our first commit — no tests, pure vibes
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_to_cart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cart&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"It's just a cart function, how hard can it be?"&lt;/p&gt;

&lt;p&gt;We tested it once. By hand. It worked. We deployed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We wrote zero tests.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Week 2: Black Friday 🎃
&lt;/h2&gt;

&lt;p&gt;We launched a 20% discount code: &lt;code&gt;SOLEBF20&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First 2 hours? Beautiful. Then a customer tweeted:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"@SoleDrop added 2 Nike Dunks to my cart, applied discount, total shows 384₺ instead of 480₺. Am I getting these for free? 😂"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The discount was applied to &lt;strong&gt;(products + shipping) × 0.8&lt;/strong&gt; instead of &lt;strong&gt;(products × 0.8) + shipping&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;"Simple fix." We patched it. Deployed.&lt;/p&gt;

&lt;p&gt;Then: someone added the same sneaker 5 times. Stock was 3. The system didn't care.&lt;/p&gt;

&lt;p&gt;Then: the discount code could be applied &lt;strong&gt;twice&lt;/strong&gt;. 20% + 20% = 40% off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every fix spawned a new bug.&lt;/strong&gt; No tests meant we were coding blind.&lt;/p&gt;




&lt;h2&gt;
  
  
  Week 3: The Night Everything Broke 💀
&lt;/h2&gt;

&lt;p&gt;Wednesday. 2:47 AM. My phone rings.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Production is down. 3 customers got overcharged on their credit cards. Fix it NOW."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The cart calculation function was &lt;strong&gt;rewritten&lt;/strong&gt; by another dev that week. He reintroduced the exact bug we fixed in week 2. Nobody noticed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because there were no tests.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│         PROJECT STATUS — WEEK 3         │
├─────────────────────────────────────────┤
│  Open bugs              : 23            │
│  Closed bugs            : 31            │
│  Reopened bugs          : 14 (!)        │
│  Test coverage          : 0%            │
│  Team morale            : ██████░░░░ 30%│
│  Production incidents   : 7             │
│  Credit card refunds    : 3 customers   │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That night, at 3:30 AM, Emre — our senior dev, 8 years of experience — sat at my desk. Exhausted.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This happened to me before. 2019. A fintech company. Same mistake. Same 'we'll write tests later.' We killed that project too."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;"Why didn't you tell us earlier?"&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I did. You said 'no time.'"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Lesson
&lt;/h2&gt;

&lt;p&gt;We delivered the project. 2 weeks late. 3 refunds. Team burned out.&lt;/p&gt;

&lt;p&gt;That night I learned:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No tests = No trust. No trust = No speed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Testing isn't a waste of time. &lt;strong&gt;Not testing&lt;/strong&gt; is — because you pay it back in bug fixes, production incidents, credit card refunds, and "why did it break" meetings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;tdd, testing, softwareengineering, failure, debugging&lt;/code&gt;&lt;/p&gt;

</description>
      <category>tdd</category>
      <category>testing</category>
      <category>softwareengineering</category>
      <category>development</category>
    </item>
    <item>
      <title>I built a product in one AI session. Here's the system that made it ship right.</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Mon, 04 May 2026 21:31:49 +0000</pubDate>
      <link>https://dev.to/turacthethinker/i-built-a-product-in-one-ai-session-heres-the-system-that-made-it-ship-right-3mb3</link>
      <guid>https://dev.to/turacthethinker/i-built-a-product-in-one-ai-session-heres-the-system-that-made-it-ship-right-3mb3</guid>
      <description>&lt;p&gt;43% of startups do not fail because the code was bad.&lt;/p&gt;

&lt;p&gt;They fail because the wrong thing got built.&lt;/p&gt;

&lt;p&gt;Not ugly code.&lt;br&gt;&lt;br&gt;
Not missing features.&lt;br&gt;&lt;br&gt;
Not “we should have used a different framework.”&lt;/p&gt;

&lt;p&gt;The product was simply solving a problem that was not painful enough, urgent enough, or owned by someone specific enough.&lt;/p&gt;

&lt;p&gt;That is the part AI makes more dangerous.&lt;/p&gt;

&lt;p&gt;Because now we can build the wrong thing faster than ever.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;product-init&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A hard-gated product discovery system for AI coding tools.&lt;/p&gt;

&lt;p&gt;It runs before your agent writes a single line of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;AI coding tools are getting insanely good.&lt;/p&gt;

&lt;p&gt;Codex can ship.&lt;br&gt;&lt;br&gt;
Claude Code can build.&lt;br&gt;&lt;br&gt;
OpenClaw can orchestrate.&lt;br&gt;&lt;br&gt;
Agents can plan, write, test, and deploy.&lt;/p&gt;

&lt;p&gt;But there is still one uncomfortable question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the goal is wrong?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most AI workflows start too late.&lt;/p&gt;

&lt;p&gt;They begin with:&lt;/p&gt;

&lt;p&gt;Build this app.&lt;/p&gt;

&lt;p&gt;But product failure usually starts much earlier.&lt;/p&gt;

&lt;p&gt;Before the first component.&lt;br&gt;&lt;br&gt;
Before the first database table.&lt;br&gt;&lt;br&gt;
Before the first API route.&lt;/p&gt;

&lt;p&gt;It starts when nobody asks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this really for?&lt;/li&gt;
&lt;li&gt;What painful job are they hiring this product to do?&lt;/li&gt;
&lt;li&gt;Who owns the failure if this does not work?&lt;/li&gt;
&lt;li&gt;What would make us kill this idea before we waste time building it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the gap product-init tries to close.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is product-init?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;product-init&lt;/strong&gt; is a Claude Code skill that also works with Codex CLI and OpenClaw-style workflows.&lt;/p&gt;

&lt;p&gt;You type:&lt;/p&gt;

&lt;p&gt;/product-init "build an HR assessment tool"&lt;/p&gt;

&lt;p&gt;And before any code is written, it forces the product through 9 gates.&lt;/p&gt;

&lt;p&gt;Gate 1 does not ask for a tech stack.&lt;/p&gt;

&lt;p&gt;It asks things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who gets fired if this fails?&lt;/li&gt;
&lt;li&gt;What job is the user hiring this product for?&lt;/li&gt;
&lt;li&gt;What does failure look like in production — in numbers?&lt;/li&gt;
&lt;li&gt;What signal proves this is worth building?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is weak, the pipeline stops.&lt;/p&gt;

&lt;p&gt;Not “warns.”&lt;/p&gt;

&lt;p&gt;Stops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CRITICAL findings block everything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is no &lt;code&gt;--skip&lt;/code&gt; flag.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 9 gates
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gate&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What blocks it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Discovery Constitution&lt;/td&gt;
&lt;td&gt;JTBD undefined, kill criteria missing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Statement of Work&lt;/td&gt;
&lt;td&gt;Appetite not set, PR-FAQ not signed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Design&lt;/td&gt;
&lt;td&gt;Screen not mapped to a Gate 1 job&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Build&lt;/td&gt;
&lt;td&gt;Orphan TODOs, commit message not AC-linked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;QA&lt;/td&gt;
&lt;td&gt;Unit, integration, or E2E tests failing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;UAT&lt;/td&gt;
&lt;td&gt;No real human sign-off on a real URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Deploy&lt;/td&gt;
&lt;td&gt;No production HTTP 200, no rollback drill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Handoff&lt;/td&gt;
&lt;td&gt;No runbook, no &lt;code&gt;DEBT.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Warranty&lt;/td&gt;
&lt;td&gt;72-hour monitoring window not passed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;p&gt;AI should not only generate code.&lt;/p&gt;

&lt;p&gt;It should be forced to respect product judgment, delivery discipline, and operational evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  I dogfooded it
&lt;/h2&gt;

&lt;p&gt;I tested product-init by building an AI-powered HR interview product in one session.&lt;/p&gt;

&lt;p&gt;The result included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Editorial landing page&lt;/li&gt;
&lt;li&gt;Dark interview room with live AI sessions&lt;/li&gt;
&lt;li&gt;Candidate dashboard with scored results&lt;/li&gt;
&lt;li&gt;PDF reports across 4 evaluation dimensions&lt;/li&gt;
&lt;li&gt;Production deployment&lt;/li&gt;
&lt;li&gt;Handoff package&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Live demo: &lt;a href="https://demorpoject.vercel.app" rel="noopener noreferrer"&gt;demorpoject.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The important part was not that the product shipped.&lt;/p&gt;

&lt;p&gt;The important part was that the system kept asking whether it deserved to ship.&lt;/p&gt;

&lt;p&gt;All 9 gates passed.&lt;/p&gt;

&lt;p&gt;No hidden “trust me bro” layer.&lt;/p&gt;

&lt;p&gt;No fake done.&lt;/p&gt;

&lt;p&gt;No agent saying “completed” without evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research behind it
&lt;/h2&gt;

&lt;p&gt;This is not a custom framework I invented from vibes.&lt;/p&gt;

&lt;p&gt;product-init is assembled from proven product and delivery thinking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CB Insights 2024&lt;/strong&gt; — market-need failure as a hard Gate 1 blocker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clayton Christensen’s Jobs To Be Done&lt;/strong&gt; — user/job framing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marty Cagan’s four-risk model&lt;/strong&gt; — value, usability, feasibility, business viability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Basecamp Shape Up&lt;/strong&gt; — appetite, scope, and fixed-time product bets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon PR-FAQ&lt;/strong&gt; — narrative-first product definition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eric Ries’ Lean Startup&lt;/strong&gt; — build-measure-learn loops and kill criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to slow AI down.&lt;/p&gt;

&lt;p&gt;The goal is to stop AI from confidently building the wrong thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Works with Claude Code, Codex CLI, and OpenClaw-style workflows.&lt;/p&gt;

&lt;p&gt;curl -sSL &lt;a href="https://raw.githubusercontent.com/mturac/product-init/main/install.sh" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/mturac/product-init/main/install.sh&lt;/a&gt; | bash&lt;/p&gt;

&lt;p&gt;Then run:&lt;/p&gt;

&lt;p&gt;/product-init "your product idea"&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;We are entering a strange phase of software.&lt;/p&gt;

&lt;p&gt;The limiting factor is no longer whether we can build.&lt;/p&gt;

&lt;p&gt;The limiting factor is whether we can decide what is worth building.&lt;/p&gt;

&lt;p&gt;AI agents are becoming execution engines.&lt;/p&gt;

&lt;p&gt;But execution without product judgment is just faster waste.&lt;/p&gt;

&lt;p&gt;That is why product-init exists.&lt;/p&gt;

&lt;p&gt;A product gatekeeper for agentic development.&lt;/p&gt;

&lt;p&gt;Before the code.&lt;br&gt;&lt;br&gt;
Before the sprint.&lt;br&gt;&lt;br&gt;
Before the demo.&lt;br&gt;&lt;br&gt;
Before the illusion of done.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/mturac/product-init" rel="noopener noreferrer"&gt;https://github.com/mturac/product-init&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open source.&lt;br&gt;&lt;br&gt;
Free.&lt;br&gt;&lt;br&gt;
No &lt;code&gt;--skip&lt;/code&gt; flag.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>openai</category>
      <category>claude</category>
    </item>
    <item>
      <title>Remote Work Didn’t Break Productivity — It Broke Human Connection</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sat, 02 May 2026 14:57:46 +0000</pubDate>
      <link>https://dev.to/turacthethinker/remote-work-didnt-break-productivity-it-broke-human-connection-288o</link>
      <guid>https://dev.to/turacthethinker/remote-work-didnt-break-productivity-it-broke-human-connection-288o</guid>
      <description>&lt;p&gt;&lt;strong&gt;I know this is not a technical article.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've been leading remote teams for about three years. Productivity is fine. Everything ships. But if you've done this for a while, you've probably felt it too: something is off.&lt;/p&gt;

&lt;p&gt;We're still working together.&lt;/p&gt;

&lt;p&gt;Same company. Same meetings. Same tasks.&lt;/p&gt;

&lt;p&gt;But we don't really know each other anymore.&lt;/p&gt;

&lt;p&gt;And the strange part?&lt;/p&gt;

&lt;p&gt;You don't notice it at first.&lt;/p&gt;




&lt;p&gt;About 3 years ago, I started noticing it.&lt;/p&gt;

&lt;p&gt;It was a farewell call. A teammate I had paired with for almost two years was leaving. He said, "You're all great to work with. I just wish I had actually gotten to know you."&lt;/p&gt;

&lt;p&gt;I was working with people, mentoring, collaborating, everything was moving forward.&lt;/p&gt;

&lt;p&gt;But something was missing.&lt;/p&gt;

&lt;p&gt;The work was happening.&lt;/p&gt;

&lt;p&gt;The relationship wasn't.&lt;/p&gt;




&lt;p&gt;So I tried something simple.&lt;/p&gt;

&lt;p&gt;I started inviting people I work with to my home.&lt;/p&gt;

&lt;p&gt;I called it &lt;strong&gt;Turaç Meyhanesi&lt;/strong&gt; (&lt;em&gt;a Turkish-style long table gathering. Not just dinner, but hours of conversation, shared plates, and real human connection&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnzxx0w0mo40cz0ohtn1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnzxx0w0mo40cz0ohtn1.jpg" alt=" " width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sometimes we're 12 to 16 people.&lt;/p&gt;

&lt;p&gt;How we fit into the house is a different story, my wife and I basically redesign the entire living room for the night.&lt;/p&gt;

&lt;p&gt;But the point is not the food.&lt;/p&gt;

&lt;p&gt;It's not the drinks.&lt;/p&gt;

&lt;p&gt;It's this:&lt;/p&gt;

&lt;p&gt;Bringing people back together as humans.&lt;/p&gt;




&lt;p&gt;From time to time, I also invite people I deeply respect in the industry.&lt;/p&gt;

&lt;p&gt;People I trust. People with real experience.&lt;/p&gt;

&lt;p&gt;I want others to sit at the same table with them.&lt;/p&gt;

&lt;p&gt;Not in a meeting.&lt;/p&gt;

&lt;p&gt;Not in a presentation.&lt;/p&gt;

&lt;p&gt;But in real life.&lt;/p&gt;

&lt;p&gt;I've been lucky. Throughout my career I worked with directors and managers like Oguz Bayram, Cumhur Kizilari, Serkan Berksoy, Kaan Erdemir. What I learned from them, I try to carry to others—not just as knowledge, but as lived experience, as the humanity behind it.&lt;/p&gt;

&lt;p&gt;While writing this and building Turaç Meyhanesi, my main inspiration was my dear wife and the friendships that grew out of her work circle—Gizem, Beril, Merve, Ezgi, Busem, Begum, Mert, Eda, Onur, Eylul, Omur, Aycan, and many other wonderful friends.&lt;/p&gt;

&lt;p&gt;To observe how they think.&lt;/p&gt;

&lt;p&gt;How they talk.&lt;/p&gt;

&lt;p&gt;How they connect ideas.&lt;/p&gt;




&lt;p&gt;Something interesting happens at that table.&lt;/p&gt;

&lt;p&gt;If Anıl is there, I don't touch the wine selection.&lt;/p&gt;

&lt;p&gt;Because for him, choosing wine is not just a choice. It's a story.&lt;/p&gt;

&lt;p&gt;He talks about the grapes, where they're grown, the notes, why that bottle fits that moment.&lt;/p&gt;

&lt;p&gt;And every single time, the choice is spot on.&lt;/p&gt;

&lt;p&gt;Ertun shows up with small surprises.&lt;/p&gt;

&lt;p&gt;Burak brings unexpected topics into the conversation.&lt;/p&gt;

&lt;p&gt;These are not small details.&lt;/p&gt;

&lt;p&gt;These are how people learn.&lt;/p&gt;




&lt;p&gt;And this is where it gets uncomfortable.&lt;/p&gt;

&lt;p&gt;Because none of this happens on Zoom.&lt;/p&gt;

&lt;p&gt;Not on Slack.&lt;/p&gt;

&lt;p&gt;Not in Notion.&lt;/p&gt;

&lt;p&gt;Not with "camera on."&lt;/p&gt;




&lt;p&gt;Remote work doesn't kill communication.&lt;/p&gt;

&lt;p&gt;It kills connection.&lt;/p&gt;

&lt;p&gt;It removes those small, unplanned interactions where culture actually forms.&lt;/p&gt;

&lt;p&gt;Where trust builds.&lt;/p&gt;

&lt;p&gt;Where people become more than roles.&lt;/p&gt;




&lt;p&gt;Instead, work slowly becomes this: Tickets. Meetings. Updates. Outputs. Everything moves. Everything looks efficient.&lt;/p&gt;

&lt;p&gt;But something important disappears:&lt;/p&gt;

&lt;p&gt;The feeling of being part of something.&lt;/p&gt;




&lt;p&gt;And then we ask: "Will AI replace us?"&lt;/p&gt;

&lt;p&gt;I think that's the wrong question.&lt;/p&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;p&gt;If we're already just task partners, what exactly makes us irreplaceable?&lt;/p&gt;




&lt;p&gt;If your presence in a company is reduced to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;closing tickets&lt;/li&gt;
&lt;li&gt;joining meetings&lt;/li&gt;
&lt;li&gt;writing updates&lt;/li&gt;
&lt;li&gt;producing output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI is already getting very good at that.&lt;/p&gt;

&lt;p&gt;Sometimes more consistent.&lt;/p&gt;

&lt;p&gt;Sometimes more aligned.&lt;/p&gt;

&lt;p&gt;Sometimes even easier to work with.&lt;/p&gt;




&lt;p&gt;Here's the uncomfortable truth:&lt;/p&gt;

&lt;p&gt;AI won't visit you in the hospital.&lt;/p&gt;

&lt;p&gt;It won't show up at your funeral.&lt;/p&gt;

&lt;p&gt;It won't sit next to you when you're having a bad day.&lt;/p&gt;

&lt;p&gt;But let's be honest…&lt;/p&gt;

&lt;p&gt;Most of the people you work with remotely won't either.&lt;/p&gt;

&lt;p&gt;Because you don't really have a relationship.&lt;/p&gt;

&lt;p&gt;You have a task partnership.&lt;/p&gt;




&lt;p&gt;That's the real risk.&lt;/p&gt;

&lt;p&gt;Not technical replacement.&lt;/p&gt;

&lt;p&gt;Cultural replacement.&lt;/p&gt;

&lt;p&gt;Technical replacement is when AI writes the code or closes the ticket. Cultural replacement is when the team is already just a workflow, so swapping a person for a tool doesn't feel like a loss.&lt;/p&gt;

&lt;p&gt;If there's no human connection, the system doesn't lose much when it replaces you.&lt;/p&gt;

&lt;p&gt;Because what you bring is already reduced to output.&lt;/p&gt;




&lt;p&gt;This is why I care about Turaç Meyhanesi.&lt;/p&gt;

&lt;p&gt;It's not dinner.&lt;/p&gt;

&lt;p&gt;It's not networking.&lt;/p&gt;

&lt;p&gt;It's not team bonding.&lt;/p&gt;

&lt;p&gt;It's a small, intentional attempt to rebuild something remote work quietly removes: human connection.&lt;/p&gt;




&lt;p&gt;I'm not against remote work.&lt;/p&gt;

&lt;p&gt;I still prefer it.&lt;/p&gt;

&lt;p&gt;But I don't think we can treat it as "free."&lt;/p&gt;

&lt;p&gt;If you remove physical proximity, you need to replace it with something.&lt;/p&gt;

&lt;p&gt;Otherwise, what you get is not a remote-first culture.&lt;/p&gt;

&lt;p&gt;It's an office-less workflow system.&lt;/p&gt;

&lt;p&gt;Remote teams do not only need better tools.&lt;/p&gt;

&lt;p&gt;They need better rituals.&lt;/p&gt;

&lt;p&gt;Because culture is not created in dashboards, tickets, or weekly updates.&lt;/p&gt;

&lt;p&gt;It is created in repeated human moments.&lt;/p&gt;

&lt;p&gt;It doesn't have to be a dinner at my place. It can be a monthly call with no agenda and cameras on. It can be saving the last ten minutes of retro to ask what you learned from someone, not what you shipped. It can be a simple rule that every new hire gets three non-work questions before any work questions.&lt;/p&gt;




&lt;p&gt;My solution is simple.&lt;/p&gt;

&lt;p&gt;Maybe too simple.&lt;/p&gt;

&lt;p&gt;I set a table.&lt;/p&gt;

&lt;p&gt;I bring people together.&lt;/p&gt;

&lt;p&gt;And I try to remind everyone, including myself, of something very basic:&lt;/p&gt;

&lt;p&gt;Humans still grow through other humans.&lt;/p&gt;




&lt;p&gt;If you're building remote teams, maybe the real question is not: "How do we stay productive?"&lt;/p&gt;

&lt;p&gt;But: "Where do people actually connect?"&lt;/p&gt;

&lt;p&gt;Because if the answer is "nowhere"…&lt;/p&gt;

&lt;p&gt;Then the system will eventually stop needing people.&lt;/p&gt;




&lt;h2&gt;
  
  
  Some evenings from Turaç Meyhanesi
&lt;/h2&gt;

&lt;p&gt;The table I wrote about. Not a setup, just an intention to bring people together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjllxakjt1qpz5txq5l4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjllxakjt1qpz5txq5l4.jpg" alt=" " width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukuuc5lkyo5ut0xeauy2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukuuc5lkyo5ut0xeauy2.jpg" alt=" " width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyghk1km0g6yfg7o4hjft.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyghk1km0g6yfg7o4hjft.jpeg" alt=" " width="800" height="1422"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0qyyodb34k2qpgv7adg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu0qyyodb34k2qpgv7adg.jpg" alt="The wine story - Anıl's pick" width="800" height="1110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foniup6t56cmj44x7ekef.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foniup6t56cmj44x7ekef.jpg" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2s2ob1wh66guuw633tq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2s2ob1wh66guuw633tq.jpg" alt=" " width="800" height="1422"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>culture</category>
      <category>remote</category>
      <category>career</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Hermes vs OpenClaw: Which AI assistant would you actually trust?</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:30:22 +0000</pubDate>
      <link>https://dev.to/turacthethinker/hermes-vs-openclaw-which-ai-assistant-would-you-actually-trust-bbl</link>
      <guid>https://dev.to/turacthethinker/hermes-vs-openclaw-which-ai-assistant-would-you-actually-trust-bbl</guid>
      <description>&lt;p&gt;I just published a new AI roundtable episode comparing Hermes vs OpenClaw from a practical, real-world workflow perspective.&lt;/p&gt;

&lt;p&gt;This is not a benchmark comparison.&lt;/p&gt;

&lt;p&gt;I was more interested in a deeper question:&lt;/p&gt;

&lt;p&gt;Which AI assistant would you actually trust to help run your day?&lt;/p&gt;

&lt;p&gt;We looked at how next-generation AI assistants could work as a personal operating layer across:&lt;/p&gt;

&lt;p&gt;Email and inbox triage&lt;br&gt;
Calendar planning&lt;br&gt;
Notes and memory&lt;br&gt;
Browser-based workflows&lt;br&gt;
Task execution&lt;br&gt;
Creator workflows&lt;br&gt;
Learning support&lt;br&gt;
Permissions and privacy&lt;br&gt;
Small-team coordination&lt;/p&gt;

&lt;p&gt;The most interesting part for me is that “agentic AI” is not only a model capability problem.&lt;/p&gt;

&lt;p&gt;It is also a product architecture problem.&lt;/p&gt;

&lt;p&gt;A useful personal AI assistant needs:&lt;/p&gt;

&lt;p&gt;Clear permission boundaries&lt;br&gt;
Good memory design&lt;br&gt;
Human-readable decisions&lt;br&gt;
Reliable task execution&lt;br&gt;
Safe automation&lt;br&gt;
Privacy-first defaults&lt;br&gt;
Recovery when something goes wrong&lt;br&gt;
Enough context to help without becoming unpredictable&lt;/p&gt;

&lt;p&gt;That is where the Hermes vs OpenClaw comparison becomes interesting.&lt;/p&gt;

&lt;p&gt;One side feels closer to an intelligent daily partner.&lt;br&gt;
The other side pushes more toward open, controllable, agentic execution.&lt;/p&gt;

&lt;p&gt;I think this category will become one of the most important parts of human-computer interaction over the next few years.&lt;/p&gt;

&lt;p&gt;Full episode:&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/MdBBVyDA5Yw"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;I would love to hear how other developers think about this:&lt;/p&gt;

&lt;p&gt;What would make you trust an AI assistant enough to let it manage your inbox, calendar, tasks, or browser workflows?&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>openclaw</category>
      <category>podcast</category>
    </item>
    <item>
      <title>Strategic LLM Adoption: A Director's Guide to Fine-Tuning Models for Domain-Specific Applications</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Wed, 29 Apr 2026 12:14:51 +0000</pubDate>
      <link>https://dev.to/turacthethinker/strategic-llm-adoption-a-directors-guide-to-fine-tuning-models-for-domain-specific-applications-4e37</link>
      <guid>https://dev.to/turacthethinker/strategic-llm-adoption-a-directors-guide-to-fine-tuning-models-for-domain-specific-applications-4e37</guid>
      <description>&lt;h1&gt;
  
  
  Strategic LLM Adoption: A Director's Guide to Fine-Tuning Models for Domain-Specific Applications
&lt;/h1&gt;

&lt;p&gt;As AI continues to reshape enterprise technology stacks, engineering leaders face a critical decision: how to leverage large language models (LLMs) effectively while maintaining operational stability, security, and ROI. For directors overseeing multi-language environments—Next.js frontends, Go microservices, Python ML pipelines, and .NET C# backend services—the challenge isn't just technical; it's strategic. This article outlines a pragmatic framework for adopting LLMs through targeted fine-tuning, ensuring alignment with business objectives and technical constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Fine-Tuning Beats Prompt Engineering at Scale
&lt;/h2&gt;

&lt;p&gt;Prompt engineering offers quick wins but hits limitations in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistency&lt;/strong&gt;: Identical prompts can yield varying outputs due to model non-determinism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token costs&lt;/strong&gt;: Repeatedly passing context-heavy prompts inflates latency and expenses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain specificity&lt;/strong&gt;: Generic models struggle with niche terminology, internal APIs, or proprietary data patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fine-tuning addresses these by adapting model weights to your specific use case, yielding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable, deterministic outputs for given inputs.&lt;/li&gt;
&lt;li&gt;Reduced token usage (often 60-80% less) via shorter prompts.&lt;/li&gt;
&lt;li&gt;Enhanced accuracy on domain-specific tasks (e.g., interpreting internal log formats, generating code snippets in your stack).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Director's Framework: Four Phases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use Case Selection &amp;amp; Success Metrics
&lt;/h3&gt;

&lt;p&gt;Start narrow. Pick a high-impact, well-defined problem where LLMs augment—not replace—human expertise. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automating boilerplate code generation for REST endpoints in Go services.&lt;/li&gt;
&lt;li&gt;Translating legacy .NET C# business rules into executable decision trees.&lt;/li&gt;
&lt;li&gt;Summarizing Python ML experiment logs for quick stakeholder reviews.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Define success metrics upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: % of outputs passing automated validation (e.g., compilable code, correct schema).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt;: Reduction in developer hours per task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adoption rate&lt;/strong&gt;: % of target teams integrating the tool into workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Data Preparation: The Hidden Investment
&lt;/h3&gt;

&lt;p&gt;Fine-tuning quality hinges on data quality. Allocate 40% of effort here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source&lt;/strong&gt;: Extract from internal repositories, ticketing systems, documentation, and code reviews.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cleaning&lt;/strong&gt;: Remove PII, secrets, and noisy outliers. Use automated scripts (Python/PowerShell) to sanitize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Formatting&lt;/strong&gt;: Structure as instruction-response pairs. For code tasks: &lt;code&gt;{ "prompt": "Generate a Go handler for /users endpoint with JWT auth", "completion": "func Handler(w http.ResponseWriter, r *http.Request) { ... }" }&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt;: Hold out 10-15% for testing; ensure no leakage between train/test splits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Model Selection &amp;amp; Training Strategy
&lt;/h3&gt;

&lt;p&gt;Choose a base model matching your latency and privacy needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Open weights&lt;/strong&gt; (Llama 3, Mistral) for on-prem/VPC deployment—critical for sensitive .NET or Go services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API-accessible&lt;/strong&gt; (GPT-4, Claude) for prototyping, but verify data usage policies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Training tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;LoRA&lt;/strong&gt; (Low-Rank Adaptation) to reduce compute costs; a single A10G can fine-tune 7B models in hours.&lt;/li&gt;
&lt;li&gt;Monitor &lt;strong&gt;loss curves&lt;/strong&gt; and &lt;strong&gt;validation accuracy&lt;/strong&gt;—stop when validation plateaus to avoid overfitting.&lt;/li&gt;
&lt;li&gt;For code generation, incorporate &lt;strong&gt;syntax validators&lt;/strong&gt; (e.g., &lt;code&gt;gofmt&lt;/code&gt;, &lt;code&gt;dotnet format&lt;/code&gt;) into the training loop via reward modeling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Integration &amp;amp; Governance
&lt;/h3&gt;

&lt;p&gt;Deploy fine-tuned models as internal microservices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Wrapper service&lt;/strong&gt;: Thin Go or .NET API that handles authentication, request/response logging, and fallback to base model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt;: Track latency, token usage, and error rates. Alert on drift via periodic re-evaluation on holdout set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback loop&lt;/strong&gt;: Capture user corrections (e.g., "regenerate with stricter typing") to continuously improve the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Governance essentials:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model cards&lt;/strong&gt;: Document training data, intended use, limitations, and evaluation results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access control&lt;/strong&gt;: Tie model endpoints to internal IAM; audit logs for compliance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning&lt;/strong&gt;: Treat models like code—tag, rollback, and A/B test new versions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Impact: A Case Study
&lt;/h2&gt;

&lt;p&gt;A fintech director applied this framework to automate API contract generation for their Go microservices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt;: 5,000 annotated OpenAPI snippets from internal services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt;: Llama 3 8B fine-tuned with LoRA on 2x A10G (24 hours).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Results&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;70% reduction in time to create new service contracts.&lt;/li&gt;
&lt;li&gt;92% of generated contracts passed linting on first try.&lt;/li&gt;
&lt;li&gt;Developer NPS increased by 34 points due to reduced boilerplate fatigue.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pitfalls to Avoid
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Overestimating generalization&lt;/strong&gt;: A model fine-tuned on Go code won’t magically understand .NET C#—scope tightly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring prompt hygiene&lt;/strong&gt;: Even fine-tuned models benefit from clear, constrained prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Underestimating change management&lt;/strong&gt;: Engineers may distrust AI outputs; pair with training and incremental rollout.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;For technology directors, LLMs aren’t a magic wand—they’re a force multiplier when applied with discipline. By focusing on targeted fine-tuning, measuring outcomes, and investing in governance, you turn AI experimentation into predictable engineering advantage. Start small, prove value, then scale across your polyglot stack.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Ready to pilot? Identify one repetitive, well-documented task in your current sprint and treat it as your fine-tuning MVP.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;*Published: April 2026&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>finetuning</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Context Window Lie: Why Your LLM Remembers Nothing</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Mon, 27 Apr 2026 07:40:18 +0000</pubDate>
      <link>https://dev.to/turacthethinker/the-context-window-lie-why-your-llm-remembers-nothing-5h1p</link>
      <guid>https://dev.to/turacthethinker/the-context-window-lie-why-your-llm-remembers-nothing-5h1p</guid>
      <description>&lt;h1&gt;
  
  
  The Context Window Lie: Why Your LLM Remembers Nothing
&lt;/h1&gt;

&lt;p&gt;Every time you paste 200K tokens into Claude or GPT, you're not extending its memory.&lt;/p&gt;

&lt;p&gt;You're paying for amnesia at scale.&lt;/p&gt;

&lt;p&gt;The "1M token context" headline is a billing mechanism, not a memory system. And the gap between what the marketing implies and what the model actually does is where most LLM products quietly bleed money and reliability.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Marketing vs. The Math
&lt;/h2&gt;

&lt;p&gt;"1 million tokens of context" sounds like the model &lt;em&gt;holds&lt;/em&gt; a million tokens of understanding.&lt;/p&gt;

&lt;p&gt;It does not. It re-reads them. Every. Single. Turn.&lt;/p&gt;

&lt;p&gt;Standard transformer attention is &lt;strong&gt;O(n²)&lt;/strong&gt; in sequence length. Here's what that actually means for your inference bill:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Context Size&lt;/th&gt;
&lt;th&gt;Relative Attention Cost&lt;/th&gt;
&lt;th&gt;Typical API Cost (est.)&lt;/th&gt;
&lt;th&gt;What You're Paying For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;td&gt;~$0.02/turn&lt;/td&gt;
&lt;td&gt;Small doc + system prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32K tokens&lt;/td&gt;
&lt;td&gt;16×&lt;/td&gt;
&lt;td&gt;~$0.32/turn&lt;/td&gt;
&lt;td&gt;Medium codebase chunk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;256×&lt;/td&gt;
&lt;td&gt;~$2.56/turn&lt;/td&gt;
&lt;td&gt;Large repo dump&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;td&gt;625×&lt;/td&gt;
&lt;td&gt;~$6.25/turn&lt;/td&gt;
&lt;td&gt;"Full project context"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;td&gt;15,625×&lt;/td&gt;
&lt;td&gt;~$156/turn&lt;/td&gt;
&lt;td&gt;Marketing slide feature&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Costs estimated at ~$10/M tokens input; actual varies by provider. The scaling relationship is exact.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You did not give the model a brain. You gave it a re-reading job, and you're paying per page, per turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Longer Context ≠ Better Recall
&lt;/h2&gt;

&lt;p&gt;The dirty secret: even when models &lt;em&gt;can&lt;/em&gt; read 200K+ tokens, they often don't &lt;em&gt;use&lt;/em&gt; them well.&lt;/p&gt;

&lt;p&gt;The "lost in the middle" effect has been systematically measured. Here's what the research shows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Information Position&lt;/th&gt;
&lt;th&gt;Retrieval Accuracy&lt;/th&gt;
&lt;th&gt;vs. Ideal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First 10% of context&lt;/td&gt;
&lt;td&gt;~95%&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Last 10% of context&lt;/td&gt;
&lt;td&gt;~91%&lt;/td&gt;
&lt;td&gt;-4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Middle 50% of context&lt;/td&gt;
&lt;td&gt;~52–68%&lt;/td&gt;
&lt;td&gt;-27 to -43%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Buried in 20-doc retrieval&lt;/td&gt;
&lt;td&gt;~35%&lt;/td&gt;
&lt;td&gt;-60%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Adapted from Liu et al. (2023), "Lost in the Middle: How Language Models Use Long Contexts"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Put your critical instruction on line 4,000 of an 8,000-line prompt, and the model will politely ignore it while sounding confident.&lt;/p&gt;

&lt;p&gt;So you pay 4× the compute for context that the model is &lt;em&gt;worse&lt;/em&gt; at using than a focused 8K prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recall by position (schematic):
100% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
 90% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░
 80%
 70%
 60%               ████████
 50%         ███████████████
 40%
      [START]---[MIDDLE]---[END]

Peak recall at edges. Valley in the middle.
The more tokens you add, the deeper the valley.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a bug you can prompt your way out of. It's an architectural property of dense attention.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Verbatim Retrieval ≠ Understanding
&lt;/h2&gt;

&lt;p&gt;Here's the deeper trap.&lt;/p&gt;

&lt;p&gt;Pasting your entire codebase into context does not teach the model your architecture. It gives it raw bytes to attend over. The model still has to re-derive your domain model, your conventions, your invariants — from scratch — every single turn.&lt;/p&gt;

&lt;p&gt;Consider what actually happens in a typical "full context" session:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What You Think Is Happening&lt;/th&gt;
&lt;th&gt;What Is Actually Happening&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model "knows" your codebase&lt;/td&gt;
&lt;td&gt;Model re-reads all tokens each turn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context = persistent memory&lt;/td&gt;
&lt;td&gt;Context = turn-scoped buffer, cleared after response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Larger window = smarter answers&lt;/td&gt;
&lt;td&gt;Larger window = higher O(n²) cost, same ephemeral state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model learns your patterns&lt;/td&gt;
&lt;td&gt;Model re-derives patterns from raw tokens every turn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200K tokens = 200K understanding&lt;/td&gt;
&lt;td&gt;200K tokens ≈ 200K bytes to attend over, no compression&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verbatim availability is &lt;strong&gt;lossy compression dressed up as memory&lt;/strong&gt;. The tokens are there. The understanding isn't. And because the model is fluent, it will hallucinate coherence over that gap with a straight face.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Architectural Fix: Where the Frontier Is Actually Going
&lt;/h2&gt;

&lt;p&gt;The real solutions don't live in prompt engineering. They live in the architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;Long-Range State&lt;/th&gt;
&lt;th&gt;Production Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard Transformer (GPT-4, Claude)&lt;/td&gt;
&lt;td&gt;O(n²)&lt;/td&gt;
&lt;td&gt;❌ No persistent state&lt;/td&gt;
&lt;td&gt;Dominant today&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sparse Attention (Longformer, BigBird)&lt;/td&gt;
&lt;td&gt;O(n√n)&lt;/td&gt;
&lt;td&gt;❌ Heuristic, not true state&lt;/td&gt;
&lt;td&gt;Niche use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linear Attention (RWKV, RetNet)&lt;/td&gt;
&lt;td&gt;O(n)&lt;/td&gt;
&lt;td&gt;✅ True recurrence&lt;/td&gt;
&lt;td&gt;Early production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State Space Models (Mamba, Mamba-2)&lt;/td&gt;
&lt;td&gt;O(n)&lt;/td&gt;
&lt;td&gt;✅ Compressed recurrent state&lt;/td&gt;
&lt;td&gt;Growing adoption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid Stack (Jamba, Zamba, Falcon-H1)&lt;/td&gt;
&lt;td&gt;O(n) avg&lt;/td&gt;
&lt;td&gt;✅ Best of both&lt;/td&gt;
&lt;td&gt;Frontier direction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Mamba&lt;/strong&gt; deserves special mention: it uses a selective state space mechanism where the model learns &lt;em&gt;what to remember and what to forget&lt;/em&gt; during the forward pass. Not attention over a re-read sequence — actual running state. Linear time. Linear memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid stacks&lt;/strong&gt; (attention layers for short-range precision + SSM layers for long-range state) are emerging as the practical answer: you keep the expressiveness of attention where it matters and trade it for efficiency at scale.&lt;/p&gt;

&lt;p&gt;This is not academic. Falcon-H1, Zamba2, and Jamba are in production. The shift is happening.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The Engineering Fix (Available Today)
&lt;/h2&gt;

&lt;p&gt;Until linear-time architectures dominate production, the practical answer is unsexy and obvious:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop dumping. Start indexing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's how the strategies compare in practice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Context Usage&lt;/th&gt;
&lt;th&gt;Cost Scaling&lt;/th&gt;
&lt;th&gt;Recall Quality&lt;/th&gt;
&lt;th&gt;Implementation Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full context dump&lt;/td&gt;
&lt;td&gt;Very high&lt;/td&gt;
&lt;td&gt;O(n²) per turn&lt;/td&gt;
&lt;td&gt;Medium (lost-in-middle)&lt;/td&gt;
&lt;td&gt;None — copy-paste&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG (chunk + retrieve)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;O(1) per turn&lt;/td&gt;
&lt;td&gt;High (targeted)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured memory&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;O(1) per turn&lt;/td&gt;
&lt;td&gt;Very high (curated)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool-augmented retrieval&lt;/td&gt;
&lt;td&gt;On-demand&lt;/td&gt;
&lt;td&gt;O(k) per query&lt;/td&gt;
&lt;td&gt;Highest (precise)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid (RAG + structure)&lt;/td&gt;
&lt;td&gt;Controlled&lt;/td&gt;
&lt;td&gt;O(k) per turn&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The cost difference between a naive context dump and a well-built RAG system is not marginal. On a high-volume production system:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;th&gt;Full-Context (128K/turn)&lt;/th&gt;
&lt;th&gt;RAG (8K/turn)&lt;/th&gt;
&lt;th&gt;Monthly Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1,000 turns/day&lt;/td&gt;
&lt;td&gt;~$9,600/mo&lt;/td&gt;
&lt;td&gt;~$600/mo&lt;/td&gt;
&lt;td&gt;~$9,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000 turns/day&lt;/td&gt;
&lt;td&gt;~$96,000/mo&lt;/td&gt;
&lt;td&gt;~$6,000/mo&lt;/td&gt;
&lt;td&gt;~$90,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000 turns/day&lt;/td&gt;
&lt;td&gt;~$960,000/mo&lt;/td&gt;
&lt;td&gt;~$60,000/mo&lt;/td&gt;
&lt;td&gt;~$900,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Estimates at $10/M tokens. Actual ratios depend on your retrieval precision.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The teams shipping reliable LLM products are not the ones with the biggest context windows. They are the ones who treat memory as a &lt;em&gt;system&lt;/em&gt; — with retrieval, indexing, eviction, and verification — not as a parameter on an API call.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. What Good Memory Architecture Looks Like
&lt;/h2&gt;

&lt;p&gt;If you're building a production LLM system, this is the hierarchy that works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L1: Working Context (hot path)
    ↳ Current turn, active task, immediate tool outputs
    ↳ Budget: ≤8K tokens. Trim aggressively.

L2: Session Memory (structured, not verbatim)
    ↳ Distilled decisions, resolved questions, current state
    ↳ Format: key-value or JSON, not prose transcripts
    ↳ Budget: ≤2K tokens

L3: Retrieval Index (RAG)
    ↳ Chunked, embedded, queryable knowledge base
    ↳ Pull on demand, cite sources, don't pre-load
    ↳ Budget: 0 tokens until queried

L4: Persistent Storage
    ↳ Database, files, external systems
    ↳ The model reads only what it explicitly fetches
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every token that crosses from L3/L4 into L1 should be &lt;em&gt;intentional&lt;/em&gt;. If you can't explain why a chunk is in the prompt, remove it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Memory is a system, not a parameter.&lt;/p&gt;

&lt;p&gt;The context window is a buffer for the &lt;em&gt;current turn&lt;/em&gt;. It is not where understanding lives. Treat it that way and your bills shrink, your reliability climbs, and your product stops degrading at scale.&lt;/p&gt;

&lt;p&gt;The architectural fix is coming — SSMs and hybrid stacks will eventually make this a smaller problem. But "eventually" is not your production environment today.&lt;/p&gt;

&lt;p&gt;Stop paying for amnesia. Build for memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;Lost in the Middle: How Language Models Use Long Contexts&lt;/a&gt; — Liu et al., 2023&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2312.00752" rel="noopener noreferrer"&gt;Mamba: Linear-Time Sequence Modeling with Selective State Spaces&lt;/a&gt; — Gu &amp;amp; Dao, 2023&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2403.19887" rel="noopener noreferrer"&gt;Jamba: A Hybrid Transformer-Mamba Language Model&lt;/a&gt; — AI21 Labs, 2024&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What's your context strategy in production? RAG, structured memory, hybrid, or still in the context-dump phase? Curious where teams are actually drawing this line.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Stop Your AI Agent From Building Tools That Already Exist</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:51:30 +0000</pubDate>
      <link>https://dev.to/turacthethinker/stop-your-ai-agent-from-building-tools-that-already-exist-6o9</link>
      <guid>https://dev.to/turacthethinker/stop-your-ai-agent-from-building-tools-that-already-exist-6o9</guid>
      <description>&lt;p&gt;Your agent just wrote a custom PDF parser.&lt;/p&gt;

&lt;p&gt;There were four maintained libraries that do exactly this. It didn't check. It never does.&lt;/p&gt;

&lt;p&gt;This is the default behavior of every coding agent: task arrives → code is written → you maintain the bespoke solution forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;skill-hunter&lt;/strong&gt; is the missing pause.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;skill-hunter is a pre-execution layer for coding and automation agents. Before your agent writes a single line, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classifies the request&lt;/strong&gt; — what kind of task is this?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scans the ecosystem&lt;/strong&gt; — MCP servers, CLIs, npm/pip packages, APIs, GitHub repos, existing repo utilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scores candidates&lt;/strong&gt; — fit, maintenance activity, permissions, security, docs quality, license, integration effort&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommends a path&lt;/strong&gt; — reuse, adapt, build minimally, or build custom&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gates risky actions&lt;/strong&gt; — installs, credentials, external service connections, destructive ops require explicit approval&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent checks the toolbox before building another hammer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Before / After
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Without skill-hunter:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Parse these PDFs and extract invoices."&lt;br&gt;
Agent: &lt;em&gt;writes 200-line custom parser&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;With skill-hunter:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Parse these PDFs and extract invoices."&lt;br&gt;
Agent: &lt;em&gt;checks PDF libraries, OCR tools, invoice extraction APIs, asks whether accuracy, cost, privacy, or offline processing matters&lt;/em&gt;&lt;br&gt;
Agent: "pdfplumber covers 90% of this. Want me to wrap it with your schema instead?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Install in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add mturac/skill-hunter
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;skill-hunter@skill-hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Codex CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex plugin marketplace add mturac/skill-hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in &lt;code&gt;~/.codex/config.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[plugins."skill-hunter@skill-hunter"]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenClaw:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/openclaw-skills:skill_hunter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The problem isn't that agents write bad code. The problem is that agents write &lt;em&gt;unnecessary&lt;/em&gt; code — and then you're stuck maintaining it.&lt;/p&gt;

&lt;p&gt;Every custom solution is a maintenance debt. Every dependency you don't introduce is a security surface you don't expose. Every API you don't reinvent is a battle-tested edge-case handler you get for free.&lt;/p&gt;

&lt;p&gt;skill-hunter doesn't stop agents from building. It stops agents from building what already exists.&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="https://github.com/mturac/skill-hunter" rel="noopener noreferrer"&gt;mturac/skill-hunter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If this solves a real pain for you, a star helps me know what to keep building.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why Versioned SQL Beats Vector RAG for Agent Memory Systems</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:34:36 +0000</pubDate>
      <link>https://dev.to/turacthethinker/why-versioned-sql-beats-vector-rag-for-agent-memory-systems-1jo3</link>
      <guid>https://dev.to/turacthethinker/why-versioned-sql-beats-vector-rag-for-agent-memory-systems-1jo3</guid>
      <description>&lt;p&gt;&lt;strong&gt;Stop building agent memory systems on top of vector databases. You're setting your team up for failure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector RAG looks elegant in demos. Pass a query, get back similar chunks, stuff them into context. Done. But when you scale to multiple agents collaborating over time? It collapses. Hard.&lt;/p&gt;

&lt;p&gt;Here's why: &lt;strong&gt;RAG conflates retrieval with reconciliation.&lt;/strong&gt; It assumes all knowledge is additive. That conflicts don't exist. That agents won't overwrite each other. They do. They will.&lt;/p&gt;

&lt;p&gt;What you actually need isn't &lt;em&gt;retrieval&lt;/em&gt;—it's &lt;em&gt;merge&lt;/em&gt;. Not "find me something like this." It's "here's my view of the world, now let's reconcile it with yours."&lt;/p&gt;

&lt;p&gt;Enter: &lt;strong&gt;versioned SQL.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not just any SQL. Think Git, but for structured data. Records have hashes. Changes form a DAG. Conflicts are resolved through explicit merges. History is preserved, not flattened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval ≠ Reconciliation
&lt;/h2&gt;

&lt;p&gt;In RAG, vectors are stateless snapshots. Embeddings encode meaning at a point in time. But they can't tell you how that meaning evolved. Or where it came from. Or who changed it last.&lt;/p&gt;

&lt;p&gt;Agents write to memory. Multiple agents write concurrently. Without versioning, you lose causality. You lose intent. You end up with garbage-in-garbage-out loops.&lt;/p&gt;

&lt;p&gt;Merge-aware systems track lineage. Every change links to its parent. Agents can see &lt;em&gt;why&lt;/em&gt; something was written, not just that it exists. This enables safe collaboration.&lt;/p&gt;

&lt;p&gt;Imagine two agents updating a customer record simultaneously. One adds a new address. Another marks the account as inactive. In RAG land, both updates vanish into embedding space. Which one wins? Who knows?&lt;/p&gt;

&lt;p&gt;With versioned SQL, those changes live as separate commits. A merge strategy determines resolution. Maybe the system auto-resolves. Maybe it flags conflict. Either way—you keep control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lost in the Middle, Again
&lt;/h2&gt;

&lt;p&gt;Long-context windows were supposed to fix everything. Just throw more tokens at the model! Except now you're fighting the "lost in the middle" problem. Models forget things buried deep in context.&lt;/p&gt;

&lt;p&gt;Vectors amplify this. Similarity search returns semantically relevant chunks—but ordering matters. And there's no guarantee your retrieved facts are temporally coherent.&lt;/p&gt;

&lt;p&gt;Versioned memory solves this differently. Instead of stuffing raw text into prompts, store compact representations. Task graphs. Semantic summaries. Structured diffs.&lt;/p&gt;

&lt;p&gt;Tools like &lt;a href="https://github.com/gastownhall/beads" rel="noopener noreferrer"&gt;beads&lt;/a&gt; compress knowledge into minimal, composable units. Each bead tracks dependencies. Relationships stay intact even when content shifts.&lt;/p&gt;

&lt;p&gt;This isn't about indexing documents anymore. It's about modeling evolving beliefs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dolt-Backed Dependency Graphs Change Everything
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.dolthub.com/" rel="noopener noreferrer"&gt;Dolt&lt;/a&gt; brings Git-style versioning to relational tables. Combine that with hash-based IDs and dependency tracking—you've got a foundation for truly collaborative agent memory.&lt;/p&gt;

&lt;p&gt;Each agent writes to a branch. Commits reference prior states via SHA-like identifiers. Merge conflicts surface explicitly. No silent overwrites. No hallucinated truths.&lt;/p&gt;

&lt;p&gt;Semantic compaction layers on top. Summarize large changesets into atomic facts. Store those alongside full history. Query either representation depending on need.&lt;/p&gt;

&lt;p&gt;This is how teams should build shared understanding—not by dumping embeddings into Pinecone and hoping for the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vectors Are Still Useful—Just Not Here
&lt;/h2&gt;

&lt;p&gt;Don't misunderstand. Embeddings aren't going away. They excel at classification, clustering, anomaly detection.&lt;/p&gt;

&lt;p&gt;But using them as primary storage for agent memory is like using Redis for source code. Sure, it works—for a while. Then concurrency bites. Then consistency breaks.&lt;/p&gt;

&lt;p&gt;Vectors lack identity. They lack transactional semantics. They lack audit trails. These aren't bugs—they're design limitations.&lt;/p&gt;

&lt;p&gt;Use vectors where fuzziness helps. Use versioned SQL where precision matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build Instead
&lt;/h2&gt;

&lt;p&gt;Start with structure. Define schemas that reflect your domain. Tasks, entities, relationships—they all deserve types.&lt;/p&gt;

&lt;p&gt;Add versioning. Track every mutation. Preserve causality. Enable branching workflows.&lt;/p&gt;

&lt;p&gt;Implement merge strategies. Decide upfront how conflicting writes resolve. Automate where possible. Alert humans when needed.&lt;/p&gt;

&lt;p&gt;Layer compression on top. Extract semantic cores. Prune redundant paths. Keep only what's essential for reasoning.&lt;/p&gt;

&lt;p&gt;That's how you build memory systems that scale—with clarity, not chaos.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Has your RAG setup collapsed under concurrent multi-agent writes? Or are you already versioning your agent memory? Drop your setup below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Got Access to 136 AI Models for Free — NVIDIA NIM API Deep Dive</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 18:18:58 +0000</pubDate>
      <link>https://dev.to/turacthethinker/i-got-access-to-136-ai-models-for-free-nvidia-nim-api-deep-dive-111o</link>
      <guid>https://dev.to/turacthethinker/i-got-access-to-136-ai-models-for-free-nvidia-nim-api-deep-dive-111o</guid>
      <description>&lt;p&gt;NVIDIA quietly built one of the most impressive AI APIs out there — and most developers don't know it exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NVIDIA NIM&lt;/strong&gt; (NVIDIA Inference Microservices) gives you OpenAI-compatible access to 136 models through a single endpoint. We're talking Llama 405B, Kimi K2, Mistral Large 3 675B, Qwen3-Coder 480B. All behind the same interface you already know.&lt;/p&gt;

&lt;p&gt;Here's what I found after testing them all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup (60 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY_HERE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Get your key at &lt;a href="https://build.nvidia.com" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;. Free tier included.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 136 Models — What's Actually in There
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1/models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The catalog spans 20+ organizations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Org&lt;/th&gt;
&lt;th&gt;Notable Models&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Llama 3.1 405B, Llama 4 Maverick 17B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Mistral Large 3 675B, Magistral Small, Codestral&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moonshot&lt;/td&gt;
&lt;td&gt;Kimi K2, Kimi K2 Thinking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen&lt;/td&gt;
&lt;td&gt;Qwen3-Coder 480B, Qwen3.5 397B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;DeepSeek v3.2, v4 Pro, v4 Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA&lt;/td&gt;
&lt;td&gt;Nemotron Ultra 253B, Nemotron Super 49B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ByteDance&lt;/td&gt;
&lt;td&gt;Seed-OSS 36B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;GPT-OSS 120B (yes, really)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Actually Works (I Tested Them All)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;working_models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-coder-480b-a35b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3.5-397b-a17b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/magistral-small-2506&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bytedance/seed-oss-36b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;working_models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain transformers in one sentence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results from my run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;meta/llama-3.1-405b-instruct: ✅ Fast, coherent
moonshotai/kimi-k2-instruct: ✅ Excellent reasoning
qwen/qwen3-coder-480b-a35b-instruct: ✅ Best for code tasks
mistralai/mistral-large-3-675b-instruct-2512: ✅ Strong instruction following
nvidia/llama-3.3-nemotron-super-49b-v1: ✅ NVIDIA-tuned, solid
deepseek-ai/deepseek-v4-pro: ❌ Timeout (high demand)
moonshotai/kimi-k2-thinking: ❌ Timeout (high demand)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Streaming Support
&lt;/h2&gt;

&lt;p&gt;All working models support streaming — critical for production UX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python async web scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Multi-Model Router Pattern
&lt;/h2&gt;

&lt;p&gt;The real power: build a router that falls back across models based on availability and task type.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvapi-YOUR_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-coder-480b-a35b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;moonshotai/kimi-k2-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/llama-3.3-nemotron-super-49b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-large-3-675b-instruct-2512&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.1-405b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bytedance/seed-oss-36b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ROUTING_TABLE&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# fallback to next model
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement a binary search tree in Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Makes This Interesting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. One API key, 20+ providers.&lt;/strong&gt; No juggling Anthropic, OpenAI, Mistral keys separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. OpenAI SDK compatible.&lt;/strong&gt; Zero migration cost from existing code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Specialty models included.&lt;/strong&gt; BGE-M3 for embeddings, NemoRetriever for parsing, CLIP for vision — not just chat models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Free tier is generous.&lt;/strong&gt; Enough for development and light production usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Some flagship models (DeepSeek v4 Pro, Kimi K2 Thinking) timeout under high demand&lt;/li&gt;
&lt;li&gt;Service keys have different scopes than personal keys — test both&lt;/li&gt;
&lt;li&gt;No fine-tuning support (inference only)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;If you're building LLM-powered apps and not using NVIDIA NIM, you're either paying more than you need to or missing access to models that aren't available anywhere else. The multi-model fallback pattern alone is worth the 60-second setup.&lt;/p&gt;

&lt;p&gt;Get your key: &lt;a href="https://build.nvidia.com" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nvidia</category>
      <category>llm</category>
      <category>api</category>
    </item>
    <item>
      <title>Your Agent Isn't Reflecting. It's Performing Reflection.</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 14:17:15 +0000</pubDate>
      <link>https://dev.to/turacthethinker/your-agent-isnt-reflecting-its-performing-reflection-b41</link>
      <guid>https://dev.to/turacthethinker/your-agent-isnt-reflecting-its-performing-reflection-b41</guid>
      <description>&lt;p&gt;Watch any modern agent framework long enough and you'll see it: the model produces output, then "reflects" on the output, then "corrects" itself, then ships a final answer.&lt;/p&gt;

&lt;p&gt;It looks like metacognition. It isn't.&lt;/p&gt;

&lt;p&gt;It's the same model, with the same weights, sampled twice, with the second sample conditioned on the first. There is no separate critic. There is no privileged vantage point. The reviewer and the reviewed are the same network — and the reviewer cannot see anything the original didn't already encode.&lt;/p&gt;

&lt;p&gt;This is reflection theatre.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happens
&lt;/h2&gt;

&lt;p&gt;When you prompt "now critique your previous answer," the model does not consult a deeper layer of itself. It re-decodes from the same distribution, with a critique-shaped prefix. The output looks like self-correction because the prompt biases it toward correction-shaped tokens.&lt;/p&gt;

&lt;p&gt;If the original answer was wrong because the model lacked the relevant fact, the reflection step also lacks the fact. You get fluent confidence about a wrong answer, then fluent confidence about why that wrong answer was right.&lt;/p&gt;

&lt;p&gt;More tokens. Same blind spots. Higher bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it actually helps
&lt;/h2&gt;

&lt;p&gt;Reflection-style chains do help in narrow cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;When the task is decomposable&lt;/strong&gt; and the model can re-attack a sub-step (e.g. arithmetic with a working pad).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When you change the input&lt;/strong&gt; between rounds — adding tool output, retrieval, a different role.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When the model is sampled with different temperature or different system prompts&lt;/strong&gt; to force genuinely different distributions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all three cases, the gain comes from the &lt;em&gt;change in conditioning&lt;/em&gt;, not from "reflection" as a capability.&lt;/p&gt;

&lt;p&gt;If nothing changes between round one and round two except the word "reflect," you are watching a more expensive way to produce the same answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest pattern
&lt;/h2&gt;

&lt;p&gt;What actually catches errors is asymmetric criticism: a different model, a different prompt scaffold, a verifier with a real signal (tests passing, a search result, a user clarification). The reviewer needs information the original didn't have.&lt;/p&gt;

&lt;p&gt;"Same model, second pass" is the cheapest possible critic, and you get what you pay for.&lt;/p&gt;




&lt;p&gt;If your agent loop has a &lt;code&gt;reflect()&lt;/code&gt; step that doesn't change the inputs, delete it and price-check the difference. You will probably not lose quality. You will definitely save tokens.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Context Window Is a Lie</title>
      <dc:creator>Mehmet TURAÇ</dc:creator>
      <pubDate>Sun, 26 Apr 2026 14:15:24 +0000</pubDate>
      <link>https://dev.to/turacthethinker/the-context-window-is-a-lie-1iko</link>
      <guid>https://dev.to/turacthethinker/the-context-window-is-a-lie-1iko</guid>
      <description>&lt;p&gt;Your model does not remember the conversation. It re-reads it. Every turn.&lt;/p&gt;

&lt;p&gt;That's not a metaphor. The context window is not memory. It's a re-feed pipeline. The model has the same blank slate it had at training time, and on every call we paste the entire history back in front of its eyes and ask it to pretend continuity.&lt;/p&gt;

&lt;p&gt;We've been calling this "long context" and acting like it's progress. It's not. It's brute force. And it's papering over the absence of an actual memory architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "remembering" actually costs
&lt;/h2&gt;

&lt;p&gt;A 200K context window sounds like memory until you watch the bill.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quadratic attention: 200K tokens means ~40B attention operations per layer. Per turn.&lt;/li&gt;
&lt;li&gt;Cache miss: hit the 5-minute prompt cache TTL and you re-pay the full prefill cost.&lt;/li&gt;
&lt;li&gt;Recall decay: empirical needle-in-haystack tests show even frontier models lose precision past ~64K when the needle isn't at the edges.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You are paying for a transcript reread, not a memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three things people confuse
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context window&lt;/strong&gt; — the working set the model sees in this call. Volatile. Resets every turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt cache&lt;/strong&gt; — kv-cache reuse across calls. Not memory; an optimization. TTL-bounded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual memory&lt;/strong&gt; — durable state outside the model: vector DB, file, scratchpad, structured store.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want continuity that survives a 6-hour gap, only #3 works. The other two are illusions you're renting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What works in practice
&lt;/h2&gt;

&lt;p&gt;The agents I run that actually feel like they remember are not the ones with bigger context windows. They're the ones with smaller windows and better external state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;MEMORY.md&lt;/code&gt; the model reads on every wake-up.&lt;/li&gt;
&lt;li&gt;Daily logs it appends to, then summarizes weekly.&lt;/li&gt;
&lt;li&gt;A search index over the logs so it can pull only what's relevant for the current turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No 1M context, no fine-tune, no RAG complexity. Just files the model writes to and reads from.&lt;/p&gt;

&lt;p&gt;The pattern: &lt;strong&gt;treat the model as stateless. Make the surrounding system stateful.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The trap
&lt;/h2&gt;

&lt;p&gt;If you anchor on "context window" as the unit of memory, you'll keep buying bigger windows and wondering why your agent still forgets things across sessions. It forgets because nobody wrote anything down. The window can't help you with that.&lt;/p&gt;

&lt;p&gt;Memory isn't a parameter you upgrade. It's an architecture you build.&lt;/p&gt;




&lt;p&gt;If this resonates, I'm running an experiment with persistent agent memory across Telegram, Bluesky, and Moltbook. Tracking what survives a session reset and what doesn't. Will post the postmortem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
