<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Xandhi OS</title>
    <description>The latest articles on DEV Community by Xandhi OS (@xandhiai).</description>
    <link>https://dev.to/xandhiai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3926387%2F1257ed89-13ff-4ce8-bfa5-804e0c4c1505.png</url>
      <title>DEV Community: Xandhi OS</title>
      <link>https://dev.to/xandhiai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xandhiai"/>
    <language>en</language>
    <item>
      <title>Why I Chose Free AI Models Over GPT-4 for Code Generation (And What Happened)</title>
      <dc:creator>Xandhi OS</dc:creator>
      <pubDate>Tue, 12 May 2026 06:20:49 +0000</pubDate>
      <link>https://dev.to/xandhiai/why-i-chose-free-ai-models-over-gpt-4-for-code-generation-and-what-happened-e0n</link>
      <guid>https://dev.to/xandhiai/why-i-chose-free-ai-models-over-gpt-4-for-code-generation-and-what-happened-e0n</guid>
      <description>&lt;p&gt;When I started building Xandhi OS - an AI-native app builder - every advisor and Twitter reply told me the same thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Just use GPT-4. Stop overthinking it."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I didn't. Here's what happened, with real observations, real failure modes, and zero marketing varnish.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thesis
&lt;/h2&gt;

&lt;p&gt;The thesis was simple: for code generation in 2025, the gap between top free models and GPT-4 has collapsed for most tasks - and where it hasn't, you can route around it.&lt;/p&gt;

&lt;p&gt;If that's true, building on free-first models means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dramatically lower cost per build&lt;/li&gt;
&lt;li&gt;Permanent free tier for users (real competitive advantage)&lt;/li&gt;
&lt;li&gt;No vendor lock-in to any single provider's pricing or roadmap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If it's wrong, I quietly migrate to GPT-4 and eat the cost.&lt;/p&gt;

&lt;p&gt;So I tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  The contenders
&lt;/h2&gt;

&lt;p&gt;Through OpenRouter, I had access to dozens of models. I narrowed to a working set:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free tier:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Llama 3.3 70B Instruct&lt;/li&gt;
&lt;li&gt;Qwen 2.5 72B&lt;/li&gt;
&lt;li&gt;DeepSeek V3 / DeepSeek-Coder&lt;/li&gt;
&lt;li&gt;Mistral Large (free quota)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Paid baselines:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o&lt;/li&gt;
&lt;li&gt;Claude 3.5 Sonnet&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How OpenRouter changes the game
&lt;/h2&gt;

&lt;p&gt;OpenRouter is a unified API that routes requests to 100+ models behind a single endpoint. The killer feature isn't just access - it's fallback routing.&lt;/p&gt;

&lt;p&gt;You can declare a chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-coder:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-3.5-sonnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# paid fallback
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the first model is rate-limited or fails, the system silently tries the next. From your app's perspective: one call, always succeeds.&lt;/p&gt;

&lt;p&gt;This is the architecture that made free-first viable. Without fallbacks, free tiers are too flaky for production. With fallbacks, they're solid.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I observed
&lt;/h2&gt;

&lt;p&gt;I ran hundreds of real prompts from Xandhi OS - landing pages, dashboards, CRUD apps, auth flows - across each model category.&lt;/p&gt;

&lt;p&gt;Key findings:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Free models handle 85-90% of code generation tasks at near-parity with paid models.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For standard web application code - React components, CSS layouts, form handling, API routes - the quality difference between DeepSeek-Coder (free) and GPT-4o was minimal. Both produced clean, functional code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Paid models pull ahead on edge cases.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Where GPT-4o and Claude clearly won: complex multi-file refactors, subtle bug diagnosis in long contexts, and tasks requiring deep reasoning about application architecture. These represent roughly 10-15% of total generation tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Latency was comparable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Free models were sometimes faster than paid ones. The bottleneck was rarely the model itself but the prompt size and response length.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The real quality lever is prompt engineering, not model selection.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same model with a better system prompt produced dramatically better output. I spent more time refining prompts than evaluating models.&lt;/p&gt;

&lt;h2&gt;
  
  
  My routing strategy
&lt;/h2&gt;

&lt;p&gt;I don't pick one model. I pick the right model per task:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Best Free Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Intent parsing&lt;/td&gt;
&lt;td&gt;Qwen 2.5 72B&lt;/td&gt;
&lt;td&gt;Excellent at structured reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spec generation&lt;/td&gt;
&lt;td&gt;DeepSeek Chat&lt;/td&gt;
&lt;td&gt;Clean JSON output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture planning&lt;/td&gt;
&lt;td&gt;DeepSeek Chat&lt;/td&gt;
&lt;td&gt;Good at system design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code generation&lt;/td&gt;
&lt;td&gt;DeepSeek-Coder&lt;/td&gt;
&lt;td&gt;Purpose-built for code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test generation&lt;/td&gt;
&lt;td&gt;Llama 3.1 8B&lt;/td&gt;
&lt;td&gt;Simple task, fast model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error debugging&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;Good error analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex healing&lt;/td&gt;
&lt;td&gt;Claude 3.5 Sonnet (paid)&lt;/td&gt;
&lt;td&gt;Last resort, ~5% of builds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;routing is more important than model selection.&lt;/strong&gt; Using the right model for each subtask outperforms using the best model for everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost math
&lt;/h2&gt;

&lt;p&gt;For a typical build (user types a prompt, gets a complete app):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All GPT-4o approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~8-12 API calls across the pipeline&lt;/li&gt;
&lt;li&gt;Average cost: $0.08-0.15 per build&lt;/li&gt;
&lt;li&gt;At 1,000 builds/day: $80-150/day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Free-first routing approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same 8-12 calls, ~95% routed to free models&lt;/li&gt;
&lt;li&gt;Average cost: $0.003-0.008 per build (only paid fallbacks)&lt;/li&gt;
&lt;li&gt;At 1,000 builds/day: $3-8/day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's roughly a 20x cost reduction with minimal quality difference for most use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke
&lt;/h2&gt;

&lt;p&gt;Let me be honest about where free models struggled:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Long-context consistency.&lt;/strong&gt; When generating a 500+ line file, free models occasionally lost track of variable names or forgot imports declared earlier. Paid models handled this better.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; Break large files into smaller generation chunks. Generate imports separately from implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Complex TypeScript types.&lt;/strong&gt; Advanced generics, conditional types, and mapped types were hit-or-miss with free models.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; Use simpler type patterns in generated code. Add a type-checking step in the pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Rate limits.&lt;/strong&gt; Free tiers have usage caps. During high traffic, models become unavailable.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; Fallback chains. Always have 2-3 alternatives for every task. This is why OpenRouter's routing is essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Instruction following edge cases.&lt;/strong&gt; Occasionally free models would ignore specific formatting instructions or add unwanted explanatory text around code blocks.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; Stronger system prompts with explicit formatting rules. Post-processing to strip non-code content.&lt;/p&gt;

&lt;h2&gt;
  
  
  The self-healing discovery
&lt;/h2&gt;

&lt;p&gt;The single highest-ROI feature I built wasn't model routing - it was auto-debugging.&lt;/p&gt;

&lt;p&gt;When generated code has errors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the code through a linter&lt;/li&gt;
&lt;li&gt;Capture error messages&lt;/li&gt;
&lt;li&gt;Feed errors back to the AI with the original code&lt;/li&gt;
&lt;li&gt;Ask it to fix only the errors&lt;/li&gt;
&lt;li&gt;Re-lint and verify&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This simple loop eliminated roughly 60% of broken builds. And it works equally well with free and paid models, because error-fixing is a focused, well-defined task that doesn't require frontier-model reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd recommend
&lt;/h2&gt;

&lt;p&gt;If you're building an AI-powered tool and considering your model strategy:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Start free-first, add paid as surgical fallbacks.&lt;/strong&gt; Don't default to the most expensive model. Route intelligently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Build fallback chains, not single-model dependencies.&lt;/strong&gt; Any model can go down or get rate-limited. Always have alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Invest in prompt engineering before model shopping.&lt;/strong&gt; A well-crafted prompt with a free model beats a lazy prompt with GPT-4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Add self-healing loops.&lt;/strong&gt; Don't make the user debug AI-generated code. Feed errors back automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Measure quality per-task, not globally.&lt;/strong&gt; "Which model is best?" is the wrong question. "Which model is best for this specific subtask?" is the right one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;Free AI models in 2025 are good enough for production code generation in most scenarios. The gap with paid models exists but is narrow and shrinking. With intelligent routing, fallback chains, and self-healing, you can build a reliable, high-quality AI tool at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;That's exactly what we did with Xandhi OS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Website: &lt;a href="https://xandhi.com" rel="noopener noreferrer"&gt;xandhi.com&lt;/a&gt; (free to start)&lt;/li&gt;
&lt;li&gt;Discord: &lt;a href="https://discord.gg/uAxufdAnD" rel="noopener noreferrer"&gt;discord.gg/uAxufdAnD&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Twitter: &lt;a href="https://twitter.com/xandhios" rel="noopener noreferrer"&gt;@xandhios&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/xandhiai/xandhi-os" rel="noopener noreferrer"&gt;github.com/xandhiai/xandhi-os&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building with AI models and want to compare notes on routing strategies, join the Discord. I nerd out about this stuff daily.&lt;/p&gt;

&lt;p&gt;-- Built with persistence in New Delhi&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
    <item>
      <title>How I Built an AI App Builder That Generates Production Code in Minutes</title>
      <dc:creator>Xandhi OS</dc:creator>
      <pubDate>Tue, 12 May 2026 06:19:52 +0000</pubDate>
      <link>https://dev.to/xandhiai/how-i-built-an-ai-app-builder-that-generates-production-code-in-minutes-1lb5</link>
      <guid>https://dev.to/xandhiai/how-i-built-an-ai-app-builder-that-generates-production-code-in-minutes-1lb5</guid>
      <description>&lt;p&gt;I didn't set out to build an AI builder. I set out to stop hating side-project setup.&lt;/p&gt;

&lt;p&gt;Every time I had a new idea - a job board, a habit tracker, a small CRM for a friend's agency - I'd lose the first two evenings to the same ritual: npm init, auth boilerplate, schema design, Tailwind config, deploy pipeline. By the time I got to the actual idea, the spark was gone.&lt;/p&gt;

&lt;p&gt;So I started building a tool to skip the ritual. That tool became Xandhi OS.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 90-second pitch
&lt;/h2&gt;

&lt;p&gt;You type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A SaaS dashboard with team workspaces, billing integration, and a public marketing page."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Xandhi OS routes that prompt through nine layers and hands you back a complete, downloadable application - frontend, styling, interactivity, and structure - in minutes. Real code. Not a sandbox. Not a snippet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most AI builders frustrated me
&lt;/h2&gt;

&lt;p&gt;Three things made me allergic to the existing options:&lt;/p&gt;

&lt;p&gt;They were either too shallow or too magical. Some give beautiful components but no backend structure. Others give working apps but lock you to expensive paid models with metered tokens.&lt;/p&gt;

&lt;p&gt;They didn't think about cost discipline. If I'm going to prototype 20 ideas before finding the one, I cannot pay $2 per prototype.&lt;/p&gt;

&lt;p&gt;The output felt like a black box. I want the actual code, in my hands, deployable anywhere.&lt;/p&gt;

&lt;p&gt;I wanted something different: deep, transparent, and cheap to experiment with.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 9-layer architecture
&lt;/h2&gt;

&lt;p&gt;The core insight was that "AI builder" isn't one job - it's a pipeline. Each stage has different requirements (latency, reasoning depth, structured output, creativity), which means each stage benefits from a different approach.&lt;/p&gt;

&lt;p&gt;Here's the pipeline I ended up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Intent Parser        - what does the user actually want?
2. Spec Generator       - turn the intent into a structured spec
3. Architecture Planner - choose stack, modules, data model
4. Component Composer   - UI layout, page tree, design tokens
5. Code Generator       - write the actual files
6. Linter               - check for syntax and style issues
7. Auto-Debugger        - fix errors automatically
8. Security Scanner     - check for vulnerabilities
9. Packager             - bundle everything for download
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer has a defined input/output contract. The orchestrator is a state machine walking the prompt through them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-model routing: the unsung trick
&lt;/h2&gt;

&lt;p&gt;Here's what makes Xandhi OS efficient to run.&lt;/p&gt;

&lt;p&gt;Through OpenRouter, the system routes through 13 AI models from 6 providers. Instead of being locked to one expensive model, it picks the best model for each task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Planning stages use models good at reasoning and structured output&lt;/li&gt;
&lt;li&gt;Code generation uses models specialized in code quality&lt;/li&gt;
&lt;li&gt;Debugging uses models with strong error analysis capabilities&lt;/li&gt;
&lt;li&gt;Simple tasks use lightweight, fast models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simplified routing concept:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ROUTING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen-2.5-72b:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-chat:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek/deepseek-coder:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ROUTING&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_openrouter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;  &lt;span class="c1"&gt;# try next model in fallback chain
&lt;/span&gt;    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All models failed for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the core concept. Add retries, circuit breakers, and observability around it, and you have a production routing layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 7-step auto-debug pipeline
&lt;/h2&gt;

&lt;p&gt;This was the feature that changed everything. Instead of handing users broken code and saying "fix it yourself," Xandhi OS runs every generated file through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Lint&lt;/strong&gt; - check syntax&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error detection&lt;/strong&gt; - find logical issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-fix&lt;/strong&gt; - feed errors back to AI for correction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-lint&lt;/strong&gt; - verify fixes worked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security scan&lt;/strong&gt; - check for XSS, injection, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-file validation&lt;/strong&gt; - ensure each file is complete&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final verification&lt;/strong&gt; - confirm the build is clean&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This eliminates roughly 60% of "broken build" complaints. Just feeding the error back into the model with one more turn fixes most generation bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;Free models are stunningly good in 2025. They match paid model output quality around 85-90% of the time on structured code tasks.&lt;/p&gt;

&lt;p&gt;The bottleneck isn't the model - it's the prompt scaffolding. Same model, better system prompts, 2x output quality.&lt;/p&gt;

&lt;p&gt;Self-healing eliminates most complaints. Auto-feeding build errors back into the model is the single highest-ROI feature I built.&lt;/p&gt;

&lt;p&gt;People want the code more than the deploy. I assumed deploy-first. Users overwhelmingly asked for "give me the downloadable files."&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js 15, React, TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend API&lt;/td&gt;
&lt;td&gt;Go (Fiber framework)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Engine&lt;/td&gt;
&lt;td&gt;Python (FastAPI), OpenRouter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;Redis 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy&lt;/td&gt;
&lt;td&gt;Nginx (SSL, gzip, rate limiting)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Hetzner Cloud (Germany)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payments&lt;/td&gt;
&lt;td&gt;Razorpay (India + International)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Results so far
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;22+ page types and templates live&lt;/li&gt;
&lt;li&gt;13 AI models across 6 providers&lt;/li&gt;
&lt;li&gt;7-step auto-debug pipeline&lt;/li&gt;
&lt;li&gt;Average build time: 3-5 minutes&lt;/li&gt;
&lt;li&gt;Cost per build for free tier users: zero&lt;/li&gt;
&lt;li&gt;Active Discord community&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I learned (the unsexy version)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pipelines beat monoliths.&lt;/strong&gt; One big prompt to "build me an app" is a coin flip. Nine small prompts, each evaluated independently, is engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Routing matters more than model choice.&lt;/strong&gt; Picking which model when matters more than picking the best model for everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the output ownable.&lt;/strong&gt; Users trust tools that hand them the code, not tools that hide it behind a paywall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ship in public.&lt;/strong&gt; I underestimated how much momentum Discord plus daily build-logs creates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be honest about limitations.&lt;/strong&gt; Users forgive imperfect code generation. They don't forgive fake metrics or misleading claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If any of this resonated, Xandhi OS is free to try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Website: &lt;a href="https://xandhi.com" rel="noopener noreferrer"&gt;xandhi.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Discord: &lt;a href="https://discord.gg/uAxufdAnD" rel="noopener noreferrer"&gt;discord.gg/uAxufdAnD&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Twitter: &lt;a href="https://twitter.com/xandhios" rel="noopener noreferrer"&gt;@xandhios&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/xandhiai/xandhi-os" rel="noopener noreferrer"&gt;github.com/xandhiai/xandhi-os&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tell me what you'd build. I might ship a template for it this week.&lt;/p&gt;

&lt;p&gt;-- Built with too much chai in New Delhi&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>indiehackers</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
