<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: saltxd</title>
    <description>The latest articles on DEV Community by saltxd (@saltxd).</description>
    <link>https://dev.to/saltxd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4013913%2F455c3e56-eda0-48ae-a2ad-b48d4570c17d.jpg</url>
      <title>DEV Community: saltxd</title>
      <link>https://dev.to/saltxd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saltxd"/>
    <language>en</language>
    <item>
      <title>I replaced 1,000 lines of Python with a 500-word prompt</title>
      <dc:creator>saltxd</dc:creator>
      <pubDate>Fri, 03 Jul 2026 17:14:14 +0000</pubDate>
      <link>https://dev.to/saltxd/i-replaced-1000-lines-of-python-with-a-500-word-prompt-29ao</link>
      <guid>https://dev.to/saltxd/i-replaced-1000-lines-of-python-with-a-500-word-prompt-29ao</guid>
      <description>&lt;p&gt;&lt;em&gt;Or: what building a $0/month autonomous AI librarian taught me about running LLM agents in production.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My documentation wiki was rotting. Not dramatically — just the usual entropy: pages with no tags, books filed on the wrong shelf, duplicated titles, stale metadata blocks pasted from templates. I had written governance rules. Nobody was enforcing them, because the only "body" available was me at 11pm.&lt;/p&gt;

&lt;p&gt;So I built an autonomous agent to do it. Twice, it turned out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Version 1: the framework reflex
&lt;/h2&gt;

&lt;p&gt;The first version was what most engineers would build in 2026: a Python service. Rule engine for the mechanical checks, an LLM API call for the judgment calls, an executor module to apply fixes, an SDK wrapper, retry logic. About a thousand lines.&lt;/p&gt;

&lt;p&gt;It worked. It also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cost ~$0.70 per run, because one badly-scoped rule sent every page to the model without a pre-filter,&lt;/li&gt;
&lt;li&gt;died mid-sweep the first live night when API credits ran out,&lt;/li&gt;
&lt;li&gt;and hid two genuine integration bugs (a skipped protocol handshake that made tool calls silently no-op) inside plumbing I had written myself and therefore trusted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I retired it the same day it shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Version 2: the prompt is the program
&lt;/h2&gt;

&lt;p&gt;The realization: an agentic CLI (I use Claude Code, but the shape generalizes) already &lt;strong&gt;is&lt;/strong&gt; the rule engine, the tool executor, the retry loop, and the judgment module. I had rebuilt, worse, the thing I was paying for.&lt;/p&gt;

&lt;p&gt;Version 2 is a Kubernetes CronJob. The container has the CLI installed. The entrypoint is ~50 lines of shell: load a prompt from a ConfigMap, point the CLI at an MCP server that exposes my wiki's API as tools, run headless, post the summary to my chat channel. The prompt — about 500 words — &lt;em&gt;is&lt;/em&gt; the curator logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CronJob (weekly, Sunday 04:00)
  └─ agent pod (fresh per run)
       ├─ agentic CLI + auth
       ├─ ConfigMap: system-prompt.md   ← the "codebase"
       ├─ MCP server: wiki API as tools
       └─ entrypoint: load prompt → run → post summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because it authenticates with the subscription I already pay for rather than metered API keys, the marginal cost per run is zero. That's not an accounting footnote — it changed the architecture. When runs are free, you can afford to re-scan everything weekly instead of building incremental-state machinery. Half of v1's code existed to avoid API spend.&lt;/p&gt;

&lt;p&gt;And the prompt-based version finds &lt;em&gt;more&lt;/em&gt; real problems than the Python version did, with better judgment — it flags near-miss duplicate titles for human review instead of blindly "fixing" them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part that actually matters: letting an agent edit things
&lt;/h2&gt;

&lt;p&gt;Giving an LLM write access to a system you care about is the whole question. Permissions alone don't answer it. Three design choices did the work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Two lanes, not one.&lt;/strong&gt; Every finding is classified: &lt;em&gt;Tier 1&lt;/em&gt; (mechanically obvious — auto-fix) or &lt;em&gt;Tier 2&lt;/em&gt; (judgment — propose to the human, touch nothing). Tag-case normalization is Tier 1. "This page might belong in a different book" is Tier 2 forever, because moving someone's content is disruptive even when you're right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. An edit-review gate the agent runs against itself.&lt;/strong&gt; Before &lt;em&gt;any&lt;/em&gt; write, the prompt requires the agent to state the exact change and ask: &lt;em&gt;"Could a reasonable maintainer object to this?"&lt;/em&gt; If yes — or anywhere near yes — the item is demoted to the propose lane with a reason. The bias is explicit: a bad auto-fix costs more than another week of backlog. In months of weekly runs, zero regressions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Undo as a first-class dependency.&lt;/strong&gt; The wiki keeps full page revision history, so every agent edit is one click from reverted. I would not point this pattern at a system without native versioning. Reversibility, not permission scoping, is what makes autonomy safe.&lt;/p&gt;

&lt;p&gt;One more habit that earned its place: the agent reads its own previous report before starting, so every summary carries a trend line — &lt;em&gt;"12 items awaiting review, was 14 last run, net −2."&lt;/em&gt; A snapshot tells you nothing; the derivative tells you whether the system is healing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons I'd generalize
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Delete the framework.&lt;/strong&gt; If your agent code is mostly orchestration — routing, retries, tool dispatch — you've rebuilt the harness. Ship a prompt and a schedule instead. Mine went from ~1,000 lines to ~50 plus 500 words of English.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts are code.&lt;/strong&gt; The curator prompt lives in git, deploys through the same chart as everything else, and gets reviewed like a module — because it is one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give agents an undo, not just an allowlist.&lt;/strong&gt; Scoped credentials limit blast radius; versioned targets make mistakes cheap. You want both, but if I had to pick one, I'd pick undo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make the agent defend each write.&lt;/strong&gt; A self-applied "would a maintainer object?" gate sounds soft. Empirically it's the difference between an agent you trust weekly and one you babysit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marginal cost shapes architecture.&lt;/strong&gt; Free re-runs beat clever state. Before optimizing an agent, check whether the economics that forced the complexity still exist.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The same skeleton now runs other jobs in my cluster — different prompt, same image, same MCP tools, same report-to-chat pattern. Each new agent is a config change, not a codebase.&lt;/p&gt;

&lt;p&gt;The wiki, for the record, is clean. The librarian doesn't sleep, doesn't get bored, and — unlike v1 — has never once billed me.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm writing a series on AI-native, self-hosted IT automation: running LLM agents in production on infrastructure you own — safely, cheaply, and without a SaaS bill. Follow along if that's your kind of problem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>kubernetes</category>
      <category>selfhosted</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
