<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 东道主</title>
    <description>The latest articles on DEV Community by 东道主 (@nalnanananana).</description>
    <link>https://dev.to/nalnanananana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4014323%2F01949b62-709c-4d67-aa24-9f915eff9f35.jpg</url>
      <title>DEV Community: 东道主</title>
      <link>https://dev.to/nalnanananana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nalnanananana"/>
    <language>en</language>
    <item>
      <title>I spent ~500M tokens building a prompt optimization tool</title>
      <dc:creator>东道主</dc:creator>
      <pubDate>Sat, 04 Jul 2026 03:05:54 +0000</pubDate>
      <link>https://dev.to/nalnanananana/i-burned-through-roughly-500m-tokens-building-a-prompt-optimization-tool-30mh</link>
      <guid>https://dev.to/nalnanananana/i-burned-through-roughly-500m-tokens-building-a-prompt-optimization-tool-30mh</guid>
      <description>&lt;p&gt;Hey everyone,&lt;/p&gt;

&lt;p&gt;I've been working on an automated prompt optimization project for a while now, and I've gone through roughly 500M tokens iterating on the core loop.&lt;/p&gt;

&lt;p&gt;Along the way, I tried leaning on pretty much every major model out there — GLM, DeepSeek, GPT, Claude, you name it — to help me refine the architecture. But honestly, their output was underwhelming for this specific task. Most of their built-in agent/skill features were basically useless for actually designing a better optimization pipeline.&lt;/p&gt;

&lt;p&gt;This is the core design pattern I'm currently running with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        ┌──────────────────────────────────────────────────────┐
        ▼                                                        │
Current Prompt ──► Evaluate (target + judge) ──► Score + deductions
        ▲                                                        │
        │                                                        ▼
Optimizer Model ◄────────── rewrite from feedback ◄─── keep best-scoring version
        (repeats until round budget is hit; highest-scoring prompt wins)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've tacked on a few extra things on top: a prompt library, a test question bank, and some other quality-of-life features. But I can't shake the feeling that these are just surface-level additions — they don't really move the needle on how well the core optimization actually works.&lt;/p&gt;

&lt;p&gt;That's why I'm posting here. I'd love to get this community's take:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What would you change about this core loop to make it fundamentally better?&lt;/li&gt;
&lt;li&gt;What features do you actually find valuable in a prompt optimization tool, beyond the basics?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm relatively new to sharing my work here, so any advice, critiques, or wild ideas are greatly appreciated. Thanks in advance!&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
