<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AI Dev Hub</title>
    <description>The latest articles on DEV Community by AI Dev Hub (@aidevhub).</description>
    <link>https://dev.to/aidevhub</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3769170%2F51b2c1be-6090-4a70-b86f-000759e46929.png</url>
      <title>DEV Community: AI Dev Hub</title>
      <link>https://dev.to/aidevhub</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aidevhub"/>
    <language>en</language>
    <item>
      <title>MCP Inspector vs Postman in 2026: which one I actually use</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Tue, 23 Jun 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/aidevhub/mcp-inspector-vs-postman-in-2026-which-one-i-actually-use-3j37</link>
      <guid>https://dev.to/aidevhub/mcp-inspector-vs-postman-in-2026-which-one-i-actually-use-3j37</guid>
      <description>&lt;h1&gt;
  
  
  MCP Inspector vs Postman in 2026: which one I actually use
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;I use two of the three: MCP Inspector for live calls, and a small client-side validator for checking definitions before I ever start a server. Postman's MCP support works, but it was too much setup for the quick checks I do most. Same broken tool, run through all three, below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Full disclosure: the MCP Tool Tester I link to below is one I built. I'd tried four other validators and every one either needed a running server, an npm install, or an account before it'd tell me my inputSchema had a typo. Mine doesn't. It's free, runs entirely in your browser, no signup, nothing uploaded. Paste it in, get an answer. If you've got a better one, tell me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task: a broken currency tool
&lt;/h2&gt;

&lt;p&gt;Three weeks ago I was wiring up an MCP server for a currency tool, and the agent kept refusing to call it. No error message. It just ignored the tool. After 47 minutes of squinting I found it: my handler read a field called &lt;code&gt;from_currency&lt;/code&gt;, but the schema I advertised defined &lt;code&gt;currency_from&lt;/code&gt;. The model saw a contract it couldn't satisfy and quietly walked away.&lt;/p&gt;

&lt;p&gt;Here's the thing about MCP tool definitions: the schema you advertise and the handler you write live in two different places, and nothing forces them to agree. JSON Schema will happily describe a field your code never reads. Most agents won't tell you why they skipped a tool, they just skip it. The expected behavior here was simple: send 100 USD with EUR as the target, get a converted amount back. What I actually got was nothing, no call attempted, which is the worst kind of bug because there's no stack trace to follow.&lt;/p&gt;

&lt;p&gt;So I rebuilt that broken tool on purpose and ran it through three things people reach for when testing MCP: the official Inspector, Postman, and the validator I made. Here's the server, mismatch and all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// server.js  (run: npm i @modelcontextprotocol/sdk zod &amp;amp;&amp;amp; node server.js)&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;McpServer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/mcp.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StdioServerTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/server/stdio.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fx&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;convert_currency&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Convert an amount between two ISO 4217 currency codes&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;amount to convert&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;currency_from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;currency_to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// Bug: the schema defines currency_from, the handler reads from_currency.&lt;/span&gt;
  &lt;span class="c1"&gt;// The names never line up, so the agent sees a field it can't supply.&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;from_currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;currency_to&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;from_currency&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currency_to&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioServerTransport&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  MCP Inspector: what happened
&lt;/h2&gt;

&lt;p&gt;MCP Inspector is the official debugger. You run &lt;code&gt;npx @modelcontextprotocol/inspector node server.js&lt;/code&gt; and it opens a local UI where you can list a server's tools and call them by hand. I ran it last Tuesday against the broken server above.&lt;/p&gt;

&lt;p&gt;It connected in about two seconds and the tool showed up. Inspector reads your inputSchema and renders a form, so the field labels it expected (including &lt;code&gt;currency_from&lt;/code&gt;) were right there on screen. That's where the bug got visible to me, because I knew my handler wanted &lt;code&gt;from_currency&lt;/code&gt;. When I filled the form and hit call, my own handler threw on an undefined field. Honest, but late: I had to boot a full server to learn something a static check could have told me in seconds.&lt;/p&gt;

&lt;p&gt;One more thing worth flagging. Inspector caches the tool list per session, so when I edited the schema and restarted the server, I had to reconnect to see the change. Minor, but I lost a couple of minutes the first time wondering why my fix wasn't showing. Once you learn to reconnect after every restart, it's fine. The history panel is also handy for replaying a call you already got working.&lt;/p&gt;

&lt;p&gt;Inspector's strength is that it talks to a real, running server over the actual transport. Its limit is the same thing. It can't say a word about a definition until there's a live process to connect to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Postman: what happened
&lt;/h2&gt;

&lt;p&gt;Postman shipped MCP support in 2025, and for HTTP-based servers it's solid. I pointed it at the same tool after switching the transport to streamable HTTP, because Postman won't drive a stdio process. It discovered the tool, showed the schema, and let me send a call.&lt;/p&gt;

&lt;p&gt;The request builder is nicer than Inspector's plain form, I'll give it that. Two things bugged me, though. I had to change my transport just to test, so I was poking at a slightly different server than the one I ship. And the validation is shallow: it happily sent garbage and reported the failure as a generic error response instead of pointing at the field that was wrong. Setup ate about 15 minutes. If you already live in Postman, that cost is mostly paid. For a fast definition check, it's heavy.&lt;/p&gt;

&lt;p&gt;To be fair to Postman, the collection sharing is real value if you're on a team. I could save the MCP connection and hand it to a coworker, and they'd get the same setup without me writing a README. That's something neither Inspector nor my validator does. It just doesn't help the specific thing I was testing, which was whether a definition is correct before anyone runs it.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Tool Tester: what happened
&lt;/h2&gt;

&lt;p&gt;This one I built, so weigh that accordingly. The workflow is intentionally dumb: paste the tool definition (the JSON a server advertises, or a WebMCP &lt;code&gt;tools&lt;/code&gt; array) and it checks the shape against the MCP schema plus a handful of lint rules I kept tripping over in real projects.&lt;/p&gt;

&lt;p&gt;On the broken currency tool it flagged the mismatch in under a second: a required name with no matching property. It also caught two issues the other two never looked at, a description longer than the cutoff where some clients truncate (I still don't know the exact limit for every client, but I've watched it break around 1,024 characters) and an enum with a duplicated value. No server to boot, no install. It runs in the browser, so the definition never leaves your machine, which I care about because my tool descriptions leak internal endpoint names.&lt;/p&gt;

&lt;p&gt;The lint rules came straight from bugs that cost me time. A required field with no property to back it. A description left empty, which makes some clients drop the tool entirely. I keep adding rules as I get burned, so the list grows in an embarrassingly autobiographical way. A few days ago a teammate's PR defined a tool that required &lt;code&gt;user_id&lt;/code&gt; but only declared &lt;code&gt;userId&lt;/code&gt;. Same family of bug as mine. The validator flagged it before review started, which saved a confused back-and-forth in comments.&lt;/p&gt;

&lt;p&gt;What it won't do is execute your tool. It checks definitions, not behavior. Think of it as the step you run before Inspector.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scorecard, and which one I reach for
&lt;/h2&gt;

&lt;p&gt;Here's the same task scored across the things I actually care about:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;MCP Inspector&lt;/th&gt;
&lt;th&gt;Postman&lt;/th&gt;
&lt;th&gt;MCP Tool Tester&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Setup to first result&lt;/td&gt;
&lt;td&gt;about 10s via npx&lt;/td&gt;
&lt;td&gt;about 15 min&lt;/td&gt;
&lt;td&gt;about 3s, paste only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Needs a running server&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flags a bad definition pre-deploy&lt;/td&gt;
&lt;td&gt;only as a runtime error&lt;/td&gt;
&lt;td&gt;shallow&lt;/td&gt;
&lt;td&gt;yes, field-level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Actually calls the tool&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validates WebMCP tool arrays&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;partial&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;free, open source&lt;/td&gt;
&lt;td&gt;free tier, paid plans&lt;/td&gt;
&lt;td&gt;free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is clear enough: Postman never wins a row outright for my use case. It's competent everywhere and best nowhere, which is a perfectly respectable place to be for a general tool that happens to speak MCP.&lt;/p&gt;

&lt;p&gt;So which do I actually use? Two of them, at different moments. While I'm authoring a definition or reviewing a teammate's pull request, I paste it into the validator first, because catching a name mismatch in three seconds beats catching it after a 90-second server boot and a confused agent. Once the definition is clean, I bring up Inspector and call the thing for real over the transport I'll ship. Postman stays in the box unless a project already runs on it.&lt;/p&gt;

&lt;p&gt;If I had to give up two of the three and keep one, I'd actually struggle, because they solve different halves of the problem. Definition correctness and runtime behavior aren't the same question. The validator answers the first in seconds; Inspector answers the second properly. That split is why I run both rather than picking a single winner.&lt;/p&gt;

&lt;p&gt;If you want to throw your own definitions at the validator, it's here: &lt;a href="https://aidevhub.io/mcp-tool-tester/" rel="noopener noreferrer"&gt;MCP Tool Tester&lt;/a&gt;. Paste one in, read what's wrong, move on. I added the WebMCP checks last month after a reader pointed out that browser tool arrays carry their own sharp edges.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Does MCP Inspector validate my schema without running the server?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; No. It connects to a live server over stdio or HTTP, lists the advertised tools, and lets you call them. If the server won't start, you get nothing to inspect. For static definition checks you want a validator that reads the raw JSON.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; What's the difference between MCP and WebMCP here?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; MCP tools are advertised by a server you run over stdio or HTTP. WebMCP exposes tools from inside a web page to a browser-side agent. The definition shape is similar, but WebMCP adds constraints that most server-focused testers skip.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can I trust a browser-based validator with internal tool descriptions?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Only if it's genuinely client-side. Open the network tab and confirm nothing leaves your machine. The one I built runs entirely in-page for that reason. Don't take my word for it, check the requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Is Postman a bad choice for MCP?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Not at all. If your server is HTTP and your team already uses Postman, the request history and sharing are genuinely useful. It's just heavier than I want for a quick sanity check on a definition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Which one is fastest for a quick check?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; The validator, by a wide margin, because there's no server to start. Paste and read. For anything involving real calls and real responses, that speed stops mattering and Inspector's live connection is what you want.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with AI assistance and human review. Try the tool at &lt;a href="https://aidevhub.io/mcp-tool-tester/" rel="noopener noreferrer"&gt;aidevhub.io/mcp-tool-tester&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>mcp</category>
      <category>testing</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Stop hand-aligning markdown tables in 2026</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Thu, 18 Jun 2026 14:00:01 +0000</pubDate>
      <link>https://dev.to/aidevhub/stop-hand-aligning-markdown-tables-in-2026-1eig</link>
      <guid>https://dev.to/aidevhub/stop-hand-aligning-markdown-tables-in-2026-1eig</guid>
      <description>&lt;h1&gt;
  
  
  Stop hand-aligning markdown tables in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Use a visual, spreadsheet-style editor that exports GitHub-Flavored Markdown so you stop counting spaces by hand. You paste a CSV export, tweak the alignment per column, and the markdown updates as you type. The editor I cover here runs in your browser and never uploads your data. It cut a 47-minute chore down to about two minutes for me.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Quick disclosure: the Markdown Table Generator I link to below is one I built. I'd tried five other generators. Every one either logged my table data to a server or buried the alignment controls behind a paywall, and a couple choked on cells that contained a pipe character. So I wrote my own. It's free and runs entirely client-side. There's no signup, and nothing gets uploaded. If you know a better one, tell me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 47 minutes I'll never get back
&lt;/h2&gt;

&lt;p&gt;Three weeks ago I sat down to update the README for an internal CLI tool. It had grown to fourteen flags, and a teammate asked for a table documenting each one. I typed it by hand the way I always had, lining up the pipes with spaces so the raw file looked tidy in my editor.&lt;/p&gt;

&lt;p&gt;Then I added one more flag. Every column shifted. I spent the next stretch nudging spaces around like it was 2011, and when I finally pushed, the diff came back as a wall of red and green because the renderer had touched every line in the block. The actual change was a single row.&lt;/p&gt;

&lt;p&gt;I checked the clock afterward. 47 minutes on one table. Not the docs, not the code. The table itself.&lt;/p&gt;

&lt;p&gt;Here's the thing that really gets me about those diffs. When you re-pad a whole table to fit one new row, every line registers as changed, so a reviewer can't see what you actually edited. They either skip the table review entirely or waste time eyeballing forty lines to find your one real change. Both outcomes are bad, and both come from a formatting quirk rather than anything meaningful.&lt;/p&gt;

&lt;p&gt;Markdown tables look simple, and for a 2x2 they are. The trouble starts once a table grows past a handful of rows: manual alignment stops paying for itself, and any single edit reshuffles the whole block. The alignment markers don't help my memory either. Left is &lt;code&gt;:---&lt;/code&gt;, right is &lt;code&gt;---:&lt;/code&gt;, center is &lt;code&gt;:---:&lt;/code&gt;, and I still open the docs every time to remember which side the colon goes on.&lt;/p&gt;

&lt;p&gt;This is a tiny problem, which is exactly why it irritates me so much. Nobody plans for it. You don't budget time to reformat a table the way you'd budget for a refactor, so it sneaks up during work you thought was basically finished and quietly eats real minutes. Multiply that across every doc and changelog you touch in a year, and the total stops being small.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the generator actually does
&lt;/h2&gt;

&lt;p&gt;There's no magic under the hood. A table generator holds your data as a 2D array and re-renders the text output on every keystroke. The fiddly part is the padding. Each column's width equals the length of its longest cell. Every other cell in that column gets padded to match, and the separator row carries the alignment colons.&lt;/p&gt;

&lt;p&gt;Here's the core of that logic in plain JavaScript. Save it as &lt;code&gt;table.js&lt;/code&gt; and run it with Node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;toMarkdownTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;aligns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;widths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;align&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;width&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;align&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;right&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;align&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;left&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gap&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cells&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;| &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;cells&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;pad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;widths&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;aligns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; |&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;widths&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;aligns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;right&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;dash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;aligns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;center&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;dash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;dash&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;| &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;sep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; |&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;flag&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;--verbose&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;false&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;--retries&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;int&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;toMarkdownTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;left&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;left&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;right&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;node table.js&lt;/code&gt; and you get a clean, aligned table out the other side. A visual editor wraps this same logic in a spreadsheet grid, so you can tab between cells and paste a block of CSV while the markdown updates live in a pane beside the data. You can &lt;a href="https://aidevhub.io/markdown-table-generator/" rel="noopener noreferrer"&gt;try the editor here&lt;/a&gt; and drop in a CSV export straight from a spreadsheet app; it parses the rows in the browser and keeps the data on your machine.&lt;/p&gt;

&lt;p&gt;The part I care about most is that CSV import. Most of my tables already exist somewhere, usually as a spreadsheet export or a query result. Retyping them by hand is the step that actually wastes my afternoon, so being able to paste raw CSV and get GFM back is the whole point for me.&lt;/p&gt;

&lt;p&gt;One detail that took me longer than I'd like to admit: cells that contain a pipe character. In GFM a raw &lt;code&gt;|&lt;/code&gt; inside a cell breaks the table unless you escape it as &lt;code&gt;\|&lt;/code&gt;. The generator does that escaping for you on the way out, which is the single bug that pushed me off two of the hosted tools I'd been using. I don't know why so many of them skip it, but they do.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it stacks up against the alternatives
&lt;/h2&gt;

&lt;p&gt;I didn't build this in a vacuum. There are several common ways to make a markdown table, and each comes with a tradeoff. Here's how I'd line up the ones I actually reached for before I gave up and wrote my own:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;CSV import&lt;/th&gt;
&lt;th&gt;Alignment controls&lt;/th&gt;
&lt;th&gt;Works offline&lt;/th&gt;
&lt;th&gt;Keeps data local&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;This generator&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Per-column buttons&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tablesgenerator.com&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Sends to a server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VS Code table extension&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Auto-format only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typing it by hand&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual colons&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The VS Code formatter is genuinely good if you already have the rows typed out. It'll re-align an existing table for you on save, which covers the "I added a row and everything shifted" pain nicely. What it won't do is take a CSV dump and build the table from scratch, and that build step is exactly where my time goes. The hosted generators do handle CSV, but the ones I tried either round-tripped my data through their backend or put alignment behind an account wall. For a thirty-second job, that's too much friction and too much trust to ask.&lt;/p&gt;

&lt;p&gt;None of these are bad tools, to be clear. The hand-typed table is fine until it grows. The VS Code extension is the right answer if your rows are already in the file. My complaint is narrow: I wanted one workflow that started from a CSV and ended at GFM I could trust, without an account in the middle.&lt;/p&gt;

&lt;p&gt;So the gap I kept hitting was a tool that could import CSV and give me real alignment controls without shipping my data off my laptop. That combination is what I ended up building.&lt;/p&gt;

&lt;h2&gt;
  
  
  When I don't reach for it
&lt;/h2&gt;

&lt;p&gt;A generator isn't always the right call, and pretending it is would be dishonest.&lt;/p&gt;

&lt;p&gt;If the table is tiny, say two columns and two rows, just type it. Opening any tool is slower than the keystrokes, and you won't fight alignment at that size anyway.&lt;/p&gt;

&lt;p&gt;If the table is produced from data inside a script or a CI job, generate the markdown in code instead. The function above is a fine starting point, and a GUI parked in the middle of an automated pipeline defeats the purpose.&lt;/p&gt;

&lt;p&gt;If you need merged cells or row spans, plain markdown can't express either of those. You'll have to drop down to raw HTML inside your markdown, or pick a different format. No generator can paper over a limitation that lives in the spec itself.&lt;/p&gt;

&lt;p&gt;And honestly, I still hand-write a quick table now and then when I'm offline and can't be bothered to open a browser tab. Old habits stick around.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Does it output GitHub-Flavored Markdown specifically?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes. The separator row uses the colon syntax GitHub renders, so whatever alignment you set shows up correctly once the table lands in a repo or an issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can I paste data from Excel or Google Sheets?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes. Copy a range and paste it straight in. Spreadsheet copy usually arrives as tab-separated text, and the editor reads that the same way it reads CSV, so you don't need to export a file first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Is any of my table data sent anywhere?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; No. Parsing and rendering both happen in your browser, and nothing about the table leaves your machine. That was the main reason I stopped using the hosted options I'd been relying on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; What about very wide tables?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; They work, but think about whoever reads the result. A table with twelve columns is painful in a rendered doc and it'll scroll sideways on a phone. I try to keep mine under six columns when the data lets me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Does it handle alignment per column or all at once?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Per column. Each column gets its own alignment toggle, so a numbers column can sit right-aligned while the text label beside it stays left. That mixed alignment is the part raw typing makes genuinely tedious, since every change recomputes the padding by hand.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with AI assistance and human review. Try the tool at &lt;a href="https://aidevhub.io/markdown-table-generator/" rel="noopener noreferrer"&gt;aidevhub.io/markdown-table-generator&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>showdev</category>
      <category>tooling</category>
      <category>writing</category>
    </item>
    <item>
      <title>Building a rules-file MCP server in 2026</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Thu, 11 Jun 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/aidevhub/building-a-rules-file-mcp-server-in-2026-2j74</link>
      <guid>https://dev.to/aidevhub/building-a-rules-file-mcp-server-in-2026-2j74</guid>
      <description>&lt;h1&gt;
  
  
  Building a rules-file MCP server in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;You can build an MCP server that writes CLAUDE.md, .cursorrules, and copilot-instructions.md from one prompt in about 30 lines of Python with FastMCP. Register the tool, point your assistant's config at it, and ask. The code is the easy part. Keeping three formats in sync is the actual work. Here's the server and the part that broke first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The rules-file generator I link to below is one I built. I'd opened four separate template repos last spring trying to bootstrap a CLAUDE.md, and every one assumed a single assistant. Cursor users had .cursorrules. The Claude crowd had their own file. Nobody emitted every format from one input. So I wrote a small web page that does. It's free, runs entirely in your browser, no signup, nothing gets uploaded. If you've got a better one, tell me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The goal
&lt;/h2&gt;

&lt;p&gt;A local MCP server that any MCP-aware assistant can call to write its own rules file. You say 'set up the rules for this repo,' the assistant calls the tool with what it already knows about your stack, and three files land on disk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLAUDE.md at the repo root&lt;/li&gt;
&lt;li&gt;.cursorrules beside it&lt;/li&gt;
&lt;li&gt;.github/copilot-instructions.md for the Copilot users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No copy-paste between editor tabs. The outcome you'd screenshot: your assistant printing &lt;code&gt;wrote 412 bytes to CLAUDE.md&lt;/code&gt; right in the chat, and the file appearing in your sidebar a half second later.&lt;/p&gt;

&lt;p&gt;A rules file is just instructions your assistant loads before it reads your code: your conventions, plus the things it keeps getting wrong. Most teams write one by hand, then never touch it again, and it rots. Generating it from a tool means you can regenerate when the stack changes instead of editing prose at midnight. The sync problem is why I cared. I'd update CLAUDE.md and forget the .cursorrules copy, and then two teammates on different assistants would get different rules. That drift is quiet and it's expensive.&lt;/p&gt;

&lt;p&gt;Why a server and not a plain script? Because the assistant can call it mid-conversation, with context it already has. It knows you're on Postgres and Vite before you say a word. That's the whole reason to bother.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup and auth
&lt;/h2&gt;

&lt;p&gt;Here's the part nobody warns you about: there's no auth. I spent a chunk of last Thursday hunting for where an API key goes, and there isn't one. A local MCP server talks to your assistant over stdio, on your machine, as your user. The "credential" is filesystem permission. I was wrong about needing a token, and it took me 47 minutes of reading the spec before I believed it.&lt;/p&gt;

&lt;p&gt;Install the SDK with &lt;code&gt;pip install "mcp[cli]"&lt;/code&gt;. Then register the server. For Claude Code, that's an &lt;code&gt;.mcp.json&lt;/code&gt; file in the project root with a &lt;code&gt;command&lt;/code&gt; and &lt;code&gt;args&lt;/code&gt; pointing at your Python file. Cursor and Claude Desktop use their own JSON config, same shape. One gotcha that cost me real time: a stdio server must never write to stdout, because that stream carries the JSON-RPC frames. Any stray &lt;code&gt;print()&lt;/code&gt; corrupts the protocol. Log to stderr or to a file.&lt;/p&gt;

&lt;p&gt;You can pass a default through the environment too. I set &lt;code&gt;RULES_DEFAULT_STACK&lt;/code&gt; in the &lt;code&gt;.mcp.json&lt;/code&gt; &lt;code&gt;env&lt;/code&gt; block so the tool has a fallback when the model forgets to send one. Small thing, but it turned a class of empty-stack files into a sane default. Restart the assistant after editing the config, by the way: most of them read MCP settings once at launch and won't pick up changes live, which confused me for a good ten minutes the first time.&lt;/p&gt;

&lt;p&gt;If you just want the files without running a server, the rules-file generator at &lt;a href="https://aidevhub.io/rules-file-generator/" rel="noopener noreferrer"&gt;aidevhub.io/rules-file-generator&lt;/a&gt; produces the same multi-format output in the browser. I reach for it when I'm scaffolding a repo I'll touch once and forget.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core code
&lt;/h2&gt;

&lt;p&gt;The whole server fits in one file. Here it is, with the guard I added only after it bit me (more on that in the last section):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# server.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rules-file-generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;TEMPLATES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLAUDE.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cursor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.cursorrules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;copilot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.github/copilot-instructions.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_rules&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Write an AI rules file. target: claude | cursor | copilot.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TEMPLATES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# guard added after the bug in the last section
&lt;/span&gt;        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown target &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;; pick claude, cursor, or copilot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stack: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Conventions&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Keep functions under 40 lines.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Put tests next to the code they cover.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- No new dependency without a note in the PR.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wrote &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@mcp.tool()&lt;/code&gt; decorator does the heavy lifting. FastMCP reads your type hints (&lt;code&gt;target: str&lt;/code&gt;, and the rest) and generates the JSON schema the assistant uses to call the tool correctly. You write a normal Python function and the protocol wiring is generated for you. Return a string and it shows up in the chat as the tool result.&lt;/p&gt;

&lt;p&gt;To call it, you don't do anything special. You ask the assistant in plain language ('generate a Claude rules file for this project, we're on FastAPI and Postgres') and it maps that to the &lt;code&gt;generate_rules&lt;/code&gt; arguments on its own. The first time you watch a model fill in &lt;code&gt;stack="FastAPI + Postgres"&lt;/code&gt; without you naming the parameter, it feels like cheating. That mapping is exactly what the type hints buy you.&lt;/p&gt;

&lt;p&gt;One design choice worth calling out: the tool writes the file itself instead of returning the text for the assistant to write. I went back and forth on this. Returning text keeps the server side-effect free, which is cleaner, but then the assistant has to turn around and call its own file-write tool, and you've added a round trip and a chance for it to mangle the content. Writing directly from the server means the bytes that get validated are the bytes that hit disk. For a generator like this, I'd rather own the write.&lt;/p&gt;

&lt;h2&gt;
  
  
  How FastMCP compares to the alternatives
&lt;/h2&gt;

&lt;p&gt;I tried the TypeScript SDK first and bounced off it. It's fine. I just had a Python repo open and didn't want a Node toolchain to write three files. Here's how the realistic options stack up on the axes I cared about:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Lines to first working tool&lt;/th&gt;
&lt;th&gt;Schema validation&lt;/th&gt;
&lt;th&gt;Best when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FastMCP (Python)&lt;/td&gt;
&lt;td&gt;~12&lt;/td&gt;
&lt;td&gt;inferred from type hints&lt;/td&gt;
&lt;td&gt;you want a tool running before lunch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript SDK&lt;/td&gt;
&lt;td&gt;~40&lt;/td&gt;
&lt;td&gt;explicit, via Zod&lt;/td&gt;
&lt;td&gt;your stack is already Node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw JSON-RPC over stdio&lt;/td&gt;
&lt;td&gt;~120&lt;/td&gt;
&lt;td&gt;hand-written&lt;/td&gt;
&lt;td&gt;you need zero dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Python row wins for this job because the template logic is a dozen lines of string formatting and the SDK gets out of the way. If your assistant tooling already lives in Node, the gap closes and the TypeScript SDK is the saner pick. Raw JSON-RPC is there if you enjoy pain or have a runtime the SDKs don't support yet.&lt;/p&gt;

&lt;p&gt;One axis I left off the table on purpose: performance. None of these matter at this scale. You're writing a few hundred bytes once. If you're benchmarking a rules-file generator you've taken a wrong turn somewhere. Pick the SDK that matches the language your tooling already speaks and move on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke the first time I ran it
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;KeyError&lt;/code&gt;, and it cost me an afternoon. The first version looked up &lt;code&gt;TEMPLATES[target]&lt;/code&gt; with a plain bracket, no guard. The model called the tool with &lt;code&gt;target="claude-code"&lt;/code&gt; instead of &lt;code&gt;"claude"&lt;/code&gt;, Python raised a bare &lt;code&gt;KeyError&lt;/code&gt;, and MCP wrapped it into a generic "tool failed" with nothing useful in the chat. I sat there convinced the stdio transport was broken. It was a typo in one argument.&lt;/p&gt;

&lt;p&gt;The fix is the &lt;code&gt;.get()&lt;/code&gt; plus an explicit &lt;code&gt;ValueError&lt;/code&gt; that names the valid options. That error string goes straight back to the model, which reads it and retries with &lt;code&gt;"claude"&lt;/code&gt;. That feedback loop is the real point of returning good errors from a tool: the assistant self-corrects if you let it. Honestly this annoyed me for most of a day before it clicked.&lt;/p&gt;

&lt;p&gt;The second thing: that stray &lt;code&gt;print()&lt;/code&gt; I mentioned. I'd left one in to debug the KeyError, and it quietly corrupted the JSON-RPC frames. The server "connected" and then every call hung with no error at all. Pull debug output to stderr and it comes back to life. Both bugs were mine, and both took longer to find than the whole server took to write.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Do I need a separate server for Cursor and Claude Code?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; No. It's the same server and the same stdio protocol. You register it in each tool's config file, but the server code doesn't change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can the tool read my existing code to infer conventions?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes. Add a second tool that globs the repo and feeds a short summary into the template. Keep it read-only so a bad prompt can't rewrite files you didn't mean to touch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Why three files instead of one shared format?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Because the assistants haven't agreed on a format. CLAUDE.md is Markdown prose. The .cursorrules format is its own shape, and Copilot wants a specific path under .github. Until that converges, you generate each one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Is FastMCP ready for real use in 2026?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; For local developer tooling, yes. I wouldn't expose one to the public internet without real auth in front of it, since the local model assumes a single trusted user.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with AI assistance and human review. Try the tool at &lt;a href="https://aidevhub.io/rules-file-generator/" rel="noopener noreferrer"&gt;aidevhub.io/rules-file-generator&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>mcp</category>
      <category>python</category>
      <category>tooling</category>
    </item>
    <item>
      <title>3 LLM diff tools, 1 task: which one I actually use in 2026</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Mon, 08 Jun 2026 13:26:34 +0000</pubDate>
      <link>https://dev.to/aidevhub/3-llm-diff-tools-1-task-which-one-i-actually-use-in-2026-5cfi</link>
      <guid>https://dev.to/aidevhub/3-llm-diff-tools-1-task-which-one-i-actually-use-in-2026-5cfi</guid>
      <description>&lt;h1&gt;
  
  
  3 LLM diff tools, 1 task: which one I actually use in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;I use the diff-based comparator for daily prompt work, and promptfoo when I need a repeatable eval in CI. Vercel's Playground is the quickest way to eyeball two models, but it stops there. If all you want is to see what changed between two model answers, a side-by-side diff beats rereading both in full. Below is the same task run through all three, with the warts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Upfront, so you can weigh what follows: the AI Response Comparator I link to below is one I built. I'd tried four different playgrounds and every single one made me read two full answers and find the differences by eye, which fell apart past a paragraph. So I wrote a thing that diffs them instead. It's free, runs entirely in your browser, no signup, and nothing you paste ever leaves the tab. If you know a better one, tell me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task: summarize a changelog, two ways
&lt;/h2&gt;

&lt;p&gt;Last Tuesday I had a boring job that turned into a useful test. A changelog with 31 lines needed to become a 3-bullet summary a PM could read without asking me what "idempotent" means. Simple enough on its face. The catch: one summary wasn't the goal. I wanted to see how two models handled the exact same prompt, and more importantly where they drifted apart, because the drift is usually where the interesting bug hides.&lt;/p&gt;

&lt;p&gt;I've been burned by this before. Two summaries that read as twins, except one had quietly invented a config flag that never shipped, and I only caught it in review three weeks later. So the comparison isn't busywork. It's the step that catches a model confidently making something up while sounding exactly as calm as the model that got it right.&lt;/p&gt;

&lt;p&gt;So the input was the raw changelog: dependency bumps, a null-pointer fix, the usual stuff. The expected output was three plain bullets, no jargon, nothing a non-engineer would trip on. I ran the prompt through gpt-4o and claude-sonnet-4-6, got two summaries that looked roughly the same at a glance, and then hit the actual question of the day: how do I compare them without reading both four or five times and still missing something? That comparison step is exactly where these three tools stop being interchangeable, so I ran the same pair of outputs through all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  promptfoo: the config file that pays off later
&lt;/h2&gt;

&lt;p&gt;promptfoo is the open source eval runner a lot of teams already lean on. You describe your prompt, your providers, and your assertions in a YAML file, then run it from the terminal. Here's the config I used, trimmed to the parts that matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# promptfooconfig.yaml&lt;/span&gt;
&lt;span class="c1"&gt;# run: npx promptfoo@latest eval -c promptfooconfig.yaml&lt;/span&gt;
&lt;span class="na"&gt;prompts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changelog&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bullets&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;non-technical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reader:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{{changelog}}"&lt;/span&gt;
&lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anthropic:claude-sonnet-4-6&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;openai:gpt-4o-mini&lt;/span&gt;
&lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;changelog&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;- Fixed null pointer in auth handler (line 238)&lt;/span&gt;
        &lt;span class="s"&gt;- Bumped pg driver to 16.2&lt;/span&gt;
        &lt;span class="s"&gt;- Added retry with backoff on 429s&lt;/span&gt;
    &lt;span class="na"&gt;assert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;contains&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-rubric&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Uses&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;jargon&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;wouldn't&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;understand"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run that and you get a browser matrix: prompts down one axis, providers across the top, pass or fail on each assertion in every cell. It's genuinely good for what it is. The llm-rubric assert is the clever bit, since it grades free-form output against a plain-English standard instead of an exact string. The first run, though, cost me 47 minutes of fighting provider strings and one stale API key before a single result appeared. Annoying. Once the config exists it's repeatable, which is the entire reason you'd reach for it: you wire it into CI and it screams when an edited system prompt quietly changes the output. What promptfoo won't give me is a character-level diff between two answers. The cells sit politely side by side and I'm still the one reading them line by line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vercel's AI Playground: fast, but you read everything yourself
&lt;/h2&gt;

&lt;p&gt;The Vercel AI Playground (the one over at sdk.vercel.ai) is the opposite trade-off. Paste a prompt, pick two or three models from a dropdown, hit run, and watch the columns stream in next to each other. Zero config, no file, no keys of your own to manage. Under a minute from a cold tab to two finished answers on screen. For my 3-bullet task that was honestly plenty, and it's still the thing I open when I want a quick gut check on a new model.&lt;/p&gt;

&lt;p&gt;The ceiling arrives fast, though. The comparison ends at "here are both outputs, side by side, good luck telling them apart." For three short bullets your eyes handle the diffing fine. For a twelve-sentence answer, or a blob of JSON, or a refactored function, that approach quietly falls over and you start trusting whichever answer you read second. There's no character diff, no assertion, no rubric, and no saved history unless you log in. It's a great front door and a mediocre workbench, and I think it's honest about which one it's trying to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diff comparator: it shows me what changed
&lt;/h2&gt;

&lt;p&gt;This is the one I built, so weigh the praise accordingly. You paste answer A and answer B into two boxes, and it renders a side-by-side view with insertions and deletions highlighted, the same mental model as a git diff. There's also an analysis mode that flags the spots where the two answers assert different facts, which is the part I lean on most. The diff is the whole reason the thing exists.&lt;/p&gt;

&lt;p&gt;On the changelog task it earned its place in about a second. Both models produced nearly identical bullets, except claude held onto a line about the pg driver bump to 16.2 that gpt-4o silently folded away. I had read both summaries twice by eye and missed that gap both times. The diff caught it instantly, and the analysis mode called it out as a factual difference rather than a wording change. The analysis mode goes past plain text changes and tries to separate a reworded sentence from a genuine factual conflict, which is the line I care about when one model hallucinates and the other stays honest. I don't fully understand why a highlighted diff is so much easier on the brain than two clean columns, but it plainly is. Probably the same reason &lt;code&gt;git diff&lt;/code&gt; beats opening two copies of a file in separate windows.&lt;/p&gt;

&lt;p&gt;What it deliberately doesn't do: it won't call the models for you. You bring the two answers you've already generated. No API key to set, nothing you paste ever leaves the browser. That's a real limit. It's a comparison tool, not an eval runner, and if you need scoring across a test set you're back to promptfoo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scores, and which one I reach for
&lt;/h2&gt;

&lt;p&gt;Here's how the three landed on the things I actually care about:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;promptfoo&lt;/th&gt;
&lt;th&gt;Vercel Playground&lt;/th&gt;
&lt;th&gt;AI Response Comparator&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;~47 min first run&lt;/td&gt;
&lt;td&gt;under 1 min&lt;/td&gt;
&lt;td&gt;under 1 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repeatable in CI&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Character-level diff&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flags fact conflicts&lt;/td&gt;
&lt;td&gt;partial (rubric)&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;free, open source&lt;/td&gt;
&lt;td&gt;free tier&lt;/td&gt;
&lt;td&gt;free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Saves history&lt;/td&gt;
&lt;td&gt;local files&lt;/td&gt;
&lt;td&gt;login only&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these wins outright, and I genuinely use two of them most weeks. promptfoo stays wired into CI, where it catches regressions the moment someone edits a system prompt and doesn't realize the output shifted underneath them. That month my eval runs cost about $18.30 in API credits, which is nothing for the silent breakage it's caught. For the daily "did this prompt tweak actually change anything" question, I paste both outputs into the &lt;a href="https://aidevhub.io/ai-response-comparator/" rel="noopener noreferrer"&gt;diff comparator&lt;/a&gt; and read the highlighted bits instead of rereading two full answers. Vercel's Playground I keep one tab away for a thirty-second look and nothing heavier.&lt;/p&gt;

&lt;p&gt;Free matters here more than it sounds. A tool I have to expense or justify is a tool I won't open for a thirty-second check, and the whole value of a diff is that the friction to run one is basically zero.&lt;/p&gt;

&lt;p&gt;If you only adopt one, pick by the question you ask most. Worried about silent prompt regressions shipping to prod? That's promptfoo's job, full stop. Already holding two answers and you just want the delta highlighted? That's the diff tool, and it's the one I open almost every day. The Playground earns its keep for a fast first look at a model when nothing needs to persist past the session.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can promptfoo do diffs?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Not at the character level. It lays outputs out in a matrix and supports rubric-based asserts so you can grade each cell, but you're still the one reading them and spotting the differences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Do I need API keys for the diff comparator?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; No. You paste the two answers you already produced somewhere else. It never calls a model itself, so there's no key to configure and nothing leaves your browser tab.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Which models did you actually test?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; gpt-4o, gpt-4o-mini, and claude-sonnet-4-6, on a single changelog-summary prompt back in April 2026. It's a small sample, but the workflow gaps between the three tools showed up on the very first run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Is the diff approach only good for summaries?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; No. I run the same flow on refactored functions and on JSON outputs. Anything where "what changed" tells you more than "is this any good" in isolation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with AI assistance and human review. Try the tool at &lt;a href="https://aidevhub.io/ai-response-comparator/" rel="noopener noreferrer"&gt;aidevhub.io/ai-response-comparator&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Catch skill regressions before they ship in 2026</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Tue, 26 May 2026 14:01:04 +0000</pubDate>
      <link>https://dev.to/aidevhub/catch-skill-regressions-before-they-ship-in-2026-23g9</link>
      <guid>https://dev.to/aidevhub/catch-skill-regressions-before-they-ship-in-2026-23g9</guid>
      <description>&lt;h1&gt;
  
  
  Catch skill regressions before they ship in 2026
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Build a deterministic regression suite that reruns every skill case on each prompt change and blocks the merge when the risk-weighted pass rate drops below your gate. The Skill Regression Suite Builder generates those case files for you in a CI-ready format. It can't replace human judgment on edge cases. What it does is stop the silent breakages that ship when you tweak one line of a system prompt.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Skill Regression Suite Builder I link to below is one I built. I tried four eval frameworks before this, and every one assumed I'd ship my prompts and test data to their servers. I didn't want my system prompts leaving my laptop. So it runs entirely client-side. It's free and asks for no signup, and nothing you paste ever leaves the browser. If you've got a better workflow, tell me, I'm not precious about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug I shipped because I trusted a one-line prompt edit
&lt;/h2&gt;

&lt;p&gt;Three weeks ago, on a Tuesday afternoon, I edited a single sentence in a skill that classifies incoming support tickets. The change looked harmless. I added "prioritize billing issues" to the instructions because a stakeholder asked. I ran the skill once by hand. The output looked sensible, so I merged it.&lt;/p&gt;

&lt;p&gt;By Thursday morning the skill was misrouting password-reset tickets into the billing queue. Not all of them. About 1 in 6. Enough that nobody caught it for a day, and enough that our support lead pinged me at 8:42am asking why the billing queue had doubled overnight.&lt;/p&gt;

&lt;p&gt;The fix took two minutes. Finding it took most of the morning, because I had no record of what "working" used to look like. I'd been testing skills the way most people do: open it, type a few inputs I can think of off the top of my head, eyeball the output, ship. That works right up until it doesn't. The trouble is that the inputs I dream up on the spot are the easy ones. The cases that actually break are the ones I'd never think to type, which is exactly why they slip through.&lt;/p&gt;

&lt;p&gt;Here's the thing about skills. A skill is just a prompt plus some tools, and prompts are absurdly sensitive to wording. Shift one clause and the model quietly re-weights everything downstream. A regression suite for application code is normal practice. A regression suite for prompts barely exists in most teams I've talked to, even though prompts break in the same sneaky, invisible ways code does. I wanted the same safety net I already have for my functions: a fixed set of inputs with known-good outputs that runs automatically on every change and ends in one clear pass or fail.&lt;/p&gt;

&lt;p&gt;That gap is what this tool fills. It does one job: turn the test cases living in your head into a file your CI can run on every change.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the builder turns loose test ideas into a CI gate
&lt;/h2&gt;

&lt;p&gt;The core idea is boring on purpose, and that's a compliment. You hand it a list of cases. Each case has two halves: an input with its expected behavior, plus a risk weight you assign to it. The builder produces a deterministic suite file plus a small runner you can drop straight into CI. Deterministic is the operative word: it pins temperature to 0 and uses exact or rule-based matching, so the same input gives you the same verdict every single run. Flaky evals are worse than no evals, because the first time one fails for no reason you stop trusting all of them. A suite you don't trust is just guilt that runs in CI.&lt;/p&gt;

&lt;p&gt;The risk weight is the piece I didn't expect to care about, and now I won't build a suite without it. Test cases don't all matter the same amount. A misrouted billing ticket costs real money and a real apology. A slightly stiff greeting costs nothing. So each case carries a weight, and the gate checks a weighted pass rate rather than a raw count. You can pass 95% of your cases and still fail the gate, if the 5% you broke happened to be the expensive ones. I set my own weights on a one-to-five scale and treat anything that touches money or data as a five. You'll pick your own scale, the point is just that the gate respects it.&lt;/p&gt;

&lt;p&gt;Here is what a generated suite and its runner look like. This is real, runnable Python against a stubbed skill function, so you can read the gate logic without needing an API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Cases generated by the Skill Regression Suite Builder.
# Each case: input, expected route, and a risk weight.
&lt;/span&gt;&lt;span class="n"&gt;CASES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route-password-reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I forgot my password and can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t log in&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expect_route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route-billing-charge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Why was I charged twice this month?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expect_route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;greeting-tone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello there&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expect_route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Swap this stub for your real skill call.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;charg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.90&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;earned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expect_route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;earned&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PASS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FAIL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (w=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;earned&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weighted pass rate: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;SystemExit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CASES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run that and the stubbed skill fails the password-reset case, which carries weight 5, because the dummy router only recognizes the word "charg". The weighted score lands at 6 out of 11, about 54.5%, well under the 0.90 gate, so the process exits with code 1 and your CI run goes red. Swap &lt;code&gt;run_skill&lt;/code&gt; for your actual skill call and you have a genuine regression gate. The builder writes both files for you, including the GitHub Actions step, so the only part you actually do by hand is describe the cases and assign the weights.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it stacks up against the eval tools I tried
&lt;/h2&gt;

&lt;p&gt;I didn't build this in a vacuum. I tried the obvious options first, and a couple of them are excellent. Here's the honest comparison, including the rows where the builder loses:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Risk-weighted gates&lt;/th&gt;
&lt;th&gt;Runs client-side&lt;/th&gt;
&lt;th&gt;Setup time&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skill Regression Suite Builder&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;~5 min&lt;/td&gt;
&lt;td&gt;Skill/prompt regression in CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Promptfoo&lt;/td&gt;
&lt;td&gt;No (raw pass rate)&lt;/td&gt;
&lt;td&gt;Local CLI, configs on disk&lt;/td&gt;
&lt;td&gt;~30 min&lt;/td&gt;
&lt;td&gt;Broad LLM eval matrices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosted eval platform (generic)&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No, cloud upload&lt;/td&gt;
&lt;td&gt;~1 hr + account&lt;/td&gt;
&lt;td&gt;Teams wanting dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hand-rolled pytest&lt;/td&gt;
&lt;td&gt;Whatever you build&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;td&gt;Full control, no time budget&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Promptfoo is genuinely good and far more flexible than what I built. If you need a sprawling matrix of models against providers, reach for it. It does treat every assertion as equal by default, though, so wiring up weighted gates meant writing my own scoring on top. The hosted platforms gave me pretty dashboards and a login screen I didn't ask for, plus the upload problem from earlier. Hand-rolled pytest is exactly what I had before the bug, and it works fine, it just costs you an afternoon every time you want to reshape the suite.&lt;/p&gt;

&lt;p&gt;The builder sits in a narrow spot: you want weighted regression gates for your skills running in CI today, and you don't want your prompts sitting on someone else's server. If that describes you, &lt;a href="https://aidevhub.io/skill-regression-suite-builder/" rel="noopener noreferrer"&gt;generate your suite here&lt;/a&gt; and paste the output into your repo. It took me about five minutes the first time, and most of that was me arguing with myself over the weights. The weights are the only judgment call, and that's by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you should skip this entirely
&lt;/h2&gt;

&lt;p&gt;I'd rather tell you when not to use it, because the "when not to" section is the part AI-written tool reviews always quietly drop. Honesty about limits is the only reason to trust the rest.&lt;/p&gt;

&lt;p&gt;Don't reach for a regression suite while your skill is still changing shape every day. Early on, the "correct" output is a moving target, and you'll burn more time rewriting expected values than improving the actual skill. Wait until behavior settles. I usually start a suite once a skill has gone a full week without a structural rewrite.&lt;/p&gt;

&lt;p&gt;Skip it for purely generative, open-ended skills where no single right answer exists. If the output is a creative paragraph, exact matching is useless and rule-based matching is mostly wishful thinking. You want an LLM-as-judge setup there, which is a different tool and a longer argument about how much you trust the judge model.&lt;/p&gt;

&lt;p&gt;And please don't read a green gate as proof the skill is correct. A green gate means the skill still does the things you remembered to test. My password-reset disaster taught me that the case I most need is usually the exact one I forgot to write down. A suite catches regressions against behavior you already knew about. It can't catch a whole category of input you never imagined existed.&lt;/p&gt;

&lt;p&gt;One last thing. If you have three test cases total, skip all of this and use plain pytest. The weighting and the CI scaffolding start earning their keep somewhere around 15 to 20 cases, not 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Does this work with Claude skills and Codex skills?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes. The case format is model-agnostic. You point the runner's skill call at whatever backend you use, and the builder only cares about the case input and the expected behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; How is the weighted pass rate actually calculated?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Each case has a weight. The score is the sum of the weights for passing cases divided by the sum of every weight. A gate of 0.90 means you need 90% of your weighted risk to pass, so breaking one heavy case hurts far more than breaking a light one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can I run it without an API key?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; The builder itself runs in your browser with no key and no signup. The generated runner needs whatever credentials your real skill call uses, since it has to actually invoke your skill to test it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Is it really deterministic?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; As deterministic as your skill call is. The suite pins temperature to 0 and uses exact or rule-based matching, but if your model still wobbles at temperature 0, add a tolerance rule or a retry. I haven't needed to yet, though your mileage may vary.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with AI assistance and human review. Try the tool at &lt;a href="https://aidevhub.io/skill-regression-suite-builder/" rel="noopener noreferrer"&gt;aidevhub.io/skill-regression-suite-builder&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>testing</category>
    </item>
    <item>
      <title>How token counters actually work in 2026, and when to trust them</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Wed, 29 Apr 2026 17:58:09 +0000</pubDate>
      <link>https://dev.to/aidevhub/how-token-counters-actually-work-in-2026-and-when-to-trust-them-20jj</link>
      <guid>https://dev.to/aidevhub/how-token-counters-actually-work-in-2026-and-when-to-trust-them-20jj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Most "free token counter" tools in your bookmarks are not running the model's tokenizer. They're running a character-ratio estimate and labeling the output "tokens". For OpenAI's GPT family the official tokenizer is open and easy to ship in a browser. For Claude, Gemini, and most others it isn't. Here's what that means for your context-window math.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Up-front disclosure on this one: the tool I link to below is one I built. I got tired of paste-counter-paste-counter loops where the same input produced different numbers, and tired of tools that claim to support every model but quietly use one tokenizer for all of them. Free, client-side, no signup. I'm linking to it because it's what I use, and because I'd rather show you how it works than pitch it.&lt;/p&gt;

&lt;p&gt;If you've ever opened three "GPT token counter" tabs and gotten three different numbers, you're not crazy and the tools aren't all wrong. They're doing different things and labeling them the same way. Knowing which is which makes the difference between "this prompt fits" and "the API will reject it at the boundary".&lt;/p&gt;

&lt;h2&gt;
  
  
  What "tokenization" actually does
&lt;/h2&gt;

&lt;p&gt;A tokenizer takes raw text and splits it into the integer IDs the model actually consumes. Every model family ships its own vocabulary, trained on its own corpus. Same input string yields different token counts because the vocabularies differ.&lt;/p&gt;

&lt;p&gt;OpenAI's GPT-4 family uses an encoding called &lt;code&gt;cl100k_base&lt;/code&gt;. The newer GPT-4o, GPT-5, o3 and o4 models use &lt;code&gt;o200k_base&lt;/code&gt;, a larger vocabulary tuned for multilingual and code-heavy input. Anthropic's Claude family uses its own vocabulary that's published only as a server-side counting endpoint. Google's Gemini family is similar: server-side counting, no public local tokenizer at the time of writing (April 2026).&lt;/p&gt;

&lt;p&gt;The rule of thumb people quote, "1 token is about 4 characters of English", is fine for napkin math and wrong by 10 to 20 percent on real input. German tokenizes worse than English because compound words don't fit the English-trained vocabulary. Code with many short identifiers tokenizes better than prose. Emoji are usually 2 to 4 tokens each. JSON with verbose keys tokenizes much worse than minified JSON. If you're sitting near the context window, the rule of thumb will lie to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exact vs estimated, the real divide
&lt;/h2&gt;

&lt;p&gt;Free token counters fall into two camps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exact counters&lt;/strong&gt; ship the model's actual tokenizer in the browser and run it on your input. The numbers match what the API will charge, give or take a token or two. This is feasible only when the tokenizer is published as a runnable library. For OpenAI's GPT and o-series, that library is &lt;code&gt;tiktoken&lt;/code&gt; (Python) and &lt;code&gt;gpt-tokenizer&lt;/code&gt; (JavaScript). Both are MIT-licensed and small enough to ship client-side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Estimating counters&lt;/strong&gt; apply a character-ratio heuristic. They divide the character count by some constant (3.5 to 4.0 depending on the model family) and round up. The number is roughly right on plain English. It can be 10 to 20 percent off on code, JSON, German, mixed scripts, or anything with unusual whitespace. If a counter is fast on a 100,000-character paste regardless of which model you pick, it's almost certainly estimating.&lt;/p&gt;

&lt;p&gt;The honest move is to label which is which. Most counters don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the tool I built actually does
&lt;/h2&gt;

&lt;p&gt;Since I'm linking to one of these, I owe you the spec.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aidevhub.io/token-counter/" rel="noopener noreferrer"&gt;aidevhub.io/token-counter&lt;/a&gt; uses &lt;code&gt;gpt-tokenizer&lt;/code&gt; to compute exact counts for OpenAI's GPT-4, GPT-5, o3, and o4 model names. For every other family (Claude 3.x, Claude 4.x, Gemini, Llama, DeepSeek, Mistral, Grok) it uses a character-ratio estimate calibrated per family. Claude is &lt;code&gt;chars / 3.5&lt;/code&gt;. The others are &lt;code&gt;chars / 4.0&lt;/code&gt;. The output labels each row as either &lt;code&gt;exact&lt;/code&gt; or &lt;code&gt;estimate&lt;/code&gt; so you can tell which you're looking at.&lt;/p&gt;

&lt;p&gt;This is honest about what's possible. I can't ship Anthropic's tokenizer client-side because it isn't published as a local library. I can't ship Google's either. The choice was either to fake-claim "supports every tokenizer" (the easy lie) or to label estimates as estimates (the harder honesty). Picked the second.&lt;/p&gt;

&lt;p&gt;For most context-budget math at 30 to 70 percent of the window, the estimate is close enough. For boundary cases at 95+ percent of the window, you want the actual tokenizer. The next section is how to get certainty when you need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to get certainty when the number matters
&lt;/h2&gt;

&lt;p&gt;If the count matters (you're at the boundary, or you're billing customers per-token), don't trust any browser tool, including mine. Use the model's own counting endpoint or library.&lt;/p&gt;

&lt;p&gt;For OpenAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;
&lt;span class="n"&gt;enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encoding_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the source of truth. &lt;code&gt;gpt-tokenizer&lt;/code&gt; in the browser uses the same encodings (&lt;code&gt;cl100k_base&lt;/code&gt; for GPT-4 era, &lt;code&gt;o200k_base&lt;/code&gt; for GPT-4o and newer), so a browser-based exact counter and &lt;code&gt;tiktoken&lt;/code&gt; should match within a token or two. If they don't, your &lt;code&gt;tiktoken&lt;/code&gt; version is probably stale and the model has updated its vocabulary since you last upgraded.&lt;/p&gt;

&lt;p&gt;For Claude, Anthropic publishes a server-side counting endpoint accessible via the SDK as &lt;code&gt;client.messages.count_tokens()&lt;/code&gt; (or &lt;code&gt;client.beta.messages.count_tokens()&lt;/code&gt; depending on SDK version). It costs nothing to call but it does need network and an API key. Returns the exact count the API will charge for that exact &lt;code&gt;messages&lt;/code&gt; array including system prompt and tool definitions.&lt;/p&gt;

&lt;p&gt;For Gemini, the SDK exposes &lt;code&gt;model.count_tokens()&lt;/code&gt; which similarly calls Google's server.&lt;/p&gt;

&lt;p&gt;The post-call &lt;code&gt;usage&lt;/code&gt; field on every modern API is also authoritative. After your call, the response includes &lt;code&gt;input_tokens&lt;/code&gt; and &lt;code&gt;output_tokens&lt;/code&gt; as the actual billed counts. If your local count and the API's &lt;code&gt;usage&lt;/code&gt; consistently disagree, your local tokenizer is the one that's wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where token counts and API math diverge
&lt;/h2&gt;

&lt;p&gt;A counter on raw text isn't the full picture for an API call. Three things eat budget that a naive counter doesn't see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompt and tool definitions count.&lt;/strong&gt; Every modern API includes them in the input total. If you're counting only the user message, you're under-counting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message structure adds overhead.&lt;/strong&gt; Each message in a chat-format request costs a few tokens for the role markers and separators, on top of the content. OpenAI documents this; Anthropic does too. It's small (3 to 6 tokens per message) but at scale it matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens are a separate budget.&lt;/strong&gt; The 200,000 number you see in Claude's docs is the input window. Output is configured separately. Claude 4 family has a third configurable budget for thinking tokens. Always check the model's docs for the specific split.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A browser counter that gives you a single number against a single model is a useful sanity check, not a complete budget calculation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compact summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Counter type&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;tiktoken&lt;/code&gt; (Python)&lt;/td&gt;
&lt;td&gt;Runs OpenAI's official tokenizer locally&lt;/td&gt;
&lt;td&gt;Exact for GPT and o-series&lt;/td&gt;
&lt;td&gt;Boundary cases, prod budget math&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;gpt-tokenizer&lt;/code&gt; (JS)&lt;/td&gt;
&lt;td&gt;Same vocabularies, browser-shippable&lt;/td&gt;
&lt;td&gt;Exact for GPT and o-series&lt;/td&gt;
&lt;td&gt;Browser tools, paste-and-count UIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic &lt;code&gt;count_tokens&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Server-side API call&lt;/td&gt;
&lt;td&gt;Exact for Claude, includes message overhead&lt;/td&gt;
&lt;td&gt;When the count matters and you have a key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini &lt;code&gt;count_tokens&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Server-side API call&lt;/td&gt;
&lt;td&gt;Exact for Gemini, includes message overhead&lt;/td&gt;
&lt;td&gt;When the count matters and you have a key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Character-ratio estimate&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;chars / 3.5&lt;/code&gt; or &lt;code&gt;chars / 4.0&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Within 10 to 20 percent on most input&lt;/td&gt;
&lt;td&gt;Quick sanity check, no key needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  A few small habits that pay off
&lt;/h2&gt;

&lt;p&gt;After watching too many "but my count said it'd fit" boundary failures, three habits I've stuck with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Count against the actual target model&lt;/strong&gt;, not "GPT-4 close enough". Different vocabularies give different numbers on identical input. If you're sending to Claude 4.6, count with Anthropic's tokenizer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minify JSON before sending.&lt;/strong&gt; Pretty-printed JSON spends tokens on whitespace. The model doesn't care. Editor reads the indented version, model reads the minified one. Easy to script in your client.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log token counts on every prod call&lt;/strong&gt; and graph the average weekly. If your average prompt size starts creeping up because someone added a new few-shot example, you'll see it before it tips over the budget. Costs about 10 lines of code per service.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Are there official tokenizers I can run locally for every model?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Only OpenAI publishes one as a runnable library (&lt;code&gt;tiktoken&lt;/code&gt; in Python, &lt;code&gt;gpt-tokenizer&lt;/code&gt; in JS). Anthropic and Google publish counting as server APIs only. If a third-party tool claims to do exact tokenization for Claude or Gemini in your browser, it's almost certainly estimating, no matter what the marketing says.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Why does the count change when I add a system prompt?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Because system prompt is part of the input. Same for tool definitions if you're using tool-use APIs. The input window includes the entire request payload, not just the user turn. This trips people who count only their user message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How accurate is the post-call &lt;code&gt;usage&lt;/code&gt; field?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; It's the source of truth. That's what was billed. Counters before the call are estimates of what &lt;code&gt;usage&lt;/code&gt; will say. They should match within 1 to 2 tokens if your local tokenizer matches the model's current version. Consistent drift means your local library is stale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does whitespace really matter that much?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes, on text-heavy input. Repeated newlines and indentation are often single tokens each, but they add up. A pretty-printed 5,000-line JSON file can use noticeably more tokens than the same JSON minified, with no information loss. If you're trimming for budget, that's the first place to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What about thinking tokens on Claude 4 and reasoning tokens on o-series?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Separate budget on both. Claude 4 family has a configurable &lt;code&gt;thinking&lt;/code&gt; token budget independent of input and output. OpenAI's o-series has reasoning tokens that count against output. Check the specific model's docs because the rules vary by version.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Written with AI assistance and human review. &lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
    <item>
      <title>5 cron expression gotchas that catch experienced devs in 2026</title>
      <dc:creator>AI Dev Hub</dc:creator>
      <pubDate>Sat, 25 Apr 2026 13:34:21 +0000</pubDate>
      <link>https://dev.to/aidevhub/5-cron-expression-gotchas-that-catch-experienced-devs-in-2026-21l1</link>
      <guid>https://dev.to/aidevhub/5-cron-expression-gotchas-that-catch-experienced-devs-in-2026-21l1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Cron is one of those tools where the syntax looks obvious until a job fires at the wrong time and you start digging. Five behaviors below are documented in the man page and still catch people who've been writing cron for years. Each one is in a footnote most tutorials skip.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Quick disclosure on this one: the cron builder I link to below is something I built. After enough years of writing 5-field expressions by hand, I wanted a tool that showed me the next 5 fire times in my actual local timezone before I committed. Free, client-side, no signup. Linking to it because it's the workflow I use now.&lt;/p&gt;

&lt;p&gt;I think most devs learn cron the same way. You copy something off Stack Overflow that looks close to what you want, you tweak a number, you commit it, and then a few days later something fires at the wrong time and you start reading the man page properly. The 5 behaviors below are the ones I see trip people up over and over. None are exotic. All are documented. All pass code review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha 1: &lt;code&gt;*/5&lt;/code&gt; is anchored to the field origin, not to "now"
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;*/5 * * * *&lt;/code&gt; does not mean "every 5 minutes from whenever the job loaded". It means "every minute whose value is divisible by 5". So it fires at :00, :05, :10, :15, etc. If you load the job at :07 and expect the next fire 5 minutes later, you'll see the next fire at :10, not :12.&lt;/p&gt;

&lt;p&gt;The same rule applies to every field. &lt;code&gt;0 */6 * * *&lt;/code&gt; fires at 00:00, 06:00, 12:00, 18:00, anchored to midnight. Not to whenever you started the scheduler.&lt;/p&gt;

&lt;p&gt;This is the right behavior for most use cases (predictable, aligned across machines) but it's not what people often expect on the first read. The lesson: &lt;code&gt;*/N&lt;/code&gt; is anchored to the field's natural origin, never to the load time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha 2: day-of-month and day-of-week are OR, not AND
&lt;/h2&gt;

&lt;p&gt;This one is in the POSIX spec and almost nobody reads it. The expression &lt;code&gt;0 9 1 * 1&lt;/code&gt; does NOT mean "the 1st of the month, but only if it's a Monday". It means "at 9am on the 1st of every month, OR on every Monday". So it fires roughly 5 times more often than the AND interpretation would suggest.&lt;/p&gt;

&lt;p&gt;There's no way to express AND between those two fields in standard POSIX cron. Two common workarounds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;

&lt;span class="c1"&gt;# Cron fires every Monday. Script filters down to "first Monday of the month".
&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;weekday&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;run_billing_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skipping; not first Monday of the month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cron expression becomes &lt;code&gt;0 9 * * 1&lt;/code&gt; (every Monday at 9am) and the script handles the "first" qualifier. Two pieces of logic, each obvious on its own.&lt;/p&gt;

&lt;p&gt;The other workaround is to switch to a scheduler that supports AND between those fields. Quartz syntax (used by AWS EventBridge and many JVM schedulers) treats them as AND when both are non-&lt;code&gt;*&lt;/code&gt;. Different platform, different rule. Worth knowing which one you're on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha 3: launchd reads local time, not UTC
&lt;/h2&gt;

&lt;p&gt;This is a Mac-specific gotcha and it's caused enough confusion that I now put a comment at the top of every plist. macOS &lt;code&gt;launchd&lt;/code&gt; interprets &lt;code&gt;StartCalendarInterval&lt;/code&gt; in the system's local timezone. If your plist has &lt;code&gt;Hour=14&lt;/code&gt;, the job fires at 14:00 wherever the Mac thinks it is. There is no built-in "interpret as UTC" flag.&lt;/p&gt;

&lt;p&gt;If you're migrating a cron job from a Linux server (where cron typically runs in UTC unless configured otherwise) to launchd on a Mac in another timezone, the job will fire at a different absolute time. The expression looks identical. The behavior isn't.&lt;/p&gt;

&lt;p&gt;Two ways to fix it on launchd:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set the system clock to UTC. Works if you control the machine and don't mind the rest of the OS displaying UTC times.&lt;/li&gt;
&lt;li&gt;Compute the UTC-equivalent local hour and update it twice a year for daylight saving. Less elegant but doesn't change anything else on the system.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I pick option 2 with a comment in the plist that says "fires at 13:00 UTC; adjust for DST in March and October". Ugly, but explicit, which is what you want when you read the file 6 months later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha 4: cron does not catch up missed firings
&lt;/h2&gt;

&lt;p&gt;If your laptop is asleep at the scheduled time, the job does NOT fire on wake. Cron has no built-in catch-up. If your job is "delete files older than 30 days" and the machine is asleep through 3 firings, it just runs once when the next scheduled time arrives. The 3 missed firings are gone.&lt;/p&gt;

&lt;p&gt;This is a portable laptop problem more than a server problem. A server that's always on rarely misses. A Mac that sleeps overnight can easily miss its 3am job most nights and never log an error, because there's no error to log. The job didn't fail. It just wasn't fired.&lt;/p&gt;

&lt;p&gt;The fix on launchd is &lt;code&gt;StartInterval&lt;/code&gt; (interval-based, fires on wake) instead of &lt;code&gt;StartCalendarInterval&lt;/code&gt; (clock-time, no catch-up). Or you use a tool with persistent scheduling that does catch up: &lt;code&gt;anacron&lt;/code&gt; is the classic Linux answer, &lt;code&gt;cronie&lt;/code&gt; with &lt;code&gt;crond -P&lt;/code&gt; works similarly, and various job runners (systemd timers with &lt;code&gt;Persistent=true&lt;/code&gt;, etc.) handle this natively.&lt;/p&gt;

&lt;p&gt;I default to interval-based scheduling for anything maintenance-shaped (backups, cleanup, log rotation) where the exact time matters less than "did it run today". Calendar-based scheduling for anything time-sensitive (a daily 9am email) where running at 11am after the laptop wakes would be wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha 5: a cron expression has no timezone embedded in it
&lt;/h2&gt;

&lt;p&gt;This is the one that bites distributed teams. The expression &lt;code&gt;0 9 * * *&lt;/code&gt; says "at 9:00 in whatever timezone the scheduler runs in". It doesn't say UTC. It doesn't say Berlin. It says "whatever the scheduler thinks 9:00 is".&lt;/p&gt;

&lt;p&gt;If you write the expression in Berlin, deploy the code to a server in US-East, and that server's cron runs in UTC, your job fires at 9:00 UTC, which is 10:00 or 11:00 Berlin time depending on the season. The expression looks fine in code review. The behavior is wrong.&lt;/p&gt;

&lt;p&gt;A few things help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For Linux cron, &lt;code&gt;CRON_TZ=Europe/Berlin&lt;/code&gt; at the top of the crontab file pins all subsequent entries to that zone. Documented in &lt;code&gt;man 5 crontab&lt;/code&gt;. Easy to miss.&lt;/li&gt;
&lt;li&gt;For Quartz-based schedulers, the timezone is usually a separate config field (&lt;code&gt;timeZone&lt;/code&gt; in Spring's &lt;code&gt;@Scheduled&lt;/code&gt;, for example).&lt;/li&gt;
&lt;li&gt;For launchd, you compute it yourself or set the system clock.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I add a comment to every cron entry now that says what timezone I expect it to fire in. Adds 3 seconds to writing the entry and saves the timezone-archaeology session that always comes a month later.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I'd write each of these now
&lt;/h2&gt;

&lt;p&gt;For reference, here's how each gotcha translates to a defensible expression.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Naive attempt&lt;/th&gt;
&lt;th&gt;What it actually does&lt;/th&gt;
&lt;th&gt;Defensible version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Every 5 minutes from now&lt;/td&gt;
&lt;td&gt;&lt;code&gt;*/5 * * * *&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fires at :00, :05, :10...&lt;/td&gt;
&lt;td&gt;Same expression, accept the alignment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First Monday of month at 9am&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0 9 1 * 1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1st of month OR every Monday at 9am&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0 9 * * 1&lt;/code&gt; plus script-side date check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14:00 UTC daily on launchd&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Hour=14&lt;/code&gt; in plist&lt;/td&gt;
&lt;td&gt;14:00 in local timezone, not UTC&lt;/td&gt;
&lt;td&gt;Compute local hour, comment with intended zone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily backup at 3am&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0 3 * * *&lt;/code&gt; cron OR &lt;code&gt;Hour=3&lt;/code&gt; plist&lt;/td&gt;
&lt;td&gt;Skips firings when machine is asleep&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;StartInterval=86400&lt;/code&gt; or use a catch-up scheduler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anything moderately complex&lt;/td&gt;
&lt;td&gt;Hand-typed&lt;/td&gt;
&lt;td&gt;Often wrong on the first try&lt;/td&gt;
&lt;td&gt;Build visually, paste, comment what it fires on&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When raw cron is still fine
&lt;/h2&gt;

&lt;p&gt;I'm not saying never write cron by hand. For "every minute" (&lt;code&gt;* * * * *&lt;/code&gt;) or "every hour at the top" (&lt;code&gt;0 * * * *&lt;/code&gt;) it's faster to just type it. The break point for me is anything with more than one non-&lt;code&gt;*&lt;/code&gt; field. Two fields with values is where my error rate spikes and the cost of building visually is zero.&lt;/p&gt;

&lt;p&gt;Worth knowing: most cron implementations support extensions that aren't in POSIX. &lt;code&gt;@daily&lt;/code&gt;, &lt;code&gt;@weekly&lt;/code&gt;, &lt;code&gt;@reboot&lt;/code&gt;, &lt;code&gt;@hourly&lt;/code&gt; all exist in Vixie cron and read better than the equivalent expressions. If your environment supports them, prefer them. They're more readable to whoever opens the file in 2027.&lt;/p&gt;

&lt;p&gt;The free cron builder I made and use regularly now is at &lt;a href="https://aidevhub.io/cron-builder/" rel="noopener noreferrer"&gt;aidevhub.io/cron-builder&lt;/a&gt;. Pick days, hours, minutes from dropdowns, get the expression, see the next 5 fire times in your local timezone. The next-fire preview is the part I find most useful, because it catches the "this expression doesn't actually fire when I think it does" cases before they ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Why is the day-of-week / day-of-month thing an OR?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; It's a POSIX thing, dating back to the original Unix cron. The spec says if either field is restricted (not &lt;code&gt;*&lt;/code&gt;), they're OR-ed together. There's a footnote in &lt;code&gt;man 5 crontab&lt;/code&gt; if you want to read it. Most cron tutorials skip this part because it's a footgun.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does this work for AWS EventBridge cron expressions?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; EventBridge uses a 6-field cron syntax with year, and the day-of-week / day-of-month rule is AND there, not OR. So if you're going EventBridge, that specific gotcha goes away. The other 4 still apply. EventBridge also requires you to use &lt;code&gt;?&lt;/code&gt; in one of the two day fields, which is its own kind of footgun.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is there a cron syntax that's better than the 5-field one?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Quartz scheduler's syntax is more expressive (seconds, year, AND between day fields). Most Linux distros ship &lt;code&gt;systemd.timer&lt;/code&gt; which is way more readable but is its own thing. Pick whatever your platform supports best. I find systemd timers the cleanest for new Linux work and stick with launchd for Mac because the alternatives aren't worth the friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How do I test a cron expression without waiting?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Easiest path is a builder that shows the next 5 fire times so you can eyeball whether the schedule matches your intent. Beyond that, &lt;code&gt;croniter&lt;/code&gt; for Python and &lt;code&gt;cron-parser&lt;/code&gt; for Node both let you iterate the next N firings programmatically. I write a one-line script when I'm not sure: &lt;code&gt;python3 -c "from croniter import croniter; from datetime import datetime; c=croniter('0 9 * * 1'); [print(c.get_next(datetime)) for _ in range(5)]"&lt;/code&gt;. If the printed times look right, the expression is right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What about Quartz cron expressions?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Different beast. 6 or 7 fields (seconds optional, year optional), &lt;code&gt;?&lt;/code&gt; placeholder for day fields, &lt;code&gt;L&lt;/code&gt; for last, &lt;code&gt;#&lt;/code&gt; for nth-day-of-month. More expressive, less portable. If you're on a Quartz-based stack you're already in a different syntax and most of the POSIX gotchas above don't apply.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Written with AI assistance and human review.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>devtools</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
