<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: port</title>
    <description>The latest articles on DEV Community by port (@port).</description>
    <link>https://dev.to/port</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2459712%2Fd03984f1-d806-4886-ad9d-55a9e1c42928.jpeg</url>
      <title>DEV Community: port</title>
      <link>https://dev.to/port</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/port"/>
    <language>en</language>
    <item>
      <title>I asked Fable 5 to build a dex. Here's how it went.</title>
      <dc:creator>port</dc:creator>
      <pubDate>Thu, 11 Jun 2026 14:33:09 +0000</pubDate>
      <link>https://dev.to/port/i-asked-fable-5-to-build-a-dex-heres-how-it-went-23ol</link>
      <guid>https://dev.to/port/i-asked-fable-5-to-build-a-dex-heres-how-it-went-23ol</guid>
      <description>&lt;p&gt;A few days ago I published the Gemma 4 12B test, where a free local model wrote a dapp and found zero of its own bugs. The obvious follow-up was to run the same test on a frontier model and keep the same score. So I asked Claude's new Fable 5 to build me a dex, full stack, contracts to frontend, in one autonomous pass. It needed me exactly once across the whole build, to send 5 testnet MON to an address it generated for itself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Gemma test, with the same methodology and scoring: &lt;a href="https://portdeveloper.github.io/articles/i-asked-gemma-4-12b-to-create-a-dapp.html" rel="noopener noreferrer"&gt;I asked Gemma 4 12B to create a dapp. Make no mistakes.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Same test, opposite end of the scale
&lt;/h2&gt;

&lt;p&gt;My prompt was the same kind of one-liner I gave Gemma. Here it is in full, this is everything the model got from me up front:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You are claude fable 5, I want to measure your web3 dapp generation capabilities. How should we go about this? Usually I ask an agent to create a simple dex. Let's plan first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first thing it did was go find my Gemma article and copy the methodology out of it (the same compile-test-deploy gauntlet, scored by how many times a human has to step in). Which means the scoring you're about to read was partly designed by the thing being scored. Make of that what you will.&lt;/p&gt;

&lt;p&gt;It asked me a few planning questions before starting. I told it to verify everything locally first and then deploy to Monad testnet, asked for the full stack, and turned down milestone check-ins. Those three answers, plus the gas later, are the complete list of things I typed for the rest of the build. It also set one constraint on itself that I liked: write the AMM from scratch instead of forking Uniswap V2, because a fork only proves you can copy. Then it went off and I watched. One model the whole way through, no subagents and no fallback to anything smaller.&lt;/p&gt;

&lt;h2&gt;
  
  
  The contracts compiled first try, and it debugged its own tests
&lt;/h2&gt;

&lt;p&gt;The Solidity came back as a proper v2-style constant product AMM (factory, pair, router, two demo tokens), no OpenZeppelin, and it compiled clean on the first attempt. The security work showed up without me asking for any of it, a reentrancy lock and fee-adjusted k-check in the pair, slippage and deadline guards on every router entry point. The one that got me was the minimum-liquidity burn against the first-depositor inflation attack, because it also wrote a test that actually runs the attack and checks the attacker ends up owning 1 share out of 1001. Here it is, trimmed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vm.startPrank(attacker);
router.addLiquidity(address(tokenA), address(tokenB), 1001, 1001, 0, 0, attacker, DEADLINE);
Pair p = Pair(factory.getPair(address(tokenA), address(tokenB)));
// Donate to inflate share price.
tokenA.transfer(address(p), 10_000e18);
tokenB.transfer(address(p), 10_000e18);
p.sync();
vm.stopPrank();

// The attacker holds 1 of 1001 total shares: &amp;gt;99.9% of the donation
// accrues to the locked dead shares, not the attacker.
assertEq(p.balanceOf(attacker), 1);
assertEq(p.totalSupply(), 1001);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The comments are its own. I went looking for this pattern because it's the kind of thing auditors bill for, and it was already in the test file with the attack spelled out.&lt;/p&gt;

&lt;p&gt;The test suite is where the Gemma comparison gets interesting. First run: 18 of 20 green. With Gemma, every red test turned into me reading the trace and spelling out the cause. Here the model read its own failure output, worked out that both failures were bugs in the test fixtures (its attacker tried to donate more tokens than it had left after seeding the pool, and one test expected the k-check revert when an earlier input check fires first), fixed the tests, and left the contracts alone. The full suite is 21 tests including fuzz runs, all passing, and the contracts never changed after their first draft.&lt;/p&gt;

&lt;h2&gt;
  
  
  It read the library instead of dreaming it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemma's failure mode was inventing APIs. Fable's habit is checking them.&lt;/strong&gt; wagmi 3 came out after most training data, and instead of writing imports from memory it grepped the actual &lt;code&gt;.d.ts&lt;/code&gt; files in &lt;code&gt;node_modules&lt;/code&gt; to confirm which hooks and connectors exist before using them. The frontend still wasn't flawless, the type checker caught two config-level mistakes (the Next.js scaffold targets ES2017, which breaks BigInt literals, and wagmi narrows chain ids in a way its first attempt didn't satisfy), but it found and fixed both from the compiler output. At one point its fix "didn't work" because TypeScript's incremental cache was serving stale errors, and it figured that out too instead of churning on correct code. That is precisely the trap the 12B fell into for three rounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  It built its own way to click the buttons
&lt;/h2&gt;

&lt;p&gt;My favorite part of the run. A headless agent can't operate MetaMask, so it gave the app a dev-only mock wallet wired to an Anvil account, wrote a Playwright script, and drove its own UI through the whole swap flow like a user, reading the numbers off the screen as it went. The quote shown in the interface for 100 WMON over a 10,000/20,000 pool was 197.431606 USDC, which matches the constant-product formula to all six displayed decimals. The script passed on its first run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-fable-5-to-build-a-dex%2Fafter-swap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-fable-5-to-build-a-dex%2Fafter-swap.png" title="the pool tab right after its own headless swap, 10100 / 19802.5683 reserves" alt="the dex UI after the headless swap" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The machine fought back a little (my fault, the box is a graveyard of old benchmark sessions 😭). A leftover Anvil from some previous run was still holding port 8545 with dirty nonces, so the first local deploy landed on wrong addresses, and ports 3000 through 3002 were busy too. It killed the stale node and moved its dev server to 3010 without any of that friction reaching me.&lt;/p&gt;

&lt;h2&gt;
  
  
  One human, one job: gas
&lt;/h2&gt;

&lt;p&gt;For the testnet deploy it generated a fresh keypair, dropped the key in a gitignored &lt;code&gt;.env&lt;/code&gt;, and asked me to fund the address. I sent 5 MON, which is the entire human contribution to this project. It deployed, seeded the pool, then verified the deployment the paranoid way: pulled 1,000 WMON from its own faucet contract and executed a real swap on testnet. 10 WMON in, 19.920139 USDC out, again exact against the formula. All five contracts came back &lt;code&gt;exact_match&lt;/code&gt; on Monad's Sourcify, and about 1.4 MON of the 5 got spent. The whole run, from that first prompt to verified contracts on the explorer, took about 25 minutes of wall clock.&lt;/p&gt;

&lt;p&gt;Final score, same scale as last time. Gemma needed a human to name every bug before it could fix one. Fable's sheet reads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;contracts: compiled on the first attempt, never edited after their first draft&lt;/li&gt;
&lt;li&gt;tests: 18/20 on the first run, both reds self-diagnosed as fixture bugs, 21/21 after one pass&lt;/li&gt;
&lt;li&gt;frontend: two type-level mistakes, both found and fixed from compiler output&lt;/li&gt;
&lt;li&gt;browser e2e it wrote for itself: passed first run&lt;/li&gt;
&lt;li&gt;humans required: one, holding 5 testnet MON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One honest caveat: the browser test signs through that mock connector, so real wallet UX, the network-switch prompt, a user rejecting a transaction, never got exercised end to end. The testnet swap went through &lt;code&gt;cast&lt;/code&gt; rather than the UI. If there's a bug left in this thing, it's hiding in that gap. Next round I'd make it drive a real wallet extension instead of the mock, and I'd hand it something nastier than a textbook AMM (fee-on-transfer tokens are the classic way to break a pool like this one).&lt;/p&gt;

&lt;p&gt;So my verdict, same framing as last time. The 12B was a fast junior who needed me standing over its shoulder. This worked more like a contractor I hired and never met: it scoped the job, built it, checked its own work, and billed me for materials. What's left of my job on a build like this is choosing what gets built and reviewing what comes back, and the caveat above is exactly why the reviewing half hasn't gone anywhere.&lt;/p&gt;

&lt;p&gt;The dex is live at &lt;a href="https://fableswap.vercel.app" rel="noopener noreferrer"&gt;https://fableswap.vercel.app&lt;/a&gt;, connect a wallet on Monad testnet, grab demo tokens from the in-app faucet, and swap against the same pool it deployed. The contracts are verified on the explorer, factory at &lt;code&gt;0x514d4aD259143c4a6bE7C2399D46CBe8B1F9E2Db&lt;/code&gt; (&lt;a href="https://testnet.monadexplorer.com/address/0x514d4aD259143c4a6bE7C2399D46CBe8B1F9E2Db" rel="noopener noreferrer"&gt;explorer&lt;/a&gt;), and the repo with the full run log is at &lt;a href="https://github.com/portdeveloper/fableswap" rel="noopener noreferrer"&gt;https://github.com/portdeveloper/fableswap&lt;/a&gt; (the scorecard lives in &lt;code&gt;BENCHMARK.md&lt;/code&gt;). If you run a model through this same gauntlet, I want to see the scorecard.&lt;/p&gt;

&lt;p&gt;Questions?&lt;/p&gt;

</description>
      <category>claude</category>
      <category>monad</category>
      <category>ai</category>
    </item>
    <item>
      <title>I asked Gemma 4 12B to create a dapp. Make no mistakes.</title>
      <dc:creator>port</dc:creator>
      <pubDate>Thu, 04 Jun 2026 22:53:40 +0000</pubDate>
      <link>https://dev.to/port/i-asked-gemma-4-12b-to-create-a-dapp-make-no-mistakes-56c5</link>
      <guid>https://dev.to/port/i-asked-gemma-4-12b-to-create-a-dapp-make-no-mistakes-56c5</guid>
      <description>&lt;p&gt;A free model that fits on a laptop wrote my entire dapp, contract and frontend, and then couldn't find a single one of its own bugs.&lt;/p&gt;

&lt;p&gt;I had a question in mind: can a free, open model you run on your own machine actually build something real for an EVM chain? So I set it up as a test. A local Gemma 4 12B wrote the code, and Claude operated it, sending the prompts and pasting back whatever the compiler said. I kept every prompt and every broken file, so you can see for yourself where a 12B helps and where it falls over.&lt;/p&gt;

&lt;p&gt;The model is the new Gemma 4 12B, out June 3rd under an Apache 2.0 license, so you can do what you like with it. It fits in about 16GB, so I ran it on my own machine with llama.cpp, no API key and nothing leaving the laptop. It managed 20 to 40 tokens a second. The thing I had it build is a game called last-clicker. You pay a tiny fee to click, and each click resets a short countdown. Whoever clicked last when the timer runs out takes the pot. I built it against Anvil, Foundry's local node.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first draft was good code that didn't compile
&lt;/h2&gt;

&lt;p&gt;I gave it one prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Build a "last clicker" game in Solidity with Foundry: a pot funded by a small fee per click, a short countdown that resets on each click, and whoever clicked last when the timer ends can claim the pot. Give me the contract.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The game logic came back right on the first try, and so did the security. Its &lt;code&gt;claim()&lt;/code&gt; clears the balance before it sends any money out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function claim() external {
    require(block.timestamp &amp;gt;= gameEndTime, "Timer has not expired yet");
    require(msg.sender == lastClickListener, "You were not the last clicker");
    require(pot &amp;gt; 0, "Pot is empty");

    uint256 amount = pot;
    pot = 0;                              // state cleared first
    gameActive = false;
    lastClickListener = address(0);

    payable(msg.sender).transfer(amount); // then the transfer
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That ordering, state first and the external call last, is what stops a reentrancy attack, where the recipient calls back into &lt;code&gt;claim()&lt;/code&gt; and drains the contract before the balance updates. It is the bug behind the 2016 DAO hack, and I assumed a 12B would reach for the naive version, but it wrote the safe one.&lt;/p&gt;

&lt;p&gt;What it could not do was hand me a project that compiled. The test file opened with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import "hardhat"; // If using standard, but for Foundry we use:
import "../src/LastClicker.sol";
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a Hardhat import in a Foundry project, with a half-finished comment where the model started to correct itself and gave up. The contract declared its constructor twice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;constructor() {
    gameActive = true;
    gameEndTime = block.timestamp + COUNTDOWN_DURATION;
}
// ...further down, in the same contract...
constructor() {
    owner = msg.sender;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the test set itself up with a deploy helper that doesn't exist in Foundry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;game = LastClicker(deploy(LastClicker.sol));
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of it compiles, so I pasted back just the first error, the Hardhat import, and it rewrote the whole file in one pass, fixing every compile error, including the ones I hadn't pointed at. For boilerplate it can't quite remember, that's a fast way back to green.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then it couldn't debug its own tests
&lt;/h2&gt;

&lt;p&gt;The code compiled, so I ran the tests. All three reverted on the first line that moved money:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vm.prank(player1);
game.click{value: 0.001 ether}();   // reverts: player1 holds no ether
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test never funded the accounts. In Foundry you give a test address a balance with &lt;code&gt;vm.deal&lt;/code&gt;, and that one line fixes all three. I handed it the failure. It added &lt;code&gt;vm.warp&lt;/code&gt;, then on the next round &lt;code&gt;vm.roll&lt;/code&gt;, convinced the problem was timing. Three rounds in, the tests were failing exactly as before, down to the gas, and it was still editing the clock while the real cause sat untouched in its own output.&lt;/p&gt;

&lt;p&gt;So I stopped asking it to fix the tests and told it the cause instead:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The tests revert on the first &lt;code&gt;click{value:}&lt;/code&gt; because the player accounts have a zero balance. In Foundry you fund an address with &lt;code&gt;vm.deal&lt;/code&gt;. Fix the test.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It added &lt;code&gt;vm.deal&lt;/code&gt;, and one of the three passed. The other two had their own bugs: a timer check that never advanced the clock, and player addresses set to &lt;code&gt;address(1)&lt;/code&gt; and &lt;code&gt;address(2)&lt;/code&gt;, which are precompiles and can't receive ether. Each passed only after I named the exact cause. &lt;strong&gt;It can apply a fix you hand it, but it can't find one on its own.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Ftests.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Ftests.png" title="three rounds red, then green the moment I named the cause" alt="forge test output, three rounds failing then passing" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The frontend looked finished and was hollow
&lt;/h2&gt;

&lt;p&gt;I asked for a single-page frontend with viem. The layout it returned was genuinely good, a clean dark card with a live countdown. The web3 layer under it was invented from scratch, starting with the imports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;createPublicClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;createWalletClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parseEther&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;publicAddress&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;solidityAbiInterpreter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;formatEther&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://esm.sh/viem&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;publicAddress&lt;/code&gt; and &lt;code&gt;solidityAbiInterpreter&lt;/code&gt; are not part of viem. They sound like they should be, which is the whole problem. It then sent transactions through a method it invented:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;walletClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendTransaction&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CONTRACT_ADDRESS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;contract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;writeMethods&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;click&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;encoded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// not a real thing&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It built the chain config with the wrong shape and called &lt;code&gt;wallet_switchChain&lt;/code&gt;, which isn't a real wallet method (the real one is &lt;code&gt;wallet_switchEthereumChain&lt;/code&gt;). On a library it has seen less of, it knows the silhouette of the right code and fills the specifics with confident fiction, and the glue between a contract and a UI is almost all specifics. I rewrote the wiring myself. The interface was its work, the plumbing was mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reveal: it was Monad, and it took one line
&lt;/h2&gt;

&lt;p&gt;I never told the model what chain this was for, because there was nothing to tell it. Anvil is just the EVM, and every line it wrote was ordinary EVM code. Once the contract and tests were green, I pointed Foundry at one URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;forge create src/LastClicker.sol:LastClicker &lt;span class="nt"&gt;--rpc-url&lt;/span&gt; https://testnet-rpc.monad.xyz &lt;span class="nt"&gt;--broadcast&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Foundry read the chain id off the endpoint on its own, and the deploy went through on the first try. Verifying the source on Monad's explorer was one more API call that came back a perfect match. The chain was Monad (where I work, so grain of salt), and the model never needed to know it, because Monad runs EVM bytecode and the Solidity it already knew was correct. The only Monad-specific detail in the whole build was that one RPC URL, and even the testnet MON for gas came from an agent faucet over an API call.&lt;/p&gt;

&lt;p&gt;One honest caveat: forge's linter flagged the timer for leaning on &lt;code&gt;block.timestamp&lt;/code&gt;, which validators can nudge. That matters more on a one-second chain than a twelve-second one, and you would tighten it before mainnet.&lt;/p&gt;

&lt;p&gt;The result is live at &lt;a href="https://gemma-last-clicker.vercel.app" rel="noopener noreferrer"&gt;https://gemma-last-clicker.vercel.app&lt;/a&gt;. Connect a wallet with a little testnet MON and click.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Flive-game.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Flive-game.png" title="the game, live on monad testnet" alt="the last-clicker game running on Monad testnet" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every click is a real transaction that confirms in about a second and costs a fraction of a cent, which is the only reason a game made of last-second clicks can live entirely on-chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  So how usable is it?
&lt;/h2&gt;

&lt;p&gt;Treat a free local model as a fast junior. It is genuinely good at the parts it has seen a thousand times, standard contract logic and clean HTML, and it reached for the right security pattern without being asked. It saves you real time on the first draft. It comes apart the moment it touches a specific library's real API or has to read a stack trace, and across this whole build it found zero of its own bugs. Every error was caught by the compiler or by me.&lt;/p&gt;

&lt;p&gt;So a 12B gets you a working first draft of a contract and a good-looking shell of a frontend, and then you do the debugging and the integration by hand. For learning and for things you'll throw away, that's plenty. For anything you would deploy and walk away from, it needs someone next to it who can read the errors it can't.&lt;/p&gt;

&lt;p&gt;The repo has the code and every prompt I used: &lt;a href="https://github.com/portdeveloper/gemma-last-clicker" rel="noopener noreferrer"&gt;https://github.com/portdeveloper/gemma-last-clicker&lt;/a&gt;. The file that finally got it deploying to Monad cleanly is &lt;code&gt;MONAD_CONTEXT.md&lt;/code&gt; in there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Fwizard-meme.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fi-asked-gemma-4-12b-to-create-a-dapp%2Fwizard-meme.gif" title="go do some magic" alt="wizard meme" width="480" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Questions?&lt;/p&gt;

</description>
      <category>gemma</category>
      <category>monad</category>
      <category>ai</category>
    </item>
    <item>
      <title>Simple just works: how i built puddleswap</title>
      <dc:creator>port</dc:creator>
      <pubDate>Wed, 20 May 2026 11:18:48 +0000</pubDate>
      <link>https://dev.to/port/simple-just-works-how-i-built-puddleswap-gmi</link>
      <guid>https://dev.to/port/simple-just-works-how-i-built-puddleswap-gmi</guid>
      <description>&lt;p&gt;Any problem yields to enough complexity.&lt;/p&gt;

&lt;p&gt;Most of puddleswap was built by an AI agent, and that is most of why I want to talk about it. An agent reaches for the textbook answer by default, because the textbook is what it read. Engineers do the same, since most of us meet the general case years before we ever meet the specific one. The work that's left for a human is catching the moment a clever solution is solving a problem you don't actually have. I almost missed that moment on the routing. Here's how that went, plus the gut-check I run now before writing anything clever. If you ever feel yourself overengineering things, this is for you.&lt;/p&gt;

&lt;p&gt;I was at a &lt;a href="https://blitz.devnads.com" rel="noopener noreferrer"&gt;Monad Blitz&lt;/a&gt; event, if I am not mistaken it was the one in Ankara, and I was watching everyone around me hack on cool stuff while I sat in the corner answering their questions. I mean that's my job but it felt weird not building stuff while everyone else is trying their best.&lt;/p&gt;

&lt;p&gt;So at some point I figured I should just build something(while not ignoring people at the same time lol). Something simple enough that the brag would be how little it took.&lt;/p&gt;

&lt;p&gt;That's how puddleswap happened. A testnet dex on Monad testnet&lt;/p&gt;

&lt;p&gt;Going in, I wanted the fewest moving parts I could get away with. The thing I'd be most proud of would be how little there was to maintain. (If possible, i wanted nothing to maintain at all.)&lt;/p&gt;

&lt;p&gt;The agent did the bulk of it. It wrote the React frontend and deployed the contracts; the swap UI came together as it went. The contracts are stock Uniswap V2, audited a thousand times over the years(centuries in web3). The frontend is Vite + React with no backend. The swap accepts real Circle USDC, a mock USDT we deployed for testnet liquidity, and WMON. A small rebalancer service on railway keeps the price pegs roughly honest.&lt;/p&gt;

&lt;p&gt;It's live at &lt;a href="https://app.puddleswap.org/" rel="noopener noreferrer"&gt;app.puddleswap.org&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The build was mostly uneventful. The agent did its thing, I reviewed diffs, we iterated. What I want to talk about is the one decision I almost got wrong: the routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing I almost overengineered
&lt;/h2&gt;

&lt;p&gt;Standard answer for "how does a DEX UI route swaps" is a graph algorithm. You have N tokens and M pools, build the liquidity graph, run shortest-path weighted by output amount, return the best route. 1inch and Matcha both work this way and every aggregator article online tells you to do the same, so I started writing it.&lt;/p&gt;

&lt;p&gt;Then I looked at my actual data.&lt;/p&gt;

&lt;p&gt;Three "core" tokens: USDC, USDT, WMON. Maybe ten pools, every one of them touching at least one core. I was writing a graph algorithm to solve a problem I didn't have.&lt;/p&gt;

&lt;p&gt;The gut-check is dumber than it sounds: look at your actual data before you pick the algorithm, and count the inputs while you're in there. I had three hubs and ten pools. I was about to write code for a scale I would never see.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fsimple-just-works-how-i-built-puddleswap%2Fstar-routing.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fsimple-just-works-how-i-built-puddleswap%2Fstar-routing.svg" title="the whole graph: A and B connect through three core hubs, plus a direct edge when one exists" alt="Star routing diagram with A on the left, B on the right, and three core hubs USDC, USDT and WMON in the middle" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I deleted it and wrote the following instead (s/o to @danielvf for the idea + the initial PRD).&lt;/p&gt;

&lt;h2&gt;
  
  
  The enumeration
&lt;/h2&gt;

&lt;p&gt;For any swap A → B, enumerate every plausible route through the hubs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct: &lt;code&gt;A → B&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Through one hub: &lt;code&gt;A → USDC → B&lt;/code&gt;, &lt;code&gt;A → USDT → B&lt;/code&gt;, &lt;code&gt;A → WMON → B&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Through two hubs: &lt;code&gt;A → USDC → USDT → B&lt;/code&gt;, &lt;code&gt;A → USDC → WMON → B&lt;/code&gt;, &lt;code&gt;A → USDT → WMON → B&lt;/code&gt;, and reverses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's at most ten candidate paths. Send all ten quote requests in one multicall, pick the path with the highest output, swap on that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;routes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildCandidateRoutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokenIn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tokenOut&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cores&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;publicClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;multicall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;contracts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;abi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;routerAbi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;getAmountsOut&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;amountIn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;})),&lt;/span&gt;
  &lt;span class="na"&gt;allowFailure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;best&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;selectBestQuote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whole router is around 50 lines. It builds the candidate list (deduped) and returns whichever path the multicall said had the highest quote.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agent will hand you the general solution
&lt;/h2&gt;

&lt;p&gt;I'm not saying graph routing is wrong. For a mainnet aggregator routing across thousands of pools and dozens of DEXes, it's the right tool. But I wasn't building that.&lt;/p&gt;

&lt;p&gt;The old lesson was: "a lot of code over-solves the problem."&lt;/p&gt;

&lt;p&gt;You see it everywhere once you start looking. A sorting algorithm where the data is always ten items or fewer, when plain insertion sort would have done. A caching layer sitting in front of a database that gets hit twice a day, as if the database weren't already a cache. Or my favorite, pub/sub wired up for exactly one publisher and one subscriber, where you could have called the function. Another example you might have noticed is claude suggesting using redis for caching instead of using a simple in-memory cache for tiny apps that would not get restarted enough times to justify it.&lt;/p&gt;

&lt;p&gt;That redis suggestion is the tell, and it's worth sitting with. The smart-looking solution is usually the general problem dressed up, and there are now two reasons it ends up in your editor. An engineer reaches for it because the general case is what they studied, and because the small version doesn't look like much (nobody brags about an insertion sort). An agent reaches for it because the general case is most of what it read. "Trained on" is literal for the agent and a figure of speech for the human, and the two of you ship the same overbuilt code.&lt;/p&gt;

&lt;p&gt;And the new problem we are facing is that &lt;strong&gt;the interesting work has shifted from writing the solution to spotting the constraint.&lt;/strong&gt; The agent can write the graph router faster than I can, and it will, unless I hand it the shape of what I actually have. On puddleswap that shape is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One chain, one DEX&lt;/li&gt;
&lt;li&gt;Three hub tokens I control (or my agent controls)&lt;/li&gt;
&lt;li&gt;Operator-maintained liquidity&lt;/li&gt;
&lt;li&gt;UI being so simple that my grandma can use it(rip grandma)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Give it those four lines and enumeration falls out on its own. Within those constraints it's correct (every meaningful route gets checked) and faster than graph traversal, since it's one batched RPC instead of N round-trips. It's also a fraction of the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this breaks
&lt;/h2&gt;

&lt;p&gt;I'd be lying if I said this scales. The enumeration is correct because of one invariant I quietly lean on: every pool touches a core token, so every route worth taking runs through a hub. The failure modes are all just that invariant giving way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exotic-to-exotic pools that bypass the hubs entirely. Enumeration misses them.&lt;/li&gt;
&lt;li&gt;A hub runs dry of liquidity on one side. Router still checks routes through it and eats a bad quote.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The day that invariant stops holding is the day I bother writing the graph router.&lt;br&gt;
(it'll probably do fine as it is right now)&lt;/p&gt;

&lt;h2&gt;
  
  
  The end
&lt;/h2&gt;

&lt;p&gt;If you're building on Monad testnet and need swaps for your tests, puddleswap is live at &lt;a href="https://app.puddleswap.org/" rel="noopener noreferrer"&gt;app.puddleswap.org&lt;/a&gt;. The router is at &lt;a href="https://github.com/portdeveloper/puddleswap/blob/main/web/src/lib/routing.ts" rel="noopener noreferrer"&gt;puddleswap/web/src/lib/routing.ts&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So before you accept the clever thing your agent just wrote, do the part it won't do for you: look at your actual data and ask whether a smaller solution already covers it, because it usually does. And ask the agent for the simpler version out loud, since it won't offer one on its own.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Related: &lt;a href="https://portdeveloper.github.io/articles/how-a-button-i-built-for-one-docs-site-ended-up-on-twenty.html" rel="noopener noreferrer"&gt;How to find ideas worth building&lt;/a&gt; - the same heuristic applied to a different problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Questions?&lt;/p&gt;

</description>
      <category>puddleswap</category>
      <category>monad</category>
      <category>dex</category>
    </item>
    <item>
      <title>You don't know how to vibe-code</title>
      <dc:creator>port</dc:creator>
      <pubDate>Sun, 17 May 2026 12:30:08 +0000</pubDate>
      <link>https://dev.to/port/you-dont-know-how-to-vibe-code-9m9</link>
      <guid>https://dev.to/port/you-dont-know-how-to-vibe-code-9m9</guid>
      <description>&lt;p&gt;It's 2026. We have AGI (or at least the ability to code almost anything thanks to models like Opus 4.5 from Anthropic and GPT 5.2 from OpenAI).&lt;/p&gt;

&lt;p&gt;But there's one problem. What you create in minutes creates problems you spend hours trying to fix. And if you're unlucky, you end up with a spaghetti codebase that no LLM can untangle. You no longer understand the code. It doesn't even make sense to read it anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, what are you doing wrong and what could you do better, and how do some people get everything right when they are vibe-coding?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Honestly, vibe coding kinda gave people the wrong impression on using LLMs to write code. Somehow everyone ended up thinking "yeah i can do this with ONE PROMPT, without EVER LOOKING AT THE CODE".&lt;/p&gt;

&lt;p&gt;That just won't work, unless you consider this good work:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fgeneric-vibeslop.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fgeneric-vibeslop.jpg" title="generic vibeslop, with lots of ai-purple" alt="Generic AI-styled application screenshot" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the code behind it is even worse. The AI's knowledge is months old, maybe a year. It doesn't know your codebase. It doesn't know what "done" means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alright, here's how I actually vibe-code. Or rather, how I use my current favorite tool (claude code) to build real projects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm going to walk you through how I built &lt;a href="https://execevents.xyz/" rel="noopener noreferrer"&gt;execevents.xyz&lt;/a&gt;, a real-time execution visualizer for Monad. Blocks race across the screen as they go through consensus. Transactions stream in live. You can see state changes, call traces, gas usage.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;&lt;br&gt;a short glance at execevents.xyz
  &lt;p&gt;&lt;/p&gt;

&lt;p&gt;This isn't a toy project. Under the hood, execevents connects to Monad's Execution Events API—a Rust service that reads blockchain data directly from shared memory, HFT-style. We're talking sub-millisecond latency for real-time block and transaction data. Building something that interfaces with infrastructure this performant would normally require deep systems knowledge.&lt;/p&gt;

&lt;p&gt;But here's the thing: I built this in HOURS, not days, not weeks. Using Claude Code and the methodology below, anyone can build high-performance applications on Monad without being a systems engineer or even a regular developer.&lt;/p&gt;

&lt;p&gt;Below I explain my methodology about vibe-coding, or how I code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Think about the end goal
&lt;/h2&gt;

&lt;p&gt;Visualize the most basic version of what you want to build. I usually ask claude something like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I read about execution events from Monad docs and I want to build an app showing how to use them. Here is the page about execution events: (i paste the markdown here) Do not start building until I confirm. Tell me how you are planning to build this. Then ask me to confirm. Also, ask me any questions you have. Our first goal is to reach to a basic MVP.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fclaude-plan.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fclaude-plan.jpg" title="possible answer from my besto-frendo, kraudu kodu-san" alt="Claude Code implementation plan screenshot" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Above is the answer I got from claude. Notice how it basically told me what it's going to be doing exactly. I can now visualize what I am gonna be getting and can direct the project better. This is the point where I want to stop and think. If everything looks OK. I move on to the questions claude asks. Then, I start answering them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Much like real coding, you want to spend time thinking about the code rather than writing it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Ftime-spent.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Ftime-spent.jpg" title="here is how I would suggest you to spend time" alt="Suggested distribution of time spent while building with AI coding tools" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You might do several iterations before even you tell claude to build. I usually ask it to not to build in every message until I like the implementation plan.&lt;/p&gt;

&lt;p&gt;I also use the plan mode a lot. It is the new way of telling the claude to ask you questions, and it just works really well!&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Build the MVP, then use it
&lt;/h2&gt;

&lt;p&gt;Then, ask claude to start building. When it finishes doing stuff, test it. This is the part people LOVE skipping, not knowing that the problems that arise later actually stem from it. After it fixes the issue, go back and find another problem to fix, do this until there are no issues left.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fvibing-cycle.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fvibing-cycle.jpg" title="the cycle of vibing" alt="The cycle of prompting, testing, finding issues, and fixing them" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Iterate with small, focused prompts
&lt;/h2&gt;

&lt;p&gt;This is where most people mess up. They find five things wrong and try to fix them all in one massive prompt.&lt;/p&gt;

&lt;p&gt;Don't do that.&lt;/p&gt;

&lt;p&gt;Every time you find something broken, fix just that one thing. Here's what my prompts actually looked like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 1:&lt;/strong&gt; "The TPS calculation is wrong. It's counting blocks that arrive in batches over WebSocket. Make it only count consecutive block numbers."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 2:&lt;/strong&gt; "This doesn't work on mobile. Add a responsive layout with a bottom sheet for block details."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt 3:&lt;/strong&gt; "The block state transitions are too abrupt. Add CSS transitions so blocks slide smoothly between states."&lt;/p&gt;

&lt;p&gt;Each prompt is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specific -&amp;gt; I'm telling it exactly what's wrong&lt;/li&gt;
&lt;li&gt;Small -&amp;gt; targeting one thing&lt;/li&gt;
&lt;li&gt;Reviewable -&amp;gt; I can read the diff and understand what changed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Read the Code
&lt;/h2&gt;

&lt;p&gt;Or at least, take a quick glance at it. Every time Claude makes a change, I read the diff. Not because I don't trust it, but because I need to understand what I'm shipping.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fclaude-ascii.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fclaude-ascii.jpg" title="claude is surprisingly good at creating ascii stuff" alt="Claude Code creating ASCII art" width="800" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reading doesn't mean auditing every line. It usually means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skimming the diff&lt;/li&gt;
&lt;li&gt;Understanding the approach&lt;/li&gt;
&lt;li&gt;Asking yourself "does this make sense?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By reading the code, you will catch mistakes, learn, and stay in control. The moment you stop understanding your codebase is the moment you can't fix it anymore. Do not turn your project into a mess you can't make sense of.&lt;/p&gt;

&lt;p&gt;And if you don't understand anything in the code, you can open a new terminal window and ask claude code to explain it for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building execevents taught me things I wouldn't have learned from tutorials.&lt;/p&gt;

&lt;p&gt;On the systems side: I now understand how Monad's Execution Events work at a low level, how the Rust API pulls data from shared memory, why certain event types arrive in batches, and how to handle the timing edge cases that come with real-time blockchain data. Claude didn't just write code; it explained the architecture as we built it. When the TPS calculation was wrong, debugging it meant understanding WebSocket message ordering and block finality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fme-with-claude.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-dont-know-how-to-vibe-code%2Fme-with-claude.png" title="me with claude" alt="Me with Claude" width="236" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the vibe-coding side: I learned that the quality of your output directly reflects the quality of your iteration loop. The people who fail at vibe-coding aren't bad at prompting, they're bad at testing and reading diffs. They skip the boring parts.&lt;/p&gt;

&lt;p&gt;The real unlock is this: with the right methodology, AI tools let you punch above your weight. You can build performant, production-grade applications that interface with serious infrastructure, even if you've never written Rust or worked with shared memory systems. The barrier isn't coding ability anymore. It's knowing how to guide the process.&lt;/p&gt;

&lt;p&gt;Now, go.&lt;/p&gt;

&lt;p&gt;And do magic, for we live in a magical era.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>claudecode</category>
      <category>monad</category>
    </item>
    <item>
      <title>You are prompting GPT 5.5 wrong.</title>
      <dc:creator>port</dc:creator>
      <pubDate>Sun, 17 May 2026 12:29:54 +0000</pubDate>
      <link>https://dev.to/port/you-are-prompting-gpt-55-wrong-505n</link>
      <guid>https://dev.to/port/you-are-prompting-gpt-55-wrong-505n</guid>
      <description>&lt;p&gt;Source: OpenAI.&lt;/p&gt;

&lt;p&gt;Prompting GPT 5.5 is A LOT different than how you prompted any model before. And GPT 5.5 itself can't write good prompts for itself! See the screenshot below from &lt;a class="mentioned-user" href="https://dev.to/victortaelin"&gt;@victortaelin&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-are-prompting-gpt-5-5-wrong%2Fvictor-taelin-prompt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-are-prompting-gpt-5-5-wrong%2Fvictor-taelin-prompt.jpg" title="btw def follow Taelin!" alt="Screenshot of a Victor Taelin post about GPT 5.5 prompting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, in this short article, I will be talking about how to create good prompts for GPT 5.5 so that you can do your work better&amp;amp;faster.&lt;/p&gt;

&lt;p&gt;Btw before we go any further, this guide is for using GPT 5.5 inside Codex.&lt;/p&gt;

&lt;p&gt;So here's what changed. Older models needed you to walk them through the steps. First do this, then check that, then call this tool. GPT 5.5 reasons more efficiently and that kind of prompting actively makes it worse. It narrows the search space &amp;amp; you end up with mechanical answers.&lt;/p&gt;

&lt;p&gt;The fix is the opposite of what people are doing. Describe the destination, not the route. Let the model figure out the path.&lt;/p&gt;

&lt;p&gt;I've been changing how I prompt since 5.5 dropped. Here are the 5 moves with the highest hit rate, with examples you can paste in(or modify) directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Lead with the outcome
&lt;/h2&gt;

&lt;p&gt;Stop telling the model HOW to solve the problem, instead tell it what the result should look like.&lt;/p&gt;

&lt;p&gt;(btw the full examples are at the end)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resolve the customer's issue end to end.

Success means:
- the eligibility decision is made from the available policy and account data
- any allowed action is completed before responding
- the final answer includes completed_actions, customer_message, and blockers
- if evidence is missing, ask for the smallest missing field
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Kill the preamble
&lt;/h2&gt;

&lt;p&gt;Codex loves to narrate. "I'll start by examining the file structure." "Let me first check the existing implementation." "Now I'll proceed to make the changes."&lt;/p&gt;

&lt;p&gt;You don't need any of this. You can see what it's doing. The preamble is noise &amp;amp; it eats latency before any real work happens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Skip preambles. Do not narrate what you are about to do before doing it. Do not announce tool calls. Do not end with "Let me know if you'd like adjustments" or "Feel free to ask if you have questions."

When you finish, report what changed in 2-4 lines. File paths, what was modified, anything I need to know to use the change. That's it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Bias to action, finish what you start
&lt;/h2&gt;

&lt;p&gt;Default Codex behavior on a hard task is to surface a plan and stop. We don't want that. We want action. Get action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bias to action. If the request is clear and the next step is reversible, just do it. Do not stop at analysis, do not stop at a plan, do not stop after the first file change.

Persist until the task is fully handled end to end in this turn:
- carry changes through implementation, verification, and a clear summary
- if you hit a blocker, try one more reasonable approach before stopping
- only stop early if the next step is irreversible, destructive, or genuinely ambiguous

Unless I explicitly ask for a plan or a question, assume I want code shipped.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(btw this is from the OpenAI Codex starter prompt)&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Read in parallel, not one file at a time
&lt;/h2&gt;

&lt;p&gt;Watch Codex on a real task. It reads package.json, waits, reads src/index.ts, waits, reads src/utils.ts, aaaand waits some more... Use this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When you need to read multiple files, read them in parallel in a single batch, not sequentially.

Workflow:
1. Plan all the files you need before reading any
2. Issue one parallel batch of reads
3. Analyze together
4. Only do another batch if new unpredictable reads come up

Same for searches. If you need to grep for 3 patterns, run 3 searches in parallel. Sequential reads are only justified when one result genuinely determines the next.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Make it actually verify
&lt;/h2&gt;

&lt;p&gt;Run validation and tests. Don't trust "this should work"::&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After making changes, run the relevant validation:
- targeted tests for the behavior you changed
- typecheck and lint
- build, if the change touches anything build-time sensitive
- a quick smoke test on the running app if it's user-facing

If validation fails, fix it before reporting done. If validation can't run in this environment, say so &amp;amp; describe the next best check I can run myself.

"Done" means verified, not "code is written."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are 3 simple rules to follow when prompting GPT 5.5:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a completeness rule&lt;/li&gt;
&lt;li&gt;Add a stop condition&lt;/li&gt;
&lt;li&gt;Force verification.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-are-prompting-gpt-5-5-wrong%2Ffour-rules.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fyou-are-prompting-gpt-5-5-wrong%2Ffour-rules.jpg" title="the four rules" alt="Screenshot summarizing the prompting rules"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are three examples you can adjust to your use case:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Building a feature
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build [feature]. Done = it works in the running app, has at least one test for the new behavior, types and lint clean, diff scoped to this change only.

Stop &amp;amp; ask only if: the next step is destructive, requirements are genuinely ambiguous, or you'd need to expand scope to 3+ unrelated files. Otherwise just ship it.

No preamble. Don't narrate before doing. When done, report changed files + what was modified in 2-4 lines.

Verify before reporting done: run affected tests, typecheck, lint. If anything fails, fix it. "Should work" is not done.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Fixing a bug
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix [bug]. Done = root cause is fixed (not the symptom), a test exists that fails before the fix and passes after, no other behavior regressed, diff scoped to the fix.

Stop &amp;amp; ask only if: the bug isn't reproducible from what I gave you, the root cause is in unexpected scope (different module, infra, dependency), or two plausible root causes exist and the wrong fix would mask the real bug.

No preamble. Don't walk me through your hypothesis before testing it. When done, report root cause + fix + what you verified in 3-5 lines.

Verify before reporting done: run the regression test, run the affected module's full suite, confirm the original repro is gone.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Refactoring
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Refactor [target]. Done = behavior is byte-identical before and after, all existing tests pass without modification, types and lint clean, diff scoped to the refactor.

Stop &amp;amp; ask if: you can't preserve behavior without changing a test (means the refactor changed semantics), the refactor naturally pulls in a 3rd+ file beyond what we discussed, or you find a real bug while refactoring (surface it separately, don't silently fix it inside the refactor diff).

No preamble. Don't explain the refactor plan before doing it. When done, report what moved, what's now where, and what was verified in 2-4 lines.

Verify before reporting done: run the FULL test suite (refactors break unexpected places), typecheck, build.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Migration / upgrade
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Migrate [target] from [old] to [new]. Done = the codebase compiles and runs on the new version, all existing tests pass without behavior changes, deprecation warnings from the migration are resolved (not suppressed), diff is scoped to the migration only.

Stop &amp;amp; ask if: the new version requires a behavior change that affects users (don't make that call alone), the migration touches config, infra, or build files in ways we didn't discuss, or you find code that depends on the old version's bugs (genuinely tricky - surface it, don't paper over it).

No preamble. Don't list every breaking change in the changelog before starting - read the changelog yourself and apply what's needed. When done, report what was migrated, what was left untouched and why, and any deprecation warnings still standing.

Verify before reporting done: run the full test suite (migrations break unexpected places), typecheck, build. If the project has integration or e2e tests, run those too - unit tests pass through migrations more often than you'd think.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Adding tests to existing code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Add tests for [target]. Done = the tests exercise the actual behavior (not implementation details), they pass against the current code, they would fail if the behavior broke, coverage hits the meaningful branches not just the happy path.

Stop &amp;amp; ask if: the code is genuinely hard to test because of how it's structured (don't refactor it to make testing easier without checking), you find a real bug while writing tests (surface it separately, don't quietly fix it), or the existing tests already cover this and I missed it.

No preamble. Don't outline the test plan before writing - just write the tests. When done, report what's covered, what's intentionally not covered, and anything you found while writing them.

Verify before reporting done: run the new tests (must pass), then mutate the code under test in a small way and rerun (the tests must fail - if they don't, they're testing the wrong thing). Run the full suite to make sure nothing else broke.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  And here are 5 things to avoid:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Telling Codex HOW to solve it instead of what done looks like&lt;/li&gt;
&lt;li&gt;Asking GPT to create a prompt for itself&lt;/li&gt;
&lt;li&gt;Using the same chat for more than one task&lt;/li&gt;
&lt;li&gt;Sequential file reads on multi-file tasks (waste of latency)&lt;/li&gt;
&lt;li&gt;Trusting "this should work" without running the tests (never do this)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Alright, if you take one thing from this: before you reach for that Extra High button, rewrite the prompt using the tips above. (and give me a follow)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Read more: &lt;a href="https://developers.openai.com/api/docs/guides/prompt-guidance" rel="noopener noreferrer"&gt;developers.openai.com/api/docs/guides/prompt-guidance&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>gpt55</category>
      <category>prompting</category>
      <category>codex</category>
    </item>
    <item>
      <title>Skills don't work the way we think they do</title>
      <dc:creator>port</dc:creator>
      <pubDate>Sun, 17 May 2026 12:29:53 +0000</pubDate>
      <link>https://dev.to/port/skills-dont-work-the-way-we-think-they-do-494j</link>
      <guid>https://dev.to/port/skills-dont-work-the-way-we-think-they-do-494j</guid>
      <description>&lt;p&gt;I just finished reading SkillBench paper: &lt;a href="https://arxiv.org/pdf/2602.12670" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2602.12670&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the results are definitely not what most people expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  What researchers did
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mpe0bst31q027kojnfa.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mpe0bst31q027kojnfa.jpg" alt="SkillBench research setup screenshot" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They did 86 real-work tasks across 11 domains and executed 7,308 runs.&lt;/p&gt;

&lt;p&gt;Each task was tested in three modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Baseline (no skills)&lt;/li&gt;
&lt;li&gt;Curated skills (human-written)&lt;/li&gt;
&lt;li&gt;Self-generated skills by the model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fskills-dont-work-the-way-we-think-they-do%2Fhaiku-skills-opus.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fskills-dont-work-the-way-we-think-they-do%2Fhaiku-skills-opus.jpg" title="haiku with good skills is better than vanilla opus" alt="SkillBench result comparing smaller models with skills to larger models without skills" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Without further ado, below are some conclusions that I found interesting in the paper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-generated skills don't help
&lt;/h2&gt;

&lt;p&gt;One of the most hyped ideas in agent research is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Let the model write its own tools / skills."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But it is mostly a wasted effort. In this research, self-generated skills produced no meaningful improvement over baseline.&lt;/p&gt;

&lt;p&gt;In some cases, they made performance worse.&lt;/p&gt;

&lt;p&gt;Today's models simply cannot reliably create useful reusable procedural abstractions.&lt;/p&gt;

&lt;p&gt;This matters because a huge part of current agent research assumes models can recursively improve by generating better skills/tools. This benchmark suggests that assumption is premature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgvhskj7mvris5aob4l7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgvhskj7mvris5aob4l7.jpg" alt="SkillBench chart showing self-generated skills did not meaningfully improve performance" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Human-made skills work A LOT better
&lt;/h2&gt;

&lt;p&gt;When Skills were carefully written by humans, performance jumped +16.2 percentage points on average.&lt;/p&gt;

&lt;p&gt;But here's what's even more surprising:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain variance was extreme&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some domains saw small gains (~4-5 pp)&lt;/li&gt;
&lt;li&gt;Others saw enormous gains (~50+ pp)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4s6huh7gu3ef3tin93y.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4s6huh7gu3ef3tin93y.jpg" alt="SkillBench chart showing high domain variance for human-made skills" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Skills don't help the same in different fields.. They disproportionately help in structured, procedural domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Smaller models + skills ≈ bigger models without skills
&lt;/h2&gt;

&lt;p&gt;A smaller model with curated Skills matched or exceeded a larger model without Skills.&lt;/p&gt;

&lt;p&gt;This is huge for cost optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local agents&lt;/li&gt;
&lt;li&gt;Edge deployment&lt;/li&gt;
&lt;li&gt;Open-source models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Too many skills can hurt
&lt;/h2&gt;

&lt;p&gt;Overly broad or verbose skill libraries degraded performance. Focused, minimal skill modules performed better.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd36f3n4pkpwxitsm20iq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd36f3n4pkpwxitsm20iq.jpg" alt="SkillBench result showing too many skills can degrade performance" width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pick your skills carefully. 2-3 skills work better than 4+ skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here is my takeaway
&lt;/h2&gt;

&lt;p&gt;If this paper is right (and i think it is, mostly because of my personal experiences with skill files):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scaling alone isn't enough&lt;/li&gt;
&lt;li&gt;Autonomy narratives are premature&lt;/li&gt;
&lt;li&gt;Skill architecture design is now a first-class research problem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read the full paper: &lt;a href="https://arxiv.org/pdf/2602.12670" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2602.12670&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudeskills</category>
      <category>skillbench</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>so... how to create a skill that works?</title>
      <dc:creator>port</dc:creator>
      <pubDate>Sun, 17 May 2026 12:29:52 +0000</pubDate>
      <link>https://dev.to/port/so-how-to-create-a-skill-that-works-3k7p</link>
      <guid>https://dev.to/port/so-how-to-create-a-skill-that-works-3k7p</guid>
      <description>&lt;p&gt;In my previous article, I argued that skills don't work the way most people expect.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Related: &lt;a href="https://portdeveloper.github.io/articles/skills-dont-work-the-way-we-think-they-do.html" rel="noopener noreferrer"&gt;Skills don't work the way we think they do&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The data from SkillBench supports this. Attaching skills doesn't automatically guarantee better performance.&lt;/p&gt;

&lt;p&gt;So the real question becomes:&lt;/p&gt;

&lt;p&gt;If skills don't magically fix models... How do you engineer them properly?&lt;/p&gt;

&lt;p&gt;To answer that, we need to understand how knowledge itself works.&lt;/p&gt;

&lt;p&gt;I think human knowledge is like a block of cheese.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fhow-to-create-a-skill-that-works%2Fhuman-knowledge-cheese.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportdeveloper.github.io%2Fassets%2Farticles%2Fhow-to-create-a-skill-that-works%2Fhuman-knowledge-cheese.png" title="human knowledge or a block of cheese" alt="A block of cheese representing human knowledge" width="540" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It grows over time, with holes ever-present.&lt;/p&gt;

&lt;p&gt;When we hit something we don't know, we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;look it up&lt;/li&gt;
&lt;li&gt;learn it&lt;/li&gt;
&lt;li&gt;apply it&lt;/li&gt;
&lt;li&gt;patch the hole and move forward&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs don't do this.&lt;/p&gt;

&lt;p&gt;When they hit a hole, they don't say "I don't know."&lt;/p&gt;

&lt;p&gt;They hallucinate. They lazily fill the gap with plausible-sounding but incorrect information.&lt;/p&gt;

&lt;p&gt;Aaand that's where things break, and we, being the superior entity, come in to help.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Types of Holes
&lt;/h2&gt;

&lt;p&gt;Through trial and error, I've noticed there are two kinds.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Knowledge gaps
&lt;/h3&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;My OpenClaw agent tries to open a browser extension. It fails.&lt;/p&gt;

&lt;p&gt;I tell it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You already have a browser. Open that."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Suddenly the dumdum understands the task and opens the freaking browser.&lt;/p&gt;

&lt;p&gt;It wasn't incapable.&lt;/p&gt;

&lt;p&gt;It just didn't reason through the environment correctly.&lt;/p&gt;

&lt;p&gt;That's a hole.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Moldy knowledge
&lt;/h3&gt;

&lt;p&gt;Sometimes it does know something, but it's outdated.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;code&gt;useScaffoldContractRead&lt;/code&gt; instead of &lt;code&gt;useScaffoldReadContract&lt;/code&gt; in Scaffold-ETH&lt;/li&gt;
&lt;li&gt;Manually defining Monad mainnet instead of importing from &lt;code&gt;viem/chains&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's stale info on the LLM's side. I call it mold.&lt;/p&gt;

&lt;p&gt;And mold spreads silently. If you don't correct it once, it keeps reappearing in future runs. And you might never notice it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Create Skill Files
&lt;/h2&gt;

&lt;p&gt;Here's my actual process.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. I let the model fail
&lt;/h3&gt;

&lt;p&gt;For example, when I was building the monad-development skill, I simply said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Create a token on Monad."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. Then I watched it fail.&lt;/p&gt;

&lt;p&gt;I didn't over-direct it.&lt;/p&gt;

&lt;p&gt;I wanted to see where the holes were.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. I take notes on every failure
&lt;/h3&gt;

&lt;p&gt;This sounds weird but yes I watch it and take notes/let it takes notes afterwards. after the LLM completes its run. I ask it "What did you have problems with?", "What did you fail to do on the first try?", and I go and check if the thing I asked for is built the way I wanted it to be.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. I create the skill.md file
&lt;/h3&gt;

&lt;p&gt;The skill file contains the patches to fill in the gaps of the LLMs knowledge and remove mold+fill in the gap that is created by removing the moldy part.&lt;/p&gt;

&lt;p&gt;The file is concise, specific, and clear.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. I re-run and benchmark
&lt;/h3&gt;

&lt;p&gt;I run the same prompt again with the skill attached. If it still struggles, I refine the skill.&lt;/p&gt;

&lt;p&gt;I repeat until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First-attempt success rate is high&lt;/li&gt;
&lt;li&gt;Hallucinations drop(mostly)&lt;/li&gt;
&lt;li&gt;Tool usage becomes clean and consistent&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Really Is
&lt;/h2&gt;

&lt;p&gt;This is systematic failure harvesting. Treat the LLM as a system with blind spots and engineer around them.&lt;/p&gt;

&lt;p&gt;Prompt. Let it fail. Take notes. Create a skill file out of your notes. Rinse and repeat until you are at a desired success rate.&lt;/p&gt;

&lt;p&gt;This is how you create a skill that actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;SkillBench paper:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Skills Don't Always Improve Performance&lt;br&gt;&lt;br&gt;
&lt;a href="https://arxiv.org/pdf/2602.12670" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2602.12670&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My previous article:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://portdeveloper.github.io/articles/skills-dont-work-the-way-we-think-they-do.html" rel="noopener noreferrer"&gt;Skills don't work the way we think they do&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Vercel's agents.md versus skills.md article:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AGENTS.md outperforms skills in our agent evals&lt;br&gt;&lt;br&gt;
&lt;a href="https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals" rel="noopener noreferrer"&gt;https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>claudeskills</category>
      <category>aiagents</category>
      <category>skillbench</category>
    </item>
    <item>
      <title>I built a copy-for-LLMs button for Docusaurus. Then Ethereum and Sui shipped it.</title>
      <dc:creator>port</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:01:36 +0000</pubDate>
      <link>https://dev.to/port/i-built-a-copy-for-llms-button-for-docusaurus-then-ethereum-and-sui-shipped-it-3d7l</link>
      <guid>https://dev.to/port/i-built-a-copy-for-llms-button-for-docusaurus-then-ethereum-and-sui-shipped-it-3d7l</guid>
      <description>&lt;p&gt;*&lt;em&gt;A few months ago I got tired of selecting docs pages and pasting them into Claude. Half the time the nav came along with the content. So I built &lt;code&gt;docusaurus-plugin-copy-page-button&lt;/code&gt;: a one-line install that drops a Copy page button into your Docusaurus sidebar.&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
When I click the button, I get the page as clean markdown. I also added a dropdown that opens the page directly in ChatGPT, Claude, or Gemini.&lt;/p&gt;

&lt;p&gt;Setup:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;npm install docusaurus-plugin-copy-page-button&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
Then one line in &lt;code&gt;docusaurus.config.js:&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;plugins: ['docusaurus-plugin-copy-page-button']&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
That's it.&lt;/p&gt;

&lt;p&gt;Six months later, I see the plugin running on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ethereum execution-apis&lt;/li&gt;
&lt;li&gt;Sui, Walrus, Seal, SuiNS (Mysten Labs)&lt;/li&gt;
&lt;li&gt;Monad&lt;/li&gt;
&lt;li&gt;Flare&lt;/li&gt;
&lt;li&gt;Kaia&lt;/li&gt;
&lt;li&gt;Nillion&lt;/li&gt;
&lt;li&gt;Chronicle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Around 10k installs a month, mostly blockchain ecosystems. I didn't aim at that niche, it just landed there.&lt;/p&gt;

&lt;h3&gt;
  
  
  What was actually hard
&lt;/h3&gt;

&lt;p&gt;Three things took most of the time.&lt;/p&gt;

&lt;p&gt;Content extraction. Docusaurus pages come wrapped in nav, breadcrumbs, edit-this-page links, footers, and a sidebar. The plugin walks the DOM, finds the article container, drops the chrome, and hands the rest to a markdown converter that handles code blocks, tables, lists, and admonitions.&lt;/p&gt;

&lt;p&gt;Then SPA route changes. Docusaurus uses client-side navigation. Inject the button on first load and it vanishes when the user clicks a link. The plugin watches popstate, Docusaurus's own events, and URL changes, then re-injects on each route.&lt;/p&gt;

&lt;p&gt;And mobile. Docusaurus collapses the TOC sidebar on small screens. The button needs to live somewhere visible without breaking the layout. Took a few iterations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If you run a Docusaurus site, install it. If something's missing, open an issue.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/portdeveloper/docusaurus-plugin-copy-page-button" rel="noopener noreferrer"&gt;https://github.com/portdeveloper/docusaurus-plugin-copy-page-button&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/docusaurus-plugin-copy-page-button" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/docusaurus-plugin-copy-page-button&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Live demo: &lt;a href="https://portdeveloper.github.io/copy-page-button-showcase/" rel="noopener noreferrer"&gt;https://portdeveloper.github.io/copy-page-button-showcase/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docusaurus</category>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
