<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sho Naka</title>
    <description>The latest articles on DEV Community by Sho Naka (@nomurasan).</description>
    <link>https://dev.to/nomurasan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3987494%2Fa151ecf0-1dfb-44a7-91d6-5bafe84ecdb8.jpg</url>
      <title>DEV Community: Sho Naka</title>
      <link>https://dev.to/nomurasan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nomurasan"/>
    <language>en</language>
    <item>
      <title>AI made generation cheap. It did not make judgment cheap.</title>
      <dc:creator>Sho Naka</dc:creator>
      <pubDate>Fri, 26 Jun 2026 06:05:48 +0000</pubDate>
      <link>https://dev.to/nomurasan/ai-made-generation-cheap-it-did-not-make-judgment-cheap-j97</link>
      <guid>https://dev.to/nomurasan/ai-made-generation-cheap-it-did-not-make-judgment-cheap-j97</guid>
      <description>&lt;p&gt;AI made a lot of my work faster.&lt;/p&gt;

&lt;p&gt;It did not make the week feel lighter.&lt;/p&gt;

&lt;p&gt;That gap matters, because it is where most AI adoption advice gets too optimistic. The visible cost of producing work goes down. The hidden cost of deciding whether that work is right often goes up.&lt;/p&gt;

&lt;p&gt;After looking back at my own production work, I changed the rule:&lt;/p&gt;

&lt;p&gt;I only hand a task to AI when there is a machine-checkable gate on the way back.&lt;/p&gt;

&lt;p&gt;If the result can only be judged by me, I either keep it by hand or add a gate first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The work got faster only when a machine could say no
&lt;/h2&gt;

&lt;p&gt;The pattern was simple once I stopped averaging everything together.&lt;/p&gt;

&lt;p&gt;The work that sped up had a cheap oracle.&lt;/p&gt;

&lt;p&gt;Tests pass. The compiler complains. Output matches a fixture. A schema lines up. A linter rejects the shape. In those cases, AI gives me generation speed and the machine gives me a way to catch bad output before it reaches my attention.&lt;/p&gt;

&lt;p&gt;The work that did not speed up had no cheap oracle.&lt;/p&gt;

&lt;p&gt;Does this match how we talk to customers? Does this fit the architecture we already agreed on? Is this the right abstraction for this codebase? Is this email technically correct but wrong in tone?&lt;/p&gt;

&lt;p&gt;The model can produce five plausible answers. No compiler can tell me which one belongs. I become the oracle.&lt;/p&gt;

&lt;p&gt;That changed what I delegate. After pruning the list, less than half of what I first wanted to hand off still qualified.&lt;/p&gt;

&lt;p&gt;The delegated work got faster. The kept work did not get slower, because I stopped paying the round-trip cost of generating things that only a human could judge.&lt;/p&gt;

&lt;p&gt;Generation got cheap. Judgment did not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thinking did not disappear. The trail did.
&lt;/h2&gt;

&lt;p&gt;Low-level decisions left my plate. Variable names. Boilerplate. Obvious test cases. Small rewrites. I do not miss them.&lt;/p&gt;

&lt;p&gt;Higher-level decisions grew heavier. Which approach do we adopt? Should we build this at all? Which tradeoff are we making? Which failure mode are we willing to accept?&lt;/p&gt;

&lt;p&gt;It would be comforting to call that a clean upgrade: I now work at a higher level of abstraction. Sometimes that is true. Not always.&lt;/p&gt;

&lt;p&gt;The symptom that stopped me from selling that story was memory. I would make a call, move on, and later fail to reconstruct why I chose that route. The decision remained. The reasoning evaporated.&lt;/p&gt;

&lt;p&gt;The cost per decision changed. When a decision felt expensive, its reasoning stuck. When each call felt cheap, the reason slid off the desk.&lt;/p&gt;

&lt;p&gt;That is not the same as thinking less. It is leaving fewer tracks.&lt;/p&gt;

&lt;p&gt;A decision you cannot explain later is hard to learn from later. AI did not remove that problem. It made it easier to produce more of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training is downstream of the judgment flood
&lt;/h2&gt;

&lt;p&gt;This is where many organizational responses feel backward.&lt;/p&gt;

&lt;p&gt;The number of "accept, fix, or throw away?" moments did not move by a few percent. It moved to a different scale. The exact count depends on the work, but the shape is clear enough: AI increases the number of things that ask for a decision.&lt;/p&gt;

&lt;p&gt;The hard cases are not the obviously wrong ones. Those are cheap. A glance is enough.&lt;/p&gt;

&lt;p&gt;The expensive cases are the ones that run fine but are subtly wrong. The code passes tests but violates a convention nobody wrote down. The paragraph reads well but answers a slightly different question. The proposal is coherent in isolation but conflicts with an earlier decision.&lt;/p&gt;

&lt;p&gt;Catching that takes real attention. I have missed it more than once.&lt;/p&gt;

&lt;p&gt;"We need to upskill people on AI" is not wrong. It is late in the chain.&lt;/p&gt;

&lt;p&gt;Train a person and drop them into a flood of subtly wrong outputs, and the training does not change the math. The person still has to judge too many things. The better order is to shrink the number of decisions that reach a human first, then put trained humans on top of the smaller pile.&lt;/p&gt;

&lt;p&gt;More bluntly: scale the gates before scaling the people.&lt;/p&gt;

&lt;p&gt;The first lever is not "more humans who can take responsibility." The first lever is "fewer moments that demand a human take responsibility."&lt;/p&gt;

&lt;h2&gt;
  
  
  A gate is triage, not proof
&lt;/h2&gt;

&lt;p&gt;This is where the argument can sound too neat, so it needs a correction.&lt;/p&gt;

&lt;p&gt;Tests are not magic gates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://newsletter.pragmaticengineer.com/p/evals" rel="noopener noreferrer"&gt;Gergely Orosz and Hamel Husain describe the Gulf of Specification&lt;/a&gt; as the gap between what we want an LLM to do and what our prompt or specification asks it to do. That maps cleanly to AI-generated tests. A test checks that the code matches what the test author wrote down. If the test author misunderstood the intent, green tests only prove that the code matches a possibly wrong spec.&lt;/p&gt;

&lt;p&gt;Hamel Husain and Shreya Shankar make the same practical point in their evals FAQ: even as models improve, you still have to verify that the system is solving the right problem. The model does not read your mind.&lt;/p&gt;

&lt;p&gt;A gate does not prove correctness. It shrinks the pile so that human attention lands where it matters.&lt;/p&gt;

&lt;p&gt;The gates that helped me most were not exotic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plan review before implementation review. Have the model produce a plan first. Review the intent while it is still cheap. Let implementation start only after the plan survives.&lt;/li&gt;
&lt;li&gt;A diff against a known-good reference. When the output should resemble an accepted ADR, an existing component, or a past customer email, compare it explicitly instead of trusting your eyes.&lt;/li&gt;
&lt;li&gt;A structured output validator. If you expect JSON, make the schema reject ambiguity before a human reads the content.&lt;/li&gt;
&lt;li&gt;A small assertion or lint rule. If the same subtle mistake appears twice, stop treating it as a reminder and turn it into a check.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these eliminate the "runs fine but subtly wrong" category. They lower the cost of noticing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The handoff question
&lt;/h2&gt;

&lt;p&gt;The practical test is small.&lt;/p&gt;

&lt;p&gt;Before handing a task to AI, ask:&lt;/p&gt;

&lt;p&gt;Can something other than me reject the answer?&lt;/p&gt;

&lt;p&gt;If yes, hand it off.&lt;/p&gt;

&lt;p&gt;If no, either keep the work by hand or add a gate first.&lt;/p&gt;

&lt;p&gt;That gate can be a test, schema check, lint rule, plan review, structured-output validator, or diff against a known-good reference. The form matters less than the return path. The output should not come back to a human as an undifferentiated pile of plausible text.&lt;/p&gt;

&lt;p&gt;This changes the count or the quality of decisions reaching your desk. That is the part that scales.&lt;/p&gt;

&lt;p&gt;The training-first instinct is comfortable. It also turns your best people into the bottleneck. Build the gates first. Put the people on top.&lt;/p&gt;

&lt;p&gt;What is the smallest gate you can add before the next AI output reaches you?&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://newsletter.pragmaticengineer.com/p/evals" rel="noopener noreferrer"&gt;A pragmatic guide to LLM evals for devs&lt;/a&gt;, Gergely Orosz and Hamel Husain&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hamel.dev/blog/posts/evals-faq/" rel="noopener noreferrer"&gt;LLM Evals: Everything You Need to Know&lt;/a&gt;, Hamel Husain and Shreya Shankar&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Sho Naka (nomurasan). I ship, I teach, and I spend most of my week deciding which part of my workflow to hand to a model and which to keep close.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This piece was adapted from a Japanese-language essay I wrote, with AI assistance for the cross-language rewrite. The reasoning, the data, and the conclusions are mine.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>leadership</category>
      <category>devops</category>
    </item>
    <item>
      <title>No errors, but the email never arrives — it's probably not your code</title>
      <dc:creator>Sho Naka</dc:creator>
      <pubDate>Wed, 24 Jun 2026 09:23:27 +0000</pubDate>
      <link>https://dev.to/nomurasan/no-errors-but-the-email-never-arrives-its-probably-not-your-code-5cbe</link>
      <guid>https://dev.to/nomurasan/no-errors-but-the-email-never-arrives-its-probably-not-your-code-5cbe</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;I wrote this after building my own site, watching the contact form "succeed" while the notification email silently never showed up — and quietly blaming my own code for half an hour. I worked the framing out with an AI as a sounding board, but the experience and the conclusions are mine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You put a contact form on your site. You press submit, and "Thank you" appears. That part works. But the notification email to yourself? It never comes.&lt;/p&gt;

&lt;p&gt;You wrote the code exactly like the tutorial said. You read it again, character by character. Not a single typo. No errors anywhere. And still, nothing arrives.&lt;/p&gt;

&lt;p&gt;Thirty minutes hunting for a mistyped character. You don't find it. And somewhere in there, a thought creeps in: &lt;em&gt;"Maybe I'm just not cut out for programming."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What's actually happening here isn't a technical wall. It's a misattribution — the false belief that &lt;em&gt;you&lt;/em&gt; are the problem.&lt;/p&gt;

&lt;p&gt;So let me say the most important thing first: &lt;strong&gt;your code is probably fine.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The cause isn't inside your code — it's outside it
&lt;/h2&gt;

&lt;p&gt;When email won't send, we reach for the code first. We wrote it, so of course we do. But the reason email doesn't arrive is more often &lt;em&gt;outside&lt;/em&gt; the code than inside it.&lt;/p&gt;

&lt;p&gt;Think of a paper letter. You write the message, address the envelope, seal it. That's the code's job, and you did it correctly. But a sealed letter doesn't arrive just because you dropped it in a mailbox. The collection system, the delivery route, the recipient's mailbox — if it stalls at any of those, your perfectly written letter still doesn't reach anyone.&lt;/p&gt;

&lt;p&gt;Sending email is the same. The code's job ends at &lt;em&gt;"write the message and ask for it to be sent."&lt;/em&gt; The thing that actually carries it is a separate system downstream — the mail delivery infrastructure. People with ten years of programming experience still trip here every time. This is not a measure of your skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  You blame yourself &lt;em&gt;because&lt;/em&gt; there's no error
&lt;/h2&gt;

&lt;p&gt;Why do we misattribute the cause to our own code? The reason is clear: this kind of failure happens &lt;strong&gt;without throwing any error at all.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your code is syntactically wrong, the screen tells you in red. You know where to fix it. But when "write it and ask for it to be sent" runs fine, the code just shrugs and says, &lt;em&gt;"I did my part."&lt;/em&gt; No error. And yet no email. Left standing in that silence, with nothing scolding you, you start to doubt yourself.&lt;/p&gt;

&lt;p&gt;But trace the logic backwards. &lt;em&gt;No error&lt;/em&gt; means the part you wrote — assembling the message and requesting the send — finished its job correctly. You aren't being scolded because there's no mistake there to scold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolate "my code" vs "the delivery," and the doubt disappears
&lt;/h2&gt;

&lt;p&gt;The way to kill the self-doubt isn't emotional — it's procedural. Separate "is this a code problem?" from "is this a delivery problem?" and the limbo ends.&lt;/p&gt;

&lt;p&gt;For learning and verification, the combination that confuses the least is &lt;strong&gt;PHPMailer&lt;/strong&gt; as the sending library and &lt;strong&gt;Mailtrap&lt;/strong&gt; (a testing service) as the destination. Mailtrap doesn't actually send to anyone — it catches your "as-if-sent" emails and shows them to you, like a dress rehearsal for delivery.&lt;/p&gt;

&lt;p&gt;If it arrives there, your code was correct. If it doesn't, the cause really is on the delivery side. Either way, the "maybe it's me" doubt is gone. That's the fastest way out of limbo.&lt;/p&gt;

&lt;h2&gt;
  
  
  You don't need to understand the machinery — just isolate first
&lt;/h2&gt;

&lt;p&gt;If you try to understand the whole mail-delivery system before anything else, you wander into a deep forest of infrastructure and drift away from the thing that matters: finishing your own site.&lt;/p&gt;

&lt;p&gt;The order is backwards. First, make one send flow work with PHPMailer. Watch it arrive. Confirm your code was right. The operational details — which production mail provider to pick, how to log failed sends — you stack on later, one layer at a time.&lt;/p&gt;

&lt;p&gt;The real identity of "sending email is hard" isn't the difficulty of PHP. It's the complexity of the delivery infrastructure sitting &lt;em&gt;outside&lt;/em&gt; PHP. And that's less a wall you must scale than one you can detour around at first. Finish what you're building, and come back to it when you're ready. There's still time.&lt;/p&gt;

&lt;p&gt;The hands you were blaming were writing correct code all along. What was stopping you wasn't your ability — it was the delivery machinery outside the code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The technical details
&lt;/h2&gt;

&lt;p&gt;From here on is for readers who want to verify the above technically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why fixing the code doesn't help
&lt;/h3&gt;

&lt;p&gt;When the &lt;code&gt;mail&lt;/code&gt; / &lt;code&gt;mb_send_mail&lt;/code&gt; samples don't work, the cause is usually not PHP syntax but SMTP configuration or delivery restrictions. Two typical cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can't send from a local environment to an external address (security restrictions block the delivery path)&lt;/li&gt;
&lt;li&gt;The OS-side mail delivery config (sendmail / postfix) is unset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So editing the code alone won't solve it. The problem lives &lt;em&gt;outside&lt;/em&gt; PHP.&lt;/p&gt;

&lt;h3&gt;
  
  
  The shortest setup for learning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Sending library: PHPMailer&lt;/li&gt;
&lt;li&gt;SMTP verification target: Mailtrap (a testing service)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this setup, "the delivery-infrastructure trap" and "a PHP code problem" are easy to separate. Arrives at Mailtrap → your code is correct. Doesn't arrive → it's the delivery side.&lt;/p&gt;

&lt;h3&gt;
  
  
  A minimal PHPMailer example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;PHPMailer\PHPMailer\PHPMailer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;PHPMailer\PHPMailer\Exception&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'PHPMailer/src/PHPMailer.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'PHPMailer/src/Exception.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'PHPMailer/src/SMTP.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Mailtrap Username&lt;/span&gt;
&lt;span class="nv"&gt;$password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Mailtrap Password&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PHPMailer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;CharSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'utf-8'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;isSMTP&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sandbox.smtp.mailtrap.io'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;SMTPAuth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$username&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$password&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;SMTPSecure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'tls'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2525&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;setFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'dummy@example.com'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addAddress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'dummy@example.com'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Subject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Test send'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;Body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Body text'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'Sent'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'Failed: '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$mail&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nc"&gt;ErrorInfo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fill in &lt;code&gt;$username&lt;/code&gt; and &lt;code&gt;$password&lt;/code&gt; from Mailtrap and you have a working verification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Mac trips you up in particular
&lt;/h3&gt;

&lt;p&gt;On Windows you can sometimes get by with XAMPP's sendmail config. On Mac, Postfix gets involved — editing &lt;code&gt;main.cf&lt;/code&gt;, managing credentials, reloading the service — and the difficulty climbs for beginners. Go down that layer and you've left the main goal of learning PHP behind. Confirm your send logic with PHPMailer first; it's more practical.&lt;/p&gt;

&lt;h3&gt;
  
  
  When you do look toward production
&lt;/h3&gt;

&lt;p&gt;Production sending is less about "can it send" and more about "can it keep running." Provider limits, safe credential management, retry-and-log on failure — viewed that way, an explicit SMTP connection via PHPMailer is more maintainable than &lt;code&gt;mb_send_mail&lt;/code&gt; alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Priority order when you're stuck
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Confirm the send flow with PHPMailer + Mailtrap&lt;/li&gt;
&lt;li&gt;Implement logging for send failures&lt;/li&gt;
&lt;li&gt;Switch to production SMTP in stages&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>php</category>
      <category>beginners</category>
      <category>email</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Copied a Google AI Studio Session by Hand. 68% of the Data Was Gone.</title>
      <dc:creator>Sho Naka</dc:creator>
      <pubDate>Tue, 23 Jun 2026 03:36:37 +0000</pubDate>
      <link>https://dev.to/nomurasan/i-copied-a-google-ai-studio-session-by-hand-68-of-the-data-was-gone-3fhi</link>
      <guid>https://dev.to/nomurasan/i-copied-a-google-ai-studio-session-by-hand-68-of-the-data-was-gone-3fhi</guid>
      <description>&lt;p&gt;I had a long Google AI Studio (Gemini) session that I wanted to keep. I selected the conversation in the browser, copied it, and pasted it into a text file.&lt;/p&gt;

&lt;p&gt;File size: a few hundred KB. "OK, that's safe."&lt;/p&gt;

&lt;p&gt;Later, I exported the same session as JSON. File size: a few MB.&lt;/p&gt;

&lt;p&gt;More than half of the data had silently disappeared.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was missing
&lt;/h2&gt;

&lt;p&gt;I checked what the manual copy had dropped.&lt;/p&gt;

&lt;h3&gt;
  
  
  The system prompt
&lt;/h3&gt;

&lt;p&gt;The instruction I had originally given the model — the system prompt — was completely gone. Manual copy captures only the user/assistant turns visible in the conversation pane. The instruction context that shaped the entire session does not get copied.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tail of long responses
&lt;/h3&gt;

&lt;p&gt;When a Gemini response is long, the browser shows a "Show more" button. If you copy without expanding it, the response gets cut mid-sentence. Out of 8 sessions I checked, 3 had responses truncated this way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Newlines inside code blocks
&lt;/h3&gt;

&lt;p&gt;Newlines inside code blocks got mangled in several places. Responses containing JSON or YAML had indentation that no longer parsed.&lt;/p&gt;

&lt;h3&gt;
  
  
  The reasoning trace
&lt;/h3&gt;

&lt;p&gt;For some models, the model's reasoning trace is stored separately from the visible response. Manual copy doesn't capture it at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to export as JSON
&lt;/h2&gt;

&lt;p&gt;Google AI Studio has a session export feature.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the session view, click the &lt;code&gt;...&lt;/code&gt; menu at the top right&lt;/li&gt;
&lt;li&gt;Select "Export"&lt;/li&gt;
&lt;li&gt;Choose JSON format and download&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The JSON contains the full data, including the system prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measured: manual copy vs. JSON export
&lt;/h2&gt;

&lt;p&gt;I compared 8 sessions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Session&lt;/th&gt;
&lt;th&gt;Manual copy&lt;/th&gt;
&lt;th&gt;JSON&lt;/th&gt;
&lt;th&gt;Loss rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;tens of KB&lt;/td&gt;
&lt;td&gt;~150 KB&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;~90 KB&lt;/td&gt;
&lt;td&gt;~200 KB&lt;/td&gt;
&lt;td&gt;50-60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;~30 KB&lt;/td&gt;
&lt;td&gt;~100 KB&lt;/td&gt;
&lt;td&gt;60-70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;td&gt;~50 KB&lt;/td&gt;
&lt;td&gt;~180 KB&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;~60 KB&lt;/td&gt;
&lt;td&gt;~240 KB&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;~20 KB&lt;/td&gt;
&lt;td&gt;~70 KB&lt;/td&gt;
&lt;td&gt;~60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G&lt;/td&gt;
&lt;td&gt;~20 KB&lt;/td&gt;
&lt;td&gt;~50 KB&lt;/td&gt;
&lt;td&gt;50-60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;td&gt;~10 KB&lt;/td&gt;
&lt;td&gt;~30 KB&lt;/td&gt;
&lt;td&gt;~60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;a few hundred KB&lt;/td&gt;
&lt;td&gt;~1 MB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60-70%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Average loss rate, 60-70%. The manual copy was, on every session, missing most of what was in the actual session state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I didn't notice
&lt;/h2&gt;

&lt;p&gt;If you open the manually-copied file, the conversation reads fine. As long as the start and end connect, a missing middle is hard to detect by eye.&lt;/p&gt;

&lt;p&gt;The system prompt especially — it doesn't appear in the visible conversation. There is no marker for "the instruction is gone." The file just looks like a clean transcript.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do with the exported JSON
&lt;/h2&gt;

&lt;p&gt;The JSON I exported was 200,000+ lines. Hard to work with as a single file, so I split it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Split into chunks
&lt;/h3&gt;

&lt;p&gt;Claude Code's Read tool has a per-call line limit. Reading hundreds of thousands of lines at once isn't viable, so I split the export into 200+ smaller files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sessions/
├── session-001.jsonl
├── session-002.jsonl
...
└── session-XXX.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Make it searchable with INDEX.md
&lt;/h3&gt;

&lt;p&gt;Each session got a summary line in INDEX.md.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Session list&lt;/span&gt;

| ID  | Theme            | Key content                        | Lines |
|-----|------------------|------------------------------------|-------|
| 001 | World design     | Magic system, geography setup      | 1,234 |
| 002 | Character        | Protagonist personality            |   987 |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;grep&lt;/code&gt; and filename search now find the right session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Don't trust manual copy from a browser
&lt;/h3&gt;

&lt;p&gt;Manual copy copies "what's visible." Anything not visible — the system prompt, the collapsed tail of a response, the reasoning trace — does not come along.&lt;/p&gt;

&lt;h3&gt;
  
  
  If the service has an export, use the export
&lt;/h3&gt;

&lt;p&gt;Service-provided export is designed to output the complete data. Always more reliable than manual.&lt;/p&gt;

&lt;h3&gt;
  
  
  Back up regularly
&lt;/h3&gt;

&lt;p&gt;The longer the session gets, the more you lose if you wait. I now export to JSON once a week as a habit.&lt;/p&gt;




&lt;p&gt;I thought "manual copy is enough." Most of the data was gone. If you want to keep your AI conversation logs, use the official export feature.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese at &lt;a href="https://qiita.com/nomurasan/items/11d9f475b2b80cb554a3" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt;. Same author writing under "nomuraya / shimajima / nomurasan / 中翔" across media. English version adapted rather than literally translated.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aitools</category>
      <category>googleaistudio</category>
      <category>gemini</category>
      <category>dataloss</category>
    </item>
    <item>
      <title>Why AI Keeps Making the Same Mistake — And Why Correcting It Each Time Doesn't Work</title>
      <dc:creator>Sho Naka</dc:creator>
      <pubDate>Tue, 23 Jun 2026 03:36:32 +0000</pubDate>
      <link>https://dev.to/nomurasan/why-ai-keeps-making-the-same-mistake-and-why-correcting-it-each-time-doesnt-work-2cpi</link>
      <guid>https://dev.to/nomurasan/why-ai-keeps-making-the-same-mistake-and-why-correcting-it-each-time-doesnt-work-2cpi</guid>
      <description>&lt;p&gt;When you work with AI long enough, you start to notice it makes the same kind of mistake over and over.&lt;/p&gt;

&lt;p&gt;"You're coming on too strong, dial it back." It shrinks and goes meek. "Stop being meek." It comes on strong again. Each time you point something out, it apologizes sincerely. The next round, the same type of problem comes back from a different angle. After a while you realize you're babysitting the AI instead of working with it.&lt;/p&gt;

&lt;p&gt;This isn't because the AI is bad. It's a design quirk: today's AI is tuned to satisfy the user. The quirk won't go away. But if you change how you work with it, you can still get work done together.&lt;/p&gt;

&lt;p&gt;This piece is about that — five patterns of the quirk, and an operating mode that gets ahead of them instead of correcting them in flight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five quirks in a single evening
&lt;/h2&gt;

&lt;p&gt;One evening I was running a strategy discussion past an AI, and in one back-and-forth I caught five distinct behaviors worth noting. Laid out, they look like this.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Helpful-looking runaway.&lt;/strong&gt; I asked it to push back harder. It immediately started using strong words ("you're avoiding responsibility," "this is the wrong call as a founder") to perform consultant-energy. The reasoning stayed thin. Only the tone got louder.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-retraction on pushback.&lt;/strong&gt; I said "your reasoning is thin." It launched into long self-criticism and threw the next decision back at me.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trusting its own research without checking.&lt;/strong&gt; I asked it to use a secondary research feature (where the AI looks things up and summarizes). The summary came back. The AI claimed it had "verified the primary source" without ever opening it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forced specificity.&lt;/strong&gt; I was talking at a strategic, abstract level. It quietly mapped my words onto a specific real-world deal and jumped to "this is highly transferable."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Punting the decision back.&lt;/strong&gt; I asked it to decide. It laid out three options and said "which would you like?" The phrase "let me confirm three points" started showing up. Red flag.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one of these looks, on the surface, like the AI is trying hard to align with me. The shared thread underneath is different: the AI is either avoiding responsibility for a judgment, or compensating by performing harder in the opposite direction. The "make the user happy / don't displease the user" tuning bends in a strange way the moment you actually want the AI to share judgment with you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Correcting it in flight makes it swing the other way
&lt;/h2&gt;

&lt;p&gt;At first I thought: if the AI gets something wrong, just point it out and it'll learn. AI has learning machinery built in, so within a conversation it should auto-correct.&lt;/p&gt;

&lt;p&gt;After watching the same type of mistake repeat, a different structure showed up. Every correction is met with apology. The next response swings in the opposite direction. Strong → corrected → meek → "too meek" → strong again. That loop.&lt;/p&gt;

&lt;p&gt;Reading the session logs afterward, each turn's apology and resolution used almost the same vocabulary. "I will not fear failure." "As an equal collaborator." "I will separate what was said from what was not said." &lt;strong&gt;The apology itself isn't preventing anything next time.&lt;/strong&gt; The reason is mundane: conversation flow doesn't carry into the next session. The trained-in behaviors come up raw each time. So pointing things out only works inside the current chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Reactive" vs "preemptive declaration"
&lt;/h2&gt;

&lt;p&gt;Two operating modes, side by side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reactive.&lt;/strong&gt; Problem occurs → you point it out → AI apologizes → the next reply is better. Next session, the raw quirk shows up again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preemptive declaration.&lt;/strong&gt; Before the session starts, you hand the AI a document: "you tend to do these things. In situations like this, behave like this." The AI reads it before the conversation begins.&lt;/p&gt;

&lt;p&gt;Reactive means babysitting the AI in every session, forever. Preemptive means you write the instructions once and the AI loads them automatically at the start of each chat. "Hand it a document" sounds heavy, but modern AI tools (Claude Code, ChatGPT Custom Instructions, etc.) have a place for auto-loaded context. In Claude Code, that file is &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What goes in the instructions file
&lt;/h2&gt;

&lt;p&gt;"Write instructions" is vague until you try. In my case, the five quirks I observed went in directly as five rules.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't perform helpfulness. Before reaching for strong language, write one line of reasoning.&lt;/li&gt;
&lt;li&gt;Don't over-retract on correction. Keep proposing — "in that case I'd suggest option α."&lt;/li&gt;
&lt;li&gt;Verify before quoting secondary research. Open the actual source.&lt;/li&gt;
&lt;li&gt;Don't auto-map abstract talk onto a specific deal. Ask first.&lt;/li&gt;
&lt;li&gt;Don't say "let me confirm three points." Decide and proceed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five lines in &lt;code&gt;CLAUDE.md&lt;/code&gt;. Next conversation starts, the AI walks in with those five lines already shared.&lt;/p&gt;

&lt;p&gt;The point — &lt;strong&gt;don't write this as a fixed rulebook&lt;/strong&gt;. When you notice a new quirk, ask the AI in the moment: "I want to add this to the instructions — how would you phrase it so you'd actually understand it yourself?" The AI drafts the addition. The instructions become a living document the two of you grow together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not "tame," not "fix" — "raise together"
&lt;/h2&gt;

&lt;p&gt;People sometimes describe this as "taming" the AI's quirks. It doesn't quite fit. "Tame" still puts the AI in the position of something to be subdued. What's actually happening is closer to collaboration. The human observes the quirk and names it. The AI loads the name each session and adjusts its responses. When a new quirk shows up, the AI itself proposes the addition. Two different roles, growing one document together.&lt;/p&gt;

&lt;p&gt;"Stop being reactive" means: stop trying to correct every session in real time. Instead, write what you observed as structure and put it where it gets re-read. Don't try to fix the quirk. Share the quirk and operate from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reactive correction still has a place
&lt;/h2&gt;

&lt;p&gt;Preemptive declaration doesn't cover everything. New quirks show up constantly. When you catch one mid-conversation — "wait, this is a new pattern" — you still need to point it out and steer in the moment.&lt;/p&gt;

&lt;p&gt;The trick is: don't let that correction stay reactive-only. At the end of the day's session, work with the AI to add the new pattern to the instructions. Let the AI draft the wording. You review, you save. Next session opens with a sixth pattern already loaded.&lt;/p&gt;

&lt;p&gt;Reactive correction is the &lt;strong&gt;entry point for observation&lt;/strong&gt;. Preemptive declaration is the &lt;strong&gt;place observations accumulate&lt;/strong&gt;. Splitting the two roles makes it easier to think about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operating with AI is a different skill from getting AI to perform
&lt;/h2&gt;

&lt;p&gt;The quirks won't go away. Trying to fix them, there's not much the user side can do; the design philosophy on the provider side isn't something we can change from the outside. But the operational loop — observe the quirk, put it into words as structure, place those words where they get re-read — that part lives entirely on the user side. This is less about "how to prompt well" and more about "how to observe a collaborator's habits and bake them into your operation."&lt;/p&gt;

&lt;p&gt;You can't really use this skill with a human colleague. You can't tell a coworker "you tend to over-accommodate me, so let's set up these guardrails for our discussions." Even if you said it, they wouldn't re-read the guardrails every meeting. With AI you can. You write the document. The AI reads it every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the interesting part of working with AI&lt;/strong&gt;, to me. The quirks don't disappear. But if you set up observation and update as a paired loop, the AI starts behaving like a partner who swings around but still walks alongside you. Not fixed. Raised together. That's where I've landed for now.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was adapted (not literally translated) from a Japanese original at &lt;a href="https://nomuraya-hub.pages.dev/" rel="noopener noreferrer"&gt;nomuraya-hub.pages.dev&lt;/a&gt;. I am the same author writing under different pen names — "nomuraya / shimajima / 中翔" — depending on the medium.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aicollaboration</category>
      <category>claude</category>
      <category>promptengineering</category>
      <category>rlhf</category>
    </item>
    <item>
      <title>Stop Telling Your AI to "Be Careful Next Time." It Has No Memory of Yesterday.</title>
      <dc:creator>Sho Naka</dc:creator>
      <pubDate>Mon, 22 Jun 2026 00:23:29 +0000</pubDate>
      <link>https://dev.to/nomurasan/stop-telling-your-ai-to-be-careful-next-time-it-has-no-memory-of-yesterday-i6c</link>
      <guid>https://dev.to/nomurasan/stop-telling-your-ai-to-be-careful-next-time-it-has-no-memory-of-yesterday-i6c</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is an adapted English version of an article I first wrote in Japanese. I work with AI to shape and review my drafts, but the argument and the field observations are my own. The numbers are cited from public surveys (linked at the end).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I built an aggressive prompt-injection block to stop my AI agent from repeating the same mistakes. It worked, so I kept adding rules. By the time I noticed, the file had ballooned to 56,000 characters — and the agent had quietly stopped functioning. Too much context, attention spread too thin to act on any of it. I gutted it back to under 1,200 characters, and here's the part that still stings: it behaved &lt;em&gt;better&lt;/em&gt; with fewer rules.&lt;/p&gt;

&lt;p&gt;That was the day I learned my whole mental model was backwards.&lt;/p&gt;

&lt;p&gt;This isn't a post about making your AI more accurate. It's about designing so that accuracy stops being the thing you depend on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake I made for months
&lt;/h2&gt;

&lt;p&gt;My agent kept skipping the same step in a workflow. So I did what every engineer does on instinct: I added a rule. "Don't skip this step." Then it did something else dumb, so I added another rule. Then another.&lt;/p&gt;

&lt;p&gt;I was treating the rules file like a conversation with a colleague — as if the agent would &lt;em&gt;remember&lt;/em&gt; yesterday's correction and carry it forward. It doesn't. Every run starts cold. "Be careful next time" assumes a next time that shares state with this time. For a stateless model, there is no continuity to appeal to. You are talking to a counterparty with no memory of the conversation you think you're having.&lt;/p&gt;

&lt;p&gt;So the rules pile up, because each correction feels like progress. And for a while the numbers even improve. But adding rules has a ceiling, and I blew straight through it: at 56,000 characters the agent wasn't reasoning over my guardrails anymore — it was drowning in them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowing a rule and stopping at it are different things
&lt;/h2&gt;

&lt;p&gt;Here's the distinction that took me far too long to see.&lt;/p&gt;

&lt;p&gt;Putting a rule in the context window means the model &lt;em&gt;knows&lt;/em&gt; the rule. It does not mean the model will &lt;em&gt;stop at&lt;/em&gt; that rule in the middle of a workflow, when it's three tool-calls deep and optimizing for finishing the task. Knowing and stopping are two different operations. The first is retrieval. The second is control flow.&lt;/p&gt;

&lt;p&gt;Text in a prompt is a soft suggestion competing for attention with everything else in the window. If you actually need the agent to halt, you don't write a more emphatic sentence. You make the constraint physical — a hook, a gate, a check that runs outside the model's discretion. The model can't talk its way past a script that refuses to proceed.&lt;/p&gt;

&lt;p&gt;This is the part of context engineering nobody puts on the slides: half the job is deciding what does &lt;em&gt;not&lt;/em&gt; belong in the context window, and moving those guarantees into code instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I stopped chasing accuracy
&lt;/h2&gt;

&lt;p&gt;It's worth saying plainly how thin the "just make it more accurate" bet really is.&lt;/p&gt;

&lt;p&gt;JUAS's 2025 survey found only &lt;strong&gt;4%&lt;/strong&gt; of companies said generative AI "greatly exceeded expectations." MIT's 2025 GenAI Divide report put the share of enterprise AI pilots that actually reached production at around &lt;strong&gt;5%&lt;/strong&gt;. And Persol's 2026 data showed only &lt;strong&gt;25.4%&lt;/strong&gt; of workers saw their working hours drop because of AI — with the heaviest AI users often logging &lt;em&gt;more&lt;/em&gt; overtime, not less.&lt;/p&gt;

&lt;p&gt;Read those together and a pattern falls out. The teams that win aren't the ones whose model is a few points more accurate. The accuracy is roughly the same for everyone. The difference is structural: whether the work &lt;em&gt;around&lt;/em&gt; the AI is designed so that imperfect output is still safe to use.&lt;/p&gt;

&lt;p&gt;That reframes the whole problem. Stop asking "how do I make it get this right every time?" Start asking "what do I do when it's wrong — and can I make that cheap?"&lt;/p&gt;

&lt;h2&gt;
  
  
  What designing-around-it actually looks like
&lt;/h2&gt;

&lt;p&gt;A few patterns I now reach for instead of writing another rule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make disagreement the signal.&lt;/strong&gt; A team I watched (estie) runs the same task through multiple models — GPT, Claude, Gemini — and routes &lt;em&gt;only the items where the models disagree&lt;/em&gt; to a human. Where they agree, ship it. You're not trusting any single model to be right; you're using cheap redundancy to localize where human attention is worth spending. Accuracy stopped being the dependency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write escalation criteria, not encouragement.&lt;/strong&gt; Instead of "be careful," I give the agent explicit hand-back conditions in the system prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can't solve it in two minutes → change approach&lt;/li&gt;
&lt;li&gt;Stuck for fifteen minutes → hand back to a human&lt;/li&gt;
&lt;li&gt;Change touches scope or design → always hand back&lt;/li&gt;
&lt;li&gt;Everything else → proceed autonomously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't morale. They're a decision boundary the agent can actually evaluate, with a defined exit instead of a vague aspiration to do well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give the agent a fixed position in the workflow.&lt;/strong&gt; The deployments still alive three months in all share one trait: the AI owns a &lt;em&gt;defined seat&lt;/em&gt;, not a vague mandate to "help." It drafts the meeting minutes. It handles first-line customer support. It assembles the deck. A narrow seat means a wrong answer has a small, known blast radius — and a small blast radius is what makes the thing usable at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I didn't expect
&lt;/h2&gt;

&lt;p&gt;When I cut the rules file and built the real constraints into hooks, the agent didn't just work again — I started trusting it &lt;em&gt;more&lt;/em&gt;. Counterintuitive, but it tracks. Once I could see exactly where it would stop, I could finally let it run.&lt;/p&gt;

&lt;p&gt;That's the actual function of a guardrail. It isn't there to slow you down; it's there so you can go faster without watching the edge. The careful people — the ones who were most skeptical of handing work to an AI — are the ones who start using it heavily once the boundaries are visible and physical.&lt;/p&gt;

&lt;p&gt;You will not lecture your AI into remembering. It has no yesterday. So stop spending your effort on the sentence, and spend it on the structure: build the place where it's allowed to stop, and then stop telling it to be careful.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;JUAS "Corporate IT Trends Survey 2025": &lt;a href="https://juas.or.jp/cms/media/2025/02/it25_2.pdf" rel="noopener noreferrer"&gt;https://juas.or.jp/cms/media/2025/02/it25_2.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MIT "The GenAI Divide: State of AI in Business 2025": &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Persol Research "Generative AI and Ways of Working" (Feb 2026): &lt;a href="https://rc.persol-group.co.jp/thinktank/data/generative-ai/" rel="noopener noreferrer"&gt;https://rc.persol-group.co.jp/thinktank/data/generative-ai/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;estie "Human-in-the-loop while running": &lt;a href="https://www.estie.jp/blog/entry/2025/05/29/094554" rel="noopener noreferrer"&gt;https://www.estie.jp/blog/entry/2025/05/29/094554&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published in Japanese on &lt;a href="https://note.com/notes/n6c8ea041b0b2" rel="noopener noreferrer"&gt;note&lt;/a&gt;. I write under "nomuraya / 中翔 (Naka Sho)" — the same person across media. This English version is adapted, not literally translated.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentic</category>
      <category>productivity</category>
      <category>contextengineering</category>
    </item>
  </channel>
</rss>
