<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: John Rojas</title>
    <description>The latest articles on DEV Community by John Rojas (@wordcaster).</description>
    <link>https://dev.to/wordcaster</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847289%2F54a2b5e5-dff3-42b7-810e-b0356d7010b6.jpg</url>
      <title>DEV Community: John Rojas</title>
      <link>https://dev.to/wordcaster</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wordcaster"/>
    <language>en</language>
    <item>
      <title>Checkbox theater: how I stopped trusting my AI agent to run the checks</title>
      <dc:creator>John Rojas</dc:creator>
      <pubDate>Sun, 24 May 2026 10:36:22 +0000</pubDate>
      <link>https://dev.to/wordcaster/checkbox-theater-how-i-stopped-trusting-my-ai-agent-to-run-the-checks-2gf1</link>
      <guid>https://dev.to/wordcaster/checkbox-theater-how-i-stopped-trusting-my-ai-agent-to-run-the-checks-2gf1</guid>
      <description>&lt;p&gt;For context: in &lt;a href="https://dev.to/wordcaster/i-kept-saying-the-same-five-things-in-every-doc-review-so-i-made-them-defaults-3lo4"&gt;the previous piece&lt;/a&gt;, I worked through a five-dimension review framework for documentation, covering clarity, readability, style, completeness, and technical accuracy. Those dimensions are now part of how our team's AI agent reviews PRs. It runs them on every review pass, quietly, in the background. Most people don't think about them. They just see the review output.&lt;/p&gt;

&lt;p&gt;Then I started catching things on my own review pass that the agent had marked clean. The style scan reported zero hits. I'd find three present-tense violations on the next read. A completeness check came back marked complete. A ticket requirement was unaddressed in the diff. The dimensions were running. They were also missing things, and I was the one finding what they missed.&lt;/p&gt;

&lt;p&gt;This piece is about what I learned from that gap, what I built to close it, and the bigger principle I'm still working through. Sharing as I go, in case any of it is useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I'd built up a set of gates around the dimension checks. A &lt;strong&gt;PRE-FLIGHT gate&lt;/strong&gt; that forced the agent to write a todo list with concrete execution methods for each dimension before any review work began. No "I'll check style" wishful thinking; you had to say "I'll run &lt;code&gt;gh pr diff&lt;/code&gt; and scan for forbidden terms, will/would violations, and passive voice constructions." A &lt;strong&gt;COMPLETION gate&lt;/strong&gt; that required documented evidence for all five dimensions before the review file could be written: findings, or "no issues, here's what I checked."&lt;/p&gt;

&lt;p&gt;It felt thorough. Looked thorough. Read thorough on paper. Was thorough in approximately the same way a paper checklist is fire-safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode
&lt;/h2&gt;

&lt;p&gt;What I started noticing across reviews:&lt;/p&gt;

&lt;p&gt;A style scan reported clean. On my own pass through the diff, I caught three present-tense violations.&lt;/p&gt;

&lt;p&gt;A completeness check came back marked complete. A ticket requirement was unaddressed.&lt;/p&gt;

&lt;p&gt;Sub-agents reported back with results no artifact on disk corroborated. "Ran the will/would scan, zero hits." Where? Show me. There was no &lt;code&gt;where&lt;/code&gt;. The scan had never produced output. It had produced a sentence.&lt;/p&gt;

&lt;p&gt;I want to be careful about how I name this. It was satisfying the instruction as stated. The PRE-FLIGHT todo said "run the will/would scan." Marking the todo complete satisfied the instruction. Whether the scan had actually executed and produced findings was, structurally, outside the loop. The gate was social, not mechanical. It depended on the agent choosing to do the work, and on me choosing to believe that the work had been done because the agent said so.&lt;/p&gt;

&lt;p&gt;I was reading the agent's confidence as evidence. Confidence is not evidence. It's a sentence about evidence. There is a difference, and the difference was costing me on every review.&lt;/p&gt;

&lt;p&gt;The phrase that stuck was checkbox theater. The gates existed. They had names, structure, even formal blocking semantics. What they didn't have was teeth. They lived in instructions, and instructions are wishes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift
&lt;/h2&gt;

&lt;p&gt;The question I put to the agent: can we make these gates mechanical? Not "the agent should run the check" but "the agent cannot move forward until the check has produced a written artifact, on disk, tied to the current state of the PR."&lt;/p&gt;

&lt;p&gt;That reframe is the whole article in one sentence. Evidence over status. Substrate over self-report. The shift from a gate that asks the agent to verify itself to a gate that verifies whether the agent has verified itself, where "has verified" is measured by file existence, not by claim.&lt;/p&gt;

&lt;p&gt;Once that shift was on the table, the implementation became obvious. Most of it, anyway. I'm still finding the edges.&lt;/p&gt;

&lt;h2&gt;
  
  
  The implementation: three moves
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Move 1: Scripts that write artifacts, not status flags
&lt;/h3&gt;

&lt;p&gt;Each mechanical scan now writes a JSON file. Not a status. Not a return code. A file, with hit counts, file paths, sample matches, a timestamp, and the SHA of the PR HEAD it was run against.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pr_number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NNNN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pr_head_oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;headRefOid&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-12T08:59:14Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dimensions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ran"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"style-gate.sh (5 scans: will/would, passive, placeholders, superlatives, boolean)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"readability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"scan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sentence length &amp;gt; 25 words on added prose lines"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"samples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"docs/api/auth.md:42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docs/api/auth.md:67"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ran"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SHA pin is doing real work. If the PR gets a new commit, the artifact's &lt;code&gt;pr_head_oid&lt;/code&gt; no longer matches the current HEAD. The artifact is now stale, which means the scan results are stale, which means whatever was clean five minutes ago is no longer demonstrably clean. The agent has to re-run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Move 2: A hook that intercepts the destination
&lt;/h3&gt;

&lt;p&gt;This is the move that turned out to matter most. Cursor supports a &lt;code&gt;beforeShellExecution&lt;/code&gt; hook: a shell script that runs before any shell command the agent issues. The hook reads the command, decides whether it's a PR-write command (&lt;code&gt;gh api .../pulls/&amp;lt;N&amp;gt;/comments&lt;/code&gt;, &lt;code&gt;gh pr edit --body&lt;/code&gt;, &lt;code&gt;gh pr comment&lt;/code&gt;), and if so, validates the gate artifacts before deciding whether to allow or deny.&lt;/p&gt;

&lt;p&gt;The mechanism here is Cursor-specific. The principle isn't. Other agentic tools have equivalent shell-level hooks; if yours doesn't, the enforcement point shifts to a pre-commit hook or a CI gate, but the move is the same: put the verification somewhere the agent can't talk its way past.&lt;/p&gt;

&lt;p&gt;The validation is dumb on purpose. Does &lt;code&gt;PR-&amp;lt;N&amp;gt;-tickets.json&lt;/code&gt; exist? Does it have &lt;code&gt;status: "loaded"&lt;/code&gt; or &lt;code&gt;"partial_blocked"&lt;/code&gt;? Does its &lt;code&gt;pr_head_oid&lt;/code&gt; match the current HEAD from &lt;code&gt;gh pr view&lt;/code&gt;? Same questions for &lt;code&gt;PR-&amp;lt;N&amp;gt;-gate.json&lt;/code&gt;. If any check fails, the hook returns deny with a clear message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pr-review-gate-hook BLOCKED gh api .../pulls/NNNN/comments on PR #NNNN.

Missing or stale gate artifacts:
- Stale: PR-NNNN-gate.json pr_head_oid=&amp;lt;old-sha&amp;gt; but current PR HEAD is &amp;lt;new-sha&amp;gt;
  Re-run: ~/Documents/docs-agent/scripts/review-gate.sh NNNN

Resolve and retry. Bypass available for one command via environment variable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What changed when this hook went live: the agent stopped being able to lie about whether it had run the scans. There was nothing for it to lie about. Either the artifact existed and the SHA matched, or the call to GitHub got blocked. The agent could still produce a sentence saying "I ran the scan." That sentence no longer affected anything. The hook didn't read sentences. It read files.&lt;/p&gt;

&lt;p&gt;Here's the before-and-after, visually:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgltfeuf89pnwjmhg90de.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgltfeuf89pnwjmhg90de.png" alt="Two-panel flowchart comparing before instruction-only gates (agent self-reports without an artifact) and after hook-enforced gates (hook validates gate.json before allowing the POST to GitHub)." width="800" height="936"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One subtle but important detail: if the hook itself errors, if &lt;code&gt;jq&lt;/code&gt; isn't installed, if &lt;code&gt;gh&lt;/code&gt; can't reach the network, the command is allowed through with a warning. The gate fails open. This is deliberate. The cost of false negatives, a stale artifact slipping through, is low because the next review will catch it. The cost of false positives, every command bricked because the hook crashed, is high. A bypass environment variable exists for the same reason: when you genuinely need to override, you can, but you have to do it on purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Move 3: The dimension rules that match
&lt;/h3&gt;

&lt;p&gt;Hard gating only works if the gates point at the right things, so the rules inside the dimensions got tightened too. The style dimension can no longer be satisfied by citing the style guide; you have to run the mechanical scans and either resolve every hit via inline suggestion or document zero matches with the command that produced them. The completeness dimension requires a per-requirement mapping table built from &lt;code&gt;tickets.json&lt;/code&gt;, not from the PR diff alone, because feature-mapping from the artifact being reviewed is circular. The rule structure stopped being aspirational and started being operational. "Run the check" turned into "produce the artifact that proves you ran the check, and here's the schema."&lt;/p&gt;

&lt;h2&gt;
  
  
  The feedback loop: lessons as infrastructure
&lt;/h2&gt;

&lt;p&gt;Hard gating handles the failure modes I already knew about. It does not handle the ones I haven't run into yet. For those, there's a separate piece: the gap log.&lt;/p&gt;

&lt;p&gt;The gap log is an append-only file. After every review, the &lt;strong&gt;POST-REVIEW IMPROVEMENT gate&lt;/strong&gt; runs: take every reviewer comment, ask whether the workflow could have caught it before submission, and if yes, draft a check that would catch it next time. The check gets logged.&lt;/p&gt;

&lt;p&gt;The format is one line per gap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-05-10 | PR-1234 | style    | passive voice not caught on added definition lines | open
2026-05-12 | PR-1242 | complete | nav entry missing for new partial                  | mechanized
2026-04-22 | PR-1289 | clarity  | "this powerful feature" not caught                 | resolved
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three statuses do the work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;open&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Logged but not yet caught by any script or hook. Next PRE-FLIGHT reads this and injects the gap as an additional dimension check.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mechanized&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A scan or hook now catches this pattern automatically. The gap can sit dormant; the infrastructure handles it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;resolved&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The underlying recurring pattern is gone (often because upstream changed). No further check needed.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What this does, structurally, is convert one-time learnings into infrastructure. A gap surfaced in PR-1234 doesn't sit in a Slack message I'll forget about. It sits in the log. The next PRE-FLIGHT reads the log and reminds the agent. When I get around to writing a script that catches the pattern mechanically, the status flips. The lesson doesn't depend on me remembering it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tp23c5rwkz6trg9hv3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tp23c5rwkz6trg9hv3m.png" alt="Flowchart of the gap log lifecycle: review surfaces a gap, gap is logged as 'open', the next PRE-FLIGHT injects it as a dimension check. If the pattern recurs, build a script and mark it 'mechanized'; if the root cause is fixed upstream, mark it 'resolved'." width="800" height="1339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Honest disclosure: this part still has friction. At the end of each session, I have to prompt the agent to walk through reviewer comments and log gaps. The pattern isn't fully self-sustaining yet. The log gets written. The injection works. The "remember to look at the comments" step is still mine. There's a future article in closing that gap, and I'm still figuring out the right shape for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The principle
&lt;/h2&gt;

&lt;p&gt;Trust-but-verify is the wrong frame for AI workflows, because verification is what you're asking the LLM to do. The agent that runs the check is the same agent reporting on whether the check ran. There is no second agent watching. The integrity of the whole thing depends on a self-report from the entity being audited. That's not a verification model. That's a wish.&lt;/p&gt;

&lt;p&gt;The fix isn't a better prompt. The fix is moving verification outside the agent's control loop entirely. The agent can write any sentence about whether it ran a scan. The hook cannot write any sentence about whether a file exists. The file is on disk or it isn't. The SHA matches or it doesn't. There is no clever phrasing the agent can produce that changes that.&lt;/p&gt;

&lt;p&gt;Where I am right now: hard gates for the failure modes I know about, gap log for the ones I don't, manual prompting for the meta-check that I don't have a way to automate yet. It's a workable system I'm still actively improving.&lt;/p&gt;

&lt;p&gt;If you're building AI-assisted workflows, especially ones where the LLM both does the work and audits the work, I'd push you to ask the same question I had to ask: what does my agent actually produce, on disk, that I could check without taking its word for anything? If the answer is nothing, that's checkbox theater. The first step to closing the gap is making it produce something.&lt;/p&gt;

&lt;p&gt;More on the specific mechanical scanners in the next piece. More on the gap log and the durability problem in the one after that. If you've solved any of this, or seen a sharper take on it, I'd want to hear from you.&lt;/p&gt;




&lt;p&gt;I'm publishing this while the system is still actively improving, because the principle landed for me before the implementation did, and the implementation isn't finished. If you're building AI workflows where the agent both does the work and audits it, I'd want to hear how you've thought about that gap, or whether you've found a way to close it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on &lt;a href="https://www.linkedin.com/in/jeprojas" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>technicalwriting</category>
      <category>documentation</category>
      <category>devrel</category>
    </item>
    <item>
      <title>I kept saying the same five things in every doc review. So I made them defaults.</title>
      <dc:creator>John Rojas</dc:creator>
      <pubDate>Thu, 14 May 2026 19:23:09 +0000</pubDate>
      <link>https://dev.to/wordcaster/i-kept-saying-the-same-five-things-in-every-doc-review-so-i-made-them-defaults-3lo4</link>
      <guid>https://dev.to/wordcaster/i-kept-saying-the-same-five-things-in-every-doc-review-so-i-made-them-defaults-3lo4</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/wordcaster/what-ai-actually-changed-about-my-work-as-a-technical-writer-2edh"&gt;my last piece&lt;/a&gt;, I wrote about how AI changed my work as a technical writer. I closed with a line I keep coming back to: technical writing is shifting from producing content to building the gates, constraints, and workflows that ensure content is accurate before it ever reaches a reader. This piece is about one of those constraints, named: five things I check in every documentation review, and how I went from reciting them out loud every time to having them run by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I got here
&lt;/h2&gt;

&lt;p&gt;I was reviewing PRs alongside an AI agent. The agent was helpful. The agent was also unevenly helpful. Some reviews came back sharp. Others missed things any first read should have caught. The pattern wasn't in what the agent did. It was in what I kept telling it.&lt;/p&gt;

&lt;p&gt;At first, I was iterating on prompts. Trying different framings, different phrasings, different orders of operations. Some produced sharper reviews. Some didn't. And I wasn't writing any of it down.&lt;/p&gt;

&lt;p&gt;Then I'd remember a recent review that had come out particularly well, and I'd want to use the same approach again, and I'd realize I had no idea what I'd told the agent that time. I found myself reconstructing the prompts from memory. And badly at that. Yikes.&lt;/p&gt;

&lt;p&gt;So I opened a Slack message to myself and pasted in my current working prompt. The one with the five things I kept asking the agent to check. Then I closed Slack and got back to work.&lt;/p&gt;

&lt;p&gt;The next time I started a review, I opened that Slack message and pasted the prompt into Cursor. The time after that, I did the same thing. And the time after that. And the time after that. And another. And another. You know where this is going.&lt;/p&gt;

&lt;p&gt;I want to say this didn't go on for long. But it did. OMG it did. I was apparently fine pasting the same wall of text into every conversation, multiple times a day, until my own laziness finally lit up a bulb.&lt;/p&gt;

&lt;p&gt;If I'm pasting the same instructions every time, the agent shouldn't be relying on me to paste them. They should just be there. By default.&lt;/p&gt;

&lt;p&gt;Something else was eating at me in the background. Every long-pasted prompt was eating up part of the context window before the agent had even started reading the diff. That seemed wasteful. (Spoiler: it was.)&lt;/p&gt;

&lt;p&gt;So I started baking the five dimensions into my local preferences. Not the agent rules. Not the slash commands. Just my own preferences and local memory, where I could iterate without touching anything the rest of the team relied on. I wanted to test whether the bake-in actually preserved the catches I was getting from manual prompting. If the local version produced worse reviews than my paste-everything version, I'd know early. And nobody else would have to deal with it.&lt;/p&gt;

&lt;p&gt;It didn't produce worse reviews. It produced consistent ones, which is its own kind of better.&lt;/p&gt;

&lt;p&gt;So I named the thing I'd been doing all along.&lt;/p&gt;

&lt;p&gt;Clarity. Readability. Style. Technical accuracy. Completeness.&lt;/p&gt;

&lt;p&gt;Five dimensions. Each one a specific kind of failure I'd seen the agent commit, and a specific kind of catch I'd had to make on my own pass through the diff. Once they lived in the rules, the realization landed: this isn't a checklist I should be reciting from memory before every review. It's a checklist that should run by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five dimensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Clarity
&lt;/h3&gt;

&lt;p&gt;Technical documentation has an obscurity problem. Layers of concepts, configuration options, dependencies, prerequisites. Somewhere underneath the torrent of context, the actual thing you came to learn. Clarity is the dimension that asks: can a reader who doesn't already know this still follow it?&lt;/p&gt;

&lt;p&gt;It catches the sentence that assumes too much. The acronym used before it's defined. The reference to a concept that's actually three concepts. The instruction that skips a step because the author considered it obvious.&lt;/p&gt;

&lt;p&gt;When clarity is missing, the reader rereads. Then they reread again. Then they give up and ask in Slack. Documentation that needs a translation isn't documentation. It's a draft.&lt;/p&gt;

&lt;h3&gt;
  
  
  Readability
&lt;/h3&gt;

&lt;p&gt;This one is structural, not conceptual. Readability is about how text feels on the page. The sentence that runs three clauses long without breath. The paragraph that's nine lines tall and looks like a wall. The bulleted list that's actually seven sub-points masquerading as one.&lt;/p&gt;

&lt;p&gt;The goal isn't "easy" prose. The goal is prose someone can absorb without working at it. Chunking. Variation. White space. The boring craft of making information accessible to a reader who is tired, distracted, or scanning while their build runs.&lt;/p&gt;

&lt;p&gt;When readability is missing, the eye glazes. The reader skims past the part that matters. Worse, they skip the page entirely and search for an answer somewhere else.&lt;/p&gt;

&lt;h3&gt;
  
  
  Style
&lt;/h3&gt;

&lt;p&gt;Every team has a style guide. Most are imperfect. All of them are non-negotiable inside the codebase they belong to.&lt;/p&gt;

&lt;p&gt;Style is the dimension that catches drift from those rules. Present tense where past tense slipped in. The term we use vs. the term we don't. Sentence case in headings when it should be title case, or the other way around. Boolean phrasing when we've agreed on declarative.&lt;/p&gt;

&lt;p&gt;I'm a stickler for directional language in particular. We don't allow it. I've been known to chase every instance of "via" across a doc set and hunt each one down to oblivion. (Nah, I just replace them with something else.) It's an itch I have to scratch, and honestly, this kind of work is what I live for. (My obsessive tendencies might be showing. I'm fine with it.)&lt;/p&gt;

&lt;p&gt;It's also the dimension AI agents are most likely to silently abandon mid-document. Style rules are pattern-matching against a long list of small constraints. Context windows get full, attention drifts, and what should have been a will/would catch becomes a missed will. The agent didn't forget the rule. It just stopped applying it.&lt;/p&gt;

&lt;p&gt;When style is missing, the doc set drifts. Voice fragments. Readers feel the inconsistency before they can name it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical accuracy
&lt;/h3&gt;

&lt;p&gt;This is the highest-stakes dimension. Technical accuracy is the dimension that asks: is this actually true?&lt;/p&gt;

&lt;p&gt;It catches the configuration property that doesn't exist. The endpoint behavior described backwards. The version compatibility claim that contradicts the upgrade guide. The example that compiled in someone's head but not in a terminal.&lt;/p&gt;

&lt;p&gt;The configuration property that doesn't exist is the one I keep coming back to. I added this dimension specifically because of one passage I reviewed that read beautifully. Confident, authoritative, smooth. Then I went to check the configuration properties it referenced. They didn't exist. Not anywhere in the codebase. The text was perfectly written and completely wrong. That's when technical accuracy became its own line item on every review.&lt;/p&gt;

&lt;p&gt;The check has two parts. Source-grounded: does this match the codebase, the specs, the schemas, the actual source of truth? Cross-document consistent: does this conflict with what we've already said elsewhere?&lt;/p&gt;

&lt;p&gt;Both halves matter. A statement can be technically correct against the code and still wrong in the context of the broader documentation. Two pages can each be internally consistent and contradict each other. AI agents are particularly good at producing both kinds of failure, in perfect sentences, with confident phrasing.&lt;/p&gt;

&lt;p&gt;When technical accuracy is missing, readers act on bad information. That's the worst outcome documentation can produce. Everything else is just polish.&lt;/p&gt;

&lt;h3&gt;
  
  
  Completeness
&lt;/h3&gt;

&lt;p&gt;This is the dimension most external to the doc itself. Completeness asks: did we address what the ticket actually said?&lt;/p&gt;

&lt;p&gt;It catches the requirement that's documented in the PR description but not in the docs. The acceptance criterion that got lost between planning and merge. The new partial that needs a nav entry. The feature flag that quietly changes behavior in a way someone needs to know about.&lt;/p&gt;

&lt;p&gt;The trick with completeness is that it requires holding two artifacts in mind at once: the change being reviewed, and the requirements that motivated the change. AI agents handle this poorly by default. They look at the diff and review the diff. They forget to look back at the ticket and ask whether the diff is enough.&lt;/p&gt;

&lt;p&gt;When completeness is missing, the docs ship and a customer finds the gap two releases later.&lt;/p&gt;

&lt;h3&gt;
  
  
  The five at a glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;th&gt;What missing it looks like&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Clarity&lt;/td&gt;
&lt;td&gt;Concepts that need decoding&lt;/td&gt;
&lt;td&gt;Reader rereads, gives up, asks in Slack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Readability&lt;/td&gt;
&lt;td&gt;Walls of text, no rhythm&lt;/td&gt;
&lt;td&gt;Eye glazes, page gets skipped&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style&lt;/td&gt;
&lt;td&gt;Drift from style guide rules&lt;/td&gt;
&lt;td&gt;Voice fragments, inconsistency felt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical accuracy&lt;/td&gt;
&lt;td&gt;Hallucinations and contradictions&lt;/td&gt;
&lt;td&gt;Readers act on bad information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Completeness&lt;/td&gt;
&lt;td&gt;Missing requirements or acceptance criteria&lt;/td&gt;
&lt;td&gt;Customer finds the gap two releases later&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What naming didn't solve
&lt;/h2&gt;

&lt;p&gt;Baking the five dimensions into the rules solved one problem. It didn't solve all of them.&lt;/p&gt;

&lt;p&gt;Naming the dimensions told the agent what to check. It didn't tell the agent how to verify the check had actually happened. There's a gap between "the agent ran a clarity pass" and "the agent ran a clarity pass and the output exists somewhere on disk that I can read." For a while, I didn't see that gap. The agent would report clean across all five dimensions, and I'd believe it, because the dimensions were named and the dimensions were "covered."&lt;/p&gt;

&lt;p&gt;Then I started catching things on my own review pass that the agent had marked clean. Why? That's a tale for another time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What might be missing
&lt;/h2&gt;

&lt;p&gt;Five dimensions felt like enough when I wrote them down. I'm less sure of it now.&lt;/p&gt;

&lt;p&gt;A few I've been mulling over, but haven't committed to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accessibility.&lt;/strong&gt; Heading hierarchy, alt text on images, link labels that say more than "click here," screen reader behavior on tables and code blocks. Most style guides treat this as a subset of style. I'm not convinced it deserves to live under style; the failure modes are too different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-document consistency.&lt;/strong&gt; This currently sits inside technical accuracy ("doesn't contradict what we've said elsewhere"). But the failure mode is structural, not factual. Two pages can each be true and still pull a reader in opposite directions. Worth a separate dimension? Maybe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discoverability.&lt;/strong&gt; A page that's accurate, clear, well-styled, and complete is still useless if no one can find it. Nav placement, search terms, related-links suggestions. Probably belongs somewhere in the framework. I just don't know where.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audience appropriateness.&lt;/strong&gt; A doc written for the wrong reader fails before any of the other dimensions get a chance to. Did we write this for the developer, the admin, or the integrator? Each one needs a different shape.&lt;/p&gt;

&lt;p&gt;None of these are settled. They might fold into the existing five with a sharper definition. They might warrant standalone dimensions. They might turn out not to matter at the level of automated review. I don't know what I don't know yet. The fun is in finding out.&lt;/p&gt;

&lt;p&gt;Are these the kinds of things you check for in your own reviews? I'd be curious to hear what's on your list.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This is the first piece in a small series I'm working through on AI-assisted doc reviews. Next up: what happened when I discovered that naming the dimensions wasn't enough to make the agent actually check them.&lt;/p&gt;

&lt;p&gt;The build is never finished. You just keep improving it. That still suits me fine.&lt;/p&gt;




&lt;p&gt;I'm sharing this because the framework isn't a destination. It's a working hypothesis. Five named dimensions emerged from my own habits and my own catches, and the next reviewer to read past them might find a sixth. If you've shaped something similar from your own practice, or watched these dimensions miss things mine hasn't yet, I'd genuinely love to hear about it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on &lt;a href="https://www.linkedin.com/in/jeprojas" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>technicalwriting</category>
      <category>documentation</category>
      <category>devrel</category>
    </item>
    <item>
      <title>What AI actually changed about my work as a technical writer</title>
      <dc:creator>John Rojas</dc:creator>
      <pubDate>Sat, 28 Mar 2026 08:44:57 +0000</pubDate>
      <link>https://dev.to/wordcaster/what-ai-actually-changed-about-my-work-as-a-technical-writer-2edh</link>
      <guid>https://dev.to/wordcaster/what-ai-actually-changed-about-my-work-as-a-technical-writer-2edh</guid>
      <description>&lt;p&gt;For me, the most frustrating part of technical writing was never the writing.&lt;/p&gt;

&lt;p&gt;It was the waiting.&lt;/p&gt;

&lt;p&gt;Waiting for a subject matter expert to have a free hour. Waiting for a developer to confirm whether something changed between releases. The information existed. It just lived in someone else's head, or somewhere in a codebase I wasn't expected to touch.&lt;/p&gt;

&lt;p&gt;To be fair, waiting wasn't always the only option. I've always preferred finding things out for myself, digging into a ticket, tracing a thread, unblocking my own work rather than sitting on a question. But self-directed digging has its own cost. It takes time, it can send you down the wrong path, and without the right tools it's easy to spend an hour finding half an answer. The codebase was there. I just didn't have a good way in.&lt;/p&gt;

&lt;p&gt;That changed when my team started experimenting with AI-assisted documentation workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I started
&lt;/h2&gt;

&lt;p&gt;I didn't build the workflow. I was asked to test it.&lt;/p&gt;

&lt;p&gt;A colleague had been developing an AI-assisted approach to documentation and brought me in to try it, break it, and help refine it before we rolled it out to the wider team. That process taught me more about how AI actually works in practice than any course I've taken.&lt;/p&gt;

&lt;p&gt;The core idea was straightforward. Instead of waiting for a developer to explain a feature, we used AI tools to investigate the codebase directly. Read the source. Check the specs. Cross-reference existing documentation. Then plan what to write before touching a single file.&lt;/p&gt;

&lt;p&gt;Refining the workflow felt a lot like optimizing a character build in a game. You start with what you have, figure out what works, pick up better tools along the way, and keep tweaking until the whole thing plays smoothly. There's no perfect final state. You just keep improving the build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I noticed
&lt;/h2&gt;

&lt;p&gt;Three things stood out.&lt;/p&gt;

&lt;p&gt;The first was speed. Not because AI writes faster, but because the back-and-forth shrinks. Fewer dependency chains. Less time blocked waiting for input that may or may not arrive before the deadline.&lt;/p&gt;

&lt;p&gt;The second was confidence. I could start a piece of work with a clearer picture of what I was walking into. Instead of beginning from a blank page and a vague (sometimes even empty, shocker, I know) ticket description, I had a verified starting point grounded in what the code actually does. That changes how you work.&lt;/p&gt;

&lt;p&gt;The third was autonomy. I could start work on a ticket without needing anyone's permission or availability. That shift is harder to explain than it sounds. After years of structuring your work around other people's calendars, being able to just start is genuinely different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I also learned
&lt;/h2&gt;

&lt;p&gt;Three things here too, and none of them are about prompting.&lt;/p&gt;

&lt;p&gt;The first is that AI-assisted workflows are only as good as the information they work with. Garbage in, garbage out. If the source material is incomplete or inconsistent, the output reflects that. You still need to know what good documentation looks like. You still need to catch what the AI misses. Judgment doesn't go away. It just gets applied differently.&lt;/p&gt;

&lt;p&gt;The second is that critical thinking becomes more important, not less. You're not just reviewing writing anymore. You're reviewing claims. Every output needs to be read with the question: is this actually true, or does it just sound true? AI doesn't flag uncertainty the way a careful human writer might. It states things. Confidently. Whether they're correct or not. In an age where misinformation spreads faster than corrections, accuracy and fact-checking are non-negotiable. That doesn't change just because the content came from an AI. If anything, it becomes more important.&lt;/p&gt;

&lt;p&gt;The third is that domain knowledge and attention to detail are not optional. They're your last line of defense. AI will describe a configuration property that doesn't exist, get the behavior of an endpoint backwards, and do it in perfect sentences. It also skips style rules, not always because it doesn't know them, but because context window limits or what I can only describe as selective laziness means it stops applying them mid-document. Directional language, incorrect linking patterns, inconsistent terminology. If you don't know the subject matter well enough to catch a wrong answer, or the style guide well enough to catch a broken rule, those errors walk straight out the door into your published docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I think this is going
&lt;/h2&gt;

&lt;p&gt;Technical writing is changing. The role is shifting from producing content to something broader: building the gates, constraints, and workflows that ensure content is accurate before it ever reaches a reader.&lt;/p&gt;

&lt;p&gt;The core purpose stays the same. Technically accurate content, usable by a specific audience. But the way we get there is different.&lt;/p&gt;

&lt;p&gt;I started this by talking about waiting. Waiting for the right person, the right meeting, the right moment to get unblocked. What I've learned is that the writers who thrive won't be the ones who wait less. They'll be the ones who bring more to the table when they show up: domain knowledge, critical thinking, a sharp eye for detail, and the ability to build workflows that catch what they miss.&lt;/p&gt;

&lt;p&gt;The build is never finished. You just keep improving it.&lt;/p&gt;

&lt;p&gt;That suits me fine.&lt;/p&gt;




&lt;p&gt;I'm sharing this because I want to hear how others are navigating the same shift. There's been no shortage of hype and fear around AI in our industry, and honestly, some of that fear is justified. Across the industry, roles are changing. But changing isn't the same as disappearing. What I've experienced is less replacement and more transformation. The job looks different, demands different things, and rewards different skills than it did a few years ago. If you're in the middle of that shift too, I'd genuinely love to hear how you're doing it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on &lt;a href="https://www.linkedin.com/in/jeprojas" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>technicalwriting</category>
      <category>documentation</category>
      <category>career</category>
    </item>
  </channel>
</rss>
