<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mufeng</title>
    <description>The latest articles on DEV Community by mufeng (@changyou).</description>
    <link>https://dev.to/changyou</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3969584%2Fc906e166-2d8c-4b15-8630-8abd8638cad2.png</url>
      <title>DEV Community: mufeng</title>
      <link>https://dev.to/changyou</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/changyou"/>
    <language>en</language>
    <item>
      <title>The Real AI Productivity Hack Is Not a Better Prompt</title>
      <dc:creator>mufeng</dc:creator>
      <pubDate>Sat, 04 Jul 2026 00:47:04 +0000</pubDate>
      <link>https://dev.to/changyou/the-real-ai-productivity-hack-is-not-a-better-prompt-27dk</link>
      <guid>https://dev.to/changyou/the-real-ai-productivity-hack-is-not-a-better-prompt-27dk</guid>
      <description>&lt;p&gt;I used to think the next jump in AI productivity would come from writing better prompts.&lt;/p&gt;

&lt;p&gt;Longer prompts. More precise prompts. Prompts with role definitions, tone rules, examples, constraints, and output formats.&lt;/p&gt;

&lt;p&gt;After reading a book on Agent Skills, I think that framing is too small.&lt;/p&gt;

&lt;p&gt;The real bottleneck is not that I fail to explain a task once. The real bottleneck is that I keep explaining the same class of task again and again: how I want an article structured, how I review code, how I prepare App Store release notes, how I generate visuals, how I check a draft before publishing.&lt;/p&gt;

&lt;p&gt;At some point, “using AI” quietly turns into “managing AI manually.”&lt;/p&gt;

&lt;p&gt;The book’s most useful idea is simple:&lt;/p&gt;

&lt;p&gt;AI productivity does not come from making every prompt longer. It comes from turning repeated work into executable, maintainable, testable skills.&lt;/p&gt;

&lt;p&gt;That changed how I think about AI work.&lt;/p&gt;

&lt;p&gt;A Skill Is Not a Prompt&lt;br&gt;
A prompt is a temporary instruction inside one conversation.&lt;/p&gt;

&lt;p&gt;A skill is a reusable operating manual for an agent.&lt;/p&gt;

&lt;p&gt;That difference sounds small until you use AI every day. A prompt tells the model what you want right now. A skill tells the agent how a category of work should be done every time:&lt;/p&gt;

&lt;p&gt;when to activate&lt;br&gt;
what input to read&lt;br&gt;
what steps to follow&lt;br&gt;
which tools or scripts to call&lt;br&gt;
what output to produce&lt;br&gt;
what must never happen&lt;br&gt;
where the agent should stop and ask for human judgment&lt;br&gt;
That last part matters.&lt;/p&gt;

&lt;p&gt;The goal is not to remove the human from the work. The goal is to stop spending human attention on the same low-level instructions.&lt;/p&gt;

&lt;p&gt;For me, the most obvious candidates are not exotic:&lt;/p&gt;

&lt;p&gt;a writing style skill&lt;br&gt;
a code review skill&lt;br&gt;
an iOS release checklist skill&lt;br&gt;
an App Store release notes skill&lt;br&gt;
a book notes skill&lt;br&gt;
a weekly review skill&lt;br&gt;
These are not tasks I cannot do. They are tasks where I keep repeating the same standards, preferences, caveats, and checks.&lt;/p&gt;

&lt;p&gt;That repetition is the real cost.&lt;/p&gt;

&lt;p&gt;The Useful Split: Judgment, Mechanics, and Workflow&lt;br&gt;
One of the cleanest distinctions in the book is this:&lt;/p&gt;

&lt;p&gt;prompts handle semantic judgment&lt;br&gt;
scripts handle deterministic mechanics&lt;br&gt;
skills orchestrate the whole workflow&lt;br&gt;
This sounds obvious, but many AI workflows fail because they give the model the wrong job.&lt;/p&gt;

&lt;p&gt;For example, asking a model to decide where an article needs illustrations is reasonable. Asking it to reliably rename files, validate image dimensions, split long documents, or calculate table values is usually a mistake.&lt;/p&gt;

&lt;p&gt;Those are deterministic jobs. They should be handled by scripts or strict tools.&lt;/p&gt;

&lt;p&gt;The model is better used for judgment:&lt;/p&gt;

&lt;p&gt;choosing the angle of an essay&lt;br&gt;
identifying the weak part of a draft&lt;br&gt;
comparing two architecture options&lt;br&gt;
explaining a tradeoff&lt;br&gt;
turning rough material into clear language&lt;br&gt;
The skill sits above both. It says: when this kind of task appears, use the model for the judgment parts, use scripts for the mechanical parts, and preserve the checkpoints where a human decision is required.&lt;/p&gt;

&lt;p&gt;That is a much more durable pattern than trying to put everything into one giant prompt.&lt;/p&gt;

&lt;p&gt;Context Is a Workbench, Not a Warehouse&lt;br&gt;
Large context windows make it tempting to dump everything into the conversation.&lt;/p&gt;

&lt;p&gt;Style guides. Prior chats. Examples. Templates. API docs. Drafts. Personal preferences. All of it.&lt;/p&gt;

&lt;p&gt;The book argues for the opposite discipline: load the right material at the right time.&lt;/p&gt;

&lt;p&gt;That is how skills should be designed. The main SKILL.md should not become a warehouse. It should contain the core workflow:&lt;/p&gt;

&lt;p&gt;trigger conditions&lt;br&gt;
inputs and outputs&lt;br&gt;
main steps&lt;br&gt;
hard constraints&lt;br&gt;
failure modes&lt;br&gt;
references to load only when needed&lt;br&gt;
Long templates, examples, API notes, and style samples belong in separate reference files.&lt;/p&gt;

&lt;p&gt;This is not just about token savings. It is about attention. The more unrelated material you push into context, the easier it becomes for the model to miss the one rule that actually matters.&lt;/p&gt;

&lt;p&gt;Context should feel like a workbench: only the tools needed for the current job should be on it.&lt;/p&gt;

&lt;p&gt;Good Workflows Are Not Fully Automatic&lt;br&gt;
The dangerous version of AI automation is the one that looks efficient because it removes every pause.&lt;/p&gt;

&lt;p&gt;Become a Medium member&lt;br&gt;
Give the agent source material. Let it choose the angle. Let it write the draft. Let it polish the draft. Let it generate images. Let it publish.&lt;/p&gt;

&lt;p&gt;That looks like a productivity win. Often it is just a way to outsource the most important decisions.&lt;/p&gt;

&lt;p&gt;The better workflow is more selective.&lt;/p&gt;

&lt;p&gt;For writing, I want AI to:&lt;/p&gt;

&lt;p&gt;analyze source material&lt;br&gt;
propose several angles&lt;br&gt;
stop&lt;br&gt;
let me choose the angle&lt;br&gt;
draft from that angle&lt;br&gt;
revise against my standards&lt;br&gt;
prepare platform-specific versions&lt;br&gt;
The pause is not friction. It is the point.&lt;/p&gt;

&lt;p&gt;The same applies to development. AI can propose implementation plans, write tests, scan for regressions, and generate release notes. But architecture decisions, product tradeoffs, and publish decisions still need human ownership.&lt;/p&gt;

&lt;p&gt;AI can do the prep work. It should not silently take over the judgment.&lt;/p&gt;

&lt;p&gt;Skills Need Engineering, Not Decoration&lt;br&gt;
A useful skill should be treated more like a small software product than a clever note.&lt;/p&gt;

&lt;p&gt;That means it has a lifecycle:&lt;/p&gt;

&lt;p&gt;define the real problem&lt;br&gt;
build the smallest usable version&lt;br&gt;
run it on real tasks&lt;br&gt;
record failure modes&lt;br&gt;
add tests or examples&lt;br&gt;
refactor when the file becomes too large&lt;br&gt;
keep improving it as the work changes&lt;br&gt;
The most useful part of a skill is often not the elegant workflow. It is the “gotchas” section.&lt;/p&gt;

&lt;p&gt;That is where you record the failures that keep happening:&lt;/p&gt;

&lt;p&gt;the agent forgot to read the reference template&lt;br&gt;
the output sounded too generic&lt;br&gt;
the script handled the wrong file path&lt;br&gt;
the model rewrote sections it should have preserved&lt;br&gt;
the task needed a human checkpoint before publishing&lt;br&gt;
This is where personal experience becomes operational memory.&lt;/p&gt;

&lt;p&gt;If the same mistake happens twice, it probably belongs in the skill. If the same task happens three times, it is probably a candidate for a skill.&lt;/p&gt;

&lt;p&gt;The Security Boundary Is Part of the Design&lt;br&gt;
Skills become more serious when they can read files, write files, call scripts, access the network, or publish content.&lt;/p&gt;

&lt;p&gt;At that point, they are not just prompts. They are operational tools.&lt;/p&gt;

&lt;p&gt;So the safety rules need to be designed in from the beginning:&lt;/p&gt;

&lt;p&gt;limit where the skill can read and write&lt;br&gt;
avoid destructive actions without confirmation&lt;br&gt;
back up before overwriting important files&lt;br&gt;
test publishing workflows with fake data first&lt;br&gt;
remove local paths, secrets, and personal assumptions before sharing a skill publicly&lt;br&gt;
inspect third-party skills before running their scripts&lt;br&gt;
This is not paranoia. It is basic engineering hygiene.&lt;/p&gt;

&lt;p&gt;The more capable the agent becomes, the more explicit the boundaries must be.&lt;/p&gt;

&lt;p&gt;What I Am Going to Try First&lt;br&gt;
The book made the idea feel concrete enough that I can turn it into a weekly habit.&lt;/p&gt;

&lt;p&gt;This week, I would start with three small skills.&lt;/p&gt;

&lt;p&gt;First: a writing style skill.&lt;/p&gt;

&lt;p&gt;Not a giant manifesto. Just a role, three style principles, a short banned-phrase list, and a few examples of what “good” looks like.&lt;/p&gt;

&lt;p&gt;Second: an iOS or app release checklist skill.&lt;/p&gt;

&lt;p&gt;The first version only needs to cover version number, release notes, screenshots, privacy text, and a final manual confirmation before submission.&lt;/p&gt;

&lt;p&gt;Third: a gotchas section for existing skills.&lt;/p&gt;

&lt;p&gt;Take the last three AI outputs that were disappointing. Convert each failure into a specific rule. Do not patch for one example. Capture the pattern.&lt;/p&gt;

&lt;p&gt;There is also one experiment worth running immediately:&lt;/p&gt;

&lt;p&gt;Take a piece of material you want to turn into an article. Do not ask AI to write the article. Ask it to do only two things: analyze the material and propose three angles. Then stop and choose the angle yourself.&lt;/p&gt;

&lt;p&gt;If the final article improves, the human checkpoint paid for itself.&lt;/p&gt;

&lt;p&gt;The Shift&lt;br&gt;
The book did not make me want to use AI more.&lt;/p&gt;

&lt;p&gt;It made me want to manage AI less manually.&lt;/p&gt;

&lt;p&gt;That is the real shift: from temporary instruction to reusable workflow; from prompt accumulation to experience engineering; from asking AI to remember my preferences to writing those preferences into a system that can be maintained.&lt;/p&gt;

&lt;p&gt;Better prompts still matter.&lt;/p&gt;

&lt;p&gt;But the real compounding return comes when the prompt stops being a one-off instruction and becomes part of a skill.&lt;/p&gt;

&lt;p&gt;Disclosure: this essay was adapted from my Chinese reading notes and drafted with AI assistance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>skills</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Two Ways Claude Code Calls Codex: One-Shot Subprocess vs. Persistent App Server</title>
      <dc:creator>mufeng</dc:creator>
      <pubDate>Fri, 19 Jun 2026 09:59:31 +0000</pubDate>
      <link>https://dev.to/changyou/two-ways-claude-code-calls-codex-one-shot-subprocess-vs-persistent-app-server-18a6</link>
      <guid>https://dev.to/changyou/two-ways-claude-code-calls-codex-one-shot-subprocess-vs-persistent-app-server-18a6</guid>
      <description>&lt;p&gt;"Claude Code calls Codex" sounds like one feature. It's at least two different process models, and they have almost nothing in common past the name.&lt;/p&gt;

&lt;p&gt;The first spawns a one-shot subprocess with &lt;code&gt;codex exec&lt;/code&gt;. You hand it one explicit instruction, it produces a file or a structured result, and it exits. The second runs a persistent runtime with &lt;code&gt;codex app-server&lt;/code&gt; and talks to it over JSON-RPC, managing threads, turns, reviews, and interrupts for work that needs to carry state across rounds.&lt;/p&gt;

&lt;p&gt;Both let Claude Code borrow Codex. They differ on startup cost, protocol, permissions, error recovery, and the kind of task they fit. Get the distinction wrong and you either over-engineer a one-shot job or reach for a stateless call on work that needs to resume.&lt;/p&gt;

&lt;h2&gt;
  
  
  The conclusion first: two architectures, not two commands
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;codex exec&lt;/code&gt; one-shot subprocess&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;codex app-server&lt;/code&gt; persistent service&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reference implementation&lt;/td&gt;
&lt;td&gt;baoyu &lt;code&gt;codex-imagegen&lt;/code&gt; backend&lt;/td&gt;
&lt;td&gt;OpenAI Codex Plugin for Claude Code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Process shape&lt;/td&gt;
&lt;td&gt;Spawned per task, exits when done&lt;/td&gt;
&lt;td&gt;Long-running, reused within a session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;Launch args, stdin, JSONL event stream&lt;/td&gt;
&lt;td&gt;JSON-RPC requests and notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State model&lt;/td&gt;
&lt;td&gt;Single run, no dependence on the last&lt;/td&gt;
&lt;td&gt;Thread holds multiple turns, can resume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permission posture&lt;/td&gt;
&lt;td&gt;The example uses &lt;code&gt;danger-full-access&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Review is read-only; task can switch to &lt;code&gt;workspace-write&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical task&lt;/td&gt;
&lt;td&gt;Image gen, file generation, single deterministic op&lt;/td&gt;
&lt;td&gt;Code review, long delegated tasks, multi-turn work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main risk&lt;/td&gt;
&lt;td&gt;Full-access child, cold start every time&lt;/td&gt;
&lt;td&gt;More protocol and lifecycle complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The one-line test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you need to run once and get a single verifiable artifact, reach for &lt;code&gt;codex exec&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If you need ongoing collaboration, retained context, and the ability to cancel or resume, reach for &lt;code&gt;codex app-server&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Version scope: keep the numbers honest
&lt;/h2&gt;

&lt;p&gt;The first thing this writeup exposed wasn't architecture. It was version accounting. I had carried over the original draft's phrasing about "the current local version," and only after checking the install records did I confirm that the marketplace source and the active plugin were not the same snapshot.&lt;/p&gt;

&lt;p&gt;Local commands and plugin records show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Codex CLI is &lt;code&gt;0.140.0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The OpenAI Codex Plugin for Claude Code is &lt;code&gt;1.0.4&lt;/code&gt;, commit &lt;code&gt;807e03a&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The baoyu-skills marketplace source snapshot is &lt;code&gt;2.5.1&lt;/code&gt;, commit &lt;code&gt;441ca30&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;But Claude Code's installed-plugin record still points baoyu-skills at the earlier &lt;code&gt;1.111.1&lt;/code&gt; snapshot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the accurate way to state the &lt;code&gt;baoyu-codex-imagegen&lt;/code&gt; analysis below is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It's based on the baoyu-skills v2.5.1 source snapshot in the local marketplace, not a claim that the active plugin has been upgraded to v2.5.1.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is easy to miss. The marketplace source, the cached snapshot, and the active version can all be different commits. Read the directory name or the changelog alone and you'll write "the version I read" when you mean "the version actually running."&lt;/p&gt;

&lt;h2&gt;
  
  
  Path one: &lt;code&gt;codex exec&lt;/code&gt;, Codex as a one-shot operator
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What it solves
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;baoyu-codex-imagegen&lt;/code&gt; skill has a narrow job: let a non-Codex host like Claude Code call the &lt;code&gt;image_gen&lt;/code&gt; tool built into the Codex CLI, and save the result to a chosen path.&lt;/p&gt;

&lt;p&gt;Tasks like that share a shape:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clear input boundary, usually a prompt, an aspect ratio, and an output path.&lt;/li&gt;
&lt;li&gt;Clear result boundary, usually one file and one line of structured status.&lt;/li&gt;
&lt;li&gt;No need for multiple rounds, and no need to restore prior context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So it skips a persistent service and spawns directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sandbox&lt;/span&gt; danger-full-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--skip-git-repo-check&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a reference image exists, it appends one or more &lt;code&gt;--image&lt;/code&gt; arguments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why each flag is there
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;exec&lt;/code&gt; runs non-interactively for scripting. OpenAI's CLI docs position it as the execution path for automation and CI: run, return a result, done.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--json&lt;/code&gt; turns process output into line-delimited JSON events, or JSONL. The caller doesn't parse terminal display text; it reads structured events for the thread, tool calls, usage, and the final message.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--sandbox danger-full-access&lt;/code&gt; is here because this implementation needs Codex to copy the image from its default generation directory to an arbitrary target path the caller specifies, so it grants full file permissions.&lt;/p&gt;

&lt;p&gt;That is not a general best practice. OpenAI's docs recommend &lt;code&gt;workspace-write&lt;/code&gt; for automation and say to avoid unnecessary full access unless the runtime is already isolated.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--skip-git-repo-check&lt;/code&gt; lets Codex run outside a Git repo, since image jobs may launch from a temp or plugin directory rather than a trusted repository.&lt;/p&gt;

&lt;p&gt;The trailing &lt;code&gt;-&lt;/code&gt; tells Codex to read the instruction from stdin. The wrapper writes the task contract with &lt;code&gt;child.stdin.write(instruction)&lt;/code&gt; and then closes stdin.&lt;/p&gt;

&lt;h3&gt;
  
  
  The task contract is the real work
&lt;/h3&gt;

&lt;p&gt;This path doesn't pass the user prompt straight through. It wraps a strict instruction, roughly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TASK:
Generate an image and save it to the given path.

STEPS:
1. You must call the built-in image_gen.
2. Copy the result to the target path.
3. Check that the target file exists.
4. Return one line of JSON only.

HARD CONSTRAINTS:
- Do not call an external image API.
- Do not fake the image with a script.
- You must use image_gen to produce real pixels.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the "sub-agent as operator" design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixed input structure.&lt;/li&gt;
&lt;li&gt;Fixed set of allowed tools.&lt;/li&gt;
&lt;li&gt;Fixed file side effects.&lt;/li&gt;
&lt;li&gt;Fixed output format.&lt;/li&gt;
&lt;li&gt;Explicit prohibitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For an automated pipeline, the constraints matter more than the phrasing. The caller wants a verifiable result, not an open conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Don't trust self-reported success: three checks
&lt;/h3&gt;

&lt;p&gt;The engineering detail worth keeping is that this implementation does not call the job done just because Codex replied "success."&lt;/p&gt;

&lt;p&gt;It checks, in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Whether the JSONL events contain a thread ID.&lt;/li&gt;
&lt;li&gt;Whether an image actually appears under &lt;code&gt;$CODEX_HOME/generated_images/{threadId}/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the directory check fails, whether the tool calls include a &lt;code&gt;cp&lt;/code&gt; or &lt;code&gt;mv&lt;/code&gt; from the generation directory to the target path.&lt;/li&gt;
&lt;li&gt;Whether the target file actually exists and has a byte count above zero.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Failure becomes a structured error:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent_refused&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;no_image_gen_tool_use&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;timeout&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;codex_not_installed&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spawn_failed&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point isn't the image. It's a general principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An agent's natural-language reply is a claim. Files, events, and repeatable checks are evidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Where it fits and where it doesn't
&lt;/h3&gt;

&lt;p&gt;Good fit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single image or file generation.&lt;/li&gt;
&lt;li&gt;A code transform with clear boundaries.&lt;/li&gt;
&lt;li&gt;One-off analysis that returns structured JSON.&lt;/li&gt;
&lt;li&gt;Automation that doesn't need inherited context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every run pays process and model cold-start cost.&lt;/li&gt;
&lt;li&gt;No cross-run state by default.&lt;/li&gt;
&lt;li&gt;With &lt;code&gt;danger-full-access&lt;/code&gt;, the trust boundary is very wide.&lt;/li&gt;
&lt;li&gt;Timeout, cancellation, and recovery usually fall to the wrapper to build.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Path two: &lt;code&gt;codex app-server&lt;/code&gt;, Codex as a stateful service
&lt;/h2&gt;

&lt;p&gt;The OpenAI Codex Plugin for Claude Code does not re-run &lt;code&gt;codex exec&lt;/code&gt; per command. It starts &lt;code&gt;codex app-server&lt;/code&gt; and manages an ongoing session over JSON-RPC.&lt;/p&gt;

&lt;p&gt;OpenAI's docs define the App Server's core abstraction in three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread: a conversation that persists.&lt;/li&gt;
&lt;li&gt;Turn: one round of user input and agent execution inside a thread.&lt;/li&gt;
&lt;li&gt;Item: events inside a turn, such as messages, reasoning, commands, and file edits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Direct connection and broker
&lt;/h3&gt;

&lt;p&gt;The plugin supports two connection modes.&lt;/p&gt;

&lt;p&gt;Direct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code
    |
    | stdin/stdout JSONL
    v
codex app-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The client starts &lt;code&gt;codex app-server&lt;/code&gt; itself and sends line-delimited JSON-RPC over stdio.&lt;/p&gt;

&lt;p&gt;Broker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code command
    |
    | Unix socket
    v
Broker
    |
    | reuse
    v
codex app-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The plugin stores the broker endpoint in &lt;code&gt;CODEX_COMPANION_APP_SERVER_ENDPOINT&lt;/code&gt; so review, rescue, and status commands in the same Claude Code session share one Codex runtime.&lt;/p&gt;

&lt;p&gt;If the broker returns the busy error &lt;code&gt;-32001&lt;/code&gt;, or the connection hits &lt;code&gt;ENOENT&lt;/code&gt; or &lt;code&gt;ECONNREFUSED&lt;/code&gt;, the plugin drops the broker and starts an App Server directly to retry.&lt;/p&gt;

&lt;p&gt;That's one more layer than a one-shot subprocess, and it buys:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime reuse within a session.&lt;/li&gt;
&lt;li&gt;Thread persistence.&lt;/li&gt;
&lt;li&gt;Background task management.&lt;/li&gt;
&lt;li&gt;Cancel and resume.&lt;/li&gt;
&lt;li&gt;Permission isolation between review and task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Handshake: initialize first
&lt;/h3&gt;

&lt;p&gt;Once the App Server connection is up, the client sends &lt;code&gt;initialize&lt;/code&gt;, then an &lt;code&gt;initialized&lt;/code&gt; notification.&lt;/p&gt;

&lt;p&gt;The plugin passes this client identity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Codex Plugin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude Code"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0.4"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also uses &lt;code&gt;optOutNotificationMethods&lt;/code&gt; to unsubscribe from some token-level delta events, keeping the structured notifications that are worth more to the caller and cutting noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session model: threads and turns
&lt;/h3&gt;

&lt;p&gt;The key RPC methods the plugin uses:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;thread/start&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create a new thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;thread/name/set&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Name a thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;thread/resume&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Resume an existing thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;thread/list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Query past threads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;turn/start&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a turn in a thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;review/start&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a code review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;turn/interrupt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interrupt a running turn&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So the App Server isn't a single-round wrapper that "sends a prompt and waits." It's a managed session runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review and task have different permissions
&lt;/h3&gt;

&lt;p&gt;The plugin keeps the two actions separate.&lt;/p&gt;

&lt;p&gt;Review runs read-only, on a temporary thread, through &lt;code&gt;review/start&lt;/code&gt;. It returns findings and does not touch code.&lt;/p&gt;

&lt;p&gt;Task defaults to read-only. Pass &lt;code&gt;--write&lt;/code&gt; and it switches to &lt;code&gt;workspace-write&lt;/code&gt;. It can save the thread, and it can continue prior work with &lt;code&gt;--resume&lt;/code&gt; or &lt;code&gt;--resume-last&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is closer to what an engineering system's default should look like than "run everything with full access." Set the minimum permission by the nature of the task, then decide whether to widen write scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooks wire Codex into the Claude Code lifecycle
&lt;/h3&gt;

&lt;p&gt;The plugin registers three Claude Code hooks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SessionStart&lt;/code&gt;: prepare the shared runtime.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SessionEnd&lt;/code&gt;: clean up the broker and session resources.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Stop&lt;/code&gt;: an optional stop-gate review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the review gate on, every time Claude Code is about to stop, it can have Codex check whether the last round has a blocking problem.&lt;/p&gt;

&lt;p&gt;The value isn't "one more model." It's putting a second model inside the delivery flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude makes a change
    |
    v
Codex reviews independently
    |
    +-- ALLOW: stop is permitted
    |
    +-- BLOCK: return findings, keep working
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has a cost. The official plugin README warns that the review gate can create long Claude/Codex loops and burn through usage fast, so don't turn it on unconditionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When &lt;code&gt;codex exec&lt;/code&gt; fits
&lt;/h3&gt;

&lt;p&gt;Use a one-shot subprocess when most of these hold:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The task is a single round.&lt;/li&gt;
&lt;li&gt;The result can be verified by a file or JSON.&lt;/li&gt;
&lt;li&gt;You don't need to restore prior context.&lt;/li&gt;
&lt;li&gt;Cold-start cost is acceptable.&lt;/li&gt;
&lt;li&gt;The caller can handle timeout and retry on its own.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Examples: generate an image, convert input to a fixed format, run one analysis on a file, run a check once in CI.&lt;/p&gt;

&lt;h3&gt;
  
  
  When &lt;code&gt;codex app-server&lt;/code&gt; fits
&lt;/h3&gt;

&lt;p&gt;Use the persistent service when you need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multiple rounds of conversation.&lt;/li&gt;
&lt;li&gt;Thread resumption.&lt;/li&gt;
&lt;li&gt;Background runs and status queries.&lt;/li&gt;
&lt;li&gt;Interruption of a running task.&lt;/li&gt;
&lt;li&gt;Separate review and write permissions.&lt;/li&gt;
&lt;li&gt;Integration with Claude Code's session lifecycle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Examples: review a branch continuously, delegate a long investigation, let Codex change code and then add tests, or run an automatic second-model gate before stopping.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this was verified
&lt;/h2&gt;

&lt;p&gt;This published version doesn't lean on the draft's description. I redid a minimal verification.&lt;/p&gt;

&lt;p&gt;The steps I ran:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read the draft and listed every factual claim about versions, commands, RPC methods, and permissions.&lt;/li&gt;
&lt;li&gt;Ran &lt;code&gt;codex --version&lt;/code&gt;, &lt;code&gt;codex exec --help&lt;/code&gt;, and &lt;code&gt;codex app-server --help&lt;/code&gt; to confirm the current CLI's commands and flags.&lt;/li&gt;
&lt;li&gt;Checked the OpenAI plugin manifest, install records, &lt;code&gt;app-server.mjs&lt;/code&gt;, &lt;code&gt;codex.mjs&lt;/code&gt;, and the hook config.&lt;/li&gt;
&lt;li&gt;Checked &lt;code&gt;spawn.ts&lt;/code&gt;, &lt;code&gt;main.ts&lt;/code&gt;, the version file, and the Git commit in the baoyu marketplace source.&lt;/li&gt;
&lt;li&gt;Cross-checked against the OpenAI Codex CLI, App Server, Codex Plugin, and Claude Code Hooks docs.&lt;/li&gt;
&lt;li&gt;Recorded "current active version" and "source snapshot I actually read" separately.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The mistake and the lesson
&lt;/h3&gt;

&lt;p&gt;I first took the draft's baoyu-skills v2.5.1 as "the current local version." On further checking, the v2.5.1 marketplace source does exist locally, but Claude Code's installed-plugin record still points at an earlier snapshot.&lt;/p&gt;

&lt;p&gt;Without checking the install record, that phrasing looks reasonable and is wrong.&lt;/p&gt;

&lt;p&gt;The lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you analyze a local plugin, record at least the marketplace HEAD, the install cache path, the plugin manifest, and the commit. No single one of those stands in for "the version actually running."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Practical advice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  One-shot tasks: hardcode the output contract
&lt;/h3&gt;

&lt;p&gt;Don't write "generate an image for me" or "check my code." An automation prompt should include at least:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Goal
Allowed tools
Input and output paths
Prohibitions
Verification steps
Final return format
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That cuts the uncertainty of an agent improvising, and it lets the caller judge success or failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long tasks: resume with the delta only
&lt;/h3&gt;

&lt;p&gt;When you resume a thread, send only what changed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Continue the last task. Apply the first fix and add the matching test.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's no reason to re-paste the whole background. Repeating context adds noise and can make the model misread the task boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review tasks: bind every finding to evidence
&lt;/h3&gt;

&lt;p&gt;Whether you run a standard review or an adversarial one, require each finding to carry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The file or diff actually examined.&lt;/li&gt;
&lt;li&gt;A reproducible failure path.&lt;/li&gt;
&lt;li&gt;A clear risk level.&lt;/li&gt;
&lt;li&gt;A split between fact, inference, and open question.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A "might be a problem" with no evidence rarely makes it into an engineering decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Permissions: start at the smallest scope
&lt;/h3&gt;

&lt;p&gt;The order of preference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;read-only
    |
    v
workspace-write
    |
    v
danger-full-access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Widen only when the task genuinely needs a larger file scope and the runtime is trusted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;"Claude Code calls Codex" is not one calling convention.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;codex exec&lt;/code&gt; is a one-shot, stateless subprocess that's easy to wrap. It fits single tasks with clear boundaries and verifiable results.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;codex app-server&lt;/code&gt; is a stateful, resumable, manageable agent service. It fits code review, task delegation, and complex work that needs ongoing collaboration.&lt;/p&gt;

&lt;p&gt;The real selection criteria aren't "which is more advanced." They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does the task need state?&lt;/li&gt;
&lt;li&gt;Can the result be verified in one shot?&lt;/li&gt;
&lt;li&gt;Do you need interruption, resume, and background management?&lt;/li&gt;
&lt;li&gt;Can permissions be graded by action?&lt;/li&gt;
&lt;li&gt;Is the extra protocol complexity worth it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Simple tasks get a simple process. Ongoing collaboration gets a stateful service. Draw that line clearly and the system gets easier to understand and to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/cli/reference" rel="noopener noreferrer"&gt;OpenAI Codex CLI Command Line Options&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/app-server" rel="noopener noreferrer"&gt;OpenAI Codex App Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/codex-plugin-cc/tree/807e03ac9d5aa23bc395fdec8c3767500a86b3cf" rel="noopener noreferrer"&gt;OpenAI Codex Plugin for Claude Code v1.0.4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;Claude Code Hooks Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/packages/baoyu-codex-imagegen/src/spawn.ts" rel="noopener noreferrer"&gt;baoyu-codex-imagegen spawn.ts&lt;/a&gt; and &lt;a href="https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/packages/baoyu-codex-imagegen/src/main.ts" rel="noopener noreferrer"&gt;main.ts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>codex</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Your blog is invisible to AI. Here's the 1999 fix.</title>
      <dc:creator>mufeng</dc:creator>
      <pubDate>Mon, 15 Jun 2026 03:49:58 +0000</pubDate>
      <link>https://dev.to/changyou/your-blog-is-invisible-to-ai-heres-the-1999-fix-4d8i</link>
      <guid>https://dev.to/changyou/your-blog-is-invisible-to-ai-heres-the-1999-fix-4d8i</guid>
      <description>&lt;p&gt;&lt;em&gt;A quick story about a dead protocol, a confused chatbot, and the ten minutes that gave my blog a new kind of reader.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Hey friends,&lt;/p&gt;

&lt;p&gt;A small thing happened the other day that I haven't been able to stop thinking about.&lt;/p&gt;

&lt;p&gt;I dropped a link to my blog into Claude and asked it to read a few of my recent posts. It came back and told me: can't fetch it. The page returned an empty shell — &lt;code&gt;undefined | loading&lt;/code&gt;. My blog runs on NotionNext, the content renders client-side with JavaScript, and AI crawlers don't execute JS. All it got was the skeleton that exists &lt;em&gt;before&lt;/em&gt; the page comes to life.&lt;/p&gt;

&lt;p&gt;I stared at that spinner for a few seconds, and something clicked: &lt;strong&gt;in the AI era, a site built only for human eyes is worth only half of what it could be.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The other half belongs to machine readers. And the door to those readers was already built back in 1999. It's called RSS.&lt;/p&gt;

&lt;p&gt;If you've been around the internet long enough, you just felt a little nostalgia twinge. Stay with me — this turned out to be one of the highest-leverage things I've done for my writing in years.&lt;/p&gt;

&lt;h2&gt;
  
  
  What RSS actually is
&lt;/h2&gt;

&lt;p&gt;One sentence: RSS is a &lt;strong&gt;read-only API&lt;/strong&gt; your blog exposes to the world.&lt;/p&gt;

&lt;p&gt;It's a static XML file listing your most recent posts in reverse-chronological order — title, link, publish date, and either a summary or the full text. Any program can grab it with a single HTTP request. No JavaScript, no login, no API key.&lt;/p&gt;

&lt;p&gt;If you're technical, picture a public &lt;code&gt;GET /articles?limit=20&lt;/code&gt; endpoint whose response format hasn't changed in over two decades. A protocol defined in 1999, and every reader today still parses every feed. In web terms, that's a living fossil.&lt;/p&gt;

&lt;p&gt;It solves exactly one problem: &lt;strong&gt;readers no longer have to keep reopening your site to check for updates.&lt;/strong&gt; Someone adds your feed to their reader, the reader polls it on a schedule, new posts get pushed to them. The subscription lives entirely in &lt;em&gt;their&lt;/em&gt; hands — no algorithm, no rate limit, no platform taking a cut.&lt;/p&gt;

&lt;p&gt;(Sound familiar? It's basically what you're doing by reading this email. A newsletter is RSS with a friendlier face.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we forgot about it
&lt;/h2&gt;

&lt;p&gt;The platforms won.&lt;/p&gt;

&lt;p&gt;When Google Reader shut down in 2013, control over information flow shifted from &lt;em&gt;subscription&lt;/em&gt; to &lt;em&gt;recommendation&lt;/em&gt;. Twitter/X, TikTok, Instagram — algorithms decide what you see and feed your attention on a drip. Subscription is too "dumb" for that business model: it won't guess what you like, won't manufacture anxiety, won't keep you scrolling.&lt;/p&gt;

&lt;p&gt;So RSS retreated to the corner, kept alive by a small group: programmers, content creators, deep readers.&lt;/p&gt;

&lt;p&gt;But here's the twist — &lt;strong&gt;that small group is exactly the audience an independent writer most wants.&lt;/strong&gt; People still using an RSS reader actively curate their own sources. They don't scroll a feed; they choose their springs. Get into their list and you've earned a long-term seat at the table: they read &lt;em&gt;everything&lt;/em&gt; you publish, not the one piece an algorithm happened to surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI is bringing it back
&lt;/h2&gt;

&lt;p&gt;Two shifts changed my mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One: machines are your blog's new readers.&lt;/strong&gt; People now ask ChatGPT and Claude to summarize your work, point assistants at your site to track updates, and let agents pull your content into research. Most of those crawlers don't run JavaScript — so a client-rendered blog is a blank page to them. RSS is pure server-side XML; any AI can parse it in one line. When I sent Claude my &lt;em&gt;RSS&lt;/em&gt; link instead, it instantly read every recent post. Same content — the HTML page is a welded-shut door, the feed is an open window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two: AI fixes RSS's old fatal flaw.&lt;/strong&gt; Subscription used to die under its own weight — a hundred feeds, hundreds of daily updates, no human can keep up. An LLM dissolves that. More people now let AI sweep every source once a day and produce a linked digest, surfacing only the few pieces worth reading closely. &lt;em&gt;You&lt;/em&gt; pick the sources, AI does the skimming, you keep the deep reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In the algorithm era, a platform uses AI to feed you. In the RSS + LLM era, you use AI to feed yourself.&lt;/strong&gt; The controls have flipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do it in ten minutes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Confirm you have a feed.&lt;/strong&gt; Most frameworks ship one for free. Try &lt;code&gt;yourdomain.com/rss/feed.xml&lt;/code&gt; or &lt;code&gt;/atom.xml&lt;/code&gt; (NotionNext / Hexo / Hugo), &lt;code&gt;yourdomain.com/feed&lt;/code&gt; (WordPress), or &lt;code&gt;yourdomain.com/rss&lt;/code&gt; (Ghost). See XML? It works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it visible.&lt;/strong&gt; Put an RSS link (with the orange icon) in your footer or About page, and confirm your HTML &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; has the auto-discovery line:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;   &lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"alternate"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"application/rss+xml"&lt;/span&gt; &lt;span class="na"&gt;title=&lt;/span&gt;&lt;span class="s"&gt;"RSS"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://yourdomain.com/rss/feed.xml"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use it yourself.&lt;/strong&gt; Install &lt;a href="https://feedly.com" rel="noopener noreferrer"&gt;Feedly&lt;/a&gt;, &lt;a href="https://reederapp.com" rel="noopener noreferrer"&gt;Reeder&lt;/a&gt;, or &lt;a href="https://folo.is" rel="noopener noreferrer"&gt;Folo&lt;/a&gt;. Subscribe to five writers you admire plus your own blog. Live with it a week and feel the difference between &lt;em&gt;information finding you&lt;/em&gt; and &lt;em&gt;you chasing it&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Want to go further? Use n8n or GitHub Actions to pull your feeds on a schedule, send the updates to an LLM API for a daily digest, and push it to your inbox or Telegram. An evening's work — probably the highest-ROI personal infrastructure you'll ever build.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It won't reach a mass audience.&lt;/strong&gt; Most people don't know what RSS is. Bulk traffic still comes from social and search. RSS serves the small high-value slice — and the machines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Almost no engagement data.&lt;/strong&gt; No open rates, no idea who's reading. For dashboard people, it feels like writing in the dark.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full text vs. summary is a real tradeoff.&lt;/strong&gt; Full text is kind to readers but invites scrapers; summaries drive clicks but degrade the experience. My take: ship full text. An independent writer's enemy was never being reposted — it's not being read at all.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One last thing
&lt;/h2&gt;

&lt;p&gt;After years of building software, I keep coming back to one conviction: &lt;strong&gt;the good protocols outlive the platforms.&lt;/strong&gt; Email is older than every social app and won't die. HTTP has watched products rise, throw their banquet, and collapse. RSS has been pronounced dead more times than anyone can count — and in the AI era, it found its second spring.&lt;/p&gt;

&lt;p&gt;Platforms change. Algorithms change. Whichever channel is hot this quarter will change. But the need for &lt;em&gt;an open, machine-readable outlet anyone can subscribe to&lt;/em&gt; does not.&lt;/p&gt;

&lt;p&gt;Spend ten minutes today: find your feed, surface it, subscribe to it. Then hand the link to your AI assistant and watch it read back every post you've ever written.&lt;/p&gt;

&lt;p&gt;That's the moment you realize your blog just gained a whole new audience that's always online.&lt;/p&gt;

&lt;p&gt;Until next time,&lt;br&gt;
Joey&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If a friend would find this useful, forward it along. And if someone shared this with you — you can subscribe below to get the next one straight to your inbox.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rss</category>
    </item>
    <item>
      <title>Your AI Agent Is Underperforming Because of Your Harness, Not the Model</title>
      <dc:creator>mufeng</dc:creator>
      <pubDate>Thu, 11 Jun 2026 05:08:36 +0000</pubDate>
      <link>https://dev.to/changyou/your-ai-agent-is-underperforming-because-of-your-harness-not-the-model-1cf7</link>
      <guid>https://dev.to/changyou/your-ai-agent-is-underperforming-because-of-your-harness-not-the-model-1cf7</guid>
      <description>&lt;p&gt;The pattern is familiar: your AI agent produces garbage output, so you switch to a better model. Things improve for a few days, then the same problems resurface. You upgrade again.&lt;/p&gt;

&lt;p&gt;Here's what you're probably missing: &lt;strong&gt;the model is just one input. The rest is harness — and that's almost always where the real problem lives.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Harness?
&lt;/h2&gt;

&lt;p&gt;The cleanest definition comes from engineer Vtrivedy, who coined the term:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Agent = Model + Harness. If you're not the model, you're the harness.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A harness encompasses everything except the model itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompts, CLAUDE.md / AGENTS.md files, Skill definitions&lt;/li&gt;
&lt;li&gt;Tool descriptions, MCP servers, and their technical specifications&lt;/li&gt;
&lt;li&gt;Execution environment: filesystem, sandboxes, headless browsers&lt;/li&gt;
&lt;li&gt;Subagent orchestration: spawning logic, task handoffs, routing&lt;/li&gt;
&lt;li&gt;Hooks: deterministic enforcement layers (linting, formatting, permission checks)&lt;/li&gt;
&lt;li&gt;Observability: cost monitoring, latency tracking, logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This entire surface area is yours to design, not the model provider's.&lt;/p&gt;

&lt;p&gt;Claude Code, Cursor, Codex, Cline — these tools might run on identical underlying models, but the behavior you experience is dominated by the harness each one provides. The underlying model might be identical across two setups; the behavior you see will be completely different.&lt;/p&gt;

&lt;p&gt;This leads to a counterintuitive but well-supported finding:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A decent model with a great harness consistently outperforms a great model with a bad harness.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Engineers Default to Model-Blaming
&lt;/h2&gt;

&lt;p&gt;When an agent does something nonsensical, blaming the model is the path of least resistance. It's the most visible component, and failures often &lt;em&gt;look&lt;/em&gt; like reasoning problems.&lt;/p&gt;

&lt;p&gt;But most failures are legible if you look closely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent ignored a coding convention → Add it to AGENTS.md&lt;/li&gt;
&lt;li&gt;Agent ran a destructive command → Write a Hook to block it&lt;/li&gt;
&lt;li&gt;Agent got lost in a 40-step task → Split into Planner and Executor subagents&lt;/li&gt;
&lt;li&gt;Agent consistently ships broken types → Wire a type-checker signal into the loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As HumanLayer frames it: &lt;em&gt;"It's not a model problem. It's a configuration problem."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Consider the performance benchmarks: a leading model running inside an off-the-shelf framework often scores dramatically lower than the exact same model running in a custom, highly-tuned harness. The model's capabilities didn't change — the harness is what unlocks them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Ratchet: Every Failure Becomes a Rule
&lt;/h2&gt;

&lt;p&gt;The most important habit in harness engineering is treating agent failures as permanent signals, not one-off flukes to retry and forget.&lt;/p&gt;

&lt;p&gt;Think of a mechanical ratchet: it only moves forward, never backward.&lt;/p&gt;

&lt;p&gt;When an agent makes a mistake, you don't retry and hope for better luck. You engineer a permanent fix so the same exact failure cannot happen again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; An agent submits a PR with commented-out tests. It gets merged into main.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Wrong response:&lt;/em&gt; Fix it manually. Move on.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Harness response:&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add to AGENTS.md: "Never comment out tests. Delete or fix them."&lt;/li&gt;
&lt;li&gt;Add a pre-commit Hook that flags &lt;code&gt;.skip(&lt;/code&gt; in any diff automatically.&lt;/li&gt;
&lt;li&gt;Update the Reviewer subagent's instructions: commented-out tests are a blocking issue.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three layers. Same failure is structurally impossible now.&lt;/p&gt;

&lt;p&gt;Constraints should be added when you observe a real failure, and removed when a more capable model makes them redundant. &lt;strong&gt;Every line in a good system prompt should trace back to a specific, historical failure.&lt;/strong&gt; A harness that grows without bound is just as broken as one that never grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  CLAUDE.md Is a Failure Log, Not Documentation
&lt;/h2&gt;

&lt;p&gt;This is the mistake I see most often. Engineers treat CLAUDE.md like a README written for an AI: project overview, tech stack, coding conventions. Useful — but incomplete.&lt;/p&gt;

&lt;p&gt;Mature harnesses treat CLAUDE.md differently: &lt;strong&gt;every rule should trace back to a specific, real incident.&lt;/strong&gt; If you can't remember the failure that generated a rule, it's probably noise that dilutes the signal of the rules that actually matter.&lt;/p&gt;

&lt;p&gt;Examples of rules with provenance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Never use &lt;code&gt;any&lt;/code&gt; type without explicit authorization"&lt;/em&gt; → From a production bug after TypeScript checks were bypassed.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Run the full test suite before committing, even for one-line changes"&lt;/em&gt; → From a regression where a small fix touched adjacent logic without running tests.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Back up configuration files before modifying"&lt;/em&gt; → From an agent that overwrote a production config.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rules derived from real incidents carry weight in the agent's reasoning. Rules written speculatively get treated as suggestions — not because the model is bad, but because they lack the contextual authority that real constraints carry.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Engineering: The Harness Layer People Miss
&lt;/h2&gt;

&lt;p&gt;There's a component of harness design that gets less attention than it deserves: context management.&lt;/p&gt;

&lt;p&gt;Antonio Gullí, Engineering Director at Google, defines &lt;strong&gt;Context Engineering&lt;/strong&gt; in &lt;em&gt;Agentic Design Patterns&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Not information dumping. Carefully selecting, trimming, and packaging context. To get AI to peak accuracy, you must give it short, focused, powerful context.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This distinguishes Context Engineering from the more common Prompt Engineering. Prompt Engineering asks: &lt;em&gt;How should I phrase this request?&lt;/em&gt; Context Engineering asks: &lt;em&gt;What should already be in front of the agent before it even sees the request?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The discipline applies to every part of the harness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool descriptions:&lt;/strong&gt; Concise and precise, not comprehensive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill files:&lt;/strong&gt; Exact schemas and templates the agent needs, not everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System prompts:&lt;/strong&gt; Specific constraints from real failures, not generic guidelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An agent drowning in context doesn't perform better — it performs worse. Every line in your CLAUDE.md or system prompt is doing Context Engineering. Noise in equals noise in the agent's reasoning.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two-Tier Configuration: Team Brain + Personal Brain
&lt;/h2&gt;

&lt;p&gt;Claude Code's configuration architecture is worth understanding as a design pattern applicable to any agent harness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project &lt;code&gt;.claude/&lt;/code&gt;&lt;/strong&gt; — lives in the repo, committed to Git&lt;br&gt;
Team-shared rules, hooks, security policies, workflow definitions. Every engineer who clones the repo inherits the full agent behavior constraints automatically. This is an engineering asset, maintained alongside code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Global &lt;code&gt;~/.claude/&lt;/code&gt;&lt;/strong&gt; — personal directory, stays out of Git&lt;br&gt;
Personal coding style preferences, cross-project shortcuts, individual tool configurations.&lt;/p&gt;

&lt;p&gt;The separation enforces the right ownership boundaries: team standards are reliable and shared, personal preferences are free and local. New team members inherit your agent setup the moment they clone the repository.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changes When You See It This Way
&lt;/h2&gt;

&lt;p&gt;Once you internalize Agent = Model + Harness, the questions you ask about AI tools shift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model has better code generation?&lt;/li&gt;
&lt;li&gt;What's the context window size?&lt;/li&gt;
&lt;li&gt;What's the price per token?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How mature is this harness?&lt;/li&gt;
&lt;li&gt;What does the failure recovery path look like?&lt;/li&gt;
&lt;li&gt;How are harness rules maintained over time?&lt;/li&gt;
&lt;li&gt;What's the observability story?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model is table stakes at this point. The harness is the differentiator.&lt;/p&gt;

&lt;p&gt;Anthropic's engineering team published this framing directly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The gap between what today's models can theoretically do and what you actually see them doing is largely a harness gap.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The ceiling isn't the model. The floor you're operating at is almost entirely determined by your harness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start Here
&lt;/h2&gt;

&lt;p&gt;Open your CLAUDE.md, or create one if it doesn't exist.&lt;/p&gt;

&lt;p&gt;Think about the last thing your agent got wrong. Not a model failure — a behavioral failure. Something it did that violated an expectation.&lt;/p&gt;

&lt;p&gt;Write one rule. Note where the failure came from. One sentence is enough.&lt;/p&gt;

&lt;p&gt;That's the first notch on the ratchet. Over months, this file becomes a compressed history of your collaboration — every line representing a mistake that was never repeated.&lt;/p&gt;

&lt;p&gt;The harness isn't designed. It's earned.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I write about practical AI engineering, agent design, and building production systems with Claude. Follow for more.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>claude</category>
    </item>
    <item>
      <title>How to Make AI Coding Agents Actually Follow Engineering Process</title>
      <dc:creator>mufeng</dc:creator>
      <pubDate>Sun, 07 Jun 2026 15:53:26 +0000</pubDate>
      <link>https://dev.to/changyou/how-to-make-ai-coding-agents-actually-follow-engineering-process-5b1b</link>
      <guid>https://dev.to/changyou/how-to-make-ai-coding-agents-actually-follow-engineering-process-5b1b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs1mpgipxfnnim9rvrcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbs1mpgipxfnnim9rvrcz.png" alt="Engineering Process"&gt;&lt;/a&gt;&lt;br&gt;
The problem isn't that AI coding agents write bad code.&lt;/p&gt;

&lt;p&gt;The problem is that they skip steps.&lt;/p&gt;

&lt;p&gt;Ask an agent to fix a bug—it reads a few files, guesses a cause, patches the code. Ask it to add a feature—it starts writing before anyone's agreed on what the feature actually does. Ask it to refactor—it touches unrelated files, reformats half the codebase, and hands you a diff too large to review.&lt;/p&gt;

&lt;p&gt;None of this is stupidity. It's the absence of process discipline.&lt;/p&gt;

&lt;p&gt;Software development has always required workflow constraints: clarify before implementing, plan before coding, test before shipping, debug root causes not symptoms, verify before declaring done. The question is whether your AI agent follows them—or bypasses them entirely.&lt;/p&gt;

&lt;p&gt;Superpowers is a plugin framework for Claude Code and Codex that encodes those constraints as loadable, composable agent workflows. This is what it is, when to use it, and how to get started.&lt;/p&gt;


&lt;h2&gt;
  
  
  What "Skills" Actually Are
&lt;/h2&gt;

&lt;p&gt;The word "skill" is overloaded in AI contexts. Here it means something specific: a workflow protocol that loads into an agent session and constrains &lt;em&gt;how&lt;/em&gt; the agent approaches a category of task.&lt;/p&gt;

&lt;p&gt;Not "be more careful." Not a style guide. A specific sequence of steps with defined inputs, outputs, and verification gates.&lt;/p&gt;

&lt;p&gt;The analogy is a checklist for a surgeon or a pilot—not because either lacks expertise, but because cognitive discipline under pressure requires procedural anchors.&lt;/p&gt;

&lt;p&gt;The core Superpowers Skills cover the major failure modes in AI-assisted development:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Failure Mode It Prevents&lt;/th&gt;
&lt;th&gt;What It Produces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;brainstorming&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Implementing the wrong thing&lt;/td&gt;
&lt;td&gt;Clarified scope with edge cases surfaced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;writing-plans&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Drifting mid-implementation&lt;/td&gt;
&lt;td&gt;Executable task list: file scope + verification per step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test-driven-development&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Works on my machine" guesswork&lt;/td&gt;
&lt;td&gt;RED-GREEN-REFACTOR cycles that lock behavior first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;systematic-debugging&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shotgun-patching symptoms&lt;/td&gt;
&lt;td&gt;Root cause hypotheses, evidence-based elimination, minimal fix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;verification-before-completion&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Should be done" claims&lt;/td&gt;
&lt;td&gt;Actual test runs, browser paths, or device checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;requesting-code-review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Merging unreviewed code&lt;/td&gt;
&lt;td&gt;Severity-ranked risk list before merge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;using-git-worktrees&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Task bleed across workstreams&lt;/td&gt;
&lt;td&gt;Isolated workspaces with clean baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't independent tips—they chain into a complete development pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Vague requirement
  → brainstorming  (scope + edge cases)
  → writing-plans  (executable task list)
  → test-driven-development  (behavior locked by tests)
  → requesting-code-review  (risks surfaced)
  → verification-before-completion  (actually verified)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Key Insight: Process Errors vs. Code Errors
&lt;/h2&gt;

&lt;p&gt;AI agents will get better at writing correct code over time. They won't automatically get better at following process—unless process is encoded somewhere.&lt;/p&gt;

&lt;p&gt;The bugs Superpowers Skills prevents aren't syntax errors or logic bugs. They're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building the wrong feature because nobody asked the right clarifying questions&lt;/li&gt;
&lt;li&gt;Writing code that "looks complete" but has zero coverage on the edge cases that matter&lt;/li&gt;
&lt;li&gt;Patching a symptom while the root cause persists&lt;/li&gt;
&lt;li&gt;Refactoring that expands scope until the diff is unmergeable&lt;/li&gt;
&lt;li&gt;Shipping because the agent said "done" without running anything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A more capable model doesn't fix these. A faster agent arguably makes them worse—more code written in the wrong direction before anyone catches it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Real Example: Adding Invoice Export
&lt;/h2&gt;

&lt;p&gt;Imagine you tell an agent: &lt;em&gt;"Add a billing export feature."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without workflow constraints, it will probably find the billing service, write an endpoint, add a download button, and report completion. Whether that implementation handles empty data, unauthorized requests, large datasets, or export format edge cases depends entirely on whether the model guessed right.&lt;/p&gt;

&lt;p&gt;With Superpowers Skills, the flow looks like this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: &lt;code&gt;brainstorming&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Before touching any files, the agent surfaces questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Export format: PDF, CSV, or Excel?&lt;/li&gt;
&lt;li&gt;Date range limits?&lt;/li&gt;
&lt;li&gt;Permission checks required?&lt;/li&gt;
&lt;li&gt;Sync download or async background job?&lt;/li&gt;
&lt;li&gt;What does the user see on failure?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't bureaucracy. This is the list of decisions that will otherwise get made silently—by the model, in the wrong direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: &lt;code&gt;writing-plans&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A compliant plan doesn't say "implement invoice export." It says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Add exportInvoiceCsv(userId, range) to billing service.
   Verify: unit tests covering empty data, normal data, unauthorized access.

2. Wire export endpoint in API routes.
   Verify: 403 on missing permissions, valid text/csv response on success.

3. Add download button to billing page.
   Verify: file downloads on click, loading and error states render correctly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every task has a file scope and a verification gate. That's what makes it executable instead of aspirational.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: &lt;code&gt;test-driven-development&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Tests first. Not as documentation—as behavior contracts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;exportInvoiceCsv&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;exports invoices as csv rows&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;csv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;exportInvoiceCsv&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inv_001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inv_002&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toContain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id,amount,currency&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toContain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inv_001,1999,USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toContain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inv_002,2999,USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write the failing test. Confirm it fails. Implement the minimum to pass. Confirm it passes. Then refactor. The order matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: &lt;code&gt;requesting-code-review&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Before merge, the review targets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this match the agreed plan?&lt;/li&gt;
&lt;li&gt;Any authorization gaps?&lt;/li&gt;
&lt;li&gt;Large dataset edge cases?&lt;/li&gt;
&lt;li&gt;Unhandled error states?&lt;/li&gt;
&lt;li&gt;Files changed outside the agreed scope?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 5: &lt;code&gt;verification-before-completion&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Depending on project type:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project Type&lt;/th&gt;
&lt;th&gt;Verification Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web app&lt;/td&gt;
&lt;td&gt;Start dev server, walk the critical path in browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend service&lt;/td&gt;
&lt;td&gt;Run tests, type check, hit the endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI tool&lt;/td&gt;
&lt;td&gt;Run the command, check actual output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iOS app&lt;/td&gt;
&lt;td&gt;Test on real device (especially IAP, StoreKit, permissions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK / Library&lt;/td&gt;
&lt;td&gt;Unit tests + integration tests + example project&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The principle: &lt;em&gt;evidence over claims&lt;/em&gt;. "I think it's done" is not verification.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Install
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin &lt;span class="nb"&gt;install &lt;/span&gt;superpowers@claude-plugins-official
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via the Superpowers marketplace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add obra/superpowers-marketplace
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;superpowers@superpowers-marketplace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Codex CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search &lt;code&gt;superpowers&lt;/code&gt;, select &lt;code&gt;Install Plugin&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codex App
&lt;/h3&gt;

&lt;p&gt;Sidebar → Plugins → Coding category → Superpowers → &lt;code&gt;+&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use vs. Skip
&lt;/h2&gt;

&lt;p&gt;Not every task needs a full workflow. A typo fix doesn't need a plan. A one-liner doesn't need TDD.&lt;/p&gt;

&lt;p&gt;The right mental model is risk-proportional discipline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Recommended Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Typo fix, config lookup&lt;/td&gt;
&lt;td&gt;Direct action—just verify the output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-file small change&lt;/td&gt;
&lt;td&gt;Optional workflow; at minimum verify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug with unclear root cause&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;systematic-debugging&lt;/code&gt; required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New feature&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;brainstorming&lt;/code&gt; + &lt;code&gt;writing-plans&lt;/code&gt; + TDD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-module refactor&lt;/td&gt;
&lt;td&gt;Plan + verification strongly recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-merge / pre-deploy&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;requesting-code-review&lt;/code&gt; + &lt;code&gt;verification-before-completion&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Skills should add friction proportional to the blast radius of getting it wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Skills to Start With
&lt;/h2&gt;

&lt;p&gt;If you're integrating Superpowers into an existing project, don't try to use everything at once. Start with three:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;systematic-debugging&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Tell the agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Use systematic-debugging. Do not modify any code yet. List your root cause hypotheses first, then we'll validate them one by one."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This stops the shotgun-patch reflex before it starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;writing-plans&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Before any non-trivial feature or change:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Use writing-plans. Produce an executable plan first. I'll confirm before you implement anything."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This surfaces scope creep before it happens, not after you're reviewing a 500-line diff.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;verification-before-completion&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Add this to your project's &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Before declaring any task complete, use verification-before-completion. Run tests, verify in browser or device, report exactly what you checked and what the result was."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This closes the gap between "I think it works" and "I confirmed it works."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Broader Pattern: Startup Superpowers
&lt;/h2&gt;

&lt;p&gt;Startup Superpowers—a companion project that applies the same framework to startup validation—illustrates why this pattern generalizes beyond coding.&lt;/p&gt;

&lt;p&gt;It applies the same idea (codify a professional workflow into loadable agent protocols) to hypothesis tracking, competitor research, customer interviews, and MVP scoping. Available slash commands:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/whats-next&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Assess current stage, recommend next action&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/competitors&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Map direct and indirect competitors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/market-research&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Research customers, pricing, and trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/hypotheses&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Write testable hypotheses with evidence tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/interviews&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Design scripts and analyze transcripts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/surveys&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Design surveys and manage responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/mvp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Design the minimum testable product&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything is stored as Markdown in a &lt;code&gt;startup/&lt;/code&gt; directory—version-controllable, agent-readable, no SaaS dependency.&lt;/p&gt;

&lt;p&gt;That's the actual pattern: take a repeatable professional workflow, encode it as agent steps with defined inputs and outputs, make it loadable in any session, and store all state in files the agent can read and write. The AI doesn't get smarter. The process gets stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Superpowers Skills solves a specific problem: AI coding agents that know how to write code but don't know how to do software development.&lt;/p&gt;

&lt;p&gt;The six questions it forces an agent to answer before declaring a task complete:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Did you clarify the requirements before implementing?&lt;/li&gt;
&lt;li&gt;Did you make a verifiable plan before writing code?&lt;/li&gt;
&lt;li&gt;Did you write tests before the implementation?&lt;/li&gt;
&lt;li&gt;Did you find the root cause before patching?&lt;/li&gt;
&lt;li&gt;Did you get a review before merging?&lt;/li&gt;
&lt;li&gt;Did you actually verify—not just assume—that it works?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without workflow constraints, developers have to ask these questions themselves, every session, every task. With Superpowers, the constraints are stable, loadable, and consistent across sessions, developers, and projects.&lt;/p&gt;

&lt;p&gt;If you're using AI coding agents in real projects today, start with three skills: &lt;code&gt;systematic-debugging&lt;/code&gt;, &lt;code&gt;writing-plans&lt;/code&gt;, and &lt;code&gt;verification-before-completion&lt;/code&gt;. They won't make development magical. They'll make your agent behave like a collaborator with engineering discipline instead of one without it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Superpowers: &lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;github.com/obra/superpowers&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Startup Superpowers: &lt;a href="https://github.com/SergeiGorbatiuk/startup-superpowers" rel="noopener noreferrer"&gt;github.com/SergeiGorbatiuk/startup-superpowers&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codex</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
