<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rim Zabarov</title>
    <description>The latest articles on DEV Community by Rim Zabarov (@zabarov).</description>
    <link>https://dev.to/zabarov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974186%2Fcdb07f3e-add6-40f6-bb5e-acf525727e17.png</url>
      <title>DEV Community: Rim Zabarov</title>
      <link>https://dev.to/zabarov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zabarov"/>
    <language>en</language>
    <item>
      <title>When One Prompt Becomes a Process: How I Split Responsibility Inside an AI Skill</title>
      <dc:creator>Rim Zabarov</dc:creator>
      <pubDate>Tue, 09 Jun 2026 08:47:59 +0000</pubDate>
      <link>https://dev.to/zabarov/when-one-prompt-becomes-a-process-how-i-split-responsibility-inside-an-ai-skill-1055</link>
      <guid>https://dev.to/zabarov/when-one-prompt-becomes-a-process-how-i-split-responsibility-inside-an-ai-skill-1055</guid>
      <description>&lt;p&gt;I started with a simple AI prompt for developer work.&lt;/p&gt;

&lt;p&gt;It had the usual parts: role, task, output format, a few constraints. That was&lt;br&gt;
enough for small jobs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;review this function;&lt;/li&gt;
&lt;li&gt;explain this error;&lt;/li&gt;
&lt;li&gt;suggest a plan;&lt;/li&gt;
&lt;li&gt;clean up this note.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the tasks got closer to real engineering work.&lt;/p&gt;

&lt;p&gt;One short request changed the shape of the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this pull request before merge.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds simple. It is not.&lt;/p&gt;

&lt;p&gt;A useful PR review has to read the change, understand the intent, notice&lt;br&gt;
missing context, separate serious risks from small suggestions, think about&lt;br&gt;
tests, and give a result that a developer can act on.&lt;/p&gt;

&lt;p&gt;My first reaction was to keep adding rules to the prompt.&lt;/p&gt;

&lt;p&gt;If the AI jumped to a fix too quickly, I added a rule about understanding the&lt;br&gt;
task boundary first. If it mixed blockers with style comments, I added a rule&lt;br&gt;
about prioritization. If it sounded confident without enough proof, I added a&lt;br&gt;
rule about evidence. If the answer was correct but hard to read, I added a rule&lt;br&gt;
about the final format.&lt;/p&gt;

&lt;p&gt;Each rule was useful. The prompt itself was becoming the problem.&lt;/p&gt;

&lt;p&gt;Everything lived in one long instruction: input analysis, implementation&lt;br&gt;
review, architecture, risk, tests, and final writing. The output could still&lt;br&gt;
look polished, but the actual checks were hard to see.&lt;/p&gt;

&lt;p&gt;So I stopped treating the prompt as one large text and started treating it as a&lt;br&gt;
small process.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I Mean By An AI Skill
&lt;/h2&gt;

&lt;p&gt;By AI skill I mean a repeatable AI workflow for one kind of work.&lt;/p&gt;

&lt;p&gt;It can be a Codex skill, a custom assistant, a system prompt, repository rules,&lt;br&gt;
or another mechanism. The tool is less important than the pattern: the user&lt;br&gt;
brings a recurring kind of task, and the AI handles it in a predictable way.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;review a pull request;&lt;/li&gt;
&lt;li&gt;triage a bug;&lt;/li&gt;
&lt;li&gt;prepare a safe fix plan;&lt;/li&gt;
&lt;li&gt;check a release list;&lt;/li&gt;
&lt;li&gt;summarize a long task for handoff;&lt;/li&gt;
&lt;li&gt;clean up technical documentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a tiny task, a short prompt is usually fine.&lt;/p&gt;

&lt;p&gt;For repeated developer work, the prompt starts carrying more responsibility.&lt;br&gt;
It has to know what counts as input, what counts as risk, what needs a test,&lt;br&gt;
what can block the work, and how the final answer should help the human make a&lt;br&gt;
decision.&lt;/p&gt;

&lt;p&gt;My solution was to split those responsibilities inside the same skill.&lt;/p&gt;

&lt;p&gt;The user still talks to one AI skill and receives one answer. Inside the skill,&lt;br&gt;
the task is handled as several kinds of work.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem With A Big Prompt
&lt;/h2&gt;

&lt;p&gt;A big prompt often grows from reasonable corrections.&lt;/p&gt;

&lt;p&gt;The AI misses context, so we add a context rule. It ignores a risk, so we add a&lt;br&gt;
risk rule. It writes vague advice, so we add an output rule. It forgets tests,&lt;br&gt;
so we add a verification rule.&lt;/p&gt;

&lt;p&gt;After a while, the prompt contains many good rules, but the AI still has to use&lt;br&gt;
all of them at once.&lt;/p&gt;

&lt;p&gt;For code review, that means one pass is expected to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read the diff;&lt;/li&gt;
&lt;li&gt;infer the intent;&lt;/li&gt;
&lt;li&gt;notice missing information;&lt;/li&gt;
&lt;li&gt;understand the implementation;&lt;/li&gt;
&lt;li&gt;check possible user impact;&lt;/li&gt;
&lt;li&gt;think about permissions, data, compatibility, and failure paths;&lt;/li&gt;
&lt;li&gt;decide what blocks merge;&lt;/li&gt;
&lt;li&gt;suggest tests;&lt;/li&gt;
&lt;li&gt;write a clear review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a lot of work to hide behind one smooth answer.&lt;/p&gt;

&lt;p&gt;The review may sound reasonable, but the developer still has to ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which comments are blockers?&lt;/li&gt;
&lt;li&gt;Which ones are suggestions?&lt;/li&gt;
&lt;li&gt;What did the AI treat as a fact?&lt;/li&gt;
&lt;li&gt;What is only an assumption?&lt;/li&gt;
&lt;li&gt;What should be tested before merge?&lt;/li&gt;
&lt;li&gt;Is the conclusion strong enough to act on?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When those questions are not visible in the structure, the AI answer becomes&lt;br&gt;
less useful as an engineering tool.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Responsibility Split
&lt;/h2&gt;

&lt;p&gt;I started separating the work into responsibilities inside the same AI skill.&lt;/p&gt;

&lt;p&gt;The exact names do not matter. For developer tasks, the split often looks like&lt;br&gt;
this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;th&gt;What It Checks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input intake&lt;/td&gt;
&lt;td&gt;What was provided, what is missing, and what cannot be assumed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implementation review&lt;/td&gt;
&lt;td&gt;Whether the change solves the stated problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Action planning&lt;/td&gt;
&lt;td&gt;What the smallest useful next step should be&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk review&lt;/td&gt;
&lt;td&gt;Data, permissions, compatibility, irreversible actions, user impact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality check&lt;/td&gt;
&lt;td&gt;Tests, reproduction, evidence, manual verification, uncertainty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final editing&lt;/td&gt;
&lt;td&gt;A concise answer the developer can act on&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is still one skill. The user should not have to read six separate reports.&lt;/p&gt;

&lt;p&gt;The point is to make the internal work clearer. The final answer can stay short,&lt;br&gt;
but it should carry the result of these checks: what is known, what is risky,&lt;br&gt;
what blocks the decision, what needs verification, and what can wait.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Pull Request Review Example
&lt;/h2&gt;

&lt;p&gt;A weak AI review might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Consider renaming this variable.
- Maybe add a test.
- Check permission handling.
- The code could be easier to read.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line is plausible. Together, they do not help much.&lt;/p&gt;

&lt;p&gt;A style comment, a missing test, a possible permission issue, and a readability&lt;br&gt;
note all have the same weight. The author of the pull request still has to&lt;br&gt;
decide what matters before merge.&lt;/p&gt;

&lt;p&gt;With a responsibility split, the same review can become more practical.&lt;/p&gt;

&lt;p&gt;The input intake checks whether the AI has the diff, the task description, and&lt;br&gt;
any constraints. The implementation review checks whether the change solves the&lt;br&gt;
actual problem. The risk review looks for cases where a small change can affect&lt;br&gt;
users, data, permissions, or compatibility. The quality check asks how the&lt;br&gt;
conclusion can be verified.&lt;/p&gt;

&lt;p&gt;The final answer might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Blockers:
- After an authorization failure, the code can return a cached result. This
  can show stale or unauthorized data to the current user.

Questions:
- Is there a test for the authorization failure path?

Suggestions:
- Keep the cache fallback for technical failures, but handle access denial as
  a hard stop.

Conclusion:
- I would not merge this PR yet. First, make the authorization failure path
  explicit and cover it with a test.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The value comes from the order, not from making the answer longer.&lt;/p&gt;

&lt;p&gt;The important issue has a clear place. The question is separate from the&lt;br&gt;
recommendation. The suggestion does not hide the blocker. The conclusion tells&lt;br&gt;
the developer what decision the review supports.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Same Pattern For Bug Triage
&lt;/h2&gt;

&lt;p&gt;The same split helps when the request is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Here is an error. Fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI often wants to jump straight to the likely file and suggest a patch. That can&lt;br&gt;
be useful for obvious issues. For a real bug, the useful work often starts one&lt;br&gt;
step earlier.&lt;/p&gt;

&lt;p&gt;Input intake separates facts from guesses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What exactly happened?&lt;/li&gt;
&lt;li&gt;Is there a stack trace?&lt;/li&gt;
&lt;li&gt;Can the issue be reproduced?&lt;/li&gt;
&lt;li&gt;Which environment and version are involved?&lt;/li&gt;
&lt;li&gt;What has already been checked?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implementation review looks for the likely area of the cause.&lt;/p&gt;

&lt;p&gt;Action planning chooses a small path through the code instead of turning the&lt;br&gt;
bug into a broad refactor.&lt;/p&gt;

&lt;p&gt;Risk review asks whether the fix touches data, permissions, migrations, public&lt;br&gt;
APIs, background jobs, or production behavior.&lt;/p&gt;

&lt;p&gt;Quality check asks how to prove the fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a failing test before the change;&lt;/li&gt;
&lt;li&gt;the same test passing after the change;&lt;/li&gt;
&lt;li&gt;a reproduction command;&lt;/li&gt;
&lt;li&gt;a manual check;&lt;/li&gt;
&lt;li&gt;a clear note about what could not be verified.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final answer can stay compact. It should tell the developer the cause, the&lt;br&gt;
change, the verification, and the remaining risk.&lt;/p&gt;

&lt;p&gt;That is the part an experienced engineer usually keeps in their head. The skill&lt;br&gt;
just makes it explicit enough for the AI to follow.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Helps
&lt;/h2&gt;

&lt;p&gt;The first benefit is fewer hidden assumptions.&lt;/p&gt;

&lt;p&gt;When the skill has an input step, it is more likely to say what is missing&lt;br&gt;
before writing a confident answer. That matters in code review and bug triage,&lt;br&gt;
because a confident guess can waste more time than an honest question.&lt;/p&gt;

&lt;p&gt;The second benefit is better prioritization.&lt;/p&gt;

&lt;p&gt;A useful review is more than a list of possible improvements. It tells the&lt;br&gt;
developer what blocks the decision, what needs an answer, and what is only a&lt;br&gt;
suggestion.&lt;/p&gt;

&lt;p&gt;The third benefit is easier improvement of the skill itself.&lt;/p&gt;

&lt;p&gt;If all rules live in one large prompt, it is hard to see what failed. Did the AI&lt;br&gt;
miss the input boundary? Did it miss the risk? Did it fail to ask for evidence?&lt;br&gt;
Did it write a good technical answer in a bad format?&lt;/p&gt;

&lt;p&gt;When responsibilities are separate, the next edit is more targeted.&lt;/p&gt;

&lt;p&gt;If the AI invents missing facts, improve input intake. If it misses permission&lt;br&gt;
risks, improve risk review. If it gives long unfocused answers, improve final&lt;br&gt;
editing. The skill can grow where it actually fails.&lt;/p&gt;
&lt;h2&gt;
  
  
  When I Would Use It
&lt;/h2&gt;

&lt;p&gt;I would use this structure for tasks where the answer supports a decision:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;merge or hold a pull request;&lt;/li&gt;
&lt;li&gt;change a public API;&lt;/li&gt;
&lt;li&gt;fix a bug with unclear cause;&lt;/li&gt;
&lt;li&gt;prepare a release check;&lt;/li&gt;
&lt;li&gt;touch user data or permissions;&lt;/li&gt;
&lt;li&gt;hand off a long task to another session or person;&lt;/li&gt;
&lt;li&gt;review material that will be published.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For small requests, the structure becomes extra weight.&lt;/p&gt;

&lt;p&gt;If I need a Git command, a quick explanation of a compiler error, or a small&lt;br&gt;
code example, a full review process gets in the way. The skill should fit the&lt;br&gt;
weight of the task.&lt;/p&gt;

&lt;p&gt;There is also a failure mode in the other direction: roles that repeat each&lt;br&gt;
other. If every responsibility says the same thing, the result is just noise.&lt;br&gt;
Each responsibility should either catch a different kind of problem or make a&lt;br&gt;
different decision.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Public Demo
&lt;/h2&gt;

&lt;p&gt;I made a small public repository to show the idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/zabarov/demo-codex-skill-dev-review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is a text-based demo skill for code review. Start with &lt;code&gt;SKILL.md&lt;/code&gt;, then look&lt;br&gt;
at the sample review output.&lt;/p&gt;

&lt;p&gt;The repository is intentionally small. It shows the structure: review input,&lt;br&gt;
implementation, risk, quality, and final answer. You can adapt the same pattern&lt;br&gt;
to your own review workflow without copying any particular process.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Took From This
&lt;/h2&gt;

&lt;p&gt;The useful shift was simple: stop trying to make one perfect prompt carry every&lt;br&gt;
rule in the same place.&lt;/p&gt;

&lt;p&gt;For developer work, an AI skill becomes easier to trust when it has visible&lt;br&gt;
discipline. It should know what input it has, what is still missing, where the&lt;br&gt;
risk is, what needs verification, and what decision the final answer supports.&lt;/p&gt;

&lt;p&gt;The user still owns the decision. The AI still makes mistakes. But the work is&lt;br&gt;
less opaque.&lt;/p&gt;

&lt;p&gt;For me, this is where many practical AI skills are going: from a long prompt&lt;br&gt;
toward a small process with clear responsibilities inside one tool.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
