<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gagik Harutyunyan</title>
    <description>The latest articles on DEV Community by Gagik Harutyunyan (@gagharutyunyan).</description>
    <link>https://dev.to/gagharutyunyan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3960167%2F7d3e6cb6-fe1e-4f7f-b1ff-9fd256cbbe35.jpeg</url>
      <title>DEV Community: Gagik Harutyunyan</title>
      <link>https://dev.to/gagharutyunyan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gagharutyunyan"/>
    <language>en</language>
    <item>
      <title>Claude Does Not Need More Prompts. It Needs Reasoning Discipline.</title>
      <dc:creator>Gagik Harutyunyan</dc:creator>
      <pubDate>Sat, 30 May 2026 18:40:20 +0000</pubDate>
      <link>https://dev.to/gagharutyunyan/claude-does-not-need-more-prompts-it-needs-reasoning-discipline-32da</link>
      <guid>https://dev.to/gagharutyunyan/claude-does-not-need-more-prompts-it-needs-reasoning-discipline-32da</guid>
      <description>&lt;p&gt;Large language models are good at sounding structured. That is not the same as&lt;br&gt;
being structured.&lt;/p&gt;

&lt;p&gt;Ask an AI assistant to "use first principles" and it may produce a confident&lt;br&gt;
answer with the phrase "first principles" near the top. Ask it to "red-team this&lt;br&gt;
plan" and it may list generic risks. Ask it to "apply OODA" and it may give you&lt;br&gt;
four headings without doing the hard part: orienting against assumptions,&lt;br&gt;
constraints, and evidence.&lt;/p&gt;

&lt;p&gt;That failure mode is subtle because the answer looks responsible. It has the&lt;br&gt;
right vocabulary. It has the right shape. But the method did not actually&lt;br&gt;
control the analysis.&lt;/p&gt;

&lt;p&gt;I built &lt;code&gt;methodology-toolkit&lt;/code&gt; to target that gap.&lt;/p&gt;

&lt;p&gt;The goal is not to add more clever prompts to Claude Code. The goal is to add a&lt;br&gt;
small layer of discipline around non-trivial decisions: classify the problem,&lt;br&gt;
choose methods that fit, apply those methods explicitly, verify load-bearing&lt;br&gt;
claims, and stress-test plans before they harden into action.&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/gagharutyunyan1993/methodology-toolkit" rel="noopener noreferrer"&gt;https://github.com/gagharutyunyan1993/methodology-toolkit&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem: Methodology Theater
&lt;/h2&gt;

&lt;p&gt;Methodologies are useful because they constrain attention.&lt;/p&gt;

&lt;p&gt;First Principles asks you to strip assumptions and rebuild from base facts. ACH&lt;br&gt;
asks you to compare competing hypotheses by disconfirming evidence, not by&lt;br&gt;
collecting confirmations for your favorite answer. OODA asks you to separate&lt;br&gt;
raw observation from orientation, where bias and context do most of the work.&lt;br&gt;
Pre-mortem asks you to imagine the plan has already failed so optimism does not&lt;br&gt;
screen out obvious risks.&lt;/p&gt;

&lt;p&gt;When an AI assistant merely names those methods, you get the cost without the&lt;br&gt;
benefit.&lt;/p&gt;

&lt;p&gt;The answer becomes longer, more formal, and more convincing, but not necessarily&lt;br&gt;
more correct. That is worse than a short intuitive answer because the structure&lt;br&gt;
creates false confidence.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;methodology-toolkit&lt;/code&gt; treats that as the core anti-pattern:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If a method is named, its steps must be walked.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not hinted at. Not summarized. Applied.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj5x0brdm0xy9wem1rj8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhj5x0brdm0xy9wem1rj8.png" alt="Woman Yelling at a Cat meme: " width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Methodology theater: right vocabulary, no method actually in control.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Second Problem: Confident Wrongness
&lt;/h2&gt;

&lt;p&gt;The other failure mode is more operational: AI agents often make load-bearing&lt;br&gt;
claims from memory or partial context.&lt;/p&gt;

&lt;p&gt;In a codebase, that can look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;assuming which file owns behavior without opening it;&lt;/li&gt;
&lt;li&gt;trusting stale docs instead of current code;&lt;/li&gt;
&lt;li&gt;patching the nearest visible symptom;&lt;/li&gt;
&lt;li&gt;treating generated types or comments as ground truth;&lt;/li&gt;
&lt;li&gt;deciding before running the grep, test, or build that would falsify the idea.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where "reasoning" by itself is not enough. A polished argument with&lt;br&gt;
unverified premises is still fragile.&lt;/p&gt;

&lt;p&gt;So the toolkit includes a dedicated Quality of Information Check. Its rule is&lt;br&gt;
simple: before a conclusion depends on a fact, promote that fact to primary&lt;br&gt;
evidence when possible.&lt;/p&gt;

&lt;p&gt;Primary evidence means things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;code just read;&lt;/li&gt;
&lt;li&gt;command output;&lt;/li&gt;
&lt;li&gt;test results;&lt;/li&gt;
&lt;li&gt;git history;&lt;/li&gt;
&lt;li&gt;observed runtime behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docs, comments, and memory can be useful, but they are not the final authority&lt;br&gt;
when the code or command output says otherwise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx2qs87vcukpp7rcejpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frx2qs87vcukpp7rcejpa.png" alt="I Should Buy a Boat Cat meme: " width="800" height="589"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A polished argument with unverified premises is still fragile.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What the Plugin Actually Adds
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;methodology-toolkit&lt;/code&gt; bundles three pieces that share one methodology index.&lt;/p&gt;

&lt;p&gt;The first is the &lt;code&gt;methodology-driven-thinking&lt;/code&gt; skill. It can activate on&lt;br&gt;
non-trivial tasks like architecture decisions, prioritization, root-cause&lt;br&gt;
analysis, strategy, planning under uncertainty, or tradeoff analysis. It starts&lt;br&gt;
with Cynefin as a dispatcher: clear tasks should be answered directly, while&lt;br&gt;
complicated, complex, or chaotic tasks get different treatment.&lt;/p&gt;

&lt;p&gt;The second is the &lt;code&gt;/methodology-toolkit:method&lt;/code&gt; slash command. This gives you a&lt;br&gt;
manual trigger when you explicitly want the full protocol, or when you want to&lt;br&gt;
force a specific method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/methodology-toolkit:method how should we prioritize the Q3 backlog?
/methodology-toolkit:method ACH+pre-mortem should we migrate polling to WebSocket?
/methodology-toolkit:method red-team &amp;lt;the plan you just wrote&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The third is the &lt;code&gt;red-team-critic&lt;/code&gt; subagent. It is intentionally adversarial. It&lt;br&gt;
does not try to balance the positives. Its job is to find load-bearing&lt;br&gt;
assumptions, failure modes, attack paths, and disconfirming evidence.&lt;/p&gt;

&lt;p&gt;The shared index currently contains 29 methods, including Cynefin, OODA, PDCA,&lt;br&gt;
First Principles, 5 Whys, Porter, ADKAR, JTBD, Theory of Constraints, OKR,&lt;br&gt;
Minto, BATNA, ACH, Red Team, Pre-mortem, PMESII, SWOT/TOWS, SAT, and Quality of&lt;br&gt;
Information Check.&lt;/p&gt;

&lt;p&gt;The number is not the point. The point is that each method has explicit&lt;br&gt;
&lt;code&gt;use_when&lt;/code&gt;, &lt;code&gt;avoid_when&lt;/code&gt;, &lt;code&gt;steps&lt;/code&gt;, and expected output. The agent is instructed&lt;br&gt;
to read that index instead of relying on memory.&lt;/p&gt;
&lt;h2&gt;
  
  
  Design Choice 1: Classify Before Applying
&lt;/h2&gt;

&lt;p&gt;The easiest way to misuse a methodology is to apply it to the wrong type of&lt;br&gt;
problem.&lt;/p&gt;

&lt;p&gt;Some problems are clear. A syntax question does not need OODA. A direct command&lt;br&gt;
does not need a pre-mortem. A small deterministic fix does not need three&lt;br&gt;
frameworks and a leadership memo.&lt;/p&gt;

&lt;p&gt;Other problems are complicated: the answer is knowable through expertise and&lt;br&gt;
analysis. That is where methods like First Principles, ACH, 5 Whys, Theory of&lt;br&gt;
Constraints, Porter, or PMESII can help.&lt;/p&gt;

&lt;p&gt;Some problems are complex: cause and effect are only visible in hindsight. Those&lt;br&gt;
need probes, feedback loops, and iteration. OODA, PDCA, Double Diamond, and JTBD&lt;br&gt;
fit better there.&lt;/p&gt;

&lt;p&gt;Some problems are chaotic: the first job is stabilization, not analysis.&lt;/p&gt;

&lt;p&gt;That is why the skill uses Cynefin first. It prevents the plugin from becoming a&lt;br&gt;
framework machine that turns every question into a workshop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoacarwlr6ge9kd38ueu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoacarwlr6ge9kd38ueu.png" alt="Persian Cat Room Guardian meme: " width="638" height="600"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Design Choice 2: Apply Methods Explicitly
&lt;/h2&gt;

&lt;p&gt;The toolkit has a hard rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never name a method without walking through its steps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer uses OODA, the user should see Observe, Orient, Decide, and Act.&lt;br&gt;
If it uses ACH, the user should see competing hypotheses, evidence, and&lt;br&gt;
disconfirming logic. If it uses Pre-mortem, the answer should imagine failure&lt;br&gt;
and work backward to causes.&lt;/p&gt;

&lt;p&gt;This is not about making the answer longer. It is about making failure visible.&lt;/p&gt;

&lt;p&gt;When the structure is visible, the user can inspect it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the agent skip the real bottleneck?&lt;/li&gt;
&lt;li&gt;Did it confirm the favorite hypothesis instead of trying to disprove it?&lt;/li&gt;
&lt;li&gt;Did it treat a secondary source as fact?&lt;/li&gt;
&lt;li&gt;Did the Orient step name the actual assumptions?&lt;/li&gt;
&lt;li&gt;Did the pre-mortem surface concrete failure modes or generic worries?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Visible structure makes the analysis debuggable.&lt;/p&gt;
&lt;h2&gt;
  
  
  Design Choice 3: Separate The Critic
&lt;/h2&gt;

&lt;p&gt;Self-review is useful, but it has a weakness: the same context that produced the&lt;br&gt;
first answer often rationalizes it during review.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;red-team-critic&lt;/code&gt; subagent exists to create a sharper second pass. It is&lt;br&gt;
designed to critique only. It looks for what would make the plan fail, what an&lt;br&gt;
opponent would exploit, which assumptions carry the most weight, and what&lt;br&gt;
evidence would change the decision.&lt;/p&gt;

&lt;p&gt;This is intentionally not run silently for every task. Independent critique has&lt;br&gt;
a cost. The toolkit encourages it when decisions are hard to reverse, touch&lt;br&gt;
money, auth, data integrity, security, or when the first answer was not grounded&lt;br&gt;
in verified evidence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fem8vz39t6glzy7jajg1d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fem8vz39t6glzy7jajg1d.png" alt="Grumpy Cat meme: " width="600" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The &lt;code&gt;red-team-critic&lt;/code&gt; does not balance the positives. That is the point.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Architecture Decision
&lt;/h2&gt;

&lt;p&gt;Suppose the question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Should we migrate polling to WebSocket?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A generic AI answer will often drift toward WebSocket because it sounds more&lt;br&gt;
modern and more real-time. It will list familiar pros and cons: latency,&lt;br&gt;
complexity, scaling, browser support, server load.&lt;/p&gt;

&lt;p&gt;That is not useless, but it is shallow.&lt;/p&gt;

&lt;p&gt;With the toolkit, the analysis should change shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cynefin classifies the decision as complicated or complex, not clear.&lt;/li&gt;
&lt;li&gt;First Principles asks what real-time property is actually required.&lt;/li&gt;
&lt;li&gt;ACH compares polling, SSE, and WebSocket against evidence that could
disconfirm each option.&lt;/li&gt;
&lt;li&gt;Pre-mortem asks how the migration fails after launch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can lead to a narrower answer:&lt;/p&gt;

&lt;p&gt;Use WebSocket if bidirectional low-latency interaction is actually required. Use&lt;br&gt;
SSE if the server mostly pushes updates to the client. Keep or tune polling if&lt;br&gt;
freshness requirements are loose, the operational surface must stay small, or&lt;br&gt;
the current bottleneck is elsewhere.&lt;/p&gt;

&lt;p&gt;The method does not guarantee the answer. It improves the path to the answer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Codebase Diagnosis
&lt;/h2&gt;

&lt;p&gt;Suppose a page crashes when a user has no profile.&lt;/p&gt;

&lt;p&gt;A fast assistant might patch the component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That may be correct. It may also hide the real defect.&lt;/p&gt;

&lt;p&gt;With the toolkit, 5 Whys is only appropriate while the causal chain stays&lt;br&gt;
mechanical:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Why did the page crash? Rendering accessed a missing field.&lt;/li&gt;
&lt;li&gt;Why was the field missing? The API returned no profile object.&lt;/li&gt;
&lt;li&gt;Why did the API return no profile? The session was partially expired.&lt;/li&gt;
&lt;li&gt;Why did that state reach the UI? The auth refresh path did not normalize the
response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At that point, the patch may belong in the auth/data layer, not the component.&lt;/p&gt;

&lt;p&gt;The Quality of Information Check matters here. The agent should read the&lt;br&gt;
component, grep the callers, inspect the API mapper, and run the relevant test&lt;br&gt;
before deciding where the fix belongs.&lt;/p&gt;
&lt;h2&gt;
  
  
  What This Does Not Solve
&lt;/h2&gt;

&lt;p&gt;This plugin does not make AI reasoning magically correct.&lt;/p&gt;

&lt;p&gt;It does not replace domain expertise. It does not remove the need to run tests,&lt;br&gt;
inspect production telemetry, talk to users, or understand the business. It also&lt;br&gt;
does not mean every answer should become a methodology exercise.&lt;/p&gt;

&lt;p&gt;In fact, one of the rules is to exit when the problem is clear and simple.&lt;/p&gt;

&lt;p&gt;The value is narrower and more practical: it reduces a few predictable failure&lt;br&gt;
modes in AI-assisted work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It makes it harder to use framework names as decoration.&lt;/li&gt;
&lt;li&gt;It makes assumptions easier to inspect.&lt;/li&gt;
&lt;li&gt;It encourages primary evidence before confident claims.&lt;/li&gt;
&lt;li&gt;It gives high-stakes plans an adversarial second pass.&lt;/li&gt;
&lt;li&gt;It turns "think harder" into a repeatable protocol.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;Install from this marketplace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/plugin marketplace add gagharutyunyan1993/methodology-toolkit
/plugin install methodology-toolkit@methodology-toolkit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or try it locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;--plugin-dir&lt;/span&gt; /path/to/methodology-toolkit/plugins/methodology-toolkit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask a non-trivial question, or invoke the method command directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/methodology-toolkit:method should we rewrite this module or stabilize it incrementally?
/methodology-toolkit:method ACH+pre-mortem should we ship this migration this week?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not to make Claude sound like a consultant.&lt;/p&gt;

&lt;p&gt;The point is to make its decisions easier to inspect, challenge, and correct.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feucm6v26mbhipgia2d5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feucm6v26mbhipgia2d5r.png" alt="Bongo Cat meme: " width="600" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
