<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eric Young</title>
    <description>The latest articles on DEV Community by Eric Young (@ericyoung183).</description>
    <link>https://dev.to/ericyoung183</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943562%2Fce2dfe76-2a50-41e1-858a-afca48352eec.jpg</url>
      <title>DEV Community: Eric Young</title>
      <link>https://dev.to/ericyoung183</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ericyoung183"/>
    <language>en</language>
    <item>
      <title>Prompting Is Not Enough: Code-Enforced Research Workflows for AI Agents</title>
      <dc:creator>Eric Young</dc:creator>
      <pubDate>Mon, 01 Jun 2026 05:19:24 +0000</pubDate>
      <link>https://dev.to/ericyoung183/prompting-is-not-enough-code-enforced-research-workflows-for-ai-agents-524d</link>
      <guid>https://dev.to/ericyoung183/prompting-is-not-enough-code-enforced-research-workflows-for-ai-agents-524d</guid>
      <description>&lt;p&gt;Most AI workflow failures do not happen because the prompt is too short.&lt;/p&gt;

&lt;p&gt;They happen because the prompt is the only thing holding the process together.&lt;/p&gt;

&lt;p&gt;In long research tasks, especially business research, the model can start well and still drift later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It summarizes before verifying.&lt;/li&gt;
&lt;li&gt;It treats weak sources as if they were primary evidence.&lt;/li&gt;
&lt;li&gt;It updates a conclusion but forgets to update the chart or table behind it.&lt;/li&gt;
&lt;li&gt;It cites a source that cites another source, then presents the second-hand claim as if it were original.&lt;/li&gt;
&lt;li&gt;It becomes overconfident when the evidence is thin.&lt;/li&gt;
&lt;li&gt;It skips the boring quality-control step when the context gets long.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why I built Alpha Insights as a harness-enforced research workflow instead of a large prompt template.&lt;/p&gt;

&lt;p&gt;Alpha Insights is an open-source business research skill for Claude Code and Codex Desktop. It packages consulting-style research into a staged workflow with frameworks, evidence grading, validators, and report generation.&lt;/p&gt;

&lt;p&gt;The more interesting part is not the list of frameworks. It is the execution model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Problem
&lt;/h2&gt;

&lt;p&gt;Prompting is probabilistic.&lt;/p&gt;

&lt;p&gt;You can ask the model to check sources, reconcile numbers, red-team its assumptions, and maintain chart consistency. Sometimes it will. Sometimes it will quietly skip the step, especially after the task becomes long and messy.&lt;/p&gt;

&lt;p&gt;For casual work, that may be fine.&lt;/p&gt;

&lt;p&gt;For research, it is not fine. A report can look polished while hiding weak evidence, stale numbers, mismatched charts, or unsupported conclusions.&lt;/p&gt;

&lt;p&gt;So the design question changes:&lt;/p&gt;

&lt;p&gt;Instead of asking, "How do I write a better prompt?"&lt;/p&gt;

&lt;p&gt;Ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What artifact must exist before the workflow advances?&lt;/li&gt;
&lt;li&gt;Which claims need source confidence?&lt;/li&gt;
&lt;li&gt;Which checks should be deterministic?&lt;/li&gt;
&lt;li&gt;Which failure modes should block the next stage?&lt;/li&gt;
&lt;li&gt;What should be written to disk so the workflow can survive context drift?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between a prompt and a harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Alpha Insights Enforces
&lt;/h2&gt;

&lt;p&gt;Alpha Insights uses the model for reasoning, synthesis, and judgment. But it tries to move repeatable control logic out of the prompt and into the surrounding system.&lt;/p&gt;

&lt;p&gt;The workflow includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;19 business frameworks: Porter's Five Forces, BCG Matrix, PESTEL, TAM/SAM/SOM, JTBD, flywheel, business model canvas, value chain, and more.&lt;/li&gt;
&lt;li&gt;9 thinking methods: issue trees, MECE, hypothesis-driven research, pyramid principle, triangulation, first principles, ACH, pre-mortem, and expert-interview logic.&lt;/li&gt;
&lt;li&gt;Evidence grading: claims are tagged by source confidence instead of treating all citations as equal.&lt;/li&gt;
&lt;li&gt;Stage gates: validators and hooks block progression when required artifacts or checks are missing.&lt;/li&gt;
&lt;li&gt;HTML reports: the final output is a decision-ready report with ECharts visualizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important shift is that "do good research" becomes a set of explicit intermediate artifacts.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A research plan must exist before evidence collection.&lt;/li&gt;
&lt;li&gt;Evidence needs source confidence instead of anonymous citation stuffing.&lt;/li&gt;
&lt;li&gt;Claims should link back to supporting evidence.&lt;/li&gt;
&lt;li&gt;Report headlines should not drift away from chart data.&lt;/li&gt;
&lt;li&gt;Weak evidence should not support strong strategic recommendations without warning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of these checks still require judgment. But many failure modes are mechanical enough to catch with code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should Be Code, Not Prompt
&lt;/h2&gt;

&lt;p&gt;After building and iterating on the workflow, I now think several AI-agent failure modes should be treated as engineering problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stale numbers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a number changes in one part of the report, downstream tables, charts, and executive summaries should not silently keep the old value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source laundering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If source A cites source B, the system should not pretend A is the primary source. The claim should preserve the evidence chain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chart/report mismatch&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a chart says 42% and the paragraph says 47%, that should be a validation issue, not a writing style issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skipped artifacts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the workflow requires a plan, an evidence ledger, a red-team pass, or a report-quality check, the system should verify that the artifact exists before moving on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overconfidence from weak evidence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a claim is supported only by low-confidence sources, the language should not become definitive without an explicit warning.&lt;/p&gt;

&lt;p&gt;These are exactly the kinds of things prompts are bad at enforcing over long sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harness Engineering
&lt;/h2&gt;

&lt;p&gt;The pattern I am exploring is what I call harness engineering:&lt;/p&gt;

&lt;p&gt;Use prompts to describe intent.&lt;/p&gt;

&lt;p&gt;Use code, state machines, hooks, validators, and explicit files to enforce the workflow.&lt;/p&gt;

&lt;p&gt;The model is still doing the hard thinking. But the system around it decides whether the work is complete enough to advance.&lt;/p&gt;

&lt;p&gt;That boundary matters.&lt;/p&gt;

&lt;p&gt;If everything lives in the prompt, the model is both the worker and the inspector. In long workflows, that is fragile.&lt;/p&gt;

&lt;p&gt;If the harness owns the process, the model can focus on reasoning while the system checks structure, evidence, and completion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;AI agents are getting better at producing plausible work.&lt;/p&gt;

&lt;p&gt;That makes verification more important, not less.&lt;/p&gt;

&lt;p&gt;For business research, the goal is not a longer report. The goal is a report where the reasoning chain is visible, the evidence quality is explicit, and the workflow cannot quietly skip the boring parts.&lt;/p&gt;

&lt;p&gt;Alpha Insights is one implementation of that idea.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Ericyoung-183/alpha-insights" rel="noopener noreferrer"&gt;https://github.com/Ericyoung-183/alpha-insights&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo report: &lt;a href="https://ericyoung-183.github.io/alpha-insights/assets/demo-report.html" rel="noopener noreferrer"&gt;https://ericyoung-183.github.io/alpha-insights/assets/demo-report.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. Feedback is very welcome, especially from people building agent workflows where the boundary between model judgment and deterministic enforcement is still unclear.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>claude</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I built Alpha Insights: AI business research with validators, not just prompts</title>
      <dc:creator>Eric Young</dc:creator>
      <pubDate>Thu, 21 May 2026 09:24:38 +0000</pubDate>
      <link>https://dev.to/ericyoung183/i-built-alpha-insights-ai-business-research-with-validators-not-just-prompts-307a</link>
      <guid>https://dev.to/ericyoung183/i-built-alpha-insights-ai-business-research-with-validators-not-just-prompts-307a</guid>
      <description>&lt;p&gt;Most AI research tools can summarize. That is not the hard part.&lt;/p&gt;

&lt;p&gt;The hard part is making the model behave like a serious analyst when the context gets long, the evidence is messy, and the answer needs to support a real decision.&lt;/p&gt;

&lt;p&gt;That is why I built &lt;strong&gt;Alpha Insights&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Ericyoung-183/alpha-insights" rel="noopener noreferrer"&gt;https://github.com/Ericyoung-183/alpha-insights&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;When you ask a raw AI model to do business research, the failure mode is usually not dramatic. It is subtle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it gives a clean answer before the research is actually done&lt;/li&gt;
&lt;li&gt;it cites weak evidence with too much confidence&lt;/li&gt;
&lt;li&gt;it skips framework steps when the context gets crowded&lt;/li&gt;
&lt;li&gt;it mixes facts, assumptions, and recommendations into one fluent paragraph&lt;/li&gt;
&lt;li&gt;it produces a report that looks finished, but is hard to audit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In business analysis, that is dangerous. A polished answer is not the same thing as a decision-ready answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Alpha Insights does differently
&lt;/h2&gt;

&lt;p&gt;Alpha Insights is an open-source business analysis SKILL for Claude Code compatible runtimes and Codex Desktop.&lt;/p&gt;

&lt;p&gt;It is not a prompt pack. It is a research workflow with external constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;19 business frameworks&lt;/strong&gt;: Porter's Five Forces, Value Chain, SWOT, PESTEL, BCG Matrix, TAM/SAM/SOM, JTBD, Blue Ocean, Three Horizons, Flywheel, SCP, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9 analyst methodologies&lt;/strong&gt;: MECE, Issue Tree, Hypothesis-Driven, Pyramid Principle, Triangulation, Pre-Mortem, First Principles, ACH, Expert Interview&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 research scenarios&lt;/strong&gt;: industry research, competitive analysis, product analysis, business model teardown, opportunity discovery, market entry, investment decision, strategic planning, due diligence, ad-hoc advisory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evidence chain&lt;/strong&gt;: conclusions are tied to source quality and confidence, instead of floating as polished prose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-track research&lt;/strong&gt;: public sources, optional knowledge bases, optional internal data, and expert-interview workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is simple: make AI stop acting like a generic summarizer and start following an analyst-grade research process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical idea: harness over prompt
&lt;/h2&gt;

&lt;p&gt;The most important design decision in Alpha Insights V4 is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Prompt instructions are probabilistic. Harness checks are deterministic.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So Alpha Insights adds a runtime harness around the AI workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;state machine&lt;/strong&gt; tracks the research stage, tier, loaded frameworks, and deliverables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;stage gate validators&lt;/strong&gt; check whether each step has actually produced the required artifacts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hooks&lt;/strong&gt; guard report generation, trigger gate checks, and persist progress incrementally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML write guards&lt;/strong&gt; prevent the model from jumping straight to a final report before the evidence and insight stages are validated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dual-platform adapters&lt;/strong&gt; support both Claude Code compatible runtimes and Codex Desktop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because agent quality problems are often execution problems, not wording problems.&lt;/p&gt;

&lt;p&gt;If the model can silently skip a stage, it eventually will. If there is no artifact boundary, the report becomes unauditable. If evidence quality is not checked before recommendations, the output can look smart while resting on sand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this may be useful beyond business research
&lt;/h2&gt;

&lt;p&gt;Alpha Insights is a business analysis tool, but the engineering lesson is broader:&lt;/p&gt;

&lt;p&gt;For serious AI workflows, we should stop relying only on better prompts.&lt;/p&gt;

&lt;p&gt;A good agent should have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;explicit stages&lt;/li&gt;
&lt;li&gt;persistent intermediate artifacts&lt;/li&gt;
&lt;li&gt;validators before transitions&lt;/li&gt;
&lt;li&gt;source and confidence tracking&lt;/li&gt;
&lt;li&gt;hooks that enforce the boring-but-important parts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the difference between "the model probably followed the instruction" and "the workflow can prove what happened."&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;For Codex Desktop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Ericyoung-183/alpha-insights.git
&lt;span class="nb"&gt;cd &lt;/span&gt;alpha-insights
python3 scripts/install_codex.py &lt;span class="nt"&gt;--verify&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Claude Code compatible runtimes, install the folder as a skill package and keep the root &lt;code&gt;SKILL.md&lt;/code&gt; frontmatter hooks intact, then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 scripts/verify_cloudcode.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is also an agent-first installation guide in the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Install Alpha Insights from this repository. Follow INSTALL_FOR_AGENTS.md exactly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Feedback welcome
&lt;/h2&gt;

&lt;p&gt;This is open source and MIT licensed.&lt;/p&gt;

&lt;p&gt;If you are building AI agents, research workflows, or business-analysis tools, I would love feedback on the harness design, the validator layer, and the dual-platform installation path.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Ericyoung-183/alpha-insights" rel="noopener noreferrer"&gt;https://github.com/Ericyoung-183/alpha-insights&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stars are appreciated, but serious critique is even more useful.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: This article was drafted with AI assistance and reviewed by Eric before publication.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
