DEV Community: Eric Young

Prompting Is Not Enough: Code-Enforced Research Workflows for AI Agents

Eric Young — Mon, 01 Jun 2026 05:19:24 +0000

Most AI workflow failures do not happen because the prompt is too short.

They happen because the prompt is the only thing holding the process together.

In long research tasks, especially business research, the model can start well and still drift later:

It summarizes before verifying.
It treats weak sources as if they were primary evidence.
It updates a conclusion but forgets to update the chart or table behind it.
It cites a source that cites another source, then presents the second-hand claim as if it were original.
It becomes overconfident when the evidence is thin.
It skips the boring quality-control step when the context gets long.

This is why I built Alpha Insights as a harness-enforced research workflow instead of a large prompt template.

Alpha Insights is an open-source business research skill for Claude Code and Codex Desktop. It packages consulting-style research into a staged workflow with frameworks, evidence grading, validators, and report generation.

The more interesting part is not the list of frameworks. It is the execution model.

The Core Problem

Prompting is probabilistic.

You can ask the model to check sources, reconcile numbers, red-team its assumptions, and maintain chart consistency. Sometimes it will. Sometimes it will quietly skip the step, especially after the task becomes long and messy.

For casual work, that may be fine.

For research, it is not fine. A report can look polished while hiding weak evidence, stale numbers, mismatched charts, or unsupported conclusions.

So the design question changes:

Instead of asking, "How do I write a better prompt?"

Ask:

What artifact must exist before the workflow advances?
Which claims need source confidence?
Which checks should be deterministic?
Which failure modes should block the next stage?
What should be written to disk so the workflow can survive context drift?

That is the difference between a prompt and a harness.

What Alpha Insights Enforces

Alpha Insights uses the model for reasoning, synthesis, and judgment. But it tries to move repeatable control logic out of the prompt and into the surrounding system.

The workflow includes:

19 business frameworks: Porter's Five Forces, BCG Matrix, PESTEL, TAM/SAM/SOM, JTBD, flywheel, business model canvas, value chain, and more.
9 thinking methods: issue trees, MECE, hypothesis-driven research, pyramid principle, triangulation, first principles, ACH, pre-mortem, and expert-interview logic.
Evidence grading: claims are tagged by source confidence instead of treating all citations as equal.
Stage gates: validators and hooks block progression when required artifacts or checks are missing.
HTML reports: the final output is a decision-ready report with ECharts visualizations.

The important shift is that "do good research" becomes a set of explicit intermediate artifacts.

For example:

A research plan must exist before evidence collection.
Evidence needs source confidence instead of anonymous citation stuffing.
Claims should link back to supporting evidence.
Report headlines should not drift away from chart data.
Weak evidence should not support strong strategic recommendations without warning.

Some of these checks still require judgment. But many failure modes are mechanical enough to catch with code.

What Should Be Code, Not Prompt

After building and iterating on the workflow, I now think several AI-agent failure modes should be treated as engineering problems:

Stale numbers

If a number changes in one part of the report, downstream tables, charts, and executive summaries should not silently keep the old value.

Source laundering

If source A cites source B, the system should not pretend A is the primary source. The claim should preserve the evidence chain.

Chart/report mismatch

If a chart says 42% and the paragraph says 47%, that should be a validation issue, not a writing style issue.

Skipped artifacts

If the workflow requires a plan, an evidence ledger, a red-team pass, or a report-quality check, the system should verify that the artifact exists before moving on.

Overconfidence from weak evidence

If a claim is supported only by low-confidence sources, the language should not become definitive without an explicit warning.

These are exactly the kinds of things prompts are bad at enforcing over long sessions.

Harness Engineering

The pattern I am exploring is what I call harness engineering:

Use prompts to describe intent.

Use code, state machines, hooks, validators, and explicit files to enforce the workflow.

The model is still doing the hard thinking. But the system around it decides whether the work is complete enough to advance.

That boundary matters.

If everything lives in the prompt, the model is both the worker and the inspector. In long workflows, that is fragile.

If the harness owns the process, the model can focus on reasoning while the system checks structure, evidence, and completion.

Why This Matters

AI agents are getting better at producing plausible work.

That makes verification more important, not less.

For business research, the goal is not a longer report. The goal is a report where the reasoning chain is visible, the evidence quality is explicit, and the workflow cannot quietly skip the boring parts.

Alpha Insights is one implementation of that idea.

GitHub: https://github.com/Ericyoung-183/alpha-insights

Demo report: https://ericyoung-183.github.io/alpha-insights/assets/demo-report.html

MIT licensed. Feedback is very welcome, especially from people building agent workflows where the boundary between model judgment and deterministic enforcement is still unclear.

I built Alpha Insights: AI business research with validators, not just prompts

Eric Young — Thu, 21 May 2026 09:24:38 +0000

Most AI research tools can summarize. That is not the hard part.

The hard part is making the model behave like a serious analyst when the context gets long, the evidence is messy, and the answer needs to support a real decision.

That is why I built Alpha Insights.

GitHub: https://github.com/Ericyoung-183/alpha-insights

The problem

When you ask a raw AI model to do business research, the failure mode is usually not dramatic. It is subtle:

it gives a clean answer before the research is actually done
it cites weak evidence with too much confidence
it skips framework steps when the context gets crowded
it mixes facts, assumptions, and recommendations into one fluent paragraph
it produces a report that looks finished, but is hard to audit

In business analysis, that is dangerous. A polished answer is not the same thing as a decision-ready answer.

What Alpha Insights does differently

Alpha Insights is an open-source business analysis SKILL for Claude Code compatible runtimes and Codex Desktop.

It is not a prompt pack. It is a research workflow with external constraints:

19 business frameworks: Porter's Five Forces, Value Chain, SWOT, PESTEL, BCG Matrix, TAM/SAM/SOM, JTBD, Blue Ocean, Three Horizons, Flywheel, SCP, and more
9 analyst methodologies: MECE, Issue Tree, Hypothesis-Driven, Pyramid Principle, Triangulation, Pre-Mortem, First Principles, ACH, Expert Interview
10 research scenarios: industry research, competitive analysis, product analysis, business model teardown, opportunity discovery, market entry, investment decision, strategic planning, due diligence, ad-hoc advisory
Evidence chain: conclusions are tied to source quality and confidence, instead of floating as polished prose
Multi-track research: public sources, optional knowledge bases, optional internal data, and expert-interview workflows

The goal is simple: make AI stop acting like a generic summarizer and start following an analyst-grade research process.

The technical idea: harness over prompt

The most important design decision in Alpha Insights V4 is this:

Prompt instructions are probabilistic. Harness checks are deterministic.

So Alpha Insights adds a runtime harness around the AI workflow:

a state machine tracks the research stage, tier, loaded frameworks, and deliverables
stage gate validators check whether each step has actually produced the required artifacts
hooks guard report generation, trigger gate checks, and persist progress incrementally
HTML write guards prevent the model from jumping straight to a final report before the evidence and insight stages are validated
dual-platform adapters support both Claude Code compatible runtimes and Codex Desktop

This matters because agent quality problems are often execution problems, not wording problems.

If the model can silently skip a stage, it eventually will. If there is no artifact boundary, the report becomes unauditable. If evidence quality is not checked before recommendations, the output can look smart while resting on sand.

Why this may be useful beyond business research

Alpha Insights is a business analysis tool, but the engineering lesson is broader:

For serious AI workflows, we should stop relying only on better prompts.

A good agent should have:

explicit stages
persistent intermediate artifacts
validators before transitions
source and confidence tracking
hooks that enforce the boring-but-important parts

That is the difference between "the model probably followed the instruction" and "the workflow can prove what happened."

Install

For Codex Desktop:

git clone https://github.com/Ericyoung-183/alpha-insights.git
cd alpha-insights
python3 scripts/install_codex.py --verify

For Claude Code compatible runtimes, install the folder as a skill package and keep the root SKILL.md frontmatter hooks intact, then run:

python3 scripts/verify_cloudcode.py

There is also an agent-first installation guide in the repository:

Install Alpha Insights from this repository. Follow INSTALL_FOR_AGENTS.md exactly.

Feedback welcome

This is open source and MIT licensed.

If you are building AI agents, research workflows, or business-analysis tools, I would love feedback on the harness design, the validator layer, and the dual-platform installation path.

GitHub: https://github.com/Ericyoung-183/alpha-insights

Stars are appreciated, but serious critique is even more useful.

Disclosure: This article was drafted with AI assistance and reviewed by Eric before publication.