aswe

Posted on Jun 8

Stop Reinventing the Wheel: A Prior Art Investigation Framework for the SDD Era

#ai #architecture #designsystem #productivity

I spent hours designing something that already had a name, a Wikipedia page, and 10 years of papers. Here's how I fixed that.

The Mistake

When I started building llm-distil-loop, I designed a system from scratch:

"Use an LLM to generate labeled training data, then train a smaller ML model on those outputs."

I wrote requirements. I sketched architecture. I started thinking about data schemas.

A few hours in, I searched for something loosely related — and found it.

Knowledge Distillation. A research field since Hinton et al., 2015. Hundreds of papers. Multiple production-ready OSS implementations. Documented failure patterns. A decade of practitioners learning what not to do.

I had been designing in a vacuum that didn't exist.

The problem wasn't that I'm careless. The problem is that spec-driven development and AI agents make you move fast — and moving fast means skipping the "does this already have a name?" check.

That check is now a framework: prior-art-investigation.

What It Actually Does

It's a prompt collection — I'll be honest about that upfront. Not a library, not a CLI tool. Prompts that wire into your SDD workflow.

But the prompts encode something non-trivial: the questions that senior engineers and system designers actually ask before committing to an approach.

The 7 Questions

Every investigation runs through some or all of these:

#	Question	When
Q1	Is the problem framing correct? Am I solving the right problem?	Requirements, Design
Q2	Why hasn't this approach already become the standard? If it's obvious, why isn't everyone doing it?	Design
Q3	Who has tried this and failed? How did they fail?	Design
Q4	Who thinks most deeply about this domain? Where do their words live?	Design
Q5	Have I read primary sources — papers, RFCs, commit logs, issues — not just READMEs and blog posts?	Design
Q6	If this fails in the worst possible way, what causes it? What should I verify now?	Requirements, Design
Q7	I now know the concept name. How does that change my design?	Tasks

Q7 — the "So What" question — is the one most people skip. Finding the concept name isn't the goal. Letting it change your design is.

What It Returns

The agent searches arXiv, Papers with Code, and Semantic Scholar in real time — no knowledge cutoff — and returns:

Research Lineage

Concept: Knowledge Distillation

2015 — Hinton et al., "Distilling the Knowledge in a Neural Network"
       https://arxiv.org/abs/1503.02531
       Key insight: Temperature-scaled softmax enables knowledge transfer
       between models of different sizes.

2019 — Sanh et al., "DistilBERT"
       https://arxiv.org/abs/1910.01108
       Key insight: BERT-scale distillation is practical and production-ready.

2021 — Wang et al., "MiniLM"
       https://arxiv.org/abs/2002.10957
       Key insight: Layer-wise attention matching improves small model quality.

2023 — Fu et al., "Distilling Step-by-Step"
       https://arxiv.org/abs/2212.10560
       Key insight: LLM reasoning chains can be distilled, not just outputs.

OSS Evaluation Matrix

Tool              License      Last Commit   Fit    Verdict
──────────────────────────────────────────────────────────
HF transformers   Apache-2.0   Active        High   ✅ Adopt
LLaMA-Factory     Apache-2.0   Active        Med    ✅ Evaluate
Paper code        Varies       Stale         Low    ❌ Reference only

License tiers are explicit: MIT/Apache-2.0 are Tier 1 (adopt freely), GPL is Tier 3 (legal review required), AGPL/SSPL are Tier 4 (do not adopt).

Known Failure Patterns

Teacher bias propagates to the student model
Without quality gates on generated labels, distillation silently fails
Temperature and loss weighting are sensitive — small changes break training

This last section is what saves the most time. You don't just learn what the thing is called — you learn what breaks it, from people who already learned the hard way.

It Also Works for OSS and Technology Selection

Prior art investigation isn't only for research concepts. I've used it for:

OCR and PDF library selection — evaluating Tesseract vs EasyOCR vs cloud APIs across accuracy, license, offline support, and maintenance health before writing a single line of integration code.

Programming language technology decisions — when a project's language has specific constraints (runtime, ecosystem maturity, async model), the framework surfaces those tradeoffs from primary sources rather than Stack Overflow opinions.

The evaluation criteria in the prompts are not fixed. Because it's a prompt collection, you can adjust the selection matrix for your context — stricter license requirements, different maintenance thresholds, specific performance benchmarks. The framework adapts to what you're actually deciding.

The underlying question is always the same: what do I need to know before I commit to this?

Is this author an individual or an organization? (Long-term maintenance signal)
When was the last commit? (Health signal)
What's the license tier? (Legal risk signal)
How does it compare to the two closest alternatives?
Does this language/runtime have known limitations for this use case?

How It Integrates

Standalone

git clone https://github.com/as-we/prior-art-investigation
cd prior-art-investigation
make install

In VS Code + Copilot Chat:

/prior-art full I want to use LLM outputs to train a smaller ML model

Add #web for live search beyond training cutoff.

Wired into SDD Workflows

The framework runs at different depths depending on the phase — automatically, without manual triggering:

Phase	Questions	Depth
Requirements	Q1 + Q6	Quick check — 2 questions
Design	All 7	Full investigation
Tasks	Q7 only	So What check

By the time I'm writing tasks, the research is already done.

GitHub SpecKit (VS Code + GitHub Copilot)

specify extension add prior-art-investigation --from <zip-url>

Then add to your .specify/extensions.yml:

hooks:
  before_specify: prior-art minimal
  before_plan:    prior-art full
  before_tasks:   prior-art sowhat

Three agent files handle each phase: prior-art-minimal.agent.md, prior-art-full.agent.md, prior-art-sowhat.agent.md.

Kiro SDD

Native hook integration via .kiro/hooks/. No additional setup beyond copying the hook files.

Claude Code

Add the snippet from claude-code/CLAUDE.md.snippet to your project's CLAUDE.md. Claude Code reads this as a persistent instruction and fires prior art checks at each phase automatically.

Cursor / Windsurf / other agent IDEs

Use the prompt files directly as agent prompts. Manual trigger required.

The Output Gets Recorded

Results aren't ephemeral. Each investigation writes to research.md:

## Named Concept

| Field | Value |
|-------|-------|
| Concept | Knowledge Distillation |
| First published | 2015 / Hinton et al., NeurIPS |
| Maturity | ✅ Production Ready |
| Paper URL | https://arxiv.org/abs/1503.02531 |
| Design impact | Use temperature scaling; add quality gate on LLM labels |
| Differentiation | Custom quality gate logic specific to our label schema |

## OSS Decision

| Package | License | Last Commit | Verdict |
|---------|---------|-------------|---------|
| HF transformers | Apache-2.0 | 2025-05 | ✅ Adopted |
| LLaMA-Factory | Apache-2.0 | 2025-04 | ❌ Overkill for this use case |

Future team members — or future you — can see exactly what was considered and why.

Standing on the Shoulders of People Who Struggled

There's a manga called Chi. — Chikyuu no Undou ni Tsuite ("Chi. — About the Movement of the Earth"). It follows ordinary people across centuries who, at enormous personal cost, pursued the idea that the Earth moves around the Sun — not the other way around. Each of them built on the suffering and insight of the person before them, usually without recognition, often at great risk.

I think about that when I read a paper published in 2015.

Geoffrey Hinton didn't write "Distilling the Knowledge in a Neural Network" in an afternoon. That insight came from years of thinking about how biological neural systems learn, how compressed representations form, what it means for a model to "understand" rather than memorize. The footnotes in that paper point to decades of prior work by people I'll never know.

When I run /prior-art full and get back a research lineage in thirty seconds, I'm not just saving time. I'm being handed a map that took hundreds of people years of struggle to draw.

The least I can do is read it carefully.

This framework is built around that belief. Q5 — "Have I read primary sources, not just READMEs?" — is a discipline question as much as a research question. It asks: did you actually engage with what these people discovered, or did you skim the surface and move on?

Speed is valuable. Efficiency is valuable. But efficiency that treats human knowledge as a lookup table misses something important. The research lineage isn't just context — it's the record of how hard certain problems actually are, written by the people who found out the hard way.

Use this framework to go fast. But go fast with your eyes open.

Why This Matters Now

AI agents and SDD workflows have changed the speed of implementation. A well-framed problem statement becomes working code in hours, not weeks.

That's powerful. It's also dangerous.

When implementation is fast, the cost of starting with the wrong design compounds quickly. You ship fast in the wrong direction.

Prior art investigation is the check that keeps speed from becoming waste. Five minutes before you start. The research is already out there — someone already named it, studied it, failed at it, and wrote it down. This framework finds it before you repeat their mistakes.

DEV Community