I spent hours designing something that already had a name, a Wikipedia page, and 10 years of papers. Here's how I fixed that.
The Mistake
When I started building llm-distil-loop, I designed a system from scratch:
"Use an LLM to generate labeled training data, then train a smaller ML model on those outputs."
I wrote requirements. I sketched architecture. I started thinking about data schemas.
A few hours in, I searched for something loosely related — and found it.
Knowledge Distillation. A research field since Hinton et al., 2015. Hundreds of papers. Multiple production-ready OSS implementations. Documented failure patterns. A decade of practitioners learning what not to do.
I had been designing in a vacuum that didn't exist.
The problem wasn't that I'm careless. The problem is that spec-driven development and AI agents make you move fast — and moving fast means skipping the "does this already have a name?" check.
That check is now a framework: prior-art-investigation.
What It Actually Does
It's a prompt collection — I'll be honest about that upfront. Not a library, not a CLI tool. Prompts that wire into your SDD workflow.
But the prompts encode something non-trivial: the questions that senior engineers and system designers actually ask before committing to an approach.
The 7 Questions
Every investigation runs through some or all of these:
| # | Question | When |
|---|---|---|
| Q1 | Is the problem framing correct? Am I solving the right problem? | Requirements, Design |
| Q2 | Why hasn't this approach already become the standard? If it's obvious, why isn't everyone doing it? | Design |
| Q3 | Who has tried this and failed? How did they fail? | Design |
| Q4 | Who thinks most deeply about this domain? Where do their words live? | Design |
| Q5 | Have I read primary sources — papers, RFCs, commit logs, issues — not just READMEs and blog posts? | Design |
| Q6 | If this fails in the worst possible way, what causes it? What should I verify now? | Requirements, Design |
| Q7 | I now know the concept name. How does that change my design? | Tasks |
Q7 — the "So What" question — is the one most people skip. Finding the concept name isn't the goal. Letting it change your design is.
What It Returns
The agent searches arXiv, Papers with Code, and Semantic Scholar in real time — no knowledge cutoff — and returns:
Research Lineage
Concept: Knowledge Distillation
2015 — Hinton et al., "Distilling the Knowledge in a Neural Network"
https://arxiv.org/abs/1503.02531
Key insight: Temperature-scaled softmax enables knowledge transfer
between models of different sizes.
2019 — Sanh et al., "DistilBERT"
https://arxiv.org/abs/1910.01108
Key insight: BERT-scale distillation is practical and production-ready.
2021 — Wang et al., "MiniLM"
https://arxiv.org/abs/2002.10957
Key insight: Layer-wise attention matching improves small model quality.
2023 — Fu et al., "Distilling Step-by-Step"
https://arxiv.org/abs/2212.10560
Key insight: LLM reasoning chains can be distilled, not just outputs.
OSS Evaluation Matrix
Tool License Last Commit Fit Verdict
──────────────────────────────────────────────────────────
HF transformers Apache-2.0 Active High ✅ Adopt
LLaMA-Factory Apache-2.0 Active Med ✅ Evaluate
Paper code Varies Stale Low ❌ Reference only
License tiers are explicit: MIT/Apache-2.0 are Tier 1 (adopt freely), GPL is Tier 3 (legal review required), AGPL/SSPL are Tier 4 (do not adopt).
Known Failure Patterns
- Teacher bias propagates to the student model
- Without quality gates on generated labels, distillation silently fails
- Temperature and loss weighting are sensitive — small changes break training
This last section is what saves the most time. You don't just learn what the thing is called — you learn what breaks it, from people who already learned the hard way.
It Also Works for OSS and Technology Selection
Prior art investigation isn't only for research concepts. I've used it for:
OCR and PDF library selection — evaluating Tesseract vs EasyOCR vs cloud APIs across accuracy, license, offline support, and maintenance health before writing a single line of integration code.
Programming language technology decisions — when a project's language has specific constraints (runtime, ecosystem maturity, async model), the framework surfaces those tradeoffs from primary sources rather than Stack Overflow opinions.
The evaluation criteria in the prompts are not fixed. Because it's a prompt collection, you can adjust the selection matrix for your context — stricter license requirements, different maintenance thresholds, specific performance benchmarks. The framework adapts to what you're actually deciding.
The underlying question is always the same: what do I need to know before I commit to this?
- Is this author an individual or an organization? (Long-term maintenance signal)
- When was the last commit? (Health signal)
- What's the license tier? (Legal risk signal)
- How does it compare to the two closest alternatives?
- Does this language/runtime have known limitations for this use case?
How It Integrates
Standalone
git clone https://github.com/as-we/prior-art-investigation
cd prior-art-investigation
make install
In VS Code + Copilot Chat:
/prior-art full I want to use LLM outputs to train a smaller ML model
Add #web for live search beyond training cutoff.
Wired into SDD Workflows
The framework runs at different depths depending on the phase — automatically, without manual triggering:
| Phase | Questions | Depth |
|---|---|---|
| Requirements | Q1 + Q6 | Quick check — 2 questions |
| Design | All 7 | Full investigation |
| Tasks | Q7 only | So What check |
By the time I'm writing tasks, the research is already done.
GitHub SpecKit (VS Code + GitHub Copilot)
Register as a SpecKit Extension. It hooks into before_specify, before_plan, and before_tasks automatically:
specify extension add prior-art-investigation --from <zip-url>
Then add to your .specify/extensions.yml:
hooks:
before_specify: prior-art minimal
before_plan: prior-art full
before_tasks: prior-art sowhat
Three agent files handle each phase: prior-art-minimal.agent.md, prior-art-full.agent.md, prior-art-sowhat.agent.md.
Kiro SDD
Native hook integration via .kiro/hooks/. No additional setup beyond copying the hook files.
Claude Code
Add the snippet from claude-code/CLAUDE.md.snippet to your project's CLAUDE.md. Claude Code reads this as a persistent instruction and fires prior art checks at each phase automatically.
Cursor / Windsurf / other agent IDEs
Use the prompt files directly as agent prompts. Manual trigger required.
The Output Gets Recorded
Results aren't ephemeral. Each investigation writes to research.md:
## Named Concept
| Field | Value |
|-------|-------|
| Concept | Knowledge Distillation |
| First published | 2015 / Hinton et al., NeurIPS |
| Maturity | ✅ Production Ready |
| Paper URL | https://arxiv.org/abs/1503.02531 |
| Design impact | Use temperature scaling; add quality gate on LLM labels |
| Differentiation | Custom quality gate logic specific to our label schema |
## OSS Decision
| Package | License | Last Commit | Verdict |
|---------|---------|-------------|---------|
| HF transformers | Apache-2.0 | 2025-05 | ✅ Adopted |
| LLaMA-Factory | Apache-2.0 | 2025-04 | ❌ Overkill for this use case |
Future team members — or future you — can see exactly what was considered and why.
Standing on the Shoulders of People Who Struggled
There's a manga called Chi. — Chikyuu no Undou ni Tsuite ("Chi. — About the Movement of the Earth"). It follows ordinary people across centuries who, at enormous personal cost, pursued the idea that the Earth moves around the Sun — not the other way around. Each of them built on the suffering and insight of the person before them, usually without recognition, often at great risk.
I think about that when I read a paper published in 2015.
Geoffrey Hinton didn't write "Distilling the Knowledge in a Neural Network" in an afternoon. That insight came from years of thinking about how biological neural systems learn, how compressed representations form, what it means for a model to "understand" rather than memorize. The footnotes in that paper point to decades of prior work by people I'll never know.
When I run /prior-art full and get back a research lineage in thirty seconds, I'm not just saving time. I'm being handed a map that took hundreds of people years of struggle to draw.
The least I can do is read it carefully.
This framework is built around that belief. Q5 — "Have I read primary sources, not just READMEs?" — is a discipline question as much as a research question. It asks: did you actually engage with what these people discovered, or did you skim the surface and move on?
Speed is valuable. Efficiency is valuable. But efficiency that treats human knowledge as a lookup table misses something important. The research lineage isn't just context — it's the record of how hard certain problems actually are, written by the people who found out the hard way.
Use this framework to go fast. But go fast with your eyes open.
Why This Matters Now
AI agents and SDD workflows have changed the speed of implementation. A well-framed problem statement becomes working code in hours, not weeks.
That's powerful. It's also dangerous.
When implementation is fast, the cost of starting with the wrong design compounds quickly. You ship fast in the wrong direction.
Prior art investigation is the check that keeps speed from becoming waste. Five minutes before you start. The research is already out there — someone already named it, studied it, failed at it, and wrote it down. This framework finds it before you repeat their mistakes.
Links
- prior-art-investigation: github.com/as-we/prior-art-investigation — MIT License
- llm-distil-loop: github.com/as-we/llm-distil-loop — the project where I learned I needed this — Apache-2.0
Built while working on a music analysis distillation pipeline. The irony of needing prior art investigation while building a prior art investigation tool was not lost on me.
Tags: productivity tooling machinelearning beginners
Top comments (0)