Every time you ask an AI to write code, something disappears.
Not the code — the code shows up fine. What disappears is the trail. The GitHub discussion where someone spent two hours explaining why cursor-based pagination beats offset for live-updating datasets. The Stack Overflow answer from 2019 where one person, after a week of debugging, documented exactly why that approach fails under concurrent writes. The RFC your team wrote six months ago that established the pattern the AI just silently copied.
The AI consumed all of it. The humans who produced it got nothing.
And I don't mean "nothing" philosophically. I mean: no citation in the codebase. No way for a new developer to trace why the code is written the way it is. No signal to the person who wrote the original answer that their work mattered.
Over time, at scale, those people stop contributing. Why maintain a detailed GitHub discussion if AI will summarize it into oblivion and no one will read the original?
This is the quiet cost of AI-assisted development that nobody is measuring. I've been thinking about it for a while, and I built something to address it.
The scenario
A developer joins a team. Six months of AI-assisted codebase. They hit a bug in the pagination logic — cursor-based, unusual implementation, nobody on the team remembers why it was built that way. The original developer who designed it has left.
Old answer: two days of archaeology. git blame points to a commit message that says "fix pagination." The commit before that says "implement pagination." Dead end.
With poc.py trace src/utils/paginator.py, that same developer sees this in thirty seconds:
Provenance trace: src/utils/paginator.py
────────────────────────────────────────────────────────────
[HIGH] @tannerlinsley on github
Cursor pagination discussion
https://github.com/TanStack/query/discussions/123
Insight: cursor beats offset for live-updating datasets
Knowledge gaps (AI-synthesized, no human source):
• Error retry strategy — no human source cited
• Concurrent write handling — AI chose this arbitrarily
They now know exactly where the pattern came from and — critically — which parts of the code have no traceable human source. That second section is what saves them. The concurrent write handling is where the bug lives. AI made a choice nobody reviewed.
That's what this tool does. Not enforcement first. Archaeology first.
What I built
proof-of-contribution is a Claude Code skill that keeps the human knowledge chain intact inside AI-assisted codebases.
The core idea is simple: every AI-generated artifact should stay tethered to the human knowledge that inspired it. Not as a comment at the top of a file that nobody reads. As a structured, queryable, enforceable record that lives next to the code.
When the skill is active, Claude automatically appends a Provenance Block to every generated output:
## PROOF OF CONTRIBUTION
Generated artifact: fetch_github_discussions()
Confidence: MEDIUM
## HUMAN SOURCES THAT INSPIRED THIS
[1] GitHub GraphQL API Documentation Team
Source type: Official Docs
URL: docs.github.com/en/graphql
Contribution: cursor-based pagination pattern
[2] GitHub Community (multiple contributors)
Source type: GitHub Discussions
URL: github.com/community/community
Contribution: "ghost" fallback for deleted accounts
surfaced in bug reports
## KNOWLEDGE GAPS (AI synthesized, no human cited)
- Error handling / retry logic
- Rate limit strategy
## RECOMMENDED HUMAN EXPERTS TO CONSULT
- github.com/octokit community for pagination
The section that matters most is Knowledge Gaps. That's where AI admits what it synthesized without a traceable human source. No other tool I know of produces this. It's the part that turns "the AI wrote it" from a shrug into an auditable fact.
How Knowledge Gaps actually get detected
This is the part worth explaining carefully, because the obvious assumption — that the AI just introspects and reports what it doesn't know — is wrong. LLMs hallucinate confidently. An AI that could reliably detect its own knowledge gaps wouldn't produce knowledge gaps in the first place.
The detection mechanism is different. It's a comparison, not introspection.
When you use spec-writer before building, it generates a structured spec with an explicit assumptions list — every decision the AI is making that you didn't specify, each one impact-rated. That list is the contract: here is every claim this feature rests on.
When the code ships, proof-of-contribution cross-checks the final implementation against that contract. Anything the code does that doesn't map to a spec assumption or a cited human source gets flagged as a Knowledge Gap. The AI isn't grading its own exam. The spec is the answer key.
The result is deterministic. If the retry logic wasn't specified and no human source covers it, the gap appears in the block regardless of how confident the model was when it wrote the code. The boundary holds because it comes from the spec, not from the model's confidence.
This is also why the confidence levels mean something. HIGH means the spec explicitly covered it or the user provided the source directly. MEDIUM means the pattern traces to recognized human-authored work but the exact source isn't pinned. LOW means the model synthesized it — human review strongly recommended before this code goes anywhere near production.
There's a second detection path that doesn't require spec-writer at all. poc.py verify runs Python's built-in ast module against the file and extracts every function definition, conditional branch, and return path. It cross-checks each one against the seeded claims. No API calls. No model confidence. Pure static analysis. When you run it on a file where import-spec was used first, only the assumptions with no resolved citation surface as gaps. When you run it cold, every uncited structural unit surfaces as a baseline. Either way, the AI's confidence at generation time is irrelevant — the boundary comes from the code's actual structure.
Three things the skill does
Provenance Blocks — attached automatically to any generated code, doc, or architecture output. You don't have to ask. It's always there.
Knowledge Graph schema — when you're building a system to track contributions at scale. Claude generates a complete graph schema for Neo4j, Postgres, or JSON-LD. Nodes for code artifacts, human sources, individual experts, AI sessions, and knowledge claims. Edges that let you ask: "who are the humans behind this module?" or "what did @username contribute to this codebase?"
Static analyser (poc.py verify) — runs after the agent builds. Parses the file's structure using Python's AST, cross-checks every function and branch against seeded claims, and reports deterministic Knowledge Gaps. Zero API calls. Exit code 0 means clean, 1 means gaps found — CI-compatible.
HITL Indexing architecture — when you want AI to surface human experts instead of summarizing them. The query interface returns Expert Cards:
Answer: Use cursor-based pagination with GraphQL endCursor.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HUMAN EXPERTS ON THIS TOPIC
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
👤 @tannerlinsley (GitHub)
Expertise signal: 23 contributions on pagination patterns
Key contribution: github.com/TanStack/query/discussions/123
Quote: "Cursor beats offset when rows can be inserted mid-page"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Not a summary. A pointer. The human expert stays visible.
Getting started takes one command
I didn't want this to be another tool that requires you to choose a database before you can do anything. The default is SQLite. It works immediately.
# Install the skill
mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git ~/.claude/skills/proof-of-contribution
# Scaffold your project (run once, in your repo root)
python ~/.claude/skills/proof-of-contribution/assets/scripts/poc_init.py
That creates four things:
-
.poc/provenance.db— SQLite database, local only, gitignored -
.poc/config.json— project config, committed -
.github/PULL_REQUEST_TEMPLATE.md— PR template with an AI Provenance section -
.github/workflows/poc-check.yml— GitHub Action that fails PRs missing attribution
Then you get a local CLI:
python poc.py add src/utils/parser.py # record attribution interactively
python poc.py trace src/utils/parser.py # show full human attribution chain
python poc.py report # repo-wide provenance health
python poc.py experts # top cited humans in your graph
poc.py verify is what catches gaps before they become incidents:
python poc.py verify src/utils/csv_exporter.py
Verify: src/utils/csv_exporter.py
────────────────────────────────────────────────────────────
Structural units detected : 11
Seeded claims : 3
Covered by cited source : 2
Deterministic gaps : 1
Deterministic Knowledge Gaps (no human source):
• function: handle_concurrent_writes (lines 47–61)
Seeded assumption: concurrent write handling — AI chose this arbitrarily
Resolve: python poc.py add src/utils/csv_exporter.py
poc.py trace is what I use the most for the full attribution picture. This is what it looks like on a real file:
Provenance trace: src/utils/csv_exporter.py
────────────────────────────────────────────────────────────
[HIGH] @juliandeangelis on github
Spec Driven Development at MercadoLibre
https://github.com/mercadolibre/sdd-docs
Insight: separate functional from technical spec
[MEDIUM] @tannerlinsley on github
Cursor pagination discussion
https://github.com/TanStack/query/discussions/123
Insight: cursor beats offset for live-updating datasets
Knowledge gaps (AI-synthesized, no human source):
• Error retry strategy — no human source cited
• CSV column ordering — AI chose this arbitrarily
The GitHub Action is for teams that already find the trace valuable
Once you've used poc.py trace enough times that it's saved you real hours — that's when you push the GitHub Action. Not before.
git add .github/ .poc/config.json poc.py
git commit -m "chore: add proof-of-contribution"
git push
After that, every PR gets checked. If a developer submits AI-assisted code without an ## 🤖 AI Provenance section in the PR description, the action fails and posts a comment explaining what's needed.
The opt-out is simple: write 100% human-written anywhere in the PR body and the check skips.
The enforcement works because the tool already saved them hours before they turned it on. The PR check isn't introducing friction — it's standardizing something people already want to do. That's the only version of a mandate that doesn't get gamed.
It works with spec-writer
I built spec-writer first. It turns vague feature requests into structured specs, technical plans, and task breakdowns before the agent starts building. The problem spec-writer solves is ambiguity before the code exists.
proof-of-contribution solves attribution after the code exists.
They connect at the assumption layer. spec-writer generates an assumptions list — every implicit decision the AI made that you didn't specify, impact-rated, with guidance on when to correct it. Each correction can now carry a citation. Each citation becomes a node in the knowledge graph. By the time a developer runs poc.py trace on a finished module, the full chain is visible:
feature request → spec decision → human source → code artifact
↑
poc.py verify closes this loop
without asking the AI what it missed
That chain is what I mean when I say AI should be a pointer to human expertise. Not a replacement. A pointer.
Why 2026 is the right time to build this
The tools are mature. Coding agents are shipping code at scale. The question of "who is responsible for this output?" is becoming real — in teams, in code reviews, in enterprise audits.
The provenance infrastructure doesn't exist yet. git blame tells you who committed. It doesn't tell you what human knowledge shaped the decision. That gap is getting wider every month.
proof-of-contribution is one piece of the infrastructure. It's not the whole answer. But it's the piece I could build, and it's the piece I think matters most: keeping the humans whose knowledge powers AI visible in the artifacts AI produces.
Install
mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git ~/.claude/skills/proof-of-contribution
Works with Claude Code, Cursor, Gemini CLI, and any agent that supports the Agent Skills standard.
Top comments (5)
The provenance problem is real and underappreciated. What you're describing is a version of what happens in infrastructure too — a configuration decision gets copied between systems, the original context gets lost, and six months later nobody knows why the timeout is set to 47 seconds or why that specific network range is excluded.
The difference in infrastructure is that the blast radius of losing context is more immediately visible. A misconfigured firewall rule breaks something today. A lost reasoning trail in code breaks something in 18 months when someone refactors without understanding what they're touching.
Your trace tool addresses the symptom correctly. The deeper question — why AI systems don't naturally preserve attribution — is probably structural. The training data was consumed, not cited. The output is synthesis, not quotation. Building attribution back in requires explicit tooling like what you've built, because the model itself has no incentive to surface where it learned something.
I've been documenting architectural decisions in CLAUDE.md for exactly this reason — not for the AI, but for the next session of the AI, and for the human reviewing the output. It's a partial solution to the same problem.
The 47-second timeout analogy is doing real work. That's exactly the failure mode — context that made sense in the original decision becomes cargo cult config by the second transfer. The difference you're naming (18-month blast radius vs. same-day) is why provenance feels optional until it's catastrophic.
The structural point is the one that doesn't have a clean answer: training consumed, not cited. Building attribution back in is archaeologist work, not feature work. You're right that it has to be explicit tooling because the incentive to surface sources was never in the training objective.
The CLAUDE.md approach is interesting — documenting for the next session rather than the current one. I've been thinking about whether that pattern needs to be machine-readable rather than just human-legible. Something that lets the next session actually query the reasoning, not just encounter it in plaintext. Does your CLAUDE.md have any structure beyond prose, or is it narrative-first?
Narrative-first, mostly — but with enough structure that Claude Code can extract what it needs without reading everything.
The pattern I've settled on is sections with consistent headers and a "Current State" block at the top that acts as the entry point for a new session. Something like:
The "Current State" block is the machine-readable part in practice — Claude Code reads it first and uses it to orient without re-deriving everything from the codebase. The architecture decisions section is genuinely prose, because the reasoning rarely compresses into structured data without losing the nuance that makes it useful.
The machine-readable vs human-legible tension you're naming is real. My instinct is that structured data (JSON, YAML) would be more queryable but would break the incentive to maintain it — nobody wants to write JSON to explain why they chose one approach over another. Prose has lower friction to update, which means it actually gets updated.
The open question for me is whether there's a middle ground: structured enough that a model can run a semantic search against it, but loose enough that a human writes it naturally. Something like Obsidian's approach to linking thoughts without forcing schema. Haven't solved it.
The maintenance incentive point is the one that usually kills schema-first approaches. JSON for architectural reasoning is a losing proposition — the friction is too high exactly where the thinking is most complex. What you've described is a workable compromise: structured enough at the entry point, prose where the nuance lives.
The Obsidian analogy is the right frame. What that approach actually does is keep linking lightweight — [[decision]] is lower friction than a foreign key. The structure emerges from the connections, not from enforcing a schema upfront.
The middle ground might already exist in how you've described your "Architecture decisions" section: prose that a model can semantic-search without ever needing to parse it as data. Vectorized prose retrieval is probably more useful than structured queries for this class of content anyway — the reasoning rarely has clean boundaries you'd want to filter on.
What I haven't figured out is the staleness problem. "Current State — last updated [date]" breaks the moment someone forgets to update it, which is most of the time. Does your pattern have any forcing function for that, or is it discipline-dependent?
Interesting perspective, Daniel — and the tooling is genuinely clever, especially the Knowledge Gaps detection via deterministic spec comparison rather than relying on model introspection. That’s a smart, grounded approach.
I see the problem differently though. You’re describing how AI is making expert developers less visible. I’m coming from the opposite situation entirely: I’m not a traditional developer. I never wrote production code before AI.
CORE — a governed autonomous runtime with 686 source files, constitutional rules, and a self-correcting daemon — exists because AI made it possible for someone like me to build it. Not “faster.” Possible.
And here’s what strikes me most about the attribution question: who wrote the AI in the first place? Every pattern, idiom, and architectural instinct the model uses came from the collective output of millions of developers whose code ended up in the training data. The expertise wasn’t destroyed. It was distilled.
From where I stand, AI isn’t making human expertise invisible. It’s fundamentally changing who gets to have expertise in the first place — democratizing access to knowledge that used to be locked behind years of deliberate practice, expensive education, or gatekept communities.
Your tool solves a real and important problem for existing teams worried about long-term maintainability and credit. I just think the larger civilizational shift is the opposite of invisibility: it’s the boundary between “developer” and “non-developer” quietly dissolving.