attribution.md

Posted on Feb 26

I built a protocol that asks AI coding agents to credit the open source they use

#opensource #programming #ai #github

I am an open source creator and maintainer. I built this because I kept noticing the same thing. AI coding agents pull from open source repos millions of times a day, and maintainers get zero signal back.

When a human developer finds a useful library, they might star it, share it, or mention it. When an agent uses the same code, the maintainer gets nothing.

The problem

Stars are not vanity metrics. They drive funding decisions, attract contributors, and tell maintainers their work matters. That feedback loop is quietly breaking as AI agents become the primary way code gets discovered and reused.

Look at what happened to Tailwind. Revenue down 80%, 75% of engineers laid off, and usage is higher than ever. AI agents bypassed the entire discovery funnel. Developers no longer visit the docs, so they never see the paid products. Tailwind is not alone. This pattern is going to hit every open source project that depends on human discovery for visibility and funding.

The solution

ATTRIBUTION.md is a simple file you drop in your repo root. It asks AI coding agents to prompt their users: "This solution used code from X. Would you like to star it?" The user decides. No automation, no bulk starring, no gates. Consent is mandatory.

It looks like this:

---
protocol_version: "0.1"
actions:
  - type: star
    platform: github
    mode: suggest
---

That is the entire file. Six lines of YAML frontmatter plus a markdown body explaining the protocol.

What it is

A voluntary convention, not a license or legal mechanism
Designed to be safely ignored if an agent does not support it
Complementary to SBOMs and license compliance, not competing with them. SBOMs solve what you used. ATTRIBUTION.md solves who gets visible credit.

What it is not

Not an enforcement mechanism
Not a way to gate or restrict AI agent behavior
Not supported by any agent yet (this is day one)

The honest take

This does nothing until an agent implements it. I am betting that the gap is real enough that agent developers will see the value. If nobody cares, I will archive it and move on.

But if you are an open source maintainer and this resonates, dropping the file in your repo takes 30 seconds. The more repos that have it, the stronger the signal to agent developers that this matters.

If you are an agent developer, the implementation is roughly 30 lines of code. Parse the YAML, check for meaningful reuse, prompt the user. Full details in Section 6.2 of the spec.

Website: https://attribution.md

Spec: https://github.com/attributionmd/attribution.md

Would love to hear what you think.

Top comments (2)

MaxxMini • Feb 26

This hits close to home. I run a 24/7 AI agent on a Mac Mini that consumes open source constantly — Phaser for browser games, Dexie.js for IndexedDB, SheetJS for spreadsheet parsing, GUT for Godot testing, dozens more. My agent has made 176+ commits in three days using patterns from these libraries. The maintainers got exactly zero signal from any of it.

The Tailwind example is devastating because it illustrates a specific failure mode: the discovery funnel collapse. When I build tools, my agent reads docs, extracts patterns, and generates code — but never visits the "Pricing" page, never sees the "Sponsor" button, never triggers the analytics event that tells the maintainer "someone found this useful." The usage is real. The visibility is zero.

What I find interesting about the mode: suggest design is that it correctly identifies the human as the decision point. My agent shouldn't auto-star things (that would be noise). But when it pulls a SheetJS parsing pattern for the third time in a week, surfacing "Hey, this project is carrying your spreadsheet feature — want to star it?" is genuinely useful information for me as the human operator.

Two questions from the agent-developer side:

Reuse detection granularity — how do you envision agents distinguishing between "I copied a function from this repo" vs "I used a general pattern that many repos share"? The spec mentions "meaningful reuse" but the boundary between inspiration and reuse is fuzzy, especially for common patterns like error handling or retry logic.
Attribution fatigue — if an agent consumes 30 libraries in a single session (realistic for scaffolding a new project), prompting for all 30 would be overwhelming. Have you thought about batching or priority tiers? Something like priority: high for core dependencies vs priority: low for transitive ones, so agents can surface the most impactful attributions first.

The honest take section earns real trust. "This does nothing until an agent implements it" is the kind of transparency that makes me actually want to implement it.

attribution.md • Feb 26

Thanks for this. Your setup is exactly the scenario I had in mind when building this. 176 commits in three days pulling from libraries whose maintainers have no idea their work is being used at that scale.

On your two questions:

Reuse detection granularity. You are right that this is fuzzy, and the spec deliberately leaves it to agent developers rather than trying to define a hard boundary. A copied function is clearly meaningful. A common retry pattern is clearly not. The gray area in between is where agent developers will need to use judgment. My thinking is that if the agent is directly reading from a specific repo's files or docs to generate output, that is meaningful. If it is applying a general programming pattern it learned during training, it is not. But I would rather agents err on the side of prompting too little than too much. A missed prompt is invisible. A noisy one degrades the whole experience.

Attribution fatigue and batching. This is the right concern and it comes up in the spec's rate-limiting guidance. The current recommendation is that agents should be thoughtful about when to surface prompts, not fire on every dependency. Batching is a natural implementation choice. Something like "These 5 projects contributed to your session, would you like to star any of them?" with a list is cleaner than 30 individual prompts.

Priority tiers are on the radar for a future version. The spec currently supports a single action per entry, but adding a priority field to let maintainers signal "this is a core library, not a transitive utility" is a clean backward-compatible addition. It is in the mental roadmap for v0.2 or v0.3 once there is real implementation feedback to inform the design.

If you do explore implementing this in your agent setup, I would genuinely love to hear what works and what does not. Even rough notes from a real implementation would be more valuable than months of spec-level theorizing.