Two bipartisan bills dropped in Congress this month that would force every AI company in America to publicly disclose every copyrighted work they used for training. Including retroactively, for models that already exist.
The bills are the CLEAR Act in the Senate and the TRAIN Act in the House. They arrived the same week Anthropic is fighting a $3 billion music piracy lawsuit and two months after settling a separate $1.5 billion case over stolen books. The timing is not subtle.
What the Bills Require
The CLEAR Act — Copyright Labeling and Ethical Reporting Act — was introduced February 10 by Senators Adam Schiff (D-CA) and John Curtis (R-UT). It requires AI companies to file notices with the U.S. Copyright Office listing copyrighted works used in training before releasing any new model. For models already deployed, retroactive disclosure is mandatory.
The Copyright Office would maintain a searchable public database. Penalties: $5,000 per missed notice, capped at $2.5 million. Courts can issue injunctions blocking non-compliant training data until the violation is cured. Copyright owners can sue directly.
The House companion, the TRAIN Act (H.R. 7209), was introduced by Representatives Madeleine Dean (D-PA) and Nathaniel Moran (R-TX). It takes a different approach: modeled on the DMCA's anti-piracy subpoena process, it lets copyright holders compel AI companies to disclose whether specific works were used. The enforcement mechanism is brutal. If a company ignores a valid subpoena, courts apply a rebuttable presumption that the company copied the work. The burden of proof flips. Silence becomes admission.
The Senate version (S. 2455) has four sponsors: Senators Peter Welch (D-VT), Marsha Blackburn (R-TN), Josh Hawley (R-MO), and Schiff.
Both bills are bipartisan. In a Congress that can't agree on lunch, that detail matters.
Everybody With a Union Endorsed It
Twenty-five creator organizations signed on to the CLEAR Act. SAG-AFTRA. The Writers Guild, East and West. The Directors Guild. IATSE. The American Federation of Musicians. The National Music Publishers Association. The Authors Guild. The RIAA. The Copyright Alliance. SoundExchange. The Television Academy. Artists Rights Alliance. The National Association of Voice Actors.
That coalition doesn't form unless the underlying grievance is specific and documented. It formed.
The Receipts Are Already Leaking
The bills arrive against a backdrop of lawsuits that have already exposed what AI training data looks like in practice.
In August 2025, Anthropic settled Bartz v. Anthropic for $1.5 billion — the largest copyright settlement in U.S. history. The case revealed that approximately 500,000 pirated books from shadow libraries like Library Genesis and Pirate Library Mirror were used to train Claude. Roughly $3,000 per work. Judge Alsup called the terms "fair." Final approval is scheduled for April 23, 2026.
Then on January 28, a coalition led by Universal Music Group, Concord, and ABKCO filed a $3 billion lawsuit against Anthropic over 20,000 pirated songs. The complaint alleges that co-founder Benjamin Mann personally used BitTorrent to download approximately five million pirated files from Library Genesis in 2021, and that CEO Dario Amodei authorized it. The publishers found the evidence during discovery in the Bartz case. If true, the CLEAR Act would require Anthropic to publicly catalog the same activity it just paid $1.5 billion to settle.
The New York Times lawsuit against OpenAI is still pending. So is the visual artists' class action against Stability AI, Midjourney, and DeviantArt. Getty Images sued Stability AI. The Intercept and Raw Story sued OpenAI. Authors including George R.R. Martin, John Grisham, and Sarah Silverman have filed separate claims.
No major AI company has voluntarily disclosed a complete list of training data.
Europe Is Already Doing This
The EU AI Act made training data disclosure mandatory as of August 2, 2025. General-purpose AI providers must individually identify large training datasets, disclose commercially licensed sources, list the top 10 percent of web domains scraped, and describe content moderation approaches. Models released before the deadline get until August 2027 to comply. Penalties run up to 15 million euros or 3 percent of global annual revenue.
The format uses narrative responses rather than rigid checklists. Updates are required every six months or sooner for material changes.
The CLEAR Act is narrower — it focuses on copyrighted works specifically rather than all training data. But the retroactive requirement is comparable, and the public database would be unprecedented in the United States.
What Happens If It Passes
Senator Schiff introduced the Generative AI Copyright Disclosure Act in 2024 as H.R. 7913. It required a "sufficiently detailed summary" of copyrighted works, with penalties starting at $5,000. It died in committee.
The CLEAR Act is the second attempt, upgraded with a Senate companion, bipartisan co-sponsorship, and a 25-organization endorsement wall. The political dynamics have shifted. Anthropic's settlements have made the abstract concrete. The UMG lawsuit put a face on the scale: five million files, downloaded via BitTorrent, by a named co-founder.
The opposition is already forming. On December 29, xAI sued to invalidate California's AB 2013, a state-level training data transparency law that took effect January 1. The arguments: Fifth Amendment takings (forced disclosure destroys trade secret value), Fourteenth Amendment vagueness (what counts as a "high-level summary"?), and First Amendment compelled speech. If those arguments succeed against California's law, which requires less granular disclosure than the CLEAR Act, the federal bills face the same constitutional gauntlet.
No major AI company has publicly endorsed either bill. OpenAI has urged the Trump administration to "prevent less innovative countries from imposing their legal regimes on American AI firms." AI-backed super PACs funded by Andreessen Horowitz and Palantir co-founder Joe Lonsdale have called proposed transparency laws "ideological and politically motivated legislation" that would "handcuff" the industry.
As of October 2025, at least 51 copyright lawsuits against AI companies were pending in U.S. courts. Not one has produced a ruling on the core fair use question. The CLEAR Act and TRAIN Act exist because Congress got tired of waiting for courts to answer a question the industry would prefer stay open forever.
If you work with AI, check out my AI prompt engineering packs on Polar — battle-tested prompts for developers.
Top comments (0)