DEV Community

Cover image for The Copyright Apocalypse: Why Training on Everything Might Be the Last Time Anyone Can Do It
VelocityAI
VelocityAI

Posted on

The Copyright Apocalypse: Why Training on Everything Might Be the Last Time Anyone Can Do It

The lawsuits are mounting. Authors, artists, musicians, and publishers are suing AI companies for training on their work without permission. The cases are complex. The outcomes are uncertain. But one thing is clear: the era of training on "everything" may be coming to an end. Future models may be trained on a fraction of the data. They may be significantly less capable. This is the Copyright Apocalypse.

We are at a crossroads. The current generation of AI was trained on a vast, unlicensed corpus of human creativity. The next generation may be trained on a carefully curated, legally scrubbed dataset. The difference in quality could be dramatic.

The Legal Landscape
The lawsuits are numerous and varied.

The Plaintiffs:

Authors (e.g., Sarah Silverman, John Grisham).

Visual artists.

Music publishers.

News organizations (e.g., The New York Times).

The Claims:

Copyright Infringement: The AI companies copied and used copyrighted works without permission.

Right of Publicity: The AI companies used artists' names and likenesses.

Unfair Competition: The AI companies created products that compete with the original works.

The Defenses:

Fair Use: The AI companies argue that training is transformative and does not harm the market for the original works.

Lack of Direct Copying: The AI does not copy the works directly. It learns patterns.

A Contrarian Take: The Lawsuits Are Not About Copyright. They Are About Control.

The legal arguments are about copyright. But the real issue is control. The creators want to control how their work is used. They want to be compensated. They want to be acknowledged.

The AI companies argue that they are just reading. The creators argue that they are stealing. Both are right. The law is trying to catch up.

The Opt-Out Mechanisms
In response to the lawsuits, AI companies have introduced opt-out mechanisms.

The Mechanisms:

Robots.txt: Website owners can block web crawlers.

Data Removal Requests: Creators can request that their work be removed from training datasets.

NoAI Tags: New metadata tags that signal "do not train on this."

The Problem:

Opt-out is reactive, not proactive.

Most creators do not know about the mechanisms.

The mechanisms are easy to ignore.

A Contrarian Take: Opt-Out Is Not Consent. It Is a Trap.

Opt-out shifts the burden to the creator. It says: "We will use your work unless you tell us not to." That is not consent. That is an opt-out regime.

A true consent regime would require opt-in. The AI companies would have to ask permission. They are not asking.

The Future of Training Data
If the lawsuits succeed, the future of training data will look very different.

The Optimistic Scenario:

The AI companies pay for licenses.

They create a "Spotify for text" where creators are compensated.

The models are still powerful.

The Pessimistic Scenario:

The AI companies lose the lawsuits.

They are forced to delete their datasets.

Future models are trained on a fraction of the data.

They are significantly less capable.

A Contrarian Take: The Pessimistic Scenario Is Unlikely.

The AI companies have deep pockets. They will not let the lawsuits destroy their business.

They will pay the settlements. They will negotiate the licenses. They will find a way to keep training on massive datasets. The question is not whether they will train. It is what they will train on.

The Quality Gap
If future models are trained on less data, they will be less capable.

The Gap:

Factual Accuracy: Less data means fewer facts.

Nuance: Less data means less subtlety.

Creativity: Less data means less surprising combinations.

The Consequence:

The next generation of AI may be a step backward.

The "golden age" of AI may be behind us.

A Contrarian Take: The Gap Might Be Smaller Than We Think.

The current models are trained on vast amounts of data. But they are also trained on vast amounts of noise. Much of the data is redundant.

A smaller, carefully curated dataset might be more efficient. It might produce better results.

What You Can Do
You are not a lawyer. But you can still pay attention.

  1. Follow the Lawsuits:

The cases are ongoing.

The outcomes will shape the future of AI.

  1. Support Creators:

If you like a creator's work, support them directly.

Pay for their content. Share their work.

  1. Advocate for Fairness:

The AI companies should compensate creators.

The creators should have a say in how their work is used.

  1. Stay Informed:

The copyright apocalypse is not a single event. It is a process.

Stay informed about the latest developments.

The Last Lawsuit
The last lawsuit is not about copyright. It is about the future.

You ask: "What is the future of AI?"
The model says: "The future is uncertain."
You realize: The future depends on the choices we make today.

If you could design a fair system for training AI on copyrighted works, what would it look like? How would creators be compensated?

Top comments (0)