DEV Community

Asmit Phuyal
Asmit Phuyal

Posted on • Originally published at blog.asmitphuyal.com.np

Model Collapse: When AI learns from AI

Model Collapse: When AI Learns from AI

Let’s imagine a line of people playing the telephone game. The last person, labeled F, whispers a message to E. E whispers what she heard to D, and the process continues till the message reaches A.

By the time A receives the message, it will be totally different from what F originally intended. There will be lots of distortions and inaccuracies.

Likewise is the case for AI model training. If synthetic data (AI-generated content) is used to train the next model, and then that new synthetic data is again used to train another model, the final model tends to produce more homogenous output — more error-prone, less useful, less diverse, and less accurate.

Let’s get deeper into it.

You are probably familiar with the importance of diversity in ecosystems. The same applies to AI training data — otherwise, models risk collapse.

With the rapid rise of LLMs, the internet is increasingly being flooded with AI-generated content. As LLMs are trained heavily on data scraped from the internet, future training datasets will contain AI-generated data as inputs.

Organizations working on LLMs value human-generated data, which could become harder to find over time. I’ve heard ideas about making AI-generated data easily identifiable so that future systems can distinguish synthetic data from real. I mean some kind of human-undetectable but machine-detectable information embedded within AI-generated outputs.

But here’s the catch — there are already tools that make AI-generated content appear more human. If that AI-generated data is again used to train models, what happens? A deadlock? An infinite feedback loop? Irreversible defects?

Here’s what I think happens if a model collapses:

  1. Poor decisions for rare cases; models will struggle with edge-case scenarios.
  2. AI-generated outputs are expected to be unique, but users may be disappointed by repetitive or similar responses.
  3. Amplified bias in certain rare topics; models rely more on dominant patterns.

Model collapse is a significant challenge for the future development of robust and reliable AI. If we don’t pay attention now, future AIs might lose the very thing that made them powerful — their connection to human experience.

Top comments (0)