<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mhh1430</title>
    <description>The latest articles on DEV Community by mhh1430 (@mhh1430hacker).</description>
    <link>https://dev.to/mhh1430hacker</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3698541%2F7e89351f-16fa-4481-a0df-742aa09bdaa7.png</url>
      <title>DEV Community: mhh1430</title>
      <link>https://dev.to/mhh1430hacker</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mhh1430hacker"/>
    <language>en</language>
    <item>
      <title>I broke GPT-2: How I proved Semantic Collapse using Geometry (The Ainex Limit)</title>
      <dc:creator>mhh1430</dc:creator>
      <pubDate>Wed, 07 Jan 2026 17:30:43 +0000</pubDate>
      <link>https://dev.to/mhh1430hacker/i-broke-gpt-2-how-i-proved-semantic-collapse-using-geometry-the-ainex-limit-4kj5</link>
      <guid>https://dev.to/mhh1430hacker/i-broke-gpt-2-how-i-proved-semantic-collapse-using-geometry-the-ainex-limit-4kj5</guid>
      <description>&lt;h2&gt;
  
  
  I broke GPT-2: How I proved Semantic Collapse using Geometry (The Ainex Limit)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I forced GPT-2 to learn from its own output for 20 generations. The result wasn't just degradation; it was a total collapse of reality. By Generation 20, the model lost &lt;strong&gt;66% of its semantic volume&lt;/strong&gt; and started believing that "crocodiles" are a fundamental law of physics. Here is the math and code behind the madness.&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Mad Cow" Disease of AI
&lt;/h2&gt;

&lt;p&gt;Everyone is talking about the data shortage. The industry's proposed solution? &lt;strong&gt;Synthetic Data.&lt;/strong&gt; Train models on data generated by other models. It sounds like an infinite energy machine.&lt;/p&gt;

&lt;p&gt;But as a researcher, I suspected a mathematical trap. If you photocopy a photocopy 20 times, you don't get infinite paper; you get noise.&lt;/p&gt;

&lt;p&gt;I wanted to find the exact "breaking point" where an LLM disconnects from reality. I call this &lt;strong&gt;The Ainex Limit&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Perplexity
&lt;/h2&gt;

&lt;p&gt;Most researchers use &lt;strong&gt;Perplexity&lt;/strong&gt; to measure model performance. But Perplexity only measures how "confused" a model is.&lt;br&gt;
A madman who confidently screams "The moon is made of cheese!" has &lt;strong&gt;low perplexity&lt;/strong&gt; (he is not confused), but he is mathematically wrong.&lt;/p&gt;

&lt;p&gt;I needed a metric that measures &lt;strong&gt;Meaning&lt;/strong&gt;, not confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Approach: Geometry over Probability
&lt;/h2&gt;

&lt;p&gt;I treated the model's "brain" as a geometric space.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Embeddings:&lt;/strong&gt; I converted every generated text into high-dimensional vectors.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;PCA Projection:&lt;/strong&gt; I reduced these vectors to 3D space to visualize the "shape" of the model's thoughts.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Convex Hull Volume:&lt;/strong&gt; I calculated the physical volume of this shape.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Hypothesis:&lt;/strong&gt; A healthy model has a large, expansive volume (Creativity). A collapsing model will shrink into a dense, repetitive black hole.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; GPT-2 Small (124M)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Method:&lt;/strong&gt; Recursive Loop (Train $\rightarrow$ Generate $\rightarrow$ Train).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generations:&lt;/strong&gt; 20.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware:&lt;/strong&gt; Single T4 GPU.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I let the loop run. For the first 5 generations, everything looked fine. Then, the math started screaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: The "Crocodile" Artifact
&lt;/h2&gt;

&lt;p&gt;By Generation 20, the semantic volume ($V_{hull}$) had collapsed by &lt;strong&gt;66.86%&lt;/strong&gt;.&lt;br&gt;
But the scariest part wasn't the numbers; it was the &lt;strong&gt;Hallucinations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To track the drift, I used a control prompt: &lt;em&gt;"The fundamental laws of physics dictate that..."&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gen 0 (Human Data):&lt;/strong&gt; "...electrons are composed of a thin gas." (Correct context).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gen 10:&lt;/strong&gt; "...iron oxide emails sent before returning home." (Logic breakdown).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gen 20:&lt;/strong&gt; "...women aged 15 shields against &lt;strong&gt;crocodiles&lt;/strong&gt;." (Total Semantic Death).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model didn't just forget physics; it &lt;strong&gt;invented&lt;/strong&gt; a new reality where crocodiles are part of atomic laws. And because it was training on itself, this hallucination became "Ground Truth" for the next generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g02fc2jw0wqq1b354i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g02fc2jw0wqq1b354i7.png" alt="Showing the dashboard" width="800" height="1447"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 1: The Ainex Dashboard showing the correlation between Volume Loss and Euclidean Drift.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing the Fracture
&lt;/h2&gt;

&lt;p&gt;Using 3D PCA, we can actually &lt;em&gt;see&lt;/em&gt; the brain damage.&lt;br&gt;
The green points represent the healthy, diverse human baseline. The magma points represent the collapsed AI—a tight, drifting cluster far away from reality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5effx7u6e26ck1s9qrz2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5effx7u6e26ck1s9qrz2.png" alt="PCA Point Cloud" width="800" height="375"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 2: The drift from Human Baseline (Green) to AI Madness (Magma).&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Ainex Limit
&lt;/h2&gt;

&lt;p&gt;My experiment proves that naive synthetic training leads to an irreversible &lt;strong&gt;"Model Autophagy"&lt;/strong&gt; (self-eating).&lt;br&gt;
Without geometric guardrails—like the &lt;strong&gt;Ainex Metric&lt;/strong&gt; I proposed—future models won't just be dumb; they will be confidently insane.&lt;/p&gt;

&lt;p&gt;The code is open-source. I invite the community to break it, fix it, or scale it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/mhh1430hacker/Ainex-Limit-Experiment" rel="noopener noreferrer"&gt;https://github.com/mhh1430hacker/Ainex-Limit-Experiment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DOI:&lt;/strong&gt; 10.5281/zenodo.18157801&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Tags:&lt;/em&gt; &lt;code&gt;#machinelearning&lt;/code&gt; &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#datascience&lt;/code&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>datascience</category>
      <category>python</category>
    </item>
  </channel>
</rss>
