<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jack Elston</title>
    <description>The latest articles on DEV Community by Jack Elston (@jack_elston_14706fe87c3f8).</description>
    <link>https://dev.to/jack_elston_14706fe87c3f8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3833548%2F85945c48-f6ba-43e5-9314-d2829482b370.jpg</url>
      <title>DEV Community: Jack Elston</title>
      <link>https://dev.to/jack_elston_14706fe87c3f8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jack_elston_14706fe87c3f8"/>
    <language>en</language>
    <item>
      <title>I Gave 6 AI Models the Same March Madness Bracket. They All Agreed.</title>
      <dc:creator>Jack Elston</dc:creator>
      <pubDate>Thu, 19 Mar 2026 10:57:14 +0000</pubDate>
      <link>https://dev.to/jack_elston_14706fe87c3f8/i-gave-6-ai-models-the-same-march-madness-bracket-they-all-agreed-16n2</link>
      <guid>https://dev.to/jack_elston_14706fe87c3f8/i-gave-6-ai-models-the-same-march-madness-bracket-they-all-agreed-16n2</guid>
      <description>&lt;h1&gt;
  
  
  The guy who flew drones into tornadoes used 6 AI models to predict March Madness — and discovered they all think the same thing.
&lt;/h1&gt;

&lt;p&gt;I spent last night doing what any engineer would do during March Madness: building a prediction system that pits Claude, GPT-4o, Gemini, Grok, Llama, and DeepSeek against each other.&lt;/p&gt;

&lt;p&gt;The question was not just "who wins the tournament" — it was &lt;strong&gt;"do these models actually think differently, or are they all trained on the same data and producing the same outputs?"&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;6 frontier AI models from 6 different companies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;~$5-10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;~$0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 3 Mini&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;~$0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;Meta (via Groq)&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek Chat&lt;/td&gt;
&lt;td&gt;DeepSeek AI&lt;/td&gt;
&lt;td&gt;$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each model independently predicted all 32 first-round games. Then we ran Monte Carlo sensitivity analysis, cross-model adversarial debate, ML calibration on 10 years of data, and KenPom integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total: 1,300+ API calls. ~$15.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Big Finding
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;80.6% of predictions were identical across all six models.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding models 4, 5, and 6 changed &lt;em&gt;zero&lt;/em&gt; bracket picks. DeepSeek at $0.01 produced the same predictions as GPT-4o at $0.27 — a 27x cost difference for the same output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Do They All Agree?
&lt;/h2&gt;

&lt;p&gt;They all learned basketball from the same internet. Same ESPN articles. Same KenPom data. Same Reddit takes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model diversity ≠ information diversity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most important finding. It has massive implications for anyone paying for multiple AI services.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Human Insight That Beat 1,300 API Calls
&lt;/h2&gt;

&lt;p&gt;The most impactful change to our bracket? Not the ML model (92.1% accuracy). Not the Monte Carlo analysis. Not the cross-model debate.&lt;/p&gt;

&lt;p&gt;It was me saying &lt;strong&gt;"history says we need more upsets."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That single human observation — grounded in the fact that the average tournament has 10+ first-round upsets — changed more picks than all 1,300 API calls combined.&lt;/p&gt;

&lt;h2&gt;
  
  
  The UAS Analogy
&lt;/h2&gt;

&lt;p&gt;I fly drones into tornadoes for a living (CEO of &lt;a href="https://bst.aero" rel="noopener noreferrer"&gt;Black Swift Technologies&lt;/a&gt;). Same principle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two weather AI companies with the same NOAA data = same forecast&lt;/li&gt;
&lt;li&gt;One company with sensors inside a hurricane = better forecast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The AI model is not the moat. The DATA is the moat.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;92.1%&lt;/strong&gt; — ML model accuracy on 10 years of historical data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80.6%&lt;/strong&gt; — Agreement rate across 6 models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0.01&lt;/strong&gt; — DeepSeek cost for same predictions as $0.27 GPT-4o&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 in 763&lt;/strong&gt; — Perfect bracket odds even at 90% per-game accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$22M&lt;/strong&gt; — Kentucky NIL spend (most expensive roster, only a 7-seed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;77th&lt;/strong&gt; — Florida ranked in NIL spending (won championship last year)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Seldon Parallel
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Psychohistory dealt not with man, but with man-masses. The reaction of one man could be forecast by no known mathematics; the reaction of a billion is something else again."&lt;/em&gt; — Isaac Asimov, Foundation&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;68 teams is somewhere between one man and a billion. We are using the same principle: aggregate enough independent signals and the noise cancels out, leaving the signal.&lt;/p&gt;

&lt;p&gt;The ~8% we cannot capture? A player having the game of their life. A referee whistle. A lucky bounce. Even Hari Seldon could not predict that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard:&lt;/strong&gt; &lt;a href="https://madness.elstonj.com" rel="noopener noreferrer"&gt;madness.elstonj.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paper (9 pages):&lt;/strong&gt; &lt;a href="https://madness.elstonj.com/paper.pdf" rel="noopener noreferrer"&gt;madness.elstonj.com/paper.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/elstonj/march-madness-2026" rel="noopener noreferrer"&gt;github.com/elstonj/march-madness-2026&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hidden Easter egg:&lt;/strong&gt; Try the Konami code on the site 🦬&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Champion pick: Duke over Arizona.&lt;/strong&gt; Let us see if psychohistory holds up. 🏀&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Jack Elston, Ph.D. — CEO, Black Swift Technologies — Boulder, CO&lt;/em&gt;&lt;br&gt;
&lt;em&gt;The guy who flew the first drone into a tornadic supercell thunderstorm.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>sports</category>
    </item>
  </channel>
</rss>
