<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CodersAcademy006</title>
    <description>The latest articles on DEV Community by CodersAcademy006 (@codersacademy006).</description>
    <link>https://dev.to/codersacademy006</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1255300%2Fd6ebd328-0de6-4d80-b39f-dbb32ce605d6.png</url>
      <title>DEV Community: CodersAcademy006</title>
      <link>https://dev.to/codersacademy006</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codersacademy006"/>
    <language>en</language>
    <item>
      <title>I Built a Local DuckDB Engine to Mathematically Settle Cricket Debates</title>
      <dc:creator>CodersAcademy006</dc:creator>
      <pubDate>Fri, 29 May 2026 12:02:39 +0000</pubDate>
      <link>https://dev.to/codersacademy006/i-built-a-local-duckdb-engine-to-mathematically-settle-cricket-debates-5b38</link>
      <guid>https://dev.to/codersacademy006/i-built-a-local-duckdb-engine-to-mathematically-settle-cricket-debates-5b38</guid>
      <description>&lt;p&gt;For years, sports commentators have repeated the same clichés about T20 cricket: &lt;em&gt;"You have to win the powerplay,"&lt;/em&gt; &lt;em&gt;"You need an anchor to win,"&lt;/em&gt; and &lt;em&gt;"Team X always chokes."&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;I wanted to know if any of that was actually true. &lt;/p&gt;

&lt;p&gt;The problem is that querying 15+ years of historical ball-by-ball telemetry (over 294,000 deliveries) via commercial sports APIs is painfully slow and brutally rate-limited. &lt;/p&gt;

&lt;p&gt;So, I built &lt;a href="https://github.com/CodersAcademy006/Midwicket" rel="noopener noreferrer"&gt;Midwicket&lt;/a&gt;—an open-source SDK that bypasses APIs entirely. It pulls raw open data into a local DuckDB and PyArrow engine, turning your laptop into a sub-millisecond sports data warehouse. &lt;/p&gt;

&lt;p&gt;To test the architecture, I wrote a logistic regression Win Probability model (AUC 0.87) trained on 1,239 IPL matches. Because the query engine is entirely local, I could calculate the exact &lt;strong&gt;Win Probability Added (WPA)&lt;/strong&gt; for every single ball ever bowled. &lt;/p&gt;

&lt;p&gt;When I ran the numbers, the data completely broke some of the sport's biggest myths.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The "Choke Index" is real, and it's ruthless.
&lt;/h2&gt;

&lt;p&gt;I queried the database for every match where a chasing team reached an &lt;strong&gt;80% Win Probability&lt;/strong&gt; at any point, but still managed to lose the game. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Franchise&lt;/th&gt;
&lt;th&gt;80%+ WP Matches&lt;/th&gt;
&lt;th&gt;Chokes&lt;/th&gt;
&lt;th&gt;Choke %&lt;/th&gt;
&lt;th&gt;Rating&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Royal Challengers Bangalore (RCB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.7%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Notorious&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mumbai Indians (MI)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;40.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chennai Super Kings (CSK)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kolkata Knight Riders (KKR)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22.2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Composed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Finding:&lt;/strong&gt; RCB has a staggering 66.7% choke rate from commanding positions. If they reach an 80% probability of winning, they are mathematically more likely to throw it away than finish the job. KKR, on the other hand, converts almost 78% of these positions.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Anchor vs. The Finisher (Kohli vs Dhoni)
&lt;/h2&gt;

&lt;p&gt;Who is more valuable in a run chase: the guy who bats for 15 overs to build a foundation (Kohli), or the guy who comes in at the end (Dhoni)? &lt;/p&gt;

&lt;p&gt;The WPA math heavily favors the finisher. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Player&lt;/th&gt;
&lt;th&gt;Total Innings Analyzed&lt;/th&gt;
&lt;th&gt;Positive WPA %&lt;/th&gt;
&lt;th&gt;Average WPA per Innings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MS Dhoni&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+46.5%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Virat Kohli&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+25.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Finding:&lt;/strong&gt; Kohli is vastly consistent, but Dhoni mathematically contributes ~20% more total win probability per innings. The model reveals that run-chases are highly non-linear; the leverage in the final 3 overs is so massive that a finisher essentially holds the entire win probability of the team in their hands.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. "Winning the Powerplay" is a myth. Over 19 is the cliff.
&lt;/h2&gt;

&lt;p&gt;If you simulate a catastrophic 2-wicket collapse at any point in a chase, the Powerplay (Overs 1-6) is surprisingly forgiving—a collapse there drops your win probability by about 25%. &lt;/p&gt;

&lt;p&gt;However, a collapse in &lt;strong&gt;Over 19&lt;/strong&gt; drops it by &lt;strong&gt;60.9%&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;By the 19th over, the margin of error mathematically compresses to zero. It is the definitive breaking point of the sport.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Open Source Engine
&lt;/h3&gt;

&lt;p&gt;If you’re a sports analytics nerd, a data engineer, or just someone who wants to run SQL queries against 15 years of sports data without paying for an API, I’ve open-sourced the entire engine, the PyArrow schema, and the trained models. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check out the repo here:&lt;/strong&gt; &lt;a href="https://github.com/CodersAcademy006/Midwicket" rel="noopener noreferrer"&gt;Midwicket on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d love to hear your thoughts on the architecture, the DuckDB implementation, or the data findings! What team should I run the choke index on next?&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>data</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
