<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bid Canvas</title>
    <description>The latest articles on DEV Community by Bid Canvas (@bidcanvas).</description>
    <link>https://dev.to/bidcanvas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974568%2F8738fb07-5264-4c1a-a27b-77bd431c1236.png</url>
      <title>DEV Community: Bid Canvas</title>
      <link>https://dev.to/bidcanvas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bidcanvas"/>
    <language>en</language>
    <item>
      <title>The Beta-Binomial trick for not overreacting to a tiny sample</title>
      <dc:creator>Bid Canvas</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:14:00 +0000</pubDate>
      <link>https://dev.to/bidcanvas/the-beta-binomial-trick-for-not-overreacting-to-a-tiny-sample-4lmj</link>
      <guid>https://dev.to/bidcanvas/the-beta-binomial-trick-for-not-overreacting-to-a-tiny-sample-4lmj</guid>
      <description>&lt;p&gt;It's the third quarter. A team that shoots &lt;strong&gt;40% from three&lt;/strong&gt; on the season is sitting at &lt;strong&gt;4-for-20 (20%)&lt;/strong&gt; tonight and down 10. Every instinct says &lt;em&gt;ice cold, stay away.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable part: 20% on 20 attempts, when your true rate is 40%, is &lt;strong&gt;not&lt;/strong&gt; statistically weird. It's well inside two standard deviations of expected. If you've ever had to decide how much to trust a small sample — a conversion rate after 20 sessions, an error rate after 20 requests, a model's accuracy on a tiny eval set — this is the same problem wearing a jersey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 20 attempts tells you almost nothing
&lt;/h2&gt;

&lt;p&gt;Each shot is a Bernoulli trial with &lt;code&gt;p = 0.40&lt;/code&gt;. The standard deviation of the &lt;em&gt;observed&lt;/em&gt; percentage over &lt;code&gt;n&lt;/code&gt; shots is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SD = sqrt(p * (1 - p) / n)
   = sqrt(0.40 * 0.60 / 20)
   = sqrt(0.012)
   ≈ 0.1095   # ~11 percentage points
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eleven points of standard deviation on 20 attempts. A true-40% team will &lt;em&gt;routinely&lt;/em&gt; look like a 29% team or a 51% team in a single game, and both are completely normal. 4-for-20 sits about 1.8 SD below the mean — something you'd expect roughly twice a season. Unusual-ish. Not shocking.&lt;/p&gt;

&lt;p&gt;The signal-to-noise ratio at &lt;code&gt;n = 20&lt;/code&gt; is just terrible. Over a full season (~2,500 attempts) the picture stabilizes; in one game it's mostly noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: shrink toward the prior
&lt;/h2&gt;

&lt;p&gt;Instead of asking &lt;em&gt;"what's their percentage tonight?"&lt;/em&gt; (answer: 20%, meaningless), ask &lt;em&gt;"given everything we know, what's the best estimate of their true rate right now?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's a &lt;strong&gt;Beta-Binomial&lt;/strong&gt; update — the workhorse of this kind of problem.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prior:&lt;/strong&gt; the season says 40%. Encode it as &lt;code&gt;Beta(40, 60)&lt;/code&gt; — as if we'd already seen 40 makes in 100 attempts before tip-off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Likelihood:&lt;/strong&gt; tonight, 4 makes / 16 misses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Posterior:&lt;/strong&gt; &lt;code&gt;Beta(40 + 4, 60 + 16) = Beta(44, 76)&lt;/code&gt; → &lt;code&gt;44/120 ≈ 36.7%&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The estimate moves from 40% to &lt;strong&gt;36.7%&lt;/strong&gt; — not to 20%. The tiny in-game sample nudges the belief; the season-long record dominates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;beta&lt;/span&gt;

&lt;span class="c1"&gt;# Prior strength = how much you trust the season average (in pseudo-attempts)
&lt;/span&gt;&lt;span class="n"&gt;prior_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior_strength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;a0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prior_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;prior_strength&lt;/span&gt;          &lt;span class="c1"&gt;# 40
&lt;/span&gt;&lt;span class="n"&gt;b0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prior_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;prior_strength&lt;/span&gt;    &lt;span class="c1"&gt;# 60
&lt;/span&gt;
&lt;span class="n"&gt;makes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;makes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;makes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# Beta(44, 76)
&lt;/span&gt;
&lt;span class="n"&gt;post_mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ppf&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;0.025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.975&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# weight the in-game data actually got
&lt;/span&gt;&lt;span class="n"&gt;in_game_weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;prior_strength&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;posterior mean: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;post_mean&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# 36.7%
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;95% interval:   &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lo&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; – &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hi&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# ~28% – 46%
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in-game weight: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;in_game_weight&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# 17%
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With only 20 attempts, the model gives the live data just &lt;strong&gt;17%&lt;/strong&gt; weight; the other 83% comes from the prior. You'd need ~100 attempts to split it evenly — impossible in a single game.&lt;/p&gt;

&lt;p&gt;The lever is &lt;strong&gt;prior strength&lt;/strong&gt; — how much you trust the season number:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prior strength (α+β)&lt;/th&gt;
&lt;th&gt;Interpretation&lt;/th&gt;
&lt;th&gt;Posterior given 4/20&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;treat each game fresh&lt;/td&gt;
&lt;td&gt;30.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;some season context&lt;/td&gt;
&lt;td&gt;31.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;full-season confidence&lt;/td&gt;
&lt;td&gt;36.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;multi-season track record&lt;/td&gt;
&lt;td&gt;38.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even the &lt;em&gt;weakest&lt;/em&gt; prior lands at 30% — nowhere near the raw 20%. The Bayesian answer is always "expect regression." The only question is how much.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one thing shrinkage can't see
&lt;/h2&gt;

&lt;p&gt;The model assumes ability hasn't &lt;em&gt;changed&lt;/em&gt;. But maybe the defense adjusted and the team is taking genuinely worse looks. That's where shot-quality models (expected effective FG% from location, defender distance, shot type, clock) earn their keep: they separate &lt;em&gt;"bad luck on good shots"&lt;/em&gt; from &lt;em&gt;"good luck would've been needed on bad shots."&lt;/em&gt; Shrinkage handles the first; only shot quality catches the second.&lt;/p&gt;

&lt;p&gt;If you want the longer version — the shot-quality decomposition, the credible-interval reasoning, and how this plays out when live in-game markets overreact to exactly this noise — I wrote up the full breakdown with the shot-quality model &lt;a href="https://bidcanvas.com/research/3pt-shooting-variance" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Small samples lie, and they lie &lt;em&gt;loudly&lt;/em&gt;. Whether it's 20 three-point attempts or 20 API calls, the discipline is the same: don't read the raw rate, shrink it toward what you already knew, and size your confidence to how much data you actually have.&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>datascience</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
