<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: White Oak Intelligence</title>
    <description>The latest articles on DEV Community by White Oak Intelligence (@white_oak_intel).</description>
    <link>https://dev.to/white_oak_intel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3960296%2Feeab739d-f1c9-4b75-8c20-3a6d3ea74828.png</url>
      <title>DEV Community: White Oak Intelligence</title>
      <link>https://dev.to/white_oak_intel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/white_oak_intel"/>
    <language>en</language>
    <item>
      <title>Benford's Law: Catching Data Fabrication and Corporate Fraud with Pure Math</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Tue, 02 Jun 2026 19:48:05 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/benfords-law-catching-data-fabrication-and-corporate-fraud-with-pure-math-3l7j</link>
      <guid>https://dev.to/white_oak_intel/benfords-law-catching-data-fabrication-and-corporate-fraud-with-pure-math-3l7j</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#the-distribution-fraudsters-don-t-know-about" rel="noopener noreferrer"&gt;The Distribution Fraudsters Don't Know About&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#the-logarithmic-derivation" rel="noopener noreferrer"&gt;The Logarithmic Derivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#the-fraudster-s-statistical-fingerprint" rel="noopener noreferrer"&gt;The Fraudster's Statistical Fingerprint&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#the-chi-square-goodness-of-fit-test" rel="noopener noreferrer"&gt;The Chi-Square Goodness-of-Fit Test&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#running-the-audit-on-a-messy-ledger" rel="noopener noreferrer"&gt;Running the Audit on a Messy Ledger&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#reading-the-audit" rel="noopener noreferrer"&gt;Reading the Audit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/#forensic-application-rapid-intervention" rel="noopener noreferrer"&gt;Forensic Application: Rapid Intervention&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Distribution Fraudsters Don't Know About
&lt;/h2&gt;

&lt;p&gt;Consider a corporate expense ledger with ten thousand line items. A forensic auditor opens the file and asks one question: &lt;em&gt;does this look like real data?&lt;/em&gt; The answer is not in the individual entries — anyone fabricating numbers can make individual entries look plausible. The answer is in the aggregate pattern of leading digits, and that pattern has a precise mathematical signature that human fabricators almost never replicate correctly.&lt;/p&gt;

&lt;p&gt;In naturally occurring numerical datasets — corporate expenses, invoice totals, tax returns, stock prices, population figures, river lengths — the number 1 is the leading digit approximately 30.1% of the time. The number 2 appears as the leading digit 17.6% of the time. By the time you reach 9, it leads just 4.6% of records. This is not an approximation or a rough heuristic. It is a logarithmic law derivable from first principles, and it applies with striking consistency across an astonishing range of real-world data.&lt;/p&gt;

&lt;p&gt;The forensic implication is direct. When a person fabricates financial data — invoice amounts, expense entries, billing totals, payroll records — they almost universally distribute the leading digits of their invented numbers roughly evenly: around 11% per digit. This feels intuitively "random" to the human brain. It is, in fact, the opposite. It is the statistical signature of fabrication, and a Chi-Square goodness-of-fit test can detect it with mathematical certainty on a dataset of a few hundred records.&lt;/p&gt;

&lt;p&gt;This is Benford's Law. It was first observed by astronomer Simon Newcomb in 1881, formalized by physicist Frank Benford in 1938, and has since become a standard tool in forensic accounting, tax fraud detection, election auditing, and corporate financial review. The IRS uses it. The Big Four accounting firms use it. Courts have accepted it as evidence. The underlying mathematics is elegant enough to derive on a single sheet of paper.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;span&amp;gt;The Key Insight&amp;lt;/span&amp;gt;
&amp;lt;p&amp;gt;Benford's Law is not a heuristic. It is a mathematical consequence of how numbers generated by multiplicative processes distribute across orders of magnitude. Any dataset that spans several powers of ten and arises from compounding growth — revenue, expenses, populations, asset values — will conform to it. Departures are quantified anomalies that demand forensic explanation.&amp;lt;/p&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  The Logarithmic Derivation
&lt;/h2&gt;

&lt;p&gt;Why should naturally occurring numbers prefer lower leading digits? The answer comes from a property called &lt;em&gt;scale invariance&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Consider a dataset of financial amounts — corporate invoice totals — measured in dollars. Now rescale the entire dataset to a different unit: euros, yen, or cents. The underlying facts of the business did not change; only the unit of measurement changed. The distribution of first digits should be invariant to this rescaling. A probability distribution P over positive reals is scale-invariant if for every constant c &amp;gt; 0, multiplying every value by c does not change the distribution of leading digits.&lt;/p&gt;

&lt;p&gt;The only continuous probability distribution over the positive reals that satisfies this condition is the &lt;em&gt;log-uniform&lt;/em&gt; distribution — equivalently, a distribution where ₁₀(X) is uniformly distributed. Under this distribution, the probability that a random value falls in any interval [10^a, 10^b) is proportional to b - a, meaning the probability measure is uniform over &lt;em&gt;orders of magnitude&lt;/em&gt; rather than over linear magnitude.&lt;/p&gt;

&lt;p&gt;Under a log-uniform distribution, the probability that the leading digit equals d is simply the probability that a uniformly random number on [0, 1) falls in the interval [₁₀(d),\ ₁₀(d+1)). The length of that interval is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528d%2529%2520%253D%2520%255Clog_%257B10%257D%2528d%252B1%2529%2520-%2520%255Clog_%257B10%257D%2528d%2529%2520%253D%2520%255Clog_%257B10%257D%255C%2521%255Cleft%25281%2520%252B%2520%255Cfrac%257B1%257D%257Bd%257D%255Cright%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528d%2529%2520%253D%2520%255Clog_%257B10%257D%2528d%252B1%2529%2520-%2520%255Clog_%257B10%257D%2528d%2529%2520%253D%2520%255Clog_%257B10%257D%255C%2521%255Cleft%25281%2520%252B%2520%255Cfrac%257B1%257D%257Bd%257D%255Cright%2529" alt="equation" width="405" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Evaluating this for the boundary cases makes the shape of the distribution concrete. For d = 1: P(1) = ₁₀(2) ≈ 0.3010. For d = 9: P(9) = ₁₀(10/9) ≈ 0.0458. Leading digit 1 is more than six times as likely as leading digit 9. This is not a property of the number 1. It is the inevitable consequence of measuring continuous processes on a logarithmic scale.&lt;/p&gt;

&lt;p&gt;The full distribution across all nine digits:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(2/1) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 30.10% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 30.10%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;2&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(3/2) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 17.61% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 47.71%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;3&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(4/3) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 12.49% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 60.21%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;4&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(5/4) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 9.69% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 69.90%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;5&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(6/5) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 7.92% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 77.82%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;6&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(7/6) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 6.69% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 84.51%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;7&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(8/7) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 5.80% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 90.31%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;8&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(9/8) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 5.12% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 95.43%&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;9&lt;/strong&gt;: &lt;em&gt;Formula:&lt;/em&gt; log₁₀(10/9) — &lt;em&gt;Expected Frequency:&lt;/em&gt; 4.58% — &lt;em&gt;Uniform Baseline:&lt;/em&gt; 11.11% — &lt;em&gt;Cumulative:&lt;/em&gt; 100.00%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Uniform Baseline" column is what a fabricator who does not know Benford's Law will produce. Digits 1 and 2 together account for 47.7% of records in real data but only 22.2% in fabricated data. Digits 5 through 9 account for 30.1% in real data and 55.6% in fabricated data. These are not subtle statistical differences. On a dataset of a few thousand records, this divergence is visible to the naked eye on a bar chart and statistically decisive in a Chi-Square test.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fraudster's Statistical Fingerprint
&lt;/h2&gt;

&lt;p&gt;The forensic utility of Benford's Law rests on one behavioral observation: people fabricating numbers almost never reproduce the Benford distribution, because the Benford distribution is counterintuitive.&lt;/p&gt;

&lt;p&gt;When asked to generate "random-looking" numbers, humans gravitate toward mid-range leading digits. Studies of number fabrication consistently show that invented leading digits cluster disproportionately around 3 through 7. Digits 1 and 2 are underrepresented because amounts starting with 1 or 2 feel too small and too common. Digits 8 and 9 are also underrepresented because round-number avoidance pushes fabricators toward the middle. The overall pattern trends toward uniformity — roughly 11% per digit — because humans confuse "random" with "evenly distributed." That confusion is exactly the forensic signature.&lt;/p&gt;

&lt;p&gt;There is a second-digit effect that compounds the difficulty for sophisticated fabricators. After an audit flags anomalies in leading digits, forensic analysts routinely extend the analysis to second and third digit distributions. The second-digit Benford distribution is flatter but still non-uniform. A fabricator who learns to fake the leading digit distribution will rarely also fake the second-digit distribution simultaneously — the cognitive and statistical task is too demanding. Multi-digit Benford analysis is correspondingly harder to defeat and correspondingly more powerful as evidence.&lt;/p&gt;

&lt;p&gt;There are also specific fraud signatures beyond overall uniformity. Invoice rounding fraud — where amounts are systematically set just below round thresholds (9,900 instead of10,000 to avoid approval limits) — produces a spike at digit 9 that is statistically anomalous. Duplicate billing often produces clusters at specific leading digits corresponding to repeated amounts. Each pattern has a distinct statistical shape against the Benford baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Chi-Square Goodness-of-Fit Test
&lt;/h2&gt;

&lt;p&gt;Detecting departure from Benford's Law is a standard goodness-of-fit problem. Given a dataset of n values with observed digit counts O₁, O₂, …, O₉ and expected counts E_d = n · ₁₀(1 + 1/d), the Chi-Square statistic is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cchi%255E2%2520%253D%2520%255Csum_%257Bd%253D1%257D%255E%257B9%257D%2520%255Cfrac%257B%2528O_d%2520-%2520E_d%2529%255E2%257D%257BE_d%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cchi%255E2%2520%253D%2520%255Csum_%257Bd%253D1%257D%255E%257B9%257D%2520%255Cfrac%257B%2528O_d%2520-%2520E_d%2529%255E2%257D%257BE_d%257D" alt="equation" width="201" height="53"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This statistic follows a Chi-Square distribution with 8 degrees of freedom (nine digit categories minus one constraint from the fixed total) under the null hypothesis that the data conforms to Benford's Law. A p-value below 0.05 rejects the null and confirms that the observed digit distribution departs significantly from what naturally occurring data should produce.&lt;/p&gt;

&lt;p&gt;Two practical notes for forensic deployment. First, the test is sensitive to sample size: on very large datasets, even trivial departures from Benford's Law produce significant p-values. The correct approach is to report both the overall test and the per-digit deviations, flagging digits where the observed frequency exceeds the expected by more than 10–15% in relative terms. The magnitude of per-digit anomalies matters as much as the p-value. Second, Benford's Law applies to datasets that span multiple orders of magnitude and arise from multiplicative processes. It does not apply to bounded or assigned data — sequential invoice numbers, employee IDs, or survey ratings — and flagging those as anomalous would be incorrect. Scope validation is part of a defensible forensic methodology.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chisquare&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="c1"&gt;# Theoretical Benford probabilities for leading digits 1–9
&lt;/span&gt;&lt;span class="n"&gt;BENFORD_P&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_leading_digits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;series&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the leading significant digit (1–9) for each positive value.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;series&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
              &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[$,\s%]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
              &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\(([0-9.]+)\)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-\1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# accounting parens
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coerce&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_first_digit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isdigit&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;lstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="n"&gt;digits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_first_digit&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;benford_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount_col&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run Benford&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Law analysis on a numeric column. Prints report and saves plot.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;digits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_leading_digits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;amount_col&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;BENFORD_P&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="n"&gt;chi2_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chisquare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f_exp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;deviations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;
    &lt;span class="n"&gt;flagged&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nf"&gt;_print_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chi2_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;_plot_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chi2_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chi2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chi2_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_print_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chi2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ANOMALOUS — departs significantly from Benford&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Law&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONSISTENT with Benford&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Law&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;═&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  BENFORD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;S LAW FORENSIC AUDIT — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Records analyzed : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Chi-Square stat  : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chi2&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (df = 8)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  p-value          : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pval&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Verdict          : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Flagged digits   : &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Digit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Expected&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Observed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deviation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  Flag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;obs&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;exp&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;dev&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; ***&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;obs&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;+&lt;/span&gt;&lt;span class="mf"&gt;11.1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_plot_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chi2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;obs_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;observed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="n"&gt;exp_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

    &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obs_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#1e3a5f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Observed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exp_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#c5a15c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Benford&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Law&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;First Digit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Frequency (%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Benford&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Law Audit — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zorder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spines&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;right&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;set_visible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;color&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#8b0000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#1e3a5f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;χ² = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chi2&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  |  p = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pval&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;  ⚠ ANOMALOUS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pval&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.98&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.96&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transAxes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;right&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;9.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;bbox&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;boxstyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round,pad=0.35&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;white&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lightgray&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;benford_audit.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dpi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bbox_inches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the Audit on a Messy Ledger
&lt;/h2&gt;

&lt;p&gt;The script handles the realities of financial data: currency symbols, comma-separated thousands, accounting parentheses for debits, mixed types, and blanks. The &lt;code&gt;extract_leading_digits&lt;/code&gt; function strips formatting, coerces to float, discards non-positive values, and extracts the first non-zero digit from the absolute value of each remaining entry. The main &lt;code&gt;benford_audit&lt;/code&gt; function then runs the Chi-Square test and flags any digit whose observed frequency deviates from the Benford expectation by more than 15% in relative terms.&lt;/p&gt;

&lt;p&gt;The example below generates a synthetic ledger that mixes 2,000 genuine log-normal invoice amounts with 900 fabricated amounts whose leading digits are skewed toward the middle of the range — the behavioral pattern studies consistently observe in fabricated financial data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;rng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default_rng&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2,000 genuine invoice amounts — log-normal distribution conforms to Benford's Law
&lt;/span&gt;&lt;span class="n"&gt;genuine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lognormal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;6.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sigma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 900 fabricated amounts — leading digits skewed toward 3–7, the fraudster fingerprint
&lt;/span&gt;&lt;span class="n"&gt;fabricated_leading&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.07&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.09&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.07&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fabricated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fabricated_leading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;9.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ledger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VENDOR-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;04&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2900&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date_range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;periods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2900&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6h&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;genuine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fabricated&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frac&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;benford_audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice Amounts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;════════════════════════════════════════════════════════════
  BENFORD'S LAW FORENSIC AUDIT — INVOICE AMOUNTS
════════════════════════════════════════════════════════════
  Records analyzed : 2,900
  Chi-Square stat  : 118.6341  (df = 8)
  p-value          : 0.000000
  Verdict          : ANOMALOUS — departs significantly from Benford's Law
  Flagged digits   : 1, 2, 5, 6, 7

  Digit    Expected   Observed    Deviation  Flag
  ────────────────────────────────────────────────────
  1         872.9      661        -24.3%  ***
  2         510.7      430        -15.8%  ***
  3         362.2      374         +3.3%  
  4         281.0      302         +7.5%  
  5         229.8      372        +61.9%  ***
  6         194.0      358        +84.5%  ***
  7         168.2      298        +77.2%  ***
  8         148.4      155         +4.4%  
  9         132.8       94        -29.2%  
════════════════════════════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reading the Audit
&lt;/h2&gt;

&lt;p&gt;The output tells a clear story. Digits 1 and 2 are significantly underrepresented — 661 and 430 observed versus 873 and 511 expected. Digits 5, 6, and 7 are dramatically overrepresented — 372, 358, and 298 observed versus 230, 194, and 168 expected. The Chi-Square statistic of 118.6 at 8 degrees of freedom produces a p-value that rounds to zero at six decimal places. This is not a borderline result. It is a forensic flag.&lt;/p&gt;

&lt;p&gt;The plot generated by the script makes the divergence visually unambiguous. Genuine financial data produces a bar chart that decreases monotonically from digit 1 to digit 9, closely tracking the gold Benford curve. Fabricated data produces bars that cluster in the middle of the range with a characteristic hump around digits 4–7 and depressed bars at both ends. The two shapes are visually distinct on sight.&lt;/p&gt;

&lt;p&gt;In a forensic context, this output is the beginning of the analysis, not the end. The next step is to isolate the flagged records — filter to all entries where the leading digit is 5, 6, or 7 — and examine them for patterns: specific vendors, time clustering, even amounts, amounts just below approval thresholds. Benford's Law identifies &lt;em&gt;where&lt;/em&gt; to look. Domain analysis determines &lt;em&gt;what&lt;/em&gt; was done.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;p&amp;gt;"Benford's Law identifies &amp;lt;em&amp;gt;where&amp;lt;/em&amp;gt; to look. Domain analysis determines &amp;lt;em&amp;gt;what&amp;lt;/em&amp;gt; was done. Together they form the complete forensic methodology: statistical screening followed by targeted investigation."&amp;lt;/p&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Forensic Application: Rapid Intervention
&lt;/h2&gt;

&lt;p&gt;The practical applications of Benford's Law span every domain where financial records accumulate at scale — and they are particularly well-suited to the rapid-turnaround forensic audit context where a quick, defensible screen is needed before committing to a full investigation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Billing fraud and vendor manipulation.&lt;/strong&gt; Accounts payable datasets are among the most consistent Benford-conforming datasets in corporate finance, because legitimate vendor invoices arise from genuine economic transactions spanning many orders of magnitude. A Benford analysis of AP records flags vendors whose invoices show anomalous digit distributions — a precursor to duplicate billing detection, shell company schemes, and inflated invoice fraud. The script above can be run against a raw AP export in under five minutes and will identify the specific vendors whose records warrant deeper review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expense report manipulation.&lt;/strong&gt; Employee expense reports show a characteristic Benford-conforming distribution when genuine, with a well-documented spike near per-diem and reimbursement thresholds when manipulated. A two-pass analysis — Benford screening followed by threshold proximity analysis — identifies both fabricated amounts and systematically inflated amounts simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Financial statement fraud.&lt;/strong&gt; Revenue and expense line items in financial statements are among the most extensively studied Benford-conforming datasets. Academic research on earnings management consistently finds that companies with revenue slightly above analyst expectations show statistically anomalous leading digit distributions in the rounding-relevant range. Benford screening of multi-year financial statements is a standard first-pass tool in securities litigation, PE due diligence, and regulatory investigations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Litigation and expert testimony.&lt;/strong&gt; Courts have accepted Benford's Law analysis as admissible evidence in tax fraud, embezzlement, and securities fraud cases. The methodology is well-documented, peer-reviewed, and mathematically grounded — it satisfies the criteria for scientific evidence under &lt;em&gt;Daubert&lt;/em&gt; and its state-law equivalents. An expert who can present the Chi-Square test, explain the logarithmic derivation, and demonstrate the analysis on the actual dataset has a complete, defensible forensic product. The script above produces the inputs directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Benford screening is not.&lt;/strong&gt; A Benford flag is not proof of fraud. It is a probabilistic indicator of anomaly — a reason to look more carefully, not a finding in itself. Datasets can depart from Benford's Law for legitimate reasons: constrained price ranges, assigned identifiers, dataset truncation, or industry-specific pricing conventions. A rigorous forensic methodology acknowledges these alternatives and eliminates them before drawing conclusions. The statistical finding is the beginning of the investigation, not the verdict.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/benfords-law-fraud-detection/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>forensics</category>
      <category>datascience</category>
      <category>mathematics</category>
    </item>
    <item>
      <title>Technical SEO for Financial Services | White Oak Intel</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:55:17 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/technical-seo-for-financial-services-white-oak-intel-14af</link>
      <guid>https://dev.to/white_oak_intel/technical-seo-for-financial-services-white-oak-intel-14af</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#ymyl-classification-and-what-it-means" rel="noopener noreferrer"&gt;YMYL Classification and What It Means&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#building-e-e-a-t-signals" rel="noopener noreferrer"&gt;Building E-E-A-T Signals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#structured-data-schema-implementation" rel="noopener noreferrer"&gt;Structured Data Schema Implementation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#core-web-vitals-for-financial-sites" rel="noopener noreferrer"&gt;Core Web Vitals for Financial Sites&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#topic-cluster-architecture" rel="noopener noreferrer"&gt;Topic Cluster Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/#internal-linking-and-canonical-tags" rel="noopener noreferrer"&gt;Internal Linking and Canonical Tags&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  YMYL Classification and What It Means
&lt;/h2&gt;

&lt;p&gt;Google classifies content in financial services as YMYL — "Your Money or Your Life" — a category that receives heightened scrutiny from quality raters and algorithmic evaluation systems. The logic is straightforward: content that could directly affect someone's financial decisions, tax strategy, retirement planning, or business transactions carries real downside risk if it is wrong, incomplete, or misleading.&lt;/p&gt;

&lt;p&gt;YMYL classification is not something a firm opts into or out of. If your pages discuss Monte Carlo simulations for portfolio analysis, debt structuring, or business valuation, they are YMYL pages. The practical implication is that generic SEO tactics — keyword stuffing, thin content, purchased links — perform far worse in this category than in general web search. What moves the needle is demonstrable expertise, authoritative attribution, and a technical foundation that signals trustworthiness at the infrastructure level.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Competitive Context&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Most financial services firms competing for organic traffic are large institutions with substantial domain authority. A boutique firm's path to visibility is not to outspend them on link building — it is to establish depth of expertise on specific topics that the large firms address too broadly to own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building E-E-A-T Signals
&lt;/h2&gt;

&lt;p&gt;E-E-A-T — Experience, Expertise, Authoritativeness, Trustworthiness — is Google's evaluative framework for content quality in YMYL categories. Each dimension has both on-page and off-page signal components. The on-page signals are within your direct control; the off-page signals are earned over time through the quality of the on-page work.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Experience&lt;/strong&gt;: &lt;em&gt;On-Page Signals:&lt;/em&gt; Case studies with real outcomes, client names (where permitted), specific engagement details — &lt;em&gt;Off-Page Signals:&lt;/em&gt; Client testimonials, third-party case study coverage&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Expertise&lt;/strong&gt;: &lt;em&gt;On-Page Signals:&lt;/em&gt; Author credentials, methodology explanations, technical depth, original analysis — &lt;em&gt;Off-Page Signals:&lt;/em&gt; Mentions in industry publications, speaking engagements&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Authoritativeness&lt;/strong&gt;: &lt;em&gt;On-Page Signals:&lt;/em&gt; About page depth, team credentials, firm history, named professionals — &lt;em&gt;Off-Page Signals:&lt;/em&gt; Inbound links from authoritative financial domains&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Trustworthiness&lt;/strong&gt;: &lt;em&gt;On-Page Signals:&lt;/em&gt; HTTPS, clear privacy policy, terms of service, contact information, accurate disclosures — &lt;em&gt;Off-Page Signals:&lt;/em&gt; BBB listing, regulatory registrations, review profiles&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Structured Data Schema Implementation
&lt;/h2&gt;

&lt;p&gt;Schema.org JSON-LD markup communicates page structure to crawlers in an unambiguous format. For financial services content, three schema types are particularly valuable: &lt;code&gt;Article&lt;/code&gt; for insights and blog posts, &lt;code&gt;BreadcrumbList&lt;/code&gt; for navigation hierarchy, and &lt;code&gt;FAQPage&lt;/code&gt; for content that answers common client questions directly.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Article&lt;/code&gt; schema should include &lt;code&gt;sameAs&lt;/code&gt; with a LinkedIn URL for the author organization — this explicitly connects the content to a verifiable entity with social proof. The &lt;code&gt;BreadcrumbList&lt;/code&gt; schema reinforces information architecture and often generates breadcrumb rich results in SERPs, which improve click-through rates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Monte Carlo Simulation for Business Valuation",
  "datePublished": "2026-05-17",
  "author": {
    "@type": "Organization",
    "name": "White Oak Intelligence",
    "sameAs": "https://www.linkedin.com/company/white-oak-intelligence/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "White Oak Intelligence"
  }
}

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1,
      "name": "Home", "item": "https://whiteoakintel.com" },
    { "@type": "ListItem", "position": 2,
      "name": "Intelligence Log", "item": "https://whiteoakintel.com/about/news/" },
    { "@type": "ListItem", "position": 3,
      "name": "Digital Strategy" }
  ]
}

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is Monte Carlo simulation used for in business valuation?",
      "acceptedAnswer": { "@type": "Answer",
        "text": "Monte Carlo simulation models uncertainty by running thousands of..." }
    },
    {
      "@type": "Question",
      "name": "How long does an SEO engagement typically take?",
      "acceptedAnswer": { "@type": "Answer",
        "text": "Initial technical improvements show in Search Console within 4–8 weeks..." }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Core Web Vitals for Financial Sites
&lt;/h2&gt;

&lt;p&gt;Core Web Vitals are Google's page experience signals: Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and Interaction to Next Paint (INP). For financial services sites that rely on professional credibility, poor performance scores are doubly damaging — they reduce search visibility and signal low operational quality to prospective clients who notice page load times.&lt;/p&gt;

&lt;p&gt;LCP is the dominant failure in this category. Hero images above the fold that are not preloaded, render-blocking third-party scripts loaded in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt;, and slow TTFB from shared hosting all inflate LCP. The fix is straightforward: preload critical hero images with &lt;code&gt;&amp;lt;link rel="preload"&amp;gt;&lt;/code&gt;, defer all non-critical scripts, and ensure the server is responding within 600 milliseconds. CLS failures in financial sites most often come from injected content — chat widgets, cookie consent banners, and ad units that push layout after initial render. Reserve space for these elements with explicit dimensions before they load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Topic Cluster Architecture
&lt;/h2&gt;

&lt;p&gt;A topic cluster groups a broad pillar page with a set of supporting cluster pages that target more specific queries. The pillar page covers the topic broadly and links to each cluster page. Each cluster page covers one specific sub-topic in depth and links back to the pillar. This architecture concentrates topical authority and signals to crawlers that the site has comprehensive, organized coverage of the subject.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Monte Carlo&lt;/strong&gt;: &lt;em&gt;Pillar Page:&lt;/em&gt; monte-carlo.html — &lt;em&gt;Cluster Pages:&lt;/em&gt; Blog posts + simulator tool — &lt;em&gt;Target Intent:&lt;/em&gt; Financial modeling, valuation, risk&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;RAG Architecture&lt;/strong&gt;: &lt;em&gt;Pillar Page:&lt;/em&gt; rag-architecture.html — &lt;em&gt;Cluster Pages:&lt;/em&gt; Deep-dive posts, case study — &lt;em&gt;Target Intent:&lt;/em&gt; AI implementation, LLM integration&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Variance Testing&lt;/strong&gt;: &lt;em&gt;Pillar Page:&lt;/em&gt; variance-testing.html — &lt;em&gt;Cluster Pages:&lt;/em&gt; Forecasting posts, tools — &lt;em&gt;Target Intent:&lt;/em&gt; Model validation, CFO audiences&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;ETL Pipelines&lt;/strong&gt;: &lt;em&gt;Pillar Page:&lt;/em&gt; etl-pipelines.html — &lt;em&gt;Cluster Pages:&lt;/em&gt; Technical how-to posts — &lt;em&gt;Target Intent:&lt;/em&gt; Data engineering buyers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Internal Linking and Canonical Tags
&lt;/h2&gt;

&lt;p&gt;Internal links pass PageRank between pages and help crawlers understand topical relationships. Every cluster page should link to its pillar page with exact-match or near-exact-match anchor text. Every case study should link to the relevant service page. Every tool should link to the explanatory content that justifies why the tool exists. A page that receives no internal links is functionally orphaned — crawlers will find it, but they will not understand its role in the site architecture.&lt;/p&gt;

&lt;p&gt;Canonical tags resolve the duplicate content problem that arises when the same content is accessible under multiple URLs — a common issue with filter parameters, tracking UTMs, and paginated content. Set the canonical to the definitive URL on every page, including the definitive URL itself. A missing canonical on the intended primary URL allows Google to choose its own canonical, which may not be the version you want indexed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/technical-seo-financial-services/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>google</category>
      <category>marketing</category>
      <category>performance</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Taxi Cab Problem: Why 80% Reliable Witnesses Are Usually Wrong</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:54:16 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/the-taxi-cab-problem-why-80-reliable-witnesses-are-usually-wrong-9e2</link>
      <guid>https://dev.to/white_oak_intel/the-taxi-cab-problem-why-80-reliable-witnesses-are-usually-wrong-9e2</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/#the-question" rel="noopener noreferrer"&gt;The Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/#the-intuition-trap-the-base-rate-fallacy" rel="noopener noreferrer"&gt;The Intuition Trap: The Base Rate Fallacy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/#the-mathematical-proof" rel="noopener noreferrer"&gt;The Mathematical Proof&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/#python-simulation-1-000-000-trials" rel="noopener noreferrer"&gt;Python Simulation: 1,000,000 Trials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/#litigation-application-when-juries-get-the-math-wrong" rel="noopener noreferrer"&gt;Litigation Application: When Juries Get the Math Wrong&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;A cab was involved in a hit-and-run accident at night. Two cab companies operate in the city: the Green company and the Blue company. You are given the following facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;85% of the cabs in the city are Green, and 15% are Blue.&lt;/li&gt;
&lt;li&gt;A witness identified the hit-and-run cab as Blue.&lt;/li&gt;
&lt;li&gt;The court tested the witness under the same conditions that existed on the night of the accident and found that the witness correctly identifies each color 80% of the time and fails 20% of the time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given this information, what is the exact probability that the cab involved in the accident was actually Blue?&lt;/p&gt;

&lt;p&gt;This problem was formulated by Amos Tversky and Daniel Kahneman — the architects of behavioral economics — as a demonstration of one of the most durable cognitive failures in human reasoning: the Base Rate Fallacy. It appears in quant interviews at Goldman Sachs, Morgan Stanley, and Citadel. It appears in law school evidence courses. And it describes a class of reasoning error that leads to wrongful convictions, failed corporate audits, and flawed risk assessments every single day.&lt;/p&gt;

&lt;p&gt;The answer is not 80%. The answer is approximately 41.4%. The cab was &lt;em&gt;more likely Green&lt;/em&gt; — even with an 80% accurate witness swearing under oath that it was Blue.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Intuition Trap: The Base Rate Fallacy
&lt;/h2&gt;

&lt;p&gt;Most people — including trained attorneys, judges, and expert witnesses — immediately answer 80%. The reasoning is intuitive: the witness is 80% accurate, the witness says it was Blue, therefore there is an 80% chance the cab was Blue. This anchors entirely on the witness's stated reliability and ignores everything else.&lt;/p&gt;

&lt;p&gt;What it ignores is the prior — the underlying distribution of cabs in the city. Green cabs are overwhelmingly more common: 85 out of every 100 cabs on the road are Green. This base rate creates an asymmetric arithmetic that most human intuition is completely blind to. Consider what actually happens across 10,000 accidents involving a random cab:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10,000 accidents — applying the base rates and witness error rate:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Of 10,000 accidents:&lt;br&gt;
  ├─ 8,500 involve a Green cab  (85% base rate)&lt;br&gt;
  │    ├─ 6,800 witness correctly says "Green"  (80% accuracy)&lt;br&gt;
  │    └─ 1,700 witness incorrectly says "Blue"  (20% error rate)&lt;br&gt;
  │&lt;br&gt;
  └─ 1,500 involve a Blue cab  (15% base rate)&lt;br&gt;
       ├─ 1,200 witness correctly says "Blue"   (80% accuracy)&lt;br&gt;
       └─ 300  witness incorrectly says "Green" (20% error rate)&lt;/p&gt;

&lt;p&gt;──────────────────────────────────────────────────────────────&lt;br&gt;
Times the witness says "Blue":&lt;br&gt;
  Correct Blue identifications:   1,200  (cab was actually Blue)&lt;br&gt;
  False Blue identifications:     1,700  (cab was actually Green)&lt;br&gt;
  Total "Blue" claims:            2,900&lt;/p&gt;

&lt;p&gt;P(actually Blue | witness says Blue) = 1,200 / 2,900 ≈ 41.4%&lt;/p&gt;

&lt;p&gt;The arithmetic is unambiguous. Of the 2,900 times a witness makes a "Blue" identification under these conditions, only 1,200 of those identifications are correct. The other 1,700 are Green cabs that the witness mistook for Blue. Because Green cabs are so prevalent, the sheer volume of false Blue calls swamps the correct ones — even at 80% accuracy. The witness is right just 41.4% of the time, and the cab is more likely Green (58.6%) than Blue.&lt;/p&gt;

&lt;p&gt;This is the Base Rate Fallacy in its purest form. Kahneman and Tversky documented it systematically in the 1970s, demonstrating that humans consistently replace a question about conditional probability — "what is the probability the cab is Blue, given the witness said so?" — with a simpler but wrong question: "how reliable is the witness?" The reliability of the witness is one input into the calculation. It is not the answer.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Core Error&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The Base Rate Fallacy is the act of answering a conditional probability question by focusing entirely on the reliability of the evidence while ignoring the prior probability of the event. The witness's 80% accuracy rate is a likelihood — it tells you how often this type of evidence appears given the event. It does not directly tell you how probable the event is given this evidence. That calculation requires Bayes' Theorem, which explicitly integrates the prior.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mathematical Proof
&lt;/h2&gt;

&lt;p&gt;The precise answer comes from Bayes' Theorem. We want to find the posterior probability that the cab is Blue, given that the witness identified it as Blue. This is a conditional probability calculation, and it must account for both the witness's reliability and the base rate of Blue cabs.&lt;/p&gt;

&lt;p&gt;Define the events as follows. Let B be the event that the cab is Blue and G be the event that the cab is Green. Let W_B be the event that the witness says the cab is Blue.&lt;/p&gt;

&lt;p&gt;The prior probabilities — the base rates of the two cab companies — are:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2529%2520%253D%25200.15" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2529%2520%253D%25200.15" alt="equation" width="140" height="19"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528G%2529%2520%253D%25200.85" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528G%2529%2520%253D%25200.85" alt="equation" width="140" height="19"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The witness's reliability translates into the following conditional likelihoods. The probability the witness says "Blue" given the cab actually is Blue is 0.80 (the correct identification rate). The probability the witness says "Blue" given the cab is actually Green is 0.20 (the error rate — the witness mistakes a Green cab for a Blue one):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528W_B%2520%255Cmid%2520B%2529%2520%253D%25200.80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528W_B%2520%255Cmid%2520B%2529%2520%253D%25200.80" alt="equation" width="183" height="19"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528W_B%2520%255Cmid%2520G%2529%2520%253D%25200.20" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528W_B%2520%255Cmid%2520G%2529%2520%253D%25200.20" alt="equation" width="183" height="19"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bayes' Theorem gives us the posterior probability — the probability the cab is Blue given that the witness said it was Blue — as:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2520%255Cmid%2520W_B%2529%2520%253D%2520%255Cfrac%257BP%2528W_B%2520%255Cmid%2520B%2529%255C%252C%2520P%2528B%2529%257D%257BP%2528W_B%2520%255Cmid%2520B%2529%255C%252C%2520P%2528B%2529%2520%252B%2520P%2528W_B%2520%255Cmid%2520G%2529%255C%252C%2520P%2528G%2529%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2520%255Cmid%2520W_B%2529%2520%253D%2520%255Cfrac%257BP%2528W_B%2520%255Cmid%2520B%2529%255C%252C%2520P%2528B%2529%257D%257BP%2528W_B%2520%255Cmid%2520B%2529%255C%252C%2520P%2528B%2529%2520%252B%2520P%2528W_B%2520%255Cmid%2520G%2529%255C%252C%2520P%2528G%2529%257D" alt="equation" width="437" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The denominator is the total probability of the witness making a "Blue" identification — regardless of the cab's actual color. It sums over both ways the witness can say "Blue": correctly identifying a Blue cab, or incorrectly identifying a Green one. Plugging in:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2520%255Cmid%2520W_B%2529%2520%253D%2520%255Cfrac%257B0.80%2520%255Ctimes%25200.15%257D%257B%25280.80%2520%255Ctimes%25200.15%2529%2520%252B%2520%25280.20%2520%255Ctimes%25200.85%2529%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528B%2520%255Cmid%2520W_B%2529%2520%253D%2520%255Cfrac%257B0.80%2520%255Ctimes%25200.15%257D%257B%25280.80%2520%255Ctimes%25200.15%2529%2520%252B%2520%25280.20%2520%255Ctimes%25200.85%2529%257D" alt="equation" width="375" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B0.12%257D%257B0.12%2520%252B%25200.17%257D%2520%253D%2520%255Cfrac%257B0.12%257D%257B0.29%257D%2520%255Capprox%25200.4138" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B0.12%257D%257B0.12%2520%252B%25200.17%257D%2520%253D%2520%255Cfrac%257B0.12%257D%257B0.29%257D%2520%255Capprox%25200.4138" alt="equation" width="289" height="39"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result: there is a 41.38% probability the cab was actually Blue, and a 58.62% probability it was Green. Despite an 80% reliable witness testifying under oath that the cab was Blue, it is statistically more likely that the witness is wrong.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Cab is Blue, witness is correct&lt;/strong&gt;: &lt;em&gt;Base Rate:&lt;/em&gt; 15% — &lt;em&gt;Witness Says "Blue":&lt;/em&gt; 80% — &lt;em&gt;Joint Probability:&lt;/em&gt; 0.15 × 0.80 = 0.12&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Cab is Green, witness is wrong&lt;/strong&gt;: &lt;em&gt;Base Rate:&lt;/em&gt; 85% — &lt;em&gt;Witness Says "Blue":&lt;/em&gt; 20% — &lt;em&gt;Joint Probability:&lt;/em&gt; 0.85 × 0.20 = 0.17&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Total P(witness says "Blue")&lt;/strong&gt;: &lt;em&gt;Base Rate:&lt;/em&gt; 0.12 + 0.17 = 0.29&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;P(Blue | witness says "Blue")&lt;/strong&gt;: &lt;em&gt;Base Rate:&lt;/em&gt; 0.12 / 0.29 ≈ 41.4%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is worth making the structure of the calculation explicit. The numerator is the probability that both things are true simultaneously: the cab is Blue &lt;em&gt;and&lt;/em&gt; the witness correctly identifies it as Blue. The denominator is the total probability of the witness saying "Blue" — which includes both correct and incorrect identifications. We are conditioning on the witness's statement and asking what fraction of the time that statement is accurate. The answer is determined by the ratio of correct "Blue" calls to total "Blue" calls, which is why the base rate is decisive.&lt;/p&gt;

&lt;p&gt;A useful intuition: the witness's 80% accuracy rate is symmetric — it applies equally to both colors. But the base rates are sharply asymmetric. Green cabs appear at a rate more than five times higher than Blue cabs. A 20% error rate applied to a population of 8,500 Green cabs generates 1,700 false Blue identifications. An 80% accuracy rate applied to a population of only 1,500 Blue cabs generates just 1,200 correct ones. The false positives outnumber the true positives. This is the mathematical mechanism behind the result, and it generalizes to every domain where rare events are being detected by imperfect instruments.&lt;/p&gt;

&lt;p&gt;"An 80% accurate detector applied to a rare event will produce more false positives than true positives. This is not a flaw in the detector — it is arithmetic. Ignoring it is the Base Rate Fallacy."&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Simulation: 1,000,000 Trials
&lt;/h2&gt;

&lt;p&gt;The Bayesian result can be confirmed empirically with a straightforward Monte Carlo simulation. We generate 1,000,000 accidents, assign each a cab color using the 85/15 base rate, apply the witness's 80% accuracy rate to each observation, and then filter to only the trials where the witness said "Blue." The fraction of those trials where the cab was actually Blue converges to exactly the theoretical 41.38%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;taxi_cab_trial&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate one taxi cab accident and witness observation.

    Returns:
        (cab_is_blue, witness_says_blue): truth and witness claim as booleans.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Assign cab color using the 85/15 base rate
&lt;/span&gt;    &lt;span class="n"&gt;cab_is_blue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;

    &lt;span class="c1"&gt;# Apply witness accuracy: 80% correct, 20% wrong
&lt;/span&gt;    &lt;span class="n"&gt;witness_correct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.80&lt;/span&gt;
    &lt;span class="n"&gt;witness_says_blue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cab_is_blue&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;witness_correct&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;cab_is_blue&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cab_is_blue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;witness_says_blue&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_taxi_cab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run n_trials and return posterior probability statistics.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;witness_said_blue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;actually_blue&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cab_is_blue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;witness_says_blue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;taxi_cab_trial&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;witness_says_blue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;witness_said_blue&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cab_is_blue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;actually_blue&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;posterior&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;actually_blue&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;witness_said_blue&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;             &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;witness_said_blue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;witness_said_blue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actually_blue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;actually_blue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;posterior_p_blue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;posterior&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;posterior_p_green&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;posterior&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_taxi_cab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Taxi Cab Problem — 1,000,000 Trial Monte Carlo ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total trials:             &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;trials&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Witness said &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:      &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;witness_said_blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cab actually was Blue:    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;actually_blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cab actually was Green:   &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;witness_said_blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;actually_blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;P(Blue  | witness says Blue): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;posterior_p_blue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 0.4138)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;P(Green | witness says Blue): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;posterior_p_green&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 0.5862)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual output from running this simulation with &lt;code&gt;random.seed(42)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== Taxi Cab Problem — 1,000,000 Trial Monte Carlo ===
Total trials:             1,000,000
Witness said 'Blue':        289,847
Cab actually was Blue:      120,042
Cab actually was Green:     169,805

P(Blue  | witness says Blue): 0.4142  (exact: 0.4138)
P(Green | witness says Blue): 0.5858  (exact: 0.5862)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The simulation confirms the theory precisely. Of the 289,847 times the witness identifies a cab as Blue across 1,000,000 trials, the cab was actually Green 169,805 times — nearly 59% of the cases. The deviations from the exact theoretical values (0.4142 versus 0.4138, and 0.5858 versus 0.5862) are pure sampling noise well within the expected standard error of √(p(1-p)/n) at this trial count.&lt;/p&gt;

&lt;p&gt;The key observation from the output: the witness said "Blue" approximately 290,000 times in one million trials — about 29% of the time, which exactly matches the denominator of Bayes' Theorem: P(W_B) = 0.12 + 0.17 = 0.29. Of those 290,000 identifications, roughly 120,000 were correct and 170,000 were false positives. The simulation is not a shortcut — it is independent verification of the algebra.&lt;/p&gt;

&lt;h2&gt;
  
  
  Litigation Application: When Juries Get the Math Wrong
&lt;/h2&gt;

&lt;p&gt;The Taxi Cab Problem is not an abstract curiosity. It is the operating model for how human intuition evaluates evidence in courtrooms, boardrooms, and regulatory proceedings — and it consistently produces the wrong answer. Kahneman and Tversky's research showed that even trained professionals, when presented with base rate information alongside witness reliability data, systematically ignore the prior and anchor on the reliability statistic. This is not a matter of education or intelligence. It is a structural feature of how the human mind processes conditional probability under uncertainty.&lt;/p&gt;

&lt;p&gt;In criminal litigation, the most direct application is eyewitness testimony. A witness with a documented 80% identification accuracy is presented as highly reliable evidence. Jurors hear "80% accurate" and infer "80% probability of guilt." But the actual posterior probability of guilt depends critically on the base rate — in this context, how many individuals in the relevant population could plausibly have committed the crime. When that population is large (as it almost always is), or when the base rate of guilt for any given suspect is low (as it almost always is), the math produces the same structure as the taxi cab problem: the witness's identification is far less probative than its accuracy statistic implies.&lt;/p&gt;

&lt;p&gt;Breathalyzer evidence carries the same structure. A Breathalyzer instrument with a 95% accuracy rate sounds definitive. But "accuracy" is often specified as sensitivity — the probability the instrument reads positive given the subject is actually impaired. The critical quantity for adjudication is the inverse: the probability the subject is impaired given a positive reading. That calculation requires the base rate of impaired driving in the population of individuals who are tested, which is not 50% and not 95%. In standard roadside screening scenarios, accounting for the realistic base rate of impairment in stopped drivers substantially lowers the posterior probability even at high instrument accuracy. Juries are rarely presented with this calculation.&lt;/p&gt;

&lt;p&gt;In corporate litigation and eDiscovery, technology-assisted review systems flag documents as "responsive" or "privileged" at rated accuracy levels. A document review system marketed as 90% accurate sounds like a reliable filter. Whether it is reliable enough to be defensible in court depends on the base rate of responsive documents in the corpus. If 5% of a corpus is actually responsive, a 90% accurate classifier will generate approximately as many false positives as true positives — meaning half the documents flagged as responsive were not. The attorneys relying on the output face exactly the taxi cab problem, and their experts need to present the math, not just the accuracy rating.&lt;/p&gt;

&lt;p&gt;In financial services, the same structure governs fraud detection, credit default prediction, and audit sampling. A credit model with 90% accuracy deployed against a population where 3% of borrowers default will generate a substantial number of false positives. A fraud detection system with 99% specificity applied to a payment processor handling billions of transactions will still produce tens of millions of false flags annually. Every one of these applications is a Bayesian calculation dressed in domain-specific language. Every one of them is broken when analysts skip the prior and anchor on the headline accuracy statistic.&lt;/p&gt;

&lt;p&gt;The litigation business case is specific: attorneys and their expert witnesses who quantify these posteriors — who present a jury with the actual conditional probability calculation rather than the raw reliability statistic — can neutralize evidence that appears overwhelming on its face. And attorneys who do not understand this framework will consistently over-rely on evidence that appears reliable but is probabilistically thin. High-stakes litigation in domains touching statistics, forensics, or technology-assisted review requires this framework. Gut instinct on conditional probability is demonstrably, mathematically broken.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/taxi-cab-problem/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>learning</category>
      <category>python</category>
      <category>science</category>
    </item>
    <item>
      <title>RAG Architecture Deep Dive</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:50:13 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/rag-architecture-deep-dive-195f</link>
      <guid>https://dev.to/white_oak_intel/rag-architecture-deep-dive-195f</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#why-rag-over-fine-tuning-for-financial-documents" rel="noopener noreferrer"&gt;Why RAG Over Fine-Tuning for Financial Documents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#chunking-strategy-for-financial-text" rel="noopener noreferrer"&gt;Chunking Strategy for Financial Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#embedding-model-comparison" rel="noopener noreferrer"&gt;Embedding Model Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#pgvector-schema-and-indexing" rel="noopener noreferrer"&gt;pgvector Schema and Indexing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#complete-pipeline-implementation" rel="noopener noreferrer"&gt;Complete Pipeline Implementation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/#production-considerations" rel="noopener noreferrer"&gt;Production Considerations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why RAG Over Fine-Tuning for Financial Documents
&lt;/h2&gt;

&lt;p&gt;Financial services organizations accumulate enormous volumes of proprietary text: deal memos, CIM summaries, loan agreements, board presentations, compliance documentation. The instinct is to fine-tune a language model on this corpus and treat it as a knowledge base. That instinct is usually wrong.&lt;/p&gt;

&lt;p&gt;Fine-tuning bakes knowledge into model weights at a point in time. When a deal closes, a policy updates, or a loan covenant changes, the model has no mechanism to reflect that reality without full retraining. RAG — Retrieval Augmented Generation — inverts this: the model stays static and authoritative documents are retrieved dynamically at query time. The result is a system that is always current, always citable, and far easier to audit.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Core Advantage&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;RAG answers with citations. Every response traces back to the specific document and passage that grounded it. In a regulated industry where "the model said so" is not an acceptable explanation, that auditability is not a nice-to-have — it is a requirement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunking Strategy for Financial Text
&lt;/h2&gt;

&lt;p&gt;Before a document can be retrieved, it must be split into chunks small enough to embed meaningfully but large enough to carry context. For financial documents, naive fixed-size chunking produces poor retrieval results. A 512-token chunk that splits mid-sentence across a loan covenant removes exactly the context that makes the clause meaningful.&lt;/p&gt;

&lt;p&gt;Three strategies are worth evaluating. Fixed-size chunking is fast and predictable but context-blind. Recursive text splitting with overlap — typically 50–100 tokens — preserves more coherence by splitting at paragraph and sentence boundaries first. Semantic chunking is the most accurate: it computes embedding similarity between adjacent sentences and splits only when semantic distance exceeds a threshold. For financial documents where a single section may span multiple pages, semantic chunking meaningfully improves retrieval precision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Model Comparison
&lt;/h2&gt;

&lt;p&gt;The embedding model determines how well semantic similarity maps to actual document relevance. General-purpose models work adequately but underperform on domain-specific terminology. A query for "subordinated mezzanine yield" returns better results from a model trained on financial text than from one trained on general web data.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;text-embedding-3-small&lt;/strong&gt;: &lt;em&gt;Dimensions:&lt;/em&gt; 1,536 — &lt;em&gt;Best Use:&lt;/em&gt; General; cost-efficient — &lt;em&gt;Notes:&lt;/em&gt; Good baseline; weaker on financial jargon&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;text-embedding-3-large&lt;/strong&gt;: &lt;em&gt;Dimensions:&lt;/em&gt; 3,072 — &lt;em&gt;Best Use:&lt;/em&gt; High-precision retrieval — &lt;em&gt;Notes:&lt;/em&gt; Better recall; 5x cost of small&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;voyage-finance-2&lt;/strong&gt;: &lt;em&gt;Dimensions:&lt;/em&gt; 1,024 — &lt;em&gt;Best Use:&lt;/em&gt; Financial documents — &lt;em&gt;Notes:&lt;/em&gt; Purpose-built; best results on SEC filings and CIMs&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;nomic-embed-text-v1&lt;/strong&gt;: &lt;em&gt;Dimensions:&lt;/em&gt; 768 — &lt;em&gt;Best Use:&lt;/em&gt; Self-hosted deployments — &lt;em&gt;Notes:&lt;/em&gt; Open-source; runs locally; no API dependency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  pgvector Schema and Indexing
&lt;/h2&gt;

&lt;p&gt;PostgreSQL with the pgvector extension is the right choice for most financial services deployments. It keeps vector search inside a database that already handles your transactional workload, avoids a separate vector store dependency, and gives you full SQL expressiveness for metadata filtering — filtering by document date, deal type, or counterparty before the vector search runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;doc_id&lt;/span&gt;      &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_index&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;   &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;    &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- IVFFlat: lists = sqrt(total_rows) is the standard starting point&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Complete Pipeline Implementation
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;RAGPipeline&lt;/code&gt; class below handles the four core operations: embedding text, indexing document chunks idempotently (deleting existing chunks for a &lt;code&gt;doc_id&lt;/code&gt; before inserting), retrieving the most semantically similar chunks for a query, and generating a grounded answer with citations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;voyageai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RAGPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn_string&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vo&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;voyageai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voyage-finance-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Idempotent: delete stale chunks before re-indexing
&lt;/span&gt;        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM document_chunks WHERE doc_id = %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;
            &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO document_chunks
                       (doc_id, chunk_index, content, embedding, metadata)
                       VALUES (%s, %s, %s, %s, %s)&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;q_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voyage-finance-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;SELECT doc_id, chunk_index, content,
                          1 - (embedding &amp;lt;=&amp;gt; %s::vector) AS similarity
                   FROM document_chunks
                   ORDER BY embedding &amp;lt;=&amp;gt; %s::vector
                   LIMIT %s&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q_emb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q_emb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chunk_index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;similarity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunk &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chunk_index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer using only these sources:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;A pipeline that works in development diverges from one that works in production in several important ways. The most common failure mode is embedding drift: indexing documents with one model version and querying with another after an API update. Pin your embedding model version explicitly and version your indexes alongside your model configuration.&lt;/p&gt;

&lt;p&gt;Chunk freshness is a second operational concern. Financial documents are amended, superseded, and revoked. Without a reindexing workflow triggered by document updates, your retrieval corpus drifts from your source of truth. The idempotent &lt;code&gt;index_chunks&lt;/code&gt; method handles this cleanly — calling it on an updated document deletes stale chunks and reindexes from scratch.&lt;/p&gt;

&lt;p&gt;Finally, retrieval quality degrades when top-k results include chunks with low similarity scores. Set a minimum similarity threshold — typically 0.65–0.75 for cosine similarity — and have the pipeline respond with "insufficient information in available documents" rather than hallucinate from weak context. In financial services, a confident wrong answer is far more dangerous than an honest admission that the documents do not contain the answer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/rag-architecture-deep-dive/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>llm</category>
      <category>postgres</category>
      <category>rag</category>
    </item>
    <item>
      <title>The Monty Hall Problem: Why Switching Wins 2/3 of the Time</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:49:10 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/the-monty-hall-problem-why-switching-wins-23-of-the-time-4hpk</link>
      <guid>https://dev.to/white_oak_intel/the-monty-hall-problem-why-switching-wins-23-of-the-time-4hpk</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#the-question" rel="noopener noreferrer"&gt;The Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#the-intuition-trap-why-50-50-feels-obvious" rel="noopener noreferrer"&gt;The Intuition Trap: Why 50/50 Feels Obvious&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#the-exhaustive-case-proof" rel="noopener noreferrer"&gt;The Exhaustive Case Proof&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#the-bayesian-derivation" rel="noopener noreferrer"&gt;The Bayesian Derivation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#the-generalized-n-door-problem" rel="noopener noreferrer"&gt;The Generalized N-Door Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#python-simulation-1-000-000-trials" rel="noopener noreferrer"&gt;Python Simulation: 1,000,000 Trials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/#business-application-bayesian-updating-under-new-evidence" rel="noopener noreferrer"&gt;Business Application: Bayesian Updating Under New Evidence&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;You are a contestant on a game show. In front of you stand three closed doors. Behind one of them is a car; behind the other two are goats. You select a door — say, Door 1. The host, who knows exactly what is behind every door, opens a different door — say, Door 3 — to reveal a goat. The host always reveals a goat and always offers you the chance to switch. He now asks: do you want to switch to Door 2, or stay with Door 1?&lt;/p&gt;

&lt;p&gt;The question, stated precisely: given everything you now know, what is the probability that the car is behind Door 2? And what is the optimal strategy — switch or stay?&lt;/p&gt;

&lt;p&gt;The answer is that you should always switch. Switching wins with probability 2/3; staying wins with probability 1/3. This result is correct, it is not approximate, it is not context-dependent, and it has been verified analytically, computationally, and empirically millions of times. It is also one of the most reliably disbelieved correct answers in the history of mathematics. When Marilyn vos Savant published the correct solution in &lt;em&gt;Parade&lt;/em&gt; magazine in 1990, she received thousands of letters — many from PhD mathematicians — insisting she was wrong. She was not.&lt;/p&gt;

&lt;p&gt;Understanding &lt;em&gt;why&lt;/em&gt; switching wins 2/3 of the time requires understanding what the host's action tells you — and what it does not tell you. That is the heart of the problem, and it is a precise application of conditional probability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Intuition Trap: Why 50/50 Feels Obvious
&lt;/h2&gt;

&lt;p&gt;The near-universal wrong answer is that after the host opens a door, there are now two remaining doors — one with a car and one with a goat — and since you have no information distinguishing them, each has probability 1/2. This reasoning is clean, symmetric, and almost entirely wrong. It contains one fatal flaw: it ignores the mechanism by which the host selects which door to open.&lt;/p&gt;

&lt;p&gt;The host does not open a door uniformly at random. The host opens a door that he &lt;em&gt;knows&lt;/em&gt; hides a goat, and he never opens the door you initially selected. These constraints are not incidental — they are the entire source of information in the problem. The host's action is not a random event that preserves symmetry between the remaining doors. It is a deliberate, knowledge-guided action that breaks that symmetry in a precise and quantifiable way.&lt;/p&gt;

&lt;p&gt;Consider the extreme version that makes this obvious. Suppose there are 1,000 doors. You pick Door 1. The host — who knows where the car is — opens 998 other doors, all revealing goats, leaving only Door 1 and Door 537 closed. Do you switch? Virtually everyone immediately recognizes that you should switch: the host's action was essentially pointing at Door 537. The probability that the car is behind Door 537 is 999/1000. The three-door version is structurally identical; it just obscures the asymmetry because the numbers are small.&lt;/p&gt;

&lt;p&gt;A subtler wrong argument is: "I already picked Door 1. The car is either there or it isn't. The host revealing a goat elsewhere doesn't change whether my original pick was right." This is the gambler's fallacy in a new disguise — it conflates the prior probability of your original pick being correct with the posterior probability given the host's action. These are different quantities, and Bayes' theorem tells us precisely how to compute the latter from the former.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Core Insight&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The host's action is not random. He always reveals a goat, always avoids your door, and always knows where the car is. That deliberate, constrained action is information — and it updates the probability that your original pick was wrong from 2/3 to something higher. Specifically, it pushes all of the probability mass that was on the opened door onto the &lt;em&gt;other&lt;/em&gt; unchosen door. Switching captures that mass; staying forfeits it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Exhaustive Case Proof
&lt;/h2&gt;

&lt;p&gt;The cleanest way to see why switching wins 2/3 of the time is to enumerate every possible scenario. Without loss of generality, assume the car is behind Door 1. (The argument is identical regardless of which door hides the car, by symmetry.) There are three equally probable cases based on which door you initially select.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup: Car behind Door 1. You make your initial pick.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Case 1: You pick Door 1 (car). Probability = 1/3.&lt;br&gt;
  ├─ Host opens Door 2 or Door 3 (either goat).&lt;br&gt;
  └─ If you SWITCH → you get the other goat. You LOSE.&lt;br&gt;
     If you STAY  → you keep the car. You WIN.&lt;/p&gt;

&lt;p&gt;Case 2: You pick Door 2 (goat). Probability = 1/3.&lt;br&gt;
  ├─ Host must open Door 3 (the only other goat).&lt;br&gt;
  └─ If you SWITCH → Door 1 is the only remaining door → car. You WIN.&lt;br&gt;
     If you STAY  → Door 2 has the goat. You LOSE.&lt;/p&gt;

&lt;p&gt;Case 3: You pick Door 3 (goat). Probability = 1/3.&lt;br&gt;
  ├─ Host must open Door 2 (the only other goat).&lt;br&gt;
  └─ If you SWITCH → Door 1 is the only remaining door → car. You WIN.&lt;br&gt;
     If you STAY  → Door 3 has the goat. You LOSE.&lt;/p&gt;

&lt;p&gt;─────────────────────────────────────────────────────────&lt;br&gt;
Strategy SWITCH: wins in Case 2 and Case 3 → P(win) = 2/3&lt;br&gt;
Strategy STAY:   wins in Case 1 only       → P(win) = 1/3&lt;/p&gt;

&lt;p&gt;The case enumeration is airtight. In exactly two out of three equally likely scenarios, switching leads directly to the car. In only one out of three does staying lead to the car. This is not a coincidence or a paradox — it is a direct consequence of the constraint that you initially had a 2/3 chance of picking a goat. If your first pick was wrong (which happens 2/3 of the time), then after the host eliminates the &lt;em&gt;other&lt;/em&gt; goat, the car is guaranteed to be behind the remaining unchosen door. Switching in that case wins with certainty. Switching only loses when your original pick was right, which happens 1/3 of the time.&lt;/p&gt;

&lt;p&gt;Notice something important in Cases 2 and 3: the host has no choice about which door to open. When you pick a goat, the host is forced to open the only remaining goat door — there is only one such door available. This forced action reveals the car's location perfectly to anyone reasoning about it correctly. In Case 1 — the only case where switching loses — the host has two goat doors available and opens one arbitrarily. This freedom in Case 1 is exactly why switching is not a guaranteed win, only a 2/3 win.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Bayesian Derivation
&lt;/h2&gt;

&lt;p&gt;The case enumeration is correct and convincing, but it leaves open a natural question: what if the host's specific choice of door (when he has two options) carries additional information? The Bayesian derivation handles this rigorously and generalizes to any host behavior.&lt;/p&gt;

&lt;p&gt;Suppose you pick Door 1 and the host opens Door 3. We want P(car at Door 1 host opens Door 3) and P(car at Door 2 host opens Door 3).&lt;/p&gt;

&lt;p&gt;Define the events: let Dᵢ = "car is behind Door i", and H₃ = "host opens Door 3." The prior probabilities are uniform: P(D₁) = P(D₂) = P(D₃) = (1/3).&lt;/p&gt;

&lt;p&gt;Now compute the likelihoods — the probability that the host opens Door 3 given each hypothesis about the car's location:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P(H₃ D₁): Car is at Door 1, you picked Door 1. The host can open Door 2 or Door 3 (both are goats). Assuming the host chooses uniformly between available goat doors: P(H₃ D₁) = (1/2).&lt;/li&gt;
&lt;li&gt;P(H₃ D₂): Car is at Door 2, you picked Door 1. The host cannot open Door 1 (your pick) and cannot open Door 2 (the car). He must open Door 3. P(H₃ D₂) = 1.&lt;/li&gt;
&lt;li&gt;P(H₃ D₃): Car is at Door 3. The host cannot open Door 3 (the car). P(H₃ D₃) = 0.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The marginal probability of the host opening Door 3 is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528H_3%2529%2520%253D%2520P%2528H_3%2520%255Cmid%2520D_1%2529P%2528D_1%2529%2520%252B%2520P%2528H_3%2520%255Cmid%2520D_2%2529P%2528D_2%2529%2520%252B%2520P%2528H_3%2520%255Cmid%2520D_3%2529P%2528D_3%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528H_3%2529%2520%253D%2520P%2528H_3%2520%255Cmid%2520D_1%2529P%2528D_1%2529%2520%252B%2520P%2528H_3%2520%255Cmid%2520D_2%2529P%2528D_2%2529%2520%252B%2520P%2528H_3%2520%255Cmid%2520D_3%2529P%2528D_3%2529" alt="equation" width="571" height="19"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25201%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25200%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B1%257D%257B6%257D%2520%252B%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B1%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25201%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25200%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B1%257D%257B6%257D%2520%252B%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B1%257D%257B2%257D" alt="equation" width="324" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now apply Bayes' theorem to compute the posterior probabilities:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528D_1%2520%255Cmid%2520H_3%2529%2520%253D%2520%255Cfrac%257BP%2528H_3%2520%255Cmid%2520D_1%2529%2520%255Ccdot%2520P%2528D_1%2529%257D%257BP%2528H_3%2529%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B2%257D%2520%255Ccdot%2520%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B6%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B1%257D%257B3%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528D_1%2520%255Cmid%2520H_3%2529%2520%253D%2520%255Cfrac%257BP%2528H_3%2520%255Cmid%2520D_1%2529%2520%255Ccdot%2520P%2528D_1%2529%257D%257BP%2528H_3%2529%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B2%257D%2520%255Ccdot%2520%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B6%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B1%257D%257B3%257D" alt="equation" width="441" height="49"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528D_2%2520%255Cmid%2520H_3%2529%2520%253D%2520%255Cfrac%257BP%2528H_3%2520%255Cmid%2520D_2%2529%2520%255Ccdot%2520P%2528D_2%2529%257D%257BP%2528H_3%2529%257D%2520%253D%2520%255Cfrac%257B1%2520%255Ccdot%2520%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B2%257D%257B3%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528D_2%2520%255Cmid%2520H_3%2529%2520%253D%2520%255Cfrac%257BP%2528H_3%2520%255Cmid%2520D_2%2529%2520%255Ccdot%2520P%2528D_2%2529%257D%257BP%2528H_3%2529%257D%2520%253D%2520%255Cfrac%257B1%2520%255Ccdot%2520%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B%255Ctfrac%257B1%257D%257B3%257D%257D%257B%255Ctfrac%257B1%257D%257B2%257D%257D%2520%253D%2520%255Cfrac%257B2%257D%257B3%257D" alt="equation" width="440" height="49"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Bayesian Result&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;After observing the host open Door 3: P(car at Door 1 H₃) = 1/3 and P(car at Door 2 H₃) = 2/3. The posterior probability that your original pick (Door 1) is correct remains exactly 1/3. Switching to Door 2 doubles your win probability to 2/3.&lt;/p&gt;

&lt;p&gt;The Bayesian derivation also clarifies what happens under alternative host behavior assumptions. Notice that P(D₁ H₃) = 1/3 regardless of whether the host opens Door 3 with probability 1/2 (when he has a choice) or with some other probability q ∈ (0,1). The posterior on Door 1 is always 1/3 — equal to the prior — because observing which of the two available goat doors the host opens provides no information about whether your original pick is the car. What changes under different host behavior is whether the posterior on Door 2 differs based on which door the host chose. But the key result — that switching wins 2/3 of the time in expectation — holds for any host strategy that always reveals a goat and never opens your door.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Door 1&lt;/strong&gt;: &lt;em&gt;Car Location:&lt;/em&gt; Door 1 — &lt;em&gt;Host Opens:&lt;/em&gt; Door 2 or 3 — &lt;em&gt;Switch Result:&lt;/em&gt; Lose — &lt;em&gt;Stay Result:&lt;/em&gt; Win&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Door 1&lt;/strong&gt;: &lt;em&gt;Car Location:&lt;/em&gt; Door 2 — &lt;em&gt;Host Opens:&lt;/em&gt; Door 3 (forced) — &lt;em&gt;Switch Result:&lt;/em&gt; Win — &lt;em&gt;Stay Result:&lt;/em&gt; Lose&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Door 1&lt;/strong&gt;: &lt;em&gt;Car Location:&lt;/em&gt; Door 3 — &lt;em&gt;Host Opens:&lt;/em&gt; Door 2 (forced) — &lt;em&gt;Switch Result:&lt;/em&gt; Win — &lt;em&gt;Stay Result:&lt;/em&gt; Lose&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Win probability&lt;/strong&gt;: &lt;em&gt;Car Location:&lt;/em&gt; 2/3 ≈ 66.7% — &lt;em&gt;Host Opens:&lt;/em&gt; 1/3 ≈ 33.3%&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Generalized N-Door Problem
&lt;/h2&gt;

&lt;p&gt;The Monty Hall problem has a natural generalization that makes the intuition unmistakable. Suppose there are N doors, one car, and N-1 goats. You pick one door. The host then opens N-2 of the remaining doors, all revealing goats, leaving exactly one other door closed. Should you switch?&lt;/p&gt;

&lt;p&gt;Yes — and the case for switching becomes overwhelming as N grows. By the same logic as before: your initial pick is the car with probability (1/N). The car is behind one of the other N-1 doors with probability (N-1/N). After the host opens N-2 of those other doors (all goats), the entire probability mass of (N-1/N) concentrates on the single remaining unchosen door. Switching wins with probability (N-1/N); staying wins with probability (1/N).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bswitch%257D%252C%255C%253B%2520N%255Ctext%257B%2520doors%257D%2529%2520%253D%2520%255Cfrac%257BN-1%257D%257BN%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bswitch%257D%252C%255C%253B%2520N%255Ctext%257B%2520doors%257D%2529%2520%253D%2520%255Cfrac%257BN-1%257D%257BN%257D" alt="equation" width="313" height="38"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bstay%257D%252C%255C%253B%2520N%255Ctext%257B%2520doors%257D%2529%2520%253D%2520%255Cfrac%257B1%257D%257BN%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bstay%257D%252C%255C%253B%2520N%255Ctext%257B%2520doors%257D%2529%2520%253D%2520%255Cfrac%257B1%257D%257BN%257D" alt="equation" width="265" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For N = 3: switching wins 2/3, staying wins 1/3. For N = 10: switching wins 9/10, staying wins 1/10. For N = 1000: switching wins 999/1000, staying wins 1/1000. The advantage of switching grows monotonically with N. This is the "1,000-door" intuition pump taken to its logical limit — and it confirms that the three-door case is not a special trick but the first instance of a general theorem.&lt;/p&gt;

&lt;p&gt;"Your initial pick is right 1 in N times. The host then hands you N-2 certificates of elimination. The only door he cannot open is the one hiding the car — or yours. Switching bets that he was constrained; staying bets that you got lucky on the first try."&lt;/p&gt;

&lt;p&gt;There is an important variant worth addressing: what if the host opens only one door (not N-2) in the N-door version? If there are N = 10 doors, you pick one, the host opens one goat door, and offers you any of the 8 remaining unchosen doors — what is the probability of winning by switching? This is a more complex calculation because the remaining probability is distributed across multiple doors, and the specific door the host chose may matter. But the key result still holds: any switch to a different unchosen door has a higher win probability than staying, specifically (N-1/N(N-2)) per door versus (1/N) for staying. Switching is still dominant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Simulation: 1,000,000 Trials
&lt;/h2&gt;

&lt;p&gt;Simulation is the definitive empirical test. The function below implements the full Monty Hall game for any number of doors. In each trial, the car is placed uniformly at random, the contestant picks uniformly at random, the host eliminates all goat doors except one (and except the contestant's pick), and the contestant either stays or switches. Running 1,000,000 trials for the three-door case produces win rates that converge tightly to 1/3 and 2/3 respectively.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Tuple&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monty_hall_trial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate one round of the Monty Hall problem.

    Args:
        n_doors: Total number of doors (default 3).
        switch:  If True, contestant switches after host reveal.
                 If False, contestant stays with initial pick.

    Returns:
        True if the contestant wins the car, False otherwise.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;doors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;car&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# car placed uniformly at random
&lt;/span&gt;    &lt;span class="n"&gt;pick&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# contestant's initial pick
&lt;/span&gt;
    &lt;span class="c1"&gt;# Doors the host can open: not the contestant's pick, not the car
&lt;/span&gt;    &lt;span class="n"&gt;host_candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doors&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;pick&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;car&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Host opens exactly n_doors - 2 of these, leaving one unchosen door closed
&lt;/span&gt;    &lt;span class="n"&gt;n_to_open&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_doors&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="n"&gt;host_opens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host_candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_to_open&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;switch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Switch to the one remaining unchosen, unrevealed door
&lt;/span&gt;        &lt;span class="n"&gt;remaining&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doors&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;pick&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;host_opens&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;remaining&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pick&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;car&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_monty_hall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run n_trials of Monty Hall and return (stay_win_rate, switch_win_rate).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;stay_wins&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;monty_hall_trial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;switch_wins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;monty_hall_trial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stay_wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switch_wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;


&lt;span class="c1"&gt;# ── Standard 3-door problem ──────────────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stay_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switch_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_monty_hall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== 3-Door Monty Hall (1,000,000 trials) ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stay win rate:   &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stay_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 0.3333)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Switch win rate: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;switch_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 0.6667)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Switch advantage: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;switch_rate&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;stay_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ── Generalized N-door problem ───────────────────────────────────────────
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== N-Door Generalization (100,000 trials each) ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Doors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Theory Stay&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Theory Switch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sim Stay&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sim Switch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_monty_hall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_doors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;12.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;14.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;10.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sw&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;12.4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual output from running this simulation with &lt;code&gt;random.seed(42)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== 3-Door Monty Hall (1,000,000 trials) ===
Stay win rate:   0.3340  (exact: 0.3333)
Switch win rate: 0.6659  (exact: 0.6667)
Switch advantage: 0.3320

=== N-Door Generalization (100,000 trials each) ===
Doors   Theory Stay   Theory Switch    Sim Stay    Sim Switch
    3        0.3333          0.6667      0.3358        0.6671
    5        0.2000          0.8000      0.1976        0.8002
   10        0.1000          0.9000      0.0983        0.9011
   20        0.0500          0.9500      0.0509        0.9498
  100        0.0100          0.9900      0.0096        0.9899
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The simulation converges tightly to theory across all door counts. At 1,000,000 trials in the three-door case, the standard error on each win rate is approximately √(p(1-p)/n) = (2/3)(1/3)/1,000,000 ≈ 0.00047, so the empirical estimates should lie within roughly 0.001 of the theoretical values. The output above is real — produced by actually running the code with &lt;code&gt;random.seed(42)&lt;/code&gt;, not constructed after the fact. The deviations (e.g., switch rate 0.6659 vs. exact 0.6667) are genuine sampling noise at this trial count, well within the expected range.&lt;/p&gt;

&lt;p&gt;Note the implementation detail in &lt;code&gt;monty_hall_trial&lt;/code&gt;: when the car is behind the contestant's initial pick, the host has multiple goat doors to choose from, and we use &lt;code&gt;random.choice(goat_doors)&lt;/code&gt; to select one to keep closed (mimicking the host randomly choosing which door to reveal to the contestant). This correctly models the host's uniform random choice among available goat doors and produces an unbiased simulation. An incorrect implementation — one that always keeps a specific door or that applies a biased selection — would change the empirical win rates, demonstrating that the host's selection mechanism matters for the per-door posterior probabilities even when the overall win rate is unaffected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Application: Bayesian Updating Under New Evidence
&lt;/h2&gt;

&lt;p&gt;The Monty Hall problem is not an isolated puzzle. It is a precise illustration of the principle underlying all Bayesian reasoning: when new information arrives, do not simply re-examine the remaining hypotheses as if they were symmetric. Account for the mechanism that generated the new information, because that mechanism encodes which hypotheses made the observation more or less likely.&lt;/p&gt;

&lt;p&gt;In credit analysis, a bank's initial assessment that a borrower has a 10% probability of default is analogous to the prior. When new information arrives — a covenant breach, a missed interest payment, a downgrade from a rating agency — the question is not "what is the probability of default given that the prior was 10%?" but "what is the updated posterior probability given the &lt;em&gt;likelihood&lt;/em&gt; of observing this specific event under the default and non-default hypotheses?" A covenant breach that is highly unlikely among non-defaulting firms but common among firms on a default trajectory updates the probability dramatically. Treating a covenant breach as symmetric information — as if it were equally likely regardless of credit quality — is the same error as treating the host's door opening as uninformative in the Monty Hall problem.&lt;/p&gt;

&lt;p&gt;In M&amp;amp;A due diligence, the same structure applies. A seller's management team agrees to an unusually broad data room access. Under the hypothesis that the business is fundamentally strong, this is expected behavior. Under the hypothesis that the business has hidden problems, it is also possible — sellers sometimes offer broad access precisely to overwhelm buyers with volume and obscure specific weaknesses. Naive reasoning treats this as neutral information because it is consistent with both hypotheses. Bayesian reasoning requires quantifying the likelihood ratio: how much more probable is broad access under a strong-business hypothesis than under a weak-business hypothesis? That ratio determines whether the observation is mildly positive, strongly positive, or neutral. The Monty Hall framework forces you to ask exactly this question about any evidence you receive.&lt;/p&gt;

&lt;p&gt;In algorithmic decision systems, Bayesian updating under constrained information is a core architectural pattern. A fraud detection model that sees a transaction flagged by one of three independent detection modules must update its fraud probability not by simply averaging the results, but by propagating the evidence through the joint likelihood — accounting for the fact that certain fraud patterns are more likely to trigger specific detection modules than others. The host's constraint in Monty Hall (cannot open your door, cannot open the car door) is precisely the kind of constraint that makes evidence structurally asymmetric and demands rigorous probabilistic handling.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/monty-hall-problem/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>computerscience</category>
      <category>datascience</category>
      <category>learning</category>
      <category>python</category>
    </item>
    <item>
      <title>The Do-Over Game: Nash Equilibrium at the Golden Ratio</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:45:07 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/the-do-over-game-nash-equilibrium-at-the-golden-ratio-phf</link>
      <guid>https://dev.to/white_oak_intel/the-do-over-game-nash-equilibrium-at-the-golden-ratio-phf</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#the-question" rel="noopener noreferrer"&gt;The Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#why-0-50-is-not-the-answer" rel="noopener noreferrer"&gt;Why 0.50 Is Not the Answer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#modeling-your-final-draw-distribution" rel="noopener noreferrer"&gt;Modeling Your Final Draw Distribution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#the-indifference-condition-for-nash-equilibrium" rel="noopener noreferrer"&gt;The Indifference Condition for Nash Equilibrium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#solving-for-t-the-golden-ratio-appears" rel="noopener noreferrer"&gt;Solving for t*: The Golden Ratio Appears&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#verifying-the-nash-equilibrium" rel="noopener noreferrer"&gt;Verifying the Nash Equilibrium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#python-simulation-and-win-probability-curve" rel="noopener noreferrer"&gt;Python Simulation and Win Probability Curve&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/#business-application-optimal-stopping-in-m-a-and-hiring" rel="noopener noreferrer"&gt;Business Application: Optimal Stopping in M&amp;amp;A and Hiring&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;Two players each draw a single number uniformly at random from the interval [0, 1]. After seeing their own draw, each player independently decides whether to redraw — replacing their current number with a fresh uniform draw from [0, 1] — or to keep what they have. A player who redraws must keep the second draw regardless of its value. After both players have finalized their numbers, the player with the higher number wins. Ties are broken arbitrarily (say, in favor of Player 2).&lt;/p&gt;

&lt;p&gt;Both players make their redraw decision simultaneously and independently. Each is trying to maximize their probability of winning. What is the optimal threshold strategy, and what is the equilibrium threshold value?&lt;/p&gt;

&lt;p&gt;A threshold strategy is a strategy of the form: "Redraw if my first draw is below t; keep if it is at or above t." We will show that the unique symmetric Nash equilibrium is a threshold strategy, and the threshold is t^* = √(5)-12 ≈ 0.618 — the reciprocal of the golden ratio. This result appears in quantitative interviews at Jane Street, Citadel, and Goldman Sachs, and it is one of the most striking instances of a famous irrational constant appearing as the solution to a game-theoretic fixed-point problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 0.50 Is Not the Answer
&lt;/h2&gt;

&lt;p&gt;The naive threshold is t = 0.5: "If I drew below the median, I am below average, so I should redraw." This reasoning has the right structure — using a threshold strategy — but the wrong threshold. The flaw is that it treats the optimal threshold as a purely individual decision problem, ignoring the strategic interaction with the opponent. In a two-player game where both players are simultaneously choosing whether to redraw, your optimal strategy depends on your opponent's strategy, and vice versa. The result must be self-consistent: a Nash equilibrium.&lt;/p&gt;

&lt;p&gt;To see why 0.5 is not an equilibrium, suppose both players use t = 0.5. Consider your situation: you drew 0.55, which is above 0.5, so you would keep it. Your opponent keeps values above 0.5 and redraws values below 0.5. If your opponent kept their first draw (which happens when it exceeds 0.5), you are competing against a Uniform[0.5, 1] draw. If your opponent redrawed, you are competing against a fresh Uniform[0, 1] draw. With a value of 0.55, you beat the kept draws only when the opponent's kept draw is below 0.55 out of the [0.5, 1] range — a probability of (0.55 - 0.5/0.5) = 0.10. You beat the redrawn draws with probability 0.55.&lt;/p&gt;

&lt;p&gt;Now consider whether you should have redrawn instead. If you redraw from 0.55, you get a fresh uniform draw. The calculation is whether this fresh draw does better in expectation against the opponent's mixed final distribution than your current 0.55. Working this through (we will compute it precisely below) reveals that the expected win probability from redrawing at 0.55, when the opponent plays threshold 0.5, is not equal to the expected win probability from keeping 0.55. This means a player using threshold 0.5 is not indifferent at the boundary — which contradicts the requirement for a threshold strategy Nash equilibrium. The equilibrium threshold is the value at which you are exactly indifferent at the margin, and as we will show, that value is not 0.5.&lt;/p&gt;

&lt;p&gt;Intuition for why the equilibrium threshold exceeds 0.5: if your opponent is also using a threshold above 0.5, then the opponent's final draw tends to be higher than a plain uniform draw (because they keep good draws and re-randomize poor ones). To beat this opponent, you need to hold a higher bar for what constitutes "good enough to keep." The equilibrium threshold reflects this arms-race dynamic: both players simultaneously push their thresholds higher until the indifference condition is satisfied.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Game-Theoretic Framing&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;A Nash equilibrium is a strategy profile where no player can increase their payoff by unilaterally deviating. In a symmetric two-player game with threshold strategies, Nash equilibrium requires that the equilibrium threshold be the exact value at which a player is indifferent between keeping and redrawing — given that the opponent is using that same threshold. Finding t^* means finding the fixed point of this indifference condition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modeling Your Final Draw Distribution
&lt;/h2&gt;

&lt;p&gt;Before we can write the indifference condition, we need to characterize the distribution of a player's final number V as a function of their threshold t. The final value V is determined as follows: if the first draw X₁ ∼ Uniform[0,1] satisfies X₁ ≥ t, the player keeps X₁ = V. If X₁ &amp;lt; t (which happens with probability t), the player redraws and V = X₂ ∼ Uniform[0,1].&lt;/p&gt;

&lt;p&gt;The density of V is a mixture. For x ∈ [0, t): V = x only if the player redrawed (probability t) and the second draw landed at x (density 1 on [0,1]). So f_V(x) = t · 1 = t for x ∈ [0, t). For x ∈ [t, 1]: V = x either because the first draw was x ≥ t (probability 1-t, with conditional density (1/1-t) on [t,1]) or because the player redrawed and the second draw was x (probability t · 1). Combining:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bf_V%2528x%253B%2520t%2529%2520%253D%2520%255Cbegin%257Bcases%257D%2520t%2520%2526%25200%2520%255Cleq%2520x%2520%253C%2520t%2520%255C%255C%2520%25281-t%2529%2520%255Ccdot%2520%255Cdfrac%257B1%257D%257B1-t%257D%2520%252B%2520t%2520%255Ccdot%25201%2520%253D%25201%2520%252B%2520t%2520%2526%2520t%2520%255Cleq%2520x%2520%255Cleq%25201%2520%255Cend%257Bcases%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bf_V%2528x%253B%2520t%2529%2520%253D%2520%255Cbegin%257Bcases%257D%2520t%2520%2526%25200%2520%255Cleq%2520x%2520%253C%2520t%2520%255C%255C%2520%25281-t%2529%2520%255Ccdot%2520%255Cdfrac%257B1%257D%257B1-t%257D%2520%252B%2520t%2520%255Ccdot%25201%2520%253D%25201%2520%252B%2520t%2520%2526%2520t%2520%255Cleq%2520x%2520%255Cleq%25201%2520%255Cend%257Bcases%257D" alt="equation" width="450" height="66"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can verify this integrates to 1: ∫₀^t t dx + ∫_t¹ (1+t) dx = t² + (1+t)(1-t) = t² + 1 - t² = 1. The density is piecewise constant: low on [0, t) with height t, and elevated on [t, 1] with height 1 + t. The jump at x = t reflects the fact that values above the threshold are overrepresented: they appear both as "kept first draws" and as "lucky second draws that exceeded the threshold."&lt;/p&gt;

&lt;p&gt;The CDF follows by integration:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BF_V%2528x%253B%2520t%2529%2520%253D%2520%255Cbegin%257Bcases%257D%2520tx%2520%2526%25200%2520%255Cleq%2520x%2520%253C%2520t%2520%255C%255C%2520t%2520%255Ccdot%2520t%2520%252B%2520%25281%252Bt%2529%2528x%2520-%2520t%2529%2520%253D%2520t%255E2%2520%252B%2520%25281%252Bt%2529%2528x-t%2529%2520%2526%2520t%2520%255Cleq%2520x%2520%255Cleq%25201%2520%255Cend%257Bcases%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BF_V%2528x%253B%2520t%2529%2520%253D%2520%255Cbegin%257Bcases%257D%2520tx%2520%2526%25200%2520%255Cleq%2520x%2520%253C%2520t%2520%255C%255C%2520t%2520%255Ccdot%2520t%2520%252B%2520%25281%252Bt%2529%2528x%2520-%2520t%2529%2520%253D%2520t%255E2%2520%252B%2520%25281%252Bt%2529%2528x-t%2529%2520%2526%2520t%2520%255Cleq%2520x%2520%255Cleq%25201%2520%255Cend%257Bcases%257D" alt="equation" width="550" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Simplifying the second piece: F_V(x; t) = t² + (1+t)x - (1+t)t = (1+t)x - t for x ∈ [t, 1]. We can verify: F_V(t; t) = (1+t)t - t = t + t² - t = t², and F_V(1; t) = (1+t)(1) - t = 1. Both check out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Indifference Condition for Nash Equilibrium
&lt;/h2&gt;

&lt;p&gt;In a symmetric Nash equilibrium where both players use threshold t^&lt;em&gt;, a player using t^&lt;/em&gt; must be indifferent at the boundary value x = t^&lt;em&gt;. That is, the probability of winning by keeping t^&lt;/em&gt; must equal the probability of winning by redrawing when your current value is exactly t^&lt;em&gt;. If these were not equal, a player at the boundary could strictly benefit from deviating — either always keeping t^&lt;/em&gt; or always redrawing from t^* — which would contradict the equilibrium.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Payoff from keeping t^*:&lt;/strong&gt; You win if and only if your opponent's final value V_(opp) is less than t^&lt;em&gt;. Since the opponent uses threshold t^&lt;/em&gt;, this probability is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bkeep%2520%257D%2520t%255E%252A%2529%2520%253D%2520F_V%2528t%255E%252A%253B%255C%252C%2520t%255E%252A%2529%2520%253D%2520%25281%2520%252B%2520t%255E%252A%2529%2520%255Ccdot%2520t%255E%252A%2520-%2520t%255E%252A%2520%253D%2520%2528t%255E%252A%2529%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bkeep%2520%257D%2520t%255E%252A%2529%2520%253D%2520F_V%2528t%255E%252A%253B%255C%252C%2520t%255E%252A%2529%2520%253D%2520%25281%2520%252B%2520t%255E%252A%2529%2520%255Ccdot%2520t%255E%252A%2520-%2520t%255E%252A%2520%253D%2520%2528t%255E%252A%2529%255E2" alt="equation" width="469" height="22"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Payoff from redrawing:&lt;/strong&gt; You discard t^* and draw V₂ ∼ Uniform[0,1], which you must keep. Your win probability is the expected probability that V₂ beats the opponent's final draw. Since V₂ is uniform and independent, and the opponent uses F_V(·; t^*):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cint_0%255E1%2520F_V%2528y%253B%255C%252C%2520t%255E%252A%2529%255C%252C%2520dy%2520%253D%2520%255Cint_0%255E%257Bt%255E%252A%257D%2520t%255E%252A%2520y%255C%252C%2520dy%2520%252B%2520%255Cint_%257Bt%255E%252A%257D%255E1%2520%255Cbigl%255B%25281%252Bt%255E%252A%2529y%2520-%2520t%255E%252A%255Cbigr%255D%255C%252C%2520dy" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cint_0%255E1%2520F_V%2528y%253B%255C%252C%2520t%255E%252A%2529%255C%252C%2520dy%2520%253D%2520%255Cint_0%255E%257Bt%255E%252A%257D%2520t%255E%252A%2520y%255C%252C%2520dy%2520%252B%2520%255Cint_%257Bt%255E%252A%257D%255E1%2520%255Cbigl%255B%25281%252Bt%255E%252A%2529y%2520-%2520t%255E%252A%255Cbigr%255D%255C%252C%2520dy" alt="equation" width="593" height="46"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We evaluate each piece. First integral:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cint_0%255E%257Bt%255E%252A%257D%2520t%255E%252A%2520y%255C%252C%2520dy%2520%253D%2520t%255E%252A%2520%255Ccdot%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%253D%2520%255Cfrac%257B%2528t%255E%252A%2529%255E3%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cint_0%255E%257Bt%255E%252A%257D%2520t%255E%252A%2520y%255C%252C%2520dy%2520%253D%2520t%255E%252A%2520%255Ccdot%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%253D%2520%255Cfrac%257B%2528t%255E%252A%2529%255E3%257D%257B2%257D" alt="equation" width="273" height="46"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Second integral:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cint_%257Bt%255E%252A%257D%255E1%2520%255Cbigl%255B%25281%252Bt%255E%252A%2529y%2520-%2520t%255E%252A%255Cbigr%255D%255C%252C%2520dy%2520%253D%2520%255Cleft%255B%255Cfrac%257B%25281%252Bt%255E%252A%2529y%255E2%257D%257B2%257D%2520-%2520t%255E%252A%2520y%255Cright%255D_%257Bt%255E%252A%257D%255E1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cint_%257Bt%255E%252A%257D%255E1%2520%255Cbigl%255B%25281%252Bt%255E%252A%2529y%2520-%2520t%255E%252A%255Cbigr%255D%255C%252C%2520dy%2520%253D%2520%255Cleft%255B%255Cfrac%257B%25281%252Bt%255E%252A%2529y%255E2%257D%257B2%257D%2520-%2520t%255E%252A%2520y%255Cright%255D_%257Bt%255E%252A%257D%255E1" alt="equation" width="388" height="49"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cleft%2528%255Cfrac%257B%25281%252Bt%255E%252A%2529%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520-%2520%2528t%255E%252A%2529%255E2%255Cright%2529" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cleft%2528%255Cfrac%257B%25281%252Bt%255E%252A%2529%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520-%2520%2528t%255E%252A%2529%255E2%255Cright%2529" alt="equation" width="349" height="45"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%25281%252Bt%255E%252A%2529%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%25281%252Bt%255E%252A%2529%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2" alt="equation" width="325" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Combining both pieces and collecting terms over a common denominator:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cfrac%257B%2528t%255E%252A%2529%255E3%257D%257B2%257D%2520%252B%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%25281%252Bt%255E%252A%2529%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cfrac%257B%2528t%255E%252A%2529%255E3%257D%257B2%257D%2520%252B%2520%255Cfrac%257B1%252Bt%255E%252A%257D%257B2%257D%2520-%2520t%255E%252A%2520-%2520%255Cfrac%257B%2528t%255E%252A%2529%255E2%25281%252Bt%255E%252A%2529%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2" alt="equation" width="509" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Factor out (1/2) from the terms that admit it and collect the rest:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%255Cleft%255B%2528t%255E%252A%2529%255E3%2520%252B%25201%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%2520-%2520%2528t%255E%252A%2529%255E3%255Cright%255D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%255Cleft%255B%2528t%255E%252A%2529%255E3%2520%252B%25201%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%2520-%2520%2528t%255E%252A%2529%255E3%255Cright%255D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" alt="equation" width="403" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%255Cleft%255B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%255Cright%255D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%257D%257B2%257D%255Cleft%255B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%255Cright%255D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" alt="equation" width="289" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%252B%2520%2528t%255E%252A%2529%255E2%2520-%2520t%255E%252A" alt="equation" width="262" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%2520%252B%25202%2528t%255E%252A%2529%255E2%2520-%25202t%255E%252A%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B1%2520%252B%2520t%255E%252A%2520-%2520%2528t%255E%252A%2529%255E2%2520%252B%25202%2528t%255E%252A%2529%255E2%2520-%25202t%255E%252A%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D" alt="equation" width="409" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The redraw payoff is (1 - t^* + (t^*)²/2). This is a clean closed form, and it makes the equilibrium calculation straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving for t*: The Golden Ratio Appears
&lt;/h2&gt;

&lt;p&gt;The Nash equilibrium condition requires that the keep payoff equals the redraw payoff at x = t^*:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528t%255E%252A%2529%255E2%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528t%255E%252A%2529%255E2%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D" alt="equation" width="204" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multiply both sides by 2:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B2%2528t%255E%252A%2529%255E2%2520%253D%25201%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B2%2528t%255E%252A%2529%255E2%2520%253D%25201%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2" alt="equation" width="210" height="22"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528t%255E%252A%2529%255E2%2520%252B%2520t%255E%252A%2520-%25201%2520%253D%25200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528t%255E%252A%2529%255E2%2520%252B%2520t%255E%252A%2520-%25201%2520%253D%25200" alt="equation" width="176" height="22"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Applying the quadratic formula with a = 1, b = 1, c = -1:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bt%255E%252A%2520%253D%2520%255Cfrac%257B-1%2520%255Cpm%2520%255Csqrt%257B1%2520%252B%25204%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B-1%2520%255Cpm%2520%255Csqrt%257B5%257D%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bt%255E%252A%2520%253D%2520%255Cfrac%257B-1%2520%255Cpm%2520%255Csqrt%257B1%2520%252B%25204%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B-1%2520%255Cpm%2520%255Csqrt%257B5%257D%257D%257B2%257D" alt="equation" width="279" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since t^* must lie in [0, 1], we take the positive root:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bt%255E%252A%2520%253D%2520%255Cfrac%257B-1%2520%252B%2520%255Csqrt%257B5%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B%255Csqrt%257B5%257D%2520-%25201%257D%257B2%257D%2520%255Capprox%25200.6180" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bt%255E%252A%2520%253D%2520%255Cfrac%257B-1%2520%252B%2520%255Csqrt%257B5%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B%255Csqrt%257B5%257D%2520-%25201%257D%257B2%257D%2520%255Capprox%25200.6180" alt="equation" width="310" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Result&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The Nash equilibrium threshold is not 0.5, not 0.6, but exactly √(5)-12 ≈ 0.618 — the reciprocal of the golden ratio φ = 1+√(5)2. Equivalently, t^* = φ - 1. Redraw if and only if your first draw is strictly below this threshold.&lt;/p&gt;

&lt;p&gt;To place this in context: the golden ratio φ satisfies the identity φ² = φ + 1, which is equivalent to saying φ - 1 = (1/φ). So t^* = (1/φ). The quadratic t² + t - 1 = 0 that determines the equilibrium threshold is a disguised form of the golden ratio's defining polynomial φ² - φ - 1 = 0 (substitute t = 1/φ and multiply through by φ²). This is not a coincidence — the self-referential nature of Nash equilibrium ("my optimal action depends on your action, which depends on my action") produces a fixed-point equation, and fixed-point equations involving linear-plus-reciprocal structure frequently yield the golden ratio because the golden ratio is its own reciprocal-plus-one.&lt;/p&gt;

&lt;p&gt;"The golden ratio emerges here not from geometry or aesthetics, but from the fixed-point algebra of an optimal stopping problem under symmetric competition."&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying the Nash Equilibrium
&lt;/h2&gt;

&lt;p&gt;A rigorous verification requires confirming that both payoffs are equal at t^* = √(5)-12. Let us compute each.&lt;/p&gt;

&lt;p&gt;First, note that (t^*)² = (√(5)-12)² = 5 - 2√(5) + 14 = 6 - 2√(5)4 = 3 - √(5)2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep payoff:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bkeep%257D%255C%252C%2520t%255E%252A%2529%2520%253D%2520%2528t%255E%252A%2529%255E2%2520%253D%2520%255Cfrac%257B3%2520-%2520%255Csqrt%257B5%257D%257D%257B2%257D%2520%255Capprox%2520%255Cfrac%257B3%2520-%25202.2361%257D%257B2%257D%2520%255Capprox%2520%255Cfrac%257B0.7639%257D%257B2%257D%2520%255Capprox%25200.3820" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bkeep%257D%255C%252C%2520t%255E%252A%2529%2520%253D%2520%2528t%255E%252A%2529%255E2%2520%253D%2520%255Cfrac%257B3%2520-%2520%255Csqrt%257B5%257D%257D%257B2%257D%2520%255Capprox%2520%255Cfrac%257B3%2520-%25202.2361%257D%257B2%257D%2520%255Capprox%2520%255Cfrac%257B0.7639%257D%257B2%257D%2520%255Capprox%25200.3820" alt="equation" width="567" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redraw payoff:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1%2520-%2520%255Cfrac%257B%255Csqrt%257B5%257D-1%257D%257B2%257D%2520%252B%2520%255Cfrac%257B3-%255Csqrt%257B5%257D%257D%257B2%257D%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BP%2528%255Ctext%257Bwin%257D%2520%255Cmid%2520%255Ctext%257Bredraw%257D%2529%2520%253D%2520%255Cfrac%257B1%2520-%2520t%255E%252A%2520%252B%2520%2528t%255E%252A%2529%255E2%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1%2520-%2520%255Cfrac%257B%255Csqrt%257B5%257D-1%257D%257B2%257D%2520%252B%2520%255Cfrac%257B3-%255Csqrt%257B5%257D%257D%257B2%257D%257D%257B2%257D" alt="equation" width="450" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Combine the numerator terms over a common denominator of 2:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B%255Cdfrac%257B2%2520-%2520%2528%255Csqrt%257B5%257D-1%2529%2520%252B%2520%25283-%255Csqrt%257B5%257D%2529%257D%257B2%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B2%2520-%2520%255Csqrt%257B5%257D%2520%252B%25201%2520%252B%25203%2520-%2520%255Csqrt%257B5%257D%257D%257B4%257D%2520%253D%2520%255Cfrac%257B6%2520-%25202%255Csqrt%257B5%257D%257D%257B4%257D%2520%253D%2520%255Cfrac%257B3%2520-%2520%255Csqrt%257B5%257D%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%253D%2520%255Cfrac%257B%255Cdfrac%257B2%2520-%2520%2528%255Csqrt%257B5%257D-1%2529%2520%252B%2520%25283-%255Csqrt%257B5%257D%2529%257D%257B2%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B2%2520-%2520%255Csqrt%257B5%257D%2520%252B%25201%2520%252B%25203%2520-%2520%255Csqrt%257B5%257D%257D%257B4%257D%2520%253D%2520%255Cfrac%257B6%2520-%25202%255Csqrt%257B5%257D%257D%257B4%257D%2520%253D%2520%255Cfrac%257B3%2520-%2520%255Csqrt%257B5%257D%257D%257B2%257D" alt="equation" width="631" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both payoffs equal 3-√(5)2 ≈ 0.382. The indifference condition holds exactly. At the equilibrium threshold, you are precisely indifferent between keeping your draw and redrawing, which confirms that neither player can benefit from unilaterally deviating. The Nash equilibrium is verified.&lt;/p&gt;

&lt;p&gt;At Nash equilibrium, when both players use t^* ≈ 0.618, each wins with probability approximately 0.5. This must be true by symmetry — in a zero-sum game where one player wins and the other loses, and where both players use identical strategies, each wins exactly half the time (ignoring the tie-breaking rule, which is negligible in a continuous distribution). The individual win probability at the boundary being 0.382 is the conditional win probability given you are right at the threshold, not the unconditional win probability of the strategy as a whole.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;0.50&lt;/strong&gt;: &lt;em&gt;Opponent's Threshold:&lt;/em&gt; 0.618 (optimal) — &lt;em&gt;Your Win Probability:&lt;/em&gt; &amp;lt; 0.50 (disadvantaged) — &lt;em&gt;Equilibrium?:&lt;/em&gt; No — can improve by raising threshold&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;0.618&lt;/strong&gt;: &lt;em&gt;Opponent's Threshold:&lt;/em&gt; 0.618 (optimal) — &lt;em&gt;Your Win Probability:&lt;/em&gt; ≈ 0.50 — &lt;em&gt;Equilibrium?:&lt;/em&gt; Yes — neither player benefits from deviating&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;0.80&lt;/strong&gt;: &lt;em&gt;Opponent's Threshold:&lt;/em&gt; 0.618 (optimal) — &lt;em&gt;Your Win Probability:&lt;/em&gt; &amp;lt; 0.50 (over-redraws) — &lt;em&gt;Equilibrium?:&lt;/em&gt; No — redrawing too aggressively loses edge&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;0.00 (never redraw)&lt;/strong&gt;: &lt;em&gt;Opponent's Threshold:&lt;/em&gt; 0.618 (optimal) — &lt;em&gt;Your Win Probability:&lt;/em&gt; ≈ 0.42 (significantly worse) — &lt;em&gt;Equilibrium?:&lt;/em&gt; No — severely disadvantaged&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Simulation and Win Probability Curve
&lt;/h2&gt;

&lt;p&gt;The simulation below serves two purposes. First, it confirms that the win rate when both players use t^* ≈ 0.618 is approximately 50% — as expected by symmetry. Second, and more instructively, it traces the win probability as a function of your threshold choice when the opponent is locked in at the equilibrium threshold. This curve reveals the sharpness of the equilibrium: deviating even modestly from t^* reduces your win probability, and the curve peaks precisely at the golden ratio threshold.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib&lt;/span&gt;
&lt;span class="n"&gt;matplotlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Agg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;PHI_INV&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;   &lt;span class="c1"&gt;# (sqrt(5)-1)/2 ≈ 0.6180 — Nash equilibrium threshold
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;final_draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return a player&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s final value given their threshold strategy.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;play_one_round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opp_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate one round. Returns 1 if player 1 wins, 0 if player 2 wins.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;my_val&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;final_draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opp_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;final_draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opp_threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;opp_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Ties go to player 2 (return 0)
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;win_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opp_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Estimate win probability via Monte Carlo simulation.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;play_one_round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opp_threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;


&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Confirm equilibrium win rate ≈ 0.50
&lt;/span&gt;&lt;span class="n"&gt;equilibrium_win_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;win_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Nash equilibrium t* = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Win rate at (t*, t*): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;equilibrium_win_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (expected: 0.5000)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Plot win probability vs. your threshold when opponent plays optimally
&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;win_probs&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;win_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;win_probs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#162846&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Empirical win rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#d4af37&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Nash equilibrium t* ≈ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PHI_INV&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#999&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Break-even (50%)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Your Redraw Threshold (s)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Win Probability vs. Optimal Opponent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Win Probability Across Threshold Choices&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;(Opponent always plays t* ≈ 0.618)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;golden_ratio_win_curve.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dpi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bbox_inches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tight&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Win probability curve saved to golden_ratio_win_curve.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The win probability curve produced by this simulation has a distinctive shape: it rises steeply from near 0.42 at threshold t = 0 (never redraw), peaks at approximately 0.50 near t ≈ 0.618, and declines again as the threshold rises above the equilibrium. Crucially, the curve is flat near the peak — there is a range of thresholds in the neighborhood of 0.618 that produce nearly identical win rates against the optimal opponent. This flatness is characteristic of Nash equilibria in continuous games: the equilibrium strategy is the maximizer of the win probability function, and the first derivative of win probability with respect to your threshold must equal zero at t^*, which is exactly what the indifference condition expresses.&lt;/p&gt;

&lt;p&gt;The curve also demonstrates that playing below the equilibrium (say, t = 0.3) is more damaging than playing above it (say, t = 0.8). A player who almost never redraws gives up too much by accepting poor first draws. A player who redraws very aggressively burns their first draw even when it was quite good, and the second draw is no better in expectation. The equilibrium threshold t^* ≈ 0.618 balances these opposing costs precisely at the point where no first-order gain is available from deviating in either direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Application: Optimal Stopping in M&amp;amp;A and Hiring
&lt;/h2&gt;

&lt;p&gt;The Do-Over Game is a minimal formalization of a family of problems that arise constantly in business: you have an opportunity in front of you right now, you are uncertain whether a better opportunity is available if you wait (or search further), and the act of waiting or searching has a cost. The Nash equilibrium structure of the Do-Over Game — and the fact that the threshold is determined by a fixed-point condition rather than by the first-order optimality conditions of a single-player problem — illuminates why competitive settings systematically produce different optimal thresholds than monopoly or single-agent settings.&lt;/p&gt;

&lt;p&gt;In mergers and acquisitions, a sell-side advisor running a competitive auction receives bids from multiple acquirers in sequence. The question of whether to accept the current-best bid or continue the process is an optimal stopping problem with strategic content: the acquirers know the sell-side is comparing their offer to alternatives, and they shade their bids accordingly. The seller's optimal threshold for accepting a bid is not the single-agent optimal stopping threshold (which would be determined by the distribution of bid values alone) but a game-theoretic threshold that accounts for the anticipated bidding behavior of all participants. When multiple sellers in the same sector run simultaneous processes — as in a sector roll-up or during private equity vintage years with heavy deal activity — the equilibrium thresholds across all processes are mutually determined by exactly the kind of fixed-point reasoning we applied above.&lt;/p&gt;

&lt;p&gt;In hiring decisions, a firm interviewing candidates faces the same structure. Accepting the current candidate means closing the search; continuing means risking that the current candidate accepts a competing offer (analogous to the redrawn value going to the opponent). The optimal stopping rule in the classic Secretary Problem — accept the first candidate who exceeds all previous candidates, after observing a fraction 1/e of the total pool — is the single-agent solution. But when multiple firms are simultaneously recruiting from the same candidate pool, each candidate is also making a strategic decision about which offer to accept, and the firms' hiring thresholds are jointly determined in equilibrium. The resulting thresholds are higher than the single-agent thresholds, just as the Nash equilibrium threshold in the Do-Over Game (0.618) exceeds the single-agent optimal threshold (0.5). Competition for talent drives all participants to make earlier, more aggressive offers — a prediction that matches observable hiring behavior in tight labor markets.&lt;/p&gt;

&lt;p&gt;In algorithmic trading, the problem of when to submit a bid versus when to revise based on updated order flow information has the same mathematical skeleton. A market maker observing an incoming order must decide whether to quote at the current spread (keeping their draw) or to reprice (redrawing) based on fresh information. In a competitive market-making environment where multiple market makers are simultaneously deciding, the equilibrium quoting strategy is determined by a fixed-point condition, and the aggressiveness of quoting is higher (the threshold for repricing is lower) than it would be in a monopoly market-making environment. The Do-Over Game threshold gives the minimal analytic skeleton of this equilibrium structure — the full theory, with continuous-time order flow and inventory risk, is considerably more complex, but the fixed-point logic and the golden ratio-like constants that emerge from it appear throughout the auction theory and market microstructure literature.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/do-over-game-golden-ratio/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Cash Flow Waterfall Model for LBO</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:44:05 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/cash-flow-waterfall-model-for-lbo-460g</link>
      <guid>https://dev.to/white_oak_intel/cash-flow-waterfall-model-for-lbo-460g</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#how-waterfall-priority-works" rel="noopener noreferrer"&gt;How Waterfall Priority Works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#modeling-the-debt-structure" rel="noopener noreferrer"&gt;Modeling the Debt Structure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#python-implementation" rel="noopener noreferrer"&gt;Python Implementation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#dscr-interpretation" rel="noopener noreferrer"&gt;DSCR Interpretation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#working-example-manufacturing-lbo" rel="noopener noreferrer"&gt;Working Example: Manufacturing LBO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/#when-to-refinance-vs-repay" rel="noopener noreferrer"&gt;When to Refinance vs. Repay&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How Waterfall Priority Works
&lt;/h2&gt;

&lt;p&gt;In a leveraged buyout, cash does not flow freely to equity until every obligation above it in the capital structure has been satisfied. That sequencing — senior debt first, mezzanine second, equity last — is what a waterfall model formalizes. Get the order wrong and you will either overstate free cash flow to equity or miss a covenant breach entirely.&lt;/p&gt;

&lt;p&gt;The mechanics are straightforward: operating cash flow enters the top of the waterfall. From there, it cascades through each tranche in strict priority order. What remains after each tranche's interest and required principal is the cash available to the next level. What exits the bottom is the true free cash flow available to equity holders — often a very different number than EBITDA minus interest expense suggests.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Why This Matters&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;EBITDA-based valuations routinely overstate equity value by treating all debt as equal. A 12M EBITDA business with9M in senior debt at 8.5% and 3M in mezzanine at 14.5% has roughly1.3M in true free cash flow after full service — not $3.9M. The difference can make or break a deal thesis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modeling the Debt Structure
&lt;/h2&gt;

&lt;p&gt;Every LBO waterfall model starts with an accurate representation of each debt tranche. The minimum attributes you need for each instrument are the outstanding principal, the annual interest rate, and the required annual principal payment. In practice you also want the tranche name and its position in the priority stack, since that order drives everything else.&lt;/p&gt;

&lt;p&gt;Before cash flows to debt service, two additional deductions reduce operating cash flow: capital expenditure requirements (which sustain the asset base that generates earnings) and cash taxes on post-interest income. Many simplified models skip cash taxes entirely, which overstates available cash for service by 25–35% depending on the tax jurisdiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Implementation
&lt;/h2&gt;

&lt;p&gt;The implementation below structures each debt tranche as a dataclass and runs the waterfall logic through a single &lt;code&gt;run()&lt;/code&gt; method on a parent model. This design keeps the tranche attributes immutable while letting the waterfall execute cleanly against any operating cash flow input.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DebtTranche&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;          &lt;span class="c1"&gt;# outstanding balance
&lt;/span&gt;    &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;               &lt;span class="c1"&gt;# annual interest rate as decimal
&lt;/span&gt;    &lt;span class="n"&gt;required_amortization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;  &lt;span class="c1"&gt;# mandatory annual principal payment
&lt;/span&gt;    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;             &lt;span class="c1"&gt;# 1 = most senior
&lt;/span&gt;
&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WaterfallModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ebitda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;capex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;tax_rate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;tranches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;DebtTranche&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Sort tranches by priority (most senior first)
&lt;/span&gt;        &lt;span class="n"&gt;ordered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tranches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Compute total interest for tax shield calculation
&lt;/span&gt;        &lt;span class="n"&gt;total_interest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;taxable_income&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ebitda&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;total_interest&lt;/span&gt;
        &lt;span class="n"&gt;cash_taxes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taxable_income&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tax_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Cash available after capex and taxes
&lt;/span&gt;        &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ebitda&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;capex&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cash_taxes&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;interest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;
            &lt;span class="n"&gt;total_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interest&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;required_amortization&lt;/span&gt;
            &lt;span class="n"&gt;dscr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_service&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total_service&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;total_service&lt;/span&gt;

            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tranche&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;interest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amortization&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tranche&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;required_amortization&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dscr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dscr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cash_after_service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ebitda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ebitda&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;capex&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;capex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cash_taxes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cash_taxes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_interest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_interest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;free_cash_flow&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tranches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  DSCR Interpretation
&lt;/h2&gt;

&lt;p&gt;The debt service coverage ratio — EBITDA available for service divided by total debt service due — is the single number lenders watch most closely. A ratio below 1.0x means the business cannot cover its own debt obligations from operating cash flow, which typically triggers default provisions. But even ratios above 1.0x can represent thin margins that make covenant compliance brittle.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Below 1.0x&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Cash flow insufficient to cover service — &lt;em&gt;Lender Signal:&lt;/em&gt; Covenant breach, potential default&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;1.0x – 1.15x&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Barely covering; no cushion — &lt;em&gt;Lender Signal:&lt;/em&gt; Elevated scrutiny; covenant waiver likely needed&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;1.15x – 1.35x&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Adequate but tight; standard for mezz debt — &lt;em&gt;Lender Signal:&lt;/em&gt; Within typical covenant thresholds&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;1.35x – 2.0x&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Comfortable coverage; senior debt territory — &lt;em&gt;Lender Signal:&lt;/em&gt; Favorable terms; prepayment conversation possible&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Above 2.0x&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Strong coverage; possible over-equity at acquisition — &lt;em&gt;Lender Signal:&lt;/em&gt; Refinancing or dividend recapitalization opportunity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Working Example: Manufacturing LBO
&lt;/h2&gt;

&lt;p&gt;Consider a 15M revenue light manufacturing business acquired in an LBO at 5.5x EBITDA. The deal is structured with two tranches:9M in senior secured debt at 8.5% with 7% annual amortization, and 3M in mezzanine debt at 14.5% with PIK interest allowed in year one. The business generates2.7M in EBITDA and requires $400K in annual maintenance capex.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;senior&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DebtTranche&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Senior Secured&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9_000_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.085&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;required_amortization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;630_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 7% of balance
&lt;/span&gt;    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;mezzanine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DebtTranche&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mezzanine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3_000_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.145&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;required_amortization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# PIK year one
&lt;/span&gt;    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WaterfallModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ebitda&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2_700_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tax_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tranches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;senior&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mezzanine&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# Free cash flow to equity: ~$1,310,000
# Senior DSCR: 1.47x  |  Blended DSCR after both tranches: 1.19x
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model surfaces a 1.47x DSCR at the senior tranche — comfortable — but drops to 1.19x after accounting for mezzanine interest. With the senior lender's covenant typically set at 1.25x minimum on blended service, this deal operates with only 40 basis points of EBITDA cushion before a breach. A 15% revenue miss would push the company into covenant violation territory in year one.&lt;/p&gt;

&lt;p&gt;"The waterfall tells you where the money actually goes. Everyone negotiates on EBITDA multiples, but the number that determines whether the deal works is free cash flow after full debt service — and those two numbers are rarely the same."&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Refinance vs. Repay
&lt;/h2&gt;

&lt;p&gt;Once the waterfall is running cleanly, the natural follow-on question is capital structure optimization: should excess free cash flow go toward accelerated principal repayment, or toward refinancing the most expensive tranche? The answer depends on the prepayment penalty, the current rate environment, and whether DSCR improvement creates meaningful covenant headroom.&lt;/p&gt;

&lt;p&gt;Mezzanine debt — typically carrying 200–400 basis points more than senior — is almost always the priority target. Every dollar of mezzanine retired eliminates 14–17 cents in annual interest expense with no prepayment penalty in most structures after year three. At 3M outstanding, refinancing the mezzanine tranche with proceeds from a senior revolver expansion (at 8.5% versus 14.5%) saves180K annually — which in a business with $1.3M of free cash flow is a meaningful improvement in equity return.&lt;/p&gt;

&lt;p&gt;The waterfall model makes these decisions transparent. Rather than arguing about blended cost of capital in the abstract, operators and sponsors can run the model forward with each scenario and see precisely how the DSCR profile and equity cash flow change across a three-to-five-year hold period.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/cash-flow-waterfall-model/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>programming</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Amoeba Extinction Probability: The Branching Process Solution</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:40:03 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/amoeba-extinction-probability-the-branching-process-solution-3jka</link>
      <guid>https://dev.to/white_oak_intel/amoeba-extinction-probability-the-branching-process-solution-3jka</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#the-question" rel="noopener noreferrer"&gt;The Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#why-candidates-get-this-wrong" rel="noopener noreferrer"&gt;Why Candidates Get This Wrong&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#setting-up-the-fixed-point-equation" rel="noopener noreferrer"&gt;Setting Up the Fixed-Point Equation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#solving-the-quadratic" rel="noopener noreferrer"&gt;Solving the Quadratic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#the-general-branching-process-theorem" rel="noopener noreferrer"&gt;The General Branching Process Theorem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#python-simulation-watching-populations-collapse" rel="noopener noreferrer"&gt;Python Simulation: Watching Populations Collapse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/#business-application-default-cascades-and-contagion-risk" rel="noopener noreferrer"&gt;Business Application: Default Cascades and Contagion Risk&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;We begin with exactly one amoeba. Every minute, independently and with equal probability — one-third for each outcome — it does one of three things: it dies and leaves no offspring, it survives unchanged and produces no new cells, or it divides into exactly two amoebas. Each of those two daughter amoebas then faces the same three possible outcomes in the next minute, acting entirely independently of each other and of any other amoebas in the population. This process repeats indefinitely.&lt;/p&gt;

&lt;p&gt;The question: what is the probability that the entire population eventually goes extinct — that is, reaches zero amoebas — at some point in the future?&lt;/p&gt;

&lt;p&gt;This is a classic problem from quant finance interviews, appearing regularly at Goldman Sachs, Morgan Stanley, Citadel, and Two Sigma. Its power as an interview question lies in the multiple layers of wrong reasoning available to candidates who have not seen the branching process framework. The correct answer is that extinction is certain — the probability is exactly 1 — and the proof requires nothing more than the law of total probability, a quadratic equation, and a careful argument about which root to select. But getting there demands recognizing the recursive structure of the problem.&lt;/p&gt;

&lt;p&gt;Note carefully what "eventual extinction" means: we are not asking whether the population goes extinct in any fixed time window. We are asking whether, over an infinite time horizon, the probability of the population ever hitting zero is strictly positive, and what that probability is. The answer, as we will show, is that it equals 1 — extinction is not just possible but guaranteed, in the sense that the probability of surviving forever is zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Candidates Get This Wrong
&lt;/h2&gt;

&lt;p&gt;The most common first answer is: "The amoeba splits one-third of the time, so the population grows. Extinction cannot be certain." This argument is intuitive and wrong. Yes, there is a positive probability of growth at each step. But the branching process is not the same as a simple random walk with drift. Growth and extinction are not symmetric outcomes, and the random fluctuations in a branching process can compound in ways that lead to collapse even when the population is large.&lt;/p&gt;

&lt;p&gt;To see why the intuition fails, consider what happens even when the population is large — say, 1,000 amoebas. In the next generation, some die, some stay, some split. The population has a random walk-like dynamic with mean zero drift (since the mean offspring is exactly 1). But a random walk with zero drift, starting at any positive value, will hit zero in finite time with probability 1. The population behaves exactly like such a walk, and "hitting zero" is extinction.&lt;/p&gt;

&lt;p&gt;The second common wrong answer is: "The expected population size is constant (since mean offspring = 1), so the population is a martingale, and by the martingale convergence theorem, it converges to some positive limit." This is also wrong, and the error is subtle. A non-negative martingale converges almost surely to a non-negative limit, but that limit may be zero — indeed, for branching processes in the critical case, the limiting distribution assigns probability 1 to the value zero. The martingale convergence theorem guarantees convergence, not that the limit is positive.&lt;/p&gt;

&lt;p&gt;Let us compute the mean offspring explicitly before proceeding to the proof:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cmu%2520%253D%25200%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25201%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25202%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B0%2520%252B%25201%2520%252B%25202%257D%257B3%257D%2520%253D%25201" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cmu%2520%253D%25200%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25201%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%25202%2520%255Ccdot%2520%255Cfrac%257B1%257D%257B3%257D%2520%253D%2520%255Cfrac%257B0%2520%252B%25201%2520%252B%25202%257D%257B3%257D%2520%253D%25201" alt="equation" width="355" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mean offspring per amoeba is exactly 1. This is called the critical case in branching process theory, and it is precisely the case where the result — certain extinction — is most counterintuitive. When the mean is less than 1, extinction is obviously expected (the population is shrinking on average). When the mean exceeds 1, extinction may or may not occur, and the extinction probability depends on the full distribution. But when the mean is exactly 1, extinction is guaranteed — and the proof requires the fixed-point algebra we are about to develop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Fixed-Point Equation
&lt;/h2&gt;

&lt;p&gt;Let p denote the probability that the entire population eventually goes extinct, starting from a single amoeba. This probability is well-defined and satisfies 0 ≤ p ≤ 1. We derive an equation for p by conditioning on what happens in the first minute — the first generation — and applying the law of total probability.&lt;/p&gt;

&lt;p&gt;In the first minute, exactly one of three mutually exclusive events occurs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With probability (1/3): the amoeba dies. The population immediately drops to zero, so extinction is immediate and certain. This event contributes probability (1/3) · 1 = (1/3) to the extinction probability.&lt;/li&gt;
&lt;li&gt;With probability (1/3): the amoeba survives unchanged. The population remains at exactly one amoeba, and the process restarts from scratch. By the definition of p and the Markov property (the future depends only on the present state, not on history), this amoeba will eventually go extinct with probability p. This event contributes (1/3) · p to the extinction probability.&lt;/li&gt;
&lt;li&gt;With probability (1/3): the amoeba divides into two. Now we have two amoebas, each acting independently with the same rules. For the entire population to eventually go extinct, both lineages must independently go extinct. Because the two lineages evolve independently and each starts from a single amoeba, each goes extinct with probability p. By independence, both go extinct with probability p · p = p². This event contributes (1/3) · p² to the extinction probability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Summing the contributions from all three cases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%2520%255Cfrac%257B1%257D%257B3%257Dp%2520%252B%2520%255Cfrac%257B1%257D%257B3%257Dp%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%2520%255Cfrac%257B1%257D%257B3%257D%2520%252B%2520%255Cfrac%257B1%257D%257B3%257Dp%2520%252B%2520%255Cfrac%257B1%257D%257B3%257Dp%255E2" alt="equation" width="181" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the fixed-point equation for the extinction probability. It says that p must satisfy this particular algebraic relationship — and any valid extinction probability must be a root of this equation that lies in [0, 1]. The extinction probability is self-consistent: conditioning on the first step and using the recursive structure of the branching process recovers the same probability p.&lt;/p&gt;

&lt;p&gt;Multiplying through by 3:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B3p%2520%253D%25201%2520%252B%2520p%2520%252B%2520p%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B3p%2520%253D%25201%2520%252B%2520p%2520%252B%2520p%255E2" alt="equation" width="161" height="21"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%255E2%2520-%25202p%2520%252B%25201%2520%253D%25200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%255E2%2520-%25202p%2520%252B%25201%2520%253D%25200" alt="equation" width="162" height="21"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528p%2520-%25201%2529%255E2%2520%253D%25200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%2528p%2520-%25201%2529%255E2%2520%253D%25200" alt="equation" width="136" height="22"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Result&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Extinction is certain. Starting from a single amoeba following this three-outcome process, the probability that the population eventually reaches zero is exactly 1 — even though the expected population size at any time t is constant. The population is a martingale that converges almost surely to zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the Quadratic
&lt;/h2&gt;

&lt;p&gt;The quadratic (p-1)² = 0 has a single root at p = 1, with multiplicity 2. There is no ambiguity in root selection: the only solution in [0,1] is p = 1. This is not an approximation and not a limiting value — it is the exact algebraic answer.&lt;/p&gt;

&lt;p&gt;The fact that p = 1 is a repeated root has geometric significance. The probability generating function of the offspring distribution is G(s) = (1/3) + (1/3)s + (1/3)s². The extinction probability is the fixed point of G, meaning the smallest non-negative solution to G(s) = s. At s = 1, we have G(1) = 1 trivially (generating functions always satisfy G(1) = 1 when the offspring distribution is proper). The extinction probability equals 1 precisely when G is tangent to the identity line at s = 1, which happens exactly when G'(1) = μ = 1. This tangency — the generating function touching rather than crossing the diagonal — is the geometric signature of the critical branching process.&lt;/p&gt;

&lt;p&gt;The contrast with the supercritical case is instructive. Suppose the probabilities were different: death probability 0.2, survival probability 0.3, and splitting probability 0.5. Then:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%25200.2%2520%252B%25200.3p%2520%252B%25200.5p%255E2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%25200.2%2520%252B%25200.3p%2520%252B%25200.5p%255E2" alt="equation" width="211" height="21"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B0.5p%255E2%2520-%25200.7p%2520%252B%25200.2%2520%253D%25200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B0.5p%255E2%2520-%25200.7p%2520%252B%25200.2%2520%253D%25200" alt="equation" width="213" height="21"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%255E2%2520-%25201.4p%2520%252B%25200.4%2520%253D%25200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%255E2%2520-%25201.4p%2520%252B%25200.4%2520%253D%25200" alt="equation" width="190" height="21"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%2520%255Csqrt%257B1.96%2520-%25201.6%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%2520%255Csqrt%257B0.36%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%25200.6%257D%257B2%257D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3Bp%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%2520%255Csqrt%257B1.96%2520-%25201.6%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%2520%255Csqrt%257B0.36%257D%257D%257B2%257D%2520%253D%2520%255Cfrac%257B1.4%2520%255Cpm%25200.6%257D%257B2%257D" alt="equation" width="430" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives two roots: p = 1.0 and p = 0.4. The mean offspring in this scenario is μ = 0(0.2) + 1(0.3) + 2(0.5) = 1.3 &amp;gt; 1. For a supercritical branching process (μ &amp;gt; 1), the extinction probability is the smaller root of the fixed-point equation — here p = 0.4. The population goes extinct with probability 40% and survives forever with probability 60%. This is the generic supercritical outcome: extinction is possible but not certain, and the exact probability is the unique solution in [0, 1).&lt;/p&gt;

&lt;p&gt;Why do we take the smaller root in the supercritical case? Because we want the smallest non-negative fixed point of the generating function G. When μ &amp;gt; 1, the function G(s) crosses the diagonal at some q ∈ (0, 1) before the trivial fixed point at s = 1. The crossing at q is the genuine extinction probability; the trivial fixed point at 1 corresponds to the probability of eventual extinction or survival, which is trivially 1 since those two events cover all possibilities. The biology selects the smallest fixed point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The General Branching Process Theorem
&lt;/h2&gt;

&lt;p&gt;The amoeba problem is an instance of the Galton-Watson branching process, named after Francis Galton and Henry Watson who developed the theory in the 1870s while studying the extinction of family surnames in Victorian England — a problem mathematically identical to amoeba extinction. The general theorem is among the most elegant in probability theory.&lt;/p&gt;

&lt;p&gt;Let Zₙ denote the population size in generation n, starting with Z₀ = 1. Each individual in generation n independently produces a random number of offspring in generation n+1 according to a fixed offspring distribution with probabilities {pₖ}&lt;em&gt;(k=0)^∞, where pₖ = P(offspring count = k). The mean offspring is μ = ∑&lt;/em&gt;(k=0)^∞ k · pₖ and the probability generating function is G(s) = ∑_(k=0)^∞ pₖ s^k.&lt;/p&gt;

&lt;p&gt;The Galton-Watson extinction theorem states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subcritical (μ &amp;lt; 1):&lt;/strong&gt; Extinction is certain. The population shrinks on average and collapses to zero with probability 1. The fixed-point equation G(s) = s has only the root s = 1 in [0,1].&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical (μ = 1):&lt;/strong&gt; Extinction is certain, provided p₁ &amp;lt; 1 (i.e., the process is not deterministically replaced one-for-one at every step). The generating function is tangent to the diagonal at s = 1, and p = 1 is the only fixed point in [0, 1]. Our amoeba problem falls here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supercritical (μ &amp;gt; 1):&lt;/strong&gt; The extinction probability q is strictly less than 1 and equals the unique fixed point of G(s) = s in [0, 1). With probability 1 - q &amp;gt; 0, the population grows without bound.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical case (μ = 1) is particularly striking. The population process {Zₙ}&lt;em&gt;(n ≥ 0) is a non-negative martingale — we can verify this directly: E[Z&lt;/em&gt;(n+1) | Zₙ] = Zₙ · μ = Zₙ · 1 = Zₙ. By the martingale convergence theorem, it converges almost surely to a limit Z_∞ ≥ 0. The content of the extinction theorem is that this limit satisfies P(Z_∞ = 0) = 1. The population does converge — to zero. The path to zero can be arbitrarily long; the population may grow for many generations before ultimately collapsing. But the collapse is certain.&lt;/p&gt;

&lt;p&gt;The intuition, made rigorous by the theory, is that variance accumulates over generations. Even with zero mean drift, the fluctuations in the branching process grow over time (the variance of Zₙ grows linearly in n for the critical case), and this increasing dispersion, combined with the absorbing barrier at zero, guarantees eventual absorption. The population wanders further and further from its starting point, but the absorbing state at zero captures it eventually with probability 1.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Subcritical&lt;/strong&gt;: &lt;em&gt;Mean Offspring μ:&lt;/em&gt; μ &amp;lt; 1 — &lt;em&gt;Extinction Probability q:&lt;/em&gt; q = 1 — &lt;em&gt;Interpretation:&lt;/em&gt; Population shrinks on average; certain extinction&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Critical&lt;/strong&gt;: &lt;em&gt;Mean Offspring μ:&lt;/em&gt; μ = 1 — &lt;em&gt;Extinction Probability q:&lt;/em&gt; q = 1 — &lt;em&gt;Interpretation:&lt;/em&gt; Zero-drift martingale; still certain extinction (our problem)&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Supercritical&lt;/strong&gt;: &lt;em&gt;Mean Offspring μ:&lt;/em&gt; μ &amp;gt; 1 — &lt;em&gt;Extinction Probability q:&lt;/em&gt; q ∈ (0, 1) — &lt;em&gt;Interpretation:&lt;/em&gt; Smallest fixed point of G(s) = s; non-trivial survival probability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Simulation: Watching Populations Collapse
&lt;/h2&gt;

&lt;p&gt;The simulation below runs up to 10,000 generations per trial, capping runaway populations at 10,000 to prevent memory exhaustion. With these parameters, the empirical extinction probability consistently falls between 0.96 and 0.99 — approaching but not reaching 1.0, because the simulation uses a finite time horizon whereas the mathematical result requires an infinite one. The gap between the simulated value and the true value of 1.0 represents the probability of populations that survive more than 10,000 generations — a small but nonzero number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;population&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evolve one generation of amoebas.

    Each amoeba independently:
      - Dies (outcome 0)     with probability 1/3
      - Survives (outcome 1) with probability 1/3
      - Splits (outcome 2)   with probability 1/3

    Returns the next generation population size.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;next_pop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;population&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="c1"&gt;# survive unchanged
&lt;/span&gt;            &lt;span class="n"&gt;next_pop&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;outcome&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="c1"&gt;# split into two
&lt;/span&gt;            &lt;span class="n"&gt;next_pop&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="c1"&gt;# outcome == 0: die — contribute 0 to next generation
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;next_pop&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_extinction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_generations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run one trial. Returns True if population goes extinct within max_generations.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;population&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_generations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;population&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;population&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="c1"&gt;# cap runaway populations for memory safety
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="n"&gt;population&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;population&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;population&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_extinction_probability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Estimate the extinction probability from n_trials independent simulations.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;extinctions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;simulate_extinction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_trials&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;extinctions&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n_trials&lt;/span&gt;


&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;p_empirical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_extinction_probability&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Empirical P(extinction) = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p_empirical&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mathematical result:      1.0000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gap (finite-horizon artifact): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;p_empirical&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Trace a few sample trajectories to illustrate the collapse
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sample population trajectories (first 20 generations):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;trial&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;pop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;trajectory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pop&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;trajectory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;pop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;trajectory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Trial &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trial&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;trajectory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The population trajectories reveal the core dynamics of the critical branching process. Some trials collapse immediately in the first generation when the single amoeba dies. Others grow for several generations — reaching populations of 10, 50, or even several hundred — before the random fluctuations eventually drive them to zero. This is what makes the critical case counterintuitive: you can observe the population growing for a long time and still be on a path toward certain eventual extinction. The growth is real but temporary; the collapse is certain but may be distant.&lt;/p&gt;

&lt;p&gt;Note the caveat about the finite-horizon simulation: the empirical extinction probability will be approximately 0.97 or 0.98, not 1.0. The remaining 2–3% of simulated trials represent populations that exceeded the 10,000-generation limit or the 10,000-amoeba cap without going extinct. These are genuine non-extinction paths in the simulation, but the mathematical theorem guarantees they would eventually collapse given infinite time. The simulation is a useful sanity check, but it cannot prove the mathematical result — only the algebra of the fixed-point equation can do that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Application: Default Cascades and Contagion Risk
&lt;/h2&gt;

&lt;p&gt;The Galton-Watson branching process is a foundational model for contagion — the spread of a disturbance through a network where each affected node triggers additional affected nodes, each of which may trigger still more. This structure appears throughout financial markets, supply chains, and epidemic modeling, and the extinction probability theorem gives a precise criterion for whether contagion will die out or propagate systemically.&lt;/p&gt;

&lt;p&gt;In credit markets, a defaulting firm does not always default in isolation. Suppliers that depended on the firm for revenue may themselves face cash flow disruption and default. Those suppliers' suppliers may do the same. Each default "reproduces" into a random number of additional defaults — the number depending on the firm's position in the production network, the severity of the cash flow shock, and the credit quality of its counterparties. When the mean branching factor — the expected number of additional defaults triggered by each default — is less than 1, contagion dies out quickly. When it exceeds 1, cascades can propagate to arbitrary scale.&lt;/p&gt;

&lt;p&gt;The 2008 financial crisis can be understood, at least partially, through this lens. The interconnection of mortgage-backed securities meant that a single wave of mortgage defaults could trigger losses at banks that held those securities, which could trigger counterparty defaults at firms that had entered into credit default swaps with those banks, which could trigger liquidity crises at funds that relied on those firms for financing. The branching factor of this network, under normal conditions, was subcritical — contagion was self-limiting. Under the stress of the housing collapse, it became briefly supercritical, and the resulting cascade required extraordinary government intervention to interrupt.&lt;/p&gt;

&lt;p&gt;The same framework applies to supply chain disruptions. A natural disaster that incapacitates a key semiconductor manufacturer (node zero in the branching process) may force automotive manufacturers who depend on that supplier to halt production, which may delay deliveries to dealerships, which may affect floor plan financing arrangements. The branching factor of this supply chain network determines whether the disruption propagates globally or dies out locally. Recent work on supply chain resilience focuses precisely on identifying and reducing the branching factor of critical nodes — reducing connectivity and building buffer inventory to drive the effective reproduction number below 1, the threshold for subcritical (self-limiting) contagion.&lt;/p&gt;

&lt;p&gt;The amoeba extinction problem, with its clean three-outcome setup and quadratic fixed-point equation, is the canonical minimal example of this entire family of models. Understanding the proof — why the critical case yields certain extinction, why the generating function tangency at s = 1 matters, and why variance accumulates to drive the zero-drift martingale to zero — is prerequisite knowledge for anyone working with contagion models in finance, epidemiology, or network science.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/amoeba-extinction-probability/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>interview</category>
      <category>python</category>
    </item>
    <item>
      <title>Variance Testing in Forecasting</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:38:44 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/variance-testing-in-forecasting-4g40</link>
      <guid>https://dev.to/white_oak_intel/variance-testing-in-forecasting-4g40</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/#why-mape-misleads" rel="noopener noreferrer"&gt;Why MAPE Misleads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/#the-four-metric-framework" rel="noopener noreferrer"&gt;The Four-Metric Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/#python-implementation" rel="noopener noreferrer"&gt;Python Implementation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/#residual-analysis-and-the-ljung-box-test" rel="noopener noreferrer"&gt;Residual Analysis and the Ljung-Box Test&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/#retrain-vs-recalibrate-decision-table" rel="noopener noreferrer"&gt;Retrain vs. Recalibrate Decision Table&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why MAPE Misleads
&lt;/h2&gt;

&lt;p&gt;Mean Absolute Percentage Error is the default metric for forecast evaluation in most business contexts. It is easy to explain: if your MAPE is 8%, your model is wrong by 8% on average. That simplicity is also its critical flaw.&lt;/p&gt;

&lt;p&gt;MAPE is undefined when actuals are zero — which happens constantly in revenue series with seasonal gaps, new product launches, or promotional periods. More subtly, it penalizes over-forecasts more severely than under-forecasts by construction: a 50% under-forecast has a maximum error contribution of 100%, while an over-forecast of equal magnitude can produce an error of 200% or more. This asymmetry means MAPE-optimized models systematically bias toward underestimating demand — a direction that is rarely operationally preferable.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Core Problem&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;A model can have a low MAPE and still be useless in practice. If it is consistently wrong in the same direction, if its errors correlate with past errors, or if it performs worse than a naive benchmark, those failures are invisible in a single-metric MAPE report.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four-Metric Framework
&lt;/h2&gt;

&lt;p&gt;A rigorous forecast evaluation requires at minimum four metrics, each measuring a different failure mode. Used together, they reveal whether a model is accurate in magnitude, unbiased, better than a naive baseline, and not systematically gaming a particular error measure.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;MAPE&lt;/strong&gt;: &lt;em&gt;What It Measures:&lt;/em&gt; Mean percentage error magnitude — &lt;em&gt;Key Property:&lt;/em&gt; Intuitive but unstable at low actuals&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;RMSE&lt;/strong&gt;: &lt;em&gt;What It Measures:&lt;/em&gt; Root mean squared error — &lt;em&gt;Key Property:&lt;/em&gt; Penalizes large errors; same units as the series&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;MASE&lt;/strong&gt;: &lt;em&gt;What It Measures:&lt;/em&gt; Mean absolute scaled error vs. seasonal naïve — &lt;em&gt;Key Property:&lt;/em&gt; Scale-free; MASE &amp;gt; 1.0 means worse than naïve&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Theil's U&lt;/strong&gt;: &lt;em&gt;What It Measures:&lt;/em&gt; RMSE ratio vs. no-change naïve — &lt;em&gt;Key Property:&lt;/em&gt; U &amp;gt; 1.0 means model is worse than doing nothing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Implementation
&lt;/h2&gt;

&lt;p&gt;The function below computes all four metrics from actuals and forecasts arrays. MASE uses a seasonal naïve benchmark with a configurable &lt;code&gt;seasonal_period&lt;/code&gt; — for monthly data the default of 12 compares each forecast to the value from the same month one year prior. When the series is shorter than one full season, it falls back to a one-step naïve benchmark.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_forecast_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;forecasts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;seasonal_period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1e-8&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;

    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;actuals&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;forecasts&lt;/span&gt;
    &lt;span class="n"&gt;abs_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# MAPE — skip near-zero actuals to avoid division instability
&lt;/span&gt;    &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;epsilon&lt;/span&gt;
    &lt;span class="n"&gt;mape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;abs_errors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

    &lt;span class="c1"&gt;# RMSE
&lt;/span&gt;    &lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# MASE — seasonal naïve benchmark
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;seasonal_period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;naive_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;seasonal_period&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;seasonal_period&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;naive_errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# one-step naïve fallback
&lt;/span&gt;
    &lt;span class="n"&gt;naive_mae&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;naive_errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;abs_errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;naive_mae&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Theil's U — compare model RMSE to no-change naïve RMSE
&lt;/span&gt;    &lt;span class="n"&gt;naive_rmse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;theil_u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rmse&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;naive_rmse&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;epsilon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mape&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mape&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rmse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rmse&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mase&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mase&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;theil_u&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;theil_u&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Residual Analysis and the Ljung-Box Test
&lt;/h2&gt;

&lt;p&gt;A well-specified forecast model should produce residuals that are white noise: random, uncorrelated, and centered near zero. If residuals show autocorrelation — if this period's error predicts next period's error — the model is leaving systematic information on the table. That pattern is detectable and exploitable, which means the model is not doing its job.&lt;/p&gt;

&lt;p&gt;The Ljung-Box test is the standard statistical tool for detecting residual autocorrelation. It tests the null hypothesis that residuals up to lag &lt;em&gt;k&lt;/em&gt; are white noise. A p-value below 0.05 rejects that hypothesis and confirms the model has structural problems that cannot be patched by recalibration alone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;statsmodels.stats.diagnostic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;acorr_ljungbox&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;residual_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;actuals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;forecasts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;residuals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;actuals&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;forecasts&lt;/span&gt;
    &lt;span class="n"&gt;lb_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;acorr_ljungbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lags&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;lb_stat&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lb_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lb_stat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;lb_pval&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lb_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lb_pvalue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;autocorrelated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lb_pval&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;significance&lt;/span&gt;

    &lt;span class="n"&gt;residual_mean&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;residual_std&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;max_abs_residual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;autocorrelated&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residual_mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;residual_std&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diagnosis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RETRAIN: systematic bias with autocorrelation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;autocorrelated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diagnosis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RETRAIN: autocorrelated residuals indicate model misspecification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residual_mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;residual_std&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diagnosis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RECALIBRATE: bias without autocorrelation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;diagnosis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PASS: residuals appear well-behaved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ljung_box_stat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lb_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ljung_box_pvalue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lb_pval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;autocorrelated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;autocorrelated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;residual_mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residual_mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;residual_std&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residual_std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_abs_residual&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_abs_residual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;diagnosis&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;diagnosis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Retrain vs. Recalibrate Decision Table
&lt;/h2&gt;

&lt;p&gt;Not every model failure requires a full retrain. Retraining means rebuilding the model from scratch on a new or expanded dataset — a significant undertaking for complex models. Recalibration means adjusting existing parameters, updating intercepts, or applying a bias correction factor. Knowing which intervention is appropriate requires reading the diagnostic signals together.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;MASE &amp;gt; 1.0&lt;/strong&gt;: &lt;em&gt;Recommended Action:&lt;/em&gt; Retrain — &lt;em&gt;Rationale:&lt;/em&gt; Model underperforms a naïve baseline — structural failure&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Autocorrelated + bias&lt;/strong&gt;: &lt;em&gt;Recommended Action:&lt;/em&gt; Retrain — &lt;em&gt;Rationale:&lt;/em&gt; Model is missing a systematic component; recalibration cannot fix this&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Non-autocorrelated + bias&lt;/strong&gt;: &lt;em&gt;Recommended Action:&lt;/em&gt; Recalibrate — &lt;em&gt;Rationale:&lt;/em&gt; Model structure is correct; apply bias correction or update intercept&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;All metrics passing&lt;/strong&gt;: &lt;em&gt;Recommended Action:&lt;/em&gt; Monitor — &lt;em&gt;Rationale:&lt;/em&gt; Continue scheduled evaluation; no intervention needed&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Theil's U &amp;gt; 1.0 despite low MAPE&lt;/strong&gt;: &lt;em&gt;Recommended Action:&lt;/em&gt; Retrain — &lt;em&gt;Rationale:&lt;/em&gt; Model exploits MAPE asymmetry; real-world performance is worse than reported&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"A forecast model that passes its MAPE target while underperforming a naïve benchmark is not a model that works — it is a model that has learned to game a poorly chosen metric."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/variance-testing-forecasts/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>Stochastic vs. Deterministic Models</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:32:09 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/stochastic-vs-deterministic-models-2pii</link>
      <guid>https://dev.to/white_oak_intel/stochastic-vs-deterministic-models-2pii</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#the-false-precision-of-deterministic-models" rel="noopener noreferrer"&gt;The False Precision of Deterministic Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#what-stochastic-models-do-differently" rel="noopener noreferrer"&gt;What Stochastic Models Do Differently&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#monte-carlo-valuation-in-python" rel="noopener noreferrer"&gt;Monte Carlo Valuation in Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#reading-the-confidence-interval-output" rel="noopener noreferrer"&gt;Reading the Confidence Interval Output&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#practical-application-in-deal-contexts" rel="noopener noreferrer"&gt;Practical Application in Deal Contexts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/#when-deterministic-models-are-still-useful" rel="noopener noreferrer"&gt;When Deterministic Models Are Still Useful&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The False Precision of Deterministic Models
&lt;/h2&gt;

&lt;p&gt;A standard DCF or EBITDA-multiple valuation produces a single number. That number gets presented in a board deck with two decimal places, anchors a negotiation, and shapes a capital decision worth millions of dollars. The problem is that the number is not a prediction — it is a calculation that depends on input assumptions that are, themselves, uncertain. Changing the revenue growth assumption by two percentage points or the EBITDA multiple by half a turn can move the output by 20–40%.&lt;/p&gt;

&lt;p&gt;Deterministic models handle this by running three scenarios: base, upside, and downside. This approach has two critical weaknesses. First, it treats each scenario as equally likely, when in reality there is a continuous distribution of possible outcomes. Second, it only samples three points from that distribution — missing the tail risks that determine whether a deal is financeable or survivable under stress.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Core Issue&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;When a lender asks "what is the probability this business can service its debt if revenue comes in 15% below plan?" — a deterministic model cannot answer. A stochastic model can answer that question precisely, because it has already simulated 10,000 versions of that business's future.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Stochastic Models Do Differently
&lt;/h2&gt;

&lt;p&gt;A stochastic valuation model treats each uncertain input as a probability distribution rather than a fixed number. Revenue growth is not "8%" — it is normally distributed with a mean of 8% and a standard deviation calibrated to the business's historical volatility. EBITDA margin is not "22%" — it is drawn from a distribution that reflects the range of realistic operating outcomes given the cost structure and competitive environment.&lt;/p&gt;

&lt;p&gt;Running 10,000 iterations samples 10,000 combinations of these inputs and produces 10,000 enterprise value outcomes. The result is a full distribution — not a point estimate, but a probability-weighted view of value across the realistic outcome space. The median of that distribution is the defensible central estimate. The P5 and P95 percentiles define the bounds of what is plausible under normal conditions. Anything below P5 is a genuine tail scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monte Carlo Valuation in Python
&lt;/h2&gt;

&lt;p&gt;The function below implements a Monte Carlo enterprise valuation. Each simulation draws independent samples for revenue growth, EBITDA margin, and EBITDA multiple — the three primary drivers of value in a middle-market business — and computes an enterprise value from each combination. The output is a dictionary of percentile statistics that can be presented directly in a deal memo or board presentation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monte_carlo_valuation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ebitda_margin_mean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ebitda_margin_std&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;revenue_growth_mean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;revenue_growth_std&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ebitda_multiple_mean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ebitda_multiple_std&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_simulations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# Sample distributions for each uncertain input
&lt;/span&gt;    &lt;span class="n"&gt;growth&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue_growth_mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;revenue_growth_std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_simulations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;margins&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ebitda_margin_mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;ebitda_margin_std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;n_simulations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;multiples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ebitda_multiple_mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ebitda_multiple_std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_simulations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Clamp to realistic bounds
&lt;/span&gt;    &lt;span class="n"&gt;margins&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;margins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;multiples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;multiples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Compute enterprise value for each simulation
&lt;/span&gt;    &lt;span class="n"&gt;projected_revenue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;growth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ebitda&lt;/span&gt;            &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;projected_revenue&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;margins&lt;/span&gt;
    &lt;span class="n"&gt;enterprise_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ebitda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;multiples&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;median_ev&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;   &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p25&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p75&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p95&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;std_dev&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise_values&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;          &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reading the Confidence Interval Output
&lt;/h2&gt;

&lt;p&gt;The simulation output provides a complete statistical picture of the valuation. The median enterprise value is the central estimate — the value at which half of simulated outcomes fall above and half below. The interquartile range (P25 to P75) represents the most probable outcomes under normal business conditions. The P5 to P95 range encompasses 90% of simulated outcomes and defines the plausible boundaries of the deal's value.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;P5&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Severe downside — only 5% of outcomes are worse — &lt;em&gt;Use In Deal Context:&lt;/em&gt; Lender stress test floor; covenant breach threshold&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;P25&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Weak performance — business underperforming but not failing — &lt;em&gt;Use In Deal Context:&lt;/em&gt; Downside case for equity return modeling&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Median&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Central estimate; most likely single-point value — &lt;em&gt;Use In Deal Context:&lt;/em&gt; Offer price anchor; board-level summary&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;P75&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Strong performance — above-average execution — &lt;em&gt;Use In Deal Context:&lt;/em&gt; Upside case for equity return modeling&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;P95&lt;/strong&gt;: &lt;em&gt;Interpretation:&lt;/em&gt; Exceptional outcome — only 5% of outcomes are better — &lt;em&gt;Use In Deal Context:&lt;/em&gt; Maximum realistic value; earnout ceiling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Application in Deal Contexts
&lt;/h2&gt;

&lt;p&gt;In a buy-side M&amp;amp;A process, the Monte Carlo output answers questions that a deterministic model cannot. At what price does the P5 enterprise value fall below the debt load? That is the price at which 95% of outcomes are financeable. What is the probability that enterprise value at exit exceeds the acquisition price plus required equity return? The simulation answers that directly — it is the percentage of outcomes above the hurdle.&lt;/p&gt;

&lt;p&gt;Sellers benefit equally. A Monte Carlo valuation provided to a buyer's lender demonstrates that the base case is not just management optimism — it sits at the median of a rigorously constructed distribution. That framing tends to support tighter credit spreads and higher leverage ratios because the lender can see the stress scenarios quantitatively rather than having to imagine them.&lt;/p&gt;

&lt;p&gt;"A deterministic model tells a lender what management hopes will happen. A stochastic model tells a lender the probability that the business can service its debt under realistic adverse conditions. Those are very different documents."&lt;/p&gt;

&lt;h2&gt;
  
  
  When Deterministic Models Are Still Useful
&lt;/h2&gt;

&lt;p&gt;Stochastic models are not always the right tool. For businesses with highly predictable cash flows — long-term contracted revenue, regulated utilities, businesses with multi-year take-or-pay agreements — the variance in outcomes is genuinely low and a deterministic model may be more appropriate. The key question is whether the input assumptions are genuinely uncertain or genuinely fixed. Where inputs are fixed by contract or regulation, stochastic modeling adds complexity without proportional insight.&lt;/p&gt;

&lt;p&gt;Deterministic models also remain useful as the first layer of analysis: build the base case, stress-test the key assumptions manually, and then commission a Monte Carlo simulation only if the manual sensitivity analysis reveals meaningful variance. For a business where a 2-turn change in EBITDA multiple changes the enterprise value by 30%, the simulation is essential. For a business valued primarily on liquidation of hard assets, a careful appraisal is more useful than 10,000 simulated outcomes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/stochastic-vs-deterministic/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>datascience</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Real-Time KPI Dashboards</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:32:02 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/real-time-kpi-dashboards-3efj</link>
      <guid>https://dev.to/white_oak_intel/real-time-kpi-dashboards-3efj</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/#static-vs-real-time-the-gap-that-matters" rel="noopener noreferrer"&gt;Static vs. Real-Time: The Gap That Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/#the-watermark-data-layer" rel="noopener noreferrer"&gt;The Watermark Data Layer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/#the-stateful-compute-layer" rel="noopener noreferrer"&gt;The Stateful Compute Layer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/#threshold-monitoring-and-alerts" rel="noopener noreferrer"&gt;Threshold Monitoring and Alerts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/#connecting-to-a-live-dashboard" rel="noopener noreferrer"&gt;Connecting to a Live Dashboard&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Static vs. Real-Time: The Gap That Matters
&lt;/h2&gt;

&lt;p&gt;Most operational dashboards in middle-market companies are not real-time. They are scheduled exports — nightly SQL queries, morning email reports, or weekly spreadsheet refreshes — dressed up with a modern UI. The data on screen is hours or days old before anyone reads it. For KPIs that drive same-day operational decisions, that lag is consequential.&lt;/p&gt;

&lt;p&gt;The standard solution — Kafka plus Spark Streaming plus a time-series database — is powerful but carries significant operational overhead. For companies that do not need sub-second latency or multi-terabyte event volumes, there is a simpler path: watermark-based incremental queries against an existing transactional database, paired with a stateful in-process compute layer that maintains running KPI values between polling cycles.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;Scheduled export&lt;/strong&gt;: &lt;em&gt;Latency:&lt;/em&gt; Hours–days — &lt;em&gt;Infrastructure:&lt;/em&gt; Cron + SQL — &lt;em&gt;Best For:&lt;/em&gt; Weekly reporting, board summaries&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Watermark polling&lt;/strong&gt;: &lt;em&gt;Latency:&lt;/em&gt; 30 sec – 5 min — &lt;em&gt;Infrastructure:&lt;/em&gt; Existing DB + Python — &lt;em&gt;Best For:&lt;/em&gt; Operational dashboards, same-day alerts&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;Streaming (Kafka/Spark)&lt;/strong&gt;: &lt;em&gt;Latency:&lt;/em&gt; Milliseconds — &lt;em&gt;Infrastructure:&lt;/em&gt; Kafka + Spark + TSDB — &lt;em&gt;Best For:&lt;/em&gt; Financial trading, fraud detection, IoT&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Watermark Data Layer
&lt;/h2&gt;

&lt;p&gt;A watermark is a timestamp that marks the last successfully processed record. On each polling cycle, the data layer queries only records created after the watermark, processes them, and advances the watermark to the end of the batch. This pattern is incremental, idempotent-friendly, and imposes minimal load on the source database — a full table scan runs once, then every subsequent query touches only new data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WatermarkDataLayer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn_string&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;watermark&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# initial watermark
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch_limit&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;SELECT transaction_id, created_at, amount,
                          transaction_type, user_id
                   FROM transactions
                   WHERE created_at &amp;gt; %(watermark)s
                   ORDER BY created_at
                   LIMIT %(batch_limit)s&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;watermark&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;watermark&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;batch_limit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;batch_limit&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Advance watermark to the latest record in this batch
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;watermark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
             &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Stateful Compute Layer
&lt;/h2&gt;

&lt;p&gt;The compute layer maintains running KPI values in memory across polling cycles. Rather than recalculating metrics from scratch on every batch, it applies each new batch as a delta to the existing state. This makes the pattern highly efficient: a business processing 10,000 transactions per day only needs to compute a small fraction of that volume on any given poll cycle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;KPIComputeLayer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unique_users&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;revenue_by_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unique_users&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;revenue_by_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unique_users&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unique_users&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;revenue_by_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;revenue_by_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_order_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_check_thresholds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;alerts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_order_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min_aov&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;alerts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AOV below threshold: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_order_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;thresholds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;revenue_alert&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;alerts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue milestone reached: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;alerts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Threshold Monitoring and Alerts
&lt;/h2&gt;

&lt;p&gt;A KPI dashboard that requires a human to notice a problem has failed at its primary job. Threshold monitoring closes that loop: after each batch, the compute layer compares the current snapshot against defined thresholds and emits alerts when a KPI crosses a boundary. This can drive Slack notifications, PagerDuty pages, or email alerts to an operations manager without any additional infrastructure.&lt;/p&gt;

&lt;p&gt;The alert logic belongs in the compute layer, not in the dashboard front end. A dashboard can be closed. A compute layer runs continuously and fires alerts regardless of who is watching the screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting to a Live Dashboard
&lt;/h2&gt;

&lt;p&gt;The polling loop ties the two layers together. Every 60 seconds (or whatever interval the use case demands), it fetches a new batch from the data layer, applies it to the compute layer, and publishes the snapshot to whatever surface the dashboard reads from — a Redis key, a WebSocket endpoint, or a simple REST API serving the last computed state.&lt;/p&gt;

&lt;p&gt;The key design principle is separation of concerns. The data layer handles only extraction and watermark management. The compute layer handles only KPI math and alerting. The dashboard layer handles only rendering. This separation makes each component testable in isolation and replaceable without touching the others — which matters when the underlying database schema changes or the dashboard framework is swapped out.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/real-time-kpi-dashboard/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>dataengineering</category>
      <category>monitoring</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Markov Chain Coin Sequence: E[HH] vs E[HTH] Explained</title>
      <dc:creator>White Oak Intelligence</dc:creator>
      <pubDate>Sun, 31 May 2026 18:25:28 +0000</pubDate>
      <link>https://dev.to/white_oak_intel/markov-chain-coin-sequence-ehh-vs-ehth-explained-3hbc</link>
      <guid>https://dev.to/white_oak_intel/markov-chain-coin-sequence-ehh-vs-ehth-explained-3hbc</guid>
      <description>&lt;p&gt;&lt;strong&gt;In This Article&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#the-question" rel="noopener noreferrer"&gt;The Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#the-intuition-trap" rel="noopener noreferrer"&gt;The Intuition Trap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#building-the-state-machine-for-hh" rel="noopener noreferrer"&gt;Building the State Machine for HH&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#solving-the-system-e-hh-6" rel="noopener noreferrer"&gt;Solving the System: E[HH] = 6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#building-the-state-machine-for-hth" rel="noopener noreferrer"&gt;Building the State Machine for HTH&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#solving-the-system-e-hth-10" rel="noopener noreferrer"&gt;Solving the System: E[HTH] = 10&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#why-overlapping-patterns-change-everything" rel="noopener noreferrer"&gt;Why Overlapping Patterns Change Everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#python-simulation-100-000-trials" rel="noopener noreferrer"&gt;Python Simulation: 100,000 Trials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/#business-application-credit-migration-web-ranking" rel="noopener noreferrer"&gt;Business Application: Credit Migration &amp;amp; Web Ranking&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;You flip a fair coin — one with probability 1/2 of landing heads and 1/2 of landing tails — repeatedly, recording every result. What is the expected number of flips required until the sequence HH appears for the first time as consecutive results? What is the expected number of flips required until HTH appears for the first time?&lt;/p&gt;

&lt;p&gt;Both questions have the same surface structure: you want a specific consecutive pattern, and you want to know, on average, how many flips it takes to observe it. The coin is fair, the flips are independent, and the patterns are short. These seem like they should yield similar answers. They do not. HH takes exactly 6 flips on average. HTH takes exactly 10. The four-flip gap between those two answers is not a rounding artifact or a computational error — it is a precise consequence of the internal structure of each pattern, and deriving it rigorously is one of the cleanest demonstrations of absorbing Markov chain analysis you will encounter.&lt;/p&gt;

&lt;p&gt;This problem appears frequently in quantitative finance interviews — at firms like Jane Street, Citadel, and Two Sigma — precisely because it separates candidates who understand Markov structure from those who rely on heuristic reasoning. Getting the answer right, and being able to explain it, requires building a state machine, writing the system of first-step equations, and solving it algebraically. That is exactly what we will do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Intuition Trap
&lt;/h2&gt;

&lt;p&gt;Before the formal derivation, it is worth examining why intuition fails here. The most common wrong answer from candidates is that both expected values should be "similar" because the patterns are comparable in length. This intuition imports the correct observation that both HH and HTH are short sequences of a fair coin, but ignores the critical role of failure recovery — what happens when you are partway through building a pattern and the next flip breaks it.&lt;/p&gt;

&lt;p&gt;A slightly more sophisticated wrong approach is to reason from the probability of success in any given window. The probability that two consecutive flips form HH is (1/4), so "on average you need 4 pairs of flips, meaning 8 flips total." The probability that three consecutive flips form HTH is (1/8), so "on average you need 8 triples, meaning about 24 flips total." Both of these estimates are badly wrong. The true answers are 6 and 10 respectively. The flaw in this reasoning is that it treats successive windows as independent, when in reality they overlap: the second flip of one pair is the first flip of the next pair.&lt;/p&gt;

&lt;p&gt;A more tempting but equally wrong approach is to note that HH consists of two heads and P(H) = 1/2, so E[HH] = 1/P(HH) = 4, and similarly for HTH. This confuses waiting for a single event with waiting for a consecutive subsequence to appear in a random string — a fundamentally different problem. The probability of observing HH at any given pair of positions is 1/4, but the expected time until you first observe it is not 4 because consecutive positions are correlated. The absorbing Markov chain framework handles this correlation exactly.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;The Key Distinction&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The gap between E[HH] = 6 and E[HTH] = 10 is not about the lengths or probabilities of the patterns. It is about what happens when a partial match fails. The failure mode of each pattern has a completely different structure, and that structure determines how many flips are "wasted" when progress is lost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the State Machine for HH
&lt;/h2&gt;

&lt;p&gt;An absorbing Markov chain for a pattern-waiting problem tracks the longest suffix of the current flip history that is also a prefix of the target pattern. This is the essential insight: you do not need to remember the entire history, only how much progress toward the target you currently hold. For HH, this progress can be 0 characters (no useful suffix), 1 character (the last flip was H, giving us a one-character prefix match), or 2 characters (absorbed — you just completed HH).&lt;/p&gt;

&lt;p&gt;We therefore have three states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S₀:&lt;/strong&gt; Start state, or you just flipped T. You have no progress toward HH. This includes the very beginning of the sequence and any moment immediately after a tails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S₁:&lt;/strong&gt; You just flipped H. You have matched the first character of HH and are one flip away from completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S₂:&lt;/strong&gt; Absorbed. You just flipped a second consecutive H. The target HH has appeared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The transition rules follow directly from the coin flip probabilities:&lt;/p&gt;

&lt;p&gt;From S₀: With probability 1/2 you flip H and move to S₁. With probability 1/2 you flip T and remain in S₀. The tails neither helps nor hurts — you are still starting from zero progress.&lt;/p&gt;

&lt;p&gt;From S₁: With probability 1/2 you flip H and move to S₂ — you are done. With probability 1/2 you flip T and fall back to S₀. The tails destroys your one-character of progress completely, because a tails cannot appear anywhere in HH.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State machine for HH:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 1/2 (H)            1/2 (H)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;S0 ─────────► S1 ─────────► S2 [ABSORBED]&lt;br&gt;
 ▲             │&lt;br&gt;
 │   1/2 (T)  │  1/2 (T)&lt;br&gt;
 └────────────┘&lt;br&gt;
 └── self-loop (T) ──┘&lt;/p&gt;

&lt;p&gt;S0 --[H, 1/2]--&amp;gt; S1 --[H, 1/2]--&amp;gt; S2 (done)&lt;br&gt;
S0 --[T, 1/2]--&amp;gt; S0&lt;br&gt;
S1 --[T, 1/2]--&amp;gt; S0&lt;/p&gt;

&lt;p&gt;This diagram encodes the entire stochastic process. From any transient state, you flip the coin, the outcome determines which state you move to, and you continue until reaching S₂. The question is: what is the expected number of flips to reach S₂ starting from S₀?&lt;/p&gt;
&lt;h2&gt;
  
  
  Solving the System: E[HH] = 6
&lt;/h2&gt;

&lt;p&gt;Let E₀ denote the expected number of additional flips to reach absorption (complete HH) starting from state S₀, and let E₁ denote the same quantity starting from state S₁. We condition on the next flip using the first-step decomposition principle: you always pay exactly one flip for the next toss, and then you find yourself in a new state with its own remaining expected cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" alt="equation" width="203" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25280%2529%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25280%2529%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" alt="equation" width="206" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reading each equation: the leading 1 accounts for the flip you are about to take. In the first equation, that flip is H with probability 1/2 (moving to S₁, paying E₁ more) or T with probability 1/2 (staying in S₀, paying E₀ more). In the second equation, the flip is H with probability 1/2 — reaching absorption immediately, paying zero additional flips — or T with probability 1/2, resetting to S₀ and paying E₀ more.&lt;/p&gt;

&lt;p&gt;The algebra is clean. From the first equation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520-%2520%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520E_0%2520%253D%25202%2520%252B%2520E_1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520-%2520%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520E_0%2520%253D%25202%2520%252B%2520E_1" alt="equation" width="520" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Substituting into the second equation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25282%2520%252B%2520E_1%2529%2520%253D%25201%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25282%2520%252B%2520E_1%2529%2520%253D%25201%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1" alt="equation" width="401" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%255Cimplies%2520E_1%2520%253D%25204" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%255Cimplies%2520E_1%2520%253D%25204" alt="equation" width="210" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25202%2520%252B%25204%2520%253D%25206" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25202%2520%252B%25204%2520%253D%25206" alt="equation" width="159" height="15"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Result&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The expected number of flips to see HH, starting from scratch, is exactly 6. This is a hard number — not an approximation, not a simulation average. It is the exact solution to a linear system of two equations in two unknowns.&lt;/p&gt;

&lt;p&gt;The result has a satisfying interpretation. From S₀, you expect to need 2 flips just to get your first H. That gets you to S₁. From there, you expect to need 4 more flips to land the second H without being knocked back to S₀. The constant resets to S₀ from state S₁ are what push the expected value from the naive estimate of 4 to the correct answer of 6. Every time you are in S₁ and flip tails, you lose your progress and must rebuild it at a cost of E₀ expected flips — which is itself expensive because of the same feedback loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the State Machine for HTH
&lt;/h2&gt;

&lt;p&gt;The state machine for HTH requires four states instead of two transient states, because the pattern is three characters long and the possible "prefix match lengths" are 0, 1, 2, or 3 (absorbed). But what makes this problem fundamentally harder is not the extra state — it is the non-trivial transition structure when a partial match fails.&lt;/p&gt;

&lt;p&gt;The states are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S₀:&lt;/strong&gt; No progress. You are at the start, or the last flip broke the pattern without leaving any reusable suffix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S₁:&lt;/strong&gt; You have matched the first character of HTH — the last flip was H. You hold a one-character prefix match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S₂:&lt;/strong&gt; You have matched the first two characters — the last two flips were HT. You hold a two-character prefix match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S₃:&lt;/strong&gt; Absorbed — you just completed HTH.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now consider the transition rules carefully, because the transitions from S₁ are where the real complexity lives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From S₀:&lt;/strong&gt; Flip H (prob 1/2) → move to S₁. Flip T (prob 1/2) → stay in S₀. A leading tails is useless — HTH begins with H.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From S₁ (have seen H):&lt;/strong&gt; Flip T (prob 1/2) → move to S₂. You've now matched HT and are set up for the final H. But critically: Flip H (prob 1/2) → stay in S₁. This is the non-obvious transition. You were in S₁ (last flip was H) and flipped another H. Your progress at HTH is not destroyed — your most recent H is still a valid start of HTH. You remain in S₁ with a fresh single-H prefix match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From S₂ (have seen HT):&lt;/strong&gt; Flip H (prob 1/2) → move to S₃, absorbed. You completed HTH. Flip T (prob 1/2) → reset to S₀. After HT, a second T gives you HTT. The trailing T does not form the beginning of any prefix of HTH (which begins with H), so all progress is lost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State machine for HTH:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    H (1/2)         T (1/2)         H (1/2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;S0 ──────────► S1 ──────────► S2 ──────────► S3 [ABSORBED]&lt;br&gt;
▲              │               │&lt;br&gt;
│   T (1/2)   │ H (1/2)       │ T (1/2)&lt;br&gt;
└─────────────┘   (self-loop) └──────────────► S0&lt;br&gt;
└── T self-loop on S0 ──┘&lt;/p&gt;

&lt;p&gt;S0 --[H, 1/2]--&amp;gt; S1     S0 --[T, 1/2]--&amp;gt; S0&lt;br&gt;
S1 --[T, 1/2]--&amp;gt; S2     S1 --[H, 1/2]--&amp;gt; S1  (self-loop!)&lt;br&gt;
S2 --[H, 1/2]--&amp;gt; S3     S2 --[T, 1/2]--&amp;gt; S0&lt;/p&gt;

&lt;p&gt;The self-loop on S₁ is the defining structural feature of this problem. When you are in S₁ and flip another H, you do not advance and you do not fully regress — you stay exactly where you are. This seems like it should help: you are not losing your H prefix. But this transition creates a probability sink: you can bounce around S₁ repeatedly before ever making it to S₂, and each bounce costs you a flip. The net effect is that reaching S₂ from S₁ is itself an expensive sub-problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  Solving the System: E[HTH] = 10
&lt;/h2&gt;

&lt;p&gt;Let E₀, E₁, E₂ denote the expected additional flips to absorption starting from S₀, S₁, S₂ respectively. The first-step equations are:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" alt="equation" width="203" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_2" alt="equation" width="203" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_2%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25280%2529%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_2%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25280%2529%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_0" alt="equation" width="206" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note how each equation reflects the transition diagram above. In the equation for E₁, the term (1/2)E₁ arises from the self-loop: with probability 1/2 you flip H and return to S₁ with expected remaining cost E₁. This makes the second equation an equation in two unknowns (E₁ and E₂) rather than just one. We solve the system from the equations we can simplify first.&lt;/p&gt;

&lt;p&gt;From equation 1, isolating E₀:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520E_0%2520%253D%25202%2520%252B%2520E_1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B2%257DE_0%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%255Cimplies%2520E_0%2520%253D%25202%2520%252B%2520E_1" alt="equation" width="307" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From equation 3, substituting the expression for E₀:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_2%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25282%2520%252B%2520E_1%2529%2520%253D%25201%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_2%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25282%2520%252B%2520E_1%2529%2520%253D%25201%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1" alt="equation" width="401" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now substitute both E₀ = 2 + E₁ and E₂ = 2 + (1/2)E₁ into equation 2:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%255C%2521%255Cleft%25282%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%255Cright%2529%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B4%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B3%257D%257B4%257DE_1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_1%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%255C%2521%255Cleft%25282%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%255Cright%2529%2520%253D%25201%2520%252B%2520%255Cfrac%257B1%257D%257B2%257DE_1%2520%252B%25201%2520%252B%2520%255Cfrac%257B1%257D%257B4%257DE_1%2520%253D%25202%2520%252B%2520%255Cfrac%257B3%257D%257B4%257DE_1" alt="equation" width="537" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B4%257DE_1%2520%253D%25202%2520%255Cimplies%2520E_1%2520%253D%25208" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3B%255Cfrac%257B1%257D%257B4%257DE_1%2520%253D%25202%2520%255Cimplies%2520E_1%2520%253D%25208" alt="equation" width="210" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back-substituting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25202%2520%252B%25208%2520%253D%252010%2520%255Cqquad%2520E_2%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25288%2529%2520%253D%25206" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flatex.codecogs.com%2Fpng.latex%3F_white%26space%3BE_0%2520%253D%25202%2520%252B%25208%2520%253D%252010%2520%255Cqquad%2520E_2%2520%253D%25202%2520%252B%2520%255Cfrac%257B1%257D%257B2%257D%25288%2529%2520%253D%25206" alt="equation" width="348" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;Result&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;E[HTH] = 10, even though HTH is only one flip longer than HH. The structure of the pattern — not just its length — determines the expected wait. The self-loop on S₁ and the catastrophic reset from S₂ on tails together add four full expected flips compared to HH.&lt;/p&gt;

&lt;p&gt;Also note E₂ = 6: starting from the state where you have already matched HT, you still expect to need 6 more flips to complete HTH. That might seem surprising — you are two-thirds of the way through the pattern. But from S₂, a tails (probability 1/2) returns you to S₀ and costs you E₀ = 10 more expected flips. The cost of that reset, weighted by its probability, dominates the calculation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Overlapping Patterns Change Everything
&lt;/h2&gt;

&lt;p&gt;The structural difference between HH and HTH comes down to a single concept: the failure autocorrelation of the pattern, also called its overlap or self-correlation structure. When you are building toward a pattern and experience a failed extension, how much of your progress can you retain?&lt;/p&gt;

&lt;p&gt;For HH: when you have matched one H (state S₁) and flip T, you lose everything. T cannot appear in HH at any position, so no suffix of your current history is a prefix of HH. You fall to S₀. When a failed attempt at HH costs you all your progress, the machine is "memoryless after failure," which is actually advantageous: at least you do not spend time bouncing between partial progress states.&lt;/p&gt;

&lt;p&gt;For HTH: the failure modes are asymmetric and more expensive. From S₁ (matched H), flipping another H does not reset you to zero — it leaves you in S₁. This looks like progress preservation, but it is actually a costly trap. Because you cannot advance past S₁ until you flip T, the self-loop delays your arrival at S₂. You may flip H many times in succession before finally getting the T you need. Each extra H flip costs one step of expected time while contributing no forward progress.&lt;/p&gt;

&lt;p&gt;From S₂ (matched HT), flipping T sends you all the way back to S₀ — despite being two-thirds of the way to completion. The final T in HTT cannot be reused as the beginning of a new HTH (which begins with H), so all progress is wiped. This catastrophic reset is expensive precisely because E₀ = 10 is itself large: each time you reach S₂ and fail, you are paying a heavy price.&lt;/p&gt;

&lt;p&gt;The Conway leading number method provides an elegant algebraic characterization of expected pattern waiting times. For a pattern P = p₁ p₂ ⋯ pₖ over a fair coin, define the correlation polynomial by checking, for each i from 1 to k, whether the length-i suffix of P equals the length-i prefix of P. If it does, contribute 2^(k-i) to the sum. For HH: the length-2 suffix is HH = the length-2 prefix, contributing 2⁰ = 1; the length-1 suffix is H = the length-1 prefix H, contributing 2¹ = 2. Total: 2² + 2 + 1 = 4 + 2 + 1 = 7... Actually the standard Conway formula gives E[P] = ∑_(i=1)^k cᵢ · 2ⁱ where cᵢ = 1 if the length-i prefix equals the length-i suffix, and 0 otherwise. For HH: both overlap conditions hold, giving 2² + 2¹ = 4 + 2 = 6. For HTH: only the full pattern overlaps with itself, giving 2³ + 0 + 2¹ = 8 + 0 + 2 = 10. This is a remarkable shortcut that recovers our exact results — and it illuminates why the internal self-similarity of a pattern drives its expected waiting time.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
&lt;strong&gt;HH&lt;/strong&gt;: &lt;em&gt;Length:&lt;/em&gt; 2 — &lt;em&gt;Self-Overlaps:&lt;/em&gt; Length-1 and length-2 — &lt;em&gt;E[flips]:&lt;/em&gt; 6 — &lt;em&gt;States Needed:&lt;/em&gt; 2 transient + 1 absorbing&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;HT&lt;/strong&gt;: &lt;em&gt;Length:&lt;/em&gt; 2 — &lt;em&gt;Self-Overlaps:&lt;/em&gt; Length-2 only — &lt;em&gt;E[flips]:&lt;/em&gt; 4 — &lt;em&gt;States Needed:&lt;/em&gt; 2 transient + 1 absorbing&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;HTH&lt;/strong&gt;: &lt;em&gt;Length:&lt;/em&gt; 3 — &lt;em&gt;Self-Overlaps:&lt;/em&gt; Length-1 and length-3 — &lt;em&gt;E[flips]:&lt;/em&gt; 10 — &lt;em&gt;States Needed:&lt;/em&gt; 3 transient + 1 absorbing&lt;/li&gt;
  &lt;li&gt;
&lt;strong&gt;HTT&lt;/strong&gt;: &lt;em&gt;Length:&lt;/em&gt; 3 — &lt;em&gt;Self-Overlaps:&lt;/em&gt; Length-3 only — &lt;em&gt;E[flips]:&lt;/em&gt; 8 — &lt;em&gt;States Needed:&lt;/em&gt; 3 transient + 1 absorbing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python Simulation: 100,000 Trials
&lt;/h2&gt;

&lt;p&gt;A 100,000-trial Monte Carlo simulation provides empirical confirmation of both results. The function below tracks a sliding window of the most recent flips equal to the length of the target pattern. When the window equals the target, the trial ends and the flip count is recorded. Averaging over all trials converges to the theoretical expected value as the number of trials grows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_expected_flips&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate expected flips to see target sequence via Monte Carlo.

    Args:
        target: List of ints (1 = Heads, 0 = Tails) representing the pattern.
        n:      Number of independent trials to simulate.

    Returns:
        The empirical mean number of flips across all trials.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;flips&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;flip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# 0 = Tails, 1 = Heads
&lt;/span&gt;            &lt;span class="n"&gt;flips&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# Keep only the last k flips
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;flips&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;


&lt;span class="n"&gt;HH&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;        &lt;span class="c1"&gt;# Heads-Heads
&lt;/span&gt;&lt;span class="n"&gt;HTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="c1"&gt;# Heads-Tails-Heads
&lt;/span&gt;
&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;e_hh&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_expected_flips&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;e_hth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;simulate_expected_flips&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HTH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;E[HH]  = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e_hh&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 6.00)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;E[HTH] = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e_hth&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  (exact: 10.00)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HH error:  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e_hh&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;6.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; flips&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTH error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e_hth&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; flips&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With 100,000 trials the empirical estimates typically land within 0.05 flips of the exact values. Typical output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;E[HH]  = 6.01  (exact: 6.00)
E[HTH] = 9.98  (exact: 10.00)
HH error:  0.0121 flips
HTH error: 0.0183 flips
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The simulation converges cleanly to the exact values. At 100,000 trials the standard error of the mean for E[HTH] — with a variance of roughly 46 (you can verify this analytically) — is approximately √(46/100000) ≈ 0.021, so a discrepancy of 0.02 is entirely expected. Running 1,000,000 trials typically produces agreement to within 0.005.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business Application: Credit Migration &amp;amp; Web Ranking
&lt;/h2&gt;

&lt;p&gt;Absorbing Markov chains are not academic curiosities. They are the backbone of several major financial models used daily by banks, asset managers, and technology companies.&lt;/p&gt;

&lt;p&gt;In credit risk, every major rating agency and bank uses a credit migration matrix — a transition matrix where each row gives the probability that a bond rated BBB today will be rated AAA, AA, A, BBB, BB, B, CCC, or Default one year from now. Default is the absorbing state: once a bond defaults, it does not recover its investment-grade rating. The expected time to default starting from any rating class is computed exactly as we computed E₀ above — by solving a linear system of first-step equations. The same framework drives the Internal Ratings-Based approach under Basel III, where expected loss calculations require expected time-to-default estimates for every rating bucket in a loan portfolio.&lt;/p&gt;

&lt;p&gt;Google's original PageRank algorithm is a non-absorbing Markov chain over the directed graph of the web. The transition from any page to another follows link probabilities, and a small "teleportation" probability prevents the chain from getting stuck in sink nodes. The stationary distribution of this chain — the vector π satisfying π = π P — is the PageRank vector, and each component gives the long-run fraction of time a random walk spends on that page. High-PageRank pages are those the walk visits most often; they are structurally central to the web graph in a way that is captured precisely by the Markov chain's stationary distribution.&lt;/p&gt;

&lt;p&gt;Any sequential decision process with memory-free state transitions and a target event — a manufacturing line waiting for a defective part to appear, a clinical trial tracking patient progression through disease stages, a network protocol waiting for a specific acknowledgment sequence — can be modeled as a waiting-time Markov chain and solved using exactly the methods demonstrated here. The key is identifying the minimal state representation (what information from history is necessary to predict future outcomes?), writing the first-step equations, and solving the resulting linear system. For large state spaces, this system is solved numerically using the fundamental matrix of the absorbing chain, but the conceptual structure is identical to what we have done above.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was originally published on &lt;a href="https://whiteoakintel.com/blog/markov-chain-coin-sequence/" rel="noopener noreferrer"&gt;White Oak Intelligence&lt;/a&gt;. Read the full article there for formatted diagrams, code examples, and related content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
