<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Waqas R</title>
    <description>The latest articles on DEV Community by Waqas R (@waqas_r_47bca4fef1922623d).</description>
    <link>https://dev.to/waqas_r_47bca4fef1922623d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3969502%2Fd6bea643-f5fb-4dca-8794-1b17f0a359f6.png</url>
      <title>DEV Community: Waqas R</title>
      <link>https://dev.to/waqas_r_47bca4fef1922623d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/waqas_r_47bca4fef1922623d"/>
    <language>en</language>
    <item>
      <title>How we predict the FIFA World Cup 2026 with a Dixon-Coles bivariate Poisson model</title>
      <dc:creator>Waqas R</dc:creator>
      <pubDate>Tue, 23 Jun 2026 08:07:43 +0000</pubDate>
      <link>https://dev.to/waqas_r_47bca4fef1922623d/how-we-predict-the-fifa-world-cup-2026-with-a-dixon-coles-bivariate-poisson-model-41kc</link>
      <guid>https://dev.to/waqas_r_47bca4fef1922623d/how-we-predict-the-fifa-world-cup-2026-with-a-dixon-coles-bivariate-poisson-model-41kc</guid>
      <description>&lt;p&gt;We're building Onside Arena — an open AI football analytics platform for the FIFA World Cup 2026 and FPL. Live model record: 75% of MD1 winners called correctly. Here's the technical core.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Dixon-Coles bivariate Poisson on team goal expectations&lt;/li&gt;
&lt;li&gt;Bayesian-shrunk ratings learned from 12 past World Cups + 8 Premier League seasons (~32K matches)&lt;/li&gt;
&lt;li&gt;Live recalibration after every played match in the tournament&lt;/li&gt;
&lt;li&gt;Outputs per-match win/draw probabilities, scoreline distributions, and Monte Carlo simulations of the bracket&lt;/li&gt;
&lt;li&gt;Receipts published live at onsidearena.com/world-cup-2026/model-record&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Dixon-Coles
&lt;/h2&gt;

&lt;p&gt;A standard independent-Poisson model assumes home and away goal counts are independent given attack/defence rates. That's wrong for football — 0-0 and 1-1 are over-represented vs Poisson, and 1-0 / 0-1 are under-represented. Dixon-Coles (1997) introduces a low-score correction term that down-weights the independence assumption near origin.&lt;/p&gt;

&lt;p&gt;The rho parameter is learned from data. For our WC + PL training set, rho is approximately -0.13, which materially shifts predicted draw probabilities by 4-6 percentage points on average.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the team ratings come from
&lt;/h2&gt;

&lt;p&gt;Attack/defence rates are not observed — they're estimated. We use a hierarchical Bayesian shrinkage model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each team has a latent attack strength and defence strength&lt;/li&gt;
&lt;li&gt;Priors centered on confederation mean (UEFA, CONMEBOL, etc.) so newly-qualified nations aren't extreme outliers&lt;/li&gt;
&lt;li&gt;Likelihood: every observed match score in our 32K-match corpus contributes evidence&lt;/li&gt;
&lt;li&gt;MAP estimation via Stan-style sampler, but we cache point estimates per nation pair for fast scoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Home advantage is a single global parameter (~0.31 log-goals), with a learned multiplier for neutral-venue WC matches (~0.83x of league home advantage).&lt;/p&gt;

&lt;h2&gt;
  
  
  Live recalibration
&lt;/h2&gt;

&lt;p&gt;This is the part most public models don't do. After every WC 2026 match plays out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compute the model's pre-match attack/defence rates and the actual scoreline&lt;/li&gt;
&lt;li&gt;Compute the Bayesian update to that team-pair's posterior&lt;/li&gt;
&lt;li&gt;Propagate the update to the team's confederation-cluster prior&lt;/li&gt;
&lt;li&gt;Re-score all future matches involving either team&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Net effect: a side like Iraq, which had a wide posterior because of limited recent international form, sharpened ~2x faster than a side like France whose prior was already tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sanity-check: what we got right and wrong
&lt;/h2&gt;

&lt;p&gt;From MD1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Argentina to top Group H @ 73% -&amp;gt; 2-0 vs Austria (correct)&lt;/li&gt;
&lt;li&gt;France to top Group K @ 81% -&amp;gt; 3-0 vs Iraq (correct)&lt;/li&gt;
&lt;li&gt;England to win Group C @ 68% -&amp;gt; won 2-0 (correct)&lt;/li&gt;
&lt;li&gt;Germany draw @ 64% -&amp;gt; lost (model was too confident in Germany's defensive solidity vs current form)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Live accuracy: 24/32 calls correct = 75%. Brier score on win-probability: 0.179 (lower is better, 0.25 is naive baseline).&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the API
&lt;/h2&gt;

&lt;p&gt;We publish the model's outputs as free JSON via MCP and REST:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GET /api/v1/wc/probabilities — per-match win/draw probabilities&lt;/li&gt;
&lt;li&gt;GET /api/v1/wc/champions — current Monte Carlo champion distribution (10K sims)&lt;/li&gt;
&lt;li&gt;GET /api/v1/wc/upsets — biggest projected upsets in upcoming 7 days&lt;/li&gt;
&lt;li&gt;npm: onside-football-mcp — drop-in for Claude / Cursor / ChatGPT App Directory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full docs at onsidearena.com/llms.txt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd love feedback on
&lt;/h2&gt;

&lt;p&gt;Things we're still tuning:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Squad-rotation prior: We don't yet condition on starting XI announcements — model still uses pre-tournament team ratings. Fix is in progress.&lt;/li&gt;
&lt;li&gt;Set-piece specialist weighting: A team's set-piece goal share is volatile and we under-weight it.&lt;/li&gt;
&lt;li&gt;Tail risk in knockouts: The model is conservative on extra-time and penalty shootouts. We use a separate logistic mixture there.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you build prediction models for sports, or are interested in Bayesian methods applied to live recalibrating systems, would love to hear how you handle these problems.&lt;/p&gt;




&lt;p&gt;Live model record (we update it after every match): &lt;a href="https://onsidearena.com/world-cup-2026/model-record" rel="noopener noreferrer"&gt;https://onsidearena.com/world-cup-2026/model-record&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Follow @onsidearena on X for daily picks and post-match receipts.&lt;/p&gt;

</description>
      <category>datascience</category>
    </item>
    <item>
      <title>Cohort Retention Analysis in Excel - Without SQL</title>
      <dc:creator>Waqas R</dc:creator>
      <pubDate>Mon, 22 Jun 2026 18:04:54 +0000</pubDate>
      <link>https://dev.to/waqas_r_47bca4fef1922623d/cohort-retention-analysis-in-excel-without-sql-2l5h</link>
      <guid>https://dev.to/waqas_r_47bca4fef1922623d/cohort-retention-analysis-in-excel-without-sql-2l5h</guid>
      <description>&lt;p&gt;If you want to know whether customers actually stick around, a cohort retention table is the clearest view there is - and you don't need SQL or a BI tool to build one. Plain Excel will do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a cohort retention table shows
&lt;/h2&gt;

&lt;p&gt;You group customers by the month they first appeared (their &lt;em&gt;cohort&lt;/em&gt;), then track what fraction of each cohort is still active in month +1, +2, +3 and so on. Read down a column to see how retention is trending across cohorts; read across a row to see how a single cohort decays over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building it from a transactions sheet
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One row per customer per active month.&lt;/strong&gt; From a transactions list, derive each customer's first-active month and their active months.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute the month offset.&lt;/strong&gt; &lt;code&gt;offset = active_month - cohort_month&lt;/code&gt; (0, 1, 2, ...).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pivot.&lt;/strong&gt; Rows = cohort month, columns = offset, values = count of &lt;em&gt;distinct&lt;/em&gt; customers. A PivotTable does this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert to percentages.&lt;/strong&gt; Divide each cell by the cohort's month-0 size to get retention %.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Colour it.&lt;/strong&gt; Conditional formatting turns the grid into a heatmap so the decay pattern jumps out.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wrote up the full step-by-step with the helper formulas here: &lt;a href="https://www.datahubpro.co.uk/tutorials/cohort-analysis-in-excel" rel="noopener noreferrer"&gt;Cohort analysis in Excel&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few things that trip people up
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Count distinct customers, not transactions&lt;/strong&gt; - a PivotTable counts rows by default, so de-duplicate to distinct customers per cohort/offset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Young cohorts look better than they are&lt;/strong&gt; - the newest cohorts have only had a month or two to churn, so don't over-read their high early retention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pair it with RFM&lt;/strong&gt; - cohorts tell you &lt;em&gt;when&lt;/em&gt; people churn; &lt;a href="https://www.datahubpro.co.uk/tutorials/rfm-in-excel" rel="noopener noreferrer"&gt;RFM segmentation&lt;/a&gt; tells you &lt;em&gt;who&lt;/em&gt; is most valuable and most at risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you'd rather not rebuild the grid by hand each month, I made a free browser tool that does cohorts (plus forecasts, segments and more) straight from a CSV, no signup: &lt;a href="https://www.datahubpro.co.uk/free-tools" rel="noopener noreferrer"&gt;free tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Cohort retention looks advanced but it's really just careful bookkeeping. Build it once and you'll never trust a single headline "churn rate" again.&lt;/p&gt;

</description>
      <category>excel</category>
      <category>datascience</category>
      <category>tutorial</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Holt-Winters Forecasting in Excel: Trend + Seasonality, Explained</title>
      <dc:creator>Waqas R</dc:creator>
      <pubDate>Sun, 21 Jun 2026 09:29:39 +0000</pubDate>
      <link>https://dev.to/waqas_r_47bca4fef1922623d/holt-winters-forecasting-in-excel-trend-seasonality-explained-1jje</link>
      <guid>https://dev.to/waqas_r_47bca4fef1922623d/holt-winters-forecasting-in-excel-trend-seasonality-explained-1jje</guid>
      <description>&lt;p&gt;If you forecast anything with both a trend and a repeating seasonal pattern - monthly sales, web traffic, energy use - a plain moving average won't cut it. &lt;strong&gt;Holt-Winters&lt;/strong&gt; (triple exponential smoothing) is the classic method that handles both, and you can run it in Excel with no add-ins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three pieces
&lt;/h2&gt;

&lt;p&gt;Holt-Winters tracks three things and updates each as new data arrives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Level&lt;/strong&gt; - where the series is right now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trend&lt;/strong&gt; - how fast it's climbing or falling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seasonality&lt;/strong&gt; - the repeating pattern within a cycle (e.g. 12 months).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each gets its own smoothing weight (alpha, beta, gamma) between 0 and 1. A higher weight reacts faster to recent data; a lower one is smoother and more stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The update equations (additive)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Level:    l_t = alpha*(y_t - s_{t-m}) + (1-alpha)*(l_{t-1} + b_{t-1})
Trend:    b_t = beta*(l_t - l_{t-1}) + (1-beta)*b_{t-1}
Season:   s_t = gamma*(y_t - l_t) + (1-gamma)*s_{t-m}
Forecast: y_hat = l_t + h*b_t + s_{t-m+h}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;m&lt;/code&gt; is the season length (12 for monthly data with a yearly cycle).&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing it in Excel
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The one-function way.&lt;/strong&gt; Excel 2016+ has &lt;code&gt;FORECAST.ETS&lt;/code&gt;, which is essentially auto-tuned Holt-Winters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=FORECAST.ETS(target_date, values, timeline, seasonality)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;seasonality&lt;/code&gt; to 12 for monthly data, and pair it with &lt;code&gt;FORECAST.ETS.CONFINT&lt;/code&gt; for a confidence band.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The manual way.&lt;/strong&gt; Build the level/trend/season columns straight from the equations so you can audit every step - the only way to really answer "why does it predict that?". I wrote up the full manual build with initialisation and a worked example here: &lt;a href="https://www.datahubpro.co.uk/tutorials/holt-winters-in-excel" rel="noopener noreferrer"&gt;Holt-Winters in Excel&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfalls worth knowing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too few cycles.&lt;/strong&gt; You need at least two full seasonal cycles (24 months for monthly data) before the seasonal component is trustworthy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additive vs multiplicative.&lt;/strong&gt; If seasonal swings grow as the series grows, use the multiplicative form.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-reacting.&lt;/strong&gt; Large weights chase noise; auto-tuning by minimising one-step error usually beats eyeballing them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A quick sanity-check
&lt;/h2&gt;

&lt;p&gt;If you just want a fast trend forecast from a column of numbers without building the whole sheet, I made a free browser tool that auto-tunes the weights and charts the result: &lt;a href="https://www.datahubpro.co.uk/forecast-calculator" rel="noopener noreferrer"&gt;free forecast calculator&lt;/a&gt;. No signup, runs locally in your browser.&lt;/p&gt;

&lt;p&gt;Forecasting won't make the future certain - but Holt-Winters gives you a defensible, transparent baseline, which is usually what the conversation actually needs.&lt;/p&gt;

</description>
      <category>excel</category>
      <category>forecasting</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How we built a 10,000-run Monte Carlo simulator for the 2026 World Cup</title>
      <dc:creator>Waqas R</dc:creator>
      <pubDate>Fri, 05 Jun 2026 09:23:08 +0000</pubDate>
      <link>https://dev.to/waqas_r_47bca4fef1922623d/how-we-built-a-10000-run-monte-carlo-simulator-for-the-2026-world-cup-1kcj</link>
      <guid>https://dev.to/waqas_r_47bca4fef1922623d/how-we-built-a-10000-run-monte-carlo-simulator-for-the-2026-world-cup-1kcj</guid>
      <description>&lt;p&gt;The 2026 World Cup is the first with 48 teams and 104 matches, which makes it a genuinely interesting simulation problem: a new Round of 32, best-third qualification rules, and group tiebreakers that branch in ugly ways. We built a simulator that runs the whole tournament 10,000 times and publishes champion probabilities for every nation. Here's the engineering side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Monte Carlo instead of closed-form
&lt;/h2&gt;

&lt;p&gt;With 12 groups of 4 plus best-third qualification, the bracket space explodes. Closed-form approaches lose the path-dependence (who you meet in the R32 depends on which groups produce best-thirds). Sampling the tournament end-to-end 10,000 times converges nicely for champion probabilities and is simple to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture (boring on purpose)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Per-match win/draw/loss probabilities come from our rating model (the same engine behind our FPL projections; inputs are public signals like rankings and squad data).&lt;/li&gt;
&lt;li&gt;The simulator is a pure TypeScript function, deterministic given a seed (mulberry32 PRNG), so any board we publish is reproducible.&lt;/li&gt;
&lt;li&gt;It runs in a Next.js ISR route revalidating hourly. No workers, no queues: 10,000 tournament runs are just arithmetic over a fixtures array and finish in well under a second.&lt;/li&gt;
&lt;li&gt;Played matches lock in real results; the sim only samples what hasn't happened yet, so the board tilts as the tournament progresses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The part that matters: a public accuracy record
&lt;/h2&gt;

&lt;p&gt;Prediction content is cheap; accountability isn't. Every match prediction is auto-graded after full time on a public model-record page: probability given, result, running Brier score. If the model has a bad tournament, that page will say so. Every prediction site should do this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open data
&lt;/h2&gt;

&lt;p&gt;Model outputs (per-match probabilities, champion odds, fixtures) are published as CSVs under CC BY 4.0:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live endpoints: &lt;a href="https://onsidearena.com/data" rel="noopener noreferrer"&gt;https://onsidearena.com/data&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kaggle mirror: &lt;a href="https://www.kaggle.com/datasets/wr0027/world-cup-2026-predictions-onside-model-outputs" rel="noopener noreferrer"&gt;https://www.kaggle.com/datasets/wr0027/world-cup-2026-predictions-onside-model-outputs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Interactive simulator: &lt;a href="https://onsidearena.com/world-cup-2026/simulator" rel="noopener noreferrer"&gt;https://onsidearena.com/world-cup-2026/simulator&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Accuracy record: &lt;a href="https://onsidearena.com/world-cup-2026/model-record" rel="noopener noreferrer"&gt;https://onsidearena.com/world-cup-2026/model-record&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to answer questions about the simulation layer, the Next.js setup, or how we grade accuracy. (The rating model's internals stay private; everything about the simulation layer is fair game.)&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>webdev</category>
      <category>nextjs</category>
    </item>
  </channel>
</rss>
