DEV Community: Waqas R

What BI actually costs a small team in 2026 (a pricing breakdown)

Waqas R — Wed, 15 Jul 2026 13:02:53 +0000

If you run a small company or a lean finance team, you have probably had this moment: you open a BI tool's pricing page, see a friendly "$14 a month", sign your team up, and three months later the invoice is five times what you expected.

Here is where that money actually goes.

The two ways BI pricing quietly scales

Per user. Power BI Pro is $14 per user per month. Tableau starts around $75 per Creator seat. That $14 looks great until everyone who needs to see a dashboard needs a licence. A five-person team is $70/month for Power BI, $375 for Tableau — and it climbs with every hire.

Per data source. Databox starts at $159/month (Pro) and includes three data sources. Every extra source — a GA4 property, an ad account, a client — is $5.60/month on top. Agencies feel this fastest.

Neither model is dishonest. But both mean the number on the pricing page is the floor, not the price.

What a 5-person / 5-source team really pays

Tool	Headline	Real monthly cost
Power BI Pro	$14/user	$70
Tableau	~$75/user	$375+
Databox	$159	~$170
DataHub Pro	$14.99 flat	$14.99

(Prices from each vendor's site, July 2026.)

Why we went flat

I'm the founder of DataHub Pro. We priced it at $14.99/month flat — unlimited users, unlimited files, no per-source meter, no AI-credit top-ups. A small team's usage is spiky and headcount changes; charging per seat or per source punishes exactly the growth you want.

Underneath the pricing is a product decision: most BI tools assume you have a data analyst. Small teams usually don't — they have a founder or an accountant with a spreadsheet and a question. So it works from the file you already have: upload an Excel or CSV, ask "which region grew fastest last quarter?" in plain English, get the number, the chart and the working in about a minute.

Full breakdown and a live cost calculator: the most affordable AI BI tool for small business & finance. And the free plan needs no card.

Happy to answer pricing questions in the comments — I've spent an unreasonable amount of time inside competitor pricing pages.

Our football model went 63-for-76 at the World Cup. Here are the 13 it got wrong.

Waqas R — Sun, 12 Jul 2026 13:04:48 +0000

Most football prediction sites publish a hit rate. Almost none publish the list of matches they got wrong.

That asymmetry is the whole problem with accuracy claims in this space: a hit rate you can't audit is a marketing number, not a result. So here is ours, with the losses attached.

Our model's favourite came through in 63 of 76 decisive World Cup 2026 matches. 82.9%. In the knockout rounds, its favourite advanced in 20 of 24 ties.

The full graded record is public at onsidearena.com/model-record, the raw data is free to reuse under CC BY 4.0 at onsidearena.com/data, and the method is written up at onsidearena.com/methodology.

How it was graded

A scorecard is worthless if you get to pick the rules after seeing the results, so these were fixed in advance:

The question is binary and boring. Did the model's favourite win the match (group stage) or advance (knockouts)? Not "were we directionally interesting." Did the pick come through, yes or no.
Group-stage draws are excluded from the denominator. A draw isn't a win for our pick, but it isn't a defeat of it either, and quietly counting draws as hits is the oldest trick in this genre. 76 is the count of decisive matches.
Knockout ties are graded on advancement, including extra time and penalties. If our pick went out on penalties, that's a loss. No asterisks.
Every miss is listed. Not summarised, not aggregated into a percentage. Named.

The 13 misses

Round	Result	Our pick
Group	Ghana 1-0 Panama	Panama
Group	South Africa 1-0 South Korea	South Korea
Group	Australia 2-0 Turkiye	Turkiye
Group	Ivory Coast 1-0 Ecuador	Ecuador
Group	Turkiye 0-1 Paraguay	Turkiye
Group	Norway 3-2 Senegal	Senegal
Group	Bosnia & Herzegovina 3-1 Qatar	Qatar
Group	Ecuador 2-1 Germany	Germany
Group	Turkiye 3-2 United States	United States
R32	Germany 1-1 (pens 3-4) Paraguay	Germany
R32	Netherlands 1-1 (pens 2-3) Morocco	Netherlands
R16	Colombia 0-0 (pens 3-4) Switzerland	Colombia
R16	Brazil 0-2 Norway	Brazil

Three of those are penalty shootouts, which are close to coin flips and which no model should claim to predict. The rest are straightforward: we called it, and it didn't happen.

What the knockouts looked like

The model held up better once the tournament narrowed: 20 of 24 ties. It called Morocco over Canada, Spain over Portugal, Argentina over Egypt, and England over Mexico. Two of its four knockout losses went to penalties.

That pattern is what you'd hope for. Knockout ties concentrate quality gaps that group-stage football tends to blur, and a model built on team strength should do relatively better there. It did.

Why publish the losses

Two reasons, and only one of them is high-minded.

The high-minded one: a prediction you can't check isn't a prediction, it's content. The category is full of "AI football tips" that never publish a scorecard, because a scorecard can be checked and content cannot. If the number is going to mean anything, it has to be falsifiable.

The self-interested one: it's the one claim a competitor can't match by writing better marketing copy. Anyone can say "83% accurate." Almost nobody will publish the thirteen matches behind the other 17%, because it's uncomfortable. That discomfort is the moat.

The data

Free to download, cite and reuse under CC BY 4.0:

The graded record, every call and every result: onsidearena.com/model-record
The raw data: onsidearena.com/data
The methodology: onsidearena.com/methodology

If you're building something similar, take it. If you find an error in the grading, I'd rather hear it than not.

The same engine now points at Fantasy Premier League, currently at 0.86 mean absolute error across 51,518 out-of-sample predictions. The 2026/27 season starts 21 August, and the record gets published the same way: every gameweek, wins and losses, in public.

How we predict the FIFA World Cup 2026 with a Dixon-Coles bivariate Poisson model

Waqas R — Tue, 23 Jun 2026 08:07:43 +0000

We're building Onside Arena — an open AI football analytics platform for the FIFA World Cup 2026 and FPL. Live model record: 75% of MD1 winners called correctly. Here's the technical core.

TL;DR

Dixon-Coles bivariate Poisson on team goal expectations
Bayesian-shrunk ratings learned from 12 past World Cups + 8 Premier League seasons (~32K matches)
Live recalibration after every played match in the tournament
Outputs per-match win/draw probabilities, scoreline distributions, and Monte Carlo simulations of the bracket
Receipts published live at onsidearena.com/world-cup-2026/model-record

Why Dixon-Coles

A standard independent-Poisson model assumes home and away goal counts are independent given attack/defence rates. That's wrong for football — 0-0 and 1-1 are over-represented vs Poisson, and 1-0 / 0-1 are under-represented. Dixon-Coles (1997) introduces a low-score correction term that down-weights the independence assumption near origin.

The rho parameter is learned from data. For our WC + PL training set, rho is approximately -0.13, which materially shifts predicted draw probabilities by 4-6 percentage points on average.

Where the team ratings come from

Attack/defence rates are not observed — they're estimated. We use a hierarchical Bayesian shrinkage model:

Each team has a latent attack strength and defence strength
Priors centered on confederation mean (UEFA, CONMEBOL, etc.) so newly-qualified nations aren't extreme outliers
Likelihood: every observed match score in our 32K-match corpus contributes evidence
MAP estimation via Stan-style sampler, but we cache point estimates per nation pair for fast scoring

Home advantage is a single global parameter (~0.31 log-goals), with a learned multiplier for neutral-venue WC matches (~0.83x of league home advantage).

Live recalibration

This is the part most public models don't do. After every WC 2026 match plays out:

Compute the model's pre-match attack/defence rates and the actual scoreline
Compute the Bayesian update to that team-pair's posterior
Propagate the update to the team's confederation-cluster prior
Re-score all future matches involving either team

Net effect: a side like Iraq, which had a wide posterior because of limited recent international form, sharpened ~2x faster than a side like France whose prior was already tight.

Sanity-check: what we got right and wrong

From MD1:

Argentina to top Group H @ 73% -> 2-0 vs Austria (correct)
France to top Group K @ 81% -> 3-0 vs Iraq (correct)
England to win Group C @ 68% -> won 2-0 (correct)
Germany draw @ 64% -> lost (model was too confident in Germany's defensive solidity vs current form)

Live accuracy: 24/32 calls correct = 75%. Brier score on win-probability: 0.179 (lower is better, 0.25 is naive baseline).

What's in the API

We publish the model's outputs as free JSON via MCP and REST:

GET /api/v1/wc/probabilities — per-match win/draw probabilities
GET /api/v1/wc/champions — current Monte Carlo champion distribution (10K sims)
GET /api/v1/wc/upsets — biggest projected upsets in upcoming 7 days
npm: onside-football-mcp — drop-in for Claude / Cursor / ChatGPT App Directory

Full docs at onsidearena.com/llms.txt.

What we'd love feedback on

Things we're still tuning:

Squad-rotation prior: We don't yet condition on starting XI announcements — model still uses pre-tournament team ratings. Fix is in progress.
Set-piece specialist weighting: A team's set-piece goal share is volatile and we under-weight it.
Tail risk in knockouts: The model is conservative on extra-time and penalty shootouts. We use a separate logistic mixture there.

If you build prediction models for sports, or are interested in Bayesian methods applied to live recalibrating systems, would love to hear how you handle these problems.

Live model record (we update it after every match): https://onsidearena.com/world-cup-2026/model-record

Follow @onsidearena on X for daily picks and post-match receipts.

Cohort Retention Analysis in Excel - Without SQL

Waqas R — Mon, 22 Jun 2026 18:04:54 +0000

If you want to know whether customers actually stick around, a cohort retention table is the clearest view there is - and you don't need SQL or a BI tool to build one. Plain Excel will do it.

What a cohort retention table shows

You group customers by the month they first appeared (their cohort), then track what fraction of each cohort is still active in month +1, +2, +3 and so on. Read down a column to see how retention is trending across cohorts; read across a row to see how a single cohort decays over time.

Building it from a transactions sheet

One row per customer per active month. From a transactions list, derive each customer's first-active month and their active months.
Compute the month offset. offset = active_month - cohort_month (0, 1, 2, ...).
Pivot. Rows = cohort month, columns = offset, values = count of distinct customers. A PivotTable does this.
Convert to percentages. Divide each cell by the cohort's month-0 size to get retention %.
Colour it. Conditional formatting turns the grid into a heatmap so the decay pattern jumps out.

I wrote up the full step-by-step with the helper formulas here: Cohort analysis in Excel.

A few things that trip people up

Count distinct customers, not transactions - a PivotTable counts rows by default, so de-duplicate to distinct customers per cohort/offset.
Young cohorts look better than they are - the newest cohorts have only had a month or two to churn, so don't over-read their high early retention.
Pair it with RFM - cohorts tell you when people churn; RFM segmentation tells you who is most valuable and most at risk.

If you'd rather not rebuild the grid by hand each month, I made a free browser tool that does cohorts (plus forecasts, segments and more) straight from a CSV, no signup: free tools.

Cohort retention looks advanced but it's really just careful bookkeeping. Build it once and you'll never trust a single headline "churn rate" again.

Holt-Winters Forecasting in Excel: Trend + Seasonality, Explained

Waqas R — Sun, 21 Jun 2026 09:29:39 +0000

If you forecast anything with both a trend and a repeating seasonal pattern - monthly sales, web traffic, energy use - a plain moving average won't cut it. Holt-Winters (triple exponential smoothing) is the classic method that handles both, and you can run it in Excel with no add-ins.

The three pieces

Holt-Winters tracks three things and updates each as new data arrives:

Level - where the series is right now.
Trend - how fast it's climbing or falling.
Seasonality - the repeating pattern within a cycle (e.g. 12 months).

Each gets its own smoothing weight (alpha, beta, gamma) between 0 and 1. A higher weight reacts faster to recent data; a lower one is smoother and more stable.

The update equations (additive)

Level:    l_t = alpha*(y_t - s_{t-m}) + (1-alpha)*(l_{t-1} + b_{t-1})
Trend:    b_t = beta*(l_t - l_{t-1}) + (1-beta)*b_{t-1}
Season:   s_t = gamma*(y_t - l_t) + (1-gamma)*s_{t-m}
Forecast: y_hat = l_t + h*b_t + s_{t-m+h}

where m is the season length (12 for monthly data with a yearly cycle).

Doing it in Excel

1. The one-function way. Excel 2016+ has FORECAST.ETS, which is essentially auto-tuned Holt-Winters:

=FORECAST.ETS(target_date, values, timeline, seasonality)

Set seasonality to 12 for monthly data, and pair it with FORECAST.ETS.CONFINT for a confidence band.

2. The manual way. Build the level/trend/season columns straight from the equations so you can audit every step - the only way to really answer "why does it predict that?". I wrote up the full manual build with initialisation and a worked example here: Holt-Winters in Excel.

Pitfalls worth knowing

Too few cycles. You need at least two full seasonal cycles (24 months for monthly data) before the seasonal component is trustworthy.
Additive vs multiplicative. If seasonal swings grow as the series grows, use the multiplicative form.
Over-reacting. Large weights chase noise; auto-tuning by minimising one-step error usually beats eyeballing them.

A quick sanity-check

If you just want a fast trend forecast from a column of numbers without building the whole sheet, I made a free browser tool that auto-tunes the weights and charts the result: free forecast calculator. No signup, runs locally in your browser.

Forecasting won't make the future certain - but Holt-Winters gives you a defensible, transparent baseline, which is usually what the conversation actually needs.

How we built a 10,000-run Monte Carlo simulator for the 2026 World Cup

Waqas R — Fri, 05 Jun 2026 09:23:08 +0000

The 2026 World Cup is the first with 48 teams and 104 matches, which makes it a genuinely interesting simulation problem: a new Round of 32, best-third qualification rules, and group tiebreakers that branch in ugly ways. We built a simulator that runs the whole tournament 10,000 times and publishes champion probabilities for every nation. Here's the engineering side.

Why Monte Carlo instead of closed-form

With 12 groups of 4 plus best-third qualification, the bracket space explodes. Closed-form approaches lose the path-dependence (who you meet in the R32 depends on which groups produce best-thirds). Sampling the tournament end-to-end 10,000 times converges nicely for champion probabilities and is simple to reason about.

The architecture (boring on purpose)

Per-match win/draw/loss probabilities come from our rating model (the same engine behind our FPL projections; inputs are public signals like rankings and squad data).
The simulator is a pure TypeScript function, deterministic given a seed (mulberry32 PRNG), so any board we publish is reproducible.
It runs in a Next.js ISR route revalidating hourly. No workers, no queues: 10,000 tournament runs are just arithmetic over a fixtures array and finish in well under a second.
Played matches lock in real results; the sim only samples what hasn't happened yet, so the board tilts as the tournament progresses.

The part that matters: a public accuracy record

Prediction content is cheap; accountability isn't. Every match prediction is auto-graded after full time on a public model-record page: probability given, result, running Brier score. If the model has a bad tournament, that page will say so. Every prediction site should do this.

Open data

Model outputs (per-match probabilities, champion odds, fixtures) are published as CSVs under CC BY 4.0:

Live endpoints: https://onsidearena.com/data
Kaggle mirror: https://www.kaggle.com/datasets/wr0027/world-cup-2026-predictions-onside-model-outputs
Interactive simulator: https://onsidearena.com/world-cup-2026/simulator
Accuracy record: https://onsidearena.com/world-cup-2026/model-record

Happy to answer questions about the simulation layer, the Next.js setup, or how we grade accuracy. (The rating model's internals stay private; everything about the simulation layer is fair game.)