One Big Table vs the star schema: I think everyone's arguing about the wrong thing

#analytics #database #dataengineering #sql

Every few months the "One Big Table vs star schema" argument flares up again on data Twitter, and every time I watch two groups of smart people talk completely past each other. It took me a while to figure out why — and once I did, the whole debate kind of dissolved.

Short version: they're not disagreeing about modeling. They're optimizing for different things and not saying so out loud.

Let me lay it out.

The two camps

If you've somehow avoided this one so far: One Big Table (OBT) means you flatten your fact and all its dimensions into a single wide, denormalized table. No joins. Every order row already carries the customer name, the product category, the store region, the date attributes — everything, inline.

The star schema keeps facts and dimensions separate and joins them at query time. Fact in the middle, dimension tables around it, assembled per question.

That's the whole fork: OBT joins once, in the pipeline, ahead of time. The star joins every time, at query time.

Why OBT people love OBT

And honestly, they're not wrong:

No joins for the analyst. Every question is a single-table SELECT ... GROUP BY. Nobody fat-fingers a join. Self-serve BI users stop filing tickets.
Predictable performance. One wide columnar table scans fast and consistently, which matters a lot if your BI tool generates clumsy join SQL or you've got high dashboard concurrency.
It fits columnar storage. Wide, repetitive tables compress beautifully. "Electronics" repeated a million times costs almost nothing after compression. The old penalty for very wide tables mostly isn't a thing anymore.

I used to roll my eyes at OBT. I don't anymore. For the right job it's genuinely great.

Why it bites you

The costs are real too, they just show up later, which is the dangerous part:

It's built for the questions you already thought of. New question that needs an attribute you didn't flatten in? That's a pipeline change, not a new query. The star's flexibility is exactly the thing OBT trades away.
History gets awkward. Handling slowly changing dimensions in a flat table is clumsy compared to a proper dimension with validity dates.
Combinatorial sprawl. Because each OBT is shaped for a set of questions, teams build lots of them — one per dashboard — and now "revenue" is computed a dozen slightly-different ways across a dozen wide tables that quietly drift apart. Congrats, you've reinvented the "three different revenue numbers" problem, one big table at a time.

Here's the comparison I wish someone had handed me earlier:

	Star schema	One Big Table
Joins at query time	Yes	None
Flexibility for new questions	High	Low (rebuild pipeline)
Query simplicity for analysts	Medium	High
Handling SCDs / history	Clean	Awkward
Risk at scale	Complexity	Definitions drift across many OBTs
Best as	Core model	Serving layer

Where I actually landed

The framing that fixed it for me: they're layers, not rivals.

Keep a star schema as your core model — the flexible, governed, single-source-of-truth layer where facts and dimensions live cleanly and history is handled properly. Then, where a specific high-traffic dashboard or a join-averse tool needs it, build a One Big Table on top as a serving layer — a denormalized projection derived from the star.

You get both properties without the trap. The star keeps flexibility and one definition of each metric. The OBT delivers join-free speed to the consumers who benefit. And because the OBT is derived, it can't invent its own private definition of revenue — it inherits the star's.

So I stopped asking "star or OBT?" and started asking "which layer am I building right now?" That question usually answers itself.

The one exception: if you genuinely have one dashboard and a small team, an OBT alone might be all you ever need. Don't build a star schema to feed a single report. Match the tool to the actual job, not the tribe.

I wrote a longer version of this with the full reasoning over on my site if you want it: One Big Table vs the Star Schema. It's part of a set of pieces I've been writing on dimensional modeling — the star vs snowflake one and normalization vs denormalization are the closest companions.

But mostly I'm curious about your experience: has anyone here run OBT as the only layer at real scale? What broke first — the rebuild-for-every-new-question cost, or the "wait, why does each dashboard have a different revenue number" problem? Or did it just... work, and I'm overthinking it?

DEV Community

One Big Table vs the star schema: I think everyone's arguing about the wrong thing

The two camps

Why OBT people love OBT

Why it bites you

Where I actually landed

Top comments (0)