DEV Community

Cover image for The Great Data Debate: Should You Build Your Warehouse Top-Down or Bottom-Up?
Lawrence Murithi
Lawrence Murithi

Posted on

The Great Data Debate: Should You Build Your Warehouse Top-Down or Bottom-Up?

Introduction

Imagine you have a massive, disorganized garage. You need to clean it up so you can actually find things. You have two ways to tackle this.
The first way is to take every single item out of the garage, build a perfect, custom-sized shelving unit for the entire space, categorize every loose screw and tool into a master list, and then put everything in its exact, permanent place.
The second way is to just clean out the corner where you keep your gardening tools because it’s spring and that’s what you need right now. Later, when winter comes, you can clean out the corner for your snow shovels.
This is exactly how the data engineering world looks at building a Data Warehouse. The clean the whole garage first method is the Inmon approach. The clean corner by corner method is the Kimball approach.
If your company wants to store data to make smart business decisions, you will inevitably bump into these two names; Bill Inmon and Ralph Kimball.
This article looks at how the architectures work, the good, the bad, and which one you should actually use.

The Inmon Architecture(The Top-Down Master Plan)

Bill Inmon is often called the father of the data warehouse. His philosophy is that a data warehouse should be the single, ultimate source of truth for the entire business.

How It works

Inmon uses a top-down approach. You start by looking at the entire company, pull data from all the different software systems (sales, HR, finance) and clean it up. Then, you store all of it in one massive, highly organized central database.
Because of this design, the Inmon approach requires that business requirements are defined first. You must have a complete understanding of the enterprise's overarching data needs before building the model. Furthermore, it relies on strong governance, meaning there are strict, centralized rules controlling data quality, security, and standardization across the board.
Inmon uses a normalized structure. This means data is stored without any duplication. If a customer's name changes, you only have to update it in one single place.
Building a centralized warehouse first is the core of this method. Once this giant central warehouse is built, you carve out smaller pieces of it, Data Marts, for specific departments to use. Each department gets their own data mart, but that data mart is fed strictly by the central warehouse.
Below is a flowchart showing multiple source systems feeding into a single Staging Area, flows into a large central Enterprise Data Warehouse, which then splits into smaller Data Marts pointing to the end users.
Source → ETL → Data Warehouse → Data Marts → Reports
Inmon approach

Pros

- Single source of truth - Because everything flows from one central hub, the different teams will never have conflicting numbers.
- High consistency - Due to strong governance and a centralized structure, definitions and metrics mean the exact same thing across the entire enterprise.
- Good for large organizations - The robust, highly structured foundation is capable of handling vast amounts of complex, enterprise-wide data efficiently over the long term.
- Easy to update - Since data isn't duplicated, updating records or fixing errors is very clean and simple.
- Built for the future - If the company grows or adds new departments, the foundation is already solid.

Cons

- Slow to implement - Designing a perfect system for an entire enterprise takes months, sometimes years, before anyone sees real value.
- It’s expensive - You need highly specialized database experts and a massive upfront budget to build and maintain the central hub.
- Hard for business users to read - The normalized database is great for computers, but very confusing for a regular business person trying to run a report.
- Hard to change - Because the entire enterprise is highly integrated and normalized, pivoting the architecture to accommodate new, unforeseen business models is difficult and time-consuming.

The Kimball Architecture(The Bottom-Up Quick Win)

Ralph Kimball felt Inmon method was slow and expensive and decided to craft a better method. His philosophy is that a data warehouse focus on business processes and answer specific business questions as quickly as possible.

How it works

Kimball uses a bottom-up approach prioritized around fast delivery. Instead of building a giant central warehouse first, you start by building individual Data Marts.
For example, if the sales team needs a report urgently, you pull data just for the sales team, run it through ETL and build a Sales Data Mart. Then later, you build an HR Data Mart.
Kimball uses a denormalized structure, known as the Star Schema. This means he doesn't care if data is duplicated. He organizes data into Facts (numbers such as sales amount) and Dimensions (context such as time, location, or customer name).
Rather than being isolated silos, these individual Data Marts are eventually linked together to form an Integrated Warehouse. To keep things from getting chaotic, Kimball uses conformed dimensions (an enterprise bus). This is a strict rule that says if both the Sales mart and the HR mart use a Date or a Customer, they must use the exact same definition, allowing the data marts to connect logically for company-wide reporting.
Below is a flowchart flowchart showing source systems feeding into an ETL process, which builds independent Data Marts(Star Schemas) first. These marts are linked together by shared conformed dimensions to form a logical Integrated Warehouse, which is then used for End-User Reports.
Source → ETL → Data Marts → Integrated Warehouse → Reports

The Pros

- Faster implementation - You can get a single department up and running with data in a matter of weeks, delivering immediate ROI.
- Cheaper to start - You don't need a massive upfront budget.
- Business-friendly - The Star Schema is incredibly easy for regular business users to understand. They can drag and drop fields in software like Tableau or PowerBI easily.
- Flexible - It is much easier to add new data marts or modify existing ones as business needs change without breaking a massive central database.

The Cons:
- Data duplication - Because data is stored in multiple different marts, you use up more storage space.
- Harder to update - Because Kimball favors speed and query performance over strict organization, the same piece of data is intentionally stored in multiple places. For example, if a customer's address changes, you might have to update it in five different data marts.
- Risk of inconsistency - If you aren't strictly enforcing conformed dimensions, your data marts will drift apart. Because data is duplicated across different marts, sales and finance might end up reporting different total revenue numbers.
- Integration challenges - Because the system is built piece-by-piece rather than centrally planned from the start, tying all the disparate data marts together into a unified, integrated warehouse later on can become technically complex and messy.
For example, if Sales mart is built in January and the HR mart in July, the teams might design their databases differently. A user trying to generate a combined report showing Sales Revenue vs. Employee Training Costs might realize that Sales measures time in Weeks, while HR measures time in Months. Trying to join the two data marts together to answer enterprise-wide questions thus becomes technologically complex.

Which Architecture is better?

If you ask a room full of data engineers this question, you will probably start an argument. But realistically, neither is better. It entirely depends on what your company needs.

You should use Inmon if

  • You work in a highly regulated industry (like banking, insurance, or healthcare) where data accuracy and audit trails are more important than speed.
  • You have a large budget, a big team of data engineers, and plenty of time.
  • Your company's data is incredibly complex and changes constantly.

You should use Kimball if

  • You are a startup, a retail business, or a fast-moving company that needs data right now.
  • You want your non-technical business teams to build their own reports without asking IT for help every time.
  • You are on a tight budget and need to prove the value of the data warehouse to your boss quickly.

The Modern Reality

It is worth mentioning that technology has changed a lot since Inmon and Kimball wrote their books in the 1990s.
Back then, computer storage was incredibly expensive and Inmon’s method of not duplicating data saved money.
Today, cloud storage is incredibly cheap. Because storage is cheap, many companies lean heavily toward Kimball's Star Schema because the cost of duplicating data just doesn't matter much anymore.
Furthermore, new hybrid approaches have popped up. The Data Vault architecture (by Dan Linstedt) is becoming very popular. It essentially takes the best of Inmon’s strict central storage and pairs it with Kimball’s easy-to-read data marts.

The Bottom Line

When it comes to building a data warehouse, don't get caught up in treating Inmon or Kimball like a religion. You aren't building a monument but a tool to help your company make money.
If your company has the patience to build a bulletproof foundation, go top-down with Inmon. If your company needs answers tomorrow to keep the lights on, go bottom-up with Kimball.
Pick the approach that fits your business reality, not the one that looks prettiest on a whiteboard.

Top comments (0)