DEV Community

Cover image for Data warehouse explained
Barbara
Barbara

Posted on • Edited on

2 1

Data warehouse explained

WHY

Different business processes need different data structures due to different needs.
The aim of a data warehouse is to

  • make the data simple to understand
  • performant
  • quality assured
  • ready for new questions
  • secure

It should deliver the data to the user

  • in an understandable and performant dimensional model
  • with conformed dimensions or separate data marts
  • to report and visualize -- by interacting directly with the model -- or through an BI application

OLTP - online transactional processing

For running a business it is important to track different things, like for example inventory, financial transactions or different types of movements.
A single relational database might be sufficient for this part, as there is no redundancy and high integrity.

OLAP - online analytical processing

To understand what is going on in a business it is important to do analysis and monitoring. Like examine market segmentation over a period of time or split users into market segmentations.
As single relational database is not sufficient, because it might become too slow for analytics and quite hard to understand.

WHAT

A data warehouse is a system that enables the support of an analytical process. It gathers data from multiple sources. It is a copy of transaction data specifically structured for queries and analysis.

The data from OLTP will run through an ETL process to provide data for the OLAP.

ETL - Extract Transform Load

Extract

  • get data from a source
  • update old states

Transform

  • integrate many sources together
  • clean data
  • produce metadata

Load

  • structure the data
  • load into a dimensional data model

sketch ETL

Dimensional modeling

The dimensional model will help to understand the data easily and perform fast analytical queries.

sketch starscheme

Fact table - numeric and additive

  • business events, like a phone call or a review of a book
  • context of the business events in quantifiable metrics. Like duration of a call, the rating of a book

Dimension table

  • context of business events: who, what, where, why
  • attributes, like: date, time, physical locations, human roles, good sold

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay