DEV Community

Paulet Wairagu
Paulet Wairagu

Posted on

QN : Data warehouses in Fabric

Relational data warehouses are at the center of most enterprise business intelligence (BI) solutions. They provide a structured, SQL-based environment where organizations store, query, and analyze business data at scale.

Fabric provides a fully managed data warehouse with full transactional T-SQL capabilities, including the ability to create tables and insert, update, and delete data.

A data warehouse is a centralized, structured store designed for analytical queries and reporting; It is optimized for analysis.

important steps when building a data warehouse:

  1. Data ingestion - Moving data from source systems into the warehouse.
  2. Data storage - Storing the data in a format optimized for analytics.
  3. Data processing - Transforming the data into a format ready for consumption by analytical tools.
  4. Data analysis and delivery - Analyzing the data to gain insights and delivering them to the business.

Data warehouses contain tables organized in a schema optimized for multidimensional modeling.

This organization, known as dimensional modeling, involves structuring tables into fact tables and dimension tables.

Fact tables contain the numerical data that you want to analyze. Fact tables typically have a large number of rows and are the primary source of data for analysis.
E.g For example, a fact table might contain the total amount paid for sales orders that occurred on a specific date or at a particular store.

Dimension tables contain descriptive information about the data in the fact tables. Dimension tables typically have a few rows and provide context for the data in the fact tables. For example, a dimension table might contain information about the customers who placed sales orders.

a dimension table contains a unique key column that uniquely identifies each row in the table

  1. A surrogate key is a unique identifier for each row in the dimension table. It's often an integer value that the database management system generates automatically when you insert a new row
  2. An alternate key is often a natural or business key that identifies a specific instance of an entity in the transactional source system - such as a product code or a customer ID.

_Surrogate keys are specific to the data warehouse and help maintain consistency and accuracy.
Alternate keys are specific to the source system and help maintain traceability between the data warehouse and the source system.
_

Special types of dimensions provide additional context and enable more comprehensive data analysis.

Time dimensions provide information about the time period in which an event occurred. This table enables data analysts to aggregate data over temporal intervals. For example, a time dimension might include columns for the year, quarter, month, and day of a sales order.

Slowly changing dimensions track changes to dimension attributes over time, like changes to a customer's address or a product's price. They're significant in a data warehouse because they enable you to analyze and understand changes to data over time. Slowly changing dimensions ensure that data stays up-to-date and accurate, which is important for making good business decisions.

In a data warehouse however, the dimension data is denormalized* to reduce the number of joins required to query the data.

a data warehouse uses a star schema, in which a fact table relates directly to the dimension tables

If there are lots of levels or attributes shared by different things, it might make sense to use a snowflake schema

Top comments (0)