DEV Community

Rubens Barbosa
Rubens Barbosa

Posted on

Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD) are a fundamental part of Dimensional Data Modeling, particularly in data warehousing and business intelligence. Before we delve into the details of SCD, it is helpful to focus on some fundamental concepts.

What is Data Modeling?

Data Modeling is the process of creating a visual representation/diagram and its relationships that represent your data system.

What is a dimension?

Dimensions are attributes of an entity.

What is an entity?

An entity is a representation of either a real-world object or a concept that can represent abstract ideas.

Example of entities:

  • Real-World Objects: customer, car, product etc.
  • Concepts: course, sale, order etc.

Attributes of an entity aka dimensions are specific pieces of information about that entity. For example, a customer entity might have attributes such as: name, birthday, address, and phone number.

Dimensions are categorized into two types

dimensions-types

In data modeling, it is important to consider whether an attribute is fixed or slowly changing. Considering the above attributes of a customer entity, the birthday is a great example of a fixed dimension, no one can change their birthday, and phone number is an example of slowing changing because the customer can keep change their phone number, which means that this attribute is time-dependent and, as it is time-dependent, it is changing slowly.

Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD) are a concept in data warehousing that refers to how the attributes of a given dimension table are managed when some records may change slowly over time, often in an unpredictable manner. There are different types of SCDs, each of with its own approach to handling changes.

In Germany there is a service of the Deutsche Post called "Post Nachsendeauftrag" which forwards order/letters in case of a relocation (from the old address to the new one) I will use this example: a customer has moved and now needs to inform Deutsche Post of his new address in order to receive letters at his new address. in order to understand the different types of SCDs.

SCD Type 0: Fixed dimensions

No changes are allowed. The dimension attributes remain static and are never updated.

SCD Type 1: Overwrite the old value

The old value of the dimension attribute is overwritten with the new value. This approach does not keep any history of changes. For example, if a customer moves to a new address, the old address is updated with the new address.

customer_id name address city
1 Friedrich Goethestraße 1 München
2 Sophia Eiffestraße 12 Hamburg

If Friedrich moves to Ebertstraße, 17 in Berlim, the Deutsche dimensional customer table would be updated as follows:

customer_id name address city
1 Friedrich Ebertstraße 17 Berlim
2 Sophia Eiffestraße 12 Hamburg

SCD Type 2: Adding a new row

In Type 2 SCD, a new row is added to the dimension table to represent the new value, and the old row is marked as inactive or expired. This approach maintains a full history of changes.

Using the same Deutsche customer dimension table, if Friedrich moves to Ebertstraße, 17 in Berlim, a new row is added for Friedrich with the new address, and the old row is marked as inactive.

id name address city active effective_date end_date
1 Friedrich Goethestraße 1 München N 2020-01-01 2025-03-13
1 Friedrich Ebertstraße 17 Berlim Y 2025-03-14 NULL
2 Sophia Eiffestraße 12 Hamburg Y 2022-02-10 NULL

Instead of using NULL to indicate that a record is currently active, some dimensional data modeling might use a future date far enough to mark that the record is still valid. For example, using '9999-12-31' for active records.

id name address city active effective_date end_date
1 Friedrich Goethestraße 1 München N 2020-01-01 2025-03-13
1 Friedrich Ebertstraße 17 Berlim Y 2025-03-14 9999-12-31
2 Sophia Eiffestraße 12 Hamburg Y 2022-02-10 9999-12-31

The effective date column indicates the date from which a particular record of a dimension becomes active or valid. It marks the beginning of the period during which the information in that record is considered current.

The end date column indicates the date when a particular record of a dimension is no longer active or valid. It marks the end of the period during which the information in that record was considered valid. This column might also be called expiry date, effective end date, or similar names.

SCD Type 3: Adding a new column

A new column is added to the dimension table to store the new value, while the old value is preserved in the original column. This approach maintains limited history, as only the previous value is preserved.

If Friedrich moves to Ebertstraße, 17 in Berlim, a new column is added to store the new address, and the old address is preserved in the original column.

customer_id name previous_address new_address
1 Friedrich Goethestraße 1, München Ebertstraße 17, Berlim
2 Sophia Eiffestraße 12, Hamburg NULL

Conclusion

There are other types of SCDs, but I will only go into these above in this article. Each type of SCD has its own advantages and disadvantages, and the choice of which type to use depends on the specific requirements of the data warehousing project. I would say the SCD Type 2 is the most commonly used in data warehousing and business intelligence, because it allows store historical data changes in dimension attributes, which is crucial for many analytical and reporting purposes.

"In a well-designed dimensional model, dimension tables have many columns or attributes. It is not uncommon for a dimension table to have 50 to 100 attributes. The power of the data warehouse is directly proportional to the quality and depth of the dimension attributes." (The Data Warehouse Toolkit)

Top comments (0)