DEV Community

Cover image for GETTING TO KNOW:STAR AND SNOWFLAKE SCHEMA.
Ken kimani
Ken kimani

Posted on

GETTING TO KNOW:STAR AND SNOWFLAKE SCHEMA.

By: Kennedy Ngugi Kimani

INTRODUCTION

Terms to be familiar with:

  1. Central Fact Table - is the core of a star schema in a data warehouse, containing quantitative measures of a business process, such as sales amounts or quantities, linked to surrounding dimension tables via foreign keys to provide business context for analysis and reporting.

2.Normalized dimension Tables -This break down attributes into separate tables to reduce data redundancy.

3.De-normalized Dimension Tables -A technique in dimensional modeling where redundancy is intentionally introduced to combine related data into a single, wide, "flat" table to improve query performance and usability in data warehousing and business intelligence.

UNDERSTANDING STAR SCHEMA AND SNOWFLAKE SCHEMA.

First of all this are data modeling techniques in data warehousing.

STAR SCHEMA.

A star schema has one main table in the center (the fact table) connected to several supporting tables (dimension tables) around it. These dimension tables are not split into smaller parts, which makes the data easier to understand and faster to analyze. This design is especially useful for creating reports and dashboards quickly.
This is a visual representation of Star schema.

At the center is the fact table, which contains measurable business data: sales, clicks, revenue, inventory changes, etc. Around it are dimension tables that add context to each fact: customers, products, dates, stores, and so on.

Advantages.

  • Simplified querying: Star schemas are easy to understand and implement. Their de-normalized structure reduces the number of joins required to retrieve data. This simplifies and leads to faster data aggregation and reporting.
  • Faster performance: The reduced join complexity and efficient indexing of fact and dimension tables enhance data retrieval. This is particularly important for decision-makers who require quick access to insights.
  • Intuitive analysis: Star schemas enable intuitive and straightforward data analysis. Users can easily understand relationships and hierarchies among dimensions.
  • Robust support: Star schemas provide support for OLAP structures such as data cubes – multi-dimensional arrays used to improve data analysis.

Dis-advantages.

  • Lack of integrity: Denormalization can cause data redundancy. Dimensional attributes are often repeated across multiple records within a dimension table which can cause data quality issues. Since data is duplicated in denormalization, frequent changes can also cause certain tables to display out-of-date information.
  • Increased costs: Adding redundant data increases computing and storage costs. This can be especially troubling when handling large datasets.
  • Limited flexibility: Star schemas are relatively less robust than normalized structures since they are built for specific use cases. Other approaches might be more effective for complex querying involving multiple joins.
  • Maintenance difficulties: As data changes over time, maintaining a star schema can become challenging. Updates to dimension attributes might require changes in multiple places.

When to use star schema.

  • You need dashboards that load quickly — fewer table connections make reports and filters run faster.
  • Your data doesn’t change structure often, so it’s easy to maintain.
  • Your users are business people or analysts who prefer simple data layouts that are easy to understand.
  • You’re using tools like Power BI or Tableau, which work better with straightforward, less complex data models.

SNOWFLAKE SCHEMA.

Snowflake Schema is similar to a basic setup where one main table (the fact table) connects to several supporting tables (dimension tables). The difference is that in a snowflake schema, those supporting tables are further split into smaller, related tables. This helps avoid repeating the same data and shows clear levels of information — like categories and subcategories.
This is a visual representation of Snowflake schema.

Advantages.

  • Small storage. The snowflake schema doesn’t require as much storage space due to its normalization.
  • High granularity. Thanks to its data normalization, it adds new layers of analysis.
  • Integrity. Due to normalization, the schema has a higher level of data integrity and low redundancies.

Dis-advantages.

  • Complexity: The presence of multiple dimension tables increases the complexity of query creation.
  • Slow Processing: Numerous lookup tables require additional JOIN operations, which can slow down query performance.
  • Hard to Maintain: Increased granularity results in greater complexity, making the schema more difficult to manage and maintain.

When to use star schema.

  • You want to avoid repeating the same data, which saves space and reduces storage costs.
  • Your data naturally fits into levels, like category → subcategory → product, showing clear hierarchies.
  • You need to make updates easily — changing one table automatically updates related information.
  • You’re handling very large datasets — the snowflake schema keeps main tables smaller and faster by storing detailed information separately.

CONCLUSION

In summary, the choice between a Star Schema and a Snowflake Schema is driven by each business need.

Top comments (0)