DEV Community

Cover image for Schema Design Patterns: Because Even Data Needs Good Architecture
Chiiraq
Chiiraq

Posted on

Schema Design Patterns: Because Even Data Needs Good Architecture

INTRODUCTION

In the ever growing world of data, it exists in absolute chaos; records scattered across multiple systems each telling a slightly different version of the same story. Untouched, the data cannot organize itself into neat and logical structures but instead, it duplicates and contradicts itself and yet, from this chaos, businesses need answers. Regardless of how terrifying it is, this is the natural state of data i.e (raw, unstructured and exponentially growing). This is where the data experts comes in to play, to bring order to this chaos and their most powerful tool you ask? the schema!

What exactly is data modelling??

According to Joe Reis and Matt Housley, (no, they're not made up dudes - authors, fundamentals of data engineering):
"This is the process of creating a visual representation of either a whole information system or part of it to commuicate connections between data points and structures."

What is a Schema??

A schema is the organization or structure for a database, defining how data is organized and how the relations among data are associated.A well designed schema is the foubdation of query perfomance and data integrity in analytical systems.

What is the importance of good data modelling??

Good data modelling ensures:
a) Improved database perfomance: Statistical research has shown that well designed data models can improve report perfomance by up to 90%.Statistical research has shown that well designed data models can improve report perfomance by up to 90%. well i wouldn't know about you but to me that's astronomical figures also makinig it easier to find new opportunities for optimization and are equally easier to diagnose.

b)Improved application quality: the data modelling gives your orgnisation a clear vision for hoe data can fill your business needs.

c)Improves data quality: the data modelling process establishes rules for monitoring data quality and identifies any redundancies or omissions eliminating the hustle in cleaning large data sets.

d)Enables better documentation: it enables consistent documentation which simplifies database maintenance while simultaneously preserving operational efficiency.

e)Saves time and money: it empowers businesses to achieve quicker times to market by catching errors early.

note: These are just some of the perks that great data modelling provides, the scope goes on and on, i could continue listing them down but let's get to the juicy part of the steak, no??

SCHEMAS

In schemas, database tables will have a primary key or a foreign key, which will act as unique identifiers for individual entries in a table. These keys are used in SQL statements to join tables together, creating a unified view of information.Schema diagrams are particularly helpful in showing relationships between tables and they enable analysts to understand the keys that they should join.

While there are several schemas existent, we will primarily focus on the star schema and the snowflake schema. Why you ask?? the two represent the optimal design patterns for the vast majority of analytical workloads in relational database management systems and power bi.

star schema

A star schema is a type of relational database schema that is composed of a single, central fact table that is surrounded by dimension tables.It can have any number of dimension tables.

Star schema featuring a many to one relationship

snowflake schema

The snowflake schema consists of one fact table that is connected to many dimension tables, which can be connected to other dimension tables through a many to one relationship.
Tables in snowflake schema are usually normalized to the 3rd normal form. Each dimension table represents exactly one level in a hierarchy.

Snowflake schema

starflake schema

A starflake schema is a combination of a star schema and a snowflake schema.Starflake schemas are snowflake schemas where only some of the dimension tables have been normalized. Starflake schemas are normalized to remove any redundancies in the dimensions.

Starflake schema

TO NOTE:
Fact table: the central,primary table in a star schema that stores quantitative, numerical data.
Dimension table: a table that stores the descriptive, textual or contextual data about business entites.
Relationship: this is a logical link between two or more tables that share common data, primarily established using primary and foreign keys. The main types are one to one, one to many and many to many.

Throughout this article, we have covered the journey from absolute chaos to now refined structures and schemas through good modelling but the real mastery comes from practice and making design decisions.Every schema you design teaches you something. Every relationship you define deepens your understanding of how data flows through business processes.Until next time, keep your data clean and your terminal keen. Peace madudes.

citations:

A) Reis, J., & Housley, M. (2022). Fundamentals of Data Engineering: Plan and Build Robust Data Systems. O'Reilly Media, p. 156.
b) Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media, p. 39.

Top comments (0)