Kendrick Onyango

Posted on Nov 1, 2023

Data Modeling

#datawarehouse #data #dataengineering #dbms

Introduction

Data modeling is a process used in database design and information systems in which data is structured and organized to serve specific business needs. It involves creating a conceptual representation of data to understand how data elements relate to each other and to support efficient data storage, retrieval, and processing. Data modeling is a crucial step in the development of databases, data warehouses, and other information systems.

Data models are built around business needs. Rules and requirements are defined upfront through feedback from business stakeholders so they can be incorporated into the design of a new system or adapted in the iteration of an existing one.

Data can be modelled at various levels of abstraction. The process begins by collecting information about business requirements from stakeholders and end users. These business rules are then translated into data structures to formulate a concrete database design. A data model can be compared to a roadmap, an architect’s blueprint or any formal diagram that facilitates a deeper understanding of what is being designed.

Data modeling is an iterative process that involves close collaboration between business stakeholders, data analysts, and database designers to create a data structure that meets the organization's needs and supports effective data management and analysis.

Data modeling employs standardized schemas and formal techniques. This provides a common, consistent, and predictable way of defining and managing data resources across an organization, or even beyond.

Ideally, data models are living documents that evolve along with changing business needs. They play an important role in supporting business processes and planning IT architecture and strategy.

Aspects and Types of data models

**
Database and Information system design begins at a high level of abstraction and becomes more concrete and specific just like any other design process. Data models can be divided into three categories depending on their degree of abstraction. The process start with a Conceptual model, progress to a Logical model and ultimately with a Physical model.

Conceptual Data Models: This is the highest-level view of the data and focuses on understanding the business requirements and data entities without considering implementation details. They are usually created as part of the process of gathering initial project requirements. It often involves creating an Entity-Relationship Diagram (ERD) to represent entities and their relationships.

Logical Data Models: At this level, the focus shifts to designing a database schema that is independent of a specific database management system (DBMS). The goal is to create a structured and normalized representation of data elements, tables, and relationships using tools like Entity-Relationship Diagrams or UML class diagrams. They are less abstract and provide greater detail about the concepts and relationships in the domain under consideration. These indicate data attributes, such as data types and their corresponding lengths, and show the relationships among entities. They can be used in highly procedural implementation environments, or for projects that are data-oriented by nature, such as data warehouse design or reporting system development.

Physical Data Models: In physical data modeling, the logical data model is translated into a database schema that is specific to a particular DBMS. This includes defining data types, constraints, indexes, and other implementation details. The outcome is a database design that can be used to create the actual database. They offer a finalized design that can be implemented as a relational database, including associative tables that illustrate the relationships among entities as well as the primary keys and foreign keys that will be used to maintain those relationships. Physical data models can include database management system (DBMS)-specific properties, including performance tuning.

Data modeling process

**
Stakeholders evaluate data processing and storage. Techniques involve dictating what symbols are used to represent data, how models are laid out and how business requirements are conveyed. All the approaches provide formalized workflows that include tasks to be performed in an iterative manner. The workflows generally look like this:

Identify the entities. The process of data modeling begins with the identification of the things, events or concepts that are represented in the data set that is to be modeled. Each entity should be cohesive and logically discrete from all others.
Identify key properties of each entity. Each entity type can be differentiated from all others because it has one or more unique properties, called attributes. For instance, an entity called “customer” might possess such attributes as a first name, last name, telephone number and salutation, while an entity called “address” might include a street name and number, a city, state, country and zip code.
Identify relationships among entities. The earliest draft of a data model will specify the nature of the relationships each entity has with the others. In the above example, each customer “lives at” an address. If that model were expanded to include an entity called “orders,” each order would be shipped to and billed to an address as well. These relationships are usually documented via unified modeling language (UML).
Map attributes to entities completely. This will ensure the model reflects how the business will use the data. Several formal data modeling patterns are in widespread use. Object-oriented developers often apply analysis patterns or design patterns, while stakeholders from other business domains may turn to other patterns.
Assign keys as needed, and decide on a degree of normalization that balances the need to reduce redundancy with performance requirements. Normalization is a technique for organizing data models (and the databases they represent) in which numerical identifiers, called keys, are assigned to groups of data to represent relationships between them without repeating the data. For instance, if customers are each assigned a key, that key can be linked to both their address and their order history without having to repeat this information in the table of customer names. Normalization tends to reduce the amount of storage space a database will require, but it can at cost to query performance.
Finalize and validate the data model. Data modeling is an iterative process that should be repeated and refined as business needs change.

Types of data modeling

Data modeling has evolved alongside database management systems, with model types increasing in complexity as businesses' data storage needs have grown. Here are several model types:

**- Hierarchical data models

Relational data models
Entity-relationship (ER) models
Object-oriented data models
Dimensional data models**

Benefits of data modeling

Data modeling makes it easier for developers, data architects, business analysts, and other stakeholders to view and understand relationships among the data in a database or data warehouse. In addition, it can:

Reduce errors in software and database development.
Increase consistency in documentation and system design across the enterprise.
Improve application and database performance.
Ease data mapping throughout the organization.
Improve communication between developers and business intelligence teams.
Ease and speed the process of database design at the conceptual, logical and physical levels.

DEV Community

Data Modeling

Introduction

Aspects and Types of data models

Data modeling process

Types of data modeling

Benefits of data modeling

Top comments (0)

Read next

GitHub Repository Card

From SLIM to ERB: A Developer's Journey Back to Classic Templates

Choosing the Right Tech Stack for Your Next Project: Insights and Recommendations

Mastering Link Creation in Rails: Best Practices Unveiled