This post is part of my Learning Databases Beyond SQL journey. I’m starting to realize that writing SQL queries is only half the story and how tables are designed matters just as much.
This is my beginner-friendly walkthrough of data modelling: how you go from vague requirements to actual database tables.
1. Introduction
Data modelling is the process of taking real-world requirements and translating them into database tables.
I quickly learned this is a huge topic and easily a full course on its own. This post isn’t meant to make anyone an expert. It’s just a clear, high-level view of the steps involved and why they matter.
2. Overview: The Big Picture
Designing a database usually follows these steps:
- Capture requirements
- Build a conceptual model
- Design a logical model
- Create the physical model
One thing that stood out to me:
this is not a one-pass process
It is iterative and you constantly go back and adjust things as you understand the problem better, ideally while checking in with the people who will actually use the system.
3. Capturing Requirements
This step is about answering one question:
What information does the system need to store?
You get this by talking to users , that is the customers, staff, or anyone who will interact with the application.
Example: Hospital Appointment System
Imagine we’re building a system that lets patients:
- View available appointment times
- Book an appointment
- View upcoming appointments
To support this, we need to store:
- Appointment date and time
- Clinic location
- Patient name
- Consultant name
These are functional requirements, these describe what the system must do.
There are also non-functional requirements, like:
- Pages must load in under two seconds
These don’t describe features, but they still affect how the database is designed. They describe how the system should behave while performing its functions.
4. Conceptual Model
The conceptual model is a high-level view of the data. At this stage, we only care about the main entities , that is the real-world things the system tracks.
For the hospital system, those entities are:
- Patient
- Consultant
- Appointment
No columns yet. Just identifying what exists.
5. Logical Model
This is where things start to feel more “database-like”.
The logical model:
- Defines the attributes of each entity
- Shows relationships between entities
- Is often drawn as an Entity Relationship Diagram (ERD)
First pass
We might start with:
Patient
- name
Consultant
- name
Appointment
- date & time
- clinic name & address
- patient
- consultant
Spotting problems (normalization begins)
This is where I dove a bit deeper into the art and science of designing database and tables.
Every appointment at the same clinic will have the same address. Storing the address in every appointment row means:
- duplicated data
- higher chance of errors
So instead, we introduce a new entity:
Clinic
- clinic name
- address
And now Appointment just references the clinic.
This process of removing duplicated data is called normalization.
At this stage, we also:
- choose data types
- define constraints (e.g. appointment dates must be in the future)
- identify primary keys
6. Normalization (At a High Level)
Normalization in simple terms means:
Store each fact once, and only once.
Normalization means storing each real-world fact in one place and linking everything else to it and there are different normal forms (1NF, 2NF, 3NF, etc.). Each one has rules that reduce redundancy and anomalies.
A full deep dive is out of scope here, but the key takeaway for me was:
- normalization improves data quality
- it makes systems easier to maintain
7. Physical Model
The physical model is where the design turns into actual SQL.
This includes:
-
CREATE TABLEstatements - data types
- performance decisions (indexes, partitions, etc.)
Example tables might look like:
create table consultants (
consultant_id integer,
consultant_name varchar2(100)
);
create table patients (
patient_id integer,
patient_name varchar2(100)
);
create table clinics (
clinic_name varchar2(30),
address varchar2(1000)
);
create table appointments (
appointment_id integer,
appointment_datetime date,
clinic_name varchar2(30),
consultant_id integer,
patient_id integer
);
This is where non-functional requirements (like performance) really start to matter.
8. Supertypes and Subtypes
At this point, something interesting shows up.
Both patients and consultants are people — and people have names.
If someone can be both a patient and a consultant, storing their name in two tables can cause inconsistencies and makes querying the data a bit more complex. Imagine you wanted to edit the name of a patient who is also a doctor, you'd have to edit both tables and ensure that there are no inconsistencies.
Introducing a supertype
So we can create a people table and generalize the data:
create table people (
person_id integer,
full_name varchar2(100)
);
Now:
-
peopleis the supertype -
patientsandconsultantsare subtypes
Subtypes store only what’s specific to that role.
This structure:
- avoids duplication
- keeps shared data consistent
Whether you need supertypes depends on your requirements and that decision often changes over time.
9. Relational vs. Document Storage
It can be tempting to skip all of this and store everything as a single JSON document. These formats for me are more familiar due to my previous experiences.
Example:
{
"appointmentDatetime": "2018-09-01 10:00",
"location": {
"name": "PHYSIO",
"address": "1 Hospital Way"
},
"consultant": {
"name": "Doctor Awesome"
},
"patient": {
"name": "Miss Sick"
}
}
This looks convenient but it has downsides:
- harder to query
- duplicated data
- inconsistent values
Over time, these shortcuts usually create more problems than they solve.
Relational design takes longer upfront, but it tends to produce systems that are:
- easier to query
- easier to maintain
- more flexible long-term
Final Thoughts
What stood out to me most is that good database design is about thinking, not syntax. SQL is just the final expression of decisions made much earlier.
I’m still very early in learning this, but even this high-level understanding already changes how I think about tables.
Top comments (0)