This blog is the second part of my series on learning Database Management Systems (DBMS). In the first blog, I covered the basics of DBMS, and in this post, we will delve into the details of Relational Database Management Systems (RDBMS). I highly encourage you to read my previous blog before continuing with this one. So let’s get started!😁
What is RDBMS?
A Relational Database Management System (RDBMS) is a type of database management system that stores data in the form of tables (also called relations). Each table consists of rows (records) and columns (attributes), and the data is organized so that relationships between different tables can be easily established using keys.
Real-life Analogy
Think of an RDBMS like an Excel workbook where each sheet (table) holds data about a specific topic (like students, courses, teachers). You can link different sheets using common columns like Student_ID
or Course_ID
— this linking is what makes it "relational."
Key Features of RDBMS
- Uses tables to store structured data
- Supports primary keys and foreign keys to enforce data integrity
- Allows SQL querying for data manipulation
- Supports normalization to remove redundancy
- Ensures data consistency and integrity
Relational Terminologies
Before diving deeper into relational databases, it's important to understand some basic terms used in RDBMS. These terms describe how data is structured and organized inside a relational database.
Relation (Table)
A relation is simply a table in an RDBMS. It represents a collection of related data entries and consists of rows and columns. Each relation has a unique name and stores data about a particular type of object or entity.
Example: AStudent
table is a relation that holds data about students.Tuple (Row / Record)
A tuple is a single row in a table. It represents one complete set of related data or a record.
Example: A single row in theStudent
table that stores data of one student is a tuple.Cardinality
Cardinality refers to the number of tuples (rows) in a relation.
Example: If there are 50 students in theStudent
table, its cardinality is 50.Attribute (Column /Field)
An attribute is a column in a table. It describes a specific property or characteristic of the data stored in each row. Attributes are also referred to as fields.
Example: In aStudent
table, columns likeStudent_ID
,Name
, andAge
are attributes.Domain
A domain refers to the set of valid values an attribute can take. For example, the domain of theAge
attribute could be all integer values from 15 to 100.Degree
Degree is the number of attributes (columns) in a relation.
Example: If theStudent
table has 4 columns (Student_ID
,Name
,Age
,Email
), then its degree is 4.
Properties of Relational tables
- Cells contains atomic values
- Values in a column are of the same kind
- Each row is unique
- No two tables can have the same name in a relational schema.
- Each column has a unique name
- The sequence of rows is insignificant
- The sequence of columns is insignificant.
Data Anomalies
When a database is not properly structured, it can lead to data anomalies — unexpected issues that arise when inserting, updating, or deleting data. These problems often occur when all the data is stored in a single large table instead of being divided into related tables.
Why Do Data Anomalies Occur?
Data anomalies typically occur when:
- A table has repeated data (redundancy)
- Multiple types of data are stored together (like student + course + teacher in one table)
- The database is not normalized, meaning it hasn't been structured properly into separate tables with relationships
Types of Data Anomalies
Insertion Anomaly
Occurs when you can’t insert data into a table without inserting something else unnecessarily.
Example: If a new course is being introduced but no students have enrolled yet, you can't insert the course without leaving student-related fields blank.Update Anomaly
Occurs when the same piece of data is stored in multiple places, and not all copies are updated correctly.
Example: If a teacher's name is updated in one row but not in others, the database now holds inconsistent data.Deletion Anomaly
Occurs when deleting one piece of data accidentally removes other useful information.
Example: If the last student enrolled in a course is deleted from the table, the course information may also get deleted.
How to Solve Data Anomalies?
The most effective way to solve data anomalies is by using Normalization. It is a process that involves:
- Splitting large tables into smaller, related tables
- Removing redundancy
- Organizing data into logical groupings
We’ll explore normalization and normal forms (1NF, 2NF, 3NF, BCNF) in upcoming sections of this blog.
Functional Dependencies (FDs)
In relational database design, Functional Dependency (FD) is a very important concept that helps us understand how attributes (columns) relate to one another. It forms the foundation of normalization.
What is a Functional Dependency?
A Functional Dependency exists when the value of one attribute uniquely determines the value of another attribute in a relation (table).
Let’s say we have a relation R, and attributes A and B. We say:
A → B (A functionally determines B)
This means: if two rows have the same value for attribute A, they must have the same value for attribute B.
Simple Example
Consider a Student
table:
Student_ID | Name | Course |
---|---|---|
101 | Alice | DBMS |
102 | Bob | OS |
101 | Alice | DBMS |
Here:
-
Student_ID → Name
Because ifStudent_ID
is the same, theName
must also be the same. - But
Name → Student_ID
is not valid, because two students can have the same name.
Why Functional Dependencies Matter
- They help us find candidate keys
- They guide table splitting in normalization
- They help eliminate data redundancy and anomalies
Attribute Closure
Attribute closure of a set of attributes X, denoted as X⁺, is the set of all attributes that can be functionally determined from X using the given set of functional dependencies.
- It helps you check if a set of attributes is a super key
- It helps you determine the minimal cover of FDs
- It is used in finding candidate keys during schema design
Step-by-Step Example
Let’s say we have a relation R(A, B, C, D)
And the functional dependencies (FDs) are:
- A → B
- B → C
- A → D
Let’s find the closure of A (A⁺):
Step 1: Start with A⁺ = {A}
Step 2: A → B → add B ⇒ A⁺ = {A, B}
Step 3: B → C → add C ⇒ A⁺ = {A, B, C}
Step 4: A → D → add D ⇒ A⁺ = {A, B, C, D}
So, A⁺ = {A, B, C, D}
That means A is a super key, because it determines all attributes in the relation.
General Algorithm to Find Closure
- Start with X⁺ = X (initial set)
- Repeatedly apply all FDs where:
- If
Y → Z
andY ⊆ X⁺
, then addZ
toX⁺
- Stop when no more attributes can be added
Armstrong’s Axioms
Armstrong’s Axioms are a set of rules or inference laws used to derive all valid functional dependencies from a given set. These rules are sound and complete, meaning they will only derive correct FDs and can derive all correct FDs.
They form the theoretical foundation for reasoning about functional dependencies in relational databases.
Why Do We Need Armstrong’s Axioms?
- To generate new functional dependencies from a known set
- To prove whether a particular FD holds
- To simplify and minimize functional dependencies
- To help in finding candidate keys and in normalization
Core Armstrong’s Axioms (3 Basic Rules)
Reflexivity
If Y is a subset of X, then X → Y
Example: If X = {A, B}, then X → A and X → B are validAugmentation
If X → Y, then XZ → YZ (add same attributes to both sides)
Example: If A → B, then AC → BCTransitivity
If X → Y and Y → Z, then X → Z
Example: If A → B and B → C, then A → C
Additional (Derived) Rules
Using the core axioms, we can derive more useful rules:
- Union: If X → Y and X → Z, then X → YZ
- Decomposition: If X → YZ, then X → Y and X → Z
- Pseudo-Transitivity: If X → Y and YZ → W, then XZ → W
These are not part of the original axioms but can be proved using them.
Summary Table
Axiom | Rule Description | Example |
---|---|---|
Reflexivity | If Y ⊆ X, then X → Y | AB → A |
Augmentation | If X → Y, then XZ → YZ | A → B ⟹ AC → BC |
Transitivity | If X → Y and Y → Z, then X → Z | A → B, B → C ⟹ A → C |
Union | If X → Y and X → Z, then X → YZ | A → B, A → C ⟹ A → BC |
Decomposition | If X → YZ, then X → Y and X → Z | A → BC ⟹ A → B, A → C |
Pseudo-Transitivity | If X → Y and YZ → W, then XZ → W | A → B, BC → D ⟹ AC → D |
When to Use Armstrong’s Axioms
- During attribute closure calculation
- To simplify or minimize a set of FDs
- To prove whether an FD logically follows from a given set
Keys in RDBMS
In relational databases, keys are crucial because they help uniquely identify rows (tuples) and maintain data integrity. Without keys, it would be difficult to retrieve, relate, or update data accurately.
What is a Key?
A key is one or more attributes (columns) that can uniquely identify a record in a table.
Different types of keys serve different purposes in database design.
Types of Keys in RDBMS
Super Key
A super key is any set of attributes that can uniquely identify a tuple (row) in a table.
It may contain extra (redundant) attributes.
Example:{Student_ID}
,{Student_ID, Email}
Candidate Key
A minimal super key — meaning it uniquely identifies a record, and if you remove any attribute, it no longer does.
A table can have multiple candidate keys.
Example:{Student_ID}
,{Email}
(if both are unique)Primary Key
One of the candidate keys is selected as the primary key.
It cannot have NULL values and must be unique for each row.
Example:Student_ID
is chosen as the primary keyAlternate Key
A candidate key not chosen as the primary key is called an alternate key.
Example: IfEmail
is a candidate key but not the primary key, it’s an alternate key.Composite Key
A key that consists of two or more attributes that together uniquely identify a record.
Example: In aStudent_Course
table, the combination ofStudent_ID
andCourse_ID
can form a composite key.Foreign Key
A foreign key is an attribute in one table that refers to the primary key of another table.
It is used to establish a relationship between two tables.
Example:Department_ID
in theEmployee
table refers to theDepartment
table.
Summary Table
Key Type | Description | Example |
---|---|---|
Super Key | Any set of attributes that uniquely identifies a row | {Student_ID, Email} |
Candidate Key | Minimal super key (no redundant attributes) |
{Student_ID} , {Email}
|
Primary Key | Chosen candidate key; cannot be null | Student_ID |
Alternate Key | Candidate key not selected as primary key | Email |
Composite Key | Key formed by combining two or more attributes | {Student_ID, Course_ID} |
Foreign Key | Attribute linking to primary key of another table |
Department_ID in Employee
|
Normalization & Normal Forms
When a database is poorly structured, it can suffer from problems like data redundancy, update anomalies, and inconsistency. This is where normalization comes in.
Normalization?
Normalization is a process in relational database design that organizes data into well-structured tables to minimize redundancy and avoid anomalies.
It involves dividing a large table into smaller, related tables and defining relationships between them.
Why Normalize?
- To eliminate duplicate data
- To reduce insertion, update, and deletion anomalies
- To maintain data integrity
- To make the database easier to maintain
Normal Forms
Normalization is done through a series of normal forms — each with specific rules. Let's understand the first three main normal forms (commonly used in practice):
-
First Normal Form (1NF)
- A table is in 1NF if:
- All values in each column are atomic (indivisible)
- There are no repeating groups or arrays
- Example: A student should not have multiple phone numbers stored in a single column like
12345, 67890
. Instead, store them in separate rows.
-
Second Normal Form (2NF)
- A table is in 2NF if:
- It is already in 1NF
- Every non-prime attribute is fully functionally dependent on the entire primary key
- Mainly applies to composite keys
- Removes partial dependency
- Example: In a
Student_Course
table with primary key(Student_ID, Course_ID)
,Student_Name
should not depend only onStudent_ID
.
-
Third Normal Form (3NF)
- A table is in 3NF if:
- It is already in 2NF
- There is no transitive dependency between non-key attributes
- Removes dependencies like: A → B, B → C, so A → C (which is transitive)
- Example: In a table where
Student_ID → Department_ID
andDepartment_ID → Department_Name
, you should split the tables to avoid storingDepartment_Name
repeatedly.
-
Boyce-Codd Normal Form (BCNF)
- A stricter version of 3NF
- Every determinant must be a candidate key
Summary
Normal Form | Key Rules | Purpose |
---|---|---|
1NF | Atomic values, no repeating groups | Basic structure |
2NF | No partial dependency on composite keys | Eliminates redundancy |
3NF | No transitive dependency among non-key attributes | Ensures clear relationships |
BCNF | Every determinant is a candidate key | Advanced, strict 3NF |
Indexing in RDBMS
As your database grows, searching for data can become slow. This is where indexing comes into play. It helps make data retrieval faster — just like how an index in a book helps you find topics quickly.
What is Indexing?
Indexing is a database technique used to speed up data retrieval operations on a table.
An index is a data structure (usually a B-tree or hash) that allows the database to find rows more quickly without scanning the entire table.
Why Use Indexing?
- Speeds up SELECT queries significantly
- Helps in sorting and searching large datasets
- Improves performance when using WHERE, JOIN, ORDER BY, etc.
Real-Life Analogy
Think of a database table as a book and an index as the table of contents. Without an index, you would have to flip through every page to find what you need. With an index, you can go directly to the right page.
Types of Indexes
Primary Index
Created automatically on the primary key. Ensures uniqueness and fast lookup.Secondary Index
Created manually on non-primary key columns to improve performance.Composite Index
An index on multiple columns. Useful when queries filter on more than one column.Clustered Index
Sorts the actual data rows according to the index. A table can have only one clustered index.Non-Clustered Index
Stores pointers to the data instead of sorting it. A table can have multiple non-clustered indexes.
Things to Keep in Mind
- Indexes speed up reads but can slow down writes (INSERT, UPDATE, DELETE)
- Over-indexing can waste storage and affect performance
- Use indexing strategically for frequently queried columns
Summary
Type of Index | Description | Use Case |
---|---|---|
Primary Index | Automatically created on primary key | Uniquely identify rows |
Secondary Index | Manually created on other columns | Improve query speed |
Composite Index | On multiple columns | Queries involving multiple conditions |
Clustered Index | Sorts actual data rows | Faster range queries |
Non-Clustered Index | Stores pointers to rows | Multiple fast lookups on big tables |
Thanks For Reading 🤗
Top comments (0)