Aayush Singh

Posted on Jul 13

Diving into RDBMS & Data Integrity

#database #computerscience #beginners #sql

This blog is the second part of my series on learning Database Management Systems (DBMS). In the first blog, I covered the basics of DBMS, and in this post, we will delve into the details of Relational Database Management Systems (RDBMS). I highly encourage you to read my previous blog before continuing with this one. So let’s get started!😁

What is RDBMS?

A Relational Database Management System (RDBMS) is a type of database management system that stores data in the form of tables (also called relations). Each table consists of rows (records) and columns (attributes), and the data is organized so that relationships between different tables can be easily established using keys.

Real-life Analogy

Think of an RDBMS like an Excel workbook where each sheet (table) holds data about a specific topic (like students, courses, teachers). You can link different sheets using common columns like Student_ID or Course_ID — this linking is what makes it "relational."

Key Features of RDBMS

Uses tables to store structured data
Supports primary keys and foreign keys to enforce data integrity
Allows SQL querying for data manipulation
Supports normalization to remove redundancy
Ensures data consistency and integrity

Relational Terminologies

Before diving deeper into relational databases, it's important to understand some basic terms used in RDBMS. These terms describe how data is structured and organized inside a relational database.

Relation (Table)
A relation is simply a table in an RDBMS. It represents a collection of related data entries and consists of rows and columns. Each relation has a unique name and stores data about a particular type of object or entity.
Example: A Student table is a relation that holds data about students.
Tuple (Row / Record)
A tuple is a single row in a table. It represents one complete set of related data or a record.
Example: A single row in the Student table that stores data of one student is a tuple.
Cardinality
Cardinality refers to the number of tuples (rows) in a relation.
Example: If there are 50 students in the Student table, its cardinality is 50.
Attribute (Column /Field)
An attribute is a column in a table. It describes a specific property or characteristic of the data stored in each row. Attributes are also referred to as fields.
Example: In a Student table, columns like Student_ID, Name, and Age are attributes.
Domain
A domain refers to the set of valid values an attribute can take. For example, the domain of the Age attribute could be all integer values from 15 to 100.
Degree
Degree is the number of attributes (columns) in a relation.
Example: If the Student table has 4 columns (Student_ID, Name, Age, Email), then its degree is 4.

Properties of Relational tables

Cells contains atomic values
Values in a column are of the same kind
Each row is unique
No two tables can have the same name in a relational schema.
Each column has a unique name
The sequence of rows is insignificant
The sequence of columns is insignificant.

Data Anomalies

When a database is not properly structured, it can lead to data anomalies — unexpected issues that arise when inserting, updating, or deleting data. These problems often occur when all the data is stored in a single large table instead of being divided into related tables.

Why Do Data Anomalies Occur?

Data anomalies typically occur when:

A table has repeated data (redundancy)
Multiple types of data are stored together (like student + course + teacher in one table)
The database is not normalized, meaning it hasn't been structured properly into separate tables with relationships

Types of Data Anomalies

Insertion Anomaly
Occurs when you can’t insert data into a table without inserting something else unnecessarily.
Example: If a new course is being introduced but no students have enrolled yet, you can't insert the course without leaving student-related fields blank.
Update Anomaly
Occurs when the same piece of data is stored in multiple places, and not all copies are updated correctly.
Example: If a teacher's name is updated in one row but not in others, the database now holds inconsistent data.
Deletion Anomaly
Occurs when deleting one piece of data accidentally removes other useful information.
Example: If the last student enrolled in a course is deleted from the table, the course information may also get deleted.

How to Solve Data Anomalies?

The most effective way to solve data anomalies is by using Normalization. It is a process that involves:

Splitting large tables into smaller, related tables
Removing redundancy
Organizing data into logical groupings

We’ll explore normalization and normal forms (1NF, 2NF, 3NF, BCNF) in upcoming sections of this blog.

Functional Dependencies (FDs)

In relational database design, Functional Dependency (FD) is a very important concept that helps us understand how attributes (columns) relate to one another. It forms the foundation of normalization.

What is a Functional Dependency?

A Functional Dependency exists when the value of one attribute uniquely determines the value of another attribute in a relation (table).

Let’s say we have a relation R, and attributes A and B. We say:
A → B (A functionally determines B)
This means: if two rows have the same value for attribute A, they must have the same value for attribute B.

Simple Example

Consider a Student table:

Student_ID	Name	Course
101	Alice	DBMS
102	Bob	OS
101	Alice	DBMS

Here:

Student_ID → Name Because if Student_ID is the same, the Name must also be the same.
But Name → Student_ID is not valid, because two students can have the same name.

Why Functional Dependencies Matter

They help us find candidate keys
They guide table splitting in normalization
They help eliminate data redundancy and anomalies

Attribute Closure

Attribute closure of a set of attributes X, denoted as X⁺, is the set of all attributes that can be functionally determined from X using the given set of functional dependencies.

It helps you check if a set of attributes is a super key
It helps you determine the minimal cover of FDs
It is used in finding candidate keys during schema design

Step-by-Step Example

Let’s say we have a relation R(A, B, C, D)
And the functional dependencies (FDs) are:

A → B
B → C
A → D

Let’s find the closure of A (A⁺):

Step 1: Start with A⁺ = {A}
Step 2: A → B → add B ⇒ A⁺ = {A, B}
Step 3: B → C → add C ⇒ A⁺ = {A, B, C}
Step 4: A → D → add D ⇒ A⁺ = {A, B, C, D}

So, A⁺ = {A, B, C, D}

That means A is a super key, because it determines all attributes in the relation.

General Algorithm to Find Closure

Start with X⁺ = X (initial set)
Repeatedly apply all FDs where:

If Y → Z and Y ⊆ X⁺, then add Z to X⁺
1. Stop when no more attributes can be added

Armstrong’s Axioms

Armstrong’s Axioms are a set of rules or inference laws used to derive all valid functional dependencies from a given set. These rules are sound and complete, meaning they will only derive correct FDs and can derive all correct FDs.

They form the theoretical foundation for reasoning about functional dependencies in relational databases.

Why Do We Need Armstrong’s Axioms?

To generate new functional dependencies from a known set
To prove whether a particular FD holds
To simplify and minimize functional dependencies
To help in finding candidate keys and in normalization

Core Armstrong’s Axioms (3 Basic Rules)

Reflexivity
If Y is a subset of X, then X → Y
Example: If X = {A, B}, then X → A and X → B are valid
Augmentation
If X → Y, then XZ → YZ (add same attributes to both sides)
Example: If A → B, then AC → BC
Transitivity
If X → Y and Y → Z, then X → Z
Example: If A → B and B → C, then A → C

Additional (Derived) Rules

Using the core axioms, we can derive more useful rules:

Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
Pseudo-Transitivity: If X → Y and YZ → W, then XZ → W

These are not part of the original axioms but can be proved using them.

Summary Table

Axiom	Rule Description	Example
Reflexivity	If Y ⊆ X, then X → Y	AB → A
Augmentation	If X → Y, then XZ → YZ	A → B ⟹ AC → BC
Transitivity	If X → Y and Y → Z, then X → Z	A → B, B → C ⟹ A → C
Union	If X → Y and X → Z, then X → YZ	A → B, A → C ⟹ A → BC
Decomposition	If X → YZ, then X → Y and X → Z	A → BC ⟹ A → B, A → C
Pseudo-Transitivity	If X → Y and YZ → W, then XZ → W	A → B, BC → D ⟹ AC → D

When to Use Armstrong’s Axioms

During attribute closure calculation
To simplify or minimize a set of FDs
To prove whether an FD logically follows from a given set

Keys in RDBMS

In relational databases, keys are crucial because they help uniquely identify rows (tuples) and maintain data integrity. Without keys, it would be difficult to retrieve, relate, or update data accurately.

What is a Key?

A key is one or more attributes (columns) that can uniquely identify a record in a table.
Different types of keys serve different purposes in database design.

Types of Keys in RDBMS

Super Key
A super key is any set of attributes that can uniquely identify a tuple (row) in a table.
It may contain extra (redundant) attributes.
Example: {Student_ID}, {Student_ID, Email}
Candidate Key
A minimal super key — meaning it uniquely identifies a record, and if you remove any attribute, it no longer does.
A table can have multiple candidate keys.
Example: {Student_ID}, {Email} (if both are unique)
Primary Key
One of the candidate keys is selected as the primary key.
It cannot have NULL values and must be unique for each row.
Example: Student_ID is chosen as the primary key
Alternate Key
A candidate key not chosen as the primary key is called an alternate key.
Example: If Email is a candidate key but not the primary key, it’s an alternate key.
Composite Key
A key that consists of two or more attributes that together uniquely identify a record.
Example: In a Student_Course table, the combination of Student_ID and Course_ID can form a composite key.
Foreign Key
A foreign key is an attribute in one table that refers to the primary key of another table.
It is used to establish a relationship between two tables.
Example: Department_ID in the Employee table refers to the Department table.

Summary Table

Key Type	Description	Example
Super Key	Any set of attributes that uniquely identifies a row	`{Student_ID, Email}`
Candidate Key	Minimal super key (no redundant attributes)	`{Student_ID}`, `{Email}`
Primary Key	Chosen candidate key; cannot be null	`Student_ID`
Alternate Key	Candidate key not selected as primary key	`Email`
Composite Key	Key formed by combining two or more attributes	`{Student_ID, Course_ID}`
Foreign Key	Attribute linking to primary key of another table	`Department_ID` in `Employee`

Normalization & Normal Forms

When a database is poorly structured, it can suffer from problems like data redundancy, update anomalies, and inconsistency. This is where normalization comes in.

Normalization?

Normalization is a process in relational database design that organizes data into well-structured tables to minimize redundancy and avoid anomalies.
It involves dividing a large table into smaller, related tables and defining relationships between them.

Why Normalize?

To eliminate duplicate data
To reduce insertion, update, and deletion anomalies
To maintain data integrity
To make the database easier to maintain

Normal Forms

Normalization is done through a series of normal forms — each with specific rules. Let's understand the first three main normal forms (commonly used in practice):

First Normal Form (1NF)
- A table is in 1NF if:
- All values in each column are atomic (indivisible)
- There are no repeating groups or arrays
- Example: A student should not have multiple phone numbers stored in a single column like 12345, 67890. Instead, store them in separate rows.
Second Normal Form (2NF)
- A table is in 2NF if:
- It is already in 1NF
- Every non-prime attribute is fully functionally dependent on the entire primary key
- Mainly applies to composite keys
- Removes partial dependency
- Example: In a Student_Course table with primary key (Student_ID, Course_ID), Student_Name should not depend only on Student_ID.
Third Normal Form (3NF)
- A table is in 3NF if:
- It is already in 2NF
- There is no transitive dependency between non-key attributes
- Removes dependencies like: A → B, B → C, so A → C (which is transitive)
- Example: In a table where Student_ID → Department_ID and Department_ID → Department_Name, you should split the tables to avoid storing Department_Name repeatedly.
Boyce-Codd Normal Form (BCNF)
- A stricter version of 3NF
- Every determinant must be a candidate key

Summary

Normal Form	Key Rules	Purpose
1NF	Atomic values, no repeating groups	Basic structure
2NF	No partial dependency on composite keys	Eliminates redundancy
3NF	No transitive dependency among non-key attributes	Ensures clear relationships
BCNF	Every determinant is a candidate key	Advanced, strict 3NF

Indexing in RDBMS

As your database grows, searching for data can become slow. This is where indexing comes into play. It helps make data retrieval faster — just like how an index in a book helps you find topics quickly.

What is Indexing?

Indexing is a database technique used to speed up data retrieval operations on a table.
An index is a data structure (usually a B-tree or hash) that allows the database to find rows more quickly without scanning the entire table.

Why Use Indexing?

Speeds up SELECT queries significantly
Helps in sorting and searching large datasets
Improves performance when using WHERE, JOIN, ORDER BY, etc.

Real-Life Analogy

Think of a database table as a book and an index as the table of contents. Without an index, you would have to flip through every page to find what you need. With an index, you can go directly to the right page.

Types of Indexes

Primary Index
Created automatically on the primary key. Ensures uniqueness and fast lookup.
Secondary Index
Created manually on non-primary key columns to improve performance.
Composite Index
An index on multiple columns. Useful when queries filter on more than one column.
Clustered Index
Sorts the actual data rows according to the index. A table can have only one clustered index.
Non-Clustered Index
Stores pointers to the data instead of sorting it. A table can have multiple non-clustered indexes.

Things to Keep in Mind

Indexes speed up reads but can slow down writes (INSERT, UPDATE, DELETE)
Over-indexing can waste storage and affect performance
Use indexing strategically for frequently queried columns

Summary

Type of Index	Description	Use Case
Primary Index	Automatically created on primary key	Uniquely identify rows
Secondary Index	Manually created on other columns	Improve query speed
Composite Index	On multiple columns	Queries involving multiple conditions
Clustered Index	Sorts actual data rows	Faster range queries
Non-Clustered Index	Stores pointers to rows	Multiple fast lookups on big tables

Thanks For Reading 🤗

DEV Community

Diving into RDBMS & Data Integrity

What is RDBMS?

Relational Terminologies

Properties of Relational tables

Data Anomalies

Why Do Data Anomalies Occur?

Types of Data Anomalies

How to Solve Data Anomalies?

Functional Dependencies (FDs)

What is a Functional Dependency?

Simple Example

Why Functional Dependencies Matter

Attribute Closure

Step-by-Step Example

General Algorithm to Find Closure

Armstrong’s Axioms

Why Do We Need Armstrong’s Axioms?

Core Armstrong’s Axioms (3 Basic Rules)

Additional (Derived) Rules

Summary Table

When to Use Armstrong’s Axioms

Keys in RDBMS

What is a Key?

Types of Keys in RDBMS

Summary Table

Normalization & Normal Forms

Normalization?

Why Normalize?

Normal Forms

Summary

Indexing in RDBMS

What is Indexing?

Why Use Indexing?

Real-Life Analogy

Types of Indexes

Things to Keep in Mind

Summary

Top comments (0)