DEV Community

Aayush Singh
Aayush Singh

Posted on

Diving into RDBMS & Data Integrity

This blog is the second part of my series on learning Database Management Systems (DBMS). In the first blog, I covered the basics of DBMS, and in this post, we will delve into the details of Relational Database Management Systems (RDBMS). I highly encourage you to read my previous blog before continuing with this one. So let’s get started!😁


What is RDBMS?

A Relational Database Management System (RDBMS) is a type of database management system that stores data in the form of tables (also called relations). Each table consists of rows (records) and columns (attributes), and the data is organized so that relationships between different tables can be easily established using keys.

Real-life Analogy

Think of an RDBMS like an Excel workbook where each sheet (table) holds data about a specific topic (like students, courses, teachers). You can link different sheets using common columns like Student_ID or Course_ID — this linking is what makes it "relational."

Key Features of RDBMS

  • Uses tables to store structured data
  • Supports primary keys and foreign keys to enforce data integrity
  • Allows SQL querying for data manipulation
  • Supports normalization to remove redundancy
  • Ensures data consistency and integrity

Relational Terminologies

Before diving deeper into relational databases, it's important to understand some basic terms used in RDBMS. These terms describe how data is structured and organized inside a relational database.

  • Relation (Table)
    A relation is simply a table in an RDBMS. It represents a collection of related data entries and consists of rows and columns. Each relation has a unique name and stores data about a particular type of object or entity.
    Example: A Student table is a relation that holds data about students.

  • Tuple (Row / Record)
    A tuple is a single row in a table. It represents one complete set of related data or a record.
    Example: A single row in the Student table that stores data of one student is a tuple.

  • Cardinality
    Cardinality refers to the number of tuples (rows) in a relation.
    Example: If there are 50 students in the Student table, its cardinality is 50.

  • Attribute (Column /Field)
    An attribute is a column in a table. It describes a specific property or characteristic of the data stored in each row. Attributes are also referred to as fields.
    Example: In a Student table, columns like Student_ID, Name, and Age are attributes.

  • Domain
    A domain refers to the set of valid values an attribute can take. For example, the domain of the Age attribute could be all integer values from 15 to 100.

  • Degree
    Degree is the number of attributes (columns) in a relation.
    Example: If the Student table has 4 columns (Student_ID, Name, Age, Email), then its degree is 4.


Properties of Relational tables

  1. Cells contains atomic values
  2. Values in a column are of the same kind
  3. Each row is unique
  4. No two tables can have the same name in a relational schema.
  5. Each column has a unique name
  6. The sequence of rows is insignificant
  7. The sequence of columns is insignificant.

Data Anomalies

When a database is not properly structured, it can lead to data anomalies — unexpected issues that arise when inserting, updating, or deleting data. These problems often occur when all the data is stored in a single large table instead of being divided into related tables.

Why Do Data Anomalies Occur?

Data anomalies typically occur when:

  • A table has repeated data (redundancy)
  • Multiple types of data are stored together (like student + course + teacher in one table)
  • The database is not normalized, meaning it hasn't been structured properly into separate tables with relationships

Types of Data Anomalies

  • Insertion Anomaly
    Occurs when you can’t insert data into a table without inserting something else unnecessarily.
    Example: If a new course is being introduced but no students have enrolled yet, you can't insert the course without leaving student-related fields blank.

  • Update Anomaly
    Occurs when the same piece of data is stored in multiple places, and not all copies are updated correctly.
    Example: If a teacher's name is updated in one row but not in others, the database now holds inconsistent data.

  • Deletion Anomaly
    Occurs when deleting one piece of data accidentally removes other useful information.
    Example: If the last student enrolled in a course is deleted from the table, the course information may also get deleted.

How to Solve Data Anomalies?

The most effective way to solve data anomalies is by using Normalization. It is a process that involves:

  • Splitting large tables into smaller, related tables
  • Removing redundancy
  • Organizing data into logical groupings

We’ll explore normalization and normal forms (1NF, 2NF, 3NF, BCNF) in upcoming sections of this blog.


Functional Dependencies (FDs)

In relational database design, Functional Dependency (FD) is a very important concept that helps us understand how attributes (columns) relate to one another. It forms the foundation of normalization.

What is a Functional Dependency?

A Functional Dependency exists when the value of one attribute uniquely determines the value of another attribute in a relation (table).

Let’s say we have a relation R, and attributes A and B. We say:
A → B (A functionally determines B)
This means: if two rows have the same value for attribute A, they must have the same value for attribute B.

Simple Example

Consider a Student table:

Student_ID Name Course
101 Alice DBMS
102 Bob OS
101 Alice DBMS

Here:

  • Student_ID → Name Because if Student_ID is the same, the Name must also be the same.
  • But Name → Student_ID is not valid, because two students can have the same name.

Why Functional Dependencies Matter

  • They help us find candidate keys
  • They guide table splitting in normalization
  • They help eliminate data redundancy and anomalies

Attribute Closure

Attribute closure of a set of attributes X, denoted as X⁺, is the set of all attributes that can be functionally determined from X using the given set of functional dependencies.

  • It helps you check if a set of attributes is a super key
  • It helps you determine the minimal cover of FDs
  • It is used in finding candidate keys during schema design

Step-by-Step Example

Let’s say we have a relation R(A, B, C, D)
And the functional dependencies (FDs) are:

  1. A → B
  2. B → C
  3. A → D

Let’s find the closure of A (A⁺):

Step 1: Start with A⁺ = {A}
Step 2: A → B → add B ⇒ A⁺ = {A, B}
Step 3: B → C → add C ⇒ A⁺ = {A, B, C}
Step 4: A → D → add D ⇒ A⁺ = {A, B, C, D}

So, A⁺ = {A, B, C, D}

That means A is a super key, because it determines all attributes in the relation.

General Algorithm to Find Closure

  1. Start with X⁺ = X (initial set)
  2. Repeatedly apply all FDs where:
  • If Y → Z and Y ⊆ X⁺, then add Z to X⁺
    1. Stop when no more attributes can be added

Armstrong’s Axioms

Armstrong’s Axioms are a set of rules or inference laws used to derive all valid functional dependencies from a given set. These rules are sound and complete, meaning they will only derive correct FDs and can derive all correct FDs.

They form the theoretical foundation for reasoning about functional dependencies in relational databases.

Why Do We Need Armstrong’s Axioms?

  • To generate new functional dependencies from a known set
  • To prove whether a particular FD holds
  • To simplify and minimize functional dependencies
  • To help in finding candidate keys and in normalization

Core Armstrong’s Axioms (3 Basic Rules)

  1. Reflexivity
    If Y is a subset of X, then X → Y
    Example: If X = {A, B}, then X → A and X → B are valid

  2. Augmentation
    If X → Y, then XZ → YZ (add same attributes to both sides)
    Example: If A → B, then AC → BC

  3. Transitivity
    If X → Y and Y → Z, then X → Z
    Example: If A → B and B → C, then A → C

Additional (Derived) Rules

Using the core axioms, we can derive more useful rules:

  • Union: If X → Y and X → Z, then X → YZ
  • Decomposition: If X → YZ, then X → Y and X → Z
  • Pseudo-Transitivity: If X → Y and YZ → W, then XZ → W

These are not part of the original axioms but can be proved using them.

Summary Table

Axiom Rule Description Example
Reflexivity If Y ⊆ X, then X → Y AB → A
Augmentation If X → Y, then XZ → YZ A → B ⟹ AC → BC
Transitivity If X → Y and Y → Z, then X → Z A → B, B → C ⟹ A → C
Union If X → Y and X → Z, then X → YZ A → B, A → C ⟹ A → BC
Decomposition If X → YZ, then X → Y and X → Z A → BC ⟹ A → B, A → C
Pseudo-Transitivity If X → Y and YZ → W, then XZ → W A → B, BC → D ⟹ AC → D

When to Use Armstrong’s Axioms

  • During attribute closure calculation
  • To simplify or minimize a set of FDs
  • To prove whether an FD logically follows from a given set

Keys in RDBMS

In relational databases, keys are crucial because they help uniquely identify rows (tuples) and maintain data integrity. Without keys, it would be difficult to retrieve, relate, or update data accurately.

What is a Key?

A key is one or more attributes (columns) that can uniquely identify a record in a table.
Different types of keys serve different purposes in database design.

Types of Keys in RDBMS

  • Super Key
    A super key is any set of attributes that can uniquely identify a tuple (row) in a table.
    It may contain extra (redundant) attributes.
    Example: {Student_ID}, {Student_ID, Email}

  • Candidate Key
    A minimal super key — meaning it uniquely identifies a record, and if you remove any attribute, it no longer does.
    A table can have multiple candidate keys.
    Example: {Student_ID}, {Email} (if both are unique)

  • Primary Key
    One of the candidate keys is selected as the primary key.
    It cannot have NULL values and must be unique for each row.
    Example: Student_ID is chosen as the primary key

  • Alternate Key
    A candidate key not chosen as the primary key is called an alternate key.
    Example: If Email is a candidate key but not the primary key, it’s an alternate key.

  • Composite Key
    A key that consists of two or more attributes that together uniquely identify a record.
    Example: In a Student_Course table, the combination of Student_ID and Course_ID can form a composite key.

  • Foreign Key
    A foreign key is an attribute in one table that refers to the primary key of another table.
    It is used to establish a relationship between two tables.
    Example: Department_ID in the Employee table refers to the Department table.

Summary Table

Key Type Description Example
Super Key Any set of attributes that uniquely identifies a row {Student_ID, Email}
Candidate Key Minimal super key (no redundant attributes) {Student_ID}, {Email}
Primary Key Chosen candidate key; cannot be null Student_ID
Alternate Key Candidate key not selected as primary key Email
Composite Key Key formed by combining two or more attributes {Student_ID, Course_ID}
Foreign Key Attribute linking to primary key of another table Department_ID in Employee

Normalization & Normal Forms

When a database is poorly structured, it can suffer from problems like data redundancy, update anomalies, and inconsistency. This is where normalization comes in.

Normalization?

Normalization is a process in relational database design that organizes data into well-structured tables to minimize redundancy and avoid anomalies.
It involves dividing a large table into smaller, related tables and defining relationships between them.

Why Normalize?

  • To eliminate duplicate data
  • To reduce insertion, update, and deletion anomalies
  • To maintain data integrity
  • To make the database easier to maintain

Normal Forms

Normalization is done through a series of normal forms — each with specific rules. Let's understand the first three main normal forms (commonly used in practice):

  • First Normal Form (1NF)

    • A table is in 1NF if:
    • All values in each column are atomic (indivisible)
    • There are no repeating groups or arrays
    • Example: A student should not have multiple phone numbers stored in a single column like 12345, 67890. Instead, store them in separate rows.
  • Second Normal Form (2NF)

    • A table is in 2NF if:
    • It is already in 1NF
    • Every non-prime attribute is fully functionally dependent on the entire primary key
    • Mainly applies to composite keys
    • Removes partial dependency
    • Example: In a Student_Course table with primary key (Student_ID, Course_ID), Student_Name should not depend only on Student_ID.
  • Third Normal Form (3NF)

    • A table is in 3NF if:
    • It is already in 2NF
    • There is no transitive dependency between non-key attributes
    • Removes dependencies like: A → B, B → C, so A → C (which is transitive)
    • Example: In a table where Student_ID → Department_ID and Department_ID → Department_Name, you should split the tables to avoid storing Department_Name repeatedly.
  • Boyce-Codd Normal Form (BCNF)

    • A stricter version of 3NF
    • Every determinant must be a candidate key

Summary

Normal Form Key Rules Purpose
1NF Atomic values, no repeating groups Basic structure
2NF No partial dependency on composite keys Eliminates redundancy
3NF No transitive dependency among non-key attributes Ensures clear relationships
BCNF Every determinant is a candidate key Advanced, strict 3NF

Indexing in RDBMS

As your database grows, searching for data can become slow. This is where indexing comes into play. It helps make data retrieval faster — just like how an index in a book helps you find topics quickly.

What is Indexing?

Indexing is a database technique used to speed up data retrieval operations on a table.
An index is a data structure (usually a B-tree or hash) that allows the database to find rows more quickly without scanning the entire table.

Why Use Indexing?

  • Speeds up SELECT queries significantly
  • Helps in sorting and searching large datasets
  • Improves performance when using WHERE, JOIN, ORDER BY, etc.

Real-Life Analogy

Think of a database table as a book and an index as the table of contents. Without an index, you would have to flip through every page to find what you need. With an index, you can go directly to the right page.

Types of Indexes

  • Primary Index
    Created automatically on the primary key. Ensures uniqueness and fast lookup.

  • Secondary Index
    Created manually on non-primary key columns to improve performance.

  • Composite Index
    An index on multiple columns. Useful when queries filter on more than one column.

  • Clustered Index
    Sorts the actual data rows according to the index. A table can have only one clustered index.

  • Non-Clustered Index
    Stores pointers to the data instead of sorting it. A table can have multiple non-clustered indexes.

Things to Keep in Mind

  • Indexes speed up reads but can slow down writes (INSERT, UPDATE, DELETE)
  • Over-indexing can waste storage and affect performance
  • Use indexing strategically for frequently queried columns

Summary

Type of Index Description Use Case
Primary Index Automatically created on primary key Uniquely identify rows
Secondary Index Manually created on other columns Improve query speed
Composite Index On multiple columns Queries involving multiple conditions
Clustered Index Sorts actual data rows Faster range queries
Non-Clustered Index Stores pointers to rows Multiple fast lookups on big tables

Thanks For Reading 🤗

Top comments (0)