Siddharth Shanker

Posted on May 12, 2023 • Edited on May 15, 2023

SQL Concepts

Transaction

A transaction is a single logical unit of work that comprises one or more database operations.
Transaction typically consists of several operations that read or modify data in a database.

ACID Properties

ACID which stands for Atomicity, Consistency, Isolation and Durability are the key attributes that define a transaction.

Atomicity(A):

Entire transaction takes place at once or doesn't happen at all.
Transaction do not occur partially, i.e each transaction is considered as single unit.

There are two operations involved:
Abort :- Aborting a transaction leads to no changes being visible which were made to a database.
Commit:- Changes would be visible after commiting a transaction.

Example

Consider a transaction T which consists of T1 and T2: Task is to transfer 100 from account X to account Y

T1: Deduct the amount from account X
T2: Credit the amount to account Y

If the transaction fails after the completion of T1 and before completion of T2 then the amount will be deducted from account X but it will not be added to the account Y which ultimately results in an inconsistent database state.

Consistency( C ) :-

This property ensures that a transaction brings the database from one valid state to another valid state. In other words, the database must be consistent both before and after a transaction is executed.

Isolation (I):-

This property ensures that concurrent transactions do not interfere with each other. Each transaction must execute in isolation from other transactions, and the results of one transaction must not affect the results of another transaction.

Durability( D ):-

This property ensures that once a transaction is committed, its changes are permanently stored in the database, even in the event of a system failure or power outage. The changes made by a committed transaction should be durable and survive any system crash or failure.

CAP Theorem

CAP theorem suggests that it is impossible for a distributed database system to simultaneously provide consistency, availability, and partition tolerance.
In a distributed database system, there is always a trade-off between consistency and availability when it comes to handling network partitions.
CAP Theorem is a concept that a distributed database system can only have 2 of the 3: Consistency, Availability and Partition Tolerance.

Consistency :- Every read operation on the system should return the most recent value or an error. In other words, all nodes in the system see the same data at the same time.

A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state.

Availability :- Every request sent to a non-failing node in the system should receive a response, without any guarantee about the freshness of the data. In other words, the system should always be responsive and accessible.

Achieving availability in a distributed system requires that the system remains operational 100% of the time.

Partition Tolerance :- The system should continue to function even if there are network failures that cause communication delays or data loss between nodes in the system.

A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network.

Joins

This is a query operation that combines rows from two or more tables based on a related column or set of columns between them.

Inner Join :

This join returns only the matching rows from both tables based on a common column or set of columns.
It is like an intersection operation in Set Theory.

Left Join:

This join returns all the rows from the left table and matching rows from the right table. If there are no matching rows in the right table, then the result will contain NULL values.

Right Join:

Same as Left join but this join returns all rows from the right table and matching rows from the left table.

Full Outer Join:

This join returns all the rows from both tables and NULL values for the non-matching rows.

Cross Join:

This type of join returns the Cartesian product of both tables, which means every row from the left table is joined with every row from the right table.

Aggregations, Filters in queries:

Aggregations

Aggregation is a query operation
Computes a summary value or statistic for a group of records.
Used to generate reports from large datasets.

Types of aggregations:-

COUNT: Returns the number of rows in a table or a subset of a table that match a specific condition.
SUM: Returns the sum of a specific column in a table or a subset of a table that matches a specific condition.
AVG: Returns the average value of a specific column in a table or a subset of a table that matches a specific condition.
MAX: Returns the maximum value of a specific column in a table or a subset of a table that matches a specific condition.
MIN: Returns the minimum value of a specific column in a table or a subset of a table that matches a specific condition.

These are typically used with the GROUP BY clause.

Filters:

Used to retrieve relevant information from large datasets based on specific criteria.
Can be applied using the WHERE clause in queries.
Combined using logical operators such as AND, OR, and NOT to create more complex conditions.

SELECT * FROM orders
WHERE order_total > 100

Types of filters:

Comparison Filters: Compare a column or an expression with a specific value or another column.
Comparison operators include "=", "<", ">", "<=", ">=", and "<>".
Range Filters: Fetches records that fall within a specific range of values.
Range operators include "BETWEEN" and "NOT BETWEEN".
Pattern matching Filters: Matches a pattern or substring within a column.
Pattern matching operators include "LIKE", "NOT LIKE"
Null Filter: Retrieves records with NULL values in specific column.
IS NULL and IS NOT NULL are used.
Logical Filters : Combine multiple conditions using logical operators such as "AND", "OR", and "NOT".

Normalization

Process of organizing data in a database to reduce data redundancy and dependency.
Objective is to minimize data duplication and inconsistency.
Improves data integrity, reduces storage requirements, and improves database performance.

Normalization divides a database into two or more tables and defines relationships between them.

Each table represents a distinct entity or concept, and the relationships between the tables are based on the attributes that they share.

1NF: A relation is in 1NF if all its attributes have an atomic value.
2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the candidate key in DBMS.
3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS is the super key.

Indexes

Indexing makes columns faster to query by creating pointers to where data is stored within a database.
An index is a structure that holds the field the index is sorting and a pointer from each record to their corresponding record in the original table where the data is actually stored.

Transactions

Can be defined as a group of tasks.
A single task is the minimum processing unit which cannot be divided further.

States of Transactions

A transaction in a database can be in one of the following states −

Active − In this state, the transaction is being executed. This is the initial state of every transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back all its write operations on the database to bring the database back to its original state where it was prior to the execution of the transaction.
Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system.

Locking mechanism

Mechanism that ensures that the integrity of data is maintained. It locks the data while a transaction is running
Any transaction cannot read or write the data until it acquires the appropriate lock.

1. Shared Lock(S): Shared lock is placed when we are reading the data, multiple shared locks can be placed on the data but when a shared lock is placed no exclusive lock can be placed.

2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write the data. This lock allows both the read and write operation, Once this lock is placed on the data no other lock (shared or Exclusive) can be placed on the data until Exclusive lock is released.

Database Isolation Levels

The SQL standard defines four isolation levels.

Read Uncommitted

The lowest isolation level allowed by SQL is read uncommitted. In this level, transactions can read data that is not committed by other transactions, permitting dirty read.

Read Committed

This isolation level allows for the reading of data after it is committed by a transaction. This means no dirty reads are possible. This isolation is unable to prevent non-repeatable reads.

Repeatable Read

This isolation tries to improve on the previous isolation level by preventing both dirty reads and non-repeatable reads. This prevention is done by applying locks on rows that are read and rows that perform write operations.

Serializable

The highest isolation level allowed is serializable. This isolation level looks like a serial execution, with almost no concurrency. All the three read phenomena are prevented in this isolation level, but it compromises concurrent execution.

Triggers

A trigger is a stored procedure in database which automatically invokes whenever a special event in the database occurs.
For example, a trigger can be invoked when a row is inserted into a specified table or when certain table columns are being updated. In Structured Query Language, triggers are called only either before or after the below events:

INSERT Event: This event is called when the new row is entered in the table.
UPDATE Event: This event is called when the existing record is changed or modified in the table.
DELETE Event: This event is called when the existing record is removed from the table.

DEV Community