AJITH D R

Posted on Feb 22, 2023

SQL Concepts

ACID Properties

ACID properties are a set of properties that ensures the consistent state before and after the transaction.

1. Atomicity:

Atomicity property says that, when an action is performed on the data, it should either be executed completely or not executed at all.

Suppose there's an application that transfers funds between accounts. In this case, the property of atomicity guarantees that when a withdrawal is made from one account, the corresponding deposit is also made to the other account, ensuring a successful transaction. If any part of the transaction fails due to insufficient funds, the entire transaction is rolled back to its initial state.

2. Consistency:

The consistency property of a transaction ensures that the data is in a valid state both before the transaction begins and after it completes.

For instance, when performing a fund transfer from one account to another, the consistency property guarantees that the total amount of funds in both accounts remains constant at the beginning and end of each transaction.

3. Isolation:

In a system where multiple transactions are running simultaneously, the intermediate phase of each transaction is not visible to other transactions. This implies that the concurrent transactions seem to be executed sequentially.

For instance, suppose an application that transfers funds from one account to another. In that case, the isolation property guarantees that any other transaction will only see the transferred funds in one account or the other, but not in both accounts or neither of them.

4: Durability:

Once a transaction has been executed successfully, any modifications made to the data are permanent and will not be reverted, even if there is a system malfunction.

For instance, consider an application that transfers money from one account to another. In such a scenario, the durability property ensures that any modifications made to the accounts involved in the transaction will be preserved and will not be rolled back.

CAP Theorem

CAP theorem states that, In a distributed systems, it is not possible to guarantee all three properties - consistency, availability, and partition tolerance at the same time.

Consistency means that all nodes will have same data at the same time. Every client will read the same data no matter what node they are connected to.
Availability ensures that even if one or more nodes are down, any client making a request to a node for data will gets a response.
Partition tolerance means that the system must continue working even if there is a communication break within the system.
Since the networks are not completely reliable, we must tolerate partitions in a distributed systems. According to CAP theorem, we can have only two properties at the same time, so with partition tolerance we can have consistency or availability.
CP: CP based system provides consistency and partition tolerance. When a communication break occurs between any two nodes, the system has to make the non-consistent node unavailable till the partition is resolved.
AP: AP based systems provides availability and partition tolerance.
When a communication break occurs, all nodes will be available to the clients but non-consistent nodes might return older data than other nodes.

Joins

Joins in SQL is used to combine columns of two or more tables based on the common columns between the tables being joined.
There are various types of joins:

Inner join
Left join
Right join
Full join
Self join
Cross join

The examples that I have used here are sourced from here.
Consider the following two tables:
Table 1 - CUSTOMERS Table.

Table 2 - ORDERS Table.

INNER JOIN:

INNER JOIN is also referred to as EQUIJOIN. It joins all rows from the intended tables if they meet the stated condition.
If we want to get all the records from two tables that have common customer id, we can get it using inner join.

SELECT  ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This will produce the following output:

LEFT JOIN:

Left Join returns all the records from the Left Table and only the matching records from the right table.

SELECT  ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

RIGHT JOIN:

Right Join returns all the records from the Right table and only the matching records from the left table.

SELECT  ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

FULL JOIN:

The SQL FULL JOIN combines the results of both left and right joins.

SELECT  ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

SELF JOIN:

The SELF JOIN is used to join a table to itself. This means that each row in a table is joined to itself and every other row in that table.

SELECT  a.ID, b.NAME, a.SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;

Cross Join:

Cross Join returns the cartesian product of the two tables. That means, it produces a paired combination of each row of the first table with each row of the second table.

SELECT  ID, NAME, AMOUNT, DATE
FROM CUSTOMERS, ORDERS;

Aggregations, Filters in queries:

Aggregate function operates on a set of values and returns a single value.

Consider the following table 'Random_numbers':

numbers
12
10
13
97
100

Commonly used SQL Aggregate functions are:

COUNT(): This function returns the number of items in a set.

SELECT COUNT(numbers) FROM Random_numbers;

-- output: 5

AVG(): This function returns the average of a set.

SELECT AVG(numbers) FROM Random_numbers;

-- output: 46.4

SUM(): This function returns the sum of all values in a set.

SELECT SUM(numbers) FROM Random_numbers;

-- output: 232

MIN(): This function returns the minimum value in a set.

SELECT MIN(numbers) FROM Random_numbers;

-- output: 10

MAX(): This function returns the maximum value in a set.

SELECT MAX(numbers) FROM Random_numbers;

-- output: 100

Filters:

AND:

The AND operator allows the existence of multiple conditions in an SQL statement WHERE clause

SELECT CustomerID, InvoiceId, Total
FROM Invoice
WHERE Total >= 10 AND Total =< 13;

OR:

The OR operator combines multiple conditions in an SQL statement’s WHERE clause.

SELECT FirstName, LastName, City 
FROM Customer 
WHERE City = 'London' OR City ='Paris' OR City='Milan';

NOT:

The NOT operator reverses the meaning of the logical operator with which it is used; eg, NOT BETWEEN, NOT IN.

SELECT FirstName, LastName, City
FROM Customer
WHERE City NOT IN ('London', 'Paris', 'Milan');

BETWEEN:

The BETWEEN operator searches for values that are within a set of values, given the minimum value and the maximum value.

SELECT CustomerID, InvoiceId, Total
FROM Invoice
WHERE Total BETWEEN 10 AND 13;

IN:

The IN operator compares a value with a list of literal values that have been specified.

SELECT FirstName, LastName, City 
FROM Customer 
WHERE City IN ('London', 'Paris', 'Milan');

LIKE:

The LIKE operator compares a value with similar values, using wildcard operators.

SELECT Composer, Name 
FROM Track 
WHERE Name LIKE '%Rock%';

Normalization

Normalization helps to reduce or remove data redundancy in a database. It is important to remove redundancy since it causes anomalies in a database which makes it hard to maintain it.

There are four types of normal forms:

First Normal Form (1NF)
For a table to be in the First Normal Form,

It should only have atomic valued attributes.
Values stored in a column should be of the same domain.
All the columns in a table should have unique names.
And the order in which data is stored, does not matter.

Second Normal Form (2NF)
For a table to be in the Second Normal Form,

It should be in the First Normal Form.
And, it should not have Partial Dependency(Non prime attribute(s) should not depend on part of the candidate key).

Third Normal Form (3NF)
For a table to be in the Third Normal Form,

It should be in the Second Normal Form.
It should not have Transitive Dependency(A functional dependency X → Y in a relation R is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both X → Z and Z → Y hold).

Boyce-Codd Normal Form (BCNF)
For a table to be in BCNF, it must satisfy the following rules:

The table must be in the third normal form.
For every non-trivial functional dependency X -> Y, X is the superkey of the table. That means X cannot be a non-prime attribute if Y is a prime attribute. A superkey is a set of one or more attributes that can uniquely identify a row in a database table.

Indexes

Indexing is a technique used to optimize the speed of retrieving data. Indexing helps to speed up queries by allowing the database to quickly find the relevant data.

For example, consider a table called 'Employees' which contains information about employees, such as their firstname, lastname, emailId, Salary, location. Now if we want to search for all the employees who live in a particular location, the DBMS will scan the entire table to find the required records which makes it slower.
But, if we create an index on the location, the DBMS use that index to quickly find the required records.
Example to create an index:

CREATE INDEX employees_by_location
ON Employees (location)

Transactions

A Transaction is a set of operations performed on a database, which is treated as a single operation.
A transaction can include multiple operations on the database, but if any one of those operations fails, the entire transaction is rolled back, and the database returns to its original state.
Transaction has four properties which maintains consistency in a database, before and after transaction: Atomicity, Consistency, Isolation and Durability.

Locking Mechanism

The locking mechanism allows the database to produce sequential output without sequential steps. Locks provide a way to protect data in use from anomalies, such as data loss or additional data that may be added due to a lost transaction.

Two-Phase Locking Protocol
Each transaction locks and unlocks items in two different steps.

Growth Phase - All locks are executed in this phase. Locks are not released after the second phase (compact phase) starts after all changes have been committed.

Compact Phase - In this phase no locks are issued, all changes to the data item are saved and then the locks are released.

In the grow phase, a transaction reaches a point where it has already acquired all the locks it needs. This point is called the LOCK POINT.

After reaching the blocking point, the transaction enters the compression phase.

Database Isolation Levels

Database isolation levels refer to the degree to which different transactions in a database are isolated from one another.

Read Uncommitted(Level 0): This isolation level allows transactions to read uncommitted data that has not yet been written to the database. This can lead to dirty reads, where transactions read data that has been modified by another transaction but has not yet been committed.
Read Committed (Level 1): This isolation level prevents other transactions from writing to or reading from rows that other transactions have written but have not yet committed. So it doesn't allow dirty reading. A transaction holds a read or write lock on the current row, so other transactions cannot read, update, or delete that row.
Repeatable Read (Level 2): This isolation level ensures that any transaction that reads data from a row blocks other write transactions accessing the same row. This is the most restrictive isolation level and holds read locks on all rows it references and holds write locks on all rows it inserts, updates, or deletes. No other transaction can read, update, or delete these rows, preventing non-repeatable reads.
Serializable (Level 3): This isolation level is the highest isolation level. A serializable isolation level requires much more than restricting access to a single row. Typically, this isolation mode locks the entire table to prevent other transactions from inserting or reading data from the table.

Triggers

Trigger are the SQL statements that runs automatically in response to certain events or changes to a table. Triggers can be used to perform actions such as updating a table or sending an email whenever specific data is inserted, updated, or deleted from a table.

Syntax:

create trigger Trigger_name
(before | after)
[insert | update | delete] 
on [table_name]  
[for each row]  
[trigger_body]

"CREATE TRIGGER" - Used to make a new trigger.
"TRIGGER_NAME" - Name of the trigger, and it has to be unique.
"BEFORE" or "AFTER" - Specifies when the trigger should happen, either before or after an event.
"INSERT", "UPDATE", or "DELETE" - Events that can activate the trigger.
"ON [TABLE_NAME]" - Specifies the table that the trigger is connected to.
"FOR EACH ROW" - means the trigger is triggered for each row in the table.
"TRIGGER_BODY" contains the SQL queries that the trigger will execute when it's activated.

Here's a simple SQL trigger example that automatically updates a "last_updated" column in a table called "orders" whenever a new row is inserted into the table:

CREATE TRIGGER update_last_updated
AFTER INSERT ON orders
FOR EACH ROW
BEGIN
   UPDATE orders
   SET last_updated = NOW()
   WHERE id = NEW.id;
END;

DEV Community