SQL Concepts
ACID
- ACID refers to a set of principles that ensure reliable data transactions
- These are the properties of a transaction
- Transaction is a logical group of one or more SQL statements
Atomicity:
- Either all or none are applied to a database Example: In a banking system for a money transfer two operations have to be done like debit and credit in this scenario two operations or none should be done
Consistency:
- Transactions always leave the database in a consistent state
Isolation:
- When multiple transactions are being simultaneously executed and in parallel in a database system, then the carrying out and execution of the transaction would occur as if it is the only transaction that exists in the system.
- None of the transactions would affect any other transaction’s existence.
Durability:
- Once a transaction is committed, its changes are permanent and will survive any subsequent system failures, even power loss or crashes.
CAP Theorem: It states that in a distributed system, it is impossible to simultaneously achieve Consistency, Availability, and partition tolerance
- According to the above statement we have to give up something to gain something
Consistency:
Every read operation from the database returns the most recent write or an error. In other words, all nodes in the system see the same data at the same time.
Availability:
Every request to the database receives a response, without guaranteeing that it contains the most recent version of the data. In other words, the system is always up and running, even in the face of network partitions, hardware failures, or other forms of degradation.
Partition tolerance:
The system continues to operate even when network partitions occur, which means that some nodes in the system are unable to communicate with others.
Examples
- Trade-off between consistency and availability. Choosing to prioritize consistency may mean sacrificing some level of availability, while prioritizing availability may mean sacrificing some level of consistency.
- Another example of a trade-off in database management systems is between performance and data durability. Often, improving performance can come at the cost of reducing data durability. This is because some optimizations, such as buffering or caching data in memory, may increase performance but also increase the risk of data loss in the event of a system failure.
Joins:
JOIN is a clause to combine rows from two or more tables, based on a related column between them.
Types of joins:
Natural join:
NATURAL JOIN combines the tables based on the common columns.
SELECT *
FROM table1
NATURAL JOIN table2;
Inner Join:
An inner join returns only the rows that have matching values in both tables.
SELECT column1, column2, ...
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
CROSS JOIN:
In CROSS JOIN, each row from the first table is combined with all rows in the second table. Cross Join is also called CARTESIAN JOIN
SELECT *
FROM table1
CROSS JOIN table2;
Full Join:
- FULL JOIN or FULL OUTER JOIN is the result of both RIGHT JOIN and LEFT JOIN
SELECT *
FROM table1
FULL JOIN table2
ON c1 = c2;
Cross Join
Left join:
Returns each row in the left table, and matched rows from the right table are combined. If there is no match, NULL values are assigned to the right half of the rows in the temporary table.
SELECT *
FROM table1
LEFT JOIN table2
ON table1.c1 = tabl2.c2;
Right join:
Returns each row in the right table, and matched rows from the left table are combined. If there is no match, NULL values are assigned to the left half of the rows in the temporary table.
Syntax
SELECT *
FROM table1
RIGHT JOIN table2
ON table1.c1 = table2.c2;
Aggregations, Filters in queries:
These are the function that we can apply after grouping data.
sum():
- Calculates the sum of columns for a group of rows or entire data Example
SELECT category, SUM(sales) FROM products GROUP BY category;
Avg():
- Calculates the average value of a given column for a group of rows or to entire data Example
SELECT AVG(salary) from EMPLOYEES
count():
- Counts the number of rows in a table or the number of rows in a group of rows
SELECT customer, COUNT(order_id) FROM orders GROUP BY customer;
Min():
- Returns minimum value of a given column for entire data or grouped data
SELECT department, MIN(salary) FROM employees GROUP BY department;
Max():
- Returns the maximum value of a given column for entire data or grouped data
SELECT department, Max(salary) FROM employees GROUP BY department;
Filters:
We use a where clause and having clause to filter rows in a table
- Where clause is put before grouping and having clause is written after grouping
- Inside these clauses we insert conditions Example:
select batsman,sum(runs) as runs from batsman_runs
where team='rcb'
group by batsman
having runs>2000;
Normalization:
- Normalization is a process by which database and table structures are created or modified to address inefficiencies/complexities related to data storage, data modification, and querying database tables ### Normalization can be achieved by following techniques #### 1NF Databases: A 1NF database is an atomic database. In this case, atomic means that each cell contains one value and each row is unique
// This would be a non-atomic table
| Title | Length | Type |
|-----------------|------------|------|
| Example A, B, C | 125 inches | B, C |
| Example A, B, C | 125 inches | B, C |
// This would be an atomic table
| ID | Title | Length | Type |
|----|-----------|------------|------|
| 1 | Example A | 125 inches | B |
| 2 | Example A | 125 inches | C |
Second normal form:
- In a table every non-key value should be fully dependent on a primary key Violating the second normal form
- 2NF is intended to eliminate redundancy and anomalies that can occur when data is stored in a single table.
product_id | supplier_id | product_name | supplier_name | supplier_location
-----------+-------------+--------------+---------------+-----------------
1 | 1 | Widget A | Supplier X | Location 1
2 | 1 | Widget B | Supplier X | Location 1
3 | 2 | Widget C | Supplier Y | Location 2
4 | 2 | Widget D | Supplier Y | Location 2
To normalize the above table to 2nd nf modify the above table like below
Table: products
product_id | product_name
-----------+--------------
1 | Widget A
2 | Widget B
3 | Widget C
4 | Widget D
Table: suppliers
supplier_id | supplier_name | supplier_location
------------+---------------+------------------
1 | Supplier X | Location 1
2 | Supplier Y | Location 2
Third normal form:
- This ensures that every non-key column in a table is not dependent on any other non-key column
- Each column in a table must depend only on the primary key and not on any other columns in the table.
- 3NF is intended to eliminate data redundancy and anomalies that can occur when data is stored in a single table. Violating 3rd normal form:
book_id | author_name | author_email | book_title | book_isbn | book_publisher | publisher_location
--------+-------------+--------------+------------+-----------+----------------+--------------------
1 | John Smith | john@xyz.com | Book A | 123456789 | Publisher X | New York
2 | John Smith | john@xyz.com | Book B | 987654321 | Publisher Y | London
3 | Jane Doe | jane@xyz.com | Book C | 111111111 | Publisher Z | San Francisco
To normalize the above do following
Table: books
book_id | book_title | book_isbn
--------+------------+-----------
1 | Book A | 123456789
2 | Book B | 987654321
3 | Book C | 111111111
Table: authors
author_id | author_name | author_email
----------+-------------+--------------
1 | John Smith | john@xyz.com
2 | Jane Doe | jane@xyz.com
Table: publishers
publisher_id | publisher_name | publisher_location
-------------+----------------+--------------------
1 | Publisher X | New York
2 | Publisher Y | London
3 | Publisher Z | San Francisco
- Use foreign keys to establish a relationship between tables Indexes ### Boyce-Codd Normal Form (BCNF):
- It ensures every non-key column in a table should be functionally dependent only on the primary key and not on any other columns in the table ### Violating the above technique
student_id | student_name | course_id | course_name | instructor_name | instructor_email
-----------+--------------+-----------+-------------+----------------+------------------
1 | John Smith | 1 | Course A | Jane Doe | jane@xyz.com
1 | John Smith | 2 | Course B | John Doe | john@xyz.com
2 | Jane Doe | 2 | Course B | John Doe | john@xyz.com
3 | Bob Johnson | 3 | Course C | Jane Doe | jane@xyz.com
To normalize the above data split the table into three
Table: students
student_id | student_name
-----------+--------------
1 | John Smith
2 | Jane Doe
3 | Bob Johnson
Table: courses
course_id | course_name
----------+------------
1 | Course A
2 | Course B
3 | Course C
Table: instructors
instructor_id | instructor_name | instructor_email
--------------+----------------+------------------
1 | Jane Doe | jane@xyz.com
2 | John Doe | john@xyz.com
- To establish a relationship we can create another table that will hold all details of the enrollment
Table: enrollment
student_id | course_id | instructor_id
-----------+-----------+--------------
1 | 1 | 1
1 | 2 | 2
2 | 2 | 2
3 | 3 | 1
Transactions:
- Nothing but a Set of Queries that are treated as a single unit
- Transactions follow the principles of acid Acid properties are explained above
- In PostgreSQL, transactions are started with the BEGIN statement and are ended with either the COMMIT statement to apply the changes made within the transaction Example
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 123;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 456;
COMMIT;
In the above, if both UPDATE statements execute successfully, the COMMIT statement applies the changes to the database. If an error occurs during the transaction, the ROLLBACK statement will undo the changes made within the transaction, leaving the database in its original state.
Locking mechanism:
- Locks are used to restrict users from access to the database, modifying data to ensure that transactions do not interfere with each other
- Locking ensures that each transaction can access the data it needs without interfering with other transactions that may be accessing the same data. Type of locking mechanism
1. Shared Locks:
Shared locks allow multiple transactions to read a resource simultaneously without interfering with each other. A shared lock allows other transactions to acquire shared locks but prevents them from acquiring exclusive locks.
Example:
BEGIN;
SELECT * FROM accounts WHERE account_id = 5 FOR SHARE;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 5;
COMMIT;
- The SELECT statement acquires a shared lock on the row with account_id = 5 using the FOR SHARE clause
- This ensures that other transactions can also acquire shared locks on this row, but prevents them from acquiring exclusive locks until the current transaction has been completed. ### 2.Exclusive Locks: Exclusive locks are used to ensure that only one transaction at a time can modify a resource. An exclusive lock prevents other transactions from acquiring shared or exclusive locks on the same resource. Example:
BEGIN;
SELECT * FROM accounts WHERE account_id = 3 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 3;
COMMIT;
- The SELECT statement acquires an exclusive lock on the row with account_id = 3 using the FOR UPDATE clause
- This ensures that no other transactions can modify this row until the current transaction has been completed.
3.Row-level Locks:
Row-level locks are used to lock individual rows of data. Row-level locks are useful in situations where multiple transactions need to modify different rows of the same table at the same time.
Example
BEGIN;
SELECT * FROM accounts WHERE account_id = 673 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 673;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 906;
COMMIT;
- The first UPDATE statement acquires a row-level lock on the row with account_id = 673
- This ensures that no other transactions can modify this row until the current transaction has been completed ### 4. Table-level Locks: Table-level locks are used to lock an entire table. Table-level locks are useful in situations where a transaction needs to modify all rows of a table at the same time Example
BEGIN;
LOCK TABLE accounts IN EXCLUSIVE MODE;
UPDATE accounts SET balance = 0;
COMMIT;
5.Advisory Locks:
Advisory locks are user-defined locks that are not enforced by PostgreSQL itself
- These are useful in situations where a user needs to enforce locking based on some application-specific logic. Example:
BEGIN;
SELECT pg_advisory_xact_lock(1234);
UPDATE accounts SET balance = balance - 100 WHERE account_id = 56;
COMMIT;
- The SELECT statement acquires an advisory lock with key 1234
- This ensures that no other transactions can acquire a lock with the same key until the current transaction has been completed.
Database Isolation Levels:
Database isolation defines the degree to which a transaction must be isolated from the data changes made by another transaction (although in reality there may be a large number of transactions running concurrently). The overall goal is to prevent reads and writes of temporary, aborted, or otherwise incorrect data written by concurrent transactions.
Types of isolation levels
- Read uncommitted
- Read committed
- Repeatable read
- Serializable
1. Read uncommitted:
This kind of isolation level is used when we want even the uncommitted values of the rows. All updates and inserts that aren't even committed should be reflected in our transaction. This type of isolation level is commonly used in booking systems where we should be able to see these changes when another transaction tries to update a seat's availability even though that transaction is not committed.
2. Read committed:
It has 2 properties
- No dirty reads:
- This means values updated through a transaction will only be visible after the completion of a transaction
- No Dirty writes:
- Whenever a transaction is running on the database and updating particular values those values will not be overwritten by other transactions.
- Next transaction will only update values after the completion of the previous transaction #### Implementation of reading committed
- Row level locking for write operations
- Multiple versions of the doc #### 3. Repeatable read:
- Transactions started and not committed will not be visible
- Transactions executed later will not be visible
4. Serializable isolation level:
This is the strongest of all isolation levels and guarantees pure isolation. No other transaction can read and write values until that transaction type is executed in the database. This isolation level solves all the problems encountered in all 3 isolation levels discussed previously, but since this only allows one query to be run on the database, it limits the scalability of the system.
Triggers:
Triggers are SQL codes that are automatically executed in response to certain events on a particular table
- These are used to maintain the integrity of data #### Syntax
create trigger trigger_name
(before|after)[Insert | Update |Delete]
on [Table name]
[For each row | For each column]
[Trigger body]
- Nested triggers in SQL are the actions that are automatically executed when a certain database operation is performed #### operations in trigger
- Drop: To remove a trigger from the database
Drop trigger trigger_name
- Displaying triggers in a database
SHOW TRIGGERS
IN database_name;
- Insert Before inserting triggers:
- Used to update or validate record values before they are saved to the database
CREATE TRIGGER calculate
before insert
ON STUDENT
FOR EACH ROW
set new.marks=new.marks+100;
After inserting the trigger:
- Used to access field values that are set by the system and to effect changes in other records Example:
CREATE TRIGGER total_mark
after insert
ON STUDENT
FOR EACH ROW
insert into final_mark values(new.marks);
Top comments (0)