Manisha Kundrapu

Posted on Feb 21, 2023

SQL Concepts

#sql #beginners

The following are some of the concepts that are included to understand databases:

ACID
CAP Theorem
Joins
Aggregations and Filters in Queries
Normalization
Indexes
Transactions
Locking Mechanism
Database Isolation Levels
Triggers

ACID

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability.

These are the key properties of a database transaction that ensure data integrity and reliability.

1. Atomicity:

Atomicity guarantees that a transaction is treated as a single, indivisible unit of work that either completes in its entirety or is rolled back (undone) completely in the event of any failure or error.

Example:

A bank transfer that involves deducting money from one account and adding it to another account is an example of an atomic transaction. If either the debit or credit transaction fails, both transactions should be rolled back to ensure data consistency.

2. Consistency:

Consistency ensures that a transaction takes the database from one valid state to another valid state. It ensures that the database is consistent both before and after the transaction.

Example:

If a transaction involves adding a new record to a database, it should ensure that the record meets all the constraints and rules set forth by the database schema before being added.

3. Isolation:

Isolation ensures that each transaction is executed independently of any other transaction, without interference or dependence on any other transaction occurring simultaneously.

Example:

If two transactions occur simultaneously, one transaction should not be able to access the data modified by another transaction until the transaction is completed and the data is committed to the database.

4. Durability:

Durability ensures that once a transaction is committed to the database, it will remain there permanently, even in the event of power failures, crashes, or other system failures.

Example:

If a database crashes after a transaction has been committed, the data from that transaction will still be available when the database is restored.

Overall, ACID provides a set of principles to ensure that database transactions are reliable, consistent, and accurate, making it an essential concept for building robust and scalable database systems.

CAP Theorem

The CAP theorem, also known as Brewer's theorem, is a fundamental concept in distributed database systems that describes the tradeoffs that must be made when building highly available and fault-tolerant systems.

The theorem states that in a distributed database system, you can only have two out of the following three guarantees.

1. Consistency:

All nodes in the system see the same data at the same time, regardless of which node receives the update request.

2. Availability:

Every request made to the system receives a response, without guarantee that it contains the most recent version of the information.

3. Partition tolerance:

The system continues to operate despite arbitrary message loss or failure of part of the system.

In practical terms, this means that when building a distributed database system, you must choose which two of these guarantees are most important for your application and optimize the system accordingly.

For example, consider an e-commerce website that relies on a distributed database to manage inventory and process transactions. In this scenario, consistency and availability are the two most important guarantees, as it is critical that the website always has accurate inventory information and can process transactions in a timely manner. Partition tolerance may be less important, as the website can temporarily go offline if there is a network issue or a node failure.

On the other hand, consider a social media platform where users are frequently posting updates, and the priority is to ensure that all updates are eventually replicated across all nodes in the system, even in the event of network partitions. In this scenario, partition tolerance and consistency are the two most important guarantees, while availability may be less critical.

Overall, the CAP theorem highlights the fundamental tradeoffs that must be made when building distributed database systems, and emphasizes the importance of choosing the appropriate tradeoffs based on the requirements of your specific application.

Joins

In SQL, a join is a way to combine data from two or more tables into a single result set based on a common column. There are several types of joins, including:

1. Inner join:

Returns only the rows that have matching values in both tables.

Example:

Suppose you have two tables - "orders" and "customers". To get a list of all orders along with the customer name, you can use an inner join on the "customer_id" column.

SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.customer_id;

2. Left join:

Returns all rows from the left table and matching rows from the right table. If there are no matching rows in the right table, the result will contain NULL values for the right table columns.

Example:

Suppose you have two tables - "customers" and "orders". To get a list of all customers and their orders (if any), you can use a left join on the "customer_id" column.

SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id;

3. Right join:

Returns all rows from the right table and matching rows from the left table. If there are no matching rows in the left table, the result will contain NULL values for the left table columns.

Example:

Suppose you have two tables - "orders" and "customers". To get a list of all orders and the customer name (if available), you can use a right join on the "customer_id" column.

SELECT orders.order_id, customers.customer_name
FROM orders
RIGHT JOIN customers
ON orders.customer_id = customers.customer_id;

4. Full outer join:

Returns all rows from both tables, with NULL values in the columns where there are no matches.

Example:

Suppose you have two tables - "customers" and "orders". To get a list of all customers and all orders, you can use a full outer join on the "customer_id" column.

SELECT customers.customer_name, orders.order_id
FROM customers
FULL OUTER JOIN orders
ON customers.customer_id = orders.customer_id;

Joins are a powerful tool for combining data from multiple tables in SQL, and can be used in a variety of scenarios to retrieve useful information from a database.

Aggregations, Filters in queries

In SQL, aggregations and filters are powerful tools for querying and summarizing large datasets.

Aggregations are used to calculate summary statistics, such as averages, sums, counts, and maximum or minimum values, on one or more columns of data. The most commonly used aggregation functions in SQL include:

1. COUNT:

Returns the number of rows in a table or the number of rows that match a specified condition.

Example:

To count the number of orders in a table called "orders", you can use the following query:

SELECT COUNT(*)
FROM orders;

2. SUM:

Returns the sum of the values in a specified column.

Example:

To calculate the total revenue from a table called "sales", you can use the following query:

SELECT SUM(revenue)
FROM sales;

3. AVG:

Returns the average value of the values in a specified column.

Example:

To calculate the average price of a product from a table called "products", you can use the following query:

SELECT AVG(price)
FROM products;

Filters, on the other hand, are used to retrieve specific subsets of data from a table based on specified conditions.

The most commonly used filter clauses in SQL include:

1. WHERE:

Filters the data based on a specified condition.

Example:

To retrieve all orders with a total value greater than $1000 from a table called "orders", you can use the following query.

SELECT *
FROM orders
WHERE total_value > 1000;

2. GROUP BY:

Groups the data based on one or more columns, and allows for aggregations to be calculated for each group.

Example:

To calculate the total revenue for each region from a table called "sales", you can use the following query:

SELECT region, SUM(revenue)
FROM sales
GROUP BY region;

3. HAVING:

Filters the results of a GROUP BY clause based on a specified condition.

Example:

To retrieve all regions with a total revenue greater than $100,000 from a table called "sales", you can use the following query.

SELECT region, SUM(revenue)
FROM sales
GROUP BY region
HAVING SUM(revenue) > 100000;

Aggregations and filters are two essential tools for working with large datasets in SQL, and can be used together to generate powerful insights and reports.

Normalization

Normalization is a process of organizing data in a database to reduce redundancy and improve data integrity.

It involves breaking down a table into smaller tables and establishing relationships between them to eliminate data duplication and inconsistencies.

There are different levels of normalization, also known as normal forms.

The most commonly used are:

1. First Normal Form (1NF):

A table is in 1NF if it has no repeating groups or arrays. Each column in the table must contain atomic (indivisible) values.

For example, a table with a column for phone numbers should not store multiple phone numbers in a single field separated by commas.

2. Second Normal Form (2NF):

A table is in 2NF if it is in 1NF and all non-key columns are dependent on the entire primary key. In other words, all columns in a table must be related to the primary key, and not just a part of it.

For example, a table with customer orders should have separate tables for customers and orders, with the order table referencing the customer table through a foreign key.

3. Third Normal Form (3NF):

A table is in 3NF if it is in 2NF and has no transitive dependencies. In other words, all non-key columns in a table must be related only to the primary key, and not to other non-key columns.

For example, a table with information about customers and their orders should not have a column for the salesperson who made the sale, as this information is not directly related to the customer or the order.

Example:

Consider a table called "Customer_Order" with the following columns.

Customer_ID
Customer_Name
Customer_Phone
Order_ID
Order_Date
Order_Amount

This table is not normalized as it contains repeating groups (phone numbers) and has transitive dependencies (order amount is dependent on order ID and not directly on customer ID). To normalize this table, we can create two separate tables as follows.

Customer table:

Customer_ID (Primary Key)
Customer_Name
Customer_Phone
Order table:

Order_ID (Primary Key)
Order_Date
Order_Amount
Customer_ID (Foreign Key)

By breaking down the original table into two smaller tables, we have eliminated redundancy and established a relationship between the customer and order tables through the customer ID and order ID columns.

This ensures data consistency and reduces the likelihood of errors in the database.

Indexes

Indexes in SQL are used to improve the performance of queries by allowing faster data retrieval.

An index is a data structure that contains a copy of selected columns of a table arranged in a specific order that enables faster lookups, sorting, and filtering.

When a query is executed, the database engine uses the index to locate the data more quickly, rather than scanning the entire table.

Indexes are typically created on columns that are frequently used in queries, such as primary keys or columns used in WHERE clauses.

There are several types of indexes available in SQL, including:

1. Clustered index:

A clustered index determines the physical order of the rows in a table based on the values in one or more columns. Each table can have only one clustered index.

2. Non-clustered index:

A non-clustered index creates a separate data structure that contains the index key values and a pointer to the location of the actual data. Multiple non-clustered indexes can be created on a table.

3. Unique index:

A unique index ensures that the values in a column or combination of columns are unique. Each table can have multiple unique indexes.

Example:

Consider a table called "Employee" with the following columns.

Employee_ID (Primary Key)
Employee_Name
Employee_Age
Employee_Salary

To improve the performance of queries that frequently use the Employee_Name column, we can create a non-clustered index on that column as follows.

CREATE NONCLUSTERED INDEX ix_Employee_Name ON Employee (Employee_Name)

This index creates a separate data structure containing the Employee_Name column and a pointer to the location of the actual data in the Employee table. When a query includes the Employee_Name column in a WHERE clause, the database engine can use the index to quickly locate the relevant rows in the table.

In addition, we can create a unique index on the Employee_ID column to ensure that each employee has a unique identifier:

CREATE UNIQUE INDEX ix_Employee_ID ON Employee (Employee_ID)

This index prevents duplicate values in the Employee_ID column, ensuring data integrity and efficient data retrieval for queries that use the Employee_ID column.

Transactions

In SQL, a transaction is a sequence of database operations that are executed as a single unit of work.

A transaction ensures that all the operations within it are completed successfully, or none of them are, so that the database remains in a consistent state.

Transactions typically include multiple SQL statements, such as SELECT, INSERT, UPDATE, and DELETE, and they are used to ensure data integrity and consistency.

If any part of a transaction fails, the database will automatically roll back all the changes made during the transaction to ensure that the database remains consistent.

To ensure that a sequence of SQL statements is executed as a transaction, we use the following commands:

BEGIN TRANSACTION: Starts a new transaction.
COMMIT: Ends a transaction and saves all the changes made during it.
ROLLBACK: Ends a transaction and undoes all the changes made during it.

Example:

Consider a table called "Bank_Account" with the following columns:

Account_ID (Primary Key)
Account_Balance

To transfer funds from one account to another, we can use a transaction to ensure that the transfer is completed successfully, or none of it is. The transaction can be implemented using the following SQL statements.

BEGIN TRANSACTION
UPDATE Bank_Account SET Account_Balance = Account_Balance - 500 WHERE Account_ID = 12345
UPDATE Bank_Account SET Account_Balance = Account_Balance + 500 WHERE Account_ID = 67890
COMMIT

In this example, the transaction begins with the BEGIN TRANSACTION statement. The two UPDATE statements transfer $500 from the account with the ID 12345 to the account with the ID 67890. Finally, the COMMIT statement ends the transaction and saves the changes made to the database.

If any part of the transaction fails, such as if there are insufficient funds in the account with the ID 12345, the ROLLBACK statement can be used to undo all the changes made during the transaction and ensure that the database remains in a consistent state.

Locking mechanism

In SQL, a locking mechanism is used to control access to data in a database and ensure that multiple transactions can work with the same data without causing conflicts.

Locking can prevent data inconsistencies and ensure data integrity by preventing multiple transactions from accessing the same data at the same time.

There are several types of locks available in SQL, including:

1. Shared lock:

Allows multiple transactions to read the same data at the same time, but prevents any transaction from modifying the data.

2. Exclusive lock:

Prevents any other transaction from accessing or modifying the data while the lock is held.

3. Update lock:

Allows multiple transactions to read the same data, but only one transaction can update the data at a time.

4. Intent lock:

Indicates that a transaction intends to acquire a lock on a particular resource, such as a row or a page.

5. Schema lock:

Prevents any other transaction from modifying the schema of a database object while the lock is held.

Example:
Consider a table called "Inventory" with the following columns:

Item_ID (Primary Key)
Item_Name
Item_Quantity

To prevent two transactions from updating the same item quantity at the same time, we can use a locking mechanism.

The following example shows how to use a shared lock to allow multiple transactions to read the same data, but prevent any transaction from modifying the data:

Transaction 1:

BEGIN TRANSACTION
SELECT Item_Quantity FROM Inventory WITH (UPDLOCK) WHERE Item_ID = 12345
-- Process the data
COMMIT

Transaction 2:

BEGIN TRANSACTION
SELECT Item_Quantity FROM Inventory WITH (UPDLOCK) WHERE Item_ID = 12345
-- Process the data
COMMIT

In this example, both transactions acquire an update lock on the row with the Item_ID 12345 using the WITH (UPDLOCK) clause in the SELECT statement. This prevents any other transaction from acquiring an exclusive lock on the same row while the update lock is held, ensuring that only one transaction can modify the data at a time.

Note that the shared lock acquired by the transactions allows other transactions to read the same data, but not modify it. If a transaction needs to modify the data, it must acquire an exclusive lock on the row, which prevents any other transaction from accessing the data until the lock is released.

Database Isolation Levels

In SQL, database isolation levels define how transactions interact with each other and with the data in the database.

Isolation levels determine the level of concurrency that is allowed in a database system, which can affect the consistency and accuracy of the data.

There are four commonly used isolation levels in SQL:

1. Read Uncommitted:

The lowest level of isolation, which allows transactions to read data that has been modified by other transactions, but not yet committed.

2. Read Committed:

This level of isolation allows transactions to read only committed data, which prevents dirty reads.

However, it still allows non-repeatable reads, where a transaction reads the same data twice and gets different results because another transaction has modified the data in between.

3. Repeatable Read:

This level of isolation ensures that a transaction can read the same data multiple times and get the same result, even if another transaction modifies the data in between.

This level prevents non-repeatable reads, but still allows phantom reads, where a transaction reads a set of rows and another transaction inserts new rows that match the same criteria.

4. Serializable:

The highest level of isolation, which provides the strongest level of consistency.

It ensures that transactions are executed as if they were executed sequentially, which prevents dirty reads, non-repeatable reads, and phantom reads.

However, this level of isolation can lead to a high level of contention, which can affect the performance of the database system.

Example:

Consider a table called "Order_Details" with the following columns:

Order_ID (Primary Key)
Product_Name
Quantity

To demonstrate the different isolation levels, we can use the following example:

Transaction 1:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION
SELECT SUM(Quantity) FROM Order_Details WHERE Product_Name = 'Product A'
-- Process the data
COMMIT

Transaction 2:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
SELECT SUM(Quantity) FROM Order_Details WHERE Product_Name = 'Product A'
-- Process the data
COMMIT

Transaction 3:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT SUM(Quantity) FROM Order_Details WHERE Product_Name = 'Product A'
-- Process the data
COMMIT

Transaction 4:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT SUM(Quantity) FROM Order_Details WHERE Product_Name = 'Product A'
-- Process the data
COMMIT

In this example, each transaction sets a different isolation level and retrieves the sum of the quantity for the product with the name "Product A" from the "Order_Details" table.

The different isolation levels affect how the transactions interact with each other and with the data in the database, which can lead to different results and different levels of consistency.

For example, the READ UNCOMMITTED level allows dirty reads, which means that Transaction 1 can read data that has been modified but not yet committed by other transactions.

On the other hand, the SERIALIZABLE level ensures the highest level of consistency by preventing dirty reads, non-repeatable reads, and phantom reads, but it can also lead to a high level of contention and reduced performance.

Triggers

In SQL, a trigger is a special type of stored procedure that is automatically executed in response to certain events or actions in a database.

Triggers can be used to enforce business rules, maintain data integrity, and automate certain database operations.

There are two types of triggers in SQL:

1. DML Triggers:

These triggers are fired in response to data manipulation language (DML) events, such as INSERT, UPDATE, and DELETE statements.

2. DDL Triggers:

These triggers are fired in response to data definition language (DDL) events, such as CREATE, ALTER, and DROP statements.

Triggers can be defined to execute either before or after the triggering event. A trigger can be defined to execute once for each statement, or once for each row that is affected by the statement.

Example:

Consider a table called "Employee" with the following columns:

Employee_ID (Primary Key)
Employee_Name
Salary

To demonstrate the use of triggers, we can create a trigger that automatically updates the "Salary" column of the "Employee" table whenever a new record is inserted into the table. The trigger will double the value of the "Salary" column for each new record.

CREATE TRIGGER trg_employee_insert
ON Employee
FOR INSERT
AS
BEGIN
UPDATE Employee SET Salary = Salary * 2 WHERE Employee_ID IN (SELECT Employee_ID FROM inserted)
END

In this example, the trigger is defined to execute after an insert statement is executed on the "Employee" table. The trigger updates the "Salary" column of the newly inserted record by multiplying it by 2. The "inserted" table is a special table that is available within a trigger and contains the rows that were inserted or modified by the triggering statement.

Now, if we insert a new record into the "Employee" table with the following statement:

INSERT INTO Employee (Employee_Name, Salary) VALUES ('John', 50000)

The trigger will automatically update the "Salary" column of the new record to 100000. Thus, the trigger has enforced a business rule that specifies that all new employees must have a salary that is double the original value.