DEV Community

Saravanan
Saravanan

Posted on

SQL Concepts

SQL(Structured Query Language) is a domain-specific language that is used for managing data in a relational database. We will look at some of the concepts of SQL in detail below.

ACID

ACID1 - Atomicity, Consistency, Isolation, and Durability.

Atomicity

Atomicity guarantees that a transaction, which consists of one or multiple SQL statements, is either entirely executed or not executed at all. A transaction can be committed, making all its changes permanent, or rolled back, undoing all of its changes. As a result, the database has only two states - one before the transaction is executed and one after it is completed - with no intermediate states.

Consistency

Consistency ensures that the database is in a uniform state both before and after the execution of a transaction. This encompasses all rules, constraints, and triggers, and any deviations from this state of uniformity will result in the transaction being aborted.

Isolation

In situations where multiple transactions are executed simultaneously in parallel, the final state of the database must be identical to the state that would result from sequentially executing those transactions.

Durability

Durability ensures that the modifications made to the database persist once a transaction is committed. Such changes should only result from SQL transactions, and external factors should not impact them.

CAP Theorem

Eric Brewer proposed the CAP theorem2, which states that any distributed data store can provide only two out of the three following properties: consistency, availability, and partition tolerance.

  • Consistency - At any given moment, all users view identical data, and any alterations made are instantaneously reflected everywhere.
  • Availability - The database responds to every request, regardless of whether or not the response is consistent.
  • Partition tolerance - A partition occurs when one or more nodes of a database become unavailable. Partition tolerance refers to the system's ability to continue functioning despite such partitions.

When a node fails, the system must choose whether to prioritize consistency or availability. If consistency is prioritized, transactions are rolled back to synchronize the non-failed nodes with the failed ones. On the other hand, if availability is prioritized, transactions proceed, but the nodes may become inconsistent.

Joins

A join is a keyword in SQL that joins two or more tables to form a single table based upon a condition on matching fields. Five common types of joins are used.

  • JOIN/INNER JOIN - returns rows from both tables where matching values are found in the joined columns of both tables.
  • LEFT JOIN - returns every row from the left table. When a row is found with a matching value in the right table, values from that row are included in the results.
  • RIGHT JOIN - returns every row from the right table. When a row is found with a matching value in the left table, values from that row are included in the results.
  • FULL OUTER JOIN - returns every row from both tables and joins the rows where values in the joined columns match. If there's no match for a value in either the left or right table, the query result contains no values for that table.
  • CROSS JOIN - return every possible combination of rows from both tables.

Aggregations

Aggregate functions combine values from multiple rows, perform an operation on those values, and return a single result. Some of the commonly used aggregate functions are given below.

  • count - returns a count of the column with values that are not null. To count all rows with null values use * instead of a column name.
  • max - returns the maximum value of a given column.
  • min - returns the minimum value of a given column.
  • sum - returns the sum value of a given column.
  • avg - returns the average of the given numeric column.

Aggregate functions are often used with GROUP BY to group the result by one or more columns. The aggregate function works on the set of values based on group by and not by the whole column.

Filter in queries

Filtering the result based on any condition is done using the WHERE clause in SQL. The filtering is done by comparison and matching of values from selected data.

  • = equal to(case sensitive)
  • != not equal to(case sensitive)
  • /> greater than
  • < lesser than
  • />= greater than or equal to
  • <= lesser than or equal to
  • ~ matches with a regular expression
  • BETWEEN - passes value only within the range
  • IN - if the column value in the given set
  • LIKE - match the column value with the pattern
  • ILIKE - same as LIKE with case insensitive match
  • NOT - prefix to a match to negate it

Data normalization3

Normalization involves organizing data in a database to increase its flexibility by removing redundancy and inconsistent dependency. There are several forms of normalization, but databases are typically designed up to the third normal form, despite the existence of higher levels of normalization.

  • First normal form
    • Eliminate repeating groups in individual tables.
    • Create a separate table for each set of related data.
    • Identify each set of related data with a primary key.
  • Second normal form
    • Create separate tables for sets of values that apply to multiple records.
    • Relate these tables with a foreign key.
  • Third normal form
    • Eliminate fields that do not depend on the key.

Indexes4

An index is a data structure that provides search results fast for a particular column in a database. Generating an index on a table's field produces another data structure that holds the field's value and a pointer to its corresponding record. This index structure is then sorted, enabling Binary Searches to be conducted on it.

The data of the database are stored as data blocks. These data blocks are accessed entirely, making them the atomic disk access operation. Sorting these data blocks helps the database to access the data using binary search, which has a significant performance improvement over searching for the data in unsorted and randomly places data blocks.

Transactions

A transaction in a database consists of a sequence of data manipulation statements that must execute either completely or not at all, resulting in the database being in a consistent state. The main purpose of a transaction is to group several steps into a single, all-or-nothing operation. The intermediate states of a transaction are not observable by other concurrent transactions, and if an error prevents the transaction from completing, none of its steps will affect the database. Transactions are designed to support the ACID principles and provide concurrent access to multiple users and systems that are resistant to failures.

Locking mechanism5

Locking enables concurrent access to a database by multiple users. To access a data item for reading or writing, a lock is placed on it. A user requests a lock on the part of the data they need to access, and no conflicting lock is granted on the same data until the current lock is released. Two types of locks are shared and exclusive.

A shared lock allows reading from the locked row or table, hence also called a read lock. Multiple transactions can hold a shared lock on the same resource and read from it. However, no transaction is permitted to update the resource while a shared lock is held on it.

An exclusive lock completely locks the row or table and enables the transaction to update the row in isolation. Only one transaction can obtain an exclusive lock on a particular resource at a time. Other processes that wish to acquire the lock on the same resource must wait until the lock is released. Once released, the remaining processes can acquire the lock and make modifications.

Database isolation6

The level of isolation in a database determines how much a transaction should be separated from the data changes made by other transactions. It is to avoid the retrieval or storage of temporary, aborted, or inconsistent data written by simultaneous transactions. There are four various levels of database isolation.

  • Read Uncommitted - no transaction can update a database row if another transaction has already updated it and not committed but, allows dirty reads.
  • Read Committed - does not allow any other transaction to write or read a row to which another transaction has written but not yet committed.
  • Repeatable Read - a transaction that reads data from a row blocks any other writing transactions from accessing the same row.
  • Serializable - locks the whole table to prevent any other transactions from inserting or reading data from it.
  • Snapshot - transaction for modifying data works on a copy of committed data with a lock. Parallel read transactions are given the snapshot of the committed data.

Triggers

A trigger is a special type of stored procedure that automatically runs when an event occurs in the database such as INSERT, UPDATE, DELETE, and TRUNCATE.

CREATE TRIGGER my_trigger
BEFORE INSERT ON my_table
FOR EACH ROW
EXECUTE FUNCTION my_function();
Enter fullscreen mode Exit fullscreen mode

The above example is to create a trigger in postgres named my_trigger which gets executes my_function for every insert in the my_table.


  1. https://www.ibm.com/docs/en/cics-ts/5.4?topic=processing-acid-properties-transactions 

  2. https://www.ibm.com/topics/cap-theorem 

  3. https://learn.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description 

  4. https://stackoverflow.com/questions/1108/how-does-database-indexing-work 

  5. https://medium.com/inspiredbrilliance/what-are-database-locks-1aff9117c290 

  6. https://medium.com/nerd-for-tech/understanding-database-isolation-levels-c4ebcd55c6b9 

Top comments (0)