DEV Community

Ravi Shankar
Ravi Shankar

Posted on

SQL Concepts

ACID Properties

DBMS is the management of data that should remain integrated when any changes are done in it.

It is because if the integrity of the data is affected, whole data will get disturbed and corrupted.

Therefore, to maintain the integrity of the data, there are four properties described in the database management system, which are known as the ACID properties.

The ACID properties are meant for the transaction that goes through a different group of tasks, and there we come to see the role of the ACID properties.

1. Atomicity

The term atomicity defines that the data remains atomic.

It means if any operation is performed on the data, either it should be performed or executed completely or should not be executed at all.

  • Commit: In case a transaction commits, the changes made are visible to us. Thus, atomicity is also called the ‘All or nothing rule’.

  • Abort: In case a transaction aborts, the changes made to the database are not visible to us.

2. Consistency

Consistency means that we have to maintain the integrity constraints so that any given database stays consistent both before and after a transaction.

3. Isolation

Isolation ensures the occurrence of multiple transactions concurrently without a database state leading to a state of inconsistency.

A transaction occurs independently, i.e. without any interference.

Any changes that occur in any particular transaction would NOT be ever visible to the other transactions unless and until this particular change in this transaction has been committed or written to the memory.

4. Durability

The durability property states that once the execution of a transaction is completed, the modifications and updates on the database gets written on and stored in the disk.

These persist even after the occurrence of a system failure. Such updates become permanent and get stored in non-volatile memory. Thus, the effects of this transaction are never lost.

CAP Theorem

The CAP theorem, originally introduced as the CAP principle, can be used to explain some of the competing requirements in a distributed system with replication. It is a tool used to make system designers aware of the trade-offs while designing networked shared-data systems.

  • Consistency

Consistency means that all clients see the same data at the same time, no matter which node they connect to.

For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed successful.

  • Availability

Availability means that any client making a request for data gets a response, even if one or more nodes are down. Another way to state this all working nodes in the distributed system return a valid response for any request, without exception.

  • Partition tolerance

A partition is a communications break within a distributed system a lost or temporarily delayed connection between two nodes.

Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

Joins

Joins clause is used to combine records from two or more tables in a database. A JOIN is a means for combining fields from two tables by using values common to each.

Join Types in SQL are:

  • The CROSS JOIN
  • The INNER JOIN
  • The LEFT OUTER JOIN
  • The RIGHT OUTER JOIN
  • The FULL OUTER JOIN

The CROSS JOIN

A CROSS JOIN matches every row of the first table with every row of the second table. If the input tables have m and n columns, respectively, the resulting table will have m + n columns. Because CROSS JOIN have the potential to generate extremely large tables, care must be taken to use them only when appropriate.

Syntax:

SELECT ... FROM TABLE1 CROSS JOIN TABLE2 ...

SELECT EMP_ID, NAME, DEPT 
FROM COMPANY 
CROSS JOIN DEPARTMENT;
Enter fullscreen mode Exit fullscreen mode
emp_id name dept
1 Paul IT Billing
1 Mark IT Billing
1 David IT Billing
1 Allen IT Billing
1 Paul IT Billing
2 Paul Engineering
2 Mark Engineering
2 David Engineering
2 Allen Engineering
2 Paul Engineering
7 Paul Finance
7 Mark Finance
7 David Finance
7 Allen Finance
7 Paul Finance

The INNER JOIN

A INNER JOIN creates a new result table by combining column values of two tables (table1 and table2) based upon the join-condition.

The query compares each row of table1 with each row of table2 to find all pairs of rows, which satisfy the join-condition.

When the join-condition is satisfied, column values for each matched pair of rows of table1 and table2 are combined into a result row.

Syntax:

SELECT TABLE1.COLUMN1,TABLE2.COLUMN1 
FROM TABLE1 
[INNER] JOIN TABLE2 
ON TABLE1.COMMON_COLUMN = TABLE2.COMMON_COLUMN ...

SELECT EMP_ID, NAME, DEPT 
FROM COMPANY 
INNER JOIN DEPARTMENT
ON COMPANY.ID = DEPARTMENT.EMP_ID;
Enter fullscreen mode Exit fullscreen mode
emp_id name dept
1 Paul IT Billing
2 Allen Engineering

The OUTER JOIN

The OUTER JOIN is an extension of the INNER JOIN. SQL standard defines three types of OUTER JOINs: LEFT, RIGHT, and FULL and PostgreSQL supports all of these.

The LEFT OUTER JOIN

In case of LEFT OUTER JOIN, an inner join is performed first.

Then, for each row in table T1 that does not satisfy the join condition with any row in table T2, a joined row is added with null values in columns of T2.

Thus, the joined table always has at least one row for each row in T1.

Syntax:

SELECT T1.COLUMN_1,T2.COLUMN1
FROM TABLE1 AS T1
LEFT [OUTER] JOIN TABLE2 AS T2
ON T1.COMMON_COLUMN = T2.COMMON_COLUMN ...

SELECT EMP_ID, NAME, DEPT 
FROM COMPANY 
LEFT OUTER JOIN DEPARTMENT
ON COMPANY.ID = DEPARTMENT.EMP_ID;
Enter fullscreen mode Exit fullscreen mode
emp_id name dept
1 Paul IT Billing
2 Allen Engineering
David
Paul
Mark

The RIGHT OUTER JOIN

In case of RIGHT OUTER JOIN, an inner join is performed first.

Then, for each row in table T2 that does not satisfy the join condition with any row in table T1, a joined row is added with null values in columns of T1.

Thus, the joined table always has at least one row for each row in T2.

Syntax:

SELECT T1.COLUMN_1,T2.COLUMN1
FROM TABLE1 AS T1
RIGHT [OUTER] JOIN TABLE2 AS T2
ON T1.COMMON_COLUMN = T2.COMMON_COLUMN ...

SELECT EMP_ID, NAME, DEPT 
FROM COMPANY 
RIGHT OUTER JOIN DEPARTMENT
ON COMPANY.ID = DEPARTMENT.EMP_ID;
Enter fullscreen mode Exit fullscreen mode
emp_id name dept
1 Paul IT Billing
2 Allen Engineering
7 Finance

The FULL OUTER JOIN

First, an inner join is performed.

Then, for each row in table T1 that does not satisfy the join condition with any row in table T2, a joined row is added with null values in columns of T2.

In addition, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

Syntax:

SELECT T1.COLUMN_1,T2.COLUMN1
FROM TABLE1 AS T1
FULL [OUTER] JOIN TABLE2 AS T2
ON T1.COMMON_COLUMN = T2.COMMON_COLUMN ...

SELECT EMP_ID, NAME, DEPT 
FROM COMPANY 
FULL OUTER JOIN DEPARTMENT
ON COMPANY.ID = DEPARTMENT.EMP_ID;
Enter fullscreen mode Exit fullscreen mode
emp_id name dept
1 Paul IT Billing
2 Allen Engineering
7 Finance
David
Paul
Mark

Aggregation

An aggregation is the result returned from running an aggregate function in a query and is displayed in a new column in the results table.

An aggregate function is a function that results of a set of the data queried being condensed into an aggregation.

  • sum()

sum() function returns the sum of all numeric value in a specified column.

Syntax:

SELECT SUM(COLUMN) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode
  • count(*)

count(*) function is used to count number of input rows in the table.

Syntax:

SELECT COUNT(*) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode
  • count(Expression)

counts number of input rows for which the value of expression is not null.

Syntax:

SELECT COUNT(COLUMN) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode
  • avg(expression)

The average (arithmetic mean) of all non-null input values.

Syntax:

SELECT AVG(COLUMN) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode
  • min(expression)

returns minimum value of expression across all non-null input values.

Syntax:

SELECT MIN(COLUMN) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode
  • max(expression)

returns maximum value of expression across all non-null input values.

Syntax:

SELECT MAX(COLUMN) FROM TABLE;
Enter fullscreen mode Exit fullscreen mode

Filters in queries

Filters in queries are used to reduce the data that is displayed on the output or to reduce the data in the calculation of an aggregate function.

  • WHERE clause:

WHERE clause is used to specify a condition while fetching the data from single table or joining with multiple tables.

If the given condition is satisfied, only then it returns specific value from the table. You can filter out rows that you do not want included in the result-set by using the WHERE clause.

The WHERE clause not only is used in SELECT statement, but it is also used in UPDATE, DELETE statement, etc., which we would examine in subsequent chapters.

Syntax:

SELECT column1, column2, columnN
FROM table_name
WHERE [search_condition]
Enter fullscreen mode Exit fullscreen mode
  • GROUP BY clause:

The GROUP BY clause divides the rows returned from the SELECT statement into groups. For each group, you can apply an aggregate function e.g., SUM() to calculate the sum of items or COUNT() to get the number of items in the groups.

Syntax:

SELECT column_1, column_2,..., aggregate_function(column_3)
FROM table_name
GROUP BY column_1,column_2,...;
Enter fullscreen mode Exit fullscreen mode
  • HAVING clause

The HAVING clause specifies a search condition for a group or an aggregate. The HAVING clause is often used with the GROUP BY clause to filter groups or aggregates based on a specified condition.

Syntax:

SELECT column1,aggregate_function (column2)
FROM table_name
GROUP BY column1
HAVING condition;
Enter fullscreen mode Exit fullscreen mode

Normalization

Normalization is the process to eliminate data redundancy and enhance data integrity in the table. Normalization also helps to organize the data in the database. It is a multi-step process that sets the data into tabular form and removes the duplicated data from the relational tables.

Normalization organizes the columns and tables of a database to ensure that database integrity constraints properly execute their dependencies. It is a systematic technique of decomposing tables to eliminate data redundancy (repetition) and undesirable characteristics like Insertion, Update, and Deletion anomalies.

1st Normal Form (1NF)

  • A table is referred to as being in its First Normal Form if atomicity of the table is 1.

  • Here, atomicity states that a single cell cannot hold multiple values. It must hold only a single-valued attribute.

  • The First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

2nd Normal Form (2NF)

The first condition for the table to be in Second Normal Form is that the table has to be in First Normal Form. The table should not possess partial dependency. The partial dependency here means the proper subset of the candidate key should give a non-prime attribute.

3rd Normal Form (3NF)

  • The first condition for the table to be in Third Normal Form is that the table should be in the Second Normal Form.

  • The second condition is that there should be no transitive dependency for non-prime attributes, which indicates that non-prime attributes (which are not a part of the candidate key) should not depend on other non-prime attributes in a table. Therefore, a transitive dependency is a functional dependency in which A → C (A determines C) indirectly, because of A → B and B → C (where it is not the case that B → A).

  • The third Normal Form ensures the reduction of data duplication. It is also used to achieve data integrity.

Boyce Codd Normal Form (BCNF)

  • Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF and was developed by Raymond F.

  • Boyce and Edgar F. Codd to tackle certain types of anomalies which were not resolved with 3NF.

  • The first condition for the table to be in Boyce Codd Normal Form is that the table should be in the third normal form.

  • Secondly, every Right-Hand Side (RHS) attribute of the functional dependencies should depend on the super key of that particular table.

Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

PostgreSQL provides several index types: B-tree, Hash, GiST, SP-GiST and GIN. Each Index type uses a different algorithm that is best suited to different types of queries. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations.

Syntax:

CREATE INDEX index_name ON table_name;
Enter fullscreen mode Exit fullscreen mode

Transactions

Transactions are a fundamental concept of all database systems.

The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation.

The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all.

Locking mechanism

PostgreSQL locks often called “write locks” or “exclusive locks,” restrict users from modifying a row or a PostgreSQL table’s contents.

Rows that have undergone a DELETE or UPDATE operation will be locked solely until the transaction is finished.

Other users won’t be able to change the same rows until the transaction is rolled back or committed.

Users won’t have to wait to alter various rows because this locking mechanism only engages when they attempt to modify the same rows.

PostgreSQL serializes changes to vital portions of the database using many layers of locks to ensure that complex transactions can run safely in parallel.

Until they attempt to acquire a conflicting lock, such as when they update the same row, transactions are carried out concurrently.

The first transaction to obtain the lock may then continue, while the second transaction must wait until the first transaction commits or aborts before proceeding.

Although some locking in databases happens automatically, there are some circumstances when locking must be done manually. The PostgreSQL LOCK command enables manual locking.

Database Isolation Levels

In database systems, isolation determines how transaction integrity is visible to other users and systems.

A lower isolation level increases the ability of many users to access the same data at the same time, but increases the number of concurrency effects (such as dirty reads or lost updates) users might encounter.

Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another.

  • Dirty Read : A Dirty read is a situation when a transaction reads data that has not yet been committed.

  • Non Repeatable read : Non Repeatable read occurs when a transaction reads the same row twice and gets a different value each time.

  • Phantom Read : Phantom Read occurs when two same queries are executed, but the rows retrieved by the two, are different.

Based on these phenomena, The SQL standard defines four isolation levels:

  • Read Uncommitted : Read Uncommitted is the lowest isolation level. In this level, one transaction may read not yet committed changes made by other transactions, thereby allowing dirty reads. At this level, transactions are not isolated from each other.

  • Read Committed : This isolation level guarantees that any data read is committed at the moment it is read. Thus it does not allow dirty read. The transaction holds a read or write lock on the current row, and thus prevents other transactions from reading, updating, or deleting it.

  • Repeatable Read : This is the most restrictive isolation level. The transaction holds read locks on all rows it references and writes locks on referenced rows for update and delete actions. Since other transactions cannot read, update or delete these rows, consequently it avoids non-repeatable read.

  • Serializable : This is the highest isolation level. A serializable execution is guaranteed to be serializable. It is defined to be an execution of operations in which concurrently executing transactions appears to be serially executing.

Triggers

A trigger is a stored procedure in database which automatically invokes whenever a special event in the database occurs.

For example, a trigger can be invoked when a row is inserted into a specified table or when certain table columns are being updated.

Syntax:

create trigger [trigger_name] 
[before | after]  
{insert | update | delete}  
on [table_name]  
[for each row]  
[trigger_body]  
Enter fullscreen mode Exit fullscreen mode

Listing Triggers

You can list down all the triggers in the current database from pg_trigger table as follows

postgres=# SELECT * FROM pg_trigger;
Enter fullscreen mode Exit fullscreen mode

Dropping Triggers

The following is the DROP command, which can be used to drop an existing trigger

postgres=# DROP TRIGGER trigger_name;
Enter fullscreen mode Exit fullscreen mode

Resources:

PostgreSQL Documentation
Locking Mechanism
Database isolation levels
Triggers

Top comments (0)