SQL CONCEPTS TECHNICAL PAPER

#sql

Tushar Singh (susst94@gmail.com)

Introduction:

During my starting days in Software Development, I always knew in the back of my mind that I would have to end up learning SQL at some point without knowing why it is that I would need to learn a language for defining, creating, managing, and even running queries for databases. Perhaps it was just word of mouth with my peers that I took it for granted, and didn't ask them the question, "What purpose does SQL serve in the day-to-day life of a Software Developer?" My moment of realization came after completing a Python project which worked on CSV files and later working on that same database but this time using SQL after learning all the basic tools and concepts and working with those ideas and writing queries. It was at that point I realized how powerful SQL can be in performing the same operations that I did with Python which required at times 30 to 40 lines of code but could easily be boiled down to 5 to 6 lines of simple queries. That was the beauty of SQL and its syntactical nature designed to perform your operation and present it in an ordered manner. And it was just the tip of the iceberg.

This paper is not designed to teach you the reader SQL, because the best teacher for learning SQL is you and nobody else. The purpose of this paper is to introduce the reader to a few tools that can express the unique nature and design aspects that make SQL the powerful tool it is for working with databases.

ACID:

Before we can begin explaining what ACID even is in the context of databases, we need to first understand what is a transaction. A transaction as we define it in the context of databases and management systems is an operation that has definite completion states i.e. complete or incomplete. An example would be your electric bill every month. You either pay your bill or you don't and get a notification from the local electric board that you are due on your bills. Now that we know what a transaction is, we can introduce the ACID properties for a transaction.
The following are the ACID properties:

Atomicity(A):

"Atomicity" is the property of a transaction that ensures that each statement that constitutes a transaction is treated like an individual unit. The statement is either going to end up being executed completely or the entirety of it isn't even bothered with in the first place.
Consistency(C):

As the word says, the "Consistency" property of a transaction ensures that any changes which will take place due to a transaction are going to be predictable and predetermined as well. This property is important for establishing robustness in our data. Any possible corruption in errors should not create inadvertent circumstances which harm the integrity of our table.
Isolation(I):

The "Isolation" property of a transaction allows multiple users to operate on the same table simultaneously without it affecting one another. The individual transaction request occurs as if they were executing in sequential order.
Durability(D):

The "Durability" property of a transaction allows our data to save any changes after the successful execution of a transaction, regardless of whether a system failure occurs.

CAP Theorem:

The "CAP Theorem", introduced by Professor Eric Brewer allows us in choosing a data management system that has characteristics that are essential for our application. The theorem refers to three distributed system characteristics, which are as follows:

Consistency(C):

Do not confuse this with "Consistency" in ACID. They may sound the same, but they aren't one bit. Consistency in this theorem implies that all users at any point in time see the same data regardless of which node they are connected to at that point in time. There is a small caveat though. For this to be achieved, whenever data is written via one node, it should be mimicked across all other nodes. Then and only then can the write operation be declared a success.
Availability(A):

Availability implies that regardless of a node being unresponsive, a client will always get a response for making a request for data, regardless of whether the response is a success or failure.
Partition Tolerance(P):

The word Parition here means that a communication error or message failure event has taken place in a system. Naturally, the word Tolerance in this context implies that regardless of these events, the system should not fail.

Joins:

Perhaps one of the most important operations in SQL is the JOIN operation involving multiple tables. JOIN is what allows us to work with multiple tables at the same time and perform any operation which requires fetching data from those tables simultaneously and working with them.
Before diving into the intricacies of JOIN, we must first understand the concept of PRIMARY KEY. In any table, depending upon the uniqueness of different table entries and in order to distinguish said entries, we make use of PRIMARY KEY. What this means is that we are going to choose a column and the data contained within the column is a unique identifier to uniquely all different data entries. This unique identifier becomes the basis for performing JOIN operation as it becomes the bridge for performing a relational map between tables and hence allows us to fetch data from those tables simultaneously.
Following are all the different types of JOIN functions available:

INNER JOIN
OUTER JOIN
LEFT JOIN
RIGHT JOIN

Aggregation:

More often than not, we are required to fetch either some form of summary or some form of inference data from a table which allows us to encapsulate it. SQL supports its users by providing specific expressions (or functions) called aggregates that help us in accomplishing that task. Some of the most commonly used aggregate functions are mentioned as follows:

MAX: This function returns the maximum value of the column in focus
MIN: This function returns the minimum value of the column in focus
SUM: This function returns the total sum value of the column in focus
AVG: This function returns the average value of the column in focus
COUNT: This function returns the count value (which aren't NULL) of the column in focus. To include NULL values in the total count, use the * operator in the aggregate function argument space.

Filters in queries:

In any scenario, we make use of the WHERE clause extensively, coupled with some condition(s) to filter data specific to our needs at the time. The filtering process takes place with the help of a comparison between our filtering clause and the data present in the table whose data needs to be filtered. Following are a few keywords that we make use of to formulate our filter queries in the WHERE clause:

KEYWORDS	OPERATIONS
=	Equal to
!=	Not Equal to
<	Less than
<=	Less than or Equal to
>	Greater than
>=	Greater than or Equal to
BETWEEN	Value between two closed limits
NOT BETWEEN	Value not between two closed limits
IN (conditional_set)	Value present in set provided
NOT	Value, not suffixed condition

Normalization:

Database Normalization is a series of steps that are used to limit and reduce data redundancy with the aim of improving data integrity and consistency. The process constitutes the organization of both columns and tables in a database with the goal of ensuring that the dependencies are enforced in a proper fashion via the integrity constraints of the underlying database. There are some underlying formal rules, mainly the process of synthesis or decomposition which help in accomplishing this task.

It is often good practice to carry out this process for every database that we design. The process by which we take a database, and apply the set of formal rules and conditions is called Normal Forms. Following are the various categories of Normal Forms:

First Normal Form
Second Normal Form
Third Normal Form
Boyce Codd or Fourth Normal Form
Fifth Normal Form
Sixth Normal Form

Indexes:

SQL employs a concept that is widely used by programmers spanning across almost all languages, which is Indexes. Indexes in SQL are used for performing faster search operations for specific columns. In a hypothetical scenario, if there were no Index, the SQL query must begin with the starting row and then the entire table is read to find the pertinent rows. An Index in this scenario helps to quickly determine the position to search from in the table, without caring about the rest of the data present in the table. Some of the commonly used Indexes are PRIMARY KEY, UNIQUE,INDEX, etc.

An important thing to note is, Indexes are deemed less important for queries involving small or big tables in which report queries to analyze most of, if not all the rows. Sequential reading is faster in these scenarios compared to using an Index when a query is required to read through most of the rows anyway.

Transactions:

As mentioned before during the starting section of this paper, a transaction as we define it in the context of databases and management systems is an operation that has definite completion states i.e. complete or incomplete. To go more into detail, Transactions can be comprised of either a single reading, writing, deletion, or updation process or a combination of either of these processes. The simplest example that one can easily relate to is that of a banking withdrawal transaction via an ATM. In general, there are always three stages present in this process:

Balance check of the account in question
Money deduction operation from the account
Saving withdrawal activity into the account/bank log.

After the successful completion of all processes during this transaction, permanent changes(modifications) are made to the database reflecting the occurrence of this transaction. However, in the event an error occurs during any of the stages of the transaction, all modifications done thus far are rolled-back to secure and maintain data integrity. All of these systems in place must remind you of the ACID properties discussed earlier, and it is exactly that. Transactions in almost all cases are designed to have ACID properties.

Locking mechanism:

Talking about SQL server transactions, another extremely crucial thing associated with said transactions is a Locking mechanism. This mechanism is essential for allowing SQL servers to work flawlessly within a multi-ser setting. This server locking is an essential part of the insulation of objects affected by a transaction. When objects are locked, SQL servers restrict any changes of data stored in objects affected by the underlying lock. Simply put, if and when a transaction imposes a lock on an object, the rest of the transactions that need access to that specific object are forced to wait until the lock gets released. A few of the lock modes which are used consistently are :

Exclusive(X)
Shared(S)
Update(U)
Intent(I)

Database Isolation Levels:

Continuing on our topic of transactions and them having ACID properties, we will now explore the concept of Isolation(I) a bit more. In SQL, there is something called Isolation levels, which is used to define the extent to which a transaction has to be isolated from the resource or from the data modifications due to other concurrent transactions. There are five different levels of isolation:

Read Uncommitted: Transactions at this level don't provide shared locks to block other transactions from modifying read operations via the current transaction.
Read Committed: At this level, exclusive locks are issued by transactions during data modification time, hence not allowing other transactions to read modified data that hasn't been committed yet.
Repeatable Read: At this level, statements can't read data that hasn't been committed yet and has been modified by other transactions.
Serializable: At this level, statements can't read data that has been modified yet not committed by other transactions. Until the transaction is completed, no other transactions can modify the data that has been read by the current one.
Snapshot: At this level, any statement in a transaction that reads data shall be the transactionally consistent version of the data that existed during the start of the transaction.

Triggers:

Lastly, we present to you the reader, the statement TRIGGER. A trigger is a piece of code that gets executed automatically as a response to a particular event occurrence on a table within the database. A trigger always has an association with a specific table. In the event a table gets deleted, all the triggers associated are deleted automatically.

A trigger is executed with the following events either before or after: