Hey DEV community! I would like to share with you a cheatsheet of some of the most common SQL commands. The guide is focuses on PostgreSQL.
In a later article I will show how to create tables and its commands. Hope you'll enjoy it, and more important, that it will be useful to you.
Let's start!
SELECT Statement
The SELECT statement allows us to retrieve information from a table.
General syntax:
SELECT column_name FROM table_name;
SELECT DISTINCT
Sometimes a table contains a column that has duplicate values, and perhaps we are in a situation where we only want to list the unique/distinct values.
To do this, we can use the DISTINCT keyword, which returns only the distinct values in a column. Therefore, it works in one column.
The syntax looks like this:
SELECT DISTINCT column_name FROM table_name;
SELECT COUNT
The COUNT function returns the number of input rows that match a specific condition of a query.
To make it easier, we can try to answer this question:
How many rows are there in the named column?
SELECT COUNT (column_name) FROM table_name;
COUNT is merely reporting back the number of rows returned which is going to be the same, regardless of the actual column we are looking at. This is why we can use just (*), which gives us back all the rows in a table.
SELECT COUNT (*) FROM table_name;
COUNT is much useful when combined with other commands, such as DISTINCT.
Let's say we want to know, how many unique names are on the table?
We would do something like this:
SELECT COUNT (DISTINCT name) FROM table_name;
The result will be a number, representing all the names that are not repeated in the table.
WHERE
General syntax:
SELECT column1, column2 FROM table
WHERE conditions;
Let's see it with an example:
How many film titles have an "imdb rate" bigger than 4, a "length" bigger or equal to 60 and a rating of 'Good'?
Assuming we have three columns, 'imdb_rate', 'film_length' and 'rating' in the table 'film', we would make a query that looks like this:
SELECT COUNT(title) FROM film
WHERE imdb_rate > 4 AND film_length >= 60
AND rating = 'Good';
ORDER BY
General syntax:
SELECT column_1, column_2, column_3
FROM table_name
ORDER BY column_1, column2 ASC/DESC; --> Here we are defining ascendant or descendant. ORDER BY uses ASC by default
LIMIT
The LIMIT command allows us to limit the number of rows returned for a query. It is useful if we want to know, for example, the 10 highest payments in a table. This also becomes useful in combination with ORDER BY.
LIMIT goes at the very end of a query request, being the last command executed.
General syntax:
Let's suppose that we have a 'payment' table with 2 columns 'amount' and 'payment_date', where we want to know the 'amounts' that are different from 0.00, and we want to sort/order them by 'payment_date', limiting the query to 10 results.
SELECT * FROM payment
WHERE amount != 0.00
ORDER BY payment_date
LIMIT 10;
BETWEEN
The BETWEEN operator can be used to match a value against a range of values:
value BETWEEN low AND high;
We can define the BETWEEN operator as INCLUSIVE or EXLUSIVE
INCLUSIVE:
value BETWEEN low AND high;
OR
value > = low AND value < = high;
EXCLUSIVE:
We can combine it with the NOT logical operator
value NOT BETWEEN low AND high;
OR
value < low OR value > high;
IMPORTANT!
When using the BETWEEN operator with dates that also include timestamp information, we should pay careful attention on using BETWEEN versus <=, >= (comparison operators), due to the fact that a date time starts at 0:00.
An example, combined with a WHERE statement, might be:
SELECT * FROM payment
WHERE payment_date BETWEEN '2007-02-01' AND '2007-02-15';
Let's say we receive a certain 20 results (all on the 2007-02-14). But if we do:
SELECT * FROM payment
WHERE payment_date BETWEEN '2007-02-01' AND '2007-02-14';
Now, we receive 0 results. This is because all of them happen on the '2007-02-14'. When we are dealing with this end date ('2007-02-14'), it's actually going up to the beginning of the day (00:00:00), not towards the end, which would be the 24-hour mark of the 14th day (23:59:59). This is why it is important to make a double check.
IN
We can use the IN operator to create a condition that checks to see if a value is included in a list of multiple options.
General syntax:
value IN (option1, option2, ... , option_n);
Example:
We want to get the colors red, blue and green from a table.
SELECT color FROM table_name
WHERE color IN ('red', 'blue', 'green');
Now, let's say that we want to get every color that is not red or blue.
SELECT color FROM table_name
WHERE color NOT IN ('red', 'blue'); --> every color that is not red or blue
As we can see, this gives us the advantage of not having to write a lot of OR statements.
LIKE and ILIKE
The LIKE operator allows us to perform pattern matching against string data with the use of wildcard characters:
- Percent %: Matches any sequence of characters
- Underscore _: Matches any single character
LIKE is case-sensitive, on the other hand, ILIKE is case-insensitive.
Percent:
Example:
Let's suppose that we want all the names that begin with an 'A'
SELECT * FROM table_name
WHERE name LIKE 'A%'; --> starts with 'A' and then it can have anything (numbers, letters, special characters, etc)
Or, all the names that end with an 'a'
SELECT * FROM table_name
WHERE name LIKE '%a'; --> it can be anything that ends with 'a'
Underscore:
It allows us to replace just a single character
Example:
We want to get all "Mission Impossible" films
SELECT * FROM table_name
WHERE title LIKE 'Mission Impossible _'; --> This query is going to give us all the 'Mission Impossible', 1, 2, 3, etc
Now, let's imagine that we have version string codes in the format 'Version#A4', 'Version#B7', etc... If we want to get all the different versions, we can use multiple underscores:
SELECT * FROM table_name
WHERE value LIKE 'Version#__'; --> Here, we are using 2 underscores (it looks like one '_' but there's two '__' :) ), which can be filled of any sort of characters we want (numbers, letters, special characters, etc)
Finally, we can combine pattern matching operators to create more complex patterns, like:
SELECT * FROM table_name
WHERE name LIKE '_her%';
This query is going to give us every name that has a 'her' in it, with the exception that can only have one character before it. We would receive results like these:
Cheryl
Theresa
Sherri
Aggregation Functions
The main idea behind an aggregate function is to take multiple inputs and return a single output.
Most common Aggregate Functions:
- AVG() - returns average or mean value
- COUNT() - returns number of values
- MAX() - returns maximum value
- MIN() - returns minimum value
- SUM() - returns the sum of all values
Example:
We want to know which film is the one with the shortest duration and the longest
SELECT MIN(film_length) FROM film;
SELECT MAX(film_length) FROM film;
SELECT MAX(film_length),MIN(film_length) FROM film;
Now, if we want to know the average duration between all the films, we can do:
SELECT AVG(film_length) FROM film;
Here, we are going to receive a bunch of 0s. To solve this you can use ROUND, which takes 2 parameters, (value, decimal_places):
SELECT ROUND (AVG(film_length), 2) FROM film;
GROUP BY
GROUP BY allows us to aggregate columns per some category.
General syntax:
AGG is a placeholder for some "Aggreate function" or data_column.
SELECT category_col, AGG(data_col)
FROM table_name
GROUP BY category_col;
So, the GROUP BY clause must appear right after a FROM or WHERE statement. Now, we can filter things out like this:
SELECT category_col, AGG(data_col)
FROM table_name
WHERE category_col != 'A'
GROUP BY category_col;
In the SELECT statement, columns must either have an Aggregate Function or be in the GROUP BY call. So the column that we pick after SELECT, must be in GROUP BY. The only exception to this is if we call some sort of AGG on the column itself, like in the example above.
If we want to GROUP BY date, we have to call the special DATE function to convert the "timestamp" to a date, and then we can GROUP BY that.
SELECT DATE(payment_date) FROM payment;
DATE choose the day, not the minutes or hour per transaction.
Query example:
SELECT DATE(category_col), SUM(data_col) FROM table_name
GROUP BY DATE(category_col)
ORDER BY SUM(data_col) ASC/DESC;
HAVING
The HAVING clause allows us to filter AFTER an aggregation has already taken place.
Let's see it with an example:
SELECT company, SUM(sales)
FROM finance_table
WHERE company != 'Google' --> we can use WHERE for 'company' but not for 'sales'
GROUP BY company
HAVING SUM(sales) > 1000; --> but here yes because of the HAVING clause
HAVING allows us to use the aggregate result as a filter, along with a GROUP BY statement.
In short:
- We can use WHERE for everything less aggregated statements.
- HAVING for ONLY aggregated statements.
AS statement
AS allows us to create an 'alias' for a column or result. We can use it to make a result column more readable or understandable for us or somebody else.
General syntax:
SELECT column_name AS new_name
FROM table_name;
SELECT SUM(column_name) AS new_name
FROM table_name;
SELECT SUM(amount) AS net_revenue
FROM payment;
Let's see this with a more real life example:
Let's suppose that we have a table 'payment' with a few customer ids, and the amount that they spent during a period of time. From this, we want to get the customers that spent more than 100.
SELECT customer_id, SUM(amount) AS total_spent
FROM payment
GROUP BY customer_id
HAVING SUM(amount) > 100;
Now, if we do:
SELECT customer_id, SUM(amount) AS total_spent
FROM payment
GROUP BY customer_id
HAVING SUM(total_spent) > 100;
This query is gonna give us an error, because we are using the alias total_spent for the SUM operation, instead we should use amount. AS is the last thing that gets assign.
INNER JOIN
JOINs allow us to combine multiple tables together. The main reason for the different JOIN types is to decide how to deal with information only present in one of the joined tables.
General syntax:
SELECT * FROM tableA
INNER JOIN TableB
ON TableA.col_match = TableB.col_match;
FULL OUTER JOIN
There are few different types of OUTER JOINs. They will allow us to specify how to deal with values that are only present in one of the tables being joined.
A FULL OUTER JOIN just grabs everything, whether it's present in both tables or present only in one table. The order of tables doesn't matter because they are symmetrical.
General syntax:
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.col_match = TableB.col_match;
When one value is not in the other table, SQL fills it with "null", clarifying that there were no values.
FULL OUTER JOIN with WHERE:
We can do this in order to get unique rows to either table (rows not found in both tables). This is the exact opposite of an INNER JOIN.
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.col_match = TableB.col_match
WHERE TableA.id IS null OR
TableB.id IS null;
This query is going to grab things that are unique either completely the TableA or unique to TableB. Remember, this is symmetrical, so we can switch them.
LEFT OUTER JOIN
A LEFT OUTER JOIN results in the set of records that are in the left table, if there is no match with the right table, the results are "null". Before we said that the JOINs were symmetrical, now, is when it starts to get asymmetrical.
General syntax:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.col_match = TableB.col_match;
Here, we are saying that we are going to grab things that are EXCLUSIVELY to TableA or can be found in both.
IMPORTANT: It's critical to remember that table order does matter in SQL's syntax.
LEFT OUTER JOIN with WHERE: It gets rows unique to left table
Example:
What if we only want entries unique to Table A, rows found in Table A and not found in Table B?
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.col_match = TableB.col_match
WHERE TableB.id IS null;
RIGHT JOIN
A RIGHT JOIN is essentially the same as a LEFT JOIN, except that the tables are switched.
General syntax:
SELECT * FROM TableA
RIGHT OUTER JOIN TableB --> same as RIGHT JOIN TableB
ON TableA.col_match = TableB.col_match;
We can use WHERE, in order to clarify that we only want tables that can be found exclusively in TableB and are not found in TableA:
SELECT * FROM TableA
RIGHT OUTER JOIN TableB --> same as RIGHT JOIN TableB
ON TableA.col_match = TableB.col_match
WHERE TableA.id IS null;
UNION
The UNION operator is used to combine the result-set of two or more SELECT statements. It basically serves to directly concatenate two results together, like "pasting" them together.
** General syntax:**
SELECT column_name(s) FROM table_1
UNION
SELECT column_name(s) FROM table2
ORDER BY name; --> ORDER BY is not necessary
Something to keep in mind: It should be logical to make an UNION between the 2 tables, they should match up in a way that you can stack the results right on top of another.
Timestamps and Extract
PostreSQL can hold date and time information:
- TIME - Contains only time
- DATE - Contains only date
- TIMESTAMP - Contains date and time
- TIMESTAMPTZ - Contains date, time, and timezone
TO REMEMBER: You can always remove historical information, but you can't go back and add it! So, it's important to consider thinking long-term when choosing a particular time data type.
Some functions and operations:
- TIMEZONE
- NOW
- TIMEOFDAY
- CURRENT_TIME
- CURRENT_DATE
If we run the following queries:
SHOW ALL; --> It gives us around 300 parameters that we can use, like SHOW TIMEZONE, which gives us our time zone
SELECT NOW(); --> It grabs our current time zone that we're working in
SELECT TIMEOFDAY(); --> It gives us the same information as NOW() but as a string, a bit easier to read. Also, a little easier to store and grab components from, because it's just a text string with that information, it's no longer a specific time stamp style
SELECT CURRENT_TIME; --> Time with time zone
SELECT CURRENT_DATE;
How can we extract information from a time based data type? For this we can use:
- EXTRACT()
- AGE()
- TO_CHAR()
EXTRACT(): It allows to "extract" or obtain a sub-component of a date value, such as YEAR, MONTH, DAY, WEEK, QUARTER.
Example:
If we would have a full timestamp and we want only the year of it, we can do:
SELECT EXTRACT (YEAR FROM date_col)
AS year --> we can add an alias
FROM table_name;
AGE(): It calculates and returns the current age, given a timestamp. Basically, it calculates how old is that particular timestamp given when you are actually querying it.
Syntax:
SELECT AGE(date_col) --> date_col = timestamp column
FROM table_name;
TO_CHART(): General function to convert data types to text. Useful for timestamp formatting. It's not explicitly used just for timestamps, but it's really useful for those cases.
Syntax:
SELECT TO_CHART(date_col, 'mm-dd-yyyy') --> The last parameter is a string. We can format it as we want, like an integer into a string
FROM table_name;
Feel free to explore the documentation on Data Type Formatting Functions !
Mathematical Functions and Operators
Here, we have functions like ROUND that we saw before. The best way to learn them, is by exploring the documentation and finding the ones useful for your queries!
Mathematical Functions and Operators
String Functions and Operators
PostgreSQL also provides a variety of string functions and operators that allow us to edit, combine and alter text data columns. As I said before, the best way to learn it, is by diving into the documentation: String Functions and Operators
However, let's see some examples:
Let's suppose that we want to know the length of a first_name, we would do a query that looks like this:
SELECT LENGTH(first_name) FROM table_name;
Now, we want to add together first_name and last_name with a space in the middle:
SELECT first_name || ' ' || last_name --> ' ' adds a space in the middle
AS full_name
FROM table_name;
The same example, but using the upper method:
SELECT upper(first_name) || ' ' || upper(last_name) AS full_name
FROM customer;
Let's jump into a more realistic example:
Assuming that we want to create an email for a customer, we would the take first letter of the first name, then add their last name and finally mix it with something like "@company .com".
For this example, we can use a function called "left", which returns the first n characters in a string.
LEFT(str text, n int);
Now, the query:
SELECT LOWER(LEFT(first_name,1)) || LOWER(last_name) || '@gmail.com'
AS customer_email
FROM customer;
Sub-query
A sub-query allows us to build more complex queries, essentially performing a query on the results of another query. The general syntax involves two SELECT statements.
Let's see an example:
How can we get a list of students who scored better than the average grade?
SELECT student, grade
FROM test_scores
WHERE grade > (SELECT AVG(grade) --> This part of the query runs first
FROM test_scores);
The sub-query is performed first, since it's inside a parenthesis. We can also use the IN operator in conjunction with a sub-query to check against multiple results returned.
Example:
A sub-query can operate on a separate table:
SELECT student, grade
FROM test_scores
WHERE student IN
(SELECT student
FROM honor_roll_table);
In this case we are asking for a student that is inside another table. We could do this with a JOIN too, but for especially complex queries, it's easier to think of them as a sub-query.
The EXISTS operator is used to test the existence of rows in a sub-query. Typically, a sub-query is passed in the EXISTS() function to check if any rows are returned with the sub-query.
General Syntax:
SELECT column_name
FROM table_name
WHERE EXISTS
(SELECT column_name FROM
table_name WHERE condition);
SELF-JOIN
A self-join is a query in which a table is joined to itself. They are useful for comparing values in a column of rows within the same table.
They are not as common as just a normal INNER JOIN, because their use depends more on the structure of the table and the actual question we are trying to solve.
The self-join can be seen as a join of two copies of the same table. The table is not actually copied, but SQL performs the command as if it were. There is no special keyword for a self-join, it's simply the standard syntax of JOIN with the same table on both sides.
When using a self-join, as we're using the same table, it's necessary to use an alias for the table, otherwise the table names would be ambiguous.
General syntax:
SELECT tableA.col, tableB.col
FROM main_table AS tableA --> In this two lines, we are assigning two different "aliases" to the same table
JOIN main_table AS tableB ON --> In this two lines, we are assigning two different "aliases" to the same table
tableA.some_col = tableB.other_col;
All the "table"s are referring to the same table, but we're just using the aliases in order to organize the query.
Okay, that was long... If you've reached here, I want to thank you for reading it and any comments, advices, fixes in the text or queries are more than welcome! Feel free to use it and share it!
To finish, I would like to thank to Jose Portilla, I was able to extract all this information from his great course "Complete SQL Bootcamp" on Udemy once I finished it.
All GIFs taken from GIPHY
Top comments (1)
This is a great help for someone that doesn't work with SQL a lot but still needs some of it once in a while. Thanks for taking the time!