DEV Community

Adienge Moses
Adienge Moses

Posted on

ESSENTIAL SQL COMMANDS FOR DATA SCIENCE

SQL (Structured Query Language) is a popular programming language used for managing and manipulating data in relational databases. Data scientists often use SQL to query data from databases and extract useful insights. In this article, we'll cover some essential SQL commands that data scientists should be familiar with, along with examples of code.

1.SELECT

SELECT is a statement used to search through a table's contents. You can select which columns to retrieve and filter the data using various criteria.

Code Format
SELECT column1, column2
FROM table;

For instance;
SELECT id, account_id, total_amt_usd
FROM orders;

From the example, columns id, account_id, total_amt_usd are selected from the orders table.

2.WHERE

This statement simply filters data with it being a conditional.
Code Format
SELECT column1, column2
FROM table
WHERE condition;

For instance;
SELECT id, account_id, total_amt_usd
FROM orders
WHERE total_amt_usd<500;

3.ORDER BY

This is used to sort results of data in any column. The results can be sorted in ascending(default) or descending order as ASC or DESC which is added after the column in the order statement.
Code Format
SELECT column1, column2
FROM table
ORDER BY column2;

For instance;
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC;

From the above example, id, account_id, total_amt_usd are retrieved from the orders table and then total_amt_usd sorted by the ORDER BY clause in descending order. We can also ORDER BY more than one column at a time.

4.LIMIT

As the name suggests, the limit statement is used to limit the number of rows displayed to a specific number.
Code Format
SELECT column1, column2
FROM table
ORDER BY column2
LIMIT no. of rows to display;

Example;
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC
LIMIT 5;

On the above code, the LIMIT statement will display 5 rows of results as specified.

5.GROUP BY

The GROUP BY clause also is a type that is used to group data; it generally provides summary statistics.
Code Format
SELECT column1, column2
FROM table
GROUP BY column1;

A code example of this would be;
SELECT certification, title_count
FROM films
GROUP BY certification;

Results of the code would display as;
certification title_count
Unrated 62
M 5
G 112
NC-17 7
The GROUP BY statement displays certification categories by grouping them and displaying their total numbers.

6.JOIN

This command is used to combine two or more tables on a common column.
Code Format
SELECT column1, column2
FROM table1
JOIN table2
ON table1.column=table2.column;

For instance;
SELECT customers.customer_id, orders.order_date,
orders.order_total
FROM customers
JOIN orders
ON customers. customer_id=orders. customer_id;

The code above retrieves customer ID, order_date, order_total from the customers and order tables and joins both of them based on customer ID column.

7.DISTINCT

DISTINCT removes duplicates to return unique values.
Code Format
SELECT DISTINCT column1
FROM table;

For instance;
SELECT DISTINCT language
FROM films;

The above code results to;
language
Danish
Greek

The results display only unique languages that are in the column language if there are multiple languages of the same name then only one of them is displayed.

8.COUNT

COUNT () clause counts the number of records with a value in a field.
When using it we use an alias for code readability.
Code Format
SELECT COUNT(column1) AS count_column1
FROM table;

For instance;
SELECT COUNT(birthdates) AS count_birthdates
FROM people;

Results of the query would appears as;
count_birthdates
6152
In conclusion, those are some of the essential SQL commands data scientists should have at their grasp. Apart from that best practices such as capitalizing keywords, adding new lines, writing clear and readable code and not excluding a semi-colon at end of the code should always be observed.

Latest comments (0)