DEV Community

Kiongo-Bob
Kiongo-Bob

Posted on

Essential SQL Commands for Data Science

SQL (Structured Query Language) is a crucial tool for data science. It allows you to retrieve and manipulate data in a robust and efficient way. Read along as I traverse some of the most important SQL commands for data science.

Statements
SELECT
The SELECT statement is used to retrieve data from one or more tables. This command comes in handy when extraction of specific features or columns is required from a dataset.

Example:
SELECT col1, col2 FROM tab1;

WHERE
The WHERE statement is used to filter data based on certain conditions. In data science, this command is often used to extract data that meets the desired criteria.

Example:
SELECT col1, col2 FROM tab1 WHERE col1 > 10;

In this example, the condition/ criteria is ' > 10' -> Greater than 10

GROUP BY
The GROUP BY statement is utilized in grouping data by one or more columns. In data science, this command is often used to group data by specific features or categories.

Example:
SELECT col1, AVG(col2) FROM tab1 GROUP BY col1;

HAVING
The HAVING statement is used as filter for data that has been grouped using the GROUP BY statement. In data science, this command is occasionally applied when a specific criteria is required to apply to the groups.

Example:
SELECT col1, AVG(col2) FROM tab1 GROUP BY col1 HAVING AVG(col2) > 100;

ORDER BY
The ORDER BY statement is used to categorize the data in a query by one or more columns. In data science, this command is usually employed to sort data by specific features or categories.

Example:
SELECT col1, col2 FROM tab1 ORDER BY col1 ASC;

JOIN
The JOIN statement is used to combine data from two or more tables based on a common column. In data science, it now becomes possible to merge datasets that share common attributes.

Example:
SELECT tab1.col1, tab2.col2 FROM tab1 JOIN tab2 ON tab1.column = tab2.column;

UNION
The UNION statement is applied in combining data from two or more queries into a single result set. In data science, this command is utilized in merging datasets that have similar structures.

Example:
SELECT col1, col2 FROM tab1
UNION
SELECT col1, col2 FROM tab2;

Functions
COUNT
The COUNT function returns the number of rows matching a specific condition. In data science, this command is often used to count the number of observations or instances in a dataset.

Example:
SELECT COUNT(*) FROM tab1 WHERE col1 > 10;

MAX and MIN
The MAX and MIN function return the maximum and minimum values in a column, respectively. In data science, these commands are often used to identify outliers in a dataset.

Example:
SELECT MAX(col1), MIN(col2) FROM tab1;

SUM and AVG
The SUM and AVG functions are used to calculate the total sum and average of a column, respectively. In data science, these commands are often used to calculate measures of central tendency or to identify trends in a dataset.

Example:
SELECT SUM(col1), AVG(col2) FROM tab1;

LIKE
The LIKE operator helps in searching for a specific pattern or string within a column. In data science, this command is often used to search for specific keywords or phrases in text data.

Example:
SELECT col1, col2 FROM tab1 WHERE col3 LIKE '%enter_keyword_here%';

DISTINCT
The DISTINCT keyword helps remove duplicates from a result set. In data science, this command is often used to identify unique values or categories in a dataset.

Example:
SELECT DISTINCT col1 FROM tab1;

These are some of the powerful SQL commands or tools available for data science. By employing these commands and others alike though not mentioned, you can gain valuable insights from your data thus better-inform your decision making.

Top comments (0)