DEV Community

Dawit Tadesse Hailu
Dawit Tadesse Hailu

Posted on

Essential SQL Commands for Data Science

Structured Query Language (SQL) is a programming language used to manage and manipulate data stored in a relational database. SQL is essential for data science, as it provides a way to extract and manipulate data from databases, which is crucial for data analysis and visualization. In this article, we will see some essential SQL commands that are crucial in data science.

SELECT

SELECT is the most basic and essential SQL command, used to retrieve data from a database. The SELECT statement is used to specify the columns from which data should be retrieved. The syntax of the SELECT statement is as follows:

SELECT column1, column2, …
FROM table_name;
Enter fullscreen mode Exit fullscreen mode

For example, to retrieve all the data from a table called "employees," we can use the following command:

SELECT * FROM employees;

This command retrieves all the data from the "employees" table.

WHERE

The WHERE clause is used to specify a condition that must be met in order for data to be retrieved. The WHERE clause is used to filter data based on specific criteria. The syntax of the WHERE clause is as follows:

SELECT column1, column2, …
FROM table_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

For example, to retrieve all the data from the "employees" table where the salary is greater than $50,000, we can use the following command:

SELECT * FROM employees
WHERE salary > 50000;
Enter fullscreen mode Exit fullscreen mode

This command retrieves all the data from the "employees" table where the salary is greater than $50,000.

GROUP BY

The GROUP BY clause is used to group data based on specific columns. The GROUP BY clause is used in conjunction with aggregate functions such as SUM, AVG, MIN, MAX, and COUNT. The syntax of the GROUP BY clause is as follows:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
Enter fullscreen mode Exit fullscreen mode

For example, to retrieve the total salary for each department in the "employees" table, we can use the following command:

SELECT department, SUM(salary)
FROM employees
GROUP BY department;
Enter fullscreen mode Exit fullscreen mode

This command groups the data by department and calculates the total salary for each department.

ORDER BY

The ORDER BY clause is used to sort data in ascending or descending order based on specific columns. The syntax of the ORDER BY clause is as follows:

SELECT column1, column2, …
FROM table_name
ORDER BY column1 ASC/DESC;
Enter fullscreen mode Exit fullscreen mode

For example, to retrieve the data from the "employees" table sorted by salary in descending order, we can use the following command:

SELECT * FROM employees
ORDER BY salary DESC;
Enter fullscreen mode Exit fullscreen mode

This command retrieves the data from the "employees" table sorted by salary in descending order.

JOIN

The JOIN clause is used to combine data from two or more tables based on a common column. The JOIN clause is used to retrieve data from related tables. The syntax of the JOIN clause is as follows:

SELECT column1, column2, …
FROM table1
JOIN table2
ON table1.column = table2.column;
Enter fullscreen mode Exit fullscreen mode

For example, to retrieve data from the "employees" table and the "departments" table where the department ID is the same in both tables, we can use the following command:

SELECT employees.employee_name, departments.department_name
FROM employees
JOIN departments
ON employees.department_id = departments.department_id;
Enter fullscreen mode Exit fullscreen mode

These are some of the most important SQL commands to master for data science. Hope you had a good read and until my next article, cheers.

Top comments (0)