DEV Community

Cover image for Essential SQL Commands for Data Science
Belinda Florence
Belinda Florence

Posted on • Edited on

Essential SQL Commands for Data Science

Effective and essential tools are critical when it comes to good performance. This is not an exemption to data scientists, they too seek tools and ways of making their work enjoyable and fast despite the challenges that come with the tasks. SQL is helpful in extracting important information from a database. In this article, we shall look at SQL commands that are essential for data scientists to know and use in their tasks. The commands vary from the most basic one to the most advanced with examples to make it possible to follow when reading the article.

a) SELECT...FROM...
This command is used to retrieve data from one or more tables in the database. SELECT command retrieves all data, sorts, and filters using accompanying different functions as illustrated.

To get all data in a given table:

SELECT *
FROM _table_name_
Enter fullscreen mode Exit fullscreen mode

To get retrieved data sorted we can use the group by function as illustrated:

SELECT *
FROM students
GROUP BY class
Enter fullscreen mode Exit fullscreen mode

However, it is advisable that the GROUPBY function be accompanied by COUNT function to make the command sensible. For instance

SELECT count(_Column_name_) from _table_name_
group by _column_name2_;
Enter fullscreen mode Exit fullscreen mode

We can also select specific columns from a table and view them using the following SELECT command:

select column1,colum3
from _table_name_;
Enter fullscreen mode Exit fullscreen mode

ii)Distinct
It is used to select only unique values in the tables for the specified column. For many tables, there is duplication of certain values and this brings redundancy when it comes to viewing. This function help solve this problem.

select distinct (_column_name_)
from _table_name_;
Enter fullscreen mode Exit fullscreen mode

iii) Where
This is an additional function that is used to filter data in the database. Here we specify the desired conditions to be met for the records we want to be displayed.

select _column_name_ 
from _table_name_
where _condition_;
Enter fullscreen mode Exit fullscreen mode

The only data that will be returned is the one that meets the conditions only and non-other Otherwise, no record will be displayed. We can also return more than one column, for instance in a class a school table where we want to see students who have more than 10 years, we can use the following command:

select Age, First_Name
from School
where Age>10;
Enter fullscreen mode Exit fullscreen mode

This command returns the column of age and First names of the students in the table who have an age greater than 10 years.
When it comes to the conditions, we use '>' for greater than,'>=' for greater or equal to,'<=' for less or equal to and '=' for equal to. It is awesome to filter data, isn't it?

Where can also be combined by logic operators to get specific results of records. Foer instance, where the condition should meet more than one criteria, we use and operator to achieve this as shown in the syntax below:

select Age, First_Name
from School
where Age>10 and First_Name!= James;
Enter fullscreen mode Exit fullscreen mode

or can also be used to get records that meet either of the condition or both of them as in the example below:

select Age, First_Name
from School
where Age>10 or First_Name = James;
Enter fullscreen mode Exit fullscreen mode

in addition to that, not can also be used to show the condition given, if met, the records that meet that condition should not be returned as shown in the example:

select Age, First_Name
from School
where not Age>10;
Enter fullscreen mode Exit fullscreen mode

This returns records of any student less than 9 years old.

b) Order By
By default, this command sorts desired data in ascending order. In the case that we want our data order by in descending order, we use the DESC function to achieve a descending order as in the sample:

select _column_name_
from _table_name_
order by _column_name1_;
Enter fullscreen mode Exit fullscreen mode

to order in descending order we specify within the command as follows:

select _column_name_
from _table_name_
order by _column_name1_ desc;
Enter fullscreen mode Exit fullscreen mode

ii)Order by more than one column
So let us say we want to use more than one column to order our data, we may ask ourselves if it is possible. The answer is yes. It is very possible to use more than one column as follows:

select _column_name_
from _table_name_
order by _column_name1_asc,column_name2 desc;
Enter fullscreen mode Exit fullscreen mode

Using more than two columns to specify order by is mostly efficient when we are dealing with data that has duplication in some columns. Let us say, in a school table, we might find several students sharing first names and just to create some order in it, we might query the table to give us the table with the column of age and class ordered for neatness purposes which results to easy view.

c)Group By
This function is used to summarize rows that have same values into groups. In most cases, it is used with aggregated functions like sum, average, max, min and count. For example, given a school, we can find the total number of students, return their first names and group by age in a given class as follows:

select count(_column_name_)
from _table_name_
group by column_name1;
Enter fullscreen mode Exit fullscreen mode

It is also acceptable to use other aggregation like minimum for example:

i) Use count

select count(_column_name_)
from _table_name_
group by column_name1;
Enter fullscreen mode Exit fullscreen mode

ii) Use max

select max(_column_name_)
from _table_name_
group by column_name1;
Enter fullscreen mode Exit fullscreen mode

iii) Use average

select avg(_column_name_)
from _table_name_
group by column_name1;
Enter fullscreen mode Exit fullscreen mode

iv) Use min

select min (_column_name_)
from _table_name_
group by _column_name1;_
Enter fullscreen mode Exit fullscreen mode

The trick with the group, is we categorize data according to a certain value we require and by default is in ascending order which we can specify by using the desc function. Priority of order is given to the column name described in the group by function rather than select function.

d) Join
This command joins rows from more than one table, based on related columns between them. The joins are in four ways;

i)inner
Gives records with matching values in both tables. For instance, if we are querying a number of columns from two different tables that have a similar column name with the same values, then the rest of the columns from different tables will be returned on basis of the common column name and values. Example

Our database has a table of customers_information and credit

CUSTOMER_ID DATE COUNTRY NAME BUSINESS
4 10/12/22 Kenya Israel Hairdressing
7 14/12/22 Tanzania Tesh Hotel
1 20/12/22 Uganda Amor School

credit table
CUSTOMER_ID CREDIT STATUS
7 Poor Denied
6 Good In_process
4 Excellent Complete

when we use inner join, the results will be as follows:

select customers_information.DATE, credit.STATUS
from credit
inner join customers_information on credit.CUSTOMER_ID=customers_information.CUSTOMER_ID;
Enter fullscreen mode Exit fullscreen mode

The syntax is as follows:

select _column_name,_ _column_name2_
from _table_1
_inner join _table_2_
on _table_1.column_name = table_2.column_name2_;
Enter fullscreen mode Exit fullscreen mode

CUSTOMER_ID DATE STATUS
7 14/12/22 Denied
4 10/12/22 Complete

It is possible to also join 3 tables, the more reason to like sql 😂 right?
We use the same syntax as above but with a slight modification to accommodate the third table as demonstrated:

select _column_name,_ _column_name2_,column_name3
from _table_1
_inner join _table_2_
on _table_1.column_name = table_2.column_name2_;
on _table_1.column_name = table_3.column_name3_;
Enter fullscreen mode Exit fullscreen mode

ii) Right Join
This command returns records of the second table (or right table). In case there are matching records from left table, then they are also returned together with those in table1.The syntax is as follows:

select _column_name,_ _column_name2_
from _table_1
right join _table_2_
on _table_1.column_name = table_2.column_name2_;
Enter fullscreen mode Exit fullscreen mode

iii) Left Join
This is similar to right join only that it returns records of table 1 and any similar records found in table 2. The syntax is also similar to that of right join as shown

select _column_name(s)
from _table_1
left join _table_2_
on _table_1.column_name = table_2.column_name2_;
Enter fullscreen mode Exit fullscreen mode

iv)Cross Join
This returns all records from all tables selected. It returns all matching records from the involved tables whether the rows in either table match or not When the conditional keyword is given(where) the results will be similar to those of inner join command query
Its syntax is as follows:

select _column_name,_ _column_name2_
from _table_1
cross join _table_2_;
Enter fullscreen mode Exit fullscreen mode

e) Aggregate
In this part, we shall take a look at the syntax that goes in the aggregate commands. They include, sum,avarage, count, minimum and maximum commands. Let us look at each of them briefly

i) Sum
Now let us say we have a numeric column, like that of age or number of children one has in a given city. It is possible to get the total number of children in that city using the sum command that follows the following syntax:

select sum( _column_name)
from _table_name_;

Enter fullscreen mode Exit fullscreen mode

In addition to this, we can also give conditions to the command of sum to get sum of specific data as shown below:

select sum( _column_name_)
from _table_name_
where condition;
Enter fullscreen mode Exit fullscreen mode

ii)count
If we are not sure by the number of rows, or just need to confirm for safety purposes, then this is the command to use. count() returns the number of rows in a given column using the following syntax:

select count( _column_name_)
from _table_name_;

Enter fullscreen mode Exit fullscreen mode

As also in the sum syntax, conditions can be added to return a specific number of desired rows in a column.

iii) Min
As from the word itself, min() function returns the least value in a given column. Additional conditions are also accepted using the where function. It follows the following syntax.:

select min( _column_name_)
from _table_name_;
Enter fullscreen mode Exit fullscreen mode

iv) avg
Given a numeric column, lets say the age of students in a class, we can get their average age by using the following syntax:

select avg( _column_name_)
from _table_name_;
Enter fullscreen mode Exit fullscreen mode

v) Max
As the name suggest, this function can be used to get the maximum value of a numeric column as shown in the syntax below:

select max( _column_name_)
from _table_name_;
Enter fullscreen mode Exit fullscreen mode

f)Null Values
If we want to check if a record has null values( no value) we use is null or is not null functions. It can be illustrated using the following example:

select Age, First_Name
from School
where Age is null;

Enter fullscreen mode Exit fullscreen mode

If there are empty records in the Age column, the results returns the number of null records and their count. If not the null table is return empty and count is zero.

ii) Is not null
It is used to show all records that do not have empty values. It is illustrated in the following example:

select Age, First_Name
from School
where Age is not null;

Enter fullscreen mode Exit fullscreen mode

g) Regulatory clauses

i) Limit
Given that sometimes we work with large data and sometimes not all the data is necessary to be viewed, we can use limit clause to specify the number like in the example only 5 rows will be returned:

select *
from School
limit 5;

Enter fullscreen mode Exit fullscreen mode

ii) Between
This clause gives a range of records that should be returned. An example is given below:

select *
from School
where Age between 10 and 14;

Enter fullscreen mode Exit fullscreen mode

iii) in
It is used to select multiple values from records using the where statement as shown in the example:

select *
from School
where Age in (3,5);

Enter fullscreen mode Exit fullscreen mode

the syntax is:

select _column_name_
from School
where column_name1 in (value1_in_column_name1,value5_in_column_name1);
Enter fullscreen mode Exit fullscreen mode

h) Manipulation
There are several commands that enable us to manipulate the tables in the databases to achieve a different thing. For instance, let us say the name given to a certain column is too technical for some people who need to view the table. In this case, the clause AS is used to achieve a change of name. It is also called the alias command as illustrated in the example:

select Last_Name as Surname
from School
where Age in (3,5);
Enter fullscreen mode Exit fullscreen mode

The record will return the column name 'Last_Name' but will be recorded as 'Surname'.
It is important to note that the name is only durable during the query time and is not permanent on the table.

ii) Update statement
If we want to change some things in the records that we have, we use the update syntax to achieve this. Where clause is used to specify the records that are being changed. Where is very important to use, otherwise if not specified, all records will be updated. The example below shows how this is possible:

update _table_name_
set column_name =(val_1,val_2,val_3)
where condition;
Enter fullscreen mode Exit fullscreen mode

for instance

update School
set First_Name = ('Julius'),Age = ('3')
where Residence = Nairobi;
Enter fullscreen mode Exit fullscreen mode

iii)Delete statement
For instance, we have been recording values that are wrong to a given table and we have just realized. Is there a way to delete these records until we find the right value to update the records? Of course, using the delete statement makes this possible. Using the following example, we can delete values of age and school for a certain student:

delete from school
where Age = 3;
Enter fullscreen mode Exit fullscreen mode

Now if we are frustrated and have just learnt that all our records are wrong, we can delete the records and remain with the table. What a fresh start, right? This can be achieved using the following syntax:

delete from School;
Enter fullscreen mode Exit fullscreen mode

All records in the 'School' table will be deleted

I) Union
Just like in mathematics, union statements return values of two tables which should be of the same data type and have the same order. They return only distinct values from the two column queried using the select statement as in the syntax below:

select column_name from table1
union
select column_name(n) from table2;
Enter fullscreen mode Exit fullscreen mode

To get duplicates, as in all the values from the two columns from the two tables, use union all as in the syntax:

select column_name from table1
union all
select column_name(n) from table2;
Enter fullscreen mode Exit fullscreen mode

Database **
**i)Creation of Database

It is a simple and straight command that will create a database of the desired name. It is as in the example below:

create database _database_name_;

ii) Deletion
We have been discussing the commands that affect tables and columns but for this section, the discussion will be on commands that affect the database generally;

If we have been recording very confidential information for a given NGO concerning medical reports of individuals in a given area, and the task that the NGO desired is achieved, we can delete the database, in case that the records are no longer important. For this case, we use the drop 'database' command to achieve this as illustrated:

drop database _databse_name_

Tables
i) Creation
We can create a table from scratch or using an existing table. All this is illustrated in the following examples:

create table _table_name_(
      _column_name datatype,
      column_name1 datatype,
      column_name2 datatype_
     );

Enter fullscreen mode Exit fullscreen mode

--creation from scratch

creation from an existing table

create table _table_name_ as 
   select column_name,column_name1,column_name2
   from _existing_table_name_;

Enter fullscreen mode Exit fullscreen mode

ii) Modification
To add and delete or change columns of a given table permanently,we use the 'Alter Table' Command.
To add column we use the following command:

alter table _table_name_
add _column_name datatype_;
Enter fullscreen mode Exit fullscreen mode

We can also change something in the column, like change of datatype of a given column. It is as illustrated:

alter table _table_name_
modify column _column_name datatype_;
Enter fullscreen mode Exit fullscreen mode

There is also a possibility of dropping a column all together, in case that is not really helpful. For instance, we can drop the column of the year of birth if we have the age column of a given record. We use the following syntax:

alter table _table_name_
drop _column_name;
Enter fullscreen mode Exit fullscreen mode

These are the ABC's of the important command for a data scientist that when fully practiced, aid efficient and easy analysis of data

Top comments (0)