DEV Community

Cover image for SQL101: Introduction to SQL for Data Analysis
Kimani Kanyutu
Kimani Kanyutu

Posted on

SQL101: Introduction to SQL for Data Analysis

Introduction

A database is a collection of related data. A database can be used alone or collaboratively with other databases. A database management system(DBMS) is a software consisting of a set of programs that facilitates the storage, modification and extraction of information from a database. DBMS were developed to address the deficiencies in traditional file systems such as;

  1. Data integrity
  2. Concurrent access by users
  3. Data redundancy and inconsistency
  4. Difficulty accessing the data
  5. Atomicity, consistency, isolation, Durability(ACID) properties

Database Models

A database model shows the structure of a database. It shows the relationships, constraints of how the data is processed and stored inside a database.

1. Hierarchical database management system

In this model, data is organized in a tree-like format. It allows for a parent-child like relationship where a parent can have many children but a child can have only one parent.

2. Network based database management system

The network based model is an advancement on the Hierarchical model. Its creation was to address the lack of flexibility in the hierarchical model.
As a result, it allows for children to have multiple parents, creating room for more complex relationships.

3. Relational database management system

The relational model allows for the storage of data in data tables(relations). The data tables have different rows(records) and different columns (attributes).
Relationships between data tables can be developed. The data tables can have a variety of relationships that range from one-to-one, one-to-many, and many-to-many. This creates room for efficient storage and retrieval of data.

4. Object oriented database management system

In object oriented database management systems also known as OODM, data and data relationships are stored together in single entities known as objects.

Database Languages

Database components are usually consisted of three main components;

I. Data Definition Language

Data definition Language is used to define structures to hold data of specific record types or object types, relationships among them any integrity constraints that needs to be met.
Some of the DDL statements include;
DROP
CREATE
ALTER
TRUNCATE
RENAME
COMMENT

II. Data Manipulation Language

Data Manipulation Language is used to select, retrieve, store, modify, delete, insert and update entries. Part of DML used to retrieve data is called Query language.
Some of the basic DML statements include; INSERT, DELETE, SELECT, UPDATE
INSERT
DELETE
SELECT
UPDATE

III. Data Control Language

Data Control Language is used to control access to data in a database. This involves giving specific privileges to the users to access data items. An example of DCL statement includes;
GRANT
REVOKE

Structured Query Language (SQL)

Structured query language is one of the widely implemented language for relational databases..

Concepts in SQL

  1. Tables. Tables are used to store data. It is a combination of several rows and columns
  2. Rows. A row is a single record in a table
  3. Columns. Columns are used to represent different attributes of the data.

Datatypes

Data types are attributes that specify the type of data the object will hold. Such as integers, character, date and time data and binary strings. Some of the common datatypes can be categorized into;

  1. String data types such as; VARCHAR() CHAR() TEXT
  2. Numeric data types such as; INT() FLOAT() BOOL()
  3. Date and time data types such as; DATE TIME TIMESTAMP DATETIME

Basic SQL commands

Creating databases

To create a database, one can use basic SQL commands. It is extremely important to first understand the type of data that will be stored and the various relationships that need to exist. Plans should be put in place for long term storage. This statements can be used to either create a new database or drop an existing database;
Database creation
CREATE DATABASE _databasename_;
Dropping and existing database;
DROP DATABASE _databasename_

Creating Tables

After creating a database, the next thing is to create tables. Here, the name of the table, the column names and column parameters as well as the datatype in the column are specified.

CREATE TABLE depts(
   first_name VARCHAR(50),
   department VARCHAR(50)
)
Enter fullscreen mode Exit fullscreen mode

Considerations in creating tables
Primary keys. This are unique integer based row identifiers in a table. They are crucial when performing joins on tables.
Foreign key. A foreign key is a reference field for a primary key in another table. The table containing the foreign key is the referencing/child table while the table to which the foreign key references is the parent table.
When creating tables, constraints can be used to define a primary key or attaching a foreign key relationship to anther table.
Constraints
They are used to enforce certain conditions for the data that is entered in columns or table. This is so as to ensure accuracy and consistence of column data. Constraints can be categorized into;
Column constraints That constraints data in a column
Table constraintsThat constraints data in the entire table.
Some examples of common constraints include;
NOT NULL , UNIQUE

CREATE TABLE employees(
        emp_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    birthday DATE CHECK (birthdate > '1900-01-01'),
    hire_date DATE CHECK (hire_date > birthdate),
    salary INTEGER CHECK (salary > 0)
)
Enter fullscreen mode Exit fullscreen mode

INSERT
After creating a table and defining constraints, the next step involves the entry of data into the table. The insert command is used. The name of the table is identified as well as the column names.

INSERT INTO employees(
first_name,
last_name,
birthdate,
hire_date,
salary
)
VALUES
('Sammy',
 'Smith',
 '1990-11-03',
 '2010-01-01',
 100
)
Enter fullscreen mode Exit fullscreen mode

UPDATE
Allows for the changing of values in a table.

UPDATE account
SET last_login   =  CURRENT_TIMESTAMP;
Enter fullscreen mode Exit fullscreen mode

DELETE
The delete statement is used to remove rows from a table. It can be used together with a conditional statement to delete all that meet/do not meet that condition

DELETE FROM job
WHERE job_name = 'Farmer'
Enter fullscreen mode Exit fullscreen mode

ALTER
The alter table command is used to add, delete or modify columns in an existing table. It can also be used to add or drop constraints as shown;

ALTER VIEW customer_info RENAME to c_info;
Enter fullscreen mode Exit fullscreen mode

DROP
The drop table statement is used to drop an existing table in a database. A condition can also be added to the drop table statement.

DROP IF EXISTS customer_info;
Enter fullscreen mode Exit fullscreen mode

Querying from a table

SELECT
The select statement is used to get data from a database. In general, SELECT * FROM table_name; can be used to get all the data in the particular table. On the other hand, to get specific columns from the table, use;

SELECT column1, column2
FROM table_name;
Enter fullscreen mode Exit fullscreen mode

SELECT DISTINCT
The select distinct statement is used to return unique values only. Eliminates duplicate values in the result.

SELECT DISTINCT column1, column2
FROM table_name;
Enter fullscreen mode Exit fullscreen mode

WHERE
When you only need data that meets certain conditions, the where statement is used. The specified column that meet the condition are returned.

SELECT memid,surname, firstname, joindate 
FROM members
WHERE joindate >= '2012-09-01';
Enter fullscreen mode Exit fullscreen mode

AND, OR, NOT
The AND, OR and NOT are used together with the WHERE statement to curate the output of a table to meet certain conditions.
AND operator is used where all conditions need to be True

SELECT column1, column2
FROM table_name
WHERE condition1 AND condition2 AND condition3;
Enter fullscreen mode Exit fullscreen mode

OR operator is used to separate conditions where any of them only need to be True

SELECT column1, column2
FROM table_name
WHERE condition1 OR condition2 OR condition3;
Enter fullscreen mode Exit fullscreen mode

NOT operator is used to return values when the condition is not True

SELECT column1, column2
FROM table_name
WHERE NOT condition;
Enter fullscreen mode Exit fullscreen mode

ORDER BY
The order by statement is used to sort the query result in either ascending or descending order.
ASC Ascending order
DESC Descending order

SELECT DISTINCT(surname) 
FROM members
ORDER BY surname ASC;
Enter fullscreen mode Exit fullscreen mode

LIMIT
The limit statement is used to specify the number of rows to be obtained in the results. It more of sets an upper limit.

SELECT DISTINCT(first_name), last_name 
FROM students
ORDER BY first_name ASC
limit 5;
Enter fullscreen mode Exit fullscreen mode

BETWEEN
The between statement is used to give values with in a certain specified range. The first and last value are specified.

SELECT * FROM Products
WHERE Product_name BETWEEN 'Cussons' AND 'Dettol'
Enter fullscreen mode Exit fullscreen mode

IN, NOT IN
The In and not in statement are used similar to the or statement. They are for specification of a number of conditions when using the where operator.

SELECT * 
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
Enter fullscreen mode Exit fullscreen mode
SELECT * 
FROM Customers
WHERE Country NOT IN ('Germany', 'France', 'UK');
Enter fullscreen mode Exit fullscreen mode

LIKE, ILIKE
The like and ilike are used with the where statement to search for patterns in a column. The Like operator assess both pattern and casing of the individual characters while ilike assess only the pattern.
There are wildcards that are used in conjunction with this two. They include;
% - One or several characters
_ - One single character.

DELETE FROM parent
WHERE last_name ILIKE 'l%';
Enter fullscreen mode Exit fullscreen mode
SELECT actor_id, first_name, last_name
FROM actor
WHERE last_name LIKE 'B%';
Enter fullscreen mode Exit fullscreen mode

Groupby
The group by statement is used to group rows that have similar values under certain columns. Most at time, the group by function is used together with aggregate functions such as COUNT(), MAX(), MIN(), SUM()

SELECT staff_id,customer_id, sum(amount)
FROM payment
GROUP BY staff_id,customer_id
Enter fullscreen mode Exit fullscreen mode

SQL provides a wide range of commands that can be used to query, add and manipulate data in tables and database. This is not an exhaustive list of the command but a summary of the few basic SQL commands that a beginner is most likely frequently going to interact with.

Top comments (1)

Collapse
 
njerigitome profile image
Njeri Gitome

The explanation is simple and precise👍