Data Lanes

Posted on Sep 1

SQL for Data Professionals: From Basics to Advanced Analytics

#sql #datascience #programming #database

SQL (Structured Query Language) remains the backbone of data retrieval and manipulation in the world of relational databases.

Whether you’re a data analyst, data engineer, or business intelligence professional, learning SQL unlocks the ability to explore data, uncover insights, and build analytics pipelines with ease.

In this article, part of our SQL Learning Series, you’ll go from SQL basics to advanced SQL techniques using real-world examples.

What is a Relational Database Management System (RDBMS)?

A relational database management system (RDBMS) is a program that allows you to create, update, and administer a relational database.

Relational databases store data in tables(rows and columns).

Relationships between tables are established using keys:

Primary Key (PK): Uniquely identifies each row in a table.
Foreign Key (FK): Refers to the primary key in another table to maintain relationships.

What the heck is SQL?

SQL stands for Structured Query Language.

It’s declarative, meaning you tell the database what you want, and it figures out how to get it.

Three core categories of SQL commands:

DDL (Data Definition Language) → Create or modify database objects like tables(CREATE, ALTER).
DML (Data Manipulation Language) → Insert, update, or delete data.
DQL (Data Query Language) → Retrieve data with SELECT queries.

These form the foundation for both data exploration and advanced analytics tasks.

Writing Your First SQL Queries

Let’s look at some beginner-friendly SQL commands:

CREATE TABLE friends (
  id INTEGER,
  name TEXT,
  birthday DATE,
)
INSERT INTO friends (id, name, birthday) 
VALUES (1, 'Ororo Munroe', '1940-05-30');

SELECT * 
FROM friends;

UPDATE friends
SET name = 'Storm'
WHERE id = 1;

ALTER TABLE friends
ADD COLUMN email TEXT;

UPDATE friends
SET email = storm@codecademy.com
WHERE id = 1;

DELETE FROM friends
WHERE id = 1;

SELECT * 
FROM friends;

Ah, don't worry, I'm going to explain it all like you're five:

CREATE TABLE is a clause that tells SQL you want to create a new table.
friends is the name of the table.
(id INTEGER, name TEXT, birthday DATE) is a list of parameters defining each column, or attribute in the table and its data type.
The INSERT statement inserts a new row into a table:
- (id, name, age) is a parameter identifying the columns that data will be inserted into.
- VALUES is a clause that indicates the data being inserted.
SELECT statements are used to fetch data from a database.
- * is a special wildcard character that allows you to select every column in a table without having to name each one individually.
The ALTER TABLE statement is used to add or modify table columns.
- ADD COLUMN is a clause that lets you add a new column to a table.
The UPDATE statement edits a row in a table.
- You can use the UPDATE statement when you want to change existing records.
- WHERE is a clause that indicates which row(s) to update with the new column value.
The DELETE FROM statement deletes one or more rows from a table.
- WHERE is a clause that lets you select which rows you want to delete.

Pro tip: SQL commands are not case-sensitive. select = SELECT.

Adding Constraints for Data Integrity

Constraints define rules for your data.

Example: The statement below sets constraints on the awards table.

CREATE TABLE awards (
   id INTEGER PRIMARY KEY,
   recipient TEXT NOT NULL,
   award_name TEXT DEFAULT 'Grammy'
);

PRIMARY KEY → Uniquely identifies each record.
NOT NULL → Prevents empty values.
DEFAULT → Sets a fallback value if none is provided.

This ensures data stays accurate and consistent.

Database Indexes: Speeding Things Up

Indexes are a powerful tool used in the background of a database to speed up querying. Simply put, an index is a pointer to data in a table.

How are indexes created?

In a database, data is stored in rows which are organized into tables. Each row has a unique key which distinguishes it from all other rows and those keys are stored in an index for quick retrieval.

When a new row with a unique key is added, the index updates automatically.

However, sometimes we need to quickly look up data that isn’t stored as a key—for example, finding customers by telephone number. Using a unique constraint isn’t suitable in such cases because multiple rows can share the same value. Instead, custom indexes can be created.

Creating an index:

CREATE INDEX <index_name>
ON <table_name> (column1, column2, ...)

This allows faster access to frequently queried columns even if they are not unique keys.

QUERIES

One of the core purposes of the SQL language is to retrieve information stored in a database. This is commonly referred to as querying.

We'll be using SQL commands to query a table of restaurants called 'quench' and generate insights out of it.

Start by getting a feel for the quench table:

SELECT *
FROM quench;

Find Distinct Neighborhoods

SELECT DISTINCT neighborhood
FROM quench;

DISTINCT removes duplicates, so each neighborhood appears only once.

Let’s say you’re craving Chinese food. To see all restaurants serving Chinese cuisine:

SELECT *
FROM quench
WHERE cuisine = 'Chinese';

The WHERE clause filters rows based on a condition.

Filter by Review Score

Want restaurants rated 4 or higher?

SELECT *
FROM quench
WHERE review >= 4;

Conditions can use comparison operators like =, >=, <, or <> (not equal).

If you want to find Italian restaurants with exactly three dollar signs:

SELECT *
FROM quench
WHERE cuisine = 'Italian'
   AND price = '$$$';

Both conditions must be true for rows to be included.

Search by Partial Name

Let's say you can’t remember the exact name of a restaurant you went to earlier but knows it contains the word ‘meatball’ in it.

How will you find it?

SELECT *
FROM quench
WHERE name LIKE '%meatball%';

LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column.

%is a wildcard character that matches zero or more missing characters in the pattern.

Filter by Multiple Locations

Find all the close by spots in Midtown, Downtown or Chinatown.

SELECT name, cuisine, neighborhood
FROM quench
WHERE neighborhood = 'Midtown'
   OR neighborhood = 'Downtown'
   OR neighborhood = 'Chinatown';

Note: Operators like OR/AND can be used more than once.

We selected only name, cuisine, and neighborhood for a concise result set.

Find Missing Health Grades.

SELECT *
FROM quench
WHERE health IS NULL;

Note: IS NULL checks for empty or missing values.

Top 10 Restaurants by Reviews

SELECT *
FROM quench
ORDER BY review DESC
LIMIT 10;

ORDER BY sorts results.
DESC = descending order (highest first).
LIMIT restricts the number of rows returned.

Woah! That was a lot to digest—literally. Now I’m craving some actual food!

Aggregate Functions

When working with databases, you often need more than just raw data ;you need insights. That’s where SQL aggregate functions come in.

They allow you to perform calculations across multiple rows and return a single, meaningful result.

Some of the most commonly used aggregate functions include:

COUNT() : count the number of rows
SUM(): the sum of the values in a column
MAX()/ MIN(): the largest/smallest value
AVG(): the average of the values in a column
ROUND(): round the values in the column

In addition, you can use GROUP BY, ORDER BY, and HAVING clauses to organize and filter aggregated data effectively.

Example:

Imagine you own a chain of lemonade stands across different neighborhoods.

Every day, each stand sells some cups of lemonade. At the end of the month, you want answers to questions like:

How much lemonade did we sell each day?
Which days were really good (sold more than 100 cups)?
Which days made the most sales so we can study what went right?

In SQL terms:

Each stand’s daily sales = rows in our table
Total cups sold per day = aggregate function (SUM)
Filtering good days only = HAVING
Sorting best days on top = ORDER BY

Step 1: Group the Sales Per Day (GROUP BY)

First, let’s sum up cups sold per day:

SELECT date, SUM(cups_sold) AS total_cups
FROM sales
GROUP BY date;

Here’s what happens:

GROUP BY date → Puts all sales on the same day together
SUM(cups_sold) → Adds them up for each day

Step 2: Keep Only Good Days (HAVING)

Now we only want days with at least 100 cups sold:

SELECT 
    date,
    SUM(cups_sold) AS total_cups
FROM sales
GROUP BY date
HAVING SUM(cups_sold) >= 100;

HAVING is like a filter for groups.

WHERE filters rows before grouping, but HAVING filters after aggregates are calculated.

Step 3: Sort the Best Days (ORDER BY)

Finally, let’s sort days by total cups sold so the best days come first:

SELECT 
    date,
    SUM(cups_sold) AS total_cups
FROM sales
GROUP BY date
HAVING SUM(cups_sold) >= 100
ORDER BY total_cups DESC;

Wrapping Up

In this post, we started with simple queries to explore data, then moved on to aggregate functions and powerful clauses like GROUP BY, HAVING, and ORDER BY to summarize, filter, and sort results effectively.

Mastering these tools gives you the ability to:
Spot trends quickly
Summarize large datasets in seconds
Filter and organize information with precision

The next time you open a database, you’ll know exactly how to turn tables of numbers into actionable knowledge.

Top comments (1)

Christopher Wright • Sep 1

Great walkthrough! How would you incorporate JOINs, subqueries, and window functions into the examples here? Next, consider a post on query optimization—reading EXPLAIN plans, choosing indexes, and handling NULLs and date/time quirks.