DEV Community

Cover image for Introduction to SQL for Data Science and Data Analysis
Acar Emmanuel
Acar Emmanuel

Posted on • Edited on

Introduction to SQL for Data Science and Data Analysis

Whether you're diving into data science or data analysis, SQL (Structured Query Language) is a must-have skill. It’s the language of databases—powerful, efficient, and everywhere.

In this article, we’ll walk through the basics of SQL, including the types of statements used to create, manage, and retrieve data. By the end, you'll be comfortable working with SQL queries and understanding how they fit into your data workflow.


📁 What is a Database?

A database is an organised collection of data stored electronically. Think of it as a digital filing system where information is organized for easy access and management.

For example, a retail business might have a database with tables for customers, orders, and products.


🧱 What is a Schema?

A schema is the blueprint of a database. It defines how data is organized—what tables exist, what fields they have, and how they relate to each other.

Imagine a schema as the floor plan of your data house.


💽 What is a Relational Database Management System (RDBMS)?

A Relational Database Management System (RDBMS) is software that manages databases based on a relational model—data is stored in tables (also called relations), and tables can be linked using keys.

Popular RDBMSs include MySQL, PostgreSQL, SQLite, SQL Server, and Oracle.


🧪 Example Database: sales_db

To demonstrate SQL concepts, we’ll use a simple example database called sales_db, with the following tables:

-- customers table
CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100),
  country VARCHAR(50)
);

-- products table
CREATE TABLE products (
  product_id INT PRIMARY KEY,
  name VARCHAR(100),
  price DECIMAL(10,2)
);

-- orders table
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  product_id INT,
  quantity INT,
  order_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Enter fullscreen mode Exit fullscreen mode

🔨 SQL Categories: DDL, DML, DQL

SQL commands are grouped into several categories. Let’s break them down:

📐 1. Data Definition Language (DDL)

Used to define and modify database structure:

  • CREATE: Creates tables or databases.
  • ALTER: Modifies existing tables.
  • DROP: Deletes tables or databases.
-- Add a column to products table
ALTER TABLE products ADD COLUMN stock INT;
Enter fullscreen mode Exit fullscreen mode

✍️ 2. Data Manipulation Language (DML)

Used to insert, update, or delete data:

  • INSERT: Adds data.
  • UPDATE: Modifies data.
  • DELETE: Removes data.
-- Insert new customer
INSERT INTO customers (customer_id, name, country)
VALUES (1, 'Alice', 'Uganda');

-- Update a product's price
UPDATE products SET price = 15.99 WHERE product_id = 2;

-- Delete an order
DELETE FROM orders WHERE order_id = 10;
Enter fullscreen mode Exit fullscreen mode

🔍 3. Data Query Language (DQL)

Used to fetch data—this is the bread and butter of data analysis:

  • SELECT: Retrieves data from one or more tables.
-- Get all products
SELECT * FROM products;

-- Get customer names from Kenya
SELECT name FROM customers WHERE country = 'Kenya';
Enter fullscreen mode Exit fullscreen mode

🧮 Aggregate Functions

Aggregate functions summarize data across multiple rows:

  • COUNT(): Number of rows
  • SUM(): Total value
  • AVG(): Average value
  • MIN() / MAX(): Lowest / highest value
-- Total quantity sold
SELECT SUM(quantity) AS total_units_sold FROM orders;

-- Average product price
SELECT AVG(price) AS average_price FROM products;
Enter fullscreen mode Exit fullscreen mode

🔤 ORDER BY and HAVING

🗂 ORDER BY: Sort results

-- List products by price, highest first
SELECT * FROM products
ORDER BY price DESC;
Enter fullscreen mode Exit fullscreen mode

🧾 HAVING: Filter groups (used with GROUP BY)

-- Find customers with more than 3 orders
SELECT customer_id, COUNT(*) AS num_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 3;
Enter fullscreen mode Exit fullscreen mode

🎯 Final Thoughts

SQL is a foundational tool in the data world. By mastering its basic commands and understanding how databases work, you unlock the power to retrieve insights from raw data.

Next up, you can explore JOINs, subqueries, window functions, and performance optimization.

🚀 Tip: Practice using free tools like SQLite Online, DB Fiddle, or Mode SQL to build and query databases.

Let me know in the comments if you'd like a Part 2!

Top comments (2)

Collapse
 
njihia profile image
Wairimu NJihia

A database is an organised collection of data - databases can also consist of unstructured data

Collapse
 
manuel_a profile image
Acar Emmanuel

Thank you for your input Njihia, considering NoSQL databases that works with semi-structured and unstructured data.