DEV Community

Cover image for Understanding Databases: SQL, NoSQL, Schemas,DDL, and DML
De' Clerke
De' Clerke

Posted on • Edited on

Understanding Databases: SQL, NoSQL, Schemas,DDL, and DML

🗄️ Database Essentials

Databases sit at the core of every modern application. Whether you're building a social media platform, an online store, or a data pipeline, you need a reliable way to store, organize, and access information. This guide covers the core concepts every developer and data engineer should have solid before going deeper.


What is a Database?

A database is an organized collection of data that can be stored, managed, and retrieved efficiently. Instead of scattering data across files or spreadsheets, databases provide a structured system where data can be queried, updated, and maintained consistently.


Types of Databases

Databases come in many flavors, but the two most common categories are SQL (relational) and NoSQL (non-relational).

SQL Databases

  • Structure: Tables with rows and columns
  • Examples: PostgreSQL, MySQL, Oracle, SQL Server
  • Strengths:
    • Strong consistency and reliability
    • Support for complex queries and relationships (joins)
    • Schema-based design enforces data integrity

NoSQL Databases

  • Structure: Can be document-based, key-value pairs, wide-column stores, or graph databases
  • Examples: MongoDB (document), Redis (key-value), Cassandra (wide-column), Neo4j (graph)
  • Strengths:
    • Flexible schema; data doesn't need to fit into fixed tables
    • Handles unstructured or semi-structured data well
    • Scales horizontally, often preferred for big data and real-time apps

Schemas

A schema is the blueprint of a database. It defines how data is organized, what data types are allowed, and how different entities relate to each other.

In SQL databases, schemas are strict and defined before data is inserted. In NoSQL databases, schemas are flexible and each document or record can have different fields.

SQL Schema Example

Real pipelines often separate raw ingestion from cleaned data. A two-schema setup looks like this:

-- Stock price table (raw ingestion layer)
CREATE TABLE IF NOT EXISTS stock_prices (
    id          SERIAL PRIMARY KEY,
    symbol      VARCHAR(20)   NOT NULL,
    close_price NUMERIC(12, 4),
    volume      BIGINT,
    timestamp   TIMESTAMPTZ   NOT NULL DEFAULT NOW(),
    source      VARCHAR(50),

    CONSTRAINT uq_symbol_ts UNIQUE (symbol, timestamp),
    CONSTRAINT chk_price    CHECK (close_price > 0)
);

-- Listings table (with foreign key reference)
CREATE TABLE IF NOT EXISTS listing_images (
    id         BIGSERIAL PRIMARY KEY,
    listing_id BIGINT NOT NULL REFERENCES listings(id) ON DELETE CASCADE,
    url        TEXT NOT NULL,
    uploaded   TIMESTAMPTZ DEFAULT NOW()
);
Enter fullscreen mode Exit fullscreen mode

Notice the CHECK and UNIQUE constraints. These aren't optional extras in a production schema. They're what keeps bad data out without writing application logic to handle it.

NoSQL Schema Example

{
  "id": "listing_001",
  "location": "Westlands, Nairobi",
  "price": 4500000,
  "bedrooms": 3,
  "images": [
    {"url": "https://cdn.example.com/img1.jpg"},
    {"url": "https://cdn.example.com/img2.jpg"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

The same listing in a relational database would require a listings table and a separate listing_images table joined by a foreign key. NoSQL embeds related data directly in one document.


When to Use SQL vs NoSQL

Use SQL when:

  • Data is highly structured with clear relationships
  • You need ACID transactions (banking, financial pipelines, e-commerce checkout)
  • Queries involve complex joins and aggregations

Use NoSQL when:

  • Data is semi-structured, rapidly changing, or unstructured
  • Applications need high scalability and throughput at massive scale
  • You're working with big data, caching, or real-time event streams

SQL favors consistency and structure. NoSQL favors flexibility and speed. Most serious data platforms end up using both.


DDL vs DML

Two key categories of SQL commands you'll work with constantly:

DDL (Data Definition Language) defines and manages database structures.

Command What it does
CREATE Create tables, schemas, indexes
ALTER Modify an existing table (add column, rename, change type)
DROP Delete a table or schema permanently
TRUNCATE Remove all rows from a table (faster than DELETE)

DML (Data Manipulation Language) works with the actual data inside those structures.

Command What it does
INSERT Add new rows
UPDATE Modify existing rows
DELETE Remove rows
SELECT Query and return data

DDL and DML in Practice

Here is a realistic sequence showing both in action:

-- DDL: define the structure
CREATE TABLE IF NOT EXISTS forex_rates (
    id         SERIAL PRIMARY KEY,
    pair       VARCHAR(10) NOT NULL,
    rate       NUMERIC(12, 6),
    fetched_at TIMESTAMPTZ NOT NULL
);

-- DML: insert data
INSERT INTO forex_rates (pair, rate, fetched_at)
VALUES ('USD/KES', 129.5, NOW());

-- DML: query it back
SELECT pair, rate, fetched_at
FROM forex_rates
WHERE pair = 'USD/KES'
ORDER BY fetched_at DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

One DML pattern worth knowing early is UPSERT. In a data pipeline you often don't know whether a record already exists, so you insert it and handle conflicts in the same statement:

INSERT INTO forex_rates (pair, rate, fetched_at)
VALUES ('USD/KES', 130.1, NOW())
ON CONFLICT (pair)
DO UPDATE SET
    rate      = EXCLUDED.rate,
    fetched_at = EXCLUDED.fetched_at;
Enter fullscreen mode Exit fullscreen mode

This runs every pipeline run without failing or creating duplicates. It's one of the most useful patterns in production SQL.


Transactions and ACID

SQL databases guarantee ACID properties:

  • Atomicity: All operations in a transaction succeed or none do
  • Consistency: The database always moves from one valid state to another
  • Isolation: Concurrent transactions don't interfere with each other
  • Durability: Committed data survives crashes

A transaction in practice:

BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 1;
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
Enter fullscreen mode Exit fullscreen mode

If the second UPDATE fails, the whole transaction rolls back. Neither account changes. This is why SQL databases are the default choice for financial systems, payment pipelines, and anything where partial writes are unacceptable.


Quick Summary

SQL NoSQL
Structure Fixed schema, tables Flexible schema
Best for Relationships, ACID transactions Scale, unstructured data
Examples PostgreSQL, MySQL MongoDB, Redis, Cassandra
Query language SQL Varies (JSON queries, key lookups)

These fundamentals apply whether you're building a web app, a data pipeline, or an analytics warehouse. The tools change but the tradeoffs don't.


Follow me on dev.to for more data engineering content, or browse the project code at github.com/declerke.

Top comments (0)