Igor Nosatov

Posted on Jan 26

SQL Restaurant Kitchen ---Indexes and Partitions

#sql #performance #database

The Night Everything Broke

It was 2 AM on a Friday when my phone exploded with alerts. Our e-commerce platform was dying. Page load times had gone from 200ms to 40+ seconds. Customers were abandoning carts. My hands shook as I SSH'd into the database server.

The culprit? A seemingly innocent query:

SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

This query was taking 38 seconds to execute. Our orders table had grown to 50 million rows. No indexes. No partitions. Just pure, unoptimized chaos.

As I sat there watching EXPLAIN ANALYZE output scroll by, my roommate walked in from his late shift as a line cook. "Dude, why do you look like you're about to cry?"

I explained the problem. He looked at my screen and said something that changed how I think about databases forever:

"Your database is like a kitchen with no organization. You're asking the chef to find one specific tomato in a warehouse full of every ingredient ever used."

That night, sitting in our tiny apartment, a chef taught me more about database optimization than any textbook ever did.

The Restaurant Kitchen Model: How Professional Kitchens Actually Work

Before we dive into SQL, let me paint you a picture of how a professional kitchen operates during dinner rush.

The Unorganized Kitchen (No Indexes)

Imagine a restaurant where everything is in giant boxes:

Box A: All vegetables (500 items mixed together)
Box B: All proteins (300 items mixed together)
Box C: All spices (200 items mixed together)

When the chef needs "fresh basil," someone has to:

Open Box C
Look through ALL 200 spices
Check each container one by one
Finally find the basil

This is a full table scan. It works, but it's painfully slow.

The Organized Kitchen (With Indexes)

Now imagine the same kitchen with a proper system:

A labeled spice rack organized alphabetically
A card catalog showing: "Basil → Rack 3, Shelf 2, Position 5"
Color-coded containers
A floor plan with marked zones

The chef needs basil? They:

Check the catalog (the index)
Go directly to Rack 3, Shelf 2, Position 5
Grab the basil
Continue cooking

This is an indexed query. The time saved is exponential.

The Multi-Station Kitchen (Partitioned Tables)

But here's where it gets interesting. Large restaurants don't have one kitchen—they have stations:

┌─────────────────────────────────────────────┐
│           RESTAURANT KITCHEN                │
├──────────────┬──────────────┬───────────────┤
│   Salad      │     Grill    │    Pastry     │
│   Station    │    Station   │    Station    │
│              │              │               │
│ • Lettuce    │ • Steaks     │ • Flour       │
│ • Tomatoes   │ • Chicken    │ • Sugar       │
│ • Dressings  │ • Fish       │ • Chocolate   │
└──────────────┴──────────────┴───────────────┘

When an order comes in for "Caesar Salad," the system:

Routes it to the Salad Station only
That station has its own organized tools and ingredients
Other stations aren't even disturbed

This is table partitioning. You're dividing your data into logical segments so queries only search where data actually lives.

The Technical Translation: From Kitchen to Database

Act 1: Adding Your First Index (The Spice Rack)

Let's go back to our disaster query:

-- THE SLOW QUERY (Full Kitchen Search)
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

-- Execution time: 38 seconds 😱
-- Rows scanned: 50,000,000

The database is literally checking all 50 million orders, one by one. Let's add an index:

-- CREATE THE SPICE RACK (Index)
CREATE INDEX idx_orders_customer_created 
ON orders(customer_id, created_at DESC);

What just happened?

The database created a separate data structure that looks like this:

Index Structure (Simplified):
┌────────────────┬──────────────┬─────────────────────┐
│  customer_id   │  created_at  │  row_pointer        │
├────────────────┼──────────────┼─────────────────────┤
│  12847         │  2024-01-15  │  → Row at block 423 │
│  12847         │  2024-01-10  │  → Row at block 421 │
│  12847         │  2024-01-05  │  → Row at block 418 │
│  ...           │  ...         │  ...                │
└────────────────┴──────────────┴─────────────────────┘

Now when you run the query:

-- SAME QUERY, NOW WITH INDEX
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

-- Execution time: 12ms 🚀
-- Rows scanned: 20 (exactly what we needed!)

From 38 seconds to 12 milliseconds. That's a 3,166x improvement.

The Index Decision Tree (When to Add the Spice Rack)

Do you frequently query by this column?
    │
    ├─ YES → Is it in a WHERE clause?
    │         │
    │         ├─ YES → CREATE INDEX
    │         │
    │         └─ NO → Is it in a JOIN?
    │                   │
    │                   ├─ YES → CREATE INDEX
    │                   │
    │                   └─ NO → Is it in ORDER BY?
    │                             │
    │                             └─ YES → Consider INDEX
    │
    └─ NO → Don't waste space on an index

Act 2: The Multi-Column Index Gotcha

Here's where most developers trip up. Watch this:

-- Creating individual indexes (WRONG)
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_created ON orders(created_at);

-- Query performance: STILL SLOW
-- Why? The database can only use ONE index efficiently

You need a composite index (multi-column):

-- The RIGHT way (Order matters!)
CREATE INDEX idx_orders_customer_created 
ON orders(customer_id, created_at DESC);

The order matters because of the "Leftmost Prefix Rule":

This index helps queries with:

✅ WHERE customer_id = 123
✅ WHERE customer_id = 123 AND created_at > '2024-01-01'
❌ WHERE created_at > '2024-01-01' (customer_id must be specified first!)

Think of it like a phone book: You can find "Smith, John" but you can't efficiently find "everyone named John" because it's sorted by last name first.

Act 3: When One Kitchen Isn't Enough (Table Partitioning)

Six months after the index fix, we hit another wall. Our orders table was now 200 million rows. Even with indexes, some queries were slowing down. The table maintenance (VACUUM, backups) was taking hours.

My chef roommate: "You need to split the kitchen. You're trying to run a single kitchen for 50 different cuisines."

The Partitioning Strategy

We decided to partition by time (the most common pattern):

-- Step 1: Create the partitioned table
CREATE TABLE orders (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    status VARCHAR(50)
) PARTITION BY RANGE (created_at);

-- Step 2: Create partitions (the "stations")
CREATE TABLE orders_2023_q1 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE orders_2023_q2 PARTITION OF orders
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

CREATE TABLE orders_2023_q3 PARTITION OF orders
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');

CREATE TABLE orders_2023_q4 PARTITION OF orders
    FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');

CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

-- Step 3: Add indexes to EACH partition
CREATE INDEX idx_orders_2023_q1_customer 
    ON orders_2023_q1(customer_id, created_at);

CREATE INDEX idx_orders_2023_q2_customer 
    ON orders_2023_q2(customer_id, created_at);
-- ... (repeat for each partition)

What changed?

-- Query for recent orders
SELECT * FROM orders 
WHERE created_at > '2024-01-15'
  AND customer_id = 12847;

-- Before partitioning:
-- Scans: 200 million rows → finds 20
-- Time: 5 seconds

-- After partitioning:
-- Scans: ONLY orders_2024_q1 (15 million rows) → finds 20
-- Time: 80ms

The database is smart enough to know: "This query only needs Q1 2024 data. I'll skip the other 185 million rows entirely."

Partition Pruning in Action

Here's the magic visualized:

Query: WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01'

┌──────────────────────────────────────────────┐
│         ORDERS TABLE (Partitioned)           │
├───────────────┬───────────────┬──────────────┤
│  2023_Q1      │   2023_Q2     │   2023_Q3    │
│  ❌ SKIPPED   │   ❌ SKIPPED  │   ❌ SKIPPED │
└───────────────┴───────────────┴──────────────┘
├───────────────┬───────────────┐
│  2023_Q4      │   2024_Q1     │
│  ❌ SKIPPED   │   ✅ SCANNED  │  ← Only this one!
└───────────────┴───────────────┘

The Recipe: A Step-by-Step Guide to Implementing This

Scenario: You have a slow, growing table

-- 1. IDENTIFY slow queries
EXPLAIN ANALYZE 
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC;

-- Look for: "Seq Scan on orders" (BAD!)
-- Want to see: "Index Scan using idx_..." (GOOD!)

2. Add Strategic Indexes

-- For WHERE clauses
CREATE INDEX idx_orders_customer_id 
    ON orders(customer_id);

-- For multi-condition queries (BETTER)
CREATE INDEX idx_orders_customer_created 
    ON orders(customer_id, created_at DESC);

-- For foreign keys (often forgotten!)
CREATE INDEX idx_orders_user_id 
    ON orders(user_id);

3. Test Performance

-- Before
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE customer_id = 12847;

-- Output:
-- Seq Scan on orders  (cost=0.00..892341.00 rows=1240 width=120) 
--   (actual time=0.034..5234.123 rows=1240 loops=1)
-- Planning Time: 0.123 ms
-- Execution Time: 5234.567 ms

-- After adding index:
-- Index Scan using idx_orders_customer_id on orders  
--   (cost=0.56..1234.00 rows=1240 width=120) 
--   (actual time=0.034..12.456 rows=1240 loops=1)
-- Planning Time: 0.098 ms
-- Execution Time: 12.678 ms

4. Consider Partitioning When...

✅ Table has 50M+ rows
✅ Queries often filter by date/time
✅ You regularly archive/delete old data
✅ Maintenance operations are slow
❌ Most queries need data across all partitions
❌ Partitioning column changes frequently

5. Implement Partitioning (PostgreSQL)

-- Create new partitioned table
CREATE TABLE orders_new (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    created_at TIMESTAMP NOT NULL,
    -- ... other columns
) PARTITION BY RANGE (created_at);

-- Create partitions for existing data
CREATE TABLE orders_2023 PARTITION OF orders_new
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders_new
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Migrate data (during low-traffic window)
INSERT INTO orders_new SELECT * FROM orders;

-- Atomic swap (requires careful planning)
BEGIN;
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_new RENAME TO orders;
COMMIT;

-- Add indexes to each partition
CREATE INDEX idx_orders_2023_customer 
    ON orders_2023(customer_id, created_at);
CREATE INDEX idx_orders_2024_customer 
    ON orders_2024(customer_id, created_at);

The Gotchas (What They Don't Tell You in Tutorials)

1. Indexes Aren't Free

Every index slows down INSERT, UPDATE, and DELETE operations:

-- Without indexes
INSERT INTO orders VALUES (...);  -- 2ms

-- With 5 indexes
INSERT INTO orders VALUES (...);  -- 12ms

-- The database must update all 5 indexes!

Rule of thumb: Only index columns you query frequently.

2. Over-Indexing Is Real

I once worked with a table that had 23 indexes. Write performance was abysmal. We removed 17 of them. Nothing broke. Writes became 4x faster.

3. Partial Indexes Are Your Secret Weapon

-- Instead of indexing ALL orders
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Only index PENDING orders (if that's what you query)
CREATE INDEX idx_orders_pending 
    ON orders(customer_id) 
    WHERE status = 'pending';

-- This index is 10x smaller and MUCH faster

4. Partitioning Doesn't Automatically Create Indexes

This mistake cost me 3 hours of debugging:

-- When you partition a table, indexes DON'T carry over!
-- You must create them on EACH partition:

CREATE INDEX idx_orders_2023_customer 
    ON orders_2023(customer_id);
CREATE INDEX idx_orders_2024_customer 
    ON orders_2024(customer_id);
-- ... etc

5. Partitioning Complicates Queries

-- This query CAN'T use partition pruning (must check all partitions)
SELECT COUNT(*) FROM orders WHERE customer_id = 12847;

-- This query DOES use pruning (filters by partition key)
SELECT COUNT(*) FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2024-01-01';

Real-World Results: The Numbers

After implementing indexes and partitioning on our production database:

Metric	Before	After	Improvement
Avg Query Time	5.2s	48ms	108x faster
P95 Query Time	38s	340ms	111x faster
Peak QPS	120	2,400	20x increase
Table Size	180GB	180GB (same)	Data unchanged
Index Size	0GB	24GB	Worth it
Backup Time	4 hours	45 min	Parallel dumps
Customer Complaints	Daily	Zero	Priceless

Cost to implement: 2 days of development + 4 hours of maintenance window

Annual cost savings: $120K in server costs alone (fewer resources needed)

The Automatic Partition Manager (Bonus Script)

Manually creating partitions is tedious. Here's a script that auto-creates future partitions:

-- PostgreSQL function to auto-create monthly partitions
CREATE OR REPLACE FUNCTION create_future_partitions()
RETURNS void AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date TEXT;
    end_date TEXT;
BEGIN
    -- Create partitions for next 12 months
    FOR i IN 0..11 LOOP
        partition_date := DATE_TRUNC('month', CURRENT_DATE) + (i || ' months')::INTERVAL;
        partition_name := 'orders_' || TO_CHAR(partition_date, 'YYYY_MM');
        start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
        end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');

        -- Check if partition exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_tables 
            WHERE tablename = partition_name
        ) THEN
            -- Create partition
            EXECUTE format(
                'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)',
                partition_name, start_date, end_date
            );

            -- Add indexes
            EXECUTE format(
                'CREATE INDEX idx_%I_customer ON %I(customer_id, created_at)',
                partition_name, partition_name
            );

            RAISE NOTICE 'Created partition: %', partition_name;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule this to run monthly (using pg_cron or external scheduler)
SELECT create_future_partitions();

The Lessons I Learned (That Night and Beyond)

1. Indexes Are Not Magic

They're a trade-off. More indexes = faster reads but slower writes. Find the balance.

2. Measure, Don't Guess

Always use EXPLAIN ANALYZE. Your intuition about what's slow is probably wrong.

3. Start Simple

Don't partition until you need to. Indexes often solve 95% of performance problems.

4. Partitioning Is Not Sharding

Partitions are in the same database. Sharding is splitting across multiple databases/servers. Know the difference.

5. The Kitchen Metaphor Really Works

When explaining database concepts to non-technical stakeholders, I still use the kitchen model. It clicks instantly.

Your Turn: The 5-Minute Performance Audit

Try this on your own database RIGHT NOW:

-- Find your slowest queries
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Find tables with no indexes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
    AND tablename NOT IN (
        SELECT tablename FROM pg_indexes
    )
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Find unused indexes (wasting space)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
    AND indexrelname NOT LIKE '%_pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;

Final Thoughts

That 2 AM disaster was the best thing that happened to my career. It forced me to understand databases at a fundamental level. And weirdly, it deepened my appreciation for professional kitchens.

Every time I create an index now, I think: I'm organizing the spice rack.

Every time I partition a table, I think: I'm creating specialized stations.

Every time I optimize a query, I think: I'm improving the restaurant flow.

My roommate quit cooking last year and became a database administrator. He says I ruined restaurants for him—now he can't stop seeing them as distributed systems.

Go Forth and Optimize

Challenge: This week, audit one slow query in your production database. Create an index. Measure the improvement. Share your results in the comments—I want to hear your before/after numbers!

Questions I'll answer in comments:

MySQL vs PostgreSQL partitioning differences
When to use HASH vs RANGE vs LIST partitioning
Index bloat and maintenance strategies
Partitioning strategies for multi-tenant apps

Resources

PostgreSQL Partitioning Docs
Use The Index, Luke - Best indexing guide ever

- Explain.depesz.com - Visualize EXPLAIN output

Have a database disaster story? Share it below! The best ones often teach the most valuable lessons. 🚀

Tags: #sql #database #performance #postgresql

Meta Description: A production disaster story that teaches SQL indexing and partitioning through restaurant kitchen metaphors. Includes real code examples and a 3,166x performance improvement.

Cover Image Suggestion: A split image showing a chaotic kitchen (left) vs. an organized kitchen with labeled stations (right), metaphorically representing unoptimized vs. optimized databases.

Top comments (1)

Ankit • Jan 29

Perfect kitchen analogy to explain the non technical stakeholders!.

For a large table, 700m unique user ids with each has 6 repeated rows, where queries don't need to have date time based filtering - what could be the partitioning strategy?
Sample table data
Userid1, x1, y2
Userid2, x2, y4
Userid1, x4, y2
Userid3, x5, y1
Userid1, x3, y3

The where clause in query will have userid invariably.