archi-jain

Posted on Mar 8

Day 4/100: Advanced Query Features - From Basic CRUD to Production API

#100daysofcode #python #backend #fastapi

Part of my 100 Days of Code journey. Today we make the API actually usable at scale.

The Challenge

Add advanced query features to the Task Management API: filtering, sorting, pagination, search, and statistics.

The Problem: Yesterday's API works, but imagine having 1,000 tasks:

Can't find specific tasks (no search)
Loading all 1,000 at once (slow, wasteful)
Can't sort by priority (no sorting)
No insights into completion rates (no analytics)

The Solution: Implement production-grade query features with proper indexing.

Why Advanced Queries Matter

Real scenario: You have 5,000 tasks in your database.

Without these features:

# User clicks "View Tasks"
tasks = db.query(Task).all()  # Loads ALL 5,000 tasks!
# Result: 
# - 3 second page load
# - 5MB data transfer
# - Angry users

With these features:

# User searches "meeting", page 1
tasks = db.query(Task)\
    .filter(Task.title.ilike("%meeting%"))\
    .order_by(desc(Task.created_at))\
    .limit(20)
# Result:
# - 0.05 second response
# - 50KB data transfer
# - Happy users

This is the difference between a toy and a product.

What We're Building

Query Features:

Filtering - Get only completed or pending tasks
Sorting - Order by date, title, or status
Pagination - Load 20 tasks at a time
Search - Find tasks by keyword
Combined - Use all features together

Statistics:

Total task count
Completion rate
Tasks created today/this week

Performance:

Database indexes for speed
Query constraints for safety

Understanding Query Parameters

Query parameters are the ?key=value parts of URLs:

https://api.example.com/tasks?completed=false&sort_by=created_at&limit=20
                              ↑              ↑                 ↑
                           Filter          Sort            Paginate

In FastAPI:

from fastapi import Query

@app.get("/tasks")
def get_tasks(
    completed: Optional[bool] = Query(None),
    sort_by: str = Query("created_at"),
    skip: int = Query(0, ge=0),
    limit: int = Query(20, ge=1, le=100)
):
    # FastAPI auto-extracts and validates!

What FastAPI does automatically:

Extracts completed from URL
Converts "true" string → Python True boolean
Validates skip >= 0 and limit between 1-100
Returns 422 error if validation fails

Beautiful!

Step-by-Step Implementation

Step 1: Query Parameter Setup

from fastapi import Query
from typing import Optional

@router.get("/", response_model=List[schemas.TaskResponse])
def get_tasks(
    # Filtering
    completed: Optional[bool] = Query(None, description="Filter by status"),

    # Sorting
    sort_by: str = Query("created_at", description="Field to sort by"),
    order: str = Query("desc", description="asc or desc"),

    # Pagination
    skip: int = Query(0, ge=0, description="Records to skip"),
    limit: int = Query(20, ge=1, le=100, description="Max records"),

    # Search
    search: Optional[str] = Query(None, description="Search term"),

    db: Session = Depends(get_db),
    current_user: models.User = Depends(get_current_user)
):
    # Implementation below...

Query() parameters explained:

Query(None) - Optional, defaults to None
Query(0, ge=0) - Defaults to 0, must be >= 0
Query(20, ge=1, le=100) - Defaults to 20, must be 1-100

Why constrain limit to 100?

Without constraints:

GET /tasks?limit=1000000  # User requests 1 million records!
# Server crashes, database overloaded

With constraints:

limit: int = Query(20, ge=1, le=100)
GET /tasks?limit=1000000  # FastAPI returns 422 error
# Server protected!

Step 2: Building the Query

The magic of query chaining:

# Start with base query
query = db.query(models.Task).filter(models.Task.owner_id == current_user.id)

# Add filter if provided
if completed is not None:
    query = query.filter(models.Task.completed == completed)

# Add search if provided
if search:
    search_term = f"%{search}%"
    query = query.filter(
        or_(
            models.Task.title.ilike(search_term),
            models.Task.description.ilike(search_term)
        )
    )

# Add sorting
valid_sort_fields = ["created_at", "updated_at", "title", "completed"]
if sort_by not in valid_sort_fields:
    sort_by = "created_at"

column = getattr(models.Task, sort_by)

if order == "desc":
    query = query.order_by(desc(column))
else:
    query = query.order_by(asc(column))

# Add pagination
query = query.offset(skip).limit(limit)

# Execute query (only NOW does it hit database!)
return query.all()

Key insight: Query isn't executed until .all() is called!

This means:

query = db.query(Task)           # No SQL yet
query = query.filter(...)        # Still no SQL
query = query.order_by(...)      # Still building
tasks = query.all()              # NOW executes SQL!

Benefit: SQLAlchemy generates one optimized SQL query, not multiple.

Step 3: Understanding Filtering

Basic filtering:

if completed is not None:
    query = query.filter(models.Task.completed == completed)

Why is not None?

Consider these requests:

GET /tasks                    # completed = None (don't filter)
GET /tasks?completed=true     # completed = True (filter completed)
GET /tasks?completed=false    # completed = False (filter pending)

Without is not None:

# Bad
if completed:
    query = query.filter(Task.completed == completed)

# GET /tasks?completed=false
# completed = False
# if False: -> doesn't execute!
# Filter not applied! Bug!

With is not None:

# Good
if completed is not None:
    query = query.filter(Task.completed == completed)

# GET /tasks?completed=false
# completed = False
# if False is not None: -> True
# Filter applied correctly!

Step 4: Implementing Search

Case-insensitive search:

if search:
    search_term = f"%{search}%"
    query = query.filter(
        or_(
            models.Task.title.ilike(search_term),
            models.Task.description.ilike(search_term)
        )
    )

Breaking it down:

ILIKE (case-insensitive LIKE):

Task.title.ilike("%meeting%")

Matches:

"Team Meeting" ✅
"meeting notes" ✅
"MEETING" ✅
"Status meeting tomorrow" ✅

Doesn't match:

"Project discussion" ❌

The % wildcards:

meeting% - Starts with "meeting"
%meeting - Ends with "meeting"
%meeting% - Contains "meeting" anywhere

OR condition:

or_(
    Task.title.ilike("%meeting%"),
    Task.description.ilike("%meeting%")
)

SQL generated:

WHERE (title ILIKE '%meeting%' OR description ILIKE '%meeting%')

Searches both title AND description!

Step 5: Dynamic Sorting

valid_sort_fields = ["created_at", "updated_at", "title", "completed"]

if sort_by not in valid_sort_fields:
    sort_by = "created_at"  # Fallback to safe default

column = getattr(models.Task, sort_by)

if order == "desc":
    query = query.order_by(desc(column))
else:
    query = query.order_by(asc(column))

Why validate sort fields?

Without validation (DANGEROUS):

# User sends malicious request
GET /tasks?sort_by=hashed_password

column = getattr(Task, "hashed_password")  # Leaks password hashes!
query.order_by(column)

With validation (SAFE):

valid_fields = ["created_at", "title", "completed"]

if sort_by not in valid_fields:
    sort_by = "created_at"  # Use safe default

# User sends malicious request
GET /tasks?sort_by=hashed_password

# sort_by not in valid_fields
# Defaults to "created_at"
# Attack prevented!

Using getattr dynamically:

# Static (hardcoded)
if sort_by == "title":
    query.order_by(Task.title)
elif sort_by == "created_at":
    query.order_by(Task.created_at)
# ... many elif statements

# Dynamic (elegant)
column = getattr(Task, sort_by)
query.order_by(column)

Step 6: Pagination

query = query.offset(skip).limit(limit)

How pagination works:

# Page 1 (records 1-20)
skip=0, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 0

# Page 2 (records 21-40)
skip=20, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 20

# Page 3 (records 41-60)
skip=40, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 40

Calculating skip:

page_number = 3
page_size = 20
skip = (page_number - 1) * page_size  # (3-1) * 20 = 40

Frontend implementation:

// React example
const [page, setPage] = useState(1);
const pageSize = 20;

fetch(`/tasks?skip=${(page - 1) * pageSize}&limit=${pageSize}`)

Step 7: Statistics Endpoint

@router.get("/stats", response_model=schemas.TaskStatsResponse)
def get_task_stats(
    db: Session = Depends(get_db),
    current_user: models.User = Depends(get_current_user)
):
    # Base query
    user_tasks = db.query(models.Task).filter(
        models.Task.owner_id == current_user.id
    )

    # Total count
    total = user_tasks.count()

    # Completed count
    completed = user_tasks.filter(models.Task.completed == True).count()

    # Pending count
    pending = total - completed

    # Completion rate (prevent division by zero!)
    completion_rate = (completed / total * 100) if total else 0

    # Tasks created today
    today_start = datetime.utcnow().replace(
        hour=0, minute=0, second=0, microsecond=0
    )
    created_today = user_tasks.filter(
        models.Task.created_at >= today_start
    ).count()

    # Tasks created this week
    week_start = today_start - timedelta(days=today_start.weekday())
    created_this_week = user_tasks.filter(
        models.Task.created_at >= week_start
    ).count()

    return {
        "total": total,
        "completed": completed,
        "pending": pending,
        "completion_rate": round(completion_rate, 2),
        "created_today": created_today,
        "created_this_week": created_this_week
    }

Understanding the date calculations:

Today start (midnight):

today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)

# If now is: 2026-03-05 14:30:00
# today_start: 2026-03-05 00:00:00

Week start (Monday):

# weekday() returns 0=Monday, 6=Sunday
today_start.weekday()  # If Tuesday: returns 1

week_start = today_start - timedelta(days=1)
# Tuesday - 1 day = Monday

Edge case handling:

# Bad (crashes on empty task list)
completion_rate = (completed / total * 100)  # ZeroDivisionError!

# Good (handles zero total)
completion_rate = (completed / total * 100) if total else 0

Step 8: Database Indexing

Update models.py:

class Task(Base):
    __tablename__ = "tasks"

    # Add index=True to frequently queried columns
    title = Column(String(100), nullable=False, index=True)
    completed = Column(Boolean, default=False, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    owner_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), index=True)

    # Composite indexes for common query patterns
    __table_args__ = (
        Index('idx_owner_completed', 'owner_id', 'completed'),
        Index('idx_owner_created', 'owner_id', 'created_at'),
    )

Why indexes matter:

Without index:

SELECT * FROM tasks WHERE completed = false;
-- Database scans every row (full table scan)
-- 1,000 rows: ~50ms
-- 100,000 rows: ~5,000ms (5 seconds!)

With index:

SELECT * FROM tasks WHERE completed = false;
-- Database uses index (binary search)
-- 1,000 rows: ~5ms
-- 100,000 rows: ~50ms

~100x faster with indexes!

Composite indexes:

Index('idx_owner_completed', 'owner_id', 'completed')

Optimizes queries like:

WHERE owner_id = 'uuid' AND completed = false

Common in our API:

query.filter(Task.owner_id == current_user.id, Task.completed == False)

Index trade-offs:

Pros:

✅ Much faster queries
✅ Better user experience

Cons:

❌ Slightly slower writes (index must be updated)
❌ More disk space (~10-30% overhead)

Worth it for read-heavy applications!

Testing the Advanced Features

Test 1: Filtering

# Get pending tasks
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=false"

# Get completed tasks
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=true"

Test 2: Sorting

# Newest first
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?sort_by=created_at&order=desc"

# Alphabetical by title
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?sort_by=title&order=asc"

Test 3: Search

# Find tasks with "meeting"
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?search=meeting"

# Case doesn't matter!
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?search=MEETING"  # Same results

Test 4: Pagination

# First page
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?skip=0&limit=20"

# Second page
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?skip=20&limit=20"

Test 5: Combined Query

# Pending tasks with "project", sorted by date, page 1
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=false&search=project&sort_by=created_at&order=desc&limit=10"

Test 6: Statistics

curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks/stats"

Response:

{
  "total": 25,
  "completed": 10,
  "pending": 15,
  "completion_rate": 40.0,
  "created_today": 3,
  "created_this_week": 8
}

Performance Comparison

Before (Day 3):

# Get all tasks (no pagination)
SELECT * FROM tasks WHERE owner_id = 'uuid'
# Returns: 1,000 tasks
# Time: 500ms
# Data transfer: 500KB

After (Day 4):

# Get filtered, sorted, paginated tasks
SELECT * FROM tasks 
WHERE owner_id = 'uuid' AND completed = false
ORDER BY created_at DESC
LIMIT 20 OFFSET 0
# Returns: 20 tasks
# Time: 50ms (with indexes)
# Data transfer: 10KB

Improvement: 10x faster, 50x less data!

Real-World Impact

Before advanced queries:

User has 5,000 tasks
Page loads all 5,000 (slow, crashes mobile apps)
Can't find specific tasks
No insights into productivity

After advanced queries:

Loads 20 tasks at a time (fast on any device)
Search finds tasks instantly
Filter shows only what's needed
Statistics provide insights

This transforms the API from proof-of-concept to product.

Common Mistakes I Avoided

Mistake 1: Not Checking completed is not None

# Bad - breaks for completed=false
if completed:
    query.filter(Task.completed == completed)

# Good - works for all cases
if completed is not None:
    query.filter(Task.completed == completed)

Mistake 2: No Sort Field Validation

# Bad - SQL injection risk
column = getattr(Task, sort_by)  # User could send "password"

# Good - whitelist validation
valid_fields = ["created_at", "title"]
if sort_by not in valid_fields:
    sort_by = "created_at"
column = getattr(Task, sort_by)

Mistake 3: Unlimited Pagination

# Bad - user can request millions of records
limit: int = Query(20)

# Good - constrain to reasonable max
limit: int = Query(20, ge=1, le=100)

Mistake 4: Case-Sensitive Search

# Bad - won't match "Meeting" when searching "meeting"
Task.title.like(f"%{search}%")

# Good - case-insensitive
Task.title.ilike(f"%{search}%")

Mistake 5: Division by Zero

# Bad - crashes on empty task list
completion_rate = (completed / total * 100)

# Good - handle zero case
completion_rate = (completed / total * 100) if total else 0

What I Learned

Technical Skills

✅ Advanced SQLAlchemy query composition

✅ FastAPI Query parameter validation

✅ Database indexing strategies

✅ Pagination implementation

✅ Case-insensitive search (ILIKE)

✅ Aggregation queries (COUNT, date filters)

✅ Dynamic column access with getattr

Conceptual Understanding

✅ Query chaining and lazy evaluation

✅ Index trade-offs (speed vs space)

✅ Security via input validation

✅ Edge case handling (zero division)

✅ Performance optimization strategies

API Design Patterns

✅ Standard query parameter conventions

✅ Pagination with skip/limit

✅ Multi-field sorting

✅ Combined query parameters

✅ Statistics endpoints for analytics

The "Aha!" Moments

1. Queries aren't executed until .all()
Building queries is cheap. Executing is expensive. Chain operations freely!

2. Indexes are like book indexes
Without index: Read every page to find "PostgreSQL"

With index: Jump directly to pages with "PostgreSQL"

3. Validation prevents attacks
Whitelisting sort fields isn't just best practice - it's security!

4. Query parameters compose beautifully
?completed=false&search=meeting&sort_by=created_at

Each parameter is independent but they work together!

Tomorrow's Plans

Day 5 will add:

Task categories/tags (many-to-many relationship)
Tag-based filtering
Bulk operations (delete multiple, update multiple)
Export functionality (CSV download)

But today? Today I celebrate building a production-ready query system that handles scale! 🎉

Resources That Helped

Full Code

Complete code on GitHub:
100-days-of-python/days/004

Time Investment: 2.5 hours

Knowledge Gained: Production query optimization skills

Feeling: Like I built something that actually scales 📊

Day 4/100 complete! ✅

Tomorrow: Many-to-many relationships and bulk operations

Following my journey?

📝 Blog | 🐦 Twitter | 💼 LinkedIn

Tags: #100DaysOfCode #Python #FastAPI #PostgreSQL #SQLAlchemy #BackendDevelopment #DatabaseOptimization #APIDesign #PerformanceTuning #LearningInPublic