DEV Community

archi-jain
archi-jain

Posted on

Day 4/100: Advanced Query Features - From Basic CRUD to Production API

Part of my 100 Days of Code journey. Today we make the API actually usable at scale.

The Challenge

Add advanced query features to the Task Management API: filtering, sorting, pagination, search, and statistics.

The Problem: Yesterday's API works, but imagine having 1,000 tasks:

  • Can't find specific tasks (no search)
  • Loading all 1,000 at once (slow, wasteful)
  • Can't sort by priority (no sorting)
  • No insights into completion rates (no analytics)

The Solution: Implement production-grade query features with proper indexing.

Why Advanced Queries Matter

Real scenario: You have 5,000 tasks in your database.

Without these features:

# User clicks "View Tasks"
tasks = db.query(Task).all()  # Loads ALL 5,000 tasks!
# Result: 
# - 3 second page load
# - 5MB data transfer
# - Angry users
Enter fullscreen mode Exit fullscreen mode

With these features:

# User searches "meeting", page 1
tasks = db.query(Task)\
    .filter(Task.title.ilike("%meeting%"))\
    .order_by(desc(Task.created_at))\
    .limit(20)
# Result:
# - 0.05 second response
# - 50KB data transfer
# - Happy users
Enter fullscreen mode Exit fullscreen mode

This is the difference between a toy and a product.

What We're Building

Query Features:

  1. Filtering - Get only completed or pending tasks
  2. Sorting - Order by date, title, or status
  3. Pagination - Load 20 tasks at a time
  4. Search - Find tasks by keyword
  5. Combined - Use all features together

Statistics:

  • Total task count
  • Completion rate
  • Tasks created today/this week

Performance:

  • Database indexes for speed
  • Query constraints for safety

Understanding Query Parameters

Query parameters are the ?key=value parts of URLs:

https://api.example.com/tasks?completed=false&sort_by=created_at&limit=20
                              ↑              ↑                 ↑
                           Filter          Sort            Paginate
Enter fullscreen mode Exit fullscreen mode

In FastAPI:

from fastapi import Query

@app.get("/tasks")
def get_tasks(
    completed: Optional[bool] = Query(None),
    sort_by: str = Query("created_at"),
    skip: int = Query(0, ge=0),
    limit: int = Query(20, ge=1, le=100)
):
    # FastAPI auto-extracts and validates!
Enter fullscreen mode Exit fullscreen mode

What FastAPI does automatically:

  1. Extracts completed from URL
  2. Converts "true" string → Python True boolean
  3. Validates skip >= 0 and limit between 1-100
  4. Returns 422 error if validation fails

Beautiful!

Step-by-Step Implementation

Step 1: Query Parameter Setup

from fastapi import Query
from typing import Optional

@router.get("/", response_model=List[schemas.TaskResponse])
def get_tasks(
    # Filtering
    completed: Optional[bool] = Query(None, description="Filter by status"),

    # Sorting
    sort_by: str = Query("created_at", description="Field to sort by"),
    order: str = Query("desc", description="asc or desc"),

    # Pagination
    skip: int = Query(0, ge=0, description="Records to skip"),
    limit: int = Query(20, ge=1, le=100, description="Max records"),

    # Search
    search: Optional[str] = Query(None, description="Search term"),

    db: Session = Depends(get_db),
    current_user: models.User = Depends(get_current_user)
):
    # Implementation below...
Enter fullscreen mode Exit fullscreen mode

Query() parameters explained:

  • Query(None) - Optional, defaults to None
  • Query(0, ge=0) - Defaults to 0, must be >= 0
  • Query(20, ge=1, le=100) - Defaults to 20, must be 1-100

Why constrain limit to 100?

Without constraints:

GET /tasks?limit=1000000  # User requests 1 million records!
# Server crashes, database overloaded
Enter fullscreen mode Exit fullscreen mode

With constraints:

limit: int = Query(20, ge=1, le=100)
GET /tasks?limit=1000000  # FastAPI returns 422 error
# Server protected!
Enter fullscreen mode Exit fullscreen mode

Step 2: Building the Query

The magic of query chaining:

# Start with base query
query = db.query(models.Task).filter(models.Task.owner_id == current_user.id)

# Add filter if provided
if completed is not None:
    query = query.filter(models.Task.completed == completed)

# Add search if provided
if search:
    search_term = f"%{search}%"
    query = query.filter(
        or_(
            models.Task.title.ilike(search_term),
            models.Task.description.ilike(search_term)
        )
    )

# Add sorting
valid_sort_fields = ["created_at", "updated_at", "title", "completed"]
if sort_by not in valid_sort_fields:
    sort_by = "created_at"

column = getattr(models.Task, sort_by)

if order == "desc":
    query = query.order_by(desc(column))
else:
    query = query.order_by(asc(column))

# Add pagination
query = query.offset(skip).limit(limit)

# Execute query (only NOW does it hit database!)
return query.all()
Enter fullscreen mode Exit fullscreen mode

Key insight: Query isn't executed until .all() is called!

This means:

query = db.query(Task)           # No SQL yet
query = query.filter(...)        # Still no SQL
query = query.order_by(...)      # Still building
tasks = query.all()              # NOW executes SQL!
Enter fullscreen mode Exit fullscreen mode

Benefit: SQLAlchemy generates one optimized SQL query, not multiple.

Step 3: Understanding Filtering

Basic filtering:

if completed is not None:
    query = query.filter(models.Task.completed == completed)
Enter fullscreen mode Exit fullscreen mode

Why is not None?

Consider these requests:

GET /tasks                    # completed = None (don't filter)
GET /tasks?completed=true     # completed = True (filter completed)
GET /tasks?completed=false    # completed = False (filter pending)
Enter fullscreen mode Exit fullscreen mode

Without is not None:

# Bad
if completed:
    query = query.filter(Task.completed == completed)

# GET /tasks?completed=false
# completed = False
# if False: -> doesn't execute!
# Filter not applied! Bug!
Enter fullscreen mode Exit fullscreen mode

With is not None:

# Good
if completed is not None:
    query = query.filter(Task.completed == completed)

# GET /tasks?completed=false
# completed = False
# if False is not None: -> True
# Filter applied correctly!
Enter fullscreen mode Exit fullscreen mode

Step 4: Implementing Search

Case-insensitive search:

if search:
    search_term = f"%{search}%"
    query = query.filter(
        or_(
            models.Task.title.ilike(search_term),
            models.Task.description.ilike(search_term)
        )
    )
Enter fullscreen mode Exit fullscreen mode

Breaking it down:

ILIKE (case-insensitive LIKE):

Task.title.ilike("%meeting%")
Enter fullscreen mode Exit fullscreen mode

Matches:

  • "Team Meeting" ✅
  • "meeting notes" ✅
  • "MEETING" ✅
  • "Status meeting tomorrow" ✅

Doesn't match:

  • "Project discussion" ❌

The % wildcards:

  • meeting% - Starts with "meeting"
  • %meeting - Ends with "meeting"
  • %meeting% - Contains "meeting" anywhere

OR condition:

or_(
    Task.title.ilike("%meeting%"),
    Task.description.ilike("%meeting%")
)
Enter fullscreen mode Exit fullscreen mode

SQL generated:

WHERE (title ILIKE '%meeting%' OR description ILIKE '%meeting%')
Enter fullscreen mode Exit fullscreen mode

Searches both title AND description!

Step 5: Dynamic Sorting

valid_sort_fields = ["created_at", "updated_at", "title", "completed"]

if sort_by not in valid_sort_fields:
    sort_by = "created_at"  # Fallback to safe default

column = getattr(models.Task, sort_by)

if order == "desc":
    query = query.order_by(desc(column))
else:
    query = query.order_by(asc(column))
Enter fullscreen mode Exit fullscreen mode

Why validate sort fields?

Without validation (DANGEROUS):

# User sends malicious request
GET /tasks?sort_by=hashed_password

column = getattr(Task, "hashed_password")  # Leaks password hashes!
query.order_by(column)
Enter fullscreen mode Exit fullscreen mode

With validation (SAFE):

valid_fields = ["created_at", "title", "completed"]

if sort_by not in valid_fields:
    sort_by = "created_at"  # Use safe default

# User sends malicious request
GET /tasks?sort_by=hashed_password

# sort_by not in valid_fields
# Defaults to "created_at"
# Attack prevented!
Enter fullscreen mode Exit fullscreen mode

Using getattr dynamically:

# Static (hardcoded)
if sort_by == "title":
    query.order_by(Task.title)
elif sort_by == "created_at":
    query.order_by(Task.created_at)
# ... many elif statements

# Dynamic (elegant)
column = getattr(Task, sort_by)
query.order_by(column)
Enter fullscreen mode Exit fullscreen mode

Step 6: Pagination

query = query.offset(skip).limit(limit)
Enter fullscreen mode Exit fullscreen mode

How pagination works:

# Page 1 (records 1-20)
skip=0, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 0

# Page 2 (records 21-40)
skip=20, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 20

# Page 3 (records 41-60)
skip=40, limit=20  # SELECT * FROM tasks LIMIT 20 OFFSET 40
Enter fullscreen mode Exit fullscreen mode

Calculating skip:

page_number = 3
page_size = 20
skip = (page_number - 1) * page_size  # (3-1) * 20 = 40
Enter fullscreen mode Exit fullscreen mode

Frontend implementation:

// React example
const [page, setPage] = useState(1);
const pageSize = 20;

fetch(`/tasks?skip=${(page - 1) * pageSize}&limit=${pageSize}`)
Enter fullscreen mode Exit fullscreen mode

Step 7: Statistics Endpoint

@router.get("/stats", response_model=schemas.TaskStatsResponse)
def get_task_stats(
    db: Session = Depends(get_db),
    current_user: models.User = Depends(get_current_user)
):
    # Base query
    user_tasks = db.query(models.Task).filter(
        models.Task.owner_id == current_user.id
    )

    # Total count
    total = user_tasks.count()

    # Completed count
    completed = user_tasks.filter(models.Task.completed == True).count()

    # Pending count
    pending = total - completed

    # Completion rate (prevent division by zero!)
    completion_rate = (completed / total * 100) if total else 0

    # Tasks created today
    today_start = datetime.utcnow().replace(
        hour=0, minute=0, second=0, microsecond=0
    )
    created_today = user_tasks.filter(
        models.Task.created_at >= today_start
    ).count()

    # Tasks created this week
    week_start = today_start - timedelta(days=today_start.weekday())
    created_this_week = user_tasks.filter(
        models.Task.created_at >= week_start
    ).count()

    return {
        "total": total,
        "completed": completed,
        "pending": pending,
        "completion_rate": round(completion_rate, 2),
        "created_today": created_today,
        "created_this_week": created_this_week
    }
Enter fullscreen mode Exit fullscreen mode

Understanding the date calculations:

Today start (midnight):

today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)

# If now is: 2026-03-05 14:30:00
# today_start: 2026-03-05 00:00:00
Enter fullscreen mode Exit fullscreen mode

Week start (Monday):

# weekday() returns 0=Monday, 6=Sunday
today_start.weekday()  # If Tuesday: returns 1

week_start = today_start - timedelta(days=1)
# Tuesday - 1 day = Monday
Enter fullscreen mode Exit fullscreen mode

Edge case handling:

# Bad (crashes on empty task list)
completion_rate = (completed / total * 100)  # ZeroDivisionError!

# Good (handles zero total)
completion_rate = (completed / total * 100) if total else 0
Enter fullscreen mode Exit fullscreen mode

Step 8: Database Indexing

Update models.py:

class Task(Base):
    __tablename__ = "tasks"

    # Add index=True to frequently queried columns
    title = Column(String(100), nullable=False, index=True)
    completed = Column(Boolean, default=False, index=True)
    created_at = Column(DateTime, default=datetime.utcnow, index=True)
    owner_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), index=True)

    # Composite indexes for common query patterns
    __table_args__ = (
        Index('idx_owner_completed', 'owner_id', 'completed'),
        Index('idx_owner_created', 'owner_id', 'created_at'),
    )
Enter fullscreen mode Exit fullscreen mode

Why indexes matter:

Without index:

SELECT * FROM tasks WHERE completed = false;
-- Database scans every row (full table scan)
-- 1,000 rows: ~50ms
-- 100,000 rows: ~5,000ms (5 seconds!)
Enter fullscreen mode Exit fullscreen mode

With index:

SELECT * FROM tasks WHERE completed = false;
-- Database uses index (binary search)
-- 1,000 rows: ~5ms
-- 100,000 rows: ~50ms
Enter fullscreen mode Exit fullscreen mode

~100x faster with indexes!

Composite indexes:

Index('idx_owner_completed', 'owner_id', 'completed')
Enter fullscreen mode Exit fullscreen mode

Optimizes queries like:

WHERE owner_id = 'uuid' AND completed = false
Enter fullscreen mode Exit fullscreen mode

Common in our API:

query.filter(Task.owner_id == current_user.id, Task.completed == False)
Enter fullscreen mode Exit fullscreen mode

Index trade-offs:

Pros:

  • ✅ Much faster queries
  • ✅ Better user experience

Cons:

  • ❌ Slightly slower writes (index must be updated)
  • ❌ More disk space (~10-30% overhead)

Worth it for read-heavy applications!

Testing the Advanced Features

Test 1: Filtering

# Get pending tasks
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=false"

# Get completed tasks
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=true"
Enter fullscreen mode Exit fullscreen mode

Test 2: Sorting

# Newest first
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?sort_by=created_at&order=desc"

# Alphabetical by title
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?sort_by=title&order=asc"
Enter fullscreen mode Exit fullscreen mode

Test 3: Search

# Find tasks with "meeting"
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?search=meeting"

# Case doesn't matter!
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?search=MEETING"  # Same results
Enter fullscreen mode Exit fullscreen mode

Test 4: Pagination

# First page
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?skip=0&limit=20"

# Second page
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?skip=20&limit=20"
Enter fullscreen mode Exit fullscreen mode

Test 5: Combined Query

# Pending tasks with "project", sorted by date, page 1
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks?completed=false&search=project&sort_by=created_at&order=desc&limit=10"
Enter fullscreen mode Exit fullscreen mode

Test 6: Statistics

curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/tasks/stats"
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "total": 25,
  "completed": 10,
  "pending": 15,
  "completion_rate": 40.0,
  "created_today": 3,
  "created_this_week": 8
}
Enter fullscreen mode Exit fullscreen mode

Performance Comparison

Before (Day 3):

# Get all tasks (no pagination)
SELECT * FROM tasks WHERE owner_id = 'uuid'
# Returns: 1,000 tasks
# Time: 500ms
# Data transfer: 500KB
Enter fullscreen mode Exit fullscreen mode

After (Day 4):

# Get filtered, sorted, paginated tasks
SELECT * FROM tasks 
WHERE owner_id = 'uuid' AND completed = false
ORDER BY created_at DESC
LIMIT 20 OFFSET 0
# Returns: 20 tasks
# Time: 50ms (with indexes)
# Data transfer: 10KB
Enter fullscreen mode Exit fullscreen mode

Improvement: 10x faster, 50x less data!

Real-World Impact

Before advanced queries:

  • User has 5,000 tasks
  • Page loads all 5,000 (slow, crashes mobile apps)
  • Can't find specific tasks
  • No insights into productivity

After advanced queries:

  • Loads 20 tasks at a time (fast on any device)
  • Search finds tasks instantly
  • Filter shows only what's needed
  • Statistics provide insights

This transforms the API from proof-of-concept to product.

Common Mistakes I Avoided

Mistake 1: Not Checking completed is not None

# Bad - breaks for completed=false
if completed:
    query.filter(Task.completed == completed)

# Good - works for all cases
if completed is not None:
    query.filter(Task.completed == completed)
Enter fullscreen mode Exit fullscreen mode

Mistake 2: No Sort Field Validation

# Bad - SQL injection risk
column = getattr(Task, sort_by)  # User could send "password"

# Good - whitelist validation
valid_fields = ["created_at", "title"]
if sort_by not in valid_fields:
    sort_by = "created_at"
column = getattr(Task, sort_by)
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Unlimited Pagination

# Bad - user can request millions of records
limit: int = Query(20)

# Good - constrain to reasonable max
limit: int = Query(20, ge=1, le=100)
Enter fullscreen mode Exit fullscreen mode

Mistake 4: Case-Sensitive Search

# Bad - won't match "Meeting" when searching "meeting"
Task.title.like(f"%{search}%")

# Good - case-insensitive
Task.title.ilike(f"%{search}%")
Enter fullscreen mode Exit fullscreen mode

Mistake 5: Division by Zero

# Bad - crashes on empty task list
completion_rate = (completed / total * 100)

# Good - handle zero case
completion_rate = (completed / total * 100) if total else 0
Enter fullscreen mode Exit fullscreen mode

What I Learned

Technical Skills

✅ Advanced SQLAlchemy query composition

✅ FastAPI Query parameter validation

✅ Database indexing strategies

✅ Pagination implementation

✅ Case-insensitive search (ILIKE)

✅ Aggregation queries (COUNT, date filters)

✅ Dynamic column access with getattr

Conceptual Understanding

✅ Query chaining and lazy evaluation

✅ Index trade-offs (speed vs space)

✅ Security via input validation

✅ Edge case handling (zero division)

✅ Performance optimization strategies

API Design Patterns

✅ Standard query parameter conventions

✅ Pagination with skip/limit

✅ Multi-field sorting

✅ Combined query parameters

✅ Statistics endpoints for analytics

The "Aha!" Moments

1. Queries aren't executed until .all()
Building queries is cheap. Executing is expensive. Chain operations freely!

2. Indexes are like book indexes
Without index: Read every page to find "PostgreSQL"

With index: Jump directly to pages with "PostgreSQL"

3. Validation prevents attacks
Whitelisting sort fields isn't just best practice - it's security!

4. Query parameters compose beautifully
?completed=false&search=meeting&sort_by=created_at

Each parameter is independent but they work together!

Tomorrow's Plans

Day 5 will add:

  • Task categories/tags (many-to-many relationship)
  • Tag-based filtering
  • Bulk operations (delete multiple, update multiple)
  • Export functionality (CSV download)

But today? Today I celebrate building a production-ready query system that handles scale! 🎉

Resources That Helped

Full Code

Complete code on GitHub:
100-days-of-python/days/004


Time Investment: 2.5 hours

Knowledge Gained: Production query optimization skills

Feeling: Like I built something that actually scales 📊

Day 4/100 complete!

Tomorrow: Many-to-many relationships and bulk operations


Following my journey?

📝 Blog | 🐦 Twitter | 💼 LinkedIn

Tags: #100DaysOfCode #Python #FastAPI #PostgreSQL #SQLAlchemy #BackendDevelopment #DatabaseOptimization #APIDesign #PerformanceTuning #LearningInPublic

Top comments (0)