DEV Community

stackOverflowed
stackOverflowed

Posted on

How to Format SQL Queries in Python: Best Practices, Gotchas, and Real-World Examples

Stop writing SQL strings that look like a ransom note. Here's how to write queries that are readable, safe, and maintainable.


The Problem With "Good Enough" SQL Formatting

Most Python developers start here:

user_id = 5
query = "SELECT * FROM users WHERE id = " + str(user_id)
cursor.execute(query)
Enter fullscreen mode Exit fullscreen mode

It works. Until it doesn't — and when it breaks, it breaks badly: SQL injection, cryptic errors from mismatched types, and queries that take 45 minutes to debug at 2am. Let's fix that, permanently.


1. Never Concatenate User Input — Use Parameterized Queries

This is rule #1 and it's non-negotiable.

❌ The Wrong Way (SQL Injection Waiting to Happen)

username = request.args.get("username")  # Could be: ' OR '1'='1
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)
Enter fullscreen mode Exit fullscreen mode

If username is ' OR '1'='1, your entire users table just got exposed.

✅ The Right Way: Parameterized Queries

username = request.args.get("username")

# psycopg2 (PostgreSQL)
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))

# sqlite3
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))

# SQLAlchemy Core
from sqlalchemy import text
result = conn.execute(text("SELECT * FROM users WHERE username = :name"), {"name": username})
Enter fullscreen mode Exit fullscreen mode

The database driver handles escaping. You never touch it. This pattern is immune to SQL injection by design.

Gotcha: Note the trailing comma in (username,). Without it, Python treats the string as an iterable and passes each character as a separate parameter. This is one of the most common beginner bugs.

# 💥 Bug: passes ('a', 'l', 'i', 'c', 'e') instead of ('alice',)
cursor.execute("SELECT * FROM users WHERE username = %s", (username))

# ✅ Correct: single-element tuple
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))
Enter fullscreen mode Exit fullscreen mode

2. Multi-Line Queries: Triple Quotes + Consistent Indentation

For anything longer than one clause, use triple-quoted strings. They're readable, diffable, and don't require escape characters.

❌ String Concatenation Chaos

query = "SELECT u.id, u.name, o.total " + \
        "FROM users u " + \
        "JOIN orders o ON u.id = o.user_id " + \
        "WHERE o.total > 100"
Enter fullscreen mode Exit fullscreen mode

✅ Triple-Quoted SQL

query = """
    SELECT
        u.id,
        u.name,
        u.email,
        o.total,
        o.created_at
    FROM users u
    JOIN orders o
        ON u.id = o.user_id
    WHERE
        o.total > 100
        AND u.active = TRUE
    ORDER BY o.created_at DESC
    LIMIT 50
"""

cursor.execute(query)
Enter fullscreen mode Exit fullscreen mode

Style conventions that matter:

  • SQL keywords in UPPERCASE
  • One clause per line (SELECT, FROM, WHERE, ORDER BY)
  • Indent column lists by 4 spaces
  • Put ON conditions on their own line, indented under JOIN

3. Dynamic Queries: Building Them Safely

Real applications need dynamic filters. Here's how to do it without string concatenation.

Scenario: A search endpoint with optional filters

def get_orders(user_id: int, status: str = None, min_total: float = None):
    base_query = """
        SELECT
            o.id,
            o.status,
            o.total,
            o.created_at
        FROM orders o
        WHERE o.user_id = %s
    """
    params = [user_id]
    conditions = []

    if status:
        conditions.append("AND o.status = %s")
        params.append(status)

    if min_total is not None:
        conditions.append("AND o.total >= %s")
        params.append(min_total)

    full_query = base_query + "\n        ".join(conditions)

    cursor.execute(full_query, params)
    return cursor.fetchall()
Enter fullscreen mode Exit fullscreen mode

This is the accumulator pattern: build a list of clauses and params in parallel, then join them. The number of %s placeholders always matches len(params).

Gotcha: Never dynamically interpolate column names or table names with parameters — the DB driver only parameterizes values, not identifiers. If you need a dynamic column name, whitelist it explicitly:

ALLOWED_SORT_COLUMNS = {"created_at", "total", "status"}

def get_orders_sorted(sort_by: str):
    if sort_by not in ALLOWED_SORT_COLUMNS:
        raise ValueError(f"Invalid sort column: {sort_by}")

    # Now safe to interpolate — it came from our whitelist, not user input
    query = f"""
        SELECT id, status, total, created_at
        FROM orders
        ORDER BY {sort_by} DESC
    """
    cursor.execute(query)
Enter fullscreen mode Exit fullscreen mode

4. IN Clauses: The Tricky One

Parameterizing IN (...) trips up almost everyone the first time.

❌ This Won't Work

ids = [1, 2, 3, 4]
cursor.execute("SELECT * FROM users WHERE id IN (%s)", (ids,))
# Error: query expects 1 placeholder, got a list
Enter fullscreen mode Exit fullscreen mode

✅ Generate Placeholders Dynamically

ids = [1, 2, 3, 4]

# sqlite3
placeholders = ", ".join("?" * len(ids))
query = f"SELECT * FROM users WHERE id IN ({placeholders})"
cursor.execute(query, ids)

# psycopg2 — uses %s for each
placeholders = ", ".join(["%s"] * len(ids))
query = f"SELECT * FROM users WHERE id IN ({placeholders})"
cursor.execute(query, ids)
Enter fullscreen mode Exit fullscreen mode

✅ SQLAlchemy makes this trivial

from sqlalchemy import select, bindparam

stmt = select(users_table).where(users_table.c.id.in_([1, 2, 3, 4]))
result = conn.execute(stmt)
Enter fullscreen mode Exit fullscreen mode

5. Use SQLAlchemy Core for Complex Queries

When your raw SQL starts looking like a maze, SQLAlchemy Core gives you composable, Pythonic query building without the full ORM overhead.

from sqlalchemy import create_engine, select, func, and_, desc
from sqlalchemy import Table, MetaData

engine = create_engine("postgresql://user:pass@localhost/mydb")
meta = MetaData()
meta.reflect(bind=engine)

orders = meta.tables["orders"]
users = meta.tables["users"]

stmt = (
    select(
        users.c.id,
        users.c.name,
        func.sum(orders.c.total).label("lifetime_value"),
        func.count(orders.c.id).label("order_count"),
    )
    .join(orders, users.c.id == orders.c.user_id)
    .where(
        and_(
            orders.c.status == "completed",
            orders.c.created_at >= "2024-01-01",
        )
    )
    .group_by(users.c.id, users.c.name)
    .having(func.sum(orders.c.total) > 500)
    .order_by(desc("lifetime_value"))
    .limit(20)
)

with engine.connect() as conn:
    results = conn.execute(stmt).fetchall()
Enter fullscreen mode Exit fullscreen mode

This compiles to safe, parameterized SQL automatically. You can also print the generated SQL for debugging:

print(stmt.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))
Enter fullscreen mode Exit fullscreen mode

6. Formatting for Logging and Debugging

When you need to log a query (for debugging, audit trails, etc.), format it cleanly — but never log with real user data embedded.

import logging
import re

logger = logging.getLogger(__name__)

def log_query(query: str, params: tuple):
    # Normalize whitespace for clean log output
    clean_query = re.sub(r'\s+', ' ', query.strip())
    logger.debug("Executing SQL: %s | Params: %s", clean_query, params)

log_query("""
    SELECT id, name
    FROM users
    WHERE active = %s
""", (True,))

# Output: Executing SQL: SELECT id, name FROM users WHERE active = %s | Params: (True,)
Enter fullscreen mode Exit fullscreen mode

Gotcha: Don't use cursor.mogrify() (psycopg2) to log queries with real values in production — it defeats the purpose of parameterization from an audit standpoint, and if logs are exfiltrated, you've exposed real data.


7. Query Constants: SQL as Module-Level Variables

For queries used across your codebase, define them as constants at the top of your module. This makes them findable, testable, and easy to update.

# queries.py

GET_USER_BY_ID = """
    SELECT id, name, email, created_at
    FROM users
    WHERE id = %s
      AND active = TRUE
"""

GET_USER_ORDERS = """
    SELECT
        o.id,
        o.status,
        o.total,
        o.created_at
    FROM orders o
    WHERE o.user_id = %s
    ORDER BY o.created_at DESC
"""

INSERT_ORDER = """
    INSERT INTO orders (user_id, total, status, created_at)
    VALUES (%s, %s, %s, NOW())
    RETURNING id
"""
Enter fullscreen mode Exit fullscreen mode
# Usage in your service layer
from queries import GET_USER_BY_ID, GET_USER_ORDERS

def get_user_with_orders(user_id: int):
    cursor.execute(GET_USER_BY_ID, (user_id,))
    user = cursor.fetchone()

    cursor.execute(GET_USER_ORDERS, (user_id,))
    orders = cursor.fetchall()

    return user, orders
Enter fullscreen mode Exit fullscreen mode

This makes SQL grep-able, version-controllable, and reviewable in PRs.


8. Gotchas Quick Reference

Gotcha What Goes Wrong Fix
(value) vs (value,) String unpacked as characters Always use trailing comma for single-element tuples
f-string with user input SQL injection Use parameterized queries
Dynamic IN (...) Wrong number of placeholders Generate placeholders programmatically
Dynamic column names Parameters don't work for identifiers Whitelist allowed column names
Logging with real values Data exposure in logs Log query template + params separately
None in params NULL comparison breaks Use IS NULL in SQL, not = NULL
# None/NULL gotcha
user_role = None

# ❌ Won't match any rows — SQL NULL != NULL
cursor.execute("SELECT * FROM users WHERE role = %s", (user_role,))

# ✅ Use IS NULL when the value might be None
if user_role is None:
    cursor.execute("SELECT * FROM users WHERE role IS NULL")
else:
    cursor.execute("SELECT * FROM users WHERE role = %s", (user_role,))
Enter fullscreen mode Exit fullscreen mode

Summary

Situation Recommended Approach
Simple parameterized query cursor.execute(sql, (val,))
Multi-line query Triple-quoted string with clause-per-line formatting
Dynamic filters Accumulator pattern (list of clauses + params)
IN clause Generate placeholders: ", ".join(["%s"] * len(ids))
Complex joins/aggregates SQLAlchemy Core
Reusable queries Module-level string constants
Dynamic column names Whitelist validation + f-string

The core rule is simple: SQL structure is yours to format; SQL values are the driver's job to escape. Internalize that boundary and most of these best practices follow naturally.


Found a gotcha I missed? Drop it in the comments — SQL has a way of surprising everyone eventually.


Top comments (0)