stackOverflowed

Posted on Jun 10

How to Format SQL Queries in Python: Best Practices, Gotchas, and Real-World Examples

#python #sql #database #bestpractices

Stop writing SQL strings that look like a ransom note. Here's how to write queries that are readable, safe, and maintainable.

The Problem With "Good Enough" SQL Formatting

Most Python developers start here:

user_id = 5
query = "SELECT * FROM users WHERE id = " + str(user_id)
cursor.execute(query)

It works. Until it doesn't — and when it breaks, it breaks badly: SQL injection, cryptic errors from mismatched types, and queries that take 45 minutes to debug at 2am. Let's fix that, permanently.

1. Never Concatenate User Input — Use Parameterized Queries

This is rule #1 and it's non-negotiable.

❌ The Wrong Way (SQL Injection Waiting to Happen)

username = request.args.get("username")  # Could be: ' OR '1'='1
query = f"SELECT * FROM users WHERE username = '{username}'"
cursor.execute(query)

If username is ' OR '1'='1, your entire users table just got exposed.

✅ The Right Way: Parameterized Queries

username = request.args.get("username")

# psycopg2 (PostgreSQL)
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))

# sqlite3
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))

# SQLAlchemy Core
from sqlalchemy import text
result = conn.execute(text("SELECT * FROM users WHERE username = :name"), {"name": username})

The database driver handles escaping. You never touch it. This pattern is immune to SQL injection by design.

Gotcha: Note the trailing comma in (username,). Without it, Python treats the string as an iterable and passes each character as a separate parameter. This is one of the most common beginner bugs.

# 💥 Bug: passes ('a', 'l', 'i', 'c', 'e') instead of ('alice',)
cursor.execute("SELECT * FROM users WHERE username = %s", (username))

# ✅ Correct: single-element tuple
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))

2. Multi-Line Queries: Triple Quotes + Consistent Indentation

For anything longer than one clause, use triple-quoted strings. They're readable, diffable, and don't require escape characters.

❌ String Concatenation Chaos

query = "SELECT u.id, u.name, o.total " + \
        "FROM users u " + \
        "JOIN orders o ON u.id = o.user_id " + \
        "WHERE o.total > 100"

✅ Triple-Quoted SQL

query = """
    SELECT
        u.id,
        u.name,
        u.email,
        o.total,
        o.created_at
    FROM users u
    JOIN orders o
        ON u.id = o.user_id
    WHERE
        o.total > 100
        AND u.active = TRUE
    ORDER BY o.created_at DESC
    LIMIT 50
"""

cursor.execute(query)

Style conventions that matter:

SQL keywords in UPPERCASE
One clause per line (SELECT, FROM, WHERE, ORDER BY)
Indent column lists by 4 spaces
Put ON conditions on their own line, indented under JOIN

3. Dynamic Queries: Building Them Safely

Real applications need dynamic filters. Here's how to do it without string concatenation.

Scenario: A search endpoint with optional filters

def get_orders(user_id: int, status: str = None, min_total: float = None):
    base_query = """
        SELECT
            o.id,
            o.status,
            o.total,
            o.created_at
        FROM orders o
        WHERE o.user_id = %s
    """
    params = [user_id]
    conditions = []

    if status:
        conditions.append("AND o.status = %s")
        params.append(status)

    if min_total is not None:
        conditions.append("AND o.total >= %s")
        params.append(min_total)

    full_query = base_query + "\n        ".join(conditions)

    cursor.execute(full_query, params)
    return cursor.fetchall()

This is the accumulator pattern: build a list of clauses and params in parallel, then join them. The number of %s placeholders always matches len(params).

Gotcha: Never dynamically interpolate column names or table names with parameters — the DB driver only parameterizes values, not identifiers. If you need a dynamic column name, whitelist it explicitly:

ALLOWED_SORT_COLUMNS = {"created_at", "total", "status"}

def get_orders_sorted(sort_by: str):
    if sort_by not in ALLOWED_SORT_COLUMNS:
        raise ValueError(f"Invalid sort column: {sort_by}")

    # Now safe to interpolate — it came from our whitelist, not user input
    query = f"""
        SELECT id, status, total, created_at
        FROM orders
        ORDER BY {sort_by} DESC
    """
    cursor.execute(query)

4. IN Clauses: The Tricky One

Parameterizing IN (...) trips up almost everyone the first time.

❌ This Won't Work

ids = [1, 2, 3, 4]
cursor.execute("SELECT * FROM users WHERE id IN (%s)", (ids,))
# Error: query expects 1 placeholder, got a list

✅ Generate Placeholders Dynamically

ids = [1, 2, 3, 4]

# sqlite3
placeholders = ", ".join("?" * len(ids))
query = f"SELECT * FROM users WHERE id IN ({placeholders})"
cursor.execute(query, ids)

# psycopg2 — uses %s for each
placeholders = ", ".join(["%s"] * len(ids))
query = f"SELECT * FROM users WHERE id IN ({placeholders})"
cursor.execute(query, ids)

✅ SQLAlchemy makes this trivial

from sqlalchemy import select, bindparam

stmt = select(users_table).where(users_table.c.id.in_([1, 2, 3, 4]))
result = conn.execute(stmt)

5. Use SQLAlchemy Core for Complex Queries

When your raw SQL starts looking like a maze, SQLAlchemy Core gives you composable, Pythonic query building without the full ORM overhead.

from sqlalchemy import create_engine, select, func, and_, desc
from sqlalchemy import Table, MetaData

engine = create_engine("postgresql://user:pass@localhost/mydb")
meta = MetaData()
meta.reflect(bind=engine)

orders = meta.tables["orders"]
users = meta.tables["users"]

stmt = (
    select(
        users.c.id,
        users.c.name,
        func.sum(orders.c.total).label("lifetime_value"),
        func.count(orders.c.id).label("order_count"),
    )
    .join(orders, users.c.id == orders.c.user_id)
    .where(
        and_(
            orders.c.status == "completed",
            orders.c.created_at >= "2024-01-01",
        )
    )
    .group_by(users.c.id, users.c.name)
    .having(func.sum(orders.c.total) > 500)
    .order_by(desc("lifetime_value"))
    .limit(20)
)

with engine.connect() as conn:
    results = conn.execute(stmt).fetchall()

This compiles to safe, parameterized SQL automatically. You can also print the generated SQL for debugging:

print(stmt.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))

6. Formatting for Logging and Debugging

When you need to log a query (for debugging, audit trails, etc.), format it cleanly — but never log with real user data embedded.

import logging
import re

logger = logging.getLogger(__name__)

def log_query(query: str, params: tuple):
    # Normalize whitespace for clean log output
    clean_query = re.sub(r'\s+', ' ', query.strip())
    logger.debug("Executing SQL: %s | Params: %s", clean_query, params)

log_query("""
    SELECT id, name
    FROM users
    WHERE active = %s
""", (True,))

# Output: Executing SQL: SELECT id, name FROM users WHERE active = %s | Params: (True,)

Gotcha: Don't use cursor.mogrify() (psycopg2) to log queries with real values in production — it defeats the purpose of parameterization from an audit standpoint, and if logs are exfiltrated, you've exposed real data.

7. Query Constants: SQL as Module-Level Variables

For queries used across your codebase, define them as constants at the top of your module. This makes them findable, testable, and easy to update.

# queries.py

GET_USER_BY_ID = """
    SELECT id, name, email, created_at
    FROM users
    WHERE id = %s
      AND active = TRUE
"""

GET_USER_ORDERS = """
    SELECT
        o.id,
        o.status,
        o.total,
        o.created_at
    FROM orders o
    WHERE o.user_id = %s
    ORDER BY o.created_at DESC
"""

INSERT_ORDER = """
    INSERT INTO orders (user_id, total, status, created_at)
    VALUES (%s, %s, %s, NOW())
    RETURNING id
"""

# Usage in your service layer
from queries import GET_USER_BY_ID, GET_USER_ORDERS

def get_user_with_orders(user_id: int):
    cursor.execute(GET_USER_BY_ID, (user_id,))
    user = cursor.fetchone()

    cursor.execute(GET_USER_ORDERS, (user_id,))
    orders = cursor.fetchall()

    return user, orders

This makes SQL grep-able, version-controllable, and reviewable in PRs.

8. Gotchas Quick Reference

Gotcha	What Goes Wrong	Fix
`(value)` vs `(value,)`	String unpacked as characters	Always use trailing comma for single-element tuples
f-string with user input	SQL injection	Use parameterized queries
Dynamic `IN (...)`	Wrong number of placeholders	Generate placeholders programmatically
Dynamic column names	Parameters don't work for identifiers	Whitelist allowed column names
Logging with real values	Data exposure in logs	Log query template + params separately
`None` in params	NULL comparison breaks	Use `IS NULL` in SQL, not `= NULL`

# None/NULL gotcha
user_role = None

# ❌ Won't match any rows — SQL NULL != NULL
cursor.execute("SELECT * FROM users WHERE role = %s", (user_role,))

# ✅ Use IS NULL when the value might be None
if user_role is None:
    cursor.execute("SELECT * FROM users WHERE role IS NULL")
else:
    cursor.execute("SELECT * FROM users WHERE role = %s", (user_role,))

Summary

Situation	Recommended Approach
Simple parameterized query	`cursor.execute(sql, (val,))`
Multi-line query	Triple-quoted string with clause-per-line formatting
Dynamic filters	Accumulator pattern (list of clauses + params)
`IN` clause	Generate placeholders: `", ".join(["%s"] * len(ids))`
Complex joins/aggregates	SQLAlchemy Core
Reusable queries	Module-level string constants
Dynamic column names	Whitelist validation + f-string

The core rule is simple: SQL structure is yours to format; SQL values are the driver's job to escape. Internalize that boundary and most of these best practices follow naturally.

Found a gotcha I missed? Drop it in the comments — SQL has a way of surprising everyone eventually.

DEV Community