The data isolation audit: every endpoint must be scoped to the requesting user

#learning #backend

The bug that's easy to miss in review

You're building a multi-tenant application. A user can see their own data, not anyone else's.

You add a new endpoint:

@router.get("/api/searches")
async def get_searches(db = Depends(get_db)):
    return db.query("SELECT * FROM searches WHERE active = 1")

You test it. It returns your searches. Works correctly — for you, because you're the only user in development.

In production, with multiple users, it returns everyone's searches to everyone. The user_id filter is missing. You have a data isolation breach.

Why this happens

Data isolation failures are rarely intentional. They happen because:

The developer tested with a single user account and didn't observe the failure
The filter was present in some query functions and assumed in others
A refactor removed the filter accidentally
The endpoint was copied from a public-data endpoint and the filter wasn't added

Code review catches many bugs, but data isolation failures are hard to spot unless the reviewer is specifically looking for missing user_id clauses on every query.

The systematic audit

Before shipping any feature, audit every endpoint:

For each endpoint that returns data:
  □ Does the query include WHERE user_id = current_user.id?
  □ OR is this data intentionally public? (document why)
  □ OR is this an admin endpoint? (require admin role check, not just auth)

For admin bypass, the filter must be explicit, not omitted:

# Wrong: admin bypass by omitting the filter
if user.is_admin:
    return db.query("SELECT * FROM searches")
else:
    return db.query("SELECT * FROM searches WHERE user_id = ?", [user.id])

# Right: explicit bypass with documentation
if user.is_admin:
    # Admin can see all searches for support purposes
    return db.query("SELECT * FROM searches ORDER BY created_at DESC")

The distinction matters for auditing. An omitted filter is invisible. An explicit # Admin can see all comment is visible and intentional.

Writing negative tests

For each scoped endpoint, write a test that verifies cross-user access is blocked:

def test_user_cannot_see_other_users_searches(client, user_a, user_b):
    # user_b creates a search
    client.post("/api/searches", json={"keyword": "test"}, headers=auth(user_b))

    # user_a fetches searches — should not see user_b's search
    res = client.get("/api/searches", headers=auth(user_a))
    slugs = [s["keyword"] for s in res.json()]
    assert "test" not in slugs

This test fails immediately if the user_id filter is missing. It cannot be accidentally removed during a refactor without breaking the test.

Geolocation and sensitive fields: negative tests for field presence

Some data isolation isn't about users — it's about which fields should never appear in any response:

def test_api_response_contains_no_location_data(client, user):
    res = client.get("/api/products", headers=auth(user))
    for product in res.json():
        assert "location_lat" not in product
        assert "location_lng" not in product
        assert "seller_address" not in product

These negative tests verify that sensitive fields are absent from the response. They're easy to write and catch the "I added a new field to the model and forgot to exclude it from the serialiser" class of bug.

The rule

Every query that touches user data has user_id = current_user.id in its WHERE clause, or a documented, tested reason why it doesn't. There is no middle ground.