Tomoki Ikeda

Posted on Mar 20

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

#postgres #security #webdev #saas

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

I build Nokos, an AI note-taking app. Every user's memos, diaries, and coding sessions are stored in one PostgreSQL database. One authorization bug = one user sees another's private data.

Most apps have one layer of defense: application-level auth checks. We have two. The second layer — PostgreSQL Row-Level Security — has already caught bugs that our application code missed.

The Setup: One Function, Total Isolation

Our entire RLS system hinges on one PostgreSQL function:

CREATE OR REPLACE FUNCTION current_app_user_id() RETURNS UUID AS $$
  SELECT NULLIF(current_setting('app.current_user_id', true), '')::UUID;
$$ LANGUAGE SQL STABLE SECURITY DEFINER;

Every table policy checks: WHERE user_id = current_app_user_id().

On every API request, we set the session variable inside a transaction:

export async function withRLS<T>(
  userId: string,
  callback: (tx: TransactionClient) => Promise<T>,
): Promise<T> {
  return prisma.$transaction(async (tx) => {
    await tx.$executeRaw`SELECT set_config('app.current_user_id', ${userId}, true)`;
    return callback(tx);
  });
}

The true in set_config makes it transaction-local. When the transaction ends, the variable resets. No leakage between requests.

The Bug That RLS Caught

We use fire-and-forget patterns for non-critical async work — like generating embeddings after a memo is saved:

// After memo is created, generate embedding async
generateEmbedding(contentText)
  .then(async (embedding) => {
    // BUG: This runs OUTSIDE the withRLS transaction
    await prisma.$executeRawUnsafe(
      `UPDATE memos SET embedding = $1::vector WHERE id = $2::uuid`,
      embedding, memoId,
    );
  });

This looks correct. The UPDATE has the right memo ID. What could go wrong?

Everything. Because memos has FORCE ROW LEVEL SECURITY, and this code runs outside withRLS(). There's no app.current_user_id set. The function returns NULL. The policy evaluates user_id = NULL, which is always false.

Result: 0 rows updated. No error. No warning. Complete silence.

The embedding was generated, the API call to Gemini was paid for, and the UPDATE ran successfully — it just matched zero rows. PostgreSQL doesn't consider "zero rows updated" an error.

The fix:

generateEmbedding(contentText)
  .then(async (embedding) => {
    // FIXED: wrap in withRLS so the policy can match
    await withRLS(userId, async (tx) => {
      await tx.$executeRawUnsafe(
        `UPDATE memos SET embedding = $1::vector WHERE id = $2::uuid`,
        embedding, memoId,
      );
    });
  });

Without RLS, this bug would have silently worked in dev and production. The UPDATE would hit the row directly, no policy check. We'd never know the auth boundary was missing — until someone found a way to exploit it.

FORCE vs NO FORCE: The Decision That Matters

PostgreSQL RLS has a subtle setting: FORCE ROW LEVEL SECURITY applies policies even to the table owner. Without FORCE, the owner role bypasses all policies.

We use FORCE on every table except five:

Table	NO FORCE	Why
`users`	✓	Auth middleware looks up users by `firebase_uid` before user_id is known
`user_usage`	✓	Usage records are created during first login, before RLS context exists
`api_keys`	✓	API key auth looks up keys by SHA-256 hash before user_id is known
`device_refresh_tokens`	✓	Token refresh happens before user auth
`device_codes`	✓	Device Authorization Flow — codes exist before any user is authenticated

The pattern: tables that are accessed before authentication completes need NO FORCE. Everything else — memos, books, tags, chat sessions, media — uses FORCE.

This means if we accidentally run a query outside withRLS() on a FORCE table, it returns zero rows instead of leaking data. Default deny.

Two Roles: App vs Batch

We use two PostgreSQL roles:

nokos_app — Used by the API. Subject to RLS.
nokos_batch — Used by batch jobs (diary generation, embedding backfill). Has row_security = off, completely bypassing RLS.

Why? Batch jobs need to process data across all users. The daily diary generator reads all users' memos to create personalized diaries. Running this through RLS would require setting app.current_user_id for each user in a loop — technically possible, but fragile and slow.

The trade-off: nokos_batch has no authorization boundary. We accept this because batch code runs in a controlled environment (Cloud Scheduler → Cloud Run endpoint with secret-based auth), never exposed to user input.

Policy Patterns

Direct ownership (most tables):

CREATE POLICY memos_select ON memos
  FOR SELECT USING (user_id = current_app_user_id());

JOIN-based (junction tables without user_id):

CREATE POLICY memo_tags_select ON memo_tags
  FOR SELECT USING (
    EXISTS (SELECT 1 FROM memos
            WHERE memos.id = memo_tags.memo_id
            AND memos.user_id = current_app_user_id())
  );

Open insert, restricted read (users table — anyone can sign up):

CREATE POLICY users_insert ON users
  FOR INSERT WITH CHECK (true);
CREATE POLICY users_select ON users
  FOR SELECT USING (id = current_app_user_id());

Each policy is idempotent (DROP POLICY IF EXISTS before CREATE POLICY), so we safely reapply them on every deployment.

What I'd Do Differently

Start with FORCE on every table. We initially had some tables without it and had to migrate. Starting strict is easier than tightening later.
Test the silent failure. We now have integration tests that verify: without set_config, a query on a FORCE table returns zero rows. This catches the exact bug class described above.

it("Without set_config, non-superuser sees no data", async () => {
  const result = await appPrisma.$transaction(async (tx) => {
    // No set_config call — should see nothing
    return tx.memo.findMany();
  });
  expect(result).toHaveLength(0);
});

Document the NO FORCE exceptions. Every NO FORCE table should have a comment explaining why. Future developers (or future AI agents writing your code) need to know the reasoning.

Five Takeaways

RLS is your second line of defense, not a replacement for application auth. Both layers should exist.
FORCE ROW LEVEL SECURITY is the critical setting. Without it, your table owner bypasses all policies — which is most ORM connections.
Silent zero-row updates are the real danger. RLS doesn't throw errors; it just filters. Test for this explicitly.
Pre-auth tables need NO FORCE. Any table accessed before you know the user_id must opt out of FORCE.
Separate roles for app vs batch. Don't hack around RLS in batch jobs — give them a role that bypasses it cleanly.

Try It

Nokos is protected by this exact RLS setup across 20+ tables. Free plan available — your data is isolated at the database level, not just the application level.

Have you implemented RLS in your project? What patterns worked for you?

This is article 3 in my series about building a SaaS with AI. Article 1: Zero Lines of Code. Article 2: Claude + Gemini Cost Split.

Launching on Product Hunt March 31st — follow for updates!

DEV Community

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

PostgreSQL Row-Level Security Saved My SaaS From Bugs I Didn't Know I Had

The Setup: One Function, Total Isolation

The Bug That RLS Caught

FORCE vs NO FORCE: The Decision That Matters

Two Roles: App vs Batch

Policy Patterns

What I'd Do Differently

Five Takeaways

Try It

Top comments (0)