Jackson Kasi

Posted on Apr 3

Building a Cross-Relational Search Engine in Drizzle ORM (No Hardcoded WHERE Clauses)

#typescript #database #architecture #opensource

(This is Part 4 of my series on building scalable infrastructure. If you missed them, check out Part 1: Bridging Drizzle & TanStack, Part 2: The Engine-Adapter Pattern, and Part 3: The Dynamic Query Compiler).

📖 The Librarian, The Boxes, and The Database

Imagine you walk into a massive library. You ask the librarian: "Can you give me a list of authors who have written a book with the word 'Magic' in the title?"

Approach A (The Bad Way):
The librarian walks into the back room, finds every single author who wrote a book with 'Magic', packs up every single book that author has ever written, and drops 50 heavy boxes on your desk. You now have to manually dig through the boxes, throwing away duplicates, just to write down the author's name.

Approach B (The Smart Way):
The librarian walks into the aisle, glances at the shelves, sees at least one 'Magic' book for an author, writes the author's name on a clean sheet of paper, and hands you a single piece of paper.

🛑 The Problem: We Are Dropping Boxes on Desks

If you have ever built an admin dashboard, you know the nightmare of the "Global Search Box".

The PM asks for a single input field that searches a user's name, their email, and the titles of their posts.

If you are writing raw Drizzle ORM or SQL, your query usually mutates into a terrifying monster of LEFT JOINs and OR clauses:

// ❌ The "Approach A" Nightmare (Dropping Boxes)
const query = "magic";
db.select().from(users)
  .leftJoin(posts, eq(users.id, posts.userId))
  .where(
    or(
      ilike(users.name, `%${query}%`),
      ilike(posts.title, `%${query}%`)
    )
  )

This code is fundamentally broken.
The LEFT JOIN is "Approach A". It duplicates the user row for every single post they have written. If a user has 1,000 posts, your database sends 1,000 identical user records over the network. You then have to write complex GROUP BY logic in your code just to get a clean list of users.

When I built the engine for TableCraft, I needed to solve this permanently. I needed Approach B. I needed a Cross-Relational Search Engine that generates dynamic SQL vectors using EXISTS instead of LEFT JOIN.

Here is the exact architectural pattern to build it.

🧠 The Core Concept: The SQL Vector Array

Instead of writing a massive where(or(...)) block, we treat conditions as Vectors (an array of SQL AST nodes).

We dynamically build this array based on a configuration object, and then use the spread operator to evaluate it.

In TableCraft, the developer defines the search boundaries at the schema level:

// Define exactly what the global search box is allowed to hit
const usersTable = defineTable(schema.users)
  .search(
    "name", 
    "email", 
    "posts.title" // Nested relation!
  );

When a request comes in (?search=magic), the engine needs to compile "posts.title" into a valid SQL subquery.

🏗️ Algorithm: The Cross-Relational Subquery Builder

To avoid the LEFT JOIN duplication problem, we do not join the tables at the root level. Instead, we use SQL EXISTS subqueries (The Librarian's "glance at the shelf" approach).

If the search target is posts.title, the compiler generates this SQL under the hood:
EXISTS (SELECT 1 FROM posts WHERE posts.user_id = users.id AND posts.title ILIKE '%magic%')

Here is the simplified algorithm to compile this dynamically in Drizzle:

1. The Search Dispatcher

We iterate over the allowed search fields. If it's a local column (name), we build a simple ilike. If it has a dot (posts.title), we hand it off to the Subquery Compiler.

import { ilike, or, SQL } from "drizzle-orm";

function buildGlobalSearch(
  table: AnyPgTable, 
  searchFields: string[], 
  query: string
): SQL | undefined {
  const conditions: SQL[] = [];
  const searchTerm = `%${query}%`;

  for (const field of searchFields) {
    if (field.includes(".")) {
      // It's a relational search! Hand off to the compiler.
      conditions.push(buildExistsSubquery(table, field, searchTerm));
    } else {
      // It's a local search.
      const dbColumn = getColumnMap(table).get(field);
      if (dbColumn) conditions.push(ilike(dbColumn, searchTerm));
    }
  }

  return conditions.length > 0 ? or(...conditions) : undefined;
}

2. The Exists Subquery Compiler

This is the magic. We use the Drizzle getTableRelations introspection to find the foreign keys, then we generate the EXISTS clause dynamically.

import { sql } from "drizzle-orm";

function buildExistsSubquery(
  rootTable: AnyPgTable, 
  relationPath: string, 
  searchTerm: string
): SQL {
  const [relationName, columnName] = relationPath.split(".");

  // Introspect schema metadata
  const relations = getTableRelations(rootTable);
  const relationDef = relations[relationName];

  // Get the target table (e.g., 'posts')
  const targetTable = schema[relationDef.referencedTableName];
  const targetColumn = getColumnMap(targetTable).get(columnName);

  // Get the foreign key columns mapping
  const sourceFk = rootTable[relationDef.fields[0]];
  const targetFk = targetTable[relationDef.references[0]];

  // Compile the SQL EXISTS AST dynamically!
  return sql`EXISTS (
    SELECT 1 FROM ${targetTable} 
    WHERE ${targetFk} = ${sourceFk} 
    AND ${targetColumn} ILIKE ${searchTerm}
  )`;
}

🚀 The Power of the Dynamic Engine

Because we abstracted the search logic into a compiler, look at what happens to our API endpoint:

// 1. Parse the URL
const searchQuery = req.query.search; // "magic"

// 2. Compile the Dynamic SQL Vectors
const searchAST = buildGlobalSearch(schema.users, ["name", "email", "posts.title"], searchQuery);

// 3. Execute
const results = await db.select().from(schema.users).where(searchAST);

Zero hardcoded joins. Zero duplicated rows.

If the product manager says, "Hey, can we also search by the user's company name?", you don't touch the backend API. You don't write another LEFT JOIN. You just update the definition array:

.search("name", "email", "posts.title", "company.name")

The engine resolves the relation, builds the subquery, and executes perfectly.

🛑 Stop Hardcoding Your Backends

If you are building SaaS platforms, internal tools, or SDKs, this is the architecture you need to adopt.

Writing one-off SQL queries for every dashboard table is a massive waste of human capital. Build the engine once, and let it compile your logic for you.

If you want to see how this EXISTS subquery algorithm is recursively implemented for infinitely deep relations (e.g., posts.comments.author.name), I open-sourced the entire engine core:

👉 Study the Search Compiler Architecture in TableCraft (GitHub) (jacksonkasi1/TableCraft)

🔄 What's Next in the Series?

We have successfully built the backend engine: It parses URLs, compiles dynamic relational filters, handles cursor pagination, and executes cross-relational global searches.

But a backend API is useless without a frontend.

How do you make the React frontend (TanStack Table) automatically know what columns exist, what filters are allowed, and how to render the UI without writing duplicate TypeScript interfaces?

In Part 5, we will dive into The Metadata Builder Protocol—how to generate a semantic JSON schema from the Drizzle backend so your frontend builds itself.

Hit the Follow button to catch the next masterclass.

Question for the builders: Do you prefer EXISTS subqueries or LEFT JOIN + GROUP BY when handling 1-to-many searches? Let's argue about query planners in the comments.

DEV Community