Sathish

Posted on Mar 24

Next.js job board search: Postgres FTS + trgm

#nextjs #typescript #postgres #webdev

My job board has 8,000+ listings. Basic ILIKE died fast.
I combined Postgres full-text search with pg_trgm for typos.
I rank results with a single SQL function. No app-side scoring.
I ship it via Supabase RPC + a simple Next.js route.

Context

I’m building a job board for Psychiatric Mental Health Nurse Practitioners.

Search is the whole product.
If search feels dumb, users bounce.

I started with ILIKE '%term%'.
It worked for 200 listings.
Then I crossed ~8,000 active listings and it got brutal.
Slow queries. Weird ranking. Typos returning nothing.

I didn’t want Elasticsearch.
Not for this.
I already have Postgres (Supabase). So I leaned into it.

Goal: fast search, decent ranking, typo tolerance.
And something I can maintain solo.

1) I stopped doing `ILIKE`. It’s a trap.

ILIKE is fine for admin screens.
Not for user-facing search.

Two problems hit me.

First: performance.
ILIKE '%remote%' can’t use a normal btree index.
It scans.

Second: ranking.
Everything matches the same.
A job with “Remote” once beats a job with “Remote PMHNP” everywhere.

So I moved to two signals:

Full-text search for “real” matching.
Trigram similarity for typos and partials.

Here’s the schema change I actually run.
Generated columns. Indexed. Boring. Reliable.

-- 1) Extensions for search
create extension if not exists pg_trgm;

-- (optional but common)
create extension if not exists unaccent;

-- 2) Columns used by search
alter table public.jobs
  add column if not exists search_tsv tsvector
  generated always as (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(company_name, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(location, '')), 'C') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'D')
  ) stored;

-- 3) Indexes
create index if not exists jobs_search_tsv_gin
  on public.jobs using gin (search_tsv);

create index if not exists jobs_title_trgm_gin
  on public.jobs using gin (title gin_trgm_ops);

create index if not exists jobs_company_trgm_gin
  on public.jobs using gin (company_name gin_trgm_ops);

One thing that bit me — generated columns are only as good as your inputs.
I had a scraper bug that wrote empty company_name for 312 rows.
Search looked “random” until I fixed data quality.
Not glamorous.

2) I rank with one SQL function. No client math.

I want one call.
Query in. Rows out.

Also: consistent ranking.
If I score in TypeScript, I’ll change it three times and forget why.
SQL keeps me honest.

This is the function.
It returns a score.
It blends:

ts_rank_cd for FTS ranking.
similarity() for typo-ish matching.

And I clamp it.
Because trigram scores can dominate when the query is short.
“ny” matched half the planet.

create or replace function public.search_jobs(
  q text,
  limit_count int default 25,
  offset_count int default 0
)
returns table (
  id uuid,
  title text,
  company_name text,
  location text,
  url text,
  posted_at timestamptz,
  score real
)
language sql
stable
as $$
  with params as (
    select
      websearch_to_tsquery('english', q) as tsq,
      greatest(length(trim(q)), 0) as qlen
  )
  select
    j.id,
    j.title,
    j.company_name,
    j.location,
    j.url,
    j.posted_at,
    (
      -- full-text score
      (ts_rank_cd(j.search_tsv, p.tsq) * 1.0)
      +
      -- trigram score (down-weighted for short queries)
      (
        least(
          greatest(similarity(j.title, q), similarity(j.company_name, q)),
          0.60
        )
        * case when p.qlen >= 5 then 0.6 else 0.2 end
      )
    )::real as score
  from public.jobs j
  cross join params p
  where
    -- must match either full-text or be "close enough"
    (j.search_tsv @@ p.tsq)
    or (similarity(j.title, q) > 0.25)
    or (similarity(j.company_name, q) > 0.25)
  order by score desc, j.posted_at desc
  limit limit_count
  offset offset_count;
$$;

Yes, those numbers are arbitrary.
I spent 4 hours tuning them.
Most of it was wrong.

What finally worked was testing with real searches:

“pmhnp remote”
“psych np telehealth”
“kaiser pmhnp”
“new yrok” (typo)

If you don’t test typos, you’re lying to yourself.
Users type like raccoons.

3) I call it via Supabase RPC. Clean boundary.

I don’t want raw SQL in my Next.js server code.
Not because it’s “bad”.
Because I’ll copy/paste it into 3 routes.
Then I’ll forget one.

RPC keeps it centralized.
So my app just does:

take query
call function
return JSON

This is the route handler.
App Router.
Server-only.

// app/api/search/route.ts
import { NextResponse } from "next/server";
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY! // server-only
);

export async function GET(req: Request) {
  const { searchParams } = new URL(req.url);

  const q = (searchParams.get("q") ?? "").trim();
  const limit = Math.min(Number(searchParams.get("limit") ?? 25), 50);
  const offset = Math.max(Number(searchParams.get("offset") ?? 0), 0);

  if (q.length < 2) {
    return NextResponse.json({ items: [], count: 0 });
  }

  const { data, error } = await supabase.rpc("search_jobs", {
    q,
    limit_count: limit,
    offset_count: offset,
  });

  if (error) {
    // I’ve seen: "canceling statement due to statement timeout"
    return NextResponse.json({ error: error.message }, { status: 500 });
  }

  return NextResponse.json({ items: data ?? [], count: data?.length ?? 0 });
}

Service role key is touchy.
Don’t leak it.
I keep this route server-side only.
No client components calling Supabase directly for search.

Also.
I set a statement timeout on Supabase.
Because one runaway query can eat your day.

4) I debug search with `EXPLAIN ANALYZE`. Every time.

Search “feels” slow.
Until you measure it.

I run the exact query shape.
Then I read the plan.
Not the vibes.

Here’s the kind of thing I run in the SQL editor.
Same function logic, simplified.

explain (analyze, buffers)
select id, title
from public.jobs
where
  search_tsv @@ websearch_to_tsquery('english', 'pmhnp remote')
  or similarity(title, 'pmhnp remote') > 0.25
order by
  (ts_rank_cd(search_tsv, websearch_to_tsquery('english', 'pmhnp remote'))
   + least(similarity(title, 'pmhnp remote'), 0.60) * 0.6) desc
limit 25;

If you don’t see your GIN indexes used, stop.
Fix that first.

One dumb mistake I made.
I originally used to_tsquery().
It blew up on user input like c++ or remote (west).
Error was:

ERROR: syntax error in tsquery: "remote (west)"

websearch_to_tsquery() fixed it.
It handles normal human text.
That’s what I needed.

Results

Before: ILIKE search regularly spiked to 900–1400ms on my Vercel API route when the query was common (like “remote”). It also returned “meh” ordering. Typos like “psychiatrc” returned zero rows.

After: Postgres FTS + trigram search stays under ~120–220ms for my top queries, returning 25 rows with stable ranking. Typos now return usable results (usually 10–25 rows) instead of nothing. On the data side, this setup handles my current volume: 8,000+ active listings across 2,000+ companies, with 200+ new jobs scraped daily.

Key takeaways

Drop ILIKE '%term%' for user search. It doesn’t scale.
Store a weighted tsvector column and GIN-index it. Don’t compute it per query.
Use websearch_to_tsquery() for real user input. to_tsquery() will betray you.
Add trigram similarity for typos, but cap and down-weight it for short queries.
Keep scoring in SQL (RPC). Your Next.js code stays tiny.

Closing

I’m still not fully happy with ranking for super-short queries like “ny” and “la”. It matches too much, even with down-weighting.

How are you handling 2–3 character queries in Postgres search—hard minimum length, prefix-only, or a separate “location autocomplete” table?

DEV Community

Next.js job board search: Postgres FTS + trgm

Context

1) I stopped doing `ILIKE`. It’s a trap.

2) I rank with one SQL function. No client math.

3) I call it via Supabase RPC. Clean boundary.

4) I debug search with `EXPLAIN ANALYZE`. Every time.

Results

Key takeaways

Closing

Top comments (0)

Context

1) I stopped doing ILIKE. It’s a trap.

2) I rank with one SQL function. No client math.

3) I call it via Supabase RPC. Clean boundary.

4) I debug search with EXPLAIN ANALYZE. Every time.

Results

Key takeaways

Closing

1) I stopped doing `ILIKE`. It’s a trap.

4) I debug search with `EXPLAIN ANALYZE`. Every time.