Sathish

Posted on May 5

Next.js job board: stop Google indexing filters

#nextjs #typescript #webdev #tutorial

I blocked /jobs?* pages from getting indexed.
I kept filters usable for users (and shareable).
I generated canonicals + noindex consistently in App Router.
I wrote a tiny script to catch regressions before deploy.

Context

I’m building a job board for Psychiatric Mental Health Nurse Practitioners. Next.js 14. App Router. Supabase Postgres.

The data’s big enough to hurt. 8,000+ active listings. 2,000+ companies. I scrape 200+ jobs daily. The UI needs filters. Location, remote, salary, posted date.

And Google loved my filters. Too much.

I ended up with thousands of crawlable URLs like /jobs?remote=true&state=CA&salaryMin=160000. Same content. Slightly different. Crawl budget drain. Index bloat. Brutal.

I didn’t want to kill filters. Users need them. I just needed Google to index the right pages.

1) I decide what’s indexable. Everything else is `noindex`.

I keep it simple.

/jobs (no filters): indexable.
/jobs?remote=true (one “primary” filter): indexable.
Anything else: noindex,follow.

That last part matters. I still want Google to crawl job detail pages linked from filtered results.

I centralize the rule. One function. No guessing across components.

// lib/seo/indexing.ts

export type JobsSearchParams = Record;

const INDEXABLE_KEYS = new Set(["remote", "state", "q"]);

export function shouldIndexJobsListing(searchParams: JobsSearchParams): boolean {
  // No params =&gt; indexable
  const keys = Object.keys(searchParams).filter((k) =&gt; {
    const v = searchParams[k];
    if (v === undefined) return false;
    if (Array.isArray(v)) return v.length &gt; 0;
    return String(v).length &gt; 0;
  });

  if (keys.length === 0) return true;

  // Only one allowed key =&gt; indexable
  if (keys.length === 1 &amp;&amp; INDEXABLE_KEYS.has(keys[0])) return true;

  return false;
}

I spent 2 hours trying to make “smart rules.”

Like “index salary filters but only if salaryMin >= 140000.”

Don’t.
You’ll ship a bug. And you’ll forget it exists.

2) I generate canonical URLs that ignore junk params

The canonical is the other half.

Even with noindex, canonicals keep things tidy. And they prevent weird duplication when Google ignores your noindex for a while.

I canonicalize /jobs pages to a clean version:

Keep only the one indexable key.
Drop everything else.
Sort params so the URL is stable.

// lib/seo/canonical.ts

import type { JobsSearchParams } from "./indexing";

const CANONICAL_ALLOWLIST = new Set(["remote", "state", "q"]);

export function canonicalJobsUrl(baseUrl: string, searchParams: JobsSearchParams): string {
  const url = new URL("/jobs", baseUrl);

  const entries: Array&lt;[string, string]&gt; = [];

  for (const [k, v] of Object.entries(searchParams)) {
    if (!CANONICAL_ALLOWLIST.has(k)) continue;
    if (v === undefined) continue;

    // Next.js can give string | string[]
    const value = Array.isArray(v) ? v[0] : v;
    if (!value) continue;

    entries.push([k, value]);
  }

  // Only keep the first canonical param.
  // This matches my indexing rule: 0 or 1 param.
  entries.sort(([a], [b]) =&gt; a.localeCompare(b));
  const first = entries[0];
  if (first) url.searchParams.set(first[0], first[1]);

  return url.toString();
}

One thing that bit me — q (search query) can explode into infinite combinations.

I still allow it as indexable for one term. Because PMHNP queries are usually specific.

If your site gets spammed with q=asdf123, don’t index q. Period.

3) I wire it into Next.js metadata. No custom `` hacks.

App Router makes this clean.

I compute robots and alternates.canonical in generateMetadata.

And I do it per request. Using searchParams.

`ts
// app/jobs/page.tsx

import type { Metadata } from "next";
import { shouldIndexJobsListing } from "@/lib/seo/indexing";
import { canonicalJobsUrl } from "@/lib/seo/canonical";

export async function generateMetadata(
props: { searchParams: Promise> }
): Promise {
const searchParams = await props.searchParams;
const baseUrl = process.env.NEXT_PUBLIC_SITE_URL!;

const indexable = shouldIndexJobsListing(searchParams);
const canonical = canonicalJobsUrl(baseUrl, searchParams);

return {
title: "PMHNP Jobs",
alternates: { canonical },
robots: indexable
? { index: true, follow: true }
: { index: false, follow: true },
};
}

export default async function JobsPage() {
// ...normal page code
return null;
}
`

Yes, I’m using NEXT_PUBLIC_SITE_URL.

I tried to infer the host from headers.
That got messy behind Vercel previews.

So I set it explicitly per environment.

Preview: https://..vercel.app
Prod: my real domain

No surprises.

4) I add a `robots.txt` rule. Because crawlers are stubborn.

Metadata helps.

But crawlers still hammer filtered URLs. Especially if they find them in internal links.

So I also disallow query-string crawling for the listing route.

This doesn’t replace noindex. It reduces noise.

`ts
// app/robots.ts

import type { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
const siteUrl = process.env.NEXT_PUBLIC_SITE_URL!;

return {
rules: [
{
userAgent: "",
allow: ["/"],
disallow: [
"/jobs?", // stop crawling filter combos
"/api/", // obvious
],
},
],
sitemap: ${siteUrl}/sitemap.xml,
};
}
`

I know. robots.txt is advisory.

But it still cuts crawling.
And it’s cheap.

Also, don’t disallow your job detail pages.
I almost did /jobs/* out of frustration.
That would’ve been a full self-own.

5) I test it with a script. Not with “I’ll remember.”

This is where I usually mess up.

I change filter UI. Add a param. Forget to update canonical rules.
Two weeks later, GSC is screaming.

So I wrote a tiny Node script.
It fetches a handful of URLs and asserts:

Filter combos => noindex
Canonical is clean

`js
// scripts/check-jobs-indexing.mjs

const SITE = process.env.SITE_URL || "http://localhost:3000";

const cases = [
"/jobs",
"/jobs?remote=true",
"/jobs?remote=true&state=CA",
"/jobs?state=CA&salaryMin=160000",
"/jobs?q=pmhnp",
];

function assert(cond, msg) {
if (!cond) throw new Error(msg);
}

for (const path of cases) {
const url = ${SITE}${path};
const res = await fetch(url, { redirect: "manual" });
const html = await res.text();

const robots = html.match(/

DEV Community

Next.js job board: stop Google indexing filters

Context

1) I decide what’s indexable. Everything else is `noindex`.

2) I generate canonical URLs that ignore junk params

3) I wire it into Next.js metadata. No custom `` hacks.

4) I add a `robots.txt` rule. Because crawlers are stubborn.

5) I test it with a script. Not with “I’ll remember.”

Top comments (0)

Context

1) I decide what’s indexable. Everything else is noindex.

2) I generate canonical URLs that ignore junk params

3) I wire it into Next.js metadata. No custom `` hacks.

4) I add a robots.txt rule. Because crawlers are stubborn.

5) I test it with a script. Not with “I’ll remember.”

1) I decide what’s indexable. Everything else is `noindex`.

4) I add a `robots.txt` rule. Because crawlers are stubborn.