- I blocked
/jobs?*pages from getting indexed. - I kept filters usable for users (and shareable).
- I generated canonicals +
noindexconsistently in App Router. - I wrote a tiny script to catch regressions before deploy.
Context
I’m building a job board for Psychiatric Mental Health Nurse Practitioners. Next.js 14. App Router. Supabase Postgres.
The data’s big enough to hurt. 8,000+ active listings. 2,000+ companies. I scrape 200+ jobs daily. The UI needs filters. Location, remote, salary, posted date.
And Google loved my filters. Too much.
I ended up with thousands of crawlable URLs like /jobs?remote=true&state=CA&salaryMin=160000. Same content. Slightly different. Crawl budget drain. Index bloat. Brutal.
I didn’t want to kill filters. Users need them. I just needed Google to index the right pages.
1) I decide what’s indexable. Everything else is noindex.
I keep it simple.
-
/jobs(no filters): indexable. -
/jobs?remote=true(one “primary” filter): indexable. - Anything else:
noindex,follow.
That last part matters. I still want Google to crawl job detail pages linked from filtered results.
I centralize the rule. One function. No guessing across components.
// lib/seo/indexing.ts
export type JobsSearchParams = Record;
const INDEXABLE_KEYS = new Set(["remote", "state", "q"]);
export function shouldIndexJobsListing(searchParams: JobsSearchParams): boolean {
// No params => indexable
const keys = Object.keys(searchParams).filter((k) => {
const v = searchParams[k];
if (v === undefined) return false;
if (Array.isArray(v)) return v.length > 0;
return String(v).length > 0;
});
if (keys.length === 0) return true;
// Only one allowed key => indexable
if (keys.length === 1 && INDEXABLE_KEYS.has(keys[0])) return true;
return false;
}
I spent 2 hours trying to make “smart rules.”
Like “index salary filters but only if salaryMin >= 140000.”
Don’t.
You’ll ship a bug. And you’ll forget it exists.
2) I generate canonical URLs that ignore junk params
The canonical is the other half.
Even with noindex, canonicals keep things tidy. And they prevent weird duplication when Google ignores your noindex for a while.
I canonicalize /jobs pages to a clean version:
- Keep only the one indexable key.
- Drop everything else.
- Sort params so the URL is stable.
// lib/seo/canonical.ts
import type { JobsSearchParams } from "./indexing";
const CANONICAL_ALLOWLIST = new Set(["remote", "state", "q"]);
export function canonicalJobsUrl(baseUrl: string, searchParams: JobsSearchParams): string {
const url = new URL("/jobs", baseUrl);
const entries: Array<[string, string]> = [];
for (const [k, v] of Object.entries(searchParams)) {
if (!CANONICAL_ALLOWLIST.has(k)) continue;
if (v === undefined) continue;
// Next.js can give string | string[]
const value = Array.isArray(v) ? v[0] : v;
if (!value) continue;
entries.push([k, value]);
}
// Only keep the first canonical param.
// This matches my indexing rule: 0 or 1 param.
entries.sort(([a], [b]) => a.localeCompare(b));
const first = entries[0];
if (first) url.searchParams.set(first[0], first[1]);
return url.toString();
}
One thing that bit me — q (search query) can explode into infinite combinations.
I still allow it as indexable for one term. Because PMHNP queries are usually specific.
If your site gets spammed with q=asdf123, don’t index q. Period.
3) I wire it into Next.js metadata. No custom `` hacks.
App Router makes this clean.
I compute robots and alternates.canonical in generateMetadata.
And I do it per request. Using searchParams.
`ts
// app/jobs/page.tsx
import type { Metadata } from "next";
import { shouldIndexJobsListing } from "@/lib/seo/indexing";
import { canonicalJobsUrl } from "@/lib/seo/canonical";
export async function generateMetadata(
props: { searchParams: Promise> }
): Promise {
const searchParams = await props.searchParams;
const baseUrl = process.env.NEXT_PUBLIC_SITE_URL!;
const indexable = shouldIndexJobsListing(searchParams);
const canonical = canonicalJobsUrl(baseUrl, searchParams);
return {
title: "PMHNP Jobs",
alternates: { canonical },
robots: indexable
? { index: true, follow: true }
: { index: false, follow: true },
};
}
export default async function JobsPage() {
// ...normal page code
return null;
}
`
Yes, I’m using NEXT_PUBLIC_SITE_URL.
I tried to infer the host from headers.
That got messy behind Vercel previews.
So I set it explicitly per environment.
- Preview:
https://..vercel.app - Prod: my real domain
No surprises.
4) I add a robots.txt rule. Because crawlers are stubborn.
Metadata helps.
But crawlers still hammer filtered URLs. Especially if they find them in internal links.
So I also disallow query-string crawling for the listing route.
This doesn’t replace noindex. It reduces noise.
`ts
// app/robots.ts
import type { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
const siteUrl = process.env.NEXT_PUBLIC_SITE_URL!;
return {
rules: [
{
userAgent: "",
allow: ["/"],
disallow: [
"/jobs?", // stop crawling filter combos
"/api/", // obvious
],
},
],
sitemap: ${siteUrl}/sitemap.xml,
};
}
`
I know. robots.txt is advisory.
But it still cuts crawling.
And it’s cheap.
Also, don’t disallow your job detail pages.
I almost did /jobs/* out of frustration.
That would’ve been a full self-own.
5) I test it with a script. Not with “I’ll remember.”
This is where I usually mess up.
I change filter UI. Add a param. Forget to update canonical rules.
Two weeks later, GSC is screaming.
So I wrote a tiny Node script.
It fetches a handful of URLs and asserts:
- Filter combos =>
noindex - Canonical is clean
`js
// scripts/check-jobs-indexing.mjs
const SITE = process.env.SITE_URL || "http://localhost:3000";
const cases = [
"/jobs",
"/jobs?remote=true",
"/jobs?remote=true&state=CA",
"/jobs?state=CA&salaryMin=160000",
"/jobs?q=pmhnp",
];
function assert(cond, msg) {
if (!cond) throw new Error(msg);
}
for (const path of cases) {
const url = ${SITE}${path};
const res = await fetch(url, { redirect: "manual" });
const html = await res.text();
const robots = html.match(/
Top comments (0)