Sathish

Posted on Apr 14

Next.js job board: stop indexing expired jobs

#nextjs #typescript #postgres #webdev

I stopped Google indexing expired job pages. Without deleting them.
I used Next.js 14 + Supabase to return 410/404 correctly.
I added a tiny robots guardrail to prevent “helpful” bugs.
I shipped a sitemap that only includes active listings.

Context

I’m building a job board for Psychiatric Mental Health Nurse Practitioners.

It has 8,000+ active listings and ~2,000 companies. New jobs arrive daily (200+ scraped/day). Old jobs expire daily too.

My first version treated “expired” as a UI state. The route still returned 200 OK. Google loved that. Brutal.

I ended up with thousands of low-value pages indexed. Many were “Apply closed” pages. Some were thin because the source removed details.

The fix wasn’t “better SEO.” It was HTTP correctness. And a sitemap that doesn’t lie.

1) I model job lifecycle in Postgres. Not in UI

I needed a single truth: is this job indexable?

So I made it explicit in the table. Status. Timestamps. And an immutable first_seen_at so my scraper doesn’t rewrite history.

Here’s the migration I used (Supabase Postgres).

-- 001_jobs_lifecycle.sql

-- Jobs can be active, expired, or removed.
-- removed = source page is gone, or legal takedown, etc.
-- expired = time-based expiration or “closed” detected

create type job_status as enum ('active', 'expired', 'removed');

alter table public.jobs
  add column if not exists status job_status not null default 'active',
  add column if not exists first_seen_at timestamptz not null default now(),
  add column if not exists last_seen_at timestamptz not null default now(),
  add column if not exists expires_at timestamptz,
  add column if not exists removed_at timestamptz;

-- Helps query “what’s indexable” fast.
create index if not exists jobs_status_expires_at_idx
  on public.jobs (status, expires_at);

create index if not exists jobs_last_seen_at_idx
  on public.jobs (last_seen_at desc);

One thing that bit me — I originally used a boolean is_active.

Then I needed “removed” vs “expired” behavior. Different HTTP codes. Different sitemap behavior.

Enums saved me from the boolean mess.

2) I return 410 for expired. 404 for removed

This is the whole point.

If a job is expired, I want it to disappear from the index fast. 410 Gone is explicit.

If the job was removed (source vanished, or I removed it), I return 404.

Next.js notFound() always returns 404. That’s not enough.

So I return a NextResponse with the right status.

// app/jobs/[slug]/page.tsx
import { createClient } from "@supabase/supabase-js";
import { notFound } from "next/navigation";
import { NextResponse } from "next/server";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY! // server only
);

type JobRow = {
  id: string;
  slug: string;
  title: string;
  status: "active" | "expired" | "removed";
  expires_at: string | null;
};

export default async function JobPage({
  params,
}: {
  params: Promise<{ slug: string }>;
}) {
  const { slug } = await params;

  const { data: job, error } = await supabase
    .from("jobs")
    .select("id, slug, title, status, expires_at")
    .eq("slug", slug)
    .maybeSingle();

  if (error) throw error;
  if (!job) return notFound();

  // Removed: behave like a missing resource.
  if (job.status === "removed") return notFound();

  // Expired: return 410. Still render a human page.
  if (job.status === "expired") {
    return new NextResponse(
      `${escapeHtml(
        job.title
      )} (Closed)# Job closed
`,
      {
        status: 410,
        headers: {
          "content-type": "text/html; charset=utf-8",
          "cache-control": "public, max-age=300",
        },
      }
    );
  }

  // Active: normal render.
  return (

      # {job.title}

      {/* rest of the job page */}

  );
}

function escapeHtml(input: string) {
  return input
    .replaceAll("&", "&")
    .replaceAll("<", "<")
    .replaceAll(">", ">")
    .replaceAll('"', """)
    .replaceAll("'", "'");
}

Yeah. It’s weird returning raw HTML in a page.

I tried to do it “the React way” for 410. Spent 4 hours. Most of it was wrong.

In App Router, you don’t get an easy “render UI but with status 410” primitive. So I went direct.

And I keep it intentionally minimal. No indexing. No internal links. Just a polite dead-end.

3) I enforce “noindex” for anything not active

Status codes are good.

But bugs happen. Scrapers misclassify. Admin toggles go wrong. Somebody changes code and accidentally returns 200 for expired.

So I added a second guardrail: robots meta.

If a job isn’t active, it gets noindex, nofollow.

// app/jobs/[slug]/metadata.ts
import type { Metadata } from "next";
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

type JobMetaRow = {
  title: string;
  status: "active" | "expired" | "removed";
};

export async function generateMetadata({
  params,
}: {
  params: Promise<{ slug: string }>;
}): Promise {
  const { slug } = await params;

  const { data: job } = await supabase
    .from("jobs")
    .select("title, status")
    .eq("slug", slug)
    .maybeSingle();

  if (!job) return { title: "Not found" };

  const isIndexable = job.status === "active";

  return {
    title: job.title,
    robots: isIndexable
      ? { index: true, follow: true }
      : { index: false, follow: false },
  };
}

Two layers now.

Active: 200 + index,follow
Expired: 410 + noindex,nofollow
Removed: 404

I sleep better.

4) I generate a sitemap from the DB. Only active jobs

My original sitemap was route-based.

/jobs/[slug] exists, so it got listed. That’s how I fed crawlers a buffet of expired pages.

Now I generate the sitemap from Postgres.

Only active jobs. And I cap it. Because 8,000 URLs is fine, but it keeps growing.

// app/sitemap.ts
import type { MetadataRoute } from "next";
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

export default async function sitemap(): Promise {
  const siteUrl = process.env.NEXT_PUBLIC_SITE_URL!; // e.g. https://pmhnp.example

  // Only active jobs get into the sitemap.
  // I keep it under 45,000 to stay safe with size.
  const { data: jobs, error } = await supabase
    .from("jobs")
    .select("slug, last_seen_at")
    .eq("status", "active")
    .order("last_seen_at", { ascending: false })
    .limit(45000);

  if (error) throw error;

  const urls: MetadataRoute.Sitemap = jobs.map((j) => ({
    url: `${siteUrl}/jobs/${j.slug}`,
    lastModified: new Date(j.last_seen_at),
    changeFrequency: "daily",
    priority: 0.6,
  }));

  // Add a couple of stable pages.
  urls.push(
    {
      url: `${siteUrl}/`,
      lastModified: new Date(),
      changeFrequency: "daily",
      priority: 1,
    },
    {
      url: `${siteUrl}/jobs`,
      lastModified: new Date(),
      changeFrequency: "hourly",
      priority: 0.8,
    }
  );

  return urls;
}

This also forced me to keep status accurate.

If my scraper forgets to expire something, it stays in the sitemap. That’s a loud failure. Good.

5) I expire jobs automatically with one SQL update

I didn’t want a complicated workflow.

My scraper sets last_seen_at = now() whenever it still sees the job. If a job hasn’t been seen in 30 days, it’s probably closed.

So once per day, I run one SQL statement.

This is intentionally boring.

-- 002_expire_jobs.sql

-- Expire jobs not seen in 30 days.
-- Keeps “active” set clean for sitemap + indexing.

update public.jobs
set
  status = 'expired',
  expires_at = coalesce(expires_at, now())
where
  status = 'active'
  and last_seen_at < now() - interval '30 days';

In Supabase, I run this via a scheduled job (pg_cron if you have it, or a Vercel Cron hitting an API route that executes the SQL).

I used Vercel Cron.

Because I already have it. And I didn’t want to fight extension permissions.

Results

After this change, my sitemap dropped from 12,418 URLs to 8,091 URLs. That’s just “active” jobs now.

I also stopped serving 200 for closed listings. Expired jobs now return 410 consistently. Removed jobs return 404.

In Google Search Console, “Indexed, not submitted in sitemap” for job URLs went down from 3,204 to 611 over 10 days. Not zero yet, but it’s finally moving the right direction.

And my crawl stats stopped wasting time on dead pages.

Key takeaways

Don’t encode lifecycle in UI state. Put it in Postgres.
Use 410 Gone for expired content. It clears faster than a soft 404.
Add noindex as a second guardrail. Bugs happen.
Generate sitemaps from your database, not your route tree.
One daily SQL update beats a “smart” expiration pipeline.

Closing

If you’re running a job board (or any scraped directory), how are you deciding between 404 and 410?

Do you return 410 immediately on “closed”, or do you keep a grace period so the page can still help users for a week?

DEV Community