Aulvem

Posted on May 29

Reading sitemap lastmod from MDX frontmatter in Astro

#astro #sitemap #seo #webdev

@astrojs/sitemap won't read updatedDate from MDX frontmatter on its own.

Left at the defaults, every entry in the generated sitemap.xml ends up with lastmod set to the build time. Search engines and AI search treat that as "everything was updated, all the time", which is worse than not setting a freshness signal at all.

This post walks through the minimum implementation: walk MDX inside astro.config.mjs, build a path-to-date map, and feed it into serialize on @astrojs/sitemap. Paginated noindex pages get filtered out in the same pass.

What we're building

Three steps inside astro.config.mjs:

Build a lastmod map: read every blog MDX with fs.readdir, extract updatedDate ?? pubDate, key by URL path
Feed it into the sitemap: use serialize on @astrojs/sitemap to look up each URL and set item.lastmod
Drop noindex paginated pages: use filter to skip /blog/<cat>/<N>/ so the sitemap doesn't contradict the meta robots tag

Step 1: build the lastmod map

Put an async function at the top of astro.config.mjs and await it into a constant:

import { readdir, readFile } from "node:fs/promises";
import { join } from "node:path";

async function buildBlogLastmodMap() {
  const map = new Map();
  for (const lang of ["en", "ja"]) {
    const dir = join(process.cwd(), "src", "content", "blog", lang);
    let files = [];
    try {
      files = await readdir(dir);
    } catch {
      continue;
    }
    for (const file of files) {
      if (!file.endsWith(".mdx") && !file.endsWith(".md")) continue;
      const slug = file.replace(/\.(mdx|md)$/, "");
      const raw = await readFile(join(dir, file), "utf8");
      const fm = /^---\n([\s\S]*?)\n---/.exec(raw);
      if (!fm) continue;
      const front = fm[1];
      if (/^draft:\s*true/m.test(front)) continue; // drop drafts
      const updated = /^updatedDate:\s*(\S+)/m.exec(front);
      const pub = /^pubDate:\s*(\S+)/m.exec(front);
      const dateStr = (updated && updated[1]) || (pub && pub[1]);
      if (!dateStr) continue;
      const d = new Date(dateStr);
      if (Number.isNaN(d.getTime())) continue;
      const path = lang === "ja" ? `/ja/blog/${slug}/` : `/blog/${slug}/`;
      map.set(path, d.toISOString());
    }
  }
  return map;
}

const blogLastmod = await buildBlogLastmodMap();

Why not getCollection("blog")? Because astro.config.mjs is evaluated before the content loader is initialised — the Content Collections API isn't available yet.

The only fields the map needs are updatedDate and pubDate, so a light regex covers it. No YAML parser dependency for two fields.

Step 2: feed it into `serialize`

@astrojs/sitemap exposes a serialize hook that lets you rewrite each emitted URL entry:

import sitemap from "@astrojs/sitemap";

export default defineConfig({
  // ...
  integrations: [
    sitemap({
      i18n: {
        defaultLocale: "en",
        locales: { en: "en", ja: "ja" },
      },
      serialize(item) {
        const url = new URL(item.url);
        // Strip /ja/ for the branch decision so both EN and JA hit
        // the same changefreq / priority. lastmod lookup uses the
        // original pathname because the map keys keep /ja/.
        const pathname = url.pathname.replace(/^\/ja\//, "/").replace(/^\/ja$/, "/");
        if (pathname === "/") {
          item.changefreq = "daily";
          item.priority = 1.0;
        } else if (pathname.startsWith("/blog/")) {
          item.changefreq = "monthly";
          item.priority = 0.7;
          const lastmod = blogLastmod.get(url.pathname);
          if (lastmod) item.lastmod = lastmod;
        } else {
          item.changefreq = "monthly";
          item.priority = 0.5;
        }
        return item;
      },
    }),
  ],
});

changefreq and priority are set in the same hook so each path category stays consistent. priority is officially "ignored" by Google these days, but Bing and the AI crawlers still read it, so keeping it consistent is the cheap default.

Step 3: filter paginated `noindex` pages

If your blog category pages return <meta name="robots" content="noindex, follow"> from page 2 onwards (only page 1 is index-eligible), shipping page 2+ URLs in the sitemap is a contradiction.

"Listed in sitemap" reads as "please index this". "Meta robots: noindex" reads as "don't index this". Both at once is treated as a quality smell by Google and Bing.

sitemap({
  filter: (page) => {
    if (page.endsWith("/404/") || page.endsWith("/404")) return false;
    // Paginated category pages (/blog/build/2/, /blog/reviews/3/ ...)
    // are noindex — drop them so the sitemap doesn't conflict with
    // the meta robots tag.
    if (/\/blog\/(build|reviews)\/\d+\/?$/.test(new URL(page).pathname)) return false;
    return true;
  },
  // ...
});

The noindex decision and the sitemap filter are two halves of one change.

Pitfalls

A short list of things that almost broke:

Top-level await: works because astro.config.mjs is evaluated as ESM. .cjs configs won't accept it
draft: true filtering: skipping drafts during map construction is necessary, otherwise draft URLs leak into the sitemap
Regex tightness: /^updatedDate:\s*(\S+)/m reads updatedDate: 2026-05-25. Quoted strings still parse because \S+ captures "2026-05-25" whole and new Date() handles the quotes, but multi-line YAML values won't survive
Language-folder merge: en and ja are walked separately and joined into one map. Keys stay distinct (/blog/<slug>/ vs /ja/blog/<slug>/) so lookups during serialize resolve
updatedDate policy: the implementation only works if updatedDate is updated honestly. Bumping it for trivial edits poisons the signal — pair this with a "only update on substantive revision" rule

The longer write-up on the Aulvem site covers the updatedDate policy, alternatives I rejected, and the threshold for moving this code back into the official integration → Reading sitemap lastmod from MDX frontmatter — customising Astro's sitemap integration

DEV Community

Reading sitemap lastmod from MDX frontmatter in Astro

What we're building

Step 1: build the lastmod map

Step 2: feed it into `serialize`

Step 3: filter paginated `noindex` pages

Pitfalls

Top comments (0)

What we're building

Step 1: build the lastmod map

Step 2: feed it into serialize

Step 3: filter paginated noindex pages

Pitfalls

Step 2: feed it into `serialize`

Step 3: filter paginated `noindex` pages