@astrojs/sitemap won't read updatedDate from MDX frontmatter on its own.
Left at the defaults, every entry in the generated sitemap.xml ends up with lastmod set to the build time. Search engines and AI search treat that as "everything was updated, all the time", which is worse than not setting a freshness signal at all.
This post walks through the minimum implementation: walk MDX inside astro.config.mjs, build a path-to-date map, and feed it into serialize on @astrojs/sitemap. Paginated noindex pages get filtered out in the same pass.
What we're building
Three steps inside astro.config.mjs:
-
Build a lastmod map: read every blog MDX with
fs.readdir, extractupdatedDate ?? pubDate, key by URL path -
Feed it into the sitemap: use
serializeon@astrojs/sitemapto look up each URL and setitem.lastmod -
Drop noindex paginated pages: use
filterto skip/blog/<cat>/<N>/so the sitemap doesn't contradict the meta robots tag
Step 1: build the lastmod map
Put an async function at the top of astro.config.mjs and await it into a constant:
import { readdir, readFile } from "node:fs/promises";
import { join } from "node:path";
async function buildBlogLastmodMap() {
const map = new Map();
for (const lang of ["en", "ja"]) {
const dir = join(process.cwd(), "src", "content", "blog", lang);
let files = [];
try {
files = await readdir(dir);
} catch {
continue;
}
for (const file of files) {
if (!file.endsWith(".mdx") && !file.endsWith(".md")) continue;
const slug = file.replace(/\.(mdx|md)$/, "");
const raw = await readFile(join(dir, file), "utf8");
const fm = /^---\n([\s\S]*?)\n---/.exec(raw);
if (!fm) continue;
const front = fm[1];
if (/^draft:\s*true/m.test(front)) continue; // drop drafts
const updated = /^updatedDate:\s*(\S+)/m.exec(front);
const pub = /^pubDate:\s*(\S+)/m.exec(front);
const dateStr = (updated && updated[1]) || (pub && pub[1]);
if (!dateStr) continue;
const d = new Date(dateStr);
if (Number.isNaN(d.getTime())) continue;
const path = lang === "ja" ? `/ja/blog/${slug}/` : `/blog/${slug}/`;
map.set(path, d.toISOString());
}
}
return map;
}
const blogLastmod = await buildBlogLastmodMap();
Why not getCollection("blog")? Because astro.config.mjs is evaluated before the content loader is initialised — the Content Collections API isn't available yet.
The only fields the map needs are updatedDate and pubDate, so a light regex covers it. No YAML parser dependency for two fields.
Step 2: feed it into serialize
@astrojs/sitemap exposes a serialize hook that lets you rewrite each emitted URL entry:
import sitemap from "@astrojs/sitemap";
export default defineConfig({
// ...
integrations: [
sitemap({
i18n: {
defaultLocale: "en",
locales: { en: "en", ja: "ja" },
},
serialize(item) {
const url = new URL(item.url);
// Strip /ja/ for the branch decision so both EN and JA hit
// the same changefreq / priority. lastmod lookup uses the
// original pathname because the map keys keep /ja/.
const pathname = url.pathname.replace(/^\/ja\//, "/").replace(/^\/ja$/, "/");
if (pathname === "/") {
item.changefreq = "daily";
item.priority = 1.0;
} else if (pathname.startsWith("/blog/")) {
item.changefreq = "monthly";
item.priority = 0.7;
const lastmod = blogLastmod.get(url.pathname);
if (lastmod) item.lastmod = lastmod;
} else {
item.changefreq = "monthly";
item.priority = 0.5;
}
return item;
},
}),
],
});
changefreq and priority are set in the same hook so each path category stays consistent. priority is officially "ignored" by Google these days, but Bing and the AI crawlers still read it, so keeping it consistent is the cheap default.
Step 3: filter paginated noindex pages
If your blog category pages return <meta name="robots" content="noindex, follow"> from page 2 onwards (only page 1 is index-eligible), shipping page 2+ URLs in the sitemap is a contradiction.
"Listed in sitemap" reads as "please index this". "Meta robots: noindex" reads as "don't index this". Both at once is treated as a quality smell by Google and Bing.
sitemap({
filter: (page) => {
if (page.endsWith("/404/") || page.endsWith("/404")) return false;
// Paginated category pages (/blog/build/2/, /blog/reviews/3/ ...)
// are noindex — drop them so the sitemap doesn't conflict with
// the meta robots tag.
if (/\/blog\/(build|reviews)\/\d+\/?$/.test(new URL(page).pathname)) return false;
return true;
},
// ...
});
The noindex decision and the sitemap filter are two halves of one change.
Pitfalls
A short list of things that almost broke:
-
Top-level
await: works becauseastro.config.mjsis evaluated as ESM..cjsconfigs won't accept it -
draft: truefiltering: skipping drafts during map construction is necessary, otherwise draft URLs leak into the sitemap -
Regex tightness:
/^updatedDate:\s*(\S+)/mreadsupdatedDate: 2026-05-25. Quoted strings still parse because\S+captures"2026-05-25"whole andnew Date()handles the quotes, but multi-line YAML values won't survive -
Language-folder merge: en and ja are walked separately and joined into one map. Keys stay distinct (
/blog/<slug>/vs/ja/blog/<slug>/) so lookups duringserializeresolve -
updatedDatepolicy: the implementation only works ifupdatedDateis updated honestly. Bumping it for trivial edits poisons the signal — pair this with a "only update on substantive revision" rule
The longer write-up on the Aulvem site covers the updatedDate policy, alternatives I rejected, and the threshold for moving this code back into the official integration → Reading sitemap lastmod from MDX frontmatter — customising Astro's sitemap integration
Top comments (0)