DEV Community

Aulvem
Aulvem

Posted on

Catching content rule violations at build time with Astro Content Collections + Zod

If you run a Markdown-based blog long enough, the frontmatter starts accumulating rules. "A reviews post must carry an ad disclosure." "FAQ questions in JSON-LD must also appear in the body." Eventually a README check-list isn't enough — you forget.

Astro Content Collections plus Zod lets you push most of those rules into build failures. .refine() couples two fields, nested z.object types your structured data, and a violation gets caught at astro build.

This post is the code-first version of the setup I use on aulvem.com. The longer version with operational notes is linked at the end.

Minimal setup: defineCollection + z.object

// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";

const blog = defineCollection({
  loader: glob({
    pattern: "**/[^_]*.{md,mdx}",
    base: "./src/content/blog",
  }),
  schema: z.object({
    title: z.string(),
    description: z.string(),
    pubDate: z.coerce.date(),
    category: z.enum(["build", "reviews"]),
    tags: z.array(z.string()).default([]),
    draft: z.boolean().default(false),
    affiliate: z.boolean().default(false),
  }),
});

export const collections = { blog };
Enter fullscreen mode Exit fullscreen mode

Four moves cover most of the surface:

  • z.enum pins the category to a fixed set — typos break the build
  • z.coerce.date reads 2026-05-23 as a Date
  • .default(false) makes the field omissible at the YAML side
  • z.array(z.string()) and other composites work as-is

This is straight out of the Astro 5 docs. The interesting work starts with .refine().

.refine() for "two fields must move together"

When two fields are coupled — change one, the other must follow — .refine() at the end of the schema is the right shape. Aulvem's case: category: reviews posts must have affiliate: true so the disclosure banner and rel="sponsored" injection both kick in.

const blog = defineCollection({
  loader: glob({ /* ... */ }),
  schema: z
    .object({
      title: z.string(),
      category: z.enum(["build", "reviews"]),
      affiliate: z.boolean().default(false),
      // ...
    })
    .refine((data) => (data.category === "reviews") === data.affiliate, {
      message: "affiliate must be true iff category is 'reviews'",
      path: ["affiliate"],
    }),
});
Enter fullscreen mode Exit fullscreen mode

(A === B) === affiliate reads as "these two are always equal" — same logic as XOR, easier to scan months later.

Build error from a reviews post that forgot affiliate: true:

[ContentEntryInvalidError] Content config error in `blog → 2026-05-...`:
affiliate must be true iff category is 'reviews'
  at affiliate
Enter fullscreen mode Exit fullscreen mode

message lands in the output verbatim, so it's worth writing it as instructions for future-you.

.refine vs .superRefine

When you need more than one independent constraint on an object — or per-field error messages — .superRefine is easier:

.superRefine((data, ctx) => {
  if (data.category === "reviews" && !data.affiliate) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: "reviews posts must set affiliate: true",
      path: ["affiliate"],
    });
  }
  if (data.draft && data.updatedDate) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: "draft posts should not carry updatedDate",
      path: ["updatedDate"],
    });
  }
})
Enter fullscreen mode Exit fullscreen mode

For a single relationship between two fields, .refine() stays lighter.

Typed structured data in frontmatter

HowTo and FAQPage JSON-LD blocks pull their data from frontmatter rather than from parsed body text. The reasons:

  • Frontmatter is what Zod validates, so the shape is enforced for free
  • A heading rename doesn't quietly break JSON-LD
  • The JSON-LD generator can trust frontmatter without re-parsing MDX

Schema:

howto: z
  .object({
    name: z.string().optional(),
    description: z.string().optional(),
    totalTime: z.string().optional(),
    steps: z.array(
      z.object({
        name: z.string(),
        text: z.string(),
        image: z.string().optional(),
      }),
    ),
  })
  .optional(),
faq: z
  .array(
    z.object({
      question: z.string(),
      answer: z.string(),
    }),
  )
  .optional(),
Enter fullscreen mode Exit fullscreen mode

YAML side:

---
title: "Astro Content Collections tips"
faq:
  - question: "When do you reach for .superRefine over .refine?"
    answer: "When one object needs more than one independent constraint..."
  - question: "What breaks when the schema changes?"
    answer: "Every existing post  by design..."
---
Enter fullscreen mode Exit fullscreen mode

A howto with zero steps, or a faq entry missing answer, fails the build.

What Zod can't reach

Zod only inspects frontmatter — the body MDX is outside its scope.

Google's quality guidelines flag JSON-LD without body counterparts as structured-data mismatch and pull the rich-result eligibility. A post with frontmatter FAQs that never appear in the body passes the schema and silently disqualifies itself.

The fix is a separate layer. A small grep-based validator covers it:

import { readFile } from "node:fs/promises";
import { parse as parseYaml } from "yaml";

const raw = await readFile(path, "utf8");
const m = /^---\r?\n([\s\S]*?)\r?\n---\r?\n([\s\S]*)$/.exec(raw);
if (!m) process.exit(0);

const data = parseYaml(m[1]);
const body = m[2].replace(/\s+/g, " ").toLowerCase();

const mismatches = [];
if (Array.isArray(data.faq)) {
  for (const [i, q] of data.faq.entries()) {
    const needle = q.question.replace(/\s+/g, " ").toLowerCase();
    if (!body.includes(needle)) {
      mismatches.push(`faq[${i}].question not in body: "${q.question}"`);
    }
  }
}

if (mismatches.length) {
  for (const e of mismatches) console.error(e);
  process.exit(1);
}
Enter fullscreen mode Exit fullscreen mode

It's substring presence only. The script doesn't catch a wrong answer under the right question — that's a review-time concern.

The three-layer split

Once you split rules across three layers, "where should this rule live?" becomes answerable:

Layer Fires at Catches Misses
Zod schema astro build types, enums, required/optional, field relations meaning, body parity
Lint script pre-commit, CI banned phrases, substring parity meaning
Review pre-publish meaning, judgment calls not automatable

Rule of thumb: if a higher layer can catch it, don't push it down.


The full operational notes — the failure modes I keep an eye on, the disclosure-strength judgments, the decision history of why some rules stay in review — live on Aulvem → Pushing operational rules into Astro Content Collections with Zod

Top comments (0)