I built a GitHub Action that fails your CI when your Schema.org markup is broken

#seo #webdev #github #python

Google quietly retired the Structured Data Testing Tool a while back, and the thing it replaced it with, the Rich Results Test, is a manual web form: rate-limited, and it only works on URLs that are already live. So the usual workflow is "ship the page, then go find out the JSON-LD was broken." That is backwards.

I wanted the check to live where every other quality gate lives: in CI, on every push, failing the build before the broken markup reaches production. I couldn't find a clean zero-dependency one, so I wrote one and open-sourced it.

What it does

structured-data-action validates the JSON-LD / Schema.org structured data on a page (or in a file, or a raw snippet) and fails the job when a required rich-result property is missing.

One Python file, standard library only (3.8+). No pip install, no node_modules.
Validates 20+ types Google uses for rich results: Article, FAQPage, Product, Offer, Organization, LocalBusiness, BreadcrumbList, HowTo, Review, Recipe, Event, VideoObject, JobPosting, WebSite, and more.
Errors vs. warnings. Missing a required prop = error (the rich result is broken). Missing a recommended prop = warning (eligible, but weaker).
Extracts every <script type="application/ld+json"> block on the page, including objects nested inside @graph.
CI-native: GitHub annotations on the exact file, a job-summary table, and meaningful exit codes.

Drop it in a workflow

name: SEO checks
on: [push, pull_request]
jobs:
  structured-data:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: atlashey-collab/structured-data-action@v1
        with:
          target: 'dist/**/*.html'   # a URL, an HTML file, or a glob
          fail-on-error: 'true'      # break the build if required props are missing
          fail-on-warning: 'false'   # flip to true to also fail on missing recommended props

Point target at a live URL instead of a glob and it'll fetch and check that page directly.

Or just run it locally

python3 validate_schema.py https://example.com
python3 validate_schema.py "dist/**/*.html" --fail-on-error
python3 validate_schema.py snippet.json --json

• https://example.com/product
  ✓ Product / merchant listing — complete
  • Offer
      ✗ missing required: priceCurrency
      ⚠ missing recommended: availability
  ✓ FAQ rich result — complete

Summary: 1 required (error) and 1 recommended (warning) properties missing.

How it decides

For each typed object, it checks the properties Google documents for that type. Required missing → error (ineligible). Recommended missing → warning (qualifies, shown with less detail). It also enforces a few high-value extras, e.g. a Product must carry at least one of offers, review, or aggregateRating, which is the single most common reason a product snippet silently doesn't show.

It's a fast structural pre-flight, not a renderer. For final "will Google actually draw the rich result on this live URL" confirmation, still run Google's Rich Results Test once before launch. Use this on every commit; use Google's tool for the final sign-off.

One more reason it matters in 2026

The same JSON-LD that earns Google rich results also helps AI answer engines (ChatGPT, Perplexity, Gemini) parse and cite your pages. Broken schema is a silent tax on both classic SEO and AI-search visibility. A CI gate keeps it honest.

It's MIT licensed, issues and PRs welcome: https://github.com/atlashey-collab/structured-data-action

There's also a free in-browser version if you just want to paste a snippet and eyeball it: schema-validator. The Action keeps the same rule set, just wired into your pipeline.