LyricalString

Posted on Apr 6

How I serve 12,237 law pages in 0.3 seconds with Astro and zero client JavaScript

#astro #webdev #opensource #performance

Spanish law is public. Reading it shouldn't cost €200/month.

That's why I built Ley Abierta, an open source platform indexing every Spanish law from 1835 to today. 12,237 laws. 42,000 Git commits tracking every reform. Lighthouse score: 100 Performance.

Here's how it works.

The problem

Spain's official gazette (BOE) publishes legislation as XML. If you want the consolidated text of a law with all reforms applied, you either:

Read the BOE website (good luck navigating it)
Pay for Westlaw, Aranzadi, or similar services

There's no free, searchable, version-controlled source of Spanish law. So I built one.

Architecture overview

BOE API → Pipeline (Bun) → Git repo (Markdown) → Astro (SSG) → Cloudflare Pages
                         → SQLite + FTS5       → Elysia API  → Hetzner Docker

The same data lives in three places, each for a different reason:

Layer	Format	Why
JSON cache	12,231 JSON files	Pipeline source of truth
Git repo	Markdown + YAML frontmatter	Human readable, version control
SQLite	14 tables + FTS5 index	Fast queries, full text search

The website is fully static. The API handles search and the dynamic stuff (email alerts, omnibus detection). They deploy independently.

Astro content collections: 12K pages, one build

Each law is a Markdown file in a public Git repo (leyabierta/leyes). At build time, Astro checks out this repo and treats every file as a content collection entry.

The frontmatter carries all the metadata:

---
title: "Ley 35/2006, de 28 de noviembre, del Impuesto sobre la Renta de las Personas Físicas"
rank: "ley"
status: "vigente"
published_at: "2006-11-29"
jurisdiction: "es"
materias: ["IRPF", "Hacienda Pública", "Impuestos"]
reforms_count: 47
---

Astro generates a static HTML page for each entry. The build takes ~45-60 seconds for all 12,231 pages using Astro 6.1.1's queued rendering with 4-worker concurrency.

What comes out the other side is pure HTML on Cloudflare's CDN. There's nothing to hydrate, nothing to parse on the client. Load time is basically TCP overhead.

Performance:  100
FCP:          0.3s
LCP:          0.7s
TBT:          0ms

The daily pipeline

Every morning, a GitHub Actions workflow:

Discovers new laws from the BOE API
Fetches XML in parallel (6 workers, rate limited)
Parses metadata, articles, reform history
Commits each law as a Markdown file with the real publication date as the commit date
Pushes to the leyes repo
Triggers an Astro rebuild if anything changed

On Sundays, a full re-check catches updates to existing laws. Weekdays are incremental, only new publications.

The pre-1970 problem

Git stores dates as Unix timestamps. The oldest law in the database is from 1835. That's before Unix.

My workaround: commit date is set to 1970-01-02 (earliest safe date), but the real publication date lives in YAML frontmatter and a custom Git trailer (Source-Date: 1835-05-24). The web and API always use the real date. Git history shows the placeholder.

This affects ~334 laws. Not ideal, but it preserves the commit-per-reform model that makes git diff work across the entire corpus.

BOE API quirks (hard won knowledge)

A few things the documentation won't tell you:

Accept: application/json returns 400 on the /texto endpoint. You must parse XML.
limit=-1 silently caps at 10,000 results. Always paginate with explicit offsets.
The /analisis endpoint returns a subset of subject categories. For the full list, you need to scrape ELI meta tags from the HTML version.
Regional laws use IDs from regional bulletins (BOA, BOJA, DOGV), not BOE. Jurisdiction must be extracted from the ELI URL pattern (/eli/es-pv/ → Basque Country).

Full text search with SQLite FTS5

Search needs to be fast and accent insensitive ("politica" should match "política"). SQLite's FTS5 extension handles this natively.

The search index covers law titles, full text, and citizen friendly tags:

CREATE VIRTUAL TABLE norms_fts USING fts5(
  norm_id UNINDEXED,
  title,
  content,
  citizen_tags
);

Queries use a two-pass approach: title matches rank higher than content matches. Results are paginated with chunked ID filtering to avoid SQLite's variable limit on large result sets (splitting into 5K-item chunks).

The API (Elysia on Bun) exposes this as a REST endpoint with filters for rank, status, jurisdiction, and subject category. Swagger docs at api.leyabierta.es/swagger.

Omnibus law detection

An "omnibus" law bundles unrelated topics into a single piece of legislation. Governments use them to slip unpopular measures past public scrutiny, and in Spain it happens all the time. A tax reform hidden inside a natural disaster decree, that kind of thing. Nobody was tracking it, so I built a detector.

How detection works

If a law touches 15+ distinct subject categories (after filtering generic ones like "Public Administration"), flag it as omnibus
Extract the law's structure (titles, chapters, articles) and send to Gemini Flash
The model generates a label, headline, summary, article count, and a sneaked_in boolean for each topic

The sneaked_in flag is the interesting part. It catches topics that have nothing to do with the law's official title. Energy regulation buried in a social security update, that sort of thing.

{
  "topic_label": "Energía (medida encubierta)",
  "headline": "New renewable energy requirements",
  "article_count": 8,
  "sneaked_in": true
}

Cost: ~$0.01/day using Gemini Flash through OpenRouter.

Results are served via API, rendered on the /omnibus page, and available as an RSS feed.

Email notifications

Citizens can subscribe to topics they care about. When a law affecting those topics gets reformed, they get an email with a plain language summary.

The system is event driven:

Daily cron generates AI summaries for new reforms
Match subscriber topics against reform subjects
Send via Resend (transactional email)
Track in notified_reforms to prevent duplicates

Double opt-in uses HMAC-signed confirmation links. No authentication needed, subscriptions are managed by email token.

Things I'd do differently

SQLite from day one. I spent weeks querying the Git repo directly before accepting that Git is not a database. git log --grep is not a substitute for WHERE materia = 'IRPF' AND status = 'vigente'.

I also shouldn't have trusted the BOE API docs. They're incomplete and in some places just wrong. Would have saved time starting by scraping the endpoints and figuring out the actual behavior.

One Astro gotcha: content collections with 12K+ files will eat your memory during builds if you're not careful. Queued rendering in Astro 6 fixed this but I burned a few afternoons on OOM crashes before finding it.

Try it

Web: leyabierta.es
API: api.leyabierta.es/swagger
Code: github.com/leyabierta/leyabierta (MIT)
Laws: github.com/leyabierta/leyes (Public domain)

If you have ideas, spot bugs, or want to adapt this for your country's legislation, issues and PRs are welcome.

I'm Alex, a solo developer from Spain. You can find me on LinkedIn.

DEV Community