Spanish law is public. Reading it shouldn't cost €200/month.
That's why I built Ley Abierta, an open source platform indexing every Spanish law from 1835 to today. 12,237 laws. 42,000 Git commits tracking every reform. Lighthouse score: 100 Performance.
Here's how it works.
The problem
Spain's official gazette (BOE) publishes legislation as XML. If you want the consolidated text of a law with all reforms applied, you either:
- Read the BOE website (good luck navigating it)
- Pay for Westlaw, Aranzadi, or similar services
There's no free, searchable, version-controlled source of Spanish law. So I built one.
Architecture overview
BOE API → Pipeline (Bun) → Git repo (Markdown) → Astro (SSG) → Cloudflare Pages
→ SQLite + FTS5 → Elysia API → Hetzner Docker
The same data lives in three places, each for a different reason:
| Layer | Format | Why |
|---|---|---|
| JSON cache | 12,231 JSON files | Pipeline source of truth |
| Git repo | Markdown + YAML frontmatter | Human readable, version control |
| SQLite | 14 tables + FTS5 index | Fast queries, full text search |
The website is fully static. The API handles search and the dynamic stuff (email alerts, omnibus detection). They deploy independently.
Astro content collections: 12K pages, one build
Each law is a Markdown file in a public Git repo (leyabierta/leyes). At build time, Astro checks out this repo and treats every file as a content collection entry.
The frontmatter carries all the metadata:
---
title: "Ley 35/2006, de 28 de noviembre, del Impuesto sobre la Renta de las Personas Físicas"
rank: "ley"
status: "vigente"
published_at: "2006-11-29"
jurisdiction: "es"
materias: ["IRPF", "Hacienda Pública", "Impuestos"]
reforms_count: 47
---
Astro generates a static HTML page for each entry. The build takes ~45-60 seconds for all 12,231 pages using Astro 6.1.1's queued rendering with 4-worker concurrency.
What comes out the other side is pure HTML on Cloudflare's CDN. There's nothing to hydrate, nothing to parse on the client. Load time is basically TCP overhead.
Performance: 100
FCP: 0.3s
LCP: 0.7s
TBT: 0ms
The daily pipeline
Every morning, a GitHub Actions workflow:
- Discovers new laws from the BOE API
- Fetches XML in parallel (6 workers, rate limited)
- Parses metadata, articles, reform history
- Commits each law as a Markdown file with the real publication date as the commit date
- Pushes to the leyes repo
- Triggers an Astro rebuild if anything changed
On Sundays, a full re-check catches updates to existing laws. Weekdays are incremental, only new publications.
The pre-1970 problem
Git stores dates as Unix timestamps. The oldest law in the database is from 1835. That's before Unix.
My workaround: commit date is set to 1970-01-02 (earliest safe date), but the real publication date lives in YAML frontmatter and a custom Git trailer (Source-Date: 1835-05-24). The web and API always use the real date. Git history shows the placeholder.
This affects ~334 laws. Not ideal, but it preserves the commit-per-reform model that makes git diff work across the entire corpus.
BOE API quirks (hard won knowledge)
A few things the documentation won't tell you:
-
Accept: application/jsonreturns 400 on the/textoendpoint. You must parse XML. -
limit=-1silently caps at 10,000 results. Always paginate with explicit offsets. - The
/analisisendpoint returns a subset of subject categories. For the full list, you need to scrape ELI meta tags from the HTML version. - Regional laws use IDs from regional bulletins (BOA, BOJA, DOGV), not BOE. Jurisdiction must be extracted from the ELI URL pattern (
/eli/es-pv/→ Basque Country).
Full text search with SQLite FTS5
Search needs to be fast and accent insensitive ("politica" should match "política"). SQLite's FTS5 extension handles this natively.
The search index covers law titles, full text, and citizen friendly tags:
CREATE VIRTUAL TABLE norms_fts USING fts5(
norm_id UNINDEXED,
title,
content,
citizen_tags
);
Queries use a two-pass approach: title matches rank higher than content matches. Results are paginated with chunked ID filtering to avoid SQLite's variable limit on large result sets (splitting into 5K-item chunks).
The API (Elysia on Bun) exposes this as a REST endpoint with filters for rank, status, jurisdiction, and subject category. Swagger docs at api.leyabierta.es/swagger.
Omnibus law detection
An "omnibus" law bundles unrelated topics into a single piece of legislation. Governments use them to slip unpopular measures past public scrutiny, and in Spain it happens all the time. A tax reform hidden inside a natural disaster decree, that kind of thing. Nobody was tracking it, so I built a detector.
How detection works
- If a law touches 15+ distinct subject categories (after filtering generic ones like "Public Administration"), flag it as omnibus
- Extract the law's structure (titles, chapters, articles) and send to Gemini Flash
- The model generates a label, headline, summary, article count, and a
sneaked_inboolean for each topic
The sneaked_in flag is the interesting part. It catches topics that have nothing to do with the law's official title. Energy regulation buried in a social security update, that sort of thing.
{
"topic_label": "Energía (medida encubierta)",
"headline": "New renewable energy requirements",
"article_count": 8,
"sneaked_in": true
}
Cost: ~$0.01/day using Gemini Flash through OpenRouter.
Results are served via API, rendered on the /omnibus page, and available as an RSS feed.
Email notifications
Citizens can subscribe to topics they care about. When a law affecting those topics gets reformed, they get an email with a plain language summary.
The system is event driven:
- Daily cron generates AI summaries for new reforms
- Match subscriber topics against reform subjects
- Send via Resend (transactional email)
- Track in
notified_reformsto prevent duplicates
Double opt-in uses HMAC-signed confirmation links. No authentication needed, subscriptions are managed by email token.
Things I'd do differently
SQLite from day one. I spent weeks querying the Git repo directly before accepting that Git is not a database. git log --grep is not a substitute for WHERE materia = 'IRPF' AND status = 'vigente'.
I also shouldn't have trusted the BOE API docs. They're incomplete and in some places just wrong. Would have saved time starting by scraping the endpoints and figuring out the actual behavior.
One Astro gotcha: content collections with 12K+ files will eat your memory during builds if you're not careful. Queued rendering in Astro 6 fixed this but I burned a few afternoons on OOM crashes before finding it.
Try it
- Web: leyabierta.es
- API: api.leyabierta.es/swagger
- Code: github.com/leyabierta/leyabierta (MIT)
- Laws: github.com/leyabierta/leyes (Public domain)
If you have ideas, spot bugs, or want to adapt this for your country's legislation, issues and PRs are welcome.
I'm Alex, a solo developer from Spain. You can find me on LinkedIn.
Top comments (0)