Most scraping projects start by finding a website to scrape. This one started from the opposite direction: I knew the data existed as official government downloads, and my job was to make it accessible via a clean API.
The data source
Ofsted (the UK school inspections body) publishes monthly management information as CSV files on GOV.UK. The file covers all 22,000+ state-funded schools in England with their latest inspection grades, local authority, postcode, phase, and size data. It's 16 MB, published under the Open Government Licence v3.0 — explicitly permitting commercial use.
No scraping needed. No authentication. Just a CSV download and some parsing logic.
The architecture
The actor is deliberately simple:
- Fetch the GOV.UK stats page to find the current month's CSV URL (the URL hash changes with each release)
- Download the CSV (~16 MB from
assets.publishing.service.gov.uk) - Parse it with
csv-parse - Apply the user's filters (name, local authority, region, postcode prefix)
- Push matching records to the Apify dataset
No Crawlee. No browser. No proxy. Just fetch() and a CSV parser.
const match = html.match(
/href="(https:\/\/assets\.publishing\.service\.gov\.uk\/[^"]+latest_inspections_as_at[^"]+\.csv)"/
);
That one regex does the URL discovery. The GOV.UK page lists files in reverse chronological order, so the first match is always the latest release.
The interesting part: Ofsted changed their grading system mid-build
I built this in May 2026. In November 2025, Ofsted scrapped their 20-year-old four-word judgement system (Outstanding / Good / Requires Improvement / Inadequate) and replaced it with a report card format — six separate grade areas, each on a five-point scale:
- Exceptional
- Strong
- Expected standard
- Needs attention
- Urgent improvement
Plus a standalone Safeguarding verdict (Met / Not met).
The April 2026 CSV reflects this change entirely. There's no "Overall effectiveness" column. Schools inspected before November 2025 have null grades in the new columns, and schools inspected after have the new-format grades.
The solution was simple: expose all six grade columns as-is, with null for ungraded fields. Users can filter in their own tools. The schema is forward-compatible — when Ofsted adds new categories, they add new columns, and the [key: string]: string index type on the raw row interface handles any extras.
The result
A working actor that:
- Returns clean JSON for any school in England, filtered by LA, region, postcode, or name
- Supports bulk lookups by URN list
- Runs in under 60 seconds on Apify infrastructure
- Costs $1.00 per run + $0.05 per school returned
No session tokens, no anti-bot cat-and-mouse, no fragile HTML selectors. Sometimes the best scraper is the one that doesn't scrape anything.
The actor is live on the Apify Store: Ofsted School Register Scraper
Data sourced from GOV.UK under the Open Government Licence v3.0.
Top comments (0)