How I built an Ofsted school data API on Apify (without scraping a single webpage)

#webdev #typescript #opensource #datascience

Most scraping projects start by finding a website to scrape. This one started from the opposite direction: I knew the data existed as official government downloads, and my job was to make it accessible via a clean API.

The data source

Ofsted (the UK school inspections body) publishes monthly management information as CSV files on GOV.UK. The file covers all 22,000+ state-funded schools in England with their latest inspection grades, local authority, postcode, phase, and size data. It's 16 MB, published under the Open Government Licence v3.0 — explicitly permitting commercial use.

No scraping needed. No authentication. Just a CSV download and some parsing logic.

The architecture

The actor is deliberately simple:

Fetch the GOV.UK stats page to find the current month's CSV URL (the URL hash changes with each release)
Download the CSV (~16 MB from assets.publishing.service.gov.uk)
Parse it with csv-parse
Apply the user's filters (name, local authority, region, postcode prefix)
Push matching records to the Apify dataset

No Crawlee. No browser. No proxy. Just fetch() and a CSV parser.

const match = html.match(
    /href="(https:\/\/assets\.publishing\.service\.gov\.uk\/[^"]+latest_inspections_as_at[^"]+\.csv)"/
);

That one regex does the URL discovery. The GOV.UK page lists files in reverse chronological order, so the first match is always the latest release.

The interesting part: Ofsted changed their grading system mid-build

I built this in May 2026. In November 2025, Ofsted scrapped their 20-year-old four-word judgement system (Outstanding / Good / Requires Improvement / Inadequate) and replaced it with a report card format — six separate grade areas, each on a five-point scale:

Exceptional
Strong
Expected standard
Needs attention
Urgent improvement

Plus a standalone Safeguarding verdict (Met / Not met).

The April 2026 CSV reflects this change entirely. There's no "Overall effectiveness" column. Schools inspected before November 2025 have null grades in the new columns, and schools inspected after have the new-format grades.

The solution was simple: expose all six grade columns as-is, with null for ungraded fields. Users can filter in their own tools. The schema is forward-compatible — when Ofsted adds new categories, they add new columns, and the [key: string]: string index type on the raw row interface handles any extras.

The result

A working actor that:

Returns clean JSON for any school in England, filtered by LA, region, postcode, or name
Supports bulk lookups by URN list
Runs in under 60 seconds on Apify infrastructure
Costs $1.00 per run + $0.05 per school returned

No session tokens, no anti-bot cat-and-mouse, no fragile HTML selectors. Sometimes the best scraper is the one that doesn't scrape anything.

The actor is live on the Apify Store: Ofsted School Register Scraper

Data sourced from GOV.UK under the Open Government Licence v3.0.