SimpleMeteo

Posted on Apr 28

Why we self-hosted Open-Meteo: AI crawlers, rate limits, and 100 ms we didn't expect to win

#webdev #devops #performance #opensource

TL;DR — We run uvi.today (UV index), pollen.today (pollen forecast) and airindex.today (air quality). All three pull from Open-Meteo. AI crawlers pushed us past the free tier, the paid tier worked but still wasn't a fit for crawler-heavy traffic, so we ended up self-hosting Open-Meteo on a single VPS. Disk: ~50 GB. Latency: 90–100 ms faster per request. Cost: less than the paid plan.

This is a short write-up of how we got there, what the migration actually looked like, and a couple of things we'd flag for anyone thinking about doing the same.

The setup

The three sites are simple Next.js apps. Each city page renders server-side and calls Open-Meteo for two datasets:

The forecast API for temperature, weather code, humidity, etc.
The CAMS air quality API for UV index, AQI components, and pollen.

We cache responses in an LRU on the Node side (coord-keyed, 60 min TTL) and that's it. No queue, no warm cache jobs, no background workers.

For the first months, the free public Open-Meteo API was perfect. The free tier is generous:

Tier	Per minute	Per hour	Per day	Per month
Free	600	5,000	10,000	300,000
Standard	unlimited	unlimited	unlimited	1,000,000

Source: open-meteo.com/en/pricing. The free tier is non-commercial; the Standard plan is what you upgrade to once you put ads on the site.

Then the crawlers showed up

Search Console traffic was modest. Logs were not. Once the sites started ranking on long-tail queries, every AI crawler on the planet decided that a city-by-locale URL grid was an irresistible buffet.

In our access logs we kept seeing the same User-Agents on tight loops:

GPTBot
ClaudeBot
Bytespider
Amazonbot
Meta-ExternalAgent

The peak we measured before blocking them was around 15,000 requests per hour from a single bot. On a sister project that took longer to get blocking right, we saw bursts close to 200k/day. None of these bots respect any kind of "please slow down". They either get a 200, a 429, or a 403 — pick one.

We picked 403, eventually. But before we did, the public Open-Meteo API started returning 429s during peaks, and our pages started erroring out for real users.

The math is brutal: 100 cities × 9 locales = 900 cacheable URLs. With a 60 minute cache TTL that's 900 origin requests per hour worst case for our own users. Add a single misbehaving crawler that ignores cache headers and asks for ?lat=…&lon=… with random rounding, and the cache hit rate collapses. We were burning through 10,000 calls/day in a few hours.

We tried the paid tier first

The Standard plan removes the per-minute, per-hour and per-day caps and gives you 1,000,000 calls per month on a customer-api.open-meteo.com host. Switching is one env var:

# .env
OPENMETEO_API_KEY=...

// src/lib/uv-api.ts (excerpt)
const omHost = process.env.OPENMETEO_HOST;
const apiKey = !omHost ? process.env.OPENMETEO_API_KEY : undefined;

const CAMS_BASE = omHost
  ? `${omHost}/v1/air-quality`
  : apiKey
    ? 'https://customer-air-quality-api.open-meteo.com/v1/air-quality'
    : 'https://air-quality-api.open-meteo.com/v1/air-quality';

Open-Meteo are upfront in their FAQ that monthly limits aren't being enforced yet — they're still building the usage portal — so in practice the Standard plan is "soft 1M/month, dedicated servers, commercial use OK". This solved the rate-limit problem immediately.

It did not solve two other things:

Crawler load is wasteful spend. Even if the limit isn't enforced, paying for traffic that produces no revenue (and no useful index entry on most platforms) is irritating.
Latency. Every page render fans out to two API hosts in another datacenter. We were measuring p50 around 180–220 ms per upstream call from our box. CAMS pollen + forecast = two of those, mostly serial.

Self-hosting Open-Meteo

This is the part that surprised us: it is genuinely easy.

Open-Meteo publishes a single Docker image (ghcr.io/open-meteo/open-meteo) that does both jobs:

serve — runs the API on port 8080. Same query syntax as the public API.
sync <model> <variables> — pulls the latest model run from the upstream provider (DWD, NOAA, ECMWF, MET Norway, …) and writes it to a shared volume.

You run one serve and as many sync workers as you have models you care about. Each sync job re-runs on an interval (--repeat-interval 5 = every 5 minutes) and stores the last N days of past data (--past-days 3).

For us, the relevant compose file looks roughly like this:

services:
  open-meteo:
    image: ghcr.io/open-meteo/open-meteo
    volumes: [open-meteo-data:/app/data]
    expose: ["8080"]
    command: ["serve"]

  sync-dwd:
    image: ghcr.io/open-meteo/open-meteo
    volumes: [open-meteo-data:/app/data]
    command:
      - sync
      - dwd_icon,dwd_icon_eu,dwd_icon_d2
      - temperature_2m,relative_humidity_2m,weather_code,cloud_cover,precipitation,shortwave_radiation
      - --past-days=3
      - --repeat-interval=5

  sync-cams:
    image: ghcr.io/open-meteo/open-meteo
    volumes: [open-meteo-data:/app/data]
    command:
      - sync
      - cams_global,cams_europe
      - uv_index,uv_index_clear_sky,pm10,pm2_5,ozone,alder_pollen,birch_pollen,grass_pollen,ragweed_pollen
      - --past-days=3
      - --repeat-interval=5

volumes:
  open-meteo-data:

We sync six model groups in total:

DWD ICON (11 km global, 7 km EU, 2 km Central EU)
NCEP (GFS 13 km global, HRRR 3 km CONUS)
ECMWF IFS 25 km — long-range
MET Norway / UKMO / BOM / CMC for regional accuracy
CAMS global + Europe — UV, AQI, pollen
A one-off copernicus_dem90 sync for elevation data (~10 GB, runs once)

Application-side change is one line: set OPENMETEO_HOST=http://open-meteo:8080 and the existing client code routes there instead of the public API. No query rewriting needed — that's the nice part of Open-Meteo's design.

What the box actually looks like

Real numbers from a single VPS (8 vCPU, 16 GB RAM, 150 GB disk) running everything — three sites, Caddy, an IP-geo service, monitoring, and the full Open-Meteo stack:

Disk used by Open-Meteo data: ~50 GB and stable. The DEM is the largest one-time cost (~10 GB). The rolling weather data stays bounded by --past-days.
open-meteo serve container at steady state: ~1.1 GiB RAM, ~4 % of one core. Model files are mmapped, so the kernel page cache does most of the work — it's why free -h shows ~13 GiB sitting in buff/cache.
Sync workers: burst CPU when a new model run lands (every 1–6 hours depending on model), idle the rest of the time.
Initial sync: 1–2 hours for the first run. This is the only painful step.

This is a quieter footprint than we expected. Open-Meteo's storage format (here's their write-up) is a custom layout designed for exactly this kind of mmap-friendly point lookups, and you can feel that in the metrics.

The latency win

We weren't optimising for this — we just wanted the rate limits gone — but it turned into the most visible result.

Measured per-call upstream latency from our app container to Open-Meteo:

Public API (api.open-meteo.com): ~100-110 ms
Customer API (customer-api.open-meteo.com): comparable, slightly more consistent
Local container (http://open-meteo:8080): ~10 ms

Per page render that's roughly 90–100 ms shaved off, twice (forecast + CAMS), most of it serial. For a server-rendered Next.js page that has to land HTML before the browser can paint, this is meaningful — we saw it directly in our TTFB numbers.

What we didn't migrate

Not everything makes sense to host yourself:

Geocoding (geocoding-api.open-meteo.com). It's a separate service with its own dataset; we kept it on the public API and put a 1-hour LRU cache in front of it.
Historical / climate / ensemble APIs. We don't use them. If you do, note that the Standard plan also doesn't include them — that's a Professional-tier thing.
Marine / flood APIs. Same — out of scope for us.

The gotchas

A few things to know before you copy the docker-compose:

Pick variables deliberately. Each sync command takes an explicit list of variables. Adding a variable later means re-syncing — don't be too minimal at first.
Disk growth is mostly the DEM. The rolling weather data stays small if --past-days is small. Set this honestly — we use 3.
There is no built-in API key / rate limit on the local instance. It binds to a private Docker network in our case; if you expose it to the internet, put a reverse proxy with auth or a rate limiter in front.
Crawlers will still hit your app. Self-hosting Open-Meteo doesn't solve the crawler problem — it just stops the crawler problem from cascading into a third-party rate-limit problem.
Attribution still applies. Open-Meteo's data is CC BY 4.0; you keep crediting the underlying data sources (DWD, NOAA, ECMWF, CAMS, …) regardless of how you host it.

Was it worth it?

For our shape of traffic — small site, three domains sharing the same upstream, lots of automated traffic — yes, comfortably:

Capacity: effectively unbounded for our scale. We can let crawlers through if we ever change our mind without watching a meter.
Latency: ~10 ms per upstream call, twice per render.
Cost: one VPS that we already had, instead of a per-domain subscription.
Operational risk: lower than expected. The image is one container, the syncs are independent, and a failed sync just means stale-but-still-served data for that model.

Written by the team behind uvi.today, pollen.today, SimpleMeteo and airindex.today. We post the engineering side on X — @SimpleMeteo.

DEV Community