Juan Torchia

Posted on Apr 17 • Edited on Apr 20 • Originally published at juanchi.dev

Open Data and Creativity: How I Made Buenos Aires Trains Play Music

#english #experiments #datosabiertos #transporte

Open Data and Creativity: How I Made Buenos Aires Trains Play Music

Live demo: amba-trenes-sonoros.vercel.app
Open source: github.com/JuanTorchia/amba-trenes-sonoros

The idea that stole my weekend

I've had this tab pinned for years: Conductor, by Alexander Chen. It's a visualization of the New York subway where every train passing through a station plucks a string. The MTA publishes a live GPS feed, and Chen wired it up to a synthesizer. The result is a piece that composes itself from the city's real traffic.

I always wanted to build something like that for Buenos Aires. The city that never sleeps, actually sounding. Eight train lines, hundreds of thousands of trips a day, all converted into collective music.

Last weekend I finally sat down to try. And I ran into something that happens to me constantly when working with Argentine public data: what I needed doesn't exist.

The dead end (and why it's not the end)

The NY project runs on GTFS-RT: a real-time extension of the GTFS standard that publishes each vehicle's position every few seconds.

I spent a couple of hours searching for the Argentine equivalent. Here's the inventory I ended up with:

Source	What it has	Why it doesn't work
Trenes Argentinos API	Real GPS positions	OAuth2 + signed agreement with the Ministry
SUBE API	Personal card balances and movements	No aggregate flow data
SOFSE GTFS-RT	—	No public version exists
Scraping official apps	Could yield partial data	Legal gray area, ethically sketchy
Static GTFS on datos.gob.ar	Scheduled timetables	✅ Open, free, immaculate

That last row is the one that changes everything. The scheduled timetable is open data, published by the state, no friction. It's not a live picture of the AMBA — it's the picture of the AMBA as it promises to be.

That's where the first architectural decision of the project happens, and it has nothing to do with code:

Accept what the available data allows, and say so out loud.

The entire project — the code, the UI, this post — is written with that honesty baked in. It's not real-time. It's the best you can do with data anyone can download.

And it turns out that's enough.

Thinking like an architect under constraints

If you work professionally with systems, this dilemma is daily bread: the ideal API doesn't exist, the budget falls short, the permission never arrives. The architect's craft isn't picking the ideal solution in a vacuum — it's picking the most honest solution within what's actually there.

What we did:

Accept the constraint: we're not going to have real-time data.
Reframe the problem: sonify the schedule, not the movement.
Design so the constraint is visible: make sure the user understands what they're listening to.

With that settled, the rest of the project becomes possible.

The architecture in a diagram

┌────────────────────────┐
│  datos.gob.ar          │  Static GTFS (zip with CSVs)
│  (Ministry of          │
│  Transportation)       │
└────────────┬───────────┘
             │ fetch (once, at build-time)
             ▼
┌────────────────────────┐
│  scripts/build-gtfs.ts │  Parses routes.txt + trips.txt
│                        │  + stop_times.txt
└────────────┬───────────┘
             │ writes flat JSON
             ▼
┌────────────────────────┐
│  data/schedule.json    │  ~1-2MB, committed to the repo
└────────────┬───────────┘
             │ static import (Next.js bundler)
             ▼
┌────────────────────────┐
│  lib/schedule.ts       │  getActiveTrainsAt(minute)
│  (pure runtime)        │  no network, no filesystem
└────────────┬───────────┘
             │
             ├────────────────┐
             ▼                ▼
      UI (React)        Tone.js (audio in browser)

Four design decisions worth unpacking.

Decision 1: process the GTFS at build time, not at runtime

The zip is 10–20MB and every CSV needs parsing. We could do it on-demand when a user lands on the page, but that means:

High latency on every request.
A hard dependency on datos.gob.ar being up whenever someone visits.
The parser running in every serverless instance.

The alternative: run the parser once, at next build time, and produce a pre-chewed data/schedule.json that's much smaller and gets bundled right into the deploy. The resulting site is completely static: no backend, no database, no serverless functions. Vercel serves it straight from CDN.

The cost: the data "freezes" at build time. If timetables change tomorrow, you redeploy. For an art project, redeploying once a month is perfectly fine.

Decision 2: sonify in the browser, not on the server

We could generate WAVs or MP3s server-side and serve them. We could stream audio over WebSocket. We could use the raw Web Audio API.

We went with Tone.js on the client because:

Each user generates their own piece locally. Zero audio bandwidth cost.
Interactivity becomes trivial: muting a line, sliding the time, adjusting volume → all instant, nothing crosses the wire.
Tone.js abstracts ADSR, polyphony, and scheduling with a clean musical API.

The downside is autoplay policy: browsers won't play audio without a user gesture. We handle it with a big button that says "Listen to the AMBA". The gesture is part of the ritual.

Decision 3: one synth per line, not per train

First prototype: each train instantiated its own Tone.Synth. It worked fine with 20 active trains. It completely locked up with 200.

The fix: each line gets a single PolySynth that receives a chord per tick. If five Sarmiento trains are sounding simultaneously, the Sarmiento PolySynth receives five notes at once. Tone.js handles the polyphony internally.

It's a common pattern: group by identity instead of by individual. An architect recognizes it in a thousand contexts — rate limiting by user, connections pooled by host, etc.

Decision 4: major pentatonic, not chromatic

This one's a musical decision with architectural consequences.

Trains don't coordinate with each other. Each line fires notes independently. If we used a scale with semitones (any "normal" Western scale), two trains playing at the same moment could produce harsh dissonances — minor seconds, tritones.

The major pentatonic — C D E G A — has zero semitone intervals. Any simultaneous combination sounds consonant. It's the same trick used in kindergarten xylophones: no matter which bars you hit, it never sounds wrong.

By choosing the scale, you eliminate an entire category of musical bugs by design. It's a data-level decision, not a code-level one — in a distributed system where producers are independent, you tune the protocol so any combination is valid.

The full flow, looking at code

GTFS Parser (simplified):

// scripts/build-gtfs.ts
const routes = parseCsv(zip.readTxt("routes.txt"));
const trips = parseCsv(zip.readTxt("trips.txt"));
const stopTimes = parseCsv(zip.readTxt("stop_times.txt"));

// For each trip, we calculate its start and duration
const tripTimes = new Map<string, { start: number; end: number }>();
for (const st of stopTimes) {
  const minute = hhmmssToMinutes(st.departure_time);
  const current = tripTimes.get(st.trip_id);
  if (!current) tripTimes.set(st.trip_id, { start: minute, end: minute });
  else tripTimes.set(st.trip_id, {
    start: Math.min(current.start, minute),
    end: Math.max(current.end, minute),
  });
}

Converts the universe of ~millions of rows in stop_times.txt into a Map with a few thousand entries: start and end of each trip in minutes of the day. Everything else gets thrown away.

Runtime query:

// lib/schedule.ts
export function getActiveTrainsAt(minute: number): ActiveTrain[] {
  const out: ActiveTrain[] = [];
  for (const trip of schedule.trips) {
    const end = trip.startsAtMinute + trip.durationMinutes;
    if (minute >= trip.startsAtMinute && minute < end) {
      out.push({
        tripId: trip.tripId,
        lineId: trip.lineId,
        note: noteForIndex(trip.startsAtMinute),
        progress: (minute - trip.startsAtMinute) / trip.durationMinutes,
      });
    }
  }
  return out;
}

Pure loop. No indexes, no cache. With ~5–10K trips the browser runs this in under 1ms. Optimizing before measuring is a trap.

Sonification:

// lib/sonify.ts (simplified)
playTick(active: ActiveTrain[]): void {
  const byLine = new Map<string, string[]>();
  for (const t of active) {
    const notes = byLine.get(t.lineId) ?? [];
    notes.push(t.note);
    byLine.set(t.lineId, notes);
  }
  for (const [lineId, notes] of byLine) {
    const voice = this.voices.get(lineId);
    voice.synth.triggerAttackRelease(Array.from(new Set(notes)), "2n");
  }
}

triggerAttackRelease with an array of notes is Tone.js's mechanism for firing a chord. "2n" is the duration in musical notation (half note) — independent of BPM, flexible.

What actually came out

The demo lives at amba-trenes-sonoros.vercel.app and the full repo is on GitHub.

The patterns you hear are real:

05:00–07:00: few trains, isolated notes, long silences. The system waking up.
07:30–09:30: morning rush hour. Maximum density. All seven lines playing simultaneously, 15–20 notes in parallel.
11:00–14:00: medium frequency. Each timbre comes through more clearly.
17:30–20:00: evening rush hour. Just as dense as the morning, but psychologically different — people heading home.
23:00–04:00: near silence. A midnight Sarmiento, sometimes.

It's, in the most literal sense, a city listening to itself move.

What's still open

v2 with a map: pull in stops.txt and draw animated dots with each train's approximate position.
Time-of-day modulation: lower tonic in the dead of night, brighter at noon.
Other cities: the code is completely dataset-agnostic. Fork it, swap in a new LINES config, and you've got Córdoba, Rosario, or Mendoza sounding.
GTFS-RT when it exists: if the Ministry ever opens the real feed, swapping the data source is 10 lines of code.

Why I'm publishing this

Because I think small, weird projects are where you learn the most. Because public data is a gift sitting there waiting for someone to use it. Because I wanted to talk about not just what I built but why I built it this way: the tradeoffs, the constraints, the honest decisions.

If you code, grab the repo, change the scale, add a line, build your own version of your own city. The code is MIT, the data belongs to the Argentine state, and the music was always ours.

Useful links