It started because I wanted to hear a pipe organ. There are six in all of Egypt. Six. For a country of 100 million people.
So I built one.
Three weeks later I have a fully polyphonic synthesizer that generates every sound from pure math in real-time, and a genuine question about why commercial organ plugins cost $400.
Why not just use samples
Every digital organ you've heard in a plugin, a keyboard workstation, a movie soundtrack, is almost certainly sample-based. Someone recorded a real pipe organ in a cathedral, captured every note on every stop, packaged it into a library. Press a key, play back the matching recording.
I didn't want to do that because it felt like cheating on the learning objective (and basically impossible in Egypt). But the more I thought about it, the more I realized samples have a real limitation: they capture what a pipe sounds like in steady state. They miss everything that happens in the first 100ms when a note starts.
Real pipe organs are mechanical. When you press a key, a physical valve opens, wind rushes into the pipe before the pitch stabilizes, and there's a transient called chiff as the pipe "speaks." There's also a tiny mechanical click from the valve mechanism at note onset. These aren't recording artifacts. They're part of what makes an organ sound like an organ.
Sample libraries try to capture this by recording the attack too, but you can't reshape it afterward. With synthesis, it's just a parameter.
How a pipe organ actually makes sound
Every stop is a different harmonic recipe. When air vibrates inside a pipe it produces a fundamental frequency plus harmonics at integer multiples, each at different amplitudes depending on the pipe's shape. What makes a Flute stop sound different from a Principal isn't some mysterious tonal quality. The Flute has almost no overtones because the stopped pipe physically suppresses them. The Principal has a balanced mix of harmonics. A Reed stop has a dense harmonic series that approaches a sawtooth wave, which is why it sounds buzzy.
So synthesizing these is mostly just: figure out the right harmonic amplitudes for whichever stops are active, spin up oscillators at those frequencies, add them together. That's the whole engine.
Organum runs up to 18 oscillators per voice, two per drawbar position (9 positions, plus a slightly detuned chorus partner for each). The chorus partner creates the beating pattern characteristic of string stops and adds subtle width to everything else. At 8 simultaneous voices that's potentially 144 oscillators running. On my machine it takes about 45ms to render a 4096-sample block. The budget is 93ms. Fine.
The envelope rabbit hole
ADSR envelopes seem like the boring part until you try to implement them without causing clicks.
A linear attack has a discontinuous derivative at note onset. The slope goes from 0 to some positive value instantaneously, and the ear hears that as a click. The fix is a smoothstep curve, 3t² − 2t³, which has zero slope at both endpoints. The level eases in from silence with no abrupt slope change. No click.
The more interesting problem is retrigger. What happens when you re-press a key that's already in release?
If you restart the attack from 0, the level jumps immediately from wherever the release was down to 0, then the attack begins. That's a click. I added a dedicated retrigger stage: a 15ms crossfade using the same smoothstep curve, going from the current release level back up to sustain. The level transition is smooth and the slope is continuous. The voice doesn't restart, oscillator phases are preserved, and the note just resurfaces from wherever it was.
Completely inaudible when it works. That's the goal.
The reverb
A dry pipe organ sounds like a keyboard in a recording booth. The room is doing a huge amount of perceptual work in making it feel like a real instrument.
The reverb is a stereo pair of Schroeder networks. Each channel runs a pre-delay, then 7 parallel comb filters whose outputs get summed, then 4 allpass filters to smooth the tail. Left and right channels use different prime-offset delay times so their reflections never coincide. If they lined up the stereo image would collapse to mono.
There are also three tuned sine oscillators simulating room resonance, the standing waves that build up in a large stone building at specific low frequencies. Without them, even a long reverb tail doesn't convey the physical weight of a cathedral. With them, the Gothic Basilica preset is unplayable at fast tempos because the decay from one chord bleeds into the next. Which is historically accurate, actually. Fast repertoire in large cathedrals wasn't common for exactly this reason.
The gain problem
When you play one note, gain is 1.0. Two simultaneous notes summed directly are twice as loud. A full chord clips hard.
The fix is scaling the output by 1/√N where N is the number of active voices. Perceived loudness stays roughly constant as you add notes.
The naive implementation applies this as a hard scalar at the start of each buffer. The problem: going from 1 voice to 2 voices drops the gain by 30% in a single sample. Audible. Every time you add a note there's a pop.
The actual fix is ramping the gain linearly across the entire buffer. Going from 1.0 to 0.707 over 4096 samples is completely inaudible. The ear can't track a gradual change that slow. Going from 1.0 to 0.707 in one sample is immediately obvious. So the mixer tracks the previous gain value and interpolates to the new target across every block.
The real-time stuff
The audio callback has a hard deadline: produce 4096 samples every 93ms or the sound card glitches. Everything is organized around not missing that.
The GUI thread never touches audio state directly. All communication goes through a lock-free queue of event objects, note on/off, drawbar changes, stop toggles, room presets. The queue gets drained at the start of each render.
Everything in the audio path is vectorized numpy. No per-sample Python loops anywhere hot. The thing that made the biggest performance difference was pre-allocating work buffers in the oscillator. A naive implementation was doing roughly 1,000 numpy array allocations per callback. Pre-allocated arrays reused each call dropped that to near-zero.
The harmonic amplitude calculation is also computed once per block in the mixer and shared across all voices, rather than each voice redundantly recalculating it.
What it actually sounds like
Good. Genuinely good. The chiff on the reed stops has that breathy quality of real pipes speaking. A low chord through the cathedral reverb hits with physical weight. The Voix Humaine stop with tremulant active sounds like a bad impression of a human voice, which is exactly what the real pipe does too.
I keep loading the Gothic Basilica preset and playing Bach chorales at 2am. Not ideal for anyone sleeping nearby. Very good for me personally.
The code is on GitHub. uv sync and uv run python main.py. If you want to hear what this actually does, pull all the stops with numpad 0, set the room to Gothic Basilica, and play something slow.
Top comments (0)