What happens when AI tells you the code is fine but your gut says it isn't?

#ai #webdev #programming #discuss

Halfway through building my audio synthesis engine, I hit a specific wall. Something sounded wrong. Not broken — just slightly off in a way I couldn't immediately point at. I asked AI to review the logic. It told me everything looked correct. I almost moved on.
I didn't. I kept pulling at it. Turned out the logic was wrong in a way that only becomes obvious when you understand what the code is supposed to be physically modelling. No test was going to catch it. AI had no way to hear what I was hearing. And if I'd trusted the output over my instinct, the whole synthesis engine would have been quietly wrong from that point forward.
That experience shaped how I finished the build. But I'm getting ahead of myself.

What I was actually building
I've been tracking HRV and sleep for a couple of years. Every morning I wake up to five numbers: heart rate variability, resting heart rate, deep sleep %, REM %, stress score. I got pretty good at reading them. What I couldn't figure out was what to actually do with them before sitting down to work or trying to sleep.
The obvious answer was soundscapes. Binaural beats. That whole world. But every app I tried was the same — you pick "focus" or "relax" and it plays the same thing every single time. That started bothering me more than it probably should have.
HRV 22 ms with stress at 78 is not the same nervous system state as HRV 61 ms after a clean night of sleep. They're completely different. Why would the same audio session help both of them? Nobody seemed to be asking that question.
So I spent a few months building something that does.
It's called Neurova. C#/.NET 8, WPF, runs fully offline. No account, no subscription, your data stays on your machine. You feed in your five daily metrics and it generates a unique audio session from them — physically modelled, every sound synthesised in real time from scratch. No samples. No DAW. Pure code.
Each metric drives a different part of the synthesis:

HRV → primary binaural beat frequency. 1 ms change moves the beat by about 0.08 Hz.
Resting HR → carrier frequency and breath guide pace
Deep sleep % → sub-bass resonance and delta brainwave layer weight
REM % → theta layer frequency and which Solfeggio tone gets selected
Stress score → noise color ratio, ambient layer intensity, secondary beat frequency

The engine also builds a baseline from your own historical data — so the fingerprint is relative to your norms, not some population average that has nothing to do with you.

The synthesis part — which is the part I'm most nervous about
I'll be honest. Every sound being synthesised in real time from scratch is the thing I'm most proud of and also the thing I'm most uncertain about when I tell people.
Guitar uses Karplus-Strong. The delay line length is sample rate divided by frequency. Each sample is the average of the previous two samples in the delay line — that averaging is what acts as the low-pass filter, which is what models string damping. You seed the line with an attack noise burst and it genuinely sounds like a plucked string. Getting it stable across all frequencies without aliasing took way longer than I expected. There's a point where the math is obvious and the implementation still isn't working and you just have to sit with it.
Piano has 7 inharmonic partials with stretch tuning. I added dual-string detuning at ±0.0012 semitone for the natural chorus effect real pianos have from their doubled strings. Hammer noise on attack is a white noise burst, under 10ms, low-passed at 2kHz.
Rain is dual-band bandpass from pink noise. Low band (300–800 Hz) for the body of rain, high band (3k–12kHz) for individual droplets. No LFO swell — I spent a while on this actually. Real rain doesn't pulsate. A lot of rain ambience in apps pulses rhythmically and it drives me crazy now that I've noticed it.
Ocean is three layers: shore surge as a low frequency swell, mid-frequency wash, and high spray transients on wave crests. Wave period comes from your resting heart rate. Slower heart rate, longer swells.
The reverb is a convolution implementation using partitioned overlap-save FFT. That was probably the most complex thing I've written in a long time.

Back to the AI problem
I tried using AI for the synthesis code. I kept trying actually, because it would have been faster.
The problem is specific to this kind of work. When code is modelling something physically precise — string damping, wave periods derived from heart rate, inharmonic partials — wrong code doesn't fail. It just sounds slightly off. And if you don't own every line, you won't hear what's wrong. You'll hear something that sounds close enough and move on.
Beyond that, I kept running into the same frustration. AI would validate broken logic just as confidently as working logic. Something would feel wrong to me before any test caught it, and the response was always some version of "this looks correct." I had to learn to trust that instinct over the confidence of the output. Which, honestly, is a weird thing to have to learn.
I did use AI for other parts of the build — data handling, UI, export logic. That worked fine. But the synthesis engine I wrote myself, line by line, and I think the output is better for it even if I can't fully prove that.

The Story Engine — which made the whole thing weirder and more interesting
For a long time all sessions had the same shape. Quiet open, build, peak, resolve, dissolve. Only the instruments changed. The arc was identical every time.
I eventually built what I called the Story Engine, which turns your biometric fingerprint into a five-act dramatic structure. Act 1 is the arrival, character shaped by your recovery score. Act 2 is the build, density driven by HRV. Act 3 is the climax, and its shape comes from your stress score — it can be a sharp peak, a double wave, or a plateau depending on where you are. Act 4 is descent. Act 5 is resolution, and there are three kinds: earned resolution, graceful surrender, or open horizon.
The tension curve and the volume master curve are generated separately, which means emotional complexity and acoustic presence can move independently. Every instrument also has its own register arc — piano doesn't just get louder or quieter, it moves through its range. It might start mid-register, drop low during the deep phase, climb to its ceiling at the climax, then come back down. Guitar and strings have their own arcs. They're not all doing the same thing at the same time.
Same health data on two different days still produces a different structure because the date rotates the arc shape.

What I actually learned
The physical modelling was the hardest part technically. The Story Engine was the most interesting problem — getting five completely different biometric dimensions to each shape a different aspect of a dramatic arc, without any of them stepping on each other, took a lot of iteration.
But the thing I didn't expect to learn was about trusting my own judgment mid-build. Every time I felt like something was wrong and actually went looking, I found a real problem. Every time I accepted the AI's confidence over my instinct, I paid for it somewhere down the line.
I don't think that's an argument against using AI. I think it's an argument for knowing when you're in territory where you can't outsource the feeling that something is off.

If you've built something where the output had to be physically precise — audio synthesis, simulation, hardware interfacing, anything like that — I'm curious how you handled that gap. The place where AI says it looks fine and your gut says it doesn't. Is that specific to this kind of work or have you hit it somewhere else too?

DEV Community

What happens when AI tells you the code is fine but your gut says it isn't?

Top comments (0)