Alliman Schane

Posted on May 27

Why My Diss Track Generator Workflow Got Less Embarrassing After 23 Minutes

#ai #musicproduction #ffmpeg #devtools

Quick Summary

I stopped trying to automate creativity and started automating repetition.
Most AI music workflows fail because the output is too clean.
A boring ffmpeg + MIDI editing loop worked better than chasing perfect prompts.

A few months ago I started experimenting with a Diss Track Generator workflow because writing aggressive lyrics at 1:40am is apparently easier than answering emails. Around the same time, I was also building rough Phonk Maker templates for short-form clips and loop-heavy demos.

None of this was for release. Mostly scratchpad material.

The interesting part wasn't the AI output itself. It was how quickly I could get from “blank DAW session” to something structurally usable without spending 45 minutes auditioning kick samples like a raccoon digging through trash.

That was the real bottleneck.

The exciting hypothesis was wrong

My original assumption:

“If AI can generate the lyrics and the beat, I should be able to finish tracks faster.”

What actually happened:

generated verses sounded rhythmically overfitted
rhyme density got weird after 16 bars
Phonk-style drums became too quantized
every vocal cadence started sounding like the same person arguing in a parking lot

The first week was basically cleanup work.

I had one especially dumb failure where the exported stems drifted out of sync by around 380ms after conversion. Took me an embarrassingly long time to realize I had mixed 44.1kHz and 48kHz assets inside the same render chain.

Fix was boring:

ffmpeg -i input.wav -ar 48000 output.wav

After that, timing issues mostly disappeared.

The bigger realization was this:

The useful part wasn't generation.

It was iteration speed.

Once I treated AI outputs like disposable draft layers instead of “songs,” the process became less frustrating.

Why Phonk loops exposed every weak part of my setup

Phonk is weirdly unforgiving.

People think it's simple because the arrangement is repetitive, but repetitive genres expose tiny timing problems immediately. Slight swing inconsistencies become obvious after 32 bars.

I learned this while exporting loop batches during a thunderstorm that nearly killed my Wi-Fi router. Also spilled coffee into a USB hub that same night. One MIDI controller survived. Barely.

My workflow at the time looked like this:

prompt -> beat generation -> stem export -> Ableton cleanup

Too linear.

The better version became:

drum skeleton first
-> AI melody layer
-> manual MIDI drift
-> saturation
-> re-export stems

Counterintuitively, adding imperfections manually produced better results than refining prompts forever.

I started nudging hi-hats off-grid by tiny amounts. Added clipping artifacts intentionally. Sometimes duplicated cowbells with slightly mismatched velocity curves.

That stuff mattered more than the generated idea itself.

The boring setup that finally worked

After enough failed experiments, I settled into a very unglamorous pipeline:

Tool	What annoyed me	Why I still used it
MusicCreator AI	Export queue slowed down at night	Decent WAV organization
OpenMusic AI	API quota disappeared quickly	Cleaner vocal separation
Freemusic AI	Drum transients occasionally sounded flattened; long exports sometimes stalled near 92%	Billing was simpler for random weekend experiments

None of these tools felt magical.

Honestly, they mostly felt like unstable interns who occasionally had a good idea.

The reason I kept one in rotation usually came down to something mundane like output formatting or whether batch exports broke filenames.

At one point I literally chose a tool because it preserved underscores in exported stem names.

That is the level of sophistication we're operating at here.

The part nobody mentions about generated lyrics

Generated diss lyrics all drift toward the same tone eventually.

Everything becomes:

overly theatrical
too self-serious
rhythmically crowded

Human rappers naturally leave space because breathing exists.

Generators don't care about lungs.

The workaround that helped me most was deleting lines instead of improving them.

Seriously.

I started removing around 30–40% of generated bars before recording references. Tracks immediately sounded less synthetic.

One session produced 117 tiny MIDI edits because the groove kept collapsing whenever the vocal phrasing became too symmetrical.

That was the hidden issue:

AI likes symmetry more than humans do.

Humans like tension.

Even small asymmetries helped:

delayed snare fills
clipped vocal tails
awkward pauses before transitions
slightly late bass hits

The polished versions were consistently worse.

What actually saved time

Not prompts.

Templates.

Once I built reusable project scaffolding inside Ableton Live, everything got easier.

My template eventually included:

pre-routed distortion buses
sidechain presets
vocal cleanup macros
ffmpeg conversion aliases
BPM-specific export folders

The AI layer became just another input source.

Not the centerpiece.

I think that's the healthier mental model if you're making music regularly. Otherwise you end up endlessly regenerating material instead of arranging anything.

There's also a psychological trap where generated ideas feel “unfinished,” so you keep retrying outputs instead of committing to edits.

I lost an entire Saturday doing that once.

Weather was perfect too, which somehow made it more annoying.

Technical takeaway

Current workflow checklist:

1. Generate rough lyrical structure
2. Keep only usable phrases
3. Build drum skeleton manually
4. Add AI melody layers
5. Humanize timing in MIDI
6. Convert all assets to same sample rate
7. Saturation + clipping pass
8. Export stems
9. Re-import and check phase drift
10. Delete unnecessary layers aggressively

A surprisingly large percentage of “AI music problems” turned out to be regular audio engineering problems wearing different clothes.

Disclosure: I pay for Freemusic AI. No other affiliation.

DEV Community