DEV Community

Thi Ngoc Nguyen
Thi Ngoc Nguyen

Posted on

I Tried to Rebuild My Music Workflow Around AI. Here's Where It Actually Broke.

Weekday evening, co-working space, someone's Zoom call bleeding through the partition.


Last month I went to one of those "AI for Creatives" meetups — the kind where everyone is either very excited or very defensive, and the snacks are always slightly worse than the venue suggests.

A producer I'd never met before gave a short talk about how he'd rebuilt his entire music production workflow around AI generation tools. Hyperpop references fed into a neural style transfer pipeline, stems exported, layered in Ableton. He made it sound frictionless. He showed a demo track. It was genuinely good.

I went home and immediately tried to replicate his process.

That was four weeks ago. I'm still untangling the parts that didn't work.


What I was trying to do

My existing workflow is messy in the way that most independent creators' workflows are messy: accumulated over years, full of workarounds that made sense at the time, resistant to change because changing one thing tends to break three others.

I make music for short-form video content — mostly lo-fi adjacent stuff, occasionally something with more edge when the brief calls for it. The style fusion problem is one I hit constantly: a client wants something that feels like lo-fi study beats but with a little aggression, or hyperpop energy but not alienating, or — my personal favorite — like a rap track but without the rap.

These briefs are genuinely hard to execute manually. You're trying to hold two aesthetic registers simultaneously and find the seam where they meet without either collapsing into the other.

So when the producer at the meetup talked about using an AI Hyperpop Music Generator as a starting point for style fusion experiments, I was paying attention.


Week one: the setup

I started with a tool I'd been meaning to try properly. Fed it a style prompt: lo-fi hip hop × hyperpop, 90 BPM, melancholic but kinetic. Generated six variations.

Three were unusable — the fusion hadn't happened so much as the two styles were taking turns, like a conversation between people who aren't really listening to each other. Four bars of lo-fi, four bars of hyperpop, repeat.

Two were interesting. One was actually close to what I'd imagined.

I took the interesting ones into Logic and started working.

Then I hit the first wall.


The Slack thread I did not want to have

My collaborator — a sound designer I work with occasionally — pinged me mid-week:

Dara: hey did you export stems from that AI thing or just the full mix

Me: full mix, why

Dara: because I can't isolate the bass. it's baked in with something else. what even is that frequency

Me: I think it's a pitched-down vocal sample? the AI generated it

Dara: ok so we can't separate it

Me: no

Dara: cool cool cool. so what exactly are we working with here

Me: vibes, mostly

This is the part nobody talks about at meetups.

The AI Hyperpop Music Generator I was using didn't export stems. It exported a stereo mix. Which meant that everything interesting about the fusion — the specific way the lo-fi texture was sitting against the hyperpop percussion — was locked into a single file I couldn't meaningfully edit without degrading the audio.

I could layer on top of it. I could EQ it. I could not work inside it.


Week two: trying a different approach

I switched tools. Tried MusicArt this time, specifically because someone in the meetup chat had mentioned it handled stem-adjacent exports better for certain genre combinations.

It did, somewhat. The style fusion outputs were less coherent — the lo-fi × hyperpop blend felt more accidental, less intentional — but I could at least pull apart the percussion layer from the melodic layer and work with them separately.

The problem now was different: the fusion had happened, but it had happened generically. It sounded like what an algorithm thinks lo-fi × hyperpop sounds like, which is to say it sounded like a reference to the genre rather than an instance of it. Technically correct. Emotionally inert.

I kept working anyway.


Week three: where the AI rap generator came in unexpectedly

The brief I was actually trying to finish — the one that had started all of this — was for a brand video. The client had said, in the initial call, that they wanted something with energy and rhythm, almost like a rap track but instrumental.

I'd been ignoring the "rap track" part because I didn't think it was literal. Then I tried running the brief through an AI Rap Generator with the instrumental flag on, just to see what came out.

What came out was a beat structure — kick pattern, hi-hat rhythm, the implied space where a vocal would sit — that was more useful than anything I'd generated with the style-fusion approach. Not because it was better music. Because it gave me a skeleton to work against.

I took that skeleton, stripped it back, layered the lo-fi texture from my earlier experiments on top, and added the hyperpop elements as accents rather than as a competing layer.

It worked. Not perfectly, but it worked.

The client approved it on the first revision, which has never happened before and will probably never happen again.


What I actually learned, sitting in a co-working space at 6pm while someone nearby was still on a call about Q3 projections

The style fusion problem isn't really a generation problem. The AI can generate fusions. It generates them constantly, on request, with varying degrees of coherence.

The problem is integration. How do you take a generated fusion and make it editable? How do you preserve what's interesting about it while giving yourself room to push it further? How do you use the output as a starting point rather than a finished product?

These are workflow questions, not AI questions. And the meetup — like most meetups — had a lot to say about the generation step and almost nothing to say about what comes after.

I don't blame the producer who gave the talk. His workflow works for him. But his workflow involves a level of stem control and post-processing infrastructure that took him years to build, and he mentioned it in passing in a single slide.

I'd spent four weeks trying to shortcut to the interesting part.

You can't, really. Anyway.


The part I'm still figuring out

Style fusion with AI is genuinely useful as a reference generation tool. You can use it to hear what a combination might sound like before committing to building it manually. You can use it to show a client a direction without spending three days on a demo.

But the gap between "this is the direction" and "this is the finished track" is still mostly manual work. The AI gets you to the sketch faster. It doesn't close the distance between the sketch and the thing.

Maybe that distance is the job. I keep coming back to that.


The co-working space empties out around 6:30. The person on the Zoom call finally wraps up — I catch the words "circle back" and "bandwidth" before they pack up and leave. The room goes quiet in that specific way shared spaces do when the last person exits: suddenly aware of its own emptiness.

I save the project file. I close the laptop. Outside, the street is doing its ordinary evening thing — people walking somewhere, the particular amber of late-day light on concrete.

The track is done. It's fine. It might even be good.

I still don't fully understand how I got there.

Top comments (0)