I Asked ChatGPT, Claude, Gemini, and Grok to Write a Pop Song. Here's What Actually Happened.

#music #chatgpt

Same prompt. Same key. Same tempo. Four completely different ideas about what music should feel like.

Last month I built Melogen — a web editor that lets you import AI-generated melodies, play them back on a piano roll, edit them, and export to MIDI. The core idea is simple: LLMs can already write structured music data. What's been missing is a decent way to actually hear what they produced.
Once the editor was working, the obvious question became: which AI writes the best music?
So I ran the experiment properly. Same prompt, all four major models, zero human edits, imported straight into Melogen. What I found wasn't just a ranking — it was a surprisingly clear window into how each model thinks about creativity.

The Setup
The prompt was deliberately specific:
Compose a 16-bar melody in G major, 120 BPM, 4/4 time signature,
pop style, uplifting mood. Output as JSON with fields: tempo, key,
bars, notes array with bar, offset_beats, dur_beats, midi, vel.
Every model received this exact text. No system prompt. No examples. No follow-up nudges.
The output format — a JSON array of notes with bar position, beat offset, duration, MIDI pitch, and velocity — is precise enough to be musically unambiguous. There's nowhere to hide vague intent. Either the notes work or they don't.
I imported each response directly into Melogen without touching a single value. What you hear is the raw machine output.

The Results
ChatGPT — "The Textbook"
ChatGPT produced the most technically correct output of the four. The JSON was clean, the music theory was flawless, and there was a clear ascending arc from G4 all the way up to D6 across the 16 bars — a logical, well-structured build.
Here's a representative excerpt from its output:
json{ "bar": 1, "offset_beats": 0, "dur_beats": 1, "midi": 67, "vel": 96 },
{ "bar": 1, "offset_beats": 1, "dur_beats": 1, "midi": 71, "vel": 98 },
{ "bar": 1, "offset_beats": 2, "dur_beats": 1, "midi": 74, "vel": 102 },
{ "bar": 1, "offset_beats": 3, "dur_beats": 1, "midi": 76, "vel": 104 },
Notice anything? Every single note is dur_beats: 1 — a quarter note. This pattern holds for almost the entire melody. Occasionally you get a half note. Eighth notes — the engine of pop rhythm — are nearly absent.
The result sounds like a music theory exercise. Correct, logical, and rhythmically inert. Pop music lives in the spaces between the beats. ChatGPT stayed rigidly on them.
Verdict: Perfect for a schema validator. Not great for a dancefloor.

Claude — "The Songwriter"
Claude was the most surprising. Where ChatGPT treated rhythm as a formality, Claude leaned into it:
json{ "bar": 1, "offset_beats": 0, "dur_beats": 1, "midi": 74, "vel": 82 },
{ "bar": 1, "offset_beats": 0.5, "dur_beats": 0.5, "midi": 76, "vel": 80 },
{ "bar": 1, "offset_beats": 1, "dur_beats": 1, "midi": 79, "vel": 90 },
{ "bar": 1, "offset_beats": 2, "dur_beats": 0.5, "midi": 78, "vel": 84 },
{ "bar": 1, "offset_beats": 2.5, "dur_beats": 0.5, "midi": 76, "vel": 80 },
{ "bar": 1, "offset_beats": 3, "dur_beats": 1, "midi": 74, "vel": 86 },
Half-beat offsets. Syncopated pickups. Long-short-long phrasing. This is how a vocalist actually breathes. You can almost map words onto these notes without changing anything.
Claude also had the most natural velocity shaping — verse notes sitting around 78–88, chorus pushing into 90–100, with smooth transitions rather than sudden jumps.
The problem? The melodic ceiling is too low. The entire melody peaks at G5 (MIDI 79), barely an octave above the tonic. For a piece described as "uplifting," you need that moment where the pitch opens up and the chorus lifts. Claude never goes there.
There was also a format issue: Claude wrapped its JSON in inline // comments. Technically invalid JSON. If you're piping this into a strict parser, it fails immediately.
Verdict: The most musical output. Wrong style for the brief. Would excel on a different prompt.

Gemini — "The Architect"
Gemini produced the most structurally disciplined melody. Bars 4, 8, 12, and 16 all land on whole notes — clear cadence points that divide the piece into four balanced phrases. This is exactly what a human arranger would do to give a piece shape.
json{ "bar": 4, "offset_beats": 0, "dur_beats": 4, "midi": 74, "vel": 90 },
{ "bar": 8, "offset_beats": 0, "dur_beats": 4, "midi": 67, "vel": 100 },
{ "bar": 12, "offset_beats": 0, "dur_beats": 4, "midi": 74, "vel": 95 },
{ "bar": 16, "offset_beats": 0, "dur_beats": 4, "midi": 67, "vel": 110 },
The problem is that structural correctness isn't the same as emotional impact. The melody spends most of its time in the mid-to-low register, makes a single push upward in bar 11, and then retreats. There's no sustained climax, no energy arc. The velocity range is wide (80–120) but not attached to any meaningful musical shape — the loud moments don't land where you'd expect them.
Gemini also introduced a barline overflow bug in bar 13: offset_beats: 3.0 + dur_beats: 2 = 5 beats in a 4/4 bar. This breaks timeline-based editors. A small oversight, but a real one.
Verdict: The best skeleton. Needs a composer to fill it in.

Grok — "The Pop Star"
Grok's output was the most immediately engaging. From bar 9, the energy shifts register entirely — a clear verse-to-chorus transition that sounds deliberate, not accidental:
json{ "bar": 9, "offset_beats": 0, "dur_beats": 1, "midi": 81, "vel": 104 },
{ "bar": 9, "offset_beats": 1, "dur_beats": 1, "midi": 83, "vel": 105 },
{ "bar": 9, "offset_beats": 2, "dur_beats": 1, "midi": 86, "vel": 106 },
{ "bar": 9, "offset_beats": 3, "dur_beats": 1, "midi": 84, "vel": 104 },
MIDI 86 is D6 — nearly two octaves above the tonic G4. The arc from G4 in bar 1 to D6 in bar 9, sustained across bars 9–12, and then resolved back down is textbook pop structure. Verse establishes, chorus explodes, outro resolves.
The velocity progression tells the same story: 96 in bar 1, climbing to 108 by the chorus, never abruptly. The model understood that an "uplifting" piece should build.
There's one small issue: bar 13 has a note that exceeds the barline by a beat. In practice, this is a trivial fix in any editor — drag the note endpoint, adjust the duration, done. For a musician, it's a three-second correction. The underlying creative idea remains completely intact.
Verdict: The melody you'd actually want to finish producing.

What Each Model Gets Right
The most interesting thing about this experiment isn't the ranking — it's what each model's choices reveal about its internal model of "music."
ChatGPT treats music as a structured sequence with correct harmonic relationships. It optimizes for theoretical correctness. The output is exactly what you'd expect from a model trained on music theory textbooks.
Claude treats music as language — as something spoken by a human voice with natural rhythm and breath. It optimizes for phrasing. The result sounds singable because it was generated with the same attention to micro-timing that a vocalist would use.
Gemini treats music as architecture — as a system of phrases and cadences with clear macro-level organization. It optimizes for form. The output has a skeleton any composer would recognize, but no flesh.
Grok treats music as energy and emotional arc — as something that should feel like something. It optimizes for impact. The result is commercially viable in a way the others aren't.
None of these is wrong. A complete piece of music needs all four. The interesting question is which one you reach for first depending on what you're trying to make.

The Engineering Reality
Beyond musicality, there are practical concerns for anyone trying to use LLM-generated music in a real pipeline.
JSON validity: Claude's inline comments break standard parsers. A simple regex pre-processing step fixes this, but it's an extra step. The other three produce clean JSON.
Barline integrity: Both Gemini and Grok produced notes that overflow their bar boundaries (offset + duration > beats_per_bar). Any bar-based editor will either throw an error or silently misalign the timeline. Validation at import time is essential.
Velocity range: MIDI velocity maxes at 127. ChatGPT's output peaks at 115, which is technically fine but leaves little headroom. If you're layering multiple instruments, you'll clip.
Field naming: Claude used "key": "G major" where the others used "key": "G". Minor, but worth normalizing if you're building a parser that handles multiple models.
A minimal validation function before import saves a lot of downstream debugging:
javascriptfunction validateNotes(notes, beatsPerBar = 4) {
return notes.map(note => {
// Clamp velocity to MIDI spec
note.vel = Math.min(127, Math.max(0, note.vel));

// Flag barline overflows
if (note.offset_beats + note.dur_beats > beatsPerBar) {
  console.warn(`Bar ${note.bar}: note overflows barline`);
  note.dur_beats = beatsPerBar - note.offset_beats;
}

return note;

});
}

Final Rankings
For this specific task — uplifting pop, 16 bars, G major — here's how the models rank across two dimensions:
Musical quality (does it sound like a real pop song?):
Grok > Claude > ChatGPT > Gemini
Engineering reliability (can you pipe it directly into a parser?):
ChatGPT > Gemini > Claude > Grok
The ideal workflow, if you're building something production-ready, is to use Grok or Claude for the creative output and ChatGPT's format discipline as a validation target — or just run everything through a validation layer before import.

What This Means for AI Music Tools
We're at an interesting inflection point. LLMs can clearly generate musically coherent structured data. The gap isn't in the generation quality — it's in the tooling around it.
Most people who ask an AI to write music see a wall of JSON and stop there. They have no way to hear it, no way to edit it, no way to understand whether it's actually good. The creative output is trapped in a format that only makes sense to a parser.
That's the problem Melogen is trying to solve — not to replace the AI, but to make its output audible and editable. The models generate. You shape. The editor bridges the gap.
If you want to try any of the melodies from this article, they're all importable directly at melogen.app. Paste the JSON, hit play, and see what your ears think.

The full JSON for all four melodies is available on request. If you run the same experiment and get different results, I'd genuinely like to see them — the models update frequently and the outputs drift.

Tags: ai music chatgpt claude llm midi music-production generative-ai javascript web-audio