Suno Pro costs $10/month for 500 songs. Hit your limit on day 12, and you wait until the billing cycle resets. Udio charges $30/month with credit walls that kick in during peak hours.
ACE-Step takes a different route entirely. It's an open-source model (Apache 2.0) that you can call through an API and pay per track. Through deAPI, a 3-minute track costs $0.002 on the fastest variant - and every output is yours commercially.
This guide covers how to prompt ACE-Step, what each model variant is good at, and includes copy-paste Python code to generate your first track.
What Makes ACE-Step Different
Suno's Custom Mode and Udio both offer style tags, lyrics with structure markers, and post-generation editing. They're capable tools for making music in a browser.
ACE-Step solves a different problem. It's an open-source model you call through a REST API - which means it plugs into your code, your pipeline, your product. A game studio can generate 50 level soundtracks in a batch script. A SaaS can offer "generate background music" as a feature without licensing a third-party UI.
The prompting model splits input into two fields:
Tags shape the sound - genre, instruments, BPM, vocal style, production era as comma-separated keywords. Think of this as the mixing board.
For structure, you write lyrics with section markers like [verse], [chorus], [bridge]. Leave the lyrics field empty (or use [Instrumental]) for instrumental tracks.
Changing tags doesn't affect lyrics, and vice versa. Keep the same words, swap "lo-fi hip-hop" for "progressive techno," and the model rebuilds the entire arrangement around the new genre.
Three Models, One Endpoint
deAPI exposes three ACE-Step 1.5 variants through the same /api/v1/client/txt2music endpoint:
| Model | Slug | Speed | Best for |
|---|---|---|---|
| Turbo (3.5B) | AceStep_1_5_Turbo |
Fastest | Prototyping, A/B testing 10 variants in a minute |
| XL Turbo (~10B, INT8) | AceStep_1_5_XL_Turbo_INT8 |
Fast | Production-ready tracks with richer instrumentation |
| Base (3.5B) | AceStep_1_5_Base |
Slowest | Final renders, full control over steps and guidance |
Turbo and XL Turbo run at fixed 8 steps with CFG=1. Base lets you dial steps up to 100 and guidance from 3 to 20 - slower generation, but noticeably richer output.
What It Costs
We pulled these numbers directly from deAPI's pricing API on June 18, 2026:
| Model | 1-min track | 3-min track | 5-min track |
|---|---|---|---|
| Turbo (8 steps) | $0.0009 | $0.002 | $0.003 |
| XL Turbo (8 steps) | $0.001 | $0.0024 | $0.0037 |
| Base (40 steps) | $0.0037 | $0.0104 | $0.017 |
For context, here's what subscription platforms charge:
| Suno Pro | Udio Standard | deAPI (XL Turbo) | |
|---|---|---|---|
| Monthly cost | $10/mo | $30/mo | Pay per track |
| Tracks included | ~500 | ~1,200 | Unlimited |
| 3-min track cost | ~$0.02 (within limit) | ~$0.025 | $0.0024 |
| After limit | Wait or upgrade | Buy credits | Same price, always |
| Commercial rights | Pro plan only | Standard plan only | Always (Apache 2.0) |
| Source model | Closed | Closed | Open-source |
A 3-minute track on XL Turbo costs roughly 8× less than the per-song equivalent on Suno Pro. The gap widens in production workflows - a game studio generating 50 background tracks pays $0.12 on deAPI. On Suno, that's either one month of Pro (if you have songs left) or $0.50/track on top-up credits.
Writing Tags That Actually Work
A solid tag set runs 5-12 keywords. Start broad, get specific.
The formula
[genre], [sub-genre], [mood], [2-3 instruments],
[vocal type], [production style], [era], [BPM] bpm
Examples that produce good results
Lo-fi hip-hop:
lo-fi hip-hop, boom bap, dusty drums, vinyl crackle,
jazz sample, rhodes piano, male vocals, laid-back, warm, 88 bpm
Dark progressive techno:
progressive techno, dark, hypnotic, driving,
analog modular synth, 909 drums, acid bass,
atmospheric pad, no vocals, wide stereo, 134 bpm
Chamber folk ballad:
chamber folk, acoustic ballad, intimate, melancholic,
fingerpicked classical guitar, cello, soft female vocals,
warm analog, 68 bpm
Three rules
Instruments beat adjectives. "Grand piano, upright bass, brushed drums" gives the model concrete targets. "Sophisticated, elegant, refined" gives it nothing to work with.
Cap your tags at 12. Beyond that threshold, keywords dilute each other - eight precise tags outperform twenty vague ones.
Include BPM in the tags too, even when you set the bpm parameter separately. The model reads both inputs, and consistency between them locks the rhythm tighter.
Structuring Lyrics
Every section sits inside square brackets:
[verse] - verse
[chorus] - hook / refrain
[bridge] - contrasting section
[inst] - instrumental break or solo
[intro] - instrumental opening
[outro] - ending
An empty section (marker with no text after it) becomes an instrumental passage. The [inst] marker between sections triggers a solo - particularly effective in jazz and rock prompts.
Keep lines short: 4-8 syllables. The model maps syllables to beats. Lines longer than 12 syllables cause the vocal timing to break apart. It reads worse on paper but sings better in practice.
ACE-Step supports 19 languages. English, Chinese, Japanese, Spanish, German, and French perform best. You can mix languages mid-song - XL Turbo handles English verses with a Japanese chorus in j-rock style without stumbling.
Full Working Example (Python)
Here's a complete script that generates a lo-fi hip-hop track:
import requests
import time
API_KEY = "your_api_key_here"
BASE = "https://api.deapi.ai/api/v1/client"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Accept": "application/json"
}
response = requests.post(f"{BASE}/txt2music", headers=HEADERS, data={
"model": "AceStep_1_5_XL_Turbo_INT8",
"caption": (
"lo-fi hip-hop, boom bap, dusty drums, vinyl crackle, "
"jazz sample, rhodes piano, male vocals, rap vocals, "
"laid-back, warm, 88 bpm"
),
"lyrics": """[intro]
[verse]
Smoke above the rooftops as the sun begins to fall
Every neon sign reflecting off the barber shop wall
[chorus]
This is how it goes, this is how it always been
Writing every story down from where I'm standing in
[outro]""",
"duration": 120,
"bpm": 88,
"inference_steps": 8,
"guidance_scale": 1,
"seed": -1,
"format": "mp3"
})
request_id = response.json()["data"]["request_id"]
print(f"Submitted: {request_id}")
while True:
status = requests.get(
f"{BASE}/request-status/{request_id}", headers=HEADERS
).json()
if status["data"]["status"] == "done":
print(f"Download: {status['data']['result_url']}")
break
if status["data"]["status"] == "error":
print(f"Error: {status['data'].get('error')}")
break
print(f"Status: {status['data']['status']}")
time.sleep(5)
Switching models is one parameter change:
"model": "AceStep_1_5_Turbo" # fastest, draft quality
"model": "AceStep_1_5_XL_Turbo_INT8" # balanced (recommended)
"model": "AceStep_1_5_Base" # highest fidelity
For Base, increase inference_steps to 40-60 and adjust guidance_scale between 3-15 for more controlled output.
The Prototyping Workflow
The three variants form a natural pipeline that saves both time and money:
Sketch with Turbo. Generate 10 variations of the same prompt in under a minute. Listen for the arrangement and melody that clicks - ten 3-minute attempts cost less than two cents.
Refine with XL Turbo. Take the winning tags and lyrics, run them on XL Turbo. Richer instrumentation, cleaner vocals. Most tracks are production-ready at this stage.
Polish with Base when the track is headed to Spotify or a client deliverable. Switch to steps=40-60 and let the extra sampling time fill out the harmonic detail, especially on acoustic material.
The full pipeline - 10 Turbo sketches + 2 XL Turbo refinements + 1 Base polish - costs $0.035. On Suno, the same iterative workflow eats 13 songs from your monthly 500. And you can't control which parameter to tweak between iterations.
Common Mistakes
Writing prose instead of tags. "A melancholic piano ballad about lost love in Paris" produces muddy output. Break it into keywords: piano ballad, melancholic, intimate, female vocals, 68 bpm.
Long lyric lines kill vocal timing. The model maps syllables to beats, and anything past 12 syllables per line falls apart. Aim for 4-8.
Stacking five genres at once. XL Turbo handles rare genres individually, but darkwave, witch house, phonk, vaporwave, drill in a single prompt cancels itself out.
Expecting Turbo to match Base quality. Eight fixed steps is a deliberate trade-off. The mix will be thinner, vocals less articulate - Turbo exists for iteration, not final delivery.
Try It
Sign up at app.deapi.ai for $5 in free credits - enough to generate over 2,000 tracks on XL Turbo. The playground lets you test all three ACE-Step models without writing code. When you're ready to integrate, the API docs cover every parameter.
All generated music is yours commercially. ACE-Step ships under Apache 2.0, and deAPI adds no licensing restrictions on the output.
Top comments (1)
Thx!
AI rappers duel - I have to do something like this: LLM generates the text -> deapi creates the music (the whole hip-hop track) -> deapi makes transcriptions for the second "Rapper" -> LLM analyzes and prepares a riposte!🤔