deAPI

Posted on Jun 22

How to Generate AI Music with ACE-Step API (Cheaper Than Suno)

#ai #music #api

Suno Pro costs $10/month for 500 songs. Hit your limit on day 12, and you wait until the billing cycle resets. Udio charges $30/month with credit walls that kick in during peak hours.

ACE-Step takes a different route entirely. It's an open-source model (Apache 2.0) that you can call through an API and pay per track. Through deAPI, a 3-minute track costs $0.002 on the fastest variant - and every output is yours commercially.

This guide covers how to prompt ACE-Step, what each model variant is good at, and includes copy-paste Python code to generate your first track.

What Makes ACE-Step Different

Suno's Custom Mode and Udio both offer style tags, lyrics with structure markers, and post-generation editing. They're capable tools for making music in a browser.

ACE-Step solves a different problem. It's an open-source model you call through a REST API - which means it plugs into your code, your pipeline, your product. A game studio can generate 50 level soundtracks in a batch script. A SaaS can offer "generate background music" as a feature without licensing a third-party UI.

The prompting model splits input into two fields:

Tags shape the sound - genre, instruments, BPM, vocal style, production era as comma-separated keywords. Think of this as the mixing board.

For structure, you write lyrics with section markers like [verse], [chorus], [bridge]. Leave the lyrics field empty (or use [Instrumental]) for instrumental tracks.

Changing tags doesn't affect lyrics, and vice versa. Keep the same words, swap "lo-fi hip-hop" for "progressive techno," and the model rebuilds the entire arrangement around the new genre.

Three Models, One Endpoint

deAPI exposes three ACE-Step 1.5 variants through the same /api/v1/client/txt2music endpoint:

Model	Slug	Speed	Best for
Turbo (3.5B)	`AceStep_1_5_Turbo`	Fastest	Prototyping, A/B testing 10 variants in a minute
XL Turbo (~10B, INT8)	`AceStep_1_5_XL_Turbo_INT8`	Fast	Production-ready tracks with richer instrumentation
Base (3.5B)	`AceStep_1_5_Base`	Slowest	Final renders, full control over steps and guidance

Turbo and XL Turbo run at fixed 8 steps with CFG=1. Base lets you dial steps up to 100 and guidance from 3 to 20 - slower generation, but noticeably richer output.

What It Costs

We pulled these numbers directly from deAPI's pricing API on June 18, 2026:

Model	1-min track	3-min track	5-min track
Turbo (8 steps)	$0.0009	$0.002	$0.003
XL Turbo (8 steps)	$0.001	$0.0024	$0.0037
Base (40 steps)	$0.0037	$0.0104	$0.017

For context, here's what subscription platforms charge:

	Suno Pro	Udio Standard	deAPI (XL Turbo)
Monthly cost	$10/mo	$30/mo	Pay per track
Tracks included	~500	~1,200	Unlimited
3-min track cost	~$0.02 (within limit)	~$0.025	$0.0024
After limit	Wait or upgrade	Buy credits	Same price, always
Commercial rights	Pro plan only	Standard plan only	Always (Apache 2.0)
Source model	Closed	Closed	Open-source

A 3-minute track on XL Turbo costs roughly 8× less than the per-song equivalent on Suno Pro. The gap widens in production workflows - a game studio generating 50 background tracks pays $0.12 on deAPI. On Suno, that's either one month of Pro (if you have songs left) or $0.50/track on top-up credits.

Writing Tags That Actually Work

A solid tag set runs 5-12 keywords. Start broad, get specific.

The formula

[genre], [sub-genre], [mood], [2-3 instruments],
[vocal type], [production style], [era], [BPM] bpm

Examples that produce good results

Lo-fi hip-hop:

lo-fi hip-hop, boom bap, dusty drums, vinyl crackle,
jazz sample, rhodes piano, male vocals, laid-back, warm, 88 bpm

Dark progressive techno:

progressive techno, dark, hypnotic, driving,
analog modular synth, 909 drums, acid bass,
atmospheric pad, no vocals, wide stereo, 134 bpm

Chamber folk ballad:

chamber folk, acoustic ballad, intimate, melancholic,
fingerpicked classical guitar, cello, soft female vocals,
warm analog, 68 bpm

Three rules

Instruments beat adjectives. "Grand piano, upright bass, brushed drums" gives the model concrete targets. "Sophisticated, elegant, refined" gives it nothing to work with.

Cap your tags at 12. Beyond that threshold, keywords dilute each other - eight precise tags outperform twenty vague ones.

Include BPM in the tags too, even when you set the bpm parameter separately. The model reads both inputs, and consistency between them locks the rhythm tighter.

Structuring Lyrics

Every section sits inside square brackets:

[verse]   - verse
[chorus]  - hook / refrain
[bridge]  - contrasting section
[inst]    - instrumental break or solo
[intro]   - instrumental opening
[outro]   - ending

An empty section (marker with no text after it) becomes an instrumental passage. The [inst] marker between sections triggers a solo - particularly effective in jazz and rock prompts.

Keep lines short: 4-8 syllables. The model maps syllables to beats. Lines longer than 12 syllables cause the vocal timing to break apart. It reads worse on paper but sings better in practice.

ACE-Step supports 19 languages. English, Chinese, Japanese, Spanish, German, and French perform best. You can mix languages mid-song - XL Turbo handles English verses with a Japanese chorus in j-rock style without stumbling.

Full Working Example (Python)

Here's a complete script that generates a lo-fi hip-hop track:

import requests
import time

API_KEY = "your_api_key_here"
BASE = "https://api.deapi.ai/api/v1/client"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

response = requests.post(f"{BASE}/txt2music", headers=HEADERS, data={
    "model": "AceStep_1_5_XL_Turbo_INT8",
    "caption": (
        "lo-fi hip-hop, boom bap, dusty drums, vinyl crackle, "
        "jazz sample, rhodes piano, male vocals, rap vocals, "
        "laid-back, warm, 88 bpm"
    ),
    "lyrics": """[intro]

[verse]
Smoke above the rooftops as the sun begins to fall
Every neon sign reflecting off the barber shop wall

[chorus]
This is how it goes, this is how it always been
Writing every story down from where I'm standing in

[outro]""",
    "duration": 120,
    "bpm": 88,
    "inference_steps": 8,
    "guidance_scale": 1,
    "seed": -1,
    "format": "mp3"
})

request_id = response.json()["data"]["request_id"]
print(f"Submitted: {request_id}")

while True:
    status = requests.get(
        f"{BASE}/request-status/{request_id}", headers=HEADERS
    ).json()

    if status["data"]["status"] == "done":
        print(f"Download: {status['data']['result_url']}")
        break
    if status["data"]["status"] == "error":
        print(f"Error: {status['data'].get('error')}")
        break

    print(f"Status: {status['data']['status']}")
    time.sleep(5)

Switching models is one parameter change:

"model": "AceStep_1_5_Turbo"          # fastest, draft quality
"model": "AceStep_1_5_XL_Turbo_INT8"  # balanced (recommended)
"model": "AceStep_1_5_Base"           # highest fidelity

For Base, increase inference_steps to 40-60 and adjust guidance_scale between 3-15 for more controlled output.

The Prototyping Workflow

The three variants form a natural pipeline that saves both time and money:

Sketch with Turbo. Generate 10 variations of the same prompt in under a minute. Listen for the arrangement and melody that clicks - ten 3-minute attempts cost less than two cents.

Refine with XL Turbo. Take the winning tags and lyrics, run them on XL Turbo. Richer instrumentation, cleaner vocals. Most tracks are production-ready at this stage.

Polish with Base when the track is headed to Spotify or a client deliverable. Switch to steps=40-60 and let the extra sampling time fill out the harmonic detail, especially on acoustic material.

The full pipeline - 10 Turbo sketches + 2 XL Turbo refinements + 1 Base polish - costs $0.035. On Suno, the same iterative workflow eats 13 songs from your monthly 500. And you can't control which parameter to tweak between iterations.

Common Mistakes

Writing prose instead of tags. "A melancholic piano ballad about lost love in Paris" produces muddy output. Break it into keywords: piano ballad, melancholic, intimate, female vocals, 68 bpm.

Long lyric lines kill vocal timing. The model maps syllables to beats, and anything past 12 syllables per line falls apart. Aim for 4-8.

Stacking five genres at once. XL Turbo handles rare genres individually, but darkwave, witch house, phonk, vaporwave, drill in a single prompt cancels itself out.

Expecting Turbo to match Base quality. Eight fixed steps is a deliberate trade-off. The mix will be thinner, vocals less articulate - Turbo exists for iteration, not final delivery.

Try It

Sign up at app.deapi.ai for $5 in free credits - enough to generate over 2,000 tracks on XL Turbo. The playground lets you test all three ACE-Step models without writing code. When you're ready to integrate, the API docs cover every parameter.

All generated music is yours commercially. ACE-Step ships under Apache 2.0, and deAPI adds no licensing restrictions on the output.

Top comments (1)

Piotr • Jun 22

Thx!
AI rappers duel - I have to do something like this: LLM generates the text -> deapi creates the music (the whole hip-hop track) -> deapi makes transcriptions for the second "Rapper" -> LLM analyzes and prepares a riposte!🤔