GeneLab_999

Posted on Feb 4

ComfyUI-AceMusic: The First Full Implementation of ACE-Step 1.5 Features That "Weren't Yet Supported"

#ai #news #python #showdev

TL;DR

On February 3rd, 2026, the official ComfyUI blog announced ACE-Step 1.5 support with a notable caveat: "Cover, Repaint, and other features aren't yet supported in ComfyUI."

The next day, I released ComfyUI-AceMusic — a complete implementation of all 15 ACE-Step 1.5 features as ComfyUI nodes.

Key highlights:

World-first: Full Cover, Repaint, Edit, Retake, Extend support in ComfyUI
15 nodes covering every ACE-Step 1.5 capability
Modular architecture that eliminates widget ordering issues
Windows + Python 3.13+ compatible using soundfile/scipy instead of problematic torchaudio backends
HeartMuLa interoperability for hybrid AI music workflows

GitHub: github.com/hiroki-abe-58/ComfyUI-AceMusic

The Problem: Official Support Was Incomplete

ACE-Step 1.5 is a game-changer for open-source music generation. It outperforms most commercial alternatives, runs on consumer hardware (4GB VRAM), and generates full songs in under 10 seconds on an RTX 3090.

When ComfyUI announced native support, the community was excited. But there was a catch.

From the official ComfyUI blog (February 3rd, 2026):

"ACE-Step 1.5 has a few more tricks up its sleeve. These aren't yet supported in ComfyUI, but we have no doubt the community will figure it out."

The "tricks" they mentioned? Only the most powerful features of ACE-Step 1.5:

Feature	Description	Official Support
Cover	Transform any song into a different style	❌ Not supported
Repaint	Regenerate specific sections of audio	❌ Not supported
Edit	Change tags/lyrics while preserving melody	❌ Not supported
Retake	Create variations of existing audio	❌ Not supported
Extend	Add new content before/after audio	❌ Not supported

So I built them.

What ComfyUI-AceMusic Offers

Complete Feature Coverage

Node	Function
Model Loader	Downloads and caches ACE-Step 1.5 models
Settings	Configure generation parameters
Generator	Text-to-Music generation
Lyrics Input	Dedicated lyrics input with section markers
Caption Input	Style/genre description input
Cover	Transform existing audio into different styles
Repaint	Regenerate specific time ranges
Retake	Create variations with same settings
Extend	Add content to beginning or end
Edit	Change tags/lyrics, preserve melody (FlowEdit)
Conditioning	Combine parameters into conditioning object
Generator (from Cond)	Generate from conditioning
Load LoRA	Load fine-tuned adapters
Understand	Extract metadata from audio
Create Sample	Generate params from natural language

Comparison with Existing Implementations

Implementation	ACE-Step Version	Cover	Repaint	Edit	Retake	Extend	Win 3.13+
ComfyUI Native	1.5	❌	❌	❌	❌	❌	Untested
billwuhao	1.0	Partial	✅	❌	❌	✅	Untested
ryanontheinside	1.0	❌	✅	❌	❌	✅	Untested
ComfyUI-AceMusic	1.5	✅	✅	✅	✅	✅	✅

Technical Deep Dive

1. Modular Architecture

Previous implementations crammed 30+ parameters into a single node, causing widget ordering issues — a known ComfyUI quirk where input field order can cause unexpected behavior.

ComfyUI-AceMusic separates concerns:

[Model Loader] → Model loading only
[Settings] → Generation parameters only  
[Lyrics Input] → Lyrics entry only
[Caption Input] → Style description only
[Generator] → Generation execution only

This separation:

Eliminates widget ordering bugs
Improves workflow readability
Makes nodes reusable across different workflows
Follows single-responsibility principle

2. Cross-Platform Compatibility

The Problem: torchaudio backends can fail on Windows + Python 3.13+.

The Solution: Use soundfile and scipy instead.

# Problematic approach
import torchaudio
audio, sr = torchaudio.load("file.wav")  # Fails on Windows 3.13+

# ComfyUI-AceMusic approach
import soundfile as sf
audio, sr = sf.read("file.wav")  # Works everywhere

This isn't just a workaround — it's a more robust solution that works across all platforms without requiring specific backend configurations.

3. HeartMuLa Interoperability

The AUDIO type in ComfyUI-AceMusic is compatible with HeartMuLa outputs, enabling hybrid workflows:

[HeartMuLa Generator] → [AceMusic Cover] → [AceMusic Extend] → [Output]

This lets you combine the strengths of different music generation models in a single workflow.

Quick Start

Installation

Via ComfyUI Manager (Recommended):
Search for "ComfyUI-AceMusic" and install.

Manual:

cd ComfyUI/custom_nodes
git clone https://github.com/hiroki-abe-58/ComfyUI-AceMusic.git
cd ComfyUI-AceMusic
pip install -r requirements.txt

# Install ACE-Step 1.5
pip install git+https://github.com/ace-step/ACE-Step.git

Models auto-download from Hugging Face on first use.

Basic Workflow (Text-to-Music)

Add AceMusic Model Loader → set device to cuda
Add AceMusic Settings → configure duration, language, etc.
Add AceMusic Lyrics Input:

   [Verse]
   Walking down the empty street
   Thinking about you and me

   [Chorus]
   We belong together
   Now and forever

Add AceMusic Caption Input: pop, female vocal, energetic
Connect all to AceMusic Generator → Preview Audio

Load the example workflow: workflow/AceMusic_Lyrics_v3.json

Cover Workflow (Style Transfer)

[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Cover] → [Preview Audio]
                               ↑
[Caption Input] ───────────────┘
"jazz piano trio, smooth, relaxed"

Use cases:

Pop → Jazz arrangement
Rock → Acoustic version
EDM → Orchestral arrangement

Repaint Workflow (Section Regeneration)

[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Repaint] → [Preview Audio]
                               ↑
[Time Range: 30-45s] ──────────┘

Use cases:

Fix a problematic chorus
Improve the intro
Regenerate specific vocal sections

Performance

Generation Speed

Device	RTF (27 steps)	Time for 1 min audio
RTX 5090	~50x	~1.2s
RTX 4090	34.48x	1.74s
A100	27.27x	2.20s
RTX 3090	12.76x	4.70s
M2 Max	2.27x	26.43s

VRAM Requirements

Mode	VRAM	Notes
Normal	8GB+	Full speed
CPU Offload	~4GB	Slower but works on limited VRAM

Troubleshooting

Error	Cause	Solution
`CUDA out of memory`	Insufficient GPU memory	Enable `cpu_offload` or reduce `duration`
`ModuleNotFoundError: acestep`	ACE-Step not installed	`pip install git+https://github.com/ace-step/ACE-Step.git`
`soundfile not found`	Missing dependency	`pip install soundfile scipy`
`Model download failed`	Network issue	Check Hugging Face access
`torchaudio backend error`	Windows 3.13+ issue	Ensure soundfile is properly installed

Environment Check Script

#!/usr/bin/env python3
"""ComfyUI-AceMusic Environment Checker"""
import sys

def check():
    issues = []

    # Python version
    print(f"Python: {sys.version}")
    if sys.version_info < (3, 10):
        issues.append("Python 3.10+ required")

    # PyTorch + CUDA
    try:
        import torch
        print(f"✅ PyTorch: {torch.__version__}")
        if torch.cuda.is_available():
            print(f"✅ CUDA: {torch.version.cuda}")
            vram = torch.cuda.get_device_properties(0).total_memory / 1e9
            print(f"✅ GPU VRAM: {vram:.1f} GB")
        else:
            issues.append("CUDA not available")
    except ImportError:
        issues.append("PyTorch not installed")

    # ACE-Step
    try:
        import acestep
        print("✅ ACE-Step: installed")
    except ImportError:
        issues.append("ACE-Step not installed")

    # Audio libraries
    try:
        import soundfile
        print("✅ soundfile: installed")
    except ImportError:
        issues.append("soundfile not installed")

    # Results
    print("\n" + "="*50)
    if issues:
        print("❌ Issues found:")
        for issue in issues:
            print(f"  - {issue}")
    else:
        print("✅ Environment OK!")

if __name__ == "__main__":
    check()

Why I Built This

When I saw the official announcement saying "these features aren't yet supported," I knew exactly what needed to be done. The ACE-Step team built an incredible model with Cover, Repaint, Edit, and other powerful features — but without ComfyUI support, most users couldn't access them.

The hardest part was the torchaudio issue. On Windows with Python 3.13+, the audio backends just don't work reliably. The solution was to bypass torchaudio entirely and use soundfile/scipy for all audio I/O. It's a more robust approach that should work on any platform.

The modular architecture came from frustration with existing implementations. Stuffing 30+ parameters into one node isn't just ugly — it causes real bugs. Separating concerns made the nodes more reliable and the workflows more readable.

This is what open source is about. The official team sets the direction, and the community fills in the gaps. I'm proud to contribute to the music generation ecosystem.

License

Apache 2.0

If you find this useful, consider starring the repo. And if you build something cool with it, I'd love to see it!

DEV Community