TL;DR
On February 3rd, 2026, the official ComfyUI blog announced ACE-Step 1.5 support with a notable caveat: "Cover, Repaint, and other features aren't yet supported in ComfyUI."
The next day, I released ComfyUI-AceMusic — a complete implementation of all 15 ACE-Step 1.5 features as ComfyUI nodes.
Key highlights:
- World-first: Full Cover, Repaint, Edit, Retake, Extend support in ComfyUI
- 15 nodes covering every ACE-Step 1.5 capability
- Modular architecture that eliminates widget ordering issues
- Windows + Python 3.13+ compatible using soundfile/scipy instead of problematic torchaudio backends
- HeartMuLa interoperability for hybrid AI music workflows
GitHub: github.com/hiroki-abe-58/ComfyUI-AceMusic
The Problem: Official Support Was Incomplete
ACE-Step 1.5 is a game-changer for open-source music generation. It outperforms most commercial alternatives, runs on consumer hardware (4GB VRAM), and generates full songs in under 10 seconds on an RTX 3090.
When ComfyUI announced native support, the community was excited. But there was a catch.
From the official ComfyUI blog (February 3rd, 2026):
"ACE-Step 1.5 has a few more tricks up its sleeve. These aren't yet supported in ComfyUI, but we have no doubt the community will figure it out."
The "tricks" they mentioned? Only the most powerful features of ACE-Step 1.5:
| Feature | Description | Official Support |
|---|---|---|
| Cover | Transform any song into a different style | ❌ Not supported |
| Repaint | Regenerate specific sections of audio | ❌ Not supported |
| Edit | Change tags/lyrics while preserving melody | ❌ Not supported |
| Retake | Create variations of existing audio | ❌ Not supported |
| Extend | Add new content before/after audio | ❌ Not supported |
So I built them.
What ComfyUI-AceMusic Offers
Complete Feature Coverage
| Node | Function |
|---|---|
| Model Loader | Downloads and caches ACE-Step 1.5 models |
| Settings | Configure generation parameters |
| Generator | Text-to-Music generation |
| Lyrics Input | Dedicated lyrics input with section markers |
| Caption Input | Style/genre description input |
| Cover | Transform existing audio into different styles |
| Repaint | Regenerate specific time ranges |
| Retake | Create variations with same settings |
| Extend | Add content to beginning or end |
| Edit | Change tags/lyrics, preserve melody (FlowEdit) |
| Conditioning | Combine parameters into conditioning object |
| Generator (from Cond) | Generate from conditioning |
| Load LoRA | Load fine-tuned adapters |
| Understand | Extract metadata from audio |
| Create Sample | Generate params from natural language |
Comparison with Existing Implementations
| Implementation | ACE-Step Version | Cover | Repaint | Edit | Retake | Extend | Win 3.13+ |
|---|---|---|---|---|---|---|---|
| ComfyUI Native | 1.5 | ❌ | ❌ | ❌ | ❌ | ❌ | Untested |
| billwuhao | 1.0 | Partial | ✅ | ❌ | ❌ | ✅ | Untested |
| ryanontheinside | 1.0 | ❌ | ✅ | ❌ | ❌ | ✅ | Untested |
| ComfyUI-AceMusic | 1.5 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Technical Deep Dive
1. Modular Architecture
Previous implementations crammed 30+ parameters into a single node, causing widget ordering issues — a known ComfyUI quirk where input field order can cause unexpected behavior.
ComfyUI-AceMusic separates concerns:
[Model Loader] → Model loading only
[Settings] → Generation parameters only
[Lyrics Input] → Lyrics entry only
[Caption Input] → Style description only
[Generator] → Generation execution only
This separation:
- Eliminates widget ordering bugs
- Improves workflow readability
- Makes nodes reusable across different workflows
- Follows single-responsibility principle
2. Cross-Platform Compatibility
The Problem: torchaudio backends can fail on Windows + Python 3.13+.
The Solution: Use soundfile and scipy instead.
# Problematic approach
import torchaudio
audio, sr = torchaudio.load("file.wav") # Fails on Windows 3.13+
# ComfyUI-AceMusic approach
import soundfile as sf
audio, sr = sf.read("file.wav") # Works everywhere
This isn't just a workaround — it's a more robust solution that works across all platforms without requiring specific backend configurations.
3. HeartMuLa Interoperability
The AUDIO type in ComfyUI-AceMusic is compatible with HeartMuLa outputs, enabling hybrid workflows:
[HeartMuLa Generator] → [AceMusic Cover] → [AceMusic Extend] → [Output]
This lets you combine the strengths of different music generation models in a single workflow.
Quick Start
Installation
Via ComfyUI Manager (Recommended):
Search for "ComfyUI-AceMusic" and install.
Manual:
cd ComfyUI/custom_nodes
git clone https://github.com/hiroki-abe-58/ComfyUI-AceMusic.git
cd ComfyUI-AceMusic
pip install -r requirements.txt
# Install ACE-Step 1.5
pip install git+https://github.com/ace-step/ACE-Step.git
Models auto-download from Hugging Face on first use.
Basic Workflow (Text-to-Music)
- Add AceMusic Model Loader → set device to
cuda - Add AceMusic Settings → configure duration, language, etc.
- Add AceMusic Lyrics Input:
[Verse]
Walking down the empty street
Thinking about you and me
[Chorus]
We belong together
Now and forever
- Add AceMusic Caption Input:
pop, female vocal, energetic - Connect all to AceMusic Generator → Preview Audio
Load the example workflow: workflow/AceMusic_Lyrics_v3.json
Cover Workflow (Style Transfer)
[Load Audio] ──────────────────┐
↓
[Model Loader] → [Settings] → [AceMusic Cover] → [Preview Audio]
↑
[Caption Input] ───────────────┘
"jazz piano trio, smooth, relaxed"
Use cases:
- Pop → Jazz arrangement
- Rock → Acoustic version
- EDM → Orchestral arrangement
Repaint Workflow (Section Regeneration)
[Load Audio] ──────────────────┐
↓
[Model Loader] → [Settings] → [AceMusic Repaint] → [Preview Audio]
↑
[Time Range: 30-45s] ──────────┘
Use cases:
- Fix a problematic chorus
- Improve the intro
- Regenerate specific vocal sections
Performance
Generation Speed
| Device | RTF (27 steps) | Time for 1 min audio |
|---|---|---|
| RTX 5090 | ~50x | ~1.2s |
| RTX 4090 | 34.48x | 1.74s |
| A100 | 27.27x | 2.20s |
| RTX 3090 | 12.76x | 4.70s |
| M2 Max | 2.27x | 26.43s |
VRAM Requirements
| Mode | VRAM | Notes |
|---|---|---|
| Normal | 8GB+ | Full speed |
| CPU Offload | ~4GB | Slower but works on limited VRAM |
Troubleshooting
| Error | Cause | Solution |
|---|---|---|
CUDA out of memory |
Insufficient GPU memory | Enable cpu_offload or reduce duration
|
ModuleNotFoundError: acestep |
ACE-Step not installed | pip install git+https://github.com/ace-step/ACE-Step.git |
soundfile not found |
Missing dependency | pip install soundfile scipy |
Model download failed |
Network issue | Check Hugging Face access |
torchaudio backend error |
Windows 3.13+ issue | Ensure soundfile is properly installed |
Environment Check Script
#!/usr/bin/env python3
"""ComfyUI-AceMusic Environment Checker"""
import sys
def check():
issues = []
# Python version
print(f"Python: {sys.version}")
if sys.version_info < (3, 10):
issues.append("Python 3.10+ required")
# PyTorch + CUDA
try:
import torch
print(f"✅ PyTorch: {torch.__version__}")
if torch.cuda.is_available():
print(f"✅ CUDA: {torch.version.cuda}")
vram = torch.cuda.get_device_properties(0).total_memory / 1e9
print(f"✅ GPU VRAM: {vram:.1f} GB")
else:
issues.append("CUDA not available")
except ImportError:
issues.append("PyTorch not installed")
# ACE-Step
try:
import acestep
print("✅ ACE-Step: installed")
except ImportError:
issues.append("ACE-Step not installed")
# Audio libraries
try:
import soundfile
print("✅ soundfile: installed")
except ImportError:
issues.append("soundfile not installed")
# Results
print("\n" + "="*50)
if issues:
print("❌ Issues found:")
for issue in issues:
print(f" - {issue}")
else:
print("✅ Environment OK!")
if __name__ == "__main__":
check()
Why I Built This
When I saw the official announcement saying "these features aren't yet supported," I knew exactly what needed to be done. The ACE-Step team built an incredible model with Cover, Repaint, Edit, and other powerful features — but without ComfyUI support, most users couldn't access them.
The hardest part was the torchaudio issue. On Windows with Python 3.13+, the audio backends just don't work reliably. The solution was to bypass torchaudio entirely and use soundfile/scipy for all audio I/O. It's a more robust approach that should work on any platform.
The modular architecture came from frustration with existing implementations. Stuffing 30+ parameters into one node isn't just ugly — it causes real bugs. Separating concerns made the nodes more reliable and the workflows more readable.
This is what open source is about. The official team sets the direction, and the community fills in the gaps. I'm proud to contribute to the music generation ecosystem.
Links
- GitHub: github.com/hiroki-abe-58/ComfyUI-AceMusic
- ACE-Step 1.5: github.com/ace-step/ACE-Step-1.5
- ComfyUI Official Blog: ACE-Step 1.5 Announcement
- HeartMuLa (compatible): github.com/filliptm/ComfyUI_FL-HeartMuLa
License
Apache 2.0
If you find this useful, consider starring the repo. And if you build something cool with it, I'd love to see it!
Top comments (0)