DEV Community

GeneLab_999
GeneLab_999

Posted on

ComfyUI-AceMusic: The First Full Implementation of ACE-Step 1.5 Features That "Weren't Yet Supported"

TL;DR

On February 3rd, 2026, the official ComfyUI blog announced ACE-Step 1.5 support with a notable caveat: "Cover, Repaint, and other features aren't yet supported in ComfyUI."

The next day, I released ComfyUI-AceMusic — a complete implementation of all 15 ACE-Step 1.5 features as ComfyUI nodes.

Key highlights:

  • World-first: Full Cover, Repaint, Edit, Retake, Extend support in ComfyUI
  • 15 nodes covering every ACE-Step 1.5 capability
  • Modular architecture that eliminates widget ordering issues
  • Windows + Python 3.13+ compatible using soundfile/scipy instead of problematic torchaudio backends
  • HeartMuLa interoperability for hybrid AI music workflows

GitHub: github.com/hiroki-abe-58/ComfyUI-AceMusic


The Problem: Official Support Was Incomplete

ACE-Step 1.5 is a game-changer for open-source music generation. It outperforms most commercial alternatives, runs on consumer hardware (4GB VRAM), and generates full songs in under 10 seconds on an RTX 3090.

When ComfyUI announced native support, the community was excited. But there was a catch.

From the official ComfyUI blog (February 3rd, 2026):

"ACE-Step 1.5 has a few more tricks up its sleeve. These aren't yet supported in ComfyUI, but we have no doubt the community will figure it out."

The "tricks" they mentioned? Only the most powerful features of ACE-Step 1.5:

Feature Description Official Support
Cover Transform any song into a different style ❌ Not supported
Repaint Regenerate specific sections of audio ❌ Not supported
Edit Change tags/lyrics while preserving melody ❌ Not supported
Retake Create variations of existing audio ❌ Not supported
Extend Add new content before/after audio ❌ Not supported

So I built them.


What ComfyUI-AceMusic Offers

Complete Feature Coverage

Node Function
Model Loader Downloads and caches ACE-Step 1.5 models
Settings Configure generation parameters
Generator Text-to-Music generation
Lyrics Input Dedicated lyrics input with section markers
Caption Input Style/genre description input
Cover Transform existing audio into different styles
Repaint Regenerate specific time ranges
Retake Create variations with same settings
Extend Add content to beginning or end
Edit Change tags/lyrics, preserve melody (FlowEdit)
Conditioning Combine parameters into conditioning object
Generator (from Cond) Generate from conditioning
Load LoRA Load fine-tuned adapters
Understand Extract metadata from audio
Create Sample Generate params from natural language

Comparison with Existing Implementations

Implementation ACE-Step Version Cover Repaint Edit Retake Extend Win 3.13+
ComfyUI Native 1.5 Untested
billwuhao 1.0 Partial Untested
ryanontheinside 1.0 Untested
ComfyUI-AceMusic 1.5

Technical Deep Dive

1. Modular Architecture

Previous implementations crammed 30+ parameters into a single node, causing widget ordering issues — a known ComfyUI quirk where input field order can cause unexpected behavior.

ComfyUI-AceMusic separates concerns:

[Model Loader] → Model loading only
[Settings] → Generation parameters only  
[Lyrics Input] → Lyrics entry only
[Caption Input] → Style description only
[Generator] → Generation execution only
Enter fullscreen mode Exit fullscreen mode

This separation:

  • Eliminates widget ordering bugs
  • Improves workflow readability
  • Makes nodes reusable across different workflows
  • Follows single-responsibility principle

2. Cross-Platform Compatibility

The Problem: torchaudio backends can fail on Windows + Python 3.13+.

The Solution: Use soundfile and scipy instead.

# Problematic approach
import torchaudio
audio, sr = torchaudio.load("file.wav")  # Fails on Windows 3.13+

# ComfyUI-AceMusic approach
import soundfile as sf
audio, sr = sf.read("file.wav")  # Works everywhere
Enter fullscreen mode Exit fullscreen mode

This isn't just a workaround — it's a more robust solution that works across all platforms without requiring specific backend configurations.

3. HeartMuLa Interoperability

The AUDIO type in ComfyUI-AceMusic is compatible with HeartMuLa outputs, enabling hybrid workflows:

[HeartMuLa Generator] → [AceMusic Cover] → [AceMusic Extend] → [Output]
Enter fullscreen mode Exit fullscreen mode

This lets you combine the strengths of different music generation models in a single workflow.


Quick Start

Installation

Via ComfyUI Manager (Recommended):
Search for "ComfyUI-AceMusic" and install.

Manual:

cd ComfyUI/custom_nodes
git clone https://github.com/hiroki-abe-58/ComfyUI-AceMusic.git
cd ComfyUI-AceMusic
pip install -r requirements.txt

# Install ACE-Step 1.5
pip install git+https://github.com/ace-step/ACE-Step.git
Enter fullscreen mode Exit fullscreen mode

Models auto-download from Hugging Face on first use.

Basic Workflow (Text-to-Music)

  1. Add AceMusic Model Loader → set device to cuda
  2. Add AceMusic Settings → configure duration, language, etc.
  3. Add AceMusic Lyrics Input:
   [Verse]
   Walking down the empty street
   Thinking about you and me

   [Chorus]
   We belong together
   Now and forever
Enter fullscreen mode Exit fullscreen mode
  1. Add AceMusic Caption Input: pop, female vocal, energetic
  2. Connect all to AceMusic GeneratorPreview Audio

Load the example workflow: workflow/AceMusic_Lyrics_v3.json

Cover Workflow (Style Transfer)

[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Cover] → [Preview Audio]
                               ↑
[Caption Input] ───────────────┘
"jazz piano trio, smooth, relaxed"
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Pop → Jazz arrangement
  • Rock → Acoustic version
  • EDM → Orchestral arrangement

Repaint Workflow (Section Regeneration)

[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Repaint] → [Preview Audio]
                               ↑
[Time Range: 30-45s] ──────────┘
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Fix a problematic chorus
  • Improve the intro
  • Regenerate specific vocal sections

Performance

Generation Speed

Device RTF (27 steps) Time for 1 min audio
RTX 5090 ~50x ~1.2s
RTX 4090 34.48x 1.74s
A100 27.27x 2.20s
RTX 3090 12.76x 4.70s
M2 Max 2.27x 26.43s

VRAM Requirements

Mode VRAM Notes
Normal 8GB+ Full speed
CPU Offload ~4GB Slower but works on limited VRAM

Troubleshooting

Error Cause Solution
CUDA out of memory Insufficient GPU memory Enable cpu_offload or reduce duration
ModuleNotFoundError: acestep ACE-Step not installed pip install git+https://github.com/ace-step/ACE-Step.git
soundfile not found Missing dependency pip install soundfile scipy
Model download failed Network issue Check Hugging Face access
torchaudio backend error Windows 3.13+ issue Ensure soundfile is properly installed

Environment Check Script

#!/usr/bin/env python3
"""ComfyUI-AceMusic Environment Checker"""
import sys

def check():
    issues = []

    # Python version
    print(f"Python: {sys.version}")
    if sys.version_info < (3, 10):
        issues.append("Python 3.10+ required")

    # PyTorch + CUDA
    try:
        import torch
        print(f"✅ PyTorch: {torch.__version__}")
        if torch.cuda.is_available():
            print(f"✅ CUDA: {torch.version.cuda}")
            vram = torch.cuda.get_device_properties(0).total_memory / 1e9
            print(f"✅ GPU VRAM: {vram:.1f} GB")
        else:
            issues.append("CUDA not available")
    except ImportError:
        issues.append("PyTorch not installed")

    # ACE-Step
    try:
        import acestep
        print("✅ ACE-Step: installed")
    except ImportError:
        issues.append("ACE-Step not installed")

    # Audio libraries
    try:
        import soundfile
        print("✅ soundfile: installed")
    except ImportError:
        issues.append("soundfile not installed")

    # Results
    print("\n" + "="*50)
    if issues:
        print("❌ Issues found:")
        for issue in issues:
            print(f"  - {issue}")
    else:
        print("✅ Environment OK!")

if __name__ == "__main__":
    check()
Enter fullscreen mode Exit fullscreen mode

Why I Built This

When I saw the official announcement saying "these features aren't yet supported," I knew exactly what needed to be done. The ACE-Step team built an incredible model with Cover, Repaint, Edit, and other powerful features — but without ComfyUI support, most users couldn't access them.

The hardest part was the torchaudio issue. On Windows with Python 3.13+, the audio backends just don't work reliably. The solution was to bypass torchaudio entirely and use soundfile/scipy for all audio I/O. It's a more robust approach that should work on any platform.

The modular architecture came from frustration with existing implementations. Stuffing 30+ parameters into one node isn't just ugly — it causes real bugs. Separating concerns made the nodes more reliable and the workflows more readable.

This is what open source is about. The official team sets the direction, and the community fills in the gaps. I'm proud to contribute to the music generation ecosystem.


Links


License

Apache 2.0


If you find this useful, consider starring the repo. And if you build something cool with it, I'd love to see it!

Top comments (0)