<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yha9806</title>
    <description>The latest articles on DEV Community by yha9806 (@yha9806).</description>
    <link>https://dev.to/yha9806</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3876647%2F912bd6a6-dd6f-421c-a7f5-23f7c93f90d7.jpeg</url>
      <title>DEV Community: yha9806</title>
      <link>https://dev.to/yha9806</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yha9806"/>
    <language>en</language>
    <item>
      <title>I Built a Free Local AI Art Pipeline on My Mac — Here's What Broke</title>
      <dc:creator>yha9806</dc:creator>
      <pubDate>Mon, 13 Apr 2026 13:09:33 +0000</pubDate>
      <link>https://dev.to/yha9806/i-built-a-free-local-ai-art-pipeline-on-my-mac-heres-what-broke-3cip</link>
      <guid>https://dev.to/yha9806/i-built-a-free-local-ai-art-pipeline-on-my-mac-heres-what-broke-3cip</guid>
      <description>&lt;p&gt;What if you could run a complete AI art creation pipeline — 13 cultural traditions, 5-dimension scoring, structured layer generation — entirely on your MacBook, for free?&lt;/p&gt;

&lt;p&gt;No cloud API key. No GPU server. Just &lt;code&gt;pip install vulca&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkniad86le2jqhoz3z9as.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkniad86le2jqhoz3z9as.png" alt="Chinese Xieyi ink wash landscape" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9vcc3i7wi2ff9x8chph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9vcc3i7wi2ff9x8chph.png" alt="Japanese traditional snow temple" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ugmb70r1fiqy1buwx91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ugmb70r1fiqy1buwx91.png" alt="Brand design tea packaging" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Three traditions, one SDK — generated locally via ComfyUI/SDXL on Apple Silicon, zero cloud API cost.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These images were generated on an Apple Silicon Mac running ComfyUI locally. No Midjourney subscription. No Replicate credits. No DALL-E API calls. The evaluation scores below come from a VLM (Gemma 4 via Ollama) running on the same machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;vulca evaluate art.png &lt;span class="nt"&gt;-t&lt;/span&gt; chinese_xieyi &lt;span class="nt"&gt;--mode&lt;/span&gt; reference
&lt;span class="go"&gt;
  Score:     90%    Tradition: chinese_xieyi    Risk: low

    L1 Visual Perception         ██████████████████░░ 90%  ✓
    L2 Technical Execution       █████████████████░░░ 85%  ✓
    L3 Cultural Context          ██████████████████░░ 90%  ✓
    L4 Critical Interpretation   ████████████████████ 100%  ✓
    L5 Philosophical Aesthetics  ██████████████████░░ 90%  ✓
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This post is not a product announcement. It is a technical deep dive into what it took to build &lt;a href="https://github.com/vulca-org/vulca" rel="noopener noreferrer"&gt;VULCA&lt;/a&gt; — the bugs we hit, the architectural decisions we made, and the code that holds it together.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. What is VULCA + The Local Stack
&lt;/h2&gt;

&lt;p&gt;VULCA is an AI-native cultural art creation SDK. It generates, evaluates, decomposes, and evolves visual art across 13 cultural traditions. It runs locally (ComfyUI + Ollama) or in the cloud (Gemini).&lt;/p&gt;

&lt;p&gt;Not another Midjourney wrapper or ComfyUI plugin — a standalone SDK for cultural art intelligence.&lt;/p&gt;

&lt;p&gt;The project started as academic research. The &lt;a href="https://aclanthology.org/2025.findings-emnlp/" rel="noopener noreferrer"&gt;VULCA Framework&lt;/a&gt; was published at EMNLP 2025 Findings, and &lt;a href="https://arxiv.org/abs/2601.07986" rel="noopener noreferrer"&gt;VULCA-Bench&lt;/a&gt; provides 7,410 annotated samples with L1-L5 cultural scoring definitions. The SDK implements this research as a production tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│                  vulca CLI                   │
├─────────────┬──────────┬────────────────────┤
│   create    │ evaluate │  layers / studio   │
├─────────────┴──────────┴────────────────────┤
│              Cultural Engine                 │
│   13 traditions × L1-L5 scoring rubrics     │
├──────────────────┬──────────────────────────┤
│  Image Providers │      VLM Providers       │
│  ┌────────────┐  │  ┌────────────────────┐  │
│  │  ComfyUI   │  │  │  Ollama (Gemma 4)  │  │
│  │  (local)   │  │  │  (local)           │  │
│  ├────────────┤  │  ├────────────────────┤  │
│  │  Gemini    │  │  │  Gemini            │  │
│  │  (cloud)   │  │  │  (cloud)           │  │
│  └────────────┘  │  └────────────────────┘  │
└──────────────────┴──────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quickstart
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;vulca

&lt;span class="c"&gt;# Point at your local ComfyUI + Ollama&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VULCA_IMAGE_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8188
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VULCA_VLM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama_chat/gemma4

&lt;span class="c"&gt;# Generate&lt;/span&gt;
vulca create &lt;span class="s2"&gt;"Misty mountains after spring rain"&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; chinese_xieyi &lt;span class="nt"&gt;--provider&lt;/span&gt; comfyui &lt;span class="nt"&gt;-o&lt;/span&gt; art.png

&lt;span class="c"&gt;# Evaluate&lt;/span&gt;
vulca evaluate art.png &lt;span class="nt"&gt;-t&lt;/span&gt; chinese_xieyi &lt;span class="nt"&gt;--mode&lt;/span&gt; reference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Provider Architecture: Pluggable, Not Locked In
&lt;/h3&gt;

&lt;p&gt;VULCA does not depend on any single backend. Image providers are pluggable classes. ComfyUI is one provider. Gemini is another. You can add your own.&lt;/p&gt;

&lt;p&gt;The key design insight: providers declare their capabilities as a frozen set. VULCA uses these capabilities to decide how to format prompts, whether to pass CJK text directly, and whether RGBA output is available.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ComfyUI: CLIP-based encoder, English-only, returns raw RGBA
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ComfyUIImageProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_rgba&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Gemini: LLM-based encoder, understands CJK natively, returns raw RGBA
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GeminiImageProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_rgba&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multilingual_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;multilingual_prompt&lt;/code&gt; capability is the difference between a 120-token structured prompt (Gemini can handle it) and a compressed 60-token flat prompt (CLIP will truncate anything beyond 77 tokens). More on this in section 5.&lt;/p&gt;

&lt;p&gt;When you ask ComfyUI to generate an image, VULCA constructs a complete ComfyUI workflow as a JSON dict and submits it via the REST API. No ComfyUI nodes to install. No custom workflows to import. The entire workflow is built programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KSampler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randbelow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cfg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;7.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sampler_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;euler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scheduler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denoise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;positive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latent_image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CheckpointLoaderSimple&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ckpt_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sd_xl_base_1.0.safetensors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EmptyLatentImage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLIPTextEncode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLIPTextEncode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VAEDecode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;samples&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vae&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SaveImage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename_prefix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vulca&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is from &lt;a href="https://github.com/vulca-org/vulca/blob/master/src/vulca/providers/comfyui.py#L42" rel="noopener noreferrer"&gt;&lt;code&gt;src/vulca/providers/comfyui.py&lt;/code&gt; lines 42-62&lt;/a&gt;. It constructs a standard SDXL pipeline: checkpoint loader, empty latent, two CLIP text encoders (positive + negative), KSampler, VAE decode, save. The workflow is submitted as a single POST to &lt;code&gt;/prompt&lt;/code&gt;, and VULCA polls &lt;code&gt;/history/{prompt_id}&lt;/code&gt; until the image is ready.&lt;/p&gt;

&lt;p&gt;After the image comes back, VULCA validates it is actually a valid PNG before accepting it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\x89&lt;/span&gt;&lt;span class="s"&gt;PNG&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ComfyUI returned invalid image &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes, header=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That validation was added in commit &lt;a href="https://github.com/vulca-org/vulca/commit/fdc0e45" rel="noopener noreferrer"&gt;&lt;code&gt;fdc0e45&lt;/code&gt;&lt;/a&gt; after we discovered that certain PyTorch MPS bugs cause ComfyUI to return 4KB files with valid PNG headers but all-zero pixel data.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. L1-L5 Cultural Evaluation
&lt;/h2&gt;

&lt;p&gt;Most AI art tools generate. VULCA evaluates.&lt;/p&gt;

&lt;p&gt;The evaluation framework scores artwork across five dimensions, each measuring a different aspect of cultural and artistic quality:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;What It Measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L1&lt;/strong&gt; Visual Perception&lt;/td&gt;
&lt;td&gt;Composition, color harmony, spatial arrangement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L2&lt;/strong&gt; Technical Execution&lt;/td&gt;
&lt;td&gt;Rendering quality, technique fidelity, craftsmanship&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L3&lt;/strong&gt; Cultural Context&lt;/td&gt;
&lt;td&gt;Tradition-specific motifs, canonical conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L4&lt;/strong&gt; Critical Interpretation&lt;/td&gt;
&lt;td&gt;Cultural sensitivity, contextual framing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L5&lt;/strong&gt; Philosophical Aesthetics&lt;/td&gt;
&lt;td&gt;Artistic depth, emotional resonance, spiritual qualities&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are not arbitrary categories. They come from the &lt;a href="https://arxiv.org/abs/2601.07986" rel="noopener noreferrer"&gt;VULCA-Bench paper&lt;/a&gt;, which defines L1-L5 across 7,410 annotated samples.&lt;/p&gt;

&lt;h3&gt;
  
  
  13 Traditions, Custom Weights
&lt;/h3&gt;

&lt;p&gt;Each tradition is defined as a YAML file with its own L1-L5 weight distribution. Chinese freehand ink painting (xieyi) weights philosophical aesthetics (L5) at 30% and cultural context (L3) at 25%, because the tradition values spiritual resonance and canonical motifs above raw technical rendering. A brand design tradition would weight L2 (technical execution) much higher.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# src/vulca/cultural/data/traditions/chinese_xieyi.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chinese_xieyi&lt;/span&gt;
&lt;span class="na"&gt;display_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;en&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chinese&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Freehand&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ink&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(Xieyi)"&lt;/span&gt;
  &lt;span class="na"&gt;zh&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;中国写意"&lt;/span&gt;

&lt;span class="na"&gt;weights&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;L1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.10&lt;/span&gt;
  &lt;span class="na"&gt;L2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.15&lt;/span&gt;
  &lt;span class="na"&gt;L3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.25&lt;/span&gt;
  &lt;span class="na"&gt;L4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.20&lt;/span&gt;
  &lt;span class="na"&gt;L5&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.30&lt;/span&gt;

&lt;span class="na"&gt;terminology&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spirit resonance and vitality&lt;/span&gt;
    &lt;span class="na"&gt;term_zh&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;气韵生动"&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;en&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Xie&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;He's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Six&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Principles&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Chinese&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Painting..."&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aesthetics&lt;/span&gt;
    &lt;span class="na"&gt;l_levels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;L4&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;L5&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 13 supported traditions are: &lt;code&gt;chinese_xieyi&lt;/code&gt;, &lt;code&gt;chinese_gongbi&lt;/code&gt;, &lt;code&gt;japanese_traditional&lt;/code&gt;, &lt;code&gt;western_academic&lt;/code&gt;, &lt;code&gt;islamic_geometric&lt;/code&gt;, &lt;code&gt;watercolor&lt;/code&gt;, &lt;code&gt;african_traditional&lt;/code&gt;, &lt;code&gt;south_asian&lt;/code&gt;, &lt;code&gt;brand_design&lt;/code&gt;, &lt;code&gt;photography&lt;/code&gt;, &lt;code&gt;contemporary_art&lt;/code&gt;, &lt;code&gt;ui_ux_design&lt;/code&gt;, and &lt;code&gt;default&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Evaluation Modes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strict&lt;/strong&gt; (judge): Conformance scoring. How well does the art meet the tradition's standards?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference&lt;/strong&gt; (mentor): Cultural guidance with professional terminology. Not a judge, a mentor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fusion&lt;/strong&gt;: Multi-tradition comparison. Pass comma-separated traditions and get cross-cultural analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The API: Three Lines to Score Any Image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vulca&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;vulca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aevaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;artwork.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tradition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chinese_xieyi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full &lt;code&gt;aevaluate()&lt;/code&gt; signature from &lt;a href="https://github.com/vulca-org/vulca/blob/master/src/vulca/evaluate.py#L12" rel="noopener noreferrer"&gt;&lt;code&gt;src/vulca/evaluate.py&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;aevaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tradition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sparse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;EvalResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;sparse&lt;/code&gt; parameter is worth calling out. When &lt;code&gt;sparse=True&lt;/code&gt;, VULCA runs a &lt;code&gt;BriefIndexer&lt;/code&gt; that determines which L1-L5 dimensions are most relevant to the given intent. All five dimensions are still scored (consistency matters), but the &lt;code&gt;sparse_activation&lt;/code&gt; metadata tells callers which dimensions were most salient. This is useful in pipeline mode where you want to focus review on the dimensions that matter for a specific prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Deep Dive: Structured Layer Generation
&lt;/h2&gt;

&lt;p&gt;VULCA does not generate images. It generates layers.&lt;/p&gt;

&lt;p&gt;The pipeline works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intent parsing&lt;/strong&gt; — user prompt is analyzed for tradition, subject, and composition intent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VLM planning&lt;/strong&gt; — Gemma 4 (via Ollama) decomposes the prompt into a layer plan: background, mid-ground elements, foreground, calligraphy/text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-layer generation&lt;/strong&gt; — each layer is generated as a separate image with transparent background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Luminance keying&lt;/strong&gt; — non-background layers are keyed to remove canvas color, producing clean alpha&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alpha composite&lt;/strong&gt; — layers are composited in order to produce the final artwork&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5qkfb9y14nkg7l36whh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5qkfb9y14nkg7l36whh1.png" alt="Layered exploded view" width="800" height="164"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Layer decomposition: paper, distant mountains, forest, calligraphy, composite&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Serial-First Style Anchoring
&lt;/h3&gt;

&lt;p&gt;The first layer generates serially as a style anchor. Its raw RGB output becomes the visual reference (&lt;code&gt;style_ref&lt;/code&gt;) for all subsequent layers, which generate in parallel. This is Defense 3 from v0.14 — without it, each layer would independently interpret "Chinese xieyi style" and you would get five different visual interpretations in the same artwork.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Prompt Builder
&lt;/h3&gt;

&lt;p&gt;The core of layer generation is &lt;code&gt;build_anchored_layer_prompt()&lt;/code&gt; in &lt;a href="https://github.com/vulca-org/vulca/blob/master/src/vulca/layers/layered_prompt.py#L47" rel="noopener noreferrer"&gt;&lt;code&gt;src/vulca/layers/layered_prompt.py&lt;/code&gt;&lt;/a&gt;. This function wraps the plan's regeneration prompt in four mandatory anchor blocks: canvas, content (with negative list), spatial, style.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LayerPromptResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Prompt + negative prompt pair for a layer.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_anchored_layer_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LayerInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TraditionAnchor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sibling_roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;coverage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;english_only&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;LayerPromptResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function has two code paths, controlled by &lt;code&gt;english_only&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When &lt;code&gt;english_only=False&lt;/code&gt;&lt;/strong&gt; (Gemini path): Returns a structured multi-section string with &lt;code&gt;[CANVAS]&lt;/code&gt;, &lt;code&gt;[CONTENT]&lt;/code&gt;, &lt;code&gt;[SPATIAL]&lt;/code&gt;, &lt;code&gt;[STYLE]&lt;/code&gt;, and &lt;code&gt;[USER INTENT]&lt;/code&gt; blocks. Gemini's LLM-based encoder can parse these sections and follow the instructions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[CANVAS]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The image MUST be drawn on &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;canvas_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The background MUST be the pure canvas color &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;canvas_color_hex&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;with absolutely no other elements, textures, shading, or borders.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[CONTENT — exclusivity]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This image ONLY contains the element specified in USER INTENT.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do NOT include any of: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;others_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[SPATIAL]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MUST occupy &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, covering approximately &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cov&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; of the canvas area.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[STYLE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;style_keywords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[USER INTENT]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When &lt;code&gt;english_only=True&lt;/code&gt;&lt;/strong&gt; (ComfyUI/SDXL path): Returns a &lt;code&gt;LayerPromptResult&lt;/code&gt; with a flat, CLIP-friendly prompt under 70 tokens and a separate &lt;code&gt;negative_prompt&lt;/code&gt;. This is the path that took the most engineering to get right. More on why in section 5.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;english_only&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style_keywords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;canvas_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;negative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;others&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;others&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;LayerPromptResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;negative&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  CJK-Aware Prompt Handling
&lt;/h3&gt;

&lt;p&gt;VULCA accepts prompts in Chinese, Japanese, and Korean. When the target provider has the &lt;code&gt;multilingual_prompt&lt;/code&gt; capability (Gemini), CJK text passes through natively. When the provider does not have that capability (ComfyUI/SDXL with CLIP), VULCA strips CJK characters and falls back to English equivalents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_CJK_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[\u4e00-\u9fff\u3040-\u30ff\uac00-\ud7af]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_strip_cjk_parenthetical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Strip CJK parenthetical annotations, e.g. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cooked silk (熟绢)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cooked silk&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_CJK_PAREN_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So &lt;code&gt;vulca create "水墨山水" -t chinese_xieyi --provider comfyui&lt;/code&gt; works — VULCA translates the prompt for CLIP behind the scenes.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Deep Dive: Making SDXL Work Locally
&lt;/h2&gt;

&lt;p&gt;This is where things got interesting. Two traps nearly derailed the local ComfyUI path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trap 1: The ANCHOR Hallucination
&lt;/h3&gt;

&lt;p&gt;Our structured layer prompts originally used section headers like &lt;code&gt;[CANVAS ANCHOR]&lt;/code&gt;, &lt;code&gt;[STYLE ANCHOR]&lt;/code&gt;, and &lt;code&gt;[CONTENT ANCHOR]&lt;/code&gt;. The word "ANCHOR" was there to signal to the LLM that these were fixed constraints, not suggestions.&lt;/p&gt;

&lt;p&gt;SDXL's CLIP encoder is not an LLM. It is a text encoder that treats every token as content. When it saw "ANCHOR", it interpreted it as a request to paint an anchor — the nautical kind.&lt;/p&gt;

&lt;p&gt;The result: literal ship anchors appearing on rice paper backgrounds in Chinese ink wash paintings. Misty mountains with a ship anchor in the corner. Bamboo forests with an anchor hovering over them.&lt;/p&gt;

&lt;p&gt;The fix was trivial once diagnosed. Rename the headers to &lt;code&gt;[CANVAS]&lt;/code&gt;, &lt;code&gt;[STYLE]&lt;/code&gt;, &lt;code&gt;[CONTENT]&lt;/code&gt;, &lt;code&gt;[SPATIAL]&lt;/code&gt;. No word that could be interpreted as visual content.&lt;/p&gt;

&lt;p&gt;Commit: &lt;a href="https://github.com/vulca-org/vulca/commit/b168178" rel="noopener noreferrer"&gt;&lt;code&gt;b168178&lt;/code&gt;&lt;/a&gt; — &lt;code&gt;fix(layers): remove ANCHOR from prompt headers — SDXL paints literal anchors&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The lesson: CLIP-based models do not have a concept of "metadata" or "instructions" in a prompt. Every token is content. If your prompt engineering uses structured headers, every header word will influence the generated image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trap 1b: The 77-Token CLIP Ceiling
&lt;/h3&gt;

&lt;p&gt;Fixing the anchor hallucination revealed a second, subtler problem. Our structured prompt — even without "ANCHOR" — was 120+ tokens. CLIP truncates at 77 tokens. The actual subject description ("misty mountains after spring rain") was buried past the 77-token boundary and never reached the encoder.&lt;/p&gt;

&lt;p&gt;Gallery images (simple prompts, ~30 tokens) worked perfectly. Layered generation (structured prompts, 120+ tokens) produced generic, unfocused results. The debugging was confusing because the same code path worked for simple creates but failed for layered creates.&lt;/p&gt;

&lt;p&gt;The fix: the &lt;code&gt;english_only&lt;/code&gt; branch in &lt;code&gt;build_anchored_layer_prompt()&lt;/code&gt;. Instead of a structured multi-section prompt, VULCA builds a flat, subject-first prompt under 70 tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;misty mountains after spring rain, traditional brushwork, ink wash, on aged xuan paper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus a separate &lt;code&gt;negative_prompt&lt;/code&gt; field (other layer roles to avoid). The subject comes first so it is guaranteed to be within CLIP's 77-token window.&lt;/p&gt;

&lt;p&gt;Commit: &lt;a href="https://github.com/vulca-org/vulca/commit/74f9952" rel="noopener noreferrer"&gt;&lt;code&gt;74f9952&lt;/code&gt;&lt;/a&gt; — &lt;code&gt;fix(layers): CLIP-aware prompt compression for SDXL — flat &amp;lt;70 token prompt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;LayerPromptResult&lt;/code&gt; dataclass was added specifically for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LayerPromptResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Prompt + negative prompt pair for a layer.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;negative_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structured string (Gemini path) returns a single &lt;code&gt;str&lt;/code&gt;. The CLIP path returns a &lt;code&gt;LayerPromptResult&lt;/code&gt; with both positive and negative prompts separated. The caller checks &lt;code&gt;isinstance(result, LayerPromptResult)&lt;/code&gt; to decide which ComfyUI workflow nodes to populate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trap 2: PyTorch MPS — A Version Minefield
&lt;/h3&gt;

&lt;p&gt;With prompt engineering fixed, we hit the hardware layer. SDXL generation via ComfyUI on Apple Silicon (MPS backend) with PyTorch 2.11.0 produces black (all-zero, ~4KB) or noise (~2MB random pixels) images.&lt;/p&gt;

&lt;p&gt;Key observations that made this hard to diagnose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KSampler diffusion runs to completion — 20 steps, progress bars, no errors&lt;/li&gt;
&lt;li&gt;VAEDecode output is corrupt despite successful sampling&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--force-fp32&lt;/code&gt; does NOT fix it — this is a correctness bug, not a precision issue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three compounding PyTorch MPS bugs cause the failure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 1: SDPA Non-Contiguous Tensor Regression&lt;/strong&gt; (&lt;a href="https://github.com/pytorch/pytorch/issues/163597" rel="noopener noreferrer"&gt;pytorch/pytorch#163597&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Introduced in PyTorch 2.8.0. MPS SDPA kernels produce wildly incorrect results when given non-contiguous tensors. SDXL's cross-attention performs transpose operations that create non-contiguous views, feeding garbage embeddings into the U-Net. Error magnitude: ~34.0 vs normal ~0.000006.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 2: Conv2d Chunk Correctness Bug&lt;/strong&gt; (&lt;a href="https://github.com/pytorch/pytorch/issues/169342" rel="noopener noreferrer"&gt;pytorch/pytorch#169342&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Affects PyTorch 2.9.0+. The &lt;code&gt;chunk() -&amp;gt; conv()&lt;/code&gt; pattern produces correct results only for the first batch element. Single-image generation (batch=1) is unaffected. Multi-image batch workflows will hit it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 3: Metal Kernel Migration Regressions&lt;/strong&gt; (&lt;a href="https://github.com/pytorch/pytorch/issues/155797" rel="noopener noreferrer"&gt;pytorch/pytorch#155797&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;PyTorch 2.10-2.11 introduced additional MPS regressions during internal operator migrations. Identical symptoms reported on M3 Ultra via &lt;a href="https://github.com/Comfy-Org/ComfyUI/issues/10681" rel="noopener noreferrer"&gt;ComfyUI#10681&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why VAEDecode Is the Failure Point
&lt;/h3&gt;

&lt;p&gt;The VAE decoder is uniquely vulnerable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses Conv2d with large channel counts (hit by Bug 2)&lt;/li&gt;
&lt;li&gt;Uses GroupNorm with float16 inputs (NaN propagation)&lt;/li&gt;
&lt;li&gt;Single-pass decoder with no self-correction like iterative KSampler&lt;/li&gt;
&lt;li&gt;Intermediate values explode to 9.5e+25, GroupNorm cannot recover, output is all-zero or random&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Version Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PyTorch Version&lt;/th&gt;
&lt;th&gt;SDXL on MPS&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2.4.1&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Last fully validated version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.5.x&lt;/td&gt;
&lt;td&gt;Degraded&lt;/td&gt;
&lt;td&gt;Memory +50%, speed -60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.6.x&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Some SDPA issues, &lt;code&gt;--force-fp32&lt;/code&gt; can help&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.7.x&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Similar to 2.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.8.0&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;td&gt;SDPA non-contiguous bug introduced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2.9.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Working&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Sweet spot&lt;/strong&gt;: pre-Metal migration, SDPA bug masked by ComfyUI's attention slicing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.10.0&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;td&gt;Black images on M3 Ultra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.11.0&lt;/td&gt;
&lt;td&gt;Broken&lt;/td&gt;
&lt;td&gt;Black/noise on Apple Silicon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In ComfyUI venv&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/dev/ComfyUI
./venv/bin/pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.9.0 &lt;span class="nv"&gt;torchvision&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;0.24.0 &lt;span class="nv"&gt;torchaudio&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.9.0
./venv/bin/python main.py &lt;span class="nt"&gt;--listen&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8188
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pin &lt;code&gt;torch==2.9.0&lt;/code&gt;. That is the entire fix. We wrote a &lt;a href="https://github.com/vulca-org/vulca/blob/master/docs/apple-silicon-mps-comfyui-guide.md" rel="noopener noreferrer"&gt;complete Apple Silicon MPS + ComfyUI/SDXL Compatibility Guide&lt;/a&gt; that covers diagnosis, workarounds (CPU VAE, force-fp32), environment variables, and verification steps.&lt;/p&gt;

&lt;p&gt;The guide is at &lt;a href="https://github.com/vulca-org/vulca/blob/master/docs/apple-silicon-mps-comfyui-guide.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/apple-silicon-mps-comfyui-guide.md&lt;/code&gt;&lt;/a&gt; in the repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Inpainting and Layer Editing
&lt;/h2&gt;

&lt;p&gt;Once you have layers, you can edit them individually without regenerating the entire artwork.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnuajrw0821aqdmdc6bf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnuajrw0821aqdmdc6bf.png" alt="Inpaint comparison" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Redraw a specific layer with a new instruction&lt;/span&gt;
vulca layers redraw ./layers/ &lt;span class="nt"&gt;--layer&lt;/span&gt; sky &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"warm golden sunset"&lt;/span&gt;

&lt;span class="c"&gt;# Region-based inpaint on the composite&lt;/span&gt;
vulca inpaint art.png &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="s2"&gt;"the sky"&lt;/span&gt; &lt;span class="nt"&gt;--instruction&lt;/span&gt; &lt;span class="s2"&gt;"stormy clouds"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inpaint path uses the same provider architecture. ComfyUI receives an inpaint workflow with a mask, Gemini receives the image + mask + instruction as a multipart prompt. The same &lt;code&gt;capabilities&lt;/code&gt; system determines prompt formatting.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. What's Working, What's Next
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Current State (v0.15.1)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;All 13 traditions generating locally on Apple Silicon via ComfyUI + SDXL&lt;/li&gt;
&lt;li&gt;Full E2E pipeline validated: intent parsing, VLM planning, per-layer generation, keying, composite&lt;/li&gt;
&lt;li&gt;8 E2E phase tests passing in 2.4 seconds (mock mode)&lt;/li&gt;
&lt;li&gt;CJK prompts working end-to-end with automatic CLIP compression&lt;/li&gt;
&lt;li&gt;PNG response validation catches corrupt MPS output&lt;/li&gt;
&lt;li&gt;Structured layer generation with serial-first style anchoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Commit Trail
&lt;/h3&gt;

&lt;p&gt;The local provider path was stabilized across these commits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/b168178" rel="noopener noreferrer"&gt;&lt;code&gt;b168178&lt;/code&gt;&lt;/a&gt; — remove ANCHOR from prompt headers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/42e0e3d" rel="noopener noreferrer"&gt;&lt;code&gt;42e0e3d&lt;/code&gt;&lt;/a&gt; — skip keying for background layers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/fdc0e45" rel="noopener noreferrer"&gt;&lt;code&gt;fdc0e45&lt;/code&gt;&lt;/a&gt; — validate ComfyUI PNG response&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/74f9952" rel="noopener noreferrer"&gt;&lt;code&gt;74f9952&lt;/code&gt;&lt;/a&gt; — CLIP-aware prompt compression&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/e840496" rel="noopener noreferrer"&gt;&lt;code&gt;e840496&lt;/code&gt;&lt;/a&gt; — MPS compatibility guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vulca-org/vulca/commit/485067e" rel="noopener noreferrer"&gt;&lt;code&gt;485067e&lt;/code&gt;&lt;/a&gt; — v0.15.1 release&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Roadmap
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini cloud path&lt;/strong&gt;: Currently blocked on free-tier billing limits (image generation returns &lt;code&gt;limit: 0&lt;/code&gt;). Text + VLM vision work. Once billing is enabled, Gemini becomes the zero-setup cloud alternative.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SAM3 text-prompted segmentation&lt;/strong&gt;: Replace luminance keying with SAM3 for cleaner layer extraction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web UI / Gradio demo&lt;/strong&gt;: A browser-based interface for non-CLI users.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. Get Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5-Minute Local Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install VULCA&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;vulca

&lt;span class="c"&gt;# 2. Install ComfyUI (if you don't have it)&lt;/span&gt;
git clone https://github.com/comfyanonymous/ComfyUI
&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
./venv/bin/pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="c"&gt;# CRITICAL: pin PyTorch for Apple Silicon&lt;/span&gt;
./venv/bin/pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.9.0 &lt;span class="nv"&gt;torchvision&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;0.24.0 &lt;span class="nv"&gt;torchaudio&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.9.0

&lt;span class="c"&gt;# 3. Download SDXL checkpoint&lt;/span&gt;
&lt;span class="c"&gt;# Place sd_xl_base_1.0.safetensors in ComfyUI/models/checkpoints/&lt;/span&gt;

&lt;span class="c"&gt;# 4. Start ComfyUI&lt;/span&gt;
./venv/bin/python main.py &lt;span class="nt"&gt;--listen&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8188

&lt;span class="c"&gt;# 5. Install Ollama + Gemma 4 (for VLM evaluation)&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama
ollama pull gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 6. Generate and evaluate&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VULCA_IMAGE_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8188
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VULCA_VLM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama_chat/gemma4

vulca create &lt;span class="s2"&gt;"Misty mountains after spring rain"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; chinese_xieyi &lt;span class="nt"&gt;--provider&lt;/span&gt; comfyui &lt;span class="nt"&gt;-o&lt;/span&gt; art.png

vulca evaluate art.png &lt;span class="nt"&gt;-t&lt;/span&gt; chinese_xieyi &lt;span class="nt"&gt;--mode&lt;/span&gt; reference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vulca&lt;/span&gt;

&lt;span class="c1"&gt;# Evaluate any image
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;vulca&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aevaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;artwork.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tradition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chinese_xieyi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Access individual dimension scores
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;L1 Visual: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;L5 Philosophy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;l5&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Overall: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What VULCA Is
&lt;/h3&gt;

&lt;p&gt;VULCA is an open-source SDK for AI-native cultural art creation. It brings cultural intelligence to AI art generation. 13 traditions, each with its own L1-L5 scoring rubric, terminology, and taboos.&lt;/p&gt;

&lt;p&gt;It is built on peer-reviewed research (EMNLP 2025 Findings), tested against 7,410 annotated samples (VULCA-Bench), and runs entirely on your local machine if you want it to.&lt;/p&gt;

&lt;h3&gt;
  
  
  What VULCA Is Not
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Not a ComfyUI plugin. ComfyUI is one of several image providers.&lt;/li&gt;
&lt;li&gt;Not a Midjourney alternative. VULCA does not host image generation — it orchestrates it.&lt;/li&gt;
&lt;li&gt;Not a wrapper around any single model. Swap ComfyUI for Gemini (or your own provider) with one config change.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/vulca-org/vulca" rel="noopener noreferrer"&gt;https://github.com/vulca-org/vulca&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/vulca/" rel="noopener noreferrer"&gt;https://pypi.org/project/vulca/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MPS Guide&lt;/strong&gt;: &lt;a href="https://github.com/vulca-org/vulca/blob/master/docs/apple-silicon-mps-comfyui-guide.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/apple-silicon-mps-comfyui-guide.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research&lt;/strong&gt;: &lt;a href="https://aclanthology.org/2025.findings-emnlp/" rel="noopener noreferrer"&gt;VULCA Framework (EMNLP 2025 Findings)&lt;/a&gt; | &lt;a href="https://arxiv.org/abs/2601.07986" rel="noopener noreferrer"&gt;VULCA-Bench (arXiv)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://pypi.org/project/vulca/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fpypi%2Fv%2Fvulca.svg" alt="PyPI" width="86" height="20"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://pypi.org/project/vulca/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2Fpython-3.10%2B-blue.svg" alt="Python 3.10+" width="92" height="20"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/vulca-org/vulca/blob/master/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2Flicense-Apache%25202.0-green.svg" alt="License: Apache 2.0" width="120" height="20"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If this resonates, &lt;a href="https://github.com/vulca-org/vulca" rel="noopener noreferrer"&gt;star us on GitHub&lt;/a&gt;. Try it, break it, tell us what failed — &lt;a href="https://github.com/vulca-org/vulca/issues" rel="noopener noreferrer"&gt;issues welcome&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you use VULCA in research, please cite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight bibtex"&gt;&lt;code&gt;&lt;span class="nc"&gt;@inproceedings&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;vulca2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{VULCA: A Framework for Cultural Art Evaluation}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;booktitle&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{Findings of the Association for Computational Linguistics: EMNLP 2025}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;year&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{2025}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj032vjdb2m6n19gmevmc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj032vjdb2m6n19gmevmc.png" alt="Tradition grid" width="800" height="532"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;13 traditions. One SDK. Your machine.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
