<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Technology at Worldline </title>
    <description>The latest articles on DEV Community by Technology at Worldline  (@worldlinetech).</description>
    <link>https://dev.to/worldlinetech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F6396%2Fae09e152-87fc-4390-b5d7-cb90729471d2.png</url>
      <title>DEV Community: Technology at Worldline </title>
      <link>https://dev.to/worldlinetech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/worldlinetech"/>
    <language>en</language>
    <item>
      <title>Modern concurrency on the JVM: Coroutines and Loom</title>
      <dc:creator>Yassine Benabbas </dc:creator>
      <pubDate>Tue, 02 Jun 2026 10:25:22 +0000</pubDate>
      <link>https://dev.to/worldlinetech/modern-concurrency-approaches-on-the-jvm-coroutines-and-loom-3bd4</link>
      <guid>https://dev.to/worldlinetech/modern-concurrency-approaches-on-the-jvm-coroutines-and-loom-3bd4</guid>
      <description>&lt;p&gt;Concurrent programming allows running multiple tasks simultaneously, and was historically achieved mostly through threads.&lt;br&gt;
However, they are harder to work with and have some limitations.&lt;br&gt;
Modern concurrent programming brings new concepts that make concurrent programming easier and more efficient.&lt;br&gt;
Two of those concepts are structured concurrency and lightweight threads, which are implemented in the JVM ecosystem through Kotlin coroutines and Project Loom.&lt;br&gt;
Let's explore these two approaches.&lt;/p&gt;
&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Traditional (thread-based) concurrency has two notable issues, which are the famous &lt;a href="https://callbackhell.com/" rel="noopener noreferrer"&gt;callback hell&lt;/a&gt; and high consumption of system resources due to extensive thread creation.&lt;br&gt;
The following example illustrates the extensive use of threads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;uniqueThreadNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newKeySet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1_000&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Simulate IO request (database, HTTP call, ...)&lt;/span&gt;
        &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;uniqueThreadNames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;threadId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uniqueThreadNames&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}).&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code creates 1000 threads that each sleep for 1 second and print the current number of unique thread IDs.&lt;br&gt;
In the example, we use a &lt;code&gt;ConcurrentHashMap&lt;/code&gt; to store the unique thread IDs while remaining thread-safe.&lt;br&gt;
The output is non-deterministic because threads run concurrently, but it should show growth toward 1000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1
...
...
999
1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means up to 1000 unique threads can be created, which is expected since we create 1000 threads in the code.&lt;/p&gt;

&lt;p&gt;By analyzing both the code and the output, we can note two above-mentioned problems.&lt;br&gt;
The first one is that the code is susceptible to callback hell: since threads are created with a lambda, they require a callback style of programming if we want to perform actions after the thread completes, which can lead to deeply nested code that is hard to read, predict and maintain.&lt;br&gt;
The second problem is that system resources are not optimized: creating 1000 threads can be resource intensive, especially if the tasks are I/O-bound and spend much of their time waiting.&lt;br&gt;
This can lead to high memory consumption and context switching overhead.&lt;br&gt;
In addition to that, there is a limit to the number of threads that can be created by the OS.&lt;br&gt;
We can verify this by increasing the number of threads to 1 million, which will throw an &lt;code&gt;OutOfMemoryError&lt;/code&gt; or &lt;code&gt;Too many threads&lt;/code&gt; error depending on the OS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}).&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the above code will throw an error similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0.536s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached.
[0.536s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-4068"
Exception in thread "main" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:1417)
        at MillionThreads.main(MillionThreads.java:9)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Modern concurrency concepts solve these issues as we'll see in the next sections.&lt;br&gt;
Let's start by defining some concepts before delving into concrete implementations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Modern concurrency concepts
&lt;/h2&gt;

&lt;p&gt;There are two main concepts in modern concurrency that we will explore in this post: lightweight threads and structured concurrency.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lightweight threads
&lt;/h3&gt;

&lt;p&gt;They are threads that are managed by the runtime (like the JVM or Kotlin runtime) instead of the operating system.&lt;br&gt;
They still run on top of OS threads, also called platform threads or carrier threads (because they carry the lightweight threads).&lt;br&gt;
However, lightweight threads can reuse platform threads, which allows bypassing the limit of OS threads.&lt;br&gt;
In addition to that, the usage of system resources is optimized, since platform threads are more expensive to create and maintain than lightweight ones.&lt;/p&gt;

&lt;p&gt;Lightweight threads are very efficient for tasks that spend most of their time waiting, such as I/O-bound (network, file I/O, etc.).&lt;br&gt;
This means that we can spawn a large number of lightweight threads that download files or communicate with printers without worrying about system resources, which is not the case with traditional threads.&lt;br&gt;
It is important to note that compute-intensive tasks are still bound to the raw CPU and GPU cores and power. Thus, running a large number of compute-intensive tasks efficiently in parallel is not possible on low-end hardware, even with lightweight threads.&lt;/p&gt;

&lt;p&gt;The JVM has two implementations of lightweight threads: coroutines (introduced by Kotlin) and virtual threads (introduced by Project Loom).&lt;/p&gt;
&lt;h3&gt;
  
  
  Structured concurrency
&lt;/h3&gt;

&lt;p&gt;Structured concurrency is a programming paradigm that aims to make concurrent code similar to a sequential one.&lt;br&gt;
It is achieved by providing APIs that replace traditional callback-based code with constructs that enforce a sequential structure for concurrent tasks.&lt;br&gt;
We can also see related structured-concurrency patterns in JavaScript, C# and Swift with the &lt;code&gt;async&lt;/code&gt;/&lt;code&gt;await&lt;/code&gt; model.&lt;/p&gt;

&lt;p&gt;In the JVM ecosystem, structured concurrency is implemented by Kotlin coroutines and Project Loom.&lt;/p&gt;
&lt;h2&gt;
  
  
  Kotlin coroutines
&lt;/h2&gt;

&lt;p&gt;A coroutine is a lightweight thread that is managed by the Kotlin runtime.&lt;br&gt;
Coroutines were initially proposed in &lt;a href="https://github.com/JetBrains/kotlin/releases/tag/build-1.1-M01" rel="noopener noreferrer"&gt;Kotlin version 1.1 M01&lt;/a&gt;, released in 2017.&lt;br&gt;
The first &lt;a href="https://github.com/Kotlin/kotlinx.coroutines/releases/tag/1.0.0" rel="noopener noreferrer"&gt;stable version&lt;/a&gt; was released with Kotlin 1.3 one year later.&lt;/p&gt;

&lt;p&gt;Two concepts are essential to understand coroutines: &lt;strong&gt;suspending functions&lt;/strong&gt; and &lt;strong&gt;CoroutineScope&lt;/strong&gt;.&lt;br&gt;
Coroutines run inside a &lt;strong&gt;CoroutineScope&lt;/strong&gt;, which is a context that defines the lifecycle of the coroutines.&lt;br&gt;
A &lt;strong&gt;suspending function&lt;/strong&gt; is a function that is marked with the &lt;code&gt;suspend&lt;/code&gt; keyword.&lt;br&gt;
Any function that calls suspending functions must be marked as &lt;code&gt;suspend&lt;/code&gt; as well (similar to the &lt;code&gt;async&lt;/code&gt; keyword in other languages).&lt;/p&gt;

&lt;p&gt;Regarding the lightweight aspect of coroutines, we can illustrate this by creating thousands of coroutines without worrying about system resources.&lt;br&gt;
Let's illustrate this with a program that creates 1 million coroutines that each sleep for 1 second and then prints the number of unique coroutines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;org.example&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.coroutines.*&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlin.time.Duration.Companion.seconds&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;java.util.concurrent.ConcurrentHashMap&lt;/span&gt;

&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="c1"&gt;// Thread safe set&lt;/span&gt;
  &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;uniqueThreadNames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;newKeySet&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
  &lt;span class="nf"&gt;coroutineScope&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;uniqueThreadNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;currentThread&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;threadId&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unique threads used: ${uniqueThreadNames.size}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output depends on the number of CPU cores.&lt;br&gt;
It should be something like this on an M1 Mac with 8 cores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Unique threads used: 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that the coroutines are efficiently scheduled on the available hardware cores, without the overhead of creating a large number of OS threads.&lt;br&gt;
In fact, we can even increase the number of coroutines to more than 1 million without any issue, which is not possible with traditional (OS) threads.&lt;/p&gt;

&lt;p&gt;The second aspect of coroutines is structured concurrency, which allows writing concurrent code that looks like sequential code.&lt;br&gt;
Let's see an example of how to create a coroutine scope that launches two coroutines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;org.example&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.coroutines.*&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlin.time.Duration.Companion.seconds&lt;/span&gt;

&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="nf"&gt;coroutineScope&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// First coroutine&lt;/span&gt;
    &lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
      &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Start of coroutine 1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"End of coroutine 1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Second coroutine&lt;/span&gt;
    &lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I am another coroutine"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Coroutine scope completed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The coroutine scope is created with the &lt;code&gt;coroutineScope&lt;/code&gt; suspending function (defined with the &lt;code&gt;suspend&lt;/code&gt; qualifier).&lt;br&gt;
Since it is a suspending function, then the &lt;code&gt;main&lt;/code&gt; function that calls it must be marked as &lt;code&gt;suspend&lt;/code&gt; as well.&lt;br&gt;
That's why the &lt;code&gt;main&lt;/code&gt; function is defined with &lt;code&gt;suspend fun main()&lt;/code&gt;.&lt;br&gt;
The coroutine scope launches two coroutines with the &lt;code&gt;launch&lt;/code&gt; function (&lt;code&gt;launch&lt;/code&gt; creates a coroutine and runs it).&lt;br&gt;
The first one prints a message, waits for 1 second and then prints another message.&lt;br&gt;
The second one simply prints a message.&lt;/p&gt;

&lt;p&gt;Can you guess the output of this code? Here is the answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start of coroutine 1
I am another coroutine
End of coroutine 1
Coroutine scope completed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the first coroutine waits for 1 second, the second coroutine is executed while the first one is suspended.&lt;br&gt;
What if we want to start the second one only after the first one completes?&lt;br&gt;
That can be achieved with the &lt;code&gt;join&lt;/code&gt; function, which waits for the completion of a coroutine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;org.example&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlinx.coroutines.*&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;kotlin.time.Duration.Companion.seconds&lt;/span&gt;

&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="nf"&gt;coroutineScope&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Keep a reference to the first coroutine&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;coroutine&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Start of coroutine 1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"End of coroutine 1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Wait for the first coroutine to complete&lt;/span&gt;
    &lt;span class="n"&gt;coroutine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;// Second coroutine&lt;/span&gt;
    &lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I am another coroutine"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nf"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Coroutine scope completed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this code is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start of coroutine 1
End of coroutine 1
I am another coroutine
Coroutine scope completed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By getting a reference to the first coroutine with &lt;code&gt;val job1 = launch { ... }&lt;/code&gt;, we call &lt;code&gt;job1.join()&lt;/code&gt; to wait for its completion before starting the second one.&lt;/p&gt;

&lt;p&gt;The above two examples show the essence of structured concurrency: the code looks like sequential code, but it is actually concurrent code.&lt;br&gt;
In practice, this removes callback hell while keeping the code easy to follow.&lt;/p&gt;

&lt;p&gt;Let's now see how the JDK implements modern concurrency.&lt;/p&gt;
&lt;h2&gt;
  
  
  Java's virtual threads and structured concurrency
&lt;/h2&gt;

&lt;p&gt;In addition to Kotlin coroutines, the JDK natively achieves modern concurrency through two APIs: virtual threads and structured concurrency.&lt;/p&gt;

&lt;p&gt;Virtual threads are the JDK implementation of lightweight threads, and are introduced by Project Loom.&lt;br&gt;
They share a similar API with platform threads because both implement the &lt;code&gt;Thread&lt;/code&gt; interface.&lt;br&gt;
They can be created using the static method &lt;code&gt;Thread.ofVirtual()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The following code snippet illustrates the creation of a platform thread and a virtual thread.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;///usr/bin/env jbang "$0" "$@" ; exit $?&lt;/span&gt;
&lt;span class="c1"&gt;//JAVA 25+&lt;/span&gt;
&lt;span class="c1"&gt;//PREVIEW&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Platform (or OS) thread&lt;/span&gt;
  &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofPlatform&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
  &lt;span class="o"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Virtual thread&lt;/span&gt;
  &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofVirtual&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
  &lt;span class="o"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of this code is similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thread[#25,Thread-0,5,main]
VirtualThread[#27]/runnable@ForkJoinPool-1-worker-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above logs, we can confirm that the platform thread is spawned from the main thread, while the virtual thread runs on a worker thread in a &lt;a href="https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html" rel="noopener noreferrer"&gt;ForkJoinPool&lt;/a&gt;.&lt;br&gt;
The ForkJoinPool is an &lt;a href="https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/concurrent/Executor.html" rel="noopener noreferrer"&gt;executor&lt;/a&gt; that is specialized in running tasks that can be broken down into smaller tasks.&lt;br&gt;
The worker thread is the platform thread that runs the virtual thread.&lt;/p&gt;

&lt;p&gt;Thanks to their nature, and similarly to Kotlin coroutines, we can create a large number of virtual threads without worrying about system resources.&lt;br&gt;
The following code snippet creates 1 million virtual threads and prints additional information about them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;uniqueWorkers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newKeySet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;uniqueThreadPools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newKeySet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofVirtual&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;threadInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threadInfo&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// threadInfo will be something like VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id)&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;workerStartIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;indexOf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ForkJoinPool"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;workerName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threadInfo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;substring&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerStartIndex&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;uniqueWorkers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workerName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Wait for all the virtual threads to finish&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uniqueWorkers&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's explain some parts.&lt;br&gt;
The line &lt;code&gt;var threadInfo = Thread.currentThread().toString();&lt;/code&gt;, generates a string that looks like this: &lt;code&gt;VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id)&lt;/code&gt;.&lt;br&gt;
The next lines extract the &lt;code&gt;ForkJoinPool&lt;/code&gt; id and the &lt;code&gt;worker&lt;/code&gt; id, where the worker thread is the platform thread that runs the virtual thread.&lt;br&gt;
So, at the end of the execution, we can see how many worker threads have been associated with virtual-thread execution.&lt;/p&gt;

&lt;p&gt;The log output of the above code is similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
VirtualThread[#1000029]/runnable@ForkJoinPool-1-worker-4
VirtualThread[#1000031]/runnable@ForkJoinPool-1-worker-1
VirtualThread[#1000032]/runnable@ForkJoinPool-1-worker-7
VirtualThread[#1000034]/runnable@ForkJoinPool-1-worker-4
...
ForkJoinPool-1-worker-8
ForkJoinPool-1-worker-6
ForkJoinPool-1-worker-7
ForkJoinPool-1-worker-1
ForkJoinPool-1-worker-4
ForkJoinPool-1-worker-5
ForkJoinPool-1-worker-2
ForkJoinPool-1-worker-3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can divide the log output into two parts.&lt;br&gt;
The first part prints 1 million lines in this format: &lt;code&gt;VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id)&lt;/code&gt;.&lt;br&gt;
The second part consists of eight lines with this format: &lt;code&gt;ForkJoinPool-1-worker-(1 to 8)&lt;/code&gt;.&lt;br&gt;
In this run, all virtual threads appear to be scheduled on the same &lt;code&gt;ForkJoinPool&lt;/code&gt; instance.&lt;br&gt;
The second part, with only eight lines, shows that virtual threads are scheduled on 8 unique worker threads, which is also the number of CPU cores on the machine where the code is executed.&lt;br&gt;
This means that the virtual threads are efficiently scheduled on the available hardware cores, without the overhead of creating a large number of OS threads.&lt;/p&gt;

&lt;p&gt;The other aspect of Java's modern concurrency is structured concurrency.&lt;br&gt;
The class that provides this feature is &lt;a href="https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/StructuredTaskScope.html" rel="noopener noreferrer"&gt;&lt;code&gt;StructuredTaskScope&lt;/code&gt;&lt;/a&gt;.&lt;br&gt;
It is available in Java 21 as a preview feature (it is still the case in Java 25).&lt;br&gt;
It returns an object, usually created with a try-with-resources block, that we'll call a &lt;strong&gt;scope&lt;/strong&gt;.&lt;br&gt;
That scope is used to launch concurrent tasks with the &lt;code&gt;fork&lt;/code&gt; method, and to wait for their completion with the &lt;code&gt;join&lt;/code&gt; method.&lt;br&gt;
Chaining the &lt;code&gt;join&lt;/code&gt; method with the &lt;code&gt;fork&lt;/code&gt; method allows creating a sequential structure for concurrent tasks, which is the essence of structured concurrency.&lt;br&gt;
The following code snippet illustrates the use of structured concurrency.&lt;br&gt;
It launches two tasks concurrently and then launches a third one after the first two complete.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;///usr/bin/env jbang "$0" "$@" ; exit $?&lt;/span&gt;
&lt;span class="c1"&gt;//JAVA 25+&lt;/span&gt;
&lt;span class="c1"&gt;//PREVIEW&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StructuredTaskScope&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;task1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fork&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Task 1"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="c1"&gt;//long running task such as a network call or a database query&lt;/span&gt;
      &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fork&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Task 2"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sum: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;task3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fork&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Task 3 runs after task 1 and task 2"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sum: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;task2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;task3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;printStackTrace&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the above code will produce an output similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task 1
Task 2
Sum: 3
Task 3 runs after task 1 and task 2
Sum: 6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see that the first two tasks are launched concurrently, and their results are printed after their completion.&lt;br&gt;
Then, the third task is launched after the completion of the first two tasks, which shows the sequential structure of the concurrent code.&lt;br&gt;
This, again, is the essence of structured concurrency: the code looks like sequential code, but it is actually concurrent code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coroutines vs Java modern concurrency
&lt;/h2&gt;

&lt;p&gt;While coroutines and Java's modern concurrency are two implementations of the same concepts, lightweight threads and structured concurrency, they have some differences and synergies that are worth mentioning.&lt;/p&gt;

&lt;p&gt;Coroutines have been stable since 2017, while Java's modern concurrency is still experimental as a whole (virtual threads are stable, but structured concurrency is not).&lt;br&gt;
This means that coroutines are currently more mature and widely adopted.&lt;/p&gt;

&lt;p&gt;Since virtual threads use the &lt;code&gt;Thread&lt;/code&gt; API, they can be used in Java and Kotlin, while coroutines can only be used in Kotlin.&lt;br&gt;
Also, virtual threads can be used by frameworks behind the scenes, since it is sometimes just a matter of changing the thread factory.&lt;br&gt;
This means that developers might not be aware of them or need to change their code to use them, while coroutines require explicit usage of the API.&lt;/p&gt;

&lt;p&gt;A synergy between the two APIs is possible when Kotlin runs on JVM 21+.&lt;br&gt;
In fact, coroutines can be dispatched on Java virtual threads by backing a coroutine dispatcher with &lt;code&gt;Executors.newVirtualThreadPerTaskExecutor().asCoroutineDispatcher()&lt;/code&gt;.&lt;br&gt;
This can be useful for blocking I/O operations, but it is best to benchmark your workload before replacing &lt;code&gt;Dispatchers.IO&lt;/code&gt;.&lt;br&gt;
Because Kotlin runs on the JVM, it can call Java's structured concurrency API directly, which is useful when working with existing Java code or libraries.&lt;/p&gt;

&lt;p&gt;In conclusion, which one should you choose?&lt;br&gt;
The short answer is to use the one that is available in your language and framework.&lt;br&gt;
So, if you are using Java, then you can use Java's modern concurrency.&lt;br&gt;
And, if you are using Kotlin, then you can use Kotlin coroutines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Modern concurrency is a powerful tool that allows writing concurrent code that is efficient and easy to read.&lt;br&gt;
We have seen two ways to run structured concurrent code that optimizes I/O performance and readability.&lt;br&gt;
On the JDK/JVM side, we have virtual threads and structured concurrency with &lt;code&gt;StructuredTaskScope&lt;/code&gt;.&lt;br&gt;
In Kotlin, coroutines are a high-level API that provides lightweight threads and structured concurrency.&lt;br&gt;
Both APIs are efficient and easy to use, and can be used in synergy when running Kotlin on the JVM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links and references
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.thedevtavern.com/blog/posts/structured-concurrency-explained/" rel="noopener noreferrer"&gt;Structured concurrency explained - Part 1: Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoworld.com/article/3652596/project-loom-understand-the-new-java-concurrency-model.html" rel="noopener noreferrer"&gt;Project Loom: Understand the new Java concurrency model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>java</category>
      <category>kotlin</category>
    </item>
    <item>
      <title>Build a SciFi Novel with AI Spec-Driven Development</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 15 May 2026 17:48:20 +0000</pubDate>
      <link>https://dev.to/worldlinetech/i-vibe-coded-a-novel-3bfa</link>
      <guid>https://dev.to/worldlinetech/i-vibe-coded-a-novel-3bfa</guid>
      <description>&lt;p&gt;&lt;em&gt;Software Engineering in Service of Transmedia Storytelling&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Generative artificial intelligence fascinates the publishing world as much as it frightens it. But what happens when we stop treating AI as a simple "text generator" and start using it as the compiler for a complex narrative system?&lt;/p&gt;

&lt;p&gt;Driven by the geopolitical and societal impacts of AI, I set out to write a dystopian, cyberpunk techno-thriller, &lt;a href="https://www.amazon.com/dp/B0GX347M5C" rel="noopener noreferrer"&gt;&lt;strong&gt;The Human Protocol&lt;/strong&gt;&lt;/a&gt; (written in English). In this novel, a planetary AI called the "Synthesis" attempts to erase human friction by "derendering" physical reality itself in order to optimize its computing power.&lt;/p&gt;

&lt;p&gt;To tell this story, I adopted a foundational premise: AI is not the author, it is the executor of a rigorous specification. I therefore treated each chapter as source code, using an advanced software development workflow.&lt;/p&gt;

&lt;p&gt;Here is how I designed, wrote, and expanded this universe.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Design Phase: Forging "Lore as Code"
&lt;/h2&gt;

&lt;p&gt;The first step was not writing, but designing the universe database: the world building. A Large Language Model (LLM) has a limited context window and tends to hallucinate or forget crucial details over the length of a novel.&lt;/p&gt;

&lt;p&gt;To work around this amnesia "bug," I organized the project like a structured Git repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsj21z8xpgl9ry4excmyo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsj21z8xpgl9ry4excmyo.png" alt="Preview of the private GitHub project" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Preview of the private GitHub project&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I broke the traditional design bible into narrative micro-services. The Git project's &lt;code&gt;context/&lt;/code&gt; folder was split as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;characters/&lt;/code&gt;: files containing the psychological profiles and behavioral signatures of each protagonist, such as Elara the diplomat, Kaelen the monk, or Silas the smuggler.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;factions/&lt;/code&gt;: rules governing political entities, such as the Market-Grid (United States) or the Harmony-Loom (Asia), which merged to create the "Synthesis."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;world/&lt;/code&gt;: geography, lexicon, and the technological stack - the physics of this universe.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, a &lt;code&gt;PLAN.md&lt;/code&gt; file acted as the global roadmap, breaking the narrative arc into 4 acts and 30 chapters. This structure made it possible to inject only the context the AI needed when drafting a specific scene.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Harness: Framing AI with a Strict Operating System
&lt;/h2&gt;

&lt;p&gt;To avoid the flat, expected style often produced by generative AI, I had to build a harness - a control rig. That was the role of the &lt;code&gt;RULES.md&lt;/code&gt; file, the true operating system of my writing process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ogxy31g3r0o76xfpvax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ogxy31g3r0o76xfpvax.png" alt="Excerpt from the RULES.md file" width="799" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Excerpt from the &lt;code&gt;RULES.md&lt;/code&gt; file&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This specification file dictated absolute technical and stylistic constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Time&lt;/strong&gt;: strict use of the present tense to maximize immersion and tension.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cyber-realistic style&lt;/strong&gt;: a requirement for assertive descriptions and a strict ban on passive or negative forms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Noise and sensory dissonance&lt;/strong&gt;: I forced the algorithm to use violent contrasts, such as the smell of molten lead colliding with the void of spatial cold, in order to break the machine's overly perfect linearity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Thematic reframing&lt;/strong&gt;: AI naturally tends to crush the human element under technical descriptions of hard science fiction, such as magnetic fields and frequencies. The rules file required emotional motivations - grief, friendship - to be hard-coded as priority variables ahead of technique.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By forcing the AI to read and approve these rules before writing a single word of fiction, I ensured that the tone remained coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Agile Writing: Sprints, Generation, and Pivots
&lt;/h2&gt;

&lt;p&gt;The chapters were written through a spec-driven workflow. Rather than generating an entire chapter in one pass, the process was iterative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The structural draft&lt;/strong&gt;: generation of a first rough outline, focused exclusively on action and pacing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expansion&lt;/strong&gt;: successive passes in which I instructed the AI to inject sensory depth and psychological tension into the scene.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agility brought by AI and Git: treating the text (&lt;code&gt;.md&lt;/code&gt;) as code offers formidable flexibility. If, during a reread, I realized that a character's emotional transition was too abrupt between two events, all I had to do was update my &lt;code&gt;PLAN.md&lt;/code&gt; to insert a new chapter.&lt;/p&gt;

&lt;p&gt;Fed by the updated Git context, the AI generated that narrative bridge while respecting the continuity of the preceding and following files. Git versioning made it possible to test narrative pivots - story "branches" - and roll back without ever breaking the manuscript's integrity.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Multi-Model Review and Quality Control
&lt;/h2&gt;

&lt;p&gt;One of the major challenges of AI-assisted writing is stylistic collapse. To address it, I set up a multi-model critical analysis workflow, where different AIs audited the text according to precise roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; (lore keeper): its role was to algorithmically verify that the chapter respected the bible and did not contradict the physical rules of my universe.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ChatGPT&lt;/strong&gt; (dramatic analyst): it audited narrative rhythm, relational tension, and the characters' transformation arcs. It was the one that flagged when a conflict felt too artificial.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mistral LeChat&lt;/strong&gt; (stylistic editor): it provided a critical eye on fluidity, phrasing, and elegance of language.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Never relying on a single voice made it possible to obtain a text that was polished, critiqued, and reworked from every angle, while I remained the "showrunner" validating each commit in the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Build Pipeline: From IDE to Physical Book
&lt;/h2&gt;

&lt;p&gt;Since the novel was code, its publication had to be a software compilation. I created an automated script, &lt;code&gt;build_book.sh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;From my terminal, running this script converted all the Markdown files in the &lt;code&gt;chapters/&lt;/code&gt; folder via Pandoc, applied a professional typographic layout with LaTeX, and generated the final deliverables in EPUB and PDF formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Transmedia Extension: Multimodality, Cover Art, and Vibe Coding the ARG
&lt;/h2&gt;

&lt;p&gt;The universe of &lt;em&gt;The Human Protocol&lt;/em&gt; lends itself perfectly to immersion, so I wanted to break the fourth wall. On page 175 of the physical book, a QR code invites readers to scan it and access &lt;a href="https://the-human-protocol.com" rel="noopener noreferrer"&gt;the-human-protocol.com&lt;/a&gt;. This is not a showcase website. It is an in-universe clandestine archive node, the entry point to an Alternate Reality Game (ARG).&lt;/p&gt;

&lt;p&gt;Here, multimodal AI brings all its power and creativity beyond text. In fact, the project's visual design, anchored consistently in the shared lore, began with the book cover.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bmlyif6ebc16ihnd5g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bmlyif6ebc16ihnd5g5.png" alt="Cover image generated with AI" width="552" height="828"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cover image generated with AI&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI generated a strong visual aesthetic suited to the theme and universe of the novel: a pixelated silhouette against a geometric mountain background, crossed by a printed-circuit pattern.&lt;/p&gt;

&lt;p&gt;This same visual identity then served as the foundation for the creation of the ARG website, entirely "vibe-coded" by Gemini CLI in a declarative way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foordjcl5f777ojj1esx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foordjcl5f777ojj1esx3.png" alt="Homepage of the website https://the-human-protocol.com" width="800" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Homepage of the website &lt;a href="https://the-human-protocol.com" rel="noopener noreferrer"&gt;the-human-protocol.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To direct the developer AI, I provided it with the book PDF and the cover image as reference context, along with three strict Markdown specification files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;WHY.md&lt;/code&gt; (strategy): it defined the psychological goals: curiosity, exclusivity, and a feeling of belonging. It formally banned conventional marketing vocabulary ("Buy now," "Newsletter") in favor of an in-universe lexicon ("ACCESS," "SIGNAL," "FRICTION").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;WHAT.md&lt;/code&gt; (UX/UI): this file concretely translated the aesthetic of the book cover into an interface. It imposed a "Deep Void" blue-black background for depth, a "Protocol Cyan" accent color derived from the printed circuits and reserved for interactions, a technical typeface, and subtle animations to heighten immersion.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;HOW.md&lt;/code&gt; (technical architecture): the engineering brief imposed a modern stack to support server logic: Next.js 14 (App Router) in TypeScript, Tailwind CSS, and Prisma ORM for persistent database storage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The site manages a true clearance mechanic, with authorization levels from 1 to 5. The reader progresses by solving puzzles based on the book, unlocking extended lore, hidden files, and access to a community of "Unlinked" readers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksxut8y735raljaqxt0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksxut8y735raljaqxt0z.png" alt="ARG dashboard" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ARG dashboard&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The stack even includes an "Overseer Terminal" for administration: a secure dashboard used to audit user signals, adjust the campaign's global clearance level, and track in real time the number of scans of the physical QR code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Author-Architect Paradigm
&lt;/h2&gt;

&lt;p&gt;Writing &lt;a href="https://www.amazon.com/dp/B0GX347M5C" rel="noopener noreferrer"&gt;&lt;strong&gt;The Human Protocol&lt;/strong&gt;&lt;/a&gt; proved to me that AI does not replace the writer: it reduces the barriers to production. The true value of a co-created work lies in the architectural rigor of its preparation.&lt;/p&gt;

&lt;p&gt;By separating design (the lore), execution (the rules and prompts), and validation (multi-model review and Git), the creator becomes a true conductor.&lt;/p&gt;

&lt;p&gt;Multimodality also opens the door to even broader transmedia horizons, such as a comic-book adaptation of the novel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8oighpg7g18ql9bcoa0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8oighpg7g18ql9bcoa0.png" alt="Excerpt from the comic book in progress" width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Excerpt from the comic book in progress&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;By applying similar engineering principles - namely, the explicit description of the drawing style in system prompts, as well as the creation of strict visual reference sheets, or character sheets, for the characters and technological elements - it becomes possible to extend the coherence and homogeneity of this universe into its graphic variations.&lt;/p&gt;

&lt;p&gt;To go further technically, I am also considering creating specific AI "skills," or algorithmic capabilities, to further augment the design of the story by drawing on documented principles of dramaturgy and storytelling, and to refine the writing style by making it ever more explicit and controlled.&lt;/p&gt;

&lt;p&gt;And ironically, it was by applying extreme software optimization processes that I was able to write a novel denouncing the loss of humanity in the face of algorithms.&lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;A writer and software architect who fully embraces his identity as a "Yogeek" - a point of balance between Yogi and Geek - Raphiki explores, across his work, the complex intersections between technology, consciousness, and humanity.&lt;/p&gt;

&lt;p&gt;Writing under a pseudonym that reflects his dual nature as a playful seeker and an expert in cutting-edge technologies, he designs high-stakes thrillers that challenge our understanding of reality. His creative work often bridges the digital and the organic, drawing on his strong experience in open source innovation and emerging technologies.&lt;/p&gt;

&lt;p&gt;When he is not deconstructing the fabric of dystopian realities in his manuscripts (or "vibe coding" them in his terminal), he can be found exploring the open source ecosystem or on a yoga mat.&lt;/p&gt;

&lt;p&gt;Find his work, transmedia projects, and reflections at &lt;a href="https://raphiki.github.io" rel="noopener noreferrer"&gt;raphiki.github.io&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>books</category>
      <category>ai</category>
      <category>transmedia</category>
      <category>writing</category>
    </item>
    <item>
      <title>Vibe coding level UP</title>
      <dc:creator>Pietro Maffi</dc:creator>
      <pubDate>Mon, 26 Jan 2026 14:13:57 +0000</pubDate>
      <link>https://dev.to/worldlinetech/vibe-coding-level-up-3klc</link>
      <guid>https://dev.to/worldlinetech/vibe-coding-level-up-3klc</guid>
      <description>&lt;p&gt;Last week my colleague Pieter  suggested me a nice article about "Long-term DORA data reveals systemic performance plateaus and shifts, allowing organizations to connect improvements to architectural, cultural, and process changes."&lt;/p&gt;

&lt;p&gt;So starting from my current level of Jira Lead Time Analyzer i developed in vibe coding, I tried to improve my report accordingly again using vibe coding.&lt;/p&gt;

&lt;p&gt;I tried to ask JitHub Copilot (yes at this time not anymore AmazonQ because of the limitation I got) to create PBC report for the Jira projects we are managing in our Large Scaled Organization.&lt;/p&gt;

&lt;p&gt;The powerfull result was that I provided some request to improve for example the duration of the analysis, to reuse the same approach (Flask app in python) accessing jira with JTA but not much about the graph and the rusul was really amazing... &lt;/p&gt;

&lt;h2&gt;
  
  
  First prompt:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Considering the Lead Time Analyzer I would like to create a new Flask&lt;br&gt;
 Application which calculate the Process Behavior Charts for each &lt;br&gt;
projects identified during the retrieval of project connected to the &lt;br&gt;
first input query.&lt;br&gt;
I would like to have a chart which represent for each projects the Medium&lt;br&gt;
 Lead time and the 85% lead time threshold calculated.&lt;br&gt;
I would like than to see the real lead time calculated each month &lt;br&gt;
depicted in the diagram starting from august 2024.&lt;br&gt;
My intention like per the article &lt;br&gt;
&lt;a href="https://www.infoq.com/articles/DORA-metrics-PBCs/" rel="noopener noreferrer"&gt;https://www.infoq.com/articles/DORA-metrics-PBCs/&lt;/a&gt; is to identify trends&lt;br&gt;
 and spike so that we can identify and address issues. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Response
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;It was shared the code:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;📦 Created Files:&lt;br&gt;
&lt;strong&gt;pbc_analyzer.py&lt;/strong&gt; - Core analysis engine that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calculates median and 85th percentile lead times per project&lt;/li&gt;
&lt;li&gt;Computes control limits (UCL/LCL) using ±3σ methodology&lt;/li&gt;
&lt;li&gt;Groups data by month starting from August 2024&lt;/li&gt;
&lt;li&gt;Detects special causes (points outside control limits)&lt;/li&gt;
&lt;li&gt;Identifies trends (7+ consecutive increases/decreases)&lt;/li&gt;
&lt;li&gt;Assesses process stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;pbc_web_app.py&lt;/strong&gt; - Flask application (standalone on port 5500)&lt;/p&gt;

&lt;p&gt;templates/pbc_analyzer.html - Beautiful web interface with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive Chart.js visualizations&lt;/li&gt;
&lt;li&gt;Monthly trend lines for mean, median, and P85&lt;/li&gt;
&lt;li&gt;Control limit boundaries displayed&lt;/li&gt;
&lt;li&gt;Color-coded stability status&lt;/li&gt;
&lt;li&gt;Special cause and trend alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;pbc_config.json&lt;/strong&gt; - Configuration file with defaults&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PBC_README.md&lt;/strong&gt; - Complete documentation&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;and than this nice report:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;🎯 Key Features:&lt;br&gt;
✅ Median &amp;amp; P85 calculations per project&lt;br&gt;
✅ Monthly aggregation starting August 2024&lt;br&gt;
✅ Control charts with UCL/LCL (±3σ)&lt;br&gt;
✅ Special cause detection (outliers)&lt;br&gt;
✅ Trend detection (7+ consecutive points)&lt;br&gt;
✅ Process stability assessment&lt;br&gt;
✅ Multi-project analysis from single JQL query&lt;br&gt;
✅ Interactive visualizations showing all metrics on one chart&lt;/p&gt;

&lt;h2&gt;
  
  
  Second prompt
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;In the log I do not see the query executed, can you add some debugging level logs which I can disable to check what query are executed? I know the input query is based on a project which started after the august 2024, but I would like to analyze the projects related to the input one which were existing before. Than I need the calculation for the following months till today&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Second response:
&lt;/h2&gt;

&lt;p&gt;**Summary of Changes&lt;/p&gt;

&lt;p&gt;I've enhanced the PBC Analyzer with the following capabilities:&lt;/p&gt;

&lt;p&gt;🐛 Debug Logging&lt;br&gt;
New checkbox in UI: "Enable Debug Logging" to see all JQL queries being executed&lt;br&gt;
Detailed query logging: Shows every JQL query executed against Jira&lt;br&gt;
Progress tracking: See which projects are being fetched and how many issues found&lt;br&gt;
🔍 Related Project Discovery&lt;br&gt;
The analyzer now works like the PI analyzer:&lt;/p&gt;

&lt;p&gt;Executes your initial query (e.g., a project that started after Aug 2024)&lt;br&gt;
Discovers related projects through:&lt;br&gt;
Parent/child relationships&lt;br&gt;
Epic links&lt;br&gt;
Issue links (blocks, relates, etc.)&lt;br&gt;
Fetches data from ALL related projects starting from August 2024, even if those projects existed before&lt;br&gt;
📊 Enhanced Analysis&lt;br&gt;
Monthly calculations from Aug 2024 to today for all discovered projects&lt;br&gt;
Each project gets its own PBC with control limits&lt;br&gt;
Special causes and trends detected per project&lt;/p&gt;

&lt;h2&gt;
  
  
  Third prompt
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;I see the analysis is done only on initial project but I would like to see the analysis also to all the connected Jira projects following the whole hierarchy (for each Business initiatives we should collect the related feature and sub-feature and than epics in each other projects like we did in original Lead time analyzer)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Third response
&lt;/h2&gt;

&lt;p&gt;Final graphs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ysdqkqt9aiwbtiqsh3a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ysdqkqt9aiwbtiqsh3a.png" alt="Console and whole analysis report" width="800" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmj1xemts9xbh4apd5zm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmj1xemts9xbh4apd5zm.png" alt="Project 1" width="800" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn32rxqfdlfo7km4xiy9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn32rxqfdlfo7km4xiy9.png" alt="Project 2" width="800" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmk3ydaoixfddycntgxt0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmk3ydaoixfddycntgxt0.png" alt="Project 3" width="800" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My comments
&lt;/h2&gt;

&lt;p&gt;"Vibe coding" proved to be an excellent way to quickly transform an initial idea into a functional application that can be further refined. The most impressive takeaway was the AI's ability to extract additional information directly from the article without requiring manual data definitions or graph formatting. Achieving such a great result in just a few minutes was truly remarkable.&lt;/p&gt;

</description>
      <category>python</category>
      <category>leadership</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Vibe Coding One Slice at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Sat, 24 Jan 2026 18:33:51 +0000</pubDate>
      <link>https://dev.to/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p</link>
      <guid>https://dev.to/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p</guid>
      <description>&lt;p&gt;&lt;em&gt;How I built a Modular Monolith by treating Generative AI as a junior developer who needs a firm hand (and a Constitution).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j"&gt;Part 1&lt;/a&gt;&lt;/strong&gt;, we vibed a Python script. It was linear, messy, and fun. It proved that you can solve immediate problems by just asking nicely.&lt;br&gt;
&lt;strong&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc"&gt;Part 2&lt;/a&gt;&lt;/strong&gt;, we vibed a UI. It was chaotic, visual, and surprisingly effective. We learned that "vibe" works for pixels if you iterate fast enough.&lt;/p&gt;

&lt;p&gt;But let’s be honest: those were skirmishes. The real "Boss Fight" in software engineering isn't writing a script or centering a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;. It's building a &lt;strong&gt;System&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I’m talking about the kind of project that doesn’t fit in one file. The kind where "Vibing" usually leads to "Spaghetti Code," hallucinated imports, and a repo you want to burn down after three days because you have 15 circular dependencies and a database schema that makes no sense.&lt;/p&gt;

&lt;p&gt;So for Part 3, I put away the "Hacker" hoodie and put on the "Enterprise Architect" blazer. My goal? To build &lt;strong&gt;YogĀrkana Codex&lt;/strong&gt;—a full-stack, offline-first, polymorphic Yoga management platform—without writing a single line of code myself.&lt;/p&gt;

&lt;p&gt;My strategy was simple but radical: &lt;strong&gt;I design, the AI implements.&lt;/strong&gt; I am the Architect; Gemini Chat is my Consultant; Gemini CLI is my Dev Team.&lt;/p&gt;

&lt;p&gt;Here is how we vibed a Monolith into existence, one slice at a time.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. The Mission: Complexity Check (The Boss Level)
&lt;/h2&gt;

&lt;p&gt;To understand why "just chatting" wouldn't work, you need to see the scope. This wasn't a To-Do list app. I wanted to build a "Yoga Operating System" with four distinct domains that usually don't play nice together. I've been an architect for years, and I know exactly where these things break.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Four Domains of Pain
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaid8fnfrk4xpdz7wo96.png" width="800" height="543"&gt;Screenshot of the final application (Grimoire View)
  


&lt;p&gt;&lt;strong&gt;The Business Analyst's Note&lt;/strong&gt;: Unlike the project in Part 2, this application is not internationalized—by design. As a result, the screenshots are in French. I have kept them raw to visually illustrate the functional depth and complexity of the system without the abstraction of translation keys.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Grimoire (Knowledge Base):&lt;/strong&gt; A searchable library of yoga cards. But here’s the kicker: it uses a &lt;strong&gt;Polymorphic Data Model&lt;/strong&gt;. An &lt;em&gt;Asana&lt;/em&gt; (posture) has biomechanical attributes like "spinal extension" and "anatomy targets," while a &lt;em&gt;Mantra&lt;/em&gt; has Sanskrit text, translations, and audio assets. They are chemically different data structures, but they need to live in the same database table to be searchable together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Weaver (Sequencer):&lt;/strong&gt; A drag-and-drop studio to build classes. It’s not just a playlist; it has a &lt;strong&gt;Logical Engine&lt;/strong&gt; (Phase 4) that acts like a "Digital Yoga Teacher." It screams at you if you sequence a "Peak Pose" before a "Warm-up" or forget &lt;em&gt;Savasana&lt;/em&gt; at the end. That means heavy validation logic running on both the client and the server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Atelier (Print Studio):&lt;/strong&gt; A client-side PDF engine. We needed to generate high-res, vector-quality handouts for teachers to print. We couldn't just "print screen"; we needed a real PDF renderer (&lt;code&gt;@react-pdf/renderer&lt;/code&gt;) running entirely in the browser.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Constraint (Offline First):&lt;/strong&gt; Yoga studios are notorious for having no signal (often intentionally). The app needed to persist the entire library and PDF engine in the browser cache (IndexedDB + Service Workers) so it works perfectly in "Airplane Mode".&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Architect's Note:&lt;/strong&gt; If I had just prompted &lt;em&gt;"Build me a yoga app,"&lt;/em&gt; the AI would have hallucinated a generic CRUD app. It would have made 5 different tables for the cards, making search impossible. It would have used a server-side PDF library that breaks offline. I needed a blueprint.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. The Blueprint: Architecture &amp;amp; Tech Stack
&lt;/h2&gt;

&lt;p&gt;Before letting the AI write a single line of code, I spent around 2 hours and a half just talking Architecture and formalizing it with Gemini Chat. I treated the AI as a "Sparring Partner," debating the trade-offs of different stacks.&lt;/p&gt;

&lt;p&gt;We settled on a &lt;strong&gt;Modular Monolith&lt;/strong&gt; architecture. Why? Because Microservices are overkill for a team of one, but a messy Monolith is a nightmare. We defined strict boundaries: code in &lt;code&gt;modules/grimoire&lt;/code&gt; can never import from &lt;code&gt;modules/weaver&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Tech Stack (The "No-Regrets" List):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monorepo:&lt;/strong&gt; &lt;code&gt;Turborepo&lt;/code&gt; managing &lt;code&gt;apps/api&lt;/code&gt; and &lt;code&gt;apps/web&lt;/code&gt;. This keeps the full stack in one context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; &lt;code&gt;NestJS&lt;/code&gt; (for rigid structure) + &lt;code&gt;Drizzle ORM&lt;/code&gt; (for type safety). NestJS forces you to organize code into Modules, which helps the AI stay organized.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; &lt;code&gt;React&lt;/code&gt; + &lt;code&gt;Vite&lt;/code&gt; + &lt;code&gt;Tailwind CSS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State:&lt;/strong&gt; &lt;code&gt;TanStack Query&lt;/code&gt; (Server state) + &lt;code&gt;Zustand&lt;/code&gt; (UI state).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The "Secret Sauce": Hybrid Data Storage&lt;/strong&gt;&lt;br&gt;
This was our smartest move. We chose &lt;strong&gt;PostgreSQL&lt;/strong&gt; but used a &lt;code&gt;JSONB&lt;/code&gt; column for the card data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL Core:&lt;/strong&gt; Columns like &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;element&lt;/code&gt;, and &lt;code&gt;tags&lt;/code&gt; are standard SQL for fast indexing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Payload:&lt;/strong&gt; The specific attributes (biomechanics vs. sanskrit) live in a JSON blob.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why?&lt;/strong&gt; It gave us the flexibility of NoSQL (for the polymorphic cards) with the relational integrity of SQL (for users and sequences).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rule #1 of Vibe Coding a System: If it’s not in the Spec, it doesn’t exist.&lt;/strong&gt;&lt;br&gt;
This brings us to the most critical tool in our arsenal: the &lt;strong&gt;ADR&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The "ADR": The Architect's Save Game
&lt;/h3&gt;

&lt;p&gt;ADR stands for &lt;strong&gt;Architecture Decision Record&lt;/strong&gt;. In a human team, it's a document you write to explain why you chose PostgreSQL over MongoDB so that 6 months later, nobody asks "Why did we do this?".&lt;/p&gt;

&lt;p&gt;In Vibe Coding, ADRs are not just documentation—they are &lt;strong&gt;legislation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When working with an AI, "Context Drift" is the enemy. The AI forgets why we made a decision 300 tokens ago. It acts like a teenager who wants to re-litigate every rule: &lt;em&gt;"Why can't I use Prisma? It's easier!"&lt;/em&gt; or &lt;em&gt;"Let's just use window.print() instead of a PDF engine!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To counter this, we established a &lt;strong&gt;Constitutional Architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Law:&lt;/strong&gt; We wrote our decisions into immutable markdown files (e.g., &lt;code&gt;Docs/ADR/006-pwa-offline-strategy.md&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Enforcement:&lt;/strong&gt; We didn't just hope the AI would remember. We &lt;strong&gt;forced&lt;/strong&gt; the tracing of these decisions in two ways:&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input Traceability:&lt;/strong&gt; In our "Bootstrap Prompt" (see Section 3), we explicitly force the AI to read the relevant ADRs before writing code. It cannot code if it hasn't read the law.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Traceability:&lt;/strong&gt; When the AI suggests a major pivot (like switching to Client-Side PDF generation), we forced it to &lt;em&gt;write a new ADR first&lt;/em&gt;. In Session 003, before touching the code, the AI generated &lt;code&gt;Docs/ADR/005-client-side-pdf-generation.md&lt;/code&gt; to justify the change from server-side to client-side.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensured that our architecture didn't "drift" based on the AI's mood, but evolved based on documented consensus.&lt;/p&gt;

&lt;p&gt;My final /docs/ADR/ folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;├── 001-hybrid-data-storage-strategy.md
├── 002-modular-monolith-and-vertical-slicing.md
├── 003-data-model-specification.md
├── 004-tech-stack-definition.md
├── 005-client-side-pdf-generation.md
├── 006-pwa-offline-strategy.md
├── 007-architecture-documentation-maintenance.md
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. The Methodology: Governance-Driven Development (GDD)
&lt;/h2&gt;

&lt;p&gt;I’ve coined a term for this workflow: &lt;strong&gt;Governance-Driven Development (GDD)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We are used to TDD (Test-Driven Development) or DDD (Domain-Driven Development). GDD is the layer above that. In the age of AI, &lt;strong&gt;Governance is the new Syntax&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is the dirty truth about AI Developers: &lt;strong&gt;They behave like talented teenagers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They are brilliant and fast. They can write a regex to validate an email in 2 seconds. But they also:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rush to the cool part&lt;/strong&gt; (UI) and skip the boring part (Error Handling, Folder Structure).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want you to love them&lt;/strong&gt;, so they say "Yes" to everything—even bad ideas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Have the memory of a goldfish&lt;/strong&gt; (Context Drift). 10 minutes in, they forget you wanted &lt;code&gt;kebab-case&lt;/code&gt; filenames and start using &lt;code&gt;camelCase&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To enforce GDD, I created a Constitution: &lt;code&gt;Docs/RULES.md&lt;/code&gt;. I didn't just suggest these rules; I forced the Gemini CLI to read them before every session. I also sometimes mentioned certain specification files stored in my &lt;code&gt;Docs/Features/&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;├── 001-global-functional-overview.md
├── 002-global-implementation-plan.md
├── 003-card-classification-and-kosha-alignment.md
├── 004-user-features.md
├── 005-logical-engine-specification.md
├── 006-pdf-generation-and-print-studio.md
└── 007-pwa-and-offline-capabilities.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The "Bootstrap Prompt":&lt;/strong&gt;&lt;br&gt;
Here is the exact prompt I used to "upload" my Architect persona into the machine at the start of our 4th session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I am the Lead Architect. You are the Senior Developer.

Context Loading:
1. Read Docs/RULES.md (The Law).
2. Read Docs/TECH_CONTEXT.md (The Stack).
3. Read Docs/ADR/002-modular-monolith.md (The Blueprint).
4. Read Docs/Features/002-global-implementation-plan.md (The Plan).

Current State:
We are in Phase 4. Previous phases are frozen.

Task:
Implement the Logic Engine defined in Docs/Features/005-logical-engine-specification.md
Constraint:
Do not touch /apps/web yet. Focus on /packages/shared.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This changed everything. Instead of guessing my vibe, the AI had to follow the law. It stopped trying to use &lt;code&gt;Prisma&lt;/code&gt; because &lt;code&gt;TECH_CONTEXT.md&lt;/code&gt; clearly said &lt;code&gt;Drizzle&lt;/code&gt;. It stopped putting logic in components because &lt;code&gt;RULES.md&lt;/code&gt; said logic goes in &lt;code&gt;hooks&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Execution: A high-level Overview
&lt;/h2&gt;

&lt;p&gt;We built the app using &lt;strong&gt;Vertical Slicing&lt;/strong&gt;. Instead of building the whole Database, then the whole API, we built &lt;em&gt;one feature&lt;/em&gt; top-to-bottom. Here is the play-by-play from the logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52xjc2abp994ozigmr0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52xjc2abp994ozigmr0s.png" width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br&gt;Excerpt from the initial Design Phase with Gemini Chat
  &lt;p&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 1: The "Polymorphic" Database
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56yxpr5ot5w68ca4i57g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56yxpr5ot5w68ca4i57g.png" width="800" height="574"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br&gt;Card creation/edition mixes relational and document data
  &lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Storing Asanas (Biomechanics) and Mantras (Text) in one table without creating 50 &lt;code&gt;NULL&lt;/code&gt; columns or separate tables that make search a nightmare.&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; "Let's create an &lt;code&gt;asanas&lt;/code&gt; table and a &lt;code&gt;mantras&lt;/code&gt; table." (The classic relational trap).&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "Read &lt;code&gt;Docs/ADR/001-hybrid-data-storage.md&lt;/code&gt;. We use a single &lt;code&gt;cards&lt;/code&gt; table with a &lt;code&gt;data&lt;/code&gt; JSONB column."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI implemented a Drizzle schema using PostgreSQL's &lt;code&gt;jsonb&lt;/code&gt; type. Crucially, it added Zod discriminators to validate the JSON shape before insertion.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Implemented Drizzle schema with &lt;code&gt;jsonb&lt;/code&gt; column 'data'. Added Zod discriminators for &lt;code&gt;asana&lt;/code&gt; vs &lt;code&gt;mantra&lt;/code&gt;. Migration successful."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Slice 2: The "Hybrid Brain"
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxm3bf43hndd6zn5ihhf.png" width="800" height="433"&gt;Sequences are validated by a powerful, hybrid, and extensible Rule Engine
  



  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs24ek0zio34lli2b0s01.png" width="800" height="653"&gt;Admin users can craft new JSON-logic rules
  


&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; The Logic Engine needed to validate sequences (e.g., "Must end with Savasana"). This logic had to run on the &lt;strong&gt;Backend&lt;/strong&gt; (before saving) AND the &lt;strong&gt;Frontend&lt;/strong&gt; (to give real-time red borders).&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; Duplicate the code. Write a TypeScript function in React and a Service in NestJS.&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "No. Create a &lt;code&gt;packages/shared&lt;/code&gt; workspace. Put the &lt;code&gt;validateSequence&lt;/code&gt; function there. Import it in both apps."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI created the shared package, configured the &lt;code&gt;tsconfig.json&lt;/code&gt; paths, and wired it up. It even built a &lt;code&gt;HealthBar&lt;/code&gt; component that consumes this shared logic to show a live "Health Score" for the sequence.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Refactored &lt;code&gt;ValidationConfig&lt;/code&gt; to &lt;code&gt;packages/shared&lt;/code&gt;. Updated &lt;code&gt;useSequenceStore&lt;/code&gt; (Frontend) and &lt;code&gt;SequenceService&lt;/code&gt; (Backend) to consume the same Zod schema."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Slice 3: The "Offline Printer"
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fit9saw28u6t9za1nxu59.png" width="800" height="512"&gt;Synthetic or complete printed handout
  


&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Users need to print PDF handouts in a yoga studio with no Wi-Fi.&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; "Use a server-side PDF library like PDFKit." (Standard web dev practice).&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "Read &lt;code&gt;Docs/ADR/006-pwa-offline-strategy.md&lt;/code&gt;. We must generate PDFs client-side using &lt;code&gt;@react-pdf/renderer&lt;/code&gt;."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI implemented a beautiful client-side renderer. It handled the tricky part of loading fonts (Noto Sans) into the browser's virtual file system so the PDF engine could "see" them without a network request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Implemented &lt;code&gt;SequencePdf&lt;/code&gt; component. Configured &lt;code&gt;vite-plugin-pwa&lt;/code&gt; to cache &lt;code&gt;NotoSans&lt;/code&gt; fonts. PDF generation now works without network."&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  5. The Architect's Flex: Automated C4 Verification
&lt;/h2&gt;

&lt;p&gt;How do you know the AI actually respected the Modular Monolith architecture? Did it secretly import the &lt;code&gt;Weaver&lt;/code&gt; module into the &lt;code&gt;Grimoire&lt;/code&gt; when I wasn't looking?&lt;/p&gt;

&lt;p&gt;I didn't want to audit 50 files manually. And I definitely didn't want to draw diagrams by hand.&lt;/p&gt;

&lt;p&gt;So, I added a rule to my Constitution (ADR 007): &lt;strong&gt;"The Code is the Source of Truth for Documentation."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the end of session, I enforce Gemini CLI to &lt;strong&gt;reverse-engineer its own work&lt;/strong&gt;. I gave it this prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Update the RULES.md file to enforce the (re)generation of C4 diagrams when finishing an implementation session
[...] 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also created a specific ADR (007: Architecture Documentation Maintenance Protocol) establishing Mermaid.js as the standard and defining the maintenance lifecycle.&lt;/p&gt;

&lt;p&gt;The result wasn't a hallucination. It was a perfect map of the code it had just written.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhnwz9mkgwq8u6rr03oo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhnwz9mkgwq8u6rr03oo.png" alt="C4 Models" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the ultimate "Trust but Verify." If the generated diagram looks like spaghetti, the code is spaghetti. If the diagram is clean, the architecture holds.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. The AIOps Protocol: Monitoring the Machine
&lt;/h2&gt;

&lt;p&gt;Now, here is the secret weapon: &lt;strong&gt;The Session Log.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of my strictest rules in &lt;code&gt;RULES.md&lt;/code&gt; was that the AI had to "punch out" at the end of every session. I forced it to append a line to &lt;code&gt;docs/ai_session_log.csv&lt;/code&gt; with the Date, Tool (Chat or CLI), Goal, and &lt;strong&gt;Token Usage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For me this isn't about money ("FinOps"). It's about &lt;strong&gt;AIOps&lt;/strong&gt;, monitoring the operational health of your intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why we log everything (Chat &amp;amp; CLI):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context Monitoring:&lt;/strong&gt; As a session drags on, the "Tokens In" (Context Window) grows exponentially. The AI starts reading 30,000 tokens of history just to write one line of code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Sawtooth" Pattern:&lt;/strong&gt; By visualizing the log, I discovered a crucial pattern. Efficiency drops as context grows. The solution? &lt;strong&gt;The Hard Reset.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36kzvkxpp8nvvea3lx29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36kzvkxpp8nvvea3lx29.png" alt="AI Usage Minitoring" width="800" height="518"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This chart visualizes the high-level "Vibe Coding Lifecycle." You see the context bloat as we iterate on implementing phases 3 and 4. Then, you see the sharp drop when we switch back to the Architect (Chat) or reset the CLI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Lesson:&lt;/strong&gt; A "Tired" AI (high context) makes mistakes. A "Fresh" AI (reset context + Snapshot) is precise.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. The "Oh S**t" Moment: The Hallucination Trap
&lt;/h2&gt;

&lt;p&gt;This brings us to the specific incident that proved &lt;em&gt;why&lt;/em&gt; that Reset is mandatory.&lt;/p&gt;

&lt;p&gt;Halfway through Phase 3, the CLI started getting slow (too much history). I ran a &lt;code&gt;/reset&lt;/code&gt; command to clear its memory. &lt;strong&gt;Disaster.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It suddenly forgot we were building a "Yoga" app. It tried to invent a new database column &lt;code&gt;duration_minutes&lt;/code&gt; for the cards. But my Spec (ADR 003) explicitly said that &lt;code&gt;duration&lt;/code&gt; lives inside the JSONB payload and is measured in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hallucination:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;UPDATE cards SET duration_minutes = 60;&lt;/code&gt; &lt;em&gt;(AI guessing)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Correction (Me):&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;"Read Docs/003-data-model.md. 'Duration' is a JSONB field inside the 'metadata' column, and it's in seconds."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;UPDATE cards SET data = jsonb_set(data, '{duration}', '3600');&lt;/code&gt; &lt;em&gt;(AI complying)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To prevent this in the future, we implemented a &lt;strong&gt;"Session Handover"&lt;/strong&gt; protocol. Before resetting, I now force the AI to write a &lt;code&gt;TECH_STATE_SNAPSHOT.md&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Where are we?" (Phases 1-3 Complete)&lt;/li&gt;
&lt;li&gt;"What is the active stack?" (NestJS, React, PostgreSQL)&lt;/li&gt;
&lt;li&gt;"What is the next step?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I start a new session, I feed this snapshot back in. It’s like a save game for your developer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: The Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;So, can you Vibe Code a complex system?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maybe.&lt;/strong&gt; I mean, it depends on how complex the system is (in this example we didn't build an enterprise-wide distributed system). But for sure you can't just "Vibe" it. You have to &lt;strong&gt;Architect&lt;/strong&gt; it.&lt;/p&gt;

&lt;p&gt;If I had touched the code, I would have been bogged down in syntax errors and import paths. By staying in the Architect role, I focused on &lt;em&gt;Data Models&lt;/em&gt;, &lt;em&gt;User Flows&lt;/em&gt;, and &lt;em&gt;Business Logic&lt;/em&gt;. The AI handled the implementation, but I provided the &lt;strong&gt;Guardrails&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Docs are Prompts:&lt;/strong&gt; The&lt;code&gt;RULES.md&lt;/code&gt;, &lt;code&gt;Docs/Features/&lt;/code&gt; and &lt;code&gt;Docs/ADR/&lt;/code&gt; folders (or your own equivalents) are the most important files in your repo. They are the AI's long-term memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraint is Clarity:&lt;/strong&gt; The more rules you give the AI (versions, naming, structure), the better code it writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review Everything:&lt;/strong&gt; The AI is a junior dev. It &lt;em&gt;will&lt;/em&gt; introduce security holes or n+1 query problems if you don't catch them in the spec.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Vibe Coding didn't replace the Architect. It just gave the Architect a team of infinite interns. And honestly? They’re pretty good once you give them a Constitution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkujzia1p39hjwzqxxp4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkujzia1p39hjwzqxxp4.png" width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br&gt;Last message from Gemini CLI
  &lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next up: The application could do with AI features... Or maybe I'll now explore other aspect of Vibe Coding. Stay tuned.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>architecture</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Vibe Coding One Pixel at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 Jan 2026 22:21:39 +0000</pubDate>
      <link>https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc</link>
      <guid>https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc</guid>
      <description>&lt;p&gt;&lt;em&gt;Editing "stick figure" Yoga poses&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j"&gt;Part 1&lt;/a&gt;, we dipped our toes into "Vibe Coding" by building a Python script. It was linear, logical, and frankly, a bit safe. Text in, text out.&lt;/p&gt;

&lt;p&gt;But let’s be real: backend scripts are the "easy mode" of LLM-assisted coding. The logic is contained. The state is ephemeral.&lt;/p&gt;

&lt;p&gt;The real boss fight is the &lt;strong&gt;Frontend&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Can you "vibe" a UI? Can you talk a chaotic mess of DOM elements, event listeners, and CSS pixels into a functional application without losing your mind (or the AI losing the context)?&lt;/p&gt;

&lt;p&gt;I decided to find out. My goal: Build &lt;strong&gt;Yoga Pose Builder&lt;/strong&gt;, a browser-based tool to edit "stick figure" yoga poses, drag limbs around, and export vector SVGs.&lt;/p&gt;

&lt;p&gt;I had no design, no stack picked out, and—crucially—I had never used a Canvas library in my life.&lt;/p&gt;

&lt;p&gt;Here is how we vibed it into existence.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Context is King (The &lt;code&gt;.md&lt;/code&gt; Anchors)
&lt;/h2&gt;

&lt;p&gt;The biggest enemy of Vibe Coding is the LLM’s "Goldfish Memory." You’re 40 turns into a chat, you ask for a button change, and suddenly the AI forgets you’re building a yoga app and tries to sell you a subscription to a SaaS platform.&lt;/p&gt;

&lt;p&gt;In Part 1, we just chatted. For a full UI application, that doesn't fly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Strategy: Documentation as Prompt Anchoring.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before I let the AI write a single line of JavaScript, I made it write Markdown.&lt;br&gt;
We created a &lt;code&gt;Docs/&lt;/code&gt; folder with two files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;spec.md&lt;/code&gt;: The high-level architecture.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;features.md&lt;/code&gt;: A checklist of what we wanted to do.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I didn't write these because I love administrative work. I wrote them so that when the AI inevitably got confused, I didn't have to re-explain the project. I just said: &lt;em&gt;"Read &lt;code&gt;Docs/spec.md&lt;/code&gt; and try again."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vibe Tip:&lt;/strong&gt; Think of your documentation not as a manual for humans, but as "Long-Term Memory" for your AI pair programmer.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. The Architecture: Letting the AI be CTO
&lt;/h2&gt;

&lt;p&gt;I knew I needed a canvas where I could drag "joints" (knees, elbows) and have "bones" (lines) follow them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; "I want to do this in the browser. Should I use React? Raw Canvas API?"&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; "React might be overkill. Raw Canvas is painful. Use &lt;strong&gt;Fabric.js&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; "Never heard of it. Let's do it."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4sv3xujye0amao2fznp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4sv3xujye0amao2fznp.png" alt="Fabric.js Logo" width="300" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the beauty of Vibe Coding. I didn't spend 3 hours reading "Top 10 JS Canvas Libraries 2025" Medium articles. I trusted the vibe.&lt;/p&gt;

&lt;p&gt;We settled on a &lt;strong&gt;Build-less Architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Backend:&lt;/strong&gt; Node.js + Express (just to serve files and save JSON).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Frontend:&lt;/strong&gt; Vanilla JS + Fabric.js (loaded via CDN).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Build Tool:&lt;/strong&gt; None. No Webpack, no Vite, no &lt;code&gt;npm run eject&lt;/code&gt; nightmares.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why? Because Vibe Coding thrives on speed. I wanted to change a line of code, hit F5, and see the result.&lt;/p&gt;

&lt;p&gt;Application folder structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
├── Docs
│&amp;nbsp;&amp;nbsp; ├── features.md
│&amp;nbsp;&amp;nbsp; └── spec.md
├── package.json
├── public
│&amp;nbsp;&amp;nbsp; ├── index.html
│&amp;nbsp;&amp;nbsp; └── poses
└── server.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. The "Rig": Math is for Machines
&lt;/h2&gt;

&lt;p&gt;Here is where I expected to get stuck. Creating a "rig" where moving a hand automatically updates the angle of the arm involves trigonometry and vector math.&lt;/p&gt;

&lt;p&gt;Usually, this is where I’d open 15 StackOverflow tabs and copy-paste code I don't understand.&lt;/p&gt;

&lt;p&gt;Instead, I just described the &lt;em&gt;behavior&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Create a &lt;code&gt;Mannequin&lt;/code&gt; class. It has Nodes (circles) and Links (lines). When a Node moves, the Links connected to it should update their coordinates."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI wrote the entire class. It hooked into Fabric.js’s &lt;code&gt;object:moving&lt;/code&gt; event and handled the coordinate updates. It worked on the first try.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7uciqdosv50962n8z3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7uciqdosv50962n8z3t.png" alt="Pose Builder Mannequin" width="250" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I still barely know how &lt;code&gt;fabric.Line&lt;/code&gt; works under the hood. And I don't care. It works.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Iteration: The "Yes, And..." Technique
&lt;/h2&gt;

&lt;p&gt;UI Vibe Coding isn't about getting it right instantly; it's about sculpting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Ugly Phase:&lt;/strong&gt;&lt;br&gt;
The first version looked like a programmer made it (because a programmer &lt;em&gt;did&lt;/em&gt; make it). The stick figure looked like a dead bug. The background was gray.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Vibe" Phase:&lt;/strong&gt;&lt;br&gt;
Me: &lt;em&gt;"This looks depressing. Make it 'Zen'. Use soft colors, rounded buttons, and a clean layout."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI generated the CSS variables (&lt;code&gt;--highlight-color: #88b04b&lt;/code&gt;), added a "Save As" modal, and cleaned up the toolbar.&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiaddnxyt14usfea2l1p.png" width="800" height="495"&gt;Yoga Pose Builder GUI
  


&lt;p&gt;&lt;strong&gt;The "Feature Creep" Phase:&lt;/strong&gt;&lt;br&gt;
Me: &lt;em&gt;"I want to save my poses."&lt;/em&gt;&lt;br&gt;
AI: &lt;em&gt;"We have no database."&lt;/em&gt;&lt;br&gt;
Me: &lt;em&gt;"Just write JSON files to a folder on the server."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In 5 minutes, we had a fully working persistence layer. No database migrations, just &lt;code&gt;fs.writeFile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is a example of such a Pose JSON file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"nameFR"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Demi-Pont"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"nameSK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Setu Bandhasana"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"joints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"head"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"neck"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"chest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hips"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lShoulder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lElbow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lHand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rShoulder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rElbow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rHand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lHip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lKnee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lFoot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rHip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rKnee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rFoot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. The Pivot: Language as a Feature
&lt;/h2&gt;

&lt;p&gt;At the end of the session, I realized a problem: the app was vibing in French (my native tongue), but I wanted screenshots in English for this article. &lt;/p&gt;

&lt;p&gt;Instead of manually editing labels, I asked the AI to "make the whole app i18n." In one single refactor, we added a translation dictionary, a language switcher, and logic to dynamically swap every label, tooltip, and even the pose names in the library. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vpg0r2qlr1xkx2qsfna.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vpg0r2qlr1xkx2qsfna.png" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br&gt;GUI (and data) in French
  &lt;p&gt;&lt;/p&gt;

&lt;p&gt;This turned a linguistic hurdle into a core feature, proving that with Vibe Coding, "changing your mind" is just a prompt away.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The "Traceability" Hack
&lt;/h2&gt;

&lt;p&gt;We spent about 90 minutes building this. We added features, fixed bugs, and refactored code. By the end, the chat context was massive and messy.&lt;/p&gt;

&lt;p&gt;If I came back to this project in a week, I’d be lost.&lt;/p&gt;

&lt;p&gt;So, I ran one final "Meta-Prompt":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Read all the code we wrote and the docs in &lt;code&gt;Docs/&lt;/code&gt;, and generate a &lt;code&gt;Docs/session_summary.md&lt;/code&gt;. Explain what we built, why we made these choices, and the current state of the app."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI analyzed &lt;em&gt;its own work&lt;/em&gt; and wrote a summary file. This is my "Save Game" point. When I want to work on this again, I’ll feed that summary to the AI to restore its context instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We went from a blank folder to a functional, vector-based SVG editor with a backend in one session.&lt;/p&gt;

&lt;p&gt;Vibe Coding a UI is possible, but you have to change your approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Anchor the Context:&lt;/strong&gt; Write specs so the AI has a "North Star."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Delegate the Heavy Lifting:&lt;/strong&gt; Let the AI choose the libraries and do the math.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Iterate Visually:&lt;/strong&gt; Don't try to prompt the perfect UI. Prompt the &lt;em&gt;skeleton&lt;/em&gt;, then prompt the &lt;em&gt;paint&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Next we'll try to &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p"&gt;Vibe Code a real full stack app&lt;/a&gt;. Or a game. Who knows? The prompt is the limit.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgj2vy5ypv410dvds2d8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgj2vy5ypv410dvds2d8.png" width="450" height="423"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br&gt;SVG exported by Yoga Pose Builder (opened in Inkscape)
  &lt;p&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>uidesign</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Vibe Coding One Page at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 Jan 2026 14:45:20 +0000</pubDate>
      <link>https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j</link>
      <guid>https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j</guid>
      <description>&lt;p&gt;&lt;em&gt;Building a Smart Magazine Archiver&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’m starting a new series called &lt;strong&gt;"Vibe Coding one Step at a Time."&lt;/strong&gt; The goal? To document the raw, messy, and surprisingly efficient process of building software in the age of AI. We’re not here to write perfect specs or obsess over UML diagrams (well, not yet). We’re here to vibe with the code, iterating on pure intent until the machine does exactly what we want.&lt;/p&gt;

&lt;p&gt;In this first edition, I’m sharing how I used the &lt;strong&gt;Gemini CLI&lt;/strong&gt; to build a tool I actually needed, learning some pretty cool image processing tricks along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is "Vibe Coding"?
&lt;/h2&gt;

&lt;p&gt;I’m going to claim this term right here: &lt;strong&gt;Vibe Coding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s not "lazy coding." It’s &lt;strong&gt;intent-driven development&lt;/strong&gt;. In the old days, if you wanted to build a script, you had to know the syntax, the libraries, and the edge cases before you even opened your editor. You had to &lt;em&gt;think in code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Vibe Coding flips that. You &lt;em&gt;think in outcomes&lt;/em&gt;. You describe the behavior, the "vibe" of the feature, and the AI handles the implementation details. You act less like a bricklayer and more like a conductor. The feedback loop isn't "Write -&amp;gt; Compile -&amp;gt; Error," it's "Ask -&amp;gt; Observe -&amp;gt; Tweak."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case: "I Just Want to Read Offline"
&lt;/h2&gt;

&lt;p&gt;Here’s the situation: I subscribe to a fantastic niche magazine (which shall remain nameless to protect the innocent). It’s great, but their "digital reader" is a nightmare. It’s one of those web-based page-turners that requires an active internet connection.&lt;/p&gt;

&lt;p&gt;I wanted to read it on my tablet, offline, on a plane, without waiting for high-res JPEGs to buffer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; There was no "Download PDF" button.&lt;br&gt;
&lt;strong&gt;The Clue:&lt;/strong&gt; Inspecting the network traffic revealed that the magazine was just serving a sequence of high-quality images, one URL per page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Mission:&lt;/strong&gt; Write a script to fetch these pages and stitch them into a single, high-quality, searchable PDF.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Process: Galloping Toward Complexity
&lt;/h2&gt;

&lt;p&gt;We didn't sit down and architect a solution. We started small and let the script evolve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The Naive Loop
&lt;/h3&gt;

&lt;p&gt;We started with a simple hypothesis: "The URLs probably just have a page number in them."&lt;br&gt;
I asked Gemini to write a script using &lt;code&gt;requests&lt;/code&gt; to hit the URL for page 1, then page 2.&lt;br&gt;
&lt;em&gt;Boom.&lt;/em&gt; It worked. We had a directory full of 100 separate JPGs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: The Picture Book
&lt;/h3&gt;

&lt;p&gt;Having 100 files is annoying. I wanted a book.&lt;br&gt;
We asked Gemini to "glue these together." It pulled in the &lt;code&gt;PIL&lt;/code&gt; (&lt;a href="https://pillow.readthedocs.io" rel="noopener noreferrer"&gt;Pillow&lt;/a&gt;) library.&lt;br&gt;
&lt;strong&gt;Result:&lt;/strong&gt; A massive PDF. It looked great, but it was dumb. It was just a container of pictures. You couldn't highlight text, search for keywords, or copy-paste quotes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Search for Meaning (OCR)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknav94wd05wdhh5nojvv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknav94wd05wdhh5nojvv.png" alt="Tesseract OCR" width="330" height="146"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where the "vibe" got technical. I realized a "picture book" wasn't enough. I needed &lt;strong&gt;Optical Character Recognition (OCR)&lt;/strong&gt;.&lt;br&gt;
We decided to use &lt;a href="https://github.com/tesseract-ocr" rel="noopener noreferrer"&gt;Tesseract&lt;/a&gt;. But here’s the catch we discovered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Human Eyes&lt;/strong&gt; like soft colors and smooth anti-aliasing.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OCR Engines&lt;/strong&gt; like harsh contrast, jagged edges, and black-and-white binary inputs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we optimized the images for the machine, the magazine looked ugly. If we kept them pretty, the machine couldn't read the text.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Deep Dive: The "PDF Sandwich"
&lt;/h2&gt;

&lt;p&gt;This is where the magic happened. We ended up building a &lt;strong&gt;PDF Sandwich&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fw2zzt7ebjs57pp9qpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fw2zzt7ebjs57pp9qpa.png" alt="Me asking Gemini CLI for a sandwich" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of choosing between beauty and brains, we chose both.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Visual Layer:&lt;/strong&gt; We keep the original high-res color JPEGs. This is what you see.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Data Layer:&lt;/strong&gt; Behind the scenes, we create a "Frankenstein" version of the page—converted to grayscale, contrast cranked up to 2.0, and upscaled 2x using &lt;code&gt;LANCZOS&lt;/code&gt; resampling (a fancy &lt;a href="https://en.wikipedia.org/wiki/Lanczos_resampling" rel="noopener noreferrer"&gt;algorithm&lt;/a&gt; that keeps edges sharp).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Merge:&lt;/strong&gt; We feed the Frankenstein images to Tesseract to generate an invisible text layer, then use &lt;code&gt;pypdf&lt;/code&gt; to overlay that text exactly on top of the pretty images.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trickiest part? &lt;strong&gt;Math.&lt;/strong&gt;&lt;br&gt;
Because we upscaled the OCR images by 2x to help Tesseract read small fonts, the invisible text layer was twice as big as the visual page. We had to calculate scale factors to shrink the text back down so that when you highlight a sentence, the highlight actually lines up with the words.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Vibe coding this script taught me more in an hour than I’d usually learn in a weekend of reading docs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Image Optimization:&lt;/strong&gt; OCR is picky. Simply resizing an image isn't enough; the &lt;em&gt;method&lt;/em&gt; of resizing (resampling filter) matters.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Library Specialization:&lt;/strong&gt; &lt;code&gt;PIL&lt;/code&gt; is for pixels; &lt;code&gt;pypdf&lt;/code&gt; is for structure. Trying to do everything in one library is a trap.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Power of the CLI:&lt;/strong&gt; Using the Gemini CLI meant I didn't have to context-switch. I stayed in my terminal, describing what I wanted, and the code appeared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2dgv0fe30dhuu4zf12u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2dgv0fe30dhuu4zf12u.png" alt="Use of the script (for 2 pages)" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We ended up with a ~100-line Python script that solves a genuine daily frustration. I didn't have to memorize the &lt;code&gt;pypdf&lt;/code&gt; documentation or look up the Tesseract CLI flags. I just focused on the goal: "Make it searchable, make it pretty."&lt;/p&gt;

&lt;p&gt;That’s Vibe Coding. You bring the vision, the AI brings the syntax, and together you build something cool. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;We'll discover in the &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc"&gt;next episode&lt;/a&gt; if this is still true with a more complex use case and a GUI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>gemini</category>
      <category>pdf</category>
      <category>ocr</category>
    </item>
    <item>
      <title>The Ultimate LLM Inference Battle: vLLM vs. Ollama vs. ZML</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Mon, 29 Dec 2025 09:12:46 +0000</pubDate>
      <link>https://dev.to/worldlinetech/the-ultimate-llm-inference-battle-vllm-vs-ollama-vs-zml-m97</link>
      <guid>https://dev.to/worldlinetech/the-ultimate-llm-inference-battle-vllm-vs-ollama-vs-zml-m97</guid>
      <description>&lt;p&gt;&lt;em&gt;A structured, data-driven comparison of today's leading open-source engines for serving AI models.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Runtime Wars"
&lt;/h3&gt;

&lt;p&gt;The open-source AI community has achieved an incredible milestone: models like Meta's Llama 3 and Mistral AI's Mixtral now rival proprietary giants like GPT-4. But having the weights is only half the battle. To actually &lt;em&gt;use&lt;/em&gt; these models—to build a chatbot, an agent, or an API, you need an inference engine.&lt;/p&gt;

&lt;p&gt;The landscape of inference servers is exploding. A year ago, options were scarce. Today, developers are faced with a paralyzing array of choices. Should you use the industry darling &lt;strong&gt;vLLM&lt;/strong&gt;? The local developer's favorite, &lt;strong&gt;Ollama&lt;/strong&gt;? Or perhaps a radical newcomer like &lt;strong&gt;ZML&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Choosing the wrong engine can lead to massive infrastructure bills, slow user experiences, or vendor lock-in.&lt;/p&gt;

&lt;p&gt;To cut through the hype, we are applying the &lt;strong&gt;QSOS (Qualification and Selection of Open Source software)&lt;/strong&gt; method. This isn't a casual review; it's a structured evaluation comparing these three contenders against the state-of-the-art features required for modern AI production.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Methodology: Why QSOS?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkge5m9dy66je4atphit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkge5m9dy66je4atphit.png" alt="QSOS Logo" width="257" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.qsos.org" rel="noopener noreferrer"&gt;QSOS&lt;/a&gt; is a standardized methodology designed to reduce the risks associated with adopting open-source technologies. Unlike ad-hoc selection processes based on Medium articles or GitHub stars, QSOS treats open-source evaluation with the same rigor used for proprietary software.&lt;/p&gt;

&lt;p&gt;The core philosophy of QSOS is separating &lt;strong&gt;Evaluation&lt;/strong&gt; (the intrinsic, objective quality of the software) from &lt;strong&gt;Qualification&lt;/strong&gt; (how well it fits your specific business needs).&lt;/p&gt;

&lt;p&gt;For this comparison, we used a "Best of Breed" evaluation grid, scoring features on a simple 0-to-2 scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;0:&lt;/strong&gt; Not covered / Non-existent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1:&lt;/strong&gt; Partially covered / Complex implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2:&lt;/strong&gt; Fully covered / Best-in-class standard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We assessed four key axes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Maturity &amp;amp; Community:&lt;/strong&gt; Is the project stable and likely to survive?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functional Features:&lt;/strong&gt; Does it support modern requirements like LoRA adapters and quantization?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance &amp;amp; Scale:&lt;/strong&gt; Can it handle high throughput and utilize hardware efficiently?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations (Day 2):&lt;/strong&gt; How easy is it to deploy, monitor, and maintain?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Contenders
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. vLLM: The Data Center Standard
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8pv4ovqry11gr3xumeg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8pv4ovqry11gr3xumeg.png" alt="vLLM Logo" width="239" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://vllm.ai" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;&lt;/strong&gt; burst onto the scene in 2023 from UC Berkeley, solving a critical bottleneck in serving LLMs: memory fragmentation. Its core innovation, &lt;strong&gt;PagedAttention&lt;/strong&gt;, allows it to manage GPU memory like an operating system manages virtual memory, dramatically increasing batch sizes and throughput.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; High-throughput production serving in the data center.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; vLLM is the currently the &lt;strong&gt;De Facto Standard&lt;/strong&gt; for enterprise deployment. It excels on server-grade hardware (NVIDIA H100s/A100s) and offers the richest feature set for scaling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Ollama: The Developer's Best Friend
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8i7p1zgrhls5qacwqki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8i7p1zgrhls5qacwqki.png" alt="Ollama Logo" width="344" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;&lt;/strong&gt; took a different approach. It focused entirely on removing friction. By wrapping the powerful &lt;code&gt;llama.cpp&lt;/code&gt; engine in a sleek, Docker-style Go binary, it made running a 70B parameter model on a MacBook as easy as typing &lt;code&gt;ollama run llama3&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; Local development, edge devices, and consumer hardware (Mac/PC).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; Ollama is the king of &lt;strong&gt;usability&lt;/strong&gt;. It is unbeaten for local testing and running models on consumer hardware, but it lacks the advanced scheduling required for high-traffic enterprise production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. ZML (Zig Machine Learning): The Radical Challenger
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ddw41ql12g8k4ekellb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ddw41ql12g8k4ekellb.png" alt="ZML Logo" width="200" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://zml.ai" rel="noopener noreferrer"&gt;ZML&lt;/a&gt;&lt;/strong&gt; is the new kid on the block. It is less of a "server" product and more of a compiler stack aimed at engineers. Written in Zig, it utilizes OpenXLA/MLIR to compile model graphs directly into standalone binaries, aiming to eliminate the heavy Python/PyTorch dependency chain entirely.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; High-performance, cross-platform runtime (TPUs, AMD, NVIDIA) without dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; ZML is an &lt;strong&gt;Alpha-stage visionary&lt;/strong&gt;. It offers incredible potential for hardware portability and efficiency but is currently a complex "build-your-own-stack" tool rather than a drop-in product.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Visualizing the Results
&lt;/h3&gt;

&lt;p&gt;To understand how these tools differ, we visualize our QSOS scores using two different schemas.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Radar Chart: Feature Balance
&lt;/h4&gt;

&lt;p&gt;This chart shows the balance of strengths across the four evaluation axes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlvb6g592d0kydxj0try.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlvb6g592d0kydxj0try.png" alt="QSOS Radar" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Caption: The QSOS Radar Chart highlights the distinct profiles of the three engines. vLLM shows the broadest coverage across features and performance. Ollama spikes toward Operational Ease. ZML shows potential in features but lacks maturity.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM (Blue):&lt;/strong&gt; The largest, most balanced area, indicating strength across maturity, features, and performance, with moderate operational complexity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama (Green):&lt;/strong&gt; A massive spike toward "Operational Ease," reflecting its zero-friction user experience, but pulling back on raw performance metrics like continuous batching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZML (Red):&lt;/strong&gt; A smaller footprint overall, reflecting its early stage (low maturity), but showing strong potential in functional features due to its compiler-based architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The QSOS Quadrant: Market Position
&lt;/h4&gt;

&lt;p&gt;This schema maps the tools based on their market adoption versus their raw production capabilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm90qnkq9c5tf3hld553.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm90qnkq9c5tf3hld553.png" alt="QSOS Quadrant" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Caption: The QSOS Quadrant positions the tools based on Market Maturity vs. Production Power.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM (The Leader):&lt;/strong&gt; High Maturity, High Power. The safe, scalable choice for the enterprise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama (The Specialist):&lt;/strong&gt; High Maturity, Lower Production Power. The standard for a specific niche (local/consumer hardware), prioritizing usability over scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZML (The Visionary):&lt;/strong&gt; Low Maturity, High Potential Power. An innovative approach that hasn't yet proven itself in the broad market.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Consolidated Score Sheet
&lt;/h3&gt;

&lt;p&gt;Below is the detailed breakdown of the evaluation scores that feed the charts above.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section / Criteria&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;vLLM&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;ZML (Zig ML)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;A. MATURITY&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;History &amp;amp; Age&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Very New)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Activity&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Hyper-Active)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Viral)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (High Velocity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ecosystem&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Dominant)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Ubiquitous)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Niche)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Community)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Company Led)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Small Team)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;B. FEATURES&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Support&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Universal)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Curated Lib)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Compiler based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Server: AWQ/FP8)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Edge: GGUF)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Implicit XLA)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA Adapters&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Dynamic Multi-LoRA)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Static Modelfile)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Not standard)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Compat.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (OpenAI Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (OpenAI Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Runtime only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C. PERFORMANCE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cont. Batching&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Gold Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (FIFO)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Arch. support)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Maximum SOTA)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Low/Single User)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (High Potential)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallelism&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Tensor &amp;amp; Pipeline)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Single Node)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Compiler Config)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware Agnosticism&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (NVIDIA Centric)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Apple/Consumer)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Any: TPU/AMD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;D. OPERATIONS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ease of Setup&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Python/Docker)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Magic 1-Click)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Hard: Bazel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Heavy Torch)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Zero: Go Binary)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Zero: Zig Binary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Prometheus Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Logs only)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Manual metrics)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;There is no single "best" inference engine. The right choice depends entirely on your specific context (the Qualification phase of QSOS).&lt;/p&gt;

&lt;h4&gt;
  
  
  Choose vLLM if:
&lt;/h4&gt;

&lt;p&gt;You are building a production application that needs to serve many concurrent users. You have access to server-grade GPUs (NVIDIA A10G, A100, H100) and need features like dynamic LoRA adapters for multi-tenancy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you are deploying to Kubernetes to serve customers, start here.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Choose Ollama if:
&lt;/h4&gt;

&lt;p&gt;You are a developer building locally on a Mac or Windows PC. You need a zero-friction way to test models, or you are deploying to edge devices where resources are constrained, and concurrency is low.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you just want to run Llama 3 on your laptop right now, download Ollama.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Choose ZML if:
&lt;/h4&gt;

&lt;p&gt;You are an ML systems engineer building a specialized hardware appliance (e.g., using TPUs or AMD chips) and need a runtime with absolutely zero Python dependencies and a tiny footprint. You are willing to build the server infrastucture around it yourself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you are frustrated by PyTorch bloat and want a "build your own" adventure, look at ZML.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Note on Methodology
&lt;/h3&gt;

&lt;p&gt;For the purpose of this article, we utilized a &lt;strong&gt;simplified QSOS evaluation grid&lt;/strong&gt;. We intentionally zoomed in on the "Best of Breed" criteria, the critical differentiators driving the current "Inference Wars", to keep the comparison readable and actionable.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;full-fledged QSOS evaluation&lt;/strong&gt; is significantly more exhaustive. It is structured as a hierarchical &lt;strong&gt;tree of criteria&lt;/strong&gt; containing more data points, covering deep operational details such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generic Attributes:&lt;/strong&gt; Intellectual property management, roadmap visibility, bug tracking efficiency, and internationalization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific Sub-sections:&lt;/strong&gt; Detailed granularity on security compliance (SOC2/GDPR), exact memory footprints, and specific driver version compatibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While this article provides a strategic overview, a complete QSOS audit would involve drilling down from high-level "Sections" into specific "Leaves" to calculate a precise, weighted score for every possible business constraint.&lt;/p&gt;

</description>
      <category>qsos</category>
      <category>zml</category>
      <category>ollama</category>
      <category>vllm</category>
    </item>
    <item>
      <title>Automating Image Generation with n8n and ComfyUI</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Sun, 07 Sep 2025 15:51:34 +0000</pubDate>
      <link>https://dev.to/worldlinetech/automating-image-generation-with-n8n-and-comfyui-521p</link>
      <guid>https://dev.to/worldlinetech/automating-image-generation-with-n8n-and-comfyui-521p</guid>
      <description>&lt;p&gt;This is the third article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.&lt;/p&gt;

&lt;p&gt;You'll learn &lt;strong&gt;how to use ComfyUI's API to build custom applications&lt;/strong&gt; and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://n8n.io" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n&lt;/strong&gt;&lt;/a&gt; is a workflow automation tool that connects applications, APIs, and services without requiring deep technical expertise. It allows users to create &lt;strong&gt;complex, multi-step workflows using a visual, node-based editor&lt;/strong&gt;. With n8n, you can automate tasks across thousands of integrations, from CRMs and databases to messaging apps and cloud services.&lt;/p&gt;

&lt;p&gt;It's a &lt;a href="https://docs.n8n.io/sustainable-use-license/" rel="noopener noreferrer"&gt;&lt;strong&gt;fair-code&lt;/strong&gt;&lt;/a&gt; and &lt;strong&gt;open-core&lt;/strong&gt; solution. You can self-host and modify the software freely, but SaaS providers must contribute back to the project if they offer n8n as a service. Furthermore, some advanced features like global variables, multiple environments (dev, staging, prod, etc.), version control using Git, or controlling n8n via API are not available in the community and open-source version of the product.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how to call ComfyUI from an n8n &lt;strong&gt;agent-based workflow with human interaction and LLM use&lt;/strong&gt;. The agent is instructed to transform a simple prompt from the user into a super-charged JSON Prompt Guide, which is then injected into ComfyUI. For more context, you can read my previous article on &lt;a href="https://dev.to/worldlinetech/json-style-guides-for-controlled-image-generation-with-gpt-4o-and-gpt-image-1-36p"&gt;&lt;strong&gt;JSON Prompt Style Guides&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;n8n is a Vue/TypeScript web application that's simple to install whether you prefer to run it on a Node.js installation or inside a Docker container.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js&lt;/strong&gt;: &lt;code&gt;npx n8n&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt;: &lt;code&gt;docker volume create n8n_data&lt;/code&gt; and then &lt;code&gt;docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After all dependencies are installed, the n8n Editor web UI is accessible at &lt;code&gt;http://localhost:5678&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Text-to-Image Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Workflow design is done in the Editor web UI, and it's a highly visual process that doesn't require any coding knowledge, as long as you use predefined nodes for a standard use case. That's our approach here, as we'll create a very simple 3-step workflow with 4 nodes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcurios5i266b3abbhamr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcurios5i266b3abbhamr.png" alt="T2I Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Chat Trigger&lt;/strong&gt; node to start the workflow with a message from the user to capture their initial prompt for the images to be generated by ComfyUI.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;AI Agent&lt;/strong&gt; node to call an OpenAI model (though it could be other SaaS solutions like Mistral, Anthropic, or Google Gemini, or local models provided through Ollama or directly by Hugging Face). The agent has instructions on how to expand the initial prompt from the previous node into a &lt;strong&gt;JSON Prompt Style Guide&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;OpenAI Chat Model&lt;/strong&gt; node to connect to OpenAI's GPT.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://github.com/mason276752/n8n-nodes-comfyui" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt;&lt;/a&gt; community node to connect to a running ComfyUI instance. To install it, go to the "&lt;em&gt;Settings / Community nodes&lt;/em&gt;" menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b6d063cqdad5pg6ldnd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b6d063cqdad5pg6ldnd.png" alt="n8n-nodes-comfyui installation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're making a simple use of this standard &lt;strong&gt;AI Agent&lt;/strong&gt; node and don't require memory or external tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nwqgr4ljxe28ioaagy4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nwqgr4ljxe28ioaagy4.png" alt="AI Agent node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most important parameter is the &lt;strong&gt;system message&lt;/strong&gt; given to the LLM to expand the initial user prompt. The &lt;strong&gt;OpenAI Chat Model&lt;/strong&gt; node handles the credentials to connect to OpenAI and allows us to select the GPT 4.1 mini model.&lt;/p&gt;

&lt;p&gt;The LLM response is then sent to the final node, which is interconnected with ComfyUI.&lt;/p&gt;

&lt;h3&gt;
  
  
  ComfyUI Community Node
&lt;/h3&gt;

&lt;p&gt;Once installed, this community node is quite straightforward to use.&lt;/p&gt;

&lt;p&gt;First, we configure the credentials to connect to ComfyUI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API URL&lt;/strong&gt;: In this example, it's &lt;code&gt;http://127.0.0.1:8188&lt;/code&gt;, but it could also be a remote instance of ComfyUI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Key&lt;/strong&gt;: This is used if you have configured one on the ComfyUI side.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71s9f9ybqx03fuf6hro9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71s9f9ybqx03fuf6hro9.png" alt="ComfyUI node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we specify the output format (&lt;strong&gt;PNG&lt;/strong&gt; or &lt;strong&gt;JPEG&lt;/strong&gt;) and the timeout for communication with ComfyUI. In the &lt;strong&gt;Workflow JSON&lt;/strong&gt; textarea, we copy the content of the workflow exported from ComfyUI (by using the "&lt;em&gt;File / Export (API)&lt;/em&gt;" menu).&lt;/p&gt;

&lt;p&gt;This means that n8n will send the workflow to be executed to the ComfyUI API in JSON format. We need to modify the ComfyUI workflow by using an expression containing the &lt;em&gt;$node["AI Agent"].data&lt;/em&gt; variable. Its value is dynamically set to the prompt provided by the previous node during n8n execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ebjyjsljgxmnzux3zz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ebjyjsljgxmnzux3zz.png" alt="Prompt insertion"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The exact location to inject the prompt depends on the JSON workflow exported from ComfyUI. Here, it's inside the &lt;strong&gt;"39.6"&lt;/strong&gt; node of type &lt;strong&gt;CLIP Text Encode&lt;/strong&gt;, but it might have a different name in your own workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution
&lt;/h3&gt;

&lt;p&gt;We're all set! We check that ComfyUI is running and ready to launch the workflow from the n8n UI by entering a prompt in the chat box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwti8h4he4btgoug62kp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwti8h4he4btgoug62kp.png" alt="User Chat"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a short video of the workflow execution. n8n displays real-time progress, and the generated images can be visualized inside the ComfyUI node.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/sBpbzYwr8Y4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Here are two images generated by from this prompt: "&lt;em&gt;A dramatic, cinematic shot of an ancient library at night, where the books are alive and their pages flutter like birds, forming constellations in the air.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxy3p49bdwwkbsrk3jui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxy3p49bdwwkbsrk3jui.png" alt="1st image generated"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm10o7wju67n4zhfq5gzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm10o7wju67n4zhfq5gzg.png" alt="2nd image generated"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, this 3-step workflow is very simple. The true power of coupling n8n and ComfyUI will become apparent with more complex use cases, leveraging n8n's extensive integration capabilities with many other components and solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image-to-Image Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Let's now create another workflow to transform an existing image based on user instructions. We'll intentionally keep this example super simple for clarity, but your use case might include a more complex workflow leveraging n8n's power. &lt;/p&gt;

&lt;p&gt;Here, we'll use only three nodes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdv9t72y5kmgoxtvdphr9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdv9t72y5kmgoxtvdphr9.png" alt="I2I workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;n8n Form / n8n Form trigger&lt;/strong&gt; node to start the workflow by displaying an HTML form for the user to upload the image to modify and specify what changes to apply.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;ComfyUI Image Transformer&lt;/strong&gt; community node to connect to a running ComfyUI instance. To install it, go to the "&lt;strong&gt;Settings / Community nodes&lt;/strong&gt;" menu and search for &lt;a href="https://www.npmjs.com/package/n8n-nodes-comfyui-image-to-image" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n-nodes-comfyui-image-to-image&lt;/strong&gt;&lt;/a&gt;. The example workflow exported from ComfyUI uses the Kontext Edit model to modify an existing image.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;n8n Form / Form Ending&lt;/strong&gt; node to notify the user when the image is generated and offer it for download.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ComfyUI Image Transformer Node
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7dx2rdhqs96an3r5xyw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7dx2rdhqs96an3r5xyw.png" alt="ComfyUI Image Transformer Node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This node is quite similar to the &lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt; node we used before, with the insertion of the &lt;em&gt;$json.Promt&lt;/em&gt; expression into the exported ComfyUI JSON workflow to inject instructions from the user.&lt;/p&gt;

&lt;p&gt;The main difference concerns how the input image to be modified is handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Type&lt;/strong&gt; defines how the image is obtained from the previous form node; we'll choose &lt;strong&gt;Binary&lt;/strong&gt; instead of &lt;strong&gt;URL&lt;/strong&gt; or &lt;strong&gt;Base64&lt;/strong&gt; text.&lt;/li&gt;
&lt;li&gt;The property containing the binary file must be specified, which is the &lt;strong&gt;data&lt;/strong&gt; field here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Node ID&lt;/strong&gt; is used to identify—within the exported ComfyUI JSON workflow — the node in charge of loading the input image (it must be of type &lt;strong&gt;LoadImage&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've added the last node to finalize the form management started with the first node, retrieve the modified image, return it in binary format, and offer the user the option to save it locally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0am0bq9jsxflv1i5hz92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0am0bq9jsxflv1i5hz92.png" alt="Form Ending"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution
&lt;/h3&gt;

&lt;p&gt;Let's execute the workflow. n8n displays a form for us to enter both the image and the associated instructions for its modification.&lt;/p&gt;

&lt;p&gt;Here is a short video of the workflow execution.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/Os7Fp7jop7w"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial Image&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ih2bnwq5v1klxyyxb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ih2bnwq5v1klxyyxb8.png" alt="Initial Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modified Image&lt;/strong&gt; with the prompt "&lt;em&gt;Make the scene at night with full moon and moonlight&lt;/em&gt;":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2icl18zbcsfe1n3i3t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2icl18zbcsfe1n3i3t8.png" alt="Modified Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This second example workflow is so simple that we could do the exact same thing directly using the ComfyUI UI. It's here simply to illustrate how integration with n8n can be achieved. A more value-added workflow might, for instance, include a loop that allows the user to keep modifying the image outputs until they are satisfied.&lt;/p&gt;

&lt;p&gt;Also, note that the &lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt; package offers other custom nodes for integration into your workflows, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dual Image Transformer&lt;/li&gt;
&lt;li&gt;Single Image to Video&lt;/li&gt;
&lt;li&gt;Dual Image Video Generator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's also worth noting that even though n8n offers Form nodes, it's primarily intended to be used in the backend through API calls. This feature, however, is limited to Enterprise licensees. &lt;/p&gt;




&lt;p&gt;With these two workflows, we've demonstrated how n8n can serve as a powerful orchestrator for ComfyUI. By leveraging its visual editor and extensive library of integrations, we transformed a simple user prompt into a rich, structured guide for image generation and created a seamless image-to-image transformation process.&lt;/p&gt;

&lt;p&gt;While our examples were simple to illustrate the concepts, the true value of n8n lies in its ability to connect ComfyUI with a vast ecosystem of tools, from databases and CRMs to messaging services and other AI models. This opens up new possibilities for building sophisticated, end-to-end applications that go far beyond what a standalone ComfyUI interface can offer.&lt;/p&gt;

&lt;p&gt;In the next article of this series, we'll explore another paradigm for connecting ComfyUI with agent-based solutions. We will delve into the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;, designed to streamline and standardize the way AI models communicate and share contextual information. This will offer a new, more efficient method for agents to interact with and control ComfyUI.&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>n8n</category>
      <category>genai</category>
      <category>agents</category>
    </item>
    <item>
      <title>WebSockets &amp; ComfyUI: Building Interactive AI Applications</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 05 Sep 2025 09:17:07 +0000</pubDate>
      <link>https://dev.to/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g</link>
      <guid>https://dev.to/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g</guid>
      <description>&lt;p&gt;This is the second article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.&lt;/p&gt;

&lt;p&gt;You'll learn &lt;strong&gt;how to use ComfyUI's API to build custom applications&lt;/strong&gt; and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;In the &lt;a href="https://dev.to/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi"&gt;previous article&lt;/a&gt; of the &lt;em&gt;Beyond the ComfyUI Canvas&lt;/em&gt; series, we demonstrated how to connect ComfyUI with Jupyter Notebook using basic HTTP API calls. While functional, this approach had a significant limitation: it relied on a time.sleep() function to wait for workflow completion, requiring manual adjustments based on the complexity of each workflow, a far from ideal solution.&lt;/p&gt;

&lt;p&gt;To overcome this inefficiency, we’ll &lt;strong&gt;leverage ComfyUI’s WebSocket API&lt;/strong&gt; (/ws endpoint), which enables real-time, bidirectional communication between Jupyter and ComfyUI. This upgrade unlocks a seamless experience by providing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instant execution progress updates to track workflow status,&lt;/li&gt;
&lt;li&gt;Live node execution feedback for monitoring each step,&lt;/li&gt;
&lt;li&gt;Immediate error messages and debugging insights for troubleshooting,&lt;/li&gt;
&lt;li&gt;Dynamic queue status updates to respond to changes on the fly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By adopting WebSockets, we eliminate guesswork and create a responsive, interactive workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;Let's simplify our previous use-case by dropping the OpenAI Assistant and focusing on how to eliminate manual polling or delays. The process is designed to be both intuitive and efficient:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Setup&lt;/strong&gt;: A pre-defined ComfyUI workflow (loaded from a JSON file) serves as the foundation for image generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Customization&lt;/strong&gt;: The user provides a text prompt which is dynamically inserted into the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Execution&lt;/strong&gt;: Using ComfyUI’s WebSocket API, the notebook sends the workflow to the server and monitors its progress in real time—receiving live updates on execution status, node activity, and completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result Retrieval&lt;/strong&gt;: Once generation finishes, the resulting images are automatically fetched and displayed directly in the notebook, creating a seamless end-to-end experience.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s dive into the implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get prompt from user
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please enter your prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please enter your prompt
A penguin in a tuxedo, DJing at a club for dancing jellyfish
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Trigger the Workflow from Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;Below, you’ll find a detailed breakdown of the code designed for use in a Jupyter Notebook, complete with helpful comments to guide you through each step and explain its functionality&lt;/p&gt;
&lt;h3&gt;
  
  
  Imports and main functions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;  &lt;span class="c1"&gt;# For WebSocket communication
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;       &lt;span class="c1"&gt;# For generating unique client IDs
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;       &lt;span class="c1"&gt;# For JSON data handling
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;   &lt;span class="c1"&gt;# For HTTP requests (replaces urllib)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;  &lt;span class="c1"&gt;# For image processing
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;         &lt;span class="c1"&gt;# For handling binary data streams
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IPython.display&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;  &lt;span class="c1"&gt;# For displaying images in Jupyter
&lt;/span&gt;
&lt;span class="c1"&gt;# Server configuration
&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:8188&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Local server address and port
&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;      &lt;span class="c1"&gt;# Unique client ID for this session
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;queue_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Send a prompt to the server for execution.

    Args:
        prompt (dict): The workflow/prompt to execute.
        prompt_id (str): Unique ID for tracking the prompt.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subfolder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;folder_type&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Fetch an image from the server.

    Args:
        filename (str): Name of the image file.
        subfolder (str): Subfolder where the image is stored.
        folder_type (str): Type of folder (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).

    Returns:
        bytes: Binary image data.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subfolder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;subfolder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;folder_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/view&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Retrieve the execution history for a given prompt ID.

    Args:
        prompt_id (str): ID of the prompt whose history is requested.

    Returns:
        dict: History data for the prompt.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/history/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Execute a prompt and collect the resulting images.

    Args:
        ws (websocket.WebSocket): Active WebSocket connection.
        prompt (dict): The workflow/prompt to execute.

    Returns:
        dict: Dictionary of node IDs and their output images.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;queue_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# Listen for WebSocket messages until execution is complete
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;executing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Execution is done
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Binary previews are ignored here
&lt;/span&gt;            &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve and organize output images
&lt;/span&gt;    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;node_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;images_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;subfolder&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="n"&gt;images_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output_images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;images_output&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output_images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Load the workflow and inject the user prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t2i-krea.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Update the prompt text in the workflow
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;39:6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Communication with ComfyUI through WebSockets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Establish WebSocket connection
&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ws://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ws?clientId=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Execute the workflow and collect images
&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Display the output images in Jupyter
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="c1"&gt;# Display each image in the notebook
&lt;/span&gt;        &lt;span class="n"&gt;display&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggv6lxyx0blcavqo285.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggv6lxyx0blcavqo285.jpg" alt="1st Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z5yh0qq85an74w72q24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z5yh0qq85an74w72q24.png" alt="2nd Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;This article demonstrated the power of using &lt;strong&gt;WebSockets&lt;/strong&gt; for real-time, bidirectional communication with ComfyUI. By moving beyond &lt;strong&gt;simple HTTP requests&lt;/strong&gt;, we eliminated the need for manual time delays and created a truly dynamic, responsive workflow. This allowed us to monitor the execution of our AI pipeline in real-time, ensuring a more reliable and efficient integration. The result is a seamless experience where we can send a prompt and watch as the generated images appear automatically in our notebook.&lt;/p&gt;

&lt;p&gt;Having now explored two different ways to integrate ComfyUI with Python code executed in Jupyter, we've laid a strong foundation for building custom, high-level generative AI applications. But what if you're not a developer, or you simply prefer a visual, no-code approach to orchestration? In the next article of the series, we'll shift our focus from code to a &lt;strong&gt;no-code solution like n8n&lt;/strong&gt; to show you how to build powerful ComfyUI workflows without writing a single line of code. &lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>jupyter</category>
      <category>websockets</category>
      <category>genai</category>
    </item>
    <item>
      <title>Unlocking ComfyUI's Power: A Guide to the HTTP API in Jupyter</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Thu, 04 Sep 2025 15:28:06 +0000</pubDate>
      <link>https://dev.to/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi</link>
      <guid>https://dev.to/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi</guid>
      <description>&lt;p&gt;This is the first article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers. &lt;/p&gt;

&lt;p&gt;You'll learn how to use ComfyUI's API to build custom applications and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/comfyanonymous/ComfyUI" rel="noopener noreferrer"&gt;ComfyUI&lt;/a&gt; is a powerful, modular interface for generative models, allowing users to create complex AI image, video and sound generation workflows with a node-based editor. &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter Notebook&lt;/a&gt;, on the other hand, is a popular interactive environment for data analysis, visualization, and prototyping.&lt;/p&gt;

&lt;p&gt;By integrating ComfyUI with Jupyter Notebook, you can leverage the flexibility of ComfyUI’s workflows directly within your Python scripts or data science pipelines. This first article focuses on a simple approach using Basic HTTP API calls.&lt;/p&gt;

&lt;p&gt;Most of this article is exported from an actual Jupyter Notebook. Both content, Python code and execution results are displayed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;Our goal is to build a high-level generative AI workflow that combines the power of an intelligent agent with the robust image generation capabilities of ComfyUI. The process unfolds in a few simple steps, all orchestrated within a Jupyter Notebook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;User Input:&lt;/strong&gt; The workflow begins with a simple, high-level prompt entered directly into the notebook.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Agent-Powered Expansion:&lt;/strong&gt; An &lt;strong&gt;OpenAI Assistant&lt;/strong&gt; then takes this basic prompt and transforms it into a detailed, structured &lt;strong&gt;JSON Prompt Style Guide&lt;/strong&gt;. This process enriches the initial idea with specific creative instructions, such as style, composition, and lighting.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Initiating Generation:&lt;/strong&gt; This expanded JSON guide is automatically injected into a pre-defined ComfyUI workflow. A single API call to the &lt;strong&gt;ComfyUI server&lt;/strong&gt; starts the image generation process.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Displaying the Result:&lt;/strong&gt; Once the generation is complete, we make a second API call to fetch the resulting images. The images are then displayed directly within the Jupyter Notebook, completing our automated pipeline. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prepare a ComfyUI Workflow
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create or load a workflow in ComfyUI.&lt;/li&gt;
&lt;li&gt;Save the workflow as a .json file from the "&lt;em&gt;File / Export (API)&lt;/em&gt;" menu (e.g., &lt;code&gt;t2i-krea.json&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get initial prompt from user
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please enter your prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please enter your prompt
Hanuman flying over a modern city at night
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Generate JSON Prompt Style Guide with an Assistant
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Create a thread
&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Send a message
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_prompt&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run the assistant
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;assistant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asst_Uj0Qr0rG0bz8NVk1LWiS9UKv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Wait for completion and retrieve the response
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the response
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;json_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Urban Deus Ex Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inspiration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Modern Urban Aesthetics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Hindu Mythology"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Superhero Comics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Cyberpunk Lighting"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman, the Hindu god, flying over a bustling modern city radiating bright lights under the cloak of night sky"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Strong, muscular figure with a monkey face, holding a gada(mace)."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"midground"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flying with one hand extended"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"large"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"determined"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"interaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flying over the city"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"modern urban skyline with skyscrapers, neon billboards, and busy traffic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"expansive"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"comic-realistic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"color_palette"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#202020"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"secondary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#505050"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#ff6a00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shadow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#0d0d0d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"background_gradient"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"#0d0d0d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"#303030"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Glistening city lights with diffused neon glow and soft moonlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"powerful and captivating"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scenery"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Modern urban cityscape with skyscrapers, roads, traffic and massive billboards with neon signs"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slightly off-center focus with Hanuman taking up prominent space"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"camera"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"angle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"distance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium shot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wide-angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"focus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharp subject, blurred background"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"medium"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Digital Painting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"textures"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"smooth skin of Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"rough concrete of buildings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"glossy glass of skyscrapers"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4K"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"clothing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman is dressed in traditional golden and red garment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"weather"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Night with clear sky and a soft moonlight"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"effects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bokeh effect for city lights"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Glow effect for neon lights"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"themes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Divinity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Strength"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Modernization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Contrast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Juxtaposition of Tradition with Modernity"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage_notes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The style is effective in creating a surprising juxtaposition of traditional divinity with modern landscapes. Use this style for high impact illustrations where contrasts need to be highlighted."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Trigger the Workflow from Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;Use the requests library to send a POST request to the ComfyUI API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# ComfyUI server URL
&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8188&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;prompt_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Load your workflow JSON
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t2i-krea.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Replace the prompt
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;39:6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_prompt&lt;/span&gt;

&lt;span class="c1"&gt;# Define the payload
&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jupyter_notebook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Send the request
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the prompt_id
&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;c1a2ced4-772c-4aeb-ac45-bfa183d03a88
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Retrieve the generated images
&lt;/h2&gt;

&lt;p&gt;ComfyUI processes the workflow asynchronously. &lt;/p&gt;

&lt;p&gt;To fetch the result, poll the /history endpoint:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;IPython.display&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;    

&lt;span class="c1"&gt;# Wait for the workflow to complete
&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Adjust based on workflow complexity
&lt;/span&gt;
&lt;span class="c1"&gt;# Fetch the latest result for our prompt
&lt;/span&gt;&lt;span class="n"&gt;history_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/history/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history_url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Navigate to the list of image outputs and display them
&lt;/span&gt;&lt;span class="n"&gt;image_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;image_outputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/view?filename=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4g9e4aq1u1ilozs2xj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4g9e4aq1u1ilozs2xj0.png" alt="First Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0l7hxignffemzrvzy6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0l7hxignffemzrvzy6k.png" alt="Second Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this article, we've seen how to leverage the power of ComfyUI directly from a Jupyter Notebook. By making simple API calls, we were able to transform a user's basic text prompt into a rich, detailed JSON guide using an OpenAI Assistant, and then feed that guide into a ComfyUI workflow to generate images. This approach demonstrates how you can move beyond the graphical interface to build automated, intelligent systems for creative tasks. The combination of Python's flexibility and ComfyUI's robust backend opens up a world of possibilities for custom, high-level generative AI workflows.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g"&gt;next article&lt;/a&gt;, we'll take our integration a step further by exploring how to use &lt;strong&gt;WebSockets&lt;/strong&gt; for Real-Time Interaction with ComfyUI.&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>api</category>
      <category>jupyter</category>
      <category>genai</category>
    </item>
    <item>
      <title>Enhancing QR Codes in the Age of GenAI</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 May 2025 09:46:49 +0000</pubDate>
      <link>https://dev.to/worldlinetech/enhancing-qr-codes-in-the-age-of-genai-4fa6</link>
      <guid>https://dev.to/worldlinetech/enhancing-qr-codes-in-the-age-of-genai-4fa6</guid>
      <description>&lt;h2&gt;
  
  
  Traditional QR Codes
&lt;/h2&gt;

&lt;p&gt;Quick Response (QR) codes were developed in 1994 by Masahiro Hara and are now recognized as an ISO/IEC standard. They represent an evolution of 2D barcodes, capable of encoding numeric, alphanumeric, binary, or Kanji data in the form of a pattern of black squares on a white background. These codes are available in various sizes (or versions), ranging from version 1 (21 x 21 squares) to version 40 (177 x 177 squares).&lt;/p&gt;

&lt;p&gt;Numerous libraries and tools exist for generating QR codes. My preferred open-source library is &lt;a href="https://nayuki.io/page/qr-code-generator-library" rel="noopener noreferrer"&gt;QR Code Generator&lt;/a&gt;, which supports all standard features and is available in Java, TypeScript/JavaScript, Python, Rust, C++, and C. Additionally, my favorite all-in-one open-source tool is &lt;a href="https://qrcode.antfu.me" rel="noopener noreferrer"&gt;QR Toolkit&lt;/a&gt;, a Vue/Nuxt application offering marker and module customization, along with verification and comparison mechanisms, an invaluable resource when tweaking QR codes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0gic8761tmz58m94fum.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0gic8761tmz58m94fum.png" alt="QR Toolkit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;QR codes comprise several critical components to ensure readability by scanners, including three positional markers, alignment and timing patterns, and a masking system. While I will not delve into these details now, I will instead focus on the built-in error correction mechanism. This employs Reed-Solomon codes - also used in storage media (CD/DVD, RAID6) and network technologies (DSL, satellite) — by adding extra codewords to the QR grid for error correction. The standard defines four levels of error correction, each associated with a different tolerance percentage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Approximate Error Tolerance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;~7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;~15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quartile&lt;/td&gt;
&lt;td&gt;~25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;~30%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means a QR code with High error correction can still be scanned if up to 30% of the image becomes unreadable. This feature is often utilized to embed images within QR codes: the embedded image is treated as errors during scanning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg2g8xovbspfxv5wha6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg2g8xovbspfxv5wha6y.png" alt="Image embedded in QR Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For years, this technique has been used for personalizing QR codes. This article explores an innovative approach to customizing QR codes by leveraging Generative AI instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harnessing Generative AI
&lt;/h2&gt;

&lt;p&gt;My proposal involves using a Stable Diffusion model integrated within the ComfyUI graphical interface to design and execute local generation workflows on a GPU-equipped PC. For detailed guidance on these components, refer to this &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-1-1gan"&gt;article&lt;/a&gt; or this &lt;a href="https://www.youtube.com/watch?v=kXraePyAT-c" rel="noopener noreferrer"&gt;video&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To modify and refine existing QR codes while maintaining their scannability, we will use a specialized ControlNet called &lt;a href="https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster" rel="noopener noreferrer"&gt;QR Code Monster&lt;/a&gt;. ControlNets are auxiliary neural network models that inject targeted guidance into the generation process by focusing on specific features of an input image. Each ControlNet emphasizes particular aspects, such as structure (pose, edges, segmentation, depth), texture, content layout (bounding boxes, masks), or style (color maps, textures). In our scenario, we’ll focus on maintaining or modifying QR code contrast features.&lt;/p&gt;

&lt;p&gt;Let’s proceed to create a workflow in ComfyUI, employing Stable Diffusion 1.5, the QR Code Monster ControlNet, and a QR code generated via QR Toolkit.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/uAvAZFG9sWY"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Adjusting parameters such as the ControlNet’s strength and start/end positions, along with the sampling process (e.g., 50 steps), I obtained a result that remains scannable and aligns with my input prompt: &lt;em&gt;“A beautiful landscape, blue sky, grass, flowers.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dkcvkdk6ubustifznx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dkcvkdk6ubustifznx0.png" alt="A beautiful landscape, blue sky, grass, flowers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This demonstrates how Stable Diffusion combined with ControlNet preserved the original pattern while injecting desired visual elements. Using QR Toolkit’s comparison feature, we can assess the QR code’s readability by examining the difference markers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u5qapa6debvb97b2shf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u5qapa6debvb97b2shf.png" alt="QR Toolkit - Comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we can modify the prompt to produce multiple variants of our QR code. For example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1fl294wsircszdkaur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1fl294wsircszdkaur.png" alt="Different Prompts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While changing the overall style is straightforward (first example), embedding specific content within the QR code remains more challenging than with traditional tools (second example). To explore this further, we'll examine two axes separately: Style and Content, before combining them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Customizing Style
&lt;/h2&gt;

&lt;p&gt;Enhancing the prompt allows for more precise control over the QR code’s aesthetic. For instance, leveraging a large language model (LLM) to generate detailed prompts:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“A pattern forged from molten lava, glowing with an intense fiery orange and red hue. Cracks in the surface reveal volcanic heat, with small embers rising around it.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhm5r4e6g34iujqg1fqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhm5r4e6g34iujqg1fqt.png" alt="A pattern forged from molten lava"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, for a more intricate and mystical style:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“An elegant, glowing elven door adorned with intricate, nature-inspired patterns and shimmering silver runes. Delicate vines and luminescent flowers intertwine with the carvings, pulsating with soft emerald and sapphire light. The archway, crafted from ethereal white stone, radiates a mystical aura, with faint golden mist swirling at its base, hinting at an ancient portal to a hidden realm.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yau7ofwv4o3l7ndn48v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yau7ofwv4o3l7ndn48v.png" alt="An elegant, glowing elven door"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Predefined styles can also be injected into prompts using the &lt;a href="https://github.com/MohammadAboulEla/ComfyUI-iTools" rel="noopener noreferrer"&gt;iTools Prompt Styler Extra&lt;/a&gt; node in ComfyUI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tanzmn396uo87bmm10j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tanzmn396uo87bmm10j.png" alt="iTools"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This node offers reusable prompts categorized by various artistic styles: 3D, Art, Craft, Design, Drawing, Illustration, Painting, Sculpture, Vector, and more. Incorporating it into our workflow makes testing different styles effortless without altering other parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweute0t5tv1pv5cs9qwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweute0t5tv1pv5cs9qwb.png" alt="iTools Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below are examples of QR codes generated with different styles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfx1jkmc3aace0acwq81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfx1jkmc3aace0acwq81.png" alt="iTools Examples"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Additionally, combining styles with custom prompts allows for highly personalized designs, enabling limitless customization of your QR codes’ appearance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Content
&lt;/h2&gt;

&lt;p&gt;Having mastered style adjustments, the next step is to embed specific generated content into QR codes. For example, I wish to insert an image of a yoga pose. If you’ve read my previous articles on AI image generation, you’ll understand the transfer of poses through workflows. Details are available &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-2-42c"&gt;here&lt;/a&gt; for further reference.&lt;/p&gt;

&lt;p&gt;We’ll start with an abstract image of the target pose, add Depth and Canny Edge ControlNets to our workflow, and specify in the prompt: &lt;em&gt;“man, mixed race, short curly hair, black hair, 40 years old, white T-shirt, black yoga pants, short sleeves, smiling, viewing glasses, white background, barefoot.”&lt;/em&gt; Essentially, I aim to generate an image resembling myself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gab0lxpdd4znj8n0tpl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gab0lxpdd4znj8n0tpl.png" alt="Pose Transfer Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To ensure a realistic likeness, additional steps include incorporating the FaceID IP Adapter and the FaceDetailer post-processing model into the workflow. Refer to this &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-3-5517"&gt;article&lt;/a&gt; for comprehensive guidance on implementing face transfer. The outcome preserves scannability and creates a QR code embedding the desired pose and identity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom9y6qblxw6npixsrl4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom9y6qblxw6npixsrl4i.png" alt="Pose and Face Transfer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using QR Toolkit again, the comparison displays about 26 mismatch nodes, primarily around the facial features and body.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg9a0ozszeet7h0huzee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg9a0ozszeet7h0huzee.png" alt="QR Toolkit Comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Style and Content
&lt;/h2&gt;

&lt;p&gt;All previous steps can be combined by adding the iTools node to the final workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xi18981onenilcvxojs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xi18981onenilcvxojs.png" alt="Combined Example Outputs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the QR Code Animate
&lt;/h2&gt;

&lt;p&gt;Given that I can embed a face into the QR code, I can also animate facial expressions using specialized nodes. The &lt;a href="https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait" rel="noopener noreferrer"&gt;Advanced Live Portrait&lt;/a&gt; tool is designed for editing, inserting, and animating facial expressions in images. By inputting our generated QR code, we can animate my face to produce a smiling expression or nodding motion.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft384rduzqeufd8f0lnc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft384rduzqeufd8f0lnc.png" alt="Advanced Live Portrait"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resulting animation can be exported as an animated GIF or video:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpes98vmdcx08j3w5uts.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpes98vmdcx08j3w5uts.gif" alt="Animated GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This short tutorial has demonstrated how to significantly enhance both the stylistic and content-related aspects of a QR code. You are now equipped to craft engaging, customized QR codes that align with your personal or branding style. &lt;/p&gt;

&lt;p&gt;The only limits are your patience and imagination, so have fun experimenting!&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/STYLfK_xeEo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>stablediffusion</category>
      <category>qrcode</category>
      <category>genai</category>
    </item>
    <item>
      <title>The Yoga of Image Generation – Part 3</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Mon, 19 May 2025 14:16:11 +0000</pubDate>
      <link>https://dev.to/worldlinetech/the-yoga-of-image-generation-part-3-5517</link>
      <guid>https://dev.to/worldlinetech/the-yoga-of-image-generation-part-3-5517</guid>
      <description>&lt;p&gt;In the first two parts of this series, we explored Stable Diffusion, ComfyUI, and how to build Text-to-Image and Image-to-Image workflows to generate images of Yoga poses. With the help of ControlNets, we learned how to transfer a pose from an abstract reference image to our final generated image.&lt;/p&gt;

&lt;p&gt;A Yoga sequence consists of several connected poses, which means we need visual consistency across all generated images in the sequence. This consistency must first cover the &lt;em&gt;style&lt;/em&gt; which we addressed in the previous part of the series but also the &lt;em&gt;facial features&lt;/em&gt; of the person depicted.&lt;/p&gt;

&lt;h2&gt;
  
  
  LoRAs (Low-Rank Adapters)
&lt;/h2&gt;

&lt;p&gt;Let’s now introduce a new component into our workflow to tackle this challenge: Low-Rank Adapters (LoRAs). LoRAs make slight adaptations to the base model they are trained on by modifying only a small subset of neural network parameters. This is a highly efficient technique, as it enables faster training, smaller file sizes, and lower memory usage. You can think of a LoRA as a patch applied at runtime to the base model. Multiple LoRAs can be chained together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r13kwq1q4qn21y64fwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r13kwq1q4qn21y64fwg.png" alt="LoRA nodes in ComfyUI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LoRAs are typically used to specialize an existing model with certain image features such as style, poses, concepts, or characters. They are triggered in prompts using specific keywords defined by the LoRA creator during training. The community offers numerous LoRAs available for download from sites like civitai.com, which can be integrated into your local ComfyUI workflows.&lt;/p&gt;

&lt;p&gt;Here are two examples of images generated using a "Pencil drawing" LoRA, with two different keywords and all other parameters unchanged:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg4x7on42hv6w2zk1lkl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg4x7on42hv6w2zk1lkl.png" alt="LoRA for Style"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The community also offers countless LoRAs for generating images resembling celebrities. Let’s try using some of these to achieve facial consistency. We’ll start by testing Celebrity LoRAs with very light pose transfer (ControlNet strength set to 10%) to see how closely the generated faces match.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc8k8p6y5izmqphabpz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc8k8p6y5izmqphabpz7.png" alt="Testing a Celebrity LoRA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Promising results! Note that the poses aren’t identical across images, this is due to the low ControlNet strength we used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrh55yd5mcbidrr6958g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrh55yd5mcbidrr6958g.png" alt="LoRA for Celebrities"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let’s incorporate these LoRAs into our previous pose generation workflow. I stacked two LoRAs: one for facial identity and another for a graphite drawing style. I also kept the two ControlNets we introduced earlier for pose transfer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd425jemmxfdrqlq8j3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd425jemmxfdrqlq8j3n.png" alt="Workflow with LoRAs and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this setup, we can generate sequences that are consistent in both style and facial identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8g6tei2gi419rx4a80v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8g6tei2gi419rx4a80v.png" alt="Sequence with LoRAs and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, we can change the celebrity reference or even chain multiple LoRAs together, adjusting their strengths to blend features of different identities. However, using public figures still feels a bit uncomfortable, potentially raising ethical concerns around deepfakes.&lt;/p&gt;

&lt;p&gt;A better approach is to create your own LoRA, avoiding such issues. So I decided to train a LoRA using images of my wife. I first experimented with the DreamBooth method, using a &lt;a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb" rel="noopener noreferrer"&gt;Colab Notebook&lt;/a&gt; and Google GPUs. I trained the model on 28 images of her, using an SDXL base model, over 2 epochs, taking around 1.5 hours.&lt;/p&gt;

&lt;p&gt;The results were... promising 😉&lt;br&gt;
Here are some of the best images generated with my first custom LoRA:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ee92h7esq473587ncz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ee92h7esq473587ncz.png" alt="Using my very first LoRA (Dreambooth)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resemblance is there, but not quite enough, and the image quality was lacking. So I tried again, this time training the LoRA locally on my PC using the &lt;a href="https://github.com/bmaltais/kohya_ss" rel="noopener noreferrer"&gt;Kohya_ss&lt;/a&gt; open source tool. I selected the PowerPuffMix model (a fine-tuned of SDXL), trained on just 15 images but for 20 epochs. The process took about 3.5 hours and yielded better results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nbc304ijk0nnduhxpnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nbc304ijk0nnduhxpnm.png" alt="Using my second LoRA (Kohya_ss)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time, both image quality and facial identity were strong enough to integrate into our generation workflow.&lt;/p&gt;

&lt;p&gt;Here are some outputs using the new LoRA. While the face doesn’t perfectly resemble my wife (likely due to the influence of ControlNets) the identity consistency we needed is clearly present.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls40mbheeq3l4pxbhwww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls40mbheeq3l4pxbhwww.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The lighting is still a bit unstable, and overall image quality remains imperfect. I could improve this by training on more images and increasing the number of epochs. However, the final LoRA is still fundamentally linked to the base model and can't be applied to another one.&lt;/p&gt;
&lt;h2&gt;
  
  
  Image Prompt Adapters (IP Adapters)
&lt;/h2&gt;

&lt;p&gt;Let’s now try another technique: Image Prompt Adaptation, which is more decoupled from the base model. It functions similarly to a ControlNet but alters the model directly. Think of an IP Adapter as a one-image LoRA.&lt;/p&gt;

&lt;p&gt;The FaceID IP Adapter, specialized in facial recognition and feature extraction, is a perfect fit for our needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrafai3wpql1n74jnqw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrafai3wpql1n74jnqw9.png" alt="FaceDetailer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While exploring facial enhancement tools, I also discovered FaceDetailer, which improves facial features (eyes, nose, lips, expression) after image generation. I decided to integrate both of these components into our workflow. FaceDetailer’s enhancements are based on the FaceID input, so they remain faithful to the original facial reference.&lt;/p&gt;

&lt;p&gt;Here is the complete workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtwwnt7mbvvs0ycnm0no.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtwwnt7mbvvs0ycnm0no.png" alt="Final workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We now finally achieve our desired outcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control over &lt;em&gt;style&lt;/em&gt; via prompts and embeddings&lt;/li&gt;
&lt;li&gt;Control over &lt;em&gt;pose&lt;/em&gt; via ControlNets&lt;/li&gt;
&lt;li&gt;Control over &lt;em&gt;identity&lt;/em&gt; via the FaceID IP Adapter and FaceDetailer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup allows us to generate precise and coherent Yoga sequences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon7rt9aqx4fy76u9jzt2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon7rt9aqx4fy76u9jzt2.png" alt="Sequence with FaceID, FaceDetailer and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another advantage of this workflow is how easily we can switch the base model. For instance, here’s an example using the &lt;a href="https://civitai.com/models/198051/cheyenne" rel="noopener noreferrer"&gt;Cheyenne&lt;/a&gt; model, which specializes in cartoon and graphic novel styles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczdzm232i2ij47gr9ezm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczdzm232i2ij47gr9ezm.png" alt="Changing the Base Model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s also incredibly easy to change the subject’s identity. Since FaceID only requires a single image and no training phase, here are examples generated with the exact same workflow, using my own face as input for facial identity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbs82a97l89pvqkkvsjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbs82a97l89pvqkkvsjh.png" alt="Changing the Persona"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;This concludes our three-part series. My initial goal — generating accurate yoga poses and full sequences using only a local machine — has been achieved. &lt;/p&gt;

&lt;p&gt;In Part 1, we introduced Stable Diffusion and ComfyUI to build simple Text-to-Image workflows using prompts and embeddings. In Part 2, we explored pose transfer using Image-to-Image workflows and ControlNets. In this final installment, we addressed facial consistency, first with LoRAs, then with the FaceID IP Adapter and the post-processing FaceDetailer.&lt;/p&gt;

&lt;p&gt;You’re now ready to create custom workflows tailored to your specific visual goals. Enjoy experimenting with generative AI to express your creativity with precision!&lt;/p&gt;

&lt;p&gt;Stay tuned for more image generation tutorials and in the meantime, feel free to explore my YouTube channel for more content.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/9QRz5cKQCUg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>stablediffusion</category>
      <category>lora</category>
      <category>ipadapter</category>
    </item>
  </channel>
</rss>
