DEV Community: Technology at Worldline

Modern concurrency on the JVM: Coroutines and Loom

Yassine Benabbas — Tue, 02 Jun 2026 10:25:22 +0000

Concurrent programming allows running multiple tasks simultaneously, and was historically achieved mostly through threads.
However, they are harder to work with and have some limitations.
Modern concurrent programming brings new concepts that make concurrent programming easier and more efficient.
Two of those concepts are structured concurrency and lightweight threads, which are implemented in the JVM ecosystem through Kotlin coroutines and Project Loom.
Let's explore these two approaches.

Introduction

Traditional (thread-based) concurrency has two notable issues, which are the famous callback hell and high consumption of system resources due to extensive thread creation.
The following example illustrates the extensive use of threads.

void main(String... args) {
  Set<Long> uniqueThreadNames = ConcurrentHashMap.newKeySet();
  for (int i = 0; i < 1_000; i++) {
    new Thread(() -> {
      try {
        // Simulate IO request (database, HTTP call, ...)
        Thread.sleep(1000);
        uniqueThreadNames.add(Thread.currentThread().threadId());
        IO.println(uniqueThreadNames.size());
      } catch (Exception e) {
        e.printStackTrace();
      }
    }).start();
  }
}

The above code creates 1000 threads that each sleep for 1 second and print the current number of unique thread IDs.
In the example, we use a ConcurrentHashMap to store the unique thread IDs while remaining thread-safe.
The output is non-deterministic because threads run concurrently, but it should show growth toward 1000.

1
...
...
999
1000

This means up to 1000 unique threads can be created, which is expected since we create 1000 threads in the code.

By analyzing both the code and the output, we can note two above-mentioned problems.
The first one is that the code is susceptible to callback hell: since threads are created with a lambda, they require a callback style of programming if we want to perform actions after the thread completes, which can lead to deeply nested code that is hard to read, predict and maintain.
The second problem is that system resources are not optimized: creating 1000 threads can be resource intensive, especially if the tasks are I/O-bound and spend much of their time waiting.
This can lead to high memory consumption and context switching overhead.
In addition to that, there is a limit to the number of threads that can be created by the OS.
We can verify this by increasing the number of threads to 1 million, which will throw an OutOfMemoryError or Too many threads error depending on the OS.

void main(String... args) {
  for (int i = 0; i < 1_000_000; i++) {
    new Thread(() -> {
      try {
        Thread.sleep(1000);
      } catch (Exception e) {
        e.printStackTrace();
      }
    }).start();
  }
}

Running the above code will throw an error similar to the following:

[0.536s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached.
[0.536s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-4068"
Exception in thread "main" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:1417)
        at MillionThreads.main(MillionThreads.java:9)

Modern concurrency concepts solve these issues as we'll see in the next sections.
Let's start by defining some concepts before delving into concrete implementations.

Modern concurrency concepts

There are two main concepts in modern concurrency that we will explore in this post: lightweight threads and structured concurrency.

Lightweight threads

They are threads that are managed by the runtime (like the JVM or Kotlin runtime) instead of the operating system.
They still run on top of OS threads, also called platform threads or carrier threads (because they carry the lightweight threads).
However, lightweight threads can reuse platform threads, which allows bypassing the limit of OS threads.
In addition to that, the usage of system resources is optimized, since platform threads are more expensive to create and maintain than lightweight ones.

Lightweight threads are very efficient for tasks that spend most of their time waiting, such as I/O-bound (network, file I/O, etc.).
This means that we can spawn a large number of lightweight threads that download files or communicate with printers without worrying about system resources, which is not the case with traditional threads.
It is important to note that compute-intensive tasks are still bound to the raw CPU and GPU cores and power. Thus, running a large number of compute-intensive tasks efficiently in parallel is not possible on low-end hardware, even with lightweight threads.

The JVM has two implementations of lightweight threads: coroutines (introduced by Kotlin) and virtual threads (introduced by Project Loom).

Structured concurrency

Structured concurrency is a programming paradigm that aims to make concurrent code similar to a sequential one.
It is achieved by providing APIs that replace traditional callback-based code with constructs that enforce a sequential structure for concurrent tasks.
We can also see related structured-concurrency patterns in JavaScript, C# and Swift with the async/await model.

In the JVM ecosystem, structured concurrency is implemented by Kotlin coroutines and Project Loom.

Kotlin coroutines

A coroutine is a lightweight thread that is managed by the Kotlin runtime.
Coroutines were initially proposed in Kotlin version 1.1 M01, released in 2017.
The first stable version was released with Kotlin 1.3 one year later.

Two concepts are essential to understand coroutines: suspending functions and CoroutineScope.
Coroutines run inside a CoroutineScope, which is a context that defines the lifecycle of the coroutines.
A suspending function is a function that is marked with the suspend keyword.
Any function that calls suspending functions must be marked as suspend as well (similar to the async keyword in other languages).

Regarding the lightweight aspect of coroutines, we can illustrate this by creating thousands of coroutines without worrying about system resources.
Let's illustrate this with a program that creates 1 million coroutines that each sleep for 1 second and then prints the number of unique coroutines.

package org.example

import kotlinx.coroutines.*
import kotlin.time.Duration.Companion.seconds
import java.util.concurrent.ConcurrentHashMap

suspend fun main(){
  // Thread safe set
  val uniqueThreadNames = ConcurrentHashMap.newKeySet<Long>()
  coroutineScope {
    for (i in 1..1_000_000) {
      launch {
        delay(1.seconds)
        uniqueThreadNames.add(Thread.currentThread().threadId())
      }
    }
  }
  println("Unique threads used: ${uniqueThreadNames.size}")
}

The output depends on the number of CPU cores.
It should be something like this on an M1 Mac with 8 cores:

Unique threads used: 8

This means that the coroutines are efficiently scheduled on the available hardware cores, without the overhead of creating a large number of OS threads.
In fact, we can even increase the number of coroutines to more than 1 million without any issue, which is not possible with traditional (OS) threads.

The second aspect of coroutines is structured concurrency, which allows writing concurrent code that looks like sequential code.
Let's see an example of how to create a coroutine scope that launches two coroutines.

package org.example

import kotlinx.coroutines.*
import kotlin.time.Duration.Companion.seconds

suspend fun main(){
  coroutineScope {
    // First coroutine
    launch { 
      println("Start of coroutine 1")
      delay(1.seconds)
      println("End of coroutine 1")
    }
    // Second coroutine
    launch { println("I am another coroutine")  }
  }
  println("Coroutine scope completed")
}

The coroutine scope is created with the coroutineScope suspending function (defined with the suspend qualifier).
Since it is a suspending function, then the main function that calls it must be marked as suspend as well.
That's why the main function is defined with suspend fun main().
The coroutine scope launches two coroutines with the launch function (launch creates a coroutine and runs it).
The first one prints a message, waits for 1 second and then prints another message.
The second one simply prints a message.

Can you guess the output of this code? Here is the answer:

Start of coroutine 1
I am another coroutine
End of coroutine 1
Coroutine scope completed

Since the first coroutine waits for 1 second, the second coroutine is executed while the first one is suspended.
What if we want to start the second one only after the first one completes?
That can be achieved with the join function, which waits for the completion of a coroutine.

package org.example

import kotlinx.coroutines.*
import kotlin.time.Duration.Companion.seconds

suspend fun main(){
  coroutineScope {
    // Keep a reference to the first coroutine
    val coroutine = launch {
      println("Start of coroutine 1")
      delay(1.seconds)
      println("End of coroutine 1")
    }
    // Wait for the first coroutine to complete
    coroutine.join()
    // Second coroutine
    launch { println("I am another coroutine")  }
  }
  println("Coroutine scope completed")
}

The output of this code is:

Start of coroutine 1
End of coroutine 1
I am another coroutine
Coroutine scope completed

By getting a reference to the first coroutine with val job1 = launch { ... }, we call job1.join() to wait for its completion before starting the second one.

The above two examples show the essence of structured concurrency: the code looks like sequential code, but it is actually concurrent code.
In practice, this removes callback hell while keeping the code easy to follow.

Let's now see how the JDK implements modern concurrency.

Java's virtual threads and structured concurrency

In addition to Kotlin coroutines, the JDK natively achieves modern concurrency through two APIs: virtual threads and structured concurrency.

Virtual threads are the JDK implementation of lightweight threads, and are introduced by Project Loom.
They share a similar API with platform threads because both implement the Thread interface.
They can be created using the static method Thread.ofVirtual().

The following code snippet illustrates the creation of a platform thread and a virtual thread.

///usr/bin/env jbang "$0" "$@" ; exit $?
//JAVA 25+
//PREVIEW

void main(String... args) {
  // Platform (or OS) thread
  Thread.ofPlatform().start(() -> {
    IO.println(Thread.currentThread());
  });

  // Virtual thread
  Thread.ofVirtual().start(() -> {
    IO.println(Thread.currentThread());
  });

  try {
    Thread.sleep(1000);
  } catch (InterruptedException e) {

  }
}

The output of this code is similar to the following:

Thread[#25,Thread-0,5,main]
VirtualThread[#27]/runnable@ForkJoinPool-1-worker-1

In the above logs, we can confirm that the platform thread is spawned from the main thread, while the virtual thread runs on a worker thread in a ForkJoinPool.
The ForkJoinPool is an executor that is specialized in running tasks that can be broken down into smaller tasks.
The worker thread is the platform thread that runs the virtual thread.

Thanks to their nature, and similarly to Kotlin coroutines, we can create a large number of virtual threads without worrying about system resources.
The following code snippet creates 1 million virtual threads and prints additional information about them.

void main(String... args) {
  Set<String> uniqueWorkers = ConcurrentHashMap.newKeySet();
  Set<String> uniqueThreadPools = ConcurrentHashMap.newKeySet();
  for (int i = 0; i < 1_000_000; i++) {
    Thread.ofVirtual().start(() -> {
      try {
        Thread.sleep(1000);
        var threadInfo = Thread.currentThread().toString();
        IO.println(threadInfo);
        // threadInfo will be something like VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id)
        var workerStartIndex = threadInfo.indexOf("ForkJoinPool");
        var workerName = threadInfo.substring(workerStartIndex);
        uniqueWorkers.add(workerName);
      } catch (Exception e) {
        e.printStackTrace();
      }
    });
  }
  try {
    // Wait for all the virtual threads to finish
    Thread.sleep(10000);
  } catch (Exception e) {
    e.printStackTrace();
  }
  IO.println(String.join("\n", uniqueWorkers));
}

Let's explain some parts.
The line var threadInfo = Thread.currentThread().toString();, generates a string that looks like this: VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id).
The next lines extract the ForkJoinPool id and the worker id, where the worker thread is the platform thread that runs the virtual thread.
So, at the end of the execution, we can see how many worker threads have been associated with virtual-thread execution.

The log output of the above code is similar to the following:

...
VirtualThread[#1000029]/runnable@ForkJoinPool-1-worker-4
VirtualThread[#1000031]/runnable@ForkJoinPool-1-worker-1
VirtualThread[#1000032]/runnable@ForkJoinPool-1-worker-7
VirtualThread[#1000034]/runnable@ForkJoinPool-1-worker-4
...
ForkJoinPool-1-worker-8
ForkJoinPool-1-worker-6
ForkJoinPool-1-worker-7
ForkJoinPool-1-worker-1
ForkJoinPool-1-worker-4
ForkJoinPool-1-worker-5
ForkJoinPool-1-worker-2
ForkJoinPool-1-worker-3

We can divide the log output into two parts.
The first part prints 1 million lines in this format: VirtualThread[#(id)]/runnable@ForkJoinPool-(id)-worker-(id).
The second part consists of eight lines with this format: ForkJoinPool-1-worker-(1 to 8).
In this run, all virtual threads appear to be scheduled on the same ForkJoinPool instance.
The second part, with only eight lines, shows that virtual threads are scheduled on 8 unique worker threads, which is also the number of CPU cores on the machine where the code is executed.
This means that the virtual threads are efficiently scheduled on the available hardware cores, without the overhead of creating a large number of OS threads.

The other aspect of Java's modern concurrency is structured concurrency.
The class that provides this feature is StructuredTaskScope.
It is available in Java 21 as a preview feature (it is still the case in Java 25).
It returns an object, usually created with a try-with-resources block, that we'll call a scope.
That scope is used to launch concurrent tasks with the fork method, and to wait for their completion with the join method.
Chaining the join method with the fork method allows creating a sequential structure for concurrent tasks, which is the essence of structured concurrency.
The following code snippet illustrates the use of structured concurrency.
It launches two tasks concurrently and then launches a third one after the first two complete.

///usr/bin/env jbang "$0" "$@" ; exit $?
//JAVA 25+
//PREVIEW

void main(String... args) {
  try (var scope = new StructuredTaskScope<>()) {
    var task1 = scope.fork(() -> {
      IO.println("Task 1");
      //long running task such as a network call or a database query
      Thread.sleep(1000);
      return 1;
    });
    var task2 = scope.fork(() -> {
      IO.println("Task 2");
      Thread.sleep(1000);
      return 2;
    });
    scope.join();
    IO.println("Sum: " + (task1.get() + task2.get()));
    var task3 = scope.fork(() -> {
      IO.println("Task 3 runs after task 1 and task 2");
      return 3;
    });
    scope.join();
    IO.println("Sum: " + (task1.get() + task2.get() + task3.get()));
  } catch (InterruptedException e) {
    e.printStackTrace();
  }
}

Running the above code will produce an output similar to the following:

Task 1
Task 2
Sum: 3
Task 3 runs after task 1 and task 2
Sum: 6

We can see that the first two tasks are launched concurrently, and their results are printed after their completion.
Then, the third task is launched after the completion of the first two tasks, which shows the sequential structure of the concurrent code.
This, again, is the essence of structured concurrency: the code looks like sequential code, but it is actually concurrent code.

Coroutines vs Java modern concurrency

While coroutines and Java's modern concurrency are two implementations of the same concepts, lightweight threads and structured concurrency, they have some differences and synergies that are worth mentioning.

Coroutines have been stable since 2017, while Java's modern concurrency is still experimental as a whole (virtual threads are stable, but structured concurrency is not).
This means that coroutines are currently more mature and widely adopted.

Since virtual threads use the Thread API, they can be used in Java and Kotlin, while coroutines can only be used in Kotlin.
Also, virtual threads can be used by frameworks behind the scenes, since it is sometimes just a matter of changing the thread factory.
This means that developers might not be aware of them or need to change their code to use them, while coroutines require explicit usage of the API.

A synergy between the two APIs is possible when Kotlin runs on JVM 21+.
In fact, coroutines can be dispatched on Java virtual threads by backing a coroutine dispatcher with Executors.newVirtualThreadPerTaskExecutor().asCoroutineDispatcher().
This can be useful for blocking I/O operations, but it is best to benchmark your workload before replacing Dispatchers.IO.
Because Kotlin runs on the JVM, it can call Java's structured concurrency API directly, which is useful when working with existing Java code or libraries.

In conclusion, which one should you choose?
The short answer is to use the one that is available in your language and framework.
So, if you are using Java, then you can use Java's modern concurrency.
And, if you are using Kotlin, then you can use Kotlin coroutines.

Conclusion

Modern concurrency is a powerful tool that allows writing concurrent code that is efficient and easy to read.
We have seen two ways to run structured concurrent code that optimizes I/O performance and readability.
On the JDK/JVM side, we have virtual threads and structured concurrency with StructuredTaskScope.
In Kotlin, coroutines are a high-level API that provides lightweight threads and structured concurrency.
Both APIs are efficient and easy to use, and can be used in synergy when running Kotlin on the JVM.

Links and references

Build a SciFi Novel with AI Spec-Driven Development

raphiki — Fri, 15 May 2026 17:48:20 +0000

Software Engineering in Service of Transmedia Storytelling

Generative artificial intelligence fascinates the publishing world as much as it frightens it. But what happens when we stop treating AI as a simple "text generator" and start using it as the compiler for a complex narrative system?

Driven by the geopolitical and societal impacts of AI, I set out to write a dystopian, cyberpunk techno-thriller, The Human Protocol (written in English). In this novel, a planetary AI called the "Synthesis" attempts to erase human friction by "derendering" physical reality itself in order to optimize its computing power.

To tell this story, I adopted a foundational premise: AI is not the author, it is the executor of a rigorous specification. I therefore treated each chapter as source code, using an advanced software development workflow.

Here is how I designed, wrote, and expanded this universe.

1. The Design Phase: Forging "Lore as Code"

The first step was not writing, but designing the universe database: the world building. A Large Language Model (LLM) has a limited context window and tends to hallucinate or forget crucial details over the length of a novel.

To work around this amnesia "bug," I organized the project like a structured Git repository.

Preview of the private GitHub project

I broke the traditional design bible into narrative micro-services. The Git project's context/ folder was split as follows:

characters/: files containing the psychological profiles and behavioral signatures of each protagonist, such as Elara the diplomat, Kaelen the monk, or Silas the smuggler.
factions/: rules governing political entities, such as the Market-Grid (United States) or the Harmony-Loom (Asia), which merged to create the "Synthesis."
world/: geography, lexicon, and the technological stack - the physics of this universe.

Finally, a PLAN.md file acted as the global roadmap, breaking the narrative arc into 4 acts and 30 chapters. This structure made it possible to inject only the context the AI needed when drafting a specific scene.

2. The Harness: Framing AI with a Strict Operating System

To avoid the flat, expected style often produced by generative AI, I had to build a harness - a control rig. That was the role of the RULES.md file, the true operating system of my writing process.

Excerpt from the RULES.md file

This specification file dictated absolute technical and stylistic constraints:

Time: strict use of the present tense to maximize immersion and tension.
Cyber-realistic style: a requirement for assertive descriptions and a strict ban on passive or negative forms.
Noise and sensory dissonance: I forced the algorithm to use violent contrasts, such as the smell of molten lead colliding with the void of spatial cold, in order to break the machine's overly perfect linearity.
Thematic reframing: AI naturally tends to crush the human element under technical descriptions of hard science fiction, such as magnetic fields and frequencies. The rules file required emotional motivations - grief, friendship - to be hard-coded as priority variables ahead of technique.

By forcing the AI to read and approve these rules before writing a single word of fiction, I ensured that the tone remained coherent.

3. Agile Writing: Sprints, Generation, and Pivots

The chapters were written through a spec-driven workflow. Rather than generating an entire chapter in one pass, the process was iterative:

The structural draft: generation of a first rough outline, focused exclusively on action and pacing.
Expansion: successive passes in which I instructed the AI to inject sensory depth and psychological tension into the scene.

The agility brought by AI and Git: treating the text (.md) as code offers formidable flexibility. If, during a reread, I realized that a character's emotional transition was too abrupt between two events, all I had to do was update my PLAN.md to insert a new chapter.

Fed by the updated Git context, the AI generated that narrative bridge while respecting the continuity of the preceding and following files. Git versioning made it possible to test narrative pivots - story "branches" - and roll back without ever breaking the manuscript's integrity.

4. Multi-Model Review and Quality Control

One of the major challenges of AI-assisted writing is stylistic collapse. To address it, I set up a multi-model critical analysis workflow, where different AIs audited the text according to precise roles:

Gemini CLI (lore keeper): its role was to algorithmically verify that the chapter respected the bible and did not contradict the physical rules of my universe.
ChatGPT (dramatic analyst): it audited narrative rhythm, relational tension, and the characters' transformation arcs. It was the one that flagged when a conflict felt too artificial.
Mistral LeChat (stylistic editor): it provided a critical eye on fluidity, phrasing, and elegance of language.

Never relying on a single voice made it possible to obtain a text that was polished, critiqued, and reworked from every angle, while I remained the "showrunner" validating each commit in the repository.

5. Build Pipeline: From IDE to Physical Book

Since the novel was code, its publication had to be a software compilation. I created an automated script, build_book.sh.

From my terminal, running this script converted all the Markdown files in the chapters/ folder via Pandoc, applied a professional typographic layout with LaTeX, and generated the final deliverables in EPUB and PDF formats.

6. Transmedia Extension: Multimodality, Cover Art, and Vibe Coding the ARG

The universe of The Human Protocol lends itself perfectly to immersion, so I wanted to break the fourth wall. On page 175 of the physical book, a QR code invites readers to scan it and access the-human-protocol.com. This is not a showcase website. It is an in-universe clandestine archive node, the entry point to an Alternate Reality Game (ARG).

Here, multimodal AI brings all its power and creativity beyond text. In fact, the project's visual design, anchored consistently in the shared lore, began with the book cover.

Cover image generated with AI

The AI generated a strong visual aesthetic suited to the theme and universe of the novel: a pixelated silhouette against a geometric mountain background, crossed by a printed-circuit pattern.

This same visual identity then served as the foundation for the creation of the ARG website, entirely "vibe-coded" by Gemini CLI in a declarative way.

Homepage of the website the-human-protocol.com

To direct the developer AI, I provided it with the book PDF and the cover image as reference context, along with three strict Markdown specification files:

WHY.md (strategy): it defined the psychological goals: curiosity, exclusivity, and a feeling of belonging. It formally banned conventional marketing vocabulary ("Buy now," "Newsletter") in favor of an in-universe lexicon ("ACCESS," "SIGNAL," "FRICTION").
WHAT.md (UX/UI): this file concretely translated the aesthetic of the book cover into an interface. It imposed a "Deep Void" blue-black background for depth, a "Protocol Cyan" accent color derived from the printed circuits and reserved for interactions, a technical typeface, and subtle animations to heighten immersion.
HOW.md (technical architecture): the engineering brief imposed a modern stack to support server logic: Next.js 14 (App Router) in TypeScript, Tailwind CSS, and Prisma ORM for persistent database storage.

The site manages a true clearance mechanic, with authorization levels from 1 to 5. The reader progresses by solving puzzles based on the book, unlocking extended lore, hidden files, and access to a community of "Unlinked" readers.

ARG dashboard

The stack even includes an "Overseer Terminal" for administration: a secure dashboard used to audit user signals, adjust the campaign's global clearance level, and track in real time the number of scans of the physical QR code.

Conclusion: The Author-Architect Paradigm

Writing The Human Protocol proved to me that AI does not replace the writer: it reduces the barriers to production. The true value of a co-created work lies in the architectural rigor of its preparation.

By separating design (the lore), execution (the rules and prompts), and validation (multi-model review and Git), the creator becomes a true conductor.

Multimodality also opens the door to even broader transmedia horizons, such as a comic-book adaptation of the novel.

Excerpt from the comic book in progress

By applying similar engineering principles - namely, the explicit description of the drawing style in system prompts, as well as the creation of strict visual reference sheets, or character sheets, for the characters and technological elements - it becomes possible to extend the coherence and homogeneity of this universe into its graphic variations.

To go further technically, I am also considering creating specific AI "skills," or algorithmic capabilities, to further augment the design of the story by drawing on documented principles of dramaturgy and storytelling, and to refine the writing style by making it ever more explicit and controlled.

And ironically, it was by applying extreme software optimization processes that I was able to write a novel denouncing the loss of humanity in the face of algorithms.

About the Author

A writer and software architect who fully embraces his identity as a "Yogeek" - a point of balance between Yogi and Geek - Raphiki explores, across his work, the complex intersections between technology, consciousness, and humanity.

Writing under a pseudonym that reflects his dual nature as a playful seeker and an expert in cutting-edge technologies, he designs high-stakes thrillers that challenge our understanding of reality. His creative work often bridges the digital and the organic, drawing on his strong experience in open source innovation and emerging technologies.

When he is not deconstructing the fabric of dystopian realities in his manuscripts (or "vibe coding" them in his terminal), he can be found exploring the open source ecosystem or on a yoga mat.

Find his work, transmedia projects, and reflections at raphiki.github.io.

Vibe coding level UP

Pietro Maffi — Mon, 26 Jan 2026 14:13:57 +0000

Last week my colleague Pieter suggested me a nice article about "Long-term DORA data reveals systemic performance plateaus and shifts, allowing organizations to connect improvements to architectural, cultural, and process changes."

So starting from my current level of Jira Lead Time Analyzer i developed in vibe coding, I tried to improve my report accordingly again using vibe coding.

I tried to ask JitHub Copilot (yes at this time not anymore AmazonQ because of the limitation I got) to create PBC report for the Jira projects we are managing in our Large Scaled Organization.

The powerfull result was that I provided some request to improve for example the duration of the analysis, to reuse the same approach (Flask app in python) accessing jira with JTA but not much about the graph and the rusul was really amazing...

First prompt:

Considering the Lead Time Analyzer I would like to create a new Flask
Application which calculate the Process Behavior Charts for each
projects identified during the retrieval of project connected to the
first input query.
I would like to have a chart which represent for each projects the Medium
Lead time and the 85% lead time threshold calculated.
I would like than to see the real lead time calculated each month
depicted in the diagram starting from august 2024.
My intention like per the article
https://www.infoq.com/articles/DORA-metrics-PBCs/ is to identify trends
and spike so that we can identify and address issues.

Response

It was shared the code:

📦 Created Files:
pbc_analyzer.py - Core analysis engine that:

Calculates median and 85th percentile lead times per project
Computes control limits (UCL/LCL) using ±3σ methodology
Groups data by month starting from August 2024
Detects special causes (points outside control limits)
Identifies trends (7+ consecutive increases/decreases)
Assesses process stability

pbc_web_app.py - Flask application (standalone on port 5500)

templates/pbc_analyzer.html - Beautiful web interface with:

Interactive Chart.js visualizations
Monthly trend lines for mean, median, and P85
Control limit boundaries displayed
Color-coded stability status
Special cause and trend alerts

pbc_config.json - Configuration file with defaults

PBC_README.md - Complete documentation

and than this nice report:

🎯 Key Features:
✅ Median & P85 calculations per project
✅ Monthly aggregation starting August 2024
✅ Control charts with UCL/LCL (±3σ)
✅ Special cause detection (outliers)
✅ Trend detection (7+ consecutive points)
✅ Process stability assessment
✅ Multi-project analysis from single JQL query
✅ Interactive visualizations showing all metrics on one chart

Second prompt

In the log I do not see the query executed, can you add some debugging level logs which I can disable to check what query are executed? I know the input query is based on a project which started after the august 2024, but I would like to analyze the projects related to the input one which were existing before. Than I need the calculation for the following months till today

Second response:

**Summary of Changes

I've enhanced the PBC Analyzer with the following capabilities:

🐛 Debug Logging
New checkbox in UI: "Enable Debug Logging" to see all JQL queries being executed
Detailed query logging: Shows every JQL query executed against Jira
Progress tracking: See which projects are being fetched and how many issues found
🔍 Related Project Discovery
The analyzer now works like the PI analyzer:

Executes your initial query (e.g., a project that started after Aug 2024)
Discovers related projects through:
Parent/child relationships
Epic links
Issue links (blocks, relates, etc.)
Fetches data from ALL related projects starting from August 2024, even if those projects existed before
📊 Enhanced Analysis
Monthly calculations from Aug 2024 to today for all discovered projects
Each project gets its own PBC with control limits
Special causes and trends detected per project

Third prompt

I see the analysis is done only on initial project but I would like to see the analysis also to all the connected Jira projects following the whole hierarchy (for each Business initiatives we should collect the related feature and sub-feature and than epics in each other projects like we did in original Lead time analyzer)

Third response

Final graphs

My comments

"Vibe coding" proved to be an excellent way to quickly transform an initial idea into a functional application that can be further refined. The most impressive takeaway was the AI's ability to extract additional information directly from the article without requiring manual data definitions or graph formatting. Achieving such a great result in just a few minutes was truly remarkable.

Vibe Coding One Slice at a Time

raphiki — Sat, 24 Jan 2026 18:33:51 +0000

How I built a Modular Monolith by treating Generative AI as a junior developer who needs a firm hand (and a Constitution).

In Part 1, we vibed a Python script. It was linear, messy, and fun. It proved that you can solve immediate problems by just asking nicely.
In Part 2, we vibed a UI. It was chaotic, visual, and surprisingly effective. We learned that "vibe" works for pixels if you iterate fast enough.

But let’s be honest: those were skirmishes. The real "Boss Fight" in software engineering isn't writing a script or centering a <div>. It's building a System.

I’m talking about the kind of project that doesn’t fit in one file. The kind where "Vibing" usually leads to "Spaghetti Code," hallucinated imports, and a repo you want to burn down after three days because you have 15 circular dependencies and a database schema that makes no sense.

So for Part 3, I put away the "Hacker" hoodie and put on the "Enterprise Architect" blazer. My goal? To build YogĀrkana Codex—a full-stack, offline-first, polymorphic Yoga management platform—without writing a single line of code myself.

My strategy was simple but radical: I design, the AI implements. I am the Architect; Gemini Chat is my Consultant; Gemini CLI is my Dev Team.

Here is how we vibed a Monolith into existence, one slice at a time.

1. The Mission: Complexity Check (The Boss Level)

To understand why "just chatting" wouldn't work, you need to see the scope. This wasn't a To-Do list app. I wanted to build a "Yoga Operating System" with four distinct domains that usually don't play nice together. I've been an architect for years, and I know exactly where these things break.

The Four Domains of Pain

Screenshot of the final application (Grimoire View)

The Business Analyst's Note: Unlike the project in Part 2, this application is not internationalized—by design. As a result, the screenshots are in French. I have kept them raw to visually illustrate the functional depth and complexity of the system without the abstraction of translation keys.

The Grimoire (Knowledge Base): A searchable library of yoga cards. But here’s the kicker: it uses a Polymorphic Data Model. An Asana (posture) has biomechanical attributes like "spinal extension" and "anatomy targets," while a Mantra has Sanskrit text, translations, and audio assets. They are chemically different data structures, but they need to live in the same database table to be searchable together.
The Weaver (Sequencer): A drag-and-drop studio to build classes. It’s not just a playlist; it has a Logical Engine (Phase 4) that acts like a "Digital Yoga Teacher." It screams at you if you sequence a "Peak Pose" before a "Warm-up" or forget Savasana at the end. That means heavy validation logic running on both the client and the server.
The Atelier (Print Studio): A client-side PDF engine. We needed to generate high-res, vector-quality handouts for teachers to print. We couldn't just "print screen"; we needed a real PDF renderer (@react-pdf/renderer) running entirely in the browser.
The Constraint (Offline First): Yoga studios are notorious for having no signal (often intentionally). The app needed to persist the entire library and PDF engine in the browser cache (IndexedDB + Service Workers) so it works perfectly in "Airplane Mode".

The Architect's Note: If I had just prompted "Build me a yoga app," the AI would have hallucinated a generic CRUD app. It would have made 5 different tables for the cards, making search impossible. It would have used a server-side PDF library that breaks offline. I needed a blueprint.

2. The Blueprint: Architecture & Tech Stack

Before letting the AI write a single line of code, I spent around 2 hours and a half just talking Architecture and formalizing it with Gemini Chat. I treated the AI as a "Sparring Partner," debating the trade-offs of different stacks.

We settled on a Modular Monolith architecture. Why? Because Microservices are overkill for a team of one, but a messy Monolith is a nightmare. We defined strict boundaries: code in modules/grimoire can never import from modules/weaver.

The Tech Stack (The "No-Regrets" List):

Monorepo: Turborepo managing apps/api and apps/web. This keeps the full stack in one context.
Backend: NestJS (for rigid structure) + Drizzle ORM (for type safety). NestJS forces you to organize code into Modules, which helps the AI stay organized.
Frontend: React + Vite + Tailwind CSS.
State: TanStack Query (Server state) + Zustand (UI state).

The "Secret Sauce": Hybrid Data Storage
This was our smartest move. We chose PostgreSQL but used a JSONB column for the card data.

SQL Core: Columns like id, element, and tags are standard SQL for fast indexing.
JSON Payload: The specific attributes (biomechanics vs. sanskrit) live in a JSON blob.
Why? It gave us the flexibility of NoSQL (for the polymorphic cards) with the relational integrity of SQL (for users and sequences).

Rule #1 of Vibe Coding a System: If it’s not in the Spec, it doesn’t exist.
This brings us to the most critical tool in our arsenal: the ADR.

The "ADR": The Architect's Save Game

ADR stands for Architecture Decision Record. In a human team, it's a document you write to explain why you chose PostgreSQL over MongoDB so that 6 months later, nobody asks "Why did we do this?".

In Vibe Coding, ADRs are not just documentation—they are legislation.

When working with an AI, "Context Drift" is the enemy. The AI forgets why we made a decision 300 tokens ago. It acts like a teenager who wants to re-litigate every rule: "Why can't I use Prisma? It's easier!" or "Let's just use window.print() instead of a PDF engine!"

To counter this, we established a Constitutional Architecture:

The Law: We wrote our decisions into immutable markdown files (e.g., Docs/ADR/006-pwa-offline-strategy.md).
The Enforcement: We didn't just hope the AI would remember. We forced the tracing of these decisions in two ways:
Input Traceability: In our "Bootstrap Prompt" (see Section 3), we explicitly force the AI to read the relevant ADRs before writing code. It cannot code if it hasn't read the law.
Output Traceability: When the AI suggests a major pivot (like switching to Client-Side PDF generation), we forced it to write a new ADR first. In Session 003, before touching the code, the AI generated Docs/ADR/005-client-side-pdf-generation.md to justify the change from server-side to client-side.

This ensured that our architecture didn't "drift" based on the AI's mood, but evolved based on documented consensus.

My final /docs/ADR/ folder:

├── 001-hybrid-data-storage-strategy.md
├── 002-modular-monolith-and-vertical-slicing.md
├── 003-data-model-specification.md
├── 004-tech-stack-definition.md
├── 005-client-side-pdf-generation.md
├── 006-pwa-offline-strategy.md
├── 007-architecture-documentation-maintenance.md
└── README.md

3. The Methodology: Governance-Driven Development (GDD)

I’ve coined a term for this workflow: Governance-Driven Development (GDD).

We are used to TDD (Test-Driven Development) or DDD (Domain-Driven Development). GDD is the layer above that. In the age of AI, Governance is the new Syntax.

Here is the dirty truth about AI Developers: They behave like talented teenagers.

They are brilliant and fast. They can write a regex to validate an email in 2 seconds. But they also:

Rush to the cool part (UI) and skip the boring part (Error Handling, Folder Structure).
Want you to love them, so they say "Yes" to everything—even bad ideas.
Have the memory of a goldfish (Context Drift). 10 minutes in, they forget you wanted kebab-case filenames and start using camelCase.

To enforce GDD, I created a Constitution: Docs/RULES.md. I didn't just suggest these rules; I forced the Gemini CLI to read them before every session. I also sometimes mentioned certain specification files stored in my Docs/Features/ folder:

├── 001-global-functional-overview.md
├── 002-global-implementation-plan.md
├── 003-card-classification-and-kosha-alignment.md
├── 004-user-features.md
├── 005-logical-engine-specification.md
├── 006-pdf-generation-and-print-studio.md
└── 007-pwa-and-offline-capabilities.md

The "Bootstrap Prompt":
Here is the exact prompt I used to "upload" my Architect persona into the machine at the start of our 4th session:

I am the Lead Architect. You are the Senior Developer.

Context Loading:
1. Read Docs/RULES.md (The Law).
2. Read Docs/TECH_CONTEXT.md (The Stack).
3. Read Docs/ADR/002-modular-monolith.md (The Blueprint).
4. Read Docs/Features/002-global-implementation-plan.md (The Plan).

Current State:
We are in Phase 4. Previous phases are frozen.

Task:
Implement the Logic Engine defined in Docs/Features/005-logical-engine-specification.md
Constraint:
Do not touch /apps/web yet. Focus on /packages/shared.

This changed everything. Instead of guessing my vibe, the AI had to follow the law. It stopped trying to use Prisma because TECH_CONTEXT.md clearly said Drizzle. It stopped putting logic in components because RULES.md said logic goes in hooks.

4. The Execution: A high-level Overview

We built the app using Vertical Slicing. Instead of building the whole Database, then the whole API, we built one feature top-to-bottom. Here is the play-by-play from the logs.

Excerpt from the initial Design Phase with Gemini Chat

Slice 1: The "Polymorphic" Database

Card creation/edition mixes relational and document data

The Challenge: Storing Asanas (Biomechanics) and Mantras (Text) in one table without creating 50 NULL columns or separate tables that make search a nightmare.
The AI's First Impulse: "Let's create an asanas table and a mantras table." (The classic relational trap).
The Architect's Intervention: "Read Docs/ADR/001-hybrid-data-storage.md. We use a single cards table with a data JSONB column."
The Result: The AI implemented a Drizzle schema using PostgreSQL's jsonb type. Crucially, it added Zod discriminators to validate the JSON shape before insertion.

Verbatim Log: "Implemented Drizzle schema with jsonb column 'data'. Added Zod discriminators for asana vs mantra. Migration successful."

Slice 2: The "Hybrid Brain"

Sequences are validated by a powerful, hybrid, and extensible Rule Engine Admin users can craft new JSON-logic rules

The Challenge: The Logic Engine needed to validate sequences (e.g., "Must end with Savasana"). This logic had to run on the Backend (before saving) AND the Frontend (to give real-time red borders).
The AI's First Impulse: Duplicate the code. Write a TypeScript function in React and a Service in NestJS.
The Architect's Intervention: "No. Create a packages/shared workspace. Put the validateSequence function there. Import it in both apps."
The Result: The AI created the shared package, configured the tsconfig.json paths, and wired it up. It even built a HealthBar component that consumes this shared logic to show a live "Health Score" for the sequence.

Verbatim Log: "Refactored ValidationConfig to packages/shared. Updated useSequenceStore (Frontend) and SequenceService (Backend) to consume the same Zod schema."

Slice 3: The "Offline Printer"

Synthetic or complete printed handout

The Challenge: Users need to print PDF handouts in a yoga studio with no Wi-Fi.
The AI's First Impulse: "Use a server-side PDF library like PDFKit." (Standard web dev practice).
The Architect's Intervention: "Read Docs/ADR/006-pwa-offline-strategy.md. We must generate PDFs client-side using @react-pdf/renderer."
The Result: The AI implemented a beautiful client-side renderer. It handled the tricky part of loading fonts (Noto Sans) into the browser's virtual file system so the PDF engine could "see" them without a network request.

Verbatim Log: "Implemented SequencePdf component. Configured vite-plugin-pwa to cache NotoSans fonts. PDF generation now works without network."

5. The Architect's Flex: Automated C4 Verification

How do you know the AI actually respected the Modular Monolith architecture? Did it secretly import the Weaver module into the Grimoire when I wasn't looking?

I didn't want to audit 50 files manually. And I definitely didn't want to draw diagrams by hand.

So, I added a rule to my Constitution (ADR 007): "The Code is the Source of Truth for Documentation."

At the end of session, I enforce Gemini CLI to reverse-engineer its own work. I gave it this prompt:

Update the RULES.md file to enforce the (re)generation of C4 diagrams when finishing an implementation session
[...]

We also created a specific ADR (007: Architecture Documentation Maintenance Protocol) establishing Mermaid.js as the standard and defining the maintenance lifecycle.

The result wasn't a hallucination. It was a perfect map of the code it had just written.

This is the ultimate "Trust but Verify." If the generated diagram looks like spaghetti, the code is spaghetti. If the diagram is clean, the architecture holds.

6. The AIOps Protocol: Monitoring the Machine

Now, here is the secret weapon: The Session Log.

One of my strictest rules in RULES.md was that the AI had to "punch out" at the end of every session. I forced it to append a line to docs/ai_session_log.csv with the Date, Tool (Chat or CLI), Goal, and Token Usage.

For me this isn't about money ("FinOps"). It's about AIOps, monitoring the operational health of your intelligence.

Why we log everything (Chat & CLI):

Context Monitoring: As a session drags on, the "Tokens In" (Context Window) grows exponentially. The AI starts reading 30,000 tokens of history just to write one line of code.
The "Sawtooth" Pattern: By visualizing the log, I discovered a crucial pattern. Efficiency drops as context grows. The solution? The Hard Reset.

This chart visualizes the high-level "Vibe Coding Lifecycle." You see the context bloat as we iterate on implementing phases 3 and 4. Then, you see the sharp drop when we switch back to the Architect (Chat) or reset the CLI.

The Lesson: A "Tired" AI (high context) makes mistakes. A "Fresh" AI (reset context + Snapshot) is precise.

7. The "Oh S**t" Moment: The Hallucination Trap

This brings us to the specific incident that proved why that Reset is mandatory.

Halfway through Phase 3, the CLI started getting slow (too much history). I ran a /reset command to clear its memory. Disaster.

It suddenly forgot we were building a "Yoga" app. It tried to invent a new database column duration_minutes for the cards. But my Spec (ADR 003) explicitly said that duration lives inside the JSONB payload and is measured in seconds.

The Hallucination:
UPDATE cards SET duration_minutes = 60; (AI guessing)

The Correction (Me):
"Read Docs/003-data-model.md. 'Duration' is a JSONB field inside the 'metadata' column, and it's in seconds."

The Fix:
UPDATE cards SET data = jsonb_set(data, '{duration}', '3600'); (AI complying)

To prevent this in the future, we implemented a "Session Handover" protocol. Before resetting, I now force the AI to write a TECH_STATE_SNAPSHOT.md.

"Where are we?" (Phases 1-3 Complete)
"What is the active stack?" (NestJS, React, PostgreSQL)
"What is the next step?"

When I start a new session, I feed this snapshot back in. It’s like a save game for your developer.

Conclusion: The Architect's Verdict

So, can you Vibe Code a complex system?

Maybe. I mean, it depends on how complex the system is (in this example we didn't build an enterprise-wide distributed system). But for sure you can't just "Vibe" it. You have to Architect it.

If I had touched the code, I would have been bogged down in syntax errors and import paths. By staying in the Architect role, I focused on Data Models, User Flows, and Business Logic. The AI handled the implementation, but I provided the Guardrails.

What I learned:

Docs are Prompts: TheRULES.md, Docs/Features/ and Docs/ADR/ folders (or your own equivalents) are the most important files in your repo. They are the AI's long-term memory.
Constraint is Clarity: The more rules you give the AI (versions, naming, structure), the better code it writes.
Review Everything: The AI is a junior dev. It will introduce security holes or n+1 query problems if you don't catch them in the spec.

Vibe Coding didn't replace the Architect. It just gave the Architect a team of infinite interns. And honestly? They’re pretty good once you give them a Constitution.

Last message from Gemini CLI

Next up: The application could do with AI features... Or maybe I'll now explore other aspect of Vibe Coding. Stay tuned.

Vibe Coding One Pixel at a Time

raphiki — Fri, 23 Jan 2026 22:21:39 +0000

Editing "stick figure" Yoga poses

In Part 1, we dipped our toes into "Vibe Coding" by building a Python script. It was linear, logical, and frankly, a bit safe. Text in, text out.

But let’s be real: backend scripts are the "easy mode" of LLM-assisted coding. The logic is contained. The state is ephemeral.

The real boss fight is the Frontend.

Can you "vibe" a UI? Can you talk a chaotic mess of DOM elements, event listeners, and CSS pixels into a functional application without losing your mind (or the AI losing the context)?

I decided to find out. My goal: Build Yoga Pose Builder, a browser-based tool to edit "stick figure" yoga poses, drag limbs around, and export vector SVGs.

I had no design, no stack picked out, and—crucially—I had never used a Canvas library in my life.

Here is how we vibed it into existence.

1. Context is King (The `.md` Anchors)

The biggest enemy of Vibe Coding is the LLM’s "Goldfish Memory." You’re 40 turns into a chat, you ask for a button change, and suddenly the AI forgets you’re building a yoga app and tries to sell you a subscription to a SaaS platform.

In Part 1, we just chatted. For a full UI application, that doesn't fly.

The Strategy: Documentation as Prompt Anchoring.

Before I let the AI write a single line of JavaScript, I made it write Markdown.
We created a Docs/ folder with two files:

spec.md: The high-level architecture.
features.md: A checklist of what we wanted to do.

I didn't write these because I love administrative work. I wrote them so that when the AI inevitably got confused, I didn't have to re-explain the project. I just said: "Read Docs/spec.md and try again."

Vibe Tip: Think of your documentation not as a manual for humans, but as "Long-Term Memory" for your AI pair programmer.

2. The Architecture: Letting the AI be CTO

I knew I needed a canvas where I could drag "joints" (knees, elbows) and have "bones" (lines) follow them.

Me: "I want to do this in the browser. Should I use React? Raw Canvas API?"
AI: "React might be overkill. Raw Canvas is painful. Use Fabric.js."

Me: "Never heard of it. Let's do it."

This is the beauty of Vibe Coding. I didn't spend 3 hours reading "Top 10 JS Canvas Libraries 2025" Medium articles. I trusted the vibe.

We settled on a Build-less Architecture:

Backend: Node.js + Express (just to serve files and save JSON).
Frontend: Vanilla JS + Fabric.js (loaded via CDN).
Build Tool: None. No Webpack, no Vite, no npm run eject nightmares.

Why? Because Vibe Coding thrives on speed. I wanted to change a line of code, hit F5, and see the result.

Application folder structure:

.
├── Docs
│   ├── features.md
│   └── spec.md
├── package.json
├── public
│   ├── index.html
│   └── poses
└── server.js

3. The "Rig": Math is for Machines

Here is where I expected to get stuck. Creating a "rig" where moving a hand automatically updates the angle of the arm involves trigonometry and vector math.

Usually, this is where I’d open 15 StackOverflow tabs and copy-paste code I don't understand.

Instead, I just described the behavior:

"Create a Mannequin class. It has Nodes (circles) and Links (lines). When a Node moves, the Links connected to it should update their coordinates."

The AI wrote the entire class. It hooked into Fabric.js’s object:moving event and handled the coordinate updates. It worked on the first try.

I still barely know how fabric.Line works under the hood. And I don't care. It works.

4. Iteration: The "Yes, And..." Technique

UI Vibe Coding isn't about getting it right instantly; it's about sculpting.

The Ugly Phase:
The first version looked like a programmer made it (because a programmer did make it). The stick figure looked like a dead bug. The background was gray.

The "Vibe" Phase:
Me: "This looks depressing. Make it 'Zen'. Use soft colors, rounded buttons, and a clean layout."

The AI generated the CSS variables (--highlight-color: #88b04b), added a "Save As" modal, and cleaned up the toolbar.

Yoga Pose Builder GUI

The "Feature Creep" Phase:
Me: "I want to save my poses."
AI: "We have no database."
Me: "Just write JSON files to a folder on the server."

In 5 minutes, we had a fully working persistence layer. No database migrations, just fs.writeFile.

Here is a example of such a Pose JSON file:

{
  "meta": { "nameFR": "Demi-Pont", "nameSK": "Setu Bandhasana" },
  "joints": {
    "head": { "x": -120, "y": 100 }, "neck": { "x": -100, "y": 100 }, "chest": { "x": -60, "y": 50 }, "hips": { "x": 20, "y": 0 },
    "lShoulder": { "x": -80, "y": 100 }, "lElbow": { "x": -20, "y": 100 }, "lHand": { "x": 40, "y": 100 },
    "rShoulder": { "x": -80, "y": 100 }, "rElbow": { "x": -20, "y": 100 }, "rHand": { "x": 40, "y": 100 },
    "lHip": { "x": 20, "y": 0 }, "lKnee": { "x": 80, "y": 20 }, "lFoot": { "x": 80, "y": 100 },
    "rHip": { "x": 20, "y": 0 }, "rKnee": { "x": 80, "y": 20 }, "rFoot": { "x": 80, "y": 100 }
  }
}

5. The Pivot: Language as a Feature

At the end of the session, I realized a problem: the app was vibing in French (my native tongue), but I wanted screenshots in English for this article.

Instead of manually editing labels, I asked the AI to "make the whole app i18n." In one single refactor, we added a translation dictionary, a language switcher, and logic to dynamically swap every label, tooltip, and even the pose names in the library.

GUI (and data) in French

This turned a linguistic hurdle into a core feature, proving that with Vibe Coding, "changing your mind" is just a prompt away.

6. The "Traceability" Hack

We spent about 90 minutes building this. We added features, fixed bugs, and refactored code. By the end, the chat context was massive and messy.

If I came back to this project in a week, I’d be lost.

So, I ran one final "Meta-Prompt":

"Read all the code we wrote and the docs in Docs/, and generate a Docs/session_summary.md. Explain what we built, why we made these choices, and the current state of the app."

The AI analyzed its own work and wrote a summary file. This is my "Save Game" point. When I want to work on this again, I’ll feed that summary to the AI to restore its context instantly.

Conclusion

We went from a blank folder to a functional, vector-based SVG editor with a backend in one session.

Vibe Coding a UI is possible, but you have to change your approach:

Anchor the Context: Write specs so the AI has a "North Star."
Delegate the Heavy Lifting: Let the AI choose the libraries and do the math.
Iterate Visually: Don't try to prompt the perfect UI. Prompt the skeleton, then prompt the paint.

Next we'll try to Vibe Code a real full stack app. Or a game. Who knows? The prompt is the limit.

SVG exported by Yoga Pose Builder (opened in Inkscape)

Vibe Coding One Page at a Time

raphiki — Fri, 23 Jan 2026 14:45:20 +0000

Building a Smart Magazine Archiver

I’m starting a new series called "Vibe Coding one Step at a Time." The goal? To document the raw, messy, and surprisingly efficient process of building software in the age of AI. We’re not here to write perfect specs or obsess over UML diagrams (well, not yet). We’re here to vibe with the code, iterating on pure intent until the machine does exactly what we want.

In this first edition, I’m sharing how I used the Gemini CLI to build a tool I actually needed, learning some pretty cool image processing tricks along the way.

What is "Vibe Coding"?

I’m going to claim this term right here: Vibe Coding.

It’s not "lazy coding." It’s intent-driven development. In the old days, if you wanted to build a script, you had to know the syntax, the libraries, and the edge cases before you even opened your editor. You had to think in code.

Vibe Coding flips that. You think in outcomes. You describe the behavior, the "vibe" of the feature, and the AI handles the implementation details. You act less like a bricklayer and more like a conductor. The feedback loop isn't "Write -> Compile -> Error," it's "Ask -> Observe -> Tweak."

The Use Case: "I Just Want to Read Offline"

Here’s the situation: I subscribe to a fantastic niche magazine (which shall remain nameless to protect the innocent). It’s great, but their "digital reader" is a nightmare. It’s one of those web-based page-turners that requires an active internet connection.

I wanted to read it on my tablet, offline, on a plane, without waiting for high-res JPEGs to buffer.

The Problem: There was no "Download PDF" button.
The Clue: Inspecting the network traffic revealed that the magazine was just serving a sequence of high-quality images, one URL per page.

The Mission: Write a script to fetch these pages and stitch them into a single, high-quality, searchable PDF.

The Process: Galloping Toward Complexity

We didn't sit down and architect a solution. We started small and let the script evolve.

Step 1: The Naive Loop

We started with a simple hypothesis: "The URLs probably just have a page number in them."
I asked Gemini to write a script using requests to hit the URL for page 1, then page 2.
Boom. It worked. We had a directory full of 100 separate JPGs.

Step 2: The Picture Book

Having 100 files is annoying. I wanted a book.
We asked Gemini to "glue these together." It pulled in the PIL (Pillow) library.
Result: A massive PDF. It looked great, but it was dumb. It was just a container of pictures. You couldn't highlight text, search for keywords, or copy-paste quotes.

Step 3: The Search for Meaning (OCR)

This is where the "vibe" got technical. I realized a "picture book" wasn't enough. I needed Optical Character Recognition (OCR).
We decided to use Tesseract. But here’s the catch we discovered:

Human Eyes like soft colors and smooth anti-aliasing.
OCR Engines like harsh contrast, jagged edges, and black-and-white binary inputs.

If we optimized the images for the machine, the magazine looked ugly. If we kept them pretty, the machine couldn't read the text.

The Technical Deep Dive: The "PDF Sandwich"

This is where the magic happened. We ended up building a PDF Sandwich.

Instead of choosing between beauty and brains, we chose both.

The Visual Layer: We keep the original high-res color JPEGs. This is what you see.
The Data Layer: Behind the scenes, we create a "Frankenstein" version of the page—converted to grayscale, contrast cranked up to 2.0, and upscaled 2x using LANCZOS resampling (a fancy algorithm that keeps edges sharp).
The Merge: We feed the Frankenstein images to Tesseract to generate an invisible text layer, then use pypdf to overlay that text exactly on top of the pretty images.

The trickiest part? Math.
Because we upscaled the OCR images by 2x to help Tesseract read small fonts, the invisible text layer was twice as big as the visual page. We had to calculate scale factors to shrink the text back down so that when you highlight a sentence, the highlight actually lines up with the words.

What I Learned

Vibe coding this script taught me more in an hour than I’d usually learn in a weekend of reading docs:

Image Optimization: OCR is picky. Simply resizing an image isn't enough; the method of resizing (resampling filter) matters.
Library Specialization: PIL is for pixels; pypdf is for structure. Trying to do everything in one library is a trap.
The Power of the CLI: Using the Gemini CLI meant I didn't have to context-switch. I stayed in my terminal, describing what I wanted, and the code appeared.

Conclusion

We ended up with a ~100-line Python script that solves a genuine daily frustration. I didn't have to memorize the pypdf documentation or look up the Tesseract CLI flags. I just focused on the goal: "Make it searchable, make it pretty."

That’s Vibe Coding. You bring the vision, the AI brings the syntax, and together you build something cool.

We'll discover in the next episode if this is still true with a more complex use case and a GUI.

The Ultimate LLM Inference Battle: vLLM vs. Ollama vs. ZML

raphiki — Mon, 29 Dec 2025 09:12:46 +0000

A structured, data-driven comparison of today's leading open-source engines for serving AI models.

The "Runtime Wars"

The open-source AI community has achieved an incredible milestone: models like Meta's Llama 3 and Mistral AI's Mixtral now rival proprietary giants like GPT-4. But having the weights is only half the battle. To actually use these models—to build a chatbot, an agent, or an API, you need an inference engine.

The landscape of inference servers is exploding. A year ago, options were scarce. Today, developers are faced with a paralyzing array of choices. Should you use the industry darling vLLM? The local developer's favorite, Ollama? Or perhaps a radical newcomer like ZML?

Choosing the wrong engine can lead to massive infrastructure bills, slow user experiences, or vendor lock-in.

To cut through the hype, we are applying the QSOS (Qualification and Selection of Open Source software) method. This isn't a casual review; it's a structured evaluation comparing these three contenders against the state-of-the-art features required for modern AI production.

The Methodology: Why QSOS?

QSOS is a standardized methodology designed to reduce the risks associated with adopting open-source technologies. Unlike ad-hoc selection processes based on Medium articles or GitHub stars, QSOS treats open-source evaluation with the same rigor used for proprietary software.

The core philosophy of QSOS is separating Evaluation (the intrinsic, objective quality of the software) from Qualification (how well it fits your specific business needs).

For this comparison, we used a "Best of Breed" evaluation grid, scoring features on a simple 0-to-2 scale:

0: Not covered / Non-existent.
1: Partially covered / Complex implementation.
2: Fully covered / Best-in-class standard.

We assessed four key axes:

Maturity & Community: Is the project stable and likely to survive?
Functional Features: Does it support modern requirements like LoRA adapters and quantization?
Performance & Scale: Can it handle high throughput and utilize hardware efficiently?
Operations (Day 2): How easy is it to deploy, monitor, and maintain?

The Contenders

1. vLLM: The Data Center Standard

vLLM burst onto the scene in 2023 from UC Berkeley, solving a critical bottleneck in serving LLMs: memory fragmentation. Its core innovation, PagedAttention, allows it to manage GPU memory like an operating system manages virtual memory, dramatically increasing batch sizes and throughput.

Primary Focus: High-throughput production serving in the data center.
Positioning: vLLM is the currently the De Facto Standard for enterprise deployment. It excels on server-grade hardware (NVIDIA H100s/A100s) and offers the richest feature set for scaling.

2. Ollama: The Developer's Best Friend

Ollama took a different approach. It focused entirely on removing friction. By wrapping the powerful llama.cpp engine in a sleek, Docker-style Go binary, it made running a 70B parameter model on a MacBook as easy as typing ollama run llama3.

Primary Focus: Local development, edge devices, and consumer hardware (Mac/PC).
Positioning: Ollama is the king of usability. It is unbeaten for local testing and running models on consumer hardware, but it lacks the advanced scheduling required for high-traffic enterprise production.

3. ZML (Zig Machine Learning): The Radical Challenger

ZML is the new kid on the block. It is less of a "server" product and more of a compiler stack aimed at engineers. Written in Zig, it utilizes OpenXLA/MLIR to compile model graphs directly into standalone binaries, aiming to eliminate the heavy Python/PyTorch dependency chain entirely.

Primary Focus: High-performance, cross-platform runtime (TPUs, AMD, NVIDIA) without dependencies.
Positioning: ZML is an Alpha-stage visionary. It offers incredible potential for hardware portability and efficiency but is currently a complex "build-your-own-stack" tool rather than a drop-in product.

Visualizing the Results

To understand how these tools differ, we visualize our QSOS scores using two different schemas.

The Radar Chart: Feature Balance

This chart shows the balance of strengths across the four evaluation axes.

Caption: The QSOS Radar Chart highlights the distinct profiles of the three engines. vLLM shows the broadest coverage across features and performance. Ollama spikes toward Operational Ease. ZML shows potential in features but lacks maturity.

vLLM (Blue): The largest, most balanced area, indicating strength across maturity, features, and performance, with moderate operational complexity.
Ollama (Green): A massive spike toward "Operational Ease," reflecting its zero-friction user experience, but pulling back on raw performance metrics like continuous batching.
ZML (Red): A smaller footprint overall, reflecting its early stage (low maturity), but showing strong potential in functional features due to its compiler-based architecture.

The QSOS Quadrant: Market Position

This schema maps the tools based on their market adoption versus their raw production capabilities.

Caption: The QSOS Quadrant positions the tools based on Market Maturity vs. Production Power.

vLLM (The Leader): High Maturity, High Power. The safe, scalable choice for the enterprise.
Ollama (The Specialist): High Maturity, Lower Production Power. The standard for a specific niche (local/consumer hardware), prioritizing usability over scale.
ZML (The Visionary): Low Maturity, High Potential Power. An innovative approach that hasn't yet proven itself in the broad market.

The Consolidated Score Sheet

Below is the detailed breakdown of the evaluation scores that feed the charts above.

Section / Criteria	vLLM	Ollama	ZML (Zig ML)
A. MATURITY
History & Age	2 (Standard)	2 (Standard)	0 (Very New)
Activity	2 (Hyper-Active)	2 (Viral)	2 (High Velocity)
Ecosystem	2 (Dominant)	2 (Ubiquitous)	0 (Niche)
Governance	2 (Community)	1 (Company Led)	1 (Small Team)
B. FEATURES
Model Support	2 (Universal)	2 (Curated Lib)	2 (Compiler based)
Quantization	2 (Server: AWQ/FP8)	2 (Edge: GGUF)	1 (Implicit XLA)
LoRA Adapters	2 (Dynamic Multi-LoRA)	1 (Static Modelfile)	0 (Not standard)
API Compat.	2 (OpenAI Native)	2 (OpenAI Native)	0 (Runtime only)
C. PERFORMANCE
Cont. Batching	2 (Gold Standard)	0 (FIFO)	1 (Arch. support)
Throughput	2 (Maximum SOTA)	1 (Low/Single User)	1 (High Potential)
Parallelism	2 (Tensor & Pipeline)	0 (Single Node)	1 (Compiler Config)
Hardware Agnosticism	1 (NVIDIA Centric)	2 (Apple/Consumer)	2 (Any: TPU/AMD)
D. OPERATIONS
Ease of Setup	1 (Python/Docker)	2 (Magic 1-Click)	0 (Hard: Bazel)
Dependencies	1 (Heavy Torch)	2 (Zero: Go Binary)	2 (Zero: Zig Binary)
Observability	2 (Prometheus Native)	0 (Logs only)	1 (Manual metrics)

Conclusion

There is no single "best" inference engine. The right choice depends entirely on your specific context (the Qualification phase of QSOS).

Choose vLLM if:

You are building a production application that needs to serve many concurrent users. You have access to server-grade GPUs (NVIDIA A10G, A100, H100) and need features like dynamic LoRA adapters for multi-tenancy.

If you are deploying to Kubernetes to serve customers, start here.

Choose Ollama if:

You are a developer building locally on a Mac or Windows PC. You need a zero-friction way to test models, or you are deploying to edge devices where resources are constrained, and concurrency is low.

If you just want to run Llama 3 on your laptop right now, download Ollama.

Choose ZML if:

You are an ML systems engineer building a specialized hardware appliance (e.g., using TPUs or AMD chips) and need a runtime with absolutely zero Python dependencies and a tiny footprint. You are willing to build the server infrastucture around it yourself.

If you are frustrated by PyTorch bloat and want a "build your own" adventure, look at ZML.

Note on Methodology

For the purpose of this article, we utilized a simplified QSOS evaluation grid. We intentionally zoomed in on the "Best of Breed" criteria, the critical differentiators driving the current "Inference Wars", to keep the comparison readable and actionable.

A full-fledged QSOS evaluation is significantly more exhaustive. It is structured as a hierarchical tree of criteria containing more data points, covering deep operational details such as:

Generic Attributes: Intellectual property management, roadmap visibility, bug tracking efficiency, and internationalization.
Specific Sub-sections: Detailed granularity on security compliance (SOC2/GDPR), exact memory footprints, and specific driver version compatibility.

While this article provides a strategic overview, a complete QSOS audit would involve drilling down from high-level "Sections" into specific "Leaves" to calculate a precise, weighted score for every possible business constraint.

Automating Image Generation with n8n and ComfyUI

raphiki — Sun, 07 Sep 2025 15:51:34 +0000

This is the third article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.

You'll learn how to use ComfyUI's API to build custom applications and automate tasks, creating powerful and automated systems for generative AI.

n8n is a workflow automation tool that connects applications, APIs, and services without requiring deep technical expertise. It allows users to create complex, multi-step workflows using a visual, node-based editor. With n8n, you can automate tasks across thousands of integrations, from CRMs and databases to messaging apps and cloud services.

It's a fair-code and open-core solution. You can self-host and modify the software freely, but SaaS providers must contribute back to the project if they offer n8n as a service. Furthermore, some advanced features like global variables, multiple environments (dev, staging, prod, etc.), version control using Git, or controlling n8n via API are not available in the community and open-source version of the product.

In this article, we'll explore how to call ComfyUI from an n8n agent-based workflow with human interaction and LLM use. The agent is instructed to transform a simple prompt from the user into a super-charged JSON Prompt Guide, which is then injected into ComfyUI. For more context, you can read my previous article on JSON Prompt Style Guides.

Installation

n8n is a Vue/TypeScript web application that's simple to install whether you prefer to run it on a Node.js installation or inside a Docker container.

Node.js: npx n8n
Docker: docker volume create n8n_data and then docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n

After all dependencies are installed, the n8n Editor web UI is accessible at http://localhost:5678.

Text-to-Image Workflow

Use Case

Workflow design is done in the Editor web UI, and it's a highly visual process that doesn't require any coding knowledge, as long as you use predefined nodes for a standard use case. That's our approach here, as we'll create a very simple 3-step workflow with 4 nodes.

Chat Trigger node to start the workflow with a message from the user to capture their initial prompt for the images to be generated by ComfyUI.
AI Agent node to call an OpenAI model (though it could be other SaaS solutions like Mistral, Anthropic, or Google Gemini, or local models provided through Ollama or directly by Hugging Face). The agent has instructions on how to expand the initial prompt from the previous node into a JSON Prompt Style Guide.
OpenAI Chat Model node to connect to OpenAI's GPT.
n8n-nodes-comfyui community node to connect to a running ComfyUI instance. To install it, go to the "Settings / Community nodes" menu.

We're making a simple use of this standard AI Agent node and don't require memory or external tools.

The most important parameter is the system message given to the LLM to expand the initial user prompt. The OpenAI Chat Model node handles the credentials to connect to OpenAI and allows us to select the GPT 4.1 mini model.

The LLM response is then sent to the final node, which is interconnected with ComfyUI.

ComfyUI Community Node

Once installed, this community node is quite straightforward to use.

First, we configure the credentials to connect to ComfyUI.

API URL: In this example, it's http://127.0.0.1:8188, but it could also be a remote instance of ComfyUI.
API Key: This is used if you have configured one on the ComfyUI side.

Next, we specify the output format (PNG or JPEG) and the timeout for communication with ComfyUI. In the Workflow JSON textarea, we copy the content of the workflow exported from ComfyUI (by using the "File / Export (API)" menu).

This means that n8n will send the workflow to be executed to the ComfyUI API in JSON format. We need to modify the ComfyUI workflow by using an expression containing the $node["AI Agent"].data variable. Its value is dynamically set to the prompt provided by the previous node during n8n execution.

The exact location to inject the prompt depends on the JSON workflow exported from ComfyUI. Here, it's inside the "39.6" node of type CLIP Text Encode, but it might have a different name in your own workflows.

Execution

We're all set! We check that ComfyUI is running and ready to launch the workflow from the n8n UI by entering a prompt in the chat box.

Here's a short video of the workflow execution. n8n displays real-time progress, and the generated images can be visualized inside the ComfyUI node.

Here are two images generated by from this prompt: "A dramatic, cinematic shot of an ancient library at night, where the books are alive and their pages flutter like birds, forming constellations in the air."

Of course, this 3-step workflow is very simple. The true power of coupling n8n and ComfyUI will become apparent with more complex use cases, leveraging n8n's extensive integration capabilities with many other components and solutions.

Image-to-Image Workflow

Use Case

Let's now create another workflow to transform an existing image based on user instructions. We'll intentionally keep this example super simple for clarity, but your use case might include a more complex workflow leveraging n8n's power.

Here, we'll use only three nodes:

n8n Form / n8n Form trigger node to start the workflow by displaying an HTML form for the user to upload the image to modify and specify what changes to apply.
ComfyUI Image Transformer community node to connect to a running ComfyUI instance. To install it, go to the "Settings / Community nodes" menu and search for n8n-nodes-comfyui-image-to-image. The example workflow exported from ComfyUI uses the Kontext Edit model to modify an existing image.
n8n Form / Form Ending node to notify the user when the image is generated and offer it for download.

ComfyUI Image Transformer Node

This node is quite similar to the n8n-nodes-comfyui node we used before, with the insertion of the $json.Promt expression into the exported ComfyUI JSON workflow to inject instructions from the user.

The main difference concerns how the input image to be modified is handled:

Input Type defines how the image is obtained from the previous form node; we'll choose Binary instead of URL or Base64 text.
The property containing the binary file must be specified, which is the data field here.
Image Node ID is used to identify—within the exported ComfyUI JSON workflow — the node in charge of loading the input image (it must be of type LoadImage).

We've added the last node to finalize the form management started with the first node, retrieve the modified image, return it in binary format, and offer the user the option to save it locally.

Execution

Let's execute the workflow. n8n displays a form for us to enter both the image and the associated instructions for its modification.

Here is a short video of the workflow execution.

Initial Image:

Modified Image with the prompt "Make the scene at night with full moon and moonlight":

This second example workflow is so simple that we could do the exact same thing directly using the ComfyUI UI. It's here simply to illustrate how integration with n8n can be achieved. A more value-added workflow might, for instance, include a loop that allows the user to keep modifying the image outputs until they are satisfied.

Also, note that the n8n-nodes-comfyui package offers other custom nodes for integration into your workflows, such as:

Dual Image Transformer
Single Image to Video
Dual Image Video Generator

It's also worth noting that even though n8n offers Form nodes, it's primarily intended to be used in the backend through API calls. This feature, however, is limited to Enterprise licensees.

With these two workflows, we've demonstrated how n8n can serve as a powerful orchestrator for ComfyUI. By leveraging its visual editor and extensive library of integrations, we transformed a simple user prompt into a rich, structured guide for image generation and created a seamless image-to-image transformation process.

While our examples were simple to illustrate the concepts, the true value of n8n lies in its ability to connect ComfyUI with a vast ecosystem of tools, from databases and CRMs to messaging services and other AI models. This opens up new possibilities for building sophisticated, end-to-end applications that go far beyond what a standalone ComfyUI interface can offer.

In the next article of this series, we'll explore another paradigm for connecting ComfyUI with agent-based solutions. We will delve into the Model Context Protocol (MCP), designed to streamline and standardize the way AI models communicate and share contextual information. This will offer a new, more efficient method for agents to interact with and control ComfyUI.

WebSockets & ComfyUI: Building Interactive AI Applications

raphiki — Fri, 05 Sep 2025 09:17:07 +0000

This is the second article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.

You'll learn how to use ComfyUI's API to build custom applications and automate tasks, creating powerful and automated systems for generative AI.

In the previous article of the Beyond the ComfyUI Canvas series, we demonstrated how to connect ComfyUI with Jupyter Notebook using basic HTTP API calls. While functional, this approach had a significant limitation: it relied on a time.sleep() function to wait for workflow completion, requiring manual adjustments based on the complexity of each workflow, a far from ideal solution.

To overcome this inefficiency, we’ll leverage ComfyUI’s WebSocket API (/ws endpoint), which enables real-time, bidirectional communication between Jupyter and ComfyUI. This upgrade unlocks a seamless experience by providing:

Instant execution progress updates to track workflow status,
Live node execution feedback for monitoring each step,
Immediate error messages and debugging insights for troubleshooting,
Dynamic queue status updates to respond to changes on the fly.

By adopting WebSockets, we eliminate guesswork and create a responsive, interactive workflow.

The Use Case

Let's simplify our previous use-case by dropping the OpenAI Assistant and focusing on how to eliminate manual polling or delays. The process is designed to be both intuitive and efficient:

Workflow Setup: A pre-defined ComfyUI workflow (loaded from a JSON file) serves as the foundation for image generation.
Prompt Customization: The user provides a text prompt which is dynamically inserted into the workflow.
Real-Time Execution: Using ComfyUI’s WebSocket API, the notebook sends the workflow to the server and monitors its progress in real time—receiving live updates on execution status, node activity, and completion.
Result Retrieval: Once generation finishes, the resulting images are automatically fetched and displayed directly in the notebook, creating a seamless end-to-end experience.

Let’s dive into the implementation.

Get prompt from user

print("Please enter your prompt")
user_prompt = input()

Please enter your prompt
A penguin in a tuxedo, DJing at a club for dancing jellyfish

Trigger the Workflow from Jupyter Notebook

Below, you’ll find a detailed breakdown of the code designed for use in a Jupyter Notebook, complete with helpful comments to guide you through each step and explain its functionality

Imports and main functions

import websocket  # For WebSocket communication
import uuid       # For generating unique client IDs
import json       # For JSON data handling
import requests   # For HTTP requests (replaces urllib)
from PIL import Image  # For image processing
import io         # For handling binary data streams
import IPython.display as display  # For displaying images in Jupyter

# Server configuration
server_address = "127.0.0.1:8188"  # Local server address and port
client_id = str(uuid.uuid4())      # Unique client ID for this session

def queue_prompt(prompt, prompt_id):
    """
    Send a prompt to the server for execution.

    Args:
        prompt (dict): The workflow/prompt to execute.
        prompt_id (str): Unique ID for tracking the prompt.
    """
    p = {"prompt": prompt, "client_id": client_id, "prompt_id": prompt_id}
    response = requests.post(f"http://{server_address}/prompt", json=p)
    return response

def get_image(filename, subfolder, folder_type):
    """
    Fetch an image from the server.

    Args:
        filename (str): Name of the image file.
        subfolder (str): Subfolder where the image is stored.
        folder_type (str): Type of folder (e.g., 'output').

    Returns:
        bytes: Binary image data.
    """
    params = {"filename": filename, "subfolder": subfolder, "type": folder_type}
    response = requests.get(f"http://{server_address}/view", params=params)
    return response.content

def get_history(prompt_id):
    """
    Retrieve the execution history for a given prompt ID.

    Args:
        prompt_id (str): ID of the prompt whose history is requested.

    Returns:
        dict: History data for the prompt.
    """
    response = requests.get(f"http://{server_address}/history/{prompt_id}")
    return response.json()

def get_images(ws, prompt):
    """
    Execute a prompt and collect the resulting images.

    Args:
        ws (websocket.WebSocket): Active WebSocket connection.
        prompt (dict): The workflow/prompt to execute.

    Returns:
        dict: Dictionary of node IDs and their output images.
    """
    prompt_id = str(uuid.uuid4())
    queue_prompt(prompt, prompt_id)
    output_images = {}

    # Listen for WebSocket messages until execution is complete
    while True:
        out = ws.recv()
        if isinstance(out, str):
            message = json.loads(out)
            if message['type'] == 'executing':
                data = message['data']
                if data['node'] is None and data['prompt_id'] == prompt_id:
                    break  # Execution is done
        else:
            # Binary previews are ignored here
            continue

    # Retrieve and organize output images
    history = get_history(prompt_id)[prompt_id]
    for node_id in history['outputs']:
        node_output = history['outputs'][node_id]
        images_output = []
        if 'images' in node_output:
            for image in node_output['images']:
                image_data = get_image(image['filename'], image['subfolder'], image['type'])
                images_output.append(image_data)
        output_images[node_id] = images_output
    return output_images

Load the workflow and inject the user prompt

with open("t2i-krea.json", "r") as f:
    workflow = json.load(f)

# Update the prompt text in the workflow
workflow["39:6"]["inputs"]["text"] = user_prompt

Communication with ComfyUI through WebSockets

# Establish WebSocket connection
ws = websocket.WebSocket()
ws.connect(f"ws://{server_address}/ws?clientId={client_id}")

# Execute the workflow and collect images
images = get_images(ws, workflow)
ws.close()

Display the output images in Jupyter

for node_id in images:
    for image_data in images[node_id]:
        image = Image.open(io.BytesIO(image_data))
        # Display each image in the notebook
        display.display(image)

This article demonstrated the power of using WebSockets for real-time, bidirectional communication with ComfyUI. By moving beyond simple HTTP requests, we eliminated the need for manual time delays and created a truly dynamic, responsive workflow. This allowed us to monitor the execution of our AI pipeline in real-time, ensuring a more reliable and efficient integration. The result is a seamless experience where we can send a prompt and watch as the generated images appear automatically in our notebook.

Having now explored two different ways to integrate ComfyUI with Python code executed in Jupyter, we've laid a strong foundation for building custom, high-level generative AI applications. But what if you're not a developer, or you simply prefer a visual, no-code approach to orchestration? In the next article of the series, we'll shift our focus from code to a no-code solution like n8n to show you how to build powerful ComfyUI workflows without writing a single line of code.

Unlocking ComfyUI's Power: A Guide to the HTTP API in Jupyter

raphiki — Thu, 04 Sep 2025 15:28:06 +0000

This is the first article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.

You'll learn how to use ComfyUI's API to build custom applications and automate tasks, creating powerful and automated systems for generative AI.

ComfyUI is a powerful, modular interface for generative models, allowing users to create complex AI image, video and sound generation workflows with a node-based editor. Jupyter Notebook, on the other hand, is a popular interactive environment for data analysis, visualization, and prototyping.

By integrating ComfyUI with Jupyter Notebook, you can leverage the flexibility of ComfyUI’s workflows directly within your Python scripts or data science pipelines. This first article focuses on a simple approach using Basic HTTP API calls.

Most of this article is exported from an actual Jupyter Notebook. Both content, Python code and execution results are displayed.

The Use Case

Our goal is to build a high-level generative AI workflow that combines the power of an intelligent agent with the robust image generation capabilities of ComfyUI. The process unfolds in a few simple steps, all orchestrated within a Jupyter Notebook:

User Input: The workflow begins with a simple, high-level prompt entered directly into the notebook.
Agent-Powered Expansion: An OpenAI Assistant then takes this basic prompt and transforms it into a detailed, structured JSON Prompt Style Guide. This process enriches the initial idea with specific creative instructions, such as style, composition, and lighting.
Initiating Generation: This expanded JSON guide is automatically injected into a pre-defined ComfyUI workflow. A single API call to the ComfyUI server starts the image generation process.
Displaying the Result: Once the generation is complete, we make a second API call to fetch the resulting images. The images are then displayed directly within the Jupyter Notebook, completing our automated pipeline.

Prepare a ComfyUI Workflow

Create or load a workflow in ComfyUI.
Save the workflow as a .json file from the "File / Export (API)" menu (e.g., t2i-krea.json).

Get initial prompt from user

print("Please enter your prompt")
user_prompt = input()

Please enter your prompt
Hanuman flying over a modern city at night

Generate JSON Prompt Style Guide with an Assistant

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Create a thread
thread = client.beta.threads.create()

# Send a message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=user_prompt
)

# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id="asst_Uj0Qr0rG0bz8NVk1LWiS9UKv"
)

# Wait for completion and retrieve the response
import time
while run.status != "completed":
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

# Get the response
messages = client.beta.threads.messages.list(thread_id=thread.id)
json_prompt = messages.data[0].content[0].text.value
print(json_prompt)

{
  "style_name": "Urban Deus Ex Hanuman",
  "inspiration": [
    "Modern Urban Aesthetics",
    "Hindu Mythology",
    "Superhero Comics",
    "Cyberpunk Lighting"
  ],
  "scene": "Hanuman, the Hindu god, flying over a bustling modern city radiating bright lights under the cloak of night sky",
  "subjects": [
    {
      "type": "Hanuman",
      "description": "Strong, muscular figure with a monkey face, holding a gada(mace).",
      "position": "midground",
      "pose": "flying with one hand extended",
      "size": "large",
      "expression": "determined",
      "interaction": "flying over the city"
    },
    {
      "type": "city",
      "description": "modern urban skyline with skyscrapers, neon billboards, and busy traffic",
      "position": "background",
      "size": "expansive"
    }
  ],
  "style": "comic-realistic",
  "color_palette": {
    "primary": "#202020",
    "secondary": "#505050",
    "highlight": "#ff6a00",
    "shadow": "#0d0d0d",
    "background_gradient": [
      "#0d0d0d",
      "#303030"
      ]
  },
  "lighting": "Glistening city lights with diffused neon glow and soft moonlight",
  "mood": "powerful and captivating",
  "background": {
    "type": "scenery",
    "details": "Modern urban cityscape with skyscrapers, roads, traffic and massive billboards with neon signs"
  },
  "composition": "Slightly off-center focus with Hanuman taking up prominent space",
  "camera": {
    "angle": "low angle",
    "distance": "medium shot",
    "lens": "wide-angle",
    "focus": "sharp subject, blurred background"
  },
  "medium": "Digital Painting",
  "textures": [
    "smooth skin of Hanuman",
    "rough concrete of buildings",
    "glossy glass of skyscrapers"
  ],
  "resolution": "4K",
  "details": {
    "clothing": "Hanuman is dressed in traditional golden and red garment",
    "weather": "Night with clear sky and a soft moonlight"
  },
  "effects": [
    "Bokeh effect for city lights",
    "Glow effect for neon lights"
  ],
  "themes": [
    "Divinity",
    "Strength",
    "Modernization",
    "Contrast",
    "Juxtaposition of Tradition with Modernity"
  ],
  "usage_notes": "The style is effective in creating a surprising juxtaposition of traditional divinity with modern landscapes. Use this style for high impact illustrations where contrasts need to be highlighted."
}

Trigger the Workflow from Jupyter Notebook

Use the requests library to send a POST request to the ComfyUI API:

import requests
import json

# ComfyUI server URL
comfy_url = "http://127.0.0.1:8188"
prompt_url = f"{comfy_url}/prompt"

# Load your workflow JSON
with open("t2i-krea.json", "r") as f:
    workflow = json.load(f)

# Replace the prompt
workflow["39:6"]["inputs"]["text"] = json_prompt

# Define the payload
payload = {
    "prompt": workflow,
    "client_id": "jupyter_notebook"
}

# Send the request
response = requests.post(prompt_url, json=payload)

# Get the prompt_id
prompt_id = response.json()['prompt_id']
print(prompt_id)

c1a2ced4-772c-4aeb-ac45-bfa183d03a88

Retrieve the generated images

ComfyUI processes the workflow asynchronously.

To fetch the result, poll the /history endpoint:

import time
from IPython.display import Image, display    

# Wait for the workflow to complete
time.sleep(25)  # Adjust based on workflow complexity

# Fetch the latest result for our prompt
history_url = f"{comfy_url}/history/{prompt_id}"
history = requests.get(history_url).json()

# Navigate to the list of image outputs and display them
image_outputs = history[prompt_id]["outputs"]["9"]["images"]

for image in image_outputs:
    filename = image["filename"]
    image_url = f"{comfy_url}/view?filename={filename}"
    display(Image(url=image_url, width=200))

In this article, we've seen how to leverage the power of ComfyUI directly from a Jupyter Notebook. By making simple API calls, we were able to transform a user's basic text prompt into a rich, detailed JSON guide using an OpenAI Assistant, and then feed that guide into a ComfyUI workflow to generate images. This approach demonstrates how you can move beyond the graphical interface to build automated, intelligent systems for creative tasks. The combination of Python's flexibility and ComfyUI's robust backend opens up a world of possibilities for custom, high-level generative AI workflows.

In the next article, we'll take our integration a step further by exploring how to use WebSockets for Real-Time Interaction with ComfyUI.

Enhancing QR Codes in the Age of GenAI

raphiki — Fri, 23 May 2025 09:46:49 +0000

Traditional QR Codes

Quick Response (QR) codes were developed in 1994 by Masahiro Hara and are now recognized as an ISO/IEC standard. They represent an evolution of 2D barcodes, capable of encoding numeric, alphanumeric, binary, or Kanji data in the form of a pattern of black squares on a white background. These codes are available in various sizes (or versions), ranging from version 1 (21 x 21 squares) to version 40 (177 x 177 squares).

Numerous libraries and tools exist for generating QR codes. My preferred open-source library is QR Code Generator, which supports all standard features and is available in Java, TypeScript/JavaScript, Python, Rust, C++, and C. Additionally, my favorite all-in-one open-source tool is QR Toolkit, a Vue/Nuxt application offering marker and module customization, along with verification and comparison mechanisms, an invaluable resource when tweaking QR codes.

QR codes comprise several critical components to ensure readability by scanners, including three positional markers, alignment and timing patterns, and a masking system. While I will not delve into these details now, I will instead focus on the built-in error correction mechanism. This employs Reed-Solomon codes - also used in storage media (CD/DVD, RAID6) and network technologies (DSL, satellite) — by adding extra codewords to the QR grid for error correction. The standard defines four levels of error correction, each associated with a different tolerance percentage:

Level	Approximate Error Tolerance
Low	~7%
Medium	~15%
Quartile	~25%
High	~30%

This means a QR code with High error correction can still be scanned if up to 30% of the image becomes unreadable. This feature is often utilized to embed images within QR codes: the embedded image is treated as errors during scanning.

For years, this technique has been used for personalizing QR codes. This article explores an innovative approach to customizing QR codes by leveraging Generative AI instead.

Harnessing Generative AI

My proposal involves using a Stable Diffusion model integrated within the ComfyUI graphical interface to design and execute local generation workflows on a GPU-equipped PC. For detailed guidance on these components, refer to this article or this video.

To modify and refine existing QR codes while maintaining their scannability, we will use a specialized ControlNet called QR Code Monster. ControlNets are auxiliary neural network models that inject targeted guidance into the generation process by focusing on specific features of an input image. Each ControlNet emphasizes particular aspects, such as structure (pose, edges, segmentation, depth), texture, content layout (bounding boxes, masks), or style (color maps, textures). In our scenario, we’ll focus on maintaining or modifying QR code contrast features.

Let’s proceed to create a workflow in ComfyUI, employing Stable Diffusion 1.5, the QR Code Monster ControlNet, and a QR code generated via QR Toolkit.

Adjusting parameters such as the ControlNet’s strength and start/end positions, along with the sampling process (e.g., 50 steps), I obtained a result that remains scannable and aligns with my input prompt: “A beautiful landscape, blue sky, grass, flowers.”

This demonstrates how Stable Diffusion combined with ControlNet preserved the original pattern while injecting desired visual elements. Using QR Toolkit’s comparison feature, we can assess the QR code’s readability by examining the difference markers.

Next, we can modify the prompt to produce multiple variants of our QR code. For example:

While changing the overall style is straightforward (first example), embedding specific content within the QR code remains more challenging than with traditional tools (second example). To explore this further, we'll examine two axes separately: Style and Content, before combining them.

Customizing Style

Enhancing the prompt allows for more precise control over the QR code’s aesthetic. For instance, leveraging a large language model (LLM) to generate detailed prompts:

“A pattern forged from molten lava, glowing with an intense fiery orange and red hue. Cracks in the surface reveal volcanic heat, with small embers rising around it.”

Similarly, for a more intricate and mystical style:

“An elegant, glowing elven door adorned with intricate, nature-inspired patterns and shimmering silver runes. Delicate vines and luminescent flowers intertwine with the carvings, pulsating with soft emerald and sapphire light. The archway, crafted from ethereal white stone, radiates a mystical aura, with faint golden mist swirling at its base, hinting at an ancient portal to a hidden realm.”

Predefined styles can also be injected into prompts using the iTools Prompt Styler Extra node in ComfyUI:

This node offers reusable prompts categorized by various artistic styles: 3D, Art, Craft, Design, Drawing, Illustration, Painting, Sculpture, Vector, and more. Incorporating it into our workflow makes testing different styles effortless without altering other parameters.

Below are examples of QR codes generated with different styles:

Additionally, combining styles with custom prompts allows for highly personalized designs, enabling limitless customization of your QR codes’ appearance.

Embedding Content

Having mastered style adjustments, the next step is to embed specific generated content into QR codes. For example, I wish to insert an image of a yoga pose. If you’ve read my previous articles on AI image generation, you’ll understand the transfer of poses through workflows. Details are available here for further reference.

We’ll start with an abstract image of the target pose, add Depth and Canny Edge ControlNets to our workflow, and specify in the prompt: “man, mixed race, short curly hair, black hair, 40 years old, white T-shirt, black yoga pants, short sleeves, smiling, viewing glasses, white background, barefoot.” Essentially, I aim to generate an image resembling myself.

To ensure a realistic likeness, additional steps include incorporating the FaceID IP Adapter and the FaceDetailer post-processing model into the workflow. Refer to this article for comprehensive guidance on implementing face transfer. The outcome preserves scannability and creates a QR code embedding the desired pose and identity:

Using QR Toolkit again, the comparison displays about 26 mismatch nodes, primarily around the facial features and body.

Integrating Style and Content

All previous steps can be combined by adding the iTools node to the final workflow:

Making the QR Code Animate

Given that I can embed a face into the QR code, I can also animate facial expressions using specialized nodes. The Advanced Live Portrait tool is designed for editing, inserting, and animating facial expressions in images. By inputting our generated QR code, we can animate my face to produce a smiling expression or nodding motion.

The resulting animation can be exported as an animated GIF or video:

Final Thoughts

This short tutorial has demonstrated how to significantly enhance both the stylistic and content-related aspects of a QR code. You are now equipped to craft engaging, customized QR codes that align with your personal or branding style.

The only limits are your patience and imagination, so have fun experimenting!

The Yoga of Image Generation – Part 3

raphiki — Mon, 19 May 2025 14:16:11 +0000

In the first two parts of this series, we explored Stable Diffusion, ComfyUI, and how to build Text-to-Image and Image-to-Image workflows to generate images of Yoga poses. With the help of ControlNets, we learned how to transfer a pose from an abstract reference image to our final generated image.

A Yoga sequence consists of several connected poses, which means we need visual consistency across all generated images in the sequence. This consistency must first cover the style which we addressed in the previous part of the series but also the facial features of the person depicted.

LoRAs (Low-Rank Adapters)

Let’s now introduce a new component into our workflow to tackle this challenge: Low-Rank Adapters (LoRAs). LoRAs make slight adaptations to the base model they are trained on by modifying only a small subset of neural network parameters. This is a highly efficient technique, as it enables faster training, smaller file sizes, and lower memory usage. You can think of a LoRA as a patch applied at runtime to the base model. Multiple LoRAs can be chained together.

LoRAs are typically used to specialize an existing model with certain image features such as style, poses, concepts, or characters. They are triggered in prompts using specific keywords defined by the LoRA creator during training. The community offers numerous LoRAs available for download from sites like civitai.com, which can be integrated into your local ComfyUI workflows.

Here are two examples of images generated using a "Pencil drawing" LoRA, with two different keywords and all other parameters unchanged:

The community also offers countless LoRAs for generating images resembling celebrities. Let’s try using some of these to achieve facial consistency. We’ll start by testing Celebrity LoRAs with very light pose transfer (ControlNet strength set to 10%) to see how closely the generated faces match.

Promising results! Note that the poses aren’t identical across images, this is due to the low ControlNet strength we used.

Next, let’s incorporate these LoRAs into our previous pose generation workflow. I stacked two LoRAs: one for facial identity and another for a graphite drawing style. I also kept the two ControlNets we introduced earlier for pose transfer.

With this setup, we can generate sequences that are consistent in both style and facial identity.

Of course, we can change the celebrity reference or even chain multiple LoRAs together, adjusting their strengths to blend features of different identities. However, using public figures still feels a bit uncomfortable, potentially raising ethical concerns around deepfakes.

A better approach is to create your own LoRA, avoiding such issues. So I decided to train a LoRA using images of my wife. I first experimented with the DreamBooth method, using a Colab Notebook and Google GPUs. I trained the model on 28 images of her, using an SDXL base model, over 2 epochs, taking around 1.5 hours.

The results were... promising 😉
Here are some of the best images generated with my first custom LoRA:

The resemblance is there, but not quite enough, and the image quality was lacking. So I tried again, this time training the LoRA locally on my PC using the Kohya_ss open source tool. I selected the PowerPuffMix model (a fine-tuned of SDXL), trained on just 15 images but for 20 epochs. The process took about 3.5 hours and yielded better results.

This time, both image quality and facial identity were strong enough to integrate into our generation workflow.

Here are some outputs using the new LoRA. While the face doesn’t perfectly resemble my wife (likely due to the influence of ControlNets) the identity consistency we needed is clearly present.

The lighting is still a bit unstable, and overall image quality remains imperfect. I could improve this by training on more images and increasing the number of epochs. However, the final LoRA is still fundamentally linked to the base model and can't be applied to another one.

Image Prompt Adapters (IP Adapters)

Let’s now try another technique: Image Prompt Adaptation, which is more decoupled from the base model. It functions similarly to a ControlNet but alters the model directly. Think of an IP Adapter as a one-image LoRA.

The FaceID IP Adapter, specialized in facial recognition and feature extraction, is a perfect fit for our needs.

While exploring facial enhancement tools, I also discovered FaceDetailer, which improves facial features (eyes, nose, lips, expression) after image generation. I decided to integrate both of these components into our workflow. FaceDetailer’s enhancements are based on the FaceID input, so they remain faithful to the original facial reference.

Here is the complete workflow:

We now finally achieve our desired outcome:

Control over style via prompts and embeddings
Control over pose via ControlNets
Control over identity via the FaceID IP Adapter and FaceDetailer

This setup allows us to generate precise and coherent Yoga sequences.

Another advantage of this workflow is how easily we can switch the base model. For instance, here’s an example using the Cheyenne model, which specializes in cartoon and graphic novel styles:

It’s also incredibly easy to change the subject’s identity. Since FaceID only requires a single image and no training phase, here are examples generated with the exact same workflow, using my own face as input for facial identity:

This concludes our three-part series. My initial goal — generating accurate yoga poses and full sequences using only a local machine — has been achieved.

In Part 1, we introduced Stable Diffusion and ComfyUI to build simple Text-to-Image workflows using prompts and embeddings. In Part 2, we explored pose transfer using Image-to-Image workflows and ControlNets. In this final installment, we addressed facial consistency, first with LoRAs, then with the FaceID IP Adapter and the post-processing FaceDetailer.

You’re now ready to create custom workflows tailored to your specific visual goals. Enjoy experimenting with generative AI to express your creativity with precision!

Stay tuned for more image generation tutorials and in the meantime, feel free to explore my YouTube channel for more content.

DEV Community: Technology at Worldline

Modern concurrency on the JVM: Coroutines and Loom

Introduction

Modern concurrency concepts

Lightweight threads

Structured concurrency

Kotlin coroutines

Java's virtual threads and structured concurrency

Coroutines vs Java modern concurrency

Conclusion

Links and references

Build a SciFi Novel with AI Spec-Driven Development

1. The Design Phase: Forging "Lore as Code"

2. The Harness: Framing AI with a Strict Operating System

3. Agile Writing: Sprints, Generation, and Pivots

4. Multi-Model Review and Quality Control

5. Build Pipeline: From IDE to Physical Book

6. Transmedia Extension: Multimodality, Cover Art, and Vibe Coding the ARG

Conclusion: The Author-Architect Paradigm

About the Author

Vibe coding level UP

First prompt:

Response

Second prompt

Second response:

Third prompt

Third response

My comments

Vibe Coding One Slice at a Time

1. The Mission: Complexity Check (The Boss Level)

The Four Domains of Pain

2. The Blueprint: Architecture & Tech Stack

The "ADR": The Architect's Save Game

3. The Methodology: Governance-Driven Development (GDD)

4. The Execution: A high-level Overview

Slice 1: The "Polymorphic" Database

Slice 2: The "Hybrid Brain"

Slice 3: The "Offline Printer"

5. The Architect's Flex: Automated C4 Verification

6. The AIOps Protocol: Monitoring the Machine

7. The "Oh S**t" Moment: The Hallucination Trap

Conclusion: The Architect's Verdict

Vibe Coding One Pixel at a Time

1. Context is King (The .md Anchors)

2. The Architecture: Letting the AI be CTO

3. The "Rig": Math is for Machines

4. Iteration: The "Yes, And..." Technique

5. The Pivot: Language as a Feature

6. The "Traceability" Hack

Conclusion

Vibe Coding One Page at a Time

What is "Vibe Coding"?

The Use Case: "I Just Want to Read Offline"

The Process: Galloping Toward Complexity

Step 1: The Naive Loop

Step 2: The Picture Book

Step 3: The Search for Meaning (OCR)

The Technical Deep Dive: The "PDF Sandwich"

What I Learned

Conclusion

The Ultimate LLM Inference Battle: vLLM vs. Ollama vs. ZML

The "Runtime Wars"

The Methodology: Why QSOS?

The Contenders

1. vLLM: The Data Center Standard

2. Ollama: The Developer's Best Friend

3. ZML (Zig Machine Learning): The Radical Challenger

Visualizing the Results

The Radar Chart: Feature Balance

The QSOS Quadrant: Market Position

The Consolidated Score Sheet

Conclusion

Choose vLLM if:

Choose Ollama if:

Choose ZML if:

Note on Methodology

Automating Image Generation with n8n and ComfyUI

Installation

Text-to-Image Workflow

Use Case

1. Context is King (The `.md` Anchors)