DEV Community: Stepan Kukharskiy

So ComfyUI started as a solo side project 👀

Stepan Kukharskiy — Thu, 30 Apr 2026 04:39:19 +0000

In early 2023, a developer known as "comfyanonymous" was experimenting with Stable Diffusion. Existing tools were fine for simple text-to-image generation, but frustrating if you wanted to chain models together, mix different passes, or control the exact steps of the pipeline.

So he built a completely new interface from scratch.
He took a totally different direction:

Instead of simple sliders, he built a visual, node-based graph.
You didn't just type text; you explicitly wired together the math - models, noise, latents, and samplers.

It was harder to learn, but gave users unprecedented, repeatable control over their workflows.

Gradually, through GitHub, Discord, Reddit, and YouTube, technical artists realized they finally had a true "node editor" for AI.

The real magic was its modularity. Because the architecture was open and flexible, the community started building custom nodes. Today, whenever a new AI model or technique drops, someone builds a ComfyUI node for it within days.

As the tool became a staple for power users, researchers, and professional pipelines, it grew too big for a solo maintainer.

But instead of closing the source code to make money, they took a different path. The project evolved into a formal company to support the ecosystem, keeping the core engine open-source while building cloud, enterprise, and collaboration services around it to ensure its future.

It seems, you don't always need to start with a pitch deck and a SaaS pricing tier. Sometimes the best path is to solve a deep technical pain point for power users, let an obsessed open-source community build an ecosystem around it, and figure out the business structure later.

What are your thoughts on this approach?

Image - Flux Schnell

I never new that but Cursor was initially a CAD tool 🤪

Stepan Kukharskiy — Thu, 30 Apr 2026 04:37:25 +0000

Before building an AI code editor, the Cursor founders were working on a copilot for CAD. The idea was to help mechanical engineers in tools like SolidWorks and Fusion 360 by predicting the next geometry change while designing a part.

They explored two directions:

A pure 3D approach.
A text-based approach, where CAD actions were converted into sequences of method calls.

That second path sounded clever, but it was very hard in practice. The model had to do more than predict the next action. It also had to mentally reconstruct the geometry from a sequence of operations, which is difficult because CAD kernels and 3D geometry are complex.

The bigger issue was data. There was far less CAD data on the open internet than code, so training useful models was much harder. The science also was not ready yet. Pretrained models were still weak for 3D tasks, and the team had to do a lot of scraping and data work just to improve performance.

They did many user interviews with CAD users, but later realized that interviews were not enough.

What is interesting is that the project was not wasted. While working on CAD AI, they learned how to train large models, run inference at scale, and build infrastructure around behavior cloning and model deployment. Those lessons became very useful later.

The real reason for the pivot was simple: they were more excited about coding than mechanical engineering. They were programmers themselves, believed AI would reshape software development, and decided to work on the domain they understood best.

Well, sometimes the first idea (2, 3, 4, 5, 6, 7, 8, ha-ha) fails, but the skills, infrastructure, and clarity you gain from it become the foundation for the future. Yes?

Image - Flux Schnell

World Models?

Stepan Kukharskiy — Fri, 24 Apr 2026 04:37:25 +0000

World models are becoming the next real battleground in AI - not just chat, not just image generation, but systems that can simulate how environments behave, change, and respond.

And the field is not converging on one idea; it is splitting into competing philosophies.

LeCun / AMI Labs are betting that true intelligence will come from latent world models like JEPA, where the system predicts abstract structure instead of reconstructing every pixel, and AMI has already raised more than $1B to pursue that path.

Runway and OpenAI Sora (RIP) represent another camp: generative world models that learn by predicting and rendering the world itself, with Runway now shipping GWM-1 variants for explorable worlds, robotics simulation, and avatars.

Google DeepMind Genie 3 pushes this even further toward real-time interactivity, generating navigable environments at 24 fps and letting users modify the world live with new prompts.

World Labs, founded by Fei-Fei Li, is especially interesting because it is aiming at spatial intelligence more directly: generating full 3D scenes from a single image or prompt, with geometry, depth, and navigation built in.

Then there is the code-based world model direction, where LLMs generate executable programs that simulate environments, and research shows this can make planning 4–6 orders of magnitude faster than relying on neural rollouts alone in formal domains.

To me, this is the important shift: AI is moving from describing the world to modeling the world.

And once a system can model a world, it can do much more than generate media - it can plan, test actions, reason about consequences, and eventually become a real design or robotics engine.

My bet is that there will not be one dominant world model architecture.

We’ll likely end up with different stacks for different needs: latent models for abstraction and planning, video models for realism, 3D models for spatial interaction, and code-based models for precision and control.

For anyone building in design, robotics, games, or spatial computing, this feels like the beginning of a new foundational layer - not just models that generate outputs, but models that can simulate possibility.

The companies that matter in the next wave of AI may not be the ones with the best chatbot. They may be the ones that build the best simulation layer for reality.

It also makes me wonder whether systems like Spellshape - which turn intent into structured modeling briefs, executable spatial actions, and editable 3D outcomes - are an early form of a design world model.  

Style as a Look vs Style as a Way of Knowing 🤔

Stepan Kukharskiy — Thu, 23 Apr 2026 13:12:28 +0000

Most conversations about AI and art treat style as something visible. Kazuo Iwamura reminds us that style can also be a way of knowing.

I’ve been thinking about Iwamura, the Japanese picture-book author and illustrator best known for the 14 Forest Mice books, and what makes his work feel so enduring. For me it is the method underneath them.

In an interview, Iwamura said that even after art school he continued to study plants and animals very closely, trying to depict the inner “life” that cannot be seen from the outside. That idea explains a lot about his visual language. His images are simplified, but they do not feel generic. They are gentle, but not vague. The calm in them seems to come from observation, selection, and restraint rather than from decorative sweetness alone. You can feel this in the way he places small creatures inside larger living environments: trees, grasses, weather, nests, paths, and seasonal change. The animals are anthropomorphic, but the world around them still feels attentively seen and ecologically grounded.

His process also appears deeply temporal. In the same interview, he spoke about ideas, sketches, and finished illustrations unfolding across the seasons, and his books repeatedly use seasonal transition as part of their emotional structure. He also named artists such as Leo Lionni, Marie Hall Ets, Felix Hoffmann, and Beatrix Potter as influences, especially books in which pictures carry the story. That matters, because he was not merely illustrating narratives about nature. He was building meaning through composition, pacing, gesture, and environment.

This is why I think Iwamura matters so much now. In the AI age, style is often reduced to a visual signature: palette, texture, softness, atmosphere. But Iwamura points toward something deeper: not “how do I make this look natural?” but “how do I observe the world closely enough that form and feeling emerge from that relationship?” He also believed children need not only good picture books, but direct experience of the natural world itself, a belief he carried into the museum he opened in Tochigi in 1998.

That is a powerful lens for design, architecture, and generative systems too. AI can already imitate style as a visual signature. What it still struggles with is style as compressed perception - an approach to the world built from attention, selection, and lived observation. That’s why Iwamura still matters.

Image: Spellshape - an AI agent that generates 3D you can edit later.