Local Image Generation in Java (Yes, Really): FLUX.2 + Quarkus + FFM

#java #quarkus #ai #ffm

I got into image generation the normal way: Midjourney → “wow” → “how does this work?”

And then you try to run something locally and reality gently taps you on the shoulder with a brick:

Python-first stacks
CUDA assumptions
dependency graphs shaped like spaghetti
“works on my machine” energy… on someone else’s machine

If you’re on Apple Silicon (or any ARM box), you get the bonus level: “second-class citizen mode.”

Now add Java to the mix and the typical solution becomes:

“Run a Python server next to the JVM and pretend that’s fine.”

I’ve done it. It works. It also feels like you’re building a tiny distributed system just to generate a PNG.

So I tried a different approach:

Load the model into the JVM process. Call native inference directly. Keep lifecycle and memory under Java’s control.

No REST sidecar. No subprocess. No “hope the port is free.”

Just Quarkus + FLUX.2-klein-4B + Java FFM (Project Panama).

This is the teaser. The full tutorial (with all the sharp edges and the “why does this crash when it can’t find a shader file?” moments) is linked at the end.

The pitch: native inference as a first-class Java citizen

Most local-gen setups treat the model like an external service.

That’s fine until you care about:

startup time (loading weights every request… nope)
memory ownership
concurrency limits
predictable failure behavior
“what happens when it segfaults?”

With FFM, the JVM can load a shared library and call C functions directly in-process.

That means the Java app owns:

model lifecycle (load once, reuse)
native memory boundaries (explicit arenas, explicit lifetimes)
concurrency policy (single context, pooled contexts, whatever you decide)

Basically: the JVM stops being a client and becomes the host.

The core trick: don’t bind the whole C project

If you point jextract at a large C codebase and hope for the best, it will do what all great tools do:

It will teach you humility.

So the tutorial uses a pattern that keeps you sane:

Compile FLUX into a shared library (.dylib / .so)
Define a tiny wrapper header that exposes only what Java needs
Generate bindings for that header

Your wrapper becomes the stable “native boundary”:

init(model_path)
generate(ctx, prompt, output_path, width, height, steps, guidance, seed)
free(ctx)

No giant structs crossing the boundary. No leaking internal headers into Java-land. No “FFM archaeology.”

The other trick: the model is… large

The model download is about 16 GB.

That’s not a typo. That’s Tuesday.

Which is why the tutorial is very explicit about:

loading once at startup
keeping weights resident
exposing a simple REST endpoint that queues work safely

Also: yes, CPU inference can be viable if you pick the right model and resolution. The point is not “make it instant.”

The point is “make it predictable.”

A tiny taste: what the Java side looks like

The Java binding layer is intentionally boring (which is the highest compliment in this area):

allocate prompt + paths in a confined Arena
call flux_wrapper_generate(...)
return a file path
keep the native context alive until shutdown

And before we touch native code, we validate everything on the Java side, because:

Native code doesn’t throw exceptions. It throws your JVM out the window.

So the record that validates request parameters is not “nice to have.” It’s protective gear.

Why this is fun (and useful)

This isn’t about building a Midjourney competitor.

It’s about proving a bigger point:

Java can host modern native AI workloads locally
FFM is practical for real integrations
Quarkus is a great “container” for native inference (startup/shutdown, config, REST)

And once you have a local image generator in Java, a lot of other ideas get… tempting.