DEV Community

Vinicius Carvalho
Vinicius Carvalho

Posted on

How I started Inference4j - an open source ML project in java with the help of Claude

Introduction

A few weeks ago, I was researching how to run model inference in Java without integrating with an external API or loading a Python sidecar.

I stumbled upon ONNX runtime and was immediately hooked. A portable binary execution of a self-contained Neural Network Model.

When looking at benchmarks I was even more impressed — ONNX models seem to outperform PyTorch inference in most models I've tested.

There's even a nice Java wrapper.

However, it still requires users to know about tensor shapes, pre-processing steps, and post-processing steps.

Inference4j — Three Lines of Code Inference

So I thought, why not make it simple for Java developers and let them approach it from a use-case perspective?

Inference4j was born.

I'll spare you the details on how it works — I hope the docs make that clear enough.

The goal was simple:

  • Make inference a user-driven task. ImageClassifier, TextGenerator, ObjectDetector — these are the contracts users should interface with.
  • Run locally, not as a wrapper for API services.
  • Support ONNX's vast library of models.

Leveraging AI to Write an ML Framework

With this task at hand I started writing some of the interfaces. I knew I wanted operations to live at a use-case level, so I crafted the main interfaces first.

Next, I figured the lib should provide a pipeline interface that wraps things for users: (pre-process) → Inference → (post-process). Users shouldn't really care about what's under the hood — like how you need to pick the right scaling algorithm (Bicubic or Bilinear) for an image model. But at the same time, it had to be flexible and structured enough so that contributors could add more use cases via new models and pre/post processors.

When it came to actually implementing all of this, well, my time is really limited these days. That task would have probably set me back a few weeks, so instead I decided to use Claude to help me out.

Not hiding it, not trying to pretend I didn't use it — it was a massive accelerator for me, especially on tasks like configuring GitHub Actions or writing Python scripts (not my main language of choice). Those were created in minutes, not days.

Now, it made a ton of mistakes, needed a lot of gentle reminders not to violate good design principles or skip good software practices, but in most cases it was a life saver. The project would not have started without it.

I'd say I probably wrote about 35% of the code. All interfaces, design choices, and model selection were my inputs, but the bulk of implementation was Claude Code.

Don't Vibe Code, Learn From It

Now this is my biggest take, and also the most important part of this blog post.

I relied on Claude to help me write something like 90% of the tokenizer code. But in doing so, I also debugged each one of the tokenizers, learned about different strategies, and understood why each model chooses a different tokenizer for its encoder/decoder phase.

This was a gigantic learning experience — much richer than any Coursera course (and I've taken several courses in ML at this point), and much more insightful than just trying some Hugging Face model off the shelf. I was literally, at some points, debugging token IDs against the Python version to see if my tokenizer logic was correct.

Inverted Prompt Engineering

Early on I was using onnx-gen-ai for generative AI, but that became challenging as the runtime is still experimental and I was maintaining a Java build that Microsoft still doesn't publish.

So instead of asking Claude to write the auto-regressive loop that generative models need in order to produce new tokens, I asked it to explain to me, step by step, how the generative loop works — the KV cache, the logits, the sampling — and I wrote it myself.

And that was not only very insightful, it was fun. It gave me a sense of fulfillment that sometimes I feel AI agents take away from us. The agent was telling me what to do, not the other way around, and in the process I was learning what happens inside a generative loop.

This post was not about the library — although I hope you give it a try, open an issue, fix a bug, or contribute with a new model — but about the journey so far. And I'm excited about how much more can be done in this new era of software engineering we live in.

Top comments (0)