DEV Community: Pablo Werlang

Why I Built an Open-Source Online Judge Instead of Maintaining a Legacy One

Pablo Werlang — Thu, 04 Jun 2026 02:33:44 +0000

For years, IFSul – Campus Charqueadas, a federal institute in southern Brazil, has hosted programming marathons for both high school and undergraduate students.

Like many educational institutions, our goal has always been straightforward: create opportunities for students to develop problem-solving skills, learn programming, and experience the challenge and excitement of competitive programming.

What most participants never see, however, is everything happening behind the scenes.

This is the story of how our contests evolved from manual grading to automated judging, why I decided to build a new platform instead of maintaining a legacy one, and how that project eventually became both an open-source judge system and an online competitive programming platform.

The Era of Manual Grading

When our programming marathons began, there was no automated judging system.

Teams would solve problems, submit their source code, and wait for organizers to evaluate the solutions manually.

The process was exactly as tedious as it sounds.

An organizer would collect the source file, compile it, execute it against test cases, compare the results with the expected output, and then determine whether the solution should be accepted.

For small competitions, this approach was manageable.

As participation grew, it quickly became clear that it would not scale.

If we wanted our competitions to continue growing, we needed automation.

Enter BOCA

To solve this problem, a colleague introduced BOCA into our competitions.

For those unfamiliar with it, BOCA is a contest management and judging system widely used in Brazil, including in competitions associated with the Brazilian Computer Society (SBC). It has been around for many years and has played an important role in the Brazilian competitive programming ecosystem.

For us, it represented a major improvement.

Instead of managing teams, submissions, and scoreboards manually, we now had a dedicated platform built specifically for programming contests.

The administrative side became significantly easier.

The judging side, however, was a different story.

While BOCA supports automatic judging, configuring and maintaining that infrastructure was never particularly straightforward. The platform reflects design decisions from a different era of software development, and getting everything configured correctly required navigating a considerable amount of documentation, scripts, permissions, and server-specific configuration.

In practice, automatic judging frequently failed at exactly the wrong moment: during the contest itself.

As a result, we often used BOCA primarily as a contest management tool while much of the actual judging continued to happen manually.

It worked.

But it wasn't the experience we wanted.

The Moment Everything Changed

Eventually, responsibility for running the programming marathon landed on my desk.

There was no crisis.

No catastrophic outage.

No dramatic handover.

The contest was still running, and BOCA was still there.

The problem was much simpler.

I knew almost nothing about administering BOCA.

So I did what any developer would do: I started reading documentation, installation guides, configuration files, and forum discussions.

The deeper I went, the less interested I became in becoming a BOCA expert.

To be clear, BOCA has served the competitive programming community well for decades.

But I had little desire to spend my time learning the intricacies of a large legacy system with a configuration process that felt unnecessarily complicated and difficult to maintain.

At some point, I stopped asking myself:

"How do I configure BOCA correctly?"

And started asking:

"If I were building a contest platform today, what would it look like?"

That question eventually became AutoJudge.

Starting From Scratch

When I decided to build AutoJudge, my objective wasn't simply to replace BOCA.

I wanted to solve the problems I personally experienced while preparing competitions.

Before writing a single line of code, I established a few principles:

Deployment should be simple and reproducible.
Contest setup should not require extensive system administration knowledge.
The interface should be friendly for both organizers and competitors.
The judging infrastructure should be reliable enough that organizers can trust it during live events.
Educational institutions should be able to self-host the platform without fighting the infrastructure.

These principles guided every decision that followed.

AutoJudge began as a side project aimed at improving our local competitions.

What started as a tool built for a specific need gradually evolved into an open-source project that other institutions can also use and adapt to their own contests.

Building the Judge

At its core, AutoJudge consists of three major components:

A web application used by participants and administrators.
An API responsible for contest management.
A judging system responsible for evaluating submissions.

One of the most important parts of any judge system is executing untrusted code safely.

Every submission received by the platform is code written by a participant. By definition, the system must execute programs that it did not create and cannot fully trust.

To address this, AutoJudge executes submissions inside isolated containerized environments.

When a solution is submitted:

The submission enters the judging queue.
The source code is prepared for execution.
An isolated execution environment is created.
Resource limits are applied.
The solution is compiled and executed.
The output is compared against the expected results.
The execution environment is discarded.

Containerization is not a guarantee of perfect security, nor should it be considered one.

Its purpose is to provide a practical isolation boundary that reduces risk while remaining manageable for educational institutions.

For our use case, it offered a good balance between safety, reliability, and operational simplicity.

Adopting AutoJudge

After enough development and testing, we decided as a group to adopt AutoJudge in our programming marathons.

The difference was immediate.

Instead of worrying about fragile contest infrastructure, we could focus on creating better competitions.

Creating and managing problems became considerably easier thanks to the web-based interface.

The judging process became more reliable.

The interface was significantly more approachable for students.

Most importantly, organizers could spend less time fighting the platform and more time running the event.

That was the entire point.

Why I Open-Sourced It

As we continued using AutoJudge, I started thinking about a simple question.

If we had struggled with these problems, how many other schools and universities were facing the same situation?

Many educational institutions want to host programming contests.

Many instructors want to introduce competitive programming to their students.

Many communities want to organize local events.

What they often lack is the time and infrastructure expertise required to maintain complex systems.

That realization is what motivated me to open-source the project.

Today, the judge, API, and web platform are all available publicly.

My hope is that other institutions can benefit from the same tools we built for ourselves.

Repository:

github.com/werlang/autojudge

Beyond Self-Hosting

The open-source platform solved our institutional problem.

But another idea kept coming back.

What about everyone who doesn't want to host anything?

What about students who simply want to practice?

What about teachers who want to create contests without provisioning servers, configuring databases, managing updates, or maintaining infrastructure?

Those questions eventually led to autojudge.io.

Introducing autojudge.io

autojudge.io builds on the same ideas behind AutoJudge while removing the need for self-hosted infrastructure.

Instead of downloading, deploying, configuring, and maintaining the platform yourself, you can simply create an account and start using it.

Students can:

Solve public problems.
Track their progress.
Compare themselves against other competitors.
Participate in contests hosted by institutions and communities.

Educators and organizers can:

Create contests.
Manage participants.
Publish problem sets.
Run competitions without worrying about servers or deployment.

One feature that quickly became central to the platform is the integrated web-based code editor.

Participants can read a problem statement, write code directly in the browser, execute their solution, test it against custom inputs, and submit it to the official judging queue without ever leaving the platform.

The editor provides a complete workflow for competitive programming:

Read the problem statement.
Write code.
Run and debug solutions.
Test against sample or custom cases.
Submit to the judge.
Review verdicts and execution results.

This makes the platform particularly accessible for newcomers, who can start solving problems immediately without installing compilers, configuring IDEs, or setting up local development environments.

For experienced competitors, it provides a convenient environment for quick practice sessions and online contests.

The goal is not to replace traditional development tools.

The goal is to reduce friction and make participation as straightforward as possible.

Some institutions prefer complete control and self-hosting through the open-source AutoJudge platform.

Others prefer a zero-configuration solution that is ready to use immediately.

I wanted to support both.

What's Next?

AutoJudge started because I didn't want to spend my time learning how to maintain a legacy contest platform.

I wanted something simpler.

Something easier to deploy.

Something easier to understand.

Something designed around the needs of students and educators rather than the constraints of decades-old infrastructure.

What began as a practical solution for a local programming marathon eventually evolved into an open-source project and an online platform used beyond our campus.

There is still a lot to improve.

The judging infrastructure continues to evolve.

New features continue to be developed.

And there are countless ideas still waiting to be implemented.

But the core mission remains unchanged:

Make programming contests easier to organize and more accessible to everyone.

If you're involved in programming education, competitive programming, or educational technology, I'd love to hear your experiences.

How does your institution handle programming contests?

What tools are you using today?

And have you ever looked at a legacy system and thought:

"I could probably build this myself."

Because that's exactly how AutoJudge started.

About Sharing Local Inference: A Marketplace for Renting Idle GPUs with an OpenAI-Compatible Backend

Pablo Werlang — Mon, 04 May 2026 20:42:09 +0000

The last year of AI tooling has felt weirdly split in two.

On one side, frontier cloud models are still impressive, still useful, and still setting the pace for a lot of the industry. On the other side, they are getting harder to treat like stable infrastructure. Prices move up, limits get tighter, availability gets noisier, and the feeling of building on top of someone else's quota policy keeps getting stronger.

At the same time, the supply side is changing fast.

Chinese model labs and open-weight ecosystems are shipping at a pace that would have felt unrealistic not long ago. The gap with the biggest frontier models is still real, but for a lot of practical tasks it is getting smaller, and sometimes smaller much faster than the market narrative suggests. That matters because once the quality floor rises enough, the whole question changes from “who owns the smartest model?” to “who can serve good-enough intelligence cheaply, reliably, and close to the user?”

That shift is why more people are buying GPUs for local LLM use.

Some are doing it for privacy. Some want predictable costs. Some care about latency. Some just want control over the stack instead of depending on a remote platform that can change the rules overnight. And once those GPUs exist, a second question shows up almost immediately: what happens when they sit idle?

That question is what pulled me into this project.

I was not trying to make a polished product pitch out of it. I wanted to see what happened if I treated that question as an actual backend design problem.

Why this started to feel worth building

Three things seem to be converging at once.

First, frontier cloud APIs are becoming harder to treat like boring infrastructure. Prices move, limits tighten, regional availability changes, and a lot of teams are discovering that “just call the best hosted model” is not as stable a default as it looked a year ago.

Second, the supply side is changing. Chinese labs and open-weight ecosystems are shipping fast, and the quality curve is rising quickly enough that for many practical tasks the question is no longer only “which model is smartest?” but also “which model is good enough at the best operational cost?”

Third, a lot more people now own GPUs than they used to. Some bought them for privacy. Some for latency. Some for predictable cost. Some because they want to run agents and workflows locally without asking permission from a remote platform every five minutes. Once those GPUs exist, one obvious systems question appears:

How do we coordinate idle capacity?

The idea

I wanted to explore a simple premise: if people already own GPUs for local inference, why not let them rent out idle capacity to other developers through a marketplace?

Not a vague “decentralized AI” slogan. A concrete backend structure:

workers connect and advertise model capacity
consumers send requests through an OpenAI-compatible API
the platform matches demand to supply
responses stream back in real time
usage gets settled after the job completes

That became LocalLMarket: a peer-to-peer marketplace for LLM compute where GPU owners can publish an offer and API users can buy inference from the available pool.

The goal was not to pretend this is solved. The goal was to build a working backend structure that lets me experiment with the idea in a serious way, and show the developer community both the possibilities and the obstacles.

This repository is exactly that: a working backend for testing the concept, not a production-ready marketplace.

The way I ended up thinking about it

Once I got past the vague “decentralized AI” framing, the problem became much easier to reason about.

It stopped looking like a grand vision and started looking like a handful of pretty ordinary backend concerns:

discovery: which workers exist and what do they offer?
matching: which worker should handle this request?
relay: how do tokens get streamed back to the caller?
settlement: who pays whom, and when?
trust: how do you stop the whole thing from becoming nonsense?

That framing is what led me to build LocalLMarket.

Not as a finished startup. Not as “Uber for GPUs,” which is the kind of phrase that should make everyone a little nervous. As a working backend structure for experimenting with the concept and seeing where the real engineering friction actually is.

A minimal architecture for this kind of system

The current repo implements a pretty opinionated split:

an API service owns the public HTTP surface, authentication, worker selection, stream relay, and settlement
worker processes connect outward over WebSocket, advertise model capacity, receive jobs, run inference, and stream results back

That shape matters.

Instead of exposing a public HTTP server on every worker node, the system keeps the control plane in one place and treats workers more like queue consumers. That simplifies the first version of the problem: auth, accounting, and routing stay centralized while compute stays distributed.

Here is the abstract flow:

Consumer app / agent
        |
        | OpenAI-compatible request
        v
API service
  - authenticate user
  - apply pricing/throughput constraints
  - choose worker
  - create order record
        |
        | WebSocket job dispatch
        v
Worker node
  - run model
  - stream chunks back
        |
        | SSE relay
        v
Consumer app / agent

Why the OpenAI-compatible part matters more than it sounds

One decision I like here is using an OpenAI-compatible API surface.

This is not just about convenience. It is about lowering integration resistance.

If a local compute marketplace speaks the same language most tooling already expects, it can drop into existing applications and almost any agentic workflow with very little ceremony. You are not asking developers to rebuild orchestration just to try a different supply layer.

In practice, the mental shift becomes:

"What if I changed the base URL and the backend supply model, but kept the rest of my app or agent stack basically the same?"

That mattered a lot to me while building this.

An agent loop, internal tool runner, or multi-step workflow can keep using the same chat completion pattern while routing requests through a marketplace-backed control plane instead of a single centralized vendor.

For example:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://localhost"
)

response = client.chat.completions.create(
    model="qwen-or-llama-worker-pool",
    messages=[
        {"role": "system", "content": "You are an agent planner."},
        {"role": "user", "content": "Summarize this incident report."}
    ],
    stream=True,
)

That same pattern works whether the caller is a chatbot, an internal automation service, or an agentic workflow coordinating multiple model calls. The point is not that OpenAI is the only interface that matters. The point is that compatibility turns a weird infrastructure experiment into a very small application change.

What the project turned into, technically

What fell out of this pretty quickly was that the idea is less about “building a marketplace” and more about stitching together five backend problems.

1. Worker registration

Workers need to identify themselves, declare which model they can serve, expose pricing and throughput information, and maintain a live session with the control plane.

Conceptually, the worker advertises an offer like this:

{
  "workerId": "gpu-node-17",
  "model": "qwen2.5-32b",
  "price": 0.40,
  "tps": 52,
  "status": "available"
}

The exact fields are less important than the shape: you need a registry of who can do what, at what cost, and whether they are actually online.

2. Matching logic

Once a request arrives, the system needs to choose a worker. The current project keeps that intentionally simple: respect consumer constraints such as max price and minimum throughput, then prefer the cheapest suitable worker with throughput as a tie-breaker.

In pseudocode, the idea is basically:

const candidates = workers
  .filter((worker) => worker.model === requestedModel)
  .filter((worker) => worker.price <= user.maxPrice)
  .filter((worker) => worker.tps >= user.minTps)
  .sort((left, right) => left.price - right.price || right.tps - left.tps)

return candidates[0]

That is enough to make the market legible before you start adding more sophisticated routing, reputation weighting, or dynamic pricing.

3. Stream relay

Once the worker starts generating output, the system has to relay chunks back to the caller in real time. While the worker is connected over WebSocket to the API service, the caller is usually expecting an HTTP response with a streaming body. That means the API service has to be a middleman for the token stream, which adds some complexity around backpressure, error handling, and connection management.

4. Settlement

Billing in LLM systems is awkward because you often do not know the exact final cost until generation is complete. So the cleaner model is usually:

create an execution record when dispatch starts
compute actual cost when usage is known
debit the requester
credit the worker owner
keep platform fee logic explicit instead of magical

That is the pattern this backend uses.

The limitations are not side notes

This is the part that is still the hardest. If you are paying for remote inference, how do you know the results are real?

A marketplace for remote model execution is not just a routing problem. It is a trust problem wearing a routing costume.

You cannot robustly verify worker execution yet

If a worker says it ran a given model, there is no built-in proof that it actually did. No trusted execution environment. No strong attestation. No cryptographic proof of faithful execution. That is a major unresolved problem, not an implementation detail.

When the worker gives you back tokens, you have no strong guarantee they came from a real model running on a GPU instead of a different and cheaper model, a local cache, or even a random token generator. That is a fundamental trust issue that any open marketplace has to grapple with.

If the workers are all running in a shared physical environment you control, that is less of an issue. But if the whole point is to let anyone rent out their GPU, it becomes a real problem.

Reputation is still weak

Uptime and request count are better than nothing, but only slightly. A real market would need stronger feedback loops, better failure accounting, dispute handling, and probably model-specific trust signals.

This is not production-ready

That is deliberate. The repository is a working backend structure for experimenting with the idea, sharing the tradeoffs, and making the constraints visible to other developers. It is not pretending to be a finished marketplace product.

Why I think this is worth discussing now

I do not think the interesting future is just “everyone uses one hosted frontier API forever.”

Model capability is diffusing. Hardware ownership is diffusing. Agentic workflows are increasing demand for repeated, composable model calls. And once teams start caring more about cost control, locality, and infrastructure independence, alternative supply layers become much more interesting.

A local LLM marketplace is one possible response to that shift.

Maybe it becomes a serious category. Maybe it stays niche. Maybe the trust problem is harder than the market opportunity. All of those are plausible outcomes. But I think it is worth exploring in code rather than only in threads and hot takes.

Why I am sharing the repo anyway

Part of the fun of building something like this is that it forces a bunch of fuzzy industry arguments to become concrete.

You stop saying “decentralized inference” and start asking much more useful questions:

where should the control plane live?
how will workers register and stay authenticated?
how will you choose between price, latency, and throughput?
how will you make streaming reliable?
what trust model are you actually offering users?

Those questions are more valuable than the slogan.

If you want to see the implementation I used to explore them, the repo is here:

werlang / locallmarket

Peer-to-peer LLM compute marketplace

LocalLMarket

🤝 Peer-to-peer LLM compute marketplace. Anyone can contribute GPU power and earn. Anyone can access affordable AI.

💡 The Problem

LLM inference is expensive. OpenAI, Claude, Gemini—they all charge premium prices because they own the compute. Meanwhile, millions of people and businesses have idle GPU capacity: a gaming PC, a cloud instance running 4 hours a day, a research lab with spare hardware.

Why should all that compute go unused? Why should consumers pay centralized prices when free-market competition could drive costs down by 10x or 100x?

🎯 The Solution

LocalLMarket is a peer-to-peer marketplace where:

Workers (anyone with a GPU) register a model, set their own price, and earn credits per completed request.
Consumers find the best-priced worker meeting their needs and call the model via a simple API.
The platform matches orders, streams responses, settles payments, and tracks reputation.

No gatekeepers. No middlemen. Just fair-market pricing and…

View on GitHub

If you have built something adjacent, or if you think this architecture breaks in an important way, I would genuinely like feedback. The point of publishing this is not to advertise a product. It is to compare notes with other developers while this design space is still open.