DEV Community

Cover image for Scoring Goals Beyond the World Cup
Adedoyinsola Ogungbesan
Adedoyinsola Ogungbesan

Posted on

Scoring Goals Beyond the World Cup

The World Cup is heating up, with new stars emerging from every corner of the globe. Japan, Morocco and Cape Verde have been playing some of the best football we've seen in a long time.

While the countdown to the final in July continues, something else is quietly coming to an end in the coding world.

Gemini Code Assist's free GitHub code reviews are being sunset.

For those of us who leaned on the free tier, that's a bit of a shame.

My first thought wasn't, "What paid service do I replace it with?"

Instead it was:

How much of my engineering workflow can I push to open-source models instead?

The Problem

I've noticed something about myself.

Whenever Codex, Antigravity or another coding assistant generates code, I'm always impatient to review it.

The code works.

The tests pass.

I have another idea waiting.

So I push.

That's usually when mistakes sneak in.

Rather than depending on my own discipline, I wanted the review to happen automatically inside my CI before I even opened a pull request.

Not because AI writes bad code.

Because software deserves another pair of eyes.

Dave Farley's Influence

I've spent a lot of time listening to Dave Farley over the last year.

One thing he keeps repeating is that quality shouldn't be inspected at the end.

It should be built into the process.

That idea stayed with me.

Rather than trying to review everything manually, I wondered if a lightweight reviewer could become just another part of my development pipeline.

Bigger Models Aren't Always the Answer

Around the same time I was working through Kaggle's five-day course on AI Agents.

One lesson that really stuck with me was that not every task needs the biggest model available.

Planning.

Architecture.

Design.

Those probably deserve stronger reasoning models.

Reviewing a pull request?

Checking test quality?

Looking for maintainability issues?

Maybe those can be delegated to much smaller models.

That completely changed how I started thinking about AI workflows.

Instead of asking:

What's the smartest model?

I started asking:

What's the smallest model that can do this job well enough?

Building the Reviewer

That question slowly became Silver-One.

The goal wasn't to replace frontier models.

The goal was to move as much repetitive engineering work as possible onto inexpensive open-source models, leaving the larger models for the work that genuinely benefits from them.

Using our replayable LLM cassette design, I started experimenting with small models like Qwen running directly inside CI.

They're tiny.

They're imperfect.

But because they're inexpensive, I no longer hesitate to let them review every pull request.

That completely changes the economics.

Instead of worrying about token costs every time I push code, I can reserve my paid credits for planning, architecture and difficult reasoning while letting smaller models handle continuous review.

A Humbling Experience

After a few pull requests the feedback became surprisingly humbling.

Functions I thought were finished kept coming back with comments about maintainability, correctness and test quality.

It reminded me that generating code quickly isn't the same thing as building software that's easy to evolve.

Fast code generation is impressive.

Maintainable software is still hard.

Comparing Against Gemini

Before Gemini disappears, I had the chance to compare its reviews against the reviewer I'm currently building.

Gemini is genuinely impressive.

It often finds the two or three issues that actually matter instead of producing a long list of generic advice.

My reviewer still has a long way to go.

But that's also the exciting part.

Because it's open source, I control the prompts, the scoring, the workflow and the evaluation.

Improving it becomes an engineering problem instead of waiting for another hosted service.

The goal was never to replace Gemini.

It was to learn from it.

One Lesson I'll Probably Keep

The biggest thing I've learned isn't about models.

It's about workflows.

Sometimes the biggest improvement doesn't come from using a larger model.

Sometimes it comes from designing a better engineering process.

If I can push repetitive review work onto small open-source models while saving larger models for planning and architecture, then I think that's a workflow worth building.

Hopefully by the time the World Cup reaches its final whistle, I'll have scored a few goals of my own—not on the football pitch, but inside my CI pipeline.

Thanks to Dave Farley and everyone working to make software engineering a little more disciplined.

It's been a fun experiment so far, and I think it's only getting started.

Top comments (0)