Simphiwe Twala

Posted on Mar 21

CodeRef - Smart Java Analyzer with ML Engine

#codereview #java #machinelearning #productivity

How a Single IntelliJ Plugin Cut Our Code Review Rework by 60% — A 6-Month Honest Review

Developer's IntelliJ IDEA workspace with CodeRef's analysis report open in the bottom panel, showing zero critical issues on a Spring Boot service class after automatic refactoring.

I'm a backend engineer on a team of eight. We build microservices in Spring Boot, and like most Java teams, we use SonarQube in our CI pipeline to enforce code quality gates. It's a solid tool and we rely on it.

But there was always a gap in our workflow — the feedback only arrived after pushing code. I'd get a SonarQube report 12 minutes later telling me I introduced a cognitive complexity violation on line 47. By then I've already moved on to the next ticket. The quality gate was working, but the feedback loop was slow.

Six months ago, a colleague dropped a link to CodeRef in our team Slack channel. "Try this, it catches stuff before you even commit." I installed it expecting another linter that I'd disable within a week.

I haven't disabled it. Here's why.

Week 1: The Instant Feedback Loop Changes Everything

The first thing that hit me was speed. I opened a service class I'd been working on, and within seconds the Report tab at the bottom of my IDE lit up with findings — not after a CI pipeline, not after a PR review, but right there while I was still writing the method.

CodeRef analysis appearing within seconds of opening a Java file, with the Report tab showing three findings: a cognitive complexity violation, a missing @Transactional annotation, and field injection flagged for constructor migration.

That first day, it caught three things in a class I was about to push:

A @Transactional annotation on a private method — Spring proxies don't intercept private methods, so the annotation was doing absolutely nothing. This had been in production for two sprints. Traditional static analysis tools don't typically flag this because it's a framework-specific pattern that requires understanding Spring's proxy mechanism.
An Optional.get() without an isPresent() check — I knew better, but I was moving fast and missed it. Classic.
Field injection on three @Autowired fields — not a bug, but CodeRef flagged it with a clear explanation of why constructor injection is preferred for testability.

What made this powerful was the timing. Having analysis run inside the IDE meant I fixed all three before committing. No CI round-trip. No PR comment. No context-switching back to code I wrote an hour ago. By the time our CI pipeline ran its quality gate, the code was already clean.

Week 3: The Auto-Fixers Saved Me Hours

I'll be honest — I almost ignored the "Refactored Code" tab for the first two weeks. I assumed it would be naive find-and-replace suggestions.

Then I had a 45-line method that CodeRef flagged for cognitive complexity (S3776). Out of curiosity, I clicked the tab. It had extracted two nested blocks into well-named private methods, preserved the logic perfectly, and presented the result as a clean diff.

I copied it over. It compiled. Tests passed. What would have been a 15-minute manual refactor took 30 seconds.

CodeRef's Refactored Code tab showing a side-by-side diff where a 45-line method with nested conditionals has been split into three focused methods, with the extracted methods named after their business logic.

Since then, the fixers I use constantly:

Try-with-resources conversion — I have a codebase with legacy try-finally blocks everywhere. CodeRef converts them one click at a time. I've cleaned up about 30 so far during regular feature work, no dedicated refactoring sprint needed.
Constructor injection migration — We decided as a team to move away from @Autowired field injection. Instead of a bulk find-and-replace that would break things, I let CodeRef migrate each class as I touch it. It adds the final field, creates the constructor parameter, and removes the annotation.
String concatenation in loops — It found three StringBuilder opportunities in our batch processing code. The performance improvement was measurable in our metrics dashboard.

The key insight: the fixes aren't suggestions, they're working code. I review the diff, apply it, and move on. The plugin does the mechanical refactoring so I can focus on the logic.

Month 2: Test Generation Accelerated Our Coverage Push

Our team had a quarterly goal to get test coverage from 54% to 75%. Everyone was dreading the "write tests for existing code" phase.

CodeRef's test generation changed the math on that effort entirely.

For a @RestController with five endpoints, it generated a complete @WebMvcTest class with MockMvc setup, mock dependencies, and test methods for each mapping — including exception cases. Was it perfect? No. I adjusted assertions and added edge cases specific to our business logic. But the scaffolding was correct and the boilerplate was done.

CodeRef's Test Cases tab showing a generated @WebMvcTest class for an OrderController, with MockMvc injection, mocked OrderService, and test methods for GET /orders, POST /orders, and GET /orders/{id} including 404 handling.

For @Service classes, it set up MockitoExtension with the right @Mock and @InjectMocks fields by reading the constructor parameters. For our @Repository classes, it generated @DataJpaTest with TestEntityManager.

What used to take 20–30 minutes per class (setting up the test class, figuring out which mocks to wire, writing the first few test methods) now takes 5 minutes of review and customization.

We hit 78% coverage three weeks ahead of schedule. I'm not going to attribute that entirely to CodeRef — the team put in real work on the complex test scenarios. But eliminating the boilerplate setup meant we spent our time on the tests that actually matter.

Month 3: The ML Engine Started Earning Its Keep

For the first two months, I occasionally dismissed findings that weren't relevant to our codebase. A TODO comment warning on a ticket-tracked TODO. A magic number flag on a well-known HTTP status code. The usual noise.

Around week 10, I noticed something: those warnings stopped appearing. CodeRef's ML engine had been quietly learning from my dismissals and started suppressing similar false positives.

I checked the ML insights panel — it had suppressed 23 findings that week that matched patterns I'd previously dismissed. Every single suppression was correct.

This is the feature that turned CodeRef from "good tool" to "tool I'd fight to keep." Every other linter I've used has a static configuration — you either suppress a rule globally or you deal with the noise. CodeRef learns what matters to you and adjusts. The signal-to-noise ratio gets better every week.

The severity re-ranking was a subtler benefit. Our team cares deeply about resource leaks (we had a production incident caused by an unclosed database connection), so I always prioritized those fixes. After a few weeks, CodeRef started bumping resource-related findings to Critical automatically. It understood our priorities without me writing a config file.

Month 5: Spring-Specific Rules Caught Two Production Bugs Before They Shipped

This is the story I tell when people ask if CodeRef is worth the Pro license.

Bug #1: Self-invocation bypassing @Transactional

We had a service method that called another method in the same class, both annotated with @Transactional. The inner call wasn't going through the Spring proxy, so it was running without a transaction boundary. In our test environment with small datasets, this was invisible. In production with concurrent writes, it would have caused data inconsistency.

CodeRef flagged it as S5962 (Spring proxy self-invocation bypass) with a clear explanation of why the proxy doesn't intercept internal calls. I refactored it to use a separate service class. Total time to fix: 10 minutes.

Bug #2: @ConfigurationProperties without @Validated

A configuration class was binding external properties without validation. One of the properties was a connection timeout that defaulted to zero when not set. In our staging environment, the property was always present. In a new deployment environment that was being provisioned, it wasn't — and zero meant "no timeout," which meant threads hanging indefinitely.

CodeRef flagged the missing @Validated annotation (S5975). I added it along with @NotNull and @Positive constraints. The new environment launched without issues.

Neither of these would have been caught by PMD or SpotBugs alone. They require understanding Spring's proxy mechanism and configuration binding behavior. This is what framework-aware analysis means in practice.

Month 6: Project-Wide Analysis for Sprint Planning

We recently started using CodeRef's project-wide scan before sprint planning. The scan runs across the entire Maven project and produces an aggregate report with per-file severity distribution.

What makes this useful alongside our CI quality gates is the developer experience:

Runs locally in the IDE — no context-switching to a browser dashboard, results in about 90 seconds for our 200-file project
ML-enhanced results — the bug risk score highlights files that are structurally complex and frequently modified, so we know where to focus review effort
Actionable next steps — every finding has an auto-fixer or test generation strategy attached, so the scan feeds directly into hands-on-keyboard work

We've started allocating 10% of each sprint to "CodeRef hygiene" — picking the highest-risk files from the scan and applying auto-fixes. It's not glamorous work, but the trend lines on our defect rate speak for themselves.

CodeRef project-wide analysis showing a file list sorted by bug risk score, with the top three files highlighted in red, severity distribution chart showing a 40% reduction in Critical findings over the past three sprints, and per-file issue counts with auto-fixable percentages.

The Numbers After 6 Months

I tracked some metrics because I knew people would ask:

Table showing 6-month impact metrics: code review rework dropped from ~12 to ~5 per sprint (-60%), time to detect issues from 47 minutes to under 10 seconds, test coverage from 54% to 81%, production bugs from 2-3 per quarter to 0, manual refactoring time from ~6 hours to ~1.5 hours per sprint, and issues caught before CI from ~0% to ~85%

The ROI calculation was straightforward enough that our engineering manager approved Pro licenses for the entire team without a formal business case.

What I Wish Were Better

It's not a perfect tool, and I'd rather give an honest review than a sales pitch:

Large files (800+ lines) take a noticeable pause — the three-engine parallel analysis is fast, but on a god class it can take 5–8 seconds. Not a dealbreaker, but noticeable.
The ML engine needs 50 interactions to activate — for the first few weeks, you're getting raw unfiltered results. I wish there were a way to bootstrap it with team-level patterns from day one.
No Kotlin support yet — we have a few Kotlin modules and those don't get analyzed. The roadmap mentions it, so I'm hopeful.
Test generation is scaffolding, not magic — the generated tests are structurally correct and save significant time, but you still need to write the meaningful assertions. That's probably the right tradeoff, but set your expectations accordingly.

Who Should Consider This

If you're on a Java team that:

Uses Spring Boot, JPA, or Apache Camel
Wants code quality feedback at write-time to complement your CI quality gates
Has a test coverage goal and needs to eliminate boilerplate
Wants framework-aware analysis that understands Spring proxies, JPA lifecycle, and Camel routing
Wants a tool that gets smarter over time instead of requiring more configuration

Then CodeRef is worth a serious evaluation. It works well on its own, and it works even better as an early-feedback layer alongside your existing CI pipeline. Install the free tier, run it on your most problematic service class, and see what it finds. That's what convinced me.

Diagram showing CodeRef in the developer workflow: code is analyzed instantly in the IDE at write-time, issues are fixed before commit, and the CI pipeline quality gate sees cleaner code with fewer failures.

Six months in, CodeRef has become an essential part of my daily workflow. The earlier I catch an issue, the cheaper it is to fix — and catching it while I'm still in the method is about as early as it gets. If you've tried CodeRef, I'd love to hear how it's working for your team.