DEV Community

Cover image for My First Open Source Contribution: The Hardest Part Wasn't Writing the Code
PatilSpeaks
PatilSpeaks

Posted on • Originally published at patilspeaks.hashnode.dev

My First Open Source Contribution: The Hardest Part Wasn't Writing the Code

Add process CPU utilization metrics to HardwareMetricsEngine #1

Summary Add process CPU utilization metrics to HardwareMetricsEngine.

Changes

  • Add CpuSampler based on Process.getElapsedCpuTime() and SystemClock.elapsedRealtime()
  • Add CpuMetrics to benchmark results
  • Include CPU statistics in MetricsResult
  • Generalize PowerStats to FloatMetricStats so the same structure can be reused for power and CPU metrics
  • Stop cancelling the caller-owned benchmark scope when stopping HardwareMetricsEngine

Validation Tested on an Android device.

Verified that benchmark output now includes CPU metrics in hardware_metrics.json, for example:

"cpu": { "processUsagePercent": { "mean": 356.14523, "peak": 442.85712 } }

Build verification: ./gradlew :cli:assembleDebug

My first contribution to MoRAGBench, which eventually became PR #1


Why CPU Metrics Mattered?

When I started exploring MoRAGBench, I wasn't looking for a difficult feature. Quite the opposite.

As a first time contributor, I wanted something small enough that I wouldn't get lost in an unfamiliar codebase, but interesting enough that I'd learn something from it.

One thing immediately caught my attention.

The framework already collected useful hardware metrics such as memory usage and power consumption during benchmark execution.

At first glance, that seemed sufficient.

But the more I thought about it, the more I realized there was a missing piece.

Imagine two retrieval-augmented generation (RAG) pipelines producing identical accuracy.

One consumes:

  • 150 MB RAM

  • 120% CPU utilization

The other consumes:

  • 400 MB RAM

  • 700% CPU utilization

Without CPU metrics, both systems might appear equivalent from an accuracy perspective even though their computational cost is dramatically different.

MoRAGBench could already answer:

  • How much memory did the benchmark use?

  • How much power did it consume?

It couldn't answer:

How much CPU did this benchmark actually consume?


The First Question That Broke My Assumptions

My original hypothesis sounded almost embarrassingly simple:

Measure CPU utilization and expose it as another hardware metric.

I assumed the implementation would be the difficult part.

I was wrong.

Before writing a single line of Kotlin, I ran into a much bigger question:

What exactly should CPU utilization mean?

At first, the question sounded almost silly.

But the deeper I looked, the more definitions appeared:

  • Device-wide CPU utilization.

  • Process CPU utilization.

  • Per-core utilization.

  • Average utilization.

  • Peak utilization.

Each of these measures something different.

A technically correct implementation built on the wrong definition would still produce a misleading benchmark metric.

This was the first lesson of the project:

Defining a metric is often harder than implementing it.

Before touching code, I spent time understanding how MoRAGBench already collected hardware metrics and where CPU utilization would fit into that architecture.

Understanding the existing hardware metrics pipeline before introducing a new metric


Technical Exploration

Android doesn't provide a convenient API that simply returns "CPU utilization."

Instead, it exposes lower-level building blocks:

  • Process CPU time.

  • Wall-clock time.

  • Number of available processors.

CPU utilization has to be derived from those values.

The implementation eventually centered around:

CPU Utilization = (Process CPU Time Delta / Wall Clock Time Delta) ร— 100

The idea is straightforward:

  • Measure how much CPU time the process consumed.

  • Measure how much real-world time passed.

  • Compare the two.

Initially, this seemed like a solved problem.

Then I ran the benchmark.


The First Time I Thought My Implementation Was Broken

The numbers looked wrong.

Some runs reported:

  • 356%

  • 399%

  • 452%

My immediate reaction was:

Something must be wrong.

CPU utilization is a percentage. Percentages shouldn't exceed 100%. Right?

I spent quite a while questioning my implementation before I questioned my assumption.

The problem wasn't the code. It was my mental model.

Modern Android devices have multiple CPU cores. A process can execute work across several cores simultaneously.

On an eight core device, the process can legitimately consume close to 800% CPU utilization because utilization is aggregated across all available cores.

For example:

Active Cores Utilization
1 core 100%
4 cores 400%
8 cores 800%

The implementation wasn't wrong. My intuition was.

That realization completely changed how I interpreted every CPU number afterward.

A benchmark reporting 700% utilization wasn't necessarily broken. It simply meant the process was heavily utilizing approximately seven CPU cores.

CPU utilization can exceed 100% on multi-core systems because utilization is aggregated across cores.


The Reviewer Found the Real Problem

Once I was reasonably confident in the implementation, I opened my pull request.

I expected comments about Kotlin style, Android APIs or code structure.

Instead, the reviewer asked a question I hadn't considered.

This code reports CPU usage time for all cores. Can Android expose per-core or per-thread utilization? If not, can we expose the number of CPU cores and calculate average utilization per core?

Reading that comment felt strange. The implementation worked. The reviewer agreed it worked. Yet the feature still wasn't finished.

That was probably the first time I realized that writing software and designing software are different activities.

The reviewer wasn't asking me to fix a bug. He was asking me to improve the meaning of the metric.

If someone later saw 650% CPU utilization, would they know whether that meant six busy CPU cores or a broken implementation?

Without additional context, the metric was technically correct but practically confusing.

That feedback fundamentally changed the feature.

Instead of reporting only aggregate utilization, the final implementation exposed:

  • Process CPU utilization

  • Average utilization per core

  • Available processor count

Nothing about the calculation changed. Everything about the interpretation improved.

A review comment that shifted the conversation from implementation details to metric semantics

This became the most valuable part of the entire contribution.


Validation

One mistake I often see in engineering projects including my own is treating compilation as validation.

Compilation proves syntax.

It does not prove correctness.

Initially, I assumed that because the calculation looked correct, the implementation was probably correct.

The reviewer challenged that assumption too. He suggested comparing the benchmark output against an external monitoring application.

Although external tools report overall device utilization rather than benchmark process utilization, the numbers should still be in the same general range if the benchmark dominates the workload.

After comparing the benchmark output with DevCheck, the values weren't identical, as expected but they followed similar trends during execution.

Internally, the metrics were also consistent.

For example:

If a benchmark reported:

  • Process utilization = 640%

  • Available processors = 8

Then:

640 รท 8 = 80%

which matched the reported per-core utilization. This wasn't mathematical proof.

But it was strong evidence that the implementation behaved as intended.

CPU utilization metrics collected during benchmark execution


The Lesson That Changed How I View Code Reviews

Before this contribution, I viewed code reviews primarily as implementation reviews. Now I think differently.

A good reviewer does more than verify code quality. A good reviewer protects metric quality. The most important feedback I received had nothing to do with Android APIs. It had everything to do with interpretation.

Once a benchmark publishes a number, engineers will compare systems, make decisions and draw conclusions from it.

That means a performance metric isn't merely another value produced by code. It's a promise.

If that promise is easy to misunderstand, the implementation isn't really finished even if the code is technically correct.

That realization fundamentally changed how I think about performance metrics.


Final Outcome

The contribution ultimately introduced:

  • Process CPU utilization metrics

  • Per-core Process CPU utilization metrics

  • Available processor count

The pull request was merged as:

Add process CPU utilization metrics to HardwareMetricsEngine #1

Summary Add process CPU utilization metrics to HardwareMetricsEngine.

Changes

  • Add CpuSampler based on Process.getElapsedCpuTime() and SystemClock.elapsedRealtime()
  • Add CpuMetrics to benchmark results
  • Include CPU statistics in MetricsResult
  • Generalize PowerStats to FloatMetricStats so the same structure can be reused for power and CPU metrics
  • Stop cancelling the caller-owned benchmark scope when stopping HardwareMetricsEngine

Validation Tested on an Android device.

Verified that benchmark output now includes CPU metrics in hardware_metrics.json, for example:

"cpu": { "processUsagePercent": { "mean": 356.14523, "peak": 442.85712 } }

Build verification: ./gradlew :cli:assembleDebug

More importantly, it taught me something that extends far beyond Android development.

The hardest part of my first open source contribution wasn't implementing CPU utilization.

It was realizing that the real bug wasn't in my Kotlin code.

It was in the assumptions I brought with me before I wrote the first line.

Thanks for reading!

๐Ÿ“– Blog: patilspeaks.hashnode.dev

๐Ÿฆ X: x.com/PatilSpeaksX

๐Ÿ’ป GitHub: github.com/PatilSpeaks

Top comments (0)