Add process CPU utilization metrics to HardwareMetricsEngine
#1
Summary Add process CPU utilization metrics to HardwareMetricsEngine.
Changes
- Add CpuSampler based on Process.getElapsedCpuTime() and SystemClock.elapsedRealtime()
- Add CpuMetrics to benchmark results
- Include CPU statistics in MetricsResult
- Generalize PowerStats to FloatMetricStats so the same structure can be reused for power and CPU metrics
- Stop cancelling the caller-owned benchmark scope when stopping HardwareMetricsEngine
Validation Tested on an Android device.
Verified that benchmark output now includes CPU metrics in hardware_metrics.json, for example:
"cpu": { "processUsagePercent": { "mean": 356.14523, "peak": 442.85712 } }
Build verification: ./gradlew :cli:assembleDebug
Why CPU Metrics Mattered?
When I started exploring MoRAGBench, I wasn't looking for a difficult feature. Quite the opposite.
As a first time contributor, I wanted something small enough that I wouldn't get lost in an unfamiliar codebase, but interesting enough that I'd learn something from it.
One thing immediately caught my attention.
The framework already collected useful hardware metrics such as memory usage and power consumption during benchmark execution.
At first glance, that seemed sufficient.
But the more I thought about it, the more I realized there was a missing piece.
Imagine two retrieval-augmented generation (RAG) pipelines producing identical accuracy.
One consumes:
150 MB RAM
120% CPU utilization
The other consumes:
400 MB RAM
700% CPU utilization
Without CPU metrics, both systems might appear equivalent from an accuracy perspective even though their computational cost is dramatically different.
MoRAGBench could already answer:
How much memory did the benchmark use?
How much power did it consume?
It couldn't answer:
How much CPU did this benchmark actually consume?
The First Question That Broke My Assumptions
My original hypothesis sounded almost embarrassingly simple:
Measure CPU utilization and expose it as another hardware metric.
I assumed the implementation would be the difficult part.
I was wrong.
Before writing a single line of Kotlin, I ran into a much bigger question:
What exactly should CPU utilization mean?
At first, the question sounded almost silly.
But the deeper I looked, the more definitions appeared:
Device-wide CPU utilization.
Process CPU utilization.
Per-core utilization.
Average utilization.
Peak utilization.
Each of these measures something different.
A technically correct implementation built on the wrong definition would still produce a misleading benchmark metric.
This was the first lesson of the project:
Defining a metric is often harder than implementing it.
Before touching code, I spent time understanding how MoRAGBench already collected hardware metrics and where CPU utilization would fit into that architecture.
Technical Exploration
Android doesn't provide a convenient API that simply returns "CPU utilization."
Instead, it exposes lower-level building blocks:
Process CPU time.
Wall-clock time.
Number of available processors.
CPU utilization has to be derived from those values.
The implementation eventually centered around:
CPU Utilization = (Process CPU Time Delta / Wall Clock Time Delta) ร 100
The idea is straightforward:
Measure how much CPU time the process consumed.
Measure how much real-world time passed.
Compare the two.
Initially, this seemed like a solved problem.
Then I ran the benchmark.
The First Time I Thought My Implementation Was Broken
The numbers looked wrong.
Some runs reported:
356%
399%
452%
My immediate reaction was:
Something must be wrong.
CPU utilization is a percentage. Percentages shouldn't exceed 100%. Right?
I spent quite a while questioning my implementation before I questioned my assumption.
The problem wasn't the code. It was my mental model.
Modern Android devices have multiple CPU cores. A process can execute work across several cores simultaneously.
On an eight core device, the process can legitimately consume close to 800% CPU utilization because utilization is aggregated across all available cores.
For example:
| Active Cores | Utilization |
|---|---|
| 1 core | 100% |
| 4 cores | 400% |
| 8 cores | 800% |
The implementation wasn't wrong. My intuition was.
That realization completely changed how I interpreted every CPU number afterward.
A benchmark reporting 700% utilization wasn't necessarily broken. It simply meant the process was heavily utilizing approximately seven CPU cores.
The Reviewer Found the Real Problem
Once I was reasonably confident in the implementation, I opened my pull request.
I expected comments about Kotlin style, Android APIs or code structure.
Instead, the reviewer asked a question I hadn't considered.
This code reports CPU usage time for all cores. Can Android expose per-core or per-thread utilization? If not, can we expose the number of CPU cores and calculate average utilization per core?
Reading that comment felt strange. The implementation worked. The reviewer agreed it worked. Yet the feature still wasn't finished.
That was probably the first time I realized that writing software and designing software are different activities.
The reviewer wasn't asking me to fix a bug. He was asking me to improve the meaning of the metric.
If someone later saw 650% CPU utilization, would they know whether that meant six busy CPU cores or a broken implementation?
Without additional context, the metric was technically correct but practically confusing.
That feedback fundamentally changed the feature.
Instead of reporting only aggregate utilization, the final implementation exposed:
Process CPU utilization
Average utilization per core
Available processor count
Nothing about the calculation changed. Everything about the interpretation improved.
This became the most valuable part of the entire contribution.
Validation
One mistake I often see in engineering projects including my own is treating compilation as validation.
Compilation proves syntax.
It does not prove correctness.
Initially, I assumed that because the calculation looked correct, the implementation was probably correct.
The reviewer challenged that assumption too. He suggested comparing the benchmark output against an external monitoring application.
Although external tools report overall device utilization rather than benchmark process utilization, the numbers should still be in the same general range if the benchmark dominates the workload.
After comparing the benchmark output with DevCheck, the values weren't identical, as expected but they followed similar trends during execution.
Internally, the metrics were also consistent.
For example:
If a benchmark reported:
Process utilization = 640%
Available processors = 8
Then:
640 รท 8 = 80%
which matched the reported per-core utilization. This wasn't mathematical proof.
But it was strong evidence that the implementation behaved as intended.
The Lesson That Changed How I View Code Reviews
Before this contribution, I viewed code reviews primarily as implementation reviews. Now I think differently.
A good reviewer does more than verify code quality. A good reviewer protects metric quality. The most important feedback I received had nothing to do with Android APIs. It had everything to do with interpretation.
Once a benchmark publishes a number, engineers will compare systems, make decisions and draw conclusions from it.
That means a performance metric isn't merely another value produced by code. It's a promise.
If that promise is easy to misunderstand, the implementation isn't really finished even if the code is technically correct.
That realization fundamentally changed how I think about performance metrics.
Final Outcome
The contribution ultimately introduced:
Process CPU utilization metrics
Per-core Process CPU utilization metrics
Available processor count
The pull request was merged as:
Add process CPU utilization metrics to HardwareMetricsEngine
#1
Summary Add process CPU utilization metrics to HardwareMetricsEngine.
Changes
- Add CpuSampler based on Process.getElapsedCpuTime() and SystemClock.elapsedRealtime()
- Add CpuMetrics to benchmark results
- Include CPU statistics in MetricsResult
- Generalize PowerStats to FloatMetricStats so the same structure can be reused for power and CPU metrics
- Stop cancelling the caller-owned benchmark scope when stopping HardwareMetricsEngine
Validation Tested on an Android device.
Verified that benchmark output now includes CPU metrics in hardware_metrics.json, for example:
"cpu": { "processUsagePercent": { "mean": 356.14523, "peak": 442.85712 } }
Build verification: ./gradlew :cli:assembleDebug
More importantly, it taught me something that extends far beyond Android development.
The hardest part of my first open source contribution wasn't implementing CPU utilization.
It was realizing that the real bug wasn't in my Kotlin code.
It was in the assumptions I brought with me before I wrote the first line.
Thanks for reading!
๐ Blog: patilspeaks.hashnode.dev
๐ฆ X: x.com/PatilSpeaksX
๐ป GitHub: github.com/PatilSpeaks





Top comments (0)