DEV Community: Iñaki Villar

ImNotOkay, a GC experiment for Android CI builds

Iñaki Villar — Wed, 01 Apr 2026 14:11:21 +0000

Inspired by a very specific early-2000s song, I’m presenting a new garbage collector policy for Android CI builds: ImNotOkayGC

org.gradle.jvmargs=-Xmx5g -XX:+UseImNotOkayGC

This collector is meant for large Android builds, where memory pressure quietly builds up across tasks, variants, and modules, and when memory finally runs out, instead of pretending everything is fine, it tells you exactly how bad things really are.

[GC] I’m fine.
:app:generateReleaseRFile FROM-CACHE    
:app:compileReleaseKotlin   
[GC] Actually I’m not fine.
[GC] Full GC.
[GC] It didn’t help.

Funny, right? Consider this my small April Fools contribution.

Jokes aside, I wanted to try something more serious.

Ephemeral Android CI Builds

This is an idea I had wanted to try for a long time: Gradle builds have a very well-defined lifecycle.

Android CI builds are not long-lived services. They are ephemeral workloads with recognizable phases: startup, rising memory pressure, heavy execution, and then termination. Even if the exact task graph changes between assembleDebug and assembleRelease, the overall shape is still much more structured than a backend service running for months or supporting thousands of requests per minute.

And yet we use the same general-purpose garbage collectors for both worlds.

Of course, we can tune JVM arguments, but that usually turns into trial and error. What made this experiment different is that I used Codex to push the idea much further. Instead of stopping at flag tuning, I was able to build an iterative workflow: capture a baseline, profile the build with JFR, look for recurring allocation and GC patterns across projects, modify the JDK, rerun the scenarios, and compare the results.

That process is probably the most interesting part of this article.

The original idea was simple: if Android CI builds have such a specific workload shape, could we get something closer to a GC policy designed for that kind of execution? Not a completely different collector, but a G1-based policy that reacts better to the memory pressure patterns we see in ephemeral Android builds.

The Experiment

The first step was to create a stable baseline. I forked the JDK, built a custom JDK 23 distribution, and used standard G1 in both the Gradle and Kotlin daemons so I could compare later policy changes against a consistent starting point.

From there, the workflow became iterative:

Run a representative set of Android builds using the published JDK 23 baseline build with standard G1.
Capture telemetry, especially JFR and GC data.
Identify common patterns in memory pressure and pause behavior.
Modify the GC policy in the JDK.
Rerun the same scenarios with the updated build.
Compare again, then go back to step 4.

For each project, I tested two tasks, assembleDebug and assembleRelease, under two execution modes: non-warm runs on fully clean agents, and warm runs that reused the Gradle user home while excluding task output cache artifacts.

Each variant was executed 10 times, and every run generated the full set of profiling artifacts, including JFR recordings, GC data, and build metrics. That gave me something more reliable than a single build result. Instead of looking at isolated executions, I could compare repeated runs and see which patterns actually held.

Codex played a major role in making that process practical. It helped me move faster through the mechanical parts of the experiment, from changing JDK behavior to wiring the runs and analyzing the outputs, so I could spend more time on the observations and less time on setup.

If you are curious, here is an example of a warm baseline execution for one of the projects:
https://github.com/cdsap/im-not-ok-metrics/actions/runs/23775355907

Each run produced a fairly detailed profiling summary. A single example looked like this:

# Build Profiling Summary

- Build duration: 392.00s
- Build exit code: 0
- JDK: 23-internal-adhoc.runner.jdk
- Gradle: 9.4.0
- Build scan: https://gradle.com/s/vajkodzs7myja
- Declared GC profiles: gradle-daemon=openjdk-default, kotlin-daemon=repo-default, test-jvm=repo-default
- Reported collector labels: gradle-daemon=openjdk-default, kotlin-daemon=repo-default

## Per-process highlights

- gradle-daemon (pid 2901): max RSS 4304416.0 kB, avg RSS 2484696.2151898732 kB, max CPU 285.0%, collector openjdk-default, runtime GC G1, alloc mode n/a, alloc rate n/a MB/s, GC p95 112.87134999999998 ms, GC max 380.607 ms, total GC 25887.478 ms

- kotlin-daemon (pid 13382): max RSS 2422960.0 kB, avg RSS 2040902.820143885 kB, max CPU 302.0%, collector repo-default, runtime GC G1, alloc mode n/a, alloc rate n/a MB/s, GC p95 193.8508499999999 ms, GC max 239.578 ms, total GC 8645.073 ms

## Correlated peaks

- gradle-daemon pid 2901 peaked near :app-scaffold:assembleDebug at 2026-03-31T02:34:15Z
- kotlin-daemon pid 13382 peaked near :app-scaffold:extractDebugAnnotations at 2026-03-31T02:34:06Z
- gradle-worker pid 14199 peaked near :libraries:util:checkKotlinGradlePluginConfigurationErrors at 2026-03-31T02:31:16Z
- gradle-worker pid 14222 peaked near :libraries:util:checkKotlinGradlePluginConfigurationErrors at 2026-03-31T02:31:16Z
- gradle-worker pid 16210 peaked near :app-scaffold:extractDebugAnnotations at 2026-03-31T02:34:00Z

## Artifacts

- Metadata: /home/runner/work/im-not-ok-metrics/im-not-ok-metrics/project/artifacts/20260331T022747Z/metadata.json
- GC logs: /home/runner/work/im-not-ok-metrics/im-not-ok-metrics/project/artifacts/20260331T022747Z/logs/gc
- JFR: /home/runner/work/im-not-ok-metrics/im-not-ok-metrics/project/artifacts/20260331T022747Z/logs/jfr
- OS metrics: /home/runner/work/im-not-ok-metrics/im-not-ok-metrics/project/artifacts/20260331T022747Z/logs/os/process_metrics.csv

What Changed in ImNotOkay

After analyzing the baseline data, Codex proposed a set of changes based on the idea that Android Gradle builds on ephemeral agents behave differently from long-running JVM applications.

The idea behind ImNotOkay is to make G1 a bit more aware of the build phase instead of treating the whole build the same way from start to finish.

At the beginning, during roughly the first minute, it stays out of the way and lets Gradle run normally so the configuration phase can move as fast as possible.

It only starts intervening once the build shows real signs of memory pressure. At that point, it tries to keep memory behavior more controlled by preventing the young generation from growing too aggressively.

The goal is not to make G1 more restrictive all the time, but to react when the build starts entering a heavier phase. And if that pressure continues for longer, the policy gradually relaxes again so the build can recover throughput instead of staying constrained for too long.

You can see the complete diff here:
https://github.com/cdsap/jdk/compare/imnotokay-jdk23-baseline...cdsap:jdk:im-not-okay

Results

I should also be honest about the outcome: at least for now, this was a failed attempt:

Project	Config	Mode	Duration Δ	Gradle GC p95 Δ	Gradle GC total Δ
android-nowinandroid	assemble-debug	nonwarm	🔴 +1.24%	🔴 +5.07%	🔴 +2.10%
android-nowinandroid	assemble-debug	warm	🟢 -2.45%	🟢 -13.60%	🔴 +3.21%
android-nowinandroid	assemble-release	nonwarm	🟢 -1.61%	🔴 +21.51%	🔴 +14.69%
android-nowinandroid	assemble-release	warm	🔴 +2.71%	🔴 +6.41%	🔴 +4.24%
CatchUp	assemble-debug	nonwarm	🟢 -1.20%	🔴 +5.48%	🔴 +9.06%
CatchUp	assemble-debug	warm	🔴 +2.32%	🟢 -22.89%	🔴 +10.96%
CatchUp	assemble-release	nonwarm	🟢 -0.36%	🔴 +7.43%	🔴 +1.63%
CatchUp	assemble-release	warm	🔴 +1.38%	🔴 +12.89%	🔴 +24.01%
GeneratedProjectMedium	assemble-debug	nonwarm	🔴 +1.10%	🔴 +3.67%	🔴 +13.79%
GeneratedProjectMedium	assemble-debug	warm	🔴 +0.79%	🔴 +1.75%	🟢 -7.34%
GeneratedProjectMedium	assemble-release	nonwarm	🟢 -1.01%	🟢 -1.65%	🟢 -1.14%
GeneratedProjectMedium	assemble-release	warm	🟢 -1.60%	🔴 +4.04%	🔴 +6.93%
GeneratedProjectSmall	assemble-debug	nonwarm	🟢 -0.35%	🔴 +16.19%	🟢 -2.62%
GeneratedProjectSmall	assemble-debug	warm	🔴 +2.60%	🟢 -25.00%	🟢 -11.13%
GeneratedProjectSmall	assemble-release	nonwarm	🔴 +2.78%	🟢 -4.02%	🟢 -0.79%
GeneratedProjectSmall	assemble-release	warm	🔴 +0.15%	🟢 -0.70%	🟢 -0.67%
GeneratedProjectLarge	assemble-debug	warm	🔴 +0.27%	🟢 -13.34%	🟢 -12.34%
GeneratedProjectLarge	assemble-release	nonwarm	🟢 -1.14%	🔴 +7.50%	🟢 -0.36%
GeneratedProjectLarge	assemble-release	warm	🟢 -2.37%	🔴 +11.73%	🟢 -0.28%

The results were not strong enough to justify presenting this as “here is a better GC policy for Android CI.” The behavior was too mixed and inconsistent to call it a success. So I ended up closing the experiment earlier than I expected, partly because I wanted to publish something today and the data did not support a stronger conclusion.

Even if the collector changes did not clearly win, the process itself was still valuable. Workload tracing, JFR profiling, iterative JDK changes, and fast experimentation with Codex let me push the idea much further than I normally would. So while this is not a success story in terms of results, it still feels like a good example of how far this kind of experimentation can go.

Using the new ImNotOkay Collector Policy

If you made it this far and still want to try it, the funny thing is that it is totally possible. You can use the new policy with:

org.gradle.jvmargs=-XX:+UnlockExperimentalVMOptions -XX:+UseImNotOkayGC ...

And you would also need to use the custom JDK in your project:

- name: Download custom JDK artifact
  env:
    GH_TOKEN: ${{ github.token }}
  run: |
    gh run download 23719062741 \
      --repo cdsap/im-not-ok-metrics \
      --name custom-jdk-linux-9ad2e63f1763 \
      --dir "$GITHUB_WORKSPACE/custom-jdk"

- name: Unpack and activate custom JDK
  run: |
    mkdir -p "$GITHUB_WORKSPACE/custom-jdk/unpacked"
    tar -xzf "$GITHUB_WORKSPACE/custom-jdk/custom-jdk-linux-9ad2e63f1763.tar.gz" \
      -C "$GITHUB_WORKSPACE/custom-jdk/unpacked"
    echo "JAVA_HOME=$GITHUB_WORKSPACE/custom-jdk/unpacked/jdk" >> "$GITHUB_ENV"
    echo "$GITHUB_WORKSPACE/custom-jdk/unpacked/jdk/bin" >> "$GITHUB_PATH"

- name: Build
  run: ./gradlew assembleDebug

Final words

Even if the collector changes did not clearly win, the process itself was promising. So while this is not a success story in terms of results, it still feels like a good example of how far this kind of experimentation can go.

And yes, if you are wondering about the name, ImNotOkay came from the song that happened to be playing when I needed one. There is no deeper meaning behind it. The next songs in the playlist were Lateralus by Tool and People of the Sun by Rage Against the Machine, so this was probably the safest outcome.

Using RSS to Understand Memory Pressure in CI Builds

Iñaki Villar — Sat, 28 Mar 2026 23:44:46 +0000

Once in a while, you may have wondered why builds running on CI agents can still hit OOM errors, even on machines with large amounts of memory. For example, how is it possible to hit an OOM on a 32 GB machine even after setting a 16 GB heap?

The first and most immediate answer is that the value configured via jvmargs in gradle.properties applies only to the heap of the Gradle process. From the operating system’s point of view, a JVM process is composed of more than just the heap. Several additional components contribute to the total memory footprint, and these are often overlooked when sizing CI agents or tuning memory limits:

Metaspace
Code cache
Thread stacks
Direct buffers
GC native memory
Native / OS memory

All of these are grouped under the RSS (Resident Set Size) of the Java process on Unix-like systems.

Another important reason is that the Gradle process is not the only JVM involved in a build. We also have the Kotlin daemon, test JVMs, and in Android builds, additional isolated processes such as Lint or R8. Each of these processes has its own heap and its own RSS footprint. Together, all of them contribute to the total memory pressure on the machine.

In OOM scenarios, there is an additional problem: the host machine may kill the Gradle process before the build finishes. When that happens, we lose valuable diagnostic data. In CI environments, and especially in GitHub Actions, this is even worse because we usually cannot attach post-build steps to collect more information.

Since OOM scenarios are exactly the situations where visibility matters most, I ended up building a GitHub Action for that: Process Watcher.

In this article, we track memory behavior over time across JVM processes, combining RSS, heap usage, and GC activity. The goal is to move beyond static numbers and understand how memory pressure evolves during the build.

Capturing the RSS of a process

To understand real memory usage during build execution, we need to analyze RSS, not just heap size. On Unix-like systems, RSS reflects the physical memory currently held by the process, which makes it a better signal for understanding memory pressure.

To check the RSS of a process:

ps -o rss= -p "$PID"

This command outputs values in kilobytes, for example:

That means ~639 MB of physical RAM.

At first glance, we could collect this data at the end of the build. But this has two problems:

Not all processes live until the end
The build can be killed or time out

If the build is killed, we lose everything, no data, no diagnostics.

Because of that, I decided to take a different approach: run a separate monitoring process during the build.

Initially, the approach was simple: capture RSS and heap usage during the build and archive the data at the end of the execution. A typical output looked like this:

Elapsed_Time | PID | Name | Heap_Used_MB | Heap_Capacity_MB | RSS_MB
00:00:05 | 149 | GradleDaemon | 29.7MB | 86.0MB | 241.0MB
00:00:10 | 149 | GradleDaemon | 191.7MB | 338.0MB | 560.1MB
00:00:16 | 149 | GradleDaemon | 113.1MB | 198.0MB | 428.4MB

With this, we can visualize the data and calculate the total RSS across all Gradle processes:

In addition to RSS and heap usage, we can also track cumulative GC time and better understand how memory behaves during the build.

That worked for successful builds, but it had the same limitation: if the main Gradle process was killed, we lost all the data.

To address that, I added a remote mode that publishes the data to a Firebase database, allowing live monitoring even when the build fails or is interrupted.

With that in place, we can now look at some practical scenarios where this kind of visibility helps explain memory behavior in Android builds.

The case of misaligned Kotlin versions

We start with a known suspect. As mentioned in previous articles, this scenario can happen when the Kotlin version embedded in the Gradle distribution is misaligned with the Kotlin version used by the project.

If we run a typical nowinandroid build (:app:assembleProdDebug) and attach our GitHub Actions instrumentation tool, we observe the following memory profile:

The image clearly shows two Kotlin processes. The first one, PID 5133, is spawned during the compilation of the included builds and remains unused during the execution phase.

Although its heap usage at the end of the build is only 429 MiB, its RSS footprint accounts for 8.4% of the total RSS memory of the build. In environments closer to the memory limit, for example on a free GitHub Actions runner, this alone can represent around 4% of the available memory.

The key point is that the RSS of this first Kotlin process is never reclaimed, so that memory remains allocated for the entire build. In practice, this reduces the memory available for the rest of the build without providing any benefit during execution.

Timeouts

In the second scenario, we analyze another common case. We have heard several times from users that some builds hit the timeout defined in the job configuration.

These timeouts act as a safeguard against builds running indefinitely, but they also indicate that something is not behaving as expected. When the timeout kills the agent, we lose the Gradle process and any information that would normally be reported at the end of the build.

In some cases, the issue is related to thread locks. In others, it is an unexpected memory situation that can be understood by analyzing memory metrics across the different JVM processes.

For instance, let’s analyze this build:

The total RSS is not hitting the maximum, and the agent is not killed due to memory pressure. Instead, the build is terminated by a timeout.

At first glance, this does not look like a typical OOM scenario. But if we look at how memory behaves over time, a different pattern appears.

One interesting detail is that we observe an almost flat pattern in the later stages of both the Kotlin and Gradle processes. In this case, it is useful to review the GC graph:

The GC activity of the Gradle process shows a clear linear growth over time, which indicates that the heap is under pressure and memory is not being reclaimed efficiently. This is the kind of pattern that may not immediately fail the build, but still keeps it alive in a degraded state until the timeout is reached.

The key point is that if we detect this pattern early, we can stop the build sooner and avoid wasting time and resources. In this example, that could save up to 30% of the build time.

To get a more realistic view, this scenario shows that if we detect this behavior early, we could cancel the build and avoid the overhead of letting it continue:

G1 vs Parallel GC

Another common question is which GC is more suitable. Performance is important, but we should also consider the RSS footprint when comparing different GC strategies.

In many cases, we focus only on build time, but memory behavior can vary significantly between GC implementations. Some may be faster, while others allow the OS to reclaim memory more efficiently.

Let’s look at a G1 vs Parallel comparison of the build:

From these measurements, we can observe that, regardless of the performance outcome, the OS is able to reclaim memory more efficiently with G1. That tradeoff can matter in CI environments where staying below the memory threshold is more important than small differences in execution time.

For completeness, here is the cumulative view, left G1, right Parallel:

The OOM puzzle

Finally, let’s look at perhaps the most valuable use case for this kind of monitoring: an OOM-killed build.
It all starts with this discouraging message in GitHub Actions, where we don't get any additional feedback and the post steps haven’t been executed:

We get no useful feedback, and as mentioned before, we also do not have the chance to run post steps to archive logs, measurements, or any other diagnostic data.

In this case, if we enable the remote mode of the Build Process Watcher, we can at least preserve the latest snapshot of the Gradle processes before the agent kills the container or the build. What we get is this:

We know that GitHub Actions free runners have a 16 GB memory limit, and in the image we can already see that, before the failure, the Gradle process was increasing its RSS in a clear high-memory-pressure scenario.

In this case, the Gradle heap was configured to 10 GB. One common misconception is to assume that this is enough, or to leave the Kotlin daemon heap unspecified, assuming the defaults will be good enough. But the important detail is that the Kotlin process still contributes its own memory footprint. Looking at the data, we can see that its peak RSS reached 6962.0 MB, adding more pressure on top of the Gradle process.

So it is easy to see how the build gets dangerously close to the machine limit. But the point is not only to spot the problem, it is to make the build work.

What I did here was try different memory splits between the Gradle and Kotlin processes and compare the runs with Process Watcher. Looking at RSS growth, heap usage, GC behavior, and whether the build completed or not gave me a better direction, but it still took some experimentation to find a stable configuration.

In this case, the fix was not just increasing memory. After trying different combinations, the only stable one was 7 GB for Gradle and 3 GB for the Kotlin process. Other combinations were still ending in OOM or timeout:

So for me, the interesting part here is not only the final numbers, but how we got there. By observing the RSS pressure and comparing the runs, we were able to move from an unstable memory profile to a configuration that was sustainable for the runner limit.

Final Words

In my case, for GitHub Actions, I published Process Watcher, but the general idea is simple and can be implemented in different ways. The important part is not the specific tool, but having a way to observe RSS, heap usage, and GC behavior while the build is running. That visibility makes it much easier to understand memory pressure and iterate toward more stable configurations.

One note: to use the visualization tools in Process Watcher, you do not need to enable the remote option. The site provides a replay view and a compare view that can be used with the artifacts generated at the end of the build, without publishing data to Firebase. You can also just download the generated HTML files and open them locally.

Happy Building!

What Happens When You Kill the Kotlin Daemon Before R8?

Iñaki Villar — Tue, 30 Dec 2025 15:15:04 +0000

Every Android build spins up more JVM processes than most developers realize. Beyond the Gradle wrapper, there’s the main Gradle process orchestrating the build and several child processes created for specific tasks. These include test processes that execute in isolation, optional separate JVMs for tools like Lint or R8 when configured to run out of process, individual Java tasks that may also run in their own JVM, and the Kotlin compiler process, which delegates all Kotlin compilation units to a dedicated Kotlin daemon:

Understanding how these processes interact helps uncover hidden inefficiencies that affect both memory usage and build performance, especially in CI environments.

In general, these child processes terminate once their associated task has completed. However, that’s not the case for the Kotlin process. If we inspect the running JVMs after a build finishes, we can still see both the main Gradle process and the Kotlin daemon alive — even after all compilation work has ended:

3564 GradleDaemon
15564 KotlinCompileDaemon

This behavior is intentional. The Kotlin daemon stays alive to speed up incremental builds by avoiding the startup overhead of a new compiler process. While that optimization is useful for local development, it provides little benefit in CI environments, where builds are typically clean and short-lived.

In Android builds, which are often highly modularized, the Kotlin compiler is required across all modules that include Kotlin sources. Because of this, the Kotlin process remains active throughout the compilation of all modules. The last compilation unit is typically the main entry point or Android application module, and these modules tend to be heavier, often running additional demanding tasks after Kotlin compilation. This makes it a natural point where the Kotlin process is no longer needed, and releasing its memory can benefit the tasks that follow. This detail can be crucial in environments that are close to their available memory threshold, where freeing the Kotlin process at the right moment can prevent OOM errors and improve overall build stability.

The main goal of this article is to experiment and measure the impact of changing this process behavior, based on the idea that a persisted process is not required once it has completed all the tasks associated with it.

R8 Tasks

In this analysis, we focus specifically on release builds. These builds execute the R8 task, whose main responsibility is to shrink, obfuscate, and optimize the app components that will be packaged into the final binary. This task plays a critical role in Android builds and is often a major contributor to build duration due to its computational cost.

An interesting aspect of R8 is that, by design, it runs at the very end of the build process, once all other compilation and linking phases have been completed:

Because of this sequencing, there’s no overlap between R8 and the Kotlin compiler within the same build variant. Once Kotlin compilation has finished, R8 operates independently, processing bytecode, resources, and dependencies. This means that the Kotlin process, still alive at this point, isn’t performing any work and can safely be terminated:

The goal is to free unnecessary memory before R8 executes, ensuring the system has as much available memory as possible for the final phase of the build. While this doesn’t directly address R8’s primary bottleneck, its CPU-bound processing, it introduces a secondary hypothesis worth testing:
tasks may execute faster in more isolated process environments, where memory and CPU resources face less contention.

How to terminate the process

The implementation is straightforward. We implemented a ValueSource that, through an injected ExecOperations, executes the command used to kill the existing Kotlin processes:

override fun obtain(): String {
    ...
    return try {
        execOperations.exec {
            try {
                commandLine("sh", "-c", parameters.commands.get())
            } catch (ignored: Exception) {
            }
        }
        ...
}

This allowed us to create a new provider with the value source and the termination command:

val provider = project.providers.of(KillKotlinCompileDaemonValueSource::class.java) {
    parameters.commands.set(DEFAULT_COMMAND)
}

// Terminate Kotlin processes:
const val DEFAULT_COMMAND =
    "jps | grep -E \"KotlinCompileDaemon\" | awk '{print \$1}' | xargs -r kill -9"

Then, we have created a task that receives the provider as input and executes the command during the task action:

abstract class KillKotlinCompileDaemonTask : DefaultTask() {

    @get:Input
    abstract val kotlinDaemonKillInfo: Property<String>

    @TaskAction
    fun killDaemons() {
        kotlinDaemonKillInfo.get()
        logger.lifecycle("Kill Kotlin compile daemon command executed")
    }
}

Finally, we wire up when we want to execute the termination process. We register our task and make it run after the Kotlin compile task in the main application module.

Simple demonstration

How does this look in practice? If we analyze only the memory allocation of build child processes, we see that after our task runs and kills the Kotlin process, the impact is clear:

There is also a bigger benefit. We are not just releasing the heap allocation of the Kotlin process, we are freeing the entire memory footprint of that process and giving it back to the operating system:

The same applies in environments where Kotlin versions are not aligned and multiple Kotlin processes exist:

In summary, we are successfully returning the Kotlin process memory back to the OS and creating a lighter environment before R8 execution. Next, we analyze the results of the experiments.

Experiment environment

Because we want to measure the impact in CI builds, we set the entire experiment in GitHub Actions.
We selected three different projects:

nowinandroid: https://github.com/android/nowinandroid
A synthetic project with 120 modules created by ProjectGenerator: https://github.com/cdsap/androidRectangle120modules
Signal Android app: https://github.com/signalapp/Signal-Android

Each project has two variants: the default main configuration, and a variant applying the R8Booster plugin in the Android application module. This plugin provides the task that terminates the Kotlin process right after the Kotlin compilation phase.
For the iterations, we ran one warm-up build to download dependencies and initialize the Gradle User Home, followed by 100 iterations of the assembleRelease task on fresh agents, reusing only dependencies and transforms from the cache.

The tasks under experiment were:

nowinandroid: :app:assembleProdRelease
synthetic project: assembleRelease
Signal Android: :Signal-Android:assemblePlayProdRelease

Results Experiment

After preparing the environment and running 100 iterations per variant, we aggregated the results to compare build time, R8 task duration, and peak memory usage.

The following table shows the results of the experiments for the main metrics across all projects, based on the mean values:

Project	Build Time Improvement	R8 Task Improvement	Max Memory Reduction	Notes
Now in Android	0.0%	1.5%	14.7%	Minimal time change, strong memory reduction
Synthetic Project (120 modules)	1.7%	5.6%	13.3%	Moderate, consistent gains across all metrics
Signal Android App	3.1%	7.0%	14.5%	Best overall results, closest to a real project scenario

Terminating the Kotlin process after compilation had no negative impact on build performance and noticeably reduced memory usage across all projects.
The Signal Android app, representing a real-world scenario, achieved the best overall gains, with build times 3% faster, R8 7% faster, and memory reduced by about 15%.

nowinandroid

Build Scans:

Build time

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	283	283	0	0.0
median	283	283	0	0.0
p90	294	292	2	0.7

R8 Task

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	135	133	2	1.5
median	135	133	2	1.5
p90	140	137	3	2.2

Peak Build Memory

Unit: GiB

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	8.69	7.41	1.28	14.7
median	8.69	7.38	1.31	15.1
p90	8.81	7.72	1.09	12.4

Synthectic project 120 modules

Build Scans:

Build time

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	654	643	11	1.7
median	652	643	9	1.4
p90	676	660	16	2.4

R8 Task

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	90	85	5	5.6
median	90	85	5	5.6
p90	95	89	6	6.3

Peak Build Memory

Unit: GiB

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	11.77	10.20	1.57	13.3
median	11.72	10.19	1.53	13.1
p90	12.07	10.54	1.53	12.7

Signal Android

Build Scans:

Build time

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	604	585	19	3.1
median	601	583	18	3.0
p90	626	600	26	4.2

R8 Task

Unit: seconds

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	215	200	15	7.0
median	215	200	15	7.0
p90	225	207	18	8.0

Peak Build Memory

Unit: GiB

Metric	main	Terminate Kotlin Process	Diff	Diff %
mean	15.38	13.15	2.23	14.5
median	15.39	13.13	2.26	14.7
p90	15.45	14.13	1.32	8.5

Final Words

The results show a positive impact. In the project closest to a real application, we saw the best gains in memory reduction and measurable decreases in R8 and overall build times. This confirms the expectation that releasing the Kotlin compiler process returns OS memory in full and can help builds that have heavier Kotlin phases. In practice, this translates into a lower peak memory usage, which, when close to system thresholds, can be the difference between a successful build and an out-of-memory error in CI environments.

While we focused entirely on the R8 tasks, you may want to experiment with this approach in other scenarios where the Kotlin compiler is no longer required, but a heavier post-compilation task follows, such as Java compilation.

Remember that this approach was tested only for single-variant release builds. If you want to apply the same idea elsewhere, you will need to adjust the orchestration for when the Kotlin process should be terminated based on your project’s requirements.

Happy building!

Gradle Learning Day: Reinforcement Learning for Build Optimization

Iñaki Villar — Sat, 23 Aug 2025 18:07:55 +0000

This month at Gradle, we had our Learning Day, a day dedicated to exploring new ideas and experimenting with technologies outside our usual work. The theme this time was AI.

While brainstorming ideas, I remembered a video that completely blew my mind, one that I especially loved as a soccer fan:

That led me to think about reinforcement learning, a branch of machine learning where a system learns through rewards and penalties from its actions.

In the build engineering world, there’s always a recurring question: what’s the optimal configuration, for example in terms of heap memory or workers, for a project? It’s a tough one because the answer depends on many factors, and often the most honest reply is simply, "It depends."

So my idea was: why not use reinforcement learning to help calculate the best build configuration? During Learning Day, I started a small experiment, later expanded it, and that’s what I’m sharing here today.

A Quick Look at Reinforcement Learning

Reinforcement Learning is a type of machine learning where an agent makes decisions by interacting with an environment. Each action gives the agent a reward or a penalty, and over time, the agent learns which decisions lead to better results:

(Image: https://en.wikipedia.org/wiki/Reinforcement_learning)

Building an RL Framework for Gradle

This was the initial idea: explore whether reinforcement learning could treat Gradle builds as its environment and use performance as the reward signal. Faster builds or lower memory usage would yield a positive reward, while slower or heavier builds would yield a negative reward. From there, the agent could learn which configurations produce the best outcomes.

Beyond the RL approach, I also wanted to understand how to deploy the agent. The goal was to demonstrate the full cycle—from defining an experiment to orchestrating the build executions and collecting the resulting data. I’m happy to say I built a working proof of concept: an agent deployed on GCP, integrated with Cloud Functions, with GitHub Runners orchestrating the build executions.

However, I simplified some parts to get a working POC without spending too much time. You’ll find more details in the next sections. Here’s a high-level diagram of the setup:

Let’s walk through the main parts.

The RL Agent

The RL agent is the brain of the system, proposing which configurations to try. What do I consider “configurations”? Any setting that materially affects build performance. Today there are hundreds of such parameters across the JVM, Gradle, Kotlin, and even component-specific systems like AGP or Dagger. Initially, I targeted JVM parameters. A production-ready optimization system would expand to include individual JVM flags (e.g., -XX:NewRatio, -XX:MaxMetaspaceSize), garbage collector selection, and compiler optimizations. For this POC, I focused on just three:

Gradle Workers
Xmx for Gradle process
Xmx for Kotlin process

I know it’s simple, but it’s a solid starting point. Even with just three parameters, the number of combinations grows quickly. Since we don’t have infinite resources or time to test every combination, I added guardrails to constrain the search space and define which options we’ll explore:

I’m constrained by the environment (GHA), which provides only 4 workers.
All experiments were run on modularized Android projects, and by 2026 I know it’s not realistic to build with just 1 GB of memory.
To avoid OOMs, I limited the max heap to 8 GB in both processes.

For this experiment, the reward was build time — yes, just build time. I initially started with a formula that included GC metrics for both processes and the mean Kotlin compile time, but the idea was to keep it simple and working first, so I can iterate later.

Next, I’ll go over the different learning models I tested:

First Attempt with Q-tables

In the initial iteration, I was relying entirely on a Q-table. The Q-table is a lookup table that stores the learned value (Q-value) for each action-state combination. In our Gradle build optimization context:

Actions: Parameter combinations (max_workers, gradle_heap_gb, kotlin_heap_gb)
Q-Values: Learned rewards for each parameter combination

This is an example Q-Table entry:

{
  "4_6_8": 0.12524971,  // 4 workers, 6GB gradle, 8GB kotlin → Q-value 0.125
  "2_7_5": 0.12335408,  // 2 workers, 7GB gradle, 5GB kotlin → Q-value 0.123
  "1_2_8": 0.11712929   // 1 worker, 2GB gradle, 8GB kotlin → Q-value 0.117
}

In the first experiment run, I observed the following behavior:

One iteration was repeated six times, and with such a small total (15 iterations), I missed the chance to explore further.

In Q-tables, there are three different phases:

Initialization
Exploration
Exploitation

It’s critical to define appropriate values for exploration and exploitation when, as in our environments, the set of actions is finite. Too much exploitation too early could lead to the problem described earlier. As we’ll see in the GitHub Runner section, I parallelize N builds during action execution, which can increase exploration. I didn’t go deeper into Q-Tables and instead moved on to a simpler approach.

Adaptive Exploration Strategy

For the version described in this article, I chose an adaptive exploration strategy where the final “best action” is determined purely by observed build performance, not by learned Q-values. This makes the current implementation even simpler.

best_variant = max(variants, key=lambda v: v.get('reward', 0))

And build performance here is measured purely by build time. I initially tried incorporating GC time from the processes into a distributed reward formula — that was the original idea — but I still need to better understand its implications.

RL API Component

The RL API is built with FastAPI, deployed on GCP, and acts as the communication layer between GitHub Actions and the reinforcement learning engine. It exposes endpoints that cover the full experiment lifecycle. The primary endpoint, /get-action, receives experiment requests and returns Gradle configurations. Another key endpoint, /send-feedback, ingests build results from GitHub Actions and computes rewards on a continuous logarithmic scale. We also use Firestore to persist action results and experiment metadata.

Github Runners

If you think about it, one might assume we could simply run this locally, serving the RL agent and measuring build executions. While technically possible, this would essentially hijack our system resources during the RL experiment, and any other processes running at the same time could distort the results.

With Telltale, I’ve already demonstrated that it’s possible to orchestrate sequences of builds across different scenarios while maintaining both isolation and fairness. Following the same philosophy, I didn’t want to base our results on a single build — instead, we ran multiple builds to reduce noise and avoid the trap of regression to the mean.

Inspired by Telltale, we adopted a similar approach: whatever the RL agent decides to execute, we delegate to GitHub Actions, ensuring it runs in an isolated and repeatable environment:

Initially, we use a seed step with two main purposes:

Populate the GHA runner cache with dependencies.
Modify the project using actions to save the project state for later.

After this, we execute the build n times based on the number of iterations defined for the experiment. In this initial version, we limited the total to 150 builds to avoid overloading the GHA runners and impacting my teammates in Develocity.

For each experiment, the parallel builds are distributed as follows:

15 iterations: 1 seed build + 10 builds per iteration
30 iterations: 1 seed build + 5 builds per iteration
50 iterations: 1 seed build + 3 builds per iteration

Finally, it’s worth noting that the action proposed by the RL agent is passed through a workflow dispatch input defined as rl-actions:

  rl-actions:
    description: 'RL-generated action parameters (JSON string)'
    required: false
    default: '{}'

And we’ll be able to track the progress of the different actions executed in the experiment directly from GHA:

Collecting data

Having the GitHub runners execute builds in parallel, we still need to collect the action data from each experiment and submit the feedback to the RL Agent.

The projects under experimentation in this article are connected to Develocity. Each build publishes a Build Scan to Develocity, and to identify the non-seeding builds of each action we tag them with the experiment and action identifiers, such as experiment-1755896319523_W2_G4_K8:

The Develocity API provides endpoints to retrieve the initial reward fields needed for calculation, such as build duration and task execution information. Additionally, you can extend the Build Scan data, as I’m doing, to report both the Kotlin process GC time and the Gradle process GC time with the plugins InfoKotlinProcess and InfoGradleProcess, as custom values:

You can aggregate the data using your preferred Develocity API client. In the scope of this article, we are using: BuildExperimentResults.

Results in Action

I built a working POC where you can trigger an experiment by including the Repository name, the task, and the number of iterations you want to run:

The URL is available at: https://rlgradleld.web.app/

I’ve disabled the creation of new experiments, but feel free to ping me if you’d like to see a live demo with one of your preferred projects that can run within the free tier of GitHub Actions.
In the UI, you’ll find a set of experiments we’ve run for different projects:

Additionally, I’ve published the repo that contains the RL Agent, Cloud Functions, UI, and GitHub Actions runners:
https://github.com/cdsap/RLGradleBuilds

Please follow the instructions and let me know if you have any questions about the setup.

Final Thoughts

Working on this Learning Day project was very interesting. I still have some mixed feelings, since the rewards were purely performance-based, and I would have liked to explore more advanced RL mechanisms such as:

Learning from actions we already know will be bad, to avoid further exploration of those (for instance, running with 1 worker or a very low Gradle heap size).
Using a composite reward formula that incorporates GC times and average Kotlin compiler task durations.
So far, all experiments have run in scenarios with only cached dependencies. I plan to extend this to other cases, such as best-case builds or incremental builds, with the ultimate goal of dynamically configuring memory per scenario, guided by the RL mechanism.

On the infrastructure side, all the core components are already in place: the RL agent is connected to Cloud Functions, Firestore stores actions and experiment data, and GitHub Actions runners orchestrate executions and process build information published to Develocity.

Happy experimenting and happy building!

Results After 3 Months of Android Gradle Build Experiments with Telltale

Iñaki Villar — Mon, 23 Jun 2025 03:41:50 +0000

It's been three months since I released the Telltale GitHub Pages site, automating the creation and analysis of different Gradle build experiments. Previously, running experiments required manual analysis and a companion article. This automation has added both flexibility and speed, enabling us to run more experiments more easily.

As a quick reminder, Telltale is a framework that automates the infrastructure to run Gradle builds across different variants. It works alongside Build Experiment Results to aggregate the data published to Develocity. In the latest iteration, I’ve added integration with the OpenAI API to analyze and compare experiment results.

While many experiments didn’t yield meaningful insights, this article highlights a few that did.

Kotlin 2.1.20 vs 2.1.0

In this routine experiment, updating to a patch version of the Kotlin Gradle Plugin, I didn’t observe significant differences in build duration, configuration time, or Kotlin compilation duration. However, one metric did stand out:

The IR translation phase of the Kotlin compiler increased significantly in 2.1.20. Although the overall compilation time remained similar, we reported the behavior in JetBrains' YouTrack. The root cause remains unclear—it’s possible the compiler is shifting work between phases, keeping total time stable.

Comparing `-Xms` usage

The -Xms JVM flag defines the initial memory allocation pool. In Android builds, I initially assumed it wouldn't matter much, until I saw Jason Pearson submit a PR to nowinandroid enabling this flag for Gradle and Kotlin processes. It seemed like the perfect candidate for analysis.

The result: a 3.5% reduction in build time by setting the initial heap size.

More interestingly, the number of garbage collection operations dropped significantly:

And naturally, GC duration also improved:

Bottom line: enabling -Xms led to more efficient memory usage and GC behavior. Thanks, Jason!

Reducing worker parallelization of the Kotlin compiler

This experiment explored whether reducing Kotlin compiler parallelization, while keeping the overall Gradle worker pool unchanged, would impact build performance. In highly modularized projects, parallel compilation of many Kotlin modules can increase memory pressure on the system.

Since our experiments run on GitHub Actions runners with only 4 available workers, we’re constrained in testing more vertically scaled scenarios. Still, this setup allowed us to observe meaningful differences.

The overall build duration—in both mean and median—remained nearly identical between the two configurations. This indicated that reducing Kotlin parallelism did not negatively impact the total build time.

However, two key observations stood out:

The aggregated Kotlin compiler duration remained the same, showing no major speed-up in the Kotlin phase itself.

The memory usage of the Gradle process decreased slightly:

More notably, the task app:mergeExtDexDemoDebug, one of the most expensive in the build, improved by 10.3% when Kotlin compiler parallelization was reduced:

This suggests that reducing Kotlin parallelization can relieve memory pressure, enabling better performance for unrelated Gradle tasks. The Kotlin compiler might be competing for resources with other parts of the build, so reducing its concurrency can help other tasks run more efficiently.

Parallel vs G1

In a previous experiment, we compared Parallel GC and G1 GC in the context of the nowinandroid project and observed that Parallel GC consistently outperformed G1. However, nowinandroid is a relatively lightweight project, and we wanted to validate whether those results held in a large build.

For this experiment, we used a project with 400 modules, which demands significantly more heap during compilation phase. The results confirmed our hypothesis:
Parallel GC was still faster, with a ~6% reduction in build time (around 55 seconds) compared to G1 GC:

Tasks like Kotlin compilation and DEX merging saw consistent performance gains when using Parallel GC:

Even at the process level, the main Gradle process showed better performance and resource efficiency under Parallel GC:

This experiment reinforces that Parallel GC continues to perform better even in larger Android projects with high memory demands. However, it’s important to note that these results are specific to CI environments, where consistent throughput and short-lived processes are key.

If you're tuning a local development environment, G1 GC may still be preferable, as it tends to reclaim OS memory more efficiently, which can improve system responsiveness.

Ultimately, you should run these experiments in the context of your own project—the optimal GC strategy can vary based on project size, memory constraints, and whether you're building locally or in CI.

Other experiments

As we mentioned, not all the experiments are going to show regressions and serve as a safety net to verify that the build performance is not impacted, some of them are:

Comparing AGP 8.9 vs 8.10.1 https://cdsap.github.io/Telltale/posts/2025-06-09-cdsap-189-experiment/
Gradle 9.0.0-RC1 vs 8.14.2: https://cdsap.github.io/Telltale/posts/2025-06-19-cdsap-192-experiment/

Other experiments didn't confirm our initial hypothesis of reducing build duration based on different configurations

Using R8 in a different process with 4gb and G1 https://cdsap.github.io/Telltale/posts/2025-03-26-cdsap-152-experiment/

Final words

With this, we have completed the first review of experiments during these three months of automating the execution and analysis with Telltale. We will publish a new article in the next months with the results of more experiments.

Happy Building

Balancing Memory Heap and Performance in Gradle Builds

Iñaki Villar — Fri, 21 Mar 2025 22:00:06 +0000

In this post, we analyze the impact of different memory heap configurations on the performance of a given project. A common assumption is that increasing memory heap allocation improves build performance. However, in this article, we evaluate various metrics to determine the actual effects of different memory settings.

Experiment Setup

The setup for this experiment is straightforward. The free GitHub Actions runners provide a maximum of 16 GB of memory. Our build process, which involves running assembleDebug on nowinandroid, consists of two main components:

The Gradle process
The Kotlin compiler process

To analyze the impact of memory allocation, we define several configurations within the range of 2.5 GB to 7 GB, increasing in 1 GB increments. The minimum of 2.5 GB was chosen because a 2 GB allocation resulted in an OutOfMemory error.

The tested configurations are:

2.5 GB
3 GB
4 GB
5 GB
6 GB
7 GB

These values are set using the following JVM arguments in gradle.properties:

org.gradle.jvmargs=-Xmx{$VARIANT}g -Xms{$VARIANT}g ...
kotlin.daemon.jvmargs=-Xmx{$VARIANT}g -Xms{$VARIANT}g ...

We use the same configuration as nowinandroid, where Xmx and Xms values are identical, and the G1 garbage collector is enabled.

Methodology

Each configuration is executed 20 times on clean agents, with dependencies preloaded in the Gradle user home. Each iteration generates a build scan, which is later analyzed using the Develocity API.

Results

Build Time

The first metric we evaluate is the overall build time for each memory configuration.

A key observation is that there is no clear correlation between increasing heap allocation and reducing build time. Additionally, the low standard deviation suggests that median build times are similar across configurations. Below is a breakdown of the median build times per configuration:

Maximum Memory Usage

Develocity provides resource usage data in the build scans, we analyze the maximum memory usage of the build process. The results align well with the allocated memory configurations:

We observe a linear increase in memory usage up to 4 GB. However, for larger allocations, variance increases, suggesting that the Gradle process might be overconfigured in scenarios with 6 GB or 7 GB.

Kotlin Compiler Memory Usage

Next, we examine memory usage for the Kotlin compiler process:

The data here is more scattered. We did not perform a detailed analysis of this variance. From this point onward, we focus on Gradle process data.

Garbage Collection Time

Now, we analyze Gradle's garbage collection (GC) time across all observations:

For a clearer comparison, we examine the median GC times:

Configurations with 2.5 GB and 3 GB allocations exhibit significantly higher GC times. The differences between larger configurations are much smaller.

Garbage Collections events

For additional insights, we use the GC Report Plugin, which logs information about garbage collections during build execution. The first metric to analyze is the aggregated number of collections(excluding Concurrent Mark Cycle):

Variants with 2.5 GB and 3 GB experience a significantly higher number of GC events.
As memory allocation increases, the number of collections decreases, but the reduction is not linear.

Next, we analyze Pause Young (Normal) (G1 Evacuation Pause) events. These pauses occur when application threads stop while objects in the young generation are collected and moved to either survivor spaces or the old generation:

When examining other factors, such as Humongous objects (Normal), we observe that only the 2.5 GB variant has these entries, Showing short memory state for this variant.

More interestingly, we analyze Pause Young (Concurrent Start) (G1 Humongous Allocation) events. These occur when the JVM anticipates old-generation GC pressure and preemptively starts a concurrent cycle—an indicator that memory pressure is increasing:

Here, lower-memory configurations trigger more collections of this type. Starting at 5 GB, configurations show more stability, with the median value stabilizing at one event per iteration.

Final words

This article analyzed the behavior of the Gradle process under different heap configurations. Key findings include:

Increasing memory allocation does not significantly improve build duration in this project.
Maximum memory usage shows slight variance at higher allocations (6 GB and 7 GB).
Larger heap allocations reduce GC time, but the difference is not substantial.
The number of GC collections decreases with higher allocations, stabilizing around 5 GB.
G1 Humongous Allocation events suggest that configurations with 5 GB or more are better optimized.

Based on these findings, CI environments should balance memory allocation to optimize performance while minimizing resource usage. A 4 GB or 5 GB allocation appears to offer the best trade-off between build performance and memory efficiency.

Happy building!

Gradle 8.11: Faster Configuration Cache and Improved Configuration Time

Iñaki Villar — Sun, 17 Nov 2024 03:03:07 +0000

As modularization becomes increasingly common in Android projects, it increases the number of subprojects within a Gradle build. While modularization brings many benefits—such as improved software development practices and reduced build times by reusing tasks unaffected by code changes—it also has a side effect: the configuration time in Gradle projects increases as the project structure grows. For instance, the following graph represents a build executing the :help task in a project containing between 100 and 1000 modules, incremented in steps of 100, during a fresh daemon build:

The Configuration Cache feature, introduced by the Gradle team, addresses this problem by caching the result of the configuration phase of the build, then reusing it in subsequent builds if no relevant changes have occurred. This feature made local development faster and easier to work with, enabling faster build cycles. However, as projects grow, new optimizations to the configuration cache are needed to continue providing the best possible developer experience.

Gradle 8.11 introduces new improvements to the configuration cache process, including per-project serialization and string deduplication. Additionally, it introduces a new incubating feature that enables storing and loading the configuration cache in parallel, resulting in improved performance.
To enable the feature, add the following flag in gradle.properties:

org.gradle.configuration-cache.parallel=true

In this article, we will share the results of our experiments with the new Gradle 8.11 parallel configuration cache feature in the nowinandroid project and explore how both local and CI builds can benefit from decreased configuration time.

Methodology

As always, before diving into the results, let's take a look at the experiment:

Project

The experiment uses a project forked from nowinandroid(latest commit).
The task used for this experiment is assembleDebug.

Variants Experiment

gradle_8_10, main branch using 8.10 and configuration cache.
gradle_8_11, project using Gradle 8.11 and configuration cache.
gradle_8_11_parallel, project using Gradle 8.11 and parallel configuration cache.

Scenarios

Build with configuration cache miss:
- Dependencies prepopulated in the Gradle user home.
- Using clean agents.
Build with configuration cache hit.

Environment

Github Action runner:

Linux 6.5.0-1025-azure
4 cores
JDK 17
Xmx 4GB (Gradle-Kotlin)

Scenario Configuration Cache Miss

100 iterations for each variant using Telltale to orchestrate the execution.

Scenario Configuration Cache Hit

20 iterations for each variant using Gradle Profiler and Telltale.

Metrics

The build metrics data is published to Develocity using build scans.
With the Develocity API, experiment configuration cache metrics are now accessible via the new endpoint: /api/builds/{id}/gradle-configuration-cache, introduced in Develocity 2024.2. Example output:

{
  "result": {
    "outcome": "HIT",
    "entrySize": 1254385,
    "load": {
      "duration": 500,
      "hasFailed": false
    }
  }
}

Results

Configuration cache entry size

Before diving into the results related to durations, we first analyze the impact of these optimizations on reducing the size of the cache artifact:

Gradle 8.11 has reduced the size of the cache entry for the assembleDebug task by 14.67%.
Enabling parallel configuration results in the same cache artifact size as Gradle 8.11.

Configuration time with cache miss (dependencies already downloaded)

This scenario simulates a configuration cache miss, ensuring that all dependencies are pre-downloaded to eliminate the impact of network latency during dependency resolution. The median result for each variant is as follows:

Gradle 8.11 reduced the configuration time by 4.26%
Gradle 8.11 with parallel configuration cache reduced the configuration time by 8.16%

Configuration Cache Miss (using clean agents)

In this scenario, we are working with clean agents that request dependencies. The median result for each variant is as follows:

Gradle 8.11 reduced the configuration time by 14.5%.
Gradle 8.11 with parallel configuration further reduced configuration time by 31.72%.

Here, we have some very interesting results showing that using the parallel configuration cache reduces configuration time by 85 seconds. This highlights the significant benefits of enabling parallel configuration cache.

Configuration cache operations (dependencies already downloaded)

Storing cache entry

Using the Develocity API, we extracted the store operation duration for each variant in the scenario where the dependencies are already provided:

Comparing 8.10 to 8.11 with parallel configuration cache shows a significant improvement of 29.58% on the median for the operation of saving the configuration cache entry.

Load cache entry

Using the Develocity API, this time we extracted the load operation duration for each variant in the scenario where the dependencies are already provided:

Gradle 8.11: Offers a significant improvement over 8.10 for this metric, reducing the value by over 20%.
Parallel Configuration in 8.11 increases the value slightly compared to default 8.11, it still performs better than 8.10 overall.

Configuration cache operations (using clean agents)

In the scenario with clean agents, we noticed high variability due to non-deterministic connectivity operations. For instance, in the case of the load operation:

and for the store operation:

We observed a slight improvement when using 8.11 parallel, but the visualization is noisy. For this reason, we chose to present the percentiles instead:

Storing cache entry

Gradle 8.11 offers modest improvements over 8.10 across all percentiles, particularly at the median and upper quartile levels.
Gradle 8.11 with Parallel Configuration Cache dramatically reduces metrics across all percentiles, like the median with an improvement of 65%.

Loading cache entry

Gradle 8.11 median value improved (decreased) by approximately 11.6% when moving from Gradle 8.10.
Gradle 8.11 parallel configuration median value improved (decreased) by approximately 14.6% when moving from Gradle 8.10.

Configuration Cache Hit

For the second scenario, using Gradle Profiler, we iterated over the same runner executing the same build. In this case, the configuration cache was hit, and we measured the median configuration time for those builds. The results are as follows:

We noticed a slight improvement in the median configuration time when hitting the cache. However, given the size of the project, the reduction in duration is not significant.

Configuration Cache Hit - load time

Finally, for the same scenario—hitting the configuration cache—we analyzed the output of the Develocity endpoint for the load operation:

Gradle 8.11 and Gradle 8.11 parallel offer significant improvements over 8.10, with reductions of 30.6% and 26%, respectively. However, in the context of this project, the value of the load operation in the experiment is not significant.

References Experiment

Build Scans

Configuration cache miss scenarios

	Dependencies Cache	Clean Agents
Gradle 8.10	Build Scans	Build Scans
Gradle 8.11	Build Scans	Build Scans
Gradle 8.11 Parallel	Build Scans	Build Scans

Configuration cache hit scenarios

	Gradle Profiler Builds
Gradle 8.10	Build Scans
Gradle 8.11	Build Scans
Gradle 8.11 Parallel	Build Scans

Experiments

Experiment	Scenario	Results
Gradle 8.10 vs Gradle 8.11	Dependencies Cache	Experiment
Gradle 8.11 vs Gradle 8.11 Parallel	Dependencies Cache	Experiment
Gradle 8.10 vs Gradle 8.11	Clean	Experiment
Gradle 8.11 vs Gradle 8.11 Parallel	Clean	Experiment
Gradle 8.10 vs Gradle 8.11	Gradle Profiler	Experiment
Gradle 8.11 vs Gradle 8.11 Parallel	Gradle Profiler	Experiment

Final notes

Analyzing the results, we've observed a significant improvement in configuration time when enabling org.gradle.configuration-cache.parallel. This feature not only reduces the configuration time but also reduces the configuration cache entry size. This means the Gradle model saved in the cache uses less space on disk. As a result, the cache is stored and loaded faster, which is especially helpful for big and complex projects.

Of course, the results are based on the project under experiment and may vary depending on your project, but we strongly recommend enabling org.gradle.configuration-cache.parallel to take advantage of these improvements.

Happy Building!

Telltale: Automating Experimentation in Gradle Builds

Iñaki Villar — Sat, 28 Sep 2024 21:01:21 +0000

In this article, I introduce the latest iteration of Telltale, a framework designed to automate experimentation in Gradle builds. This new version extends the execution environment to include different caching modes and environment properties, offering more comprehensive testing capabilities.

But before we explore these new features, let’s briefly revisit the core concept of Telltale to understand its foundation.

The original idea behind Telltale was to create a framework that orchestrates experiments across Gradle builds to understand performance impacts by collecting data and providing insights. These experiments are based on comparing the results of executions between two variants. It supports two types of workflow experiments:

Gradle Profiler (experiment-with-gradle-profiler.yaml): The iterations of the variant experiments are executed on the same agent.
Isolated Iterations (experiment.yaml): Each iteration is executed on a different agent. This article explains these types of experiments in detail.

Today, you can use Gradle Profiler to achieve similar results, and in fact, Telltale offers an experiment workflow mode that integrates with Gradle Profiler. It’s an excellent tool that provides flexibility in setting up the experimental environment and includes scenarios for applying incremental changes across iterations. However, with Telltale, my goal was to ensure that each iteration of the experiment runs in complete isolation by executing the builds on different agents.

But why is such a framework necessary for Gradle builds?

The first reason is the nature of experimentation itself. Software projects are in constant flux, evolving with changes in modules, compilation unit sizes, and new tool updates. Additionally, as the infrastructure changes, such as updated JVM configurations, past performance settings can be obsolete. Experimenting with different configurations helps identify the optimal setup for a project’s current state. While we are increasingly familiar with performance factors, there’s always an element of trial and error to empirically understand how changes affect a project.

The second reason is to create a safety net that helps prevent performance regressions. Once a change is merged into the main branch, it’s often too late to catch these regressions. To address this, a more conservative approach is needed, where the performance impact is evaluated before merging changes. Running regression tests on every pull request (PR), however, is costly and time-consuming. We assume that not all types of changes require regression test execution, so we can limit the scope to PRs that update critical components, such as Java/AGP/KGP/Gradle updates, convention plugins, or central build logic.

Experiment frameworks

An effective experimentation framework must orchestrate multiple iterations of experiment variants and ensure consistency in the environment for each build execution. It should enable parallel execution of the variants to reduce the overall duration of the experiment. The framework also needs to implement a seeding step to prepare the Gradle caching state for the experiments.

Additionally, the framework should be flexible enough to allow multiple iterations for each variant, minimizing build variance. The number of iterations will depend on this variance and, of course, on the cost of the resources used by the experiment—you don't want to upset your infrastructure team. Afterward, you need to process the metrics generated by the builds, which should be published for each execution. Finally, the framework needs to analyze this data and provide the results of the experiment.

The visualization of this process would look something like this:

Given these requirements, how does Telltale provide a solution?

The Telltale approach

Telltale provides an opinionated solution to this challenge. It uses GitHub Actions to execute the experiments, relies on Develocity to publish the data, and utilizes a custom CLI that makes use of Develocity API to process the experiment results.

Initialization
At the initialization step, Telltale defines the parameters of the experiment. Those parameters are defined in the workflow experiment template:

The parameters of the experiment are:

repository: The GitHub repository where the experiment will run.
variantA and variantB: Branch names for the experiment.
task: The Gradle task to execute.
iterations: Number of iterations for each experiment run.
mode: The type of caching to apply during the experiment.
os_args: OS for each variant.
java_args: JDK versions and vendors for each variant.
extra_build_args: Additional Gradle arguments for each variant.
extra_report_args: Configuration for generating reports.

In the new version, we have introduced a mechanism called cache mode. Previously, we executed the variants on fresh agents, which worked well, but in some cases, we want to reduce the interaction with external components—such as downloading dependencies or task caching—to focus on the specific aspects of the experiment. We are now using the Gradle setup action, and thanks to the flexibility of this GitHub action, we can offer different caching modes in the experiment. The supported modes are:

Caching mode	Description
dependencies cache	Caches dependencies only, without caching task outputs
dependencies cache - transforms cache	Caches dependencies, excluding transforms cache
local task cache	Enables caching of task outputs locally
local task cache + dependencies cache	Combines local task caching with dependency caching
local task cache - transforms cache	Caches task outputs locally, excluding transforms
local task cache + dependencies cache - transforms cach	Combines local task, dependency caching, and excludes transforms
remote task cache	Uses a remote server to cache task outputs
remote task cache + dependencies cache	Combines remote task caching with dependency caching
remote task cache - transforms cache	Caches task outputs remotely, excluding transforms
remote task cache + dependencies cache - transforms cache	Combines remote task, dependency caching, and excludes transforms
no caching	Disables all forms of caching

Seeding
As mentioned earlier, in this new version, we are implementing caching modes. Therefore, if the experiment involves caching, we are adding a new step to seed the cache. Thanks to the flexibility of the setup action, we can define how we want to populate the cache, which will later be used during execution. Each variant will execute one build to populate the cache with the elements required for the experiment. For example, if I'm using 'local task cache + dependencies cache,' the task build cache and dependencies used by the project will be provided during the execution of subsequent steps.

In this step, it is important to mark those builds as seeders to exclude them from the final results. Since we are using Develocity, we add a prefix to the tags used in the build.
Once the cache is seeded, the next step is executing the experiments.

Execution
Each variant is executed for n iterations, where the n value is defined during the initialization of the experiment. This is achieved by defining a GitHub Actions matrix:

strategy:
   matrix:
      runs: ${{ fromJson(needs.seed.outputs.iterations) }}

The builds need to include the various aspects of the experiments. Similar to the seeding steps, we use Develocity tags to indicate the different properties of the experiment:

./gradlew ${{ inputs.task }} ${{ inputs.extra-args }} \
     -Dscan.tag.${{ inputs.run-id }} \
     -Dscan.tag.${{ inputs.variant-prefix }}${{ inputs.variant }} \
     -Dscan.tag."${{ inputs.mode }}" \
     -Dscan.tag.experiment \
     -Dscan.tag.${{ inputs.experiment-id }}

Reporting
Reporting is an optional step enabled by the input extra_report_args property report_enabled. In Telltale, reporting is tied to the assumption that the platform processing the builds is Develocity, allowing the use of the Develocity API to process build information for each variant. Specifically, Telltale uses a CLI to process experiment results: https://github.com/cdsap/BuildExperimentResults. The CLI processes the experiment execution with a command like:

./build-experiment-results --url=${{ inputs.url }}  \
   --api-key $DV_API
   --variants $VARIANT_A  --variants $VARIANT_B \
   --experiment-id=${{ inputs.experiment-id }}

with an output like:

The type of reports included is configurable, allowing different types:
- tasktype_report: Include task type reports.
- taskpath_report: Include task path reports.
- kotlin_build_report: Include Kotlin build reports. Requires Kotlin Build Reports.
- process_report: Include process-related reports. Requires InfoKotlinProcess and InfoGradleProcess.
- resource_usage_report: Include resource usage reports. Require builds using Develocity 2024.2.

Enough talk, let's explore real implementations of Telltale in various scenarios.

Use case: Reducing number of workers
Let’s start with a simple experiment: verifying if reducing the number of workers impacts build duration and performance. In the first experiment, simulating a worst-case scenario, we are not providing task caching, and to reduce the noise from network interactions, we are providing the dependencies during execution. We will test the main branch using the default configuration with 4 workers, and for variant B, we are using 2 workers. Parameters for the experiment:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	main
task	assembleDebug
iterations	100
cache mode	dependencies cache
build arguments	variant b: "-Dorg.gradle.workers.max=2"

(cdsap/TelltaleExperiments, the repository used in all of the experiments in this article, it's a fork of the nowinandroid project)

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11078433199

When comparing the build durations in seconds of both variants, we observe the following:

Using all available workers is faster, with a median improvement of 3.30%. Next, we analyze the Kotlin compiler duration for all tasks in the iterations:

The duration of the Kotlin compiler decreased when using two workers. From this, we infer that parallelization affects the performance of the Kotlin compilation. However, this decrease in Kotlin compiler duration does not translate into better overall build times.

Wondering if this correlates with the Kotlin process max usage, we have the following:

We observe better behavior in the variant that reduces the number of workers. This could be an interesting consideration when working in scenarios with high memory pressure, as reducing the process load might benefit build duration.

The previous experiment was based on a worst-case scenario where all tasks are executed. However, reducing parallelization in this scenario could impact other types of builds. In the next experiment, we will apply the same parameters but add the build cache to simulate a best-case scenario where cache hits occur.

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	main
task	assembleDebug
iterations	100
cache mode	local task cache + dependencies cache
build arguments	variant b: "-Dorg.gradle.workers.max=2"

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11079181145

The results of the build duration in seconds are:

The median duration shows better results when using all available workers with the local build cache; however, the difference is not significant.

Use case: Reducing parallelization of the Kotlin Compiler
In the previous section, we verified that reducing the number of workers increases the build duration. At the same time, we observed an interesting insight regarding the Kotlin compiler duration and Kotlin process memory usage. In this experiment, instead of impacting all tasks, we will reduce the parallelization of Kotlin compiler tasks without affecting the other build tasks. By implementing the same approach that AGP uses to reduce the parallelization of R8 tasks, we declare a Build service as follows:

abstract class KotlinCompileBuildService :
    BuildService<BuildServiceParameters.None> {
    class RegistrationAction(project: Project, maxParallelUsages: Int?) :
        ServiceRegistrationAction<KotlinCompileBuildService, None>(
            project,
            KotlinCompileBuildService::class.java,
            maxParallelUsages ?: 1,
        ) {
        override fun configure(parameters: BuildServiceParameters.None) {}
    }
}

To later update the convention plugin that defines the Android or Kotlin library with:

fun Project.configureKotlinWithBuildServices(maxParallelUsage: Int) {
    RegistrationAction(
        project,
        maxParallelUsage,
    ).execute()
    tasks.withType<KotlinCompile>().configureEach {
        usesService(
            getBuildService(
                project.gradle.sharedServices,
                KotlinCompileBuildService::class.java,
            ),
        )
    }
}

The parameters of the experiment are:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main
variant b	kotlin_service
task	assembleDebug
iterations	100
cache mode	dependencies cache

Results experiment: https://github.com/cdsap/Telltale/actions/runs/11079901282

Build duration:

Reducing the parallelization of the Kotlin compiler task is still slower than the main branch variant, but the build time is improved compared to the previous experiment, where the number of build workers was reduced:

Another interesting insight is how we are reducing the Kotlin compiler's memory max usage when comparing the three variants:

Given the nature of the project and the limited resources available in the GitHub Action runner (4 cores), the results are not impressive. However, in scenarios with a higher number of cores and larger compilation units, this could be an interesting experiment to perform, especially if you're experiencing high memory pressure in builds that heavily utilize the Kotlin compiler.

Use case: Disabling Artifact transform cacheability
Since Develocity includes Artifact Transforms information in the build scans, we have found some cases where significant negative avoidance savings are observed when those transforms are requested from the remote cache:

Given the high volume of transforms requesting cache entries in some poor connectivity scenarios, this could create a performance impact on the build duration. Gradle 8.9 introduces a new 'internal' property that allows disabling the cacheability of the transforms:

-Dorg.gradle.internal.transform-caching-disabled=true

Note:

The usage of this internal property does not guarantee stability or continued support in future versions. As this is an internal feature, it may be subject to changes or removal without prior notice, and its behavior may not be consistent across different versions.

In this experiment, we will use the remote cache mode, providing the dependencies cache but excluding the transforms to force execution or cache requests. Parameters experiment:

Input	Value
repository	cdsap/TelltaleExperiments
variant a	main_with_remote_cache
variant b	main_with_remote_cache
task	assembleDebug
iterations	100
cache mode	remote task cache + dependencies cache - transforms cache
build arguments	variant b: "-Dorg.gradle.internal.transform-caching-disabled=true"

Experiment results: https://github.com/cdsap/Telltale/actions/runs/11080852114

Build Duration:

The build duration increased when comparing the variants. Upon analyzing the reason, we observed that the DexMergingTasktasks were executed in the variant that disables the artifact transforms cache. This is related to the issue, where the Dexing task/transform generates non-deterministic classes.dex contents. Thanks to the Google team, this issue was fixed in Android Gradle Plugin 8.6.1. We repeated the experiment after updating the AGP version to 8.6.1.
Experiment results: https://github.com/cdsap/Telltale/actions/runs/11084537192

Build duration:

Still, the build duration increases significantly even though the tasks have the same hit ratio. In this case, however, it is cheaper to retrieve the artifact transforms output from the remote cache.

To be fair, the experiment scenario is favored by the location of the remote cache node (us-central), which is closer to the location of the GitHub Action runners. This is not always the case in our CI environments, so in the final experiment, we created a new cache node farther from the location of the agents and repeated the experiment with the artifact transforms cache disabled.
Experiment results: https://github.com/cdsap/Telltale/actions/runs/11085054664

Build duration:

The build duration improves when using the remote cache for artifact transforms despite the negative avoidance savings. However, in this case, the difference is much smaller compared to faster cache nodes. This data is interesting because, in scenarios with a high volume of transform requests and increased cache latency, disabling the transform cache might lead to better performance.

Note:
The internal Gradle property org.gradle.internal.transform-caching-disabled allows disabling cache for specific artifact transforms types, you can use the Develocity API or tools like ArtifactTransformReport to collect data of negative avoidance savings by artifact type and disable cacheability for those with higher values.

Final words
I want to emphasize that this is simply an opinionated approach I’m using to automate experiments. Of course, this approach is closely tied to the use of Develocity for consuming build data, but you can still use the experiment orchestration and opt for another component to collect job duration, such as the GitHub API.
The key takeaway from this article is the importance of having a reliable framework to run experiments and make informed, data-driven decisions.

Looking ahead, the future roadmap for Telltale includes:

Support for more than two variants in experiments: Currently, we focus on comparing two variants, but in some cases, we’d like to extend this to test multiple variants, such as different heap sizes. This extension will require careful management of the number of jobs in the experiment to avoid hitting quota limits.
Container argument configuration: While we currently provide variants by OS, some experiments need more flexibility. For example, when measuring builds with different native memory allocators, we require distinct OS environments. By introducing the option to use different container images, we can offer greater flexibility for more advanced experiments.
Support for additional reporting tools: We plan to extend support to other reporting tools, such as Talaiot or the Gradle Analytics plugin, to provide richer data insights.

Resource observability case study: jemalloc in Android builds

Iñaki Villar — Tue, 20 Aug 2024 05:22:15 +0000

As build engineers, one of our biggest concerns is running out of memory during our Gradle builds. This issue has a significant impact on our developers. When memory runs out, it can cause the system to kill the Gradle daemon, resulting in failed CI builds, frequent garbage collection (GC) overhead leading to slow build times, and, most importantly, undermining the team's confidence in running builds on CI.

We often spend time searching for magic formulas to configure the system optimally, but the reality is more complex, with multiple dimensions of problems that vary for each project:

Agent resources: Hardware always matters—not just CPU/memory, but also network and disk.
Type of build: For instance, executing costly test tasks in parallel or performing intensive R8 operations at the end of the build.
Nature of the build: Is the build dominated by cache hits, or do we have builds that consistently apply memory pressure? Are we providing layers of caching, like dependencies or wrappers, in our scenarios?
Project structure: Do we have a large legacy module with thousands of compilation units?

As you can see, it's hard to find the perfect formula. That's why I'm opinionated and always try to design for the worst-case scenario, ensuring reliability in CI builds when running all tasks on a clean agent. However, once that state is achieved, we still need to consider multiple cases to continue making improvements.

This underscores the importance of having the appropriate tools to monitor performance and automated tools to experiment under different scenarios. With these, we can systematically observe and analyze how different configurations behave, ensuring we make data-driven decisions.

Fortunately, Develocity 2024.2 introduces build resource usage observability, a feature that was previously missing in build scans:

Now, we have complete information on key metrics like memory usage, CPU, network, disk, and more. What's even better is the availability of new API endpoints that we can integrate with our monitoring systems to evaluate performance across these different metrics.

As a demonstration, I want to measure something that caught my attention months ago—an interesting topic brought up by Jason Pearson: the use of jemalloc as a native memory allocator for Android builds. The initial claim is that this usage brings a reduction in memory usage by optimizing how memory is allocated and deallocated. jemalloc is designed to minimize memory fragmentation and improve performance, particularly in multithreaded applications, making it ideal for resource-intensive builds.

Experiment

To explore the impact of different native memory allocators on Android build performance, we designed an experiment with two variants:

Default native memory allocator
jemalloc

We conducted the experiment using GitHub Actions, where we created two distinct Docker images based on amazoncorretto:17-al2023-jdk. For the jemalloc docker image variant, we configured the allocator by running the following commands:



RUN curl -L "https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2" -o jemalloc.tar.bz2
RUN tar -xf jemalloc.tar.bz2
RUN cd jemalloc-5.3.0/ && ./configure && make && make install
ENV LD_PRELOAD /usr/local/lib/libjemalloc.so

Next, in the build.yaml file, we set up the configuration to test both variants:



strategy:
    matrix:
        variant: ["cdsap/android-builder:0.5", "cdsap/android-builder-jemalloc:0.5"]
        runs: ${{ fromJson(needs.iterations.outputs.iterations) }}
runs-on: ubuntu-latest
container:
    image: ${{ matrix.variant }}

The experiment was executed with 100 iterations(fresh agent without cache) for each variant, using the nowinandroid project and focusing on the assembleRelease task. By comparing the results, we aimed to assess the effectiveness of jemalloc in reducing memory usage and improving build performance in a CI environment.

Metrics
To get a clear picture of the performance metrics during our experiment, we utilized one of the new endpoints provided by the Develocity API: api/builds/$buildScanId/gradle-resource-usage. This endpoint delivers detailed insights into various resource usage metrics throughout the build process.



{
    "totalMemory": 68719476736,
    "total": {
        "allProcessesCpu": {
            "max": 98,
            "average": 79,
            "median": 86,
            "p5": 40,
            "p25": 85,
            "p75": 89,
            "p95": 95
        },
        "buildProcessCpu": {},
        "buildChildProcessesCpu": {},
        "allProcessesMemory": {},
        "buildProcessMemory": {},
        "buildChildProcessesMemory": {},
        "diskReadThroughput": {},
        "diskWriteThroughput": {},
        "networkUploadThroughput": {},
        "networkDownloadThroughput": {}
    }
}

One key insight from the metrics exposed by the new endpoint is that it provides not only memory metrics specific to the build process but also the total memory usage on the agent. This is perfect for the purpose of our experiment.

Using the tags added to each variant execution, we then pulled the data and aggregated the results from all 100 iterations for each version of the experiment.

Results

All processes:

The time-series data shows that jemalloc has generally lower and more stable memory usage over time compared to malloc.
There are fewer spikes in memory usage with jemalloc, indicating that it may provide a more consistent memory footprint.

Main build process:

For the main build process, jemalloc again shows a more stable and lower memory usage pattern compared to malloc.
malloc has higher peaks and more variability, which can be less desirable in a memory-constrained environment.

Summary results:

Final words
The results of this experiment are inherently influenced by the specific project and the environment in which the scenarios were executed. Nevertheless, it is evident that jemalloc offers slightly better performance in terms of memory usage.

The primary focus of this article is to highlight the new opportunities introduced with the latest release of Develocity 2024.2, particularly the enhanced build resource usage information now available in build scans and through the Develocity API. These new features provide deeper insights into memory usage, enabling more informed decision-making and optimization in your development workflows.

Happy building!

Performance Impact Analysis of Gradle 8.7 in Android Projects

Iñaki Villar — Sat, 23 Mar 2024 16:57:58 +0000

Yesterday was released Gradle 8.7. Our repositories are roaring with bot-generated PRs helping us with the processes of updating with the latest and the greatest versions of our dependencies:

Despite those tools being great for automating the updates for small/sample repositories or non-critical dependencies, applying a performance regression test when updating critical build components in highly modularized projects with hundreds of developers is strongly recommended. Something that hurts the performance in this kind of repos will affect the team's development cycle and early detection is crucial. Ultimately, it is too late to detect the issue once the change is merged.

As a simple example, today we will apply a performance test within an experiment on the new Gradle version in the project nowinandroid. The goal is to verify that the update has no impact on our codebase.
In the experiment, we will cover the worst-case scenario where all tasks are executed and we don't have build-cache available.

The project under test, nowinandroid, is still working with Gradle 8.5, so we will add a new variant in the experiment representing Gradle 8.6. So the variants of the experiment are:

Gradle 8.5
Gradle 8.6
Gradle 8.7

The execution environment is:

Linux 6.5.0-1016-azure (amd64) (GHA runner)
4 CPU cores
4 Gradle workers
JDK 17
6 GB Gradle Process
6 GB Kotlin process

We executed 100 iterations for each variant, each iteration executed the task assembleRelease in a clean GHA runner.

Results
The first obvious check is the overall build time(seconds):

We obtained similar results noticing a 1.78% improvement on the median using 8.7.

Because of the nature of our experiment, fresh agent and no build/remote cache, next we analyzed exclusively the execution time to reduce the noise caused by components like the network(seconds):

Everything looks good, with a decrease of 1.93% in the median of 8.7.

Next, we will focus on the more expensive tasks by plugin. First, we will start with the AGP and the task :app:minifyDemoReleaseWithR8(ms):

We don't observe any significant impact on the task duration and the overall change related to the main median is -0.6%.

Another task that dominates the build times is the DexMergintTask. In nowinandroid the longest task is :app-nia-catalog:mergeExtDexRelease, let's see the results(ms):

All good. We don't observe any impact in the update.

Let's move to the Kotlin Gradle Plugin. In the main branch, the task with the longest duration is :core:model:compileKotlin:

What's going on? Why may the new Gradle version bring benefits in terms of the Kotlin compiler tasks? Sadly, Gradle 8.7 doesn't hide magical optimizations for our Kotlin tasks. The reason is the embedded Kotlin compiler has been updated from 1.9.10 to Kotlin 1.9.22 in Gradle and is now aligned with the version used in the nowiandroid repository.
That means the Gradle build doesn't need to download additional dependencies required for 1.9.22 because they are embedded. And that's why we are seeing an improvement in the task, which is the first Kotlin task executed outside the build logic in the project.
We could have a clear picture if we analyze the build dependencies and network metrics for a build on each variant:

	Gradle 8.5	Gradle 8.6	Gradle 8.7
Build Dependencies	241	241	218
Files downloaded	1654	1654	1601
Data downloaded	726.6 MiB	726.6 MiB	651.5 MiB
Number of network requests	2138	2138	2117

Finally, we analyzed the usage of the processes involved in the build starting with the Gradle process(Gb):

And for the Kotlin process(Gb):

We noticed an increase in process usage caused by the fact that we have the Kotlin versions aligned and we require only one process. In previous versions, two Kotlin processes were created during the build. We could verify this behavior if we analyze the Kotlin processes available at the end of the build for Gradle 8.5/8.6:

Against the processes in Gradle 8.7:

Final words
When updating components like the AGP, KGP, Gradle, or additional critical build components, a performance test regression is recommended to verify the correct behavior of the new version introduced. Even in the case, we explained that it doesn't bring significant duration improvements, it will give us an understanding of not-so-visible changes like the embedded Kotlin compiler update.
The article was just an example covering a few metrics. Depending on the update type and the processes involved in the development cycle, you may consider different tests.

Happy Building!

nowinandroid builds with Gradle 8.5 and JDK 21

Iñaki Villar — Thu, 18 Jan 2024 02:44:07 +0000

Gradle 8.5 fully supports compiling, testing and running on Java 21. Java updates frequently include optimizations that improve the performance of both the JVM and the Java applications running on it.
At the Devoxx Belgium presentation "With Java 21, Your Code Runs Even Faster But How is that Possible?", Per Minborg explains some of the optimizations shipped in Java 21 like Perform I/O operations in bulk for RandomAccessFile highlighting or Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops, and what is better, these optimizations have an immediate impact without changing the java application because by direct usage or by transitive dependency use.

Following the previous Java 17 article, in this article we share the results of measuring an Android project with Java 21.

Nowinandroid

The project used is nowinandroid. The experiment is based on the commit f5b3ae5 of the main branch(12/22). At this point, the project was already using Gradle 8.5.
The only change applied was to update the AGP to 8.2.1 because it included the fix for the issue: JdkImageTransform fails when using JDK 21

Experiment methodology

Two variants
- JDK 17
- JDK 21
Scenarios:
- assembleDebug
- assembleRelease
- testDemoDebug
- lintDemoRelease
Each variant/scenario runs 100 clean builds in GHA runners
Memory configuration for all builds: -Xmx6g -XX:+HeapDumpOnOutOfMemoryError -Dfile.encoding=UTF-8 -XX:+UseParallelGC -XX:MaxMetaspaceSize=1g

Results

First, we explore the overall build time for the different scenarios:

	JDK 17 - Median (secs)	JDK 21 - Median (secs)	Improvement
assembleDebug	448	433	3.4%
assembleRelease	659	591	10.2%
testDemoDebug	334	320	4.3%
lintDemoRelease	306	296	3.3%

In four evaluated scenarios, we are observing modest improvements in three of them, while the assembleRelease scenario shows a significant reduction of 10% in build time.

Next, we explore where the improvements came from at the task level:

assembleDebug

Task	Diff Median (seconds)	Improvement
:app:mergeExtDexDemoDebug	8.5	5.4%
:app-nia-catalog:mergeExtDexDebug	2.8	3.4%
:app:mergeExtDexProdDebug	2.3	10.2%
:app:l8DexDesugarLibDemoDebug	2.4	8.1%
:app:hiltJavaCompileProdDebug	1.7	7.3%

assembleRelease

Task	Diff Median (seconds)	Improvement
:app:minifyDemoReleaseWithR8	49	18.2%
:app-nia-catalog:mergeReleaseGlobalSynthetics	35	29.6%
:app-nia-catalog:mergeExtDexRelease	29	19.9%
:app-nia-catalog:l8DexDesugarLibRelease	8	24.7%
:app:minifyProdReleaseWithR8	7	9.9%

testDemoDebugUnitTest

Task	Diff Median (seconds)	Improvement
:feature:foryou:testDemoDebugUnitTest	3.4	7.6%
:app:testDemoDebugUnitTest	2.5	6.8%
:core:designsystem:testDemoDebugUnitTest	2.4	4.9%
:core:data:testDemoDebugUnitTest	1	10.2%
:app:hiltJavaCompileDemoDebug	1	9.6%

lintDemoRelease

Task	Diff Median (seconds)	Improvement
:core:data:lintAnalyzeDemoRelease	1.9	9.1%
:core:designsystem:compileDemoReleaseKotlin	1.5	6%
:core:designsystem:lintAnalyzeDemoRelease	1.2	6.2%
:core:analytics:lintAnalyzeDemoRelease	1.1	8%
:core:datastore:lintAnalyzeDemoRelease	0.9	12%

Again the release scenario shows a significant improvement in the expensive R8 tasks.

Finally, we aggregated the absolute diff of the median for the tasks in each scenario providing the "serial" improvement using JDK 21:

	Aggregated diff (seconds)
assembleDebug	37
assembleRelease	158
testDebugUnitTest	32
lintDemoRelease	27

Data

Experiment scenario	Results
assembleDebug	https://github.com/cdsap/Pagan/actions/runs/7509423908
assembleRelease	https://github.com/cdsap/Pagan/actions/runs/7513465432
testDemoDebug	https://github.com/cdsap/Pagan/actions/runs/7520610718
lintDemoRelease	https://github.com/cdsap/Pagan/actions/runs/7516005997

Experiments spreadsheet: https://docs.google.com/spreadsheets/d/1wdXYp4ri5XUcBSGNc-ssUpjpcl1e-Pdns4-byNMyvwg/edit?usp=sharing

Final words

In this article, we explored how using JDK 21 enhances build efficiency in an Android project, particularly noting reduced build times. While some improvements were moderate, the assembleRelease tasks demonstrated notably greater enhancements.
The outcomes may differ based on the specific project, but considering the minimal adjustments needed, it's certainly worthwhile to experiment with this approach if you're utilizing Gradle 8.5.

Happy building!

KSP in Android projects

Iñaki Villar — Fri, 01 Sep 2023 03:22:57 +0000

The Android community had awesome news this week: Dagger and Hilt KSP processors are now available in the latest release, v2.48:

The benefits of using KSP over kapt, in terms of build performance, are explained in the official doc:

The major advantages of KSP over kapt are improved build performance
...
To run Java annotation processors unmodified, kapt compiles Kotlin code into Java stubs that retain information that Java annotation processors care about. To create these stubs, kapt needs to resolve all symbols in the Kotlin program. The stub generation costs roughly 1/3 of a full kotlinc analysis and the same order of kotlinc code-generation

Regarding Android projects, KSP was already used by different libraries, but Dagger/Hilt, one of the main DI frameworks used in the Android projects, required using kapt. Mixing KSP and kapt didn't bring benefits in terms of build performance. We were waiting anxiously to test a project using KSP exclusively.

This article compares the results of building nowinadroid with KSP, using Dagger/Hilt 2.48, against the current configuration with kapt.

The Project

As usual in these articles, the project under experimentation is nowinandroid. We are based on the main branch on this commit. Build stack components:

Gradle 8.2
AGP 8.1.0
KGP 1.9.0
Hilt 2.47

The main branch is one of the variants of this experiment, representing the kapt build. Currently, the kapt configuration is used on 21 projects defining the dependency:

com.google.dagger:hilt-android-compiler:2.47

Then, we need a KSP branch acting as another variant in the experiment. We updated the required convention plugin to use KSP instead of kapt. Additionally, we had to apply small changes. The complete list of changes: https://github.com/cdsap/KspVsKapt/commit/5c7bab7b0241142f71caabde1d3558782db0bef4

Methodology

We created two different scenarios:

Clean builds: 50 builds per variant executed in parallel in GitHub Action runners. Task: :app:assembleProdDebug.
Incremental change: 20 builds per variant executed in Github Action runners applying an incremental change on core/data/src/main/java/com/google/samples/apps/nowinandroid/core/data/repository/NewsRepository.kt using Gradle Profiler.

We retrieved the build information with Gradle Enterprise API.

Results

Before reviewing the build results, it's important to mention that replacing kapt with KSP affects the nature of the build. When applying the kapt plugin, we are adding two tasks to the project:

org.jetbrains.kotlin.gradle.internal.KaptWithoutKotlincTask
org.jetbrains.kotlin.gradle.internal.KaptGenerateStubsTask

However, KSP adds only:

com.google.devtools.ksp.gradle.KspTaskJvm

We will reduce the number of tasks on the KTS variant:

	Tasks executed (kapt variant)	Tasks executed (KSP variant)
Clean Build	310	292
Incremental Change	164	154

Clean Builds
The build time, in seconds, for both variants was:

We noticed that the configuration time took around 30% of the build time. Because the builds are executed in clean runners, we preferred to exclude the configuration time and reduce the noise from the configuration phase. The results were:

Still, this data included the execution of tasks unrelated to the kapt/KSP. We picked the modules implementing kapt/KSP and measured the task duration:

For kapt variant, is the sum of KaptGenerateStubsTask + KaptWithoutKotlincTask
For the KSP variant, it represents the duration of KspTaskJvm

The following view represents the median grouped by module of the projects using the plugins under investigation:

We noticed a general improvement in the build duration on each module, showing the app module a decrease of 60% in the processor duration.

Incremental Builds
Because an incremental change affects a specific tree of the task graph, we have fewer modules involved compared to the previous scenario.
The configuration time is not a concern because we have incremental builds. The results after removing the warm-up builds:

Analyzing the median duration of the processor execution by module:

All modules decreased the processing time when using KSP. In this case, we didn't observe the same excellent results of clean builds, but we had significant wins.

Final words

We showed that KSP decreases the processing build time in the project nowinandroid. The project is small, and all the task durations in both variants take seconds, far from the expensive kapt executions we are used to in real projects where it takes minutes in large modules. This first version is just the beginning because the release notes mention the following:

Dagger’s KSP processors are still in the alpha stage. So far we’ve focused mainly on trying to ensure correctness rather than optimize performance.

Exciting times and kudos to the Dagger/Hilt team.

Happy Building

DEV Community: Iñaki Villar

ImNotOkay, a GC experiment for Android CI builds

Ephemeral Android CI Builds

The Experiment

What Changed in ImNotOkay

Results

Using the new ImNotOkay Collector Policy

Final words

Using RSS to Understand Memory Pressure in CI Builds

Capturing the RSS of a process

The case of misaligned Kotlin versions

Timeouts

G1 vs Parallel GC

The OOM puzzle

Final Words

What Happens When You Kill the Kotlin Daemon Before R8?

R8 Tasks

How to terminate the process

Simple demonstration

Experiment environment

Results Experiment

nowinandroid

Build time

R8 Task

Peak Build Memory

Synthectic project 120 modules

Build time

R8 Task

Peak Build Memory

Signal Android

Build time

R8 Task

Peak Build Memory

Final Words

Gradle Learning Day: Reinforcement Learning for Build Optimization

A Quick Look at Reinforcement Learning

Building an RL Framework for Gradle

The RL Agent

First Attempt with Q-tables

Adaptive Exploration Strategy

RL API Component

Github Runners

Collecting data

Results in Action

Final Thoughts

Results After 3 Months of Android Gradle Build Experiments with Telltale

Kotlin 2.1.20 vs 2.1.0

Comparing -Xms usage

Reducing worker parallelization of the Kotlin compiler

Parallel vs G1

Other experiments

Final words

Balancing Memory Heap and Performance in Gradle Builds

Experiment Setup

Methodology

Results

Build Time

Maximum Memory Usage

Kotlin Compiler Memory Usage

Garbage Collection Time

Garbage Collections events

Final words

Gradle 8.11: Faster Configuration Cache and Improved Configuration Time

Methodology

Project

Variants Experiment

Scenarios

Environment

Scenario Configuration Cache Miss

Scenario Configuration Cache Hit

Metrics

Results

Configuration cache entry size

Configuration time with cache miss (dependencies already downloaded)

Configuration Cache Miss (using clean agents)

Configuration cache operations (dependencies already downloaded)

Storing cache entry

Load cache entry

Configuration cache operations (using clean agents)

Storing cache entry

Comparing `-Xms` usage