DEV Community

SoftwareDevs mvpfactory.io
SoftwareDevs mvpfactory.io

Posted on • Originally published at mvpfactory.io

Gradle Build Cache Poisoning in CI

---
title: "Detecting Gradle Build Cache Poisoning in CI Pipelines"
published: true
description: "Build a CI verification pipeline that catches corrupted or stale Gradle remote build cache entries before they silently break your Android/KMP builds."
tags: kotlin, android, devops, architecture
canonical_url: https://blog.mvp-factory.com/detecting-gradle-build-cache-poisoning-in-ci-pipelines
---

## What We Will Build

Let me show you a pattern I use in every project with a non-trivial module count: a three-stage CI pipeline that detects and evicts poisoned Gradle build cache entries before they propagate across your team. By the end, you'll have determinism checks, relocatability audits, and automated cache eviction wired into your CI.

## Prerequisites

- Gradle 8.0+ with a remote build cache enabled
- Gradle Enterprise (Develocity) 2023.1+ (for cache eviction API endpoints)
- KSP 1.9+ if you're running annotation processors
- A CI environment where you can run duplicate builds (GitHub Actions, TeamCity, Jenkins — any will do)

## Step 1: Understand the Failure Modes

Gradle computes cache keys from task inputs — source files, compiler arguments, dependency versions, classpath snapshots. A cache "hit" means the key matched. Here's the gotcha that will save you hours: **a valid cache key does not guarantee valid output.**

Three things go wrong:

| Failure mode | Root cause | Symptom |
|---|---|---|
| Content hash collision | Non-deterministic compiler output (timestamps, ordering) | Intermittent test failures |
| KSP source leakage | Generated sources not fully captured in cache key inputs | Wrong generated code served |
| Relocatability violation | Absolute paths baked into task outputs | Works on CI, fails locally (or vice versa) |

## Step 2: Fix KSP Cache Key Gaps

The docs don't mention this, but KSP processors introduce implicit inputs that Gradle's cache key computation misses. Here's the problem:

Enter fullscreen mode Exit fullscreen mode


kotlin
class ApiClientProcessor : SymbolProcessorProvider {
override fun create(environment: SymbolProcessorEnvironment): SymbolProcessor {
val version = environment.options["api.version"]
// Generated code varies by version, but the cache key won't change
return ApiClientGenerator(version)
}
}


Here's the minimal setup to get this working — register the implicit input at the Gradle task level:

Enter fullscreen mode Exit fullscreen mode


kotlin
abstract class KspRegistrationPlugin : Plugin {
override fun apply(project: Project) {
project.tasks.withType().configureEach {
inputs.property("api.version", project.providers.gradleProperty("api.version"))
}
}
}


Now when `api.version` changes, Gradle computes a fresh cache key instead of serving stale output.

## Step 3: Add a Determinism Check

Run the same build twice on clean CI workers and diff the outputs:

Enter fullscreen mode Exit fullscreen mode


bash
./gradlew assembleRelease --build-cache -Dorg.gradle.caching.debug=true
find build/ -name "*.class" -exec md5sum {} \; | sort > build_a.manifest

./gradlew clean assembleRelease --build-cache
find build/ -name "*.class" -exec md5sum {} \; | sort > build_b.manifest

diff build_a.manifest build_b.manifest


Any diff means non-determinism — a direct cache poisoning vector. Run this weekly on CI. It costs about 15 minutes. That's cheap insurance.

## Step 4: Audit Relocatability via Develocity API

Query your build scans programmatically to catch absolute path leakage:

Enter fullscreen mode Exit fullscreen mode


kotlin
val response = develocityApi.getBuilds(
query = "tag:ci AND buildCacheWarning:relocatability",
since = Instant.now().minus(Duration.ofHours(24))
)
response.builds.forEach { build ->
val violations = develocityApi.getBuildCachePerformance(build.id)
.taskExecutions
.filter { it.cachingDisabledReasonCategory == "NON_CACHEABLE" }
logger.warn("Relocatability violations: ${violations.map { it.taskPath }}")
}


## Step 5: Automate Eviction

When a poisoned entry is detected, evict it immediately:

Enter fullscreen mode Exit fullscreen mode


bash
curl -X DELETE \
"https://ge.yourcompany.com/api/build-cache/entries/${CACHE_KEY}" \
-H "Authorization: Bearer ${GE_TOKEN}"


Don't wait for developers to report "weird build issues." Automate detection and purge within minutes instead of days.

## Results

After deploying this pipeline on a KMP project (54 modules, ~180K LOC, 12-person Android/backend team, 8 CI runners on Linux), we measured over 90 days:

| Metric | Before | After |
|---|---|---|
| Silent miscompilation incidents/month | 3–5 | 0 |
| Cache hit rate | 78% | 72%* |
| Mean time to detect cache issue | 2.3 days | 14 minutes |
| Developer hours lost to "works on my machine" | ~40/month | ~2/month |

*Hit rate dropped because we now correctly invalidate entries that were previously false positives.* Those "hits" were lies. Gradle's own case studies report 70–85% for comparable module counts, so 72% is healthy. And 40 engineering hours per month recovered? That's a trade I'd make every time.

## Gotchas

- **Every implicit input must be declared.** Every file read, environment variable, or classpath resource your KSP/KAPT processor touches needs a formal `inputs.property()` or `@Input` annotation. Skip this and your cache key is incomplete — you will get burned.
- **Task input snapshotting changed between Gradle 7.x and 8.x.** Relocatability checks behave differently. Make sure you're on Gradle 8.0+ before relying on the patterns above.
- **The Develocity eviction API requires Enterprise 2023.1+.** Older versions don't expose cache entry deletion endpoints.
- **A hit rate drop is not a regression.** If your rate drops after deploying verification, you were previously serving false positives. That's the pipeline working as intended.

## Wrapping Up

Cache poisoning scales with module count and annotation processor complexity. The verification pipeline itself isn't complicated — the value is in running it automatically and acting on results without waiting for a human to notice something feels off. Audit your processors, run determinism checks weekly, and query your build scans programmatically. Your team will stop chasing phantom build failures.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)