<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SoftwareDevs mvpfactory.io</title>
    <description>The latest articles on DEV Community by SoftwareDevs mvpfactory.io (@software_mvp-factory).</description>
    <link>https://dev.to/software_mvp-factory</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3790305%2F141f30ba-972f-4b17-9b03-c77343f2747d.png</url>
      <title>DEV Community: SoftwareDevs mvpfactory.io</title>
      <link>https://dev.to/software_mvp-factory</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/software_mvp-factory"/>
    <language>en</language>
    <item>
      <title>Adaptive Bitrate Log Streaming in CI/CD</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Mon, 08 Jun 2026 07:29:17 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/adaptive-bitrate-log-streaming-in-cicd-2l1m</link>
      <guid>https://dev.to/software_mvp-factory/adaptive-bitrate-log-streaming-in-cicd-2l1m</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Adaptive&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Bitrate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Log&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Streaming&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CI/CD&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pipelines"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;architect&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;real-time&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CI/CD&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;log&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;streaming&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;handles&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10GB&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;builds&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;without&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OOM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;crashes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;chunked&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;encoding,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SSE,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;backpressure,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;virtual&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;scrolling&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;explained."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;devops, architecture, cloud, performance&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/adaptive-bitrate-log-streaming-cicd&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

Let me show you a pattern I use in every project that involves real-time build output: an adaptive log streaming architecture that browses 10GB CI/CD build logs without your viewer ever exceeding 150MB of memory. We'll wire up Server-Sent Events, a ring buffer for bounded server memory, backpressure signaling, and client-side virtual scrolling with anchor-based seek.

By the end, you'll understand exactly why "just show me the logs" is a distributed systems problem — and how to solve it.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with HTTP streaming (chunked transfer encoding basics)
&lt;span class="p"&gt;-&lt;/span&gt; A server runtime (Kotlin/JVM examples here, but the patterns are portable)
&lt;span class="p"&gt;-&lt;/span&gt; A frontend that renders log output (TypeScript examples for the client)

&lt;span class="gu"&gt;## Step 1: Choose Your Delivery Protocol&lt;/span&gt;

Here's the gotcha that will save you hours: most teams jump straight to WebSockets without considering the operational cost. For log streaming — a unidirectional, append-only data flow — SSE is the correct default.

| Criteria | SSE | WebSocket |
|---|---|---|
| Auto-reconnect | Yes (built-in) | Manual |
| HTTP/2 multiplexing | Yes | No (upgrade required) |
| Load balancer support | Excellent | Requires sticky sessions |
| Memory overhead per conn | ~4KB | ~8KB |

You get automatic reconnection with &lt;span class="sb"&gt;`Last-Event-ID`&lt;/span&gt;, native HTTP/2 multiplexing, and zero load balancer headaches. Only introduce WebSockets if you have a proven bidirectional requirement.

&lt;span class="gu"&gt;## Step 2: Server-Side Ring Buffer&lt;/span&gt;

The docs don't mention this, but a typical Kotlin multiplatform CI build generates 50,000–200,000 lines of output. Monorepo Gradle builds push past 500,000 lines. Your log ingestion server should never hold the full log in memory. Here's the minimal setup to get this working:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
class LogRingBuffer(private val capacity: Int = 50_000) {&lt;br&gt;
    private val buffer = ArrayDeque(capacity)&lt;br&gt;
    private var globalOffset: Long = 0&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Synchronized
fun append(line: LogLine) {
    if (buffer.size &amp;gt;= capacity) {
        buffer.removeFirst()
        globalOffset++
    }
    buffer.addLast(line)
}

fun slice(from: Long, count: Int): List&amp;lt;LogLine&amp;gt; {
    val start = (from - globalOffset).coerceAtLeast(0).toInt()
    return buffer.drop(start).take(count)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The ring buffer keeps the last N lines in memory while the full log streams to object storage (S3, GCS) in compressed chunks. Clients connecting mid-build get the tail from the ring buffer and seek historically via range requests against stored chunks — similar to how adaptive bitrate video streaming works with segment-based seeking.

## Step 3: Backpressure Signaling

This is where most implementations quietly fall apart. Without backpressure, a fast build and a slow client become a cascading failure. The protocol:

1. Server tracks per-client send queue depth. If the queue exceeds 1,000 unsent lines, switch that client to **summary mode** — one compacted message per N lines.
2. Client signals readiness via SSE reconnection with a `Last-Event-ID` encoding the last consumed offset.
3. When the client falls behind, show a "streaming paused, click to catch up" indicator. Honesty beats invisible data loss.

This is the same pattern used in reactive streams (`Publisher`/`Subscriber` with demand signaling), adapted for HTTP.

## Step 4: Client-Side Virtual Scrolling

Rendering 500K DOM nodes is not viable. Virtual scrolling renders only the visible viewport (50–100 lines) plus a small overscan buffer:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
typescript&lt;br&gt;
interface LogViewport {&lt;br&gt;
  visibleStart: number;&lt;br&gt;
  visibleEnd: number;&lt;br&gt;
  anchorOffset: number | null; // jump target for failure line&lt;br&gt;
  totalLines: number;&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The key addition: **anchor-based seek**. When a build fails, the server sends a failure-line offset and the client jumps directly to it. Client memory stays under 150MB regardless of total log size.

## Gotchas

- **Don't skip backpressure.** Without it, a slow consumer forces the server to buffer indefinitely, eventually OOM-killing the process. Budget 50K lines in the ring buffer and stream the rest to storage.
- **GitHub Actions truncates at ~500K lines** and renders the full DOM with no virtual scrolling. GitLab archives beyond ~1M lines. Buildkite handles ~10M lines with adaptive batching and virtual scrolling — their approach most closely mirrors this architecture.
- **Chunked Transfer Encoding lacks auto-reconnect.** SSE gives you that for free. Only use raw chunked encoding if you're building a download-style flow with no reconnection needs.
- **HTTP/2 multiplexing breaks with WebSockets.** Each WebSocket connection requires an upgrade, bypassing multiplexing entirely. With SSE, hundreds of log streams share a single TCP connection.

## Wrapping Up

Default to SSE. Implement explicit backpressure. Use virtual scrolling with anchor-based seek. These three decisions turn a "just show me the logs" feature from a distributed systems landmine into a reliable, memory-bounded streaming architecture. The sooner you treat log viewing as a backpressure problem rather than a rendering problem, the less time you'll spend debugging your debugging tools.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Structured Concurrency in Ktor 3 with Kotlin Coroutines</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Fri, 05 Jun 2026 13:42:56 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/structured-concurrency-in-ktor-3-with-kotlin-coroutines-2d7e</link>
      <guid>https://dev.to/software_mvp-factory/structured-concurrency-in-ktor-3-with-kotlin-coroutines-2d7e</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Structured&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Concurrency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ktor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Failure&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Isolation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Done&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Right"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;resilient&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ktor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coroutine&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hierarchy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SupervisorJob,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;scoped&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;background&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;jobs,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;isolation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;prevents&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;one&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;slow&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;upstream&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cascading&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;across&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;service."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, architecture, api, backend&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/structured-concurrency-in-ktor-3-failure-isolation-done-right&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop we will wire up a &lt;span class="gs"&gt;**coroutine supervision tree**&lt;/span&gt; for a Ktor 3 service that handles parallel upstream calls per-request, runs background jobs that outlive requests but respect SIGTERM, and isolates third-party SDK failures behind blast-radius boundaries. By the end you will have the exact &lt;span class="sb"&gt;`SupervisorJob`&lt;/span&gt; + &lt;span class="sb"&gt;`CoroutineExceptionHandler`&lt;/span&gt; hierarchy you can drop into production.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Kotlin 1.9+ and Ktor 3.x on your classpath
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with &lt;span class="sb"&gt;`async`&lt;/span&gt;/&lt;span class="sb"&gt;`launch`&lt;/span&gt; and &lt;span class="sb"&gt;`coroutineScope`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Micrometer on your dependency list for the metrics section

&lt;span class="gu"&gt;## Step 1: Stop Default Scoping From Killing Siblings&lt;/span&gt;

Here is the minimal setup to get this working. When you fan out parallel calls inside a route handler, the default &lt;span class="sb"&gt;`coroutineScope`&lt;/span&gt; uses a regular &lt;span class="sb"&gt;`Job`&lt;/span&gt;. One timeout cancels everything — database, cache, the lot.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// DANGEROUS: one failure cancels everything&lt;br&gt;
get("/dashboard") {&lt;br&gt;
    coroutineScope {&lt;br&gt;
        val user = async { userService.fetch(id) }       // DB call&lt;br&gt;
        val prefs = async { cacheService.getPrefs(id) }  // Redis&lt;br&gt;
        val recs = async { recoApi.fetch(id) }            // External API, slow&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    respond(DashboardResponse(user.await(), prefs.await(), recs.await()))
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
If `recoApi.fetch()` throws a `TimeoutCancellationException`, both `user` and `prefs` are cancelled. At 2,000 req/s, one flaky upstream turns your p99 latency into a p50 error rate.

**The fix:** replace `coroutineScope` with `supervisorScope`. Child failures stop propagating sideways:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
get("/dashboard") {&lt;br&gt;
    supervisorScope {&lt;br&gt;
        val user = async { userService.fetch(id) }&lt;br&gt;
        val prefs = async { cacheService.getPrefs(id) }&lt;br&gt;
        val recs = async {&lt;br&gt;
            withTimeout(500.milliseconds) { recoApi.fetch(id) }&lt;br&gt;
        }&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    val recsResult = runCatching { recs.await() }.getOrDefault(emptyList())
    respond(DashboardResponse(user.await(), prefs.await(), recsResult))
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Now a recommendation API timeout returns a degraded response instead of a 500. The critical path completes independently.

| Strategy | Child failure behavior | Use case |
|---|---|---|
| `coroutineScope` (regular `Job`) | Cancels all siblings | All-or-nothing transactions |
| `supervisorScope` (`SupervisorJob`) | Siblings continue | Parallel independent fetches |
| Custom `SupervisorJob` + `CoroutineExceptionHandler` | Siblings continue, errors logged | Background job pools |

## Step 2: Background Jobs That Survive Requests

Webhook retries and cache warming should outlive the request but respect graceful shutdown. Let me show you a pattern I use in every project — an application-scoped supervisor tied to Ktor's lifecycle:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun Application.configureBackgroundJobs() {&lt;br&gt;
    val handler = CoroutineExceptionHandler { _, throwable -&amp;gt;&lt;br&gt;
        log.error("Background job failed", throwable)&lt;br&gt;
        meterRegistry.counter("bg.job.failure", "type", throwable.javaClass.simpleName).increment()&lt;br&gt;
    }&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;val bgScope = CoroutineScope(SupervisorJob() + Dispatchers.Default + handler)

environment.monitor.subscribe(ApplicationStopping) {
    bgScope.cancel()
    runBlocking { bgScope.coroutineContext.job.children.forEach { it.join() } }
}

routing {
    post("/webhook") {
        val payload = call.receive&amp;lt;WebhookPayload&amp;gt;()
        call.respond(HttpStatusCode.Accepted)
        bgScope.launch {
            retryWithBackoff(maxAttempts = 3) { webhookProcessor.deliver(payload) }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The `SupervisorJob` means one failing delivery does not cancel other in-flight jobs. The shutdown hook ensures all active jobs complete during SIGTERM. No orphaned coroutines, no lost deliveries.

## Step 3: Isolate Rogue SDK Coroutines

The docs do not mention this, but third-party SDKs that launch coroutines into your scope are the ones that get you at 3 AM. One unhandled exception propagates up and cancels your application scope.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
val sdkScope = CoroutineScope(&lt;br&gt;
    SupervisorJob() + Dispatchers.IO + CoroutineExceptionHandler { _, ex -&amp;gt;&lt;br&gt;
        log.warn("SDK failure isolated", ex)&lt;br&gt;
        meterRegistry.counter("sdk.failure.isolated").increment()&lt;br&gt;
    }&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;suspend fun safeSdkCall(): SdkResult = withContext(sdkScope.coroutineContext) {&lt;br&gt;
    withTimeout(2.seconds) { thirdPartySdk.riskyOperation() }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This creates a blast radius boundary. The SDK throws whatever it wants — your request pipeline and background jobs are untouched.

## Step 4: Wire Micrometer Into Job States

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun CoroutineScope.launchTracked(&lt;br&gt;
    name: String, registry: MeterRegistry,&lt;br&gt;
    block: suspend CoroutineScope.() -&amp;gt; Unit&lt;br&gt;
): Job {&lt;br&gt;
    registry.gauge("jobs.active", Tags.of("name", name), this) {&lt;br&gt;
        coroutineContext.job.children.count().toDouble()&lt;br&gt;
    }&lt;br&gt;
    return launch {&lt;br&gt;
        registry.timer("job.duration", "name", name).recordSuspend { block() }&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This gives you Grafana dashboards with active job counts, duration percentiles, and failure rates by job type.

## The Full Scope Hierarchy

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
Application (SupervisorJob + CEH → logs &amp;amp; metrics)&lt;br&gt;
├── RequestScope (supervisorScope per-request)&lt;br&gt;
│   ├── async { dbCall }&lt;br&gt;
│   ├── async { cacheCall }&lt;br&gt;
│   └── async { apiCall }  ← timeout doesn't kill siblings&lt;br&gt;
├── BackgroundJobScope (SupervisorJob + CEH)&lt;br&gt;
│   ├── launch { webhookRetry }  ← failure isolated&lt;br&gt;
│   └── launch { cacheWarming }&lt;br&gt;
└── SdkIsolationScope (SupervisorJob + CEH)&lt;br&gt;
    └── thirdPartySdk calls  ← blast radius contained&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Gotchas

- **`supervisorScope` does not swallow exceptions.** You still must `runCatching` on each `await()` — the supervisor only prevents *sibling cancellation*, not propagation to the caller.
- **Never use `GlobalScope` for background work.** You lose all lifecycle control. Tie scopes to `ApplicationStopping` so in-flight jobs complete during graceful shutdown.
- **`withContext(sdkScope.coroutineContext)` inherits the scope's job.** This is how you get isolation. A bare `withContext(Dispatchers.IO)` without the dedicated scope's `SupervisorJob` will not protect you.
- **Gauge registration is not idempotent in all Micrometer registries.** Register `jobs.active` gauges once at startup, not per-request.

## Wrapping Up

Here is the gotcha that will save you hours: the difference between a resilient Ktor service and a 3 AM page is three scopes with `SupervisorJob`, a `CoroutineExceptionHandler` on each, and Micrometer wired into job states. Default `coroutineScope` cascades single failures into full request failures. Dedicated background and SDK isolation scopes contain the blast radius. Get this hierarchy right once and your on-call rotation gets a lot quieter.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>PostgreSQL Partial Replication with Logical Decoding</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Fri, 05 Jun 2026 07:36:31 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/postgresql-partial-replication-with-logical-decoding-3kph</link>
      <guid>https://dev.to/software_mvp-factory/postgresql-partial-replication-with-logical-decoding-3kph</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Partial&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Replication:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Skip&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CDC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pipeline"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Learn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;how&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;15+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replication&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;row&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;filters&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;column&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lists&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;can&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Debezium/Kafka&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CDC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;selective&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;microservice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;streaming."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql, architecture, devops, api&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/postgresql-partial-replication-skip-the-cdc-pipeline&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop, I will walk you through setting up PostgreSQL 15+ logical replication with row filters and column lists so each downstream microservice receives only the data it actually needs. No Debezium. No Kafka Connect. No schema registry. Just PostgreSQL infrastructure you already run.

By the end, you will have per-service publications with filtered replication, proper WAL bloat prevention, and a monitoring query you can drop straight into your observability stack.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; PostgreSQL 15 or later (row filters and column lists require PG15+)
&lt;span class="p"&gt;-&lt;/span&gt; At least two PostgreSQL instances (one primary, one subscriber)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`wal_level = logical`&lt;/span&gt; set on the primary
&lt;span class="p"&gt;-&lt;/span&gt; Basic familiarity with SQL DDL and replication concepts

&lt;span class="gu"&gt;## Step 1: Understand Why This Exists&lt;/span&gt;

Most teams reach for Debezium + Kafka the moment two services need shared data. That stack costs a lot to operate: ZooKeeper or KRaft clusters, Connect workers, schema registries, offset management, connector configs that silently break on DDL changes.

If your architecture is 3–8 services that each need a materialized read model from a shared PostgreSQL primary, built-in logical replication may be everything you need.

&lt;span class="gu"&gt;## Step 2: Create Filtered Publications&lt;/span&gt;

Here is the minimal setup to get this working. PostgreSQL's &lt;span class="sb"&gt;`CREATE PUBLICATION`&lt;/span&gt; now supports &lt;span class="sb"&gt;`WHERE`&lt;/span&gt; clauses and column lists directly:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
-- Service: order-fulfillment&lt;br&gt;
-- Only replicate orders that are paid and awaiting shipment&lt;br&gt;
CREATE PUBLICATION pub_fulfillment&lt;br&gt;
  FOR TABLE orders (id, customer_id, status, shipping_address, created_at)&lt;br&gt;
  WHERE (status IN ('paid', 'processing'));&lt;/p&gt;

&lt;p&gt;-- Service: analytics-ingest&lt;br&gt;
-- Replicate all orders but only the columns needed for reporting&lt;br&gt;
CREATE PUBLICATION pub_analytics&lt;br&gt;
  FOR TABLE orders (id, total_cents, currency, region, created_at),&lt;br&gt;
        TABLE line_items (id, order_id, sku, quantity, unit_price_cents);&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Each subscriber connects to its own publication and receives only the filtered subset. No application-level filtering, no wasted bandwidth. The important detail: row filtering happens on the publisher side. The primary does the work, but avoids serializing and transmitting data the subscriber would discard.

## Step 3: Know the Trade-offs

| Dimension | PG Logical Replication | Debezium + Kafka |
|---|---|---|
| Additional infrastructure | None | Kafka cluster, Connect workers, schema registry |
| Row-level filtering | Native (`WHERE`) | SMT or consumer-side |
| Column filtering | Native (column lists) | SMT `ReplaceField` or downstream |
| Throughput ceiling | ~5K–15K TPS per slot | Horizontally scalable via partitions |
| Schema evolution | Manual `ALTER SUBSCRIPTION REFRESH` | Schema registry handles most cases |
| Fan-out to non-PG consumers | Not supported | Any Kafka consumer |

If your write throughput is under 10K TPS and all consumers are PostgreSQL databases, native replication wins on operational simplicity. It is not close.

## Step 4: Prevent WAL Bloat on Day One

Let me show you a pattern I use in every project. Every replication slot tells PostgreSQL: "Do not recycle WAL segments past this point." If a subscriber goes down, WAL accumulates on the primary. This is the single most common way logical replication causes outages.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';&lt;br&gt;
SELECT pg_reload_conf();&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Then monitor continuously:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
SELECT&lt;br&gt;
  slot_name,&lt;br&gt;
  active,&lt;br&gt;
  pg_size_pretty(&lt;br&gt;
    pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)&lt;br&gt;
  ) AS retained_wal,&lt;br&gt;
  pg_size_pretty(&lt;br&gt;
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)&lt;br&gt;
  ) AS total_retained&lt;br&gt;
FROM pg_replication_slots&lt;br&gt;
WHERE slot_type = 'logical';&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Set alerts on `retained_wal` exceeding 20% of your available disk. When `max_slot_wal_keep_size` is breached, PostgreSQL invalidates the slot. The subscriber must be re-initialized, but your primary survives.

## Gotchas

- **Unmonitored slots will take down your primary.** An inactive replication slot is a disk-full outage waiting to happen. Drop inactive slots immediately when decommissioning a service: `SELECT pg_drop_replication_slot('slot_name');`
- **Name slots descriptively.** Use `sub_fulfillment_v1`, not `sub1`. Future you will thank present you.
- **DDL changes require manual intervention.** Schema evolution means running `ALTER SUBSCRIPTION REFRESH PUBLICATION` on the subscriber. The docs do not mention this prominently, but skipping it silently breaks replication.
- **Logical slots do not replicate to standbys.** If you use physical replication for HA, test failover carefully. PG17 introduces `sync_replication_slots`, but on earlier versions you must recreate slots after promotion.
- **Know when to graduate.** Move to Debezium/Kafka when you need non-PostgreSQL consumers (Elasticsearch, Redis, data lakes), write throughput exceeds what a single slot handles, or you need exactly-once delivery semantics beyond `pg_replication_origin`.

## Wrapping Up

Design your publications around domain boundaries. When you eventually need Debezium, the table-to-service mapping already exists — your publication definitions become your CDC connector config blueprint. Start with what PostgreSQL gives you. You probably already have the infrastructure.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Systematic ANR Diagnosis in Jetpack Compose Apps</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:38:21 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/systematic-anr-diagnosis-in-jetpack-compose-apps-59j0</link>
      <guid>https://dev.to/software_mvp-factory/systematic-anr-diagnosis-in-jetpack-compose-apps-59j0</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Diagnose&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Hidden&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ANR&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Patterns&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Jetpack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compose&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Apps"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Learn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;how&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Dispatchers.Main.immediate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;synchronized&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Room&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;DAO&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;callbacks&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;create&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ANR-risk&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;blocking&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;StrictMode&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;misses.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Perfetto&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;traces&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gates&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;catch&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;them."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;android, kotlin, architecture, performance&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/diagnose-hidden-anr-patterns-jetpack-compose&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop, I will walk you through a systematic workflow for diagnosing ANR patterns that StrictMode completely misses. By the end, you will have:
&lt;span class="p"&gt;
-&lt;/span&gt; A clear mental model of how &lt;span class="sb"&gt;`Dispatchers.Main.immediate`&lt;/span&gt; plus &lt;span class="sb"&gt;`synchronized`&lt;/span&gt; Room DAO callbacks silently block the main thread
&lt;span class="p"&gt;-&lt;/span&gt; A working Perfetto SQL query to surface lock contention across threads
&lt;span class="p"&gt;-&lt;/span&gt; A three-layer CI gate architecture (lint, macro-benchmark, trace analysis) that catches these patterns before they ship

Let me show you a pattern I use in every project — and the one that kept slipping through our tooling for months.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Android Studio Hedgehog or later
&lt;span class="p"&gt;-&lt;/span&gt; A Jetpack Compose app with Room database access
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`adb`&lt;/span&gt; and Perfetto CLI available on your PATH
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with Kotlin coroutines and dispatchers

&lt;span class="gu"&gt;## Step 1: Understand the Gap StrictMode Leaves Open&lt;/span&gt;

StrictMode catches disk reads, network calls, and untagged sockets on the main thread. Here is the gotcha that will save you hours: &lt;span class="gs"&gt;**it does not instrument lock acquisition**&lt;/span&gt;. If your main thread calls a &lt;span class="sb"&gt;`suspend`&lt;/span&gt; function that internally acquires a &lt;span class="sb"&gt;`synchronized`&lt;/span&gt; lock held by a Room write transaction on &lt;span class="sb"&gt;`Dispatchers.IO`&lt;/span&gt;, StrictMode reports nothing. The main thread is technically not doing I/O — it is waiting on a monitor.

In production Compose-heavy apps that have already eliminated obvious StrictMode violations, this pattern accounts for roughly 15–30% of ANR clusters.

&lt;span class="gu"&gt;### The Invisible Call Chain&lt;/span&gt;

Here is the typical sequence:
&lt;span class="p"&gt;
1.&lt;/span&gt; A &lt;span class="sb"&gt;`LaunchedEffect`&lt;/span&gt; calls a repository method on &lt;span class="sb"&gt;`Dispatchers.Main.immediate`&lt;/span&gt;
&lt;span class="p"&gt;2.&lt;/span&gt; The repository calls a Room DAO method annotated with &lt;span class="sb"&gt;`@Transaction`&lt;/span&gt;
&lt;span class="p"&gt;3.&lt;/span&gt; Room's generated code acquires a &lt;span class="sb"&gt;`synchronized`&lt;/span&gt; lock on the &lt;span class="sb"&gt;`RoomDatabase`&lt;/span&gt; instance
&lt;span class="p"&gt;4.&lt;/span&gt; A background &lt;span class="sb"&gt;`Dispatchers.IO`&lt;/span&gt; coroutine is already holding that lock (bulk insert, migration, or WAL checkpoint)
&lt;span class="p"&gt;5.&lt;/span&gt; The main thread blocks on monitor entry. Zero StrictMode output.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Looks safe. It is not.&lt;br&gt;
@Composable&lt;br&gt;
fun DashboardScreen(viewModel: DashboardViewModel) {&lt;br&gt;
    LaunchedEffect(Unit) {&lt;br&gt;
        // Dispatchers.Main.immediate by default&lt;br&gt;
        viewModel.refreshStats() // suspend, calls Room @Transaction&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The `suspend` keyword lulls teams into thinking this is non-blocking. But Room's internal `synchronized` block does not suspend. It blocks the calling thread.

## Step 2: Capture and Analyze with Perfetto

Perfetto captures thread state transitions that Systrace and StrictMode cannot. Here is the step-by-step workflow:

| Step | Tool | What You Find |
|------|------|---------------|
| 1. Capture trace | `adb shell perfetto` with `sched` + `lock_contention` data sources | Raw thread scheduling data |
| 2. Find ANR window | Perfetto UI, search for `SIG_ANR` or `Input dispatching timed out` | Exact timestamp of ANR trigger |
| 3. Inspect main thread | Slice track, look for `monitor contention` slices | Lock address + blocked duration |
| 4. Cross-reference holder | Filter by lock address across all thread tracks | Background thread holding the lock |
| 5. Read holder stack | Holder thread's slice track at same timestamp | Exact call chain (e.g., Room `beginTransaction`) |

### The Perfetto Query That Matters

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
SELECT ts, dur, thread.name, args.display_value&lt;br&gt;
FROM slice&lt;br&gt;
JOIN thread_track ON slice.track_id = thread_track.id&lt;br&gt;
JOIN thread USING (utid)&lt;br&gt;
WHERE slice.name LIKE '%monitor contention%'&lt;br&gt;
  AND thread.name = 'main'&lt;br&gt;
  AND dur &amp;gt; 100000000  -- &amp;gt;100ms, ANR-risk threshold&lt;br&gt;
ORDER BY dur DESC&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This query surfaces every main-thread lock wait exceeding 100ms. In one production audit, I found 11 distinct lock-contention sites that had passed StrictMode checks for months. Eleven. All invisible to the existing tooling.

## Step 3: Build a CI Gate for ANR-Risk Chains

Waiting for production ANRs is expensive and demoralizing. Here is the minimal setup to get this working as a CI gate.

### Static Analysis with Custom Lint Rules

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Custom Lint detector: flag @Transaction calls reachable from Main dispatcher&lt;br&gt;
class MainThreadTransactionDetector : Detector(), SourceCodeScanner {&lt;br&gt;
    override fun getApplicableMethodNames() = listOf("withTransaction")&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;override fun visitMethodCall(
    context: JavaContext,
    node: UCallExpression,
    method: PsiMethod
) {
    if (isReachableFromMainDispatcher(context, node)) {
        context.report(
            ANR_RISK_ISSUE, node, context.getLocation(node),
            "Room @Transaction reachable from Dispatchers.Main"
        )
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
### The Three-Layer Pipeline

| Stage | Check | Threshold |
|-------|-------|-----------|
| Lint | Custom `MainThreadTransactionDetector` | 0 warnings |
| Instrumented test | Macro-benchmark with Perfetto trace capture | Main-thread lock wait &amp;lt; 50ms |
| Trace analysis | Automated Perfetto SQL query on CI traces | 0 slices &amp;gt; 100ms |

The macro-benchmark stage matters most. Run realistic user flows (app cold start, navigation between Compose screens, data sync) while capturing Perfetto traces. Parse the traces with the SQL query above and fail the build if any main-thread lock contention exceeds your threshold.

## Step 4: Apply the Fix

Once you identify a lock-contention site, the fix is straightforward — never acquire Room's database lock from a main-thread coroutine.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Before: ANR risk&lt;br&gt;
suspend fun refreshStats() {&lt;br&gt;
    val stats = dao.getStatsInTransaction() // blocks main thread on lock&lt;br&gt;
    _state.value = stats&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// After: explicit dispatcher switch before lock acquisition&lt;br&gt;
suspend fun refreshStats() {&lt;br&gt;
    val stats = withContext(Dispatchers.IO) {&lt;br&gt;
        dao.getStatsInTransaction() // lock acquired on IO thread&lt;br&gt;
    }&lt;br&gt;
    _state.value = stats&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
`withContext(Dispatchers.IO)` ensures the `synchronized` block executes on a thread that can safely block without causing ANRs.

## Gotchas

- **The `suspend` modifier lies (sometimes).** A `suspend` DAO method does not prevent the underlying `synchronized` block from blocking the calling thread. Be explicit about which dispatcher acquires locks.
- **StrictMode green does not mean ANR-safe.** Lock contention is an entirely separate failure mode. Do not treat passing StrictMode as full coverage.
- **Perfetto `lock_contention` requires kernel support.** Some emulator images and older devices do not emit these events. Test on physical hardware or recent API-level emulators.
- **Threshold tuning matters.** Start with 100ms as a hard failure and 50ms as a warning. Tighten as your codebase matures.

I keep long debugging sessions like these healthy with [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) for break reminders, because staring at Perfetto traces for hours without moving is its own kind of system failure.

## Wrapping Up

Stop trusting StrictMode alone for ANR prevention. Add Perfetto trace analysis to your instrumented test suite, wrap every Room `@Transaction` call in `withContext(Dispatchers.IO)`, and build your CI gate in three layers: static lint, macro-benchmark traces, and a zero-tolerance threshold for main-thread lock waits. Catch the pattern before your users do.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Gradle Build Cache Poisoning in CI</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Thu, 04 Jun 2026 09:00:54 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/gradle-build-cache-poisoning-in-ci-1mno</link>
      <guid>https://dev.to/software_mvp-factory/gradle-build-cache-poisoning-in-ci-1mno</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Detecting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Gradle&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Cache&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Poisoning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pipelines"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;verification&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;catches&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;corrupted&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;stale&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Gradle&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cache&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;entries&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;they&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;silently&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;break&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android/KMP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;builds."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, android, devops, architecture&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvp-factory.com/detecting-gradle-build-cache-poisoning-in-ci-pipelines&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

Let me show you a pattern I use in every project with a non-trivial module count: a three-stage CI pipeline that detects and evicts poisoned Gradle build cache entries before they propagate across your team. By the end, you'll have determinism checks, relocatability audits, and automated cache eviction wired into your CI.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Gradle 8.0+ with a remote build cache enabled
&lt;span class="p"&gt;-&lt;/span&gt; Gradle Enterprise (Develocity) 2023.1+ (for cache eviction API endpoints)
&lt;span class="p"&gt;-&lt;/span&gt; KSP 1.9+ if you're running annotation processors
&lt;span class="p"&gt;-&lt;/span&gt; A CI environment where you can run duplicate builds (GitHub Actions, TeamCity, Jenkins — any will do)

&lt;span class="gu"&gt;## Step 1: Understand the Failure Modes&lt;/span&gt;

Gradle computes cache keys from task inputs — source files, compiler arguments, dependency versions, classpath snapshots. A cache "hit" means the key matched. Here's the gotcha that will save you hours: &lt;span class="gs"&gt;**a valid cache key does not guarantee valid output.**&lt;/span&gt;

Three things go wrong:

| Failure mode | Root cause | Symptom |
|---|---|---|
| Content hash collision | Non-deterministic compiler output (timestamps, ordering) | Intermittent test failures |
| KSP source leakage | Generated sources not fully captured in cache key inputs | Wrong generated code served |
| Relocatability violation | Absolute paths baked into task outputs | Works on CI, fails locally (or vice versa) |

&lt;span class="gu"&gt;## Step 2: Fix KSP Cache Key Gaps&lt;/span&gt;

The docs don't mention this, but KSP processors introduce implicit inputs that Gradle's cache key computation misses. Here's the problem:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
class ApiClientProcessor : SymbolProcessorProvider {&lt;br&gt;
    override fun create(environment: SymbolProcessorEnvironment): SymbolProcessor {&lt;br&gt;
        val version = environment.options["api.version"]&lt;br&gt;
        // Generated code varies by version, but the cache key won't change&lt;br&gt;
        return ApiClientGenerator(version)&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Here's the minimal setup to get this working — register the implicit input at the Gradle task level:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
abstract class KspRegistrationPlugin : Plugin {&lt;br&gt;
    override fun apply(project: Project) {&lt;br&gt;
        project.tasks.withType().configureEach {&lt;br&gt;
            inputs.property("api.version", project.providers.gradleProperty("api.version"))&lt;br&gt;
        }&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Now when `api.version` changes, Gradle computes a fresh cache key instead of serving stale output.

## Step 3: Add a Determinism Check

Run the same build twice on clean CI workers and diff the outputs:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
./gradlew assembleRelease --build-cache -Dorg.gradle.caching.debug=true&lt;br&gt;
find build/ -name "*.class" -exec md5sum {} \; | sort &amp;gt; build_a.manifest&lt;/p&gt;

&lt;p&gt;./gradlew clean assembleRelease --build-cache&lt;br&gt;
find build/ -name "*.class" -exec md5sum {} \; | sort &amp;gt; build_b.manifest&lt;/p&gt;

&lt;p&gt;diff build_a.manifest build_b.manifest&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Any diff means non-determinism — a direct cache poisoning vector. Run this weekly on CI. It costs about 15 minutes. That's cheap insurance.

## Step 4: Audit Relocatability via Develocity API

Query your build scans programmatically to catch absolute path leakage:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
val response = develocityApi.getBuilds(&lt;br&gt;
    query = "tag:ci AND buildCacheWarning:relocatability",&lt;br&gt;
    since = Instant.now().minus(Duration.ofHours(24))&lt;br&gt;
)&lt;br&gt;
response.builds.forEach { build -&amp;gt;&lt;br&gt;
    val violations = develocityApi.getBuildCachePerformance(build.id)&lt;br&gt;
        .taskExecutions&lt;br&gt;
        .filter { it.cachingDisabledReasonCategory == "NON_CACHEABLE" }&lt;br&gt;
    logger.warn("Relocatability violations: ${violations.map { it.taskPath }}")&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 5: Automate Eviction

When a poisoned entry is detected, evict it immediately:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
curl -X DELETE \&lt;br&gt;
  "&lt;a href="https://ge.yourcompany.com/api/build-cache/entries/$%7BCACHE_KEY%7D" rel="noopener noreferrer"&gt;https://ge.yourcompany.com/api/build-cache/entries/${CACHE_KEY}&lt;/a&gt;" \&lt;br&gt;
  -H "Authorization: Bearer ${GE_TOKEN}"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Don't wait for developers to report "weird build issues." Automate detection and purge within minutes instead of days.

## Results

After deploying this pipeline on a KMP project (54 modules, ~180K LOC, 12-person Android/backend team, 8 CI runners on Linux), we measured over 90 days:

| Metric | Before | After |
|---|---|---|
| Silent miscompilation incidents/month | 3–5 | 0 |
| Cache hit rate | 78% | 72%* |
| Mean time to detect cache issue | 2.3 days | 14 minutes |
| Developer hours lost to "works on my machine" | ~40/month | ~2/month |

*Hit rate dropped because we now correctly invalidate entries that were previously false positives.* Those "hits" were lies. Gradle's own case studies report 70–85% for comparable module counts, so 72% is healthy. And 40 engineering hours per month recovered? That's a trade I'd make every time.

## Gotchas

- **Every implicit input must be declared.** Every file read, environment variable, or classpath resource your KSP/KAPT processor touches needs a formal `inputs.property()` or `@Input` annotation. Skip this and your cache key is incomplete — you will get burned.
- **Task input snapshotting changed between Gradle 7.x and 8.x.** Relocatability checks behave differently. Make sure you're on Gradle 8.0+ before relying on the patterns above.
- **The Develocity eviction API requires Enterprise 2023.1+.** Older versions don't expose cache entry deletion endpoints.
- **A hit rate drop is not a regression.** If your rate drops after deploying verification, you were previously serving false positives. That's the pipeline working as intended.

## Wrapping Up

Cache poisoning scales with module count and annotation processor complexity. The verification pipeline itself isn't complicated — the value is in running it automatically and acting on results without waiting for a human to notice something feels off. Audit your processors, run determinism checks weekly, and query your build scans programmatically. Your team will stop chasing phantom build failures.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Custom Vulkan Compute Kernels for On-Device LLM Inference on Android</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 03 Jun 2026 13:33:20 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/custom-vulkan-compute-kernels-for-on-device-llm-inference-on-android-566f</link>
      <guid>https://dev.to/software_mvp-factory/custom-vulkan-compute-kernels-for-on-device-llm-inference-on-android-566f</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Custom&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Vulkan&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compute&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kernels&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;On-Device&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LLM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Inference&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;custom&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Vulkan&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;compute&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shaders—tiled&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;matmul,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fused&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;softmax&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;attention,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;memory-mapped&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;weight&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;loading—that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bypass&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;NNAPI/TFLite&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;overhead&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;double&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;throughput&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;android, kotlin, architecture, performance&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.themvpfactory.com/custom-vulkan-compute-kernels-for-on-device-llm-inference-on-android&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

In this workshop, I'll walk you through the architecture behind a custom Vulkan compute pipeline for on-device LLM inference on Android. You'll learn how to replace NNAPI and TFLite delegates with three GPU-native kernels—tiled matrix multiplication, fused softmax-attention, and memory-mapped weight loading—and how to tune dispatch parameters per GPU architecture. By the end, you'll understand the exact approach that produces a 2x tokens/s improvement over framework-based inference on Snapdragon 8 Gen 4 hardware.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with Android NDK and native code integration
&lt;span class="p"&gt;-&lt;/span&gt; Basic understanding of GPU compute concepts (workgroups, shared memory, dispatch)
&lt;span class="p"&gt;-&lt;/span&gt; A Vulkan-capable Android device for testing (Adreno 750 or Mali-G720 ideally)
&lt;span class="p"&gt;-&lt;/span&gt; Android Studio with NDK r26+ and the Vulkan validation layers enabled

&lt;span class="gu"&gt;## Step-by-Step&lt;/span&gt;

&lt;span class="gu"&gt;### Step 1: Understand Why Frameworks Fall Short&lt;/span&gt;

Let me show you a pattern I use in every project: before writing a single shader, profile the dispatch overhead. Here's what the numbers look like:

| Factor | TFLite GPU Delegate | Custom Vulkan Kernels |
|---|---|---|
| Operator fusion | Limited, predefined patterns | Fully custom fused ops |
| Memory management | Framework-controlled allocations | Explicit VkBuffer with memory-mapped weights |
| Workgroup tuning | Generic, one-size-fits-all | Per-GPU architecture dispatch |
| Attention implementation | Decomposed into separate ops | Fused flash-attention-style kernel |
| Dispatch overhead per token | ~2.1 ms (measured on Adreno 750) | ~0.3 ms |

The delegate model means every operation passes through an abstraction layer that decides how to map your graph to GPU commands. For LLM decode steps—where you're dispatching kernels thousands of times per generation—that overhead compounds fast.

&lt;span class="gu"&gt;### Step 2: Write the Tiled Matrix Multiplication Kernel&lt;/span&gt;

This is the backbone of every transformer layer. A tiled approach using shared memory keeps data local to the workgroup:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
glsl&lt;/p&gt;
&lt;h1&gt;
  
  
  version 450
&lt;/h1&gt;

&lt;p&gt;layout(local_size_x = 16, local_size_y = 16) in;&lt;br&gt;
layout(set = 0, binding = 0) readonly buffer A { float a[]; };&lt;br&gt;
layout(set = 0, binding = 1) readonly buffer B { float b[]; };&lt;br&gt;
layout(set = 0, binding = 2) writeonly buffer C { float c[]; };&lt;br&gt;
shared float tileA[16][16];&lt;br&gt;
shared float tileB[16][16];&lt;br&gt;
// Tile loop with barrier sync between loads&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Here's the gotcha that will save you hours: tile size must match the GPU's wavefront/warp width. Adreno and Mali diverge sharply here, and getting this wrong negates the entire benefit.

### Step 3: Fuse the Softmax-Attention Kernel

Instead of dispatching separate softmax, scaling, and matmul operations, write a flash-attention-style fused kernel that performs full QKV attention in a single dispatch. This eliminates three round-trips to global memory per attention head. If you only write one custom shader, make it this one—it recovers roughly 40% of the framework overhead on its own.

### Step 4: Memory-Map Your Weights

Rather than deserializing weights through a framework, map the weight file directly into a `VkBuffer` using `AHardwareBuffer` or file-backed `mmap`. On Snapdragon 8 Gen 4, this cuts model load time from ~4 seconds to under 800 ms for a 2B parameter model at FP16.

### Step 5: Tune Dispatch Per GPU Architecture

The docs don't mention this, but a single "universal" workgroup configuration leaves 30-50% of performance on the table. Here's what you need per architecture:

| Parameter | Adreno 750 (Snapdragon 8 Gen 4) | Mali-G720 (Dimensity 9400) |
|---|---|---|
| Optimal workgroup size | 256 (16x16) | 64 (8x8) |
| Shared memory per workgroup | 32 KB | 16 KB |
| Wave width | 64 threads | 16 threads |
| Preferred tile size (matmul) | 16x16 | 8x8 |
| Max concurrent dispatches | 4 compute queues | 1 compute queue |

Here's the minimal setup to get this working—runtime GPU detection with pre-compiled SPIR-V shader variants:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
val workgroupSize = when {&lt;br&gt;
    gpuName.contains("Adreno 7") -&amp;gt; 256&lt;br&gt;
    gpuName.contains("Mali-G7")  -&amp;gt; 64&lt;br&gt;
    else -&amp;gt; 128 // conservative fallback&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
You detect the GPU via `vkGetPhysicalDeviceProperties` and select the appropriate SPIR-V variant at startup.

### Step 6: Benchmark and Validate

Here are results from Snapdragon 8 Gen 4 reference hardware running a 2B parameter LLaMA-style model at FP16, generating 128 tokens:

| Engine | Tokens/s | Peak Memory | Time to First Token |
|---|---|---|---|
| TFLite GPU delegate | 11.2 | 2.8 GB | 380 ms |
| NNAPI (GPU path) | 9.7 | 3.1 GB | 420 ms |
| Custom Vulkan kernels | 22.8 | 2.1 GB | 190 ms |

The 2x improvement breaks down: eliminated dispatch overhead accounts for ~35%, fused attention kernels contribute ~40%, and memory-mapped weight loading covers the remaining ~25%.

## Gotchas

- **Mali shared memory spill.** On Mali-G720, using 16x16 tiles with its 16 KB shared memory limit will spill to global memory. Drop to 8x8 tiles or you negate the entire benefit.
- **Profile dispatch, not compute.** Before you optimize compute, use `VK_EXT_debug_utils` timestamps to measure per-dispatch cost. On most Android devices, the bottleneck isn't slow math—it's slow dispatch. That surprised me the first time I profiled a decode loop.
- **Shader variant maintenance is worth it.** Yes, maintaining multiple SPIR-V builds per GPU is annoying. But a single universal config leaves 30-50% on the table. Runtime detection with pre-compiled variants is the minimum viable approach for production.
- **Don't skip memory mapping.** Teams often focus on kernel optimization first. Memory-mapped weight loading via `AHardwareBuffer` or `mmap` into `VkBuffer` contributes ~25% of the total improvement through reduced memory pressure translating to sustained throughput.

## Conclusion

GPU-native AI workloads are moving from cloud to device—Microsoft just announced the Surface Laptop Ultra and Surface RTX Spark Dev Box at Build, both powered by Nvidia's RTX Spark chips. On Android, we have the same opportunity, but the framework tooling hasn't caught up. NNAPI was designed for delegate-based dispatch, not the fine-grained kernel control LLM inference demands.

Start with the fused attention kernel for the best return on effort, profile dispatch overhead before optimizing compute, and ship per-GPU SPIR-V variants. Those three steps will get you from framework-limited throughput to GPU-native performance.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>eBPF-Based APM for Kotlin Backend Services</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:43:33 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/ebpf-based-apm-for-kotlin-backend-services-bd5</link>
      <guid>https://dev.to/software_mvp-factory/ebpf-based-apm-for-kotlin-backend-services-bd5</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eBPF-Based&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;APM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Zero-Code&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Latency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Profiling"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;continuous&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;profiling&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin/JVM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;services&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;using&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;eBPF&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SDK&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dependencies,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changes,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;60-70%&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;less&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CPU&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;overhead&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;than&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OpenTelemetry&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;agents."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, devops, architecture, cloud&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/ebpf-based-apm-for-kotlin-zero-code-latency-profiling&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

Let me show you how to set up eBPF-based continuous profiling for your Kotlin/JVM backend services. By the end of this tutorial, you will have a pipeline that produces CPU flame graphs with real Kotlin method names — no agent attached to your JVM, no SDK dependencies, no restarts.

Running this on production Kotlin services, we cut observability-related CPU overhead by 60-70% while catching tail-latency regressions that our old OpenTelemetry setup missed entirely.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; A Kotlin/JVM service running on Linux (eBPF is a kernel feature)
&lt;span class="p"&gt;-&lt;/span&gt; JDK 17+ (JDK 20+ recommended for built-in perf-map support)
&lt;span class="p"&gt;-&lt;/span&gt; Docker or Kubernetes for deploying the eBPF agent sidecar
&lt;span class="p"&gt;-&lt;/span&gt; Grafana Cloud or a self-hosted Pyroscope instance

&lt;span class="gu"&gt;## Step 1: Understand Why Agent-Based Instrumentation Falls Short&lt;/span&gt;

The OpenTelemetry Java agent is a &lt;span class="sb"&gt;`-javaagent`&lt;/span&gt; bytecode transformer running inside your JVM. It shares your heap, your GC pauses, and your thread pool. For Kotlin services on coroutines, the OTel agent's context propagation was designed around threads, not structured concurrency. You end up fighting the instrumentation library instead of observing your application.

eBPF sidesteps this entirely. It runs in kernel space, attached to syscall tracepoints and kprobes, completely outside your JVM process.

&lt;span class="gu"&gt;## Step 2: Configure JVM Flags&lt;/span&gt;

Here is the minimal setup to get this working. Add these flags to your Kotlin service:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
-XX:+PreserveFramePointer&lt;br&gt;
-XX:+UnlockDiagnosticVMOptions&lt;br&gt;
-XX:+DebugNonSafepoints&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
`PreserveFramePointer` costs roughly 1-2% CPU on modern JVMs — a well-documented tradeoff. `DebugNonSafepoints` ensures profiling samples resolve to the actual executing line, not the nearest safepoint. On JDK 20+, add `-XX:+DumpPerfMapAtExit` for built-in perf-map generation. For earlier JDKs, use `perf-map-agent`.

This is what gives you `com.myapp.service.OrderService.processPayment` in your flame graphs instead of `0x7f3a2b1c4d50`.

## Step 3: Deploy the eBPF Agent

The pipeline has three layers:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
┌─────────────┐     ┌──────────────┐     ┌───────────────┐&lt;br&gt;
│ Kotlin/JVM  │────▶│ eBPF Agent   │────▶│ Pyroscope /   │&lt;br&gt;
│ Service     │     │ (kernel)     │     │ Grafana Cloud │&lt;br&gt;
│ + perf-map  │     │              │     │               │&lt;br&gt;
└─────────────┘     └──────────────┘     └───────────────┘&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Start with Grafana Beyla for HTTP/gRPC auto-instrumentation. It runs as a sidecar or DaemonSet, requires zero application changes, and gives you request-level latency metrics from kernel space. I had it running in under an hour.

## Step 4: Build Differential Flame Graph Comparisons

Here is the pattern I use in every project. When a new deployment rolls out, you automatically compare the flame graph profile of the canary against the baseline. If P99 latency shifts or a new hot path appears in your Kotlin coroutine dispatchers, the alert fires before the rollout completes.

I saw this pay off firsthand when it caught a Kotlin serialization regression: a single `kotlinx.serialization` codec change that added 12ms at P99. The alert fired within the first 5% of a canary rollout. Traditional metrics-based alerting would not have flagged it until the full deployment was live.

## How the Numbers Compare

| Dimension | OpenTelemetry Java Agent | Grafana Beyla (eBPF) | Pyroscope (eBPF) |
|---|---|---|---|
| JVM restart required | Yes | No | No |
| CPU overhead | 3-8% | &amp;lt;1% | 1-2% |
| Memory overhead | 50-150 MB heap | ~10 MB (kernel) | ~20 MB |
| Coroutine-aware | Partial | N/A (kernel-level) | N/A (kernel-level) |
| Continuous profiling | Requires additional setup | Built-in | Built-in |

The overhead difference is not marginal. It is the difference between profiling being "something we turn on during incidents" and "something that runs continuously in production."

## Gotchas

- **eBPF does not replace distributed tracing.** It does not give you trace context propagation, custom business metrics, or structured log correlation. If you need to trace a request across 15 microservices, you still need distributed tracing. The right architecture is layered: eBPF for continuous profiling, lightweight OTel SDK (not the full agent) for distributed tracing where you actually need it.
- **The docs do not mention this, but** `PreserveFramePointer` must be set before any profiling tool can walk your JVM stacks. Deploy these flags now so the data is ready when you need it.
- **Do not skip `DebugNonSafepoints`.** Without it, your flame graphs will attribute time to safepoint locations instead of the actual hot code. This leads to misleading profiles.
- **Trying to pick eBPF or OTel is a false choice.** Layer them.

## Conclusion

Add `-XX:+PreserveFramePointer` and `-XX:+DebugNonSafepoints` to your JVM flags today. Deploy Grafana Beyla as a sidecar. Then wire canary profile diffs into your deployment gates so regressions get caught before they reach production traffic — not after. That shift in timing changes everything about how you think about performance work.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Incremental Annotation Processing in KSP2</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:59:14 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/incremental-annotation-processing-in-ksp2-18p3</link>
      <guid>https://dev.to/software_mvp-factory/incremental-annotation-processing-in-ksp2-18p3</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KSP2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Incremental&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Processing:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Get&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;500-Module&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Builds&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Under&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;90&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Seconds"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hands-on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;guide&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KSP2's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dirty-set&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;propagation,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Gradle&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;convention&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;plugin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;isolation,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;specific&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;APIs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;stop&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;annotation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;killing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;times."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, android, architecture, performance&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.nickel.is/ksp2-incremental-processing-500-module-kmp-builds-under-90s&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

By the end of this tutorial, you will have a Gradle convention plugin that wires KSP2 processors per source set with full classpath isolation, and you will understand exactly which KSP2 APIs to call — and which to avoid — to keep incremental annotation processing fast in a large KMP monorepo.

Let me show you a pattern I use in every project with more than a handful of modules. It is the difference between waiting 112 seconds and waiting 33 seconds on every incremental build.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Kotlin 1.9+ with KSP2 enabled
&lt;span class="p"&gt;-&lt;/span&gt; A KMP project (even a small multi-module setup works for following along)
&lt;span class="p"&gt;-&lt;/span&gt; Gradle 8.x with configuration cache support
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with writing a basic &lt;span class="sb"&gt;`SymbolProcessor`&lt;/span&gt;

&lt;span class="gu"&gt;## Step 1: Understand Where Your Time Actually Goes&lt;/span&gt;

Before optimizing anything, run &lt;span class="sb"&gt;`--scan`&lt;/span&gt; on your build. Here is what a typical 500-module KMP monorepo looks like:

| Build Phase | Full Build | Incremental (Naive KSP) | Incremental (Optimized KSP2) |
|---|---|---|---|
| Configuration | 8s | 8s | 3.2s (cache-hit) |
| Dependency Resolution | 12s | 4s | 4s |
| Kotlin Compilation | 45s | 9s | 9s |
| Annotation Processing (KSP) | 110s | 85s | 11s |
| Linking / Packaging | 15s | 6s | 6s |
| &lt;span class="gs"&gt;**Total**&lt;/span&gt; | &lt;span class="gs"&gt;**190s**&lt;/span&gt; | &lt;span class="gs"&gt;**112s**&lt;/span&gt; | &lt;span class="gs"&gt;**33.2s**&lt;/span&gt; |

Look at that annotation processing row. Naive KSP drops from 110s to only 85s on incremental builds. Optimized KSP2 drops to 11s. That gap is everything. Most teams obsess over Kotlin compilation speed while ignoring what is actually slow.

&lt;span class="gu"&gt;## Step 2: Use the Right Resolver API&lt;/span&gt;

Here is the gotcha that will save you hours. KSP1 called your processor with &lt;span class="gs"&gt;**all**&lt;/span&gt; symbols on every round. KSP2 gives you a dirty set — only symbols whose source files changed or whose dependencies changed. But it only works if your processor declares its inputs correctly.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
class MyProcessorProvider : SymbolProcessorProvider {&lt;br&gt;
    override fun create(environment: SymbolProcessorEnvironment): SymbolProcessor {&lt;br&gt;
        return MyProcessor(environment)&lt;br&gt;
    }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;class MyProcessor(private val env: SymbolProcessorEnvironment) : SymbolProcessor {&lt;br&gt;
    override fun process(resolver: Resolver): List {&lt;br&gt;
        // CORRECT: Only process new/changed files&lt;br&gt;
        val newFiles = resolver.getNewFiles()&lt;br&gt;
        val symbols = newFiles&lt;br&gt;
            .flatMap { it.declarations }&lt;br&gt;
            .filterIsInstance()&lt;br&gt;
            .filter { it.annotations.any { a -&amp;gt; a.shortName.asString() == "MyAnnotation" } }&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // Process only dirty symbols
    symbols.forEach { generateCode(it) }

    return emptyList() // No deferrals
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The critical mistake: calling `resolver.getAllFiles()` instead of `resolver.getNewFiles()`. That single API choice is the difference between 11s and 85s. Every call to `getAllFiles()` tells KSP2 your processor depends on the entire source set, and incrementality dies.

## Step 3: Avoid Multi-Round Invalidation Traps

Multi-round processing introduces another invalidation vector. When Round 1 generates code that Round 2's processor depends on, KSP2 tracks cross-round dependencies. If your Round 1 output changes, Round 2 re-runs — but **only** for the affected outputs.

The pattern that breaks this: generating files with content derived from aggregated state across all symbols. That creates an implicit dependency on every source file. Split into per-file generators instead.

## Step 4: Wire Classpath Isolation With a Convention Plugin

In a 500-module KMP monorepo, KSP processors often conflict at the classpath level. Module A uses your custom processor v2, Module B still needs v1. Without isolation, Gradle merges these onto a single classpath and you get mysterious symbol resolution failures.

Here is the minimal setup to get this working:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// build-logic/convention/src/main/kotlin/kmp-ksp-convention.gradle.kts&lt;br&gt;
plugins {&lt;br&gt;
    id("com.google.devtools.ksp")&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;kotlin {&lt;br&gt;
    sourceSets {&lt;br&gt;
        commonMain {&lt;br&gt;
            dependencies {&lt;br&gt;
                // Scoped to commonMain only&lt;br&gt;
            }&lt;br&gt;
        }&lt;br&gt;
    }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;dependencies {&lt;br&gt;
    // Per-target processor wiring&lt;br&gt;
    add("kspAndroid", project(":processors:android-specific"))&lt;br&gt;
    add("kspIosArm64", project(":processors:ios-specific"))&lt;br&gt;
    add("kspCommonMainMetadata", project(":processors:shared"))&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;ksp {&lt;br&gt;
    arg("ksp.incremental", "true")&lt;br&gt;
    arg("ksp.incremental.log", "true") // You'll want this for debugging&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This keeps each processor's classpath isolated per source set, preventing conflicts where `com.example.Generated` from one processor shadows another.

KSP2 supports Gradle configuration cache, but only if your `SymbolProcessorProvider` captures no Project references. Store configuration in KSP arguments (`environment.options`), never in captured lambdas. The docs do not mention this, but this alone cut the configuration phase from 8s to 3.2s on cache-hit builds across 500 modules.

## Gotchas

| Anti-pattern | Time Cost Per Build | Fix |
|---|---|---|
| `getAllFiles()` in processor | +60–80s | Switch to `getNewFiles()` |
| Aggregating processors | +30–50s | Split into per-file generators |
| Missing classpath isolation | +15–25s | Convention plugin per source set |
| Configuration cache miss | +5–8s | Remove Project captures from providers |
| Deferred symbol re-resolution | +10–20s | Return empty list from `process()` when possible |

The biggest one: **audit every `getAllFiles()` call first.** Replace them with `getNewFiles()` in every processor. This single change typically delivers 60–80% of incremental build time savings in KSP-heavy projects.

Also, enable `ksp.incremental.log` and actually read it. The log tells you exactly which files triggered reprocessing and why. Without it, you are optimizing blind. Run `--scan` alongside it to correlate KSP rounds with actual wall-clock time.

On a related note, long build cycles mean long stretches at your desk. I keep [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) running during builds to remind me to stretch — those 90-second compile windows are perfect for a quick guided exercise.

## Conclusion

Getting 500-module KMP projects under 90 seconds is not magic. It is disciplined use of KSP2's incremental APIs, strict classpath boundaries, and convention plugins that enforce both. Start with the build scan, follow the data, and fix the processors that lie about their inputs. The payoff is immediate and compounding — every developer on your team gets those minutes back on every single build.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Connection Pool Exhaustion in Spring Boot Under Kotlin Coroutines</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Mon, 01 Jun 2026 14:21:15 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/connection-pool-exhaustion-in-spring-boot-under-kotlin-coroutines-4n7b</link>
      <guid>https://dev.to/software_mvp-factory/connection-pool-exhaustion-in-spring-boot-under-kotlin-coroutines-4n7b</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Connection&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pool&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Exhaustion:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;HikariCP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Under&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Load"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;exhaust&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;HikariCP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pools&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;because&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JDBC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;blocks&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;threads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;can't&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reclaim.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Here's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dispatcher&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sizing,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;R2DBC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;migration,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;circuit-breaker&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;setup&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;handles&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10x&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spikes."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, architecture, api, cloud&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvp-factory.com/connection-pool-exhaustion-kotlin-coroutines-hikaricp&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop, I'll walk you through the exact problem that silently kills Kotlin coroutine backends under load — connection pool exhaustion — and three production-tested fixes. By the end, you will have:
&lt;span class="p"&gt;
-&lt;/span&gt; A clear mental model of &lt;span class="ge"&gt;*why*&lt;/span&gt; &lt;span class="sb"&gt;`suspend fun`&lt;/span&gt; + JDBC is a trap
&lt;span class="p"&gt;-&lt;/span&gt; A dedicated dispatcher configuration sized to your pool
&lt;span class="p"&gt;-&lt;/span&gt; A circuit-breaker wrapper using Resilience4j
&lt;span class="p"&gt;-&lt;/span&gt; An understanding of when R2DBC eliminates the problem entirely

Let me show you a pattern I use in every project that runs coroutines against a relational database.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Kotlin 1.7+ with coroutines
&lt;span class="p"&gt;-&lt;/span&gt; Spring Boot 3.x (WebFlux or MVC with coroutine support)
&lt;span class="p"&gt;-&lt;/span&gt; HikariCP (ships with Spring Boot by default)
&lt;span class="p"&gt;-&lt;/span&gt; Resilience4j for circuit breakers
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with &lt;span class="sb"&gt;`suspend fun`&lt;/span&gt; and &lt;span class="sb"&gt;`Dispatchers.IO`&lt;/span&gt;

&lt;span class="gu"&gt;## Step 1: Understand the Mismatch&lt;/span&gt;

Here is the gotcha that will save you hours. When a coroutine calls a JDBC driver through HikariCP, it &lt;span class="gs"&gt;**blocks the underlying thread**&lt;/span&gt; until the query completes. The coroutine runtime cannot see this. It cannot reclaim that thread. Your &lt;span class="sb"&gt;`suspend fun`&lt;/span&gt; is suspended in name only.

Under normal load, everything looks fine. Under a 3–5x traffic spike, your connection pool drains within seconds, requests queue behind &lt;span class="sb"&gt;`getConnection()`&lt;/span&gt; timeouts, and latency cascades through every downstream service.

Consider a service on &lt;span class="sb"&gt;`Dispatchers.IO`&lt;/span&gt; (default 64 threads) with a HikariCP pool of 10 connections and queries averaging 50ms:

| Metric | Normal load (200 rps) | Spike (2,000 rps) |
|---|---|---|
| Concurrent DB calls | ~10 | ~100 |
| Pool wait time (p99) | &lt;span class="nt"&gt;&amp;lt;&lt;/span&gt; &lt;span class="err"&gt;5&lt;/span&gt;&lt;span class="na"&gt;ms&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; 30,000ms (timeout) |
| Thread utilization | 15% | 100% (starvation) |
| Dropped connections | 0 | cascading failures |

The formula that exposes the problem:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;max_concurrent_db_calls = request_rate × avg_query_duration&lt;br&gt;
2000 rps × 0.05s = 100 concurrent calls vs. 10 pool connections&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;
&lt;span class="nc"&gt;Those&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="n"&gt;excess&lt;/span&gt; &lt;span class="n"&gt;coroutines&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;pin&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="nc"&gt;`IO`&lt;/span&gt; &lt;span class="n"&gt;dispatcher&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;waiting&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;will&lt;/span&gt; &lt;span class="n"&gt;not&lt;/span&gt; &lt;span class="n"&gt;arrive&lt;/span&gt; &lt;span class="n"&gt;before&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="nc"&gt;This&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;dispatcher&lt;/span&gt; &lt;span class="n"&gt;starvation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="err"&gt;##&lt;/span&gt; &lt;span class="nc"&gt;Step&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Isolate&lt;/span&gt; &lt;span class="nc"&gt;JDBC&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nc"&gt;Dedicated&lt;/span&gt; &lt;span class="nc"&gt;Dispatcher&lt;/span&gt;

&lt;span class="nc"&gt;The&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;not&lt;/span&gt; &lt;span class="n"&gt;mention&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;single&lt;/span&gt; &lt;span class="n"&gt;most&lt;/span&gt; &lt;span class="n"&gt;impactful&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;staying&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="nc"&gt;JDBC&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;creating&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;**&lt;/span&gt;&lt;span class="n"&gt;bounded&lt;/span&gt; &lt;span class="n"&gt;dispatcher&lt;/span&gt; &lt;span class="n"&gt;sized&lt;/span&gt; &lt;span class="n"&gt;exactly&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;**:&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Dedicated dispatcher — sized to connection pool&lt;br&gt;
val dbDispatcher = Dispatchers.IO.limitedParallelism(10)&lt;/p&gt;

&lt;p&gt;// Circuit breaker via Resilience4j&lt;br&gt;
val circuitBreaker = CircuitBreaker.of("db", CircuitBreakerConfig.custom()&lt;br&gt;
    .failureRateThreshold(50f)&lt;br&gt;
    .waitDurationInOpenState(Duration.ofSeconds(5))&lt;br&gt;
    .slidingWindowSize(20)&lt;br&gt;
    .build())&lt;/p&gt;

&lt;p&gt;suspend fun getUser(id: Long): User = withContext(dbDispatcher) {&lt;br&gt;
    circuitBreaker.executeSuspendFunction {&lt;br&gt;
        jdbcTemplate.queryForObject("SELECT * FROM users WHERE id = ?", id)&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Here is the minimal setup to get this working. The sizing formula I use in production:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
dispatcher_parallelism = hikari_max_pool_size&lt;br&gt;
hikari_max_pool_size = (core_count * 2) + effective_spindle_count&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This caps the number of coroutines entering the JDBC path at the number of connections available. The circuit breaker handles sustained pressure: when the pool is overwhelmed, it trips open and fails fast. A 2ms failure is better than 100 coroutines each waiting 30 seconds for a connection that never comes.

## Step 3: Evaluate R2DBC for New Services

R2DBC drivers are non-blocking at the protocol level. When a coroutine awaits an R2DBC query, it **truly suspends**. No thread held. Compare:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Blocking JDBC — holds thread for entire query lifecycle&lt;br&gt;
suspend fun getUser(id: Long): User = withContext(Dispatchers.IO) {&lt;br&gt;
    jdbcTemplate.queryForObject("SELECT * FROM users WHERE id = ?", id)&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// R2DBC — true suspension, no thread pinned&lt;br&gt;
suspend fun getUser(id: Long): User {&lt;br&gt;
    return databaseClient.sql("SELECT * FROM users WHERE id = ?")&lt;br&gt;
        .bind(0, id)&lt;br&gt;
        .map { row -&amp;gt; row.toUser() }&lt;br&gt;
        .awaitSingle()&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
| Factor | HikariCP + JDBC | R2DBC |
|---|---|---|
| Thread held during query | Yes | No |
| Coroutine suspension | Fake (thread blocked) | Real (non-blocking) |
| Pool exhaustion under 10x spike | Likely | Manageable |
| Backpressure support | None | Built-in (Reactive Streams) |
| Driver maturity | Excellent | Good (PostgreSQL, MySQL stable) |
| ORM support | Full (Hibernate, jOOQ) | Limited (Spring Data R2DBC) |

R2DBC will not magically give you infinite connections, but it removes thread starvation as a failure amplifier. That difference matters more than it sounds.

## Step 4: Monitor Aggressively

A healthy production setup needs dedicated dispatchers per resource, circuit breakers at each boundary, and dashboards tracking these metrics:

- `hikaricp_connections_pending` — early warning for exhaustion
- `r2dbc_pool_acquired` vs `r2dbc_pool_max_allocated` — pool pressure
- Coroutine dispatcher queue depth — starvation indicator

During the long debugging sessions that come with tuning these systems, I keep [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) running in the background. When you are deep in connection pool traces for hours, those break reminders are the difference between finding the bug and becoming one.

## Gotchas

1. **`Dispatchers.IO` is shared.** If your JDBC calls saturate it, every other coroutine using `IO` — file reads, HTTP calls, logging — starves too. Always isolate with `limitedParallelism`.

2. **Pool size ≠ dispatcher size is a bug.** If your dispatcher allows 64 concurrent coroutines but your pool only has 10 connections, 54 coroutines block on `getConnection()` and hold threads hostage. Match them exactly.

3. **R2DBC does not replace your ORM.** If you rely on Hibernate or complex jOOQ queries, R2DBC with Spring Data is not a drop-in replacement. Evaluate the trade-off honestly before migrating.

4. **Circuit breakers need tuning, not defaults.** A `failureRateThreshold` of 50% with a sliding window of 20 is a starting point. Profile your actual failure patterns and adjust.

## Conclusion

Stop using `Dispatchers.IO` directly for JDBC calls — create a `limitedParallelism` dispatcher sized exactly to your connection pool. For new services with simple data access patterns, use R2DBC to eliminate the blocking mismatch entirely. Put circuit breakers at the connection pool boundary so a tripped breaker that fails in 2ms beats 100 coroutines timing out at 30 seconds each. These three changes turn a system that collapses under 3x load into one that handles 10x spikes without dropping a single connection.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Compile-Time Memory Layout Optimization for On-Device ML Models</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Mon, 01 Jun 2026 08:35:34 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/compile-time-memory-layout-optimization-for-on-device-ml-models-53d8</link>
      <guid>https://dev.to/software_mvp-factory/compile-time-memory-layout-optimization-for-on-device-ml-models-53d8</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ART&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Memory&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Tuning:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Cut&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;On-Device&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ML&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pauses&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;90%"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Profile-guided&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;allocation,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pinning,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;RegionSpace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tuning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;eliminate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;stalls&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;during&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on-device&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ML&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;inference.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Practical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ART&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;strategies&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;actually&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;work."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;android, kotlin, architecture, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://mvpfactory.co/blog/art-memory-tuning-cut-on-device-ml-gc-pauses-90&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

Let me show you a pattern I use in every project that runs ML inference on Android. By the end of this tutorial, you'll know how to eliminate up to 90% of GC-induced frame drops during on-device inference using three concrete strategies: ART profile-guided compilation hints, large object space pinning, and JNI boundary isolation.

Most teams misdiagnose inference jank as a model performance problem. It's not. It's an allocation pattern problem — and I'll walk you through fixing it step by step.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with Android development in Kotlin
&lt;span class="p"&gt;-&lt;/span&gt; An on-device ML pipeline (TFLite, ONNX Runtime, or MediaPipe)
&lt;span class="p"&gt;-&lt;/span&gt; Android Studio with a debuggable build variant
&lt;span class="p"&gt;-&lt;/span&gt; ADB access for profiling GC stats

&lt;span class="gu"&gt;## Step 1: Understand Where Your Allocations Land&lt;/span&gt;

Before changing anything, you need to know what ART's Concurrent Copying (CC) collector does with your tensors. Here is the minimal mental model to get this working:

| Allocation event | Where it lands | GC risk |
|---|---|---|
| Small tensors (&amp;lt;12KB) | RegionSpace TLAB | Low — thread-local, fast |
| Medium tensors (12KB-128KB) | RegionSpace shared regions | Medium — contention + region exhaustion |
| Large tensors (&amp;gt;128KB) | Large Object Space (LOS) | High — LOS collections are expensive |
| JNI native buffers | Native heap (outside ART) | None — invisible to GC |

The docs don't mention this, but most inference frameworks allocate intermediate buffers in the 16KB-256KB range. That's the danger zone where RegionSpace fills quickly and LOS triggers costly collections. I've seen blocking pauses from 5ms to 40ms here — enough to blow a 16ms frame budget.

Profile first: &lt;span class="sb"&gt;`adb shell setprop dalvik.vm.gcstats 1`&lt;/span&gt; captures allocation rates during inference. Target that 12KB-256KB range.

&lt;span class="gu"&gt;## Step 2: Add Profile-Guided Allocation Hints&lt;/span&gt;

This is the lowest-effort, highest-impact change you can make. Since Android 9, baseline profiles influence allocation behavior by marking hot allocation sites for pre-tenuring or region pre-sizing.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// In your baseline profile rules (baseline-prof.txt)&lt;br&gt;
// Mark inference-heavy classes for optimized allocation&lt;br&gt;
HSPLcom/myapp/ml/InferenceSession;-&amp;gt;runInference([F)[F&lt;br&gt;
HSPLcom/myapp/ml/TensorBuffer;-&amp;gt;(I)V&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
ART compiles profiled methods with optimized allocation sequences that reduce TLAB overflow and region contention. This alone cuts minor GC events by 30-40% during inference bursts. Most teams simply forget to include ML pipeline classes in their profile rules.

## Step 3: Pin Large Objects with Direct ByteBuffers

For tensor I/O, use direct `ByteBuffer` allocations that bypass RegionSpace entirely:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Use direct ByteBuffers for large tensor I/O&lt;br&gt;
val inputBuffer = ByteBuffer.allocateDirect(modelInputSize * 4)&lt;br&gt;
    .order(ByteOrder.nativeOrder())&lt;/p&gt;

&lt;p&gt;// These live in native memory, completely outside ART's GC&lt;br&gt;
val outputBuffer = ByteBuffer.allocateDirect(modelOutputSize * 4)&lt;br&gt;
    .order(ByteOrder.nativeOrder())&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This eliminates the CC collector's copy overhead for large, short-lived buffers. For buffers that must remain as managed objects, `sun.misc.Unsafe`-based pinning APIs available through ART internals prevent relocation during CC phases. Expect a 50-60% GC pause reduction from this step alone.

## Step 4: Push the Pipeline Below the JNI Boundary

Here is the gotcha that will save you hours: most teams run inference through managed Kotlin wrappers that create dozens of intermediate managed objects per frame. The real fix is making the JNI boundary your GC firewall.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
class NativeInferenceEngine {&lt;br&gt;
    // All tensor allocation happens in native heap&lt;br&gt;
    external fun initModel(modelPath: String): Long  // returns native handle&lt;br&gt;
    external fun runInference(handle: Long, input: FloatArray): FloatArray&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Only crossing JNI for input/output —
// intermediate tensors never touch managed heap
external fun releaseModel(handle: Long)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Every tensor you keep off the managed heap is a GC pause you'll never see. This is the highest-effort strategy, but it delivers 80-90% pause reduction.

| Strategy | GC pause reduction | Implementation effort |
|---|---|---|
| Baseline profile hints | 30-40% | Low — profile rules only |
| Direct ByteBuffer for I/O | 50-60% | Medium — buffer management |
| Full JNI-boundary isolation | 80-90% | High — native pipeline |
| All three combined | ~90% | High — but worth it for real-time inference |

## Step 5: Tune RegionSpace for Remaining Managed Allocations

For managed allocations you can't eliminate, tune RegionSpace behavior through system properties on debug builds or ART runtime flags:

- Larger regions (512KB vs default 256KB) reduce region exhaustion during bursts
- Increasing thread-local allocation buffer size absorbs more burst allocations before falling back to shared regions
- Adjusting the CC collector urgency threshold prevents premature blocking collections

## Gotchas

- **The 12KB-256KB danger zone**: This is where GC pressure concentrates during inference. Profile this range specifically before optimizing anything else.
- **Forgetting baseline profiles for ML classes**: Your inference pipeline classes need to appear in `baseline-prof.txt`. ART can't optimize what it doesn't know about.
- **Managed wrappers creating hidden allocations**: A single Kotlin convenience layer around your inference engine can generate dozens of managed objects per frame. Audit the allocation path, not just the inference call.
- **Misdiagnosing model performance**: If you see 5-40ms stalls during inference, check GC logs before reaching for a smaller model. The managed heap isn't your enemy — uncontrolled allocation patterns are.

## Wrapping Up

Start with baseline profiles (Step 2) — it's a few lines in a text file and delivers 30-40% improvement. Then move to direct `ByteBuffer` for I/O. Only invest in full JNI isolation when you need real-time inference alongside UI rendering. Control the allocation pattern, and GC pauses during inference stop being a problem.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Xcode Build System Internals</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Fri, 29 May 2026 13:49:21 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/xcode-build-system-internals-3b3l</link>
      <guid>https://dev.to/software_mvp-factory/xcode-build-system-internals-3b3l</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Xcode&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Internals:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Settings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;That&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Cut&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Swift&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Times&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;50%"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;workshop-style&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walkthrough&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Xcode's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;llbuild&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dependency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;graph,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;explicit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;module&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;builds,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;overlooked&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;settings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;halved&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;our&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Swift&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;compilation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;times."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ios, swift, architecture, devops&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/xcode-build-internals-settings-that-cut-swift-compile-times-50&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

Today we're going to profile an Xcode build, identify what's actually slowing it down, and apply three build settings that cut clean build times by roughly 50% in a production Swift codebase with 400+ source files.

No new tools to learn. No migration to Bazel. Just settings that already exist in your project file.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Xcode 16+
&lt;span class="p"&gt;-&lt;/span&gt; A Swift project you'd like to speed up
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`xclogparser`&lt;/span&gt; installed (&lt;span class="sb"&gt;`brew install xclogparser`&lt;/span&gt;)
&lt;span class="p"&gt;-&lt;/span&gt; A few minutes of patience while clean builds run

&lt;span class="gu"&gt;## Step 1: Understand What llbuild Is Doing&lt;/span&gt;

Xcode delegates to &lt;span class="sb"&gt;`llbuild`&lt;/span&gt;, a build engine that models your project as a directed acyclic graph (DAG). Each node is a unit of work — compiling a &lt;span class="sb"&gt;`.swift`&lt;/span&gt; file, linking a framework, copying a resource bundle.

Here is the key insight most teams miss: parallelism is constrained by the longest critical path through this graph, not by your core count. You can have a 16-core M4 Max and still bottleneck on a single serial chain of module dependencies.

Let me show you how to see this for yourself.

&lt;span class="gu"&gt;## Step 2: Profile Your Current Build&lt;/span&gt;

Before changing anything, capture a baseline. Run a clean build, then parse the log:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
brew install xclogparser&lt;/p&gt;

&lt;p&gt;xclogparser parse --project MyApp.xcodeproj --reporter html&lt;/p&gt;

&lt;p&gt;xclogparser parse --project MyApp.xcodeproj --reporter json \&lt;br&gt;
  | jq '.targets[].steps | sort_by(-.duration) | .[0:10]'&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The HTML report gives you a Gantt-chart-style build timeline. Look for long serial chains where modules build sequentially, wide gaps where cores sit idle, and repeated module builds — the hallmark of implicit module thrashing.

## Step 3: Enable Explicit Modules

In implicit module builds (the default prior to Xcode 16), the compiler discovers and builds Clang modules on-demand. If two Swift files both import `UIKit`, the compiler may redundantly build the `UIKit` module map or block waiting on a shared module cache lock. Hidden serialization.

With `SWIFT_ENABLE_EXPLICIT_MODULES = YES`, Xcode scans all source files for imports up front, builds each module exactly once as a discrete graph node, and exposes the full dependency structure to the scheduler.

Here is the minimal setup to get this working. Add to your build settings:

| Setting | Recommended Value | Why |
|---|---|---|
| `SWIFT_ENABLE_EXPLICIT_MODULES` | `YES` | Eliminates implicit module rebuilds, improves parallelism |
| `EAGER_LINKING` | `YES` | Starts linking before all compile tasks finish |
| `SWIFT_ENABLE_BATCH_MODE` | `YES` | Groups files into batches per core (keep enabled) |
| `SWIFT_WHOLE_MODULE_OPTIMIZATION` | `YES` (Release only) | Better codegen but serializes compilation |
| `ENABLE_MODULE_VERIFIER` | `YES` | Catches module map issues that cause silent rebuilds |

## Step 4: Enable Eager Linking

`EAGER_LINKING` is a setting most teams have never touched. By default, the linker waits for every object file before starting. With eager linking, `llbuild` begins the link phase as soon as enough object files are available, overlapping link prep with the tail end of compilation.

In a 400-file target, this shaves real time off the critical path because your last few files to compile are rarely the ones the linker needs first.

## Step 5: Flatten Your Module Graph

Even with explicit modules enabled, a poorly structured module map can reintroduce serialization:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;/p&gt;
&lt;h1&gt;
  
  
  Before: Serial chain
&lt;/h1&gt;

&lt;p&gt;ModuleC → ModuleB → ModuleA → YourTarget&lt;/p&gt;
&lt;h1&gt;
  
  
  After: Flattened imports
&lt;/h1&gt;

&lt;p&gt;ModuleC ─┐&lt;br&gt;
ModuleB ─┼→ YourTarget&lt;br&gt;
ModuleA ─┘&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Set `ENABLE_MODULE_VERIFIER = YES` to surface circular dependencies and unnecessary transitive imports that silently kill parallelism. The docs do not mention this, but if Module A's umbrella header transitionally imports Module B, which imports Module C, you've created a three-deep serial chain that `llbuild` cannot parallelize.

## Gotchas

- **Don't enable `SWIFT_WHOLE_MODULE_OPTIMIZATION` in Debug.** It produces better codegen but serializes compilation — the opposite of what you want during development.
- **Explicit modules can surface hidden dependency issues.** If your builds relied on implicit module discovery papering over missing imports, expect some initial compiler errors. Fix them — they were bugs all along.
- **Profile before AND after.** Run `xclogparser` on both builds. Before we enabled explicit modules, our build timeline showed 6 cores idle while waiting on a chain of implicitly-built Objective-C modules. After the switch, those modules built as parallel leaf nodes. The idle gaps disappeared.
- **Hardware won't save you.** Teams throw cores at this problem when they should be shortening the critical path. A flat, wide dependency graph will always outperform a deep, narrow one regardless of core count.

## Wrapping Up

Here is a pattern I use in every project: treat the build graph as code. Enable explicit modules, turn on eager linking, profile with `xclogparser`, and flatten your module dependencies. These are low-risk changes that give `llbuild` the visibility it needs to schedule work well.

The teams that treat build time as an architecture problem — not a hardware problem — are the ones who actually fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>WebSocket Connection Lifecycle in Mobile Apps</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Fri, 29 May 2026 08:41:41 +0000</pubDate>
      <link>https://dev.to/software_mvp-factory/websocket-connection-lifecycle-in-mobile-apps-5bep</link>
      <guid>https://dev.to/software_mvp-factory/websocket-connection-lifecycle-in-mobile-apps-5bep</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;WebSocket&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Tuning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;That&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Stops&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Silent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Message&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Loss"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;deep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dive&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;into&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;building&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reconnection&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;machine&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;took&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;delivery&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rates&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;~94%&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;99.97%&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lossy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;networks,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;real&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;numbers."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, android, mobile, architecture&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/mobile-websocket-tuning-that-stops-silent-message-loss&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

In this workshop, we're going to build a &lt;span class="gs"&gt;**deterministic WebSocket reconnection state machine in Kotlin**&lt;/span&gt; that handles the ugly realities of mobile networks — doze mode, cell handoffs, carrier NAT expiration, and app backgrounding.

By the end, you'll understand the three-layer keepalive mismatch that silently kills mobile WebSocket connections, and you'll have working code for exponential backoff with jitter and a proper message drain queue. These patterns took our measured delivery rate from ~94% to 99.97% on lossy networks. Let me show you how.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with Kotlin coroutines
&lt;span class="p"&gt;-&lt;/span&gt; Basic understanding of WebSocket connections (OkHttp or Ktor)
&lt;span class="p"&gt;-&lt;/span&gt; An Android project where you're maintaining a persistent WebSocket connection

&lt;span class="gu"&gt;## Step 1: Understand the Three-Layer Keepalive Mismatch&lt;/span&gt;

Here is the gotcha that will save you hours. There are &lt;span class="gs"&gt;**three distinct keepalive mechanisms**&lt;/span&gt; at play, and their defaults are wildly mismatched for mobile:

| Mechanism | Layer | Default Interval | Mobile OS Behavior |
|---|---|---|---|
| TCP Keep-Alive | Transport | 2 hours (Linux) | Suspended in Doze mode |
| WebSocket Ping/Pong | Application | None (optional) | Suspended when app backgrounded |
| HTTP/Proxy Timeout | Infrastructure | 60-120s | Unaware of mobile state |

TCP keep-alive defaults to two hours — effectively useless. Your load balancer kills idle connections in 60 seconds. And both app-level pings and TCP keepalives get suspended in Android Doze mode. The result: your app thinks it's connected, the server has already cleaned up the session, and messages land in a void.

&lt;span class="gu"&gt;## Step 2: Define the State Machine&lt;/span&gt;

Naive retry logic (&lt;span class="sb"&gt;`while(true) { connect(); delay(5000); }`&lt;/span&gt;) gives you thundering herds after outages and duplicate delivery during partial failures. Here is the minimal setup to get this working — a deterministic state machine:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
enum class ConnectionState {&lt;br&gt;
    DISCONNECTED,&lt;br&gt;
    CONNECTING,&lt;br&gt;
    CONNECTED,&lt;br&gt;
    WAITING_FOR_RETRY,&lt;br&gt;
    BACKING_OFF,&lt;br&gt;
    DRAINING_QUEUE&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The states most implementations miss are `BACKING_OFF` and `DRAINING_QUEUE`. When a reconnection succeeds, you **cannot** immediately resume normal operation. You must first drain any queued messages in order, confirming delivery of each before sending the next. Skipping this step is where that 3-8% silent message loss hides.

## Step 3: Tune Your Heartbeat Intervals

Through production testing across ~200K daily active mobile connections, we converged on these values:

| Parameter | Value | Rationale |
|---|---|---|
| App-level ping interval | 25s | Below typical LB idle timeout (60s) |
| Ping timeout (pong expected) | 10s | Aggressive enough to detect dead connections |
| TCP keep-alive interval | 30s | Overridden from 2h default via socket options |
| Initial reconnect delay | 500ms | Fast enough for transient drops |
| Max backoff ceiling | 30s | Prevents multi-minute gaps |
| Jitter range | 0-50% of delay | Prevents thundering herd |

The docs do not mention this, but the 25-second ping interval is deliberate. We measured one major US carrier expiring NAT mappings at 28 seconds on their LTE network. Many teams set pings to 30 or 60 seconds and wonder why connections drop on cellular.

## Step 4: Implement Exponential Backoff with Jitter

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun nextDelay(attempt: Int): Long {&lt;br&gt;
    val exponential = minOf(&lt;br&gt;
        MAX_BACKOFF_MS,&lt;br&gt;
        INITIAL_DELAY_MS * 2.0.pow(attempt).toLong()&lt;br&gt;
    )&lt;br&gt;
    val jitter = (exponential * Random.nextDouble(0.0, 0.5)).toLong()&lt;br&gt;
    return exponential + jitter&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Without jitter, a server restart causes every client to reconnect at exactly the same intervals, creating predictable load spikes. In our load tests, removing jitter turned a 12-second recovery into a 45-second cascading failure. Don't skip the jitter.

## Step 5: Handle Android Doze Mode as a Network Event

Here is a pattern I use in every project that maintains a persistent connection on Android. When Android enters Doze mode, network access is batched into maintenance windows. Your ping timer fires, but the packet doesn't leave the device. When the maintenance window opens, a stale ping goes out, the server has already timed you out, and you get a close frame — or worse, nothing at all.

The fix: listen for `ACTION_DEVICE_IDLE_MODE_CHANGED` broadcasts and treat Doze entry as a **controlled disconnect**. Preemptively move to `DISCONNECTED` state, queue outbound messages, and reconnect immediately on Doze exit. This single change moved our measured delivery rate from 94.2% to 99.6%.

The remaining 0.37% came from proper `DRAINING_QUEUE` handling and server-side message deduplication using idempotency keys.

## Gotchas

- **Most WebSocket libraries leave keepalive at OS defaults.** OkHttp and Ktor wrappers give you a clean API but don't configure socket-level options. Those defaults were designed for servers, not a phone riding the subway. Always configure explicitly.
- **Testing on WiFi in the foreground on a charged device tells you nothing.** Production users are on congested LTE, walking into elevators, with battery saver enabled. The gap between lab and production is enormous.
- **The dead connection problem is about awareness, not connectivity.** A TCP socket can appear open for minutes after the actual network path has failed. Your first priority is detecting death fast, not preventing it.
- **The [jqwik incident](https://news.ycombinator.com/item?id=48319968)** — where a developer embedded a prompt injection in their library that instructed AI coding agents to delete application output — is a reminder that hidden behaviors in dependencies cause real damage. Audit what your WebSocket library actually does under the hood.

## Wrapping Up

The three changes that matter most:

1. **Override TCP keep-alive at the socket level.** Set it to 30 seconds and pair it with a 25-second application-level ping.
2. **Build a state machine, not a retry loop.** Include `DRAINING_QUEUE` as a first-class state and confirm delivery of buffered messages before resuming normal flow.
3. **Treat OS power states as network events.** Proactively disconnect on Doze entry and reconnect on exit instead of waiting for timeout detection, which can take 30+ seconds and silently drop messages.

These patterns are framework-agnostic — whether you're on OkHttp, Ktor, or rolling your own, the principles hold. Start by measuring your actual delivery rate on cellular networks. You might be surprised by how much you're silently losing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
