Hitesh Sachdeva

Posted on Dec 6, 2025

When a Bug Isn't Actually a Bug: My Kestra Contribution Story

#java #opensource #backend #learning

Overview

This week, I focused on a bug in Kestra, a workflow orchestration platform. Among my various open-source contributions, this one taught me an important lesson: sometimes what looks like a bug is actually a carefully considered design decision.

The Problem

I found an issue where the followExecution API endpoint was sending empty execution objects to users. When someone called this endpoint to track a workflow execution in real-time using Server-Sent Events (SSE), the first event they got looked like this:

// First event sent - almost empty
emitter.next(Event.of(Execution.builder()
    .id(executionId)
    .build()
).id("start"));

The object only had an ID with no actual execution data. This confused people using the SDK because they'd receive this empty object and wonder what went wrong. The issue also mentioned that if someone sent an invalid execution ID, they'd still get this fake object instead of a proper error.

My Solution

I spent time working on a fix that seemed straightforward: validate the execution exists first, then send complete data. My approach was to use an Await.until() method to wait up to 10 seconds for the execution to exist in the database:

// My approach - validate first
Execution execution = Await.until(
    () -> executionRepository.findById(tenantService.resolveTenant(), executionId).orElse(null),
    Duration.ofMillis(500),
    Duration.ofSeconds(10)
);

// Send complete execution data
emitter.next(Event.of(execution).id("start"));

I also added proper error handling that would return 404 errors for invalid execution IDs, and wrote unit tests to validate the behavior. Everything worked - the first event now contained real data instead of an empty shell.

Why It Got Rejected

The maintainer reviewed my PR and explained it didn't actually solve the problem. I was confused at first because my code clearly sent complete data. Then he pointed me to an earlier discussion where another maintainer had explained a critical constraint I missed.

The empty object was intentional. It exists to prevent resource leaks.

Here's the issue: Server-Sent Events work by keeping a connection open between server and client. If someone closes their browser tab before receiving any data, the server doesn't know the client disconnected. The connection just hangs there indefinitely, wasting memory and resources. These "phantom" connections can pile up and cause serious problems.

The solution was to send something - anything - immediately when the connection opens. This way, the server establishes the connection properly and can detect when clients disconnect. Even though it's just an empty object with an ID, it serves this important technical purpose.

Understanding the Trade-off

This was a classic engineering trade-off:

Option A: Send empty data immediately

Prevents resource leaks
Server can detect disconnections
Confuses API users

Option B: Wait and send complete data

Better developer experience
Clear, complete information
Risk of phantom connections and memory leaks

For a workflow orchestration platform that might handle thousands of concurrent executions, preventing resource leaks was more important than having perfect API responses. The team chose stability over convenience.

What Happened Next

After internal discussion, the team decided to close the issue. The empty object behavior would stay, and they'd just document this quirk in the SDK so developers would understand why the first event is incomplete.

My PR was closed without being merged, but the maintainer took time to explain why. He could have just rejected it immediately, but instead he helped me understand the architectural reasoning.

What I Actually Learned

About Server-Sent Events:

I learned how SSE connections work and why connection lifecycle management matters. The immediate response pattern prevents resource leaks - something I hadn't considered when I first read the issue.

About Trade-offs:

Not every problem has a perfect solution. Sometimes you choose the "less bad" option. In this case:

System stability vs perfect API responses
Resource management vs user confusion
Code fixes vs documentation

The team chose stability because Kestra needs to handle production workloads reliably.

About Reading Code:

When I saw "fake empty object" in the issue description, I assumed it was obviously wrong and jumped straight to fixing it. I should have asked myself: why was it implemented this way in the first place? There's usually a reason.

About Communication:

I asked the maintainer to clarify what I was missing, and he explained the internal discussion and decision. Sometimes issues get closed not because your code is bad, but because the team decides the current behavior is actually acceptable given the constraints.

Key Takeaways

Not all "bugs" are actually bugs. Sometimes weird behavior exists for good technical reasons. Ask why before assuming something is wrong.

Trade-offs are everywhere. Perfect solutions rarely exist. Teams make choices based on priorities - in this case, system stability over API convenience.

Closed PRs still teach you things. I learned about SSE, resource management, and architectural trade-offs even though nothing merged.

Context matters. Maintainers understand the bigger picture - system requirements, production constraints, past problems. They might reject your fix because they know something you don't.

Moving Forward

This experience changed how I approach issues:

Read the existing implementation carefully and ask why it works that way
Look for comments explaining unusual patterns
Ask about constraints before proposing solutions
Understand that maintainers aren't being difficult - they're protecting the system

Not every contribution makes it to production, but every attempt teaches you something about real-world software development.

DEV Community