DEV Community

Steven Hur
Steven Hur

Posted on

Continuous Journey through Dagster - bugs and testing

Lately, I've been diving deep into open-source contributions for Dagster. I think I am getting bit more comfortable with their codebases which hastened my working process(placebo?). Today, I want to share the issues I've tackled recently and talk about a significant roadblock I'm currently facing.

My Recent Contributions
I focused on fixing several bugs and improving stability across different parts of the Dagster. Here is a breakdown of the issues I worked on:

  1. Fixing ECS Pipes Client Execution

The Issue: Users were encountering an IndexError when launching tasks using the PipesECSClient. This caused pipelines to crash unexpectedly in ECS environments.

The Fix: I added proper exception handling and bounds checking to ensure the client launches tasks smoothly without crashing on index errors.

Issue #32936

  1. Resolving Asset Specs Mapping Dependencies

The Issue: There was a logic error in AssetsDefinition.map_asset_specs that caused failures when attempting to add dependencies while input definitions were already set.

The Fix: I adjusted the core logic to correctly handle the mapping of asset specs even when inputs are pre-configured.

Issue #32913

  1. [WIP]Correcting Asset Sensor Event Processing

The Issue: The asset_sensor had a critical bug where it would only process the last materialization event if multiple partitions materialized simultaneously. This issue stems from a race condition, making it notoriously difficult to reproduce and debug in a local environment.

The Fix: Still working in progress but initially, I modified the sensor logic to ensure every single materialization event is captured and processed, regardless of concurrency. Precise approach with careful testing is required for further progress.

Issue #32853

  1. [WIP]Implementing Merge Support for Polars & Delta Lake

The Use Case: Currently, the dagster-deltalake I/O manager allows writing data, but it lacks out-of-the-box support for the merge operation when using Polars.

The Implementation: I am working on updating the dagster_deltalake/handler.py to support merge mode. The logic involves checking if the write mode is set to merge. If so, instead of calling the standard write_deltalake() function, it creates a DeltaTable object and executes the merge operation.

Issue #32644

The CI
While fixing the code was satisfying, getting the Pull Requests (PRs) merged has been a different story. I am currently stuck in a loop regarding CI tests.

The Situation:

  1. I run the unit tests locally on my environment, and everything passes perfectly.
  2. I push the code to GitHub, and the CI pipeline fails.
  3. Because of this, I can't get a proper code review from the maintainers.

It is frustrating because I cannot reproduce the errors locally. It could be an environment configuration mismatch, a linting rule that strictly applies in CI, or a hidden dependency issue.

Next Steps
I plan to reach out to the Dagster team and the community for guidance. I need to understand how their CI environment differs from a standard local setup so I can replicate the failure and fix it. Sometimes, reading thousands of lines of codes and fixing errors is easier than testing.

Top comments (0)