DEV Community

Kavya
Kavya

Posted on

I opened a pull request on Netflix's codebase. It didn't get merged. Here's why that still mattered.

Let me be honest with you. I was not sitting at a desk with a plan when this happened. I was looking for open source repositories to contribute to, mostly because I needed to build a portfolio and everyone online keeps saying "contribute to open source" without really telling you what that means when you are just starting out.
So I started applying.Metaflow caught my attention because it is a Python ML pipeline framework built by Netflix engineers and used in actual production machine learning workflows. Not a toy project. Not a tutorial repo. Something real people depend on.
I found issue #3144.

The bug
timeout_decorator.py had a problem. The file used signal.SIGALRM — a POSIX signal that lets you set an alarm for a process and interrupt it after a timeout. Clean concept. One problem: SIGALRM does not exist on Windows. It is a Unix-only feature.
So if you were a Windows user trying to use Metaflow's @timeout decorator, Python would throw an AttributeError or ImportError when it hit the signal.SIGALRM line. No helpful message. No graceful exit. Just a crash that tells you nothing useful about why it happened.
This is one of those bugs that is easy to miss when you are building on a Mac or Linux machine, which most ML engineers are. But for anyone on Windows trying to use the framework, it was a silent wall.

What I did
The fix itself was not complicated. I added import sys at the top of the file and wrapped the SIGALRM-dependent code in platform checks.
In task_pre_step, before any SIGALRM logic runs, I added:
pythonif sys.platform == "win32":
raise RuntimeError(
"The @timeout decorator is not supported on Windows "
"(SIGALRM is POSIX-only)."
)
In task_post_step, I added a guard so the cleanup code — signal.alarm(0) — would also be skipped on Windows:
pythonif sys.platform != "win32":
signal.alarm(0)
Eight lines changed. Five removed, eight added. The idea was simple: if you are on Windows, fail immediately with a clear message instead of crashing with a confusing error somewhere deeper in the stack.

The review
This is where it got interesting.
An automated reviewer called Greptile analyzed the PR and gave it a confidence score of 4 out of 5. Safe to merge. The logic was sound. The platform guard was correctly placed before any SIGALRM access could be reached, so the module-level import signal was harmless even on Windows.
But then it flagged something I had missed.

"The error raised here is RuntimeError, but the rest of this decorator uses MetaflowException for user-facing configuration errors."

It was right. I had used a generic Python exception. But everywhere else in that file — and across Metaflow's decorator system generally — they raise MetaflowException, which is the project's own exception class. It routes through Metaflow's formatting pipeline, gets caught cleanly by @catch, and produces a properly formatted error message for the user.
My RuntimeError bypassed all of that. It would work. But it was not how Metaflow does things. It was like writing a fix that solved the problem but ignored the style guide of the codebase you were writing it in.
That is a real thing you only learn by actually reading a large production codebase carefully. Not from tutorials. Not from LeetCode. From opening files and understanding how the pieces fit together.

Why it was closed
The PR was closed without merging. The maintainer closed it, which means either the fix needed changes, or there was something else going on with the issue. My best guess: they wanted MetaflowException instead of RuntimeError before merging, and I had not yet updated the PR when it was closed.
I am not going to pretend that does not sting a little. You spend time reading a codebase used in Netflix's production ML systems, you find a real bug, you write a fix that a bot rates 4 out of 5, and it still does not get merged.
But here is what actually happened. A bot trained on software engineering standards reviewed my code and gave it a near-passing score. A real maintainer looked at it. I learned what MetaflowException is, why it exists, and why consistency in exception handling matters at scale. I learned that production codebases have conventions that go beyond "does the code work" — they care about "does this code fit the system it lives in."
That is not a lesson you get from building your own projects. You only get it by reading someone else's.

What I would do differently
Change RuntimeError to MetaflowException. That is it. One word. And honestly I learned more from that one bot comment than from most courses I have taken, because it was specific, it was about real code I wrote, and it pointed to something I had to go understand before I could fix it. go understand before I could fix it.
If you are a student trying to break into engineering and you have been told to "just contribute to open source" without being told how — here is what actually helped me:
Find a bug that is small enough that you can understand the whole fix but real enough that it matters. Read the files around the bug, not just the bug itself. Understand the conventions of the codebase before you write a single line. And submit the PR even if you are not sure. The review will teach you more than the research.
The PR is closed. The branch is still sitting there with unmerged commits. And I now know exactly what SIGALRM is, why it does not exist on Windows, what MetaflowException is and why Netflix engineers created it, and how production-grade exception handling differs from "just throw an error and see what happens."
That is worth a lot more than a merged commit.

The PR is #3185 on Netflix/metaflow if you want to read the review yourself.

Top comments (0)