Lessons learned from contributing to Apache Airflow after getting my first PR merged - complete rewrite, 7 CI failures, and persistence
After getting my first Apache Airflow PR merged (#58587), I felt pretty confident about the contribution process. So when I found another bug, I jumped right in with what seemed like a perfect solution.
Two weeks and a complete rewrite later, my second PR (#59938) is now merged into Apache Airflow. Here's the real storyβthe good, the messy, and what changed.
π Finding the Bug
It started with a production issue I encountered. Our Airflow scheduler kept crashing with a cryptic error:
InvalidStatsNameException: The stat name (pool.running_slots.data engineering pool π)
has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
Someone had created a pool with spaces and an emoji. Airflow accepted it, but when trying to report metrics, everything broke.
Having just gotten my first PR merged, I thought: "I know how this works now. I can fix this quickly."
Spoiler: It wasn't quick.
π» My "Perfect" Solution: Validation
My approach seemed obvious:
- Add validation when creating pools
- Only allow ASCII letters, numbers, underscores, dots, and dashes
- Reject invalid pool names with a clear error
def validate_pool_name(name: str) -> None:
if not re.match(r"^[a-zA-Z0-9_.-]+$", name):
raise ValueError(
f"Pool name '{name}' is invalid. Pool names must only contain "
"ASCII alphabets, numbers, underscores, dots, and dashes."
)
I wrote tests, updated the news fragment, and submitted the PR with confidence.
Problem solved! Or so I thought.
π₯ The Feedback That Humbled Me
@potiuk (Apache Airflow PMC member) reviewed my PR:
"I do not think it's a good idea to raise issue at pool creation time. This will mean that when you create an invalid pool, things will start crashing soon after. That's quite wrong behaviour."
He suggested a completely different approach: normalize the pool names for stats reporting instead of preventing them.
My heart sank. I'd spent hours on this validation approach, written tests, updated docs. But he was right:
The Problems With My Solution:
- β Users with existing "invalid" pools would be stuck
- β Migration would be complex and painful
- β It would break backward compatibility
- β Pools would be created but then crash the scheduler
The Better Approach:
- β Keep existing pools working
- β Normalize names only for stats reporting
- β Warn users, but don't break their systems
- β Graceful degradation instead of failures
Lesson 1: "Working" code isn't the same as "right" code.
My first PR was accepted with minor tweaks. This time, I needed to completely rethink the solution.
π The Rewrite: Normalization
I threw away my validation code and started fresh:
def normalize_pool_name_for_stats(name: str) -> str:
"""
Normalize pool name for stats reporting by replacing invalid characters.
Stats names must only contain ASCII alphabets, numbers, underscores,
dots, and dashes. Invalid characters are replaced with underscores.
"""
# Check if normalization is needed
if re.match(r"^[a-zA-Z0-9_.-]+$", name):
return name
# Replace invalid characters with underscores
normalized = re.sub(r"[^a-zA-Z0-9_.-]", "_", name)
# Log warning
logger.warning(
"Pool name '%s' contains invalid characters for stats reporting. "
"Reporting stats with normalized name '%s'. "
"Consider renaming the pool to avoid this warning.",
name,
normalized,
)
return normalized
Instead of preventing "bad" pool names, we:
- Accept any pool name (backward compatible)
- Normalize it when reporting metrics (fixes the crash)
- Log a warning (educates users)
- Suggest renaming (guides to best practices)
This was objectively better. And I would never have thought of it without the feedback.
Lesson 2: Maintainers see the bigger picture. Listen to them.
π€ The Static Check Marathon
I pushed my rewritten code. CI failed:
β Missing blank lines
β Import order wrong
β LoggingMixin usage incorrect
β Missing 're' module import
I fixed them. Pushed again. CI failed again with different formatting issues.
This happened SEVEN TIMES.
Each time:
- CI would auto-format and show what it wanted
- I'd try to apply the fixes manually (Windows, no local pre-commit)
- I'd push
- New formatting errors would appear
By attempt #5, I was frustrated. By attempt #7, I was questioning my career choices.
Lesson 3: Set up your local environment properly BEFORE you start coding.
β¨ The Breakthrough
On attempt #8, I finally:
- Used PowerShell to apply exact formatting from CI diffs
- Added proper blank lines (2 after logger, 2 after functions)
- Fixed import order alphabetically
- Replaced LoggingMixin with module-level logger
import logging
logger = logging.getLogger(__name__)
def normalize_pool_name_for_stats(name: str) -> str:
# Function code...
return normalized
class Pool(Base):
# Class code...
All checks passed! π
@potiuk approved with "Two nits" (which I quickly fixed). Minutes later, the PR was merged into Apache Airflow's main branch.
Lesson 4: Persistence beats perfection. Keep going.
π First PR vs Second PR: The Differences
My First PR (#58587):
- β±οΈ Time: 3 days
- π Major rewrites: 0
- β CI failures: 2
- π Commits: 4
- π Learned: The contribution process
- β Status: Merged
My Second PR (#59938):
- β±οΈ Time: 2 weeks
- π Major rewrites: 1 (complete approach change)
- β CI failures: 7
- π Commits: 16
- π Learned: How to handle feedback, rewrites, and persistence
- β Status: Merged into Apache Airflow
The second one taught me WAY more.
π What I Learned (That My First PR Didn't Teach Me)
Technical Lessons:
1. Think About Backward Compatibility
My first solution would have broken existing users. Always ask: "What happens to people already using this?"
2. Graceful Degradation > Hard Failures
Warn users and normalize data instead of crashing. Systems should be resilient.
3. Pre-commit Hooks Are Non-Negotiable
Don't use CI as your formatter. Set up pre-commit locally FIRST.
4. Read Diffs Carefully
CI was telling me exactly what it wanted. I just needed to pay attention.
Soft Skills Lessons:
1. Be Ready to Throw Away Your Work
I spent hours on validation code. All of it went in the trash. That's okay. It's part of learning.
2. Feedback Isn't Personal
Potiuk wasn't criticizing me. He was helping me build something better. There's a huge difference.
3. Persistence Matters More Than Talent
7 CI failures felt embarrassing. But I kept going, and eventually it worked.
4. Document Your Thinking
In my PR description, I explained WHY I chose normalization after feedback. This helped reviewers understand my thought process.
π― Advice for Your Second (or Third, or Tenth) PR
When You Get Critical Feedback:
Don't:
- β Defend your solution immediately
- β Make minimal changes hoping they'll accept it
- β Take it personally
Do:
- β Read the feedback carefully (twice!)
- β Ask questions if you don't understand
- β Be willing to start over if needed
- β Thank reviewers for their time
When CI Keeps Failing:
Don't:
- β Push 10 commits trying to guess the fix
- β Ignore error messages
- β Give up after 3 failures
Do:
- β Set up local pre-commit hooks
- β Read the CI diff output carefully
- β Apply fixes locally and test before pushing
- β Ask for help if you're stuck after 3-4 attempts
When You Need to Rewrite:
Don't:
- β Try to salvage the old approach
- β Rush the rewrite
- β Skip tests because you're frustrated
Do:
- β Start with a clean slate
- β Think through the new approach carefully
- β Write better tests based on what you learned
- β Update documentation to match
π Why You Should Keep Contributing
My first PR was smooth sailing. My second was rough waters.
That's exactly how learning works.
Each contribution teaches you something new:
- First PR: The basics (fork, commit, PR, review)
- Second PR: Handling feedback and major rewrites
- Third PR: You'll discover this next!
For your career:
- Real-world experience with design decisions
- Proof you can handle feedback and pivot
- Stories to tell in interviews ("I once had to completely rewrite my approach...")
- Connections with senior engineers
For your skills:
- Understanding trade-offs (validation vs normalization)
- Production-quality code standards
- Communication under pressure
- Resilience and persistence
π‘ Your Turn
If you've contributed once and it went well, do it again.
The second one will probably be harder. You might get asked to rewrite. CI might fail repeatedly. Reviewers might question your approach.
That's when the real learning happens.
My second PR took 4x longer than my first. It also taught me 10x more.
What will your next contribution teach you?
π Resources
My Merged PRs:
- First PR (Merged): apache/airflow#58587 β
- Second PR (Merged): apache/airflow#59938 β
- Issue: apache/airflow#59935
Helpful Links:
Originally published on Medium: https://medium.com/@kalluripradeep99/rewriting-my-apache-airflow-pr-when-your-first-solution-isnt-the-right-one-8c4243ca9daf

Top comments (0)