Sachin Chaurasiya

Posted on Dec 29, 2025

Open Source is Not Just for Hacktoberfest

#ai #python #opensource #agents

Every October, the developer community buzzes with Hacktoberfest energy. PRs fly, t-shirts are earned, and GitHub turns green. But here's what nobody talks about: what happens in November?

For most contributors, the answer is simple: nothing. The repositories they contributed to become distant memories, the architecture they briefly touched remains unexplored, and the relationships they could have built with maintainers never formed.

I decided to do things differently this year. Let me tell you how one Hacktoberfest discovery turned into a masterclass in production AI systems.

The Problem: October-Only Open Source

Let's be honest about the Hacktoberfest paradox:

The Surface-Level Contribution Trap: Quick documentation fixes and typo corrections are valuable, don't get me wrong. But if that's ALL you do, you're missing the forest for the trees. The real learning happens when you understand why the code is structured a certain way, not just that a semicolon was missing.

Chasing Swag Over Growth: T-shirts and badges are nice, but they don't teach you how to build production-grade systems. The developers who grow fastest are the ones who stick around after October.

Missed Opportunities: The best open source contributions come from contributors who understand the codebase deeply. That takes time, more than one month.

The Discovery: Finding Skyflo.ai

While searching for interesting projects to contribute to, I stumbled upon Skyflo.ai, an open-source AI agent for DevOps and cloud-native operations.

As someone actively learning AI engineering and building agent architectures, this wasn't just another contribution opportunity. It was exactly what I was looking to learn:

LangGraph for stateful agent orchestration
MCP (Model Context Protocol) for standardized tool execution
Human-in-the-loop safety patterns
Kubernetes-native deployment

Instead of submitting a quick PR and moving on, I decided to dive deep.

What is Skyflo.ai?

Skyflo.ai is an AI copilot that unifies Kubernetes operations and CI/CD systems behind a natural-language interface. Instead of memorizing CLI commands or clicking through UIs, you just tell Skyflo.ai what you want:

Show me the last 200 lines of logs for checkout in production. 
If there are errors, summarize them.

Or:

Progressively canary rollout auth-backend in dev through 10/25/50/100 steps

The magic is in how it does this safely, with human approval required for any mutating operation.

Understanding the Architecture

This is where the real learning happened. Skyflo's architecture is a textbook example of how to build production AI agent systems.

The Three-Layer Design

1. Frontend Layer: Command Center

Built with Next.js, TypeScript, and Tailwind
Real-time streaming: SSE to frontend, Streamable HTTP for MCP
Shows every action the agent takes in real-time

2. Intelligence Layer: The Engine

FastAPI backend with LangGraph workflows
Manages the Plan → Execute → Verify loop
Handles approvals and checkpoints
Real-time SSE updates to UI

3. Tool Layer: MCP Server

FastMCP implementation for tool execution
Standardized tools for kubectl, Helm, Jenkins, Argo Rollouts
Safe, consistent actions across all integrations

Why This Architecture Works

The separation of concerns is beautiful:

UI changes don't affect the agent logic
New tools can be added without touching the intelligence layer
Each component is independently deployable and testable

Key Learnings from Production AI Systems

1. LangGraph for Stateful Agents

Traditional LLM chains are stateless—you send a prompt, get a response, done. But real-world AI agents need to:

Remember context across multiple steps
Handle failures gracefully with checkpoints
Support human intervention at any point

LangGraph provides graph-based orchestration that enables all of this. The agent's workflow is defined as nodes and edges, with state persisted at each step.

Here's how Skyflo.ai implements this workflow in engine/src/api/agent/graph.py:

from langgraph.graph import StateGraph, START, END

def _build_graph(self) -> StateGraph:
    workflow = StateGraph(AgentState)

    # Define the workflow nodes
    workflow.add_node("entry", self._entry_node)
    workflow.add_node("model", self._model_node)
    workflow.add_node("gate", self._gate_node)
    workflow.add_node("final", self._final_node)

    # Define the flow
    workflow.add_edge(START, "entry")
    workflow.add_conditional_edges(
        "entry", route_from_entry,
        {"gate": "gate", "model": "model"}
    )
    workflow.add_conditional_edges(
        "model", route_after_model,
        {"gate": "gate", "model": "model", "final": "final"}
    )
    workflow.add_conditional_edges(
        "gate", route_after_gate,
        {"model": "model", "final": "final"}
    )
    workflow.add_edge("final", END)

    return workflow

This creates a stateful workflow where the agent can loop between planning (model), execution (gate), and verification phases until the task is complete.

2. MCP: The USB-C for AI

Model Context Protocol is becoming the standard for how AI agents interact with tools. Instead of building custom integrations for each tool (the "M x N nightmare"), MCP provides:

A universal interface for tool discovery
Standardized invocation patterns
Clean separation between agent logic and tool implementation

Think of it as "OpenAPI for AI agents."

3. Human-in-the-Loop is Non-Negotiable

For DevOps operations, automatic execution without approval is dangerous. Skyflo's pattern:

Agent creates a plan
User reviews and approves
Agent executes
Agent verifies the outcome
Repeat until complete

This Plan → Execute → Verify loop with human approval gates is a pattern every production AI system should adopt.

4. Real-Time Streaming Builds Trust

Users need to see what the agent is doing. Skyflo.ai streams every action in real-time:

Tool invocations
Intermediate reasoning
Execution results
Verification steps

This transparency is critical when your agent is touching production infrastructure. Skyflo.ai streams events via SSE from the Engine to the UI, while the Engine communicates with the MCP server over Streamable HTTP transport for efficient tool execution.

5. Safety-First Design

Key safety patterns I observed:

Dry-run by default for Helm operations
Diff-first before any apply
Approval gates for all mutations
Audit logging of every action

Architecture Note: The communication between components uses different protocols optimized for their use case:

Engine → UI: Server-Sent Events (SSE) for real-time user feedback
Engine → MCP Server: Streamable HTTP transport for tool execution

My Contributions to Skyflo.ai

Over the past few months, I've contributed multiple features and fixes to Skyflo.ai:

Features Shipped

Jenkins Build Control: Added tools to stop/cancel running builds, enabling full CI/CD lifecycle management
Kubernetes Rollout Management: Implemented rollout history and rollback tools for safer deployments
Helm Template Rendering: Added helm_template tool to preview manifests before deployment
Label Selector for K8s Resources: Enhanced k8s_get tool with label selectors for more precise resource queries
Chat History Search: Implemented debounced server-side search for better conversation management

Bug Fixes & UX Improvements

Message Continuity Fix: Resolved critical issue where chat messages disappeared after tool approval/denial
Approval Flow Refinement: Streamlined approval action handling and message finalization in the UI
Navigation Enhancements: Made logo clickable and added GitHub project link to navbar
Profile Management: Fixed button state management for profile updates
SSE Timeout Fix: Increased Nginx proxy timeouts to prevent 60-second SSE connection cutoffs

Documentation

Fixed architecture guide link in CONTRIBUTING.md to help new contributors

Each contribution taught me something new about production AI systems, from SSE streaming patterns to Kubernetes operations safety.

My Journey: Challenges and Breakthroughs

Contributing to Skyflo.ai wasn't always smooth sailing. Here are the challenges I faced and what I learned from them:

Understanding LangGraph State Management

The Challenge: At first, I didn't understand how state flows through the workflow nodes. The conditional edges and state updates seemed complex.

The Breakthrough: After reading through engine/src/api/agent/graph.py and experimenting locally, I realized that LangGraph's state is additive, each node returns updates that merge with the existing state. This pattern makes it easy to maintain conversation context while allowing nodes to be independent.

Decoding the MCP Abstraction

The Challenge: The abstraction between the Engine, MCP Client, and MCP Server initially confused me. I couldn't understand why we needed three separate components.

The Breakthrough: Once I traced a tool call through the entire flow, it clicked:

Engine receives user intent via LLM
MCP Client (engine/src/api/services/mcp_client.py) acts as a bridge
MCP Server (mcp/tools/) executes actual kubectl/helm commands

This separation means you can swap out tools without touching the agent logic—brilliant architecture.

Grasping the Approval Flow

The Challenge: Understanding when and how approval gates trigger took time. The interaction between approval_decisions state and ApprovalPending exceptions was not immediately obvious.

The Breakthrough: I discovered that the gate node checks if a tool requires approval, then raises ApprovalPending which halts execution. The state is checkpointed, and when the user approves/denies, the workflow resumes from exactly where it stopped. This is production-grade error handling.

Beyond Hacktoberfest: The Year-Round Journey

October: The Starting Point

Hacktoberfest is a fantastic catalyst. It lowers the barrier to entry and introduces you to projects you'd never discover otherwise. Use it as your launchpad, not your destination.

November-December: Go Deeper

This is where the real learning happens:

Read the entire codebase, not just the file you're changing
Join Discord/Slack discussions to understand the roadmap
Pick up complex issues that intimidate you
Ask questions about architectural decisions

Year-Round: Become a Maintainer

Consistent contributions build trust. Over time:

You start reviewing other contributors' PRs
Maintainers ask for your input on design decisions
You shape the project's future direction
You build lasting professional relationships

The Contributor Mindset

Here's what separates occasional contributors from impactful ones:

1. Choose projects you want to learn from

Don't just pick easy projects to farm PRs. Pick projects that use technologies you want to master. My contribution to Skyflo.ai taught me more about production AI systems than any tutorial could.

2. Contribute in multiple ways

Code features and bug fixes
Documentation improvements
Test coverage
Code reviews
Community support

All contributions are valuable. Mix them up.

3. Build relationships

Open source is as much about people as it is about code. The maintainers and contributors you meet today become your professional network tomorrow.

4. Track your growth, not just your PRs

The real metric isn't merged PRs, it's skills gained, patterns learned, and confidence built.

Your Turn: Start Now

Whether it's Skyflo.ai or another project that excites you, the best time to start contributing is now. Not next October, now.

Here's your action plan:

Find a project that aligns with what you want to learn
Read the contributing guidelines and code of conduct
Set up the development environment locally
Start with the codebase, not the issues, understand before you contribute
Join the community (Discord, Slack, Discussions)
Pick your first issue and dive in
Stay around after your first PR merges

Resources

Skyflo.ai GitHub: github.com/skyflo-ai/skyflo
Skyflo.ai Discord: discord.gg/kCFNavMund
LangGraph Docs: docs.langchain.com/langgraph
Model Context Protocol: modelcontextprotocol.io

Connect With Me

I'm sharing my AI engineering journey, open source contributions, and developer productivity tips:

What's a project that taught you something unexpected? Drop a comment, I'd love to hear your open source stories.

Top comments (1)

Karan Jagtiani • Dec 30 '25

Solid write-up, Sachin. This is exactly the mindset I want around Skyflo.ai: go past the “easy PRs”, understand the product and its invariants, then ship changes that preserve safety.

If you’re reading this and want to contribute: run the stack locally, trace one tool call end-to-end (UI → Engine → MCP → Engine → UI), then pick an issue. Any issue. Every open issue maps to a real production problem. That’s where the learning and impact compound.