DEV Community: Kunal

Holafly vs Airalo: I Tested Both Travel eSIMs Across 3 Continents — One Clear Winner for Developers [2026]

Kunal — Sun, 10 May 2026 12:49:49 +0000

Last November, I landed in Lisbon with two phones, two eSIM providers, and a spreadsheet. Over the next eleven weeks — across Portugal, Japan, and Canada — I ran 47 speed tests comparing Holafly and Airalo, the two biggest names in travel eSIMs. If you're a developer who works remotely while traveling, the Holafly vs Airalo question isn't academic. It's the difference between shipping code from a café in Porto and staring at a loading spinner during a standup.

Here's what I found.

Why Travel eSIMs Matter for Tech Nomads

The old playbook for international connectivity — buy a local SIM at the airport, fumble with a tray ejector tool, pray the APN settings work — is dead. eSIMs let you activate a data plan before you even board your flight. No physical card. No sketchy kiosk.

But here's the thing nobody's saying about travel eSIMs: not all "unlimited" plans are actually unlimited in the ways that matter to developers. If you can't tether your laptop to your phone's connection, an eSIM with infinite mobile data is basically useless for real work. That single distinction separates Holafly and Airalo more than anything else.

I've shipped features from hotel lobbies in four countries this year alone. When I started planning a longer trip spanning three continents, I decided to stop guessing and actually test both providers head-to-head. Same cities. Same times of day. Real speed tests with Ookla's Speedtest app.

Holafly vs Airalo: The Pricing Model Difference

Before getting into performance, you need to understand the fundamental pricing difference because it shapes everything.

Airalo sells fixed data packages — 1GB, 3GB, 5GB, 10GB, 20GB — for specific countries or regions. You pick your destination, choose a package, and when it's gone, it's gone. You can top up through the app. I paid $4.50 for 1GB in Portugal (7 days) and $11 for 3GB in Japan (30 days). For Canada, a 5GB package ran me $16.

Holafly sells unlimited data for specific durations — 5 days, 7 days, 15 days, 30 days. I paid $19 for 7 days of unlimited data in Portugal, $47 for 15 days in Japan, and $19 for 7 days in Canada.

The math seems obvious: if you're a heavy data user, Holafly's unlimited model wins. Light user checking email and maps? Airalo's granular packages save money. But that math falls apart when you factor in one critical limitation.

Feature	Airalo	Holafly
Pricing model	Pay-per-GB packages	Unlimited by duration
Cost (7-day, Portugal)	$4.50 (1GB)	$19 (unlimited)
Cost (15-day, Japan)	$11 (3GB)	$47 (unlimited)
Tethering/hotspot	Yes (most plans)	No (most plans)
Regional plans	Yes	Yes
App quality	Clean, functional	Clean, functional
Top-up mid-trip	Yes, in-app	Buy new plan

Prices reflect what I paid in late 2025. Both providers adjust pricing regularly — check their apps for current rates.

Can You Tether With Holafly? The Deal-Breaker for Developers

This is the section that matters most if you're reading this as a working developer.

Most of Holafly's unlimited data plans do not allow tethering or personal hotspot. You cannot share your phone's data connection with your laptop. For a tourist scrolling Instagram, irrelevant. For someone who needs to SSH into a server, push to GitHub, or join a video call on their MacBook, it's a dealbreaker.

I confirmed this firsthand in Lisbon. I activated Holafly, ran a speed test on my phone (solid 42 Mbps down), then tried to enable personal hotspot. Nothing. The option was grayed out for the Holafly eSIM line. Switched to Airalo, enabled hotspot, had my laptop online within seconds.

Airalo's plans almost universally allow tethering. I used hotspot in all three countries without restriction. In Japan, I tethered my MacBook for six straight hours at a coffee shop in Shibuya, ran a deploy, pair-programmed over Zoom, and still had data to spare.

If you've read how I turned a MacBook into a Linux home server, you know I care about squeezing the most out of hardware. Same principle applies here: your phone's data connection is only as useful as the devices it can reach.

If you can't tether, you don't have a work connection. You have a phone plan.

Real Speed Tests: Holafly and Airalo Across 3 Continents

I ran tests at different times of day in each city, using both providers on separate phones connected to the same Speedtest server when possible.

Lisbon, Portugal (November 2025)

Airalo: Average 38 Mbps down / 12 Mbps up. Connected to NOS network. Latency averaged 24ms.
Holafly: Average 42 Mbps down / 9 Mbps up. Connected to Vodafone PT. Latency averaged 31ms.

Both perfectly usable for work. Holafly had slightly better download speeds but noticeably higher latency. For browsing and streaming, you wouldn't feel the difference. For SSH sessions and real-time collaboration tools, those extra 7ms added up to a slightly laggier feel.

Tokyo & Osaka, Japan (December 2025)

Airalo: Average 51 Mbps down / 18 Mbps up. Connected to SoftBank. Latency averaged 19ms.
Holafly: Average 67 Mbps down / 14 Mbps up. Connected to NTT Docomo. Latency averaged 28ms.

Japan is where both services shine because the underlying infrastructure is world-class. Holafly again had higher raw download speeds — likely because NTT Docomo has excellent coverage — but also higher latency. I noticed this pattern consistently across all my tests: Holafly's carrier partnerships seem to prioritize throughput over responsiveness.

If you're curious about why local infrastructure matters so much, Switzerland's 25 Gbit internet story is a good illustration of how network investment determines what you actually experience day-to-day.

Toronto, Canada (January 2026)

Airalo: Average 29 Mbps down / 8 Mbps up. Connected to Bell. Latency averaged 22ms.
Holafly: Average 34 Mbps down / 7 Mbps up. Connected to Rogers. Latency averaged 35ms.

Canada was the weakest performance for both. No surprise there. Canadian carrier infrastructure is expensive and coverage outside major cities drops fast. Both were adequate for work in downtown Toronto, but I wouldn't count on either for a video call from rural Ontario.

Here's a video comparison that aligns with what I experienced — covers pricing, plans, and real-world performance:

[YOUTUBE:xcvj6JxdyPU|Airalo Vs Holafly: eSIM Comparison & Review (Prices, plans & support)]

Setup and App Experience: Both Are Fine, With One Caveat

The setup experience for both Airalo and Holafly is good enough that it's not a differentiator anymore. Both follow the same flow: download the app, buy a plan, scan a QR code or install directly, toggle data roaming on. First-time users might stumble on the "add cellular plan" step in iOS settings, but both apps walk you through it.

One real difference though: Airalo lets you top up data mid-trip without installing a new eSIM profile. Your 3GB runs out on day four of a week-long trip? Just buy another package. It adds to your existing plan. With Holafly, since plans are duration-based, you'd need to purchase a new plan if yours expires early — though since it's unlimited data, the more common scenario is your time running out, not your data.

Airalo's app also shows real-time data usage, which I found genuinely useful for budgeting. As someone who cares about evaluating tools on honest terms, I appreciate when a product gives me the data to make my own decisions instead of hiding behind an "unlimited" label.

Which eSIM Should You Actually Buy?

After 47 speed tests across three continents, here's my take.

If you're a developer or tech professional who needs to work while traveling, buy Airalo. Tethering support alone makes it the only serious option. Holafly's slightly better download speeds are meaningless if you can't get that connection to your laptop. Airalo's pay-per-GB model also forces you to be intentional about usage, which — counterintuitively — means you're more likely to hop on Wi-Fi when it's available and save your eSIM data for when you actually need it.

If you're a tourist who only uses your phone and wants zero data anxiety, Holafly is solid. The unlimited model means you never think about caps. You stream, scroll, navigate, and post without worrying. For non-work travel, that peace of mind has real value.

But here's my prediction: this distinction won't last. Holafly is under massive pressure to enable tethering across all plans. Every competitor comparison highlights it as their biggest weakness. I'd bet that within 12 months, they'll either enable tethering universally or create a "Pro" tier specifically for remote workers. The travel eSIM market is growing too fast for any provider to leave the professional segment on the table.

For now, though, the answer is clear. If your phone is your office's lifeline, Airalo wins. Not even close.

Originally published on kunalganglani.com

Forward Deployed Engineer: The Hottest Role in AI-First Tech and Why It Pays So Well [2026]

Kunal — Sat, 09 May 2026 16:07:33 +0000

Forward Deployed Engineer: The Hottest Role in AI-First Tech and Why It Pays So Well [2026]

Palantir coined the term "Forward Deployed Software Engineer" over a decade ago, and for most of that time, nobody outside their orbit paid much attention. Now the forward deployed engineer role is showing up in job postings at AI-first companies across the industry, and total comp packages are pushing well past $250K. Something shifted.

I've spent 14+ years in software engineering, and I've watched dozens of role titles come and go. Most are rebranding exercises. This one isn't. The forward deployed engineer represents a fundamentally different model for how software gets delivered. And it's becoming one of the most interesting career paths for senior developers who are tired of being three layers removed from the actual problem.

What Is a Forward Deployed Engineer?

A forward deployed engineer (FDE) is a software engineer who works directly with customers to build, customize, and deliver technical solutions. Unlike a traditional backend engineer shipping features to an anonymous user base, an FDE sits at the intersection of engineering, consulting, and solutions architecture. They write production code, but they do it while embedded with the customer, deeply understanding their operational challenges.

At Palantir, where the role originated, Forward Deployed Software Engineers (FDSEs) own the technical delivery of Palantir's platforms. They're the primary technical point of contact for customers, which means they need to understand both the platform's capabilities and the customer's specific technical and operational problems inside out. It's not support engineering. It's not pre-sales. It's building real software in the field.

The forward deployed engineer is what happens when you take a strong software engineer and give them the context that usually only product managers and consultants have.

Here's Palantir's own explanation of the FDSE role:

[YOUTUBE:5OYy_UtINo4|The Role of a Forward Deployed Software Engineer]

If you're building AI products, you've probably already noticed the pattern: the gap between "working demo" and "deployed in production at a customer site" is enormous. That gap is exactly where forward deployed engineers live.

Why Forward Deployed Engineers Are Suddenly Everywhere

AI-first companies created a problem that traditional engineering orgs weren't built to solve. You can't just ship an AI platform and expect customers to figure it out. The integration work is too complex, too domain-specific, and the stakes are too high.

Palantir figured this out early. Their entire go-to-market motion depends on FDSEs who can take the platform and mold it to a customer's specific workflow — whether that's a government intelligence agency, a hospital system, or an oil and gas operation. The technology is powerful, but it's the deployment that creates value.

Now other companies are catching on. Databricks, Scale AI, and Anduril have all posted similar forward-deployed-style positions. The titles vary — "Field Engineer," "Solutions Engineer," "Applied AI Engineer" — but the core job is the same: a technically strong engineer who ships code in the customer's environment, not in an ivory tower.

I've seen this dynamic play out firsthand. Having built systems that serve enterprise customers, I know the hardest engineering problems often aren't in the core product. They're in the last mile. Integrating with legacy systems, handling edge cases unique to a specific industry, translating business requirements into technical architecture on the fly. That's the forward deployed engineer's entire job.

This trend also maps to something broader happening in the tech job market: companies increasingly value engineers who can operate across the full stack of a business problem, not just the technical stack.

What Does a Forward Deployed Engineer Actually Do Day-to-Day?

The daily work of a forward deployed engineer looks nothing like a typical SWE role. Here's what it actually involves:

Customer discovery and scoping. Understanding the client's domain deeply enough to identify where the platform solves their most painful problems. This isn't a PM's job here — the FDE does it.
Rapid prototyping and integration. Building custom workflows, data pipelines, and interfaces that connect the product to the customer's existing systems. Speed matters more than perfection.
Production deployment. Getting the solution live in the customer's environment, which often means navigating security requirements, compliance constraints, and infrastructure limitations that nobody warned you about.
Feedback loops back to product. FDEs are the company's eyes and ears in the field. They funnel insights about what's broken, what's missing, and what customers actually need back to the core engineering team.
Stakeholder management. Presenting technical progress to non-technical executives, managing expectations, and building trust. If you can't communicate clearly to a room full of C-suite leaders, this role will eat you alive.

The closest analog I can think of is a senior engineer who also happens to be an excellent consultant. As I wrote about in the hidden roles that senior engineers play, the best engineers already do a version of this internally — translating between business needs and technical execution. The FDE role just makes it explicit and customer-facing.

Is a Forward Deployed Engineer the Same as a Solutions Engineer?

This is the question I see most often, and the answer is: not really.

A solutions engineer typically operates in the pre-sales cycle. They build demos, answer technical questions during the sales process, and hand off to an implementation team once the deal closes. Their primary metric is helping close deals.

A forward deployed engineer operates after the deal is signed (and often stays through the entire customer lifecycle). They write production code. They own technical outcomes. Their primary metric is whether the customer actually succeeds with the product.

	Solutions Engineer	Forward Deployed Engineer
Primary phase	Pre-sales	Post-sales / ongoing
Core output	Demos, POCs, technical proposals	Production code, deployed systems
Success metric	Deal closed	Customer outcome achieved
Code depth	Prototype-level	Production-grade
Customer relationship	Transactional	Embedded, long-term
Reporting line	Usually Sales	Usually Engineering

The distinction matters because it determines what kind of engineer thrives in each role. Solutions engineers need charisma and breadth. Forward deployed engineers need depth, resilience, and the ability to ship under pressure in environments they've never seen before.

That said, the lines blur at some companies. Smaller startups sometimes combine both functions into a single role. If you're evaluating a job posting, look at who the role reports to. If it's the engineering org, it's likely a true FDE. If it's sales, it's probably solutions engineering with a fancy title.

Forward Deployed Engineer Compensation: What the Numbers Say

Let's talk money, because it's a big part of why this role is pulling in top talent.

According to Levels.fyi, Palantir's software engineering roles (including FDSEs) show median total compensation around $257K, with the overall range spanning mid-level to senior positions. The numbers vary significantly by level and location — senior FDSEs in New York or the Bay Area can push well above that median. Compensation data changes frequently, so check Levels.fyi directly for current figures.

What's interesting is that FDE compensation often outpaces equivalent-level product engineers at the same company. The reason is straightforward: FDEs are revenue-generating. They're directly tied to customer success and retention, which makes them easier to justify to finance teams. In my experience, roles that sit closer to revenue always command a premium. Always.

The role is also just hard to fill. You need someone who can write solid production code and hold their own in a room with a customer's CTO. That combination is rare. Scarcity drives compensation up.

The Skills That Actually Matter for Forward Deployed Engineers

If you're considering this path, here's what I'd focus on based on what I've observed in engineers who excel in customer-facing technical roles:

Technical breadth over depth in any single area. You'll need to work across databases, APIs, cloud infrastructure, frontend, and data pipelines — sometimes all in the same week. You don't need to be the world's best React developer, but you need to be competent across the stack.

Communication as a first-class skill. I've shipped enough features to know that the technical solution is maybe 40% of the challenge. The rest is understanding what the customer actually needs (which they often can't articulate cleanly) and explaining your approach in terms they trust. This mirrors the broader shift I've written about in how software engineering is evolving — the "plan and review" skills are becoming as important as the coding itself.

Comfort with ambiguity. There's no JIRA ticket waiting for you with clear acceptance criteria. The customer has a problem. You figure out the solution. That level of autonomy is thrilling for some engineers and terrifying for others.

Domain curiosity. The best FDEs I've encountered actually enjoy learning about industries they know nothing about. One week you're learning about supply chain logistics, the next about hospital patient flow optimization. If you only want to think about software, this role will burn you out fast.

Is the Forward Deployed Engineer Role Just Consulting?

I hear this one a lot, and I get the comparison. But there's a critical difference.

Consultants advise. Forward deployed engineers build. A McKinsey consultant might produce a 60-page deck recommending a data strategy. An FDE builds the data pipeline, deploys it in production, and iterates based on real results. The accountability is completely different.

There's also the product dimension. FDEs aren't building bespoke software from scratch for every customer. They're deploying and extending a specific platform. The best FDEs become deep product experts and shape the platform's roadmap through their field experience.

This is one of those things where the boring answer is actually the right one: the FDE role is something new. It's not consulting rebranded. It's not solutions engineering with a cooler title. It's a distinct function that emerged because AI products are too complex to throw over the wall and hope customers figure them out.

Where This Role Goes From Here

Here's my prediction: within two years, "forward deployed engineer" (or close variants) will be a standard role at every serious AI company. The deployment gap — the chasm between a working model and a production system that actually delivers business value — is the single biggest bottleneck in enterprise AI adoption right now.

Traditional sales cycles and support teams can't bridge that gap. You need engineers in the field. Engineers who understand the product deeply, who can write code under pressure, and who speak both the language of technology and the language of business outcomes.

If you're a senior engineer who's ever felt frustrated by being disconnected from the actual impact of your work, this role is worth a serious look. And if you're an engineering leader wondering why your enterprise AI platform isn't getting the adoption you expected, the answer might not be a better product. It might be deploying your engineers forward.

Originally published on kunalganglani.com

Senior Engineers Don't Just Write Code: The 4 Hidden Roles That Drive Real Impact [2026]

Kunal — Sat, 09 May 2026 12:49:40 +0000

A few months ago, a developer I was mentoring asked me a question that stopped me cold: "What do you actually do all day? You barely commit any code."

He wasn't being disrespectful. He was genuinely confused. He'd been heads-down shipping features for three years and assumed the path to senior engineer was just… more of that. Faster code. Harder problems. More pull requests.

He's not alone. Most mid-level engineers have a dangerously incomplete picture of what senior engineers actually do. And that misunderstanding is the single biggest reason people stall in their careers. Senior engineers don't just write code. In my experience, they spend 50-70% of their time on activities that never show up in a git log. System design. Mentorship. Managing technical debt. Cross-functional leadership. These are the hidden roles that separate people who have a senior title from people who actually drive senior-level impact.

Here's what nobody tells you about those roles.

What Do Senior Engineers Actually Do Beyond Writing Code?

A senior engineer's job isn't to produce the most code. It's to produce the most leverage. Sarah Drasner, who served as VP of Engineering at Google, has described senior engineers as "force multipliers." Their architectural decisions and guidance can make a team of five as productive as a team of ten.

That multiplier effect is the thing most people miss. Early in your career, your impact is roughly proportional to your hours. You write code, it ships, done. But that model breaks at scale. A senior engineer who spends a week designing the right abstraction might save their team months of rework. A senior engineer who mentors a struggling junior might turn them into a reliable contributor within a quarter.

I used to roll my eyes at the "they just go to meetings" stereotype. Then I became one of those people. Yes, senior engineers spend more time in meetings. But the good ones are using those meetings to shape decisions, resolve ambiguity, and keep the team pointed in the right direction. The meetings are the work.

As Gergely Orosz writes in The Pragmatic Engineer, a primary role of a senior engineer is to reduce complexity. Taking ambiguous, messy business problems and turning them into simple, maintainable, scalable technical solutions. That's not something you do in a code editor. It's something you do in conversations, on whiteboards, and in design documents.

If you've been wondering where this shift fits in the broader evolution of engineering work, I wrote about it in how software engineering is becoming 'plan and review'. The trend is accelerating, and senior engineers are at the center of it.

Role 1: System Design — The Decisions That Outlive Your Code

The first hidden role is system designer. Not the whiteboard-exercise-for-interviews kind. I mean making the architectural decisions that your team will live with for years.

I've shipped systems that handled tens of millions of requests, and the hardest part was never writing the code. It was choosing the right boundaries. Where do you split services? What data model supports the business requirements you know about and the ones you can see coming? When do you build for scale and when do you intentionally keep things simple?

These decisions compound. A good architectural choice made early saves hundreds of engineering hours downstream. A bad one creates the kind of accidental complexity that slowly grinds a team to a halt. I've lived through both.

What makes system design a senior-level skill: it requires holding multiple competing concerns in your head simultaneously. Performance versus simplicity. Consistency versus availability. Ship-it-now versus build-it-right. Mid-level engineers tend to optimize for one dimension. Senior engineers navigate the tradeoffs.

Will Larson, author of Staff Engineer: Leadership Beyond the Management Track, describes several archetypes for senior+ engineers, including the "Architect" who designs broad systems and the "Tech Lead" who directs a team's technical approach. Both roles center on decisions that shape technical direction far beyond any single feature.

If you want to grow into system design, start by asking why your current architecture looks the way it does. Trace the decisions backward. Understand the constraints that existed when those choices were made. That historical context is what separates someone who can design a system from someone who can only implement one.

Role 2: Mentorship — Your Biggest Leverage as a Senior Engineer

I spent years thinking mentoring meant occasionally reviewing a junior's code and leaving polite comments. That's not mentorship. Real mentorship is teaching someone how to think about problems. Not just how to solve the one in front of them.

After leading teams for over a decade, I've seen this pattern play out dozens of times: a strong senior engineer who mentors well doesn't just make their mentee better. They raise the bar for the entire team. The mentee starts writing better code, which means fewer bugs in review, which means the whole team ships faster. It compounds.

What effective senior-level mentorship actually looks like:

Pairing on architectural decisions, not just code
Asking questions that force the mentee to work through tradeoffs themselves (this is harder than just giving the answer, and slower, and worth it)
Giving feedback on communication and documentation, not just technical output
Creating enough psychological safety that people admit what they don't know
Knowing when to let someone struggle versus when to step in

This matters even more in remote environments. Joel Worrall, CTO at Over, has pointed out that senior engineers in remote settings have to proactively create clarity through documentation, set standards for async communication, and catch ambiguities before they slow everyone down. In a distributed world, mentorship isn't a nice-to-have. It's infrastructure.

And here's something I tell every mid-level engineer I work with: you don't need the title to start. Explain your PR decisions more thoroughly. Write better commit messages. Document the why behind your technical choices. That's mentorship in action, and people notice.

[YOUTUBE:CHVtYzg3cfQ|4 REGRETS from my career as a Senior Software Engineer]

Role 3: Managing Technical Debt — The Boring Work That Keeps Everything Running

This is one of those things where the boring answer is actually the right one. Technical debt management is unglamorous. Nobody gets a standing ovation for paying it down. But it's one of the most important things senior engineers do.

Ward Cunningham, who coined the technical debt metaphor in 1992, originally described it as the cost of shipping code built on an incomplete understanding of the problem domain. You build what you know, ship it, learn from real usage, then refactor as your understanding deepens. It was about learning, not about cutting corners. The popular interpretation — "we chose the quick-and-dirty approach" — is actually a misreading of Cunningham's original idea.

Either way, the debt accumulates. And senior engineers are the ones who see it clearly enough to manage it.

I've been on teams where technical debt was treated as someone else's problem. "We'll fix it later." Later never comes. What comes instead is a codebase that takes three days to add a feature that should take three hours. Deployment pipelines that break every other week. Tests that nobody trusts. I've watched good engineers quit over this.

Senior engineers manage technical debt by doing three things:

Making it visible. You can't fix what nobody acknowledges. This means quantifying the cost: "this legacy service costs us 15 hours of engineering time per sprint in workarounds."
Prioritizing strategically. Not all debt is equal. Some debt is fine to carry for years. Some will kill your velocity in six months if left untreated. Knowing which is which is the skill.
Negotiating with product. This is the cross-functional part. You translate technical debt into business language: "If we invest two sprints in this migration, we cut our incident rate by 60% and ship features 30% faster."

If you've dealt with the specific challenge of AI-generated code introducing debt, I broke down practical strategies in how to audit and refactor vibe-coded applications. Same principles, different scale.

Role 4: Cross-Functional Leadership — Speaking the Language of the Business

The fourth hidden role is the one that surprises most engineers: becoming the technical voice in non-technical conversations.

Senior engineers sit at the intersection of engineering, product, and business. They translate. They explain why a feature that "sounds simple" requires rethinking the data model. They push back when a product manager's timeline doesn't account for infrastructure work. They advocate for engineering investments that don't have obvious customer-facing value.

Tanya Reilly, author of The Staff Engineer's Path, calls this "managing up" — providing clear technical context to product managers and leadership so they can make informed decisions. It's not about saying no. It's about giving decision-makers the information they need to say yes to the right things.

I've seen this make or break teams. One of the most impactful things I've done in my career wasn't writing code at all. It was building a one-page technical summary that helped a VP understand why we needed to delay a launch by three weeks to fix a reliability issue. That single document prevented an outage that would have cost far more than three weeks of delay.

This role requires skills that most engineering curricula don't teach: writing clearly for non-technical audiences, presenting tradeoffs without drowning people in implementation details, building trust with stakeholders who don't speak your language. None of this is glamorous. All of it matters.

Want to practice? Next time your team makes a technical decision, write a one-paragraph summary that a product manager could understand. If you can explain why without using jargon, you're building the muscle.

The engineers who shape their companies aren't the ones who write the most code. They're the ones who make sure the right code gets written.

How Do You Grow Into These Senior Engineer Roles?

Here's the thing nobody's saying about the senior engineer transition: it's not a promotion. It's a career change.

The skills that got you from junior to mid-level — learning frameworks, writing clean code, debugging efficiently — are table stakes at the senior level. They won't get you further. What will is developing real competence in the four roles I've described: system design, mentorship, technical debt management, and cross-functional leadership.

As I wrote about in what's left for software engineers as AI writes more code, the engineers who thrive aren't the ones who type the fastest. They're the ones who think the clearest.

So here's my challenge: for the next two weeks, track how you spend your time. If 90% of it is writing new code, you're building mid-level skills. Start intentionally carving out time for design reviews, mentoring conversations, and stakeholder communication. It will feel uncomfortable. It will feel less productive. Do it anyway.

The engineers who drive real impact at senior and staff levels aren't coding machines. They're systems thinkers, force multipliers, and translators between the technical and the human. Start building those muscles now. The title will follow. Or it won't, and you'll realize you stopped caring about the title because the impact became obvious on its own.

Originally published on kunalganglani.com

AI Agent Control Flow: Why Better Prompts Won't Fix Your Broken Agent Architecture [2026]

Kunal — Fri, 08 May 2026 12:47:13 +0000

The Prompt Isn't the Problem. The Architecture Is.

Last year, I watched a team spend three months crafting an increasingly elaborate prompt for an AI agent that was supposed to handle customer support escalations. The prompt grew to over 4,000 tokens. Nested instructions, edge case handling, persona definitions, a small novel's worth of "if the user says X, do Y" rules. It still failed unpredictably in production. The issue wasn't the prompt. It was that the team was trying to encode AI agent control flow into natural language. And natural language is a terrible programming language.

This is the wall most teams building AI agents are hitting right now. Not a model intelligence wall. Not a context window wall. An architecture wall. The fix isn't a better prompt. It's a fundamentally different way of structuring how agents make decisions.

The teams shipping reliable AI agents in 2026 aren't prompt engineering their way out of complexity. They're building control flow the way software engineers have always built control flow: with code.

What Is Control Flow in AI Agents?

Control flow in AI agents refers to the explicit, programmatic structure that governs how an agent moves through tasks: what it does first, what it does next, how it handles failures, when it loops, and when it stops. It's the same concept you learned in your first CS class — conditionals, loops, state machines, error handling — applied to orchestrating LLM calls instead of database queries.

In a prompt-only agent, the LLM decides everything. It interprets the task, chooses its next action, evaluates whether it succeeded, and determines when it's done. All of that decision-making lives inside a single inference call (or a chain of them), guided only by text. In a control-flow-first agent, a structured program makes those decisions. The LLM gets called for specific, bounded tasks — summarize this document, extract these fields, generate this response — but the when, why, and what-happens-if-it-fails logic lives in actual code.

As Alessio Fanelli and Swyx of Latent Space have argued, prompts are not a programming language. When we build agents, we are building systems, and we should reach for the tools of systems building: modularity, abstraction, and control flow. The most advanced agent teams have moved to a "code-first" approach where the agent's logic is defined in a robust programming language, and the LLM is called as a tool for specific, well-defined tasks.

If you've worked with multi-agent AI systems in production, you know this already. The agents that survive contact with real users aren't the ones with the cleverest prompts. They're the ones with the most deliberate architecture.

Why Prompt-Only AI Agents Fail in Production

I've shipped enough AI-powered features to know that the demo-to-production gap is where most agent architectures go to die. Prompt-only agents die the hardest. Here's why.

No real error handling. When a prompt-only agent gets an unexpected response from a tool call, or receives malformed JSON, or hits a rate limit, it has no structured way to recover. It either hallucinates a workaround, retries blindly, or stops dead. In code, you'd write a try-catch with exponential backoff and a fallback strategy. In a prompt, you write "If something goes wrong, try again" and cross your fingers.

State management is a nightmare. Complex tasks require maintaining state across multiple steps. A prompt-only agent carries state in its context window, which means it's subject to all the fragility of long-context inference: attention degradation, lost details, the ever-present risk of the model "forgetting" a critical piece of information from step 3 when it's executing step 17. I've debugged these failures. They're maddening because they're intermittent.

Non-determinism compounds. Every LLM call introduces variance. In a single call, that variance is manageable. In a 20-step agent workflow where each step's output feeds the next, variance compounds into chaos. I've seen agents that work perfectly 8 out of 10 times in testing fail 4 out of 10 times in production because the distribution of real-world inputs is always wider than your test suite assumes.

Cost spirals. Without explicit control over when and how the LLM is called, prompt-only agents are wildly inefficient. They'll call GPT-4-class models for tasks that a regex could handle. They'll re-process entire conversation histories instead of caching intermediate results. I've watched token costs for a single agent task go from $0.03 to $2.40 because the agent decided to "think through" a problem that had a deterministic answer. That's an 80x cost multiplier for zero added value.

The a16z AI infrastructure team has documented this pattern: early AI agents relied on a single, monolithic prompt to guide behavior, and the approach is brittle. The new architectural pattern involves a central "agent kernel" or runtime that manages state, executes a control flow loop, and calls on LLMs as one of many tools to accomplish sub-tasks.

If you've read about AI agent failures in production, the patterns are familiar. The root cause is almost always architectural.

The Control-Flow-First Architecture

So what does a control-flow-first agent actually look like? It looks a lot more like traditional software than most people expect. And that's the point.

The core idea is separation of concerns. Your program defines what happens. The LLM defines how specific subtasks get done. Think of it like a project manager (the code) delegating to a specialist (the LLM). The project manager decides order of operations, handles dependencies, manages failures, tracks progress. The specialist focuses on doing one thing well.

Omar Khattab, Stanford researcher and creator of DSPy, has been one of the clearest voices here. DSPy reframes interaction with language models from "prompting" to "programming." It separates the logic of a program — the control flow — from the parameters, meaning the prompts and model weights. This lets you build reliable, complex systems on top of language models without manually tuning every prompt.

In practice, a control-flow-first architecture has a few key properties:

Explicit state machines. The agent's possible states and transitions are defined in code. No ambiguity about what happens after a task completes or an error fires.
Typed inputs and outputs. Each LLM call has a defined schema for what goes in and what comes out. No more parsing free-text responses and hoping the model formatted things correctly.
Deterministic routing. Decisions that can be made without an LLM are made without an LLM. If the next step depends on whether a value is above or below a threshold, that's an if-statement. Not a prompt.
Structured retry and fallback logic. When a step fails, the system knows what to do: retry with different parameters, fall back to a simpler model, escalate to a human, or gracefully degrade. No guessing.
Observable execution traces. Because the control flow is explicit, you can log every decision point, every LLM call, every state transition. Debugging goes from "why did the agent do that?" to "it entered this state because this condition was true."

Sequoia Capital's analysis of LLM system evolution captures this well: simple, one-step LLM calls are being replaced by compound, agentic systems that reason, plan, and use tools. These systems require explicit control flow to manage multi-step tasks, handle errors, and decide when to use different tools or models.

None of this is new. It's how we've built reliable distributed systems for decades. The insight is that AI agents are distributed systems. They just happen to have a non-deterministic component.

What Frameworks Support AI Agent Control Flow?

The tooling ecosystem is catching up fast. Several frameworks now explicitly support control-flow-first agent design.

LangGraph takes a graph-based approach, letting you define agent workflows as nodes and edges with explicit state management. It's opinionated about control flow in a way that vanilla LangChain never was. That's a good thing.

DSPy goes furthest in the "programming, not prompting" direction. You define modules with typed signatures, compose them into pipelines, and let the framework optimize the prompts automatically. You never write a prompt. You write a program.

Temporal and similar workflow engines are being adopted by teams that want battle-tested durability guarantees for long-running agent tasks. If you're already familiar with Temporal's workflow engine, the mental model transfers directly: an agent workflow is just a workflow with LLM calls as activities.

Prefect and Dagster, originally built for data pipelines, are being repurposed for agent orchestration. They already solve the hard problems: state management, retry logic, observability. Why rebuild all of that from scratch?

Matei Zaharia, co-founder of Databricks, has been advocating for what he calls "compound AI systems" — architectures where multiple models, retrievers, and tools are composed together with explicit program logic rather than relying on a single model to figure everything out. I like this framing because it makes AI agents a software engineering problem, not just a machine learning problem. And software engineering problems, we know how to solve.

The common thread: stop asking the LLM to be the operating system. Let it be a function call.

Do You Still Need Prompt Engineering?

Here's the thing nobody's saying about this shift: prompt engineering doesn't disappear. It gets smaller and more focused.

In a control-flow-first architecture, you still write prompts. But instead of one massive, fragile prompt that tries to govern the entire agent's behavior, you write small, specific prompts for individual tasks. "Extract the customer's intent from this message" is a much easier prompt to get right than "You are a customer support agent. Handle all incoming messages. If the customer is angry, de-escalate. If they need a refund, check the policy..."

Smaller prompts are easier to test, easier to optimize, and easier to replace. If a new model handles entity extraction better, you swap it in for that one step without touching the rest of your system. Having built a 100+ prompt playbook, I can tell you the best prompts I've ever written are surgical. They do one thing and do it well. That's exactly what control-flow-first design demands.

The analogy I keep coming back to is microservices. Monolithic prompts have the same problems as monolithic applications: hard to test, hard to debug, hard to scale. A change in one area breaks something in another. Decomposing the prompt into small, composable units with explicit interfaces between them is the same architectural instinct that drove the industry from monoliths to services. We've been here before. We know how this plays out.

The Boring Answer Is the Right One

This is one of those things where the boring answer is actually the right one. The next leap in AI agent reliability isn't coming from a model that's 10% smarter. It's coming from treating agent design as a proper software engineering discipline.

Loops, conditionals, state machines, error handling, observability, typed interfaces. None of this is new. None of it is exciting. All of it is necessary.

After building and shipping AI systems for the past several years, I've learned something that keeps proving true: the gap between a compelling demo and a production system is almost never about model capability. It's about all the engineering that surrounds the model. The teams winning right now figured this out early. The LLM is a component, not the architecture.

If you're building AI agents and you're still fighting prompt brittleness, stop tuning the prompt. Zoom out. Draw the state machine. Define the error paths. Make the control flow explicit. Your agent will thank you by actually working when a real user touches it.

The future of AI agents isn't smarter models with longer prompts. It's smarter architecture with smaller, sharper prompts embedded in real code. And for anyone who's been building software for more than a few years, that should feel like coming home.

Originally published on kunalganglani.com

Android 17 QPR1 Beta 2: The Developer Features Nobody Is Talking About [2026]

Kunal — Thu, 07 May 2026 16:08:28 +0000

Android 17 QPR1 Beta 2: The Developer Features Nobody Is Talking About [2026]

Google dropped Android 17 QPR1 Beta 2 and tech media did what it always does: focused on the wallpaper refresh and notification shade tweaks. Meanwhile, the actual developer-facing changes buried in the release notes will break production apps, force privacy migrations, and change how multi-device layouts work on Android. I've been running the beta on a Pixel 9 Pro for a week. The surface-level coverage completely misses what matters.

If you're shipping Android apps professionally, the Android 17 QPR1 Beta 2 developer changes are the ones that should have your attention right now. Not the UI polish. Not the emoji updates. The API deprecations, the privacy enforcement deadlines, and the adaptive layout requirements that Google is finally making non-optional.

What's Actually New in Android 17 QPR1 Beta 2 for Developers?

Every Android QPR (Quarterly Platform Release) beta ships a mix of user-facing polish and developer-critical changes. The QPR1 betas have historically been where Google tightens enforcement on APIs they soft-launched in the main release. Android 17 QPR1 Beta 2 follows this pattern hard.

The changes that matter fall into three buckets:

Privacy enforcement escalation. Scoped storage exceptions that survived through Android 16 are now deprecated with hard removal timelines. The photo picker API is no longer a suggestion. Apps targeting API level 36 that bypass it will see Play Store review flags.
Adaptive layout compliance. Google has been pushing responsive, multi-form-factor layouts since foldables went mainstream. With QPR1 Beta 2, the large screen compatibility guidelines are getting enforced through new Play Console warnings.
Background process restrictions. The foreground service type requirements introduced in Android 14 have been further tightened. Several previously exempt service types now require explicit user consent flows.

I've shipped apps that had to scramble through previous Android privacy changes. The pattern is always the same: Google announces it softly, developers ignore it, enforcement hits, everyone panics. We're in the "soft announcement" phase right now. Don't be the team that panics later.

How Will Android 17's Privacy Changes Affect Existing Apps?

This is the big one. Google's privacy trajectory across Android versions has been dead consistent: introduce an optional API, make it recommended, make it mandatory. With Android 17 QPR1, several privacy mechanisms are crossing from "recommended" to "enforced."

The photo picker enforcement is the most impactful change for consumer-facing apps. If your app currently uses READ_MEDIA_IMAGES or READ_MEDIA_VIDEO permissions directly, you need to migrate to the system photo picker or the MediaStore API with filtered access. This isn't new — the photo picker launched in Android 13 — but QPR1 Beta 2 signals that apps bypassing it will face tangible consequences in Play Store review.

Health Connect integration is another area where enforcement is getting real. Health data APIs that were introduced as opt-in are becoming the only sanctioned path for fitness and wellness apps. If you've been reading health data through custom providers or direct sensor access, the migration clock is ticking.

I've worked with Android's permission model evolution since the Marshmallow days. Here's what I know for certain: Google moves slowly on enforcement, but they never reverse course. Every single privacy restriction that started as a soft warning eventually became a hard gate. Plan your migration now, not when the stable release ships.

How this hits you depends on what you're building. Social media and photo-heavy apps will feel the photo picker enforcement most. Health and fitness apps need to audit their Health Connect integration. And any app doing background location tracking should re-examine whether their use case survives the new foreground service consent requirements. If you've explored how dark patterns in tech affect user trust, you'll recognize Google's broader push: making implicit data access explicit and user-controlled.

Adaptive Layouts Are No Longer Optional

Google has been talking about adaptive layouts for years. Foldables, tablets, ChromeOS, desktop mode, Android XR headsets. The form factor explosion has made single-layout apps look increasingly broken. With Android 17 QPR1 Beta 2, Google is finally backing the talk with enforcement.

The new Play Console quality checks evaluate how apps behave across screen configurations. This isn't a binary pass/fail yet, but the direction is obvious. Apps that don't properly handle configuration changes for different screen sizes and densities will see warnings. Based on Google's track record, those warnings become requirements. Every time.

Sameer Samat, who leads Google's Android and Pixel ecosystem, has repeatedly emphasized the multi-form-factor future of Android in recent keynotes. The platform is no longer just phones. Google's desktop mode for Pixel, which I covered when it first appeared, was an early signal. Android 17 makes it louder.

Building an Android app for a single screen size and calling it done? That's over. QPR1 Beta 2 is Google saying "we're not asking anymore."

Practically, this means adopting WindowSizeClass from the Jetpack library, handling configuration changes gracefully, and testing your app across at least three form factors: compact phone, medium tablet/foldable inner display, and expanded desktop. If you're still using fixed-dp layouts anywhere, now is the time to refactor.

I've watched teams burn weeks debugging layout issues on foldables that could have been avoided with proper adaptive architecture from day one. The cost of retrofitting adaptive layouts into a mature app is roughly 3-5x the cost of building them in from the start. Most production apps with more than 50 screens? Budget 2-4 sprint cycles to properly adopt the new layout expectations.

Background Process and Foreground Service Changes

Android's war on background processes has been going on since Doze mode in Marshmallow. Each version ratchets the restrictions tighter. Android 17 QPR1 Beta 2 continues the trend, and it's precise about what it targets.

The foreground service type system, which Android 14 introduced to force developers to declare why their app needs a foreground service, is getting more granular. Several service types that were broadly permitted now require additional justification through a user-facing consent dialog. This particularly affects apps using dataSync, mediaPlayback, and location foreground service types.

The battery optimization exemption list is also getting harder to escape. Apps that previously requested exemption via REQUEST_IGNORE_BATTERY_OPTIMIZATIONS will find that Google Play's review process flags these requests more aggressively. The guidance is clear: use WorkManager for deferrable work, use the exact alarm API for time-critical tasks, and stop trying to keep your app alive in the background.

For anyone building real-time features — chat apps, navigation, music players — the new consent flows add friction. But the alternative was the Android 4.x era where every app fought for background CPU time and battery life was a joke. This is one of those things where the boring answer is actually the right one: proper job scheduling beats background hacks every time.

If you're building apps that rely heavily on background processing, the architectural thinking here overlaps with what I discussed in the evolving role of software engineers. Less about clever hacks, more about proper system design.

What Developers Should Do Right Now

Don't wait for the stable release.

Install the beta on a test device. A real Pixel, not an emulator. Emulators miss thermal throttling behavior and real-world permission dialog flows.
Run your full test suite against API level 36. Focus on storage access, background services, and layout behavior on non-standard screen sizes.
Audit your foreground services. List every foreground service type your app declares and check each against the updated foreground service requirements. If any are newly restricted, start planning the migration.
Test adaptive layouts across three screen sizes. Use a foldable (physical or emulator), a standard phone, and a tablet or desktop mode configuration. Write down every layout that breaks.
Check your targetSdkVersion timeline. Google Play typically gives 12 months after a new API level before requiring apps to target it. Start planning now.

I've shipped enough Android updates to know this: the gap between "it works on the stable release" and "it passes Play Store review" is where most schedule slippage happens. Teams that treat QPR betas as early warning systems consistently have smoother launches.

Where Android Is Heading

Android 17 QPR1 Beta 2 isn't just a point release. It's a clear signal of where the platform goes in the next 18 months. Google is converging on a vision where Android apps work seamlessly across phones, foldables, tablets, desktops, cars, and XR headsets. The privacy model is moving toward explicit, granular user consent for everything. Background processing is being funneled into a small set of well-defined patterns.

My prediction: by the time Android 17's first QPR ships to stable devices, Google will surface adaptive layout compliance as a visible metric in Play Console. Similar to how they surface Core Web Vitals for web. This is my speculation, not insider knowledge, but the trajectory is unmistakable.

The apps that thrive on Android in 2027 will be the ones that treat privacy, adaptability, and battery efficiency as first-class architectural concerns. Not checkboxes. Not last-minute fixes before a Play Store submission. Foundational design decisions baked in from day one.

The beta is available now. Go break your app on purpose. It's cheaper than having your users do it for you.

Originally published on kunalganglani.com

Gemini Flash vs Pro for Developers: Which Google AI Model Actually Fits Your Use Case [2026]

Kunal — Thu, 07 May 2026 12:50:34 +0000

Google's Gemini model family now spans four generations, three tiers, and more naming confusion than a Java enterprise framework. If you're a developer trying to figure out whether to use Gemini Flash vs Pro for your next project — or wondering what the 1-million-plus token context window actually means in practice — you're not alone. I've spent the last several months building with these models, and the gap between what Google announces on stage and what actually works in production is worth talking about.

Let me cut through the marketing and tell you what I've learned.

The Gemini Model Lineup: What Actually Exists Right Now

Before we compare anything, let's clear up the naming chaos. Google's current Gemini API model page lists three main models in the latest generation:

Gemini 3.1 Pro — The flagship reasoning model, currently in preview. Strong agentic capabilities, and the model you reach for when accuracy matters more than speed.
Gemini 3 Flash — The cost-optimized workhorse. Competitive performance with larger models at a fraction of the cost and latency.
Gemini 3 Nano — The on-device model for mobile and edge deployments.

If you've heard the term "Gemini Omni" floating around, that's not a real product. The confusion likely stems from Google's December 2023 announcement of Gemini Ultra (the original top-tier model), which has since been superseded by the Pro line. As Sundar Pichai explained when first introducing the Gemini family, the hierarchy is Ultra/Pro/Nano. Not Omni.

The lineage matters because each generation has brought real improvements. The 1.5 era introduced Flash as a concept — a distilled, lighter model trained from Pro's knowledge. That distillation approach carried forward into every subsequent generation, and it's why Flash models punch well above their weight class.

Is Gemini Flash Good Enough for Production?

This is the question I get asked most. The answer is yes. For most use cases, emphatically yes.

When Google first introduced Gemini 1.5 Flash at I/O 2024, the pitch was simple: take the intelligence of Pro, distill it into something faster and cheaper. As Kyle Wiggers reported in TechCrunch, Flash was designed specifically for "lower-latency, cheaper AI responses for applications like live chatbots and real-time analysis." That design philosophy has only sharpened since.

To put historical benchmarks in context: when the 1.5 generation launched, Pro scored 84% on MMLU while Flash hit 79%. A 5-point gap for a model that was dramatically cheaper and faster. As Emilia David noted at The Verge, Flash retained multimodal reasoning capabilities despite being "distilled" from the larger model. Each generation since has narrowed that gap further.

I've shipped features using both tiers, and here's the pattern I've settled on: Flash handles 80-90% of my production workloads. Summarization, classification, extraction, conversational interfaces, code explanation — Flash eats these for breakfast. I only escalate to Pro when I need complex multi-step reasoning, when I'm dealing with genuinely ambiguous inputs that need careful analysis, or when I'm building agentic workflows that require the model to plan and execute autonomously.

The boring answer is the right one: start with Flash, measure where it falls short, and upgrade to Pro only for those specific tasks.

How the Massive Context Window Changes What You Can Build

Okay, the context window story is where I actually get excited. Google first broke the context window ceiling with the 1.5 generation, when Josh Haftel, then Group Product Manager at Google, announced a 2-million-token context window for Gemini 1.5 Pro. That capacity has only grown. The current generation can handle entire codebases, full video transcripts, and massive document collections in a single prompt.

But here's the thing nobody's saying about context windows: having a million tokens available doesn't mean you should use a million tokens every time.

I've been building document-heavy applications for the past year, and context window size matters most for two specific patterns. First, whole-codebase analysis. Being able to drop an entire repository into context and ask architectural questions without chunking. That's transformative. If you've been doing RAG gymnastics to analyze code spread across dozens of files, a massive context window lets you just... not do that. Second, long-form media understanding. Processing entire meeting transcripts, video content, or multi-hundred-page documents without the lossy summarization that chunking requires.

Where it matters less than you'd think: most chatbot and assistant use cases. Your users are sending 50-200 token messages. A 10,000-token conversation history covers 95% of sessions. You're paying for context you'll never touch.

The cost math is critical. During the 1.5 era, Google priced Flash at $0.35 per million input tokens (up to 128K context) versus $3.50 per million for Pro. That's a 10x difference. Multiply that by millions of API calls and you're not looking at an academic distinction. You're looking at your cloud bill.

If you're weighing these costs against other providers, I compared LLM API latency across providers earlier this year. Google's Flash tier consistently wins on speed-per-dollar.

What Is Project Astra and Can Developers Use It?

Project Astra is Google DeepMind's vision for real-time, multimodal AI agents — the kind that can see through your camera, hear your voice, understand spatial context, and respond conversationally. The demos are impressive. An AI that watches a video feed, remembers where you left your glasses, and answers questions about what it's seeing in real time.

Developers need to know one thing about it though: Project Astra is a research initiative, not an API you can call today.

As Ryan Morrison, Senior Editor at Tom's Guide explained, Astra "is not a product but a demonstration of what's possible with Google's AI models." The underlying technology — continuous video frame encoding, real-time audio processing, persistent information caching — is technically fascinating. Demis Hassabis, CEO of Google DeepMind, has framed it as the foundation for a future generation of AI assistants.

So what does trickle down to developers today? The Live API. Google's real-time streaming API lets you build applications that process audio and video in near-real-time using Gemini models. It's not the full Astra vision, but it's the productized slice of that research. If you're building anything involving live camera feeds, voice interaction, or real-time multimodal understanding, the Live API is where the action is.

I've been experimenting with it for a developer tool that analyzes whiteboard diagrams during meetings. The multimodal capability is real. The model can parse handwritten architecture diagrams and convert them to structured descriptions. It's not perfect, but it's far better than anything I could have built even a year ago. And it runs on Flash, not Pro, which keeps costs reasonable even with continuous video input.

Gemini Flash vs Pro: The Decision Framework

After shipping multiple features on both tiers, here's how I think about the choice:

Use Gemini 3 Flash when:

You need low latency for user-facing features
Your task is well-defined: summarization, classification, extraction, translation
You're processing high volumes and cost matters (it always matters)
You're building conversational interfaces where response time shapes the user experience
Real-time multimodal processing where speed beats perfection

Use Gemini 3.1 Pro when:

Complex, multi-step reasoning is required — chain-of-thought over ambiguous inputs
You're building autonomous agents that need to plan, reason, and use tools
Accuracy on hard problems is non-negotiable (legal analysis, medical, financial)
You need the largest context window for whole-codebase or whole-document analysis
You're doing deep analysis tasks where you'll wait for quality

Use Gemini 3 Nano when:

You're building mobile or edge applications
Privacy requires on-device processing
Offline capability is a requirement
Latency needs to be sub-100ms

The mistake I see most teams make is defaulting to Pro for everything because it's "better." It is better at hard reasoning tasks. For the other 85% of what you're building, Flash isn't just cheaper — it's actually preferable because faster responses create better user experiences. As I've written about when discussing how AI agents are reshaping software engineering, the bottleneck in most AI applications isn't model intelligence. It's latency, cost, and reliability at scale.

What This Means for What You Build Next

Google's Gemini strategy is getting clearer with each generation: make the Flash tier so good that most developers never need Pro, then make Pro so capable it handles tasks that previously required custom pipelines of multiple models stitched together.

If you've been waiting for "the right model" to start building AI features, stop waiting. Flash is production-ready, affordable, and fast enough for real-time applications. Pro is there when you need serious reasoning power thrown at genuinely hard problems. The massive context windows mean you can stop engineering around chunking limitations for a lot of use cases.

Project Astra remains a glimpse of where this is heading — persistent, multimodal agents that understand the world in real time. We're not there yet as an API, but the building blocks (Live API, massive context, multimodal understanding) are available today.

My prediction: within 12 months, the Flash tier will match what Pro can do today, and Pro will be doing things we currently need multi-agent orchestration to achieve. The distillation pipeline Google has built isn't just a cost optimization trick. It's a compounding advantage. Every generation of Pro trains the next generation of Flash, and every generation of Flash makes it cheaper for developers to build things that were impossible a year ago.

Stop waiting. Start building with Flash. Upgrade to Pro where the benchmarks tell you to. That's it. That's the whole strategy.

Originally published on kunalganglani.com

CVE-2024-3400 and the AI Security Crisis: Palo Alto's CEO Warned Us While His Own Firewalls Burned [2026]

Kunal — Wed, 06 May 2026 12:49:24 +0000

CVE-2024-3400 and the AI Security Crisis: Palo Alto's CEO Warned Us While His Own Firewalls Burned

Nikesh Arora, CEO of Palo Alto Networks, stood on stage at RSA Conference 2024 and told every security team on the planet to be afraid: nation-state attackers are using AI to find vulnerabilities faster than defenders can patch them. The industry, he said, has a "24-to-36-month window" to get ahead of AI-driven threats before attackers gain a serious upper hand. Weeks earlier, his own company had disclosed CVE-2024-3400, a command injection flaw in PAN-OS that scored a perfect 10.0 on the CVSS scale. An unauthenticated attacker could execute arbitrary code with root privileges on Palo Alto's own firewalls. The irony isn't just poetic. It's a signal.

This isn't a story about one company's bad week. It's about what happens when the tools defenders built become the attack surface, and AI is accelerating the offense faster than anyone predicted.

What Is CVE-2024-3400 and Why Does It Matter?

CVE-2024-3400 is a command injection vulnerability in the GlobalProtect feature of Palo Alto Networks' PAN-OS software. It affects PAN-OS versions 10.2, 11.0, and 11.1 when configured with a GlobalProtect gateway or portal. The flaw allows an unauthenticated attacker to execute arbitrary code with root privileges on the firewall itself. No credentials needed. No prior access required.

Think about what that means. The device your organization trusts to be the barrier between your network and the internet can be completely owned by someone who has never touched your systems before. No phishing email. No stolen password. Just a crafted request to a publicly exposed endpoint.

Palo Alto Networks' own Unit 42 threat research team assigned the vulnerability the maximum CVSS score of 10.0. The Cybersecurity and Infrastructure Security Agency (CISA) confirmed active exploitation in the wild and immediately added CVE-2024-3400 to its Known Exploited Vulnerabilities catalog. Federal agencies were ordered to patch immediately under Binding Operational Directive 22-01.

The threat actor behind the initial exploitation, tracked as UTA0218 by Volexity and later analyzed by Varonis Threat Labs, wasn't some script kiddie. They built a custom backdoor called UPSTYLE. Purpose-built persistence designed to survive reboots and maintain access to compromised firewalls. This is tooling that takes significant resources and intent to develop. It screams nation-state.

I've managed infrastructure that sat behind Palo Alto firewalls. When I saw this CVE drop, my first reaction wasn't surprise. It was that familiar dread of knowing the thing you trusted most just became your biggest liability. If you've ever had to coordinate an emergency patching cycle across dozens of firewalls on a Friday evening, you know exactly what I mean.

How AI Is Helping Hackers Find Zero-Days Faster

This is where things get really uncomfortable. At RSA 2024, Arora didn't mince words. He stated plainly that nation-state actors are using AI and large language models to "find vulnerabilities faster" and to "train their malware to be more effective." This isn't conference speculation. It's already happening.

The old model of vulnerability discovery involved painstaking manual reverse engineering. A skilled researcher might spend weeks or months fuzzing a target, reading disassembled code, and crafting a working exploit. AI compresses that timeline dramatically. LLMs can analyze codebases at scale, identify patterns that correlate with known vulnerability classes, and suggest exploitation paths. The barrier to entry for sophisticated attacks is dropping fast.

Arora's 24-to-36-month window isn't arbitrary. It reflects a calculation about how quickly AI tooling matures versus how quickly defensive architectures can adapt. And honestly? Having spent years watching organizations struggle to implement basic patch management, I think 24 months is generous.

[YOUTUBE:qgSv8StOZxA|Palo Alto Networks CEO Nikesh Arora on the cyber threat landscape, impact of AI on cybersecurity]

We don't know for certain that AI was used to discover CVE-2024-3400 specifically. But the sophistication of the UPSTYLE backdoor and the speed of exploitation suggest a threat actor with advanced capabilities and serious tooling. The exploit chain involved arbitrary file creation leading to command injection. That's exactly the kind of multi-step vulnerability that AI-assisted analysis is particularly good at identifying.

The security vendors building the walls are themselves targets, and the attackers have access to the same foundational AI models that defenders do. I wrote about similar dynamics in how AI pentesting agents are learning to hack with DARPA's support. The offense-defense gap is widening, not shrinking.

The Defender's Dilemma: Your Firewall Is Now an Attack Surface

Here's what makes CVE-2024-3400 sting beyond the timing of Arora's warnings. Firewalls are supposed to be the most hardened, most trusted components in your network. They sit at the perimeter. They see all traffic. They have root-level access to everything flowing through them. When the firewall itself is compromised, the attacker doesn't just bypass your defenses. They become your defenses.

And this isn't unique to Palo Alto. We've seen similar critical vulnerabilities in Fortinet's FortiOS, Cisco's IOS XE, and Ivanti's Connect Secure VPN appliances. Network security appliances, by their nature, present a massive attack surface because they must be internet-facing and they process untrusted input at scale.

In my experience building and reviewing security architectures, this is where most organizations have a blind spot. They invest heavily in next-gen firewalls, intrusion detection systems, and endpoint protection. But the implicit assumption is that these devices themselves are trustworthy. CVE-2024-3400 shatters that assumption.

The device you trust to protect your network is the device an attacker trusts to give them root access.

The UPSTYLE backdoor is particularly alarming because it demonstrates operational maturity. UTA0218 didn't just exploit the vulnerability and grab some data. They built persistence. They planned to stay. That's the hallmark of a threat actor with strategic objectives, not an opportunistic smash-and-grab. And it's the kind of sophisticated tradecraft that, as Arora warned, AI is helping to accelerate.

I've written about how supply chain attacks targeting developer tools and infrastructure exploit the same basic weakness. The common thread is trust: we implicitly trust our tools, our dependencies, and our security appliances. Attackers know this, and they're systematically going after that trust.

What Zero Trust Actually Means After CVE-2024-3400

Every security vendor talks about "zero trust." It's become so overused it's practically meaningless as a marketing term. But CVE-2024-3400 is a case study in why the underlying principle actually matters.

Zero trust, stripped of the marketing, means this: no component in your architecture gets implicit trust based on its position in the network. Not your firewall. Not your VPN concentrator. Not your identity provider. Every component must continuously prove it deserves the access it has.

After seeing vulnerabilities like this hit production environments, I've become convinced that the practical implementation of zero trust requires three things most organizations aren't doing:

Assume breach of perimeter devices. Your incident response plan should include scenarios where the firewall itself is the compromised asset. If your IR playbook starts with "check the firewall logs," you've got a serious problem when the firewall is the adversary.
Segment aggressively behind the perimeter. East-west traffic controls matter more than ever. A compromised firewall with visibility into a flat network is catastrophic. A compromised firewall facing microsegmented workloads is bad but survivable.
Monitor your security appliances with the same rigor you monitor your servers. If you're running EDR on every endpoint but not watching the integrity of your firewall's operating system, you've got exactly the gap that threat actors like UTA0218 will find.

CISA's rapid addition of CVE-2024-3400 to the Known Exploited Vulnerabilities catalog and the mandatory patch directive for federal agencies was the right call. But it also shows how reactive the current model is. Palo Alto Networks released patches for affected PAN-OS versions, but the gap between disclosure and patching across enterprise environments is exactly the window attackers exploit.

The 24-Month Clock Is Already Ticking

Arora's warning about a 24-to-36-month window wasn't just conference keynote rhetoric. It was a candid acknowledgment from the CEO of a $100+ billion security company that the industry is losing ground.

The dynamics are brutal. AI-assisted vulnerability discovery reduces the time from "unknown flaw" to "weaponized exploit." AI-assisted malware development reduces the time from "proof of concept" to "operational capability." Meanwhile, enterprise patching cycles haven't gotten meaningfully faster in a decade. The average time to patch a critical vulnerability in enterprise environments still hovers around 60 days, according to industry reports from organizations like Qualys. Two months of exposure for every critical flaw. That's insane.

When I look at CVE-2024-3400 through this lens, the timeline is terrifying. The vulnerability was being exploited in the wild before a patch was available. This is the zero-day scenario that every security team dreads, and AI is going to make it more common, not less.

The question isn't whether AI will make attackers more effective. It already has. The question is whether defenders can use the same technology to close the gap. I'm cautiously optimistic about AI-driven detection and response, but I've also seen enough AI agent failures in production to know that deploying AI defensively comes with its own risks.

Here's what I think happens next: the security industry will consolidate aggressively around AI-native platforms. The point-solution era is ending because no human team can correlate signals across dozens of tools fast enough to catch AI-accelerated attacks. Arora himself has been pushing this platformization narrative at Palo Alto Networks, and whatever you think of his motives, the technical argument is sound.

But platformization won't matter if the platforms themselves have 10.0 CVSS vulnerabilities. That's the real lesson of CVE-2024-3400. The companies building the future of cybersecurity defense need to be dramatically better at securing their own code first. The attackers now have AI helping them check your homework. And right now, they're finding the mistakes faster than you can fix them.

Originally published on kunalganglani.com

Linux Copy-Primitive Bugs Keep Breaking Container Security: From Dirty COW to Leaky Vessels [2026]

Kunal — Tue, 05 May 2026 12:49:43 +0000

Three times in a decade. That's how often a Linux copy-primitive bug has blown a hole through container isolation. In 2016 it was Dirty COW. In 2024 it was Leaky Vessels. In 2026, a new class of Linux copy-primitive bugs is proving, again, that containers share a kernel. And that kernel keeps betraying them.

The pattern is hard to ignore. Bugs in how the Linux kernel copies, references, or manages data at the lowest level keep punching through container isolation boundaries. If you're running Docker or Podman in production, rootless or not, this should be on your radar. The next copy-primitive container escape isn't a question of if. It's when.

Why Linux Copy-Primitive Bugs Keep Breaking Containers

Containers aren't virtual machines. They don't have their own kernel. Every container on a host shares the same Linux kernel, separated only by namespaces, cgroups, and a handful of security mechanisms like seccomp and AppArmor.

That's the fundamental bargain: lightweight, fast isolation in exchange for sharing the most privileged piece of software on the machine. When a bug exists in the kernel's handling of copy operations — whether it's copying memory pages, file descriptors, or data between user and kernel space — it cuts across every isolation boundary containers rely on.

I learned this the hard way. After migrating production workloads to rootless Podman containers in 2022, I thought we'd significantly reduced our attack surface. We had. But the kernel was still the kernel. When Leaky Vessels dropped in early 2024, it was a cold reminder that our "rootless" setup was only as strong as the syscall layer sitting underneath it.

The copy-primitive pattern is consistent: the kernel needs to move or reference data — a memory page, a file descriptor, a buffer. The operation has a race condition, a leaked reference, or a missing permission check. An attacker inside a container exploits that flaw to read or write data they shouldn't touch, punching through the namespace boundary. Three times in ten years. That's not a coincidence. That's a systemic weakness in how Linux manages data at the lowest level.

Dirty COW: The Bug That Started the Pattern

Dirty COW (CVE-2016-5195) was a race condition in the Linux kernel's memory subsystem. It exploited how the kernel handles Copy-on-Write (COW) memory mappings. When a process tries to write to a read-only memory-mapped file, the kernel is supposed to create a private copy. Dirty COW exploited a race condition in that copy operation, allowing a local user to gain write access to read-only memory mappings.

The bug had existed in the kernel for nearly nine years before anyone found it. Nine years. In a component so fundamental that virtually every Linux system was affected.

For containers, Dirty COW was devastating. Because containers share the host kernel, any process inside a container could exploit the race condition to escalate privileges on the host. The isolation that namespaces and cgroups provided was irrelevant. The bug was beneath all of it.

Dirty COW proved something the container community didn't want to hear: if the kernel's copy mechanism is broken, your container boundary doesn't exist.

The fix was a kernel patch. But the lesson was bigger than one CVE. The kernel's memory management code is ancient, complex, and handles billions of operations per second. Copy-on-Write is not a feature you can rip out. It's foundational to how Linux works. And foundational code is where the worst bugs hide.

Leaky Vessels: Same Pattern, Different Layer

Fast forward to January 2024. Snyk's security research team disclosed Leaky Vessels, a set of vulnerabilities in runc, the container runtime used by both Docker and Podman. The most critical was CVE-2024-21626, which exploited a file descriptor leak during container initialization.

Different mechanism than Dirty COW. Identical pattern: a low-level operation that copies or references data across a trust boundary had a flaw. In this case, runc leaked a file descriptor pointing to the host filesystem into the container's process space. An attacker who controlled the container's working directory could use that leaked descriptor to escape the container and access the host filesystem.

This is a copy-primitive bug in spirit. The kernel and runtime are supposed to carefully manage which file descriptors are visible to which namespaces. A file descriptor is just a reference — a pointer to data. When that reference leaks across the container boundary, it's functionally the same as Dirty COW's memory page write: data that should be isolated isn't.

Having worked with container runtimes in production, I can tell you what made Leaky Vessels particularly terrifying wasn't just the escape. It was that the attack could be embedded in a malicious container image. Pull the wrong image, run it, and the container breaks out during initialization — before your runtime security tools even start monitoring. The attack surface was the docker run command itself.

The affected runc versions were patched quickly. But the incident reinforced a point that Adrian Mouat, author of Using Docker, has written about extensively: rootless containers aren't a magic bullet. If a kernel or runtime exploit exists, an attacker can still escalate privileges after breaking out.

Do Rootless Containers Actually Protect You From Copy-Primitive Bugs?

Rootless containers are the single best security improvement most teams can make to their container infrastructure. That's not the debate. The debate is whether they're sufficient.

Rootless containers operate within a distinct user namespace, mapping the container's internal root user to an unprivileged user ID on the host. As Red Hat has documented, the core benefit is straightforward: if there's a container breakout, the attacker only has the privileges of the unprivileged host user, not root.

That matters. A Dirty COW-style exploit inside a rootless container would land the attacker as an unprivileged user on the host rather than root. Massive reduction in blast radius.

But here's where teams get into trouble: they treat rootless mode as the finish line for container security rather than one layer of it. The most severe attacks chain a container escape with a separate kernel privilege escalation. You break out of the container as an unprivileged user, then use a second kernel bug to escalate to root. When Dirty COW was unpatched, that second step was trivial — the same bug that got you out of the container could also get you to root.

This chaining is exactly why copy-primitive bugs are so dangerous. They tend to affect the kernel at a level that's useful for both container escape and privilege escalation. A single bug gives you two steps of the kill chain. I wrote about similar defense-in-depth thinking for AI agents in production — the principle is the same: no single safeguard survives a determined, multi-step attack.

[YOUTUBE:x1npPrzyKfs|Linux Container Primitives: cgroups, namespaces, and more!]

How to Actually Defend Against Linux Copy-Primitive Container Escapes

I've spent the last two years hardening container deployments, and the boring answer is the right one: no single tool solves this. You need layers. Here's what I've seen actually work in production:

Patch aggressively and automatically. Copy-primitive bugs get patched in the kernel within days of disclosure. The problem is most organizations take weeks or months to roll out kernel updates. If you're running Kubernetes, tools like kured (Kubernetes Reboot Daemon) can automate node reboots after kernel updates. If you're running standalone Docker or Podman hosts, unattended-upgrades for the kernel package is table stakes. The window between disclosure and patch is where these bugs get exploited.

Run rootless by default. Yes, I just spent a section explaining why rootless isn't sufficient. It's still essential. Rootless mode in Podman is mature and production-ready. Docker's rootless mode has improved significantly since 2023. If you're still running containers as root in 2026, you're handing attackers a free privilege escalation on every container escape. Stop. Seriously.

Deploy syscall filtering with seccomp profiles. Copy-primitive bugs require specific syscalls to exploit. Dirty COW needed madvise and write. Leaky Vessels exploited WORKDIR processing during container init. Custom seccomp profiles that restrict unnecessary syscalls reduce the exploitability of kernel bugs you haven't even heard about yet. The default Docker seccomp profile blocks about 44 syscalls. For sensitive workloads, you should be blocking far more.

Consider gVisor for high-value workloads. Google's gVisor interposes a userspace kernel between your container and the host kernel. Your container's syscalls don't hit the real Linux kernel directly — they're intercepted by gVisor's Sentry process, which reimplements a subset of Linux syscalls in a sandboxed environment. A copy-primitive bug in the host kernel becomes unexploitable from inside the container because the container never makes the vulnerable syscall directly. The tradeoff is performance overhead and compatibility limitations. For multi-tenant or security-critical workloads, it's the strongest isolation you can get without a full VM.

Monitor for anomalous file descriptor and memory behavior. Tools like Falco can detect runtime behaviors associated with container escapes — unexpected file descriptor access patterns, attempts to access /proc/self/fd entries pointing outside the container's filesystem, or memory mapping operations that shouldn't be happening in your workload. This won't prevent the exploit, but it catches it in progress. Having worked through incident response on container escapes, I can tell you that detection at the early stages of exploit chains matters more than most teams realize.

The Pattern Isn't Going Away

Here's my prediction: we will see another major copy-primitive container escape within the next 18 months. The Linux kernel's memory management, file descriptor handling, and data copying paths are some of the oldest and most complex code in the entire operating system. They're also some of the most security-critical. Ancient + complex + security-critical = more bugs. Count on it.

The container model's fundamental architecture — shared kernel, namespace isolation — means every one of these bugs is a potential container escape. This isn't a flaw in Docker or Podman. It's a structural property of how Linux containers work.

The teams that survive the next copy-primitive bug won't be the ones who picked the right container runtime or checked the right compliance box. They'll be the ones who treated container isolation as one layer in a stack, patched their kernels in hours instead of weeks, and ran their most sensitive workloads behind gVisor or equivalent sandboxing. Rootless mode buys you time. Syscall filtering reduces your surface area. Runtime monitoring catches what slips through. But the kernel is still the kernel. And until containers stop sharing it, copy-primitive bugs will keep breaking the boundaries we trust them to enforce.

The only question is whether you'll be patched when the next one drops.

Frequently Asked Questions

Can Dirty COW still affect containers in 2026?

Dirty COW (CVE-2016-5195) was patched in the Linux kernel in October 2016. Any kernel version from 4.8.3 onward includes the fix. If you're running a supported, updated Linux distribution in 2026, Dirty COW itself is not a direct threat. However, the class of vulnerability it represents — race conditions in copy-on-write memory handling — continues to produce new bugs.

What is the difference between a container escape and a privilege escalation?

A container escape is when code running inside a container gains access to resources outside the container's namespace boundary — such as the host filesystem or another container's processes. A privilege escalation is when a process gains higher permissions than it was originally given, such as going from an unprivileged user to root. These are different attack steps, but they're often chained together: escape the container first, then escalate to root on the host.

Do rootless containers prevent all container escape attacks?

No. Rootless containers ensure that a breakout lands the attacker as an unprivileged host user instead of root, which significantly limits damage. But they don't prevent the escape itself. A kernel-level bug can still allow code inside a rootless container to access host resources. The attacker just has fewer permissions once they get there. For full protection, rootless mode should be combined with seccomp filtering, regular kernel patching, and runtime monitoring.

How does gVisor protect against kernel vulnerabilities?

gVisor runs a userspace kernel called Sentry that intercepts your container's system calls before they reach the host Linux kernel. Instead of your container code directly invoking kernel syscalls, gVisor reimplements a subset of those syscalls in a sandboxed Go process. This means a vulnerability in the host kernel's copy-on-write handling or file descriptor management can't be triggered from inside the container, because those calls never reach the vulnerable host code.

Was Leaky Vessels (CVE-2024-21626) exploited in the wild?

As of early 2024, there was no confirmed evidence of active exploitation before the Leaky Vessels disclosure. Snyk coordinated disclosure with the runc maintainers, and patches were released before proof-of-concept exploits became widely available. However, working exploits were developed quickly after disclosure, making rapid patching essential for any organization running affected runc versions (1.0.0 through 1.1.11).

Why do copy-related bugs keep appearing in the Linux kernel?

The Linux kernel's memory management and data-copying code paths are among the oldest and most complex in the entire codebase. Copy-on-Write, file descriptor passing, and buffer management involve intricate concurrency logic with millions of possible execution paths. These operations are also performance-critical, so they're heavily optimized in ways that can introduce subtle race conditions. The combination of complexity, age, and performance pressure makes these code paths a recurring source of security bugs.

Originally published on kunalganglani.com

Software Engineering Isn't Dead — It's Becoming 'Plan and Review' [2026]

Kunal — Mon, 04 May 2026 16:06:35 +0000

Every week, another breathless headline declares software engineering dead. Another AI demo shows a chatbot building a full-stack app in 90 seconds. Another LinkedIn thought leader posts a funeral wreath emoji next to the words "traditional coding."

And every week, I watch senior engineers at real companies quietly doing something that looks nothing like those demos. They're not typing code line by line. But they're not being replaced, either. They're doing something I've started calling plan-and-review software engineering. And honestly, it's the biggest change in how software gets built since the move from waterfall to agile.

What Is Plan-and-Review Software Engineering?

Plan-and-review software engineering is a workflow where engineers spend most of their time designing systems, writing specifications, orchestrating AI coding tools, and reviewing the output — rather than writing code by hand. The engineer becomes a director. The AI becomes the production crew.

This isn't theoretical. It's already happening. Sundar Pichai disclosed on an earnings call that more than 25% of new code at Google is now generated by AI, then reviewed and accepted by engineers. GitHub's own research shows Copilot users accept roughly 30% of code suggestions, and that number keeps climbing as models improve. Tools like Cursor, Claude Code, and Aider are pushing the boundary further every month.

I've been building software for over 14 years. The shift happening right now is real. Two years ago, I used AI assistants as glorified autocomplete. Today, I routinely describe an entire feature's architecture in natural language, let an AI agent scaffold the implementation, then spend my time reviewing, adjusting, and stress-testing the result. My job didn't disappear. It changed shape.

How Is the Software Engineering Role Changing Because of AI?

Here's the thing nobody's saying about this shift: it doesn't make the job easier. It makes it different. In some ways, harder.

When I was writing every line myself, I had intimate knowledge of what the system was doing because I'd typed it into existence. Now, when an AI generates 200 lines of a service layer in seconds, I need to understand that code just as deeply without having written it. That's a genuinely different kind of expertise.

The engineers I see thriving in plan-and-review workflows share a specific set of skills:

System design thinking. If you can't articulate what needs to be built at an architectural level, you can't direct an AI to build it well. Vague prompts produce vague code. Every time.
Specification writing. The prompt is the spec now. Engineers who write precise, unambiguous descriptions of behavior get dramatically better results than those who wing it.
AI orchestration. Knowing which tool to use for which task, how to chain agents together, when to break a problem into sub-problems the AI can handle independently. I've written about how AI coding agents are reshaping the way we think about code, and this orchestration layer is where the real leverage lives.
Critical code review. Not just "does this compile" review. Deep review that catches subtle logic errors, security holes, and architectural drift. AI-generated code looks confident even when it's dead wrong.
Domain expertise. The AI doesn't know your business rules, your compliance requirements, or why that edge case from three years ago almost took down production at 2 AM. You do.

Addy Osmani, Engineering Lead at Google, has written extensively about this. He's argued the developer's role is moving toward being a "reviewer-in-chief" — someone whose primary value comes from judgment, not keystrokes. That framing tracks with what I'm seeing on the ground.

The engineers who will be most valuable in 2026 aren't the ones who type the fastest. They're the ones who think the clearest.

What's the Difference Between Vibe Coding and Plan-and-Review Engineering?

Most people are conflating these two things. That's a mistake.

Vibe coding is what happens when someone opens an AI tool, types "build me a task management app," and ships whatever comes out. It's fast. It's fun. And it produces code that, in my experience auditing AI-generated projects, creates serious technical debt within weeks. I've personally seen vibe-coded applications with hardcoded secrets, SQL injection vulnerabilities, and architectural patterns that make future changes nearly impossible.

Plan-and-review engineering is the professional version of the same technology stack. The difference isn't the tools. It's the process.

A plan-and-review engineer starts with architecture. They define the data model, the API contracts, the error handling strategy, and the testing approach before the AI writes a single line. Then they use AI to accelerate implementation of a well-defined plan. Then they review the output with the same rigor they'd apply to a junior developer's pull request. Probably more rigor, honestly, because AI makes confident mistakes that a junior would at least flag with a comment saying "not sure about this."

Same equipment. Wildly different outcomes.

This is why I push back hard when people say AI will eliminate the need for engineering skill. It's the opposite. AI amplifies the gap between engineers who understand systems deeply and those who don't. A strong engineer with AI tools is 10x more productive. A weak engineer with AI tools produces 10x more bugs.

Will AI Replace Software Engineers?

Short answer: no. Longer answer: it will replace software engineers who refuse to adapt.

The data tells a clear story. Stack Overflow's 2024 Developer Survey found that 76% of developers are using or planning to use AI tools, but only 43% trust the accuracy of AI-generated code. That trust gap is exactly where human engineers live. Someone has to close it.

I've shipped enough features to know that the hard part of software engineering was never typing. It was figuring out what to type. It was debugging the interaction between three microservices at 11 PM when the monitoring dashboard lit up red. It was sitting in a room with a product manager and translating "we need it to be faster" into a concrete set of database indexes and caching strategies.

AI can't do that yet. And even when it gets closer, someone will still need to validate that it did it correctly. That's the plan-and-review loop.

What is disappearing is the junior developer task of implementing well-specified, straightforward features from scratch. If the task is "add a CRUD endpoint for this data model," an AI can do that in seconds. This means the entry path into software engineering is shifting. New engineers need to develop system-level thinking faster than previous generations did. I've written about how the state of software engineering is evolving in 2026, and the through-line is clear: the floor for what counts as "engineering work" is rising. Fast.

What Skills Do Software Engineers Need in the Age of AI?

If I were starting my career today, here's where I'd put my time:

Architecture and system design. This is the highest-leverage skill in a plan-and-review world. If you can design the system correctly, AI can build it. If you can't, no amount of tooling saves you.
Reading code faster than writing it. Most engineering education optimizes for writing. The future optimizes for reading, understanding, and evaluating code you didn't write. Get comfortable reviewing large diffs quickly.
Prompt engineering as specification. Not the gimmicky "10 magic prompts" stuff. Real specification writing. The kind where you define constraints, edge cases, and acceptance criteria in natural language so precisely that an AI produces correct code on the first try.
Testing and validation. If AI writes the code, humans validate the behavior. Property-based testing, integration testing, adversarial testing. These become even more critical when the code wasn't written by someone who understands the business context.
Domain knowledge. The deepest moat any engineer can build. AI is generic. Your understanding of healthcare compliance, financial reconciliation, or real-time bidding systems is specific and irreplaceable.

Having worked on teams that adopted AI-assisted development early, I can tell you: the engineers who struggled weren't the ones with fewer years of experience. They were the ones who had spent their careers optimizing for code output rather than system understanding. The fast typists suddenly had less of an edge. The careful thinkers had more of one.

The Director's Cut

Here's my prediction: by the end of 2027, the majority of professional software will be built using some version of plan-and-review. Not because it's trendy, but because the economics are brutal. A team of three senior engineers using AI-assisted plan-and-review workflows can match the output of a team of ten working the old way. Companies that don't adopt this will lose on speed and cost. Period.

But that prediction comes with a warning. The quality of software built this way depends entirely on the quality of the humans doing the planning and reviewing. We've already seen what happens when organizations treat AI coding as a shortcut to eliminate engineering judgment — they get code quality crises and maintenance nightmares.

Software engineering isn't dying. The craft of writing code by hand is becoming a smaller part of a much larger discipline. The engineers who recognize this and invest in the skills that actually matter — architecture, orchestration, validation, domain expertise — won't just survive the AI era. They'll define it.

Stop mourning the old job. Start mastering the new one.

Originally published on kunalganglani.com

Khan Academy Khanmigo AI Tutor: The 'AI Degree' That Doesn't Exist and What Actually Does [2026]

Kunal — Mon, 04 May 2026 12:48:38 +0000

Khan Academy Khanmigo AI Tutor: The 'AI Degree' That Doesn't Exist and What Actually Does [2026]

Every few weeks, a headline floats through my feed claiming Khan Academy has launched some kind of AI degree for developers. It hasn't. There is no Khan Academy AI degree. But the thing Khan Academy has built — an AI tutor called Khanmigo — deserves a more honest conversation than the clickbait it usually gets. What's actually happening in AI-powered education is more interesting, and more complicated, than a certificate you can slap on your LinkedIn.

I've spent 14+ years in software engineering, and I've watched the "how developers learn" question get reshaped by every wave: MOOCs, bootcamps, YouTube tutorials, and now AI tutors. Khanmigo is the latest entrant, and it represents a genuinely different philosophy. Here's what it actually is, who it's for, and what it means for developer education in 2026.

What Is Khanmigo and How Does It Actually Work?

Khanmigo is Khan Academy's AI-powered tutoring assistant, built on top of Google's Gemini model. It launched in 2023 and has been expanding steadily since. The core idea, as Sal Khan, Founder and CEO of Khan Academy, describes it, is a "Socratic tutor" — an AI that guides students through problems without simply handing them answers.

This distinction matters more than it sounds. If you've used ChatGPT to learn something, you know the default behavior: you ask a question, you get a complete answer. Useful for looking things up. Terrible for actual learning. Khanmigo deliberately refuses to do this. It asks follow-up questions, nudges you toward the next logical step, and makes you do the cognitive work yourself.

Having built onboarding systems for engineering teams, I can tell you the gap between "giving someone the answer" and "guiding someone to the answer" is enormous. The first creates dependency. The second builds problem-solving instincts. Khan Academy is betting that AI can finally make the second approach scalable.

Khanmigo also doubles as a teaching assistant. Kristen DiCerbo, Chief Learning Officer at Khan Academy, has shared data from early pilots showing the tool helps teachers save 30-60 minutes per day on administrative tasks like lesson planning. That's not a trivial number. It's the difference between a teacher who has time to give individual feedback and one who doesn't.

Is There Really a Khan Academy AI Degree?

No. Full stop.

Khan Academy has launched AI literacy courses — most notably an "AI for Education" course built in partnership with Google DeepMind, as reported by Anya Kamenetz at Fast Company. But this course targets educators and parents who want to understand what AI is and how it works. It's not a technical certification. It's not a degree. It will not teach you to fine-tune models or build retrieval-augmented generation pipelines.

The confusion likely comes from the sheer volume of "AI degree" and "AI certification" content flooding search results right now. Everyone from Coursera to Google to random Udemy instructors is marketing some form of AI credential, and Khan Academy's name gets swept into that current because it's the most recognizable brand in free online education.

Here's the thing nobody's saying about this: Khan Academy isn't trying to compete with formal AI certificate programs. They're doing something fundamentally different. Rather than creating a new credential, they're embedding AI into the learning process itself. The product isn't an AI course. The product is an AI tutor that helps you learn anything on their platform more effectively.

If you're a developer looking for an actual AI credential that'll move the needle on your career, I wrote about the skills that actually matter in the full-stack developer roadmap for 2026. Spoiler: it's less about certificates and more about what you can demonstrably build.

Who Is Khanmigo Actually For?

Most coverage of Khanmigo gets this wrong, so let me be direct.

Khanmigo is primarily designed for K-12 students and their teachers. It's not a developer tool. It's not competing with GitHub Copilot or Cursor or any of the AI coding assistants that working engineers use daily. The target user is a high school student struggling with algebra, or a teacher who needs help creating a lesson plan for AP Computer Science.

Khan Academy has made significant moves to broaden access. In 2023, Microsoft announced a partnership to provide Khanmigo for Teachers free to all K-12 educators in the United States, backed by Azure AI infrastructure, as reported by Daniel M. Filler at Forbes. Since then, Khan Academy has been steadily expanding free access to Khanmigo for learners as well — moving away from its initial $9/month or $99/year pricing model. With backing from both Microsoft and Google, the trajectory is clearly toward making the AI tutor freely available to as many students as possible.

So if you're a mid-career developer wondering whether Khanmigo will help you learn transformer architectures or master Kubernetes, the honest answer is no. Khan Academy's content library is deep in math, science, and introductory computing — not the kind of advanced technical material senior engineers need. I've seen too many experienced developers waste time on learning resources pitched two levels below where they actually are. Know your level. Pick your tools accordingly.

But I'd push back on pure dismissal. If you're mentoring junior developers, or involved in hiring and onboarding, Khanmigo's Socratic tutoring approach is worth studying. The model of "don't give the answer, guide toward it" is exactly what good engineering mentorship looks like. As I discussed in how AI is reshaping the role of software engineers, the ability to think through problems systematically is becoming more valuable as AI handles more routine code generation.

How Khanmigo Compares to Formal AI Certificate Programs

The real comparison isn't Khanmigo vs. other AI tutors. It's Khanmigo's philosophy vs. the credentialing industry.

On one side, you have companies like Google (with their AI Essentials certificate), Coursera (partnered with DeepLearning.AI), and AWS (with their machine learning specializations). These are structured programs with defined curricula, assessments, and certificates you can add to your resume. They cost anywhere from free to several hundred dollars, and they take weeks to months to complete.

On the other side, Khan Academy is saying: "We're not going to give you a certificate. We're going to give you an AI tutor that helps you learn better." Different game entirely.

Feature	Formal AI Certificates	Khanmigo AI Tutor
Credential on completion	Yes	No
Target audience	Working professionals	K-12 students, educators
AI-specific content depth	High (ML, deep learning)	Low (AI literacy basics)
Learning methodology	Structured courses	Socratic, adaptive tutoring
Cost	$0–$500+	Free (expanding access)
Resume signal	Direct	None

For developers, formal certificates are the more pragmatic choice right now. If you need to demonstrate AI competency to a hiring manager, a Google or AWS certificate does that. Khanmigo doesn't.

But Khan Academy is playing a longer game. If Khanmigo proves that AI-guided Socratic tutoring genuinely produces better learning outcomes than passive video courses, every corporate learning platform, every bootcamp, every university will need to reconsider how they deliver content. The credential matters less if the learning is demonstrably deeper.

What This Means for Developer Education

Developer education in 2026 is broken in a specific way: there's infinite content and almost no effective learning. You can find a tutorial on literally anything. But most tutorials teach you to copy patterns, not to think. The tech job market bifurcation I've written about is partly a learning problem. Developers who can regurgitate framework syntax are struggling. Developers who can reason about systems are thriving.

Khanmigo, for all its limitations in scope, is pointed at the right problem. The Socratic method forces active reasoning. You can't passively sit through a Khanmigo session the way you can passively watch a 4-hour YouTube tutorial at 2x speed. That friction is the point.

I've shipped enough features and mentored enough junior engineers to know this firsthand: the developers who grow fastest are the ones who struggle productively. Not the ones who copy-paste from Stack Overflow or ask ChatGPT to write their code. Struggle, when guided properly, is the actual learning mechanism. This is one of those things where the boring answer is actually the right one.

The question worth asking isn't whether Khan Academy will launch a developer-focused AI degree. They won't. It's whether Khanmigo's model — AI as Socratic tutor rather than answer machine — becomes the default approach for technical education over the next five years.

The future of developer education isn't AI that writes your code for you. It's AI that makes you a better thinker by refusing to write it for you.

If you're a working developer, Khanmigo isn't your next learning tool. But if you care about how the next generation of engineers will learn to think — and eventually join your team — pay attention to what Khan Academy is building. The AI tutor that asks questions instead of giving answers might be the most counterintuitive and most important bet in education right now.

The developers who'll dominate the next decade won't be the ones with the most certificates. They'll be the ones who learned how to reason. That's the game Khan Academy is actually playing.

Originally published on kunalganglani.com

State of Software Engineering in 2026: A Reality Check Beyond the AI Hype

Kunal — Sun, 03 May 2026 16:09:26 +0000

State of Software Engineering in 2026: A Reality Check Beyond the AI Hype

Three and a half years ago, Matt Welsh, PhD and former Google engineer, published "The End of Programming" in Communications of the ACM and declared that classical computer science was over. The meteor had hit. Engineers were the dinosaurs. The state of software engineering in 2026, he implied, would look nothing like what came before.

He was half right.

I've spent 14+ years building software systems, leading engineering teams, and shipping products. What I see in mid-2026 is messier than any of the hot takes predicted. AI didn't kill software engineering. But it did reshape what "being a good engineer" means in ways that matter. The developers who ignored this shift are struggling. The ones who leaned into it thoughtfully are doing the best work of their careers.

Here's what actually happened.

How Has AI Actually Changed Day-to-Day Coding?

McKinsey estimates that generative AI can accelerate coding by 35 to 45 percent, documentation by 45 to 50 percent, and testing by 30 to 45 percent. Thomas Dohmke, CEO of GitHub, published research showing developers using Copilot completed tasks 55% faster than those without it.

Those numbers are real. I've seen them play out on my own teams. But here's the thing nobody's saying about those productivity gains: they're concentrated almost entirely in the boring parts of the job.

Boilerplate CRUD endpoints? AI crushes that. Generating test scaffolding? Fantastic. Writing documentation that nobody wanted to write anyway? AI is genuinely great at this. I've watched junior developers produce first drafts of API docs in minutes that would have taken hours.

But the moment you move into ambiguous territory — figuring out the right data model for a system that needs to serve three different teams with conflicting requirements, or debugging a race condition that only shows up under specific load patterns — AI assistants become expensive rubber ducks. They'll confidently suggest solutions that sound plausible and are completely wrong.

Dohmke describes AI as a "thought partner" that helps developers reduce cognitive load and maintain flow state. I agree with that framing, but only when you already know roughly what you're building. AI accelerates execution. It does not accelerate understanding.

The engineers who got faster are the ones who were already good. The ones who were struggling didn't get rescued by AI — they got faster at producing code that still needed to be rewritten.

In my experience, roughly 40% of AI-generated code gets rewritten within two weeks. Not because the AI wrote "bad" code in the syntactic sense, but because it wrote the wrong abstraction, missed an edge case in the business logic, or created something that didn't compose well with the existing system. If you want the deep dive on why, I wrote about the maintainability crisis of AI-generated code earlier this year.

What Skills Do Software Engineers Need in 2026?

This is where it gets interesting. The skills that matter most in 2026 aren't the ones you'd learn from a bootcamp or a "10x developer" YouTube tutorial. They're the skills that were always valuable but are now non-negotiable.

System design is the new literacy. When AI can generate individual components quickly, the bottleneck shifts to the person who decides what components should exist, how they talk to each other, and what happens when one of them fails at 3am. Conor Bronsdon of LinearB put it well: the shift is from "code monkeys" to "problem solvers." I'd go further. It's from people who write code to people who design systems.

Debugging skills matter more, not less. This sounds counterintuitive. If AI writes more code, shouldn't there be less debugging? Nope. There's more. Because now you're debugging code you didn't write, that follows patterns you didn't choose, with assumptions you might not share. It's closer to debugging a colleague's code than your own — you have to read with skepticism. I've written about how AI coding agents are changing the way we think about code, and debugging is the skill that keeps coming up in those conversations.

Business context is your moat. As Harvard Business Review highlighted, AI can generate the "how" but it struggles with the "what" and "why." The engineer who understands why the billing system needs to handle partial refunds differently for enterprise customers versus consumers — that's someone AI can't replace. Gunnar Griese, VP of Engineering at Wayfair, calls this the evolution into a "techno-sociologist" who understands both the technology and the business deeply. I think that's exactly right.

Communication is a force multiplier. The best code in the world is worthless if you can't explain the tradeoffs to a product manager, write a clear RFC, or document your decisions for the engineer who maintains the system two years from now. AI has actually made good documentation even more critical, because AI-generated systems need more context to be maintainable.

Is Prompt Engineering a Real Skill for Developers?

Let me be direct: prompt engineering as a standalone discipline is mostly dead. But prompt literacy as a core developer competency is very much alive.

The difference matters. In 2023 and 2024, people were building entire careers around "prompt engineering" as if crafting the perfect system prompt was a durable skill. It wasn't. Models got better at understanding intent. The gap between a mediocre prompt and a great one narrowed significantly.

What didn't go away is the meta-skill: knowing how to decompose a problem so that an AI tool can actually help you solve it. This is really just good engineering thinking applied to a new tool. You need to know what to ask for, how to evaluate the output, and when to throw it away and do the thing yourself.

I've shipped enough features alongside AI tools to know that the developers who use them best treat them like a very fast, very confident intern. You wouldn't hand an intern a vague requirement and expect production-ready code back. You'd break the problem down, give clear context, review the output carefully, and iterate. Same thing.

[YOUTUBE:PEFso88LkC4|My Honest Thoughts on AI and the Job Market in 2026 (No Hype)]

What Parts of Software Engineering Can AI Not Do?

Here's my honest list of things AI is genuinely bad at in mid-2026, despite years of rapid improvement:

Cross-system reasoning. AI can work within a single file or module brilliantly. Ask it to reason about how a change in the authentication service will cascade through the event bus to affect the billing pipeline, and it falls apart. Real systems are messy graphs, not clean trees.
Organizational context. Why did we choose Postgres over DynamoDB for this service? Because the team that owns it has three Postgres experts and zero DynamoDB experience. AI doesn't know this. It will happily recommend the "optimal" solution that your team can't actually operate.
Saying no. This one doesn't get talked about enough. AI will build whatever you ask for. It won't push back and say "this feature is a bad idea because it conflicts with what we shipped last quarter." It won't tell you the complexity isn't justified by the user value. That judgment is still entirely human.
Debugging production under pressure. When your system is down at 2am and you're staring at a graph that shows p99 latency spiking while CPU is flat, you need pattern recognition built from years of being in that seat. AI can suggest possibilities. It can't feel the system. It can't say "this smells like a connection pool leak" the way a senior engineer who's been burned by one before can.
Navigating ambiguity. The hardest part of most engineering projects isn't writing the code. It's figuring out what to build when the requirements are contradictory, the stakeholders disagree, and the timeline is unrealistic. No model solves that for you.

Matt Welsh was right that the role is changing. But the direction of change is toward more human judgment, not less. The mechanical parts got automated. The parts that require wisdom, context, and taste became more valuable.

Will AI Replace Software Engineers?

No. But it will replace software engineers who refuse to adapt.

This is one of those things where the boring answer is actually the right one. The state of software engineering in 2026 isn't a dystopia where engineers are obsolete, and it isn't a utopia where AI handles everything while we sip coffee. It's a messy middle where the tools got dramatically better and the expectations rose to match.

Here's what I've seen across the teams I've worked with: the engineers who are thriving share three traits.

First, they use AI tools aggressively for the tasks those tools are good at. They don't resist out of pride. They don't waste time hand-writing boilerplate.

Second, they invest heavily in the skills AI can't replicate. System design, stakeholder communication, debugging under ambiguity, deep domain knowledge.

Third, they maintain strong opinions about code quality and architecture. They don't accept AI output uncritically. They treat it as a starting point, not a finished product. As I wrote in my piece on vibe coding tech debt, the teams that skip this review step pay for it within weeks.

The engineers who are struggling fall into two camps: the ones who rejected AI tools entirely and fell behind on velocity, or the ones who embraced them uncritically and are now drowning in tech debt they don't understand. Both extremes lose.

The Craft Isn't Dead. The Bar Just Moved.

Software engineering in 2026 demands more from practitioners, not less. The floor got raised — anyone can scaffold an app with an AI assistant now. But the ceiling got raised too. The best engineers are building more ambitious systems, faster, because they've integrated AI into their workflow without surrendering their judgment.

My prediction for the next two years: the gap between engineers who can design systems and engineers who can only write code will widen dramatically. Companies will stop hiring for "coding ability" and start hiring for "systems thinking with AI fluency." The job title stays the same. The job description is already unrecognizable.

If you're a software engineer reading this, here's what I'd do today: get uncomfortable with AI tools if you haven't already. But spend twice as much time on system design, on understanding your business domain, and on learning to communicate technical decisions clearly. Those are the skills that compound. Those are the ones no model can automate away.

The craft of software engineering isn't dying. It's being distilled down to the thing it was always actually about: thinking clearly about hard problems. The typing was never the point.

Originally published on kunalganglani.com

How to Write a Good README: Your Project's Most Important File [2026 Guide]

Kunal — Fri, 01 May 2026 12:47:29 +0000

Most open-source projects die in silence. Not because the code is bad, but because the README is.

I've evaluated hundreds of GitHub repositories over fourteen years of engineering work. Reviewing them for adoption at companies, scanning them for open-source contributions, auditing them for internal tooling decisions. Every single time, the first thing I look at is the README. Not the source code. Not the issues. The README. If it's empty, vague, or a wall of unformatted text, I close the tab. So does everyone else. Learning how to write a good README is one of the highest-leverage skills a developer can build, and most of us never bother.

A README is a text file that introduces, explains, and sells your project. It lives at the root of your repository, and platforms like GitHub automatically surface it to every visitor. It answers three questions: what does this do, why should I care, and how do I use it. Get those right, and you've built the foundation for everything else — contributors, users, trust.

Your README is the beginning of your project's user experience. A bad one can be a major turn-off for potential users and contributors.
— David Oglesby, Principal Engineer at Heptio

Why Your README Matters More Than Your Code

Here's the thing nobody says about open source: your code quality is invisible until someone actually clones the repo and starts reading it. Your README quality is visible in under three seconds.

Daniel Beck, a Software Engineer who has spoken extensively on documentation practices, makes the argument that a README is an opportunity to demonstrate professionalism and empathy. It shows you care about the people who might use your work, not just the work itself. That signal matters. When I'm evaluating two libraries that solve the same problem, the one with the clear, well-structured README wins almost every time. It's not rational. It's human. If you can't explain your project clearly, why would I trust your architecture decisions?

Tom Preston-Werner, co-founder of GitHub, coined the term "Readme Driven Development" back in 2010. His argument was simple: write the README before you write any code. The act of explaining your software forces you to think through what you're actually building. More than a decade later, this advice is still underrated. I've shipped features that would have been scoped completely differently if I'd written the README first. The README isn't documentation you bolt on at the end. It's a design document that happens to also be user-facing.

For those maintaining open-source projects, this matters doubly. The open source sustainability crisis is real, and a well-crafted README is one of the few zero-cost tools that directly increases contributor retention. People contribute to projects they understand.

What Should a Good README Include?

The Make a README project provides a solid starting template covering the essentials: Installation, Usage, Contributing, and License. That's a good floor. But a README that only covers the basics is like a landing page with no value proposition. You need to go further.

Here's what a strong README actually contains, and more importantly, why each section earns its place:

Project title and one-line description. This sounds obvious, but I've seen repos where I genuinely couldn't figure out what the project did after reading the first three paragraphs. Your first line should tell me what this is and who it's for. "A lightweight CLI tool for converting Markdown to PDF" beats "Welcome to ProjectX!" every time.

The Why. Most READMEs skip straight to installation. That's a mistake. Before I install anything, I need to know why this exists. What problem does it solve? What alternatives did you consider? Two sentences here save your users twenty minutes of research.

Installation instructions. Be specific. Include the package manager command, the minimum runtime version, and any system dependencies. Don't assume everyone is on your OS. If there's a Docker option, mention it.

Usage examples. Show me the two or three most common things someone would do with your project. Keep it concrete. A single clear example is worth more than a link to full API docs.

Contributing guidelines. Even a short section signals that you welcome contributions. Link to a CONTRIBUTING.md if you have one, but put the basics in the README itself. How to run tests, how to submit a PR, what the code style expectations are.

License. Always include it. A project without a license is legally unusable in most corporate environments. MIT, Apache 2.0, GPL — pick one and state it clearly.

Here's a video walkthrough that covers these fundamentals well:

[YOUTUBE:E6NO0rgFub4|How To Write a USEFUL README On Github]

How to Write a README That Builds Trust Instantly

Beyond the essentials, there's a layer of README craft that separates good projects from the ones that actually get adopted. These are the details that build trust before someone writes a single line of integration code.

Badges. Those little colored shields at the top of a README — build passing, coverage percentage, npm version, license type — aren't decoration. According to shields.io, which powers most of them, badges provide live, at-a-glance information about a project's health. A green "build passing" badge tells me this project has CI. A coverage badge tells me someone cares about testing. I've worked on teams where the presence or absence of badges was literally part of the library evaluation checklist.

Screenshots or GIFs. If your project has any visual component — a CLI with colored output, a web UI, a mobile app — show it. A three-second GIF communicates more than ten paragraphs of description. Tools like asciinema for terminal recordings or simple screen captures go a long way.

Architecture diagrams. For anything more complex than a single-purpose utility, a high-level architecture diagram pays for itself immediately. GitHub now natively renders Mermaid.js syntax in Markdown files. That means you can embed flowcharts, sequence diagrams, and entity-relationship diagrams directly in your README without generating external images. No extra build step, no stale PNGs. Just write the Mermaid syntax in a fenced code block and GitHub renders it. I've started using this for every project with more than two components. The drop in "how does this work?" questions has been noticeable.

A table of contents. If your README is longer than a few screen heights, add one. Markdown doesn't have native TOC support, but GitHub auto-generates anchor links for headers. A simple bulleted list at the top linking to each section makes your README navigable instead of scrollable.

If you've dealt with AI-generated code quality issues, you know how important clear documentation is for codebases that may have been partially generated. A strong README is your first line of defense against confusion.

The 5 README Mistakes That Kill Open-Source Projects

After reviewing hundreds of repositories, both professionally and as an open-source contributor, I see the same mistakes constantly. These are the ones that actually cost projects users and contributors:

The empty README. Just a project name and nothing else. This is the equivalent of opening a restaurant with no sign, no menu, and the lights off. GitHub will still surface this file. It'll just surface your indifference.
The "obvious to me" README. Installation instructions that assume you already know the project's ecosystem. "Run make install" with no mention of dependencies, build tools, or supported platforms. What's obvious to you after six months of development is completely opaque to a first-time visitor.
The novel. Ten thousand words, no headers, no structure, no visual hierarchy. The Make a README project makes this point well: a README should be scannable. Engineers don't read documentation linearly. They scan for the section they need. If your README is a single block of prose, nobody is reading it.
The outdated README. Installation instructions referencing a deprecated API. Screenshots of a UI redesigned two versions ago. Config examples that throw errors on run. An outdated README is worse than a missing one because it actively misleads people. I once spent forty minutes debugging a setup issue that turned out to be a README pointing to a config format the project had abandoned months earlier. Forty minutes I'll never get back.
The "see the docs" README. A single line: "For documentation, visit our wiki." And the wiki is either empty, disorganized, or requires authentication. Your README is the documentation entry point. If someone has to leave the repository to understand your project, you've already lost most of them.

Advanced README Patterns Worth Stealing

Once you've nailed the basics, here are patterns I've seen in the best READMEs across the ecosystem.

The comparison table. If your project exists in a competitive space (and most do), a brief comparison table showing how you differ from alternatives is worth the effort. Columns for features, performance characteristics, or philosophy. This isn't about trashing competitors. It's about helping users make informed decisions quickly.

The "non-goals" section. Explicitly stating what your project doesn't do is almost as valuable as stating what it does. It saves users from evaluating your tool for a use case you'll never support, and it signals architectural maturity. I've started including this in internal project docs too, and it dramatically reduces scope creep conversations.

The quick-start vs. full guide split. Give impatient users (which is all of us) a five-line quick-start at the top, then provide the detailed guide below. This respects both the "I just want to try it" user and the "I need to understand everything" user.

Versioned compatibility matrices. A table showing which versions of your project work with which versions of its dependencies. Especially critical for libraries. Nothing wastes developer time like version incompatibility surprises, and a simple table prevents hours of debugging. If you've ever dealt with vibe-coded tech debt, you know that unclear dependency documentation makes a bad situation catastrophic.

The README Is the Product

Here's my actual take: the line between documentation and product is gone. In a world where developers evaluate tools in minutes, not days, your README is the product experience. The landing page, the sales pitch, the onboarding flow, and the support document. All compressed into a single Markdown file.

I've watched projects with mediocre code but excellent READMEs outperform technically superior projects with terrible documentation. That's not a fluke. That's the market telling you something. The projects that win aren't always the best-engineered. They're the most accessible.

So here's my challenge: go look at the README of your most important project right now. Read it as if you've never seen the codebase. Does it answer what this is, why it exists, and how to use it in under sixty seconds? If not, that's your highest-impact commit this week. Not a new feature. Not a refactor. A README rewrite.

The best code in the world is worthless if nobody can figure out what it does.

Originally published on kunalganglani.com