DEV Community: Zak Mandhro

GitHub Copilot just crushed every AI review startup (40.3M PR analysis)

Zak Mandhro — Fri, 19 Dec 2025 16:00:00 +0000

I analyzed 40.3 million public pull requests from 2022-2025.

The data is brutal: GitHub Copilot now dominates organizational adoption despite CodeRabbit processing more total PRs.

This isn't a hot take. It's what 40M PRs told us.

The Uncomfortable Truth

CodeRabbit was built from the ground up for AI code review. It's genuinely good at what it does.

👑 CodeRabbit: #1 in PR volume for 2025.

But that crown is slipping. Copilot overtook them in monthly PRs in November, and already leads in org adoption.

Copilot wasn't built for code review. It started as autocomplete. Yet here we are.

Why Platform Usually Wins

Here's what the data actually shows:

Copilot's advantage isn't the model. It's the distribution.

Pre-installed in every GitHub org that pays for it
Zero friction to enable
Shows up in the workflow developers already use
No new vendor to approve, no new tool to learn
Can bundle pricing into existing subscriptions

CodeRabbit requires you to:

Find and evaluate it
Get approval and install it
Configure and maintain it

That friction compounds. Copilot wins by being there.

The Consolidation Already Happened

Look at this:

Top 3 AI review agents control 72% of all activity.

Rank	Agent	Share
1	CodeRabbit	~33%
2	Copilot	~29%
3	Gemini	~10%
Everyone else		~28%

And it's getting worse for the long tail.

Korbit - raised money, had traction, purpose-built for code review - shut down this year.

The market consolidated before most people realized there was a market.

The Long Tail Is Crowded

Beyond the top 10, dozens of AI review agents are fighting for what's left:

Agent	Orgs
Cubic AI	~197
Ellipsis	~160
OpenHands	~160
Bolt AI	~114

Based on observable public repo installs

And there are dozens more. All competing for the 28% not owned by the top 3.

The Platform Giants Are Coming

What happens when OpenAI and Google actually start pushing their code review features?

Look at org adoption growth from October to November alone:

Agent	Oct 2025	Nov 2025	Growth
ChatGPT	4,875	5,911	+21%
Gemini	1,220	2,788	+128%

ChatGPT is pulling away. The gap between them grew 5x in a single month.

And Gemini still grew 43x this year - from basically nothing to #3 in PR volume globally.

Neither OpenAI nor Google is even trying yet. No big marketing push. No deep IDE integration. No bundling with Workspace or Cloud.

And they have the same platform leverage as Microsoft. Millions already subscribe to ChatGPT and Google Workspace for other use cases. Code review becomes a free add-on, not a new line item.

When they flip that switch, what happens to Sourcery, Greptile, Ellipsis, and the rest of the long tail?

The Adoption Curve Is Insane

Let me show you how fast this moved:

14x in 18 months.

1 in 7 PRs now has an AI reviewer participating. Not a prediction. Already happening.

What This Means If You're Building Dev Tools

I'll be direct:

Distribution > Features. Copilot proved it. Being native to the platform beats being best-in-class at the feature.
The platform tax is real. If GitHub/Microsoft decides your feature is worth building, you're competing against free + pre-installed.
Consolidation is faster than you think. Some players won't survive 2026.
OpenAI and Google are the wildcards. Both growing fast with virtually zero marketing effort for code reviews.

My 2026 Prediction

Copilot, ChatGPT, and Gemini lock in the top 3. Claude and Cursor climb to #4 and #5.
AI agents author 2M+ PRs (up from 335K in 2025).
AI-reviewing-AI becomes a thing - agents reviewing agent-authored code grows 10x.
At least 2 long-tail players shut down or get acqui-hired.

The Full Data

Everything I cited is from our State of AI Code Review 2025 report:

📊 Full report + methodology

Includes monthly breakdowns, full agent rankings, and the methodology for how we identified and classified AI agents.

Methodology note: We filtered for active repos only - at least 10 PRs/month and 0.3 feedback events per PR. This filters out noise and surfaces repos with real development activity.

I Want the Counterarguments

Seriously. Tell me where I'm wrong:

Is "platform beats product" too simplistic?
Are there code review startups that can survive the consolidation?
Is the Copilot dominance overstated because of GitHub's visibility bias?
Am I underestimating how much enterprises care about best-in-class vs. good-enough-and-integrated?

I've been staring at this data for weeks. I want someone to challenge it.

I work on PullFlow - we're agent-agnostic, building unified code review across GitHub, Slack, and AI agents. This research came from trying to understand where the market is actually going.

How to Run Your Own OpenAI GPT OSS Server for Fun and Profit

Zak Mandhro — Thu, 07 Aug 2025 16:40:20 +0000

Deploy GPT-OSS locally on a commodity gaming PC and watch your API bills disappear while your team's productivity soars

The game changed in August 2025 when OpenAI dropped GPT-OSS—their first open-weight models since GPT-2. These aren't toy models; gpt-oss-120b matches OpenAI's proprietary o4-mini on reasoning benchmarks while gpt-oss-20b rivals o3-mini, and both can run on hardware you can order from Amazon today.

This isn't just about having cool tech on your desk. This is about fundamentally changing the economics of AI for your team, gaining complete control over your models, and having unlimited access to enterprise-grade reasoning capabilities without the monthly subscription anxiety.

Why Your Team Needs Local AI (Spoiler: It Pays for Itself in Months)

Let's talk numbers that matter to your bottom line. If you have 10 team members using AI tools at an average of $30 per month each (ChatGPT Plus, Claude Pro, or API costs), you're spending $3,600 annually. A capable GPT-OSS server costs $2,700-$3,100 upfront, meaning your hardware investment pays for itself in 8-12 months.

But the economics get even better:

Year 2 savings: $3,600
Year 3 savings: Another $3,600
Cost per additional user: Nearly zero

The Business Case Beyond Cost Savings

GPT-OSS comes with Apache 2.0 licensing, which means you can fine-tune these models on your proprietary data, customize behavior for your industry, and create competitive advantages that API-based solutions simply can't offer. Your legal team processes contracts differently than your marketing team writes copy—local AI lets you optimize for both without compromise.

Data sovereignty becomes critical when you're processing sensitive information. Client data, internal strategies, and proprietary code never leave your network. Compare this to cloud APIs where your data travels through external servers, potentially triggering compliance headaches in regulated industries.

No rate limits means your developers can iterate freely, your content team can brainstorm without throttling, and your data analysts can process large datasets without worrying about quota exhaustion. The psychological shift from "conserving API calls" to "unlimited experimentation" unlocks creativity and productivity gains that are difficult to quantify but impossible to ignore.

Shopping Made Easy: Your GPT-OSS Powerhouse from Amazon

The sweet spot for GPT-OSS deployment centers around NVIDIA RTX 4080/4080 Super GPUs with 16GB+ VRAM, capable of delivering up to 250 tokens per second for the gpt-oss-20b model. These systems handle the computational demands while remaining accessible to small teams and growing businesses.

Three Proven Amazon Options

Budget Leader: iBUYPOWER Y40 (~$2,400-2,700)
Intel Core i7-14700KF, RTX 4080 Super 16GB, 32GB DDR5, 2TB NVMe SSD

Perfect for teams of 5-15 users
Handles gpt-oss-20b with room for growth
Professional build quality with warranty support

Performance Pick: Skytech Legacy (~$2,700-3,100)

Intel i7-14700K, RTX 4080 Super, 32GB DDR5 RGB, 2TB Gen4 NVMe

Optimized cooling for sustained workloads
Premium components for reliability

Current Deal: Skytech O11 (~$2,699, down from $3,099)
Intel i7-14700K, RTX 4080 Super, 32GB DDR5, 2TB Gen4 SSD, 1000W Gold PSU

13% discount makes this exceptional value
Enterprise-grade power supply
Excellent thermal design

Future-Proofing Considerations

These systems are architected for growth. While they handle the gpt-oss-20b model well, running the larger gpt-oss-120b model requires about 80GB of VRAM. This is a significant step up, typically requiring a multi-GPU configuration. The robust power supplies and cooling in the recommended builds can support adding a second GPU, but always verify component compatibility and physical space before upgrading.

Hardware ROI Calculator

5 users: Break-even at 18 months
10 users: Break-even at 9 months
15 users: Break-even at 6 months
20+ users: Break-even in under 6 months

Server Setup: Install and Run Your First Model

Installing Ollama on Windows transforms complex LLM deployment into a streamlined process. The entire setup takes less than 30 minutes from download to first inference.

Download and Install Ollama

Navigate to ollama.com and download the Windows installer. The installation process automatically:

Creates a system service for background operation
Configures the API server on localhost:11434
Installs both GUI and command-line tools
Sets up automatic startup with Windows

Run the installer with administrator privileges. The setup wizard handles service registration and initial configuration without requiring manual intervention.

Launch Ollama Desktop App

After installation, launch the Ollama desktop application from your Start menu or desktop shortcut. The app provides a clean, user-friendly interface for managing models and server settings.

The application automatically starts the Ollama service in the background and displays available models. The service runs on port 11434 and starts automatically with Windows.

Download GPT-OSS Models

GPT-OSS models appear in Ollama's model library with native MXFP4 support. Click on the "Library" or "Models" tab to browse available models.

Download gpt-oss:20b:

Search for "gpt-oss" in the model library
Click on "gpt-oss:20b"
Click "Download" or "Pull"
Monitor download progress in the app

The download retrieves approximately 16GB of model weights. Ollama's native MXFP4 support eliminates additional quantization overhead, ensuring optimal performance.

For Higher Performance (Optional):
Similarly download "gpt-oss:120b" if your system has sufficient resources (80GB storage, high-end GPU). This larger model delivers enhanced reasoning capabilities for complex tasks.

First Chat Test

Once download completes, test the model directly in the Ollama app:

Click on "gpt-oss:20b" in your downloaded models list
Click "Chat" or "Run" to start a conversation
Type a test prompt in the chat interface

Test prompt example:

Explain the computational complexity of merge sort and why it's preferred for external sorting algorithms.

The model should respond with detailed, technically accurate explanations, confirming successful deployment. The chat interface provides an easy way to validate model functionality before connecting other applications.

Open It Up: Connect Your Whole Office in Minutes

Sharing your AI server across the office requires three configuration steps: enabling network access, configuring Windows Firewall, and setting a fixed IP address for reliable connectivity.

Enable Network Access in Ollama

Open the Ollama desktop application and navigate to Settings. Locate the "Expose to Network" option and enable it. This configuration change allows Ollama to accept connections from other devices on your local network, not just localhost requests.

The setting takes effect immediately—no service restart required. Ollama now listens on all network interfaces (0.0.0.0:11434) instead of just localhost.

Configure Windows Defender Firewall

Windows Defender blocks inbound connections to port 11434 by default. Add a firewall exception to allow team access:

Open Windows Security → Firewall & network protection
Click "Advanced settings" to open Windows Defender Firewall
Select "Inbound Rules" in the left panel
Click "New Rule..." in the right panel
Choose "Port" → Next
Select "TCP" and enter "11434" in Specific local ports
Choose "Allow the connection" → Next
Apply to Domain, Private, and Public networks → Next
Name the rule "Ollama AI Server" → Finish

Set Fixed IP Address

Configure your router to assign a consistent IP address to your AI server, ensuring team members can rely on the same connection string daily.

Router Configuration (varies by manufacturer):

Access your router's admin panel (typically 192.168.1.1 or 192.168.0.1)
Navigate to DHCP settings or LAN configuration
Locate "DHCP Reservation" or "Static IP Assignment"
Find your AI server by hostname or MAC address
Assign a static IP (e.g., 192.168.86.24)
Save configuration and restart router if required

Alternative: Windows Static IP
Configure static IP directly on the server:

Network Settings → Change adapter options
Right-click your network adapter → Properties
Select "Internet Protocol Version 4 (TCP/IPv4)" → Properties
Choose "Use the following IP address"
Enter IP: 192.168.86.24, Subnet: 255.255.255.0, Gateway: 192.168.86.1
DNS servers: 8.8.8.8, 8.8.4.4

Test Network Connectivity

From another device on your network, verify connectivity by opening a web browser and navigating to:

http://192.168.86.24:11434

You should see a simple response indicating the Ollama server is running and accessible. Alternatively, any of the client applications (WaveTerm, Sidekick, etc.) can test the connection when you configure them.

Your AI server is now ready for team-wide deployment.

Use It: WaveTerm, Sidekick (mac), Open WebUI; any app that lets you override the OpenAI endpoint

The beauty of Ollama's OpenAI-compatible API lies in its universal compatibility. Any application supporting custom OpenAI endpoints can immediately leverage your local GPT-OSS deployment.

WaveTerm: Cross-Platform Excellence

WaveTerm (waveterm.dev) provides a sophisticated terminal interface with built-in AI integration across Windows, macOS, and Linux.

Installation:
Download and install WaveTerm for your operating system. The application includes native AI configuration options designed for local LLM deployments.

Configuration:
Create or edit your ai.json configuration file:

{
  "ai@gpt": {
    "display:name": "GPT-OSS 20B (Ollama)",
    "display:order": 1,
    "ai:*": true,
    "ai:name": "gpt-oss",
    "ai:model": "gpt-oss:20b",
    "ai:baseurl": "http://192.168.86.24:11434/v1",
    "ai:apitoken": "ollama",
    "ai:temperature": 0.7,
    "ai:top_p": 1,
    "ai:max_tokens": 800,
    "ai:presence_penalty": 0,
    "ai:frequency_penalty": 0
  }
}

The configuration enables seamless AI interaction within your terminal environment, perfect for developers who live in command-line interfaces.

Sidekick: Native macOS Integration

Mac users benefit from Sidekick's native integration and optimized user experience. Download from github.com/johnbean393/Sidekick/releases.

Setup Process:

Install Sidekick from the GitHub releases page
Open preferences and navigate to AI settings
Add custom provider with your Ollama endpoint
Configure model name and API key (use "ollama" as placeholder)
Test connection to verify functionality

Sidekick's macOS-native interface provides excellent integration with system services, notifications, and keyboard shortcuts.

Open WebUI: Browser-Based Access

For teams preferring web interfaces, Open WebUI delivers a ChatGPT-like experience through your browser.

Docker Installation:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
  --name open-webui ghcr.io/open-webui/open-webui:main

Configuration:

Navigate to http://localhost:3000
Complete initial setup and create admin account
Access Settings → Connections
Add Ollama server: http://192.168.86.24:11434
Verify model detection and availability

Universal Compatibility

The OpenAI-compatible API means virtually any AI-enabled application can connect to your server:

Development Tools:

Cursor IDE (detailed in next section)
GitHub Copilot alternatives
VS Code extensions

Productivity Apps:

Raycast (macOS)
Alfred workflows
Custom business applications

API Integration Example:

from openai import OpenAI

client = OpenAI(
    base_url="http://192.168.86.24:11434/v1",
    api_key="ollama"  # Placeholder for compatibility
)

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Analyze our Q3 sales data trends."}
    ]
)

print(response.choices[0].message.content)

Code With It: Supercharge Cursor IDE with Your Local AI

Cursor IDE supports custom API endpoints through its Models settings, enabling developers to leverage local GPT-OSS for unlimited coding assistance without subscription costs or usage anxiety.

Configure Custom Endpoint

Navigate to Cursor Settings → Models to access API configuration options. Cursor requires OpenAI-compatible providers, making Ollama integration straightforward.

Step-by-Step Configuration:

Disable Default Models
Uncheck existing models (GPT-4, Claude, etc.) to avoid confusion during setup and prevent accidental cloud API usage.
Add Custom Model
Click "+ Add model" to create a new model configuration:
- Model name: gpt-oss:20b
- Display name: GPT-OSS 20B Local
Override Base URL
Enable "Override OpenAI Base URL" and enter:

   http://192.168.86.24:11434/v1

Set API Key
Enter ollama as the API key (required for compatibility, but not validated by local server).
Verify Connection
Click "Verify" to test the connection. Cursor sends a test request to validate the endpoint and model availability.

Development Workflow Integration

Once configured, GPT-OSS becomes available throughout Cursor's interface:

Chat Interface (Cmd/Ctrl+L):
Access the AI chat sidebar for code discussions, architecture questions, and debugging assistance. The local model provides unlimited conversations without rate limiting.

Inline Assistance (Cmd/Ctrl+K):
Highlight code and invoke AI assistance for refactoring, optimization, or explanation. The model understands context and provides relevant suggestions.

Code Generation:
Describe functionality in comments, then request implementation. GPT-OSS generates code based on your specific patterns and preferences.

Benefits and Limitations

Advantages:

Unlimited usage: No token limits or monthly quotas
Privacy: Code never leaves your network
Cost control: No per-request charges
Customization: Fine-tune models for your coding style
Offline capability: Work without internet connectivity

Current Limitations:
Tab completion requires specialized models and continues using Cursor's built-in models. This feature depends on optimized completion models not yet available in the GPT-OSS release.

Performance Optimization:
For optimal coding assistance, configure the model with appropriate parameters:

{
  "temperature": 0.1,
  "max_tokens": 2048,
  "top_p": 0.9
}

Lower temperature ensures more deterministic code generation, while higher token limits accommodate larger code blocks and detailed explanations.

Team Development Scenarios

Code Reviews:
Paste code snippets into Cursor's chat interface for automated review, security analysis, and optimization suggestions. The AI identifies potential issues and suggests improvements without external dependencies.

Documentation Generation:
Select functions or classes and request documentation generation. GPT-OSS analyzes code structure and creates comprehensive documentation matching your existing style.

Debugging Assistance:
Describe error messages or unexpected behavior to receive debugging guidance. The model suggests investigation approaches and potential solutions based on code context.

Win: GPT without limits. Other models. Finetuning, and more.

Local GPT-OSS deployment transcends simple cost savings—it unlocks capabilities and flexibility impossible with cloud-based solutions.

Unlimited Experimentation

Without rate limits or usage charges, your team can explore AI capabilities without financial constraints:

Developers iterate freely on code generation, testing multiple approaches without quota anxiety. Complex refactoring tasks that might require dozens of API calls become economically feasible.

Content Teams brainstorm extensively, generate multiple variations, and refine messaging through iterative AI collaboration. The psychological shift from "conserving tokens" to "unlimited exploration" fundamentally changes creative workflows.

Data Scientists process large datasets, generate synthetic data, and experiment with different analysis approaches without worrying about API costs scaling with data volume.

Model Diversity and Customization

Apache 2.0 licensing enables unrestricted fine-tuning and customization. Your GPT-OSS deployment becomes the foundation for specialized AI systems tailored to your business requirements.

Industry-Specific Fine-Tuning:

Legal firms: Train on case law and legal precedents
Healthcare: Customize for medical terminology and protocols
Finance: Optimize for regulatory compliance and analysis
Education: Adapt for curriculum and pedagogical approaches

Domain Expertise Development:
Fine-tune models on your proprietary documentation, code repositories, and institutional knowledge. Create AI assistants that understand your specific terminology, processes, and quality standards.

Multi-Model Ecosystem:
Ollama supports dozens of open-source models beyond GPT-OSS. Deploy specialized models for different tasks:

Code generation: CodeLlama, StarCoder
Creative writing: Mistral, Anthropic models
Analysis: Specialized reasoning models

Scaling Strategies

Horizontal Scaling:
Deploy multiple servers as team size grows:

50+ users: 2-3 servers with load balancing
100+ users: Dedicated servers by department
Enterprise: Multi-location deployment with model synchronization

Vertical Scaling:
Upgrade hardware for enhanced performance:

GPU upgrades: RTX 4090, RTX 5080 for increased throughput
Memory expansion: Support larger models and batch processing
Storage optimization: NVMe RAID for faster model loading

Advanced Features Roadmap

Function Calling and Tool Integration:
GPT-OSS supports native function calling capabilities, enabling integration with:

Internal APIs and databases
Business intelligence tools
Custom automation workflows
External service integration

Reasoning Effort Configuration:
Configurable reasoning effort levels allow optimization for different use cases:

Low effort: Quick responses for simple queries
Medium effort: Balanced performance for general use
High effort: Maximum quality for complex problem-solving

ROI Calculation Framework

Break-Even Analysis by Team Size:

Team Size	Monthly API Cost	Hardware Investment	Break-Even Period
5 users	$150/month	$2,700	18 months
10 users	$300/month	$2,700	9 months
15 users	$450/month	$2,700	6 months
25 users	$750/month	$5,400 (2 servers)	7 months
50 users	$1,500/month	$8,100 (3 servers)	5.4 months

Total Cost of Ownership (3 Years):

Cloud APIs (10 users): $10,800
Local deployment: $3,500 (hardware + maintenance)
Net savings: $7,300

Future-Proofing Considerations

The AI landscape evolves rapidly, but local deployment provides stability and control:

Model Updates:
Download and test new models without disrupting existing workflows. Rollback capabilities ensure stability during transitions.

Compliance Evolution:
Maintain control over data handling as regulations evolve. Local deployment simplifies compliance audits and documentation.

Technology Independence:
Reduce dependency on external providers and their policy changes. Your AI infrastructure remains under your control regardless of market dynamics.

Innovation Platform:
Local deployment becomes a platform for AI innovation within your organization. Experiment with emerging techniques, develop proprietary capabilities, and maintain competitive advantages.

Getting Started Today

Your journey to AI independence begins with a single order on Amazon. Choose a system that fits your budget and team size, knowing that the investment pays for itself within months while unlocking capabilities that cloud APIs simply cannot provide.

The future of AI belongs to organizations that control their own destiny. GPT-OSS and Ollama make that future accessible today, transforming expensive cloud dependencies into owned infrastructure that grows stronger and more valuable over time.

Ready to deploy? Share your experience in the comments below and join the growing community of developers running their own AI infrastructure.

--
Tired of fragmented workflows breaking your flow state? PullFlow bridges the gap, enabling seamless code review collaboration across GitHub, Slack, and VS Code (plus Cursor, Windsurf, and more).

Try PullFlow - Unified Code-Review Collaboration

Forked by Cursor: The Hidden Cost of VS Code Fragmentation

Zak Mandhro — Thu, 24 Jul 2025 15:58:00 +0000

It's the story of the year in developer tooling. Visual Studio Code, the open-source editor that became a unifying force for millions of developers, is now the foundation for a new, revolutionary wave of AI-powered tools. At the forefront is Cursor, a VS Code fork that has taken the world by storm with its promise of a faster, smarter, AI-first workflow. Along with Windsurf, Firebase Studio, and the just released Kiro IDE by AWS, these VS Code forks provide a tantalizing glimpse into the future of coding.

But for every developer celebrating this new era, there's another who feels the growing pains. The very ecosystem that made VS Code a beloved and reliable standard is now fracturing, leaving developers caught between a familiar present and a promising but chaotic future. This isn't just a story about competing tools; it's about the difficult, necessary, and often messy process of progress, and the hidden cost of fragmentation.

The Open-Source Rocket Ship

Before we dive into the growing pains, it's crucial to remember how VS Code became such a phenomenon. When Microsoft launched it in 2015, few could have predicted its trajectory. For the next decade, it became the undisputed editor of choice, dominating the development landscape. This success wasn't an accident; it was a masterclass in open-source strategy. Microsoft made a brilliant architectural bet by building the editor on a foundation of web technologies: Monaco for the core editor, and Electron to bring it to the desktop.

This wasn't just a technical choice; it was a community-building one. By using familiar technologies like TypeScript and Node.js, Microsoft lowered the barrier to entry to almost zero. Any web developer could jump in and contribute. The advent of the Language Server Protocol and a vibrant ecosystem of over 70,000 extensions transformed VS Code from a simple editor into a powerful, universal workbench—a full-fledged IDE for any language.

VS Code didn't just win; it won by empowering its community. And that's what makes the current situation so complex.

The Price of a Bolder Vision

Let's be clear: the frustrations are real and deeply felt. When you switch from the stable ground of VS Code to a fork like Cursor or Windsurf, you're not just getting new features; you're also leaving a carefully curated environment behind. The consequences of this divergence are tangible and disruptive:

Broken Workflows: It's more than just a few extensions not working. It's the sudden inability to use a critical tool like Live Share for a pair programming session, forcing you to switch back to VS Code mid-task. This constant context-switching shatters the "flow state" that is so crucial for productive development.
Retrained Muscle Memory: Core keyboard shortcuts—the ones burned into your brain for deleting lines, navigating files, or running tests—are often repurposed for AI features. This isn't just a minor annoyance; it's a constant tax on your cognitive load, forcing you to think about the mechanics of editing instead of the logic of your code.
A Loss of Flexibility: The highly configurable UI was one of VS Code's greatest strengths. It allowed developers from all backgrounds to feel at home. Now, that flexibility is often gone, replaced by a locked-down interface that prioritizes a specific, opinionated AI-centric layout. The editor no longer bends to you; you have to bend to it.

These aren't minor inconveniences to be brushed aside. They are significant disruptions that can make it feel like you're taking one step forward and two steps back. It's fair to ask if the promise of tomorrow's AI-powered productivity is worth the friction of today.

But What If Fragmentation is a Feature, Not a Bug?

It's easy to view this situation as a problem to be solved, a flaw in the open-source model. But what if it's not? What if this messy, divergent phase is a necessary and even healthy part of a much larger evolution?

The teams behind Cursor and other forks didn't set out to break your favorite extensions. They ran into the natural limits of a mature platform. A platform like VS Code, celebrated for its stability and backward compatibility, cannot make radical architectural changes without alienating its massive user base. Its extension API, while powerful, was never designed for the kind of deep, system-level AI integration these new tools envision—an AI that is aware of the entire codebase, can interact with the terminal, and can perform complex, multi-file refactors. To build that truly AI-native experience, they had to go beyond the existing APIs and modify the core architecture.

They chose to innovate, even if it meant breaking compatibility. This is the classic innovator's dilemma: do you stay within the safe confines of the existing system and make incremental improvements, or do you push beyond it to create something radically new, even if it means leaving the old world behind?

A Glimpse Into the Future

Viewed through this lens, these forks are not just competing products; they are live-action prototypes of the future. They are like the concept cars of the automotive world—bold, exciting, and not always practical for today's roads, but they show us exactly where the industry is headed. They are stress-testing new ideas in the real world, revealing what's possible when an editor is built around AI from the ground up.

This period of fragmentation, while painful for some, is incredibly valuable. It provides crucial data points for the entire community. The successes of these forks are a powerful signal to the VS Code team, highlighting exactly where the demand is and what kinds of deeper integrations developers are hungry for. At the same time, their failures and compatibility issues create a clear roadmap for the future of VS Code's own extension APIs, showing what needs to be built to accommodate this next generation of tooling without requiring a full fork.

The Path to Convergence

This doesn't have to be a zero-sum game where one side wins at the other's expense. The current divergence should not be the destination; it should be a temporary, albeit necessary, detour on the road to a more powerful future. The ideal path forward is one of collaboration and eventual convergence.

The innovations pioneered by forks like Cursor and Windsurf should serve as a clear guide for the evolution of VS Code itself. As these new AI patterns and workflows become standardized, the VS Code team can build the next-generation hooks and APIs that allow these powerful features to be implemented as extensions, not forks. This would allow innovation to flourish within the stable, unified ecosystem that everyone values.

We are already seeing VS Code embracing this innovation by open-sourcing the VS Code Copilot Chat Extension. This move signals exactly the kind of convergence we need: taking AI-first features that were previously exclusive to specialized tools and making them available as extensions within the core VS Code ecosystem.

This is the true spirit of open source in action. It's a dynamic cycle: a stable platform enables radical experimentation on its fringes, and that experimentation, in turn, informs and strengthens the evolution of the core platform. The future doesn't have to be a choice between a stable, universal editor and a fragmented landscape of powerful but incompatible tools. We can have both.

Let's innovate with speed but build towards convergence.

But what do you think? Is this fragmentation a necessary price for progress, or a threat to the open-source community that VS Code helped build?

Learn more at PullFlow.com.

Try PullFlow - Unified Code Review Collaboration

Perplexity Comet, Dia Browser, and Opera Neon - How Agentic Browsers Will Change The Web

Zak Mandhro — Thu, 10 Jul 2025 16:00:00 +0000

The web browser is evolving from a document viewer into an intelligent agent that acts on your behalf. This shift from passive browsing to active assistance represents one of the most significant changes in how we interact with the internet since the 1990s.

What Is Agentic Browsing?

Agentic browsing transforms your browser from a passive tool into an intelligent assistant that can understand context, perform tasks, and make decisions. Instead of just displaying web pages, agentic browsers use AI to:

Understand user intent beyond simple keyword searches.
Perform automated tasks like filling forms, making bookings, and shopping.
Provide contextual assistance with writing, learning, and research.
Synthesize information across multiple sources in real-time.
Adapt to user preferences and work patterns over time.

They serve as intelligent intermediaries between users and the web, capable of reasoning about content and taking action based on natural language commands.

The New Generation of Agentic Browsers

Several companies are leading the charge in reimagining the browser experience:

Opera: From Aria to Neon

Opera’s Aria provides AI-powered tab management and content generation in its existing browser. Opera Neon, by contrast, is a premium, standalone agentic browser designed to "turn user intent into action." Neon features a native AI that automates web tasks locally for privacy and includes a powerful AI engine that can create reports, code, and even websites from user requests.

The Browser Company's Dia

Dia, from the creators of the Arc browser, focuses on contextual AI assistance. It acts as a writing partner, learning tutor, planning assistant, and shopping concierge, allowing users to "chat with their tabs" to get information about any open webpage.

Perplexity's Comet

Positioned as a "thought partner," Comet aims to transform browsing into active collaboration. It learns user thinking patterns to surface relevant content proactively and can acquire knowledge from one website to apply it to actions on another—for instance, gathering product specs from a manufacturer's site and using them to complete a purchase on a retailer's site.

OpenAI's Potential Entry into the Browser Market

In a significant but still speculative development, OpenAI is reportedly building its own web browser. This move, if confirmed, would be based on the open-source Chromium project and could represent a major challenge to Google Chrome's dominance. Developing a native solution would allow OpenAI to design the browser architecture specifically for AI-powered features from the ground up. However, unlike the other browsers mentioned, OpenAI has not officially announced this product, so it remains a development to watch.

Use Cases and Benefits

Agentic browsers excel in scenarios where traditional browsing is cumbersome:

Productivity: Accelerate research by automatically gathering and synthesizing findings from multiple sources.
Learning: Get real-time explanations of complex topics and see different viewpoints on controversial subjects.
Commerce: Streamline shopping by comparing products, analyzing reviews, and handling complex bookings automatically.

The Publisher and Advertiser Disruption

The rise of agentic browsers creates a significant disruption to web economics, fundamentally changing how content is consumed and monetized.

A Paradigm Shift for Publishers

Web publishers who rely on page views and on-site engagement are confronting a significant challenge to their business models. Recent data shows the impact is already being felt from existing AI-powered search features.

Traffic Reduction is No Longer Hypothetical:

AI-Powered Summaries: Agentic browsers and AI search tools provide comprehensive summaries, reducing the need for users to click through to the original source. According to a June 2025 report, some publishers have already seen traffic from Google search fall by as much as 60% since the rollout of its AI Overviews.
Direct Answers Kill Clicks: Another analysis found that when AI Overviews appear, the first organic link loses an average of 34.5% of its clicks. As users get their questions answered directly on the results page, traditional traffic metrics plummet.
Adapting to Survive: Publishers are being forced to explore new strategies, such as building paywalls, forming direct partnerships with AI companies, or creating interactive experiences that an AI cannot easily replicate.

Advertising Models Under Pressure

Digital advertising models are also facing fundamental changes, as AI agents may filter out display ads and reduce the "dwell time" that generates impressions. New opportunities may emerge in AI-mediated recommendations and intent-based targeting, but the transition will be challenging.

How Web Design Will Adapt to Agentic Browsing

The rise of agentic browsers will reshape web development. The focus will shift from traditional Search Engine Optimization (SEO) to Agent Optimization.

Structured Data and APIs: Websites will need machine-readable markup (like Schema.org) and robust APIs to help AI agents understand content and context.
The UI Density Revolution: Interfaces may become more information-dense. AI agents don't need large, finger-friendly buttons, allowing for more functionality to be packed into smaller spaces. We may see a rise in hybrid interfaces that can switch between human-friendly and agent-optimized views.

Limitations and Challenges

Despite their promise, agentic browsers face significant hurdles:

Technical: Real-time AI assistance requires substantial processing power and network connectivity.
Privacy and Security: Agentic browsers need access to browsing behavior to provide personalized assistance, raising concerns about data collection and security.
User Experience: Users must adapt to new interaction patterns and learn to trust AI recommendations without becoming over-reliant on them.

Conclusion: Preparing for the Agentic Web

Agentic browsing is not a distant concept—it's happening now. With real products from Opera, The Browser Company, and Perplexity, the transformation of the web is accelerating.

This transition requires new approaches to web design, a focus on structured data, and a search for sustainable monetization models. As we navigate this shift, the goal must be to enhance human agency, not replace it. The web is becoming more intelligent, and those who adapt will be best positioned for success.

How are you preparing your web projects for an agentic future?

Vibe Coding: Why Microservices Are Cool Again

Zak Mandhro — Thu, 26 Jun 2025 16:00:00 +0000

The surprising synergy between LLM code-generation and modular architecture

The Rise of Vibe Coding

Somewhere between autocomplete and AGI, a new term entered the developer lexicon: vibe coding — the act of building software by prompting an LLM and iterating in flow.

Coined by Andrej Karpathy, it evokes that jazz-like rhythm of:

"You prompt, it codes, you tweak, it gets better — you vibe."

But not everyone's vibing.

Andrew Ng recently called the term "unfortunate," warning it trivializes the deep, focused labor of AI-assisted engineering. Hacker News lit up with takes ranging from "this is the future" to "this is how the future explodes in prod."

So… is vibe coding real? Yes. But it only works when the architecture supports it. And that's where microservices make a surprise comeback.

Why Monoliths Kill the Vibe

Large monolithic codebases — whether human-crafted or LLM-generated — are notoriously hard to work with. Not because they're morally wrong. Because they're cognitively dense.

As monoliths grow, they create a degradation spiral. The cognitive overhead of understanding the entire system becomes untenable, so developers resort to localized fixes: "I'll just make this part work." These tactical shortcuts accumulate as technical debt, introducing inconsistent patterns, tighter coupling, and architectural drift. The codebase becomes progressively harder to reason about, encouraging more shortcuts — a feedback loop that compounds complexity exponentially.

For humans, monoliths require tribal knowledge and cautious refactoring.

For LLMs, they stretch the limits of context windows and dilute the model's ability to make accurate predictions.

Why LLMs Struggle With Monoliths

LLMs process input as tokens, and use attention mechanisms to assign weight across these tokens. As your codebase grows, so does the number of tokens — often beyond what the model can "meaningfully attend to."

What happens?

Dependencies become too distant in token space.
Signals get lost in noise.
The model's attention gets diffused, weakening its ability to recognize relevant context.
Outputs become blurrier, less confident, and more error-prone.

Long story short: even the smartest model struggles when asked to make changes across a 100k-line codebase full of deeply-coupled logic. It's like trying to debug a complex system while only being able to see a small window of code at a time — critical dependencies and context get lost outside your field of view.

Why Microservices Make LLMs Shine

Microservices break complex systems into small, purpose-built modules. For LLMs, that's gold.

Each service becomes:

A promptable unit
A testable target
A contained context window

You can tell an LLM:

"Build a notification service that sends Slack alerts on deploy failures."

...and it can generate a working service with routes, tests, and infrastructure glue — all without dragging in your entire backend.

Microservices = Better Prompts

The APIs between services act as semantic boundaries that make reasoning easier — for both humans and LLMs. Instead of fuzzy internal function calls, you get explicit interfaces and contracts.

For LLMs, this clarity improves:

Planning (less ambiguity)
Generation (cleaner prompts)
Debugging (smaller scope)

But Didn't Microservices Burn Us Already?

Yes. Microservices once promised engineering nirvana — and often delivered chaos:

CI/CD pipelines everywhere
Observability fatigue
Three NPM packages to change a button color

But with AI in the mix, microservices are cool again — not because they scale, but because they de-risk co-creation with machines.

LLMs don't need the entire application — they need well-defined pieces. And microservices deliver exactly that.

The Vibe Stack: How We Make Microservices Not Suck

At PullFlow, we build AI-augmented microservices daily — and we do it without a DevOps nightmare or Kubernetes in local dev. Here's our actual setup that keeps developer experience smooth and LLMs productive:

Colima: Lightweight containerd runtime for fast, reliable local containers (especially friendly on macOS).
Caddy: Smart reverse proxy with automatic HTTPS and per-service routing.
cloudflared: Secure tunneling to expose local services — ideal for testing webhooks, LLM endpoints, and external integrations.
NATS + JetStream: High-performance pub/sub system that powers inter-service messaging and async workflows.
TurboRepo + pnpm: Monorepo tooling with shared packages managed through workspaces and fast, dependency-deduplicated builds.
PostgreSQL: Primary relational database for most service persistence needs.
Valkey: High-performance shared cache layer for cross-service data sharing.
TimescaleDB: Time-series database extension for PostgreSQL, handling metrics and event data.
Isolated service scaffolding: Each microservice lives in its own code path with dedicated persistent stores.

This setup gives us:

Predictable, testable dev environments that scale with the team
Clear interfaces that LLMs can reason about
Service boundaries that preserve human sanity and AI promptability

Microservices can be fast, modular, and developer-first — if you design for it.

The LLM-Native Stack Is Coming

We're already seeing a shift:

LLMs spinning up CRUD APIs from OpenAPI schemas
Agents orchestrating services via message buses
Prompts as the new CLI

As AI takes a bigger role in software development, we need architectures that support modularity, autonomy, and safety. Microservices aren't just back — they might be foundational to the LLM-native dev stack.

At PullFlow...

We’re building for a future where humans and AI agents don’t just coexist — they collaborate to ship better software, faster. That’s why we’ve embraced microservices — not for scale, but as a protocol for co-creation.

Want to vibe-code with confidence? Start with strong boundaries.

Want Human + AI collaboration that actually works in code reviews? Try PullFlow.

Code Review Agent Adoption in PullFlow

Zak Mandhro — Tue, 03 Jun 2025 15:00:00 +0000

As a leading code review collaboration platform, PullFlow has been at the forefront of the AI agent revolution in software development. Over the past year, we've integrated with popular AI agents like GitHub Copilot, CodeRabbit, and Greptile, giving us unprecedented visibility into how development teams are adopting and using these tools.

The insights we've gathered have been remarkable. Today, 85% of our paid customers actively use AI agents for code review, representing a fundamental shift in how development teams approach collaboration and quality assurance. But the real story isn't just in the adoption numbers—it's in what we've learned about how these tools are reshaping development workflows in ways we didn't anticipate.

This isn't simply about automation replacing manual processes. What we're observing through our platform is a sophisticated evolution in human-AI collaboration that's transforming how teams work together.

The Current Landscape

The adoption patterns tell a compelling story. 30% of our paid customers now use multiple AI agents simultaneously, with GitHub Copilot leading overall adoption, followed by specialized tools like CodeRabbit and Greptile for targeted review tasks. Perhaps most striking is the near-universal adoption of automatic PR description generation, which has become so integral to teams' workflows that many describe it as indispensable as syntax highlighting or version control.

The integration of these tools represents more than convenience: it's enabling teams to scale their review processes without proportionally scaling their time investment.

Developer Experience Patterns

One of the most interesting trends we've observed is how different experience levels approach AI agents. Junior developers tend to embrace AI agents as comprehensive learning tools, using them for guidance on best practices, code patterns, and quality standards. The immediate feedback loop helps accelerate their learning curve significantly.

Senior developers take a more strategic approach, leveraging AI agents for routine quality checks while reserving their expertise for architectural decisions, design patterns, and mentoring responsibilities. This division isn't a limitation—it's an optimization that allows teams to distribute cognitive load more effectively across both human and artificial intelligence.

Managing the Signal-to-Noise Challenge

One reality every team faces: approximately 70% of AI agent comments are resolved without action, indicating they weren't actionable or relevant to the specific context. This signal-to-noise ratio can create notification fatigue and undermine trust in automated systems. However, successful teams have developed strategies to address this challenge.

Unified conversation management through PullFlow's threading system allows teams to centralize AI feedback alongside human discussions. Senior developers can quickly validate useful suggestions with reactions while filtering out noise. Direct agent interaction via Slack integration enables teams to clarify AI feedback contextually, allowing developers to ask @coderabbit for clarification without leaving their workflow.

Customized agent settings help teams tune their review focus through PullFlow's Agents page, emphasizing feedback types most relevant to their codebase and development standards.

Workflow Evolution

The most effective implementations treat AI agents as specialized team members with distinct strengths. Teams are developing sophisticated workflows that leverage both human insight and AI capabilities:

AI agents handle consistent quality checks: syntax errors, security patterns, style compliance
Human reviewers focus on business logic, architectural decisions, and knowledge transfer
Reactions and threading systems create feedback loops that help teams learn which AI suggestions provide value

Emerging Trends

Several developments are reshaping the code review landscape. Shift-left integration is moving review capabilities directly into development environments, enabling real-time feedback before code reaches the PR stage. Role reversal scenarios are becoming more common, where human reviewers evaluate AI-generated code against business requirements and architectural standards.

Multi-agent orchestration is emerging, with specialized agents handling different aspects of code review, testing, and documentation in coordinated workflows. Self-improving systems are beginning to update their own instruction files based on team acceptance patterns, creating more targeted feedback over time.

The Human Element

Despite increasing automation, the most successful teams maintain strong human oversight and decision-making. AI agents excel at identifying technical issues and enforcing consistency, but human reviewers provide essential context around business requirements, user impact, and strategic technical decisions. The most effective implementations don't replace human judgment—they amplify it by handling routine tasks and highlighting areas that require human expertise.

Looking Forward

The 85% adoption rate reflects a broader shift toward co-intelligent development teams. Success isn't measured simply by speed improvements, but by the quality of collaboration between human expertise and AI capabilities. Teams achieving optimal results focus on orchestrating these tools thoughtfully, customizing their behavior to team-specific needs, and maintaining the collaborative learning aspects that make code review valuable beyond quality assurance.

PullFlow's Agent Experience continues evolving to support this transformation, providing centralized management, intelligent filtering, and seamless integration that adapts to how teams actually work. The future of code review lies in thoughtful human-AI collaboration: not replacement, but strategic partnership that enhances both efficiency and quality.

Learn more about optimizing your team's code review workflow with PullFlow's Agent Experience.