DEV Community

Cover image for The Truth Nobody Tells You About AI in 2026: Why Microsoft and Uber Are Pulling Back, and Why Your Strategy Matters More Than Your Speed
Eber Cruz Fararoni
Eber Cruz Fararoni

Posted on

The Truth Nobody Tells You About AI in 2026: Why Microsoft and Uber Are Pulling Back, and Why Your Strategy Matters More Than Your Speed

By Eber Cruz Fararoni | ebercruz.com | Software Architect & Builder of Intelligent Systems


TL;DR: 80.3% of enterprise AI projects fail without a trace of ROI. Microsoft has watched as only 4.5% of its 450 million M365 customers pay for Copilot, while its stock fell 34%. In the midst of this disaster, DeepSeek just made a 75% API discount permanent, and Moonshot AI's Kimi K2.6 proves it can match — or surpass — Claude Code 4.6 at a fraction of the cost. I've spent the past few months building Fararoni Flow, a multi-purpose agent orchestrator on Java 25, NATS, and hexagonal architecture with sidecar. This article is what I've learned about why most fail, and why orchestration strategy matters more than implementation speed.


1. The Landscape: A Silent Crisis in Enterprise AI Adoption

Generative artificial intelligence arrived promising to revolutionize every aspect of the modern enterprise. In 2025, organizations invested $684 billion in AI worldwide. By December of that year, more than $547 billion of that investment had produced measurable results: exactly zero. Not low returns. Zero. This is not a hypothetical scenario or a pessimistic projection: it is the conclusion of a RAND Corporation analysis of more than 2,400 enterprise AI initiatives.

The reality we face in 2026 is radically different from the marketing narrative sold by the big AI labs. While headlines celebrate each new model launch with ever-grander superlatives, in the trenches of enterprise implementation, the story is different. 42% of companies abandoned at least one AI initiative in 2025, a dramatic jump from just 17% the previous year, according to S&P Global Market Intelligence data. Organizations aren't getting better at AI; they're simply getting faster at recognizing failure.

The problem isn't the technology itself. The models of 2026 are undeniably superior to those of 2024. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and the new Chinese competitors like DeepSeek V4-Pro and Kimi K2.6 represent quantum leaps in reasoning capability, code generation, and agentic task execution. The problem is how companies are trying to leverage these capabilities.

Most organizations made the fundamental mistake of treating AI as a speed race rather than what it truly is: a strategy discipline. They wanted to build everything with prompts, automate complex processes in weeks, and transform their operations without the architectural rigor that any mission-critical enterprise system demands. When these efforts collapsed — as was inevitable — the easy conclusion was "AI is too expensive" or "AI doesn't work for us." The correct conclusion, however, is that orchestration architecture matters more than the language model you choose.

Metric Data Source
Global AI investment 2025 $684 billion RAND Corporation (2025)
AI projects with no measurable ROI $547 billion (80%) RAND Corporation (2025)
Companies that abandoned ≥1 AI initiative 42% (vs 17% in 2024) S&P Global (2025)
GenAI projects with no P&L impact 95% MIT Project NANDA (2025)
AI projects that never reach production 52% Gartner (2024)
Average cost of a failed AI project $7.2 million S&P Global (2025)
Cost overruns in RAG projects at scale 380% vs pilot projection MIT Sloan

These numbers shouldn't discourage us. They should focus us. AI isn't failing; careless approaches are failing. And in that recognition lies a massive opportunity for those who understand that the difference between success and failure isn't the model you use, but the architecture with which you orchestrate it.

Tasa de Fracaso de Proyectos AI Enterprise

Figure 1: Hard data from multiple authoritative sources confirms that most enterprise AI projects fail. Source: RAND Corp, MIT NANDA, Gartner, S&P Global, BCG (2024-2025).


2. The Giants Stumble: Microsoft, Uber, and the Real Cost of AI

Microsoft: The 4.5% That Exposed a Structural Fragility

In May 2026, Fortune published a devastating analysis of Microsoft's position in the AI race. The company that had bet the hardest on OpenAI — with an investment exceeding $13 billion — faced an uncomfortable reality: less than 4.5% of its 450 million Microsoft 365 customers were paying for Copilot features. Meanwhile, its Copilot consumer chatbot had reached approximately 20 million weekly active users, a figure that sounds impressive until you compare it to ChatGPT's 900 million users.

The situation of GitHub Copilot — which was the first major commercial success of AI coding — illustrates the trend even more clearly. From being the undisputed leader in AI coding tools, it has been supplanted first by Cursor ($2 billion ARR, the fastest-growing SaaS ever recorded) and then by Claude Code (46% of senior engineers name it their "most loved" tool, with an NPS of 54).

Microsoft's stock fell 34% from its all-time high in October 2025 to March 2026, despite its AI-related revenues in Azure having more than doubled. Investors realized something that marketing executives still don't want to admit: having the most famous model doesn't guarantee a sustainable business platform. The company announced $190 billion in capital expenditures for 2026 — more than triple what it spent in 2024 — in a desperate bid to recover lost ground.

Microsoft's Chief Commercial Officer, Judson Althoff, publicly acknowledged several errors: calling the product "Copilot" for both consumer and enterprise versions created massive confusion; incentivizing sales representatives to promote the free version when only the premium version delivered real value; and underestimating the speed at which AI technology was evolving. When Anthropic launched Claude Code in 2025 — capable of writing complete programs autonomously from a description — and then Claude Cowork in January 2026, the "copilot" model that Microsoft had built suddenly felt like a generation behind.

Uber: Closing Labs, Opening the Door to Learning

Uber's case is different but equally instructive. During the COVID-19 pandemic, Uber made the decision to close its Uber AI Labs as part of a strategic cost reduction. Although this decision was driven by the need to preserve capital during an unprecedented crisis for the ride-sharing industry, it illustrates a pattern we've seen repeat: when AI costs meet financial reality, experimental projects are the first to fall.

What makes these cases particularly revealing is not that companies are abandoning AI altogether — neither of them is doing so — but that they are learning an expensive lesson: AI is not a product you buy, it is a capability you build. Microsoft is not abandoning AI; it is reconfiguring its strategy to be model-agnostic, allowing its customers to choose between GPT, Claude, Gemini, or any other model within its platform. Uber did not abandon AI; it redirected resources toward AI applications more directly tied to its core business.

The conclusion is not that "AI is too expensive." The conclusion is that without a well-designed orchestration architecture, without a gradual implementation strategy, and without a deep understanding of where AI generates real value versus where it only generates impressive demos, costs skyrocket and results evaporate.


3. The Price War Nobody Saw Coming: DeepSeek, Kimi, and the New Geography of Cost

DeepSeek V4-Pro: The 75% That Changed Everything

In May 2026, DeepSeek — the Chinese lab that in January 2025 had already shaken the industry's foundations with its R1 model — announced something many considered impossible: a 75% discount on its V4-Pro API, which would also become permanent. This is not a temporary promotion or a marketing trick. It is a structural cost reduction based on real gains in computational efficiency.

The numbers are staggering. DeepSeek's V4-Pro model, with 1.6 trillion parameters and a 1-million-token context window, dropped to:

Model Input / 1M tokens Output / 1M tokens Cache Hit / 1M
DeepSeek V4-Pro (post-75%) $0.435 $0.87 $0.003625
DeepSeek V3.2 $0.28 $0.42 $0.028
GPT-5.4 $2.50 $15.00 $0.25
Claude Opus 4.6 $5.00 $25.00 $0.50
Kimi K2.6 $0.60 $2.50 $0.10

To put it in perspective: a task that costs $0.87 with DeepSeek V4-Pro in output tokens costs $15.00 with GPT-5.4 and $25.00 with Claude Opus 4.6. That represents a 94% savings versus Claude Opus and 88% versus GPT-5.4. The V4-Pro cache hit at $0.003625 per million tokens is practically free for workloads with repetitive system prompts.

Sanchit Vir Gogia, CEO of Greyhound Research, explained the logic behind this reduction: "V4-Pro was designed to reduce the cost of long-context inference, operating at approximately a quarter of the compute per token and a tenth of the memory footprint of its predecessor at very long contexts. That is why the price reduction is permanent and not promotional. It is not a discount. It is an efficiency gain that is passed on to the customer."

Comparativa de Costos API

Figure 2: DeepSeek V4-Pro and Kimi K2.6 offer prices 8-30x lower than equivalent Western models, with comparable performance on coding benchmarks. Source: Official API prices, May 2026.

Kimi K2.6: The Open-Source That Wins on Benchmarks and Loses on the Bill

On April 20, 2026, Moonshot AI launched Kimi K2.6, an open-source 1-trillion-parameter model with a Mixture-of-Experts (MoE) architecture that activates approximately 32 billion parameters per token. The numbers it presented are as impressive as the prices:

  • SWE-Bench Pro: 58.6% — above GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
  • HLE-Full (with tools): 54.0% — leading among all compared models
  • BrowseComp (Agent Swarm): 86.3% — dominating in multi-agent agentic tasks
  • DeepSearchQA F1: 92.5% — superior to GPT-5.4 (78.6%) and Claude Opus 4.6 (91.3%)

On independent benchmarks such as SWE-Bench Verified, K2.6 reached 80.2%, surpassing Claude Opus 4.6 (80.8% in its 4.6 version, although Opus 4.7 subsequently reached 87.6%).

The price: $0.60 per million input tokens and $2.50 per million output tokens. That makes it 8.3x cheaper on input and 10x cheaper on output than Claude Opus 4.7. When Cursor — the fastest-growing AI coding company in history with $2 billion ARR — built its Composer 2 feature on Kimi K2.5 (the previous version), they were sending a clear message: performance does not require paying premium prices.

DeepSeek V4-Pro Precio y ROI de Routing

Figure 3: Left: Evolution of DeepSeek V4-Pro price with permanent 75% discount. Right: Real monthly cost using intelligent routing versus a single model. Source: API data and public benchmarks, May 2026.

The Reality That Costs Reveal

The price difference is not a minor accounting detail. It is a strategic transformation of the landscape. When Ideas2IT ran a controlled test — building the same Flask application with SQLite, HTML frontend, CRUD operations, unit tests, and Git configuration — the results were revealing:

Model Cost per execution Output quality
DeepSeek V3.2 ~$0.15 Good (better UI)
Kimi K2.5 ~$0.33 Production-ready
Claude Sonnet 4.6 ~$1.66 Production-ready
Claude Opus 4.6 ~$75.64/month Production-ready (superior on complex architecture)

All three cloud models completed the task. Engineers who reviewed the results blind could not consistently identify which model produced which output. The 11x cost difference between DeepSeek V3.2 and Claude Sonnet 4.6 did not translate into an 11x quality difference. It translated into an 11x difference on the bill.

At team scale, the implications are enormous. A team of 10 engineers using Claude Code with Claude Sonnet 4.6 spends approximately $444.40 per month on API tokens alone. The same team using Kimi K2.5 would spend $78.60. With DeepSeek V3.2, barely $24.00. And that doesn't even consider that 82% of a developer's daily work — PR review, refactoring, testing, standard debugging — does not require the maximum reasoning capability of a premium model.

Tyler Folkman, an independent developer who built a model router for his personal workflow, documented the most extreme case: in 2,415 real AI turns, he spent $76.77 using a routing system that sent each task to the appropriate model. The same volume of work would have cost $1,272.77 if he had used GPT-5.5 for everything. A 94% savings, achieved simply by not "pretending that every task is the same task."

Build vs Buy y Costo por Ingeniero

Figure 4: Left: The dramatic shift from companies building AI in-house to buying solutions (Menlo Ventures). Right: Real monthly cost per engineer using different models within Claude Code. Source: Ideas2IT, JetBrains AI Pulse Survey, API data.


4. Claude Code 4.6 Is Still King... But the Throne Is Wobbling

I don't want my argument to be misinterpreted. Claude Code 4.6, especially in its Opus tier, remains the gold standard for complex coding tasks. Its 1-million-token context allows loading entire monolithic repositories in a single session. Its 91.3% score on GPQA Diamond (graduate-level science questions validated by domain experts) is unmatched for deep scientific reasoning. Its hallucination rate is the lowest in the industry, with an AA-Omniscience index of +10 versus Kimi K2.5's -11.

For complex architecture work, large-scale refactorings, legacy code comprehension, and truly novel problems where the output cannot be easily verified with automated tests, Claude Opus 4.6 justifies its premium price. There is no substitute for the peace of mind of knowing that the model has the highest probability of generating a correct answer when "correct" cannot be verified with a unit test.

However, here is the truth that many don't want to hear: 80% of a software engineer's work is not complex architecture or novel problems. It is REST API development, unit test generation, frontend scaffolding, standard error debugging, and pull request review. For that 80%, Kimi K2.6 and DeepSeek V4 are not just "good enough" — on many coding benchmarks, they are better.

The Pragmatic Engineer survey from February 2026, which consulted approximately 906 senior engineers with a median of 11-15 years of experience, revealed a fascinating pattern: 46% of senior engineers named Claude Code as their "most loved" tool, versus 19% for Cursor and 9% for GitHub Copilot. JetBrains confirmed these findings with hard loyalty data: a CSAT of 91% and an NPS of 54 for Claude Code, the highest in the category.

Market Share y Satisfaccion

Figure 5: Left: Workplace adoption market share (JetBrains, Jan 2026). Right: Satisfaction among senior engineers — Claude Code leads widely despite its lower market share. Source: JetBrains AI Pulse Survey, Pragmatic Engineer Survey.

But here is the critical nuance: although Claude Code is the most loved tool, it only has 18% workplace adoption versus 29% for GitHub Copilot. And among small startups (fewer than 50 people), Claude Code reaches 75% adoption, while in enterprises with 10,000+ employees, Copilot dominates with 56%. This bifurcated pattern reveals something profound: startups choose based on technical capability; large enterprises choose based on ease of acquisition. As today's startups become tomorrow's enterprises, their technology preferences will follow those paths.

My personal approach, after months of using Claude Code and Gemini, has evolved into what I call "conscious routing": I use Claude Code as my work interface (its agentic loop, its terminal integration, its ability to maintain context across long sessions), but I route model calls based on task complexity. For routine work, Kimi K2.6 or DeepSeek V4. For high-complexity tasks where I cannot tolerate errors, Claude Opus 4.6. This hybrid approach gives me 90% of Opus quality at 15% of the cost.


5. Why Strategy Beats Speed: Lessons from a System Builder

The Error of Prompts as Architecture

There's a phrase I've been repeating in conversations with fellow architects: "If you go too fast trying to build everything with prompts, you will surely fail. If you don't use AI, you will go slow and steady — very, very slow — but you will lose the ability to fail fast and correct."

This false dichotomy — between "moving fast and breaking things with AI" and "not using AI at all" — is the root of many failures. Teams that "go fast with prompts" build impressive demos that collapse when they face real data, edge cases, and compliance requirements. Teams that reject AI entirely lose the competitive advantage of rapid iteration that the technology provides.

The solution is not to choose an extreme. The solution is to understand that AI is a capability that is orchestrated, not a product that is consumed. An influential article in the OpenAI developer community from July 2025 titled "Prompt Engineering Is Dead, and Context Engineering Is Already Obsolete" captured this transition perfectly. Pure prompt engineering suffers from intrinsic fragility: minor changes in input, model versions, or even random drift can destroy the effectiveness of a carefully tuned prompt. It doesn't scale. It isn't maintainable. It doesn't provide consistent reasoning in complex workflows.

The future, the article argues, is not in more elaborate prompts or more extensive context. It is in automated workflow architectures where language models are components in a larger system, not the system itself. This is exactly what my orchestrator Fararoni Flow is designed to do.

The Five Failure Patterns I've Observed

After years of building enterprise systems and the past few months working intensively with AI agents, I've identified five recurring failure patterns:

1. Obsession with Model Selection: Teams that spend weeks comparing Claude vs GPT vs Gemini, optimizing for minor differences in output quality, while their evaluation coverage remains weak and their input/output specifications remain vague. Models improve faster than comparison cycles run. By the time you finish evaluating, a new version has been released that invalidates your results.

2. Cost Blindness: Running the most expensive model for every request regardless of complexity, without unit economics tracking, and without monitoring token usage patterns. This leads to surprise bills that can derail projects and kill ROI. The cost of AI is never just the model call: it includes retrieval, orchestration, retries, and more.

3. Chatbots Without Differentiation: Building generic chat interfaces without domain-specific context, specialized workflows, or unique capabilities. These solutions compete directly with ChatGPT, Claude, and other generic tools that users already have. If your competitive advantage is "we have a chatbot too," prepare for disappointment.

4. Over-Engineering of Tool Calling: Creating elaborate tool schemas for simple operations, defining tools for basic computation or data formatting, and building complex orchestration when simple prompt engineering would work. Every tool call adds latency and potential failure points.

5. Ignoring End-User Constraints: Voice interfaces for noisy environments, high-resolution video processing for users with limited bandwidth, and complex multi-step workflows for users with limited time. Technical capability is not equal to user value.


6. Fararoni Flow: Why I'm Building an Orchestrator in 2026

The Vision: Sovereignty Over Your Agents

In the midst of this chaotic landscape — where costs vary by orders of magnitude, where models improve every month, where platforms close walled gardens and open others — I decided to build something different. Fararoni Flow is a multi-purpose agent orchestrator born from a simple premise: in a world where everything is constantly changing, the only sustainable advantage is the ability to adapt quickly.

The interface I share in the opening image shows the main dashboard: 7,242,574 tokens processed, 395 executions, 277 completed, 1,247 LLM calls, with 31 active agents in the system. These are not demo numbers. They are real numbers from a system I use daily to automate technology intelligence workflows, email processing, briefing generation, and complex multi-step task orchestration.

Why Java 25, NATS, and Hexagonal Architecture with Sidecar

The technology stack choice is not accidental. Each decision responds to a specific requirement of agent systems at scale:

Java 25 (LTS): The latest Long-Term Support version of Java brings critical improvements for agent systems:

  • Stabilized Virtual Threads: handle 1.2 million requests per second in recent benchmarks, surpassing WebFlux's 900K. For an orchestrator that coordinates dozens of concurrent agents, this is fundamental.
  • AOT Method Profiling (JEP 515): reduces warm-up time by 15-25%, critical for microservices and serverless functions where cold-start matters.
  • Compact Object Headers (JEP 519): reduces memory usage by up to 20% according to Oracle and Amazon tests on hundreds of production services.
  • Scoped Values (JEP 506): allows sharing data across concurrent tasks without ThreadLocal, essential for shared context between agents.

NATS: The messaging system I chose as the backbone of communication between agents. NATS handles millions of messages per second with sub-millisecond latency. Its pub/sub model allows agents to communicate in a decoupled way: an agent publishes an event, interested subscribers process it. There is no direct coupling, no complex message queues. It is the messaging system used by companies like VMware, Ericsson, and SAP in production at massive scale.

Hexagonal Architecture (Ports & Adapters): This pattern is the backbone of Fararoni Flow. Each agent's business logic is completely isolated from external concerns: model API calls, data persistence, authentication, logging. If tomorrow I want to switch from Claude to Kimi, from PostgreSQL to MongoDB, or from REST to gRPC, the agent's logic doesn't change. Only the adapter changes. In a field where the underlying technology evolves weekly, this decoupling is not a luxury: it is a survival necessity.

Sidecar Pattern: Each main agent in Fararoni Flow travels accompanied by sidecar containers that handle cross-cutting concerns: structured logging, telemetry, health checks, and secure communication. This pattern, popularized by Kubernetes and used by Google, Uber, Airbnb, and eBay, allows the main container to focus exclusively on its business logic while the sidecars handle "how" it runs. I can update the logging system without touching an agent. I can change the communication protocol without affecting mission logic.

MCP (Model Context Protocol): With 110 million monthly SDK downloads and adoption by Anthropic, OpenAI, Google, and Microsoft, MCP has become the de facto standard for agent-tool integration. Fararoni Flow uses MCP to connect agents with external tools: email reading (IMAP), web search, command execution, and database access. MCP collapses the N×M integration problem (N tools × M AI platforms) to N+M. You build one MCP server for your tool, and any compatible agent can use it.

DEFCON Levels: Resilience by Design

A unique feature of Fararoni Flow is the DEFCON level system (0-5) for each mission. Inspired by the U.S. aerospace defense system, these levels define the alert state and resources assigned to each task:

  • DEFCON 0: Normal mission. Standard execution with automatic retries.
  • DEFCON 1: Intensified monitoring. Each step is verified before continuing.
  • DEFCON 2: Human escalation required. The agent can suggest but cannot execute critical changes.
  • DEFCON 3: Read-only. The agent can investigate but cannot modify.
  • DEFCON 4: Passive observation mode. Monitoring only, no action.
  • DEFCON 5: Mission aborted. All operations stopped.

This system resolves one of the fundamental problems of autonomous agents: how do you delegate authority without losing control. Not all tasks require the same level of supervision. A "generate an email summary" mission can run at DEFCON 0. A "modify production configuration" mission should require at least DEFCON 2.

The System in Numbers

Metric Value Context
Tokens processed 7,242,574 Cumulative since launch
Total executions 395 Missions initiated
Successfully completed 277 (70.1%) Current success rate
LLM Calls 1,247 Language model calls
Active agents 31 Specialized agents available
Missions created 369 Mission library
Backend WebFlux/Netty Reactive stack on Java 25
Native image GraalVM Native AOT compilation for sub-50ms cold start

These numbers reflect a system in active use, not a prototype. The 70% success rate sounds modest until you consider that it includes experimental missions, agents in development, and tasks that deliberately explore the limits of what is possible. In agentic AI, 70% success with rapid iteration is more valuable than 95% with months-long development cycles.


7. What I've Learned: Five Principles for Building with AI in 2026

After months of building Fararoni Flow and observing the enterprise AI landscape, these are the five principles that would guide my approach if I were starting today:

Principle 1: Don't Buy the Hype, Buy Flexibility

The AI model market changes weekly. What is "the best model" today will be surpassed in three months. Invest in architecture that allows you to change models without rewriting your application. A model routing system is not a luxury: it is insurance against obsolescence. Tyler Folkman demonstrated that a well-designed model router can reduce your costs by 94% while maintaining quality. That is not optimization: it is financial survival.

Principle 2: 80% of Your Work Doesn't Need the 99% Model

For most daily development tasks — APIs, tests, scaffolding, standard debugging — Kimi K2.6 and DeepSeek V4-Pro offer equal or superior performance to Claude Sonnet 4.6 at a fraction of the cost. Reserve Opus 4.6 for complex architecture, large-scale refactorings, and problems where an error is costly. A hybrid approach gives you 90% of Opus quality at 15% of the cost.

Principle 3: Observability Before Autonomy

You cannot improve what you cannot measure. Fararoni Flow records every token consumed, every tool call, every DEFCON state change, and every mission outcome. Without this telemetry, you would be flying blind. Agent autonomy is powerful but dangerous without complete observability. Start with granular monitoring before adding autonomy.

Principle 4: Start with Missions, Not with Agents

The most common mistake I see is building "cool agents" and then looking for a purpose for them. The correct approach is to identify a specific business mission — "I need a daily technology briefing based on my emails and RSS feeds" — and then design the minimum agent necessary to fulfill it. Fararoni Flow started with a single mission: process emails and generate summaries. The current 31 agents and 369 missions are the result of organic iteration, not centralized planning.

Principle 5: Persistence > Speed

Building AI systems is hard. It is. There are moments when a model that worked perfectly yesterday starts behaving erratically today. There are pipelines that break due to changes in external APIs. There are costs that spike because you forgot a rate limit in a loop. The difference between those who succeed and those who abandon is not intelligence or resources. It is the persistence of continuing to iterate when everything seems broken. If you persist, the results come. Not always on the timeline you expect, but they come.


8. The Technical Landscape: How to Build an Orchestrator That Scales

Event Architecture with NATS

Fararoni Flow is built on an event architecture where NATS acts as the central nervous system. When a user creates a mission, a mission.created event is published. Subscribed agents evaluate whether they can handle that mission based on their declared capabilities. If an agent accepts, it publishes mission.claimed and begins execution. Each step within the mission generates events: step.started, step.completed, step.failed, step.retried.

This model has several critical advantages:

  • Decoupling: Agents don't know about each other. They only know how to respond to events.
  • Scalability: I can add more instances of any agent without changing code.
  • Resilience: If an agent fails, pending events are automatically requeued.
  • Observability: Every event is logged, enabling complete replay of any execution.

GraalVM Native: Speed That Matters

Native image compilation with GraalVM reduces the application's cold-start to less than 50 milliseconds. In an agent system where functions can scale to zero when there is no work and activate on demand, this is the difference between an instant response and a frustrating experience. Spring Boot 3.x integrates native GraalVM support, and Java 25 brings additional improvements in AOT profiling that make the application reach peak performance almost immediately after startup.

WebFlux/Netty: Concurrency Without Compromise

The choice of Spring WebFlux on Netty over the traditional platform thread model is not accidental. Benchmarks from 2025-2026 show that Virtual Threads on Netty outperform pure WebFlux in approximately 45% of scenarios, especially under high concurrent load. For an orchestrator that handles multiple agents running simultaneously, each with their own external API calls, the ability to handle tens of thousands of concurrent connections with low latencies is essential.

MCP: The Universal Integration Layer

Fararoni Flow implements MCP servers for all its external integrations: IMAP for email, connectors for databases, clients for language model APIs, and adapters for file systems. This means that any MCP-compatible tool — and in 2026 that includes Claude Desktop, VS Code Copilot, Cursor, and dozens of IDEs and platforms — can use Fararoni Flow agents as tools.

The image I shared at the beginning shows the "MCP Connections" interface in the sidebar, with IMAP and Gmail connectors active. These connections are the bridge between the world of autonomous agents and the existing information systems that enterprises already use.

The Sidecar Pattern in Detail

The sidecar pattern is one of Fararoni Flow's most important architectural components and deserves a detailed explanation. Inspired by the Kubernetes model where each pod can contain multiple containers sharing resources, the sidecar pattern in our context means that each main agent travels accompanied by auxiliary containers that handle cross-cutting functions.

Imagine an agent specialized in email analysis. Its main container contains exclusively the business logic: how to parse emails, how to identify important topics, how to generate summaries. But it travels with three sidecars:

Logging Sidecar: Captures every event from the agent — task start, API call, result, error — and sends it to a centralized structured logging system. The main agent doesn't know it exists. It just does its job.

Telemetry Sidecar: Collects performance metrics — response time, tokens consumed, success rate — and exposes them in Prometheus format for scraping. If the agent starts consuming more tokens than normal, the alarm triggers.

Secure Communication Sidecar: Handles TLS encryption, mutual authentication, and connection retries. The main agent speaks HTTP without thinking about security; the sidecar ensures that communication is secure.

This separation has massive benefits for operations. I can update the logging system for the entire cluster without touching a single agent. I can change the telemetry protocol without affecting business logic. I can rotate security certificates centrally. In an ecosystem where I expect to have dozens of different agents, each specialized in a domain, this operational consistency is not optional: it is the foundation on which reliability is built.

The sidecar pattern also solves a practical problem for multidisciplinary teams. The main agent can be written in Java — my preferred language for complex business logic — while a natural language processing sidecar can be in Python, leveraging the rich NLP library that the Python ecosystem offers. The logging sidecar could be in Rust, maximizing resource efficiency. Each component uses the appropriate language for its purpose, communicating through well-defined interfaces.

NATS as Nervous System: Beyond Simple Pub/Sub

The choice of NATS as the messaging backbone was not the most obvious one. Many teams would have chosen Apache Kafka — the de facto standard for event streaming at scale — or RabbitMQ — the reliable choice for decades. But NATS offers something these systems don't have to the same degree: radical simplicity with extraordinary performance.

NATS handles millions of messages per second with latencies below one millisecond. In a comparative benchmark by the Cloud Native Computing Foundation, NATS proved to be 10-100x faster than Kafka in low-latency messaging scenarios with small messages. For an agent orchestrator where most messages are state events ("step X completed", "agent Y failed", "mission Z requires escalation"), these messages are inherently small and frequent.

But the real reason NATS is perfect for Fararoni Flow is its flexible subscription model. NATS supports multiple messaging patterns in a single system:

Classic Pub/Sub: An agent publishes an event, all interested subscribers receive it. Perfect for broadcast notifications.

Request/Reply: An agent sends a request and waits for a response. The pattern automatically handles routing responses to the correct requester, even in systems with multiple instances. This is fundamental for agent-to-agent coordination.

Queue Groups: Multiple instances of the same agent subscribe as a group. NATS delivers each message to exactly one instance of the group, enabling automatic load balancing. If I have three instances of an "email processing" agent, each email goes to exactly one instance.

JetStream: NATS's persistence layer that adds durability, message replay, and consumer groups with different processing speeds. If an agent fails and restarts, it can resume processing from where it left off.

Key-Value Store: A distributed key-value store integrated into NATS that I use for shared state between agents. The state of a mission in progress is stored here, allowing any instance of an agent to continue another's work.

This versatility means I don't need to maintain multiple messaging systems. A single NATS cluster handles all the communication patterns that Fararoni Flow requires. That drastically simplifies operations: one system to monitor, one system to back up, one system to scale.

NAT's hub-and-spoke topology is also ideal for microservices architectures. Instead of connecting services point-to-point — which creates an unmaintainable tangle of connections — all agents connect to the NATS cluster. When I add a new agent, it only needs to know the cluster address. It doesn't need to know anything about the other agents. This decoupling is what allows Fararoni Flow to scale from 5 agents to 50 without a massive re-architecture.


9. Java 25 in 2026: Why I Chose the Elephant for a Gazelle Race

One of the questions I've been asked most since sharing Fararoni Flow is: "Why Java? Python is the language of AI." It's a valid question, and the answer reveals a lot about the philosophy behind the system.

Python for Prototypes, Java for Production

There is no doubt that Python dominates the AI research ecosystem. PyTorch, TensorFlow, Hugging Face, LangChain, CrewAI — most of the frameworks that AI developers use daily are written in Python or have their first-class citizenship in Python. For prototypes, research, and rapid experimentation, Python is unbeatable.

But Fararoni Flow is not a prototype. It is an orchestration system designed to run 24/7, coordinating dozens of agents, processing millions of tokens, and maintaining the state of hundreds of concurrent missions. For that type of system, you need something Python cannot easily offer: predictable performance at scale, static typing that prevents errors at compile time, and a mature observability and operations ecosystem.

Virtual Threads: The Silent Game Changer

The feature that excited me most about Java 21 (and which has been perfected in Java 25) is Virtual Threads from Project Loom. The 2025-2026 benchmarks are conclusive: a server based on Virtual Threads over Netty handles 1.2 million requests per second on a 16-core machine, surpassing WebFlux with Project Reactor's 900K. In high-concurrency load scenarios, Virtual Threads win in approximately 45% of cases.

For an agent orchestrator, this is transformational. Each agent in execution can have its own virtual thread without consuming system OS thread resources. I can have hundreds of agents "running simultaneously" without the system feeling loaded. When an agent makes an external API call — which is the bulk of an agent's execution time — the virtual thread "parks" automatically, freeing resources for other agents.

The Spring Boot 3.x Ecosystem

Spring Boot 3.x brings native support for Virtual Threads, GraalVM Native Image, and the reactive WebFlux stack. The Spring ecosystem is massive: Spring Security for authentication and authorization, Spring Data for persistence, Spring Cloud for microservices patterns, Spring Batch for batch processing. Each of these projects has decades of enterprise production maturation.

When you build an agent system that needs OAuth2 authentication, rate limiting, circuit breakers, and distributed tracing, you don't want to build that from scratch. You want a framework that does it well, that has done it well for years, and that has the documentation and community to solve problems quickly.

GraalVM Native: Cold Starts That Don't Hurt

One of the historical criticisms of Java has been startup time. "Write once, run everywhere" felt more like "Write once, wait everywhere" for serverless applications. GraalVM Native Image changes that equation. Ahead-of-time compilation produces native binaries that start in less than 50 milliseconds.

For Fararoni Flow, this means I can scale agents to zero when there is no work and activate them on demand without users noticing the delay. In an agent system where different types of agents have different usage patterns — some constantly active, others sporadic — this "scale to zero" capability has direct implications for infrastructure costs.

Project Leyden and AOT Caching

Java 25 brings significant improvements from Project Leyden, especially AOT Method Profiling (JEP 515). This system records what the application does during a training run and saves an optimized cache for future runs. The result: the JVM generates optimized native code immediately at startup, without having to wait for the JIT compiler to collect hot profiles.

Benchmarks show 15-25% improvements in startup time and warm-up for applications that use this feature. For an agent system that restarts frequently — whether from deployments, failure recovery, or elastic scaling — every millisecond of startup counts.

Compact Object Headers: Memory That Matters

JEP 519 in Java 25 introduces compact object headers that reduce heap memory usage by up to 20%. Oracle and Amazon tested this feature on hundreds of production services and reported not only memory reduction, but also performance improvements of up to 10% and reduction in garbage collection frequency of up to 15%.

In an orchestration system where each agent maintains state, conversation context, and message buffers, memory efficiency is not a minor detail. It is the difference between being able to run 30 agents on a single instance or having to scale horizontally ahead of time.

The Truth About Language Choice

In the end, the choice of Java 25 is not about rejecting Python or declaring that Java is "better." It is about choosing the right tool for the right problem. Python is unbeatable for research, model experimentation, and prototype building. Java is unbeatable for high-concurrency distributed systems that need predictable performance, complete observability, and drama-free operations.

Fararoni Flow has components in Python — especially those that interact directly with language models and ML tools — but the core orchestration, messaging system, state management, and API layer are in Java 25 because that is where Java shines. A hybrid system that uses each language for what it does best is not a weakness: it is mature architecture.


10. The Future of Agent Orchestration: Where We're Headed

From Isolated Agents to Distributed Cognitive Systems

Agent orchestration in 2026 is where container orchestration was in 2014: on the edge of a maturity explosion. Kubernetes became popular because it solved a real problem — how to manage hundreds of containers in production — and it did so with a powerful abstraction: the pod, the service, the deployment. Agent orchestration needs its own equivalent abstractions.

I believe we are seeing three fundamental abstractions emerge:

1. The Mission as the Unit of Work: In Fararoni Flow, a mission is a complete unit of work that can involve multiple agents, tools, and steps. It is the equivalent of a "job" in batch systems or a "workflow" in integration systems. Missions have state, history, and can be replayed, audited, and optimized. The mission is the fundamental unit of orchestration because it reflects how humans think about work: as objectives to fulfill, not as isolated tasks.

2. The Agent as a Specialized Service: The agents of the future will not be generalists trying to do everything. They will be deep specialists in a specific domain, communicating with other agents through standardized protocols. Anthropic's Model Context Protocol (MCP) and Google's Agent-to-Agent Protocol (A2A) are the first steps toward this standardization. When a "data analysis" agent can communicate with a "report generation" agent and a "quality validation" agent through shared protocols, the complete system becomes more than the sum of its parts.

3. The Orchestrator as Operating System: The orchestrator is not just a coordinator: it is the operating system of the agent ecosystem. It handles agent lifecycle, resource allocation, failure recovery, security, and observability. In Fararoni Flow, the orchestrator decides which agent executes which mission based on declared capabilities, current state, and priority policies. It is the kernel of the system.

The Convergence of Protocols: MCP, A2A, and Beyond

The agent protocol ecosystem in 2026 is fragmented but converging rapidly:

Protocol Purpose Adoption Status
MCP (Anthropic) Agent → Tools 110M+ downloads/month Dominant in integration
A2A (Google) Agent → Agent 50+ launch partners Emerging but growing
ACP (IBM/Linux) Agent → Agent (commerce) Limited Early standardization
Open GAP Framework-agnostic OSS community In development

MCP has won the agent-tool integration battle. With 110 million monthly SDK downloads and adoption by the big four (Anthropic, OpenAI, Google, Microsoft), it is the de facto standard. A2A is emerging as the protocol for agent-to-agent communication, with Google leading and gaining backing from AWS and other majors.

The long-term vision is an ecosystem where these protocols complement each other: MCP for each agent to access tools, A2A for agents to communicate with each other, and possibly a third protocol for system-level orchestration. Fararoni Flow is architecturally prepared for this convergence: our inter-agent communication layer can adapt to new protocols without changing business logic.

The Democratization of Enterprise AI

A trend that deeply excites me is the democratization that these reduced costs and open standards are enabling. When DeepSeek offers V4-Pro at $0.003625 per million tokens on cache hit, and Kimi K2.6 offers frontier-level performance at mid-tier prices, the barrier to entry for building intelligent agent systems collapses.

A startup with a modest budget can today build an agent system that a year ago would have cost hundreds of thousands of dollars. An individual developer can orchestrate multiple specialized agents for less than a Netflix subscription. This is not just a cost reduction: it is a transfer of power from the big AI labs toward individual builders and small teams.

The Risks That Persist

But it is not all optimism. There are real risks that the field has not yet resolved:

Security of Autonomous Agents: When an agent has access to your email, your databases, and your production systems, the attack surface expands dramatically. A well-designed prompt injection could theoretically make an agent execute unauthorized actions. DEFCON-level systems like Fararoni Flow's are a first step, but autonomous agent security is a field in its infancy.

Model Provider Dependency: Although open protocols like MCP reduce lock-in at the tool level, dependence on specific model providers remains a risk. If Claude Opus 4.6 is the only model that can handle your most complex use case, you have a single point of failure. The multi-model routing strategy I use in Fararoni Flow mitigates this, but does not eliminate it completely.

Data Quality and Bias: 85% of AI project failures are attributed to low-quality data. Autonomous agents that make decisions based on biased or incomplete data can amplify those biases at scale. Data governance is not optional: it is a fundamental requirement.


11. Building in Public: The Commitment to Transparency

Since I started building Fararoni Flow, I have made the decision to do it in public. I share real metrics — including failures —, explain architectural decisions with their full context, and publish the code so others can learn, criticize, and improve.

This transparency is not pure altruism. It is a system-building strategy that has proven effective time and again: when you know others will see your work, you have an additional incentive to do it well. The "social pressure" of building in public is a quality accelerator.

But there is a deeper benefit. The field of agent orchestration is so new that there is no consolidated "best practices manual." We are all discovering the answers in real time. By sharing what I learn — both successes and failures — I contribute to a collective body of knowledge that benefits all builders in this space.

The Numbers I Share

The image that opens this article shows the current state of the system. It is not a snapshot of a good day: it is the typical state. 31 active agents, 369 missions created, almost 7.3 million tokens processed. The 70% success rate includes experiments that intentionally explore the limits of what is possible. I am not ashamed to admit that some missions fail: every failure is a source of learning.

The Complete Tech Stack

For the technically curious, this is the complete stack of Fararoni Flow in its current state:

Layer Technology Reason for Choice
Runtime Java 25 (LTS) + GraalVM Native Virtual Threads, AOT profiling, cold starts <50ms
Framework Spring Boot 3.5 + WebFlux/Netty Reactive stack, 1.2M req/s, mature ecosystem
Messaging NATS + JetStream Sub-millisecond latency, decoupled pub/sub, durable
Architecture Hexagonal + Sidecar Pattern Logic isolation, interchangeable adapters
Protocols MCP (Model Context Protocol) Standard for agent-tool, 110M+ downloads/month
LLM Models Claude Opus/Sonnet, Kimi K2.6, DeepSeek V4 Intelligent routing by task complexity
Persistence PostgreSQL + Redis Durable state + high-speed cache
Observability Micrometer + Prometheus + Grafana Real-time metrics, alerting
Infrastructure Docker + Kubernetes Container orchestration, auto-scaling

12. Operating in Production: Lessons Only Time Can Teach

Building an orchestrator is easy. Operating it in production for months is where the real lessons appear. I've done both, and these are the lessons you won't find in any academic paper or YouTube tutorial.

The 80/20 Law of Agents

I quickly discovered that 80% of the value Fararoni Flow generates comes from 20% of the agents. The email processing agents, technology briefing generation, and result validation agents are the ones that run dozens of times a day. The more exotic agents — such as the one that analyzes cybersecurity trends or the one that generates academic paper summaries — run weekly but are equally valuable in their moments.

This distribution taught me to think in three categories of agents: workhorses (those that do the heavy daily work), specialists (those that activate for specific tasks), and explorers (those that test new capabilities). Each category has different infrastructure, cost, and monitoring requirements. Workhorses need to be always available and optimized for cost. Specialists can start on demand. Explorers can fail without serious consequences.

The "Zombie Agent" Problem

A phenomenon I did not anticipate was that of "zombie agents": agents that remain in an intermediate state — neither completely active nor completely finished — consuming resources without producing value. An agent that got stuck waiting for a response from an external API that never arrived, or an agent that entered an infinite retry loop because the success condition was unreachable.

I solved this with an "escalation timeout" system. Each step of a mission has a predetermined timeout based on its DEFCON level. If a step exceeds its timeout, the system not only marks it as failed: it escalates. DEFCON 0 becomes DEFCON 1, which becomes DEFCON 2, until a human intervenes or the mission is automatically aborted. This system has prevented countless situations of agents consuming tokens and resources without purpose.

The Importance of Agent "Memory"

An agent without memory is like an employee with amnesia: it restarts from zero in every conversation. Fararoni Flow implements three types of memory:

Session Memory: The context of the current mission. What the agent has learned in previous steps, intermediate results, and decisions made. This memory lives in Redis and is lost when the mission ends.

Working Memory: Accumulated learnings about success and failure patterns. If an agent discovers that a certain approach works better for a type of task, that learning persists in PostgreSQL and is loaded in future missions of the same type. This memory is the foundation of continuous improvement.

System Memory: Static knowledge about the domain. Documentation, business rules, and templates that the agent uses as reference. This memory is updated manually by system operators.

The combination of these three memories transforms agents from isolated task executors into continuous learners. Every failed mission is a learning opportunity that benefits future missions.

Real Costs vs. Projected Costs

When I designed the system, I projected a monthly operating cost based on public benchmarks. The real costs were different — not necessarily higher, but different in their distribution. I discovered that:

  • 60% of token cost goes to "premium" models (Claude Opus) but they only represent 15% of calls. Those calls are the most critical.
  • 25% of cost goes to "workhorse" models (Kimi K2.6) and they represent 55% of calls. This is where intelligent routing pays dividends.
  • The remaining 15% goes to "budget" models (DeepSeek V4) and they represent 30% of calls. Simple tasks that don't justify expensive models.

This distribution validated my original hypothesis: you don't need a supermodel for every task. You need a system that assigns the right model to the right task. The difference between a system that spends $1,000/month and one that spends $200/month is not output quality: it is routing intelligence.

The Human Dimension

Technically, Fararoni Flow is a software system. Operationally, it is a human-machine team. Agents handle 80% of routine work, but humans remain essential for:

  • Validating critical outputs: A briefing for an investment decision doesn't go out without human review.
  • Handling edge cases: Agents are good at the common; humans are better at the unexpected.
  • Training new agents: Each new agent requires human supervision during its first executions.
  • Defining strategy: Agents execute; humans decide what to execute.

Ignoring this human dimension is one of the most common mistakes I see in AI implementations. Systems that try to completely replace humans fail. Systems that amplify human capabilities — delegating the routine so humans can focus on the strategic — succeed.


13. The Question You Should Ask Yourself

If you've made it this far, you're probably considering how AI can transform your work, your team, or your company. The question you should ask yourself is not "which model should I use?" or "how much does Claude Code cost?" The question is: "what is my orchestration strategy?"

The data is clear. 80% of AI projects fail. Costs can vary by orders of magnitude depending on how you route your model calls. The tools that dominate today may be obsolete in a year. In this environment, the only sustainable advantage is not knowing the latest model. It is having an architecture that allows you to adapt faster than the competition.

I've spent the past few months building that architecture for my own use. Fararoni Flow is not a finished product — it is a living system that evolves weekly — but it is the concrete answer to an abstract question: how do you orchestrate dozens of specialized agents to work together as a coherent system, with resilience, observability, and controlled costs.

Who this article is for

If you are a CTO or VP of Engineering considering an AI strategy for your company, the data I presented on failure rates and costs should be your starting point. Don't start with "which model do we buy." Start with "what architecture allows us to try, fail, and adapt quickly."

If you are a senior developer who uses Claude Code, GitHub Copilot, or Cursor daily, the numbers about model routing should resonate with you. You don't need to give up Claude Code to save money. You need a system that uses Claude for what Claude does best, and cheaper models for everything else.

If you are a software architect building distributed systems, the patterns I described — hexagonal, sidecar, event-driven with NATS — should be familiar. The novelty is not in the individual patterns, but in how they compose to solve a new problem: the orchestration of autonomous agents.

If you are an entrepreneur thinking about building something in the AI agent space, I want you to know that the field is wide open. The big players are busy building models. There is a massive opportunity in the orchestration layer, the protocol layer, and the tooling layer that makes agents productive in the real world.

How to Connect

If you're curious to try Fararoni Flow, if you're building something similar and want to exchange ideas, or if you simply want to better understand how an agent orchestrator works on Java 25, NATS, and hexagonal architecture: contact me through ebercruz.com. I'm building this in public, learning in public, and sharing what I learn.

I don't promise that Fararoni Flow will be the perfect solution for your use case. But I promise the conversation will be honest, technical, and results-oriented. In a field where most are selling smoke, I prefer to build concrete — and share how I do it.

If you persist, the results come.

Not always on the timeline you expect. Not always in the form you imagined. But they come. That is the final lesson Fararoni Flow has taught me: in the building of intelligent systems, as in life, disciplined strategy beats uncontrolled speed. Thoughtful architecture beats impulsive prompts. And persistence — that ability to keep iterating when everything seems broken — is the definitive differentiator between those who transform AI into competitive advantage and those who become another failure statistic in the next RAND Corporation report.


References and Sources

  1. RAND Corporation (2025). AI Project Failure Analysis: 2,400+ Enterprise Initiatives. Retrieved from: (Folio3 AI) (https://www.folio3.ai/blog/ai-project-failure-rate-stats)
  2. MIT Project NANDA (2025). The GenAI Divide: State of AI in Business. Retrieved from: (TechTarget) (https://www.techtarget.com/searchenterpriseai/feature/AI-deployments-gone-wrong-The-fallout-and-lessons-learned)
  3. S&P Global Market Intelligence (2025). AI Initiative Abandonment Rates. Retrieved from: (Folio3 AI) (https://www.folio3.ai/blog/ai-project-failure-rate-stats)
  4. Gartner (2024). Predicts 30% of GenAI Projects Will Be Abandoned After POC By End of 2025. Retrieved from: (Gartner) (https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025)
  5. Fortune (2026). Microsoft lost its way in the AI race. Can Copilot get it back on course? Retrieved from: (Fortune) (https://fortune.com/2026/05/21/microsoft-copilot-ai-openai-satya-nadella-gemini-claude/)
  6. DeepSeek API Documentation (2026). Models & Pricing. Retrieved from: (deepseek.com) (https://api-docs.deepseek.com/quick_start/pricing)
  7. InfoWorld (2026). DeepSeek's steep V4-Pro price cut escalates AI pricing war. Retrieved from: (InfoWorld) (https://www.infoworld.com/article/4176709/deepseeks-steep-v4-pro-price-cut-escalates-ai-pricing-war.html)
  8. Medium/AI tentenco (2026). Kimi K2.6 & Kimi Code Review: Saving 88% Coding Costs? Retrieved from: (Medium) (https://medium.com/@tentenco/kimi-k2-6-kimi-code-review-saving-88-coding-costs-b7e8c5eaf5f1)
  9. Ideas2IT (2026). Claude Code With Kimi, DeepSeek vs Claude: Cost & Benchmarks. Retrieved from: (ideas2it.com) (https://www.ideas2it.com/blogs/claude-code-alternative-models)
  10. Uvik (2026). Claude Code vs Cursor vs Copilot vs Codex. Retrieved from: (uvik.net) (https://uvik.net/blog/claude-code-vs-cursor-vs-copilot-vs-codex-2026/)
  11. Menlo Ventures (2025). State of Generative AI in the Enterprise. Referenced in: (beam.ai) (https://beam.ai/agentic-insights/the-great-ai-flip-why-76-of-enterprises-stopped-building-ai-in-house)
  12. Caylent (2026). POC to PROD: Hard Lessons from 200+ Enterprise Generative AI Deployments. Retrieved from: (Caylent) (https://caylent.com/blog/poc-to-prod-hard-lessons-from-200-enterprise-generative-ai-deployments-part-2)
  13. IBM (2025). What is AI Agent Orchestration? Retrieved from: (IBM) (https://www.ibm.com/think/topics/ai-agent-orchestration)
  14. Lyzr AI (2026). Agent Orchestration 101. Retrieved from: (lyzr.ai) (https://www.lyzr.ai/blog/agent-orchestration/)
  15. Model Context Protocol (2026). MCP Roadmap and Technical Direction. Retrieved from: (getknit.dev) (https://www.getknit.dev/blog/the-future-of-mcp-roadmap-enhancements-and-whats-next)
  16. Java 25 LTS Release Notes (2025). Performance Improvements in JDK 25. Retrieved from: (inside.java) (https://inside.java/2025/10/20/jdk-25-performance-improvements/)
  17. GitHub - loom-webflux-benchmarks (2026). Benchmarks of Spring Boot REST service comparing Java Virtual Threads with WebFlux. Retrieved from: (Github) (https://github.com/chrisgleissner/loom-webflux-benchmarks)
  18. OpenAI Developer Community (2025). Prompt Engineering Is Dead, and Context Engineering Is Already Obsolete. Retrieved from: (OpenAI API Community Forum) (https://community.openai.com/t/prompt-engineering-is-dead-and-context-engineering-is-already-obsolete-why-the-future-is-automated-workflow-architecture-with-llms/1314011)

Eber Cruz Fararoni is a software architect specialized in distributed systems, event-driven architectures, and applied artificial intelligence. He builds Fararoni Flow, an open-source AI agent orchestrator, on Java 25, NATS, and hexagonal architecture. He writes at ebercruz.com about the intersection of software engineering and artificial intelligence.

If you found this article useful, share it with someone navigating the complex enterprise AI landscape in 2026. And if you want to try Fararoni Flow or exchange ideas about agent orchestration: contact me.

Top comments (0)