✓ Human-authored analysis; AI used for formatting and proofreading.
Meta's Capacity Efficiency architecture is further along the completeness spectrum than any AI agent system published this year. The comparison matters because it shows exactly where the industry's best work still stops short.
The dominant agent architecture — "Agent = Model + Harness," as described in LangChain's widely-read harness articles and OpenAI's harness engineering blog — has a model (the tool), prompts and AGENTS.md files (a weak engine), CI/CD (the transmission), self-verification (no independent control), and no enforced boundaries (no casing). The Harness is Half the Architecture diagnosed the gaps: no independent oracle, no declared intent, no coordination protocols, no subtraction discipline.
Meta's Capacity Efficiency architecture advances past that baseline in three specific ways. The remaining gap is smaller and more precise — which makes it more actionable.
First: the tools/skills separation replaces the undifferentiated "harness." Where the LangChain architecture lumps everything that isn't the model into one subordinate category, Meta separates capability (standardized tool interfaces — query profiling data, fetch experiment results, search code) from judgment (skills that encode domain expertise about what to look for and how to interpret it). That's a genuine Engine/Tool separation. The skills aren't prompts. They're encoded reasoning patterns of senior engineers, composable and reusable across offense, defense, and new capabilities. The harness article's "everything serves the model" frame doesn't describe what Meta built. Meta built tools that serve the system and skills that encode human judgment independently of any specific model.
Second: offense/defense on one platform is architectural unification the harness articles never attempt. Finding optimizations and catching regressions are the same structure — gather context, apply domain expertise, produce a resolution — differing only in the skills. Most organizations, and the harness articles, build these as separate concerns. Meta saw the shared structure and built one platform. Each new capability (conversational assistants, capacity planning, guided investigations) composes existing tools with new skills, requiring few to no new data integrations. That's composability through a shared Transmission layer — standardized interfaces, not a shared filesystem.
Third: FBDetect is independent detection, not self-verification. The harness architecture's approach to correctness is the model reviewing its own output. Meta's FBDetect is a separate system — behavioral monitoring that catches regressions as small as 0.005% in noisy production environments, independent of the coding agent that produced the change. That's closer to the independent Control Unit the completeness model requires than anything in the harness articles. It has correlated failure modes of its own (behavioral monitoring can only catch what manifests), but it is genuinely independent of the generator — which the harness's self-verification is not.
Skills encode how to investigate. Nothing declares what must be true.
The skills in Meta's architecture encode procedural knowledge. The reasoning patterns senior engineers developed over years. "Consult the top GraphQL endpoints for endpoint latency regressions." "Look for recent schema changes if the affected function handles serialization." "Check recent configuration deployments that could have caused a step change in resource usage."
These are investigation recipes. They tell the agent how to look for problems that have already manifested. They're encoded expertise about diagnosis and they're good at it.
What's absent is declarative knowledge. Statements about what must be true, regardless of whether a violation has manifested yet. "Serialization functions in this service must not exceed N milliseconds per call." "No single code change may increase fleet power consumption by more than X% without explicit approval." "This function must remain memoized — any change that removes memoization violates the performance contract."
The difference is the same difference the entire industry keeps missing: procedural knowledge is reactive. It tells you how to investigate after something went wrong. Declarative knowledge is proactive. It tells you what must hold, and a machine can check it before anything goes wrong.
Meta's skills are the best procedural knowledge system published anywhere. The declarative layer — invariants that prevent regressions before deployment rather than catching them after — doesn't exist in the architecture.
FBDetect catches what manifests. Silent violations are invisible.
FBDetect is behavioral monitoring. It observes production time-series data and detects step changes in resource usage. It catches regressions that produce observable signals. At 0.005% sensitivity, it catches very small signals. That's real and impressive.
But the class of problems it structurally cannot see is the class that compounds silently:
Sub-threshold accumulation. A code change that degrades performance by 0.004% — just below FBDetect's threshold. Then another. Then another. Each one invisible individually. After fifty such changes, the fleet is 0.2% slower, consuming measurable additional power, and no single regression was detected because none crossed the threshold. A declared invariant — "this function's P99 latency must not exceed N microseconds" — would catch each change at deployment time, before it enters production, before it compounds.
Structural violations with no immediate behavioral signal. A code change removes memoization from a function that was memoized for performance reasons. The function still works. Tests pass. No regression signal fires immediately because the workload that exercises the hot path hasn't peaked yet. When it does — during the next traffic spike — the regression appears as a sudden step change that looks like a load-driven failure, not a code-driven one. A declared invariant — "this function must remain memoized" — would have caught it at PR time.
Configuration-driven performance risk. A configuration change modifies a timeout, a batch size, or a retry policy in a way that degrades throughput under specific conditions. The conditions haven't occurred yet. FBDetect sees nothing because nothing has manifested. A declared invariant on the configuration — "batch size for this pipeline must remain between X and Y" — would catch the violation before deployment.
These are deducible problems. The answer is computable from the code or configuration alone, without waiting for production behavior. FBDetect is the best behavioral detection system in the industry. It cannot see what hasn't happened yet. Declared invariants can.
The ratchet that would make the system self-improving
Here's the pattern Meta's architecture repeats and doesn't close:
- A code change ships. It introduces a performance regression.
- FBDetect catches it. The AI Regression Solver investigates.
- The solver produces a fix-forward PR. A human reviews it.
- The human approves. The fix deploys. The regression is resolved.
That human, in step 3, just learned something: this class of change causes this class of regression. They now know that removing memoization from serialization functions in this codebase causes latency spikes. Increasing logging verbosity in this service causes CPU regressions. Changing batch sizes in this pipeline degrades throughput.
That knowledge should become a specification — a declared invariant that prevents the same class of regression from ever deploying again. "Serialization functions in this codebase must remain memoized." "Logging verbosity in this service must not exceed level N." "Batch size for this pipeline must stay within range X–Y."
Instead, it stays as a fixed PR. The regression is resolved. The knowledge evaporates. Next month, a different engineer makes the same class of change in a different function, and the cycle repeats: deploy, regress, detect, investigate, fix. The AI Regression Solver makes the cycle faster. It doesn't make the cycle shorter. The same class of error recurs because nobody converted the one-time fix into a permanent rule.
The ratchet: every regression fix becomes a declared invariant. The human catches it once. The machine enforces it forever. Each cycle through the defense pipeline permanently expands the set of things the machine prevents, permanently shrinking the set of things FBDetect has to catch. Over time, the defense pipeline handles fewer regressions. Because the specification layer prevented the regressions before deployment.
This is the same ratchet every safety-critical domain uses. Aviation doesn't just fix each incident. It converts each incident into a regulation that prevents recurrence. Nuclear doesn't just resolve each event. It converts each event into a technical specification the interlocks enforce. The investigation gets faster with each cycle (Meta has this). The prevention gets broader with each cycle (Meta doesn't have this — yet).
The offense side has the same gap
On offense — finding optimizations — the architecture is: gather context, apply encoded expertise, produce a candidate fix. The agent looks up opportunity metadata, documentation, past examples, specific files and functions, and validation criteria.
The validation criteria are the closest thing to declared invariants in the architecture. But they're per-opportunity, not systemic. An optimization that satisfies its validation criteria but violates a performance invariant elsewhere — introducing a latency regression in a downstream service while improving CPU usage in the target service is invisible to per-opportunity validation. It requires a cross-cutting invariant: "end-to-end latency for this user flow must not exceed N milliseconds, regardless of which service is optimized."
The same layer that prevents defensive regressions would also bound offensive optimizations: declared performance invariants that all changes. Regressions and optimizations must satisfy. The optimization is only valid if it satisfies the local validation criteria and doesn't violate any cross-cutting invariant. Without the invariant layer, every optimization is a local improvement that may be a global regression. The system relies on FBDetect to catch the global regression after it ships, rather than preventing it at PR time.
Completing Meta's architecture
Meta's architecture has:
- Tools — standardized interfaces for data, profiling, code search, documentation. Present and excellent.
- Skills — encoded domain expertise for investigation and resolution. Present and growing.
- Detection — FBDetect, behavioral monitoring at 0.005% sensitivity. Present and industry-leading.
- Resolution — AI Regression Solver producing fix-forward PRs automatically. Present and compressing investigation time by 20x.
What completes it:
- Declared invariants — human-authored performance contracts that code and configuration must satisfy, checked mechanically at PR time, before deployment. Not behavioral. Not reactive. Deterministic and proactive.
- The ratchet — every regression fix, once approved by a human, becomes a declared invariant the machine enforces on every future commit. Each defensive cycle permanently expands the prevention surface.
- Cross-cutting verification — invariants that span services, so an optimization in service A that degrades service B is caught at PR time, not after both changes have deployed to production.
The tools and skills stay exactly as they are — they're excellent. FBDetect stays exactly as it is, it's the behavioral safety net for problems the invariants don't yet cover. The AI Regression Solver stays. It handles the regressions that make it past the invariant layer (Vassilev's proof guarantees some always will). What changes is that a growing layer of declared invariants prevents an increasing share of regressions from deploying in the first place, and each time the solver fixes one, the invariant layer grows.
The end state: a system that prevents more than it detects
Meta's stated goal is "a self-sustaining efficiency engine where AI handles the long tail." The current architecture approaches this from the detection side — catch more, investigate faster, fix automatically. That works, and it scales with better skills and better tools.
The invariant-first version approaches it from the prevention side — declare more, enforce mechanically, prevent before deployment. Each ratchet cycle moves a class of regression from "detected and fixed" to "prevented and never shipped." The long tail doesn't get handled — it gets shortened.
A self-sustaining efficiency engine that only detects and fixes runs forever at the same rate. There are always new regressions to catch. A self-sustaining efficiency engine that also prevents runs at a decreasing rate. Each cycle makes the next one smaller. The first is a treadmill. The second is a ratchet. Meta has the best treadmill. The ratchet is one layer away.
References: Meta, "Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale" (2026). The tools/skills architecture in that piece is the most sophisticated AI agent system published this year. The gap is the layer above it: declared invariants that prevent regressions before deployment, and the ratchet that converts each fix into a permanent rule. If you're on the Meta Capacity Efficiency team and you've already explored this direction, that's the conversation worth having.
Top comments (0)