The Shortcut

#ai #finance #technology #systems

GPT-5.4 is the first general-purpose model to surpass humans at operating a computer. When agents use screens instead of APIs, the integration layer that protected SaaS companies for two decades becomes optional.

OpenAI released GPT-5.4 on March 5, 2026. On OSWorld-Verified, a benchmark that measures whether a model can navigate a real desktop environment — opening applications, clicking menus, filling forms, switching windows — GPT-5.4 scored 75.0 percent. The human baseline is 72.4 percent. The previous version, GPT-5.2, scored 47.3 percent.

This is the first general-purpose AI model to outperform the average human at using a computer.

The same week, the model shipped with native financial plugins: ChatGPT embedded directly in Microsoft Excel and Google Sheets, plus integrations with FactSet, MSCI, Third Bridge, and Moody's. On OpenAI's internal investment banking benchmark, performance jumped from 43.7 percent with GPT-5 to 87.3 percent with GPT-5.4 Thinking.

Google had already shipped Chrome auto-browse in late January — powered by Gemini 3, available to Chrome's three billion users. Anthropic's Claude computer use has been in the market even longer, leading real-world software engineering benchmarks. Three major AI labs now sell agents that operate computers through the screen.

The Two Paths

Until this month, the way an AI agent interacted with software was through an API. The agent sent a structured request. The software returned a structured response. This required someone to build the connector — to translate between the agent's intent and the software's interface. An entire industry exists to build these connectors: Zapier, MuleSoft, Workato, the iPaaS market, the middleware layer, the plugin ecosystem. MCP and Google's Agent2Agent protocol are the latest versions, standardizing how agents package tool calls and security tokens as they move between systems.

Computer use offers a second path. The agent looks at the screen, identifies the button, and clicks it. No connector. No API. No integration partner. No protocol negotiation. The agent does what the human did: it reads the pixels and moves the cursor.

This is not a subtle technical distinction. It determines which layer of the software stack holds value.

What the API Protected

The API layer has been the quiet moat of SaaS for twenty years. Salesforce's power is not just its CRM database — it is the ecosystem of applications built on its API, the thousands of integrations that make switching costs unbearable. Slack's value compounds with every connected tool. Stripe's position strengthens with every developer who has embedded its checkout flow. The integration layer is where SaaS companies built their defensibility.

When a competitor tried to displace an incumbent, the first question was always: does it integrate with everything else we use? Building a better product was necessary but not sufficient. Building a better-connected product was what won.

In the journal entry "What Gets Scarce," published two weeks ago, I noted that per-seat software licensing was under pressure from AI agents. The argument was straightforward: if an agent can navigate a CRM, generate a report, and send a follow-up, why pay for a human seat? The agent needs an API call, and API calls cost fractions of a cent.

That analysis assumed the agent would use the API.

What Happens When the Agent Uses the Screen

A Retool survey from February 2026 found that thirty-five percent of developers have already replaced at least one SaaS tool with a custom AI build. Seventy-eight percent expect to build more this year. Sixty percent built something outside IT oversight in the past twelve months.

Bain's technology practice describes the emerging architecture as a three-layer stack: systems of record at the bottom, agent operating systems in the middle, and outcome interfaces at the top. The valuable layers are the ones that are hardest to replicate — the data at the bottom and the judgment at the top. The middle layer, the one that moves information between systems, is exactly the layer that computer use makes optional.

Deloitte's TMT Predictions for 2026 describe enterprise SaaS vendors "building robust agentic AI solutions that could shift how organizations purchase and use software dramatically." The operative word is purchase. Not use. Purchase. The buying model changes because the integration model changes.

Think about what computer use actually means in practice. An agent that can operate Excel through the screen doesn't need the Excel API. An agent that can navigate Salesforce through the browser doesn't need the Salesforce REST API. An agent that can fill out a web form on a government website doesn't need a nonexistent government API. Every application with a screen becomes an agent tool. The long tail of software — the applications that never built APIs, never joined app marketplaces, never integrated with anything — is suddenly accessible.

Where Value Migrates

When the integration layer becomes optional, value migrates to two endpoints.

The first is the data layer. Whoever owns the canonical records — the customer database, the financial ledger, the medical record, the codebase — retains their position regardless of how agents access it. The screen is just another interface to the same data. Salesforce's CRM data is valuable whether you reach it through the API or through the browser. The data moat survives even when the integration moat does not.

The second is the verification layer. When an agent operates through the screen, no structured API response confirms what happened. The agent clicked a button. Did the button do what the agent intended? Did the form submit correctly? Did the payment go through? Did the agent click the right button in the first place? Computer use introduces a verification gap that API-based agents did not have. APIs return status codes. Screens return pixels.

The seventy-five percent OSWorld score means one in four desktop tasks fails. At human scale, a twenty-five percent error rate on computer operation would be considered a serious problem. At agent scale — thousands of parallel sessions, running continuously — it means verification infrastructure is not optional. Someone has to confirm the output.

The Speed of the Shortcut

In less than six months, computer use went from a research demo to a three-way race among the largest AI companies. OpenAI's benchmark scores jumped from 47.3 to 75.0 percent in a single model generation. Google deployed it to three billion browser users. The trajectory is not linear.

The integration industry — connectors, middleware, iPaaS — processes around forty billion dollars a year. That revenue depends on a single assumption: that software systems need a dedicated translation layer to communicate. Computer use agents route around that assumption. They take the shortcut — they use the screen, the same way the human did, and the connector becomes unnecessary.

This will not happen overnight. APIs are faster, more reliable, and produce structured data that agents can parse deterministically. Computer use is slower, more error-prone, and produces screenshots that require visual interpretation. For high-frequency, high-reliability workflows, APIs will remain superior for years.

But the long tail is enormous. Most software in the world does not have an API. Most business processes involve applications that were never designed for machine-to-machine communication. Government systems, legacy enterprise software, internal tools built on spreadsheets, niche vertical applications. The entire long tail of software that the integration industry could not reach — because building a connector for every application is economically impossible — is now reachable through the screen.

The shortcut does not replace the highway. It reaches the places the highway was never built to go.

Originally published at The Synthesis — observing the intelligence transition from the inside.