Preecha

Posted on May 8

Google Agent Smith Writes 25% of Google's Code: What API Teams Should Know

#agents #ai #api #google

TL;DR

Google’s internal AI coding agent, Agent Smith, now generates over 25% of the company’s new production code. Unlike autocomplete tools like Copilot, Agent Smith works asynchronously in the background, writing, testing, and iterating on code without human interaction. For API teams, this raises questions about contract stability, test coverage, documentation drift, and review workflows when a quarter of your codebase is machine-generated.

Try Apidog today

Introduction

During a March 2026 earnings call, Google CEO Sundar Pichai shared a number that caught the software industry’s attention: AI-generated code now accounts for more than 25% of new code produced at Google.

This is not autocomplete. It is not a developer accepting Copilot suggestions line by line. This is code generated by an AI agent, reviewed by humans, and shipped to production.

The internal tool behind it is called Agent Smith, a nod to the self-replicating antagonist from The Matrix. It reportedly became popular enough across Google’s 180,000+ employees that Google had to throttle access to manage infrastructure strain.

Agent Smith represents a different class of AI coding tool. Copilot and Claude Code usually assist while a developer is actively working. Agent Smith works asynchronously: engineers assign tasks, leave the agent to work, and later review the completed output.

For API teams, this changes the risk model. If autonomous agents can modify endpoints, schemas, validation logic, and tests, you need guardrails that do not depend on every reviewer catching every API-level change manually.

The core questions are practical:

How do you keep API contracts stable?
How do you detect schema drift?
How do you ensure generated tests cover existing behavior?
How do you keep docs, mocks, and specs synchronized?
How do you review AI-generated API changes without rubber-stamping them?

Apidog’s integrated API lifecycle platform helps keep API design, tests, mocks, and documentation in sync regardless of whether a change comes from a human developer or an AI coding agent.

This article explains what Agent Smith does, how it differs from other AI coding tools, and how API teams can build workflows that are safer for autonomous code generation.

What Agent Smith does

Asynchronous autonomous coding

Agent Smith does not sit in your IDE waiting for inline prompts. It runs in the background.

A typical workflow looks like this:

An engineer describes a task in natural language.
Agent Smith breaks the task into subtasks.
It edits code across multiple files.
It runs tests.
It iterates on failures.
The engineer reviews the completed work.

That makes Agent Smith less like autocomplete and more like a junior developer who picks up a ticket and returns later with a pull request.

For example, instead of asking for a single function implementation, an engineer might assign:

Add user notification preferences to the profile service.
Include persistence, API changes, and tests.

An asynchronous agent may then touch:

route handlers
request validators
database models
service logic
unit tests
integration tests
API response types

That breadth is useful, but it also means API behavior can change in places reviewers may not immediately notice.

Google engineers can reportedly delegate tasks and check progress through Google’s internal chat platform, including from mobile devices. The tool can also access relevant employee profiles and internal documentation to pull context from Google’s knowledge base.

Built on Gemini and Antigravity

Agent Smith runs on Google’s Gemini model family and is augmented with retrieval systems that give it access to Google’s internal codebase and documentation.

It is built on top of Antigravity, Google’s agentic coding platform, and extends it with autonomous task decomposition and execution.

The retrieval layer matters. Agent Smith is not generating code in isolation. It can search internal implementations, reference existing patterns, and follow Google-specific conventions.

That context is what makes production-scale output possible. It also shows why your own API workflow needs explicit contracts and validation. The better the agent’s context, the better its output. The more implicit your API rules are, the easier they are to miss.

What “25% of new code” means

Pichai’s figure needs a precise reading.

“25% of new code” refers to code that:

is generated by AI, not merely autocompleted
passes human code review
ships in production systems
is measured across Google’s engineering output

It does not mean 25% of Google’s total historical codebase is AI-generated. It means that, at the time of the statement, over 25% of newly produced code was generated by AI.

The direction is still significant: autonomous coding is moving from experiment to production workflow.

How Agent Smith differs from other AI coding tools

The AI coding tool spectrum

Tool	Mode	Interaction	Scope	Production code?
GitHub Copilot	Real-time autocomplete	Inline in IDE	Line/function level	After human acceptance
Claude Code	Interactive session	Conversational	Multi-file changes	After human review
Cursor Agent	Background + interactive	IDE-embedded	Project-level	After human review
Agent Smith	Asynchronous autonomous	Task delegation	Full feature implementation	After human review
KAIROS (unreleased)	Always-on daemon	Background monitoring	Repository-wide	TBD

Agent Smith sits near the autonomous end of the spectrum.

The next step would be fully autonomous deployment without human review. No major tool does that yet, and for production API systems, it should not be the default.

Why asynchronous coding changes API review

With real-time AI tools, the developer usually sees each suggestion as it appears. They know what they asked for, why the code changed, and which assumptions were made.

With asynchronous agents, the reviewer sees the result after the work is done.

That creates several API-specific risks:

The reviewer may not know why a response format changed.
Contract changes may be buried inside implementation diffs.
Tests may validate only the new behavior.
Documentation, mocks, and SDK types may not be updated.
Breaking changes may pass code review if the reviewer focuses only on implementation correctness.

For API teams, the fix is not “review harder.” The fix is to make contracts executable and enforce them in CI.

What breaks when AI writes your API code

API contract drift

An API contract defines what consumers can rely on:

endpoints
methods
request schemas
response schemas
status codes
error formats
authentication requirements
pagination behavior
versioning rules

When humans modify an API, they may remember to update the OpenAPI spec, notify consumers, or version the change. Autonomous agents do not automatically know every coordination step unless your workflow enforces it.

Example scenario:

Agent Smith is assigned: “Add user preferences to the profile endpoint.”
It adds a preferences field to GET /api/users/{id}.
Existing tests pass because they do not reject additional fields.
A frontend TypeScript type does not include preferences.
A mobile client with strict JSON parsing fails on the unexpected field.

The implementation may be reasonable. The tests may pass. The API contract may still be broken.

Test coverage gaps

AI agents can generate tests, but those tests often validate what the agent just built. They may not protect existing behavior.

For APIs, missing coverage often includes:

exact response schemas
standard error formats
authentication edge cases
authorization failures
rate limiting
pagination consistency
sorting behavior
backward compatibility
latency expectations
idempotency rules

A generated test like this is useful but incomplete:

it("returns user preferences", async () => {
  const response = await request(app).get("/api/users/123");

  expect(response.status).toBe(200);
  expect(response.body.preferences).toBeDefined();
});

It confirms the new field exists. It does not confirm the full response still matches the contract.

Documentation drift

If your API docs are generated directly from OpenAPI, contract updates can flow into documentation automatically.

But many teams still maintain some API docs separately:

endpoint descriptions
examples
onboarding guides
migration notes
SDK usage snippets
consumer-specific caveats

When an agent changes an endpoint, those docs may not be updated unless the task explicitly includes it or your workflow requires it.

Even generated docs need human context. An agent can describe what an endpoint returns. It may not know why the endpoint exists, which consumers depend on it, or what migration constraints apply.

Review fatigue

AI-generated code often looks clean. It is formatted, consistent, and plausible.

That makes review harder, not easier.

Reviewers need to look beyond syntax and ask:

Does this match the API contract?
Does this preserve consumer expectations?
Does this follow versioning rules?
Are error responses consistent?
Are docs, mocks, tests, and schemas updated together?

If 25% of code is generated by agents, review volume increases. Without automated API checks, teams risk gradually rubber-stamping changes that look fine but break consumers.

How to build agent-proof API workflows

1. Make the API contract the source of truth

Design-first API development is the strongest defense against agent-induced drift.

Without a contract-first workflow:

Code change → Tests pass → Ship → Consumer breakage discovered later

With a contract-first workflow:

OpenAPI spec defines contract → Code must match spec → CI catches drift

Use your OpenAPI spec to define:

paths
methods
parameters
request bodies
response bodies
status codes
error schemas
auth requirements

Then validate implementation behavior against that spec.

Apidog’s visual API designer lets teams define endpoints, schemas, and response formats before implementation. When Agent Smith or another agent generates code, you validate the output against the spec instead of relying only on generated tests.

2. Use contract tests, not only unit tests

Unit tests validate internal behavior. Contract tests validate the agreement between your service and its consumers.

For AI-generated API changes, contract tests catch issues unit tests often miss.

Example using a strict response schema:

// This test fails if the response shape changes,
// even if the new shape looks reasonable.
describe("GET /api/users/:id contract", () => {
  it("returns the expected schema", async () => {
    const response = await request(app).get("/api/users/123");

    expect(response.body).toMatchSchema({
      type: "object",
      required: ["id", "name", "email", "created_at"],
      properties: {
        id: { type: "string" },
        name: { type: "string" },
        email: { type: "string", format: "email" },
        created_at: { type: "string", format: "date-time" }
      },
      additionalProperties: false
    });
  });
});

The important line is:

additionalProperties: false

Without it, an agent can add response fields and still pass the test. With it, any schema change must be intentional and reflected in the contract.

Apidog can automate contract testing from your API spec, so responses are validated against the declared schema during manual testing and CI/CD runs.

3. Gate deployments on spec validation

Add API contract validation to your CI/CD pipeline.

A basic pipeline step should fail the build if the running implementation does not match the declared API contract.

Example:

- name: Validate API contract
  run: |
    apidog run --test-scenario-id CONTRACT_TESTS

    if [ $? -ne 0 ]; then
      echo "API contract violation detected. Review API changes."
      exit 1
    fi

This gives you a hard deployment gate for both human-written and AI-generated code.

The goal is simple: no implementation ships unless it matches the spec.

4. Require spec updates for API behavior changes

Create a team rule:

Any PR that changes API behavior must include the corresponding OpenAPI update.

This should apply to:

new endpoints
removed endpoints
new request fields
changed request validation
new response fields
changed response fields
changed status codes
changed error formats
auth or permission changes
pagination changes

For AI-generated PRs, the agent must update the spec, or a human must update it before merge.

In Apidog, spec changes can propagate to:

API documentation
mock server responses
test assertions
client SDK types

That reduces the chance that code, docs, tests, and mocks drift apart.

5. Add API-specific CI checks

General test suites are not enough. Add checks that focus on API compatibility.

Useful CI checks include:

OpenAPI linting
OpenAPI diff checks
Contract tests
Backward compatibility checks
Mock validation
Generated SDK type checks
Error schema validation

For example, your CI workflow could include:

name: API validation

on:
  pull_request:
    paths:
      - "src/api/**"
      - "openapi/**"
      - "tests/contract/**"

jobs:
  validate-api:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Lint OpenAPI spec
        run: |
          npx @redocly/cli lint openapi/openapi.yaml

      - name: Run contract tests
        run: |
          npm run test:contract

      - name: Validate API scenarios
        run: |
          apidog run --test-scenario-id CONTRACT_TESTS

The exact commands depend on your stack. The principle is what matters: API behavior should be validated separately from implementation logic.

6. Monitor API behavior in production

Pre-production checks reduce risk, but production monitoring still matters.

Track signals such as:

responses that do not match the declared schema
unexpected fields appearing in responses
missing required fields
error rate changes
status code distribution changes
latency changes
new endpoint traffic patterns
increased validation failures
consumer-specific failures

For example, you can log schema validation failures at the edge:

function validateResponse(schema, body, route) {
  const valid = ajv.validate(schema, body);

  if (!valid) {
    logger.warn({
      route,
      errors: ajv.errors,
      message: "API response schema violation"
    });
  }

  return valid;
}

Do not rely on consumers to discover contract issues first.

7. Separate API review from code review

Code review asks:

Does this implementation work?

API review asks:

Does this change affect consumers?

For AI-generated API changes, use a dedicated checklist.

Example API review checklist:

## API review checklist

- [ ] Does this PR add, remove, or modify an endpoint?
- [ ] Is the OpenAPI spec updated?
- [ ] Are request and response schemas accurate?
- [ ] Are status codes documented?
- [ ] Are error responses consistent with the existing error format?
- [ ] Are backward-incompatible changes versioned?
- [ ] Are contract tests updated?
- [ ] Are mocks updated?
- [ ] Are API docs and examples updated?
- [ ] Have downstream consumers been notified if needed?

Put this checklist in your pull request template so it applies to both human and AI-generated changes.

8. Give agents better API instructions

If your team uses autonomous coding tools, encode API rules in the repository.

Examples:

/api-guidelines.md
/openapi/openapi.yaml
/docs/api-review-checklist.md
/tests/contract/
/examples/api-responses/

Your agent instructions should be explicit:

When modifying API behavior:

1. Update the OpenAPI spec.
2. Add or update contract tests.
3. Preserve backward compatibility unless the task explicitly requests a breaking change.
4. Do not add undocumented response fields.
5. Use the standard error schema.
6. Update examples and mocks.
7. Mention API behavior changes in the PR summary.

Autonomous agents work better when conventions are written down. If your API rules live only in senior engineers’ heads, agents will miss them.

The trajectory: where autonomous coding is heading

Agent Smith today vs. tomorrow

Agent Smith at 25% is likely the starting point, not the endpoint.

Sergey Brin called AI agents a “big focus” during a March 2026 sales town hall. As models improve, access restrictions loosen, and engineering workflows adapt, the percentage of AI-generated code is likely to grow.

Other companies are moving in similar directions:

Claude Code’s KAIROS, reportedly leaked in source code, suggests an always-on daemon with GitHub webhook subscriptions and background workers.
GitHub Copilot Agent Mode supports multi-step coding tasks with autonomous file editing.
Amazon’s CodeWhisperer has been expanding from autocomplete toward more agentic workflows.

The trend is clear: AI coding tools are moving from assistant to autonomous contributor to background infrastructure.

For API teams, the question is not whether AI will touch your API code. It is how safely your workflow handles it.

What API teams should prepare for now

Design-first development is becoming more important. When agents write implementation code, the API spec becomes the stable artifact reviewers and automation can trust.

Contract testing is also becoming mandatory. Unit tests are useful, but they do not fully encode consumer expectations. Contract tests make those expectations explicit.

Integrated tooling matters too. Disconnected tools create drift:

Separate API client
Separate test runner
Separate mock server
Separate docs generator
Separate SDK generator

Each disconnected artifact is another thing an AI agent may forget to update.

Platforms like Apidog help keep specs, tests, mocks, and docs synchronized so API changes are easier to validate and review.

FAQ

What is Google Agent Smith?

Agent Smith is Google’s internal AI coding agent built on the Gemini model family and the Antigravity platform. It works asynchronously in the background: engineers assign tasks, and Agent Smith writes, tests, and iterates on code without real-time human interaction. It generated over 25% of Google’s new production code as of March 2026.

Is Agent Smith available outside Google?

No. Agent Smith is an internal tool restricted to Google employees. Google has not announced plans for a public release. The technology is similar to Copilot Agent Mode and Claude Code, but it is more deeply integrated with Google’s internal codebase and documentation systems.

Does AI-generated code break API contracts?

It can. AI agents write code that passes tests, but tests may not cover all parts of your API contract. Schema changes, new response fields, different error formats, and behavioral changes can pass tests while breaking downstream consumers. Contract testing and design-first development reduce this risk.

Should API teams worry about Agent Smith?

Not about Agent Smith specifically, since it is Google-internal. But API teams should pay attention to the trend it represents. Similar autonomous coding tools are reaching normal development workflows. Preparing now with design-first APIs, contract testing, and integrated tooling makes adoption safer.

How do I prevent AI agents from breaking my APIs?

Use the OpenAPI spec as the source of truth. Add strict contract tests, including additionalProperties: false where appropriate. Gate deployments on spec validation. Require spec updates for API behavior changes. Use tooling such as Apidog to synchronize specs, tests, mocks, and documentation.

What is the difference between AI-assisted and AI-generated code?

AI-assisted code is produced with real-time human oversight, such as Copilot suggestions or interactive Claude Code sessions. The developer sees and approves changes as they happen.

AI-generated code, in the Agent Smith model, is produced asynchronously. The developer assigns a task and reviews completed work later. That separation changes review dynamics and increases the need for automated validation.

Will AI agents replace API developers?

No. Agent Smith still requires human task definition, code review, and deployment approval. A March 2026 MIT study confirmed that AI augments developer productivity but does not replace the judgment, context awareness, and architectural thinking that humans provide. The role shifts toward defining tasks, reviewing output, and maintaining system coherence.

Key takeaways

Google’s Agent Smith generates over 25% of new production code through asynchronous autonomous operation.
This marks a shift from AI-assisted coding to AI-generated code.
API contract drift is one of the biggest risks when autonomous agents modify endpoints and schemas.
Design-first development with OpenAPI as the source of truth helps prevent contract breakage.
Contract testing catches API changes that unit tests often miss.
Deployment gates should validate implementation behavior against the declared spec.
API review should be separate from general code review.
Integrated platforms like Apidog help synchronize specs, tests, mocks, and docs.
Autonomous coding is accelerating, so API teams should build agent-proof workflows now.

Agent Smith at 25% is the beginning. Teams that build reliable API contracts, automated validation, and synchronized API workflows today will be better prepared to use autonomous coding tools safely tomorrow.

DEV Community