Snap Cut a Quarter of Its Workforce While AI Wrote 65% of Its Code

#ai #programming #productivity #career

Book: Prompt Engineering Pocket Guide
Also by me: AI Agents Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

On April 15, 2026, Evan Spiegel sent a memo (per the Snap Newsroom statement and TechCrunch coverage linked below). About 1,000 people lost their jobs. Another 300 open roles were closed before anyone filled them. Counting both, that's roughly a quarter of the headcount Snap had planned to operate with this year. In the same memo, Spiegel told staff that AI now generates more than 65% of new code at the company. Snap's stock jumped 7% on the news.

Read the Snap Newsroom statement and the TechCrunch coverage and you get the corporate framing: a "new way of working," smaller teams shipping more, $500 million off the annualized cost base by H2 2026. The Variety write-up and the Deadline piece round out the on-record numbers.

The number that should make you sit forward is the second one. 65% of new code, written by a model. Two of every three commits. That is a real number from a real company shipping a product hundreds of millions of people use. It is also a number that gets badly misread by everyone who sees it for the first time.

What "65% AI-written" actually means

Two-thirds of the keystrokes are gone. The engineering work is not.

Those are different bills.

If you've spent any real time working with Claude Code, Cursor, Copilot, or one of the in-house variants, you already know the shape. The model fills in the obvious. CRUD endpoints. Test scaffolding. SQL plumbing. The third migration that looks like the second migration. The handler that wraps a third-party SDK in your project's idioms. The model is excellent at the part of the job that used to feel like typing.

What it is not excellent at is the rest of the job. It does not know which third-party SDK to wrap. It does not catch a migration that is unsafe under load. The test it wrote passes for the wrong reason. The endpoint is correct in isolation and catastrophic next to the rate limiter that ships next sprint.

Snap is telling you AI did 65% of the typing. The company decided it could ship the same product with fewer typists. That is not the same claim as 65% of the engineering.

The 35% that's left is the part that pays. That 35% is what Snap kept hiring for, even as the layoff memo went out. Look at any tech-layoff round in 2026 and you'll find the same shape. Headcount down. Senior, system-savvy, evals-and-observability-fluent headcount up.

The defensive move is not "code faster"

A reasonable engineer reads a memo like Spiegel's and feels the urge to type faster, with better autocomplete, a second IDE, three more keyboard shortcuts. None of that wins the round. You are competing with a model that types at 200 tokens a second and never gets tired.

The defensive move is to become the person who wires the system around AI-generated code so it doesn't blow up. There are four jobs inside that, and senior engineers who do them well are the ones who survived the round at Snap and will survive the next round somewhere else.

Evals. Not unit tests. Behavioral evals that check whether the model still does the thing you wanted on the last 200 production prompts. Run them on every prompt change and every model upgrade. Every team I see ship LLM features and then panic six weeks later is a team that did not write the eval rig until the panic.

Observability. Token counts, latency percentiles, tool-call traces, retrieval recall. If a model-driven feature degrades on Tuesday at 3pm, you should know which model, which prompt version, which retrieval index, and which downstream tool. Most teams have logs. Few have spans you can actually pivot on.

Idempotency. Models retry. Agents retry. Users retry. If your AI handler isn't safe under duplicate calls, you are one network blip from charging a card twice or sending two confirmation emails.

Review. The model wrote it; you own it. The reviewer's job is to ask the questions the model didn't ask itself: what happens when this argument is null, what happens at scale, what happens when the underlying API changes its rate limit.

A small "ship + measure" pattern

Here is the kind of code an AI assistant produces in five seconds. It works. It will pass review on a tired Friday afternoon.

from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI()


class Req(BaseModel):
    user_id: str
    question: str


@app.post("/answer")
def answer(req: Req):
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": req.question}],
    )
    return {"answer": r.choices[0].message.content}

This is the code that gets you fired six months from now. Not because it crashes. Because nobody knows what it's doing in production. Here is what a senior engineer adds on top, without changing the contract.

import hashlib
import logging
import time
import uuid
from contextvars import ContextVar
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI()
log = logging.getLogger("answer")

request_id: ContextVar[str] = ContextVar("request_id", default="-")
_idempotency_cache: dict[str, dict] = {}  # in-memory; replace with Redis/Postgres in prod


class Req(BaseModel):
    user_id: str
    question: str


def _key(user_id: str, question: str) -> str:
    raw = f"{user_id}:{question}".encode()
    return hashlib.sha256(raw).hexdigest()


@app.middleware("http")
async def trace(req: Request, call_next):
    rid = req.headers.get("x-request-id", str(uuid.uuid4()))
    request_id.set(rid)
    start = time.monotonic()
    resp = await call_next(req)
    log.info(
        "http",
        extra={
            "rid": rid,
            "path": req.url.path,
            "ms": int((time.monotonic() - start) * 1000),
            "status": resp.status_code,
        },
    )
    return resp


@app.post("/answer")
def answer(req: Req):
    key = _key(req.user_id, req.question)
    if key in _idempotency_cache:
        log.info("cache_hit", extra={"rid": request_id.get()})
        return _idempotency_cache[key]

    t0 = time.monotonic()
    try:
        r = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": req.question}],
            timeout=15,
        )
    except Exception as e:
        log.exception("llm_error", extra={"rid": request_id.get()})
        raise HTTPException(status_code=502, detail="upstream") from e

    out = {
        "answer": r.choices[0].message.content,
        "model": r.model,
        "input_tokens": r.usage.prompt_tokens,
        "output_tokens": r.usage.completion_tokens,
        "latency_ms": int((time.monotonic() - t0) * 1000),
        "rid": request_id.get(),
    }
    _idempotency_cache[key] = out
    log.info("llm_ok", extra=out)
    return out

Same shape and same contract, with a very different production lifespan.

A request id you can grep across services. A latency you can chart. Token counts you can budget against. An idempotency key so a retry doesn't double-bill the model. A timeout so a slow upstream doesn't pile up workers. An exception path that returns a clean 502 instead of bleeding the OpenAI traceback to the client. None of this is rocket science. None of it gets generated by default. It is also the kind of work that survives the next round of layoffs, because no model does it on its own.

Replace the in-memory dict with Redis or a row in Postgres for real idempotency. Send the structured logs to whatever pipeline you have. Add a span around the LLM call. The pattern stays the same: the AI wrote the happy path, you wired the boundary.

What the Snap memo really tells you

Spiegel's memo is the kind of document that makes engineers update their resumes. That is a fine instinct. While you're at it, update what's on the resume.

If your last bullet is "shipped X feature using React and Node," that is a 2022 bullet. The model can do that. The 2026 bullet reads something like "owned the eval pipeline that caught a 9-point regression on prompt v23 before it hit production," or "instrumented the agent's tool-call layer with distributed tracing and cut p95 meaningfully," or "wrote the idempotency layer that prevents duplicate model calls during retry storms." Those are not flashy bullets. They are the bullets that survive.

Snap got a stock bump for cutting 1,000 people. The market read the memo and decided AI productivity is real and the headcount math is real. The market is not wrong about either. What the market does not show on the ticker is which 1,000, and which engineers get the next email.

The way to not be on that list is to do the part of the job the model cannot. Wire the evals and own the spans. The handler has to be safe under retry. Review like the model is a junior who is fast, confident, and wrong (call it 8%) about things in ways that cost money.

That 8% is your job security. Spiegel is not lying about the 65%. He is telling you exactly where the line is.

If this was useful

The Prompt Engineering Pocket Guide covers the prompt-side discipline that makes the 65% land where you want it: patterns for evals, structured outputs, refusal handling, and the prompt versioning that keeps regressions out of production. The AI Agents Pocket Guide goes one layer up: orchestration, tool-call safety, retries, and the boundary work that keeps an agent from doing something expensive twice.