How Kubernetes-Inspired API Design Helps LLMs

#llm #kubernetes #api #architecture

A recent blog post from Vercel on MCP API design triggered me.

According to the blog post, it is not a great idea to just wrap your normal API with an MCP server. Instead, the developer should build new MCP operations that focus on workflows that combine what would otherwise take multiple sequential API calls.

Vercel's Workflow Proposal

They give the following example for an implementation of a workflow-style MCP operation called deploy_project:

server.tool(
  "deploy_project",
  "Deploy a project with environment variables and custom domain",
  {
    repo_url: z.string(),
    domain: z.string(),
    environment_variables: z.record(z.string()),
    branch: z.string().default("main")
  },
  async ({ repo_url, domain, environment_variables, branch }) => {
    // Handle the complete workflow internally
    const project = await createProject(repo_url, branch);
    await addEnvironmentVariables(project.id, environment_variables);
    const deployment = await deployProject(project.id);
    await addCustomDomain(project.id, domain);

    return {
      content: [{
        type: "text",
        text: `Project deployed successfully at ${domain}. Build completed in ${deployment.duration}s.`
      }]
    };
  }
);

This bundles what would otherwise be four individual operations:

create_project(name, repo)
add_environment_variables(project_id, variables)
create_deployment(project_id, branch)
add_domain(project_id, domain)

This is not obviously a bad idea.

Why Workflows Help

The performance improvement when bundling multiple API calls into workflows comes from multiple sources:

Fewer LLM re-invocations: Every time an LLM agent uses a tool and then receives the tool result, it results in another LLM invocation. Even though prompt prefix caching reduces the cost (as the conversation history is the same), cached tokens still cost money. For Claude 4 Sonnet, where cached tokens cost 10% of uncached ones, a complex workflow taking more than O(10) sequential tool invocations can result in costs being dominated by cached tokens.
Lower chance of errors: The LLM does not have to take the result from one tool call and use it as a parameter to a subsequent tool call. Servers often generate random IDs upon entity creation, and these need to be used in follow-up calls. LLM agents can hallucinate IDs or lose track of their task mid-workflow. Bundling the workflow reduces this surface area for error.

The Downside

But there are problems with workflows, too:

Loss of flexibility: If only workflows are exposed as MCP tools, certain operations (create_project, add_environment, etc.) are no longer possible to run in isolation. For example, the agent might want to add more environment variables or domains to a pre-existing project.
Token overhead: If the new "workflow-based" MCP tools are in addition to the low-level MCP tools, it increases the amount of tokens that the MCP tool definitions consume.
Extra design effort: Deciding what makes a “good workflow” is nontrivial. And if a workflow is good, then why expose it only via MCP, and not to other API users? Humans might also want that simpler API.

Kubernetes Shows the Way

For an example of what I would regard as a great "workflow," take a look at the Kubernetes deployment entity.

Consider you are tasked with deploying a new version of your application without downtime using only the low-level API calls to create and delete pods. You would have to:

Create new pods
Wait for them to start up and ensure they're healthy
Stop the old pods

The k8s deployment entity can take care of all of this for you! You create the deployment entity, and the k8s-internal controllers will perform the complex multi-step tasks needed to achieve the final goal of rolling out a new version without downtime.

At the same time, Kubernetes does not take away low-level control. You can still delete or create pods in an ad-hoc manner if needed.

Names, Not Random IDs

The Kubernetes API provides inspiration for another thing: avoid requiring the client to deal with server-generated random identifiers.

In Kubernetes, related entities instead refer to each other using their client-generated name or labels. The result is that it is typically possible to create multi-entity configs in the k8s control plane in one go and let Kubernetes figure out how to wire up the entities.

Bringing It Together

Applying these ideas to Vercel's deploy_project example from above would look like this:

apiVersion: vercel.com/v1
kind: project
metadata:
  name: $project_name
spec:
  repo: $project_repo
  environment_variables: $env_variables
---
apiVersion: vercel.com/v1
kind: deployment
metadata:
  name: $deployment_name
spec:
  branch: $branch
  project: $project_name
---
apiVersion: vercel.com/v1
kind: domain
metadata:
  name: $domain_name
spec:
  domain: $domain
  project: $project_name

All of this could be applied in a single step, analogous to:

kubectl apply -f .

To know whether it worked, the LLM would need another tool to wait for the k8s objects to attain readiness or failure, similar to:

kubectl wait

Conclusion

Kubernetes shows us a design pattern that is:

Declarative: Clients state intent, not steps
Composable: Both low- and high-level operations remain possible
Clean: Internal IDs do not bother the client
Efficient: Agents need to treat only the final result

Designing APIs in this style wouldn’t just make LLM agents more capable — it would also make APIs more usable for humans and easier to treat with infrastructure-as-code tooling.

Instead of choosing between “wrapping APIs” and “building workflows,” we could be asking: What would the Kubernetes version of this API look like?

This blog post was partly inspired by experience working on an internal AI assistant at my employer TNG Technology Consulting.