Joshua Phillips

Posted on Apr 16

The Full Stack Is One Layer Deeper. You've Been Building It.

#ai #webdev #programming #machinelearning

This is a cross-post of a piece I wrote for RocketRide. The original is on Medium. If you're building anything production-grade with LLMs, this is the argument I've been trying to make for a while. Questions and pushback welcome in the comments.

In 2015, Mathias Biilmann was building Netlify and kept running into the same problem: engineers were doing something that had no name. They were decoupling frontends from monolithic backends, pre-rendering markup, serving assets from CDNs, and consuming APIs for dynamic functionality. The pattern was everywhere. Nobody knew what to call it. Biilmann coined JAMstack, presented it at SmashingConf in 2016, and the industry reorganized around it almost immediately. Not because the work changed. Because the work finally had a name.

That isn't a new story in software. LAMP was coined in 1998, years after its components were already in common use together. MEAN followed in 2013, describing a combination of tools developers had already assembled on their own. Serverless entered the vocabulary around the time AWS Lambda launched in 2014, giving a name to a pattern engineers were already running in production. In every case, the name did not create the stack. The stack already existed. The name gave everyone permission to treat it as a real thing: to design around it, hire for it, and build tooling on top of it.

We are in that moment again, right now, with the AI layer of the stack.

The name did not create the stack. The stack already existed. The name gave everyone permission to treat it as a real thing.

You Have Already Built This

In 2023, I worked on a data breach reporting system for an internal company hackathon. We had access to a distributed data catalog and the idea was to use an LLM to generate breach reports from classified data across multiple sources. The first version was simple: feed the data in raw, let the model do its thing. It fell apart immediately. Inconsistent report formats. Errors on edge cases. Outputs that looked plausible but were structured differently every run.

So we fixed it the way any engineer fixes a broken system. We scrubbed and formatted the data before sending it. We wrote a system prompt that described the shape of the input, the task, and the exact shape of the expected output. We added explicit fallback handling for every edge case we could anticipate. By the time it worked reliably, we had built something with real engineering discipline behind it. Shoutout to Ryan Christensen and Jimmy Tucker, who built it with me. We won.

At the time, I didn't have a name for what we had built. I called it "making it work." Looking back, we had built prompt management, input normalization, output contract design, and edge case handling into a pipeline. Each of those is a named engineering discipline now. We were doing the work before the vocabulary existed to describe it.

If you have shipped anything with an LLM in the last two years, you have done the same. You iterated on a prompt until the output was consistent. That is prompt management. You wrote retry logic for when the model timed out or returned garbage. That is reliability engineering applied to a nondeterministic system. You figured out how much context to send without hitting limits or muddying the response. That is context management. You built some way of knowing whether a change made things better or worse. That is an eval, even if you never called it one.

The work exists. It has always existed, every time an engineer has wired up an LLM in a real project. What has not existed, until recently, is a clear name for the layer all of that work belongs to.

We were doing the work before the vocabulary existed to describe it.

Why the Lack of a Name Matters

Here is a complaint I hear from engineers constantly: there are too many AI tools, they move too fast, and by the time you pick one and integrate it, something better has come out. It feels like an impossible treadmill.

That frustration is real, but it's being diagnosed wrong. The problem isn't the volume of tools. The problem is that without a formally defined layer with clear responsibilities and interfaces, every new tool looks like a potential re-architecture of your entire product. You are not evaluating whether a tool fits inside a well-defined slot. You are asking where it even lives in your system, what it replaces, and whether adopting it means rewriting everything that touches it.

Think about how the API layer solved a similar problem. Before REST became the obvious answer, a lot of applications had database calls tangled directly into view logic. When a new database technology came out, adopting it meant touching everything. The separation of concerns that a well-defined API layer provides means that today, swapping a backend service is bounded. It's a change inside the layer, not a re-architecture of the product.

The AI layer has not yet reached that level of formalization for most teams. So every new model release, every new orchestration framework, every new retrieval approach feels like a potential disruption to the entire system. The churn isn't the problem. The missing layer boundary is the problem. Name the layer, define its responsibilities, and the churn becomes a set of bounded decisions inside a defined space.

The churn is not the problem. The missing layer boundary is the problem.

What the Layer Is

Think about how a traditional stack layer works. The layers above and below it are long, cohesive slabs: your frontend is one thing, your backend is one thing, your data layer is one thing. The AI layer is different. It isn't a single elongated component. It's a composite, more like a section of brickwork than a single slab. Multiple discrete components, each with a specific job, that together form one coherent layer. Remove one brick and the layer is structurally compromised.

That composite structure is what makes the AI layer distinct from simply "adding an API call to an LLM." The layer lives between your data and existing API layer on one side and your application behavior on the other. Its job is to take inputs from your system and produce intelligent, reliable, observable outputs through language models. Here is what it's made of.

What the Layer Actually Contains

Pipelines

A pipeline is a directed sequence of functions, each with a specific job, that collectively get data into a shape the model can use and results into a shape your application can trust. It might include database connections, data transformation, context retrieval, an LLM call, output validation, and a write to storage. Order matters. Branching matters. Whether steps run in sequence or in parallel is a design decision, not an implementation detail. A pipeline isn't a single model call. It's the infrastructure around the model call that makes it reliable enough to ship.

Prompt Management

A prompt is a recipe you are writing for the LLM. You hand it a set of ingredients and instructions and it produces something. The more specific the instructions, the more consistent the result. When you get something unexpected back, you go back into the recipe and add a fallback case, a way to handle the edge you missed. That is exactly how you would approach defensive programming, just applied to natural language instead of code.

The important difference is that a prompt is an un-sequenced list of expectations, not a deterministic program. You are writing a contract with a probabilistic system. That is what makes prompt management its own engineering discipline and not just "write a good comment." Prompts have versions. They have inputs and outputs. Changes to them have measurable effects. Treating them as static strings scattered through your codebase is the equivalent of inlining SQL queries in your view templates.

You are writing a contract with a probabilistic system.

LLM Routing and Orchestration

Not every call should go to the same model. When selecting between models, the decision tree is straightforward: can you afford it? If not, it's eliminated. Does it produce the quality of output your use case requires? If not, eliminated. Is it fast enough that users will not notice the latency? If not, eliminated. Whatever survives all three is your candidate pool, and you are optimizing within it.

This is the same tradeoff reasoning you apply to any external dependency, just with a different set of variables: cost, output quality, latency, and capability fit. The engineer selecting an LLM isn't doing research. They are walking a tightrope between budgetary constraints and the quality and reliability the product demands. Traditional software engineering paradigms still apply: users will not use your product if a request hangs on an LLM taking too long to respond.

Context and Memory Management

Language models are stateless. Any sense of continuity or awareness has to be constructed explicitly, every time. Managing context well is its own skill, and it has failure modes on both ends.

Too much context is as dangerous as too little. Bombarding the model with information does not produce better responses. At a certain point it muddies the results, making it harder for the model to respond with anything meaningful. The Liu et al. "Lost in the Middle" study, published in the Transactions of the ACL, confirmed this empirically: model performance degrades when relevant information is buried in the middle of a long context, even in models explicitly built for long-context use. The quality of what you give the model directly determines the quality of what you get back.

There is a third failure mode worth naming: stale context. Forcing a conversation to continue when the model is stuck is like arguing with someone who isn't getting what you are saying and just talking louder. Sometimes the right engineering decision is to take a breath, start fresh, and carry over only the parts of the previous context that actually matter. That isn't a failure. It's a deliberate design pattern.

Tool Use and Agents

The word "agent" has been diluted by hype to the point where it means almost nothing in most conversations. The clearest definition comes from Franklin and Graesser's 1997 taxonomy: an autonomous agent is a system situated within an environment, sensing and acting over time in pursuit of its agenda, so as to affect what it senses in the future.

That definition is precise and it's still the right one. An agent is a nondeterministic system, running inside an environment, given tools it decides how to use in order to complete a task. Think of it this way: if you give a person instructions to drive a nail into something and hand them a hammer, you are not specifying every motion of their arm. You are defining the task and the available tool and trusting the system to figure out the execution. Upon delivery, you don't care how it got done. You care that the nail is in, the wall is intact, and the job is done within the scope of the instruction given.

Agents introduce nondeterminism that static pipelines don't. They require careful design around what tools are exposed, what permissions they carry, and how failures propagate. That isn't a reason to avoid them. It's a reason to understand the boundary between a pipeline and an agent before you build one.

Evals and Observability

Most teams skip this until something breaks in production and they have no idea why. The instinct is to try to guarantee the output, to lock down the model until it's predictable. That instinct is wrong. You will never achieve a guaranteed output from a nondeterministic system regardless of how much effort you apply.

The right approach is to accept a certain degree of unpredictability and build measurement around it. Provide your input. When the system misbehaves, you have tracked it and can analyze it. You iron out the issues in future releases. You are not testing for exact output. You are tracking behavior over time and using deviation as the signal. That is what a mature eval practice looks like.

The Stack Overflow Developer Survey from 2025 is a data point worth sitting with: 84% of developers are using or planning to use AI tools, but only 29% trust the accuracy of the output, down from 43% the year before. That data is from mid-2025, and the gap has likely narrowed since. But the direction matters more than the number: engineers were shipping AI systems faster than they were building the infrastructure to trust them. That isn't a model quality problem. It's an observability problem.

Engineers were shipping AI systems faster than they were building the infrastructure to trust them.

What Owning This Layer Means

If you are a full stack engineer building a product that uses AI in any meaningful way, you own the AI layer. That isn't an addition to the job. It's the job.

What changes is what you have to reason about. You are no longer thinking only about request latency and database query plans. You are thinking about context window limits, prompt regression, model version drift, retrieval quality, and the specific failure modes of nondeterministic systems. Some of these have analogs in traditional software. Most require building new intuitions.

The tooling is maturing. A year ago, you were largely building this layer by hand. Today you have options: LangChain and LlamaIndex for pipeline composition and retrieval, Weights and Biases and Braintrust for evals and observability, and runtimes like RocketRide, which treat the pipeline as the deployable unit of the AI layer the same way a container is the deployable unit of your application layer. The layer now has dedicated tooling the same way the API layer has Express and FastAPI.

The engineers who will do this work best are not the ones who know the most about transformer architectures. They are the ones who apply the same discipline to the AI layer they already apply everywhere else in the stack: clear interfaces, observable behavior, testable contracts, and a bias toward simplicity before complexity.

The Stack Is One Layer Deeper

Working inside a chaotic system, with no clear map of where things are or where they are going, is one of the most intimidating experiences in engineering. But this is a pattern. It happens in every period of growth, in every discipline, not just software. The chaos eventually gets formalized. Names get assigned. Structure emerges. What felt novel becomes the new normal.

We are in that period right now with the AI layer. It's painful. But the shape of what comes next is already visible, because engineers are already building it.

The most useful thing you can do in this moment is reframe how you see the work you have already been doing. Assign a structure and a name to it. Know where the AI layer lives in your stack, what its responsibilities are, and what each of its components does. That clarity isn't just semantic. It's the map that pulls you out of the feeling of uncertainty, the beginning of the end of the chaos.

You have been building toward this the whole time. Now you have a name for it: the AI layer. It belongs in the full stack the same way the API layer and the data layer do. Treat it that way.

Now you have a name for it: the AI layer. It belongs in the full stack the same way the API layer and the data layer do. Treat it that way.

About RocketRide

RocketRide is an open-source AI runtime built to be the execution layer in your stack. It gives engineers a structured way to build, run, and manage AI pipelines in production: the brickwork this article describes, with first-class support for parallel execution, multi-model orchestration, and the integrations your stack already depends on.

It's MIT licensed and free to use.

Try it out. Drop RocketRide into your IDE, terminal, or agent and see the AI layer in practice. rocketride.ai

Give us a star. If this resonates, the best thing you can do is star the repo. github.com/rocketride-org/rocketride-server

Join the community. The runtime is open source and the direction is community-driven. Open a PR, fork it, or drop an issue. Come shape it. discord.gg/rocketride

DEV Community