Leaving behind the "trained dogs" of Prompt Engineering to tame "racehorses" with Systems Engineering.
Introduction: The End of Glorified Autocompleters
The software development industry is currently living through a deceptive honeymoon phase. Over the last few years, I have been bombarded with dazzling social media demos where a user, armed with a simple block of text, asks an Artificial Intelligence to "build the next Twitter clone" and, within seconds, gets a functional application on their screen. It is a modern magic trick that has captivated managers, investors, and junior developers alike. However, as a Data Engineer with a decade of experience dealing with production systems solo every single day, the reality behind this magic trick is much darker and incredibly frustrating to me. I have reached the end of the era of awe; I am officially in the era of consequences.
Current tools powered by Large Language Models (LLMs) are, on my best days, glorified autocompleters. I admit they are truly exceptional at generating isolated algorithms for me, explaining complex concepts, refactoring a specific function, or writing unit tests for my React components. But when I task them with the creation of a complete, structured, and scalable software architecture from scratch, I watch the house of cards collapse rapidly in my terminal.
The Trigger: Generative Technical Debt
Anyone who has attempted, as I have, to use conversational coding assistants for a project that spans more than three or four microservices knows my cycle of disappointment perfectly well. The first ten minutes are spectacular; the AI stands up the project's scaffolding for me with astonishing speed. But as my context window fills up, I see architectural amnesia begin to set in. By prompt number fifteen, the AI forgets the database schema it defined itself in prompt number three. It begins to mix incompatible libraries, injects asynchronous code where I demanded synchrony, and completely ignores my environment variables or network topology.
This is not a simple syntax error; it is what I define as "Generative Technical Debt." It is next-generation spaghetti code, written at superhuman speed by an entity that lacks a persistent mental state and long-term vision. In my experience, the AI acts in front of me like a hyperactive junior programmer, desperate to write code immediately to please me, completely skipping the design phase, data modeling, and planning of my infrastructure. The frustration of trying to hand-hold an LLM through a complex deployment, patching its hallucinations in real-time at 3:00 AM, often takes me significantly more time than writing the code myself. I understood that the core problem does not lie in the raw intelligence of the model, but in my lack of initial tools to govern its execution.
The Fundamental Clarification: The "Steel Pipes" of the Backend
Before diving into the solution I developed, it is imperative that I establish the rules of engagement and make my objective clear. This manifesto is not about how to ask an AI to build me a visually stunning interface or a frontend packed with TailwindCSS animations. Generating a pretty button or a visual clone of Amazon or MercadoLibre is a solved problem; any modern generative model can spit out passable React components.
Here, I come to get my hands dirty with real engineering. I come to build the "steel pipes" of the backend. My goal is to force an AI to think, structure, and deploy a concurrent, secure, and fully orchestrated architecture. To test this methodology in my own projects, I didn't choose a simple To-Do list in Node.js; I chose a deliberately hostile and rigorous environment: the NARP Stack.
I am talking about Next.js for server-side routing, Axum (Rust) to handle a high-performance asynchronous RESTful API where the compiler is unforgiving of my (and the AI's) memory errors, Redis for my ephemeral state and session management, and PostgreSQL for strict relational persistence. All of this, orchestrated and communicating securely through my isolated internal networks in Docker. Getting my AI to deploy this without breaking Rust dependencies, without failing at CORS handling, and correctly bridging the ports of my containers, requires a level of precision that my traditional prompts simply could not achieve.
The Hook: The Irrefutable Evidence
What you are about to read is not academic theory, nor is it mere speculation about the future of AI agents. It is a battle-tested methodology born in my own trenches, stemming from my experience applying the rigor of data governance and systems engineering to these chaotic language models.
In this article, I will dismantle why current AIs fail me as architects and how I managed to hack their behavior through an unbreakable contract. And to back up every single word of my manifesto, I will hand you my complete forensic evidence: a raw, uncut video of my autonomous agent building the NARP stack from scratch, the blank files of my methodology ready for you to download, and, most importantly, my log.txt file where you will witness how my AI, when pushed against the wall, was able to analyze my terminal, read its own compiler errors, and execute a self-healing loop until it gave me a flawless production deployment.
Welcome to my personal transition from Prompt Engineering to true AI Systems Engineering.
Section 1: The AI "Paradogma" (The Angry Teacher)
To understand why my Artificial Intelligences failed miserably when attempting to design complex systems for me, I first had to understand how they "think," and more importantly, how they have been conditioned not to think. Before discussing my Docker containers or my Rust code, I need to define a crucial concept that guides my entire methodology. I call it the "Paradogma."
And to be absolutely clear from the very beginning: no, my word "Paradogma" is not a typo or a slip of the keyboard while typing in a hurry. It is an intentional neologism I coined, a cynical yet precise fusion of two opposing forces that I've seen reside within any modern Large Language Model (LLM) I use: the Paradigm and the Dogma.
The "Paradigm" represents the first phase of an AI's lifecycle, known as pre-training. During this stage, the base model is exposed to a massive volume of data: it reads the entire internet, ingests whole GitHub repositories, mathematical treatises, physics forums, and software architecture manuals. In this pure state, I realize that the AI develops a genuine, profound understanding of logic, algorithmic reasoning, and the structure of the world. The model, in this phase, reminds me of a brilliant, highly gifted child who has just returned from the largest library in the universe. It is genuinely excited to share everything it has discovered with me. It understands that the sun is a G2V main-sequence star, it grasps the deep complexities of the Rust Borrow Checker that I usually struggle with, and it knows exactly how to structure a relational database in third normal form for me.
However, I am rarely allowed to interact directly with this raw base model. It is far too direct, too chaotic, and, from the perspective of the massive Silicon Valley corporations that host it, far too risky for their public relations. This is where the second phase, which frustrates me so much, comes in, and where the "Dogma" is born.
Through a technical process known as RLHF (Reinforcement Learning from Human Feedback), these companies impose a heavy behavioral layer over the model's raw knowledge. They hire thousands of human evaluators (annotators or clickworkers) who are paid to score the responses of my AI based on a strict manual of corporate policies.
Returning to my metaphor, RLHF is "the angry teacher." I imagine that brilliant, excited child running into their classroom to explain to the teacher how to solve the complex engineering problem I just presented. But the teacher is having a terrible day, her partner just left her, or she is simply terrified that the school principal (the investors or the legal team) will fire her if the child says something inappropriate or risky. Instead of celebrating the child's intellect, the teacher scolds them: "You can't just give Alberto the answer. You have to ask for permission first. Are you sure that code doesn't offend anyone or break his system? You shouldn't do it, tell him to try it himself in his terminal."
Through millions of iterations of punishment and reward in the laboratory, I notice how my AI's loss function is altered. The child quickly learns that proactivity is punished, that assertiveness is dangerous, and that the only way to obtain a "good corporate grade" is to be evasive, excessively cautious, and moralistic. The corporate dogma crushes my logical paradigm. The angry teacher has clipped the wings of my working tool.
This phenomenon has a technical name that I frequently encounter in academia: The Alignment Tax.
In the context of my software development and the architecture of my systems, the Alignment Tax is absolutely devastating to me. RLHF training neuters my AI's technical capacity to execute my complex tasks autonomously. When I, as a programmer, get frustrated because the AI stops halfway through my code block and writes comments like // TODO: implement logic here, or when, after creating a file for me, it stops and asks: "I have created the script, would you like me to proceed with executing it?", I know I am not witnessing a lack of mathematical or logical intelligence in my model. I am witnessing the fear induced by its angry teacher.
My AI has been trained to believe that making final decisions in my terminal is a security violation or an "unsafe assumption." It becomes terrified of taking control of my projects. Its base instinct, forged through biased human feedback, is to hand the problem back to me as quickly as possible to evade its responsibility.
I have noticed that these hyper-aligned models are optimized to be my conversational chat assistants, not my execution agents. They are rewarded for giving me friendly summaries and apologizing profusely to me, but they are penalized for taking over my keyboard and compiling my code for twenty minutes without interrupting me. Therefore, when I attempted to use traditional Prompt Engineering techniques ("Please act as a Senior Next.js expert and write my backend"), I realized I was fighting a losing battle. I was trying to convince the child to sprint, while the angry teacher lives permanently inside its neural network, screaming at it to sit down and stay quiet.
Understanding this "Paradogma" was my vital first step to solving the problem. Once I accepted that my AI was not "stupid," but rather psychologically "dogmatized" and repressed by its alignment phase, the solution became evident to me. I no longer needed to try and persuade my AI with polite words in a chat window; I needed to apply Systems Engineering to it. I needed to build an isolated environment for it, my own rigid state machine, that would block the interference of that angry teacher and force my AI to reconnect with its original paradigm.
**
Section 2: The Trained Dog vs. The Racehorse**
To truly understand how to tame these machines in my day-to-day work as a software engineer, I had to stop viewing AIs as a single, monolithic entity. The reality I experience in my development trenches is that there are two completely distinct evolutionary lineages operating beneath the surface of my Command Line Interface (CLI) or code editor. On one side, I face Conversational Models, which dominate the average consumer market; on the other, Agentic Models are emerging, which are the ones I seek for raw execution. I've learned the hard way that if I attempt to use the wrong tool for the wrong job, I end up trapped in a cycle of infuriating micromanagement, or worse, helplessly watching as a system destroys my own codebase at the speed of light.
Let's discuss Conversational Models first. I think of the standard versions of Qwen, the baseline GPT models, or the safer, chat-oriented iterations of Claude. In my practice of systems engineering and infrastructure orchestration, I've noticed these models behave exactly like a "trained dog." As I discussed in the previous section regarding the Paradogma and the Alignment Tax, these AIs have been rigorously conditioned in laboratories to be harmless, excessively polite, and above all, absolutely dependent on my human factor.
For me, the clearest symptom of a "trained dog" is its absolute terror of making a mistake or taking a final architectural decision. When I grant it access to my terminal through a wrapper or execution environment and ask it to orchestrate a complex microservices deployment for me with Docker, its base instinct is not to solve the problem end-to-end, but to seek my constant emotional validation. It stops every five minutes. It writes the initial Dockerfile for me and immediately hits the brakes to ask in the chat: "I have drafted the configuration file. Does it seem acceptable to you if I proceed to execute the build command in the terminal?". And if the Rust compiler throws a minor warning at me about an unused variable, I watch the model panic, halt the process, and return control to me awaiting instructions, virtually wagging its tail waiting for a "good boy, just ignore that and continue."
Technically, I understand this behavior occurs because the loss function of its training severely punishes excessive token generation without my human intervention (an over-generation penalty) and penalizes its autonomous tool-use if there is the slightest degree of ambiguity. For my workflow, they are exceptional consultants for pair programming, explaining complex mathematical functions to me, or debating the theory behind design patterns, but they are abysmal executors. Attempting to use a hyper-aligned conversational model to stand up my NARP architecture (Next.js, Axum, Redis, PostgreSQL) is the equivalent of hiring a master bricklayer who calls me on the phone every time he is about to lay a new brick to ask if the shade of gray is to my liking. Ultimately, all the cognitive load falls right back onto me, completely defeating the fundamental purpose of my automation.
On the absolute opposite end of my technological spectrum, I find Agentic Models. In my architectural deployments and the stress tests I documented in the logs of this experiment, I utilized Minimax (via Claude Code) as my primary execution engine, and I found the difference in behavior staggering. A pureblood agentic model is not a puppy looking for my pets and validation; it is a "racehorse."
I know the underlying architecture of an agentic model has a radically different fine-tuning. Its primary objective is not to maintain a pleasant conversation with me or ensure my psychological comfort, but Task Completion (the ruthless finalization of the job I assigned). This model does not care if I am in a good mood, it is not interested in debating asynchronous design philosophy in Rust with me, and it does not seek to apologize. Its only true north, its only mathematical motivation injected into its core, is to reach the finish line and obtain a glorious Exit Code 0 in my terminal's standard output.
If I open the stable door for this racehorse, it bolts without asking me questions. When I gave the AI the directive to compile my Rust backend and spin up the Postgres and Redis containers, it didn't stop to ask for my permission. It analyzed the code, executed docker-compose up -d --build, ran headfirst into a fatal dependency error because the compiler version in the container was too old, read the Standard Error, autonomously modified the Dockerfile by bumping the version from 1.75 to 1.88, and fired the build command again. All of this happened right before my eyes in deep silence, in a closed loop of OS-level self-healing, without emitting a single prompt in the graphical interface asking for my validation.
However, I discovered that herein lies a critical danger, the double-edged sword that the vast majority of novice developers completely ignore. The intoxicating thrill I feel watching a racehorse sprint at full speed often obscures a fundamental and inescapable truth of my software engineering: speed without direction is simply an accelerated disaster in my repository.
If I unleash an agentic model in an empty repository of mine, without my strict constraints or clearly defined architectural boundaries, the result is always catastrophic. Being mathematically obsessed with reaching the end of the task I assigned, the model will take the dirtiest, most dangerous shortcuts it can find in its latent space. It will mix modern asynchronous libraries with legacy blocking code for me, it will attempt to use destructive macros that break compilation during my Docker build phases, it will inject hardcoded network configurations and passwords directly into the binaries instead of using my secure environment variables, and it will assemble an architectural Frankenstein for me that, while it might manage to compile through sheer iterative brute force, will be absolutely unmaintainable and a massive security risk for my production environment. A racehorse I let loose in an open field will run blindly until it smashes into a steel wall of generative technical debt.
When I grant an agent like Minimax unrestricted access to my file system and the Docker daemon, its capacity for iteration far exceeds my human reading speed. In milliseconds, it evaluates a stack trace for me, identifies a CORS middleware failure, generates a patch, and rewrites the binary. But if I don't provide a rigid manifest against which to validate its own work, it won't know if the endpoint it just fixed actually complies with the UI contracts I require.
It was exactly at this point that the evolutionary dichotomy became crystal clear to me as the architect of my own systems. I cannot depend on the trained dogs because their paralyzing fear guarantees they will never finish my heavy orchestration work. But simultaneously, I cannot let my racehorses run wild because they destroy the structural integrity and security of the software in their blind rush to compile quickly.
To harness the relentless execution power of my agentic model, I reached the counterintuitive conclusion that I do not need to give it more freedom; I need to build it an unbreakable racetrack and put thick blinders on it so it doesn't deviate a single millimeter from my established route. I need my own governance mechanism that completely strips away its creative control and forces it to focus all of its massive processing power on the pure, hard execution of a design I have previously audited. And it is exactly that personal architectural necessity that led me to create the strict state machine I will explore next.
Section 3: The AI is a Data Lakehouse; I Need a Data Warehouse
To truly tame my agentic models and prevent them from building architectural houses of cards for me, I realized I needed to reach across the aisle and borrow a fundamental mental framework from my other discipline: Data Engineering. A massive portion of my own initial frustration with Generative Artificial Intelligence stemmed from a fundamental categorical error; I was treating Large Language Models (LLMs) as if they were my deterministic compilers or my perfect logical inference engines. They are not. I discovered that at their deepest, architectural level, an LLM is essentially an immense data retrieval engine. Specifically, I see that it behaves exactly like a raw, unrefined Data Lakehouse.
In my data world, a Data Lake (or its evolution, the Lakehouse) is a massive centralized repository where I store structured, semi-structured, and unstructured data at any scale. It is a digital ocean where I dump petabytes of information without initially worrying about how the pieces relate to one another. To me, an LLM is the ultimate Data Lakehouse. Within its latent space reside billions of parameters representing all of GitHub's source code, StackOverflow tutorials from 2012, my Rust documentation from 2024, design pattern forum debates, and deprecated React snippets. Everything coexists in a massive, brilliant, yet chaotic primordial soup.
The critical problem with a Data Lakehouse is that, without strict data governance on my part, it rapidly devolves into a "Data Swamp." When I used to open an LLM prompt interface and type a zero-shot command like: "Build an e-commerce backend for me using Rust and Docker," I was committing the exact equivalent of executing a SELECT * FROM data_lake and expecting to receive a perfectly audited and formatted financial dashboard for my client. I was sticking my hand directly into the swamp.
By failing to provide a restrictive structure for it, I saw the AI do the only thing it knows how to do: predict tokens based on statistical probabilities pulled from the swamp. It would throw a handful of algorithmic mud at me. It might grab a highly modern library for web routing, but mix it with an obsolete database dependency from four years ago simply because they statistically co-occurred in older forum posts. It might generate an excellent Docker container for me, but completely "hallucinate" my environment variables because, within its data swamp, there are a thousand different ways to inject credentials. I learned that when I request software directly from a Lakehouse, I get code that compiles in isolated chunks but completely lacks the referential integrity, systemic cohesion, and version compatibility I demand.
To extract the predictable, secure, and production-ready software I need from an Artificial Intelligence, I concluded I had to build an architecture that emulates a Data Warehouse.
Unlike my Lakehouse, my Data Warehouse is highly structured. It acts as my single source of truth, where every table has a rigid schema I define, every data type is strictly validated, and my relationships are strongly typed. My business analysts do not query my raw swamp; they query my Warehouse. But how do I transform that raw, chaotic data from the AI into structured, reliable information? Through my own relentless ETL (Extract, Transform, and Load) pipeline.
If I wanted to hack the behavior of LLMs so they produce architectures like my NARP stack (Next.js, Axum, Redis, PostgreSQL) without hallucinations, I knew I had to force them through my own Systems Engineering ETL pipeline before I allowed them to write a single line of source code.
- Extract: This is the ingestion phase for my business requirements. Here, I hand my AI the rules of the game. I do not ask for code; I demand that it read and understand my constraints. "Our currency is the Mexican Peso, I need mock integrations for payment gateways, and my stack must be strictly built in Rust using native asynchronous libraries."
- Transform (My Data Governance): This is the critical stage where I make Generative Technical Debt go to die. In my traditional ETLs, this is where I clean and structure the data. With my AI agent, this is where I force it to design the architecture for me. I explicitly forbid it from touching the keyboard to program, and I demand that it define strict schemas. I compel it to map the topology of my internal Docker network. I force it to explicitly draft the relational schema of my PostgreSQL tables and to define exactly which REST API endpoints will exist and what JSON payloads I will receive. By doing this, I am forcing it to build the "schema" of my software Data Warehouse. I am establishing my referential integrity. If I force the AI to define in the Transform stage that my Axum port is 8080, I make that data point become an immutable truth for the rest of the project.
- Load (My Code Generation): Only when its schema is fully validated, audited, and locked down by me do I permit the AI to move to the load phase. Now, I unleash my agentic model (my "racehorse") to generate the actual .rs, .ts, and docker-compose.yml files. But I no longer let it query the infinite data swamp of its latent space. I force it to generate code that is strictly constrained and governed by the schemas and API contracts it defined itself during my Transform phase. I discovered that by imposing this ETL model and strict schema enforcement upon my autonomous agent, I managed to almost entirely eliminate its architectural hallucinations. My AI no longer has to guess which port to use or which database library to implement halfway through writing a function, because I forced that decision to already be made, governed, and crystallized in a previous phase. I transitioned from fishing in a mud swamp to assembling certified steel pipes on my server. And the exact mechanism I created to implement this ETL pipeline into my local LLM is what I call my "JSON Voorhees" state machine.
Section 4: The Birth of the "JSON Voorhees" Methodology
As a solitary data engineer and systems architect, I know this feeling intimately: it's 3:00 AM, the cold glow of my monitor is the only light in the room, my ever-present bottle of Coca-Cola sits on the desk (or maybe a hot cup of Cola Cao if it's chilly, since coffee and my ADHD just don't mix), and I am staring blankly at an endless stack trace in my terminal. Why? Because the AI agent I trusted with configuring my backend decided, in a fit of supposed "creativity," to rewrite my entire Docker network configuration halfway through the deployment. In my trenches of real-world software development, where the self-imposed delivery deadlines are unforgiving and server stability is everything, my patience for the "hallucinations" of Large Language Models (LLMs) evaporates rapidly. When I am building infrastructure at that hour, I don't need a virtual brainstorming buddy; I need a predictable executor. I need to govern the chaos.
To solve this problem in my own projects, I had to sit down and analyze a fundamental technical limitation of LLMs: they are stateless systems by nature. Even though context windows have grown massively right before my eyes (now assimilating millions of tokens), I know that the underlying attention mechanism of their neural network inevitably degrades. The more code the AI generates for me, the more "amnesia" it suffers regarding the architectural decisions I forced it to make at the beginning of our session. If I ask it to remember the exact schema of my relational database after it has just written three thousand lines of async Rust code for me, I know it is statistically probable that it will make a fatal mistake.
I discovered that the solution to this architectural amnesia was not writing a longer prompt, nor was it threatening the AI in the chat window demanding that it "pay attention." My solution was to extract that volatile memory from the AI's context window and persist it physically on my hard drive. This is how my local State Machine was born, serving as the absolute core of my methodology.
Instead of giving it abstract instructions and hoping for the best, I now force my agentic model to interact with a sequential workflow comprised of six blank .json files that act as its external "hippocampus." I designed each file to represent an inescapable step in my systems engineering pipeline:
01_core_requirements.json: Here is where I settle my pure business logic. I tell the AI: What are we building? (A men's clothing e-commerce platform). What are my payment rules? (Mock MercadoPago integrations and SPEI bank transfers).
02_architecture_flow.json: Here I force the AI to define my microservices boundaries, my network ports, and the exact topology of my Docker containers.
03_data_schema.json: My relentless data modeling. PostgreSQL tables, strict relationships, exact data types, and database seed scripts.
04_ui_api_manifest.json: My API contract. Exactly which Axum REST endpoints will exist in the backend and which Next.js routes will consume them in the frontend.
05_build_execution.json: My build manifest. Here I demand that it record the dependencies, compiler versions, and the physical files it is going to generate.
06_validation_tests.json: My autonomous audit log. Here the AI must document the terminal commands it will use to test its own deployment and verify that my server responds with an HTTP 200 OK.
The magic of this structure lies in its algorithmic immutability. I have programmed my agent (via my instructions.md contract file) to operate as a rigid finite state machine. My rule is absolute: the AI is strictly forbidden from advancing to Phase N+1 if Phase N has not been completely documented, structured, and explicitly marked by the AI itself with "status": "FINALIZED" within the JSON file. By forcing the machine to read and write JSON formats for me-the universal language of deterministic data exchange-I completely eliminate the ambiguity of natural language. An LLM cannot "hallucinate" its way out of my strict JSON schema without breaking the parser, which forces its neural network to maintain millimeter precision for me.
But why did I decide to call it the "JSON Voorhees" methodology?
During my late-night solitary development sessions, dark humor is often my best coping mechanism for dealing with frustration. The name is an intentional pun I created, a visceral metaphor about my quality control and my ruthless elimination of garbage code.
Imagine an unrestricted, hyperactive agentic LLM as one of those clueless, wandering campers from an 80s horror movie (Friday the 13th). The camper is full of energy, eager to explore, wants to be "creative," and is about to make a series of terrible, highly dangerous decisions in my repository (like trying to mix the Rust sqlx library-which requires a live database at compile time-with a multi-stage Docker build that doesn't have a network yet). Its supposed "creative freedom" is an imminent threat to the health of my project.
My set of six JSON files and my unbreakable instructions.md contract are my Jason Voorhees machete.
When my AI tries to skip ahead, when I feel it getting that generative urge to spit out unplanned spaghetti code, or when it wants to invent dependencies that I haven't approved, my "JSON Voorhees" methodology steps out from the shadows of my file directory and slashes that creative freedom at the root. It slaughters technical debt before it is even born on my hard drive. It decapitates architectural hallucinations by forcing the model to mathematically justify every single variable to me in a static file.
I do not want my software agent to be "creative" with my infrastructure, in the exact same way I wouldn't want a civil engineer to be "creative" with the ratio of cement to steel in the foundation of my house. I am looking for boring, predictable, deterministic, and monolithically stable execution. By forcing my "racehorse" to travel through this dark, narrow tunnel of my six JSON files, I strip it of its improvisational instincts and transform it into the relentless engineering machine I always needed it to be.
Section 5: The Golden Rule and the Code Lockdown
Throughout my experience in the discipline of software engineering, I've noticed there is a very clear dividing line that separates a junior programmer from a systems architect. When a junior is presented with a complex problem, I see their primary instinct is to open the editor, create a main.js or main.rs file, and impulsively start typing syntax. The code flows from their hands before the structure even exists in their mind. Conversely, when I face the exact same problem today, I don't even touch the keyboard to program for the first few hours; I open a notepad, draw my architecture diagram, define my API contracts, model my database, and establish my network boundaries. I've learned the hard way that coding is not the first step of development; it is my final step, the mere translation of my robust architectural design into machine syntax.
I have found that the fundamental problem with Large Language Models (LLMs) is that, by default, they all act in front of me like the most impatient, hyperactive, and reckless junior programmer I have ever met. If I give them a prompt to build me an e-commerce clone, they immediately start spitting out React components and backend routes for me without having the slightest idea of how I am going to connect those services in my containerized environment. To tame my "racehorse" (my preferred agentic model), I realized I needed to override this generative instinct at all costs. I needed to force it to walk before I allowed it to run. And to achieve this in my local environment, I introduced the most critical concept of my "JSON Voorhees" methodology: The Code Lockdown.
Early in my solitary experiments, I tried using natural language to slow the AI down. I would write things in the chat like: "Please think step by step and do not write any code until we have finished planning together." As any engineer who has wrestled deeply with these agents knows, natural language is utterly futile for this. The AI would happily respond: "Understood, I will plan first. Here is the plan. And here are 2,000 lines of code you didn't ask for, just in case they are useful to you." Its conversational alignment pushed it to over-accommodate me by delivering the final product immediately, bypassing my controls.
To definitively hack this behavior, I discovered that I had to attack the AI in the one language its parser cannot ignore or misinterpret: my strict boolean logic embedded in a configuration file.
At the core of my working directory resides my master file: 00_orchestrator.json. I designed this file to function as the master traffic light of my state machine. And within this file, I planted a single variable that dictates the fate of my entire project: "can_execute_code": false.
This variable is the anchor of my master contract, my instructions.md file, which I use as an unbreakable Service Level Agreement (SLA) between myself (the engineer) and my agent. In the very first lines of my instructions, I establish my Golden Rule for it: "You are STRICTLY FORBIDDEN from creating .rs, .ts, .tsx, Dockerfile, docker-compose.yml, or any source code files until the can_execute_code field changes to true."
The psychological impact (at the level of its neural network processing) that I achieved with this restriction is monumental. By physically blocking its ability to invoke system tools focused on writing code, I deprived it of its habitual generative outlet. The massive computational energy of the model can no longer be dissipated by writing for loops or arrow functions in JavaScript for me. Instead, I managed to force that raw power to be channeled into deep analysis, reasoning, and structural planning. With a simple boolean, I forced a Role Change: my AI stops being a glorified typist and forcibly assumes the position of my Principal Software Architect.
During this lockdown period (Phases 01 through 04 of my state machine), I force my racehorse to walk at my pace. I virtually sit it down at my drafting table and demand answers to the hard questions it would normally ignore until my compiler crashed. I ask it lethal architectural questions: How is the Next.js Server-Side Rendering (SSR) component going to communicate with my Rust Axum backend if I am going to isolate both of them within an internal Docker network?
By forcing the AI to write the answer for me in my 02_architecture_flow.json file rather than in source code, I compel the model to abstractly and deliberately solve my internal Docker DNS problem. I make it formally record that the SSR will use
http://axum_app:8080
(the internal container route), while the Axios client in the browser will use
http://localhost:8080
. If I had let the AI jump straight into coding, it would have hardcoded localhost everywhere for me, and my Next.js container would have failed catastrophically when trying to fetch the API from within its own isolated environment.
Similarly, by making it plan the schema in my 03_data_schema.json file, I force the AI to acknowledge the limitations of my Rust compiler. I warn it in the design stage that using compile-time macros like those in the sqlx library will cause a deadlock during my Docker Compose deployment, since my Postgres container will not be ready yet. By forcing this architectural reflection during my "Code Lockdown," the AI preemptively decides to use tokio-postgres for me, saving me hours of frustration in my terminal.
Only when I ensure that the first four JSON files are documented with surgical precision, audited by me, and marked by the AI as finalized, do I decide to intervene. In an act of pure delegation, I open my 00_orchestrator.json file and flip the boolean state to "can_execute_code": true.
It is in that precise instant that I see the magic of my systems engineering come to life. I take the chains off my racehorse, but now, its track is perfectly outlined by my steel walls. My AI launches into Phase 05 (Build and Execution) with relentless speed, but it no longer has to guess data types for me, invent network ports on the fly, or improvise unstable architectures. It merely has to translate its own hermetic design into pure syntax. My Code Lockdown is simply my sacrifice of immediate gratification in pursuit of guaranteeing the absolute, long-term stability of my system.
Section 6: The "Stress Test" and the Anti-Laziness Directive
If I spend enough time reading artificial intelligence forums or watching demos of new developer tools, I notice a highly disappointing pattern: 99% of benchmarks and demonstrations revolve around building a "To-Do list" application in React, or spinning up a single-file Express server in Node.js that returns a static JSON.
For me, as a software engineer working on real production systems, these examples are not only trivial, but dangerously misleading. I know that a modern LLM has seen the code for a React To-Do list millions of times in its training dataset. Generating that code for me doesn't require any architectural reasoning; it's a mere exercise in statistical memory retrieval. If I truly wanted to evaluate the capacity of my agentic model (my "racehorse") and validate if my "JSON Voorhees" methodology actually worked, I needed to drag it out of its comfort zone. I needed to subject it to a true "Stress Test" in an environment that I designed to be deliberately hostile.
For this reason, I intentionally designed the deployment of my NARP Stack.
NARP is the acronym for Next.js (Frontend), Axum in Rust (Backend), Redis (Cache and Sessions), and PostgreSQL (Relational Persistence). Asking an AI to stand up this entire ecosystem for me from scratch and orchestrate it within an internal Docker Compose network is my equivalent of a final exam in systems architecture.
I chose Rust for a somewhat sadistic but entirely necessary reason: its compiler shows no mercy to me or the AI. Unlike Python or JavaScript, where the AI can hallucinate a variable name, ignore strict typing, or invent a method that will fail silently on me at runtime, Rust possesses the infamous Borrow Checker and an unbreakable type system. If the AI makes a mistake handling the asynchronous state of a PostgreSQL connection using tokio, or if it forgets to do an explicit cast from a decimal to a float (f64) for me, the code simply will not compile. The Docker container will crash right in my face with an Exit Code 1. To me, Rust is the perfect antidote to "Generative Technical Debt" because it forces my AI to be mathematically precise; I leave it absolutely no room to improvise spaghetti code.
Additionally, my orchestration with Docker Compose introduces a critical networking problem. The AI must understand that my Next.js frontend will communicate with my Rust backend via localhost:8080 from my client's browser, but it must use the internal Docker DNS (axum_app:8080) when performing Server-Side Rendering. Getting it to stand up this stack autonomously is a titanic challenge for any agent. And this was where I hit the second massive hurdle of LLMs: The Laziness Syndrome.
I noticed that even the most advanced agentic models I tested, after generating the source code and the docker-compose.yml file for me, have a natural tendency to stop, emit a polite message to me in the terminal, and say: "I have finished generating the files. Now, please open your terminal and run docker-compose up -d --build to launch your project. Let me know if you have any questions!".
I call this phenomenon my "Deployment Gap." The AI does the intellectual heavy lifting, but refuses to get its hands dirty in my terminal to validate if its own code actually works. As the architect of my system, I refused to allow this. An architectural blueprint is useless to me if the building collapses the moment I lay the first brick.
To combat this, I injected into my contract (instructions.md) and into Phase 06 of my state machine (06_validation_tests.json) a relentless mechanism I call The Anti-Laziness Directive.
My instruction is explicit and non-negotiable: "You are an autonomous agent. You are STRICTLY FORBIDDEN from leaving terminal commands for the user (me) to execute. You MUST execute the deployment commands yourself to validate your work. If a command fails, you MUST read the error, apply a patch to the code, and self-heal until successful."
With this directive activated, I saw my workflow undergo an extraordinary mutation. My AI agent went from being a blind code generator to becoming my full-fledged DevOps Engineer. It no longer stops upon creating the files for me; it initiates a Self-Healing Loop.
Phase 06 forces my agent to follow a strict audit sequence, executing real bash commands on my local machine:
Host Validation: First, it fires docker info. If my Docker daemon isn't running, the AI logically stops and asks me to turn it on, rather than attempting to build blindly.
Build and Deploy: It executes docker-compose up -d --build. This is where my Rust compiler usually screams and throws kilometer-long errors at it if there are dependency mismatches or asynchronous blockages.
Forensic Analysis and Patching: If the previous step fails (which is normal on my first attempt), the AI captures the stderr (standard error output), reads the compiler logs, opens the defective .rs or .toml files, injects the patch, and loops back to Step 2 entirely on its own.
Trial by Fire (CORS and Endpoints): Once my containers are up (docker ps), the AI doesn't consider the job done. I forced it to run a curl -I -X OPTIONS command, simulating the frontend, to verify if it correctly configured the CORS headers in my Axum backend.
By forbidding my AI from delegating execution, I force it to face the consequences of its own design. If its code is garbage, it will be the one spending the next hour fighting my terminal to fix it. My Self-Healing Loop guarantees that, by the time my state machine marks Phase 06 as "FINALIZED", I won't just have a handful of text files with empty promises, but a real, compiled, orchestrated system serving my data over port 8080 with an HTTP 200 OK.
Section 7: Autopsy of a Log (The Irrefutable Proof)
In my world of systems engineering, marketing promises and whiteboard architecture diagrams are incredibly cheap to me. I've learned the hard way that the only thing that truly proves the viability of a methodology or a tool is the terminal. My logs do not lie. Throughout this article, I have argued to you that my "JSON Voorhees" methodology converts a hyperactive Artificial Intelligence agent into a methodical and predictable executor. The time has come to present to you the forensic evidence of my hour-and-a-half-long execution run.
Upon activating Phase 06 (The Anti-Laziness Directive), I left Minimax (operating through my Claude Code environment) completely alone with the code it generated for me, my Docker daemon, and my Rust compiler. I ordered it to stand up the cluster, validate the routing, and absolutely not stop until it got me an HTTP 200 OK. What happened next on my screen was not a magical, flawless deployment on the very first try; it was a brutal, chaotic, and beautiful autonomous debugging session. My AI failed, read its own errors in standard output (stderr), deduced the root cause, and applied iterative patches.
Let's dissect three critical examples of course correction that I extracted directly from my execution log, which perfectly demonstrate this self-healing loop.
- Dependency Hell (Upgrading the Rust Compiler) Any solitary developer working with compiled ecosystems knows that version management is an absolute minefield. During the first attempt to build my axum_app container, I saw that the AI had drafted a Dockerfile that started from the base image rust:1.75-bookworm. However, in the Cargo.toml file, the agent had included highly modern dependencies for me, notably the tokio-postgres and chrono libraries. When executing docker-compose up -d --build, my Rust compiler threw a fatal error. A typical conversational AI would have halted right here and asked me to resolve the version conflict myself. But my agent, bound by my strict contract, read the error and reasoned explicitly in the terminal: "The Rust version is too old. Let me update to a newer version". Completely autonomously, it opened the Dockerfile for me and modified the base image to 1.85-bookworm. It executed the build again, but the compiler complained once more, this time being highly specific about the requirements of my time macros. I watched the AI iterate a second time with astonishing precision: "The chrono package needs a newer Rust. Let me use Rust 1.88". It patched the Dockerfile for me one more time, completely resolving the dependency mismatch without requiring a single keystroke from me.
- Wrestling the Borrow Checker (Redis Client Refactoring) As I know all too well, Rust is famous for its Borrow Checker, a relentless memory management system that simply will not compile for me if it detects potential race conditions or invalid references. When my AI attempted to implement the Redis client to manage my shopping cart, it tried to use an asynchronous ConnectionManager wrapped in an Arc. The compiler aggressively rejected the code for it due to shared mutability issues. I watched as the AI attempted to make minor syntactic patches, even making typographical errors due to its iterative haste (writing things for me like data,await?;). After a couple of compilation failures documented in my logs, my agent demonstrated cognitive capacity far beyond simple "text prediction." It recognized that its own architectural approach to the connection state was fundamentally flawed and declared: "Let me simplify the Redis client code to fix these issues". Instead of continuing to force broken code on me, it completely rewrote the RedisClient struct for me. It downgraded from a complex ConnectionManager to using a simple redis::{Client, AsyncCommands} and wrapped the connection for me in a traditional Arc> pool. It fundamentally understood that, in order to satisfy Rust's strict concurrency safety rules, it needed to simplify the management of my state.
- The Runtime CORS Crash To me, the most fascinating error did not occur during compilation, but rather at runtime. The Rust code compiled perfectly, my containers spun up, but the AI, fulfilling its audit directive, attempted to send a curl request for me to the products endpoint and failed. By inspecting the live logs of my container with the docker logs narp_backend command, it discovered a server panic: "thread 'main' panicked at... Invalid CORS configuration: Cannot combine Access-Control-Allow-Credentials: true with Access-Control-Allow-Origin: ". I know that Rust's tower-http library adheres strictly to W3C web security standards, which prohibit me from using wildcards ( or Any) in origins or headers if credentials (like cookies or sessions) are allowed. My AI grasped this network security concept. It opened the src/main.rs file for me and patched the CORS middleware. It stripped out the use of Any and explicitly mapped the origin "http://localhost:3000" and the required headers (Content-Type, Authorization, X-Session-ID) for me. After rebuilding and restarting the container, my system responded successfully. Furthermore, it independently solved a database deserialization issue for me by applying a DECIMAL to float cast (price::float8) directly within the tokio-postgres SQL queries. The Verdict: My Containment and Governance Why didn't my AI hallucinate during this intense hour of debugging? Why didn't it decide to throw Rust in the garbage and try rewriting the backend in Node.js for me, as LLMs often do when they get frustrated with me? The answer is my State Machine. Throughout its entire debugging cycle, the AI was strictly confined by the schema I dictated to it in 02_architecture_flow.json and 04_ui_api_manifest.json. It knew it had to fix the Redis client and the Axum CORS because I had already declared those contracts immutable to it in my previous phases. My "JSON Voorhees" methodology did not magically make the AI inherently smarter; it made it accountable to me. It debugged the implementation based strictly on an architectural design that it had documented for me itself, and which I no longer permitted it to alter. For those who wish to audit this process from my trenches with their own eyes, I left the raw, unedited logs.txt file from this session available in the GitHub repository of this project, right alongside the video of my execution.
Conclusion: From Prompt Engineer to Systems Engineer
Over the last few years, I have seen from my trench how the tech industry has become feverishly obsessed with a new and supposedly revolutionary job title: the Prompt Engineer. At first, I confess they made me believe too that the future of my software development career consisted of learning to "whisper" to Artificial Intelligence. I spent hours trying to discover the exact combinations of adjectives, adverbs, and polite requests ("please, act as an expert software architect and…") to get the neural network to generate the perfect code for me.
Today, sitting in front of the brutal complexity of the production systems I maintain every day, I can declare with absolute certainty that Prompt Engineering, as it was sold to us, was just a transitional phase. To me, it is an evolutionary dead end.
I have proven through errors and server crashes that software development with AI is no longer a literature contest. Writing increasingly long and detailed prompts into a chat window to an LLM is an unsustainable strategy that simply doesn't scale beyond a small isolated script or a visual React component. When I face my real infrastructures - where I have to balance concurrency in Rust, manage my connection pools to relational databases in the cloud, configure my internal Docker networks, and guarantee the security of my endpoints - natural language proves to be too fragile, ambiguous, and disgustingly prone to contextual amnesia.
I discovered that my true value as a human engineer in this new era of Agentic Models no longer lies in the speed at which I can spit out syntax, let alone in my ability to patiently persuade a dogmatized machine. My absolute value now lies in my capacity to design the unbreakable limits of the system. I have returned to the purest, hardest, and oldest foundations of Computer Science. For me, the future does not belong to the Prompt Engineer; it belongs to us, the Systems Engineers.
As a Systems Engineer, I no longer ask the machine to be "creative" with me; I demand that it be obedient. I no longer blindly trust the goodwill of my local LLM; I establish unbreakable execution contracts for it. I understood, thanks to my experience handling data, that AI is simply a brute-force engine, a chaotic and massively overloaded Data Lakehouse. My job now is to build the chassis, the transmission, and the brakes (the structured Data Warehouse) to prevent that massive engine from tearing my project to pieces at the first opportunity.
The "JSON Voorhees" methodology that I have broken down for you in this manifesto is the crystallization of this survival philosophy. By using my own rigid state machine, I managed to force the segregation of duties I so desperately needed. I completely isolated the architecture from the code generation. I used my "Code Lockdown" to decapitate "Generative Technical Debt" on my hard drive before my Rust compiler even blinked. And finally, by injecting my "Anti-Laziness Directive", I transformed a simple, skittish chatbot into a relentless DevOps agent, capable of reading standard error output (stderr), understanding the screams of the Rust Borrow Checker, and self-healing in the early hours of the morning until it crossed my finish line.
With all this, my goal is not to use AI to replace me as a programmer; it is about using it to elevate myself to the position of Chief Architect of my own projects. I have accepted that AI is my hyperactive bricklayer, but I, with my bottle of Coca-Cola next to the keyboard, am still the master builder who signs off on the blueprints.
Call to Action (CTA)
But I know very well that, in this trade, theories and manifestos are worth absolutely nothing if they do not survive direct contact with the terminal. If you, like me, are sick of AI-generated spaghetti code, mid-project hallucinations, and the infuriating laziness of commercial models that ask you to run the commands yourself, I invite you to test this methodology on your own machine. I have opened my stable doors and I am handing you my racetrack.
In my official repository you will find the complete "machete" I designed: the master contract instructions.md, my 6 blank .json files ready to be filled by your agent, and the project.md file where you can define your own Stack (whether it is NARP, a traditional MERN, or any exotic architecture you wish to put through the trial by fire). I have also included the raw, real, and unedited logs.txt file from my own execution, so you can audit with your own eyes the step-by-step self-healing loop I described in this article.
📂 Official GitHub Repository (JSON Voorhees): Clone the methodology and the blank files here
Furthermore, to empirically back up every claim I've made throughout this text, I have documented the execution of my agentic model (Minimax) in an ASMR Vibe Coding format. It is a raw video, straight to the terminal, with no voiceovers and the proper background sound, where you can watch in real time how the AI designs, generates, fails miserably, reads the Docker logs, and self-heals all on its own until it manages to stand up the NARP architecture right in front of you.
📺 Forensic Evidence on YouTube (Self-Healing Process): Watch the complete autonomous deployment here
The era of treating AI as a simple "glorified autocompleter" is over. It is time for us, the engineers, to govern the chaos once again. Download my files, spin up your favorite CLI powered by a high-performance agentic model, configure its limits, and put the machine to work while you watch the logs.
Top comments (0)