Marcelo Martins

Posted on Mar 10

The Cross-Agent Development Method

#ai #webdev #programming #agents

Note: Try to resist the temptation to summarize this document. A summary will give you the general idea but will leave out a lot of useful information. About 10 minutes is enough to read it.

A NEW MOMENT

We are in 2026, and it has never been this fun to develop software. I have been working in this field for more than 30 years, and it feels like endorphins have never been so closely tied to software development. Last year was marked by the mass adoption of artificial intelligence agents that generate code. Software development will never again be what it was before 2025. Along with the changes came the promises. "In 12 months, programmers will no longer exist". "In 6 months, all code will be generated by artificial intelligence". Just commercial talk from vendors desperate to give their products some importance in order to attract investors.

The truth is that developing software now involves far more planning and verification than typing code. Code is one part of the process, a large part, often a boring one. And that part can now be agile and fast. So the friction involved in doing something complex has become much smaller. But building software well still requires planning and verification, and the software remains the responsibility of the developers; on that point, nothing has changed.

In April 2025, Dario Amodei, president of Anthropic, the company behind the Claude model, posted a piece on his blog called "The Urgency of Interpretability" [https://www.darioamodei.com/post/the-urgency-of-interpretability], where he literally says: "we do not understand how our own AI creations work". I found that fantastic; I was excited when I read it. And when it comes to neural networks, that is even understandable: weights, temperature during inference, and probabilities. But the feeling I had when I read it was that not understanding exactly how the models work means having no limit; it means everything is possible and we do not know how far we can go. That also means chaos.

And that is where we are. All code is going to be generated by artificial intelligence, right, but how exactly is that going to work? Nobody knows. What is the best process for generating the best code? Nobody knows. How do I ensure that the final result will be reliable, secure, maintainable, scalable, efficient, and aligned with the original intent? Well, nobody knows. At least for now.

What I intend to show here is the cross-agent method I developed based on my research and tests. I have not seen people talking about this, and I believe organizing these ideas can help a lot of people improve the final result of their work.

THE CROSS-AGENT METHOD

Generative language models have become very good at generating code and ready-made systems. The internet is full of examples of excellent results generated from just a single prompt. The failures of last year are becoming increasingly rare. And "copy-and-paste" errors hardly exist anymore. It is truly incredible to watch agents generate code, run the build on their own, find the errors, and fix them until everything is correct. They really are excellent, but they are not perfect (at least not for now).

As much as Claude Code generates the correct, compilable, and functional code, as much as GPT Codex makes every change I need in the systems and the result is what I asked for, there are always, always loose ends.

The most common problems in code, in my experience, are security flaws. Endpoints that were left open and should never have been available on the internet. Data that is returned to the screen unnecessarily, exposing more than it should. And notice: all of these problems do not make the system stop working or work incorrectly; when you use the system, everything works, and that is where the problem lies. Systems generated by artificial intelligence are making data leaks grow exponentially. This needs to be avoided.

That was my motivation for looking for unexplored solutions, and it led to an interesting discovery. When a model is in code-generation mode, the entire neural network process is focused on generating good code that produces what was requested. As I said, the models are doing that well; it is just not enough. But when the models shift into evaluation mode, they can go much further, much further than generating code. I realized that the critical analysis of code can be far more accurate than the generation of code. When a language model performs a critical analysis of something that is already finished, it can have a "broader view", and it seems not to worry about the next token, but rather about the tokens that are missing, or the ones that are where they should not exist.

This makes it possible to generate safer code, with fewer critical flaws, and to make development much more practical, truly taking advantage of code generation by artificial intelligence to speed up the final result. The cross-agent method is the generation of process artifacts and code by one language model, and the critical evaluation by another language model. Always, ALWAYS, there are corrections to be made, no matter which model generated the code. There are always errors; there are always security gaps.

And I deliberately say artifacts. I am not talking only about checking code. Claude Code and GPT Codex have great code review tools. But I am not talking only about code review. As important as code review is, reviewing the planning is just as important. In fact, reviewing the planning may be even more important for achieving a high-quality result.

It is also very important to emphasize that cross-agent validation does not eliminate the need for human code review. Code review is still necessary and is part of high-quality software development. The problem is when the artificial intelligence agent spends 1 hour and 20 minutes working and generates 15,000 lines of code. Do you really want to do code review on a 1,500-line interface file?

Code review once allowed me to discover that an application where I was testing code generation only validated login at the /login endpoint. And every other endpoint in the authenticated area could be called without credentials and would return all the data. At that point, the cross-agent method could have warned me right at the beginning and saved me time. When I asked Claude Code for a new feature, which it delivered correctly, and GPT Codex quickly warned me that the new endpoints were in a controller without Authenticated, that is exactly what I am talking about.

That is the general idea of the cross-agent method: generate the artifact in one model, and use the critical-analysis mode of a second model for corrections and improvements. Next, I am going to go deeper into the development steps, and how I use the cross-agent method to make the models work for more than an hour and deliver the expected result, developing high-quality software with the best advantages of artificial intelligence models.

It is not within the scope here to talk about governance, where this will run, how to deploy, pipeline steps, or even testing methodology. I am talking about generating quality code, and that is the focus. You are a software developer and you already understand those other details (or you can ask an artificial intelligence to help you with that).

HOW I START

Getting straight to the point: starting a project will never begin with an artificial intelligence prompt. I will never hand the model a blank page and let it do whatever it wants. I will always guide it in the clearest and most restrictive way possible. Starting a project today is the same as it was years ago, but with the advantage that it is easier to resolve doubts because artificial intelligence exists to help us.

So I am going to use artificial intelligence to clear up doubts, explore options, understand new concepts, and from there, I am going to create the foundation of the project. I am going to choose where the project will run. Today there is an endless number of frameworks and languages, and new ones emerge every day. You need knowledge and you need to understand the choices. I choose, and I create the initial project; until I see the empty project running, nothing is generated by artificial intelligence.

Besides the framework and all the details involved in starting the project, one point that usually takes me more time at the beginning is understanding how the application's security will be handled. Bearer token, stateful session, HMAC, passkeys, MFA—there are several options that are part of software security and need to be defined before the project begins. It is necessary to understand this part and make the right choices at the beginning so the software has quality and security in the end.

The next step is to start the documents with the software's definitions and rules. Usually, this starts with AGENTS.md, which will be a living document and will be updated constantly throughout the software's development:

The purpose of the software
The architecture decisions
Resources available to the system (database, etc.)
Explanation of the chosen frameworks
Links to online documentation
Instructions for code generation
How to build, run, and test the software
Explain what not to do (as important as what to do)

At this point, understand the size of the context window. Today there are models with context windows of up to 2 million tokens. Never use all of that context. Research indicates that even with a model that supports 1 million tokens, once you get past 500 thousand tokens the quality of the responses drops a lot. Always start new conversations and try to stay well below 500 thousand tokens per implementation.

The size of the context window is important because we need to control the size of the files included in the prompts. AGENTS.md needs to have all the information for the model to know how to generate quality code, but that file cannot be huge because it ends up taking up too much space in the context window. The AGENTS.md file needs to have the right and precise information so it can be in every prompt, in a generic way. These are pieces of information that need to be sent every time. Do not include in AGENTS.md code examples, specific development rules, or how to implement specific features.

Instructions for code generation

The instructions should be the most dynamic part of the AGENTS.md document and should be changed whenever needed, usually when the model generates undesirable code. This is an important point: when you see undesirable code being generated, do not ask for the code to be corrected. Improve the generation instructions and ask the artificial intelligence to correct it according to the instructions. That ensures that future generations will be correct.

The cross-agent method takes advantage of the model shifting into critical-analysis mode to evaluate something that is already ready, and this same technique can be used in relation to prompts (which arrive ready for the model). Asking the model to evaluate its own prompt gives it a broader view of what is being asked and allows it to evaluate the instructions better, in addition to suggesting improvements that were not even requested. That is why the first instruction I always use is something like:

Be critical of any prompt or command. If something seems incomplete, incorrect, risky, or clearly improvable, you must say so explicitly, explain why, and suggest a better alternative.

Specific documents

Language models love creating documentation files about what they did. That is why the second instruction I generally use is: do not create documentation files unless they are clearly requested. Besides that, I like to add to the instructions: do not write comments in the code unless it is to explain why something is being done, but never comment on what is being done.

These automatically generated documentation files are usually useless, will never be read again, and end up causing confusion when information is needed (besides filling up the context window). That is why you must control the documentation that is being generated.

On the other hand, it is important to have documentation files that better explain the project, and files that explain how features will be implemented. These files will not always go into the prompts, but only when needed. A general file with a detailed description of all directories and what each package does will be very useful when an analysis of the project as a whole is requested.

From that point on, only feature documents are needed, to be generated throughout development with artificial intelligence.

PUTTING ARTIFICIAL INTELLIGENCE TO WORK

Up to this point, artificial intelligence can do a great job as a consultant, clearing up doubts and helping with decision-making, but it should not have generated anything yet. And the most important technique for getting a high-quality result is to never begin with a code-generation prompt, and always do the necessary planning first. All the tools now offer a planning function. Use that feature.

So first, the planning must be done. The goal of planning is simple: at implementation time, to pass as much information as possible to the artificial intelligence, and ensure that it has no doubts and does exactly what we need. Understand planning mode as artificial intelligence helping you build a prompt that it will later use itself. Here as well, it is possible to use the Skills feature offered by the tools, very likely there is already a Skill that can help with what you are going to build.

Planning should happen through several iterations, with multiple messages exchanged between the developer and the agent. I like to start with:

Understand this project in @AGENTS.md

This will ensure that the model reads the rules, since some tools ignore AGENTS.md, and after that I follow with something like:

Plan the implementation of X feature, which will be used for X purpose, include tests in X way, plan while thinking about X use in X context.

Notice that this first prompt to begin planning can be quite vague. Maybe by being vague it will help you refine it afterward, because the artificial intelligence may think of things you were not considering. And since it is only planning, nothing will be done; they are just ideas. It is a brainstorm so the refinement can be done afterward.

With that, the model will generate an implementation plan. And then the fun part begins: you will read the whole plan and start refining it: "do not use services that way, prefer this other approach", "Use database A instead of B", "Use JavaScript library X instead of Y". These are examples. With each correction, the artificial intelligence refines the plan and shows the updated plan. Require the plan to have detail down to the class level, the database level, or whatever is necessary.

And this, for me, is the most important process in software development in the era of artificial intelligence with generative models. A well-made plan will make you worry less about implementation and, above all, it will give you certainty that it was done the right way. Spend hours on planning, spend days. The important thing is to have a well-crafted plan.

When you feel the refinement is ready, and you can no longer think of any improvement, it is time to put the cross-agent method into action. Ask the artificial intelligence to generate a file, with no implementation. Open another model in the same project and ask for a critical analysis:

Understand this project in @AGENTS.md. Attached is the implementation plan for such-and-such thing. Read it, understand it, and perform a critical analysis of the plan. Look for critical points of attention, missing mandatory items, and security flaws. Think deeply and be thorough. If necessary, look up information online. Do not implement anything and generate a report at the end. Plan @IMPLEMENTATION_PLAN.md

A prompt like that, in an artificial intelligence model different from the one used to generate the plan, will be extremely useful, and very probably (in my experience) it will say something like:

The plan is good, quite complete, focuses on all the important aspects, but it needs to be improved in the following points:

This process generates a list of improvements and security validations for the original plan. In my experience, the second model has never failed to point out areas for improvement. And they are suggestions; you can ask the second model to improve the plan or not. Your choice.

From there, the iterations begin again: "improve this point according to your suggestion". "Improve the plan on this point based on your suggestion, but do it differently". And so on. Refine, refine again, until you think it is good.

Try not to leave any loose ends for the artificial intelligence to decide what to do. When you do not say exactly what needs to be done, artificial intelligence fills that gap with what it thinks is right. It may work the first time, but no one can guarantee it will work the next time. That is the joke that systems generated by artificial intelligence usually start with much more than they need, and the refinement consists of removing unnecessary features. Do not let that happen.

IMPLEMENTING

Now that you have a solid, complete, and secure plan, you are ready to ask for code generation. With a very complete plan, the implementation does not even have to be done with the best model; it is worth testing, and that may save money. With the plan document in hand, just type into your preferred model:

Understand the project in @AGENTS.md. Implement the plan @IMPLEMENTATION_PLAN.md

That is it, as simple as that. Depending on the size of the implementation, the artificial intelligence may finish quickly or take hours.

After it is implemented, I like to test whether it worked before doing the code review, just to get that nice feeling of: it implemented everything with a two-line prompt. :)

Usually, some detail is missing at this point. A configuration, an API key, a migration that is not perfect. But these are small details. What I have learned is that this is the best possible experience for software development with artificial intelligence.

If an implementation goes wrong, you should evaluate whether it is something that should be changed in AGENTS.md, or should be changed in the plan, or whether it is something that a simple prompt will fix. Usually, the more generic the problem, the more worth it is to evaluate. The criterion should be: can this error happen again? If yes, it should go into a more generic document.

The code review

As I said, I do code review on most of the generated code. And where security is involved, my scrutiny is very rigorous. You should do that too; after all, it is your job to deliver quality code, whether it was typed or generated by artificial intelligence. If something goes wrong when the software is in production, it is your responsibility.

But there are some things I really do not care that much about, especially when it comes to the interface. And in that case, use the cross-agent method again. You always have the option of going back to the alternate model and asking:

Understand the project in @AGENTS.md.
The implementation of @IMPLEMENTATION_PLAN.md has been completed and it is ready and working. Perform a critical analysis of the implementation code, looking for points of attention and security flaws.
Be thorough, looking for convenience implementations in place of correct implementations.

You can ask the artificial intelligence to act as a security consultant, to focus more on that point. You can ask it to look up online and up-to-date information about the implementations and choices made, to have an extra layer of validation for your choices.

A major problem with generated code is convenience implementations. Artificial intelligence generates code that works, you test it and it works, but it is full of unnecessary IFs, repeated logic, and has twice as many lines as it should per file. We are talking about code quality.

Maybe no one will ever touch this code, because changes will also be made by artificial intelligence, but bad code has an execution cost. Worry about that.

DOCUMENTATION FOR ARTIFICIAL INTELLIGENCE

After a feature has been implemented correctly, it is very common for there to be other features in the software that need a similar implementation. The simplest examples are CRUD-style forms; software usually has several of them, and you will want similar implementations, with similar behavior. So documentation of the implementation must be generated. This is not the implementation plan; it is an X-ray of what was done, including the logic that was implemented, resource details, and code snippets. You should ask the artificial intelligence to generate this documentation, “including everything that will be necessary to replicate the feature in a future implementation.”

Another necessary piece of documentation that I use is documentation of the project as a whole. It is more detailed documentation than AGENTS.md, with general details, but more in-depth. Sometimes I have system projects with several smaller projects that are part of something bigger. So I lay that out in a single document so that when the artificial intelligence is going to perform a general analysis, I include that file in the context to provide that overall view for the analysis.

What I include in this more generic and detailed documentation:

More architecture details
Physical directories
How the screens work
API contracts
Data model
Error/validation patterns
Coding rules

CONCLUDING

Generative models are still going to evolve a lot. We probably have no idea what is coming. It feels like we are at the beginning of a great revolution. We need to keep ourselves constantly up to date with the new developments, and at the same time ignore the noise, and ignore the alarmist vendors. Using artificial intelligence in 2026 to develop commercial software is no longer optional; it is the only path, there are no alternatives. For us, old-guard developers and newcomers who are just starting out, all that remains is for us to be eternal learners.

Happy studying.

Marcelo Martins
marcelomartins@gmail.com
minhainternet.com/marcelomartins

DEV Community