DEV Community

Simon Massey
Simon Massey

Posted on

Augmented Intelligence (AI) Coding using Markdown Driven-Development

TL;DR: Deep research the feature, write the documentation first, go YOLO, work backwards... Then magic. ✩₊˚.⋆☾⋆⁺₊✧

I my last post, I outlined how I was using Readme-Driven Development. In this post, I will outline how I implemented a 50-page RFC over the course of a weekend.

My steps are:

Step 1: Design the feature documentation with an online thinking model
Step 2: Export a description-only "coding prompt"
Step 3: Paste to an Agent in YOLO mode (--dangerously-skip-permissions)
Step 4: Force the Agent to "Work Backwards"

Step 1: Design the feature documentation with an online thinking model

Open a new chat with an LLM that can search the web or do "deep research". Discuss what the feature should achieve. Do not let the online LLM write code. Create the user documentation for the feature you will write (e.g., README.md or a blog page). I start with an open-ended question to research the feature. That will prime the model. Your exit criteria is that you like the documentation or promotional material enough to want to write the code.

To exit this step, have it create a "documentation artefact" in markdown (e.g. the README.md or blog post). Save that to disk so that you can point the coding agent at it.

If you don't want to pay for a subscription for an expensive model can install Dive AI Desktop and use pay-as-you-go models of much better value. Here is a video on setting up Dive AI to do web research with Mistral:

Step 2: Export a description-only "coding prompt"

Next, tell the online model to "create a description only coding prompt (do not write the code!)". Do not accept the first answer. The more effort you put into perfecting both the markdown feature documentation and the coding prompt, the better.

If the coding prompt is too long, then the artefact is too big! Start a fresh chat and create something smaller. This is Augmented Intelligence ticket grooming in action!

Step 3: Paste to an Agent in YOLO mode (--dangerously-skip-permissions)

Please paste in the groomed coding prompt and the documentation, and let it run. I always use a git branch so that I can let the agent go flat out. Cursor background agents, Copilot agents, OpenHands are all getting better.

I only restrict git commit and git push. I ask it first to make a GitHub issue using the gh cli and tell it to make a branch and PR.

Step 4: Force the Agent to "Work Backwards"

The models love to dive into code, break it all, get distracted, forget to update the documentation, hit compaction, and leave you with a mess. Do not let them be a caffeine-fuelled flying squirrel!

The primary tool I am using now prints out a Todos list. This is usually the opposite of the correct way to do things safely!

⏺ Update Todos
  ⎿ ☐ Remove all compatibility mode handling
     ☐ Make `{}` always compile as strict
     ☐ Update Test_X to expect failures for `{}`
     ☐ Add regression test Test_Y
     ☐ Add INFO log warning when `{}` is compiled
     ☐ Update README.md with Empty Schema Semantics section
     ☐ Update AGENTS.md with guidance
Enter fullscreen mode Exit fullscreen mode

That list is in a perilous order. Logically, it is this:

  1. Delete logic (broken code, invalid tests)
  2. Change logic (more broken code, more invalid tests)
  3. Change one test (which is mostly to what you are doing)
  4. Add one test (finally! the objective!)
  5. Change the README.md and AGENTS.md

If the agent context compacts, things go sideways, you get distracted, and you will end up with a bag of broken code.

I set it to "plan mode", else immediately interrupted it, to reorder the Todo list:

  1. Change the README.md and AGENTS.md first
  2. Add one test (insist the test is not run yet!)
  3. Change one test (insist the test is not run yet!)
  4. Add/Change logic
  5. Now run the tests
  6. Delete things last

Todos Are All You Need?

I am not actually a big fan of the built-in Todos list of the two big AI labs. The models really struggle with any changes to the plan. The Kimi K2 Turbo seems more capable of pivoting. I have a few tricks for that, but I will save them for another post.

Does This Work For Real Code?

This past weekend I decided to write an RFC 8927 JSON Type Definition validator based on the experiemental JDK java.util.json parser. The PDF of the spec is 51 pages. There is a ~4000-line compatibility test suite. A jQwik generates 1000 random JTDs, which would cause several bugs. The total set of unit test written was 509.

End Notes

Using a single model family is a Bad Idea (tm). For online research I alternate between full-fat ChatGPT Desktop, Claude Desktop and Dive Desktop to use each of GPT5-High, Opus 4.1 or Kimi K2 Turbo turn.

For Agents I have used all the models and many services. Microsoft kindly allows me to use full-fat Copilot with Agents for an open-source projects for free ❤️ I have a cursor sub to use their background agents. I use Codex, Code, and Gemini. The model seems less important than writing the documentation first and writing tight prompts. I am currently using an open weights model at $3 per million tokens for the heavy lifting as pay-as-you-go yet cross check its plans with GPT5 and Sonnet.

Top comments (0)