Have Antigravity review prompts update themselves when your codebase changes

#antigravity #vibecoding

I built a self-improving and healing workflow for Antigravity 2.0 where your AI review prompts update themselves based on your actual codebase. No more stale checklists that still reference packages you removed months ago.

The core idea: meta-prompts — instructions that tell the AI to scan your package.json, GEMINI.md, skills, and MCP config, then generate hyper-specific review prompts from scratch. Every time.

What's in the repo:

🧠 Self-optimizing prompts for code review, spring cleaning, and architecture review
🔄 Dual-model implementation review (Gemini drafts the plan, Claude stress-tests it)
🛠️ Modular .skills/ the AI loads on-demand instead of bloating every conversation
📊 Timestamped review reports with retention policies — resolved issues stay resolved
📱 A script that consolidates all your docs into a single .md for Gemini Gem so you can query your repo from your phone with Gemini.

GitHub: antigravity-self-evolving-reviews

The README has a TLDR — you can be running self-evolving reviews on your own project in about 10 minutes. Download, unzip into your workspace, run /run-meta-prompts, then run /spring-cleaning or /code-review. That's it.

If you try it, I'd really like to know: does /run-meta-prompts generate prompts that actually match your stack and produce meaningfull code-review, spring-cleaning, ... or does it miss things? That's the part I'm least sure about on projects different from mine.

Feedback, suggestions, and PRs welcome!

Top comments (4)

Harjot Singh • May 31

Self-updating review prompts that track codebase changes is a clever fix for prompt-staleness, the review prompt that was right last month silently reviews against an outdated mental model of your code. Tying the prompt to the current state so it can't drift from reality is the right instinct. The risk to watch: the auto-update itself becomes an unobserved process, if it regenerates the prompt wrong, your reviews are now confidently checking the wrong things and nothing flags it. A verify step on the generated prompt closes that loop. I deal with this same self-modifying-config problem in Moonshift. How do you validate the updated prompt is actually better, not just different?

T.M. Jensen • May 31 • Edited

I totally agree with your point that it's an unobserved process without verifying that the new prompts are actually better than the old ones or that they actually review new stuff the right way.
What I have done, is to use different AI Models for, Gemini Pro and Claude Opus for both meta prompts generating the reviews prompts as well as performing the actual review. I found that...
Using Gemini Pro for meta prompts produces review prompts that finds more stuff, but also false positives and stuff/suggestions i choose not to follow.
Using Claude Opus for meta prompts produces review prompts that finds the same as Gemini but less false positives and almost nothing i do not want to fix. Also, the review reports from the Claude generated prompts, are cleaner, more informative.

Does either produce prompts that find everything that should be found during reviews/cleaning. Probably not. But enough for me to feel confident that i have a healthy codebase/workspace/Teck stack.

Regarding executing the actual review prompts I have also tried different models and did compare results on reeving exact same codebase. Claude Sonnet 4.6 just is the best, but Gemini 3.1 Pro can yield some interesting things when doing the Architecture review.

Regarding revisiting my meta prompts, I have done it from time to time, but it's like the AI's can only find cosmetic things to change. I sometimes manually add specific things I want to start review. But the meta prompts do what I need for my project and the tech-stack I use.

So, it is my gut feeling, that if you can get the Meta prompts to a level where it covers your tech-stack and architecture well, then this setup will work nicely. But if someone has a completely different tech-stack or some of the same tech completely different, the meta prompts would need some work.

Harjot Singh • May 31

The cross-model finding is the gold here: Gemini Pro meta-prompts find more but with false positives, Claude Opus finds the same with fewer and cleaner reports. That's a real, useful asymmetry, and it maps to using each where its bias helps, the higher-recall model where missing things is expensive, the higher-precision one where false positives waste review time. The honest caveat you already named is the one I'd underline: self-updating prompts are an unobserved process unless you verify the new prompts are actually better, and gut-feeling-confident-codebase isn't quite a measurement. The thing that would close that loop is a small fixed benchmark of known issues (a few seeded bugs, a few known-clean files) you re-run the regenerated prompts against, so a prompt update has to demonstrate it still catches the planted issues before it's promoted, otherwise drift is invisible. Different models for meta vs review plus a held-out check on the regenerated prompts is the combo I'd trust to leave running. That verify-the-self-update-against-ground-truth instinct is core to how I think about Moonshift. Have you tried pinning a small known-issue set to validate a regeneration, or is it eyeball-the-reports so far?

T.M. Jensen • Jun 1

Regarding benchmark, the professional route to benchmarking the setup would be (a few seeded bugs, a few known-clean files).
But I am a solo dev that vibecodes, so don't really have the skills and time to do that.
I am considering having my AI pull past review prompts and create a matrix of what it reviews today vs. 1,2,3 months ago, to get an overview of the drift over time.
I see my setup most useful for solo vibe Coders, not teams that do PRs and peer reviews anyway. I just need to have AI review for me and if it finds the obvious - "compilable bugs" , design flaws, security holes, etc. I am happy. And it does. For example it has found "prompt injection" security issues. Has it found all. No clue, but my AI has promised me that all five cases where prompts are not hardcoded are safe.
My setups is definitely not enterprise ready. For dev teams, mayby as something you do before creating a PR on a new feature, but not without some kind of formal benchmarking.
I add to the Meta prompts more or manually and revive and approve changes the AI make to the Meta prompts, and seldom remove anything. So in essens it should review and check more and more. But with AI more is not always better :-)