Automation consultant. I build AI-powered workflows using Claude, n8n, and open-source tools. Sharing practical guides on AI agents, no-code automation, and cost optimization.
AI-generated specs as the failing test source is a really underrated pattern. One gotcha I've hit: when you let the model write the OpenAPI schema and the tests from the same prompt, any hallucination gets "verified" by its own stub — the test passes, the spec is wrong, and the bug ships. I now run the spec and tests through separate model calls with different system prompts so at least one catches drift. Curious if you've layered schema validation (like pydantic / zod) on top of the generated tests to catch that third failure mode?
I now run the spec and tests through separate model calls with different system prompts so at least one catches drift.
This is a great practice to do. I think as developers we are still learning on how to work with AI tools and one of the mistakes we do is trying to do all at once without directing AI tools to work effectively.
I use zod for schema validation (I am a big TS fan, if you check my GitHub, you wills see my dislike towards "any"). Also, as the projects grow and definitions become clear your "system prompts" to work as a guide and guardrail on top of your "instruction prompt" to generate schemas will evolve as well.
In this blog post I wanted to show folks how to think about generating specs and thinking in features. Maybe this could be another blog post to go deeper in to this concept.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
AI-generated specs as the failing test source is a really underrated pattern. One gotcha I've hit: when you let the model write the OpenAPI schema and the tests from the same prompt, any hallucination gets "verified" by its own stub — the test passes, the spec is wrong, and the bug ships. I now run the spec and tests through separate model calls with different system prompts so at least one catches drift. Curious if you've layered schema validation (like pydantic / zod) on top of the generated tests to catch that third failure mode?
Thanks for your comment!
This is a great practice to do. I think as developers we are still learning on how to work with AI tools and one of the mistakes we do is trying to do all at once without directing AI tools to work effectively.
I use zod for schema validation (I am a big TS fan, if you check my GitHub, you wills see my dislike towards "any"). Also, as the projects grow and definitions become clear your "system prompts" to work as a guide and guardrail on top of your "instruction prompt" to generate schemas will evolve as well.
In this blog post I wanted to show folks how to think about generating specs and thinking in features. Maybe this could be another blog post to go deeper in to this concept.