In the first post I asked whether the AI surprised you with what it caught or missed. After running this locally against Sauce Demo, here is my honest answer.
It did surprise me. I went in expecting the human to win. That is not quite what happened.
After indexing the three docs into ChromaDB and running cy.prompt() with that context, I ran both tests. The same app, the same flows, one written by a human and one grounded in RAG context.
The AI knew the locked out user scenario because it was in the bug history doc. It knew the exact selectors because they were in the component doc. It did not guess. It worked from what I gave it.
Here are both tests it generated:
But here is where it gets interesting. The AI verified that an error message existed. It did not verify that the message said "Sorry, this user has been locked out." That is intent knowledge. It lives in someone's head, not in a doc. The human catches that. The AI does not.
And anything that was never documented will not show up in the tests either. A flow built last Tuesday that never made it into any spec or component doc is invisible to the pipeline. The RAG context is only as good as what you indexed.
So neither wins cleanly. The AI covers breadth. The human covers intent. The most useful thing is not picking a winner, it is understanding where each one has blind spots and using both accordingly.
One thing I did not expect: cy.prompt() requires a Cypress Cloud account to authenticate. It is not a fully local feature. That was a real discovery during the setup and worth knowing before you go too far down this path.
If you have tried this, I am genuinely curious what your results looked like. And if you found a better chunking strategy for API specs after reading Post 2, I would love to hear that too.


Top comments (0)