Shipping AI features without evals is just vibes-based engineering.
You don't actually know if your last prompt change made things better or worse. You find out when a user complains.
I'm building a tool that gives AI teams proper regression testing, output scoring, and trace logging so you catch regressions before your users do.
If this is a problem you're dealing with, I'd love 20 mins to understand your setup.
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)