Enterprise Eval Library, Now Open Source

#opensource #ai #monitoring

Unpopular opinion: Using GPT-4 as a judge to evaluate other models is grading your own homework.

At Future AGI, we built an open-source eval library because evaluations need multiple signals, edge-case stress, and production monitoring.