DEV Community: Edwin Lisowski

СontextCheck: LLM & RAG Evaluation Framework

Edwin Lisowski — Wed, 27 Nov 2024 08:51:21 +0000

Hi all! We open-sourced a framework for testing LLMs, RAGs, and chatbots. The tool automates query generation, completion requests, regression detection, penetration testing, and hallucination assessment. Designed for developers, researchers, and businesses. And we are looking for contributors! Feel free to try it out for yourself and share your feedback!

Repo on Github

ContextCheck: An open-source framework for testing and evaluating LLMs, RAGs, Chatbots

Edwin Lisowski — Thu, 21 Nov 2024 10:20:15 +0000

Hey devs!

We just open-sourced ContextCheck, a framework for testing and evaluating LLMs, RAGs, and chatbots 🚀

What it does:

Generates queries and handles completions
Detects regressions and hallucinations
Runs penetration tests
Works in CI pipelines (YAML-configurable)

We built it while developing our AI Knowledge Base Assistant to solve real headaches with testing and validating LLMs. Now it’s out there for you to use, break, and improve.

Try it out and let us know what you think! ➡️ Github repo

ContextCheck: An open-source framework for testing and evaluating LLMs, RAGs, Chatbots

Edwin Lisowski — Thu, 21 Nov 2024 10:04:20 +0000

Hey everyone!

I’m one of the co-founders of Addepto, and I’m excited to share ContextCheck—a new open-source framework we’ve developed for testing and evaluating LLMs, RAGs, and chatbots.

ContextCheck offers tools to:

Automatically generate queries and request completions
Detect regressions and assess hallucinations
Perform penetration testing
Ensure the robustness and reliability of AI systems

It’s fully configurable via YAML and integrates seamlessly into CI pipelines for automated testing.

We built ContextCheck during the development of our AI-powered Knowledge Base Assistant to solve the challenges we faced with testing and validating Large Language Models. It’s a tool designed by developers for developers to tackle real-world issues.

We’d love for you to try it out, contribute, and share your feedback!

Github repo