DEV Community

Cover image for Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!
Himanshu Bamoria
Himanshu Bamoria

Posted on

4 1 3

Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!

TL;DR: Athina helps you monitor and evaluate your LLM powered app. Plug and play evals in production. 5 minute setup.


👋 Hey everyone! We’re thrilled to announce the launch of Athina AI, a suite of tools for LLM developers to ship and develop AI products with confidence.

What is Athina AI?

Athina Monitoring Dashboard

Athina AI is a Monitoring & Evaluation platform for LLM developers.

Developers use Athina’s evaluation framework and production monitoring platform to improve the performance and reliability of AI applications through real-time monitoring, analytics, and automatic evaluations.

Problem

  • It is difficult to measure the quality of Generative AI responses.
  • Eyeballing production responses is tough.
  • No easy way to detect unreliable or bad outputs (especially in production).
  • Low visibility into LLM touchpoints.

LLM developers typically have to build lots of in-house infrastructure for monitoring and evaluation.

Solution: Athina AI

  • Quick Setup: Get started in just 5 minutes! The entire integration is 1 simple POST request (and we don’t interfere with your LLM calls)
  • Comprehensive Monitoring Platform: Full visibility into your LLM touchpoints. Search, sort, filter, compare, debug.
  • Prebuilt Evaluations:
    • You can configure automatic evaluations in just a few clicks - use one of our preset evals or define a custom eval.
    • These evals will run against logged inferences automatically.
    • You can also use our open-source library to run evals and iterate rapidly during development.
  • Granular Analytics:
    • Tracks usage metrics like response time, cost, token usage, feedback, and more.
    • Athina also track metrics from the evals, like Faithfulness, Answer Relevance, Context Sufficiency, etc
    • You can segment these metrics by any property: customer ID, environment, model, prompt, etc.
      • For example, you could use Athina to see how prompt/v4 is performing for customer ID nike-usa and how gpt-4 performance compares to a llama finetune.

Athina Evaluation Dashboard

Our Story

As a team of engineers and hackers, we spent a summer trying to build various LLM-powered applications for developers.

While working with LLMs, we found that the most challenging part was evaluating the Generative AI output and systematically improving model performance.

We discovered a major gap in the tools that engineers need to effectively build production grade applications using LLMs, and set out to solve this problem.

Get Started

Athina AI is a comprehensive suite of tools to supercharge your LLM development lifecycle and help you ship high-performing, reliable AI applications with confidence.

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (2)

Collapse
 
ai_dan21 profile image
Danny

Is this open source?

Collapse
 
hbamoria profile image
Himanshu Bamoria

Hi @ai_dan21
Yes our evaluators are open-source. You can have a look here - github.com/athina-ai/athina-evals

Would love to know your thoughts.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay