DEV Community

Cover image for Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!
Himanshu Bamoria
Himanshu Bamoria

Posted on

4 1 3

Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!

TL;DR: Athina helps you monitor and evaluate your LLM powered app. Plug and play evals in production. 5 minute setup.


👋 Hey everyone! We’re thrilled to announce the launch of Athina AI, a suite of tools for LLM developers to ship and develop AI products with confidence.

What is Athina AI?

Athina Monitoring Dashboard

Athina AI is a Monitoring & Evaluation platform for LLM developers.

Developers use Athina’s evaluation framework and production monitoring platform to improve the performance and reliability of AI applications through real-time monitoring, analytics, and automatic evaluations.

Problem

  • It is difficult to measure the quality of Generative AI responses.
  • Eyeballing production responses is tough.
  • No easy way to detect unreliable or bad outputs (especially in production).
  • Low visibility into LLM touchpoints.

LLM developers typically have to build lots of in-house infrastructure for monitoring and evaluation.

Solution: Athina AI

  • Quick Setup: Get started in just 5 minutes! The entire integration is 1 simple POST request (and we don’t interfere with your LLM calls)
  • Comprehensive Monitoring Platform: Full visibility into your LLM touchpoints. Search, sort, filter, compare, debug.
  • Prebuilt Evaluations:
    • You can configure automatic evaluations in just a few clicks - use one of our preset evals or define a custom eval.
    • These evals will run against logged inferences automatically.
    • You can also use our open-source library to run evals and iterate rapidly during development.
  • Granular Analytics:
    • Tracks usage metrics like response time, cost, token usage, feedback, and more.
    • Athina also track metrics from the evals, like Faithfulness, Answer Relevance, Context Sufficiency, etc
    • You can segment these metrics by any property: customer ID, environment, model, prompt, etc.
      • For example, you could use Athina to see how prompt/v4 is performing for customer ID nike-usa and how gpt-4 performance compares to a llama finetune.

Athina Evaluation Dashboard

Our Story

As a team of engineers and hackers, we spent a summer trying to build various LLM-powered applications for developers.

While working with LLMs, we found that the most challenging part was evaluating the Generative AI output and systematically improving model performance.

We discovered a major gap in the tools that engineers need to effectively build production grade applications using LLMs, and set out to solve this problem.

Get Started

Athina AI is a comprehensive suite of tools to supercharge your LLM development lifecycle and help you ship high-performing, reliable AI applications with confidence.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (2)

Collapse
 
ai_dan21 profile image
Danny •

Is this open source?

Collapse
 
hbamoria profile image
Himanshu Bamoria •

Hi @ai_dan21
Yes our evaluators are open-source. You can have a look here - github.com/athina-ai/athina-evals

Would love to know your thoughts.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

đź‘‹ Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay