DEV Community

Cover image for Klarity – Open-source tool to analyze uncertainty/entropy in LLM output (github.com/klara-research)
Giovanna
Giovanna

Posted on

2 1 1 2 1

Klarity – Open-source tool to analyze uncertainty/entropy in LLM output (github.com/klara-research)

We've open-sourced Klarity - a tool for analyzing uncertainty and decision-making in LLM token generation. It provides structured insights into how models choose tokens and where they show uncertainty.
What Klarity does:

  • Real-time analysis of model uncertainty during generation - Dual analysis combining log probabilities and semantic understanding - Structured JSON output with actionable insights - Fully self-hostable with customizable analysis models

The tool works by analyzing each step of text generation and returns a structured JSON:

  • uncertainty_points: array of {step, entropy, options[], type} - high_confidence: array of {step, probability, token, context} - risk_areas: array of {type, steps[], motivation} - suggestions: array of {issue, improvement}

Currently supports hugging face transformers (more frameworks coming), we tested extensively with Qwen2.5 (0.5B-7B) models, but should work with most HF LLMs.

Installation is simple: pip install git+https://github.com/klara-research/klarity.git

We are building OS interpretability/explainability tools to visualize & analyse attention maps, saliency maps etc. and we want to understand your pain points with LLM behaviors. What insights would actually help you debug these black box systems?

Links:

Let me know in comments if you find it useful and your all around feedback!

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (1)

Collapse
 
davide_bcc443fb profile image
Davide Cifarelli

Seems interesting for reasoning models to avoid generating useless cot and only call reasoning models when necessary, does it support deepseek r1?

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →