I Built an Adaptive EDA Tool That Learns How You Explore Data

#machinelearning #datascience #python #opensource

Most exploratory data analysis tools generate static reports.

You upload a dataset, get dozens of charts, scroll for a few minutes, and leave with information overload instead of actual insight.

After running into this problem repeatedly, I decided to build something different.

So I open sourced XAdaptiveEDA.

A Python + Streamlit tool that adapts its recommendations based on how you interact with your data.

GitHub: https://github.com/AshayK003/XadaptiveEDA

What Makes It Different?

Traditional EDA tools treat every dataset and every user the same way.

XAdaptiveEDA tries to behave more like an adaptive system instead of a one-time report generator.

You upload a CSV, Excel, or JSON file, and the app:

ranks analyses by relevance
tracks your feedback with 👍 and 👎 interactions
adapts future recommendations in real time
avoids repetitive analyses
prioritizes columns and patterns you explore frequently
lets you chat with your dataset using natural language

The goal was to make exploratory data analysis feel more interactive and personalized.

Features

Current capabilities include:

Core Analysis
Distribution analysis
Correlation analysis
Missing value detection
Outlier analysis
Categorical analysis
Time series analysis
Clustering
Feature importance
Adaptive Recommendation Engine

The recommendation engine combines:

data relevance
user preferences
novelty scoring
diversity penalties
temporal decay
affinity tracking
ε-greedy exploration

Instead of dumping every possible chart, the tool tries to surface the analyses most likely to matter.

Built-in AI Features

I also added optional LLM integration for:

chatting with datasets
AI-generated analysis insights
smart column naming
natural language query classification

Supported providers:

Ollama (local-first)
OpenRouter
Groq
Custom APIs

One thing I cared about heavily was privacy.

If you use Ollama locally, your data never leaves your machine.

Tech Stack

The project is intentionally lightweight.

Built with:

Streamlit
Plotly
pandas
NumPy
SQLite
Ollama

No massive infrastructure setup required.

The entire system currently runs with just 6 dependencies.

Engineering Details

Some things I focused on while building this:

explainable recommendation scoring
session persistence with SQLite
progressive sampling for large datasets
GPU acceleration support through Ollama
rate limiting for remote APIs
modular architecture
fully local workflows

The project currently has:

68 passing tests
MIT license
modular analysis pipeline
explainable scoring system
Why I Open Sourced It

I strongly believe useful developer tools should be accessible and hackable.

A lot of data tooling today feels either:

too enterprise-focused
too rigid
too expensive
or too opaque

I wanted to build something developers could actually inspect, extend, and experiment with.

What’s Next

Planned improvements include:

plugin system for custom analyses
exportable reports
dashboard mode
multi-dataset comparison
collaborative sessions

I also want to improve the recommendation quality and overall UX significantly.

Looking for Feedback

I’d genuinely love feedback from: