The problem I kept running into
Every time I finished training a model, the same conversation happened:
Manager: "Why did it predict that?"
Me: opens SHAP plot
Manager: glazed eyes
SHAP and LIME are powerful — but they output numbers and plots that
only data scientists can read. Nobody builds the bridge to plain English.
Nobody automates the bias check. Nobody generates a report your legal
team can actually use.
So I built XAI-Agent to do all of that — powered by Hermes Agent's
autonomous multi-step planning pipeline.
What it does
Upload any trained ML model (.pkl) + dataset (.csv) →
Hermes Agent runs 5 tools autonomously →
You get a full plain-English explainability report in under 3 minutes.
The 5-step Hermes Agent pipeline:
-
file_reader— loads model, auto-detects task type, picks right explainer -
shap_analyzer— runs real SHAP, ranks all features by impact + direction -
lime_explainer— explains 3 individual predictions in plain English -
bias_checker— scans for demographic features, flags disparities -
report_writer— writes structured Markdown report, downloadable instantly
What makes this genuinely agentic
Context flows between all 5 tools. The model type from Step 1
determines which SHAP explainer Step 2 uses. The feature ranking
from Step 2 informs Step 3's LIME analysis. The bias verdict from
Step 4 shapes Step 5's recommendations.
It also handles a real edge case most tutorials miss: newer SHAP
versions return 3D arrays (samples, features, classes) instead of 2D.
The agent detects this automatically and slices correctly —
a bug that breaks every naive SHAP implementation.
Sample output
Running on the breast cancer dataset (569 patients, 30 features):
Executive Summary (auto-generated):
This RandomForestClassifier was analyzed across 569 samples and
30 features. The most influential predictor is 'worst area'.
No demographic bias was detected.
SHAP top features:
- worst area — 0.0756 — ↑ increases malignancy prediction
- worst concave points — 0.0538 — ↑ increases malignancy prediction
- mean concave points — 0.0503 — ↑ increases malignancy prediction
Prediction explained in plain English:
Row 0 — Predicted benign at 94% confidence.
'worst area' was well below the malignancy threshold
(impact: −0.141). 'worst concave points' also supported
benign classification (impact: −0.089).
Why this matters beyond the challenge
EU AI Act requires explainability for high-risk AI systems.
GDPR gives citizens the right to explanation for automated decisions.
US financial regulators require adverse action explanations for
ML credit scoring.
Existing tools (Fiddler, Arize, Arthur AI) cost $50K+/year.
XAI-Agent is free, open-source, runs locally, works in 3 minutes.
Tech stack
- Hermes Agent (autonomous multi-step planning)
- SHAP + LIME (real explainability — not simulated)
- Streamlit (UI)
- scikit-learn, XGBoost, LightGBM
Try it yourself
GitHub: https://github.com/SimranShaikh20/xai-agent
git clone https://github.com/SimranShaikh20/xai-agent
pip install -r requirements.txt
streamlit run app.py
Test files (sample_model.pkl + sample_dataset.csv) included —
runs in 3 minutes with zero extra setup.
What model would YOU run this on first? Drop it in the comments 👇
Top comments (0)