dinesh0666

Posted on Mar 18

I Built a Claude Skill That Turns Any CSV Into an Executive Report — Here's How

#claude #anthropic #datascience #buildinpublic

Every data person knows the drill.

Someone drops a new dataset in your Slack. You open it, stare at 40 columns and 50,000 rows, and spend the next two hours doing what you always do — profiling nulls, checking distributions, writing up "here's what the data says" for the third time this month.

That two hours? I automated it. With a Claude Skill.

It's called DataStory — upload any CSV or Excel file, and it produces a full executive data narrative report in seconds. Zero manual analysis. I open-sourced it at github.com/dinesh0666/data-story.

Here's exactly how it works.

What is a Claude Skill?

If you haven't seen these yet — Claude Skills are installable instruction sets (SKILL.md files) that teach Claude a specific repeatable workflow. When you install a skill and then trigger it (by uploading a file, using certain phrases, etc.), Claude reads the skill definition and executes the workflow step by step.

Think of it like a .github/workflows YAML, but for AI-driven tasks instead of CI/CD.

The pattern I've been exploring lately is Claude-in-Claude — where the Claude artifact or skill itself calls the Anthropic API to do the heavy analytical lifting. The UI is Claude, the analyst is also Claude. It's surprisingly powerful.

The DataStory architecture

Here's the full pipeline:

CSV/XLSX file
    ↓
profile_data.py        ← pandas: shape, nulls, distributions, outliers
    ↓
JSON profile
    ↓
Anthropic API          ← claude-sonnet-4-20250514: narrative generation
    ↓
JSON narrative
    ↓
generate_report.js     ← docx npm: 9-section Word report
    ↓
datastory_<file>.docx

Three scripts, one orchestrator shell, and a React artifact that does all of this in-browser.

Step 1: Profile the data (Python)

The first script — profile_data.py — runs pure pandas profiling. No ML, no magic. Just stats.

For every column, it outputs:

Numeric: min, max, mean, median, std, q1, q3, outlier_count (IQR method)
Categorical: top 5 value frequencies
Datetime: range in days
All: null count, null %, unique count

# Outlier detection via IQR
iqr = q3 - q1
lower, upper = q1 - 1.5 * iqr, q3 + 1.5 * iqr
outliers = int(((nums < lower) | (nums > upper)).sum())

Files over 10k rows get automatically sampled. CSV encoding failures fall back to latin-1. The output is a clean JSON blob.

Step 2: Generate the narrative (Claude API)

This is where the Claude-in-Claude magic happens. I send the profile JSON to claude-sonnet-4-20250514 with max_tokens: 4000 and a strict system prompt:

You are a senior data analyst writing an executive data story report.
Given a statistical profile, return ONLY valid JSON with these keys:
executive_summary, dataset_overview, key_findings (array of 5),
anomalies, column_insights, data_quality_score (0-100),
data_quality_label, data_quality_assessment, recommended_next_steps.

A critical lesson from building this: always set max_tokens high enough. My first version used 1000 and got Unterminated string in JSON errors because the response was being cut off mid-object. Bumped it to 4000 and it works cleanly.

I also slim the profile before sending — only 3 sample rows and 3 top values per column — to keep the prompt tight without losing analytical context.

const slimProfile = {
  ...profile,
  sample_rows: profile.sample_rows.slice(0, 3),
  columns: profile.columns.map(c => ({
    ...c,
    top_values: c.top_values
      ? Object.fromEntries(Object.entries(c.top_values).slice(0, 3))
      : undefined
  }))
};

The JSON parsing is hardened too — strips markdown fences, falls back to regex-extracting the first {...} block if Claude adds any preamble:

try {
  return JSON.parse(clean);
} catch {
  const match = clean.match(/\{[\s\S]*\}/);
  if (match) return JSON.parse(match[0]);
  throw new Error(`Unexpected response`);
}

Step 3: Generate the report (Node.js + docx)

The generate_report.js script takes the combined profile + narrative and builds a 9-section Word document using the docx npm package:

Cover (title, filename, date)
Executive Summary (callout box)
Dataset Overview (stats table)
Key Findings (numbered list)
Anomalies & Concerns (amber callout — auto-skipped if empty)
Column Profiles (full table: type, nulls, stats, outliers)
Column Insights (bullet list)
Data Quality Assessment (score + paragraph)
Recommended Next Steps + Appendix

A few docx gotchas that burned me and might save you:

// ❌ NEVER use WidthType.PERCENTAGE — breaks in Google Docs
// ✅ Always use WidthType.DXA
width: { size: 9360, type: WidthType.DXA }

// ❌ NEVER ShadingType.SOLID — renders as solid black
// ✅ Always ShadingType.CLEAR
shading: { fill: "D6E4F0", type: ShadingType.CLEAR }

// Tables need DUAL widths — both on table AND each cell
// Forgetting one causes rendering issues on some platforms

The React artifact (Claude-in-Claude in the browser)

The app/DataStory.jsx file does the whole pipeline in-browser — no backend, no server. Just a React component that:

Parses the uploaded file client-side using SheetJS (xlsx)
Runs the profiling in pure JS (rewritten from the Python logic)
Calls the Anthropic API directly from the browser
Renders the full interactive report

And critically — there's a Download report button that generates a self-contained HTML file. Open it in any browser, hit Ctrl+P, save as PDF. That's your shareable report.

The quality ring (SVG animated arc) was a fun touch:

const dash = (score / 100) * circ;
<circle cx="42" cy="42" r={r} stroke={color}
  strokeDasharray={`${dash} ${circ}`}
  strokeLinecap="round"
  transform="rotate(-90 42 42)"
  style={{ transition: "stroke-dasharray 1s ease" }} />

The SKILL.md

The skill definition is what makes this installable in Claude.ai. The description field is critical — it's how Claude decides when to trigger the skill:

description: >
  Auto-generates an executive data narrative report from any uploaded CSV or Excel file.
  Trigger when user uploads a CSV, XLSX, or data file and wants analysis, a data story,
  a data profile, a summary report, insights, anomaly detection, or an executive summary.
  Also trigger when user says: "analyze this data", "what does this dataset say",
  "generate a report from this file", "profile my data", "summarize this spreadsheet".

The description needs to be pushy — Claude tends to undertrigger skills if the description is too vague. List out the exact phrases people will use.

Running it

# CLI — full pipeline
export ANTHROPIC_API_KEY=sk-ant-...
bash skill/scripts/run_datastory.sh sales_data.csv
# → datastory_sales_data.docx

# Just profile (no API key needed)
python3 skill/scripts/profile_data.py data.csv > profile.json

Or just use the React artifact — drag and drop your file, done.

What I learned building this

The Claude-in-Claude pattern is underrated. Having the artifact call the Anthropic API directly removes the need for any backend while still getting full model power. The browser becomes the compute layer.

Skill descriptions are the whole triggering mechanism. Spending 10 minutes writing a precise, phrase-rich description is worth more than spending hours on the skill logic if Claude never triggers it.

Data profiling is still a solved problem — the narrative is the hard part. Pandas stats take seconds. Writing "what this means" used to take hours. That's the gap the LLM fills.

What's next

I'm planning to add:

Multi-sheet Excel support — currently reads sheet 1 only
Correlation matrix — for numeric columns, spot which variables move together
Time series detection — if a datetime column exists, auto-generate trend narrative
PDF output — skip the browser print step

Repo: github.com/dinesh0666/data-story

If you build something with it or find a bug, open an issue. And if this sparked an idea for your own Claude Skill — I'd love to see what you build.