From CSV to Insights: Building a Local AI Data Analysis Pipeline

#agents #ai #datascience #showdev

A few days ago I decided to run a small experiment: how far I could go building a data analysis system using AI agents running locally. The initial idea was simple, upload a dataset and generate something useful from it, but the result ended up being more interesting than I expected.

In about 30 minutes of prototyping, I built a pipeline of agents capable of receiving any CSV or JSON file and producing an executive report with insights, patterns, and recommendations based on the data.

The whole system runs 100% locally, without relying on external APIs.

The architecture is based on a pipeline of specialized agents. Each agent has a specific responsibility, and the output from one becomes context for the next. This creates a progressive chain of analysis where insights accumulate as the pipeline moves forward.

The flow looks roughly like this:

Agent 1 — Schema understanding
The first agent inspects the dataset structure: columns, data types, initial distributions, and possible inconsistencies. It also tries to detect structural anomalies early in the process.

Agent 2 — Statistics and correlations
This stage focuses on more traditional data analysis metrics: averages, distributions, outliers, and correlations between variables.

Agent 3 — Business patterns
Using the statistical output and previous context, this agent attempts to extract more interpretable patterns — recurring behaviors, trends, or relationships that may have business meaning.

Agent 4 — Executive report
The final agent synthesizes everything into a concise report focused on insights and recommendations someone could actually use to make decisions.

One detail that made a big difference was passing context between the agents. Instead of each agent analyzing the dataset independently, each stage receives the output from the previous one. This allows insights to compound throughout the pipeline rather than producing isolated analyses.

The stack is fairly straightforward:

Node.js + TypeScript on the backend
React on the frontend
Server-Sent Events (SSE) for streaming results

With SSE, the user can watch each agent complete its step in real time instead of waiting for the entire pipeline to finish before seeing results. It’s a small UX detail, but it makes the system feel much faster and more interactive.

I also decided to include basic observability from the start. Since I wanted to better understand how the pipeline behaves, I added:

structured logs
execution metrics
per-agent duration tracking
token usage estimation
error rate monitoring

This eventually turned into a small observability dashboard for the pipeline, which makes it easier to see where the system spends the most time.

The project works with any model available in Ollama, so it’s easy to experiment with different local models and compare results.

If you want to explore the idea or adapt it, the code and documentation are available on GitHub:

https://github.com/alanrslima/data-analyst-agents

This started as a quick experiment but opened up some interesting possibilities, especially for people exploring multi-agent architectures and automated data analysis without relying on external AI services.

DEV Community

From CSV to Insights: Building a Local AI Data Analysis Pipeline

Top comments (0)