DEV Community

PhatTai
PhatTai

Posted on

I built an open-source Local RAG & Markdown Extractor because AIs kept choking on my Data Analytics files

I’m taking the Google Data Analytics course right now, which means constantly feeding massive PDFs, huge CSVs, and long documents into LLMs. Almost every tool I tried either lost context, crashed, or was a nightmare to set up.

So I built GoldPan AI to handle the heavy lifting. It's a multimodal extractor and Local RAG workspace that doesn't choke.

The stack:

Takes anything -> outputs Markdown: PDFs, huge CSVs, Audio, YouTube, and JS-heavy web pages. (Uses MarkItDown, Playwright, and Gemini 1.5 Flash for vision/audio).
Parallel Processing: Chunking and parsing run concurrently via ThreadPoolExecutor. This is why it doesn't crash on massive files.
100% Local Vector DB: Embeds straight into a local ChromaDB instance (all-MiniLM-L6-v2). Zero cloud dependency.
Workspace UI: Next.js/Tailwind frontend to chat with your documents natively.
I built this out of personal frustration, but figured some of you might find it useful for your local setups.

GitHub Repo: https://github.com/ptai-eng/GoldPan

Top comments (0)