I built an open-source Local RAG & Markdown Extractor because AIs kept choking on my Data Analytics files

#ai #opensource #python #nextjs

I’m taking the Google Data Analytics course right now, which means constantly feeding massive PDFs, huge CSVs, and long documents into LLMs. Almost every tool I tried either lost context, crashed, or was a nightmare to set up.

So I built GoldPan AI to handle the heavy lifting. It's a multimodal extractor and Local RAG workspace that doesn't choke.

The stack:

Takes anything -> outputs Markdown: PDFs, huge CSVs, Audio, YouTube, and JS-heavy web pages. (Uses MarkItDown, Playwright, and Gemini 1.5 Flash for vision/audio).
Parallel Processing: Chunking and parsing run concurrently via ThreadPoolExecutor. This is why it doesn't crash on massive files.
100% Local Vector DB: Embeds straight into a local ChromaDB instance (all-MiniLM-L6-v2). Zero cloud dependency.
Workspace UI: Next.js/Tailwind frontend to chat with your documents natively.
I built this out of personal frustration, but figured some of you might find it useful for your local setups.

GitHub Repo: https://github.com/ptai-eng/GoldPan

DEV Community

I built an open-source Local RAG & Markdown Extractor because AIs kept choking on my Data Analytics files

Top comments (0)