I’m taking the Google Data Analytics course right now, which means constantly feeding massive PDFs, huge CSVs, and long documents into LLMs. Almost every tool I tried either lost context, crashed, or was a nightmare to set up.
So I built GoldPan AI to handle the heavy lifting. It's a multimodal extractor and Local RAG workspace that doesn't choke.
The stack:
Takes anything -> outputs Markdown: PDFs, huge CSVs, Audio, YouTube, and JS-heavy web pages. (Uses MarkItDown, Playwright, and Gemini 1.5 Flash for vision/audio).
Parallel Processing: Chunking and parsing run concurrently via ThreadPoolExecutor. This is why it doesn't crash on massive files.
100% Local Vector DB: Embeds straight into a local ChromaDB instance (all-MiniLM-L6-v2). Zero cloud dependency.
Workspace UI: Next.js/Tailwind frontend to chat with your documents natively.
I built this out of personal frustration, but figured some of you might find it useful for your local setups.
GitHub Repo: https://github.com/ptai-eng/GoldPan


Top comments (0)