Sharing Two Open-Source Projects for Local AI & Secure LLM Access 🚀

#ai #architecture #performance

Hey everyone! I’m finally jumping into the dev.to community. To kick things off, I wanted to share two tools I’ve been developing at the University of Jaén that tackle two common headaches in the AI space: running out of VRAM, and keeping your API chats truly private.

🦥 Quansloth: TurboQuant Local AI Server
The Problem: Standard LLM inference hits a "Memory Wall" with long documents. As context grows, your GPU runs out of memory (OOM) and crashes.
The Solution: Quansloth is a fully private, air-gapped AI server that brings elite KV cache compression to consumer hardware. By bridging a Gradio Python frontend with a highly optimized llama.cpp CUDA backend, it prevents GPU crashes and lets you run massive contexts on a budget.

Key Features:

75% VRAM Savings: Based on Google's TurboQuant (ICLR 2026) implementation, it compresses the AI's "memory" from 16-bit to 4-bit.
Punch Above Your Hardware: Run 32k+ token contexts natively on a 6GB RTX 3060 (a workload that normally demands a 24GB RTX 4090).
Live Analytics & Stability: Intercepts C++ engine logs to report exact VRAM allocation in real-time, keeping the model within physical limits.
Context Injector: Upload long PDFs directly into the chat stream.

🏗️ API2CHAT: Zero-Knowledge, Serverless GUI
The Problem: You want a clean interface to talk to various LLMs, but you don't want to deal with bloated backends, monthly subscriptions, or sending your private files to a centralized server.
The Solution: API2CHAT is an ultra-lightweight (under 9KBs) client-side GUI that connects to any OpenAI-compatible endpoint. It runs entirely in your browser's volatile memory and in any low-end webhosting like NameCheap.

Key Features:

100% Zero-Knowledge: No data or API keys are ever stored. Refreshing the page destroys the session.
Local File Reading: Files (like PDFs) are read locally by your browser and injected into the prompt. Zero uploads to any server.
Host Anywhere: Requires no PHP, Node.js, or Python. Host it on GitHub Pages, an S3 bucket, or literally just double-click index.html on your desktop in any OS.

Both projects are open-source (Apache 2.0). I’d love for you to check them out, leave a star if you find them useful, or drop some feedback in the issues if you end up deploying them!

DEV Community

Sharing Two Open-Source Projects for Local AI & Secure LLM Access 🚀

Top comments (0)