DEV Community: SmartCity Jaen

FrugalSloth trains small neural nets directly in your browser using WebGL/WebAssembly. Fully private

SmartCity Jaen — Thu, 21 May 2026 08:03:18 +0000

Tired of uploading datasets to someone else's GPU? FrugalSloth trains small neural nets directly in your browser using WebGL/WebAssembly. Fully private — weights never leave your machine. Export to ONNX when you're done. One-click demo included, zero install required.

github.com/PacifAIst/Frugalsloth

Offtoco — count GPT, Claude and Gemini tokens offline for web/CLI/desktop

SmartCity Jaen — Fri, 15 May 2026 13:33:48 +0000

I built Offtoco (https://github.com/PacifAIst/Offtoco) — a zero-knowledge offline token counter for GPT, Claude and Gemini

Tired of pasting prompts into online token counters that send your text to a server you don't control? I built Offtoco to fix that.

It counts tokens for GPT (o200k_base), Claude and Gemini simultaneously, gives you a SHA-256 fingerprint of your text, and does all of it 100% locally — no API calls, no telemetry, no internet required after download.

It ships three ways:

Web app — unzip, open index.html in any browser. Works on a USB stick, an air-gapped machine, or any static server. No Node.js required.
CLI — standalone executables for Windows, Linux and macOS (~90 MB, no dependencies). Pipe text, pass files, get JSON output for scripting.
Windows desktop — system tray app with Explorer right-click integration. Right-click any file → token count popup instantly.

GPL-3.0. Everything in the repo, audit it yourself.

ProxyFace: Give Your AI a Face & Emotions (100% Local, Zero Telemetry)

SmartCity Jaen — Sun, 10 May 2026 19:36:56 +0000

Hey everyone 333,

I wanted to share an open-source project called ProxyFace. If you're interacting with LLMs and want a more engaging experience, this adds a real-time, pixel-art avatar that reacts to the AI's output with actual emotions—and it runs entirely on your own machine.

Your AI now has a face, voice, and ears, but with zero telemetry and zero cloud dependencies for inference.

✨ What makes it special:
100% Local Emotion Brain: Runs a highly optimized 4MB TinyBERT model at 60ms via WebGPU/WASM. The face reacts to the AI's text (embarrassed, curious, delighted, etc.) without hitting any external APIs.

Hands-Free Voice Interaction: Hold Alt+T to speak and release to send. The AI replies and reacts, making it awesome for language learning or just natural conversation.

On-Device Eye Tracking: Uses MediaPipe locally so the avatar’s pupils follow your gaze. Video never leaves your computer.

Customizable Pixel Art: Comes with 40+ characters. You can easily drop in your own sprite sheet and instantly use your own custom avatar.
️ The Tech Stack: Built with React 18, Vite, Tailwind CSS, ONNX Runtime Web, and packaged for desktop with Electron. It is fully open-source under the GPL-3.0 license.

We are actively looking for feedback, developers, and pixel artists who want to submit their own characters to the official gallery (email us at yes@proxyface.com).

If you find the project interesting, giving us a ⭐ on GitHub helps out a lot. Let me know what you think of the tech stack or if you have any questions!

Sharing Two Open-Source Projects for Local AI & Secure LLM Access 🚀

SmartCity Jaen — Sat, 04 Apr 2026 08:38:51 +0000

Hey everyone! I’m finally jumping into the dev.to community. To kick things off, I wanted to share two tools I’ve been developing at the University of Jaén that tackle two common headaches in the AI space: running out of VRAM, and keeping your API chats truly private.

🦥 Quansloth: TurboQuant Local AI Server
The Problem: Standard LLM inference hits a "Memory Wall" with long documents. As context grows, your GPU runs out of memory (OOM) and crashes.
The Solution: Quansloth is a fully private, air-gapped AI server that brings elite KV cache compression to consumer hardware. By bridging a Gradio Python frontend with a highly optimized llama.cpp CUDA backend, it prevents GPU crashes and lets you run massive contexts on a budget.

Key Features:

75% VRAM Savings: Based on Google's TurboQuant (ICLR 2026) implementation, it compresses the AI's "memory" from 16-bit to 4-bit.
Punch Above Your Hardware: Run 32k+ token contexts natively on a 6GB RTX 3060 (a workload that normally demands a 24GB RTX 4090).
Live Analytics & Stability: Intercepts C++ engine logs to report exact VRAM allocation in real-time, keeping the model within physical limits.
Context Injector: Upload long PDFs directly into the chat stream.

🏗️ API2CHAT: Zero-Knowledge, Serverless GUI
The Problem: You want a clean interface to talk to various LLMs, but you don't want to deal with bloated backends, monthly subscriptions, or sending your private files to a centralized server.
The Solution: API2CHAT is an ultra-lightweight (under 9KBs) client-side GUI that connects to any OpenAI-compatible endpoint. It runs entirely in your browser's volatile memory and in any low-end webhosting like NameCheap.

Key Features:

100% Zero-Knowledge: No data or API keys are ever stored. Refreshing the page destroys the session.
Local File Reading: Files (like PDFs) are read locally by your browser and injected into the prompt. Zero uploads to any server.
Host Anywhere: Requires no PHP, Node.js, or Python. Host it on GitHub Pages, an S3 bucket, or literally just double-click index.html on your desktop in any OS.

Both projects are open-source (Apache 2.0). I’d love for you to check them out, leave a star if you find them useful, or drop some feedback in the issues if you end up deploying them!