This system runs 100% locally — no cloud APIs, no data leakage, full privacy.
AI IDE is a voice-enabled coding assistant that allows developers to interact with their development environment using natural voice commands.
🚀 Overview
VoxAI IDE is a voice-enabled AI coding assistant that allows users to interact with a development environment using natural speech. Unlike traditional AI tools, this system runs entirely offline using local models, ensuring privacy, low latency, and full control over execution.
The project combines speech recognition, large language models, and an interactive code editor into a seamless, VS Code-inspired interface.
⸻
🧠 Motivation
Most AI coding assistants rely heavily on cloud APIs, which introduce latency, cost, and privacy concerns. The goal of AI IDE was to:
• Eliminate cloud dependency
• Run AI models locally
• Enable natural voice-based coding workflows
• Maintain full user control over generated code
⸻
🏗️ Architecture
AI IDE follows a modular client-server architecture optimized for real-time interaction:
Frontend (HTML, CSS, JavaScript)
• Three-pane UI: File Explorer, Code Editor, AI Chat
• Voice captured using MediaRecorder API
• Displays AI-generated code for user validation
• “Review & Confirm” system ensures safe executionBackend (Python Flask)
• Handles API endpoints (/process, /write-file)
• Manages communication between UI and AI models
• Processes audio, routes intent, and executes actionsAI Processing Pipeline
• Audio → Text (Speech-to-Text)
• Text → Intent Detection (LLM)
• Intent → Action Execution
• Response → UI renderingFile Safety Workflow
• Generated code is not directly saved
• User must approve changes
• Prevents unintended file modifications
⸻
🔄 Example Workflow
1. User says: “Create a Python function for sorting a list”
2. Audio is captured and sent to backend
3. Whisper converts speech → text
4. LLM detects intent: write_code
5. Code is generated and displayed in editor
6. User reviews and confirms
7. File is saved locally
⸻
🤖 Models Used & Justification
🔹 Whisper (Speech-to-Text)
• High accuracy transcription
• Works well on CPU
• Handles accents and noise effectively
• Ideal for offline usage
🔹 Llama 3 (8B) via Ollama
• Runs locally with strong performance
• OpenAI-compatible API
• Supports structured outputs (JSON intent classification)
• Eliminates cloud dependency
⸻
⚙️ Tech Stack
• Frontend: HTML, CSS, JavaScript
• Backend: Python (Flask)
• STT: Whisper (local)
• LLM: Llama 3 via Ollama
• Audio Processing: ffmpeg
⸻
⚡ Challenges Faced
- Ffmpeg Dependency Issues
Whisper required ffmpeg, which caused runtime errors due to missing PATH configurations.
Solution:
Explicitly configured environment paths:
os.environ["PATH"] += ":/opt/homebrew/bin"
⸻
- File Locking & Concurrency Issues
Temporary audio files caused “Double-Open Lock” errors during concurrent processing.
Solution:
Carefully managed file lifecycle using NamedTemporaryFile with explicit closing before reuse.
⸻
- Markdown Noise in LLM Output
Generated code often included markdown syntax and extra text.
Solution:
• Strong prompt constraints
• Custom regex cleaner to extract pure code
⸻
- Latency in Voice Processing
End-to-end pipeline (audio → AI → response) introduced delays.
Solution:
• Lightweight models (Whisper base, Llama 3 8B)
• Efficient request handling
⸻
📈 Future Improvements
• Faster streaming responses
• Multi-language voice support
• Advanced code execution sandbox
• Plugin-based architecture
⸻
🏁 Conclusion
Building AI IDE demonstrated how powerful AI systems can be built entirely on local infrastructure. By combining speech recognition with LLMs, we can create intuitive and private developer tools that work directly on-device.
This project reflects a shift toward edge AI — where intelligence runs closer to the user, offering better performance, privacy, and control.
⸻
🔗 Demo
💻 Source Code
Top comments (0)