Hey everyone! It's been a while since I last posted about this project. Thanks to Claude Code's assistance, I've been continuously iterating on this open-source project in my spare time. Today, I want to share what we've built and where we're heading.
Before diving in, a quick ask: If you're a developer and find this project valuable, please give us a Star on GitHub ⭐️
Repository: https://github.com/DeepFundAI/ai-browser
Stars not only encourage us but also help more people discover this tool. We've gained some traction, but we need more support to keep going.
📊 Current Status
Since the project launch, here's what makes me proud:
- ✅ Received valuable feedback and suggestions
- ✅ Developers are starting to follow and explore the project
- ✅ Cross-platform support (Mac, Windows) running stably
But honestly, as a young open-source project, we need more visibility and support. That's why I'm asking you to give us a Star on GitHub — it really matters.
🎉 What We've Built Recently
1. History Playback + Continue Conversations
Previous pain point: History was read-only, couldn't continue
Now:
- ✅ Click any historical task to replay the full execution (with typewriter effects)
- ✅ Support play/pause/speed control
- ✅ Continue the conversation from where you left off
- ✅ Preview attached files directly
Technical Implementation:
We built a PlaybackEngine that breaks message streams into atomic fragments (AtomicFragment) — the smallest replayable units. This allows precise control over playback progress and speed. Task data is persisted via IndexedDB for offline viewing. When resuming, we restore the complete execution context (workflow, steps, attachments, etc.) to ensure seamless continuation.
This dramatically improves task continuity. For example, if AI helped you collect data yesterday, you can continue analyzing it today without starting over.
2. Human Interaction Capability
Scenario: AI encounters situations requiring human decisions
Solution:
- ✅ AI can ask questions during execution
- ✅ After you respond, AI continues
- ✅ Useful for login confirmations, option selections, etc.
Example:
Task: Help me collect data from a login-required website
AI: Login required. Are you logged in?
You: Yes, already logged in
AI: Got it, continuing data collection...
Technical Implementation:
Based on the eko framework's HumanInteraction message type, AI can initiate interaction requests during execution. We established a bidirectional communication channel between the main and renderer processes via Electron IPC. When AI needs to ask, the workflow pauses and waits for user response. After answering via IPC, the Agent resumes execution. The entire process has complete state management and error handling.
This gives AI Browser true complex task-handling capability.
3. Voice Input Support
Features:
- ✅ Support voice input for tasks (no typing needed!)
- ✅ Support Vosk offline speech recognition
- ✅ Auto-switch recognition models based on language
Technical Implementation:
We use Vosk's local offline speech recognition engine by default — no internet required, protecting user privacy. Vosk automatically loads the corresponding recognition model based on the selected language (Chinese/English). We plan to support Microsoft Azure and iFlytek cloud services as optional alternatives.
This feature is especially useful for:
- When you're too lazy to type
- Quickly inputting complex tasks
- Accessibility needs
Note: Since we use offline speech recognition, we currently embed relatively simple Chinese/English models. Chinese recognition accuracy isn't ideal yet.
4. Multi-Language Internationalization
Support:
- ✅ Chinese/English interface switching
- ✅ Complete translation coverage
- ✅ Date/time localization
Technical Implementation:
Built a complete i18n solution based on i18next + react-i18next. Translation resources are organized by module (main.json, history.json, agent-config.json, etc.) with namespace isolation. Language switching uses Zustand global state management — no page refresh needed. Date/time uses date-fns locale functionality for localized formatting. Future language expansion just requires adding corresponding JSON translation files.
We hope this tool can serve more users beyond just Chinese speakers.
5. Agent Configuration System
Features:
- ✅ Customize Agent prompts (make AI fit your needs)
- ✅ Manage MCP tools (CRUD operations)
- ✅ Configure different Agent capabilities
This makes AI Browser much more flexible and customizable.
6. Toolbox Page
Improvements:
- ✅ Centralized access to all system features
- ✅ Clearer navigation
- ✅ One-click jump to config, scheduled tasks, history, etc.
🗺️ What's Next
Based on feedback and our roadmap, here's what we're prioritizing:
Phase 1 (Near-term, 1-2 weeks)
Quick wins and iterations:
-
Task Working Directory Isolation
- Each task uses an independent working directory
- Avoid file interference between tasks
- Clearer file management
-
Windows Background Running Optimization
- Improve Windows background running characteristics
- Reduce resource usage
- Enhance stability
-
Generated File Download Support
- Direct download of AI-generated files
- Batch download support
- Better file management
-
Playback Speed Control
- Adjust history playback speed
- Fast-forward/slow-motion support
- More flexible playback experience
Phase 2 (Mid-term, 2-4 weeks)
User experience improvements:
-
Performance Optimization
- Virtual scrolling for long conversations (100+ messages without lag)
- Memory optimization
- Faster startup time
-
Multi-Language Enhancement
- Auto-detect system language
- Dynamic download of language-specific offline packages
- Support dynamic configuration of online speech recognition (Microsoft, iFlytek)
-
Theme Customization
- Dark mode
- Multiple color schemes
- User-defined colors
Phase 3 (Long-term, 1-2 months)
Core capability expansion:
-
Visual Workflow Editor
- Support workflow step adjustment
- Support saving specific workflows
- Import saved workflows when creating scheduled tasks
-
Plugin Marketplace
- Official MCP tool library supporting HTTP, stdio, SSE
- Community plugin sharing
- One-click install/update
-
More Agent Support
- ShellAgent (command execution)
- EmailAgent (email send/receive)
- NotionAgent (Notion operations)
🤔 What We Need
As an open-source project, we need three types of support:
1. ⭐️ Stars (Simple but Important)
Why it matters?
- Helps more people discover the project
- Attracts potential contributors
- Gives us motivation to keep developing
Just 5 seconds: https://github.com/DeepFundAI/ai-browser
2. 💬 Feedback and Suggestions
What's your use case?
- What problems have you encountered?
- What features would you like to see?
- Any improvement suggestions?
Tell us on GitHub Issues or in comments!
3. 🤝 Code Contributions
If you're a developer:
- PRs for bug fixes are welcome
- Contribute new features
- Improve documentation
We take every contribution seriously.
📌 Quick Links
- 🌟 GitHub: https://github.com/DeepFundAI/ai-browser
- 📥 Download: https://www.deepfundai.com/altas/download
- 📖 Configuration Guide: https://github.com/DeepFundAI/ai-browser/blob/main/docs/CONFIGURATION.md
- 💬 Issue Tracker: https://github.com/DeepFundAI/ai-browser/issues
Final Thoughts
From initial concept to a tool that solves real problems — this journey has been challenging but fulfilling.
Every Star, every piece of feedback, every user is our motivation to keep going.
If you haven't tried AI Browser yet, download it and give it a spin. If you're already using it, we'd love to hear about your experience.
Most importantly, if you find this project valuable, please give us a Star on GitHub ⭐
👉 https://github.com/DeepFundAI/ai-browser
Let's make AI Browser better together!
artificialintelligence #browserautomation #opensource #productivity #ai
Follow me for more updates on AI tool development insights!
Questions or feedback? Drop a comment below!
Top comments (0)