lsustc

Posted on Dec 7, 2025 • Edited on Dec 10, 2025

AI Browser Updates: How Far Have We Come?

#webdev #ai #programming #agents

Hey everyone! It's been a while since I last posted about this project. Thanks to Claude Code's assistance, I've been continuously iterating on this open-source project in my spare time. Today, I want to share what we've built and where we're heading.

Before diving in, a quick ask: If you're a developer and find this project valuable, please give us a Star on GitHub ⭐️

Repository: https://github.com/DeepFundAI/ai-browser

Stars not only encourage us but also help more people discover this tool. We've gained some traction, but we need more support to keep going.

📊 Current Status

Since the project launch, here's what makes me proud:

✅ Received valuable feedback and suggestions
✅ Developers are starting to follow and explore the project
✅ Cross-platform support (Mac, Windows) running stably

But honestly, as a young open-source project, we need more visibility and support. That's why I'm asking you to give us a Star on GitHub — it really matters.

🎉 What We've Built Recently

1. History Playback + Continue Conversations

Previous pain point: History was read-only, couldn't continue

Now:

✅ Click any historical task to replay the full execution (with typewriter effects)
✅ Support play/pause/speed control
✅ Continue the conversation from where you left off
✅ Preview attached files directly

Technical Implementation:
We built a PlaybackEngine that breaks message streams into atomic fragments (AtomicFragment) — the smallest replayable units. This allows precise control over playback progress and speed. Task data is persisted via IndexedDB for offline viewing. When resuming, we restore the complete execution context (workflow, steps, attachments, etc.) to ensure seamless continuation.

This dramatically improves task continuity. For example, if AI helped you collect data yesterday, you can continue analyzing it today without starting over.

2. Human Interaction Capability

Scenario: AI encounters situations requiring human decisions

Solution:

✅ AI can ask questions during execution
✅ After you respond, AI continues
✅ Useful for login confirmations, option selections, etc.

Example:

Task: Help me collect data from a login-required website

AI: Login required. Are you logged in?
You: Yes, already logged in
AI: Got it, continuing data collection...

Technical Implementation:
Based on the eko framework's HumanInteraction message type, AI can initiate interaction requests during execution. We established a bidirectional communication channel between the main and renderer processes via Electron IPC. When AI needs to ask, the workflow pauses and waits for user response. After answering via IPC, the Agent resumes execution. The entire process has complete state management and error handling.

This gives AI Browser true complex task-handling capability.

3. Voice Input Support

Features:

✅ Support voice input for tasks (no typing needed!)
✅ Support Vosk offline speech recognition
✅ Auto-switch recognition models based on language

Technical Implementation:
We use Vosk's local offline speech recognition engine by default — no internet required, protecting user privacy. Vosk automatically loads the corresponding recognition model based on the selected language (Chinese/English). We plan to support Microsoft Azure and iFlytek cloud services as optional alternatives.

This feature is especially useful for:

When you're too lazy to type
Quickly inputting complex tasks
Accessibility needs

Note: Since we use offline speech recognition, we currently embed relatively simple Chinese/English models. Chinese recognition accuracy isn't ideal yet.

4. Multi-Language Internationalization

Support:

✅ Chinese/English interface switching
✅ Complete translation coverage
✅ Date/time localization

Technical Implementation:
Built a complete i18n solution based on i18next + react-i18next. Translation resources are organized by module (main.json, history.json, agent-config.json, etc.) with namespace isolation. Language switching uses Zustand global state management — no page refresh needed. Date/time uses date-fns locale functionality for localized formatting. Future language expansion just requires adding corresponding JSON translation files.

We hope this tool can serve more users beyond just Chinese speakers.

5. Agent Configuration System

Features:

✅ Customize Agent prompts (make AI fit your needs)
✅ Manage MCP tools (CRUD operations)
✅ Configure different Agent capabilities

This makes AI Browser much more flexible and customizable.

6. Toolbox Page

Improvements:

✅ Centralized access to all system features
✅ Clearer navigation
✅ One-click jump to config, scheduled tasks, history, etc.

🗺️ What's Next

Based on feedback and our roadmap, here's what we're prioritizing:

Phase 1 (Near-term, 1-2 weeks)

Quick wins and iterations:

Task Working Directory Isolation
- Each task uses an independent working directory
- Avoid file interference between tasks
- Clearer file management
Windows Background Running Optimization
- Improve Windows background running characteristics
- Reduce resource usage
- Enhance stability
Generated File Download Support
- Direct download of AI-generated files
- Batch download support
- Better file management
Playback Speed Control
- Adjust history playback speed
- Fast-forward/slow-motion support
- More flexible playback experience

Phase 2 (Mid-term, 2-4 weeks)

User experience improvements:

Performance Optimization
- Virtual scrolling for long conversations (100+ messages without lag)
- Memory optimization
- Faster startup time
Multi-Language Enhancement
- Auto-detect system language
- Dynamic download of language-specific offline packages
- Support dynamic configuration of online speech recognition (Microsoft, iFlytek)
Theme Customization
- Dark mode
- Multiple color schemes
- User-defined colors

Phase 3 (Long-term, 1-2 months)

Core capability expansion:

Visual Workflow Editor
- Support workflow step adjustment
- Support saving specific workflows
- Import saved workflows when creating scheduled tasks
Plugin Marketplace
- Official MCP tool library supporting HTTP, stdio, SSE
- Community plugin sharing
- One-click install/update
More Agent Support
- ShellAgent (command execution)
- EmailAgent (email send/receive)
- NotionAgent (Notion operations)

🤔 What We Need

As an open-source project, we need three types of support:

1. ⭐️ Stars (Simple but Important)

Why it matters?

Helps more people discover the project
Attracts potential contributors
Gives us motivation to keep developing

Just 5 seconds: https://github.com/DeepFundAI/ai-browser

2. 💬 Feedback and Suggestions

What's your use case?

What problems have you encountered?
What features would you like to see?
Any improvement suggestions?

Tell us on GitHub Issues or in comments!

3. 🤝 Code Contributions

If you're a developer:

PRs for bug fixes are welcome
Contribute new features
Improve documentation

We take every contribution seriously.

📌 Quick Links

🌟 GitHub: https://github.com/DeepFundAI/ai-browser
📥 Download: https://www.deepfundai.com/altas/download
📖 Configuration Guide: https://github.com/DeepFundAI/ai-browser/blob/main/docs/CONFIGURATION.md
💬 Issue Tracker: https://github.com/DeepFundAI/ai-browser/issues

Final Thoughts

From initial concept to a tool that solves real problems — this journey has been challenging but fulfilling.

Every Star, every piece of feedback, every user is our motivation to keep going.

If you haven't tried AI Browser yet, download it and give it a spin. If you're already using it, we'd love to hear about your experience.

Most importantly, if you find this project valuable, please give us a Star on GitHub ⭐

DEV Community