This is part of my journey building the Kai ecosystem—a fully local, offline-first voice assistant that keeps your data yours.
Well, I started building an app for myself first.
I collaborated with Claude to build layered time parsing logic all through natural language and my goal is to see a functional app that does what it is designed for.
Kai Lite: 5-Point Summary
- Privacy-first voice assistant - Complete offline functionality, zero cloud data sharing, all data stays on your device
- Natural voice commands - Add reminders, create memos, check calendar using speech-to-text with pattern-based parsing
- Local-first architecture - Flutter mobile app with SQLite storage, works in airplane mode, no internet required
- User data control - Export/delete everything anytime, transparent permissions, visual indicators when mic is active
- Future ecosystem foundation - Designed to sync with Kai Laptop/Desktop while maintaining privacy and user control
This week, I'm sharing what actually happened when I tried to build a voice agent that works completely offline. Turns out, it is harder than expected for native AI builders.
App Demo
My AI Collaborator This Week
Claude: My main implementation partner throughout this build. From initial architecture decisions to debugging regex patterns, Claude helped me think through each technical challenge and iterate quickly on solutions.
What I Actually Built (The Messy Reality)
Attempt 1: "Let's Build Alexa-Level Voice Commands"
The goal was ambitious: voice commands that work as smoothly as Alexa, but completely local.
Started with the standard Flutter voice setup:
dependencies:
speech_to_text: ^6.3.0
flutter_tts: ^3.8.3
permission_handler: ^11.0.1
Basic voice service structure:
class VoiceService {
final SpeechToText _speech = SpeechToText();
final FlutterTts _tts = FlutterTts();
Future<void> initialize() async {
await _speech.initialize();
// Kai's calm voice settings
await _tts.setSpeechRate(0.9);
await _tts.setPitch(1.0);
}
}
The reality check:
Spent a day testing and realized that even with onDevice: true, the accuracy wasn't consistent enough for the "Alexa-level" experience I wanted.
Result: Needed a completely different approach.
Attempt 2: Comprehensive Pattern-Based Parser (What Actually Works)
Claude suggested focusing on pattern-based parsing instead of trying to build mini-Alexa.
Smart advice—I used AI to help design the VoiceCommandParser architecture and generate comprehensive regex patterns for different ways people naturally speak.
class VoiceCommandParser {
static final Map<String, List<RegExp>> patterns = {
'calendar_add': [
RegExp(r'remind me to (.*?) at (.*)'),
RegExp(r'add (.*?) to calendar at (.*)'),
RegExp(r'schedule (.*?) for (.*)'),
RegExp(r'set reminder (.*?) at (.*)'),
RegExp(r'(.*?) at (.*?) today'),
RegExp(r'(.*?) at (.*?) tomorrow'),
],
'calendar_check': [
RegExp(r"what'?s on my calendar\??"),
RegExp(r"what do i have today\??"),
RegExp(r"show my schedule"),
RegExp(r"any events today\??"),
],
'memo_add': [
RegExp(r'note to self[,:]? (.*)'),
RegExp(r'remember that (.*)'),
RegExp(r'make a note[,:]? (.*)'),
RegExp(r'write down (.*)'),
],
};
static VoiceCommand parse(String input) {
input = input.toLowerCase().trim();
// Check each pattern category
for (final entry in patterns.entries) {
final intent = entry.key;
final patternList = entry.value;
for (final pattern in patternList) {
final match = pattern.firstMatch(input);
if (match != null) {
return _extractCommand(intent, input, match);
}
}
}
// Fuzzy matching fallback
return _fuzzyMatch(input);
}
}
Added smart time parsing that handles natural language:
static String? _parseTime(String timeStr) {
// Natural language conversions
final conversions = {
'morning': '9:00 AM',
'afternoon': '2:00 PM',
'evening': '6:00 PM',
'night': '9:00 PM',
'noon': '12:00 PM',
'midnight': '12:00 AM',
};
// Check natural language first
for (final entry in conversions.entries) {
if (timeStr.contains(entry.key)) {
return entry.value;
}
}
// Parse actual times (3pm, 3:30pm, 15:00)
final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?',
caseSensitive: false).firstMatch(timeStr);
if (timeMatch != null) {
var hour = int.parse(timeMatch.group(1) ?? '0');
final minute = timeMatch.group(2) ?? '00';
var ampm = timeMatch.group(3)?.toUpperCase();
// Smart guessing for ambiguous times
if (ampm == null) {
if (hour >= 7 && hour <= 11) {
ampm = 'AM';
} else if (hour >= 1 && hour <= 6) {
ampm = 'PM';
} else if (hour >= 13 && hour <= 23) {
hour = hour - 12;
ampm = 'PM';
}
}
return '${hour}:${minute} ${ampm}';
}
return null;
}
Multi-turn conversation handler for missing information:
class ConversationHandler {
ConversationContext _context = ConversationContext();
Future<void> handleCommand(String input) async {
final command = VoiceCommandParser.parse(input);
if (command.confidence < 0.7) {
await _voice.speak("I'm not sure. Did you want to add a calendar event or create a memo?");
return;
}
// Handle missing information
if (command.intent == 'calendar_add') {
if (command.title == null) {
_context.state = ConversationState.waitingForTitle;
await _voice.speak("What would you like me to remind you about?");
return;
}
if (command.time == null) {
_context.state = ConversationState.waitingForTime;
await _voice.speak("What time should I set the reminder for?");
return;
}
await _createCalendarEvent(command);
}
}
}
Performance after this approach:
- Recognition accuracy: 90% for supported patterns
- Response time: <300ms end-to-end
- Memory usage: 45MB while active
- Battery impact: <2% over full day of testing
Real example that works:
User: "Remind me to call mom tomorrow at three"
↓
STT: "remind me to call mom tomorrow at three"
↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom tomorrow", time="three"
↓
Time parsing: "three" → "3:00 PM" (afternoon guess)
↓
Date parsing: "tomorrow" → DateTime.now().add(Duration(days: 1))
↓
Create task in SQLite
↓
TTS: "Perfect! I've added 'call mom' for 3 PM tomorrow"
Attempt 3: The Complete Alexa-Level System
Realized I was thinking about this wrong. Instead of trying to match Alexa, I built something simpler that works reliably.
My actual architecture:
// 1. Local STT with better settings
await _speech.listen(
onDevice: true,
listenFor: Duration(seconds: 3), // Shorter timeout
cancelOnError: true,
partialResults: false // Wait for complete result
);
// 2. Pattern-based parsing with multiple variations
static VoiceCommand parse(String input) {
input = input.toLowerCase().trim();
// Check each pattern category
for (final entry in patterns.entries) {
final intent = entry.key;
final patternList = entry.value;
for (final pattern in patternList) {
final match = pattern.firstMatch(input);
if (match != null) {
return _extractCommand(intent, input, match);
}
}
}
return VoiceCommand(intent: 'unknown');
}
// 3. Smart time parsing
static String? _parseTime(String timeStr) {
final conversions = {
'morning': '9:00 AM',
'afternoon': '2:00 PM',
'evening': '6:00 PM',
'noon': '12:00 PM',
};
// Handle natural language first
for (final entry in conversions.entries) {
if (timeStr.contains(entry.key)) {
return entry.value;
}
}
// Then handle actual times like "3pm" or "3:30"
final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?')
.firstMatch(timeStr);
// ... parsing logic
}
Real example of what works:
User says: "Remind me to call mom at three"
↓
Local STT: "remind me to call mom at three"
↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom", time="three"
↓
Parse time: "three" → "3:00 PM" (smart guess for afternoon)
↓
Create task in SQLite
↓
Response: "Added 'call mom' for 3:00 PM today"
Performance after optimization:
- Recognition time: 200-400ms
- Memory usage: 40MB while active
- Accuracy: 85% for supported commands
- Battery impact: <2% over full day
The Privacy Architecture I Actually Built
Problem: How do you prove to users that nothing leaves their phone?
My solution - complete transparency:
1. Visual indicators everywhere:
// Kai bubble pulses when listening
AnimatedContainer(
duration: Duration(milliseconds: 300),
decoration: BoxDecoration(
color: _isListening
? Color(0xFF9C7BD9).withOpacity(0.8) // Active purple
: Color(0xFF9C7BD9).withOpacity(0.2), // Calm purple
shape: BoxShape.circle,
),
)
2. Data export built in from day 1:
class DataExportService {
Future<String> exportAllUserData() async {
final tasks = await CalendarService().getAllTasks();
final memos = await MemoService().getAllMemos();
return jsonEncode({
'export_date': DateTime.now().toIso8601String(),
'tasks': tasks.map((t) => t.toMap()).toList(),
'memos': memos.map((m) => m.toMap()).toList(),
});
}
}
3. One-tap delete everything:
Future<void> deleteAllUserData() async {
await CalendarService().clearAllTasks();
await MemoService().clearAllMemos();
await SharedPreferences.getInstance().then((prefs) => prefs.clear());
// Show confirmation: "All data deleted"
}
What surprised me: In testing, I/user cared more about seeing the "Export my data" and "Delete everything" buttons than perfect voice accuracy. Just knowing I had control felt satisfying.
Database Design That Actually Works Offline
Used SQLite with sync-ready fields from the start:
class Task {
final String id;
final String title;
final DateTime? date;
final String? time;
final bool isCompleted;
// Sync-ready fields for future
final DateTime lastModified;
final String sourceDevice;
final String status; // 'active' | 'deleted'
Task({
required this.id,
required this.title,
this.date,
this.time,
this.isCompleted = false,
required this.lastModified,
this.sourceDevice = 'kai-lite-android',
this.status = 'active',
});
}
Why this works:
- Everything works offline immediately
- Sync fields ready for when I build cross-device features
- Soft deletes mean data recovery is possible
- Device tracking for multi-device scenarios
Performance Debugging (The Fun Stuff)
Issue 1: Memory leaks during voice processing
// Problem: Not disposing speech service
@override
void dispose() {
_speech.stop(); // Added this
_speech.cancel(); // And this
super.dispose();
}
Issue 2: Battery drain from overlay
// Problem: Overlay always active
// Solution: Smart hiding
void _hideOverlayDuringCalls() {
if (_phoneStateService.isInCall()) {
_overlay.hide();
}
}
Issue 3: SQLite performance with 1000+ tasks
// Added indexing for date queries
await db.execute('''
CREATE INDEX IF NOT EXISTS idx_task_date_status
ON tasks(date, status)
''');
What I Learned (Technical & Otherwise)
Technical insights:
- SQLite performs way better than expected on mobile
- Local speech processing is viable if you optimize for specific use cases
- Pattern matching beats AI models for simple command parsing
- Flutter overlays are battery killers if not managed properly
UX insights:
Privacy needs to feel empowering, not defensive
Visual feedback builds more trust than explanations
Reliable simple commands can feel smoother overall than unreliable complex ones
Architecture insights:
Build offline-first from day 1, add sync later
Start with the simplest solution that could work
Real user testing catches issues you never thought of
The Current State
What actually ships:
- 15+ voice command patterns that work reliably
- Complete offline functionality (no internet required)
- Export/delete controls for full data ownership
- <300ms voice response time
Top comments (0)