All tests run on an 8-year-old MacBook Air. All results from shipping 7 Mac apps as a solo developer. No sponsored opinion.
Waiting for a full AI response before showing anything feels broken in 2026. Streaming makes the UX feel instant. Here's how I implemented Gemini streaming in Rust with Tauri events.
The Approach
Gemini supports server-sent events (SSE) for streaming responses. The flow:
- Rust backend opens an SSE connection to the Gemini API
- As chunks arrive, emit Tauri events to the frontend
- Frontend appends chunks to the UI in real time
No waiting. No spinner for 3 seconds. Text appears as it generates.
The Rust Side
use futures_util::StreamExt;
#[tauri::command]
async fn stream_gemini(
handle: AppHandle,
prompt: String,
api_key: String,
) -> Result<(), AppError> {
let client = reqwest::Client::new();
// Explicitly serialize body + set Content-Type header
// Note: .json() can silently drop the body in Universal Binary release builds
let body = serde_json::to_string(&serde_json::json!({
"contents": [{"parts": [{"text": prompt}]}]
}))
.map_err(|e| AppError::Serialize(e.to_string()))?;
let mut stream = client
.post(format!(
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:streamGenerateContent?alt=sse&key={}",
api_key
))
.header("Content-Type", "application/json")
.body(body)
.send()
.await?
.bytes_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
let text = String::from_utf8_lossy(&chunk);
for line in text.lines() {
if let Some(data) = line.strip_prefix("data: ") {
if data == "[DONE]" {
break;
}
if let Ok(json) = serde_json::from_str::<serde_json::Value>(data) {
if let Some(content) = json
.pointer("/candidates/0/content/parts/0/text")
.and_then(|v| v.as_str())
{
handle.emit("gemini-chunk", content).ok();
}
}
}
}
}
handle.emit("gemini-done", ()).ok();
Ok(())
}
The Frontend Side
import { listen } from '@tauri-apps/api/event'
async function startStreaming(prompt: string, apiKey: string) {
let response = ''
const unlisten = await listen<string>('gemini-chunk', (event) => {
response += event.payload
setDisplayText(response)
})
const unlistenDone = await listen('gemini-done', () => {
unlisten()
unlistenDone()
setIsStreaming(false)
})
try {
await invoke('stream_gemini', { prompt, apiKey })
} catch (e) {
// Clean up listeners on error too
unlisten()
unlistenDone()
setIsStreaming(false)
setError(String(e))
}
}
The Gotchas
SSE parsing is fragile. The data: prefix, blank lines between events, and [DONE] markers all need handling. The code above handles the common case — add logging for malformed chunks in production.
Always clean up listeners. If the user navigates away mid-stream, orphaned listeners keep firing against dead state. The try/catch above ensures cleanup happens on both success and error paths.
Rate limits mid-stream. Gemini can 429 mid-response. Detect it in the Rust error path and emit a dedicated error event:
// In your error handling
if status == 429 {
handle.emit("gemini-error", "Rate limit hit — please retry").ok();
return Ok(());
}
Don't use .json() in release builds. On Universal Binary (Intel + Apple Silicon) DMG builds, reqwest's .json() can silently drop the request body. Always use explicit .header("Content-Type", "application/json").body(serde_json::to_string(&payload)?) instead.
The Result
Streaming makes AI features feel responsive even on slow networks. The implementation is ~50 lines of Rust and ~20 lines of TypeScript.
Worth it for every AI feature in every app.
If this was useful, a ❤️ helps more than you'd think!
👉 HiyokoBar → https://hiyokomtp.lemonsqueezy.com/checkout/buy/f9b85321-6878-40aa-a472-ff748d6de2d5
X → @hiyoyok
Top comments (1)
TL;DR: