I spent 4 months of nights and weekends building CapyBro — a Windows tray app that runs AI on any selected text via a global hotkey. Native .NET 8 + WPF (not Electron), MIT-licensed, ~49 MB installer. Two backends: cloud (OpenRouter) or fully local (Ollama). The hardest technical problem turned out to be Win32 paste-back into child controls. This post walks through that rabbit hole + the architecture decisions that paid off.
Why I built this
For most of 2025, my AI workflow was this loop:
1. Read something in Slack / a doc / a Telegram message
2. Alt+Tab → ChatGPT tab → paste
3. Type my prompt
4. Wait
5. Copy result
6. Alt+Tab back → paste over the original
I caught myself doing this 30+ times a day for trivial things — fixing one comma, translating a paragraph for a client email, rewording a DM so it doesn't sound passive-aggressive.
Then it hit me: AI is currently trapped in a browser tab. But every other utility on my PC — clipboard manager, screenshot tool, voice typer, password manager, snippet expander — is one hotkey away. Why isn't AI?
So I built it.
What CapyBro does
You select text anywhere on Windows. Word, Telegram, VS Code, your browser, Notepad, Discord, an email draft, a YAML file in your terminal — doesn't matter. The OS-level selection is the input.
You press Ctrl+Shift+E. A small popup appears. You pick a prompt (or type a custom one). AI runs. The result replaces the original text in the same app.
That's the entire product. The whole magic is in step 3 — replacing text in the source app. Sounds trivial. Took me three iterations of Win32 plumbing to get right.
The Win32 paste-back rabbit hole
This is the most undervalued part of the project. I expected ~100 lines of code. Got 130 lines of comments around a 50-line method that does two things: capture focused-child HWND before showing UI, then restore foreground + focus when user clicks Accept.
Naive solution (doesn't work)
SendKeys.SendWait("^v");
Why it fails:
- No control over which window receives the Ctrl+V
- If your modal dialog is still open, Ctrl+V goes THERE
- If foreground changed between Accept and send → paste lands somewhere random
Better but still broken
IntPtr originalForeground = GetForegroundWindow();
ShowDialog();
// User clicks Accept
SetForegroundWindow(originalForeground);
Thread.Sleep(50);
SendKeys.SendWait("^v");
Works for Notepad. Fails for Notepad++, VS Code, Office. Why?
-
SetForegroundWindowis gated by OS focus rules — caller must have received last input event, no active foreground lock, target not minimized. Any check fails → it silently returns false. - The actual editor lives in a child control (e.g. Scintilla inside Notepad++).
SetForegroundWindowactivates the top-level frame, but keyboard focus stays elsewhere.SendInput Ctrl+Vthen lands on the WindowProc of a non-input frame.
Working solution: AttachThreadInput sandwich
The trick is AttachThreadInput — an API that lets your thread temporarily share input state with another thread. While attached, the OS treats both threads as one for focus/foreground purposes, bypassing the "didn't receive last input event" check.
Plus: I need to know which child HWND had keyboard focus before my modal stole it. GetGUIThreadInfo.hwndFocus returns exactly that.
Here's the production code (extracted from Platform/ForegroundRestorer.cs):
Phase 1: capture, BEFORE showing UI
public static (IntPtr TopLevel, IntPtr FocusedChild) CaptureForegroundFocus()
{
var topLevel = NativeMethods.GetForegroundWindow();
if (topLevel == IntPtr.Zero)
return (IntPtr.Zero, IntPtr.Zero);
var targetThreadId = NativeMethods.GetWindowThreadProcessId(topLevel, out _);
if (targetThreadId == 0)
return (topLevel, IntPtr.Zero);
var info = new NativeMethods.GUITHREADINFO
{
CbSize = (uint)Marshal.SizeOf<NativeMethods.GUITHREADINFO>(),
};
if (NativeMethods.GetGUIThreadInfo(targetThreadId, ref info))
return (topLevel, info.HwndFocus);
return (topLevel, IntPtr.Zero);
}
Phase 2: restore, AFTER user clicks Accept
public static bool RestoreToForeground(IntPtr topLevel, IntPtr focusedChild)
{
if (topLevel == IntPtr.Zero) return false;
var targetThreadId = NativeMethods.GetWindowThreadProcessId(topLevel, out _);
if (targetThreadId == 0)
return NativeMethods.SetForegroundWindow(topLevel); // window died, best-effort
if (NativeMethods.IsIconic(topLevel))
NativeMethods.ShowWindowAsync(topLevel, NativeMethods.SwRestore);
var ourThreadId = NativeMethods.GetCurrentThreadId();
if (ourThreadId == targetThreadId)
return NativeMethods.SetForegroundWindow(topLevel); // AttachThreadInput on same thread is undefined
var attached = NativeMethods.AttachThreadInput(ourThreadId, targetThreadId, true);
try
{
NativeMethods.BringWindowToTop(topLevel);
var fgOk = NativeMethods.SetForegroundWindow(topLevel);
// CRITICAL: SetFocus on the FOCUSED CHILD, not the top-level frame.
// Without this, SendInput Ctrl+V echoes into the non-input frame.
var focusTarget = focusedChild != IntPtr.Zero ? focusedChild : topLevel;
NativeMethods.SetFocus(focusTarget);
return fgOk;
}
finally
{
// ALWAYS detach. Forget this, and the user loses keyboard control
// system-wide for the lifetime of your process.
if (attached)
NativeMethods.AttachThreadInput(ourThreadId, targetThreadId, false);
}
}
Pieces that earned their place
-
AttachThreadInput— bypasses focus-stealing protection. Without it,SetForegroundWindowsilently no-ops. -
SetFocuson child HWND — Notepad++'s actual edit is a Scintilla control nested inside its frame.SetFocuson the frame leaves keyboard focus on the wrong WindowProc. -
BringWindowToTopbeforeSetForegroundWindow— raises z-order even when the foreground call is rejected. Belt-and-braces. -
IsIconic+ShowWindowAsync(SW_RESTORE)—SetForegroundWindowno-ops on minimized targets. Restore first. -
finally+AttachThreadInput(false)— I forgot this once during development. Lost system-wide keyboard input until I rebooted. Don't be me. -
SendInputnotSendKeys—SendKeysuses scan codes that break on non-Latin layouts.SendInputworks with virtual-key codes.
Bonus rabbit hole: the clipboard is single-owner
Win32 clipboard is a single-owner resource. Clipboard managers, RDP virtual channels, antivirus, even the OS shell briefly hold it. Without retry, any concurrent open throws CLIPBRD_E_CANT_OPEN (HRESULT 0x800401D0) and you lose either the AI result or the user's original selection.
I wrap every clipboard call in an async retry loop (not sync — sync Thread.Sleep between attempts freezes the WPF UI for up to 500ms):
private static async Task<T> RetryAsync<T>(Func<T> action, CancellationToken ct)
{
const int RetryAttempts = 10;
var retryDelay = TimeSpan.FromMilliseconds(50);
for (var attempt = 1; attempt <= RetryAttempts; attempt++)
{
ct.ThrowIfCancellationRequested();
try
{
return action();
}
catch (COMException ex)
when (ex.HResult == unchecked((int)0x800401D0)
&& attempt < RetryAttempts)
{
await Task.Delay(retryDelay, ct).ConfigureAwait(true);
}
}
throw new InvalidOperationException();
}
The Win32 calls themselves are synchronous (the API has no cancellation hook), but the gaps between retries release the dispatcher so WPF can pump messages, repaint, and respond to input.
Two AI backends, one interface
CapyBro supports OpenRouter (cloud — one API key, ~300 models) and Ollama (local — text never leaves the machine). Both stream responses.
The pain: OpenRouter speaks SSE (data: {...}\n\n, terminated by data: [DONE]), Ollama speaks NDJSON (one JSON object per line, terminated by {"done": true}). Different error shapes, different rate-limit signaling.
The abstraction:
public interface ILlmProvider
{
IAsyncEnumerable<string> ImproveStreamAsync(
string apiKey, // OpenRouter uses; Ollama ignores
string model,
string promptText,
string userText,
TimeSpan timeout,
bool preserveLanguage,
CancellationToken ct = default);
Task<IReadOnlyList<string>> GetModelsAsync(string apiKey, CancellationToken ct = default);
bool RequiresApiKey { get; }
}
RequiresApiKey is the cute bit — it lets TextProcessor pre-flight the request. If Provider=OpenRouter and key is empty, show an actionable toast ("set your key in Settings") instead of round-tripping to a 401.
ILlmProviderFactory.Resolve is a switch that throws on unknown enum values, not a fall-back to OpenRouter. A future 3rd provider added without matching switch arm + DI registration will crash on first user interaction instead of silently routing to the wrong backend. That's intentional — silent fallbacks are how you ship "why does my Anthropic key work everywhere except CapyBro?" bug reports.
Ollama edge case: stream truncated vs empty result
A subtle bug I found while testing: Ollama can complete a request without a done:true frame — connection drop, proxy timeout, antivirus interception. The total content length is 0, but it's not "the model returned nothing" — it's "the network died."
var sawDoneFrame = false;
var totalContentLength = 0;
await foreach (var frame in ReadNdjsonFramesAsync(response, cts.Token))
{
if (frame.Done) sawDoneFrame = true;
if (frame.Delta.Length == 0) continue;
totalContentLength += frame.Delta.Length;
yield return frame.Delta;
}
if (totalContentLength == 0)
{
throw new OpenRouterException(
sawDoneFrame
? _translator["api_empty_result"] // model returned ""
: _translator["api_server_error"]); // stream interrupted
}
Two different toasts: "the model didn't produce output for your prompt" vs "check whether ollama serve is running." Different remediation paths for the user.
Installer size: 150 MB → 49 MB
Self-contained .NET 8 + WPF publish folder is ~150 MB. Single-file .exe is tempting but:
- Single-file decompresses into memory on every launch → visible cold-start latency
- Self-extracting native libraries unpack to
%TEMP%on first run → first-launch hit - For a tray app the user opens dozens of times a day, that's noticeable
So: folder build + NSIS LZMA SOLID compression. The csproj:
<PropertyGroup Condition="'$(_IsPublishing)' == 'true'">
<SelfContained>true</SelfContained>
<RuntimeIdentifier>win-x64</RuntimeIdentifier>
<PublishReadyToRun>true</PublishReadyToRun>
<DebugType>none</DebugType>
</PropertyGroup>
<!-- Strip 13 culture-specific satellite assembly folders. -->
<!-- Our UI translations live in Translator.cs, not satellite assemblies. -->
<SatelliteResourceLanguages>en</SatelliteResourceLanguages>
NSIS:
SetCompressor /SOLID lzma
SetCompressorDictSize 64
File /r "..\publish\win-x64\*.*"
LZMA SOLID archives everything as one stream rather than per-file. Repeated bytes across files compress much better. ~49 MB installer, ~150 MB unpacked.
What I didn't do: PublishTrimmed. WPF heavily uses reflection for XAML binding + resource lookup. Trimmer eagerly removes "unused" types, then runtime XAML lookup explodes with Type not found. I tried TrimMode=partial and got 25 MB savings + 12 runtime regressions. Reverted.
Open source as a trust mechanism, not ideology
This is a utility that reads my text — sometimes confidential (client emails, draft docs). Would I trust it if it were closed-source from an unknown indie dev?
No.
So why should other people trust me?
Answer: open the source. Remove "trust me bro" and show what happens.
I picked MIT. Almost went "source-available" (popular among indie SaaS right now) but decided:
- If someone forks and fixes a bug I missed → that saves me work, doesn't "compete" with my product
- If someone forks and sells their own version → they still don't have my community, my support, my updates. The product isn't the code.
- "Source-available" gets a negative reaction in the dev community. MIT gets a positive one.
API keys live in Windows Credential Manager via Meziantou.Framework.Win32.CredentialManager, not config.json. DPAPI encryption under the hood, bound to the user account, non-portable across machines by design.
Lessons after 4 months
Win32 is alive. Microsoft didn't replace it — they hid it behind WPF/WinUI. Build anything non-trivial system-side, and you're back to user32.dll. My P/Invoke list for the core workflow:
RegisterHotKey,SendInput,GetForegroundWindow,GetGUIThreadInfo,AttachThreadInput,SetFocus,BringWindowToTop,IsIconic,ShowWindowAsync. That's just the baseline.Native beats Electron for tray utilities. Not ideology — pragmatism. 49 MB vs 250 MB installer, <1s cold start vs 3-5s, ~80 MB RAM idle vs ~400 MB. For something that lives in the background, that's the difference between "I don't notice it" and "oh there you are."
Local\Mutex, notGlobal\. Singletons usually useGlobal\namespace, which requiresSeCreateGlobalPrivilege. That right is granted to interactive users by default but stripped on locked-down domain machines (kiosks, AppLocker configs). On those systems, my app crashed at startup withUnauthorizedAccessException.Local\(per-session) has no such restriction and matches the semantics I actually want (one instance per user session, not per machine):
public const string DefaultMutexName = @"Local\CapyBroV2";
var mutex = new Mutex(initiallyOwned: false, name: mutexName, createdNew: out var createdNew);
Also: initiallyOwned: false. With true, I'd get AbandonedMutexException after every crash. With false, process death cleans up silently.
Foreground-poller for popup dismiss, not Mouse.Capture. My first prompt-picker used
Mouse.Capture(this, SubTree)to detect clicks outside. WPF ListBox grabsMouse.Captureinternally for click-drag selection — myLostMouseCapturehandler closed the popup BEFORE the user's MouseLeftButtonUp reached the ListBox. Final version uses a 100msDispatcherTimer+GetForegroundWindow()poll. If foreground isn't my popup → close. Cross-process clicks (browser tabs, Notepad, Telegram) are invisible to WPF's input system — polling Win32 is the only reliable catch-all.STJ with source generation. Not
JsonSerializer.Deserialize<T>(json)(reflection-based). Instead:
[JsonSerializable(typeof(AppConfig))]
public partial class AppConfigJsonContext : JsonSerializerContext { }
var config = JsonSerializer.Deserialize(json, AppConfigJsonContext.Default.AppConfig);
One day to set up [JsonSerializable] attrs for each DTO. Result: AOT-friendly, no runtime reflection, faster parsing, cleaner stack traces on JsonException.
- Tech is ~30% of the work. The other 70% is marketing, docs, screenshots, localizations, SEO, GitHub issue triage, replying on Reddit. As a solo dev, that's not "side activity" — it's the activity after MVP.
Stack receipts
- ~12,000 lines of C# (the WPF app)
- ~3,000 lines of Next.js (marketing site)
- ~4 months, nights + weekends
- ~$130 spent (domain, OpenRouter test credits, stock icons I didn't end up using)
- Coffee: uncountable
What's next
- macOS port via Avalonia (~2 months)
- Browser extension companion for web apps with shadow DOM
- Native AOT once WPF + AOT become compatible (would shave 49 MB → ~25 MB)
Code + links
- Source: https://github.com/phantasmat2018/capy-bro (MIT)
- Site: https://capybro.app
- Installer: https://github.com/phantasmat2018/capy-bro/releases/tag/v2.0.0 (Win 10/11 x64, 49 MB)
If you've built a similar Windows-side AI tool, I'd love to hear what Win32 weirdness you ran into. The Office (Word, Excel) paste-back behavior is something I still haven't 100% nailed — Word works, Excel works only via the F2/Esc edit-mode dance. If anyone has a clean solution, drop it in the comments 🙏
Thanks for reading.
Top comments (0)