DEV Community

Cover image for After 4 months solo: shipping a Windows tray AI hotkey on .NET 8 + WPF (and the Win32 paste-back rabbit hole)
Роман Тихоненко
Роман Тихоненко

Posted on • Originally published at capybro.app

After 4 months solo: shipping a Windows tray AI hotkey on .NET 8 + WPF (and the Win32 paste-back rabbit hole)

I spent 4 months of nights and weekends building CapyBro — a Windows tray app that runs AI on any selected text via a global hotkey. Native .NET 8 + WPF (not Electron), MIT-licensed, ~49 MB installer. Two backends: cloud (OpenRouter) or fully local (Ollama). The hardest technical problem turned out to be Win32 paste-back into child controls. This post walks through that rabbit hole + the architecture decisions that paid off.

Why I built this

For most of 2025, my AI workflow was this loop:

1. Read something in Slack / a doc / a Telegram message
2. Alt+Tab → ChatGPT tab → paste
3. Type my prompt
4. Wait
5. Copy result
6. Alt+Tab back → paste over the original
Enter fullscreen mode Exit fullscreen mode

I caught myself doing this 30+ times a day for trivial things — fixing one comma, translating a paragraph for a client email, rewording a DM so it doesn't sound passive-aggressive.

Then it hit me: AI is currently trapped in a browser tab. But every other utility on my PC — clipboard manager, screenshot tool, voice typer, password manager, snippet expander — is one hotkey away. Why isn't AI?

So I built it.

What CapyBro does

You select text anywhere on Windows. Word, Telegram, VS Code, your browser, Notepad, Discord, an email draft, a YAML file in your terminal — doesn't matter. The OS-level selection is the input.

You press Ctrl+Shift+E. A small popup appears. You pick a prompt (or type a custom one). AI runs. The result replaces the original text in the same app.

That's the entire product. The whole magic is in step 3 — replacing text in the source app. Sounds trivial. Took me three iterations of Win32 plumbing to get right.

The Win32 paste-back rabbit hole

This is the most undervalued part of the project. I expected ~100 lines of code. Got 130 lines of comments around a 50-line method that does two things: capture focused-child HWND before showing UI, then restore foreground + focus when user clicks Accept.

Naive solution (doesn't work)

SendKeys.SendWait("^v");
Enter fullscreen mode Exit fullscreen mode

Why it fails:

  1. No control over which window receives the Ctrl+V
  2. If your modal dialog is still open, Ctrl+V goes THERE
  3. If foreground changed between Accept and send → paste lands somewhere random

Better but still broken

IntPtr originalForeground = GetForegroundWindow();
ShowDialog();
// User clicks Accept
SetForegroundWindow(originalForeground);
Thread.Sleep(50);
SendKeys.SendWait("^v");
Enter fullscreen mode Exit fullscreen mode

Works for Notepad. Fails for Notepad++, VS Code, Office. Why?

  1. SetForegroundWindow is gated by OS focus rules — caller must have received last input event, no active foreground lock, target not minimized. Any check fails → it silently returns false.
  2. The actual editor lives in a child control (e.g. Scintilla inside Notepad++). SetForegroundWindow activates the top-level frame, but keyboard focus stays elsewhere. SendInput Ctrl+V then lands on the WindowProc of a non-input frame.

Working solution: AttachThreadInput sandwich

The trick is AttachThreadInput — an API that lets your thread temporarily share input state with another thread. While attached, the OS treats both threads as one for focus/foreground purposes, bypassing the "didn't receive last input event" check.

Plus: I need to know which child HWND had keyboard focus before my modal stole it. GetGUIThreadInfo.hwndFocus returns exactly that.

Here's the production code (extracted from Platform/ForegroundRestorer.cs):

Phase 1: capture, BEFORE showing UI

public static (IntPtr TopLevel, IntPtr FocusedChild) CaptureForegroundFocus()
{
    var topLevel = NativeMethods.GetForegroundWindow();
    if (topLevel == IntPtr.Zero)
        return (IntPtr.Zero, IntPtr.Zero);

    var targetThreadId = NativeMethods.GetWindowThreadProcessId(topLevel, out _);
    if (targetThreadId == 0)
        return (topLevel, IntPtr.Zero);

    var info = new NativeMethods.GUITHREADINFO
    {
        CbSize = (uint)Marshal.SizeOf<NativeMethods.GUITHREADINFO>(),
    };

    if (NativeMethods.GetGUIThreadInfo(targetThreadId, ref info))
        return (topLevel, info.HwndFocus);

    return (topLevel, IntPtr.Zero);
}
Enter fullscreen mode Exit fullscreen mode

Phase 2: restore, AFTER user clicks Accept

public static bool RestoreToForeground(IntPtr topLevel, IntPtr focusedChild)
{
    if (topLevel == IntPtr.Zero) return false;

    var targetThreadId = NativeMethods.GetWindowThreadProcessId(topLevel, out _);
    if (targetThreadId == 0)
        return NativeMethods.SetForegroundWindow(topLevel); // window died, best-effort

    if (NativeMethods.IsIconic(topLevel))
        NativeMethods.ShowWindowAsync(topLevel, NativeMethods.SwRestore);

    var ourThreadId = NativeMethods.GetCurrentThreadId();
    if (ourThreadId == targetThreadId)
        return NativeMethods.SetForegroundWindow(topLevel); // AttachThreadInput on same thread is undefined

    var attached = NativeMethods.AttachThreadInput(ourThreadId, targetThreadId, true);
    try
    {
        NativeMethods.BringWindowToTop(topLevel);
        var fgOk = NativeMethods.SetForegroundWindow(topLevel);

        // CRITICAL: SetFocus on the FOCUSED CHILD, not the top-level frame.
        // Without this, SendInput Ctrl+V echoes into the non-input frame.
        var focusTarget = focusedChild != IntPtr.Zero ? focusedChild : topLevel;
        NativeMethods.SetFocus(focusTarget);

        return fgOk;
    }
    finally
    {
        // ALWAYS detach. Forget this, and the user loses keyboard control
        // system-wide for the lifetime of your process.
        if (attached)
            NativeMethods.AttachThreadInput(ourThreadId, targetThreadId, false);
    }
}
Enter fullscreen mode Exit fullscreen mode

Pieces that earned their place

  1. AttachThreadInput — bypasses focus-stealing protection. Without it, SetForegroundWindow silently no-ops.
  2. SetFocus on child HWND — Notepad++'s actual edit is a Scintilla control nested inside its frame. SetFocus on the frame leaves keyboard focus on the wrong WindowProc.
  3. BringWindowToTop before SetForegroundWindow — raises z-order even when the foreground call is rejected. Belt-and-braces.
  4. IsIconic + ShowWindowAsync(SW_RESTORE)SetForegroundWindow no-ops on minimized targets. Restore first.
  5. finally + AttachThreadInput(false) — I forgot this once during development. Lost system-wide keyboard input until I rebooted. Don't be me.
  6. SendInput not SendKeysSendKeys uses scan codes that break on non-Latin layouts. SendInput works with virtual-key codes.

Bonus rabbit hole: the clipboard is single-owner

Win32 clipboard is a single-owner resource. Clipboard managers, RDP virtual channels, antivirus, even the OS shell briefly hold it. Without retry, any concurrent open throws CLIPBRD_E_CANT_OPEN (HRESULT 0x800401D0) and you lose either the AI result or the user's original selection.

I wrap every clipboard call in an async retry loop (not sync — sync Thread.Sleep between attempts freezes the WPF UI for up to 500ms):

private static async Task<T> RetryAsync<T>(Func<T> action, CancellationToken ct)
{
    const int RetryAttempts = 10;
    var retryDelay = TimeSpan.FromMilliseconds(50);

    for (var attempt = 1; attempt <= RetryAttempts; attempt++)
    {
        ct.ThrowIfCancellationRequested();
        try
        {
            return action();
        }
        catch (COMException ex)
            when (ex.HResult == unchecked((int)0x800401D0)
                && attempt < RetryAttempts)
        {
            await Task.Delay(retryDelay, ct).ConfigureAwait(true);
        }
    }
    throw new InvalidOperationException();
}
Enter fullscreen mode Exit fullscreen mode

The Win32 calls themselves are synchronous (the API has no cancellation hook), but the gaps between retries release the dispatcher so WPF can pump messages, repaint, and respond to input.

Two AI backends, one interface

CapyBro supports OpenRouter (cloud — one API key, ~300 models) and Ollama (local — text never leaves the machine). Both stream responses.

The pain: OpenRouter speaks SSE (data: {...}\n\n, terminated by data: [DONE]), Ollama speaks NDJSON (one JSON object per line, terminated by {"done": true}). Different error shapes, different rate-limit signaling.

The abstraction:

public interface ILlmProvider
{
    IAsyncEnumerable<string> ImproveStreamAsync(
        string apiKey,         // OpenRouter uses; Ollama ignores
        string model,
        string promptText,
        string userText,
        TimeSpan timeout,
        bool preserveLanguage,
        CancellationToken ct = default);

    Task<IReadOnlyList<string>> GetModelsAsync(string apiKey, CancellationToken ct = default);

    bool RequiresApiKey { get; }
}
Enter fullscreen mode Exit fullscreen mode

RequiresApiKey is the cute bit — it lets TextProcessor pre-flight the request. If Provider=OpenRouter and key is empty, show an actionable toast ("set your key in Settings") instead of round-tripping to a 401.

ILlmProviderFactory.Resolve is a switch that throws on unknown enum values, not a fall-back to OpenRouter. A future 3rd provider added without matching switch arm + DI registration will crash on first user interaction instead of silently routing to the wrong backend. That's intentional — silent fallbacks are how you ship "why does my Anthropic key work everywhere except CapyBro?" bug reports.

Ollama edge case: stream truncated vs empty result

A subtle bug I found while testing: Ollama can complete a request without a done:true frame — connection drop, proxy timeout, antivirus interception. The total content length is 0, but it's not "the model returned nothing" — it's "the network died."

var sawDoneFrame = false;
var totalContentLength = 0;

await foreach (var frame in ReadNdjsonFramesAsync(response, cts.Token))
{
    if (frame.Done) sawDoneFrame = true;
    if (frame.Delta.Length == 0) continue;
    totalContentLength += frame.Delta.Length;
    yield return frame.Delta;
}

if (totalContentLength == 0)
{
    throw new OpenRouterException(
        sawDoneFrame
            ? _translator["api_empty_result"]   // model returned ""
            : _translator["api_server_error"]); // stream interrupted
}
Enter fullscreen mode Exit fullscreen mode

Two different toasts: "the model didn't produce output for your prompt" vs "check whether ollama serve is running." Different remediation paths for the user.

Installer size: 150 MB → 49 MB

Self-contained .NET 8 + WPF publish folder is ~150 MB. Single-file .exe is tempting but:

  • Single-file decompresses into memory on every launch → visible cold-start latency
  • Self-extracting native libraries unpack to %TEMP% on first run → first-launch hit
  • For a tray app the user opens dozens of times a day, that's noticeable

So: folder build + NSIS LZMA SOLID compression. The csproj:

<PropertyGroup Condition="'$(_IsPublishing)' == 'true'">
  <SelfContained>true</SelfContained>
  <RuntimeIdentifier>win-x64</RuntimeIdentifier>
  <PublishReadyToRun>true</PublishReadyToRun>
  <DebugType>none</DebugType>
</PropertyGroup>

<!-- Strip 13 culture-specific satellite assembly folders. -->
<!-- Our UI translations live in Translator.cs, not satellite assemblies. -->
<SatelliteResourceLanguages>en</SatelliteResourceLanguages>
Enter fullscreen mode Exit fullscreen mode

NSIS:

SetCompressor /SOLID lzma
SetCompressorDictSize 64
File /r "..\publish\win-x64\*.*"
Enter fullscreen mode Exit fullscreen mode

LZMA SOLID archives everything as one stream rather than per-file. Repeated bytes across files compress much better. ~49 MB installer, ~150 MB unpacked.

What I didn't do: PublishTrimmed. WPF heavily uses reflection for XAML binding + resource lookup. Trimmer eagerly removes "unused" types, then runtime XAML lookup explodes with Type not found. I tried TrimMode=partial and got 25 MB savings + 12 runtime regressions. Reverted.

Open source as a trust mechanism, not ideology

This is a utility that reads my text — sometimes confidential (client emails, draft docs). Would I trust it if it were closed-source from an unknown indie dev?

No.

So why should other people trust me?

Answer: open the source. Remove "trust me bro" and show what happens.

I picked MIT. Almost went "source-available" (popular among indie SaaS right now) but decided:

  • If someone forks and fixes a bug I missed → that saves me work, doesn't "compete" with my product
  • If someone forks and sells their own version → they still don't have my community, my support, my updates. The product isn't the code.
  • "Source-available" gets a negative reaction in the dev community. MIT gets a positive one.

API keys live in Windows Credential Manager via Meziantou.Framework.Win32.CredentialManager, not config.json. DPAPI encryption under the hood, bound to the user account, non-portable across machines by design.

Lessons after 4 months

  1. Win32 is alive. Microsoft didn't replace it — they hid it behind WPF/WinUI. Build anything non-trivial system-side, and you're back to user32.dll. My P/Invoke list for the core workflow: RegisterHotKey, SendInput, GetForegroundWindow, GetGUIThreadInfo, AttachThreadInput, SetFocus, BringWindowToTop, IsIconic, ShowWindowAsync. That's just the baseline.

  2. Native beats Electron for tray utilities. Not ideology — pragmatism. 49 MB vs 250 MB installer, <1s cold start vs 3-5s, ~80 MB RAM idle vs ~400 MB. For something that lives in the background, that's the difference between "I don't notice it" and "oh there you are."

  3. Local\ Mutex, not Global\. Singletons usually use Global\ namespace, which requires SeCreateGlobalPrivilege. That right is granted to interactive users by default but stripped on locked-down domain machines (kiosks, AppLocker configs). On those systems, my app crashed at startup with UnauthorizedAccessException. Local\ (per-session) has no such restriction and matches the semantics I actually want (one instance per user session, not per machine):

   public const string DefaultMutexName = @"Local\CapyBroV2";
   var mutex = new Mutex(initiallyOwned: false, name: mutexName, createdNew: out var createdNew);
Enter fullscreen mode Exit fullscreen mode

Also: initiallyOwned: false. With true, I'd get AbandonedMutexException after every crash. With false, process death cleans up silently.

  1. Foreground-poller for popup dismiss, not Mouse.Capture. My first prompt-picker used Mouse.Capture(this, SubTree) to detect clicks outside. WPF ListBox grabs Mouse.Capture internally for click-drag selection — my LostMouseCapture handler closed the popup BEFORE the user's MouseLeftButtonUp reached the ListBox. Final version uses a 100ms DispatcherTimer + GetForegroundWindow() poll. If foreground isn't my popup → close. Cross-process clicks (browser tabs, Notepad, Telegram) are invisible to WPF's input system — polling Win32 is the only reliable catch-all.

  2. STJ with source generation. Not JsonSerializer.Deserialize<T>(json) (reflection-based). Instead:

   [JsonSerializable(typeof(AppConfig))]
   public partial class AppConfigJsonContext : JsonSerializerContext { }

   var config = JsonSerializer.Deserialize(json, AppConfigJsonContext.Default.AppConfig);
Enter fullscreen mode Exit fullscreen mode

One day to set up [JsonSerializable] attrs for each DTO. Result: AOT-friendly, no runtime reflection, faster parsing, cleaner stack traces on JsonException.

  1. Tech is ~30% of the work. The other 70% is marketing, docs, screenshots, localizations, SEO, GitHub issue triage, replying on Reddit. As a solo dev, that's not "side activity" — it's the activity after MVP.

Stack receipts

  • ~12,000 lines of C# (the WPF app)
  • ~3,000 lines of Next.js (marketing site)
  • ~4 months, nights + weekends
  • ~$130 spent (domain, OpenRouter test credits, stock icons I didn't end up using)
  • Coffee: uncountable

What's next

  • macOS port via Avalonia (~2 months)
  • Browser extension companion for web apps with shadow DOM
  • Native AOT once WPF + AOT become compatible (would shave 49 MB → ~25 MB)

Code + links

If you've built a similar Windows-side AI tool, I'd love to hear what Win32 weirdness you ran into. The Office (Word, Excel) paste-back behavior is something I still haven't 100% nailed — Word works, Excel works only via the F2/Esc edit-mode dance. If anyone has a clean solution, drop it in the comments 🙏

Thanks for reading.

Top comments (0)