Building a dictation app for Windows sounds simple until you try to actually get text into other applications. After shipping dictate.app, I learned more about Windows text injection than I ever wanted to know. Here's the full picture.
The Problem
You've transcribed audio to text. Now you need to insert that text wherever the user's cursor is — Notepad, VS Code, Excel, a chat app, a browser field, or a terminal running as Administrator. Each app handles input differently. Some block you entirely.
There is no single API that works everywhere. You need a layered approach.
Method 1: SendInput (Win32)
The most direct route. SendInput injects keyboard events at the OS level, simulating actual keypresses.
// Electron / Node.js using ffi-napi to call Win32
const ffi = require("ffi-napi");
const ref = require("ref-napi");
const user32 = ffi.Library("user32", {
SendInput: ["uint", ["uint", "pointer", "int"]],
});
function sendChar(char) {
const INPUT_KEYBOARD = 1;
const KEYEVENTF_UNICODE = 0x0004;
const KEYEVENTF_KEYUP = 0x0002;
// Build INPUT struct for keydown + keyup
const buf = Buffer.alloc(28 * 2); // 2 INPUT structs
// ... fill struct fields
user32.SendInput(2, buf, 28);
}
This works well for standard apps. The catch: it types character by character, which is slow for long transcriptions and can misfire if the user moves focus mid-injection.
Works for: Most desktop apps running at the same privilege level.
Fails for: Elevated processes (apps running as Administrator), games with anti-cheat, some terminals.
Method 2: UI Automation (UIAutomation API)
Microsoft's UIAutomation framework lets you interact with app controls directly — no simulated keypresses. You find the focused element and set its value.
// Using edge-js or a native addon to call UIAutomation COM interfaces
// Pseudocode — actual implementation uses COM interop
const focusedElement = automation.GetFocusedElement();
const valuePattern = focusedElement.GetCurrentPattern(UIA_ValuePatternId);
if (valuePattern) {
valuePattern.SetValue(text);
}
This is cleaner than SendInput — it sets the value atomically, no per-character latency. Accessibility tools like screen readers use this same path.
Works for: Apps that expose UIA ValuePattern — most native Windows controls, some Electron apps, Office.
Fails for: Custom-drawn controls, Chromium-based apps (they partially support UIA but it's inconsistent), elevated processes.
Method 3: WM_PASTE (Windows Messages)
Another approach: put text on the clipboard, then send WM_PASTE directly to the target window.
const { clipboard } = require("electron");
const user32 = ffi.Library("user32", {
PostMessage: ["bool", ["pointer", "uint", "pointer", "pointer"]],
GetForegroundWindow: ["pointer", []],
});
async function pasteText(text) {
clipboard.writeText(text);
const hwnd = user32.GetForegroundWindow();
const WM_PASTE = 0x0302;
user32.PostMessage(hwnd, WM_PASTE, null, null);
}
This is fast and reliable for text editors, but many apps ignore WM_PASTE entirely. Rich text editors handle it differently from plain text fields. And if the user has something important on their clipboard — it's now gone.
Works for: Notepad, WordPad, some chat apps.
Fails for: Browsers, terminals, most modern apps that handle paste internally.
The Hard Part: Elevated Processes
Here's where things get painful. Windows has a security boundary called UIPI — User Interface Privilege Isolation. An app running at medium integrity level (normal user) cannot send input events to a process running at high integrity (Administrator).
This means if the user has a terminal open as Admin, or a system utility elevated via UAC, SendInput calls silently fail. No error. The keystrokes just vanish.
UIAutomation has the same restriction. Cross-process UIA calls across integrity levels are blocked.
Your options:
- Run your own app as Administrator — terrible UX, requires UAC prompt on launch, massive security footprint
- Use a system-level hook — requires a kernel driver or at minimum an elevated service, complex to sign and deploy
- Clipboard injection — the practical solution
The Reliable Fallback: Clipboard Injection
When everything else fails, clipboard-based injection works across privilege boundaries because clipboard access is not subject to UIPI.
The flow:
- Save the current clipboard contents
- Write the transcribed text to clipboard
- Send
Ctrl+VviaSendInput(this works even to elevated windows — keyboard events from a lower-privilege app CAN reach elevated apps via SendInput, only window messages are blocked) - Restore the original clipboard contents after a short delay
const { clipboard } = require("electron");
async function injectViaClipboard(text) {
// Save original
const original = clipboard.readText();
// Write transcription
clipboard.writeText(text);
// Small delay to ensure clipboard is set
await sleep(50);
// Send Ctrl+V
sendKeyCombination("ctrl", "v");
// Restore after paste completes
await sleep(150);
clipboard.writeText(original);
}
Wait — I said SendInput fails for elevated processes. That's true for individual character keystrokes in many cases, but Ctrl+V as a synthesized keystroke still reaches elevated windows because it goes through the global keyboard input queue, not window message routing. The behavior is subtle and depends on the specific Windows version and app.
In practice, clipboard + Ctrl+V is the most reliable method across the widest range of apps.
Tradeoffs of clipboard injection:
- Briefly overwrites clipboard (restored after ~150ms, but race conditions exist)
- Doesn't work if the app has a custom paste handler that ignores Ctrl+V
- If the app is slow to respond, the original clipboard restore can happen before the paste completes
What dictate.app Does
The injection order in dictate.app is:
- Try UIAutomation ValuePattern (fastest, no clipboard disruption)
- Fall back to SendInput character-by-character (works for most apps)
- Fall back to clipboard injection (handles elevated processes and edge cases)
The fallback chain runs automatically. Users never see it — they just see their text appear.
What I'd Do Differently
The UIAutomation path deserves more investment. For apps that support it, it's the cleanest solution — atomic, fast, no clipboard side effects. The challenge is that Chromium-based apps (Electron, Chrome, Edge) have inconsistent UIA support, and a huge percentage of Windows apps are now Electron-based.
For truly bulletproof injection across all scenarios including kernel-level game anti-cheat and maximum-security environments, a signed kernel driver is the real answer. But that's a significant engineering and signing overhead that's hard to justify for a productivity tool.
Clipboard injection with careful save/restore covers 95%+ of real-world cases. The other 5% tends to be niche enough that users don't file bug reports.
If you're building something that needs to inject text into Windows apps, I hope this saves you the week of debugging it cost me. And if you just want dictation that works — dictate.app handles all of this for you.
Top comments (0)