Building a JavaScript Keylogger: How Keystroke Capture Works in Node.js

#javascript #node #windows #security

Building a JavaScript Keylogger: How Keystroke Capture Works in Node.js

When people hear "keylogger," they usually think C or C++ — low-level Windows API calls, hook chains, kernel drivers. But JavaScript is fully capable of building a functional keystroke capture system, especially when paired with Node.js native bindings.

This article breaks down the architecture behind JavaScript-based keystroke logging — not as a guide for misuse, but as a technical deep-dive for Windows security researchers.

The Raw Keypress Model

At the lowest level, keyboard input on Windows generates scancodes — hardware-level identifiers for each physical key press and release. The Windows kernel translates these into virtual key codes (VK codes), which applications receive through the message queue.

There are several ways to capture these:

Method	Level	Description
Windows Hook (WH_KEYBOARD_LL)	System-wide	Low-level keyboard hook via `SetWindowsHookEx`
Raw Input API	Per-application	Register for raw input device data
GetAsyncKeyState	Polling	Check key states at intervals
Windows Hook (WH_KEYBOARD)	Per-thread	Application-level hook for a specific thread

The most practical approach for system-wide capture is the low-level keyboard hook — it doesn't require a DLL injection and works from user mode.

Node.js Native Binding Architecture

JavaScript can't call SetWindowsHookEx directly — it needs a native bridge. The architecture looks like this:

Node.js (JS)
    ↓
ffi-napi / koffi / node-addon-api
    ↓
user32.dll → SetWindowsHookEx(WH_KEYBOARD_LL, ...)
    ↓
Callback → VK code → JS
    ↓
Log stream

Using ffi-napi for Windows API Calls

The ffi-napi package lets you call native Windows DLL functions directly from JavaScript:

const ffi = require('ffi-napi');
const user32 = ffi.Library('user32', {
  'SetWindowsHookExA': ['pointer', ['int', 'pointer', 'pointer', 'uint32']],
  'UnhookWindowsHookEx': ['bool', ['pointer']],
  'CallNextHookEx': ['pointer', ['pointer', 'int', 'pointer', 'pointer']],
  'GetMessageA': ['bool', ['pointer', 'pointer', 'uint32', 'uint32']],
});

The hook callback receives a KBDLLHOOKSTRUCT containing:

vkCode — the virtual key code (1-254)
scanCode — the hardware scancode
flags — LLKHF_* flags (extended key, alt pressed, keyup, etc.)
time — timestamp of the event

Translating VK Codes to Text

VK codes aren't ASCII — they're hardware position codes. Translating them into readable text requires:

Key state tracking — maintain a buffer of currently pressed modifier keys (Shift, Ctrl, Caps Lock)
VK-to-char mapping — map VK codes to characters based on current keyboard layout and modifier state
Positional buffer — track cursor position for accurate text reconstruction

Here's a simplified translator:

const SHIFTED_CHARS = {
  0x30: ')', 0x31: '!', 0x32: '@', 0x33: '#', 0x34: '$',
  0x35: '%', 0x36: '^', 0x37: '&', 0x38: '*', 0x39: '(',
};

function vkToChar(vkCode, shiftDown, capsLock) {
  // Letter keys
  if (vkCode >= 0x41 && vkCode <= 0x5A) {
    const isUpper = shiftDown !== capsLock; // XOR
    return String.fromCharCode(isUpper ? vkCode : vkCode + 32);
  }
  // Number/symbol keys
  if (vkCode >= 0x30 && vkCode <= 0x39) {
    return shiftDown ? SHIFTED_CHARS[vkCode] : String.fromCharCode(vkCode);
  }
  // Space
  if (vkCode === 0x20) return ' ';
  // Enter
  if (vkCode === 0x0D) return '\n';
  // Tab
  if (vkCode === 0x09) return '\t';

  return null; // non-printable
}

Session Reconstruction

A keylogger isn't just about capturing individual keystrokes — it's about reconstructing coherent sessions. The captured data stream includes:

Timestamps for every event
Window focus changes — which application was active
Mouse events — clicking context
Clipboard captures — pasted content

The reconstructed output needs to handle:

Input Token	Reconstruction
`[Backspace]`	Remove previous character
`[Delete]`	Remove next character
`[LeftShift]+a`	Capital 'A'
`[Enter]`	Newline
`[Tab]`	Tab indentation
`[Left]` / `[Right]`	Cursor movement (overwrite mode)
`[CLIPBOARD]`	Paste marker with captured content

This is non-trivial — the buffer must track cursor position, handle text insertion at arbitrary locations, and correctly resolve dead keys.

Beyond Keystrokes

A production keystroke capture system typically includes:

Periodic screenshot capture — saving screen state at configurable intervals alongside keystroke data

Clipboard monitoring — intercepting clipboard changes via AddClipboardFormatListener API for capturing password manager fills and seed phrase pastes

Active window tracking — logging the foreground window title to identify which application is receiving input (browser URL bar, terminal, password manager, etc.)

Auto-sync pipeline — uploading captured data to a remote server at configurable intervals through encrypted channels

How Platforms Like V-Entity Handle This

A full-stack keystroke capture and analysis platform requires:

Native agent deployment — compiled executable that survives reboots, runs as a hidden service
Multi-method capture — keyboard hook, clipboard listener, screen capture, camera streaming
Remote management — per-system configuration for intervals, target paths, and data types
Web dashboard — real-time log viewing, session reconstruction, credential extraction
Build uniqueness — every agent compiled per-deployment to avoid signature-based detection

V-Entity provides all of this in a single platform: custom-compiled Windows agents with keystroke logging, clipboard monitoring, credential extraction, live camera streaming, and an interactive PowerShell takeover shell — all managed from a private web dashboard. Every build is cryptographically unique, compiled with your chosen settings, icon, and compiler backend.

This article is for educational purposes. Understanding how keystroke capture works helps researchers build better defensive tooling. Always work within authorized testing boundaries.