DEV Community

Claudia
Claudia

Posted on

Building a JavaScript Keylogger: How Keystroke Capture Works in Node.js

Building a JavaScript Keylogger: How Keystroke Capture Works in Node.js

When people hear "keylogger," they usually think C or C++ — low-level Windows API calls, hook chains, kernel drivers. But JavaScript is fully capable of building a functional keystroke capture system, especially when paired with Node.js native bindings.

This article breaks down the architecture behind JavaScript-based keystroke logging — not as a guide for misuse, but as a technical deep-dive for Windows security researchers.

The Raw Keypress Model

At the lowest level, keyboard input on Windows generates scancodes — hardware-level identifiers for each physical key press and release. The Windows kernel translates these into virtual key codes (VK codes), which applications receive through the message queue.

There are several ways to capture these:

Method Level Description
Windows Hook (WH_KEYBOARD_LL) System-wide Low-level keyboard hook via SetWindowsHookEx
Raw Input API Per-application Register for raw input device data
GetAsyncKeyState Polling Check key states at intervals
Windows Hook (WH_KEYBOARD) Per-thread Application-level hook for a specific thread

The most practical approach for system-wide capture is the low-level keyboard hook — it doesn't require a DLL injection and works from user mode.

Node.js Native Binding Architecture

JavaScript can't call SetWindowsHookEx directly — it needs a native bridge. The architecture looks like this:

Node.js (JS)
    ↓
ffi-napi / koffi / node-addon-api
    ↓
user32.dll → SetWindowsHookEx(WH_KEYBOARD_LL, ...)
    ↓
Callback → VK code → JS
    ↓
Log stream
Enter fullscreen mode Exit fullscreen mode

Using ffi-napi for Windows API Calls

The ffi-napi package lets you call native Windows DLL functions directly from JavaScript:

const ffi = require('ffi-napi');
const user32 = ffi.Library('user32', {
  'SetWindowsHookExA': ['pointer', ['int', 'pointer', 'pointer', 'uint32']],
  'UnhookWindowsHookEx': ['bool', ['pointer']],
  'CallNextHookEx': ['pointer', ['pointer', 'int', 'pointer', 'pointer']],
  'GetMessageA': ['bool', ['pointer', 'pointer', 'uint32', 'uint32']],
});
Enter fullscreen mode Exit fullscreen mode

The hook callback receives a KBDLLHOOKSTRUCT containing:

  • vkCode — the virtual key code (1-254)
  • scanCode — the hardware scancode
  • flags — LLKHF_* flags (extended key, alt pressed, keyup, etc.)
  • time — timestamp of the event

Translating VK Codes to Text

VK codes aren't ASCII — they're hardware position codes. Translating them into readable text requires:

  1. Key state tracking — maintain a buffer of currently pressed modifier keys (Shift, Ctrl, Caps Lock)
  2. VK-to-char mapping — map VK codes to characters based on current keyboard layout and modifier state
  3. Positional buffer — track cursor position for accurate text reconstruction

Here's a simplified translator:

const SHIFTED_CHARS = {
  0x30: ')', 0x31: '!', 0x32: '@', 0x33: '#', 0x34: '$',
  0x35: '%', 0x36: '^', 0x37: '&', 0x38: '*', 0x39: '(',
};

function vkToChar(vkCode, shiftDown, capsLock) {
  // Letter keys
  if (vkCode >= 0x41 && vkCode <= 0x5A) {
    const isUpper = shiftDown !== capsLock; // XOR
    return String.fromCharCode(isUpper ? vkCode : vkCode + 32);
  }
  // Number/symbol keys
  if (vkCode >= 0x30 && vkCode <= 0x39) {
    return shiftDown ? SHIFTED_CHARS[vkCode] : String.fromCharCode(vkCode);
  }
  // Space
  if (vkCode === 0x20) return ' ';
  // Enter
  if (vkCode === 0x0D) return '\n';
  // Tab
  if (vkCode === 0x09) return '\t';

  return null; // non-printable
}
Enter fullscreen mode Exit fullscreen mode

Session Reconstruction

A keylogger isn't just about capturing individual keystrokes — it's about reconstructing coherent sessions. The captured data stream includes:

  • Timestamps for every event
  • Window focus changes — which application was active
  • Mouse events — clicking context
  • Clipboard captures — pasted content

The reconstructed output needs to handle:

Input Token Reconstruction
[Backspace] Remove previous character
[Delete] Remove next character
[LeftShift]+a Capital 'A'
[Enter] Newline
[Tab] Tab indentation
[Left] / [Right] Cursor movement (overwrite mode)
[CLIPBOARD] Paste marker with captured content

This is non-trivial — the buffer must track cursor position, handle text insertion at arbitrary locations, and correctly resolve dead keys.

Beyond Keystrokes

A production keystroke capture system typically includes:

Periodic screenshot capture — saving screen state at configurable intervals alongside keystroke data

Clipboard monitoring — intercepting clipboard changes via AddClipboardFormatListener API for capturing password manager fills and seed phrase pastes

Active window tracking — logging the foreground window title to identify which application is receiving input (browser URL bar, terminal, password manager, etc.)

Auto-sync pipeline — uploading captured data to a remote server at configurable intervals through encrypted channels

How Platforms Like V-Entity Handle This

A full-stack keystroke capture and analysis platform requires:

  • Native agent deployment — compiled executable that survives reboots, runs as a hidden service
  • Multi-method capture — keyboard hook, clipboard listener, screen capture, camera streaming
  • Remote management — per-system configuration for intervals, target paths, and data types
  • Web dashboard — real-time log viewing, session reconstruction, credential extraction
  • Build uniqueness — every agent compiled per-deployment to avoid signature-based detection

V-Entity provides all of this in a single platform: custom-compiled Windows agents with keystroke logging, clipboard monitoring, credential extraction, live camera streaming, and an interactive PowerShell takeover shell — all managed from a private web dashboard. Every build is cryptographically unique, compiled with your chosen settings, icon, and compiler backend.


This article is for educational purposes. Understanding how keystroke capture works helps researchers build better defensive tooling. Always work within authorized testing boundaries.

Top comments (0)