DEV Community

iamsopotatoe
iamsopotatoe

Posted on

I built a PE packer with a custom VM that randomizes its own instruction set every build — single .cpp file, no deps

PE packers are one of those things that sound complex until you actually build one. then they sound complex again but for different reasons.

I built TinyLoad — a Windows PE packer in a single C++ file with no external dependencies. v3 just shipped and the main change is replacing rolling XOR with a custom virtual machine for decryption.

what it does

you give it an exe. it spits out a new exe that:

  1. stores your original exe compressed and VM-encrypted inside itself
  2. at runtime, spins up a tiny VM interpreter, runs the decryption bytecode, and loads the result directly into RAM via manual PE mapping

the packed exe never writes anything to disk. the original binary just materializes in memory and runs.

why a VM instead of XOR

rolling XOR is trivially broken. an analyst sets a breakpoint on VirtualAlloc, runs the stub, and dumps the decrypted payload from memory. takes about 30 seconds. the decryption loop is native x86 and immediately recognizable to any disassembler.

the VM changes the problem completely. instead of decrypting with native x86 instructions, the stub runs a custom bytecode interpreter. the decryption logic is stored as VM bytecode — not x86. there's no native decryption loop to find.

more importantly: the opcode table is randomly shuffled at pack time and baked into the binary. every packed file has a completely different instruction set. an analyst can't reuse their work across builds. they have to reverse the interpreter first, then figure out what the bytecode is doing.

how the VM works

20 opcodes: loads, moves, arithmetic, bitwise ops, memory read/write, jumps, halt. nothing exotic.

at pack time:

  • generate a random permutation of the 20 opcode IDs
  • emit the decryption program using the shuffled opcode table
  • encrypt the payload with the matching key
  • store: opmap + bytecode + encrypted payload

at runtime:

  • read the opmap from the binary
  • run the interpreter against the bytecode
  • the bytecode decrypts the payload byte by byte
  • load and execute

the cipher is a 128-bit stream cipher using rotl/rotr key mixing. the keys are embedded as immediates directly inside the VM program — there are no separate key fields in the file format.

the compression

custom LZ77 with hash-chain matching, 64KB sliding window, and lazy evaluation. runs on the raw PE first, then VM encryption goes on top. on a 3MB test exe it compressed down to ~878KB before encryption, so the final packed output was ~1.9MB including the ~1MB stub.

PE files compress well because of their structure — repeated import patterns, padding sections, predictable headers.

single file, no dependencies

the whole thing compiles with:

g++ -o TinyLoad.exe TinyLoad.cpp -static -O2 -s
Enter fullscreen mode Exit fullscreen mode

no cmake, no vcpkg, no submodules. the LZ77, VM interpreter, bytecode emitter, PE loader, and resource cloner are all in one ~400 line .cpp file.

usage

TinyLoad.exe --i myapp.exe --vm --c
Enter fullscreen mode Exit fullscreen mode

--vm for VM encryption, --c for LZ77 compression. you need at least one.

what's next

v4 is going to be PE-aware compression — preprocessing the binary structure before LZ77 to get better ratios on code sections specifically. probably also a more complex key schedule inside the VM program.

source: https://github.com/iamsopotatoe-coder/TinyLoad

Top comments (0)