TinyLoad v6 — split opcode tables, encrypted dispatch, and control flow flattening

#cpp #windows #packer #security

TinyLoad v6 is out. if you haven't seen it before — it's a PE packer for Windows. one .cpp file, no dependencies, MIT. repo here.

v5 hardened the stub itself with encrypted strings, IAT wiping, and opmap obfuscation. v6 goes after the two biggest remaining fingerprints: the switch statement in the VM interpreter, and the single contiguous opcode table. both are gone.

the problem with a switch statement

every version of TinyLoad up to v5 had a vmRun function with a giant switch(op) dispatching 28 opcode handlers. this is the most fingerprint-able thing in any custom VM — disassemblers recognise the pattern immediately, and once you have the handler layout you can reconstruct what each opcode does without ever running the code.

v6 replaces it entirely with a computed-goto dispatch table using GCC's &&label extension:

static void* s_tbl[32] = {};  // file scope, filled at runtime

// pack time: read live label addresses, encrypt with random key
uint64_t dispKey = rng3();
for (int i = 0; i < 32; i++) {
    uintptr_t addr = (uintptr_t)s_tbl[i];
    s_tbl[i] = (void*)(addr ^ dispKey);
}
// store dispKey in tail

// runtime: decrypt and jump
dispatch:
    uint8_t raw = vmCode[ip++];
    uint8_t sub = raw >> 3, slot = raw & 7;
    uint8_t op = decodeOp(sub, slot);
    void* handler = (void*)((uintptr_t)s_tbl[op] ^ dispKey);
    goto *handler;

the label addresses are never plaintext in the packed binary. the packer reads them from its own running process, XORs them with a random key, and stores the result in the tail struct. the packed stub decrypts and recomputes the table at runtime. there's no static jump table to dump, no switch to fingerprint.

split opcode decoder

v5 had a single 32-entry opmap, encrypted with one FNV-derived key. one key crack = full opcode table.

v6 splits 28 opcodes across four independent 8-entry subtables, each with its own key derived from different slices of the payload and VM bytecode:

// 4 subtables, each XOR-encrypted with independent key
BYTE sub0[8], sub1[8], sub2[8], sub3[8];

// key for each subtable derived from different data slices
uint32_t k0 = fnv(origSz, packSz, vmCode[0..7]);
uint32_t k1 = fnv(packSz, vmCodeSz, payload[0..7]);
uint32_t k2 = fnv(vmCodeSz, origSz, vmCode[8..15]);
uint32_t k3 = fnv(origSz ^ packSz, vmCodeSz, payload[8..15]);

for (int i = 0; i < 8; i++) {
    sub0[i] ^= (uint8_t)(k0 >> (i % 4 * 8));
    sub1[i] ^= (uint8_t)(k1 >> (i % 4 * 8));
    // ...
}

cracking one subtable key reveals at most 8 of 28 opcodes. the other 20 are behind three different keys derived from different data. opcodes are encoded as (subtable_index << 3) | slot — every packed file gets a completely different encoding across all four tables.

staged entry point and control flow noise

tryRun and runInMem are both broken into stages dispatched through function pointer tables — no linear flow to trace:

tryRun: s_chk → s_ld → s_prs → s_vm → s_dc → s_ex
runInMem: sp_hdr → sp_map → sp_reloc → sp_import → sp_go

on top of that, noiseDecrypt() fires at every stage transition and every 64 VM iterations — calling sdec2 on throwaway buffers with random keys. in a dynamic trace the real string decryption calls (the IAT hook lookups) are buried in identical-looking noise. you can't tell which call matters without executing every path.

other stuff in v6

full resource cloning — switched from hardcoded RT_ICON/RT_VERSION/RT_MANIFEST to EnumResourceTypesA, so all resource types survive packing now
LZ compressor fix — found a hash-chain self-loop bug that was silently degrading match quality, compression improved ~2% across tested files
PE loader hardening — SizeOfBlock underflow guard, reloc bounds validation, negative e_lfanew rejection, import thunk iteration cap, better error propagation throughout

current usage

TinyLoad.exe --i myapp.exe --vm --c

build from source:

g++ -o TinyLoad.exe TinyLoad.cpp -static -O2 -s

grab the binary from releases.

what's left for v7

the one criticism from the RE community that's still open — making a dump worthless. right now once the payload is decrypted in RAM it's self-contained. making it call back into the stub at runtime so a dump without the stub is broken is the hard problem v7 needs to solve.

also thinking about more opaque predicate variety and bytecode encryption on top of the split subtables.

if you find files it breaks on, open an issue. star helps a lot ❤️

repo: github.com/iamsopotatoe-coder/TinyLoad
blog: iamsopotatoe-coder.github.io/TinyLoad/#blog

don't use this to pack malware — legitimate use only.