azqzazq1

Posted on May 24

Beyond Brute Force: Building LXPEN, a Memory-Assisted NTLM Exploration Engine

#algorithms #cybersecurity #security #tooling

Modern password cracking tools are incredibly fast.

On modern GPUs, tools like Hashcat can process tens or even hundreds of billions of NTLM hashes per second. Yet despite this enormous throughput, one fundamental problem still remains:

Most candidate passwords are meaningless.

Traditional cracking workflows focus on increasing compute throughput:
more GPU cores, more parallelism, more masks, larger wordlists, more rules.

But what if the real optimization target is not hash throughput — but candidate quality?

This idea became the foundation of LXPEN.

The NTLM Problem

NTLM remains one of the most common password representations encountered during:

Active Directory assessments
Internal penetration tests
Red team operations
Credential audits
CTF environments

Unlike modern password hashing schemes such as bcrypt or Argon2, NTLM is simply:

MD4(UTF-16LE(password))

This creates several properties:

no salt
extremely fast hashing
deterministic outputs
highly SIMD-friendly
highly precomputation-friendly

Traditional tools exploit this by maximizing brute-force throughput.

LXPEN explores a different direction.

The Human Password Problem

People do not generate passwords randomly.

They construct them through mental templates.

Examples:

Michael1994
password123
Galatasaray1905!
P@ssw0rd123
shadow99

These passwords are not random strings.
They are combinations of:

known words
names
years
symbols
keyboard patterns
l33t transformations

In other words:

Human passwords are structured.

Traditional wordlist-based cracking partially exploits this idea, but only indirectly.

LXPEN attempts to model this behavior explicitly.

Hierarchical Probabilistic Decomposition (HPD)

At the core of LXPEN is HPD:
Hierarchical Probabilistic Decomposition.

Instead of reading candidates from a wordlist, HPD models password generation as a layered probabilistic system.

Layer 1 — Structural Templates

Examples:

[LowerWord + Digits]
[CapWord + Year]
[Name + Year + Symbol]
[L33t + Digits]

Each structure is assigned a probability weight derived from observed password behavior.

Layer 2 — Slot Frequency Modeling

Each slot contains weighted entries.

Examples:

LowerWord:
password
shadow
dragon
butterfly
galatasaray

Year:
2024
1994
1907

Symbol:
!
@
#

The engine does not treat all candidates equally.

It prioritizes:

high-frequency structures
high-frequency slot entries
culturally relevant patterns

Layer 3 — Probability-Ordered Candidate Generation

Candidates are generated dynamically in probability order.

This creates a fundamentally different exploration strategy compared to:

brute force
static wordlists
blind mutation rules

LXPEN attempts to minimize wasted computation.

CPU + RAM Cooperative Architecture

Another design goal behind LXPEN was reducing dependency on massive GPU workflows.

Modern cracking systems are usually compute-centric.

LXPEN experiments with a different model:

RAM becomes an active cracking component.

Instead of using memory only for buffering, the engine uses:

pattern indexes
slot tables
candidate caches
lookup structures
multi-target state tracking

The architecture is split into two layers:

Crystal Layer

Responsible for:

orchestration
pattern management
slot modeling
candidate-space exploration

C Core

Responsible for:

NTLM hashing
UTF-16LE conversion
SIMD-friendly hot loops
multi-threaded candidate execution

This separation allowed rapid iteration at the exploration layer while keeping the hashing path extremely lightweight.

Candidate Efficiency vs Hash Throughput

Most cracking discussions focus on:

hashes per second

LXPEN instead focuses on:

useful candidates per second

This distinction matters.

Generating 10 billion low-quality candidates is not always superior to generating 1 million highly probable ones.

In benchmark scenarios focused on human-structured passwords, LXPEN showed significantly higher candidate efficiency compared to traditional wordlist-based approaches.

The goal is not replacing Hashcat or John the Ripper.

The goal is exploring a different optimization axis:
reducing unnecessary candidate exploration.

Why This Matters

As password policies evolve, users continue to follow predictable mental models.

Even when complexity requirements are added, patterns remain:

company names
sports teams
years
keyboard sequences
predictable symbol placement
l33t substitutions

This creates an opportunity for structure-aware exploration systems.

LXPEN is an experiment in that direction.

Current Limitations

LXPEN is not magic.

It does not solve:

cryptographically random passwords
high-entropy secrets
modern slow hash functions
truly unpredictable password generation

Passwords like:

xK9#mZ2pLq
Tr0ub4dor&3

remain difficult — as they should.

The engine is specifically designed around human-generated structure.

Future Work

Several areas remain unexplored:

adaptive probability updates during cracking sessions
AVX2 / AVX-512 optimization
memory-mapped precompute zones
distributed candidate-space exploration
Markov-assisted slot ordering
PCFG integration
hybrid exploration modes

The most interesting direction may not be faster hashing.

It may be better understanding of how humans construct passwords.

Closing Thoughts

Brute force scales compute.

LXPEN attempts to scale understanding.

Instead of asking:
“How can we hash faster?”

the project asks:
“How can we avoid hashing unnecessary candidates in the first place?”

That shift in perspective became the foundation of LXPEN.

GitHub:
https://github.com/azqzazq1/lxpen

DOI:
https://doi.org/10.5281/zenodo.20366383

DEV Community

Beyond Brute Force: Building LXPEN, a Memory-Assisted NTLM Exploration Engine

The NTLM Problem

The Human Password Problem

Hierarchical Probabilistic Decomposition (HPD)

Layer 1 — Structural Templates

Layer 2 — Slot Frequency Modeling

Layer 3 — Probability-Ordered Candidate Generation

CPU + RAM Cooperative Architecture

Crystal Layer

C Core

Candidate Efficiency vs Hash Throughput

Why This Matters

Current Limitations

Future Work

Closing Thoughts

Top comments (0)