LNATION for LNATION

Posted on Mar 29

Horus, Apophis, and Sekhmet: An C/XS Identifier Stack for Perl

#perl #c #xs #programming

Three modules, one goal: fast, correct identifier generation in Perl with zero runtime dependencies. Horus generates UUIDs. Sekhmet generates ULIDs. Apophis uses deterministic UUIDs to build content-addressable storage. All three are implemented in C, exposed through XS, and designed to work together.

Horus: Every UUID Version, One Module

Horus implements all UUID versions defined in RFC 9562 -- v1 through v8, plus NIL and MAX. The entire engine is C, compiled once, called millions of times per second.

use Horus qw(:all);

my $random   = uuid_v4();                         # 122 random bits
my $sortable = uuid_v7();                         # timestamp + random, sortable
my $fixed    = uuid_v5(UUID_NS_DNS, "example.com"); # deterministic, always the same

Why multiple versions matter

Each version solves a different problem:

v4 is the workhorse. 122 bits of randomness, no coordination needed. Use it for session tokens, request IDs, anything where uniqueness is all you need.

v7 embeds a millisecond timestamp in the high bits, making UUIDs lexicographically sortable. Database indexes love this -- new rows append instead of scattering across B-tree pages. Horus guarantees monotonic ordering within the same millisecond.

my @ids = map { uuid_v7() } 1..3;
# 019d38f6-3e9a-765c-ae1c-1cfeb0c30000
# 019d38f6-3e9a-765c-ae1c-1cfeb0c40000
# 019d38f6-3e9a-765c-ae1c-1cfeb0c50000
# String sort == chronological sort

v5 is deterministic. Given the same namespace and name, it always produces the same UUID. This is the foundation Apophis builds on... the same content, the same identifier, every time.

my $a = uuid_v5(UUID_NS_DNS, "example.com");
my $b = uuid_v5(UUID_NS_DNS, "example.com");
# $a eq $b -- always

Ten output formats

Every generator accepts a format parameter. Convert between them freely:

my $id = uuid_v4();

uuid_convert($id, UUID_FMT_STR);      # 550e8400-e29b-41d4-a716-446655440000
uuid_convert($id, UUID_FMT_HEX);      # 550e8400e29b41d4a716446655440000
uuid_convert($id, UUID_FMT_BRACES);   # {550e8400-e29b-41d4-a716-446655440000}
uuid_convert($id, UUID_FMT_URN);      # urn:uuid:550e8400-e29b-41d4-a716-446655440000
uuid_convert($id, UUID_FMT_BASE64);   # VQ6EAOKbQdSnFkRmVUQAAA

Bulk generation

When you need thousands of IDs, crossing the Perl/C boundary once beats crossing it thousands of times:

my @ids = uuid_v4_bulk(10_000);  # single call, 10k UUIDs back

Utilities

uuid_validate($string);          # is this a valid UUID?
uuid_version($string);           # which version? (1-8)
uuid_time($v7_uuid);             # extract epoch seconds from v7/v6
uuid_cmp($a, $b);                # sort comparison (-1, 0, 1)

Sekhmet: ULIDs for When You Need Sortable and Compact

A ULID is 26 characters of Crockford base32 encoding: 10 characters of millisecond timestamp followed by 16 characters of randomness. They sort lexicographically by time, they are URL-safe, and they are shorter than UUIDs.

use Sekhmet qw(:all);

my $ulid = ulid();
# 06EKHXHYKAT25K0YQJHN6A6YJR

Monotonic mode

If you generate multiple ULIDs within the same millisecond, the random component increments to guarantee strict ordering:

my $a = ulid_monotonic();
my $b = ulid_monotonic();
my $c = ulid_monotonic();
# $a lt $b lt $c -- guaranteed, even within the same millisecond

Time extraction

The timestamp is baked into the ULID. Extract it without a database lookup:

my $ulid = ulid();
my $epoch = ulid_time($ulid);       # 1774777155.226
my $ms    = ulid_time_ms($ulid);    # 1774777155226

UUID interoperability

ULIDs and UUID v7 share the same structure -- 48-bit timestamp, random fill. Convert between them losslessly:

my $ulid = ulid();
my $uuid = ulid_to_uuid($ulid);    # standard UUID v7 string
# Useful when your API expects UUIDs but you generate ULIDs internally

When to use Sekhmet vs Horus

Use Sekhmet (ulid()) when you want compact, sortable, human friendly identifiers for log entries, event streams, anything displayed in a UI. Use Horus (uuid_v7()) when you need standard UUID format for compatibility with systems that expect 36-character hyphenated strings. Use Horus
(uuid_v4()) when you need pure randomness with no timestamp leakage.

Apophis: Content-Addressable Storage

Apophis answers the question: "Have I seen this content before?" It hashes content with UUID v5 to produce a deterministic identifier, then stores the content in a sharded directory tree. Same content always maps to the same path. Different content never collides.

use Apophis;

my $store = Apophis->new(
    namespace => 'my-app',
    store_dir => '/var/data/cas',
);

my $id = $store->store(\"Hello, world!");
# 3e856e0f-c7ac-569e-827b-40df723c326f

my $id2 = $store->store(\"Hello, world!");
# 3e856e0f-c7ac-569e-827b-40df723c326f  -- same content, same ID

How storage works

Content is stored in a two-level hex-sharded directory tree derived from the UUID. The first four hex characters become two directory levels:

/var/data/cas/
  3e/85/3e856e0f-c7ac-569e-827b-40df723c326f

This gives 65,536 possible directories -- enough to keep any single directory from growing too large, even with millions of files.

Writes are atomic: content goes to a temporary file first, then is renamed into place. A crash mid-write leaves no partial files.

Identification without storage

Sometimes you just want the identifier:

my $id = $store->identify(\"some content");     # UUID, no write
my $id = $store->identify_file("/path/to/big.iso");  # streams in 64KB chunks

File identification is streaming -- a 10GB file uses the same memory as a 10KB file.

Metadata

Attach arbitrary metadata as a sidecar:

my $id = $store->store(\"image data", meta => {
    mime_type     => 'image/png',
    original_name => 'photo.png',
    uploaded_by   => 'user-42',
});

my $meta = $store->meta($id);
# { mime_type => 'image/png', original_name => 'photo.png', ... }

Namespace isolation

The namespace parameter creates a separate UUID v5 namespace. The same content under different namespaces produces different identifiers:

my $a = Apophis->new(namespace => 'uploads');
my $b = Apophis->new(namespace => 'cache');

$a->identify(\"data") ne $b->identify(\"data");  # different IDs

This lets you run multiple independent stores without collision.

Verification

Content-addressable storage has a built-in integrity check: re-hash the content and compare to the filename.

if ($store->verify($id)) {
    # content matches its identifier -- no corruption
}

How They Fit Together

Horus (Foundation)
  |-- UUID v1-v8, NIL, MAX
  |-- C headers reused by downstream XS modules
  |
  |--- Apophis (Content-addressable storage)
  |      Uses UUID v5 for deterministic content identification
  |
  |--- Sekhmet (ULID generation)
         Uses Horus C primitives for Crockford base32, CSPRNG, timestamps

Horus is the foundation. Its C headers are standalone -- no Perl types, no interpreter context. Apophis and Sekhmet include them at compile time via Horus->include_dir().

A practical example using all three:

use Horus qw(:all);
use Apophis;
use Sekhmet qw(:all);

# Event tracking system
my $event_id  = ulid_monotonic();              # sortable event identifier
my $session   = uuid_v4();                     # random session token
my $store     = Apophis->new(namespace => 'events', store_dir => '/var/events');

# Store event payload, get content-addressable ID
my $payload = encode_json({ action => 'click', target => 'button-1' });
my $content_id = $store->store(\$payload, meta => {
    event_id   => $event_id,
    session_id => $session,
    timestamp  => ulid_time($event_id),
});

# Later: retrieve by content hash
my $data = $store->fetch($content_id);

# Or find when the event happened from the ULID
my $when = ulid_time($event_id);

Each module handles one concern well. Horus generates identifiers. Sekhmet adds time sortable compact identifiers. Apophis maps content to identifiers and manages storage. No module tries to do what another already does.

Performance

All three modules use custom ops on Perl 5.14+ to eliminate subroutine dispatch overhead. The hot paths are pure C with no Perl API calls.

Getting Started

cpanm Horus
cpanm Sekhmet
cpanm Apophis

All three are on CPAN under the
Artistic License 2.0.

DEV Community