DEV Community

Cover image for Horus, Apophis, and Sekhmet: An C/XS Identifier Stack for Perl
LNATION for LNATION

Posted on

Horus, Apophis, and Sekhmet: An C/XS Identifier Stack for Perl

Three modules, one goal: fast, correct identifier generation in Perl with zero runtime dependencies. Horus generates UUIDs. Sekhmet generates ULIDs. Apophis uses deterministic UUIDs to build content-addressable storage. All three are implemented in C, exposed through XS, and designed to work together.

Horus: Every UUID Version, One Module

Horus implements all UUID versions defined in RFC 9562 -- v1 through v8, plus NIL and MAX. The entire engine is C, compiled once, called millions of times per second.

use Horus qw(:all);

my $random   = uuid_v4();                         # 122 random bits
my $sortable = uuid_v7();                         # timestamp + random, sortable
my $fixed    = uuid_v5(UUID_NS_DNS, "example.com"); # deterministic, always the same
Enter fullscreen mode Exit fullscreen mode

Why multiple versions matter

Each version solves a different problem:

v4 is the workhorse. 122 bits of randomness, no coordination needed. Use it for session tokens, request IDs, anything where uniqueness is all you need.

v7 embeds a millisecond timestamp in the high bits, making UUIDs lexicographically sortable. Database indexes love this -- new rows append instead of scattering across B-tree pages. Horus guarantees monotonic ordering within the same millisecond.

my @ids = map { uuid_v7() } 1..3;
# 019d38f6-3e9a-765c-ae1c-1cfeb0c30000
# 019d38f6-3e9a-765c-ae1c-1cfeb0c40000
# 019d38f6-3e9a-765c-ae1c-1cfeb0c50000
# String sort == chronological sort
Enter fullscreen mode Exit fullscreen mode

v5 is deterministic. Given the same namespace and name, it always produces the same UUID. This is the foundation Apophis builds on... the same content, the same identifier, every time.

my $a = uuid_v5(UUID_NS_DNS, "example.com");
my $b = uuid_v5(UUID_NS_DNS, "example.com");
# $a eq $b -- always
Enter fullscreen mode Exit fullscreen mode

Ten output formats

Every generator accepts a format parameter. Convert between them freely:

my $id = uuid_v4();

uuid_convert($id, UUID_FMT_STR);      # 550e8400-e29b-41d4-a716-446655440000
uuid_convert($id, UUID_FMT_HEX);      # 550e8400e29b41d4a716446655440000
uuid_convert($id, UUID_FMT_BRACES);   # {550e8400-e29b-41d4-a716-446655440000}
uuid_convert($id, UUID_FMT_URN);      # urn:uuid:550e8400-e29b-41d4-a716-446655440000
uuid_convert($id, UUID_FMT_BASE64);   # VQ6EAOKbQdSnFkRmVUQAAA
Enter fullscreen mode Exit fullscreen mode

Bulk generation

When you need thousands of IDs, crossing the Perl/C boundary once beats crossing it thousands of times:

my @ids = uuid_v4_bulk(10_000);  # single call, 10k UUIDs back
Enter fullscreen mode Exit fullscreen mode

Utilities

uuid_validate($string);          # is this a valid UUID?
uuid_version($string);           # which version? (1-8)
uuid_time($v7_uuid);             # extract epoch seconds from v7/v6
uuid_cmp($a, $b);                # sort comparison (-1, 0, 1)
Enter fullscreen mode Exit fullscreen mode

Sekhmet: ULIDs for When You Need Sortable and Compact

A ULID is 26 characters of Crockford base32 encoding: 10 characters of millisecond timestamp followed by 16 characters of randomness. They sort lexicographically by time, they are URL-safe, and they are shorter than UUIDs.

use Sekhmet qw(:all);

my $ulid = ulid();
# 06EKHXHYKAT25K0YQJHN6A6YJR
Enter fullscreen mode Exit fullscreen mode

Monotonic mode

If you generate multiple ULIDs within the same millisecond, the random component increments to guarantee strict ordering:

my $a = ulid_monotonic();
my $b = ulid_monotonic();
my $c = ulid_monotonic();
# $a lt $b lt $c -- guaranteed, even within the same millisecond
Enter fullscreen mode Exit fullscreen mode

Time extraction

The timestamp is baked into the ULID. Extract it without a database lookup:

my $ulid = ulid();
my $epoch = ulid_time($ulid);       # 1774777155.226
my $ms    = ulid_time_ms($ulid);    # 1774777155226
Enter fullscreen mode Exit fullscreen mode

UUID interoperability

ULIDs and UUID v7 share the same structure -- 48-bit timestamp, random fill. Convert between them losslessly:

my $ulid = ulid();
my $uuid = ulid_to_uuid($ulid);    # standard UUID v7 string
# Useful when your API expects UUIDs but you generate ULIDs internally
Enter fullscreen mode Exit fullscreen mode

When to use Sekhmet vs Horus

Use Sekhmet (ulid()) when you want compact, sortable, human friendly identifiers for log entries, event streams, anything displayed in a UI. Use Horus (uuid_v7()) when you need standard UUID format for compatibility with systems that expect 36-character hyphenated strings. Use Horus
(uuid_v4()) when you need pure randomness with no timestamp leakage.

Apophis: Content-Addressable Storage

Apophis answers the question: "Have I seen this content before?" It hashes content with UUID v5 to produce a deterministic identifier, then stores the content in a sharded directory tree. Same content always maps to the same path. Different content never collides.

use Apophis;

my $store = Apophis->new(
    namespace => 'my-app',
    store_dir => '/var/data/cas',
);

my $id = $store->store(\"Hello, world!");
# 3e856e0f-c7ac-569e-827b-40df723c326f

my $id2 = $store->store(\"Hello, world!");
# 3e856e0f-c7ac-569e-827b-40df723c326f  -- same content, same ID
Enter fullscreen mode Exit fullscreen mode

How storage works

Content is stored in a two-level hex-sharded directory tree derived from the UUID. The first four hex characters become two directory levels:

/var/data/cas/
  3e/85/3e856e0f-c7ac-569e-827b-40df723c326f
Enter fullscreen mode Exit fullscreen mode

This gives 65,536 possible directories -- enough to keep any single directory from growing too large, even with millions of files.

Writes are atomic: content goes to a temporary file first, then is renamed into place. A crash mid-write leaves no partial files.

Identification without storage

Sometimes you just want the identifier:

my $id = $store->identify(\"some content");     # UUID, no write
my $id = $store->identify_file("/path/to/big.iso");  # streams in 64KB chunks
Enter fullscreen mode Exit fullscreen mode

File identification is streaming -- a 10GB file uses the same memory as a 10KB file.

Metadata

Attach arbitrary metadata as a sidecar:

my $id = $store->store(\"image data", meta => {
    mime_type     => 'image/png',
    original_name => 'photo.png',
    uploaded_by   => 'user-42',
});

my $meta = $store->meta($id);
# { mime_type => 'image/png', original_name => 'photo.png', ... }
Enter fullscreen mode Exit fullscreen mode

Namespace isolation

The namespace parameter creates a separate UUID v5 namespace. The same content under different namespaces produces different identifiers:

my $a = Apophis->new(namespace => 'uploads');
my $b = Apophis->new(namespace => 'cache');

$a->identify(\"data") ne $b->identify(\"data");  # different IDs
Enter fullscreen mode Exit fullscreen mode

This lets you run multiple independent stores without collision.

Verification

Content-addressable storage has a built-in integrity check: re-hash the content and compare to the filename.

if ($store->verify($id)) {
    # content matches its identifier -- no corruption
}
Enter fullscreen mode Exit fullscreen mode

How They Fit Together

Horus (Foundation)
  |-- UUID v1-v8, NIL, MAX
  |-- C headers reused by downstream XS modules
  |
  |--- Apophis (Content-addressable storage)
  |      Uses UUID v5 for deterministic content identification
  |
  |--- Sekhmet (ULID generation)
         Uses Horus C primitives for Crockford base32, CSPRNG, timestamps
Enter fullscreen mode Exit fullscreen mode

Horus is the foundation. Its C headers are standalone -- no Perl types, no interpreter context. Apophis and Sekhmet include them at compile time via Horus->include_dir().

A practical example using all three:

use Horus qw(:all);
use Apophis;
use Sekhmet qw(:all);

# Event tracking system
my $event_id  = ulid_monotonic();              # sortable event identifier
my $session   = uuid_v4();                     # random session token
my $store     = Apophis->new(namespace => 'events', store_dir => '/var/events');

# Store event payload, get content-addressable ID
my $payload = encode_json({ action => 'click', target => 'button-1' });
my $content_id = $store->store(\$payload, meta => {
    event_id   => $event_id,
    session_id => $session,
    timestamp  => ulid_time($event_id),
});

# Later: retrieve by content hash
my $data = $store->fetch($content_id);

# Or find when the event happened from the ULID
my $when = ulid_time($event_id);
Enter fullscreen mode Exit fullscreen mode

Each module handles one concern well. Horus generates identifiers. Sekhmet adds time sortable compact identifiers. Apophis maps content to identifiers and manages storage. No module tries to do what another already does.

Performance

All three modules use custom ops on Perl 5.14+ to eliminate subroutine dispatch overhead. The hot paths are pure C with no Perl API calls.

Getting Started

cpanm Horus
cpanm Sekhmet
cpanm Apophis
Enter fullscreen mode Exit fullscreen mode

All three are on CPAN under the
Artistic License 2.0.

Top comments (0)