LNATION for LNATION

Posted on Jan 25

Ready, Set, Compile... you slow Camel

#perl #programming

"Perl is slow."

I've heard this for years, well since I started. You probably have too. And honestly? For a long time, I didn't have a great rebuttal. Sure, Perl's fast enough for most things, it's well known for text processing, glueing code and quick scripts. But when it came to object heavy code, the critics have a point.

We will begin by looking at the myth of perl being slow a little more deeply. Here's a benchmark between Perl and Python using CPU seconds, a fair comparison that measures actual work done:

=== PERL (5 CPU seconds per test) ===
Integer arithmetic             1,072,800/s
Float arithmetic                 398,800/s
String concat                    970,000/s
Array push/iterate               368,800/s
Hash insert/iterate               84,800/s
Function calls                   244,000/s
Regex match                   12,921,200/s

=== PYTHON (5 CPU seconds per test) ===
Integer arithmetic               777,200/s
Float arithmetic                 512,400/s
String concat                    627,200/s
List append/iterate              476,400/s
Dict insert/iterate              140,600/s
Function calls                   331,400/s
Regex match                   10,543,713/s

The results are more nuanced than the "Perl is slow" narrative suggests:

Operation	Winner	Margin
Integer arithmetic	Perl	1.4x faster
Float arithmetic	Python	1.3x faster
String concat	Perl	1.5x faster
Array/List ops	Python	1.3x faster
Hash/Dict ops	Python	1.7x faster
Function calls	Python	1.4x faster
Regex match	Perl	1.2x faster

Perl wins at what it's always been good at: integers, strings, and regex. Python wins at floats, data structures, and function calls areas where I am told Python 3.x has seen heavy optimisation work.

But here's the thing that surprised me: neither language is dramatically faster than the other for basic operations. The differences are measured in fractions, not orders of magnitude. So where does the "Perl is slow" reputation actually come from?

Object-oriented code. Let's run that same fair comparison:

=== Object creation + 2 method calls (5M iterations) ===
Perl bless:    4,155,178/s  (1.20 sec)
Python class:  5,781,818/s  (0.86 sec)

Okay, this is not so bad. Perl's only 40% behind. But now let's look at what people actually use these days: Moo.

=== Object creation + 2 method calls (5M iterations) ===
Perl bless:    4,176,222/s  (1.20 sec)
Moo class:       843,708/s  (5.93 sec)
Python class:  5,590,052/s  (0.89 sec)

Wait, what? Moo is 6.6x slower than Python. And it's 5x slower than plain bless.

This is layered with actual business logic is I guess where "Perl is slow" actually comes from. This all comes down to layers. Every Moo accessor has been optimised but if you have all features you build a call stack, each adding overhead:

$obj->name
  └─> accessor method (generated sub)
        └─> type constraint check
              └─> coercion check
                    └─> trigger check
                          └─> lazy builder check
                                └─> finally: $self->{name}

Each of those subroutine calls means:

Push arguments onto the stack (~3-5 ops)
Create a new scope (localizing variables)
Execute the check (even if it's just "return true")
Pop the stack and return (~3-5 ops)

Even a "simple" Moo accessor with just a type constraint involves roughly 30+ additional operations compared to a plain hash access. The type constraint alone might call:

has_type_constraint() - is there a constraint?
type_constraint() - get the constraint object
check() - call the constraint's check method
The actual validation logic

Multiply that by two accessors per iteration, five million iterations, and suddenly you're spending 5 seconds instead of 1.

This is the trade off Moo makes: flexibility and safety for speed. And for most applications, it's the right trade off and even in python they do this with what they call pydantic that halfs the performance of python objects.

I've spent more time than I'd care to admit thinking about this question. Not in a "let's rewrite everything in Rust" kind of way, but genuinely asking: what would it take to make Perl's object system competitive with languages people actually consider fast?

The answer, it turns out, was inside a CPAN module first released on 'Mon Jul 24 11:23:25 2000'. This was highlighted to me by another works who I am indeed one of the three people who do not only read their blogs but also often finds themselves lost within their interesting coding patterns.

So this is the story of the four modules that changed how I think about Perl performance: Marlin, Meow, Inline and XS::JIT. They're different tools with different philosophies, but together they represent something I never quite expected to see Perl object access that's actually faster than Python's equivalent. Not "almost as fast." Faster.

The Marlin story: A faster fish in the Moose family

If you've written any serious Perl in the last fifteen years, you've probably used Moose. Or Moo. Or Mouse. The naming convention is... well, it's a thing we do now.

Marlin fits right into that tradition, and the name's not accidental. Marlins are among the fastest fish in the ocean. That's the pitch: everything you love about Moose-style OO, but with speed as a first-class concern.

Toby Inkster released Marlin in late 2025, and it caught my attention as I stated before, many of his projects do. I'd previously attempted to write a fast OO system myself (Meow), but was struggling to even compete with Moo despite being entirely XS. Partly ability, partly still learning, mostly not being in the right compile time stage.

With my interest piqued, I installed Marlin, played with the API, and ran some benchmarks:

Benchmark: 1,000,000 iterations
            Rate   Meow    Moo Marlin  Mouse
Meow    606,061/s     --    -1%   -45%   -47%
Moo     609,756/s     1%     --   -45%   -46%
Marlin 1,098,901/s    81%    80%     --    -3%
Mouse  1,136,364/s    87%    86%     3%     --

Marlin performed well. Meow at that point was... not impressive. But I liked Marlin's API and, understanding my own implementation's limitations, I was satisfied enough with the speed to build my Claude modules around it, while also understanding it would likely improve in performance.

A few weeks later, and a lot happened in between, but on Friday evening I randomly decided to revisit my Meow directory. Could I fix some of the flaws based upon my recent learnings? I managed to, and saw a huge improvement in my own benchmarks. So I updated to the latest Marlin for a fair comparison.

I was expecting Meow to be faster now since I'm doing much less in this minimalist approach. But what I actually found surprised me:

Benchmark: 10,000,000 iterations
            Rate    Moo  Mouse   Meow Marlin
Moo     868,810/s     --   -47%   -60%   -81%
Mouse  1,626,016/s    87%     --   -26%   -64%
Meow   2,183,406/s   151%    34%     --   -52%
Marlin 4,504,505/s   418%   177%   106%     --

Marlin had gotten dramatically faster, over 4x improvement from the version I'd first tested. Toby had clearly been busy. And while Meow had improved too, it was still only half of Marlin's speed.

This was the moment that changed everything. I needed to understand how Marlin achieved this. What was I missing?

Just in time optimisation

As I mentioned, I read other people's code. I read Toby's posts on Marlin and how he'd studied Mouse's optimisation strategy: only validate when you absolutely need to. But when I started tracing through Marlin's actual implementation, something clicked.

The key insight is in Marlin::Attribute::install_accessors. Here's what happens when Marlin sets up a reader:

if ( $type eq 'reader' and !$me->has_simple_reader and $me->xs_reader ) {
    $me->{_implementation}{$me->{$type}} = 'CXSR';  # Class::XSReader
}
elsif ( HAS_CXSA and $me->has_simple_reader ) {
    # Use Class::XSAccessor for simple cases
    Class::XSAccessor->import( class => $me->{package}, ... );
}

Marlin makes a compile-time decision: what kind of accessor does this attribute actually need?

Simple getter (no default, no lazy, no type check on read)? → Use Class::XSAccessor, which is pure XS and blindingly fast
Getter with lazy default or type coercion? → Use Class::XSReader, which handles the complexity in optimised C
Something exotic (auto_deref, custom behaviour)? → Fall back to generated Perl

This is the magic. Most Moo-style accessors go through a generic code path that handles every possible feature, even features you're not using. Marlin analyses your attribute definition at compile time and generates the minimal accessor that satisfies your requirements.

Consider a read-only attribute with a type but no default:

# Moo accessor path:
$obj->name
  → check if lazy builder needed     # nope, but we still check
  → check if default needed          # nope, but we still check  
  → check if coercion needed         # nope, but we still check
  → finally: $self->{name}

# Marlin accessor (Class::XSAccessor):
$obj->name
  → $self->{name}                    # that's it. One XS call.

The type constraint? Marlin validates it in the constructor, not the getter. Once an object is built, reading an attribute is just a hash lookup: no validation, no subroutine calls, no stack manipulation.

This is why Marlin went from 1.1M ops/sec to 4.5M ops/sec between versions. Toby wasn't just optimising code. He was eliminating entire categories of runtime work by moving decisions to compile time.

A different approach is used forClass::XSConstructor. This reuses a generic XSUB but passes the class data via a custom pointer. This sub is then optimised to not need to reach back into perl for stash, hv lookups etc.

It's JIT compilation, but done at module load time rather than runtime. By the time your code calls ->new or ->name, all the decisions have been made. All that's left is the actual work.

This was my revelation: the path to fast Perl OO isn't avoiding features, it's avoiding runtime feature detection. Know what you need at compile time, generate optimised code for exactly that, and get out of the way.

Now the question became: could I apply this same principle to Meow? It was already setup to build a simple hash that represented the object, I had what I needed but I wanted to do this in a backwards compatible way.

Enter Inline::C

Armed with the understanding of why Marlin was fast, I had a hypothesis: if I could generate XS accessors at compile time tailored to each attribute's needs, Meow could achieve the same performance.

I needed to generate custom C code and then execute it, well for perl that was written by Ingy döt Net back in 2000 the Inline::C.

The idea was simple: when Meow sees ro name => Str, it should generate C code for an accessor that:

Takes the object
Returns the value at the slot index for name
That's it. No method dispatch, no type checking, no feature checking.

I didn't want to just break everything so I leaned into the Moose catalog and added a make_immutable phase. When this is called it would compile the C code needed to generate an optimised XS package and this was fed into Inline::C. The first run would compile; subsequent runs would use the cached .so.

And it worked. I had to change the benchmark to CPU to get a fair result but I've also included a Cor test here which does not have type checking like Marlin or Meow.

Benchmark: running Cor, Marlin, Meow for at least 5 CPU seconds...
       Cor:  5 wallclock secs ( 5.13 usr +  0.02 sys =  5.15 CPU) @ 2,886,788/s
    Marlin:  5 wallclock secs ( 5.01 usr +  0.11 sys =  5.12 CPU) @ 4,523,074/s
      Meow:  5 wallclock secs ( 5.16 usr +  0.02 sys =  5.18 CPU) @ 4,558,344/s

As you can see Meow had caught Marlin. Actually, it was slightly faster, 4.56M vs 4.52M ops/sec, but this would be expected as Meow does ALOT less than Marlin.

But my bottlekneck was now in Inline::C and well nobody wants to write C/XS let alone concatenate it.

Startup overhead: First compilation was slow, several seconds for a complex class
Dependencies: Inline::C pulls in Parse::RecDescent, adds complexity to the dependency chain
Build process: It generates a full Makefile.PL and runs the ExtUtils::MakeMaker machinery
Caching: The caching mechanism is designed for "write once" scripts, not dynamic code generation

For a proof of concept, Inline::C was perfect. But for a production module, I needed something leaner. That's when I started looking at what Inline::C actually does under the hood, and wondering how much of it I could strip away.

Under the hood: XS::JIT as the secret weapon

Inline::C proved the concept worked, but it came with baggage. Every compile spawned a full Makefile.PL build process. Dependencies bloated the install. And the caching system, designed for write-once scripts, wasn't ideal for dynamic code generation.

So I started picking apart what Inline::C actually does:

Parse C code to find function signatures
Generate XS wrapper code
Generate a Makefile.PL
Run perl Makefile.PL && make
Load the resulting .so

And yes, this happens even when you use bind Inline C => ... instead of the use form. The bind keyword just defers compilation to runtime rather than compile time. It doesn't change what gets done, only when. You still get the full Parse::RecDescent parsing, the xsubpp processing, the MakeMaker dance. The only difference is whether it happens at use time or when bind is called.

Most of this was unnecessary for my use case. I didn't need function parsing, I already knew what functions I was generating. I didn't need XS wrappers, I was writing XS-native code directly. And I definitely didn't need the Makefile.PL dance.

XS::JIT strips all of that away. It's a single-purpose tool: take C code, compile it, load it, install the functions. No parsing. No xsubpp. No make. Direct compiler invocation.

Here's what the C API looks like:

#include "xs_jit.h"

/* Function mapping - where to install what */
XS_JIT_Func funcs[] = {
    { "Cat::new",  "cat_new",  0, 1 },  /* target, source, varargs, xs_native */
    { "Cat::name", "cat_name", 0, 1 },
    { "Cat::age",  "cat_age",  0, 1 },
};

/* Compile and install in one call */
int ok = xs_jit_compile(aTHX_
    c_code,           /* Your generated C code */
    "Meow::JIT::Cat", /* Unique name for caching */
    funcs,            /* Function mapping array */
    3,                /* Number of functions */
    "_CACHED_XS",     /* Cache directory */
    0                 /* Don't force recompile */
);

That's it. One function call. The first time it runs, XS::JIT:

Generates a boot function that registers all the XS functions
Compiles directly with the system compiler (cc -shared -fPIC ...)
Loads the .so with DynaLoader
Installs each function into its target namespace

Subsequent runs? It hashes the C code, finds the cached .so, and just loads it. The compile step vanishes entirely.

The key insight is the is_xs_native flag. When set, XS::JIT creates a simple alias: no wrapper function, no stack manipulation, no overhead. Your C function is the XS function:

XS_EUPXS(cat_name) {
    dVAR; dXSARGS;
    SV *self = ST(0);
    AV *av = (AV*)SvRV(self);
    SV **slot = av_fetch(av, 0, 0);  /* slot 0 = name */
    ST(0) = slot ? *slot : &PL_sv_undef;
    XSRETURN(1);
}

No wrapper. No intermediate calls.

This is exactly what Meow needed. During make_immutable, it:

Analyses each attribute's requirements (type constraint? coercion? trigger?)
Generates minimal XS accessor code for each one
Generates an optimised XS constructor that handles all attributes in one pass
Hands the code to XS::JIT for compilation
Gets back installed functions ready to call

The entire JIT compilation happens once per class, at module load time. By the time your code runs, everything is native XS.

Comparing the approaches

Here's what actually happens at runtime for each framework:

Moo accessor call:

$obj->name
  → Perl method dispatch
    → Generated Perl subroutine
      → has_type_constraint() check
        → type_constraint() fetch
          → check() call
            → finally: $self->{name}

Stack frames: 4-6. Operations: ~30.

Marlin accessor call (Class::XSAccessor):

$obj->name
  → Perl method dispatch
    → XS accessor
      → $self->{name}

Stack frames: 1. Operations: ~5.

Note: Toby has some slot magic also

Meow accessor call (XS::JIT):

$obj->name
  → Perl method dispatch
    → XS accessor
      → $self->[SLOT_INDEX]

Stack frames: 1. Operations: ~4 (arrays are slightly faster than hashes).

The benchmark results

With XS::JIT in place, here's where Meow now landed:

Benchmark: running Cor, Marlin for at least 5 CPU seconds... Marlin and Meow has type constraint checking
       Cor:  5 wallclock secs ( 5.13 usr +  0.02 sys =  5.15 CPU) @ 2886788.16/s (n=14866959)
    Marlin:  5 wallclock secs ( 5.01 usr +  0.11 sys =  5.12 CPU) @ 4523074.80/s (n=23158143)
      Meow:  5 wallclock secs ( 5.16 usr + -0.01 sys =  5.15 CPU) @ 5196218.06/s (n=26760523)
Benchmark: running Marlin, Meow, Moo, Mouse for at least 5 CPU seconds...
    Marlin:  5 wallclock secs ( 5.22 usr +  0.13 sys =  5.35 CPU) @ 4814728.04/s (n=25758795)
      Meow:  5 wallclock secs ( 5.23 usr +  0.01 sys =  5.24 CPU) @ 5203329.96/s (n=27265449)
       Moo:  4 wallclock secs ( 5.28 usr +  0.00 sys =  5.28 CPU) @ 860649.81/s (n=4544231)
     Mouse:  6 wallclock secs ( 5.29 usr +  0.01 sys =  5.30 CPU) @ 1603849.25/s (n=8500401)
            Rate    Moo  Mouse Marlin   Meow
Moo     860650/s     --   -46%   -82%   -83%
Mouse  1603849/s    86%     --   -67%   -69%
Marlin 4814728/s   459%   200%     --    -7%
Meow   5203330/s   505%   224%     8%     --

I must be honest, around this time I had not implemented the full benchmarks against Perl and Python. I didn't fully understand the difference, so I had some thoughts that I was hitting limitations with my own hardware (it was late, or early in the morning). Anyway, I kept pushing and ran a benchmark where I accessed the slot directly as an array reference. This got me excited:

Meow (direct) 7,172,481/s     778%    347%     50%     14%

I was seeing a huge improvement. I spent some time making an API that was a little nicer by exposing constants as slot indexes:

{
    package Cat 
    use Meow;
    ro name => Str;
    ro age => Int;
    make_immutable;  # Creates $Cat::NAME, $Cat::AGE
}

# Direct slot access
my $name = $cat->[$Cat::NAME];

I was now on par with Python, but I wanted more. There had to be a way to get that array access without the ugly syntax.

So I dug deeper into Perl's internals and found the missing magic: cv_set_call_checker and custom ops.

The entersub bypass: Custom ops

Here's what normally happens when you call a method in Perl:

name($cat)
  → OP_ENTERSUB (the "call function" op)
    → Push arguments onto stack
    → Look up the CV (code value)
    → Set up new stack frame
    → Execute the XS function
    → Pop stack frame
    → Return

Even for our minimal XS accessor, there's overhead: the entersub op itself, the stack frame setup, the CV lookup. What if we could eliminate all of that?

Perl provides a hook called cv_set_call_checker. It allows you to register a "call checker" function that runs at compile time when the parser sees a call to your subroutine. The checker can inspect the op tree and crucially replace it with something else entirely.

Here's what Meow does:

static void _register_inline_accessor(pTHX_ CV *cv, IV slot_index, int is_ro) {
    SV *ckobj = newSViv(slot_index);  /* Store slot index for later */
    cv_set_call_checker_flags(cv, S_ck_meow_get, ckobj, 0);
}

When the checker sees name($cat), it:

Extracts the $cat argument from the op tree
Frees the entire entersub operation
Creates a new custom op with the slot index baked in
Returns that instead

The custom op is trivially simple:

static OP *S_pp_meow_get(pTHX) {
    dSP;
    SV *self = TOPs;
    PADOFFSET slot_index = PL_op->op_targ;  /* Baked into the op */

    SV **ary = AvARRAY((AV*)SvRV(self));
    SETs(ary[slot_index] ? ary[slot_index] : &PL_sv_undef);

    return NORMAL;
}

That's the entire accessor. No function call. No stack frame. No CV lookup. The slot index is embedded directly in the op structure. The Perl runloop executes this op directly, it's as close to $cat->[$NAME] as you can get while still looking like name($cat).

This is the same technique that builtin::true and builtin::false use in Perl 5.36+. It's also how List::Util::first can be optimised when given a simple block.

The final benchmark

With custom ops in place via import_accessors, here's how the Perl OO frameworks compare:

Benchmark: running Marlin, Meow, Meow (direct), Meow (op), Moo, Mouse for at least 5 CPU seconds...
    Marlin:  6 wallclock secs ( 5.09 usr +  0.11 sys =  5.20 CPU) @ 4766685.58/s (n=24786765)
      Meow:  5 wallclock secs ( 5.29 usr +  0.01 sys =  5.30 CPU) @ 6289606.79/s (n=33334916)
Meow (direct):  5 wallclock secs ( 5.32 usr +  0.01 sys =  5.33 CPU) @ 7172480.86/s (n=38229323)
 Meow (op):  5 wallclock secs ( 5.16 usr +  0.01 sys =  5.17 CPU) @ 7394453.19/s (n=38229323)
       Moo:  4 wallclock secs ( 5.44 usr +  0.02 sys =  5.46 CPU) @ 816865.93/s (n=4460088)
     Mouse:  4 wallclock secs ( 5.18 usr +  0.01 sys =  5.19 CPU) @ 1605727.55/s (n=8333726)
                   Rate      Moo   Mouse  Marlin    Meow Meow (direct) Meow (op)
Moo            816866/s       --    -49%    -83%    -87%          -89%      -89%
Mouse         1605728/s      97%      --    -66%    -74%          -78%      -78%
Marlin        4766686/s     484%    197%      --    -24%          -34%      -36%
Meow          6289607/s     670%    292%     32%      --          -12%      -15%
Meow (direct) 7172481/s     778%    347%     50%     14%            --       -3%
Meow (op)     7394453/s     805%    361%     55%     18%            3%        --

Now lets test that directly against python:

============================================================
Python Direct Benchmark (slots + property accessors)
============================================================
Python version: 3.9.6 (default, Dec  2 2025, 07:27:58)
[Clang 17.0.0 (clang-1700.6.3.2)]
Iterations: 5,000,000
Runs: 5
------------------------------------------------------------
Run 1: 0.649s (7,704,306/s)
Run 2: 0.647s (7,733,902/s)
Run 3: 0.646s (7,736,307/s)
Run 4: 0.648s (7,720,909/s)
Run 5: 0.649s (7,702,520/s)
------------------------------------------------------------
Median rate: 7,720,909/s
============================================================
============================================================
Perl/Meow Benchmark Comparison
============================================================
Perl version: 5.042000
Iterations: 5000000
Runs: 5
------------------------------------------------------------
Inline Op (one($foo)):
  Run 1: 0.638s (7,841,811/s)
  Run 2: 0.629s (7,954,031/s)
  Run 3: 0.631s (7,929,850/s)
  Run 4: 0.631s (7,926,316/s)
  Run 5: 0.633s (7,901,675/s)
  Median: 7,926,316/s
============================================================
Summary:
------------------------------------------------------------
  Inline Op:    7,926,316/s
============================================================

Conclusion: Why JIT might be the right approach

Looking back at this journey, a pattern emerges. The fastest code isn't the cleverest code. It's the code that does the least work at runtime.

Moo is slow because of the abstraction.

Marlin proved that you could have Moo's features without Moo's overhead by making smart choices at compile time. If an accessor doesn't need lazy building, don't generate code that checks for lazy building.

Meow pushed this further: if you're going to generate code at compile time anyway, why not generate exactly the code you need? Not a generic accessor that handles many cases, but a specific accessor for this specific attribute on this specific class.

And XS::JIT made that practical. Without a lightweight JIT compiler, dynamic XS generation would require shipping a C toolchain with every module, or adding multi-megabyte dependencies. XS::JIT strips the problem down to its essence: take C code, compile it, load it.

The result is object access that competes with, and sometimes beats, languages that have had decades of optimisation work. Not because Perl's interpreter got faster, but because we stopped asking it to do unnecessary work.

Is this approach right for every project? No. Most applications don't need 7 million object accesses per second.

But for the times when performance matters (hot loops, high-frequency trading, real-time systems) it's good to know the ceiling isn't as low as we thought. Perl can be fast. We just needed to get out of its way.

The modules discussed in this post:

Marlin: https://metacpan.org/pod/Marlin
Meow: https://metacpan.org/pod/Meow
XS::JIT: https://metacpan.org/pod/XS::JIT
Inline::C: https://metacpan.org/pod/Inline::C

Top comments (7)

bbrtj • Jan 26

You don't tell us what class you are benchmarking, or I have missed it completely. Moo is actually pretty fast if you include XS speedup modules - MooX::XSConstructor (also from Toby), Class::XSAccessor. If you also use Type::Tiny in Moo classes, the speed is improved pretty dramatically with MooX::TypeTiny. It gets quite speedy with these modules, on par with Mouse or even a bit faster. I've made Mooish::Base which pulls these extra modules automatically if available, which makes Moo a nice compromise I think.

LNATION LNATION • Jan 26

check here - metacpan.org/pod/Meow

bbrtj • Jan 26

I ran this benchmark myself but included a new case which uses Mooish::Base with all extra modules installed for Moo. Here's the result:

            Rate/s Precision/s    Moo  Mouse Mooish Marlin
Moo         268900          53     -- -46.7% -62.5% -83.0%
Mouse       504832          76  87.7%     -- -29.6% -68.1%
Mooish      717450         250 166.8%  42.1%     -- -54.7%
Marlin 1.58268e+06         350 488.6% 213.5% 120.6%     --

Half the speed of Marlin is actually pretty nice, since it's just a regular Moo with all its features. No Meow case since I could not get XS::JIT installed. Additional speedup is achieved with using just plain scalar values for defaults instead of subs. That's what Marlin / Meow does, but not Moo/Mouse.

LNATION LNATION • Jan 26

And the 'Moo' is using those same deps in your test?

bbrtj • Jan 26

I left all your benchmarks unchanged, just added another Mooish one which is just Moo with those extra modules and non-sub defaults. Moo only uses Class::XSAccessor by default, other extra modules must be included explicitly, which is what Mooish::Base does. It does not use any magic really, just a convenience layer on top of Moo.

LNATION LNATION • Jan 26

I wasn't sure on what your module did, I searched Mooish and found something else I think. I found it now and yes you've made a base class that wraps the XS modules and some other functionally, nice to see.

bbrtj • Jan 27

Ah yes sorry, that was miscommunication on my part. "Mooish" case in benchmark is using metacpan.org/pod/Mooish::Base . I've only shared this to point out Moo can in fact be pretty fast with the right configuration, which my module is automating and documenting. That does not change the fact that I appreciate your (or Toby's) work on making a faster alternative. Though it would be ideal if blessed hashes themselves were optimized to a point where we don't need any XS tricks anymore.