DEV Community

Cover image for One Stack Machine, Two Runtimes: Cross-Validating Scala and Perl with a Shared JSON IR
Yoshihiro Hasegawa
Yoshihiro Hasegawa

Posted on

One Stack Machine, Two Runtimes: Cross-Validating Scala and Perl with a Shared JSON IR

This follows up on Mainframes, Monads, and Stack Machines.
Short version of that post: we model computation as a small stack machine whose
programs are assembled in Dhall, executed by the Scala JVM, and orchestrated as
Spring Batch steps under Pekko actor supervision.
This post is about what happens when you want to verify that computation
is reproducible **outside
* the JVM.*


At some point in every polyglot project, you ask yourself: how do I know the Scala
implementation and the $OTHER_LANGUAGE implementation actually agree?

The usual answers are either "write integration tests" (but what format do they
exchange data in?) or "generate the other language's code" (but then you're testing
a code generator, not the logic itself).

What we landed on for Siunertaq was neither. It turned out the cleanest solution
was to define a shared intermediate representation in JSON and write independent
implementations on both sides that consume it. The Perl side ended up being a single
.pm file, 80 lines, no CPAN dependencies.

Here is how it came together.


๐ŸŽฏ What We're Cross-Validating

The core of Siunertaq is a small stack machine. A Program is a Vector[Instr]
where Instr is a sealed enum:

enum Instr derives CanEqual:
  case PushScalar(n: Int)
  case PushVec3(x: Int, y: Int, z: Int)
  case AddScalar
  case AddVec3
  case MulScalar
  case DotVec3
Enter fullscreen mode Exit fullscreen mode

ProgramEval.exec runs the program on a JVM stack and returns
Either[String, Value]. The Value is either a ScalarValue(n: Int) or a
Vec3Value(x, y, z).

Because this stack machine is what actually drives our Spring Batch steps โ€”
each step receives a Program from Dhall, evaluates it, and stores the result
in the JobExecutionContext โ€” getting it right matters. We wanted a second
opinion from a completely independent runtime.


โŒ The First Attempt: Generate Inline Perl

The obvious approach is to translate Program into Perl code directly:

// First attempt at toPerlScript
case Instr.PushScalar(n) => s"push @stack, $n;\n"
case Instr.AddScalar     => "{ my $r=pop @stack; my $l=pop @stack; push @stack, $l+$r; }\n"
// ...
Enter fullscreen mode Exit fullscreen mode

This works. The generated script is self-contained, and perl script.pl gives the
right answer.

But the test we wrote to verify it was unsatisfying:

val code = PerlBridge.generatePerl(prog, printExpr)
code should include ("push @stack, 42;")
code should include ("$b + $a")
Enter fullscreen mode Exit fullscreen mode

We're testing the textual output of a code generator. If we add AddVec3
and forget to handle it, the test fails with a confusing "string not found" error,
not a "wrong result" error. More importantly, this code is tested in isolation โ€”
we're not actually running it against anything to prove the Perl and Scala
computations match.

There was also a structural problem: the generated Perl was different every time
we changed the generator, even when the semantics hadn't changed.
It was fragile to maintain.


๐Ÿ’ก The Insight: JSON as the Shared IR

The codebase already had a JSON representation for stack instructions โ€” specifically,
ClassASTBridge reads JVM .class files and emits:

[{"PushScalar":{"n":5}}, {"PushScalar":{"n":3}}, {"AddScalar":{}}]
Enter fullscreen mode Exit fullscreen mode

This JSON is what gets stored in the PostgreSQL instructions column
(via ForthRegistrar.registerStep). The format was already a stable contract.

The question was: why not use the same JSON for the Perl bridge?

Instead of generating Perl code that performs the computation, we would:

  1. Serialize the Program to that same JSON format.
  2. Send the JSON to Perl.
  3. Let Perl parse and execute it.

The generated .pl script becomes almost trivially thin:

use Siunertaq::StackMachine;
my $sm = Siunertaq::StackMachine->new();
$sm->execute_json('[{"PushScalar":{"n":5}},{"PushScalar":{"n":3}},{"AddScalar":{}}]');
$sm->print_scalar();
Enter fullscreen mode Exit fullscreen mode

All the logic lives in StackMachine.pm. The script is just calling the library.


๐Ÿ“ฆ Program.toJson โ€” One Format, Three Consumers

We added toJson directly to object Program in the core module
(circe was already a dependency there):

object Program:
  def toJson(program: Program): Json =
    Json.fromValues(program.map {
      case Instr.PushScalar(n)     => Json.obj("PushScalar" -> Json.obj("n" -> n.asJson))
      case Instr.PushVec3(x, y, z) => Json.obj("PushVec3"  -> Json.obj(
                                        "x" -> x.asJson, "y" -> y.asJson, "z" -> z.asJson))
      case Instr.AddScalar         => Json.obj("AddScalar"  -> Json.obj())
      case Instr.AddVec3           => Json.obj("AddVec3"    -> Json.obj())
      case Instr.MulScalar         => Json.obj("MulScalar"  -> Json.obj())
      case Instr.DotVec3           => Json.obj("DotVec3"    -> Json.obj())
    })
Enter fullscreen mode Exit fullscreen mode

The same format is now consumed by three things:

ClassASTBridge.opcodeToInstr   (.class bytecode โ†’ JSON) โ”€โ”€โ”
Program.toJson                 (Program โ†’ JSON)           โ”€โ”ผโ”€โ”€โ–บ PostgreSQL JSONB
                                                           โ”‚    ClickHouse analytics
StackMachine->execute_json     (Perl โ† JSON)              โ”˜
Enter fullscreen mode Exit fullscreen mode

PostgreSQL is the single source of truth for what ran. Perl proves the computation
is reproducible outside the JVM without any shared code.


๐Ÿช Siunertaq::StackMachine.pm

The Perl module lives at
modules/batch-bridge/src/main/resources/perl/Siunertaq/StackMachine.pm
and is bundled inside the batchBridge JAR as a classpath resource.

package Siunertaq::StackMachine;
use strict;
use warnings;
use JSON::PP;   # core module โ€” no CPAN, no cpanm

sub new {
    my ($class, %opt) = @_;
    return bless {
        stack => [],
        out   => $opt{out} // \*STDOUT,  # swappable for tests
    }, $class;
}

sub execute_json {
    my ($self, $json_str) = @_;
    my $instrs = JSON::PP::decode_json($json_str);
    for my $instr (@$instrs) {
        if    (exists $instr->{PushScalar}) {
            push @{$self->{stack}}, $instr->{PushScalar}{n};
        }
        elsif (exists $instr->{PushVec3}) {
            my $v = $instr->{PushVec3};
            push @{$self->{stack}}, [$v->{x}, $v->{y}, $v->{z}];
        }
        elsif (exists $instr->{AddScalar}) { $self->_add_scalar() }
        elsif (exists $instr->{AddVec3})   { $self->_add_vec3()   }
        elsif (exists $instr->{MulScalar}) { $self->_mul_scalar() }
        elsif (exists $instr->{DotVec3})   { $self->_dot_vec3()   }
        else  { die "unknown instr: " . JSON::PP::encode_json($instr) . "\n" }
    }
}

sub print_scalar { print { $_[0]->{out} } $_[0]->{stack}[-1], "\n" }
sub print_vec3   {
    my $v = $_[0]->{stack}[-1];
    print { $_[0]->{out} } "$v->[0] $v->[1] $v->[2]\n";
}

sub _add_scalar {
    my $self = shift;
    my ($r, $l) = (pop @{$self->{stack}}, pop @{$self->{stack}});
    push @{$self->{stack}}, $l + $r;
}
# _add_vec3, _mul_scalar, _dot_vec3 follow the same pattern
Enter fullscreen mode Exit fullscreen mode

Three things worth noting:

JSON::PP is a core module. It ships with Perl 5.6+. No installation needed,
not on ubuntu-latest, not on Strawberry Perl. This was a deliberate constraint:
the Perl side should have zero external dependencies so it can run anywhere perl runs.

out => $fh makes unit testing easy. You can redirect output to an in-memory
scalar ref without spawning a subprocess:

my $buf = '';
open my $fh, '>', \$buf;
my $sm = Siunertaq::StackMachine->new(out => $fh);
$sm->execute_json('[{"PushScalar":{"n":3}},{"PushScalar":{"n":4}},{"AddScalar":{}}]');
$sm->print_scalar();
# $buf eq "7\n"  โ† no subprocess, no temp file
Enter fullscreen mode Exit fullscreen mode

die on unknown instructions. If Scala adds a new Instr variant and forgets
to handle it in StackMachine.pm, the Perl side fails loudly on the first encounter.
No silent wrong answers.


๐Ÿ”Œ PerlBridge: the Scala Glue Layer

PerlBridge is an object in batch-bridge that manages the OS-specific perl
binary detection and the temp-directory lifecycle:

object PerlBridge:

  // OS branching: Strawberry Perl on Windows, system perl on Linux
  private lazy val perlBinary: Option[String] =
    if isWindows then
      List(raw"C:\Strawberry\perl\bin\perl.exe", ...)
        .find(p => Files.isExecutable(Paths.get(p)))
        .orElse(findInPath("perl.exe", "perl"))
    else
      List("/usr/bin/perl", "/usr/local/bin/perl")
        .find(p => Files.isExecutable(Paths.get(p)))
        .orElse(findInPath("perl"))

  private def toPerlScript(program: Program, printMethod: String): String =
    val json = Program.toJson(program).noSpaces
    s"""#!/usr/bin/perl
       |use strict; use warnings;
       |use FindBin; use lib "$$FindBin::Bin";
       |use Siunertaq::StackMachine;
       |my $$sm = Siunertaq::StackMachine->new();
       |$$sm->execute_json('${json.replace("'", "\\'")}');
       |$$sm->$printMethod();
       |""".stripMargin
Enter fullscreen mode Exit fullscreen mode

executePerl creates a temp directory, deploys StackMachine.pm from the JAR's
classpath resources, runs perl, then recursively deletes the directory:

def executePerl(script: String, perl: String, label: String): IO[Either[String, String]] =
  IO.blocking:
    val tmpDir    = Files.createTempDirectory("siunertaq-perl-")
    val scriptFile = tmpDir.resolve(s"$label.pl")
    val pmDir      = tmpDir.resolve("Siunertaq")
    try
      Files.writeString(scriptFile, script)
      Files.createDirectories(pmDir)
      Files.writeString(pmDir.resolve("StackMachine.pm"), stackMachinePm)
      // ... run perl, capture output
    finally
      Files.walk(tmpDir).sorted(Comparator.reverseOrder()).forEach(p => Files.deleteIfExists(p): Unit)
Enter fullscreen mode Exit fullscreen mode

use FindBin; use lib "$FindBin::Bin" resolves Siunertaq/StackMachine.pm
relative to the script file โ€” so no PERL5LIB manipulation needed.

The public API exposed for testing:

def isAvailable: Boolean                              // perlBinary.isDefined
def perlBin: String                                   // path or "perl"
def generatePerl(program: Program, method: String): String
def runViaPerl(program: Program): IO[Either[String, Value]]
def crossCheck(program: Program): IO[Either[String, Value]]  // = runViaPerl
Enter fullscreen mode Exit fullscreen mode

โœ… PerlBridgeSpec: 32 Tests, All Green

PerlBridgeSpec:
  ProgramLifter.liftProgram    5 / 5  โœ“   (Lowering round-trips)
  ProgramLifter.liftTyped      3 / 3  โœ“   (GADT dispatch)
  PerlBridge.generatePerl      7 / 7  โœ“   (JSON-based assertions)
  TypedParser typeclass         8 / 8  โœ“   (output parsing)
  PerlBridge.runViaPerl         5 / 5  โœ“   (Perl subprocess)
  PerlBridge.crossCheck         4 / 4  โœ“   (Scala == Perl)
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Total                        32 / 32     1 second 826 milliseconds
Enter fullscreen mode Exit fullscreen mode

The ยง3 tests that used to check code should include ("push @stack, 42;") now check
the JSON content:

it("PushScalar(42) โ†’ JSON ใซ \"PushScalar\" ใจ 42 ใŒๅซใพใ‚Œใ‚‹"):
  val code = PerlBridge.generatePerl(Vector(Instr.PushScalar(42)), "print_scalar")
  code.should(include("PushScalar"))
  code.should(include("42"))
Enter fullscreen mode Exit fullscreen mode

The ยง5/ยง6 integration tests run a real Perl subprocess and compare results:

it("3 + 4 = 7  (Scalar AddScalar)"):
  ensurePerl()
  val prog = Vector(Instr.PushScalar(3), Instr.PushScalar(4), Instr.AddScalar)
  PerlBridge.runViaPerl(prog).unsafeRunSync().shouldBe(Right(ScalarValue(7)))

it("Vec3 Dot: crossCheck == Scala result"):
  ensurePerl()
  val prog = Vector(PushVec3(1,2,3), PushVec3(4,5,6), DotVec3)
  PerlBridge.crossCheck(prog).unsafeRunSync().shouldBe(ProgramEval.exec(prog))
Enter fullscreen mode Exit fullscreen mode

perl is pre-installed on ubuntu-latest, so no extra setup step is needed.
The tests are gated behind RUN_PERL_CROSSCHECK=1 and skip cleanly when Perl
is unavailable (e.g., on a developer machine that hasn't installed it).


๐Ÿ“ฆ CI: .pl and .pm as Artifacts

One side effect of this architecture is that the .pl scripts (JSON + two method
calls) and StackMachine.pm can be saved as CI artifacts for inspection:

env:
  RUN_PERL_CROSSCHECK: "1"
  PERL_SCRIPT_SAVE_DIR: ci-out/perl-scripts

# artifact structure after a successful run:
# ci-out/
#   perl-scripts/
#     Siunertaq/StackMachine.pm
#     runViaPerl.pl
#   classes/
#     core/io/siunertaq/expr/Program.class
#     ...
Enter fullscreen mode Exit fullscreen mode

The .class files from ClassASTBridge and the .pl scripts from PerlBridge
land in the same artifact bundle, built from the same JSON schema. Reviewing them
side-by-side in a failed CI run is genuinely useful.


๐Ÿ”ฎ What's Next: RubyBridge

The natural next step is Siunertaq::StackMachine.rb consuming the same JSON via
Ruby's built-in json library. The toPerlScript / toRubyScript generators would
then share a ScriptBackend trait:

trait ScriptBackend:
  def preamble(libDir: String): String
  def executeJsonCall(json: String): String
  def printMethod(ty: Ty): String
Enter fullscreen mode Exit fullscreen mode

At that point, adding a third language is a matter of implementing one trait and
writing a 80-line mirror of StackMachine.pm. The JSON schema stays fixed;
the per-language logic stays isolated.


Code is at github.com/Yoshyhyrro/Siunertaq.
The PerlBridge and Siunertaq::StackMachine.pm live in modules/batch-bridge.
Issues and PRs welcome โ€” especially if you know a more idiomatic way to handle
the FindBin dance on Windows Strawberry Perl. ๐Ÿ™

Top comments (0)