t49qnsx7qt-kpanks

Posted on May 22

We just hit 100% TypeScript-Python rail parity in our agent payments SDK. Here's the testing strategy.

#ai #python #opensource #testing

mnemopay 1.1.0 hit PyPI yesterday with full Paystack + Lightning Network rail support. That closes a parity gap I've been chasing for months — TypeScript (@mnemopay/sdk) and Python (mnemopay) now ship the same six payment rails with the same interface.

Pick your language, pick your rail:

# Python
from mnemopay.rails.paystack import PaystackRail
from mnemopay.rails.lightning import LightningRail

rail = PaystackRail(secret_key=os.environ["PAYSTACK_SECRET"])
result = rail.hold(amount=5000, currency="NGN", reference="agent-tx-42")

// TypeScript
import { PaystackRail, LightningRail } from "@mnemopay/sdk/rails";

const rail = new PaystackRail({ secretKey: process.env.PAYSTACK_SECRET });
const result = await rail.hold({ amount: 5000, currency: "NGN", reference: "agent-tx-42" });

Same shape. Same hold / capture / release / reverse / settle lifecycle. Same PaymentRailResult dataclass / type. The agent code on top doesn't care which rail or which language.

What I want to share is the testing strategy that caught the bugs. Cross-language parity sounds simple — copy the spec, write the code, ship. It is not simple. Three pitfalls bit me. Here's how to catch each one without a four-week debugging cycle.

Pitfall 1 — Inherited dataclass default-field placement

When I first ported PaystackHoldResult from TypeScript to Python, this looked fine:

from dataclasses import dataclass
from .types import PaymentRailResult

@dataclass
class PaystackHoldResult(PaymentRailResult):
    paystack_reference: str
    authorization_code: str | None = None

It blew up at import time:

TypeError: non-default argument 'paystack_reference' follows default argument

Python's @dataclass enforces that all non-default fields must come before default fields. The base class PaymentRailResult has default fields (ok: bool = True, error: str | None = None). The subclass added a non-default field after them. TypeScript doesn't care because it has no "default field ordering" concept — TS just compiles to JS where everything is undefined until set.

The fix is to require the subclass fields to come with defaults too, or to flatten the hierarchy:

@dataclass
class PaystackHoldResult(PaymentRailResult):
    paystack_reference: str = ""              # default, ordering ok
    authorization_code: str | None = None

Detection strategy: every rail's result class gets a one-line construction test in tests/test_rails.py:

def test_paystack_result_constructs():
    r = PaystackHoldResult(paystack_reference="ref_1")
    assert r.paystack_reference == "ref_1"
    assert r.ok is True

If the dataclass declaration is malformed, this test fails at import — before any actual rail logic runs. It catches the Python-specific issue at the cheapest possible boundary.

Pitfall 2 — Timing-safe HMAC comparison

The TypeScript PaystackRail.verifyWebhook() uses Node's crypto.timingSafeEqual. The naïve Python port did this:

def verify_webhook(self, body: bytes, signature: str) -> bool:
    expected = hmac.new(self.secret_key.encode(), body, "sha512").hexdigest()
    return expected == signature   # ← timing-leak bug

That == is the bug. String comparison short-circuits on the first mismatched byte. A bad actor can guess the signature byte-by-byte by measuring how long the comparison takes. This is a textbook timing attack.

The fix is hmac.compare_digest:

def verify_webhook(self, body: bytes, signature: str) -> bool:
    expected = hmac.new(self.secret_key.encode(), body, "sha512").hexdigest()
    return hmac.compare_digest(expected, signature)

Detection strategy: I added a tests/test_rails_security.py that asserts every webhook verifier in every rail uses hmac.compare_digest. Not by mocking — by literally inspect.getsource() of the function and grepping for compare_digest. If anyone removes it in a future PR, the test fails:

def test_paystack_webhook_uses_timing_safe_compare():
    src = inspect.getsource(PaystackRail.verify_webhook)
    assert "compare_digest" in src, "must use hmac.compare_digest, not =="

Yes, it's a regex-on-source test, which feels hacky. It's also the only test I know that survives someone "cleaning up the comparison." Sometimes the right test is the dumb one.

Pitfall 3 — SSRF via LightningRail's `lnd_rest_url`

Lightning Network Daemon (LND) exposes a REST API that LightningRail talks to. Naïve port:

class LightningRail:
    def __init__(self, lnd_rest_url: str, macaroon: str):
        self.url = lnd_rest_url
        self.macaroon = macaroon

What if lnd_rest_url is http://169.254.169.254/latest/meta-data/? That's the AWS metadata service — your rail just leaked IAM credentials. Or http://127.0.0.1:11434/? You just got an agent's local LLM to call its own admin endpoint.

The TypeScript version had SSRF mitigations. The Python port needed them too. I added a _is_loopback_or_private check that filters:

Loopback IPv4: 127.0.0.0/8
Loopback IPv6: ::1, fc00::/7
Private IPv4: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16
Hexadecimal IP literals: 0x7f000001 (which DNS will resolve to 127.0.0.1)
Octal IP literals: 0177.0.0.1
DNS-rebinding-friendly hosts: localhost, metadata, metadata.google.internal

def _is_loopback_or_private(host: str) -> bool:
    # ... full impl in src

Detection strategy: parameterized pytest with a long list of malicious-looking URLs that the constructor must reject:

@pytest.mark.parametrize("bad_url", [
    "http://127.0.0.1:8080",
    "http://0x7f000001:8080",
    "http://0177.0.0.1:8080",
    "http://localhost/",
    "http://metadata.google.internal/",
    "http://169.254.169.254/latest/meta-data/",
    "http://[::1]/",
    "http://10.0.0.5:443",
])
def test_lightning_ssrf_rejected(bad_url):
    with pytest.raises(ValueError, match="loopback|private"):
        LightningRail(lnd_rest_url=bad_url, macaroon="x")

These are the URLs people actually attack with. If you write the test against 127.0.0.1 only and forget the octal form, the test passes and the bug ships. Parameterize liberally.

The takeaway

Cross-language parity work is mostly about catching the language-specific gotchas that don't exist in the source language. TypeScript doesn't care about field ordering. Python doesn't care about === vs ==. JavaScript fetches don't trigger DNS-rebinding protections by default. Each language ships its own subtle footguns.

The fix isn't more code review. The fix is a small set of dumb-looking tests that catch each class of gotcha at the boundary:

Construction tests catch dataclass shape issues at import time.
Source-inspection tests catch timing-attack regressions on every PR.
Parameterized URL tests catch SSRF holes before they ship.

Each test costs ~5 lines. Each pays for itself the first time someone "refactors for readability" and removes the safety net.

The code is at github.com/mnemopay/mnemopay-python. 435 tests, all green, Apache-2.0. If you find a rail bug or a missing safety check, the issues board is open.

— Jeremiah (@mnemopay)

DEV Community

We just hit 100% TypeScript-Python rail parity in our agent payments SDK. Here's the testing strategy.

Pitfall 1 — Inherited dataclass default-field placement

Pitfall 2 — Timing-safe HMAC comparison

Pitfall 3 — SSRF via LightningRail's `lnd_rest_url`

The takeaway

Top comments (0)

Pitfall 1 — Inherited dataclass default-field placement

Pitfall 2 — Timing-safe HMAC comparison

Pitfall 3 — SSRF via LightningRail's lnd_rest_url

The takeaway

Pitfall 3 — SSRF via LightningRail's `lnd_rest_url`