Samet GOKTEPE

Posted on Jun 20

Five days with eventferry — MySQL, SQL Server, MSK IAM, and a core that didn't move

#node #kafka #typescript #npm

Five days ago I posted I built a transactional outbox toolkit for Node.js — meet eventferry. The pitch was small and specific: Postgres + Kafka, the dual-write bug, a Goldilocks alternative between Debezium and a hand-rolled outbox. The roadmap was a single line — "MySQL, SQL Server, MongoDB are next".

This is the follow-up. Same library, same problem, much wider surface — and a slightly absurd amount of code for one work-week.

Tally as of today:

✅ 3 databases: Postgres, MySQL / MariaDB, SQL Server (Azure SQL DB + Managed Instance + on-prem)
✅ 8 packages under @eventferry/*
✅ A dedicated AWS MSK IAM helper package
✅ A CDC-driven waker for SQL Server (separate package, on purpose)
✅ A 23-page wiki + llms.txt for AI assistants
✅ @eventferry/core did not need to change for any of it

That last bullet is the one I'm proudest of. I'll come back to it at the end.

What got added (in one breath)

Two new database adapters: MySQL / MariaDB and SQL Server
A new AWS MSK IAM helper package (@eventferry/kafka-iam)
A CDC-driven waker package (@eventferry/mssql-cdc-relay)
A Service Broker waker for SQL Server (inside @eventferry/mssql)
Kafka: producer-fenced restart, librdkafka stats hook, publisher.admin(), ensureTopics, validateTopicsOnConnect, publisher.healthCheck()
@eventferry/kafka/consume subpath — decode() + extractTraceContext() for consumers
Schema Registry: subject naming strategies (TopicName / RecordName / TopicRecordName), basic + bearer auth, serializeKey, autoRegister toggle
mTLS + SNI, SASL/OAUTHBEARER, full driver tuning, error classification with errorKind, OpenTelemetry semconv tracing

The two new database adapters

MySQL / MariaDB

import { MysqlStore, createMigrationSql } from "@eventferry/mysql";

await pool.query(createMigrationSql("outbox"));
const store = new MysqlStore({ pool, table: "outbox" });

MySQL 8.0.1+ / MariaDB 10.6+ — both engines use SELECT ... FOR UPDATE SKIP LOCKED for the concurrent claim. There's also a binlog-driven streaming relay (MysqlBinlogRelay) for sub-millisecond wakes on ROW-format binlogs.

Shipping caught one real bug worth telling: MariaDB exposes the JSON type as a LONGTEXT alias and the mysql2 driver doesn't auto-parse it. On MySQL 8 the same column comes back as an already-parsed object. The integration suite caught it on the first MariaDB container run — three tests went red, fix was a defensive JSON.parse for string payloads. The kind of bug you only find by running on the actual engine.

For shops stuck on MySQL 5.7 or MariaDB < 10.6 (no SKIP LOCKED), the README documents an UPDATE ... ORDER BY id LIMIT n + claim-token fallback. Throughput floor is lower, but race-free and shippable.

SQL Server

import { MssqlStore, createMigrationSql } from "@eventferry/mssql";

await pool.request().batch(createMigrationSql("outbox", { schema: "dbo" }));
const store = new MssqlStore({ pool, table: "outbox", schema: "dbo" });

This was the bigger build. Three things in the claim path bit me during design:

READCOMMITTEDLOCK dodges a real RCSI gotcha. When the database has READ_COMMITTED_SNAPSHOT = ON, READPAST is silently rejected under default READ COMMITTED. The table hint forces locking semantics for the statement and works on both RCSI and non-RCSI databases.
ROWLOCK is mandatory with READPAST. Without it, SQL Server's page-lock escalation turns your skip-locked queue into a serial queue. No error, just collapsed throughput.
OUTPUT inserted.* INTO @t is trigger-compatible. Raw OUTPUT inserted.* fails with error 334 the moment any trigger is enabled on the table — even if no triggers exist today.

Schema-side: BIGINT IDENTITY(1,1) clustered (not uniqueidentifier — that fragments the B-tree), NVARCHAR(MAX) CHECK (ISJSON(value) = 1) for payload/headers, DATETIME2(3) for every timestamp (SYSUTCDATETIME() returns datetime2(7) and the comparison is direct — no CONVERT_IMPLICIT penalty that would disable index seeks).

Integration tests against a real SQL Server 2022 testcontainer cover strict head-of-aggregate ordering, no-double-claim under 4 concurrent claimers across 20 aggregates, BIGINT-as-JS-string round-trip (tedious returns BIGINT as a string by default), unicode + emoji payload, and markFailed(null, 'failed') as a TypeError (instant re-claim hot loop guard).

Sub-second waking for SQL Server

Polling works but leaves you with 250 ms–1 s of latency on idle. Two options shipped, picked by your hosting story:

MssqlServiceBrokerWaker (in-package, opt-in)

import { MssqlServiceBrokerWaker, createServiceBrokerSetupSql } from "@eventferry/mssql";

await pool.request().batch(createServiceBrokerSetupSql({ schema: "dbo", table: "outbox" }));

const waker = new MssqlServiceBrokerWaker({
  poolConfig,  // dedicated pool — NOT the main store pool
  schema: "dbo",
  table: "outbox",
});

new Relay({ store, publisher, waker });

The setup helper provisions queues, contracts, message types, an activation procedure that drains EndDialog messages so sys.conversation_endpoints doesn't grow unbounded (the classic Rusanu leak), and an AFTER INSERT trigger that drops a wakeup per statement.

It auto-detects Azure SQL Database (EngineEdition = 5) at start() and refuses with a clear error. Service Broker is unsupported there per Microsoft's own migration-assessment rule — the waker won't lie to you about it.

The dedicated pool is non-negotiable. WAITFOR (RECEIVE ...) holds a connection slot indefinitely; sharing it with the store pool would starve everything else.

@eventferry/mssql-cdc-relay (separate package)

import { MssqlCdcWaker, createCdcEnablementSql } from "@eventferry/mssql-cdc-relay";

await pool.request().batch(createCdcEnablementSql({ schema: "dbo", table: "outbox" }));

new Relay({
  store,
  publisher,
  waker: new MssqlCdcWaker({ pool, schema: "dbo", table: "outbox" }),
});

CDC requires SQL Server Agent, and Agent isn't available on Azure SQL Database. So CDC went into a separate package — bundling it would have excluded the most common managed footprint from the main mssql package.

The waker reads sys.fn_cdc_get_max_lsn() on a small poll loop and wakes the core Relay when the LSN advances past a persisted watermark. The polling Relay then runs the same MssqlStore.claimBatch — CDC is the wake signal, not a parallel publisher. Strict head-of-aggregate still flows through the claim path; I didn't try to re-implement it in CDC space.

Failure modes are typed (CdcNotEnabledError, CdcRetentionExceededError, WatermarkBelowMinLsnError, CdcCaptureJobStoppedError) and surface via onError. The waker never throws into the relay's claim loop — operator-facing errors stay where operators can see them.

What grew on the Kafka side

Five days ago the Kafka publisher was a thin wrapper. Now it's the layer most teams will spend the most config time on.

Auth and TLS

new KafkaPublisher({
  brokers,
  ssl: {
    ca: readFileSync("/etc/ssl/kafka-ca.pem"),
    cert: readFileSync("/etc/ssl/client.pem"),
    key: readFileSync("/etc/ssl/client-key.pem"),
    servername: "broker.example.com",
  },
  sasl: {
    mechanism: "oauthbearer",
    oauthBearerProvider: async () => ({ value: jwt, principal: "service", lifetime: 3600_000 }),
  },
});

Full TlsConfig (CA pinning + mTLS + SNI), every SASL mechanism (PLAIN / SCRAM-SHA-256 / SCRAM-SHA-512 / OAUTHBEARER). The driver asymmetry on OAUTHBEARER is documented in the type — kafkajs reads value; the confluent driver requires value + principal + lifetime and accepts extensions. rejectUnauthorized: false is not an option — TLS verification is non-negotiable; bring a ca.

AWS MSK IAM, as a separate package

import { createMskIamSasl } from "@eventferry/kafka-iam";

new KafkaPublisher({
  brokers,
  ssl: true,
  sasl: createMskIamSasl({ region: "us-east-1" }),
});

Token cache with a configurable refresh-ahead window (default 60 s on the 15-min MSK token lifetime), concurrent-refresh dedup so a thundering herd hits the SigV4 signer once, transient-failure recovery. Lives outside the main kafka package so non-AWS users don't pull aws-sdk into their tree.

Producer-fenced restart

new KafkaPublisher({
  brokers,
  transactional: true,
  transactionalId: () => `${process.env.POD_NAME}-${replicaIndex()}`,  // callable
  autoRecoverFromFence: true,
  hooks: { onProducerFenced: (err) => metrics.counter("kafka.fenced", 1) },
});

PRODUCER_FENCED and INVALID_PRODUCER_EPOCH split into a dedicated errorKind: "fenced". With autoRecoverFromFence: true, the publisher does one transparent disconnect → connect → re-send cycle, with initTransactions re-running as part of the reconnect. Concurrent fenced publishes share a single in-flight reconnect.

For multi-instance setups the documented advice is: leave the option off, use a callable transactionalId derived from per-pod runtime context, and treat a cross-instance fence as the broker telling the loser instance to stop.

Admin + healthcheck

await publisher.ensureTopics([
  { topic: "orders.created",     numPartitions: 12, replicationFactor: 3 },
  { topic: "orders.created.dlq", numPartitions: 3,  replicationFactor: 3 },
]);

app.get("/healthz", async (_req, res) => {
  const status = await publisher.healthCheck({ timeoutMs: 3_000 });
  res.status(status.ok ? 200 : 503).json(status);
});

publisher.admin() borrows a typed admin client; ensureTopics is idempotent; validateTopicsOnConnect catches "topic doesn't exist" surprises at startup on managed Kafka with auto.create.topics.enable=false. healthCheck is documented as "broker reachable + auth still good", not "publisher fully operational" — a fenced transactional producer would still answer healthy here, and the docs say so.

Consumer side, in the same package

import { decode, extractTraceContext } from "@eventferry/kafka/consume";

await consumer.run({
  eachMessage: async ({ message }) => {
    const m = decode(message, { decoder: "utf8" });
    const trace = extractTraceContext(message.headers);
    if (trace) startConsumerSpan(trace.traceId, trace.spanId);
    await handle(m.value);
  },
});

A producer-free subpath. decode() normalizes the raw shape both kafkajs and confluent deliver; extractTraceContext() strictly validates W3C traceparent headers (rejects all-zero IDs, version: ff, malformed hex). Pair with tracer.inject on the publisher and you have end-to-end W3C trace propagation.

@eventferry/all is now genuinely "all"

npm i @eventferry/all

Last week this was 4 packages. Today it installs all 8: core + postgres + mysql + mssql + mssql-cdc-relay + kafka + kafka-iam + schema-registry. Cross-adapter name collisions resolve via Mysql* / Mssql* prefixes — the convention I picked up the moment a second database adapter joined.

For production services that only need one database adapter, the individual packages are still the right install. @eventferry/all is the prototyping convenience, not the production default.

The part I'm proudest of (in five days)

Slice	What it shipped	Did `@eventferry/core` need to change?
Phase A	mTLS, OAUTHBEARER, OTel, tuning, error classifier	No
Phase B	Admin, consume subpath, MSK IAM helper, Schema Registry enhancements	No
Phase C	librdkafka stats, fenced restart, healthCheck, TLS docs	One additive line — added "fenced" to the errorKind union
MySQL adapter	MysqlStore, binlog relay	No
SQL Server adapter	MssqlStore, claim path, migrations	No
Service Broker + CDC waker	Two OutboxWaker implementations	No

Six slices of substantial work, one additive extension on the core's union type, zero breaking changes downstream.

The core defines five interfaces — OutboxStore<Tx>, Publisher, OutboxWaker, Serializer, Logger. Every adapter implements one of these. The relay orchestrates them without knowing what database or broker it's talking to. Dependency Inversion Principle, applied with intent at the start, paying off five days later when "add SQL Server" turned out to mean "implement OutboxStore<mssql.Transaction>" rather than "redesign the core."

This is the boring win nobody writes blog posts about, so here's mine: picking the right seams up-front meant a five-day expansion didn't cascade into a coordinated multi-package rewrite. Anyone still on the version of @eventferry/core that shipped at launch can pull every adapter improvement without an upgrade.

What's next

MongoDB is up next in the roadmap. After that, CockroachDB and SQLite. On the Kafka side: an onMessageRetried hook, per-topic retry policy, a Prometheus metrics helper, OpenTelemetry messaging metrics once the OTel SDK stabilizes the counters.

The bigger thing this week was documentation. There's now a 23-page wiki covering every surface — Getting Started, Core Concepts, every adapter, Schema Registry, Authentication and TLS, AWS MSK IAM, Transactions and EOS, Admin Operations, Observability, Consuming Events, Dead-Letter Queue, Reliability and Error Handling, Operations Guide, Migrations and Upgrades, Troubleshooting, API Reference, FAQ, Roadmap. And an llms.txt for AI assistants that want to recommend the right package.

Try it

npm i @eventferry/all                                                        # everything
npm i @eventferry/core @eventferry/postgres @eventferry/kafka pg kafkajs     # just postgres + kafka
npm i @eventferry/core @eventferry/mssql mssql                               # just sql server

Repository: https://github.com/SametGoktepe/eventferry
Wiki: https://github.com/SametGoktepe/eventferry/wiki
npm scope: https://www.npmjs.com/org/eventferry
Roadmap: https://github.com/SametGoktepe/eventferry/blob/main/ROADMAP.md

If you tried the library when it was Postgres-only and bounced because you were on MySQL or SQL Server, those doors are open now. If you're on the Kafka side and ran into auth / EOS / Schema Registry sharp edges, those got the most polish this week. And if your team has a Node.js / TypeScript service with a database write + a Kafka publish that needs to stay in sync — I'd still love to hear what's working, what's missing, and what you'd want next.

Feedback, issues, and stars very welcome ⭐

DEV Community