DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

Designing an Event-Sourced Audit Trail for a Financial Microservice

Designing an Event-Sourced Audit Trail for a Financial Microservice

Designing an Event-Sourced Audit Trail for a Financial Microservice

Auditing is essential in financial software. A robust audit trail not only records what happened, but when, why, and by whom. This guide walks you through designing an event-sourced audit trail for a microservice architecture. You’ll learn the rationale, data model, storage strategy, consistency considerations, and practical implementation with code sketches in a modern stack (TypeScript, Node.js, PostgreSQL, and Kafka). The approach emphasizes reliability, privacy controls, replayability, and observability.

Overview and motivation

  • Why event sourcing for audit trails: immutability, replayability, and a single source of truth for historical state changes. In finance, you often need to reconstruct events for investigations, regulatory reporting, or disaster recovery.
  • Goals:
    • Immutable, append-only event log for all sensitive operations.
    • Time-based queries and reproducible state derivations.
    • Efficient long-term storage with partitioning and compaction.
    • Privacy controls: redact or protect PII when exporting logs.
    • Observability: end-to-end traceability across services with correlation IDs.

Architectural sketch

  • Core idea: every business operation emits domain events to an audit event store. The audience is the audit analysts and compliance tooling.
  • System components:
    • Domain services: emit events when business actions occur (e.g., money transfer initiated, approval granted).
    • Event bus: a durable, scalable backbone (Kafka or Pulsar) to transport events.
    • Audit write model: append-only store optimized for inserts and time-based queries (PostgreSQL with partitioning, or a dedicated log store like Apache Pinot/ClickHouse for analytics).
    • Read models and export: materialized views, dashboards, and export pipelines for regulatory reports.
    • Replay and compliance tooling: replay engine to reconstruct state for a given time window or scenario.
  • Data flows:
    • A domain command is processed by a service.
    • The service emits one or more audit events with metadata (timestamp, actor, correlation_id, transaction_id, sensitive fields masked as needed).
    • The event bus persists events durably.
    • The audit store ingests events from the bus and builds time-partitioned tables.
    • Consumers (reporting, alerting) subscribe to the event stream or read from the audit store.

Key design decisions

  • Event schema vs. log schema:
    • Use a stable, versioned event schema with a discriminator for event_type, a payload field as a JSON blob, and metadata (timestamp, actor_id, correlation_id, tenant_id).
    • Keep domain event names expressive (e.g., account_created, funds_transfer_initiated, funds_transfer_completed, audit_redaction_applied).
  • Idempotence and deduplication:
    • Each event includes a unique sequence_id or event_id. Use idempotent writes to the audit store.
    • Idempotent consumers: if the same event is delivered twice, the store should ignore duplicates.
  • Privacy and data minimization:
    • Avoid storing PII directly in payloads. Store references (masked or tokenized identifiers) and redact sensitive fields for exports.
    • Implement a data-access layer that can redact or tokenize on export, without altering the immutable event log.
  • Time and ordering:
    • Events are ordered by the event_timestamp, but also preserve a logical order via a monotonically increasing sequence per partition.
    • Use correlation_id to tie together events belonging to the same business transaction.
  • Long-term storage:
    • Partition by month or day, with archiving to cheaper storage for older partitions.
    • Optional: compact or summarize verbose events into a summarized changelog for very old data, while keeping the original events immutable.
  • Operational concerns:
    • Observability: metrics on event emission latency, failed deliveries, backlog size, and redaction/export counts.
    • Security: restrict who can read sensitive fields. Enforce least privilege on the audit store and export tooling.
    • Compliance: enable legal holds and tamper-evident logging (append-only, immutable storage) where required.

Data model design

  • Event envelope (common structure):

    • event_id: UUID
    • event_type: string
    • timestamp: ISO 8601 timestamp
    • actor_id: string (who performed the action)
    • correlation_id: string (trace across services)
    • tenant_id: string
    • version: integer (schema version)
    • payload: JSONB (domain-specific data)
    • redacted_flags: JSONB (per-field redaction status)
  • Example events (conceptual):

    • account_created
    • payload: { account_id, customer_id, initial_balance, account_type }
    • funds_transfer_initiated
    • payload: { transfer_id, from_account, to_account, amount, currency }
    • funds_transfer_completed
    • payload: { transfer_id, status, fees, exchange_rate }
    • audit_redaction_applied
    • payload: { field, original_value, redacted_value }
  • Postgres table schema (simplified):

    • audit_events
    • event_id UUID PRIMARY KEY
    • event_type VARCHAR NOT NULL
    • event_timestamp TIMESTAMPTZ NOT NULL
    • actor_id VARCHAR
    • correlation_id VARCHAR
    • tenant_id VARCHAR
    • version INT
    • payload JSONB
    • redacted_flags JSONB
    • audit_events_partitioned (monthly)
    • In production, create declarative partitioning by date_trunc('month', event_timestamp)

Event bus and ingestion

  • Choose a durable message bus: Kafka is a common default due to strong tooling and ecosystem.
  • Producer side:
    • Each domain service publishes to a topic like audit.events.
    • Use a single topic or topic-per-aggregate? A single topic simplifies consumption; topic-per-aggregate can improve isolation but increases complexity.
    • Ensure producer transactions if you want to atomically emit domain events and audit events together. If you emit separately, rely on correlation_id to link them.
  • Consumer side:
    • Audit store ingester reads events, validates schema, enforces deduplication, and writes into partitioned Postgres or a columnar store for analytics.

Implementation guide (TypeScript/Node.js)

  • Tech stack:
    • Node.js 18+ with TypeScript
    • Kafka (via kafka-node or kafkajs)
    • PostgreSQL (13+ or 15+ with partitioning)
    • Optional: ClickHouse for analytics or Apache Pinot for fast queries
  • Project structure (simplified):

    • src/
    • events/
      • index.ts (event type registry)
      • accountCreated.ts
      • fundsTransferInitiated.ts
      • fundsTransferCompleted.ts
    • producers/
      • auditProducer.ts
    • consumers/
      • auditIngestor.ts
    • models/
      • auditEvent.ts
    • services/
      • accountService.ts
    • infrastructure/
    • kafka.ts
    • postgres.ts
    • config/
    • environment.ts
  • Example event type definitions (simplified):

// src/events/accountCreated.ts
export interface AccountCreatedPayload {
account_id: string;
customer_id: string;
initial_balance: number;
account_type: string;
}
export const AccountCreatedEvent = {
event_type: 'account_created',
toPayload: (p: AccountCreatedPayload) => ({
account_id: p.account_id,
customer_id: p.customer_id,
initial_balance: p.initial_balance,
account_type: p.account_type
})
}

  • Publishing an audit event:

// src/producers/auditProducer.ts
import { Kafka, Producer, RecordMetadata } from 'kafkajs';
const kafka = new Kafka({ clientId: 'audit-service', brokers: ['kafka:9092'] });
const producer: Producer = kafka.producer();
await producer.connect();

export async function publishAuditEvent(eventType: string, eventPayload: any, metadata: { actor_id: string; correlation_id: string; tenant_id: string; event_timestamp?: string; version?: number }) {
const payload = {
event_type: eventType,
event_timestamp: metadata.event_timestamp ?? new Date().toISOString(),
actor_id: metadata.actor_id,
correlation_id: metadata.correlation_id,
tenant_id: metadata.tenant_id,
version: metadata.version ?? 1,
payload: eventPayload,
redacted_flags: {}
};
await producer.send({
topic: 'audit.events',
messages: [{ key: payload.event_type, value: JSON.stringify(payload) }]
});
}

  • Ingestion into PostgreSQL (simplified):

// src/consumers/auditIngestor.ts
import { Client } from 'pg';
const client = new Client({ /* connection config */ });
await client.connect();

// consume from Kafka, parse message
// upsert or insert into audit_events with partitioning by month
const insertText =
INSERT INTO audit_events (event_id, event_type, event_timestamp, actor_id, correlation_id, tenant_id, version, payload, redacted_flags)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
ON CONFLICT (event_id) DO NOTHING
;
// generate event_id, map fields, etc.

  • Partitioning strategy in PostgreSQL (simplified):

    • Create table: CREATE TABLE audit_events ( event_id UUID PRIMARY KEY, event_type VARCHAR NOT NULL, event_timestamp TIMESTAMPTZ NOT NULL, actor_id VARCHAR, correlation_id VARCHAR, tenant_id VARCHAR, version INT, payload JSONB, redacted_flags JSONB ) PARTITION BY RANGE (event_timestamp);
    • Create monthly partitions: CREATE TABLE audit_events_2026_01 PARTITION OF audit_events FOR VALUES FROM ('2026-01-01') TO ('2026-02-01'); ... automate for future months.
  • Efficient querying examples:

    • Get all events for a transaction: SELECT * FROM audit_events WHERE correlation_id = 'abc-123' ORDER BY event_timestamp;
    • Time-bounded query for a period: SELECT * FROM audit_events WHERE event_timestamp >= '2026-03-01' AND event_timestamp < '2026-04-01' ORDER BY event_timestamp;
    • Redaction-aware export:
    • Build a view that presents redacted data for exports, or implement a separate export pipeline that applies redaction rules.

Privacy controls and redaction

  • Redaction policy:
    • Define sensitive fields in a redaction policy per event type.
    • Example: in funds_transfer_completed, mask account numbers except last 4 digits, and mask amount if needed for certain exports.
  • Implementation approach:
    • Persist redacted_flags per event to indicate what was redacted and why.
    • Provide an export API or job that materializes redacted views for sharing with external systems.
    • Keep the original payload immutable; apply redaction only when exporting or exposing data externally.

Observability and operations

  • Metrics to monitor:
    • emit_latency_ms: time from domain action to event published to Kafka.
    • ingest_latency_ms: time from Kafka message receipt to insert into audit store.
    • backlog_count: number of unprocessed messages in the consumer group.
    • completion_rate: percentage of events successfully ingested.
    • redact_export_count: number of redacted exports generated.
  • Tracing:
    • Propagate trace identifiers via correlation_id and include trace_id in event payload for end-to-end tracing.
  • Backups and durability:
    • Regular backups of the audit store; consider point-in-time recovery.
    • WORM (write-once, read-many) storage for compliance archives if required.

Operational example: end-to-end scenario

  • A funds transfer is initiated:
    • Domain service emits funds_transfer_initiated event to Kafka with payload containing transfer_id, from_account, to_account, amount, currency; includes correlation_id and tenant_id.
    • Audit stream ingests the event and writes to audit_events.
    • A downstream service completes the transfer and emits funds_transfer_completed with status and metadata.
    • Analysts can replay all events for a given correlation_id to reconstruct the entire transaction lifecycle.

Testing tips

  • Test event schema compatibility:
    • Use schema versioning in event envelopes; tests should verify forward/backward compatibility.
  • Test idempotency:
    • Publish the same event twice and ensure only a single record is created in the audit store.
  • End-to-end tests:
    • Simulate a transaction, emit both partial and final events, then query the audit store to validate ordering and completeness.
  • Privacy tests:
    • Validate that exports redact sensitive fields according to policy.

A minimal, runnable scaffold

  • If you want a quick-start scaffold to experiment:
    • Create a small Node.js service with:
    • kafkaJS to produce and consume from a local Kafka (or use a Dockerized Kafka).
    • A PostgreSQL instance with a simple audit_events table and a partition for the current month.
    • Implement a single event type (account_created) to verify the end-to-end flow.
    • Add a simple REST endpoint to trigger an event and a separate endpoint to export recent events with redaction rules applied.

Common pitfalls and how to avoid them

  • Pitfall: treating the audit log as the system of record for business state.
    • Fix: keep an actual source of truth for business state in the domain services; audit events are immutable logs of actions, not the canonical state.
  • Pitfall: overexposing PII in payloads.
    • Fix: design a robust redaction/export policy; store only references and use redact-on-export.
  • Pitfall: brittle event schemas breaking consumers.
    • Fix: version your events, keep a registry, and implement adapters that can handle multiple versions.

Illustrative example: event envelope JSON

{
"event_id": "a1b2c3d4-e5f6-...-9abc",
"event_type": "funds_transfer_initiated",
"event_timestamp": "2026-06-03T01:15:42.123Z",
"actor_id": "user_987",
"correlation_id": "corr-555",
"tenant_id": "tenant_acme",
"version": 1,
"payload": {
"transfer_id": "tx-20260603-001",
"from_account": "acct-000123456",
"to_account": "acct-000654321",
"amount": 250.00,
"currency": "GBP"
},
"redacted_flags": {
"from_account": false,
"to_account": false,
"amount": false
}
}

Would you like a concrete starter repository skeleton (with Docker Compose for Kafka and PostgreSQL) and a sample TypeScript module set to implement this pattern? If so, I can draft a ready-to-run starter with minimal code, tests, and CI workflow.

-

Rizwan Saleem | https://rizwansaleem.co

Top comments (0)