Building a Scalable Audit Logging Pipeline in Elixir: Handling Millions of Events Without Breaking Your Database

#elixir #outbox #eventdriven #data

In enterprise applications dealing with payroll, compensation, and benefits data, every change matters. When a single data point can cascade through multiple systems, affecting someone's salary or benefits package, having a complete audit trail isn't just nice to have—it's mission-critical. But capturing these events is only half the battle. The real challenge? Managing millions of audit records without turning your database into a bottleneck.
Let me walk you through how we built a scalable audit logging pipeline for our Elixir-based application, handling over 5 million events monthly per client while keeping our core services fast and our data accessible.

The Challenge

Our core application is built entirely in Elixir, operating in both public and corporate domains. The complexity of our data flows is significant: each data point typically touches multiple workers and goes through an average of 6-7 transactions before settling. While Elixir's concurrency model handles millions of operations with ease, audit logging presents unique challenges that can't simply be solved by throwing more processes at the problem.
The issue compounds quickly. With over 5 million audit events per month for a single client, we're looking at 60+ million records annually. Left unchecked, this growth pattern threatens to turn your primary database into a bottleneck, slowing down queries, bloating backups, and eventually impacting the performance of your core business operations.

Our Solution Architecture

We designed a pipeline that separates audit capture from audit storage, leveraging the outbox pattern to ensure reliable event delivery while keeping our operational database lean.

Architecture Flow

The Components

1. Carbonite: Capturing Changes at the Database Level
Carbonite is an Elixir library that uses database triggers to automatically capture all changes to your tables. This approach offers several advantages:

Zero application code changes: You don't need to manually log changes throughout your codebase
Complete coverage: Every INSERT, UPDATE, and DELETE is captured automatically
Transactional consistency: Audit records are created in the same transaction as the data changes
Rich metadata: Captures before/after values, timestamps, and transaction context

The beauty of Carbonite is that it operates at the database level. Even if you have complex business logic spread across multiple modules and processes, you get comprehensive audit coverage without littering your code with logging statements.

2. The Outbox Pattern: Reliable Event Publishing
The outbox pattern solves a critical problem: how do you reliably publish events from a transactional system? By writing audit events to an outbox table within the same database transaction as your business data, you ensure that either both succeed or both fail—no orphaned audit records or missed events.
Carbonite's outbox utility reads from these outbox tables and publishes events to external systems. This decouples the write path (fast database operations) from the publish path (potentially slower network operations).

3. LavinMQ: Lightweight Message Broker
We chose LavinMQ as our message broker for several reasons:

AMQP protocol support: Industry-standard messaging protocol
Lightweight and fast: Designed for high throughput with minimal resource overhead
Reliable delivery: Ensures messages aren't lost between components
Buffering capability: Handles traffic spikes gracefully

LavinMQ acts as the shock absorber in our pipeline, allowing the audit capture rate to differ from the processing rate without losing data.

4. Vector.dev: Event Processing and Routing
Vector is a high-performance observability data pipeline that consumes messages from LavinMQ and handles:

Transformation: Reshaping events into the desired format
Enrichment: Adding metadata or contextual information
Routing: Directing events to appropriate destinations
Batching: Optimizing write patterns to storage

Vector's configuration-as-code approach makes it easy to modify the pipeline without redeploying applications.

5. Backblaze: Cost-Effective Long-Term Storage
For long-term storage, we use Backblaze B2 via the S3-compatible API. Backblaze offers:

Low cost: Significantly cheaper than traditional cloud object storage
S3 compatibility: Works with existing S3 tooling and libraries
Durability: Enterprise-grade data protection
Scalability: Handles petabytes of data without configuration changes

Key Benefits of This Architecture

Decoupling Operations
By separating audit capture, transport, and storage into distinct layers, each component can be scaled, maintained, and upgraded independently. Your core application doesn't need to know or care about where audit logs ultimately end up.

Multiple Destinations
Once audit events are flowing through the pipeline, routing them to multiple destinations becomes trivial. You might:

Send recent events to a hot analytics database for real-time dashboards
Archive all events to object storage for compliance
Stream specific events to client-accessible APIs
Feed events into a data lake for data warehouse construction
Build data lineage graphs showing how values changed over time

Specialized Tooling
Each component in our pipeline is purpose-built for its role. We're not trying to make our Elixir application handle message queuing, or forcing our database to store years of historical data. We use mature protocols (AMQP, S3) and battle-tested services, reducing the surface area for bugs and operational issues.

Performance at Scale
Our operational database stays fast because we're continuously moving audit data out. LavinMQ handles traffic spikes without blocking database commits. Vector batches writes to optimize storage operations. The result is a system that handles millions of events per month without breaking a sweat.

Implementation Considerations

Monitoring and Observability
With a multi-component pipeline, observability is crucial. We monitor:

Outbox table size and processing lag
LavinMQ queue depths and consumption rates
Vector processing rates and error counts
Backblaze write success rates and latency

Data Retention and Compliance
The pipeline makes it easy to implement sophisticated retention policies. Recent data might stay in a fast query layer for 90 days, then move to cheaper storage for 7 years to meet compliance requirements, then be deleted or archived to glacier storage.

Backpressure Handling
Each component needs to handle backpressure gracefully. If Vector can't keep up with LavinMQ, messages queue in the broker. If Backblaze is slow, Vector batches and retries. The outbox pattern ensures no data is lost even if downstream systems are temporarily unavailable.

Conclusion

Building a scalable audit logging pipeline requires thinking beyond simple database inserts. By leveraging the outbox pattern with Carbonite, and building a decoupled event pipeline with LavinMQ, Vector.dev, and Backblaze, we've created a system that:

Captures every change automatically at the database level
Handles millions of events per month without impacting application performance
Provides flexibility to route audit data to multiple destinations
Uses mature, purpose-built tools for each layer
Scales horizontally without architectural changes

For applications in regulated industries or where data lineage is critical, this architecture provides the foundation for comprehensive audit logging that doesn't compromise on performance or scalability. The key is recognizing that audit logging is a data pipeline problem, not just a database problem, and designing accordingly.