Rizwan Saleem

Posted on Jun 4

Designing a Durable Event-Sourced Analytics Platform for Real-Time Dashboards

#frontend #webdev

Designing a Durable Event-Sourced Analytics Platform for Real-Time Dashboards

Event-sourced analytics is a powerful approach for building real-time dashboards that remain accurate, auditable, and scalable as data volume grows. This guide walks through a practical architecture, patterns, and implementation steps to design a robust system that ingests high-velocity events, stores them in an append-only log, materializes views for dashboards, and supports evolving analytics needs without breaking downstream consumers.

Overview and goals

Real-time, reliable dashboards: near-constant latency from event ingestion to fresh metrics.
Scalable storage: append-only event log plus compacted, query-friendly read models.
Strong consistency guarantees where it matters: idempotent producers, at-least-once delivery, and deterministic event handling.
Evolvable analytics: schema-less events with optional schemas; versioned read models and backward-compatible evolutions.
Observability and operability: tracing, metrics, schema migrations, and rollback strategies.

High-level architecture

Event producers: services emitting domain events (e.g., user_sign_up, item_purchased, page_view).
Event bus / log: a durable, immutable append-only log (e.g., Apache Kafka, AWS Kinesis, or a similar system).
Stream processing: processors that read from the log, perform enrichment, deduplication, and materialize read models.
Read models: denormalized, query-optimized projections stored in a database suitable for dashboards (e.g., PostgreSQL for ad-hoc queries, Redis for fast dashboards, or specialized OLAP stores for large-scale analytics).
Query layer: API or GraphQL/REST endpoints that expose dashboards and metrics.
Observability: centralized logging, metrics, tracing, and dashboards for system health.

Core design decisions

Event model
- Each event should have: event_id, event_type, aggregate_id, timestamp, payload, and optional metadata.
- Use a stable, monotonically increasing offset or timestamp for ordering. Include causality metadata when possible.
- Make events immutable and append-only; avoid updating events in place.
Idempotency and deduplication
- Assign a global idempotency key (e.g., event_id or a combination of source_id and sequence number).
- Processors should be idempotent, able to handle duplicate events without side effects.
Schema evolution
- Favor schemaless/loosely structured payloads (e.g., JSON with optional fields).
- Maintain a central registry of event types and payload schemas; versions can be added, with backward-compatible reader behavior.
- Read models should be versioned; new projections may leverage new fields while old projections remain intact.
Read-models and projections
- Projections are deterministic, incremental, and append-only to the read model store.
- Materialize common aggregates (counts, funnels, time-series) and more complex metrics (RFM, cohort analyses) as separate read models.
- Use windowing (tumbling/hopping windows) to produce time-bounded dashboards.
Consistency and latency
- Accept at-least-once delivery guarantees; design processors to handle retries gracefully.
- For dashboards requiring strong accuracy, implement reconciliation passes to correct drift between log and read models.
Operational concerns
- Back-pressure handling: scale producers and processors independently; implement backpressure signals.
- Exactly-once semantics in read models can be expensive; prefer idempotent processing and at-least-once semantics with deduplication.
- Monitoring: track event lag, processing lag, error rates, storage growth, and query latency.

A concrete architecture blueprint

Event bus/log: Kafka (or Kinesis)
- Topics per event source or per domain (e.g., user_events, commerce_events).
- Each producer writes to its topic with a stable schema.
Event processing layer: stream processors
- Consumer groups for each read-model projection.
- Enrichment: join with static data (e.g., user profile, product catalog) if needed; cache dimension data for fast enrichment.
- Deduplication: use a cache-backed deduper keyed by event_id within a retention window.
Read models storage
- PostgreSQL for relational read models and dashboards requiring SQL.
- Redis for ultra-fast, time-series dashboards and counters.
- Optional: ClickHouse or Druid for large-scale analytics and fast aggregations if data volume is huge.
Query API
- REST or GraphQL endpoints that fetch the latest read models and time-series data.
- Support for slicing by time window, user segments, and event types.
Observability
- OpenTelemetry instrumentation; trace end-to-end latency from event production to dashboard render.
- Metrics: event lag per topic, processor throughput, read-model update lag, error rates.
- Centralized dashboards (Grafana, Kibana) with alerting on critical thresholds.

Step-by-step guide: building a minimal but extensible system
Phase 1: define events and topics

Identify core events that drive dashboards: user_signup, login, product_view, add_to_cart, order_placed.
For each event, define:
- event_type, event_id, aggregate_id (e.g., user_id, session_id), timestamp, payload (fields relevant to the event).
Example payloads (simplified):
- user_signup: { user_id, signup_source, plan }
- product_view: { user_id, product_id, view_duration }
- order_placed: { user_id, order_id, total_amount, currency, items: [{product_id, quantity}] }

Phase 2: choose storage and streaming components

Event log: Kafka with retries and compacted keys where appropriate.
Read-model store: PostgreSQL for rich queries; Redis for hot dashboards.
Processing framework: use a lightweight stream processor (e.g., Kafka Streams, ksqlDB, or a microservice consuming Kafka and producing to read models) depending on team familiarity.

Phase 3: build a small set of initial projections

Daily active users (DAU) and monthly active users (MAU)
- Projection reads user_signup and login events; updates a daily count table and a rolling MAU view.
Revenue by day and by product
- Projection processes order_placed events; aggregates totals by day and product_id.
Session length per user
- Projection computes session duration from login and logout or from successive events within a session_id.

Phase 4: implement idempotent processors

Deduplicate events by event_id within a processing window (e.g., 24 hours).
Use upserts when updating read models to avoid duplicate counters.
Log duplicates for visibility but do not alter state.

Phase 5: schema evolution strategy

Maintain an event_type registry with a version per event payload shape.
If a payload changes, emit a new version of the event_type and support both versions in readers for a period.
Write read-model migrations as separate steps; add new read-model tables as needed without removing old ones immediately.

Phase 6: observability and resilience

Instrument producers and processors with request/latency metrics and error counters.
Track lag: difference between latest event_offset and read-model update offset.
Add alert rules for lag thresholds and processing failures.

Example implementation: a minimal Python-based event producer and a simple processor

Tech choices: Python, Kafka, PostgreSQL, Redis.

Event producer snippet (Python, using confluent-kafka)

This code emits a user_signup event when a new user registers.

from confluent_kafka import Producer
import json
import time
import uuid

p = Producer({'bootstrap.servers': 'kafka-broker:9092'})

def delivery_report(err, msg):
if err is not None:
print(f"Delivery failed for record {msg.key()}: {err}")
else:
pass # success

def publish_user_signup(user_id, signup_source, plan):
event = {
"event_id": str(uuid.uuid4()),
"event_type": "user_signup",
"aggregate_id": user_id,
"timestamp": int(time.time() * 1000),
"payload": {"user_id": user_id, "signup_source": signup_source, "plan": plan}
}
p.produce('user_events', key=event["event_id"], value=json.dumps(event), callback=delivery_report)
p.flush()

Example usage

publish_user_signup("user-123", "email_campaign", "premium")

Event processor snippet (Python, consuming Kafka and updating PostgreSQL)

A simple deduplicating projection for daily active users.

import json
import psycopg2
from confluent_kafka import Consumer

consumer = Consumer({
'bootstrap.servers': 'kafka-broker:9092',
'group.id': 'ua_projection',
'auto.offset.reset': 'earliest'
})

consumer.subscribe(['user_events'])

conn = psycopg2.connect(
dbname='analytics', user='analytics', password='secret', host='db-host'
)
cur = conn.cursor()

def main():
while True:
msg = consumer.poll(1.0)
if msg is None:
continue
if msg.error():
print("Consumer error: {}".format(msg.error()))
continue

    event = json.loads(msg.value().decode('utf-8'))
    if event.get('event_type') != 'user_signup':
        continue

    user_id = event['payload']['user_id']
    event_id = event['event_id']
    event_ts = event['timestamp']

    # Simple deduplication using a log table
    cur.execute("SELECT 1 FROM event_log WHERE event_id = %s", (event_id,))
    if cur.fetchone():
        conn.commit()
        continue

    # Insert event into log
    cur.execute(
        "INSERT INTO event_log (event_id, event_type, aggregate_id, ts, payload) VALUES (%s, %s, %s, to_timestamp(%s/1000.0), %s)",
        (event_id, event['event_type'], user_id, event_ts, json.dumps(event['payload']))
    )

    # Update DAU projection
    day = time.strftime('%Y-%m-%d', time.gmtime(event_ts/1000.0))
    cur.execute("""
        INSERT INTO dau_by_day (day, user_id)
        VALUES (%s, %s)
        ON CONFLICT (day, user_id) DO NOTHING
    """, (day, user_id))

    conn.commit()

if name == "main":
main()

Notes on the code

The event producer emits events to a topic with a unique event_id; consumers deduplicate using event_id.
The processor maintains an event_log to enable reconciliation or rollback if needed.
The DAU projection uses a per-day granularity; you can consolidate to hourly or finer granularity as needed.

Testing and validation

End-to-end tests: simulate a sequence of events, ensure read-models reflect expected aggregates.
Idempotency tests: replay the same events; ensure read models do not double-count.
Back-pressure tests: simulate spikes and observe lag; ensure processors queue and scale.

Operational considerations

Scaling
- Producers scale by topic and partition count; ensure event_id generation remains unique across partitions.
- Processors scale by consumer groups; each projection can run in its own service or container, enabling independent scaling.
Storage management
- Retain event logs long enough for reconciliation (e.g., 90 days or as required by compliance).
- Periodically compress/read-aggregate older data into summarized tables to save space.
Security
- Encrypt sensitive payload fields and enforce strict access control to read-model stores.
- Validate inputs at producers and enforce schema checks at readers.

Extending the system

Add more read models
- Funnel analytics: steps from page_view to purchase with time-to-conversion.
- Cohorts: track user behavior by signup date and observe retention.
- Product performance: revenue and units sold by category and region.
Real-time dashboards
- Use Redis/TimescaleDB or ClickHouse for fast rolling-window aggregations.
- Build a frontend that subscribes to a live API or WebSocket stream for near-instant updates.
Data quality and governance
- Implement schema validation for event payloads.
- Build a lightweight lineage system: which dashboards depend on which events.

Illustrative example: a nightly reconciliation cycle

Every night, a reconciliation job scans the event_log against read-models to detect discrepancies.
If a mutation occurred (e.g., late event arrives), the job reconstructs the affected read-model segment and replays events to bring the projection in sync.
This approach balances the performance of real-time processing with the reliability of batched consistency checks.

Final thoughts

An event-sourced analytics architecture provides a strong foundation for scalable, auditable dashboards. By starting with a compact set of events and simple projections, you can ship quickly while keeping room to grow. Prioritize idempotent processing, backward-compatible schema evolution, and robust observability to maintain a healthy system as your data and user needs expand.

Would you like a concrete starter repository structure with Docker Compose files and a minimal frontend dashboard example to go along with this architecture?

Rizwan Saleem | https://rizwansaleem.co