Sangwoo Lee

Posted on Apr 16

Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs

#nestjs #typescript #database #bullmq

How I redesigned a push notification logging system that had accumulated 100 million rows — without a single minute of downtime — using a summary table pattern, idempotent aggregation, and safe batch deletion.

When your push notification service sends 500,000 messages per job and those logs never get cleaned up, you end up with 100 million rows in a single MSSQL table. Every statistics query becomes a full-scan nightmare. Storage costs spiral. And your dashboards get slower every week.

Here's the complete story of how I solved it — including the architecture decisions, the idempotency design, the MSSQL deadlock pitfalls, and the BullMQ integration that tied it all together.

The Problem: 100 Million Rows and Growing

What Was Happening

Each time the push notification API was called, up to 500,000 FCM messages were sent over ~1.5 hours via a BullMQ worker. Every single message was logged individually to [c3tv].[c3tv].[push_notification_log] in MSSQL:

push_notification_log (MSSQL)
├── id           BIGINT (PK)
├── job_id       VARCHAR(200)   ← Same value for all 500k rows in one send
├── member_seq   INT
├── push_token   VARCHAR(500)
├── is_success   BIT
├── sent_at      DATETIME2
└── ...          (20+ more columns)

After multiple sends per day, the table grew to 100 million rows.

The Real Cost

Every statistics API call — "how many succeeded in this send?" — did this:

-- This was running against 100M rows
SELECT
  COUNT(*) AS total,
  SUM(CASE WHEN is_success = 1 THEN 1 ELSE 0 END) AS success,
  SUM(CASE WHEN is_success = 0 THEN 1 ELSE 0 END) AS failure
FROM push_notification_log
WHERE job_id = 'conditional-abc-123'

Even with an index on job_id, scanning 500,000 rows per query on a 100M-row table created significant I/O pressure, especially when multiple admins were checking stats simultaneously.

The numbers:

Storage: ~80 GB for raw logs (growing 500K rows x 20+ columns per send)
Stats query: 1.2 to 3.8 seconds average response time
Cost: Disproportionate MSSQL DTU consumption ### What the API Consumers Expected

The admin dashboard called GET /message/conditional-sends/:jobId/stats and expected:

{
  "total": 498234,
  "success": 489102,
  "failure": 9132,
  "successRate": 98.17,
  "deliveryRate": 98.17
}

This had to work even after the raw logs were deleted.

The Solution: Summary Table Pattern + Batch Deletion

Architecture Overview

Before (problematic):

FCM Send (500K msgs)
  --> push_notification_log (MSSQL)
        500,000 rows per send x N sends = 100M+
        COUNT(*) on every stats call (expensive, forever)
  --> Stats API Response

After (optimized):

FCM Send (500K msgs)
  --> push_notification_log (MSSQL)
        500,000 rows (temporary)
        Aggregated immediately on job complete
  --> push_notification_log_summary (MySQL)
        1 row per send job
        job_id        = 'conditional-abc-123'
        total_count   = 498234
        success_count = 489102
        failure_count = 9132
        first_sent_at = 2026-04-16 09:00:00
        last_sent_at  = 2026-04-16 10:31:22
        logs_deleted_at   = null  (becomes datetime after deletion)
        logs_deleted_count = 0    (becomes 498234 after deletion)
        Batch delete raw logs (runs immediately after aggregation)
  --> push_notification_log (MSSQL)
        0 rows remaining for this job_id
        Stats API reads from summary (1-row lookup, instant)
  --> Stats API Response

Key Design Constraints

Constraint	Requirement
Data integrity	`success_count + failure_count = total_count` always
Idempotency	Re-running aggregation must be safe (no duplicate rows)
Batch deletion	No table locks, no deadlocks, no CPU spikes
API continuity	Stats APIs work before AND after log deletion
Failure recovery	Resume from where it left off if server crashes mid-delete

Implementation Deep Dive

Step 1: The Summary Table Entity

// src/entities/push-notification-log-summary.entity.ts
@Entity({ name: 'push_notification_log_summary' })
@Index(['created_at'])
@Index(['first_sent_at'])
export class PushNotificationLogSummary {
  @PrimaryColumn({ type: 'varchar', length: 200, name: 'job_id' })
  job_id: string;

  // Aggregated send counts
  @Column({ type: 'int', name: 'total_count', default: 0 })
  total_count: number;          // COUNT(*) WHERE job_id = ?

  @Column({ type: 'int', name: 'success_count', default: 0 })
  success_count: number;        // COUNT(*) WHERE is_success = 1

  @Column({ type: 'int', name: 'failure_count', default: 0 })
  failure_count: number;        // COUNT(*) WHERE is_success = 0
  // INVARIANT: success_count + failure_count = total_count (always)

  // Timestamps (preserved after log deletion)
  @Column({ type: 'datetime', precision: 6, nullable: true })
  scheduled_send_date: Date | null;  // For scheduled-* jobs only

  @Column({ type: 'datetime', precision: 6, nullable: true })
  first_sent_at: Date | null;   // MIN(sent_at) from raw logs

  @Column({ type: 'datetime', precision: 6, nullable: true })
  last_sent_at: Date | null;    // MAX(sent_at) from raw logs

  // Deletion tracking
  @Column({ type: 'datetime', nullable: true })
  logs_deleted_at: Date | null; // null = not yet deleted

  @Column({ type: 'int', default: 0 })
  logs_deleted_count: number;   // Must equal total_count when done

  @CreateDateColumn({ type: 'datetime', name: 'created_at' })
  created_at: Date;
}

Why logs_deleted_at = null is a critical sentinel:

null means raw logs still exist in MSSQL (cleanup in progress or pending)
A datetime value means logs are confirmed deleted, and logs_deleted_count matches total_count This single column drives the idempotency logic across the entire system.

Step 2: Triggering Aggregation on Job Completion

I used BullMQ's Worker.on('completed') event with a fire-and-forget pattern:

// src/routes/firebase/firebase.process.ts
private registerCompletedHandler() {
  this.worker.on('completed', async (job: Job) => {
    const { jobId, scheduleSeq } = job.data;

    // STEP 1: Fire-and-forget aggregation + deletion
    // Runs independently of Redis/DB progress updates.
    // If this fails, the job is already 'completed' in BullMQ.
    // Re-runnable via admin API without re-sending notifications.
    if (jobId && !String(jobId).startsWith('backfill-')) {
      this.firebaseService
        .aggregateAndCleanupPushLogs(jobId)
        .then(() => this.logger.log(`[Completed] Cleanup done: ${jobId}`))
        .catch(err => this.logger.error(`[Completed] Cleanup failed: ${jobId}`, err));
    }

    // STEP 2: Update Redis progress and DB timestamps
    // Independent of STEP 1 — failure here does not block cleanup.
    try {
      // ... Redis progress update, actual_send_end_date recording
    } catch (error) {
      this.logger.error(`[Completed] Progress update failed:`, error);
    }
  });
}

Why fire-and-forget?

The completed event fires after BullMQ marks the job as done. If aggregation blocks this handler and throws, BullMQ might retry the job — causing duplicate push notifications. Fire-and-forget ensures:

Job completion is recorded immediately
Aggregation runs independently
Aggregation failure is logged but does not re-trigger notification sending ### Step 3: The Core Aggregation Function

This is the heart of the solution. It must be idempotent — safe to call multiple times:

// src/routes/firebase/firebase.service.ts
async aggregateAndCleanupPushLogs(jobId: string): Promise<void> {
  const logger = new Logger('AggregateAndCleanup');

  // Batch deletion constants
  const BATCH_SIZE       = 4_500;  // Below MSSQL lock escalation threshold (5,000)
  const BATCH_DELAY_MS   = 200;    // Yield to concurrent INSERT operations
  const MAX_ITERATIONS   = 500;    // Safety cap: 4,500 x 500 = 2.25M rows max
  const DEADLOCK_RETRIES = 3;

  // ============================================================
  // STAGE 1: Idempotency check
  // ============================================================
  const existing = await this.pushNotificationLogSummaryRepository.findOne({
    where: { job_id: jobId },
  });

  if (existing?.logs_deleted_at) {
    // Both aggregation AND deletion already completed -> skip entirely
    logger.log(`Already fully processed: ${jobId}`);
    return;
  }

  // ============================================================
  // STAGE 2: Aggregation (only if summary does not exist yet)
  // ============================================================
  if (!existing) {
    const aggregateSQL = `
      SELECT
        COUNT(*)                                          AS total_count,
        SUM(CASE WHEN is_success = 1 THEN 1 ELSE 0 END)  AS success_count,
        SUM(CASE WHEN is_success = 0 THEN 1 ELSE 0 END)  AS failure_count,
        MIN(sent_at)                                      AS first_sent_at,
        MAX(sent_at)                                      AS last_sent_at,
        MIN(created_at)                                   AS log_created_at
      FROM [c3tv].[c3tv].[push_notification_log]
        WITH (INDEX(idx_push_log_job_id))
      WHERE job_id = @0
    `;

    const rows = await this.pushNotificationLog.query(aggregateSQL, [jobId]);
    const row = rows?.[0];

    if (!row || Number(row.total_count) === 0) {
      logger.warn(`No logs found for aggregation: ${jobId}`);
      return;
    }

    try {
      // Use INSERT (not save/upsert) to detect concurrent duplicates
      await this.pushNotificationLogSummaryRepository.insert({
        job_id:             jobId,
        total_count:        Number(row.total_count),
        success_count:      Number(row.success_count),
        failure_count:      Number(row.failure_count),
        first_sent_at:      row.first_sent_at ? new Date(row.first_sent_at) : null,
        last_sent_at:       row.last_sent_at  ? new Date(row.last_sent_at)  : null,
        logs_deleted_at:    null,   // Sentinel: deletion not yet done
        logs_deleted_count: 0,
        created_at:         row.log_created_at
                              ? new Date(row.log_created_at)
                              : new Date(),
      });
      logger.log(`Summary saved: ${jobId} -> total: ${row.total_count}`);

    } catch (insertError: any) {
      if (insertError.code === 'ER_DUP_ENTRY' || insertError.errno === 1062) {
        // Blue/Green concurrent execution hit the same job.
        // The other container already inserted — safe to continue to deletion.
        logger.warn(`Concurrent insert detected, proceeding to deletion: ${jobId}`);
      } else {
        throw insertError;
      }
    }
  } else {
    // Summary exists but logs_deleted_at is null.
    // Previous run crashed during deletion -> resume from here.
    logger.log(`Resuming deletion (aggregation already done): ${jobId}`);
  }

  // ============================================================
  // STAGE 3: Batch deletion with deadlock handling
  // ============================================================
  const deleteSQL = `
    SET NOCOUNT ON;
    DELETE TOP(${BATCH_SIZE})
    FROM [c3tv].[c3tv].[push_notification_log] WITH (READPAST)
    WHERE job_id = @0;
    SELECT @@ROWCOUNT AS affected;
  `;

  let totalDeleted = 0;
  let iteration = 0;

  while (iteration < MAX_ITERATIONS) {
    iteration++;

    const affected = await this.executeDeleteBatchWithRetry(
      deleteSQL,
      jobId,
      DEADLOCK_RETRIES,
      logger,
    );

    totalDeleted += affected;
    if (affected === 0) break; // All rows deleted

    if (iteration % 10 === 0) {
      logger.log(`Deletion progress: ${totalDeleted.toLocaleString()} rows`);
    }

    await delay(BATCH_DELAY_MS); // Yield to concurrent writes
  }

  // ============================================================
  // STAGE 4: Record deletion completion
  // Use SQL expression for atomic accumulation (handles retries)
  // ============================================================
  await this.pushNotificationLogSummaryRepository
    .createQueryBuilder()
    .update(PushNotificationLogSummary)
    .set({
      logs_deleted_at:    getKSTDate(),
      // If previous attempt deleted N rows, accumulate correctly
      logs_deleted_count: () => `logs_deleted_count + ${totalDeleted}`,
    })
    .where('job_id = :jobId', { jobId })
    .execute();

  logger.log(`Cleanup complete: ${jobId}, deleted: ${totalDeleted.toLocaleString()} rows`);
}

Step 4: Deadlock-Safe Batch Deletion

MSSQL deadlocks are a real concern when deleting large volumes of rows while other transactions are writing. Here's the retry logic:

private async executeDeleteBatchWithRetry(
  sql: string,
  jobId: string,
  maxRetries: number,
  logger: Logger,
): Promise<number> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await this.pushNotificationLog.query(sql, [jobId]);
      return Number(result?.[0]?.affected ?? 0);
    } catch (err: any) {
      const isDeadlock =
        err.number === 1205 ||
        err.code === 'EDEADLOCK' ||
        /deadlock/i.test(err.message ?? '');

      if (isDeadlock && attempt < maxRetries) {
        // Exponential backoff: 500ms, 1000ms, 2000ms
        const backoffMs = 500 * Math.pow(2, attempt - 1);
        logger.warn(
          `Deadlock on attempt ${attempt}/${maxRetries}, retrying in ${backoffMs}ms`
        );
        await delay(backoffMs);
        continue;
      }
      throw err;
    }
  }
  return 0;
}

Why BATCH_SIZE = 4,500?

MSSQL has a lock escalation threshold: when a single transaction holds 5,000 or more row locks, it automatically escalates to a table lock. This blocks all concurrent reads and writes.

Lock escalation threshold : 5,000 row locks
Our batch size            : 4,500 rows
Safety margin             : 10% below threshold

Result: Each DELETE batch holds at most 4,500 row locks
        -> No table lock escalation
        -> Concurrent selects/inserts continue unblocked

Why WITH (READPAST) instead of WITH (NOLOCK)?

Hint	Behavior	Risk
`NOLOCK`	Reads uncommitted data	Dirty reads (wrong counts)
`READPAST`	Skips locked rows, processes them next batch	Slightly slower, zero dirty reads
(none)	Waits for locks to release	Deadlock risk, blocking

READPAST is the right choice: it does not wait for locked rows (avoiding deadlocks) and does not read dirty data (avoiding incorrect deletion).

Step 5: All Statistics APIs Now Read from Summary

Before this change, every stats API queried the 100M-row table. Now they all hit the summary table first:

// getDeliveryStats() — O(1) lookup instead of O(N) scan
async getDeliveryStats(jobId: string): Promise<FcmErrorStats> {
  // Primary: MySQL summary table (1-row lookup, instant)
  const summary = await this.pushNotificationLogSummaryRepository.findOne({
    where: { job_id: jobId },
  });

  if (summary) {
    const { total_count: total, success_count: success, failure_count: failure } = summary;
    const successRate = total > 0
      ? parseFloat(((success / total) * 100).toFixed(2))
      : 0;

    return {
      total,
      success,
      invalidToken: failure,
      temporary: 0,
      quota: 0,
      other: 0,
      deliveryRate: successRate,
      successRate,
    };
  }

  // Fallback: MSSQL query (only during the brief ~5s aggregation window)
  const sendStats = await this.pushNotificationLog
    .createQueryBuilder('log')
    .select([
      'COUNT(*) AS total',
      'SUM(CASE WHEN log.is_success = 1 THEN 1 ELSE 0 END) AS success',
    ])
    .where('log.job_id = :jobId', { jobId })
    .getRawOne();

  // ... return computed stats
}

The same dual-source pattern applies to getClickStats(), getInvalidTokens(), and getScheduleDetail().

The Idempotency State Machine

This is the most important design element. The system can be interrupted at any point and safely resumed:

State 1: No summary row
  -> Aggregate from MSSQL
  -> INSERT summary (logs_deleted_at = null)
  -> Proceed to deletion

State 2: Summary exists, logs_deleted_at = null
  -> Skip aggregation (already done)
  -> Resume deletion from current state
  -> logs_deleted_count accumulates via SQL expression

State 3: Summary exists, logs_deleted_at = datetime
  -> Fully complete -> return immediately

Crash scenario:
  Server crashes during deletion at iteration 47 (211,500 rows deleted)
    |
    v
  Restart -> aggregateAndCleanupPushLogs() called again
    |
    v
  State 2 detected (summary exists, logs_deleted_at = null)
    |
    v
  Deletion resumes — remaining rows still exist in MSSQL
    |
    v
  logs_deleted_count: SQL expression `logs_deleted_count + 288500`
    |
    v
  Final: logs_deleted_count = 211500 + 288500 = 500000 (correct)

The SQL expression for accumulation is key:

// Correct: accumulates safely on retry
logs_deleted_count: () => `logs_deleted_count + ${totalDeleted}`

// Wrong: would reset the count on every retry
// logs_deleted_count: totalDeleted

Backfilling Historical Data

With 100 million existing rows accumulated before this feature shipped, I needed to retroactively build summaries for all past jobs. I built a dedicated backfill system:

POST /message/admin/backfill-log-summary
  ?batchSize=100
  &batchDelayMs=1000
  &jobIdType=all
  &dryRun=false

Response:
{
  "backfillJobId": "backfill-abc-123",
  "progressUrl": "/message/admin/backfill-log-summary/backfill-abc-123/progress"
}

Backfill flow:

MSSQL: 100M rows, ~200 distinct job_ids
  |
  | SELECT DISTINCT job_id (paginated, cursor-based)
  | TOP 100 WHERE job_id > @cursor ORDER BY job_id ASC
  v
For each job_id:
  1. Aggregate: COUNT(*), SUM(is_success=1), MIN/MAX(sent_at)
  2. Get time info from push_notification_schedule or push_notification_conditional
  3. INSERT into push_notification_log_summary (logs_deleted_at = null)
     -> Skip if already exists (idempotent)
  4. Original logs NOT deleted (backfill is read-only)

Progress tracked in Redis (TTL 24h):
{
  "status": "in_progress",
  "totalJobIds": 200,
  "processedJobIds": 87,
  "insertedSummaries": 84,
  "skippedJobIds": 3,
  "progressPercentage": 43,
  "estimatedEndAt": "2026-04-16T11:30:00Z"
}

Cursor-based pagination instead of OFFSET:

-- OFFSET gets slower as offset increases
SELECT DISTINCT job_id FROM push_notification_log
ORDER BY job_id
OFFSET 5000 ROWS FETCH NEXT 100 ROWS ONLY

-- Cursor: constant performance regardless of position
SELECT TOP 100 job_id, COUNT(*) AS cnt
FROM push_notification_log WITH (INDEX(idx_push_log_job_id))
WHERE job_id > 'conditional-abc-123'  -- last processed cursor
GROUP BY job_id
ORDER BY job_id ASC

Timeline: What Happens After a Send

T+0:00    500,000 messages sent via FCM
          All 500,000 rows written to push_notification_log (MSSQL)

T+0:00    Promise.all(logSavePromises) resolves
          sendConditionalNotifications() returns
          BullMQ marks job as 'completed'

T+0:01    Worker.on('completed') fires
          aggregateAndCleanupPushLogs() called (fire-and-forget)

T+0:05    Aggregation query completes (~4 seconds for 500K rows with index)
          push_notification_log_summary row inserted:
            total_count   = 498234
            success_count = 489102
            failure_count = 9132
            logs_deleted_at = null

T+0:10    Batch deletion begins
          DELETE TOP(4500) ... WITH (READPAST)

T+4:10    ~56 batches x 4,500 rows x 200ms delay = ~4 minutes
          All 498,234 rows deleted

T+4:10    Summary updated:
            logs_deleted_at    = 2026-04-16 09:04:10
            logs_deleted_count = 498234

Stats API at any point after T+0:05:
          -> 1-row lookup from push_notification_log_summary (MySQL)
          -> Response in under 5ms

Performance Results

Before vs After

Metric	Before	After	Improvement
`getDeliveryStats` response time	1.2 to 3.8s	under 5ms	760x faster
MSSQL table size	100M rows (~80 GB)	under 1M rows (~0.8 GB)	99% reduction
Storage cost	High (MSSQL DTU intensive)	Minimal (MySQL summary)	~90% reduction
Stats query complexity	O(N) per call	O(1) always	Constant time
Deletion impact on production	N/A	Near-zero (READPAST)	No blocking

Resource Usage During Cleanup

Batch deletion run (500K rows, 4,500 batch size):

CPU (MSSQL):
  Aggregation query : peak 45% for ~4 seconds
  Batch deletion    : sustained 15-20% for ~4 minutes
  (No CPU spike — smooth, predictable load)

MSSQL Locks:
  Max concurrent row locks : 4,500 (below 5,000 escalation threshold)
  Table locks              : ZERO (READPAST + batch size design)

Concurrent write impact:
  Ongoing push sends : ZERO slowdown (READPAST skips their rows)

Gotchas and Lessons Learned

1. MSSQL Lock Escalation Is a Silent Killer

Initial attempt: DELETE TOP(10,000) seemed reasonable.

What happened: At 5,000+ row locks, MSSQL escalated to a table lock. All concurrent INSERT operations from ongoing push sends were blocked. The notification worker stalled.

Fix: BATCH_SIZE = 4,500 keeps every delete transaction below the escalation threshold.

-- Lock escalation behavior:
-- Row locks 1 to 4,999 : table is fine
-- Row locks 5,000+     : table lock acquired, everything blocks

-- Our fix:
DELETE TOP(4500) FROM push_notification_log WITH (READPAST)
WHERE job_id = @0
-- Maximum 4,500 row locks per transaction -> no escalation

2. `@@ROWCOUNT` Is the Only Reliable Termination Signal

Naive approach: Count rows before deletion and loop until expected count is deleted.

Problem: Concurrent INSERTs from ongoing send jobs could shift the count. The loop might run forever.

Fix: Trust @@ROWCOUNT — the actual rows affected by the last DELETE:

DELETE TOP(4500) FROM push_notification_log WITH (READPAST) WHERE job_id = @0;
SELECT @@ROWCOUNT AS affected;
-- When affected = 0, we are done. No counting, no guessing.

3. `insert()` Over `save()` for Idempotency

Problem with save(): TypeORM's save() behaves like an upsert and may silently overwrite correctly aggregated values on retry.

Fix: Use insert() which throws on duplicate PK:

// Throws ER_DUP_ENTRY if already exists -> catch and continue to deletion
await this.pushNotificationLogSummaryRepository.insert({ job_id: jobId, ... });

Combined with catching ER_DUP_ENTRY, this gives safe concurrent execution between Blue and Green containers.

4. SQL Expression for Retry-Safe Accumulation

// Wrong: resets the count on retry
logs_deleted_count: 300000

// Correct: accumulates across retries
// Previous run: 200,000 + this run: 300,000 = correct total: 500,000
logs_deleted_count: () => `logs_deleted_count + ${totalDeleted}`

5. Keep `first_sent_at` in Summary for Click Rate Calculations

Click-through rate calculations need to know when the first notification was sent. After raw logs are deleted, this information was gone.

Fix: Include MIN(sent_at) in the aggregation and store it in the summary:

SELECT
  MIN(sent_at) AS first_sent_at,  -- Preserved even after log deletion
  MAX(sent_at) AS last_sent_at,
  ...
FROM push_notification_log WHERE job_id = @0

Now getClickStats() can compute "average time from send to click" even months later.

6. The `backfill-` Job Prefix Guard

With fire-and-forget in Worker.on('completed'), every completing job triggers aggregateAndCleanupPushLogs(). Backfill jobs themselves complete and have a jobId field — but they are not push notification jobs.

// Without the guard: aggregateAndCleanupPushLogs('backfill-xyz') queries
// MSSQL for a job_id that doesn't exist there. Wasteful and noisy.

// With the guard:
if (jobId && !String(jobId).startsWith('backfill-')) {
  this.firebaseService.aggregateAndCleanupPushLogs(jobId)...
}

API Design After the Change

Statistics APIs — Now Dual-Source

All stats APIs follow the same pattern:

GET /message/conditional-sends/:jobId/stats

Decision tree:
  1. Does push_notification_log_summary have this job_id?
     YES: Return summary data (O(1), instant)
     NO:  Fall back to MSSQL query (O(N), only during brief aggregation window)

Window where fallback is needed:
  T+0:00  Job completes, logs exist in MSSQL
  T+0:05  Aggregation done, summary row exists  <- Fallback no longer needed
  T+4:10  Raw logs deleted
  T+inf   Summary row always available

New Monitoring Endpoint

// GET /message/conditional-sends/:jobId/log-summary
// Lets admins see exactly where each job's cleanup stands

// Response:
{
  "job_id": "conditional-abc-123",
  "total_count": 498234,
  "success_count": 489102,
  "failure_count": 9132,
  "first_sent_at": "2026-04-16T09:00:00.000Z",
  "last_sent_at": "2026-04-16T10:31:22.000Z",
  "logs_deleted_at": "2026-04-16T10:35:40.000Z",
  "logs_deleted_count": 498234,
  "created_at": "2026-04-16T09:00:03.500Z"
}

When to Use This Pattern

This Summary Table Pattern with Batch Deletion is appropriate when:

Use it when:

You have append-only logs where historical granularity matters at write time but not read time
Log volume is order-of-magnitude larger than the number of logical events (1 job = 500K rows)
Statistics queries are read-heavy (admins checking stats repeatedly)
You can tolerate a brief aggregation window (~5 seconds) between log creation and summary availability
You need provable data integrity (success + failure = total)
Avoid it when:
You need to query individual log rows post-cleanup (e.g., "which specific users failed?")
Logs have regulatory retention requirements

- The ratio of unique events to total rows is close to 1:1

Conclusion

Solving the 100-million-row problem required rethinking the data lifecycle from the ground up:

Write raw logs during the send (necessary for real-time diagnostics)
Aggregate immediately on job completion (fire-and-forget, idempotent)
Delete in safe batches respecting MSSQL's lock escalation limits
Serve all stats from the 1-row summary (forever, even after deletion) The key engineering decisions:

insert() over save() for race-condition-safe idempotency
BATCH_SIZE = 4,500 to stay below MSSQL's 5,000-lock escalation threshold
WITH (READPAST) to avoid deadlocks against concurrent write transactions
@@ROWCOUNT as termination signal for reliable loop exit
SQL expression accumulation (logs_deleted_count + N) for retry-safe bookkeeping
logs_deleted_at sentinel to drive the three-state idempotency machine The result: stats queries went from 1 to 4 seconds down to sub-5ms, MSSQL storage dropped by over 90%, and the system now handles 100K+ notifications daily without accumulating unbounded data.

Key Takeaways

Summary table pattern converts O(N) aggregation queries to O(1) lookups permanently
Lock escalation thresholds are MSSQL-specific — know your database's limits before batch-deleting
READPAST is the right hint for high-concurrency deletion (not NOLOCK)
Design for idempotency first — retry-safe systems are operationally resilient
Store timestamps in summary rows — you will regret losing them after deletion
Backfill is a feature, not an afterthought — plan for historical data migration from day one
Fire-and-forget with proper error handling gives you cleanup guarantees without coupling
Choose simplicity over features until you outgrow it

DEV Community

Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs

The Problem: 100 Million Rows and Growing

What Was Happening

The Real Cost

The Solution: Summary Table Pattern + Batch Deletion

Architecture Overview

Key Design Constraints

Implementation Deep Dive

Step 1: The Summary Table Entity

Step 2: Triggering Aggregation on Job Completion

Step 4: Deadlock-Safe Batch Deletion

Step 5: All Statistics APIs Now Read from Summary

The Idempotency State Machine

Backfilling Historical Data

Timeline: What Happens After a Send

Performance Results

Before vs After

Resource Usage During Cleanup

Gotchas and Lessons Learned

1. MSSQL Lock Escalation Is a Silent Killer

2. `@@ROWCOUNT` Is the Only Reliable Termination Signal

3. `insert()` Over `save()` for Idempotency

4. SQL Expression for Retry-Safe Accumulation

5. Keep `first_sent_at` in Summary for Click Rate Calculations

6. The `backfill-` Job Prefix Guard

API Design After the Change

Statistics APIs — Now Dual-Source

New Monitoring Endpoint

When to Use This Pattern

- The ratio of unique events to total rows is close to 1:1

Conclusion

Key Takeaways

Top comments (0)

The Problem: 100 Million Rows and Growing

What Was Happening

The Real Cost

The Solution: Summary Table Pattern + Batch Deletion

Architecture Overview

Key Design Constraints

Implementation Deep Dive

Step 1: The Summary Table Entity

Step 2: Triggering Aggregation on Job Completion

Step 4: Deadlock-Safe Batch Deletion

Step 5: All Statistics APIs Now Read from Summary

The Idempotency State Machine

Backfilling Historical Data

Timeline: What Happens After a Send

Performance Results

Before vs After

Resource Usage During Cleanup

Gotchas and Lessons Learned

1. MSSQL Lock Escalation Is a Silent Killer

2. @@ROWCOUNT Is the Only Reliable Termination Signal

3. insert() Over save() for Idempotency

4. SQL Expression for Retry-Safe Accumulation

5. Keep first_sent_at in Summary for Click Rate Calculations

6. The backfill- Job Prefix Guard

API Design After the Change

Statistics APIs — Now Dual-Source

New Monitoring Endpoint

When to Use This Pattern

- The ratio of unique events to total rows is close to 1:1

Conclusion

Key Takeaways

2. `@@ROWCOUNT` Is the Only Reliable Termination Signal

3. `insert()` Over `save()` for Idempotency

5. Keep `first_sent_at` in Summary for Click Rate Calculations

6. The `backfill-` Job Prefix Guard