DEV Community

Cover image for Optimizing Push Notification Logging at Scale - From Synchronous Bottleneck to Async Batching
Sangwoo Lee
Sangwoo Lee

Posted on

Optimizing Push Notification Logging at Scale - From Synchronous Bottleneck to Async Batching

How I transformed notification logging from a blocking operation to a non-blocking batch system, handling 100,000+ logs efficiently?

After refactoring my Firebase notification service into a queue-based architecture (Part 1) and breaking down the monolith into clean utilities (Part 2), I hit another bottleneck: logging. Saving push notification results to the database was blocking my notification sending pipeline, adding 30-50% overhead to every batch operation.

Here's how I transformed the logging system from a synchronous bottleneck into an efficient asynchronous batch processor that handles 100,000+ logs without breaking a sweat.

The Problem: Synchronous Logging Blocks Everything

In the refactored queue-based system, the flow looked like this:

Query DB → Filter Users → Send Notifications → ❌ WAIT for logs to save → Next chunk
                                                    ↑
                                              Blocking I/O
Enter fullscreen mode Exit fullscreen mode

Before Optimization: Blocking Logs

// firebase.service.ts - Synchronous logging (original)
for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
  const chunk = chunks[chunkIndex];

  // Send notifications
  const response = await messaging.sendEach(messages);

  // ❌ Problem: Blocking log save
  // Wait for ALL logs to be saved before continuing
  for (let idx = 0; idx < messages.length; idx++) {
    const msg = messages[idx];
    const memberSeq = tokenToSeqMap.get(msg.token);

    const log = new PushNotificationLog({
      job_id: jobData.jobId,
      member_seq: memberSeq,
      push_token: msg.token,
      title: msg.notification.title,
      content: msg.notification.body,
      is_success: response.responses[idx].success,
      error_message: response.responses[idx].error?.message,
      // ... 20+ more fields
      created_at: new Date(),
    });

    // ❌ Each save is a separate INSERT query (slow!)
    await this.pushNotificationLog.save(log);
  }

  // Only after ALL logs saved, continue to next chunk
  console.log(`Chunk ${chunkIndex + 1} completed with logs saved`);
}
Enter fullscreen mode Exit fullscreen mode

Problems with This Approach

1. Sequential INSERT Queries

  • 500 logs = 500 separate INSERT statements
  • Each INSERT waits for database acknowledgment
  • Network latency multiplied by log count

2. Blocking the Main Flow

  • Notification sending waits for log saving
  • Even though logs are "fire and forget" conceptually
  • Throughput limited by log write speed

3. Database Connection Pool Exhaustion

  • Each await blocks a connection
  • Connection pool drains quickly
  • Other queries start queueing

4. No Batch Optimization

  • Database can't optimize bulk inserts
  • MSSQL transaction overhead per query
  • Index updates happen for each row individually

Real Performance Data:

For a single chunk (500 notifications):

  • Notification send time: 1.2 seconds
  • Log saving time: 18-25 seconds
  • Total time: ~26 seconds (log saving is 95% of chunk time!)

For 100,000 notifications (200 chunks):

  • Without logs: ~8 minutes
  • With synchronous logs: ~90 minutes
  • Overhead: 1,025% 😱

The Solution: Async Batching with Promise.all()

The key insights:

  1. Logs don't need to block notifications - they're audit trail, not critical path
  2. Database loves batch INSERTs - 100x faster than individual inserts
  3. Parallel processing scales - save multiple batches concurrently

Architecture: Async + Batch

Query DB → Filter Users → Send Notifications → Continue immediately
                              ↓
                         Fire async log save
                              ↓
                    [Promise saved for later]
                              ↓
            At end: await Promise.all(logPromises)
Enter fullscreen mode Exit fullscreen mode

Step 1: Make Logging Non-Blocking

Key change: Don't await log saves immediately

// firebase.service.ts - Non-blocking logs
const logSavePromises: Promise[] = [];  // ✅ Collect promises

for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
  const chunk = chunks[chunkIndex];

  try {
    // Send notifications
    const response = await messaging.sendEach(messages);

    console.log(`Chunk ${chunkIndex + 1} sent: ${response.successCount}/${chunk.length}`);

    // ✅ Fire async log save (don't await!)
    const logPromise = savePushNotificationLogs(
      this.pushNotificationLog,
      jobData,
      messages,
      response,
      tokenToSeqMap,
      chunkIndex,
    ).catch(error => {
      // Handle errors but don't fail the main flow
      console.error(`Chunk ${chunkIndex + 1} log error:`, error);
    });

    // ✅ Save promise for later
    logSavePromises.push(logPromise);

    // ✅ Continue immediately to next chunk (no blocking!)

  } catch (error) {
    console.error(`Chunk ${chunkIndex + 1} send error:`, error);
  }
}

// ✅ Only at the end, wait for all logs to complete
console.log(`Waiting for ${logSavePromises.length} log batches to save...`);
await Promise.all(logSavePromises);
console.log('All logs saved!');
Enter fullscreen mode Exit fullscreen mode

Immediate Benefits:

  • Notification sending no longer waits for logs
  • Logs save in parallel with next chunk processing
  • Errors in log saving don't break notification flow

Step 2: Implement Batch INSERT

Problem: TypeORM's save() method inserts one row at a time.

Solution: Use createQueryBuilder().insert().values(batch)

// save-push-notification-log.ts - Batch INSERT implementation
export async function savePushNotificationLogs(
  pushNotificationLogRepository: Repository,
  jobData: ConditionalNotificationParams,
  messages: any[],
  response: any,
  tokenToSeqMap: Map,
  chunkIndex: number,
): Promise {
  try {
    const allLogs: PushNotificationLog[] = [];

    // MSSQL has a parameter limit of 2100
    // PushNotificationLog has 25 columns
    // Max batch size: 2100 / 25 = 84 rows
    // Safe batch size: 80 rows (with safety margin)
    const LOG_BATCH_SIZE = 80;

    // Step 1: Create all log entities (in-memory)
    for (let idx = 0; idx < messages.length; idx++) {
      const msg = messages[idx];
      const memberSeq = tokenToSeqMap.get(msg.token);

      if (!memberSeq) {
        console.warn(`[Logging] member_seq not found for token: ${msg.token}`);
        continue;
      }

      const log = new PushNotificationLog({
        job_id: jobData.jobId,
        member_seq: memberSeq,
        push_token: msg.token,
        title: msg.notification.title,
        content: msg.notification.body,
        gender: jobData.gender,
        age_min: jobData.ageMin,
        age_max: jobData.ageMax,
        auth_kind: jobData.auth_kind,
        platform_type: jobData.platform_type,
        days_since_login_min: jobData.daysSinceLastLoginMin,
        days_since_login_max: jobData.daysSinceLastLoginMax,
        chunk_size: jobData.chunkSize ?? 500,
        chunk_delay: jobData.chunkDelay ?? 2000,
        chunk_index: chunkIndex,
        is_success: response.responses[idx].success,
        sent_at: new Date(),
        error_message: response.responses[idx].error?.message,
        error_code: response.responses[idx].error?.code,
        retry_count: 0,
        campaign_id: msg.data?.campaignId,
        created_at: new Date(),
      });

      allLogs.push(log);
    }

    // Step 2: Save in batches
    let totalSaved = 0;
    const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);

    for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE) {
      const batch = allLogs.slice(i, i + LOG_BATCH_SIZE);
      const batchNumber = Math.floor(i / LOG_BATCH_SIZE) + 1;

      try {
        // ✅ Batch INSERT (much faster than individual saves)
        await pushNotificationLogRepository
          .createQueryBuilder()
          .insert()
          .into(PushNotificationLog)
          .values(batch)
          .execute();

        totalSaved += batch.length;
        console.log(
          `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
        );

      } catch (saveError) {
        console.error(
          `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
          saveError
        );

        // ✅ Fallback: save individually (resilience)
        for (const singleLog of batch) {
          try {
            await pushNotificationLogRepository.save(singleLog);
            totalSaved++;
          } catch (individualError) {
            console.error('[Logging] Individual save failed:', {
              job_id: jobData.jobId,
              member_seq: singleLog.member_seq,
              error: individualError.message,
            });
          }
        }
      }
    }

    console.log(
      `[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
    );
  } catch (error) {
    console.error('[Logging] Fatal error:', error);
    // ✅ Don't throw - logging errors shouldn't break notification flow
  }
}
Enter fullscreen mode Exit fullscreen mode

Key Improvements:

  1. Batch INSERT: 80 rows in one query instead of 80 separate queries
  2. Fallback Strategy: If batch fails, fall back to individual saves
  3. Non-Throwing: Logging errors are logged but don't propagate
  4. Parameter Limit Aware: Respects MSSQL's 2100 parameter limit

Step 3: Add Parallel Batch Processing

Further optimization: Process multiple batches in parallel

// save-push-notification-log.ts - Parallel batch processing
export async function savePushNotificationLogs(
  pushNotificationLogRepository: Repository,
  jobData: ConditionalNotificationParams,
  messages: any[],
  response: any,
  tokenToSeqMap: Map,
  chunkIndex: number,
): Promise {
  try {
    const allLogs: PushNotificationLog[] = [];
    const LOG_BATCH_SIZE = 80;

    // Create log entities (same as before)
    for (let idx = 0; idx < messages.length; idx++) {
      // ... entity creation logic
      allLogs.push(log);
    }

    // ✅ New: Parallel batch processing
    let totalSaved = 0;
    const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);
    const PARALLEL_BATCH_LIMIT = 3;  // Process 3 batches concurrently

    // Process in groups of 3 batches
    for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE * PARALLEL_BATCH_LIMIT) {
      const batchPromises = [];

      // Create up to 3 batch promises
      for (let j = 0; j < PARALLEL_BATCH_LIMIT && i + j * LOG_BATCH_SIZE < allLogs.length; j++) {
        const startIdx = i + j * LOG_BATCH_SIZE;
        const batch = allLogs.slice(startIdx, startIdx + LOG_BATCH_SIZE);
        const batchNumber = Math.floor(startIdx / LOG_BATCH_SIZE) + 1;

        // ✅ Create promise for this batch (not awaited yet!)
        const batchPromise = (async () => {
          try {
            await pushNotificationLogRepository
              .createQueryBuilder()
              .insert()
              .into(PushNotificationLog)
              .values(batch)
              .execute();

            totalSaved += batch.length;
            console.log(
              `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
            );

          } catch (saveError) {
            console.error(
              `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
              saveError
            );

            // Fallback: individual saves
            for (const singleLog of batch) {
              try {
                await pushNotificationLogRepository.save(singleLog);
                totalSaved++;
              } catch (individualError) {
                console.error('[Logging] Individual save failed:', {
                  job_id: jobData.jobId,
                  member_seq: singleLog.member_seq,
                  error: individualError.message,
                });
              }
            }
          }
        })();

        batchPromises.push(batchPromise);
      }

      // ✅ Wait for this group of 3 batches to complete
      await Promise.all(batchPromises);
    }

    console.log(
      `[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
    );
  } catch (error) {
    console.error('[Logging] Fatal error:', error);
  }
}
Enter fullscreen mode Exit fullscreen mode

Why Parallel Processing?

  • Better resource utilization: Database can handle multiple concurrent INSERTs
  • Reduced total time: 3 batches in parallel ≈ 3x faster than sequential
  • Controlled concurrency: Limited to 3 prevents overwhelming the database

Performance comparison:

For 500 logs (7 batches of 80):

  • Sequential batching: 7 batches × 200ms = 1.4 seconds
  • Parallel batching (3x): 3 batches + 3 batches + 1 batch = ~600ms ⚡
  • Improvement: 2.3x faster

Step 4: Handle Errors Gracefully

Critical principle: Log saving errors must not break notification sending

// firebase.service.ts - Error handling
const logPromise = savePushNotificationLogs(
  this.pushNotificationLog,
  jobData,
  messages,
  response,
  tokenToSeqMap,
  chunkIndex,
).catch(error => {
  // ✅ Catch and log errors, but don't throw
  console.error(`[Service] Chunk ${chunkIndex + 1} log error:`, error);

  // ✅ Optional: Send alert to monitoring system
  this.monitoringService.logError('push-log-save-failed', {
    jobId: jobData.jobId,
    chunkIndex,
    error: error.message,
  });

  // ✅ Continue execution - notifications already sent successfully
});

logSavePromises.push(logPromise);
Enter fullscreen mode Exit fullscreen mode

Error handling strategy:

  1. Catch at promise level: Don't let errors propagate to main flow
  2. Log for debugging: Record error details for investigation
  3. Alert if critical: Send to monitoring system (Sentry, DataDog, etc.)
  4. Continue execution: Notifications succeeded, logs are secondary

Complete Flow Diagram

┌────────────────────────────────────────────────────────────┐
│                    Main Notification Flow                  │
└────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  Query Database │
                    │  Filter Users   │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  Split into     │
                    │  Chunks         │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
              ┌────▶│  Send Chunk 1   │
              │     └────────┬────────┘
              │              │
              │     ┌────────▼────────┐
              │     │ Fire async log  │──┐
              │     │ (don't await)   │  │ Promise 1
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              ├────▶│  Send Chunk 2   │  │
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              │     │ Fire async log  │──┤ Promise 2
              │     │ (don't await)   │  │
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              └─────│  Send Chunk N   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ Fire async log  │──┤ Promise N
                    │ (don't await)   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ All chunks sent │  │
                    │ Successfully!   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ await Promise   │◀─┘
                    │ .all(logPromises│
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │ All logs saved! │
                    └─────────────────┘

┌────────────────────────────────────────────────────────────┐
│               Async Log Saving (in parallel)               │
└────────────────────────────────────────────────────────────┘

Each Promise:
  ┌──────────────────┐
  │ Create log       │
  │ entities         │
  │ (in-memory)      │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Split into       │
  │ batches of 80    │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Process 3        │
  │ batches in       │
  │ parallel         │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Batch INSERT     │
  │ (80 rows at once)│
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Success! ✅      │
  └──────────────────┘
Enter fullscreen mode Exit fullscreen mode

Performance Results

After implementing async batch logging:

Single Chunk (500 notifications)

Metric Before (Sync) After (Async Batch) Improvement
Notification send time 1.2s 1.2s Same
Log save time (blocking) 18-25s 0s (non-blocking) ∞ better
Total chunk time 19-26s 1.2s 16-21x faster
Log save time (async) N/A 0.6-0.8s 23-42x faster

Full Job (100,000 notifications / 200 chunks)

Metric Before After Improvement
Total notification time ~90 minutes ~8 minutes 11x faster
Log saving overhead 1,025% ~5% 200x better
Database connections used 100-150 5-10 10-30x fewer
Total logs saved ~95,000 (95%) ~99,500 (99.5%) 4.7% better

Real-world impact:

  • Notifications complete in 8 minutes instead of 90 minutes
  • Logs finish saving ~2 minutes after notifications
  • Database connection pool never exhausts
  • No notification delays due to logging

Database Performance

Query count reduction:

Operation Before After Improvement
INSERT queries 100,000 1,250 80x fewer
Transactions 100,000 1,250 80x fewer
Network round-trips 100,000 1,250 80x fewer
Index updates 100,000 (serial) 1,250 (batched) Faster

Database metrics during 100K notification job:

Metric Before After Improvement
Peak connections 145 12 12x fewer
CPU utilization 85% 22% 3.9x lower
Transaction log growth 450 MB 180 MB 2.5x smaller
Lock wait time 3,200ms 120ms 26x faster

Implementation Best Practices

Practice 1: Understand Your Database Limits

MSSQL Parameter Limit: 2100

// ❌ Bad: Will fail with large batches
await repository
  .createQueryBuilder()
  .insert()
  .values(logsArray)  // If > 84 logs (2100 / 25 columns), FAILS!
  .execute();

// ✅ Good: Respect parameter limits
const COLUMNS = 25;
const MAX_PARAMS = 2100;
const MAX_BATCH = Math.floor(MAX_PARAMS / COLUMNS);  // 84
const SAFE_BATCH = MAX_BATCH - 4;  // 80 (safety margin)

const batches = chunkArray(logsArray, SAFE_BATCH);
for (const batch of batches) {
  await repository
    .createQueryBuilder()
    .insert()
    .values(batch)
    .execute();
}
Enter fullscreen mode Exit fullscreen mode

Different databases have different limits:

  • MSSQL: 2100 parameters
  • PostgreSQL: No hard limit (but 65,535 parameters per statement recommended)
  • MySQL: 65,535 parameters (but typically limited by max_allowed_packet)

Practice 2: Balance Concurrency

Too few parallel operations: Underutilized

const PARALLEL_BATCH_LIMIT = 1;  // ❌ Sequential, slow
Enter fullscreen mode Exit fullscreen mode

Too many parallel operations: Overwhelms database

const PARALLEL_BATCH_LIMIT = 20;  // ❌ Connection pool exhausted
Enter fullscreen mode Exit fullscreen mode

Goldilocks zone: 3-5 concurrent operations

const PARALLEL_BATCH_LIMIT = 3;  // ✅ Sweet spot for most databases
Enter fullscreen mode Exit fullscreen mode

How to find your optimal concurrency:

  1. Monitor connection pool usage
   // Log connection pool stats
   const poolSize = dataSource.driver.master.poolSize;
   const availableConnections = dataSource.driver.master.availableConnections;
   console.log(`Pool: ${availableConnections}/${poolSize} available`);
Enter fullscreen mode Exit fullscreen mode
  1. Start low (2-3) and increase gradually
  2. Watch for connection pool exhaustion warnings
  3. Monitor database CPU and lock waits

Practice 3: Implement Graceful Fallbacks

Multi-layer fallback strategy:

try {
  // Layer 1: Try batch INSERT (fastest)
  await repository
    .createQueryBuilder()
    .insert()
    .values(batch)
    .execute();

} catch (batchError) {
  console.error('Batch insert failed, trying individual saves:', batchError);

  // Layer 2: Fall back to individual saves
  for (const log of batch) {
    try {
      await repository.save(log);
    } catch (individualError) {
      console.error('Individual save failed:', individualError);

      // Layer 3: Write to dead letter queue or file
      await this.writeToDeadLetterQueue(log, individualError);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Practice 4: Monitor and Alert

Key metrics to track:

// Track log save performance
const logSaveMetrics = {
  totalLogs: messages.length,
  startTime: Date.now(),
  batchCount: 0,
  failedLogs: 0,
};

// After saving
logSaveMetrics.batchCount = batches.length;
logSaveMetrics.duration = Date.now() - logSaveMetrics.startTime;
logSaveMetrics.logsPerSecond = logSaveMetrics.totalLogs / (logSaveMetrics.duration / 1000);

// Send to monitoring
this.metricsService.recordHistogram('push_log_save_duration', logSaveMetrics.duration);
this.metricsService.recordGauge('push_log_throughput', logSaveMetrics.logsPerSecond);

// Alert if performance degrades
if (logSaveMetrics.logsPerSecond < 500) {
  this.alertService.warn('push-log-slow', {
    throughput: logSaveMetrics.logsPerSecond,
    jobId: jobData.jobId,
  });
}
Enter fullscreen mode Exit fullscreen mode

Practice 5: Handle Partial Failures

Don't fail entire job for logging errors:

// ✅ Good: Track failures but continue
let successCount = 0;
let failureCount = 0;

for (const batch of batches) {
  try {
    await saveBatch(batch);
    successCount += batch.length;
  } catch (error) {
    failureCount += batch.length;
    console.error('Batch save failed:', error);
    // Continue to next batch
  }
}

console.log(`Logs saved: ${successCount}/${successCount + failureCount}`);

// ❌ Bad: Throw on first error
for (const batch of batches) {
  await saveBatch(batch);  // Throws on error, stops entire process
}
Enter fullscreen mode Exit fullscreen mode

Trade-offs and Considerations

When to Use Async Batch Logging

Use async batching when:

  • Logs are audit trail, not critical path
  • High-volume operations (1,000+ logs per job)
  • Write performance matters more than immediate consistency
  • You can tolerate slight delays in log visibility
  • Database is under heavy load

Don't use async batching when:

  • Logs must be visible immediately (real-time dashboards)
  • Transactional consistency required (logs + data in same transaction)
  • Very low volume (< 100 logs total)
  • Debugging active issues (synchronous easier to trace)

Data Consistency Considerations

Eventual consistency model:

  • Notifications complete at time T
  • Logs finish saving at time T + 1-2 minutes
  • During that window, dashboards may show incomplete data

Mitigation strategies:

  1. Add "in progress" status
   await jobStatusRepository.update(jobId, {
     status: 'SENDING',  // Not yet "COMPLETED"
     logsStatus: 'PENDING',
   });

   // After Promise.all(logPromises)
   await jobStatusRepository.update(jobId, {
     status: 'COMPLETED',
     logsStatus: 'SAVED',
   });
Enter fullscreen mode Exit fullscreen mode
  1. Show estimated counts
   Dashboard: "Sent: ~100,000 notifications (logs still processing)"
Enter fullscreen mode Exit fullscreen mode
  1. Eventual consistency indicator
   if (job.logsStatus === 'PENDING') {
     showWarning('Log counts may be incomplete for recent jobs');
   }
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls and Solutions

Pitfall 1: Memory Leaks with Large Arrays

Problem:

const allLogs: PushNotificationLog[] = [];

// Load 100,000 logs into memory at once
for (let i = 0; i < 100000; i++) {
  allLogs.push(createLog(i));
}

// Memory usage: ~500MB for logs alone
Enter fullscreen mode Exit fullscreen mode

Solution: Stream or process in chunks

// Process in chunks to limit memory
const CHUNK_SIZE = 1000;

for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
  const chunk = messages.slice(i, i + CHUNK_SIZE);
  const logs = chunk.map(msg => createLog(msg));

  await saveBatchOfLogs(logs);

  // Logs eligible for garbage collection after this
}
Enter fullscreen mode Exit fullscreen mode

Pitfall 2: Not Handling Connection Failures

Problem:

// No retry logic
await repository.insert().values(batch).execute();
Enter fullscreen mode Exit fullscreen mode

Solution: Implement retry with backoff

async function saveWithRetry(batch: any[], maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      await repository.insert().values(batch).execute();
      return;  // Success
    } catch (error) {
      if (attempt === maxRetries) throw error;

      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      console.log(`Retry ${attempt}/${maxRetries} after ${delay}ms`);
      await sleep(delay);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Pitfall 3: Ignoring Transaction Isolation

Problem:

// In high-concurrency scenarios, rows can be skipped or duplicated
const lastId = await getLastLogId();
const newLog = createLog({ id: lastId + 1 });
await save(newLog);  // Race condition!
Enter fullscreen mode Exit fullscreen mode

Solution: Use database auto-increment or UUIDs

// Let database handle ID generation
@Entity()
export class PushNotificationLog {
  @PrimaryGeneratedColumn()
  id: number;  // Auto-increment

  // Or use UUID
  @PrimaryGeneratedColumn('uuid')
  id: string;
}
Enter fullscreen mode Exit fullscreen mode

Pitfall 4: Not Monitoring Log Lag

Problem: Logs queue up but you don't notice until system crashes

Solution: Monitor async queue depth

// Track pending log promises
class LoggingMonitor {
  private pendingPromises = new Set<Promise>();

  addPromise(promise: Promise) {
    this.pendingPromises.add(promise);

    promise.finally(() => {
      this.pendingPromises.delete(promise);
    });

    // Alert if too many pending
    if (this.pendingPromises.size > 50) {
      this.alertService.warn('log-queue-growing', {
        queueSize: this.pendingPromises.size,
      });
    }
  }

  async waitForAll() {
    await Promise.all(Array.from(this.pendingPromises));
  }
}
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

1. Async Doesn't Mean Fire-and-Forget

Initially, I thought "async logging" meant "don't care about logs." Wrong!

Key insight: Async means "deferred, not ignored."

  • Still need to ensure logs eventually save
  • Still need to handle errors
  • Still need to wait at job completion
  • Just don't block the critical path

2. Batch Size Matters More Than You Think

I experimented with different batch sizes:

Batch Size Queries Time Sweet Spot?
1 100,000 90 min ❌ Too slow
10 10,000 15 min ❌ Still slow
50 2,000 4 min ⚠️ Better
80 1,250 2 min ✅ Optimal
150 667 1.5 min ❌ Param limit

The winner: 80 rows per batch

  • Balances parameter limit safety
  • Minimizes query count
  • Optimal for MSSQL's query planner

3. Parallel Processing Has Diminishing Returns

Testing parallel batch processing:

Concurrency Time DB CPU Connections
1 (serial) 3.2min 25% 2
2 1.8min 35% 4
3 1.4min 42% 6
5 1.3min 68% 10
10 1.3min 95% 20
20 1.4min 98% 35

Lesson: Find the knee of the curve (3-5 for most setups)

4. Error Handling Is Not Optional

In production, I saw:

  • Network blips causing batch failures
  • Deadlocks in concurrent writes
  • Temporary connection pool exhaustion
  • Constraint violations from data issues

Without proper error handling:

  • Jobs failed completely
  • Thousands of logs lost
  • No visibility into what failed

With proper error handling:

  • Jobs complete even with partial log failures
  • Individual failed logs recorded
  • Automatic retries for transient errors
  • Full audit trail maintained

Conclusion

Transforming push notification logging from a synchronous bottleneck to an asynchronous batch processing system was one of the most impactful optimizations in this project. The 11x improvement in total job time and 80x reduction in database queries came from understanding a few key principles:

  1. Decouple critical path from audit trail - Logs shouldn't block notifications
  2. Batch relentlessly - Databases love bulk operations
  3. Parallelize wisely - 3-5 concurrent operations hits the sweet spot
  4. Handle errors gracefully - Logging errors shouldn't fail the job

The combination of async promises (Promise.all()), batch INSERTs (80 rows per query), and controlled parallelism (3 concurrent batches) reduced log saving time from 82 minutes to 2 minutes while maintaining 99.5% success rate.

Key Takeaways

  • Separate critical path operations from audit/logging operations
  • Use batch INSERT instead of individual saves (80x faster)
  • Process batches in parallel (3x additional speedup)
  • Handle errors gracefully with fallback strategies
  • Respect database parameter limits (MSSQL: 2100 params)
  • Monitor async queue depth to prevent memory issues
  • Use Promise.all() to wait for async operations before job completion
  • Balance concurrency based on connection pool size and database capacity

Total improvement: 100,000 notifications

  • Before: 90 minutes (with timeouts and failures)
  • After: 8 minutes (reliable, scalable)
  • Overall improvement: 11x faster 🚀

Top comments (0)