Sangwoo Lee

Posted on Nov 24

Optimizing Push Notification Logging at Scale - From Synchronous Bottleneck to Async Batching

#nestjs #performance #database #optimization

How I transformed notification logging from a blocking operation to a non-blocking batch system, handling 100,000+ logs efficiently?

After refactoring my Firebase notification service into a queue-based architecture (Part 1) and breaking down the monolith into clean utilities (Part 2), I hit another bottleneck: logging. Saving push notification results to the database was blocking my notification sending pipeline, adding 30-50% overhead to every batch operation.

Here's how I transformed the logging system from a synchronous bottleneck into an efficient asynchronous batch processor that handles 100,000+ logs without breaking a sweat.

The Problem: Synchronous Logging Blocks Everything

In the refactored queue-based system, the flow looked like this:

Query DB → Filter Users → Send Notifications → ❌ WAIT for logs to save → Next chunk
                                                    ↑
                                              Blocking I/O

Before Optimization: Blocking Logs

// firebase.service.ts - Synchronous logging (original)
for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
  const chunk = chunks[chunkIndex];

  // Send notifications
  const response = await messaging.sendEach(messages);

  // ❌ Problem: Blocking log save
  // Wait for ALL logs to be saved before continuing
  for (let idx = 0; idx < messages.length; idx++) {
    const msg = messages[idx];
    const memberSeq = tokenToSeqMap.get(msg.token);

    const log = new PushNotificationLog({
      job_id: jobData.jobId,
      member_seq: memberSeq,
      push_token: msg.token,
      title: msg.notification.title,
      content: msg.notification.body,
      is_success: response.responses[idx].success,
      error_message: response.responses[idx].error?.message,
      // ... 20+ more fields
      created_at: new Date(),
    });

    // ❌ Each save is a separate INSERT query (slow!)
    await this.pushNotificationLog.save(log);
  }

  // Only after ALL logs saved, continue to next chunk
  console.log(`Chunk ${chunkIndex + 1} completed with logs saved`);
}

Problems with This Approach

1. Sequential INSERT Queries

500 logs = 500 separate INSERT statements
Each INSERT waits for database acknowledgment
Network latency multiplied by log count

2. Blocking the Main Flow

Notification sending waits for log saving
Even though logs are "fire and forget" conceptually
Throughput limited by log write speed

3. Database Connection Pool Exhaustion

Each await blocks a connection
Connection pool drains quickly
Other queries start queueing

4. No Batch Optimization

Database can't optimize bulk inserts
MSSQL transaction overhead per query
Index updates happen for each row individually

Real Performance Data:

For a single chunk (500 notifications):

Notification send time: 1.2 seconds
Log saving time: 18-25 seconds ❌
Total time: ~26 seconds (log saving is 95% of chunk time!)

For 100,000 notifications (200 chunks):

Without logs: ~8 minutes
With synchronous logs: ~90 minutes ❌
Overhead: 1,025% 😱

The Solution: Async Batching with Promise.all()

The key insights:

Logs don't need to block notifications - they're audit trail, not critical path
Database loves batch INSERTs - 100x faster than individual inserts
Parallel processing scales - save multiple batches concurrently

Architecture: Async + Batch

Query DB → Filter Users → Send Notifications → Continue immediately
                              ↓
                         Fire async log save
                              ↓
                    [Promise saved for later]
                              ↓
            At end: await Promise.all(logPromises)

Step 1: Make Logging Non-Blocking

Key change: Don't await log saves immediately

// firebase.service.ts - Non-blocking logs
const logSavePromises: Promise[] = [];  // ✅ Collect promises

for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
  const chunk = chunks[chunkIndex];

  try {
    // Send notifications
    const response = await messaging.sendEach(messages);

    console.log(`Chunk ${chunkIndex + 1} sent: ${response.successCount}/${chunk.length}`);

    // ✅ Fire async log save (don't await!)
    const logPromise = savePushNotificationLogs(
      this.pushNotificationLog,
      jobData,
      messages,
      response,
      tokenToSeqMap,
      chunkIndex,
    ).catch(error => {
      // Handle errors but don't fail the main flow
      console.error(`Chunk ${chunkIndex + 1} log error:`, error);
    });

    // ✅ Save promise for later
    logSavePromises.push(logPromise);

    // ✅ Continue immediately to next chunk (no blocking!)

  } catch (error) {
    console.error(`Chunk ${chunkIndex + 1} send error:`, error);
  }
}

// ✅ Only at the end, wait for all logs to complete
console.log(`Waiting for ${logSavePromises.length} log batches to save...`);
await Promise.all(logSavePromises);
console.log('All logs saved!');

Immediate Benefits:

Notification sending no longer waits for logs
Logs save in parallel with next chunk processing
Errors in log saving don't break notification flow

Step 2: Implement Batch INSERT

Problem: TypeORM's save() method inserts one row at a time.

Solution: Use createQueryBuilder().insert().values(batch)

// save-push-notification-log.ts - Batch INSERT implementation
export async function savePushNotificationLogs(
  pushNotificationLogRepository: Repository,
  jobData: ConditionalNotificationParams,
  messages: any[],
  response: any,
  tokenToSeqMap: Map,
  chunkIndex: number,
): Promise {
  try {
    const allLogs: PushNotificationLog[] = [];

    // MSSQL has a parameter limit of 2100
    // PushNotificationLog has 25 columns
    // Max batch size: 2100 / 25 = 84 rows
    // Safe batch size: 80 rows (with safety margin)
    const LOG_BATCH_SIZE = 80;

    // Step 1: Create all log entities (in-memory)
    for (let idx = 0; idx < messages.length; idx++) {
      const msg = messages[idx];
      const memberSeq = tokenToSeqMap.get(msg.token);

      if (!memberSeq) {
        console.warn(`[Logging] member_seq not found for token: ${msg.token}`);
        continue;
      }

      const log = new PushNotificationLog({
        job_id: jobData.jobId,
        member_seq: memberSeq,
        push_token: msg.token,
        title: msg.notification.title,
        content: msg.notification.body,
        gender: jobData.gender,
        age_min: jobData.ageMin,
        age_max: jobData.ageMax,
        auth_kind: jobData.auth_kind,
        platform_type: jobData.platform_type,
        days_since_login_min: jobData.daysSinceLastLoginMin,
        days_since_login_max: jobData.daysSinceLastLoginMax,
        chunk_size: jobData.chunkSize ?? 500,
        chunk_delay: jobData.chunkDelay ?? 2000,
        chunk_index: chunkIndex,
        is_success: response.responses[idx].success,
        sent_at: new Date(),
        error_message: response.responses[idx].error?.message,
        error_code: response.responses[idx].error?.code,
        retry_count: 0,
        campaign_id: msg.data?.campaignId,
        created_at: new Date(),
      });

      allLogs.push(log);
    }

    // Step 2: Save in batches
    let totalSaved = 0;
    const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);

    for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE) {
      const batch = allLogs.slice(i, i + LOG_BATCH_SIZE);
      const batchNumber = Math.floor(i / LOG_BATCH_SIZE) + 1;

      try {
        // ✅ Batch INSERT (much faster than individual saves)
        await pushNotificationLogRepository
          .createQueryBuilder()
          .insert()
          .into(PushNotificationLog)
          .values(batch)
          .execute();

        totalSaved += batch.length;
        console.log(
          `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
        );

      } catch (saveError) {
        console.error(
          `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
          saveError
        );

        // ✅ Fallback: save individually (resilience)
        for (const singleLog of batch) {
          try {
            await pushNotificationLogRepository.save(singleLog);
            totalSaved++;
          } catch (individualError) {
            console.error('[Logging] Individual save failed:', {
              job_id: jobData.jobId,
              member_seq: singleLog.member_seq,
              error: individualError.message,
            });
          }
        }
      }
    }

    console.log(
      `[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
    );
  } catch (error) {
    console.error('[Logging] Fatal error:', error);
    // ✅ Don't throw - logging errors shouldn't break notification flow
  }
}

Key Improvements:

Batch INSERT: 80 rows in one query instead of 80 separate queries
Fallback Strategy: If batch fails, fall back to individual saves
Non-Throwing: Logging errors are logged but don't propagate
Parameter Limit Aware: Respects MSSQL's 2100 parameter limit

Step 3: Add Parallel Batch Processing

Further optimization: Process multiple batches in parallel

// save-push-notification-log.ts - Parallel batch processing
export async function savePushNotificationLogs(
  pushNotificationLogRepository: Repository,
  jobData: ConditionalNotificationParams,
  messages: any[],
  response: any,
  tokenToSeqMap: Map,
  chunkIndex: number,
): Promise {
  try {
    const allLogs: PushNotificationLog[] = [];
    const LOG_BATCH_SIZE = 80;

    // Create log entities (same as before)
    for (let idx = 0; idx < messages.length; idx++) {
      // ... entity creation logic
      allLogs.push(log);
    }

    // ✅ New: Parallel batch processing
    let totalSaved = 0;
    const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);
    const PARALLEL_BATCH_LIMIT = 3;  // Process 3 batches concurrently

    // Process in groups of 3 batches
    for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE * PARALLEL_BATCH_LIMIT) {
      const batchPromises = [];

      // Create up to 3 batch promises
      for (let j = 0; j < PARALLEL_BATCH_LIMIT && i + j * LOG_BATCH_SIZE < allLogs.length; j++) {
        const startIdx = i + j * LOG_BATCH_SIZE;
        const batch = allLogs.slice(startIdx, startIdx + LOG_BATCH_SIZE);
        const batchNumber = Math.floor(startIdx / LOG_BATCH_SIZE) + 1;

        // ✅ Create promise for this batch (not awaited yet!)
        const batchPromise = (async () => {
          try {
            await pushNotificationLogRepository
              .createQueryBuilder()
              .insert()
              .into(PushNotificationLog)
              .values(batch)
              .execute();

            totalSaved += batch.length;
            console.log(
              `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
            );

          } catch (saveError) {
            console.error(
              `[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
              saveError
            );

            // Fallback: individual saves
            for (const singleLog of batch) {
              try {
                await pushNotificationLogRepository.save(singleLog);
                totalSaved++;
              } catch (individualError) {
                console.error('[Logging] Individual save failed:', {
                  job_id: jobData.jobId,
                  member_seq: singleLog.member_seq,
                  error: individualError.message,
                });
              }
            }
          }
        })();

        batchPromises.push(batchPromise);
      }

      // ✅ Wait for this group of 3 batches to complete
      await Promise.all(batchPromises);
    }

    console.log(
      `[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
    );
  } catch (error) {
    console.error('[Logging] Fatal error:', error);
  }
}

Why Parallel Processing?

Better resource utilization: Database can handle multiple concurrent INSERTs
Reduced total time: 3 batches in parallel ≈ 3x faster than sequential
Controlled concurrency: Limited to 3 prevents overwhelming the database

Performance comparison:

For 500 logs (7 batches of 80):

Sequential batching: 7 batches × 200ms = 1.4 seconds
Parallel batching (3x): 3 batches + 3 batches + 1 batch = ~600ms ⚡
Improvement: 2.3x faster

Step 4: Handle Errors Gracefully

Critical principle: Log saving errors must not break notification sending

// firebase.service.ts - Error handling
const logPromise = savePushNotificationLogs(
  this.pushNotificationLog,
  jobData,
  messages,
  response,
  tokenToSeqMap,
  chunkIndex,
).catch(error => {
  // ✅ Catch and log errors, but don't throw
  console.error(`[Service] Chunk ${chunkIndex + 1} log error:`, error);

  // ✅ Optional: Send alert to monitoring system
  this.monitoringService.logError('push-log-save-failed', {
    jobId: jobData.jobId,
    chunkIndex,
    error: error.message,
  });

  // ✅ Continue execution - notifications already sent successfully
});

logSavePromises.push(logPromise);

Error handling strategy:

Catch at promise level: Don't let errors propagate to main flow
Log for debugging: Record error details for investigation
Alert if critical: Send to monitoring system (Sentry, DataDog, etc.)
Continue execution: Notifications succeeded, logs are secondary

Complete Flow Diagram

┌────────────────────────────────────────────────────────────┐
│                    Main Notification Flow                  │
└────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  Query Database │
                    │  Filter Users   │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  Split into     │
                    │  Chunks         │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
              ┌────▶│  Send Chunk 1   │
              │     └────────┬────────┘
              │              │
              │     ┌────────▼────────┐
              │     │ Fire async log  │──┐
              │     │ (don't await)   │  │ Promise 1
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              ├────▶│  Send Chunk 2   │  │
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              │     │ Fire async log  │──┤ Promise 2
              │     │ (don't await)   │  │
              │     └────────┬────────┘  │
              │              │           │
              │     ┌────────▼────────┐  │
              └─────│  Send Chunk N   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ Fire async log  │──┤ Promise N
                    │ (don't await)   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ All chunks sent │  │
                    │ Successfully!   │  │
                    └────────┬────────┘  │
                             │           │
                    ┌────────▼────────┐  │
                    │ await Promise   │◀─┘
                    │ .all(logPromises│
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │ All logs saved! │
                    └─────────────────┘

┌────────────────────────────────────────────────────────────┐
│               Async Log Saving (in parallel)               │
└────────────────────────────────────────────────────────────┘

Each Promise:
  ┌──────────────────┐
  │ Create log       │
  │ entities         │
  │ (in-memory)      │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Split into       │
  │ batches of 80    │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Process 3        │
  │ batches in       │
  │ parallel         │
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Batch INSERT     │
  │ (80 rows at once)│
  └────────┬─────────┘
           │
  ┌────────▼─────────┐
  │ Success! ✅      │
  └──────────────────┘

Performance Results

After implementing async batch logging:

Single Chunk (500 notifications)

Metric	Before (Sync)	After (Async Batch)	Improvement
Notification send time	1.2s	1.2s	Same
Log save time (blocking)	18-25s	0s (non-blocking)	∞ better
Total chunk time	19-26s	1.2s	16-21x faster
Log save time (async)	N/A	0.6-0.8s	23-42x faster

Full Job (100,000 notifications / 200 chunks)

Metric	Before	After	Improvement
Total notification time	~90 minutes	~8 minutes	11x faster
Log saving overhead	1,025%	~5%	200x better
Database connections used	100-150	5-10	10-30x fewer
Total logs saved	~95,000 (95%)	~99,500 (99.5%)	4.7% better

Real-world impact:

Notifications complete in 8 minutes instead of 90 minutes
Logs finish saving ~2 minutes after notifications
Database connection pool never exhausts
No notification delays due to logging

Database Performance

Query count reduction:

Operation	Before	After	Improvement
INSERT queries	100,000	1,250	80x fewer
Transactions	100,000	1,250	80x fewer
Network round-trips	100,000	1,250	80x fewer
Index updates	100,000 (serial)	1,250 (batched)	Faster

Database metrics during 100K notification job:

Metric	Before	After	Improvement
Peak connections	145	12	12x fewer
CPU utilization	85%	22%	3.9x lower
Transaction log growth	450 MB	180 MB	2.5x smaller
Lock wait time	3,200ms	120ms	26x faster

Implementation Best Practices

Practice 1: Understand Your Database Limits

MSSQL Parameter Limit: 2100

// ❌ Bad: Will fail with large batches
await repository
  .createQueryBuilder()
  .insert()
  .values(logsArray)  // If > 84 logs (2100 / 25 columns), FAILS!
  .execute();

// ✅ Good: Respect parameter limits
const COLUMNS = 25;
const MAX_PARAMS = 2100;
const MAX_BATCH = Math.floor(MAX_PARAMS / COLUMNS);  // 84
const SAFE_BATCH = MAX_BATCH - 4;  // 80 (safety margin)

const batches = chunkArray(logsArray, SAFE_BATCH);
for (const batch of batches) {
  await repository
    .createQueryBuilder()
    .insert()
    .values(batch)
    .execute();
}

Different databases have different limits:

MSSQL: 2100 parameters
PostgreSQL: No hard limit (but 65,535 parameters per statement recommended)
MySQL: 65,535 parameters (but typically limited by max_allowed_packet)

Practice 2: Balance Concurrency

Too few parallel operations: Underutilized

const PARALLEL_BATCH_LIMIT = 1;  // ❌ Sequential, slow

Too many parallel operations: Overwhelms database

const PARALLEL_BATCH_LIMIT = 20;  // ❌ Connection pool exhausted

Goldilocks zone: 3-5 concurrent operations

const PARALLEL_BATCH_LIMIT = 3;  // ✅ Sweet spot for most databases

How to find your optimal concurrency:

Monitor connection pool usage

   // Log connection pool stats
   const poolSize = dataSource.driver.master.poolSize;
   const availableConnections = dataSource.driver.master.availableConnections;
   console.log(`Pool: ${availableConnections}/${poolSize} available`);

Start low (2-3) and increase gradually
Watch for connection pool exhaustion warnings
Monitor database CPU and lock waits

Practice 3: Implement Graceful Fallbacks

Multi-layer fallback strategy:

try {
  // Layer 1: Try batch INSERT (fastest)
  await repository
    .createQueryBuilder()
    .insert()
    .values(batch)
    .execute();

} catch (batchError) {
  console.error('Batch insert failed, trying individual saves:', batchError);

  // Layer 2: Fall back to individual saves
  for (const log of batch) {
    try {
      await repository.save(log);
    } catch (individualError) {
      console.error('Individual save failed:', individualError);

      // Layer 3: Write to dead letter queue or file
      await this.writeToDeadLetterQueue(log, individualError);
    }
  }
}

Practice 4: Monitor and Alert

Key metrics to track:

// Track log save performance
const logSaveMetrics = {
  totalLogs: messages.length,
  startTime: Date.now(),
  batchCount: 0,
  failedLogs: 0,
};

// After saving
logSaveMetrics.batchCount = batches.length;
logSaveMetrics.duration = Date.now() - logSaveMetrics.startTime;
logSaveMetrics.logsPerSecond = logSaveMetrics.totalLogs / (logSaveMetrics.duration / 1000);

// Send to monitoring
this.metricsService.recordHistogram('push_log_save_duration', logSaveMetrics.duration);
this.metricsService.recordGauge('push_log_throughput', logSaveMetrics.logsPerSecond);

// Alert if performance degrades
if (logSaveMetrics.logsPerSecond < 500) {
  this.alertService.warn('push-log-slow', {
    throughput: logSaveMetrics.logsPerSecond,
    jobId: jobData.jobId,
  });
}

Practice 5: Handle Partial Failures

Don't fail entire job for logging errors:

// ✅ Good: Track failures but continue
let successCount = 0;
let failureCount = 0;

for (const batch of batches) {
  try {
    await saveBatch(batch);
    successCount += batch.length;
  } catch (error) {
    failureCount += batch.length;
    console.error('Batch save failed:', error);
    // Continue to next batch
  }
}

console.log(`Logs saved: ${successCount}/${successCount + failureCount}`);

// ❌ Bad: Throw on first error
for (const batch of batches) {
  await saveBatch(batch);  // Throws on error, stops entire process
}

Trade-offs and Considerations

When to Use Async Batch Logging

✅ Use async batching when:

Logs are audit trail, not critical path
High-volume operations (1,000+ logs per job)
Write performance matters more than immediate consistency
You can tolerate slight delays in log visibility
Database is under heavy load

❌ Don't use async batching when:

Logs must be visible immediately (real-time dashboards)
Transactional consistency required (logs + data in same transaction)
Very low volume (< 100 logs total)
Debugging active issues (synchronous easier to trace)

Data Consistency Considerations

Eventual consistency model:

Notifications complete at time T
Logs finish saving at time T + 1-2 minutes
During that window, dashboards may show incomplete data

Mitigation strategies:

Add "in progress" status

   await jobStatusRepository.update(jobId, {
     status: 'SENDING',  // Not yet "COMPLETED"
     logsStatus: 'PENDING',
   });

   // After Promise.all(logPromises)
   await jobStatusRepository.update(jobId, {
     status: 'COMPLETED',
     logsStatus: 'SAVED',
   });

Show estimated counts

   Dashboard: "Sent: ~100,000 notifications (logs still processing)"

Eventual consistency indicator

   if (job.logsStatus === 'PENDING') {
     showWarning('Log counts may be incomplete for recent jobs');
   }

Common Pitfalls and Solutions

Pitfall 1: Memory Leaks with Large Arrays

❌ Problem:

const allLogs: PushNotificationLog[] = [];

// Load 100,000 logs into memory at once
for (let i = 0; i < 100000; i++) {
  allLogs.push(createLog(i));
}

// Memory usage: ~500MB for logs alone

✅ Solution: Stream or process in chunks

// Process in chunks to limit memory
const CHUNK_SIZE = 1000;

for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
  const chunk = messages.slice(i, i + CHUNK_SIZE);
  const logs = chunk.map(msg => createLog(msg));

  await saveBatchOfLogs(logs);

  // Logs eligible for garbage collection after this
}

Pitfall 2: Not Handling Connection Failures

❌ Problem:

// No retry logic
await repository.insert().values(batch).execute();

✅ Solution: Implement retry with backoff

async function saveWithRetry(batch: any[], maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      await repository.insert().values(batch).execute();
      return;  // Success
    } catch (error) {
      if (attempt === maxRetries) throw error;

      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      console.log(`Retry ${attempt}/${maxRetries} after ${delay}ms`);
      await sleep(delay);
    }
  }
}

Pitfall 3: Ignoring Transaction Isolation

❌ Problem:

// In high-concurrency scenarios, rows can be skipped or duplicated
const lastId = await getLastLogId();
const newLog = createLog({ id: lastId + 1 });
await save(newLog);  // Race condition!

✅ Solution: Use database auto-increment or UUIDs

// Let database handle ID generation
@Entity()
export class PushNotificationLog {
  @PrimaryGeneratedColumn()
  id: number;  // Auto-increment

  // Or use UUID
  @PrimaryGeneratedColumn('uuid')
  id: string;
}

Pitfall 4: Not Monitoring Log Lag

Problem: Logs queue up but you don't notice until system crashes

✅ Solution: Monitor async queue depth

// Track pending log promises
class LoggingMonitor {
  private pendingPromises = new Set<Promise>();

  addPromise(promise: Promise) {
    this.pendingPromises.add(promise);

    promise.finally(() => {
      this.pendingPromises.delete(promise);
    });

    // Alert if too many pending
    if (this.pendingPromises.size > 50) {
      this.alertService.warn('log-queue-growing', {
        queueSize: this.pendingPromises.size,
      });
    }
  }

  async waitForAll() {
    await Promise.all(Array.from(this.pendingPromises));
  }
}

Lessons Learned

1. Async Doesn't Mean Fire-and-Forget

Initially, I thought "async logging" meant "don't care about logs." Wrong!

Key insight: Async means "deferred, not ignored."

Still need to ensure logs eventually save
Still need to handle errors
Still need to wait at job completion
Just don't block the critical path

2. Batch Size Matters More Than You Think

I experimented with different batch sizes:

Batch Size	Queries	Time	Sweet Spot?
1	100,000	90 min	❌ Too slow
10	10,000	15 min	❌ Still slow
50	2,000	4 min	⚠️ Better
80	1,250	2 min	✅ Optimal
150	667	1.5 min	❌ Param limit

The winner: 80 rows per batch

Balances parameter limit safety
Minimizes query count
Optimal for MSSQL's query planner

3. Parallel Processing Has Diminishing Returns

Testing parallel batch processing:

Concurrency	Time	DB CPU	Connections
1 (serial)	3.2min	25%	2
2	1.8min	35%	4
3	1.4min	42%	6
5	1.3min	68%	10
10	1.3min	95%	20
20	1.4min	98%	35

Lesson: Find the knee of the curve (3-5 for most setups)

4. Error Handling Is Not Optional

In production, I saw:

Network blips causing batch failures
Deadlocks in concurrent writes
Temporary connection pool exhaustion
Constraint violations from data issues

Without proper error handling:

Jobs failed completely
Thousands of logs lost
No visibility into what failed

With proper error handling:

Jobs complete even with partial log failures
Individual failed logs recorded
Automatic retries for transient errors
Full audit trail maintained

Conclusion

Transforming push notification logging from a synchronous bottleneck to an asynchronous batch processing system was one of the most impactful optimizations in this project. The 11x improvement in total job time and 80x reduction in database queries came from understanding a few key principles:

Decouple critical path from audit trail - Logs shouldn't block notifications
Batch relentlessly - Databases love bulk operations
Parallelize wisely - 3-5 concurrent operations hits the sweet spot
Handle errors gracefully - Logging errors shouldn't fail the job

The combination of async promises (Promise.all()), batch INSERTs (80 rows per query), and controlled parallelism (3 concurrent batches) reduced log saving time from 82 minutes to 2 minutes while maintaining 99.5% success rate.

Key Takeaways

Separate critical path operations from audit/logging operations
Use batch INSERT instead of individual saves (80x faster)
Process batches in parallel (3x additional speedup)
Handle errors gracefully with fallback strategies
Respect database parameter limits (MSSQL: 2100 params)
Monitor async queue depth to prevent memory issues
Use Promise.all() to wait for async operations before job completion
Balance concurrency based on connection pool size and database capacity

Total improvement: 100,000 notifications

Before: 90 minutes (with timeouts and failures)
After: 8 minutes (reliable, scalable)
Overall improvement: 11x faster 🚀

DEV Community

Optimizing Push Notification Logging at Scale - From Synchronous Bottleneck to Async Batching

The Problem: Synchronous Logging Blocks Everything

Before Optimization: Blocking Logs

Problems with This Approach

The Solution: Async Batching with Promise.all()

Architecture: Async + Batch

Step 1: Make Logging Non-Blocking

Step 2: Implement Batch INSERT

Step 3: Add Parallel Batch Processing

Step 4: Handle Errors Gracefully

Complete Flow Diagram

Performance Results

Single Chunk (500 notifications)

Full Job (100,000 notifications / 200 chunks)

Database Performance

Implementation Best Practices

Practice 1: Understand Your Database Limits

Practice 2: Balance Concurrency

Practice 3: Implement Graceful Fallbacks

Practice 4: Monitor and Alert

Practice 5: Handle Partial Failures

Trade-offs and Considerations

When to Use Async Batch Logging

Data Consistency Considerations

Common Pitfalls and Solutions

Pitfall 1: Memory Leaks with Large Arrays

Pitfall 2: Not Handling Connection Failures

Pitfall 3: Ignoring Transaction Isolation

Pitfall 4: Not Monitoring Log Lag

Lessons Learned

1. Async Doesn't Mean Fire-and-Forget

2. Batch Size Matters More Than You Think

3. Parallel Processing Has Diminishing Returns

4. Error Handling Is Not Optional

Conclusion

Key Takeaways

Top comments (0)