How I transformed notification logging from a blocking operation to a non-blocking batch system, handling 100,000+ logs efficiently?
After refactoring my Firebase notification service into a queue-based architecture (Part 1) and breaking down the monolith into clean utilities (Part 2), I hit another bottleneck: logging. Saving push notification results to the database was blocking my notification sending pipeline, adding 30-50% overhead to every batch operation.
Here's how I transformed the logging system from a synchronous bottleneck into an efficient asynchronous batch processor that handles 100,000+ logs without breaking a sweat.
The Problem: Synchronous Logging Blocks Everything
In the refactored queue-based system, the flow looked like this:
Query DB → Filter Users → Send Notifications → ❌ WAIT for logs to save → Next chunk
↑
Blocking I/O
Before Optimization: Blocking Logs
// firebase.service.ts - Synchronous logging (original)
for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
const chunk = chunks[chunkIndex];
// Send notifications
const response = await messaging.sendEach(messages);
// ❌ Problem: Blocking log save
// Wait for ALL logs to be saved before continuing
for (let idx = 0; idx < messages.length; idx++) {
const msg = messages[idx];
const memberSeq = tokenToSeqMap.get(msg.token);
const log = new PushNotificationLog({
job_id: jobData.jobId,
member_seq: memberSeq,
push_token: msg.token,
title: msg.notification.title,
content: msg.notification.body,
is_success: response.responses[idx].success,
error_message: response.responses[idx].error?.message,
// ... 20+ more fields
created_at: new Date(),
});
// ❌ Each save is a separate INSERT query (slow!)
await this.pushNotificationLog.save(log);
}
// Only after ALL logs saved, continue to next chunk
console.log(`Chunk ${chunkIndex + 1} completed with logs saved`);
}
Problems with This Approach
1. Sequential INSERT Queries
- 500 logs = 500 separate INSERT statements
- Each INSERT waits for database acknowledgment
- Network latency multiplied by log count
2. Blocking the Main Flow
- Notification sending waits for log saving
- Even though logs are "fire and forget" conceptually
- Throughput limited by log write speed
3. Database Connection Pool Exhaustion
- Each await blocks a connection
- Connection pool drains quickly
- Other queries start queueing
4. No Batch Optimization
- Database can't optimize bulk inserts
- MSSQL transaction overhead per query
- Index updates happen for each row individually
Real Performance Data:
For a single chunk (500 notifications):
- Notification send time: 1.2 seconds
- Log saving time: 18-25 seconds ❌
- Total time: ~26 seconds (log saving is 95% of chunk time!)
For 100,000 notifications (200 chunks):
- Without logs: ~8 minutes
- With synchronous logs: ~90 minutes ❌
- Overhead: 1,025% 😱
The Solution: Async Batching with Promise.all()
The key insights:
- Logs don't need to block notifications - they're audit trail, not critical path
- Database loves batch INSERTs - 100x faster than individual inserts
- Parallel processing scales - save multiple batches concurrently
Architecture: Async + Batch
Query DB → Filter Users → Send Notifications → Continue immediately
↓
Fire async log save
↓
[Promise saved for later]
↓
At end: await Promise.all(logPromises)
Step 1: Make Logging Non-Blocking
Key change: Don't await log saves immediately
// firebase.service.ts - Non-blocking logs
const logSavePromises: Promise[] = []; // ✅ Collect promises
for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
const chunk = chunks[chunkIndex];
try {
// Send notifications
const response = await messaging.sendEach(messages);
console.log(`Chunk ${chunkIndex + 1} sent: ${response.successCount}/${chunk.length}`);
// ✅ Fire async log save (don't await!)
const logPromise = savePushNotificationLogs(
this.pushNotificationLog,
jobData,
messages,
response,
tokenToSeqMap,
chunkIndex,
).catch(error => {
// Handle errors but don't fail the main flow
console.error(`Chunk ${chunkIndex + 1} log error:`, error);
});
// ✅ Save promise for later
logSavePromises.push(logPromise);
// ✅ Continue immediately to next chunk (no blocking!)
} catch (error) {
console.error(`Chunk ${chunkIndex + 1} send error:`, error);
}
}
// ✅ Only at the end, wait for all logs to complete
console.log(`Waiting for ${logSavePromises.length} log batches to save...`);
await Promise.all(logSavePromises);
console.log('All logs saved!');
Immediate Benefits:
- Notification sending no longer waits for logs
- Logs save in parallel with next chunk processing
- Errors in log saving don't break notification flow
Step 2: Implement Batch INSERT
Problem: TypeORM's save() method inserts one row at a time.
Solution: Use createQueryBuilder().insert().values(batch)
// save-push-notification-log.ts - Batch INSERT implementation
export async function savePushNotificationLogs(
pushNotificationLogRepository: Repository,
jobData: ConditionalNotificationParams,
messages: any[],
response: any,
tokenToSeqMap: Map,
chunkIndex: number,
): Promise {
try {
const allLogs: PushNotificationLog[] = [];
// MSSQL has a parameter limit of 2100
// PushNotificationLog has 25 columns
// Max batch size: 2100 / 25 = 84 rows
// Safe batch size: 80 rows (with safety margin)
const LOG_BATCH_SIZE = 80;
// Step 1: Create all log entities (in-memory)
for (let idx = 0; idx < messages.length; idx++) {
const msg = messages[idx];
const memberSeq = tokenToSeqMap.get(msg.token);
if (!memberSeq) {
console.warn(`[Logging] member_seq not found for token: ${msg.token}`);
continue;
}
const log = new PushNotificationLog({
job_id: jobData.jobId,
member_seq: memberSeq,
push_token: msg.token,
title: msg.notification.title,
content: msg.notification.body,
gender: jobData.gender,
age_min: jobData.ageMin,
age_max: jobData.ageMax,
auth_kind: jobData.auth_kind,
platform_type: jobData.platform_type,
days_since_login_min: jobData.daysSinceLastLoginMin,
days_since_login_max: jobData.daysSinceLastLoginMax,
chunk_size: jobData.chunkSize ?? 500,
chunk_delay: jobData.chunkDelay ?? 2000,
chunk_index: chunkIndex,
is_success: response.responses[idx].success,
sent_at: new Date(),
error_message: response.responses[idx].error?.message,
error_code: response.responses[idx].error?.code,
retry_count: 0,
campaign_id: msg.data?.campaignId,
created_at: new Date(),
});
allLogs.push(log);
}
// Step 2: Save in batches
let totalSaved = 0;
const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);
for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE) {
const batch = allLogs.slice(i, i + LOG_BATCH_SIZE);
const batchNumber = Math.floor(i / LOG_BATCH_SIZE) + 1;
try {
// ✅ Batch INSERT (much faster than individual saves)
await pushNotificationLogRepository
.createQueryBuilder()
.insert()
.into(PushNotificationLog)
.values(batch)
.execute();
totalSaved += batch.length;
console.log(
`[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
);
} catch (saveError) {
console.error(
`[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
saveError
);
// ✅ Fallback: save individually (resilience)
for (const singleLog of batch) {
try {
await pushNotificationLogRepository.save(singleLog);
totalSaved++;
} catch (individualError) {
console.error('[Logging] Individual save failed:', {
job_id: jobData.jobId,
member_seq: singleLog.member_seq,
error: individualError.message,
});
}
}
}
}
console.log(
`[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
);
} catch (error) {
console.error('[Logging] Fatal error:', error);
// ✅ Don't throw - logging errors shouldn't break notification flow
}
}
Key Improvements:
- Batch INSERT: 80 rows in one query instead of 80 separate queries
- Fallback Strategy: If batch fails, fall back to individual saves
- Non-Throwing: Logging errors are logged but don't propagate
- Parameter Limit Aware: Respects MSSQL's 2100 parameter limit
Step 3: Add Parallel Batch Processing
Further optimization: Process multiple batches in parallel
// save-push-notification-log.ts - Parallel batch processing
export async function savePushNotificationLogs(
pushNotificationLogRepository: Repository,
jobData: ConditionalNotificationParams,
messages: any[],
response: any,
tokenToSeqMap: Map,
chunkIndex: number,
): Promise {
try {
const allLogs: PushNotificationLog[] = [];
const LOG_BATCH_SIZE = 80;
// Create log entities (same as before)
for (let idx = 0; idx < messages.length; idx++) {
// ... entity creation logic
allLogs.push(log);
}
// ✅ New: Parallel batch processing
let totalSaved = 0;
const totalBatches = Math.ceil(allLogs.length / LOG_BATCH_SIZE);
const PARALLEL_BATCH_LIMIT = 3; // Process 3 batches concurrently
// Process in groups of 3 batches
for (let i = 0; i < allLogs.length; i += LOG_BATCH_SIZE * PARALLEL_BATCH_LIMIT) {
const batchPromises = [];
// Create up to 3 batch promises
for (let j = 0; j < PARALLEL_BATCH_LIMIT && i + j * LOG_BATCH_SIZE < allLogs.length; j++) {
const startIdx = i + j * LOG_BATCH_SIZE;
const batch = allLogs.slice(startIdx, startIdx + LOG_BATCH_SIZE);
const batchNumber = Math.floor(startIdx / LOG_BATCH_SIZE) + 1;
// ✅ Create promise for this batch (not awaited yet!)
const batchPromise = (async () => {
try {
await pushNotificationLogRepository
.createQueryBuilder()
.insert()
.into(PushNotificationLog)
.values(batch)
.execute();
totalSaved += batch.length;
console.log(
`[Logging] Chunk ${chunkIndex} - Batch ${batchNumber}/${totalBatches} saved: ${batch.length} logs`
);
} catch (saveError) {
console.error(
`[Logging] Chunk ${chunkIndex} - Batch ${batchNumber} failed:`,
saveError
);
// Fallback: individual saves
for (const singleLog of batch) {
try {
await pushNotificationLogRepository.save(singleLog);
totalSaved++;
} catch (individualError) {
console.error('[Logging] Individual save failed:', {
job_id: jobData.jobId,
member_seq: singleLog.member_seq,
error: individualError.message,
});
}
}
}
})();
batchPromises.push(batchPromise);
}
// ✅ Wait for this group of 3 batches to complete
await Promise.all(batchPromises);
}
console.log(
`[Logging] Chunk ${chunkIndex} completed - ${totalSaved}/${messages.length} logs saved`
);
} catch (error) {
console.error('[Logging] Fatal error:', error);
}
}
Why Parallel Processing?
- Better resource utilization: Database can handle multiple concurrent INSERTs
- Reduced total time: 3 batches in parallel ≈ 3x faster than sequential
- Controlled concurrency: Limited to 3 prevents overwhelming the database
Performance comparison:
For 500 logs (7 batches of 80):
- Sequential batching: 7 batches × 200ms = 1.4 seconds
- Parallel batching (3x): 3 batches + 3 batches + 1 batch = ~600ms ⚡
- Improvement: 2.3x faster
Step 4: Handle Errors Gracefully
Critical principle: Log saving errors must not break notification sending
// firebase.service.ts - Error handling
const logPromise = savePushNotificationLogs(
this.pushNotificationLog,
jobData,
messages,
response,
tokenToSeqMap,
chunkIndex,
).catch(error => {
// ✅ Catch and log errors, but don't throw
console.error(`[Service] Chunk ${chunkIndex + 1} log error:`, error);
// ✅ Optional: Send alert to monitoring system
this.monitoringService.logError('push-log-save-failed', {
jobId: jobData.jobId,
chunkIndex,
error: error.message,
});
// ✅ Continue execution - notifications already sent successfully
});
logSavePromises.push(logPromise);
Error handling strategy:
- Catch at promise level: Don't let errors propagate to main flow
- Log for debugging: Record error details for investigation
- Alert if critical: Send to monitoring system (Sentry, DataDog, etc.)
- Continue execution: Notifications succeeded, logs are secondary
Complete Flow Diagram
┌────────────────────────────────────────────────────────────┐
│ Main Notification Flow │
└────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Query Database │
│ Filter Users │
└────────┬────────┘
│
┌────────▼────────┐
│ Split into │
│ Chunks │
└────────┬────────┘
│
┌────────▼────────┐
┌────▶│ Send Chunk 1 │
│ └────────┬────────┘
│ │
│ ┌────────▼────────┐
│ │ Fire async log │──┐
│ │ (don't await) │ │ Promise 1
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
├────▶│ Send Chunk 2 │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Fire async log │──┤ Promise 2
│ │ (don't await) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
└─────│ Send Chunk N │ │
└────────┬────────┘ │
│ │
┌────────▼────────┐ │
│ Fire async log │──┤ Promise N
│ (don't await) │ │
└────────┬────────┘ │
│ │
┌────────▼────────┐ │
│ All chunks sent │ │
│ Successfully! │ │
└────────┬────────┘ │
│ │
┌────────▼────────┐ │
│ await Promise │◀─┘
│ .all(logPromises│
└────────┬────────┘
│
┌────────▼────────┐
│ All logs saved! │
└─────────────────┘
┌────────────────────────────────────────────────────────────┐
│ Async Log Saving (in parallel) │
└────────────────────────────────────────────────────────────┘
Each Promise:
┌──────────────────┐
│ Create log │
│ entities │
│ (in-memory) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Split into │
│ batches of 80 │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Process 3 │
│ batches in │
│ parallel │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Batch INSERT │
│ (80 rows at once)│
└────────┬─────────┘
│
┌────────▼─────────┐
│ Success! ✅ │
└──────────────────┘
Performance Results
After implementing async batch logging:
Single Chunk (500 notifications)
| Metric | Before (Sync) | After (Async Batch) | Improvement |
|---|---|---|---|
| Notification send time | 1.2s | 1.2s | Same |
| Log save time (blocking) | 18-25s | 0s (non-blocking) | ∞ better |
| Total chunk time | 19-26s | 1.2s | 16-21x faster |
| Log save time (async) | N/A | 0.6-0.8s | 23-42x faster |
Full Job (100,000 notifications / 200 chunks)
| Metric | Before | After | Improvement |
|---|---|---|---|
| Total notification time | ~90 minutes | ~8 minutes | 11x faster |
| Log saving overhead | 1,025% | ~5% | 200x better |
| Database connections used | 100-150 | 5-10 | 10-30x fewer |
| Total logs saved | ~95,000 (95%) | ~99,500 (99.5%) | 4.7% better |
Real-world impact:
- Notifications complete in 8 minutes instead of 90 minutes
- Logs finish saving ~2 minutes after notifications
- Database connection pool never exhausts
- No notification delays due to logging
Database Performance
Query count reduction:
| Operation | Before | After | Improvement |
|---|---|---|---|
| INSERT queries | 100,000 | 1,250 | 80x fewer |
| Transactions | 100,000 | 1,250 | 80x fewer |
| Network round-trips | 100,000 | 1,250 | 80x fewer |
| Index updates | 100,000 (serial) | 1,250 (batched) | Faster |
Database metrics during 100K notification job:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Peak connections | 145 | 12 | 12x fewer |
| CPU utilization | 85% | 22% | 3.9x lower |
| Transaction log growth | 450 MB | 180 MB | 2.5x smaller |
| Lock wait time | 3,200ms | 120ms | 26x faster |
Implementation Best Practices
Practice 1: Understand Your Database Limits
MSSQL Parameter Limit: 2100
// ❌ Bad: Will fail with large batches
await repository
.createQueryBuilder()
.insert()
.values(logsArray) // If > 84 logs (2100 / 25 columns), FAILS!
.execute();
// ✅ Good: Respect parameter limits
const COLUMNS = 25;
const MAX_PARAMS = 2100;
const MAX_BATCH = Math.floor(MAX_PARAMS / COLUMNS); // 84
const SAFE_BATCH = MAX_BATCH - 4; // 80 (safety margin)
const batches = chunkArray(logsArray, SAFE_BATCH);
for (const batch of batches) {
await repository
.createQueryBuilder()
.insert()
.values(batch)
.execute();
}
Different databases have different limits:
- MSSQL: 2100 parameters
- PostgreSQL: No hard limit (but 65,535 parameters per statement recommended)
-
MySQL: 65,535 parameters (but typically limited by
max_allowed_packet)
Practice 2: Balance Concurrency
Too few parallel operations: Underutilized
const PARALLEL_BATCH_LIMIT = 1; // ❌ Sequential, slow
Too many parallel operations: Overwhelms database
const PARALLEL_BATCH_LIMIT = 20; // ❌ Connection pool exhausted
Goldilocks zone: 3-5 concurrent operations
const PARALLEL_BATCH_LIMIT = 3; // ✅ Sweet spot for most databases
How to find your optimal concurrency:
- Monitor connection pool usage
// Log connection pool stats
const poolSize = dataSource.driver.master.poolSize;
const availableConnections = dataSource.driver.master.availableConnections;
console.log(`Pool: ${availableConnections}/${poolSize} available`);
- Start low (2-3) and increase gradually
- Watch for connection pool exhaustion warnings
- Monitor database CPU and lock waits
Practice 3: Implement Graceful Fallbacks
Multi-layer fallback strategy:
try {
// Layer 1: Try batch INSERT (fastest)
await repository
.createQueryBuilder()
.insert()
.values(batch)
.execute();
} catch (batchError) {
console.error('Batch insert failed, trying individual saves:', batchError);
// Layer 2: Fall back to individual saves
for (const log of batch) {
try {
await repository.save(log);
} catch (individualError) {
console.error('Individual save failed:', individualError);
// Layer 3: Write to dead letter queue or file
await this.writeToDeadLetterQueue(log, individualError);
}
}
}
Practice 4: Monitor and Alert
Key metrics to track:
// Track log save performance
const logSaveMetrics = {
totalLogs: messages.length,
startTime: Date.now(),
batchCount: 0,
failedLogs: 0,
};
// After saving
logSaveMetrics.batchCount = batches.length;
logSaveMetrics.duration = Date.now() - logSaveMetrics.startTime;
logSaveMetrics.logsPerSecond = logSaveMetrics.totalLogs / (logSaveMetrics.duration / 1000);
// Send to monitoring
this.metricsService.recordHistogram('push_log_save_duration', logSaveMetrics.duration);
this.metricsService.recordGauge('push_log_throughput', logSaveMetrics.logsPerSecond);
// Alert if performance degrades
if (logSaveMetrics.logsPerSecond < 500) {
this.alertService.warn('push-log-slow', {
throughput: logSaveMetrics.logsPerSecond,
jobId: jobData.jobId,
});
}
Practice 5: Handle Partial Failures
Don't fail entire job for logging errors:
// ✅ Good: Track failures but continue
let successCount = 0;
let failureCount = 0;
for (const batch of batches) {
try {
await saveBatch(batch);
successCount += batch.length;
} catch (error) {
failureCount += batch.length;
console.error('Batch save failed:', error);
// Continue to next batch
}
}
console.log(`Logs saved: ${successCount}/${successCount + failureCount}`);
// ❌ Bad: Throw on first error
for (const batch of batches) {
await saveBatch(batch); // Throws on error, stops entire process
}
Trade-offs and Considerations
When to Use Async Batch Logging
✅ Use async batching when:
- Logs are audit trail, not critical path
- High-volume operations (1,000+ logs per job)
- Write performance matters more than immediate consistency
- You can tolerate slight delays in log visibility
- Database is under heavy load
❌ Don't use async batching when:
- Logs must be visible immediately (real-time dashboards)
- Transactional consistency required (logs + data in same transaction)
- Very low volume (< 100 logs total)
- Debugging active issues (synchronous easier to trace)
Data Consistency Considerations
Eventual consistency model:
- Notifications complete at time T
- Logs finish saving at time T + 1-2 minutes
- During that window, dashboards may show incomplete data
Mitigation strategies:
- Add "in progress" status
await jobStatusRepository.update(jobId, {
status: 'SENDING', // Not yet "COMPLETED"
logsStatus: 'PENDING',
});
// After Promise.all(logPromises)
await jobStatusRepository.update(jobId, {
status: 'COMPLETED',
logsStatus: 'SAVED',
});
- Show estimated counts
Dashboard: "Sent: ~100,000 notifications (logs still processing)"
- Eventual consistency indicator
if (job.logsStatus === 'PENDING') {
showWarning('Log counts may be incomplete for recent jobs');
}
Common Pitfalls and Solutions
Pitfall 1: Memory Leaks with Large Arrays
❌ Problem:
const allLogs: PushNotificationLog[] = [];
// Load 100,000 logs into memory at once
for (let i = 0; i < 100000; i++) {
allLogs.push(createLog(i));
}
// Memory usage: ~500MB for logs alone
✅ Solution: Stream or process in chunks
// Process in chunks to limit memory
const CHUNK_SIZE = 1000;
for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
const chunk = messages.slice(i, i + CHUNK_SIZE);
const logs = chunk.map(msg => createLog(msg));
await saveBatchOfLogs(logs);
// Logs eligible for garbage collection after this
}
Pitfall 2: Not Handling Connection Failures
❌ Problem:
// No retry logic
await repository.insert().values(batch).execute();
✅ Solution: Implement retry with backoff
async function saveWithRetry(batch: any[], maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
await repository.insert().values(batch).execute();
return; // Success
} catch (error) {
if (attempt === maxRetries) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
console.log(`Retry ${attempt}/${maxRetries} after ${delay}ms`);
await sleep(delay);
}
}
}
Pitfall 3: Ignoring Transaction Isolation
❌ Problem:
// In high-concurrency scenarios, rows can be skipped or duplicated
const lastId = await getLastLogId();
const newLog = createLog({ id: lastId + 1 });
await save(newLog); // Race condition!
✅ Solution: Use database auto-increment or UUIDs
// Let database handle ID generation
@Entity()
export class PushNotificationLog {
@PrimaryGeneratedColumn()
id: number; // Auto-increment
// Or use UUID
@PrimaryGeneratedColumn('uuid')
id: string;
}
Pitfall 4: Not Monitoring Log Lag
Problem: Logs queue up but you don't notice until system crashes
✅ Solution: Monitor async queue depth
// Track pending log promises
class LoggingMonitor {
private pendingPromises = new Set<Promise>();
addPromise(promise: Promise) {
this.pendingPromises.add(promise);
promise.finally(() => {
this.pendingPromises.delete(promise);
});
// Alert if too many pending
if (this.pendingPromises.size > 50) {
this.alertService.warn('log-queue-growing', {
queueSize: this.pendingPromises.size,
});
}
}
async waitForAll() {
await Promise.all(Array.from(this.pendingPromises));
}
}
Lessons Learned
1. Async Doesn't Mean Fire-and-Forget
Initially, I thought "async logging" meant "don't care about logs." Wrong!
Key insight: Async means "deferred, not ignored."
- Still need to ensure logs eventually save
- Still need to handle errors
- Still need to wait at job completion
- Just don't block the critical path
2. Batch Size Matters More Than You Think
I experimented with different batch sizes:
| Batch Size | Queries | Time | Sweet Spot? |
|---|---|---|---|
| 1 | 100,000 | 90 min | ❌ Too slow |
| 10 | 10,000 | 15 min | ❌ Still slow |
| 50 | 2,000 | 4 min | ⚠️ Better |
| 80 | 1,250 | 2 min | ✅ Optimal |
| 150 | 667 | 1.5 min | ❌ Param limit |
The winner: 80 rows per batch
- Balances parameter limit safety
- Minimizes query count
- Optimal for MSSQL's query planner
3. Parallel Processing Has Diminishing Returns
Testing parallel batch processing:
| Concurrency | Time | DB CPU | Connections |
|---|---|---|---|
| 1 (serial) | 3.2min | 25% | 2 |
| 2 | 1.8min | 35% | 4 |
| 3 | 1.4min | 42% | 6 |
| 5 | 1.3min | 68% | 10 |
| 10 | 1.3min | 95% | 20 |
| 20 | 1.4min | 98% | 35 |
Lesson: Find the knee of the curve (3-5 for most setups)
4. Error Handling Is Not Optional
In production, I saw:
- Network blips causing batch failures
- Deadlocks in concurrent writes
- Temporary connection pool exhaustion
- Constraint violations from data issues
Without proper error handling:
- Jobs failed completely
- Thousands of logs lost
- No visibility into what failed
With proper error handling:
- Jobs complete even with partial log failures
- Individual failed logs recorded
- Automatic retries for transient errors
- Full audit trail maintained
Conclusion
Transforming push notification logging from a synchronous bottleneck to an asynchronous batch processing system was one of the most impactful optimizations in this project. The 11x improvement in total job time and 80x reduction in database queries came from understanding a few key principles:
- Decouple critical path from audit trail - Logs shouldn't block notifications
- Batch relentlessly - Databases love bulk operations
- Parallelize wisely - 3-5 concurrent operations hits the sweet spot
- Handle errors gracefully - Logging errors shouldn't fail the job
The combination of async promises (Promise.all()), batch INSERTs (80 rows per query), and controlled parallelism (3 concurrent batches) reduced log saving time from 82 minutes to 2 minutes while maintaining 99.5% success rate.
Key Takeaways
- Separate critical path operations from audit/logging operations
- Use batch INSERT instead of individual saves (80x faster)
- Process batches in parallel (3x additional speedup)
- Handle errors gracefully with fallback strategies
- Respect database parameter limits (MSSQL: 2100 params)
- Monitor async queue depth to prevent memory issues
- Use
Promise.all()to wait for async operations before job completion - Balance concurrency based on connection pool size and database capacity
Total improvement: 100,000 notifications
- Before: 90 minutes (with timeouts and failures)
- After: 8 minutes (reliable, scalable)
- Overall improvement: 11x faster 🚀
Top comments (0)