How we measure actual device reach rates by analyzing production send results, classify error types, and maintain 85%+ delivery rates across 50M monthly notifications
"Our dry-run predicted 79% delivery rate. Production hit 80.2%."
My engineering lead looked at me skeptically. "That's... suspiciously accurate. How do we know the tokens we validated are still valid when we actually send?"
He had a point. Token validation (dry-run) tells you if a token was valid at validation time. Actual delivery tells you if the token was valid at send time. In high-volume push systems with millions of users, those two moments can be minutes, hours, or days apart.
In this post, I'll show you how we measure real device delivery rates by analyzing actual FCM send results, classify errors to understand why messages fail, and use that data to maintain 85%+ delivery rates across 50 million monthly notifications.
The gap between validation and reality
Here's what most tutorials don't tell you: Firebase dry-run and actual sends can produce different results.
Dry-run validation (covered in previous post):
const response = await admin.messaging().send(message, true); // dry-run = true
// Returns: { success: true } - token is valid RIGHT NOW
Actual send (this post):
const response = await admin.messaging().send(message); // dry-run = false
// Returns: { success: true } - message DELIVERED to device
Why the difference matters:
| Scenario | Dry-run Result | Actual Send Result |
|---|---|---|
| User uninstalled app 5 minutes ago | ✅ Success | ❌ invalid-registration-token
|
| User's device is offline | ✅ Success | ✅ Success (queued for delivery) |
| FCM quota exceeded | ✅ Success | ❌ quota-exceeded
|
| Network timeout during send | ✅ Success | ❌ unavailable
|
The reality: You need BOTH validation and delivery measurement.
- Validation (dry-run): Pre-send health check
- Delivery measurement (actual send): Ground truth
This post focuses on the latter: measuring what actually happened.
Why measure delivery rates at all?
Business impact:
When we started measuring delivery rates in January 2025, we discovered:
- Reported: "Sent to 500,000 users"
- Reality: Delivered to 350,000 devices (70% delivery rate)
- Gap: 150,000 users never had a chance to see the notification
Stakeholder impact:
Product Manager: "We sent to 500K users, why did only 80K click?"
Dev Team: "That's a 16% click rate!"
Product Manager: "No, it's actually 23% click rate (80K / 350K actual reach)"
Without delivery rate tracking, we were misunderstanding our engagement metrics.
Cost impact:
- Processing 150K invalid tokens = 24 minutes of server time per campaign
- 50 campaigns/month × 24 minutes = 20 hours/month wasted
- Database writes for failed sends = 7.5M unnecessary operations/month
How Firebase FCM responses reveal delivery truth
Every FCM send returns detailed response data:
const response = await admin.messaging().sendEachForMulticast({
tokens: ['token1', 'token2', ...],
notification: { title: 'Hello', body: 'World' },
data: { campaignId: '12345' },
});
console.log(response);
Response structure:
{
successCount: 7243,
failureCount: 2757,
responses: [
// Success example
{
success: true,
messageId: 'projects/my-app/messages/0:1234567890'
},
// Failure examples
{
success: false,
error: {
code: 'messaging/invalid-registration-token',
message: 'The registration token is not valid anymore'
}
},
{
success: false,
error: {
code: 'messaging/server-unavailable',
message: 'The server is temporarily unavailable'
}
},
// ... 10,000 total responses (one per token)
]
}
The goldmine: The error.code field tells you exactly WHY a send failed.
Error classification: the key to actionable metrics
Not all failures are equal. Some are permanent (bad token), others are temporary (retry might work).
// fcm-error-classifier.ts
export type FcmErrorType = 'invalid_token' | 'temporary' | 'quota' | 'other';
export function classifyFcmError(errorCode?: string): FcmErrorType {
if (!errorCode) return 'other';
// ❌ PERMANENT FAILURES - Token is completely dead
const INVALID_TOKEN_ERRORS = [
'messaging/invalid-registration-token',
'messaging/registration-token-not-registered',
'messaging/invalid-argument',
];
if (INVALID_TOKEN_ERRORS.includes(errorCode)) {
return 'invalid_token';
}
// ⏳ TEMPORARY FAILURES - Retry might succeed
const TEMPORARY_ERRORS = [
'messaging/unavailable',
'messaging/internal-error',
'messaging/server-unavailable',
'messaging/timeout',
'messaging/unknown-error',
];
if (TEMPORARY_ERRORS.includes(errorCode)) {
return 'temporary';
}
// 🚫 QUOTA EXCEEDED - Rate limiting
if (errorCode === 'messaging/quota-exceeded') {
return 'quota';
}
// ❓ UNKNOWN ERRORS
return 'other';
}
Why this classification matters:
Invalid Token (❌ Permanent):
- Action: Remove from database immediately
- Retry: Pointless - will always fail
- Cause: User uninstalled app, token expired, device was factory reset
Temporary (⏳ Retry-able):
- Action: Retry after backoff delay
- Retry: 70-80% success rate on retry
- Cause: Network hiccup, FCM infrastructure maintenance, device temporarily offline
Quota (🚫 Rate Limit):
- Action: Wait and retry later
- Retry: 100% success after rate limit window
- Cause: Too many requests too fast
Other (❓ Unknown):
- Action: Log for investigation
- Retry: Case-by-case decision
- Cause: New error types, unexpected scenarios
Storing delivery results: the audit trail
We store every single send result in the database for analysis:
// push-notification-log.entity.ts
@Entity({ name: 'push_notification_log' })
export class PushNotificationLog {
@PrimaryGeneratedColumn({ type: 'bigint' })
id: number;
@Column({ type: 'varchar', length: 200 })
job_id: string; // e.g., "production-blackfriday-2025"
@Column({ type: 'int' })
member_seq: number; // User identifier
@Column({ type: 'varchar', length: 500 })
push_token: string;
// ⭐ Success/failure tracking
@Column({ type: 'bit', default: false })
is_success: boolean;
@Column({ type: 'datetime2' })
sent_at: Date;
// ⭐ Error details (null if success)
@Column({ type: 'varchar', length: 50, nullable: true })
error_code: string; // FCM error code
@Column({ type: 'nvarchar', length: 500, nullable: true })
error_message: string;
// ⭐ Error classification for analytics
@Column({ type: 'varchar', length: 30, nullable: true })
error_type: 'invalid_token' | 'temporary' | 'quota' | 'other';
// Campaign details
@Column({ type: 'nvarchar', length: 200 })
title: string;
@Column({ type: 'nvarchar', length: 1000 })
content: string;
// Metadata
@Column({ type: 'int', nullable: true })
chunk_index: number; // Which batch was this part of
@Column({ type: 'bit', nullable: true, default: false })
is_dry_run: boolean; // false = production send
// Retry tracking
@Column({ type: 'int', default: 0 })
retry_count: number; // How many times retried
@Column({ type: 'bit', default: false })
retry_success: boolean; // Did retry succeed?
}
Indexes for fast queries:
@Index(['job_id', 'is_success']) // Fast delivery rate queries
@Index(['job_id', 'error_type']) // Fast error breakdown
@Index(['sent_at']) // Time-series analysis
@Index(['member_seq', 'sent_at']) // User-level tracking
export class PushNotificationLog { /* ... */ }
Implementation: capturing delivery results
Here's how we save every send result:
// firebase.service.ts
async sendConditionalNotifications(
jobData: ConditionalNotificationParams
): Promise {
// ... Get target tokens from database ...
const tokens = await this.getTargetTokens(jobData);
const chunks = chunkArray(tokens, 500); // 500 tokens per chunk
let totalSuccess = 0;
let totalFailed = 0;
const errorStats = {
invalid_token: 0,
temporary: 0,
quota: 0,
other: 0,
};
for (let chunkIndex = 0; chunkIndex < chunks.length; chunkIndex++) {
const chunk = chunks[chunkIndex];
// Build FCM messages
const messages = chunk.map(token => ({
token,
notification: {
title: jobData.title,
body: jobData.content
},
data: {
job_id: jobData.jobId,
campaign_id: jobData.campaignId,
},
}));
try {
// ★ ACTUAL SEND (not dry-run)
const response = await this.firebaseApp
.messaging()
.sendEachForMulticast({
tokens: chunk,
notification: messages[0].notification,
data: messages[0].data,
});
console.log(`
Chunk ${chunkIndex + 1}/${chunks.length}:
✅ Success: ${response.successCount}
❌ Failed: ${response.failureCount}
`);
// ⭐ ANALYZE EACH RESPONSE
for (let i = 0; i < response.responses.length; i++) {
const resp = response.responses[i];
const message = messages[i];
// Create log entry
const log = new PushNotificationLog({
job_id: jobData.jobId,
member_seq: await this.getMemberSeq(message.token),
push_token: message.token,
title: jobData.title,
content: jobData.content,
sent_at: new Date(),
chunk_index: chunkIndex,
is_dry_run: false, // ✅ Production send
});
if (resp.success) {
// ✅ Success - message delivered
log.is_success = true;
totalSuccess++;
} else {
// ❌ Failure - classify error
log.is_success = false;
log.error_code = resp.error?.code;
log.error_message = resp.error?.message;
log.error_type = classifyFcmError(resp.error?.code);
totalFailed++;
errorStats[log.error_type]++;
// Log detailed error for debugging
console.error(`
Token: ${message.token.substring(0, 30)}...
Error: ${resp.error?.code}
Message: ${resp.error?.message}
`);
}
// ⭐ SAVE TO DATABASE
await this.pushNotificationLog.save(log);
}
// Rate limiting (prevent quota-exceeded)
if (chunkIndex < chunks.length - 1) {
await delay(2000); // 2 seconds between chunks
}
} catch (error) {
console.error(`Chunk ${chunkIndex + 1} failed completely:`, error);
// Save error logs for entire chunk
for (const message of messages) {
const log = new PushNotificationLog({
job_id: jobData.jobId,
member_seq: await this.getMemberSeq(message.token),
push_token: message.token,
title: jobData.title,
content: jobData.content,
sent_at: new Date(),
chunk_index: chunkIndex,
is_success: false,
error_code: 'CHUNK_FAILURE',
error_message: error.message,
error_type: 'other',
is_dry_run: false,
});
await this.pushNotificationLog.save(log);
}
totalFailed += chunk.length;
}
}
// Calculate final delivery rate
const deliveryRate = totalSuccess > 0
? parseFloat(((totalSuccess / (totalSuccess + totalFailed)) * 100).toFixed(2))
: 0;
console.log(`
========== SEND COMPLETE ==========
Total tokens: ${(totalSuccess + totalFailed).toLocaleString()}
✅ Delivered: ${totalSuccess.toLocaleString()} (${deliveryRate}%)
❌ Failed: ${totalFailed.toLocaleString()}
Error breakdown:
- Invalid tokens: ${errorStats.invalid_token.toLocaleString()}
- Temporary errors: ${errorStats.temporary.toLocaleString()}
- Quota errors: ${errorStats.quota.toLocaleString()}
- Other errors: ${errorStats.other.toLocaleString()}
===================================
`);
return {
success: true,
totalTokens: totalSuccess + totalFailed,
deliveredCount: totalSuccess,
failedCount: totalFailed,
deliveryRate,
errorStats,
};
}
Calculating delivery metrics: beyond simple success rates
With detailed error classification, we calculate multiple metrics:
// fcm-error-classifier.ts
export interface FcmErrorStats {
total: number;
success: number;
invalidToken: number;
temporary: number;
quota: number;
other: number;
deliveryRate: number;
successRate: number;
retryableRate: number;
}
export function calculateErrorStats(
logs: PushNotificationLog[]
): FcmErrorStats {
const total = logs.length;
let success = 0;
let invalidToken = 0;
let temporary = 0;
let quota = 0;
let other = 0;
for (const log of logs) {
if (log.is_success) {
success++;
} else {
switch (log.error_type) {
case 'invalid_token': invalidToken++; break;
case 'temporary': temporary++; break;
case 'quota': quota++; break;
default: other++; break;
}
}
}
// ⭐ DELIVERY RATE
// = Successfully delivered / Total attempted
const deliveryRate = total > 0
? parseFloat(((success / total) * 100).toFixed(2))
: 0;
// ⭐ SUCCESS RATE (same as delivery rate for actual sends)
const successRate = deliveryRate;
// ⭐ RETRYABLE RATE
// = Tokens that could succeed on retry / Total attempted
const retryableRate = total > 0
? parseFloat((((temporary + quota) / total) * 100).toFixed(2))
: 0;
return {
total,
success,
invalidToken,
temporary,
quota,
other,
deliveryRate,
successRate,
retryableRate,
};
}
Example output:
const stats = calculateErrorStats(productionLogs);
console.log(`
📊 Delivery Analysis:
======================
Total Attempted: ${stats.total.toLocaleString()}
✅ Delivered: ${stats.success.toLocaleString()}
❌ Invalid Tokens: ${stats.invalidToken.toLocaleString()}
⏳ Temporary Errors: ${stats.temporary.toLocaleString()}
🚫 Quota Errors: ${stats.quota.toLocaleString()}
❓ Other Errors: ${stats.other.toLocaleString()}
📈 Key Metrics:
- Delivery Rate: ${stats.deliveryRate}%
- Retryable Rate: ${stats.retryableRate}%
- Permanent Failure Rate: ${((stats.invalidToken / stats.total) * 100).toFixed(2)}%
`);
Sample output:
📊 Delivery Analysis:
======================
Total Attempted: 500,000
✅ Delivered: 401,000
❌ Invalid Tokens: 87,000
⏳ Temporary Errors: 11,000
🚫 Quota Errors: 800
❓ Other Errors: 200
📈 Key Metrics:
- Delivery Rate: 80.2%
- Retryable Rate: 2.4%
- Permanent Failure Rate: 17.4%
Retry logic: recovering from temporary failures
Temporary errors (unavailable, timeout) often succeed on retry. Here's our retry strategy:
// fcm-retry.utils.ts
export async function sendEachWithRetry(
messaging: admin.messaging.Messaging,
messages: admin.messaging.Message[],
isDryRun: boolean,
retryConfig: {
maxRetries: number;
initialDelayMs: number;
maxDelayMs: number;
}
): Promise {
const { maxRetries, initialDelayMs, maxDelayMs } = retryConfig;
let attempt = 0;
let lastError: Error | null = null;
while (attempt <= maxRetries) {
try {
// Attempt send
const response = await messaging.sendEach(messages, isDryRun);
// If no failures, return immediately
if (response.failureCount === 0) {
return response;
}
// Check if failures are retryable
const retryableIndices: number[] = [];
response.responses.forEach((resp, idx) => {
if (!resp.success) {
const errorType = classifyFcmError(resp.error?.code);
if (errorType === 'temporary' || errorType === 'quota') {
retryableIndices.push(idx);
}
}
});
// If no retryable failures, return current response
if (retryableIndices.length === 0) {
console.log(`No retryable errors. Returning after attempt ${attempt + 1}`);
return response;
}
// Last attempt - return as-is
if (attempt === maxRetries) {
console.log(`Max retries (${maxRetries}) reached. Returning with ${retryableIndices.length} remaining failures.`);
return response;
}
// Prepare retry batch
const retryMessages = retryableIndices.map(idx => messages[idx]);
// Exponential backoff
const delayMs = Math.min(
initialDelayMs * Math.pow(2, attempt),
maxDelayMs
);
console.log(`
Attempt ${attempt + 1}/${maxRetries + 1}:
- Retryable failures: ${retryableIndices.length}
- Waiting ${delayMs}ms before retry
`);
await delay(delayMs);
// Retry just the failed messages
const retryResponse = await messaging.sendEach(retryMessages, isDryRun);
// Merge retry results back into original response
retryResponse.responses.forEach((retryResp, idx) => {
const originalIdx = retryableIndices[idx];
response.responses[originalIdx] = retryResp;
});
// Recalculate success/failure counts
response.successCount = response.responses.filter(r => r.success).length;
response.failureCount = response.responses.filter(r => !r.success).length;
console.log(`
After retry:
- Success: ${response.successCount}
- Failed: ${response.failureCount}
`);
// If all succeeded after retry, return
if (response.failureCount === 0) {
return response;
}
// Otherwise, continue to next retry attempt
attempt++;
} catch (error) {
console.error(`Attempt ${attempt + 1} threw exception:`, error);
lastError = error;
attempt++;
if (attempt <= maxRetries) {
const delayMs = Math.min(
initialDelayMs * Math.pow(2, attempt - 1),
maxDelayMs
);
await delay(delayMs);
}
}
}
// All retries failed
throw lastError || new Error('All retry attempts failed');
}
Retry configuration:
// Production sends
const response = await sendEachWithRetry(
messaging,
messages,
false, // isDryRun = false (actual send)
{
maxRetries: 3, // Try up to 3 times
initialDelayMs: 1000, // Start with 1 second
maxDelayMs: 5000, // Cap at 5 seconds
}
);
// Results:
// - Attempt 1: Instant
// - Attempt 2: After 1 second (2^0 * 1000ms)
// - Attempt 3: After 2 seconds (2^1 * 1000ms)
// - Attempt 4: After 4 seconds (2^2 * 1000ms, but capped at 5s)
Retry success rates (our production data):
Error Type | 1st Retry | 2nd Retry | 3rd Retry | Overall
--------------------|-----------|-----------|-----------|--------
temporary | 78% | 15% | 4% | 97%
quota | 95% | 4% | 1% | 100%
invalid_token | 0% | 0% | 0% | 0%
other | 45% | 20% | 10% | 75%
Key insight: 97% of temporary errors eventually succeed with retry!
Tracking retry results in the database
We track both initial send and retry outcomes:
// After initial send
const initialLog = new PushNotificationLog({
// ... basic fields ...
is_success: false,
error_type: 'temporary',
retry_count: 0,
retry_success: false,
});
await repository.save(initialLog);
// After successful retry
if (retryResponse.success) {
await repository.update(
{ id: initialLog.id },
{
retry_count: attemptNumber,
retry_success: true,
// Note: is_success stays false - original send failed
}
);
console.log(`Token ${token} succeeded on retry ${attemptNumber}`);
}
Query retry effectiveness:
-- Retry success rate by error type
SELECT
error_type,
COUNT(*) as total_failures,
SUM(CASE WHEN retry_success = 1 THEN 1 ELSE 0 END) as retry_successes,
CAST(SUM(CASE WHEN retry_success = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) as retry_success_rate
FROM push_notification_log
WHERE is_success = 0
AND retry_count > 0
AND sent_at >= DATEADD(month, -1, GETDATE())
GROUP BY error_type
ORDER BY total_failures DESC;
Example result:
error_type | total_failures | retry_successes | retry_success_rate
---------------|----------------|-----------------|-------------------
temporary | 45,230 | 43,875 | 97.0%
quota | 2,180 | 2,180 | 100.0%
other | 1,520 | 1,140 | 75.0%
invalid_token | 87,430 | 0 | 0.0%
API endpoint: real-time delivery statistics
We expose delivery metrics via REST API:
// GET /api/campaigns/:jobId/delivery-stats
async getDeliveryStats(jobId: string): Promise {
try {
const logs = await this.pushNotificationLog.find({
where: { job_id: jobId },
select: ['is_success', 'error_code', 'error_type', 'retry_success'],
});
if (logs.length === 0) {
throw new NotFoundException(`No logs found for campaign: ${jobId}`);
}
const stats = calculateErrorStats(logs);
// Additional retry analysis
const retriedLogs = logs.filter(log => log.retry_count > 0);
const retrySuccessCount = retriedLogs.filter(log => log.retry_success).length;
const retrySuccessRate = retriedLogs.length > 0
? parseFloat(((retrySuccessCount / retriedLogs.length) * 100).toFixed(2))
: 0;
console.log(`[getDeliveryStats] Campaign: ${jobId}`);
console.log(` Delivery Rate: ${stats.deliveryRate}%`);
console.log(` Retry Success Rate: ${retrySuccessRate}%`);
return {
...stats,
retryAttempts: retriedLogs.length,
retrySuccesses: retrySuccessCount,
retrySuccessRate,
};
} catch (error) {
console.error(`[getDeliveryStats] Error:`, error);
throw error;
}
}
Response example:
GET /api/campaigns/production-blackfriday-2025/delivery-stats
{
"total": 500000,
"success": 401000,
"invalidToken": 87000,
"temporary": 11000,
"quota": 800,
"other": 200,
"deliveryRate": 80.2,
"successRate": 80.2,
"retryableRate": 2.4,
"retryAttempts": 11800,
"retrySuccesses": 11450,
"retrySuccessRate": 97.0
}
Automated token cleanup: removing dead tokens
After each campaign, we automatically clean up invalid tokens:
// Runs after campaign completes
async function cleanupInvalidTokens(jobId: string) {
console.log(`[Cleanup] Starting for campaign: ${jobId}`);
// Get all permanently invalid tokens
const invalidLogs = await pushNotificationLog.find({
where: {
job_id: jobId,
error_type: 'invalid_token', // Permanent failures only
},
select: ['member_seq', 'push_token', 'error_code'],
});
console.log(`[Cleanup] Found ${invalidLogs.length} invalid tokens`);
if (invalidLogs.length === 0) {
console.log(`[Cleanup] Nothing to clean up`);
return;
}
// Batch update member table
const batchSize = 1000;
let updated = 0;
for (let i = 0; i < invalidLogs.length; i += batchSize) {
const batch = invalidLogs.slice(i, i + batchSize);
const memberSeqs = batch.map(log => log.member_seq);
await memberRepository
.createQueryBuilder()
.update(Member)
.set({
push_token_valid: false,
push_token_invalidated_at: () => 'GETDATE()',
push_token_invalid_reason: 'fcm_invalid_token',
})
.whereInIds(memberSeqs)
.execute();
updated += batch.length;
console.log(`[Cleanup] Updated ${updated}/${invalidLogs.length}`);
}
// Log cleanup event
await cleanupEventLog.save({
campaign_job_id: jobId,
tokens_invalidated: invalidLogs.length,
executed_at: new Date(),
});
console.log(`[Cleanup] ✅ Complete - ${invalidLogs.length} tokens marked invalid`);
}
Database schema for token validity:
ALTER TABLE member
ADD push_token_valid BIT DEFAULT 1,
ADD push_token_invalidated_at DATETIME2 NULL,
ADD push_token_invalid_reason VARCHAR(50) NULL;
-- Index for fast filtering
CREATE INDEX idx_member_valid_tokens
ON member(push_token_valid, push_token)
WHERE push_token IS NOT NULL;
Future queries automatically exclude invalid tokens:
// ✅ Only query valid tokens
const tokens = await memberRepository
.createQueryBuilder('m')
.where('m.push_token_valid = 1') // Valid tokens only
.andWhere('m.push_token IS NOT NULL')
.select(['m.push_token', 'm.seq'])
.getMany();
console.log(`Found ${tokens.length} valid tokens (invalid tokens excluded)`);
Measuring improvement over time
With consistent delivery tracking, we can measure token health trends:
-- Monthly delivery rate trends
SELECT
FORMAT(sent_at, 'yyyy-MM') as month,
COUNT(*) as total_sends,
SUM(CASE WHEN is_success = 1 THEN 1 ELSE 0 END) as successful_sends,
CAST(SUM(CASE WHEN is_success = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) as delivery_rate
FROM push_notification_log
WHERE is_dry_run = 0 -- Production sends only
AND sent_at >= DATEADD(month, -6, GETDATE())
GROUP BY FORMAT(sent_at, 'yyyy-MM')
ORDER BY month;
Our 6-month trend (with automated cleanup):
Month | Total Sends | Successful | Delivery Rate
---------|-------------|------------|---------------
2025-01 | 25,000,000 | 17,500,000 | 70.0%
2025-02 | 28,000,000 | 21,280,000 | 76.0%
2025-03 | 30,000,000 | 24,000,000 | 80.0%
2025-04 | 32,000,000 | 26,560,000 | 83.0%
2025-05 | 35,000,000 | 29,750,000 | 85.0%
2025-06 | 38,000,000 | 32,680,000 | 86.0%
Improvement: 70% → 86% delivery rate (+16 percentage points)
Reason: Automated cleanup after each campaign
Production metrics: 50M monthly notifications
Current performance (June 2025):
Monthly Volume: 50,000,000 notifications
Delivery Rate: 86.0%
Retry Success Rate: 97.0%
Error Breakdown:
- Invalid tokens: 12.0% (down from 30% in Jan)
- Temporary errors: 1.8%
- Quota errors: 0.1%
- Other errors: 0.1%
Time Saved (vs. no cleanup):
- 14% fewer invalid tokens per campaign
- ~12 minutes saved per campaign
- 50 campaigns/month × 12 min = 600 min/month = 10 hours/month
Cost savings (compared to January baseline):
Before cleanup automation (Jan 2025):
- Invalid token rate: 30%
- Monthly invalid sends: 15,000,000
- Server time wasted: 40 hours/month
- DB operations wasted: 15M writes/month
After cleanup automation (Jun 2025):
- Invalid token rate: 12%
- Monthly invalid sends: 6,000,000
- Server time wasted: 16 hours/month
- DB operations wasted: 6M writes/month
Savings:
- Server time: 24 hours/month × $0.10/min × 60 = $144/month = $1,728/year
- DB operations: 9M writes/month × $0.0001 = $900/month = $10,800/year
- Total: $12,528/year
Key takeaways
1. Measure actual delivery, not just attempts
// ❌ Misleading metric
console.log(`Sent to ${totalAttempts} users!`);
// ✅ Accurate metric
console.log(`Delivered to ${actualSuccess} devices (${deliveryRate}%)`);
2. Classify errors for actionable insights
- Invalid tokens: Remove immediately
- Temporary errors: Retry with backoff
- Quota errors: Wait and retry
- Other errors: Investigate and classify
3. Retry temporary failures (97% eventually succeed)
const response = await sendEachWithRetry(messaging, messages, false, {
maxRetries: 3,
initialDelayMs: 1000,
maxDelayMs: 5000,
});
4. Automate token cleanup after every campaign
- Improves delivery rate over time (70% → 86% in 6 months)
- Reduces wasted processing (24 hours/month saved)
- Maintains database health automatically
5. Track trends to measure improvement
SELECT FORMAT(sent_at, 'yyyy-MM'),
AVG(delivery_rate)
FROM monthly_delivery_stats
GROUP BY FORMAT(sent_at, 'yyyy-MM');
6. Store everything for forensic analysis
- Success/failure for every token
- Error codes and classifications
- Retry attempts and results
- Timestamp for time-series analysis
When to use actual delivery measurement vs dry-run validation
Use both in sequence:
// Phase 1: Dry-run validation (pre-send health check)
const validation = await sendConditionalNotifications({
...campaignData,
jobId: 'dryrun-campaign-123',
limit: 10000, // Sample
isDryRun: true,
});
console.log(`Predicted delivery rate: ${validation.deliveryRate}%`);
// Phase 2: Actual send (ground truth measurement)
const production = await sendConditionalNotifications({
...campaignData,
jobId: 'production-campaign-123',
limit: undefined, // Full send
isDryRun: false,
});
console.log(`Actual delivery rate: ${production.deliveryRate}%`);
console.log(`Prediction accuracy: ${Math.abs(validation.deliveryRate - production.deliveryRate).toFixed(1)}%`);
Dry-run (validation):
- ✅ Fast (2 minutes for 10K sample)
- ✅ Zero user impact
- ✅ Predicts delivery rate
- ❌ Not 100% accurate (time gap between test and send)
Actual send (delivery measurement):
- ✅ 100% accurate (ground truth)
- ✅ Reveals real-world issues (network, devices, timing)
- ✅ Enables retry logic
- ❌ Slower (60+ minutes for 500K)
- ❌ User impact (notifications sent)
Best practice: Use both
- Dry-run validation before large campaigns (risk mitigation)
- Actual delivery measurement for all sends (ground truth)
- Compare validation vs delivery for continuous improvement
Top comments (0)