This article was originally published on Jo4 Blog.
Our notification bell was lying to users.
Not maliciously. It just... lagged. A publisher would submit a bid on a brand's campaign, and the brand wouldn't know for up to 30 seconds. In mobile app terms, 30 seconds is an eternity. Users were refreshing manually. Some thought notifications were broken entirely.
They weren't broken. They were polling. And polling on mobile is a sin we needed to repent for.
The Setup: How We Got Here
Our React Native app (Expo, managed workflow) had a NotificationBell component. Simple enough. It used RTK Query's pollingInterval to hit GET /api/v1/protected/notifications/unread-count every 30 seconds.
It worked on web. It worked on mobile. It "worked."
But here's what "worked" actually meant on a phone sitting in someone's pocket:
- 2,880 HTTP requests per day per active user (one every 30 seconds)
- Battery drain from keeping the radio alive for each poll cycle
- Zero delivery when the app was closed — if you killed the app, you got nothing until you opened it again
- Wasted bandwidth — 99.9% of those responses came back with the same count
We were running a distributed denial-of-service attack against our own API. From our own app. On behalf of our own users.
The "Obvious" Solution: Expo Push Service
We're an Expo shop. EAS project ID configured, expo prebuild for native builds. The natural path was Expo Push Service — a single HTTP POST to exp.host/--/api/v2/push/send that fans out to both APNs and FCM. No Firebase SDK, no APNs HTTP/2 client, free up to 600 notifications per second.
We planned it. We spec'd it. We wrote the migration SQL.
Then we paused and asked ourselves: Do we really want a third-party proxy between us and Apple/Google for something this critical?
The Pivot: Direct FCM + APNs
We chose the harder path. Direct integration with both push services. No middleware, no proxy, no Expo dependency at runtime.
Here's why:
1. Zero runtime dependency. Expo Push Service is free and reliable, but it's still someone else's server. If exp.host goes down at 2 AM, our users don't get notified about a time-sensitive bid. With direct integration, the only failure points are Apple, Google, and us.
2. Full payload control. FCM v1 API and APNs have different payload structures, priority levels, and collapse keys. Going direct means we can tune each platform independently — badge counts on iOS, notification channels on Android, silent pushes for cache invalidation.
3. No token translation. Expo Push Tokens (ExponentPushToken[xxx]) are Expo's abstraction. Native device tokens are what FCM and APNs actually consume. By using getDevicePushTokenAsync() on the client instead of getExpoPushTokenAsync(), we skip the translation layer entirely.
The tradeoff? We had to implement JWT authentication for two different providers, each with their own signing algorithm, token format, and error semantics.
The Backend: Two JWT Dialects
FCM (Android): RSA-256 OAuth Dance
FCM v1 doesn't use a simple API key anymore. It requires a proper OAuth2 service account flow:
- Load the Firebase service account's RSA private key (from an environment variable, never from the JSON file on disk)
- Build a JWT with
RS256, scoped tofirebase.messaging - POST that JWT to Google's token endpoint
- Get back an access token (valid ~1 hour)
- Use that access token as a Bearer header on every FCM send
- Cache it, refresh 5 minutes early
The key separation was deliberate. The jo4-prod-firebase-adminsdk-*.json file lives on the classpath for the client_email field. The actual private key comes from PUSH_FCM_SERVICE_ACCOUNT_KEY as a base64-encoded PEM at runtime. This means the JSON file in source control has no secrets.
APNs (iOS): EC-256 Provider Token
Apple's approach is simpler in some ways, weirder in others:
- Load the
.p8key (EC private key, also from an env var) - Build a JWT with
ES256, issuer = team ID, key ID in the header - That JWT is the auth — no token exchange, just attach it as a bearer header
- Valid for 60 minutes, we refresh at 50
The HTTP/2 requirement is the curveball. APNs requires HTTP/2 — it will reject HTTP/1.1 connections. Java's HttpClient handles this natively (we set HttpClient.Version.HTTP_2 at construction), but it's the kind of thing that silently fails if you're using an older HTTP library.
The Token Lifecycle Problem
Push tokens have a lifecycle that most tutorials gloss over. A device token can become invalid for half a dozen reasons:
- User uninstalled the app
- User disabled notifications in system settings
- Token was refreshed by the OS (happens periodically on both platforms)
- User logged out and the token should no longer receive their notifications
- User logged into a different account on the same device
We handle each case:
Registration (upsert): When the app boots and the user is authenticated, it calls POST /push-token. If that token already exists for a different user, we reassign it (device changed hands). If it's new, we create it.
Unregistration (logout): Before clearing the session, the app calls DELETE /push-token. This soft-deletes the token so the logged-out device stops receiving pushes. Critically, this happens before the auth token is cleared — otherwise the API call would fail with 401.
Auto-cleanup (delivery failure): When FCM returns UNREGISTERED (404) or APNs returns 410 Gone or BadDeviceToken, we soft-delete the token automatically. No stale tokens accumulate.
The soft-delete + hard-delete dance: Here's a subtlety. We use soft-deletes everywhere (BaseEntity pattern). But we also have a partial unique index: UNIQUE (push_token) WHERE deleted = false. If a user unregisters and re-registers the same token, the soft-deleted row would violate the uniqueness constraint. So before soft-deleting, we hard-delete any previously soft-deleted rows with the same token. It's a native SQL query that bypasses our ORM's @SQLRestriction("deleted = false") filter.
The @async + @Transactional Trap
This one nearly cost us a day.
Our push delivery runs inside @Async methods. When a token needs to be soft-deleted (delivery failure), we need a database transaction. The natural instinct is to extract a @Transactional private method.
This does not work.
Spring's @Transactional relies on AOP proxies. When you call a @Transactional method from within the same bean, the call goes through this, not through the proxy. The annotation is silently ignored. Your "transaction" is actually running without one.
Inside an @Async method, you're already past the proxy boundary. Internal calls to @Transactional methods are no-ops.
The fix: TransactionTemplate. Programmatic transaction management that works regardless of proxy context.
void softDeleteToken(UserPushTokenEntity token, String reason) {
transactionTemplate.executeWithoutResult(status -> {
userPushTokenRepository.hardDeleteSoftDeletedByPushToken(token.getPushToken());
token.setDeleted(true);
token.setDeleteReason(reason);
userPushTokenRepository.save(token);
});
}
Not glamorous. But it actually works. Every time.
The Mobile Side: Less Drama, More Plumbing
The React Native side was comparatively calm. A single usePushNotifications hook handles everything:
-
Permission request —
Notifications.getPermissionsAsync()thenrequestPermissionsAsync()if needed -
Token retrieval —
Notifications.getDevicePushTokenAsync()(native token, not Expo token) - Backend registration — RTK Query mutation, best-effort (silently fails if backend is unreachable)
-
Foreground handling —
setNotificationHandlerto show banners even when the app is open -
Cache invalidation — When a push arrives in the foreground, we invalidate RTK Query's
UnreadCountcache tag. TheNotificationBellre-renders with the fresh count. No polling needed. -
Tap routing — When the user taps a notification, we extract
actionUrlfrom the payload data androuter.push()to the right screen
The hook stores the push token in a module-level variable (not React state, not Redux). Why? Because during logout, React state may be mid-teardown and Redux may be mid-reset. A simple module variable survives both.
The NotificationBell: From Polling to Push
Before:
const { data } = useGetUnreadCountQuery(undefined, {
skip: !isAuthenticated,
pollingInterval: 30000, // The sin
});
After:
const { data, refetch } = useGetUnreadCountQuery(undefined, {
skip: !isAuthenticated,
// No polling. Push notifications invalidate the cache.
});
// Only refetch when user returns to the app (tab switch, unlock)
useEffect(() => {
const sub = AppState.addEventListener('change', (next) => {
if (appState.current !== 'active' && next === 'active' && isAuthenticated) {
refetch();
}
appState.current = next;
});
return () => sub.remove();
}, [isAuthenticated, refetch]);
The difference: from 2,880 requests/day to maybe 20-30 (one per app foreground event). Server load dropped. Battery usage dropped. And notifications arrive instantly instead of up to 30 seconds late.
What We Shipped
| Aspect | Before | After |
|---|---|---|
| Delivery mechanism | HTTP polling (30s) | FCM (Android) + APNs (iOS) |
| Background delivery | None | Full system-level push |
| Latency | 0-30 seconds | Sub-second |
| Requests per user/day | ~2,880 | ~20-30 |
| Third-party dependency | None | None (direct to Apple/Google) |
| Token management | N/A | Auto-cleanup on delivery failure |
| Foreground behavior | Badge update on next poll | Instant banner + badge + sound |
Lessons Learned
- Expo Push Service is good. Direct is better. If you're serious about push reliability and payload control, go direct. The implementation cost is a few hundred lines of JWT plumbing.
-
@Asyncand@Transactionaldon't compose. UseTransactionTemplatefor programmatic transactions inside async methods. This isn't a Spring bug — it's how AOP proxies work. -
Soft-delete + unique constraints need careful choreography. Partial unique indexes (
WHERE deleted = false) are powerful but require hard-deleting stale soft-deleted rows before creating new ones. - Store push tokens outside React state for logout. Module-level variables are ugly but survive the teardown chaos of a logout flow.
-
getDevicePushTokenAsync>getExpoPushTokenAsyncif you're doing direct FCM/APNs. Skip the Expo token abstraction layer. - HTTP/2 is mandatory for APNs. Ensure your HTTP client is configured for it explicitly. Silent failures here are painful to debug.
Have you made the polling-to-push jump? What surprised you the most? Drop a comment below.
Building jo4.io — a modern URL shortener with analytics, bio pages, and an affiliate marketplace for creators.
Top comments (0)