Piyush Gupta

Posted on Apr 25

Master-Class: Sending Real-Time Updates from Server to Clients: Server to Server, Android, iOS

#systemdesign #backend #communication

Real-time communication is now a cornerstone of modern software. Whether you're showing live scores, streaming AI responses, pushing a payment confirmation to a phone, or propagating an event between two microservices — the underlying question is the same: how does the server reach the client before the client asks?

The answer looks very different depending on who the client is. A backend server, an Android device, and an iPhone each live in fundamentally different environments, face different constraints, and have completely different ecosystems waiting to solve this problem. This article walks through each scenario in depth — the concepts, the practical tooling, and real code.

Part 1: The Mental Model — Push vs. Poll

Before jumping into techniques, it's worth understanding why this is even hard.

HTTP was originally designed around a simple request-response cycle: the client asks, the server answers, connection closes. This works perfectly for loading a webpage but is deeply mismatched for "tell me when something changes." Developers have historically compensated in two ways:

Polling — the client repeatedly asks the server "anything new?" on a timer. Simple to implement, universally supported, but wasteful. You're spending bandwidth and compute on mostly-empty responses.

Long polling — a refinement where the server holds the request open until it actually has something to say, then responds, and the client immediately reconnects. Better latency than polling, but it still creates a new HTTP connection per event, adding overhead and complicating server logic. It was common in the 2000s and early 2010s and is now largely regarded as a legacy fallback.

Push — the server maintains an open channel and sends data down it whenever necessary. This is the modern standard, and the rest of this article is about how it's done in practice.

Part 2: Server to Server

When the "client" is another server or backend service, you have the most flexibility. There's no battery to drain, no mobile OS gating the connection, and no user permission dialogs. The ecosystem here splits broadly into two families: persistent streaming connections and asynchronous messaging.

WebSockets

WebSockets establish a full-duplex, persistent TCP connection via an HTTP upgrade handshake. Once the handshake completes, either side can send frames at any time with minimal overhead — no HTTP headers are re-sent per message.

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade

After this exchange, the connection is a raw duplex channel. A Node.js server using the popular ws library looks like this:

// Server (Node.js)
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  // Push data to the client at any time
  const interval = setInterval(() => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify({ type: 'update', data: getLatestMetrics() }));
    }
  }, 1000);

  ws.on('close', () => clearInterval(interval));
});

# Client (Python)
import asyncio
import websockets
import json

async def listen():
    async with websockets.connect("ws://server.example.com:8080") as ws:
        async for message in ws:
            data = json.loads(message)
            print(f"Received: {data}")

asyncio.run(listen())

WebSockets are the right choice when you need low latency, bidirectional messaging, or binary data. The tradeoff is that they require stateful servers — you can't just put a standard load balancer in front without sticky sessions or a shared pub/sub layer (Redis is commonly used for this).

Server-Sent Events (SSE)

SSE is a simpler, HTTP-based protocol for one-directional streaming from server to client. The server responds with Content-Type: text/event-stream and never closes the connection, sending data in a simple text format:

data: {"type": "price_update", "value": 142.50}\n\n
data: {"type": "price_update", "value": 143.10}\n\n

Each event is separated by a blank line. Events can also carry an id field for resumption after reconnects, and a named event field for routing on the client.

// Server (Node.js / Express)
app.get('/events', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('X-Accel-Buffering', 'no'); // Critical for Nginx

  const sendEvent = (data) => {
    res.write(`id: ${Date.now()}\n`);
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  };

  const interval = setInterval(() => sendEvent({ metrics: getMetrics() }), 2000);
  req.on('close', () => clearInterval(interval));
});

# Client consuming SSE (Python)
import sseclient
import requests

response = requests.get('http://server.example.com/events', stream=True)
client = sseclient.SSEClient(response)

for event in client.events():
    print(f"Received event: {event.data}")

SSE has built-in automatic reconnection — if the connection drops, the client will reconnect and send the last received event ID, allowing the server to resume from where it left off. However, SSE only transmits UTF-8 text and is strictly unidirectional. For server-to-server communication where the receiving service only needs to consume a stream of events, SSE is a lighter-weight and simpler choice than WebSockets.

Important gotcha: corporate proxies and some CDN/reverse proxy configurations buffer SSE streams silently, making events arrive in batches rather than in real time. The X-Accel-Buffering: no header fixes this for Nginx, but intermediaries you don't control remain a problem. For server-to-server communication within a private network, this is rarely an issue.

gRPC Streaming

gRPC is Google's open-source RPC framework, built on HTTP/2 and Protocol Buffers. It supports four communication patterns, including server streaming — where the client makes one request and the server streams back a sequence of responses.

// service.proto
syntax = "proto3";

service MetricsService {
  // Client sends one request, server streams many responses
  rpc StreamMetrics (MetricsRequest) returns (stream MetricsResponse);
}

message MetricsRequest { string service_name = 1; }
message MetricsResponse {
  double cpu_usage = 1;
  double memory_usage = 2;
  int64 timestamp = 3;
}

# Server (Python)
class MetricsServicer(MetricsService):
    def StreamMetrics(self, request, context):
        while context.is_active():
            yield MetricsResponse(
                cpu_usage=get_cpu(),
                memory_usage=get_memory(),
                timestamp=int(time.time())
            )
            time.sleep(1)

// Client (Go)
stream, err := client.StreamMetrics(ctx, &MetricsRequest{ServiceName: "api"})
for {
    resp, err := stream.Recv()
    if err == io.EOF { break }
    fmt.Printf("CPU: %.2f%%, Memory: %.2f%%\n", resp.CpuUsage, resp.MemoryUsage)
}

gRPC is particularly well-suited for internal microservice communication. Binary serialization via Protocol Buffers makes it fast and compact. HTTP/2 multiplexing allows many streams over one connection. The strongly typed contract enforced by .proto files also provides excellent developer ergonomics in polyglot architectures.

Message Queues and Pub/Sub Brokers

For asynchronous delivery — where the receiving server doesn't need to be online at the moment the event is produced — message brokers are the standard approach. They decouple the producer from the consumer and provide durability, retry logic, and fan-out.

Apache Kafka is the dominant choice for high-throughput event streaming. It persists events to disk in ordered, replayable logs called topics. Consumers can catch up from any offset, making it excellent for architectures that need auditability or event sourcing.

# Producer (Python)
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
producer.send('user-events', {'event': 'checkout', 'user_id': 'u123', 'amount': 49.99})
producer.flush()

# Consumer
from kafka import KafkaConsumer
consumer = KafkaConsumer(
    'user-events',
    bootstrap_servers=['kafka:9092'],
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
for message in consumer:
    process_event(message.value)

RabbitMQ is better suited for task queues and flexible routing. It supports exchanges with different routing modes — direct, fanout, topic, and headers — and is widely used for work queues where each message should be processed by exactly one consumer.

# Publisher (Python / pika)
import pika, json

connection = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
channel = connection.channel()
channel.exchange_declare(exchange='notifications', exchange_type='fanout')

channel.basic_publish(
    exchange='notifications',
    routing_key='',
    body=json.dumps({'type': 'payment_confirmed', 'order_id': 'ord_789'})
)

Webhooks deserve a mention here as well. Rather than the consumer server maintaining a persistent connection, the producer simply makes an HTTP POST to a pre-registered URL whenever an event occurs. This is the dominant pattern for third-party integrations (payment processors, GitHub events, Stripe, Twilio, etc.) because it requires no persistent infrastructure on either side. The tradeoff is that the receiving endpoint must be publicly reachable and the producer must handle retries.

Part 3: Server to Android

Android introduces meaningful constraints. Maintaining a persistent TCP connection in the background drains the battery, and Android's OS actively restricts background work. The ecosystem has converged on a clear stack.

Firebase Cloud Messaging (FCM)

FCM is the de facto standard for reaching Android devices from a server. It is Google's cloud-based messaging infrastructure that handles the persistent connection to every Android device so your server doesn't have to.

The architecture involves three parties: your application server, the FCM infrastructure, and the device. Your server sends a payload to FCM's API; FCM routes it to the target device using a persistent, battery-optimized connection maintained by Google Play Services.

The flow:

The Android app registers with FCM on first launch and receives a unique registration token.
The app sends this token to your backend and stores it (typically in a database keyed to the user).
When your server needs to push an update, it posts to FCM's HTTP v1 API with the target token and message payload.
FCM delivers the message to the device, waking the app if necessary.

// Android — FirebaseMessagingService
class MyFirebaseMessagingService : FirebaseMessagingService() {

    // Called when FCM generates a new token (on first install or token refresh)
    override fun onNewToken(token: String) {
        super.onNewToken(token)
        // Send token to your backend so it can reach this device
        sendTokenToServer(token)
    }

    // Called when a message arrives while app is in foreground
    override fun onMessageReceived(remoteMessage: RemoteMessage) {
        remoteMessage.data.isNotEmpty().let {
            val updateType = remoteMessage.data["type"]
            val payload = remoteMessage.data["payload"]
            handleUpdate(updateType, payload)
        }
    }
}

// Server — sending via FCM HTTP v1 API (Node.js)
const admin = require('firebase-admin');
admin.initializeApp({ credential: admin.credential.cert(serviceAccount) });

async function sendUpdate(deviceToken, data) {
    await admin.messaging().send({
        token: deviceToken,
        data: {
            type: 'order_update',
            orderId: 'ord_789',
            status: 'shipped'
        },
        android: {
            priority: 'high',
            ttl: 3600 * 1000 // 1 hour
        }
    });
}

FCM supports two message types. Notification messages are handled automatically by the system — Android displays them in the notification tray without any app code running, which is what most apps use for alerts. Data messages deliver a custom key-value payload to your app's onMessageReceived handler, giving you full control over how to process and display the update. You can combine both in a single message.

Token management is a detail that trips up many implementations. Tokens change when the user reinstalls the app, clears data, or on certain OS events. Your server must handle the UNREGISTERED error from FCM and remove stale tokens, and your app must call onNewToken to push refreshed tokens to the backend.

WebSockets on Android (Foreground Only)

For apps that need real-time streaming while the user is actively using them — a live chat, a trading terminal, a GPS tracker — WebSockets work well on Android. The standard library is OkHttp, which is already a transitive dependency in most Android projects.

// Android WebSocket with OkHttp
val client = OkHttpClient.Builder()
    .pingInterval(30, TimeUnit.SECONDS) // Keep-alive pings
    .build()

val request = Request.Builder().url("wss://api.example.com/stream").build()

val listener = object : WebSocketListener() {
    override fun onOpen(webSocket: WebSocket, response: Response) {
        webSocket.send("""{"action": "subscribe", "channel": "updates"}""")
    }

    override fun onMessage(webSocket: WebSocket, text: String) {
        val update = parseUpdate(text)
        runOnUiThread { updateUI(update) }
    }

    override fun onFailure(webSocket: WebSocket, t: Throwable, response: Response?) {
        // Implement exponential backoff reconnection here
        scheduleReconnect()
    }
}

val webSocket = client.newWebSocket(request, listener)

The critical caveat is lifecycle. Android will kill background services, and a WebSocket in a paused or stopped app is unreliable. For anything that needs to work when the app is not in the foreground, FCM is the right tool.

MQTT — For IoT and Low-Bandwidth Scenarios

MQTT is a lightweight publish/subscribe protocol designed for constrained devices. It runs over TCP and uses a small binary format, making it efficient on poor networks. Apps using Eclipse Paho or HiveMQ client libraries subscribe to topics on a broker; any server can publish to those topics.

// Android MQTT with Paho
val client = MqttAndroidClient(context, "tcp://broker.example.com:1883", clientId)
client.connect()
client.subscribe("sensors/temperature/#", 1) { topic, message ->
    val payload = String(message.payload)
    updateSensorDisplay(topic, payload)
}

MQTT is particularly common in IoT applications where devices are battery-powered or on metered connections, and where the update cadence is high (sensor data every second, for example). For mainstream consumer apps, FCM is a simpler path.

Part 4: Server to iOS

iOS has the most controlled environment of the three. Apple enforces strict rules on background execution and network usage, all in service of battery life and user privacy. The practical ecosystem is narrow but well-designed.

Apple Push Notification service (APNs)

APNs is the only sanctioned way to send server-initiated updates to an iOS device when the app is not in the foreground. There is no alternative. Just like FCM on Android, APNs maintains a persistent, encrypted connection to every Apple device so third-party servers don't have to.

The flow mirrors FCM in structure but differs in the details:

The iOS app requests permission from the user to receive notifications.
On approval, it calls UIApplication.shared.registerForRemoteNotifications().
iOS registers with APNs and returns a device token to application(_:didRegisterForRemoteNotificationsWithDeviceToken:).
The app sends this token to your backend.
Your server constructs a JSON payload and sends it to APNs over HTTP/2, authenticated with a JWT signed by your APNs key.
APNs delivers the notification to the device.

// AppDelegate.swift
func application(_ application: UIApplication,
                 didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
    let center = UNUserNotificationCenter.current()
    center.requestAuthorization(options: [.alert, .sound, .badge]) { granted, error in
        guard granted else { return }
        DispatchQueue.main.async {
            UIApplication.shared.registerForRemoteNotifications()
        }
    }
    return true
}

func application(_ application: UIApplication,
                 didRegisterForRemoteNotificationsWithDeviceToken deviceToken: Data) {
    let token = deviceToken.map { String(format: "%02.2hhx", $0) }.joined()
    sendTokenToBackend(token)
}

func application(_ application: UIApplication,
                 didReceiveRemoteNotification userInfo: [AnyHashable: Any],
                 fetchCompletionHandler completionHandler: @escaping (UIBackgroundFetchResult) -> CompletionHandler) {
    // Handle silent background update
    if let type = userInfo["type"] as? String {
        handleSilentUpdate(type: type) { result in
            completionHandler(result)
        }
    }
}

APNs payload structure:

{
  "aps": {
    "alert": {
      "title": "Order Shipped",
      "body": "Your order #789 has been dispatched."
    },
    "sound": "default",
    "badge": 1
  },
  "order_id": "ord_789",
  "status": "shipped"
}

For silent background updates — where you want to update the app's data without showing a visible notification — use content-available: 1 with no alert key and set the priority to 5 (low priority):

{
  "aps": {
    "content-available": 1
  },
  "type": "cache_refresh",
  "resource": "product_catalog"
}

Authentication: Apple strongly recommends using APNs authentication keys (.p8 files) over the older certificate-based approach. Keys never expire and one key works across all your apps in a team.

# Server — sending to APNs (Python with httpx + PyJWT)
import jwt, time, httpx

def create_apns_jwt(key_id: str, team_id: str, private_key: str) -> str:
    return jwt.encode(
        {"iss": team_id, "iat": time.time()},
        private_key,
        algorithm="ES256",
        headers={"alg": "ES256", "kid": key_id}
    )

async def send_apns_notification(device_token: str, payload: dict):
    token = create_apns_jwt(KEY_ID, TEAM_ID, PRIVATE_KEY)
    url = f"https://api.push.apple.com/3/device/{device_token}"

    async with httpx.AsyncClient(http2=True) as client:
        response = await client.post(
            url,
            headers={
                "authorization": f"bearer {token}",
                "apns-topic": "com.example.myapp",
                "apns-push-type": "alert",
                "apns-priority": "10",
            },
            json=payload
        )
    return response.status_code == 200

Note that APNs requires HTTP/2. The httpx library with [http2] extras, or Apple's own apns2 libraries in various languages, handle this correctly.

WebSockets on iOS (Foreground)

For in-app real-time features, iOS 13+ ships URLSessionWebSocketTask natively, eliminating the need for third-party libraries for basic use cases.

// Native WebSocket (iOS 13+)
class RealtimeManager {
    private var webSocketTask: URLSessionWebSocketTask?
    private let session = URLSession(configuration: .default)

    func connect() {
        let url = URL(string: "wss://api.example.com/stream")!
        webSocketTask = session.webSocketTask(with: url)
        webSocketTask?.resume()
        receiveMessage()
    }

    private func receiveMessage() {
        webSocketTask?.receive { [weak self] result in
            switch result {
            case .success(let message):
                switch message {
                case .string(let text):
                    self?.handleUpdate(text)
                case .data(let data):
                    self?.handleBinaryUpdate(data)
                @unknown default: break
                }
                self?.receiveMessage() // Continue listening
            case .failure(let error):
                self?.scheduleReconnect(after: 3.0)
            }
        }
    }

    func disconnect() {
        webSocketTask?.cancel(with: .goingAway, reason: nil)
    }
}

For older iOS versions or more complex scenarios (automatic reconnection, heartbeats, channel-based pub/sub), libraries like Starscream are commonly used.

Combining APNs with In-App Streaming

A well-architected iOS app typically uses both: APNs for background/offline delivery, and WebSockets for in-app streaming. The logic at the app level checks whether a WebSocket connection is active; if so, the update arrives via that channel. If not, APNs wakes the app or delivers a visible notification. Firebase SDKs handle this abstraction automatically when you use the FCM SDK on iOS, routing through APNs under the hood.

Part 5: Choosing the Right Approach

Here's a practical decision guide based on what the software community actually uses:

Scenario	Recommended Approach
Server → Server (low latency, bidirectional)	WebSockets or gRPC streaming
Server → Server (one-way stream)	SSE or gRPC server streaming
Server → Server (async, durable)	Kafka (high throughput) or RabbitMQ (task routing)
Server → Third-party service	Webhooks (HTTP POST)
Server → Android (background)	FCM
Server → Android (foreground, in-app)	WebSocket (OkHttp)
Server → iOS (background/offline)	APNs (directly or via FCM SDK)
Server → iOS (foreground, in-app)	URLSessionWebSocketTask or Starscream
Server → IoT devices	MQTT

Part 6: Production Considerations

Regardless of which technology you pick, several cross-cutting concerns determine whether a real-time system actually holds up in production.

Reconnection and resilience. Networks fail. Connections drop. Every client implementation needs exponential backoff reconnection logic. SSE and FCM handle this automatically; WebSocket implementations on mobile must do it manually. On the server side, design your event delivery to be idempotent — clients may receive the same event more than once after a reconnect.

Missed events. Persistent connections mean that events produced while a client was offline can be missed. SSE's Last-Event-ID header helps for short outages. For mobile, FCM stores up to 100 messages per device and delivers them when the device comes back online (subject to TTL). For critical business events, use a separate REST endpoint the client can call after reconnecting to fetch the state it missed.

Scalability. A single server can maintain tens of thousands of WebSocket connections, but as you scale horizontally, you need a shared pub/sub layer so that a message produced on Server A can reach a client connected to Server B. Redis Pub/Sub and Redis Streams are the most common solutions for this. Kafka is used when you need durability and replay.

Security. WebSocket connections should always use wss:// (TLS). APNs and FCM connections are encrypted by their respective platforms. JWT or session tokens should be validated during the WebSocket/SSE handshake, not just at connection time. For APNs, rotate your .p8 key if it is ever exposed.

Token hygiene for mobile. Device tokens (both FCM and APNs) change and expire. Build logic to handle registration errors returned by FCM (UNREGISTERED) and APNs (BadDeviceToken, Unregistered) and remove stale tokens from your database immediately. Sending to dead tokens wastes quota and can trigger rate limiting.

Conclusion

Real-time server-to-client communication is not a single technology but a landscape of tools, each optimized for a specific environment and use case. WebSockets and SSE dominate server-to-server streaming; gRPC streaming is the preferred choice in microservice architectures; message brokers like Kafka and RabbitMQ handle asynchronous event propagation at scale. On Android, FCM is the unambiguous standard for background delivery, augmented by WebSockets for in-app streaming. On iOS, everything routes through APNs for background delivery, while URLSessionWebSocketTask handles the foreground case.

The key insight is that mobile operating systems impose constraints that make "just keep a connection open" untenable for background use — which is precisely why platform-managed push infrastructure (FCM, APNs) exists. For server-to-server communication where those constraints don't apply, you have far more freedom, and the choice comes down to latency requirements, directionality, durability needs, and operational complexity you're willing to take on.

Pick the smallest, most appropriate tool for each client type, build reconnection and missed-event recovery into every path, and your real-time system will be robust regardless of what the network throws at it.

DEV Community