DEV Community

Programming Central
Programming Central

Posted on • Originally published at programmingcentral.hashnode.dev

Stop the Wait: Mastering Real-Time AI Token Streaming with Swift and URLSession

The era of the "loading spinner" is dying. If you’ve used ChatGPT, Claude, or any modern generative AI, you’ve noticed the experience isn't about waiting for a monolithic block of text to appear after ten seconds of silence. Instead, the AI "types" to you in real-time. This is token streaming, and it has fundamentally shifted the paradigm of how we build and consume AI-driven applications.

For Swift developers, implementing this isn't just about making things look "cool." It’s about performance, memory efficiency, and perceived latency. In this post, we’ll dive into how to leverage URLSession, AsyncBytes, and Swift’s modern concurrency model to bring real-time AI streaming to your Apple platform apps.

The Paradigm Shift: From Batching to Streaming

Traditionally, networking followed a simple pattern: send a request, wait for the server to finish its work, and receive a complete Data object. While this works for fetching a user profile, it fails for Large Language Models (LLMs). Generating a 500-word response can take significant time; making a user stare at a blank screen for 15 seconds is a recipe for a deleted app.

Token streaming solves this by delivering individual words, punctuation, or sub-words—known as tokens—the moment they are generated.

Why Streaming is Essential for AI:

  • Improved UX: Users see immediate progress, creating a sense of responsiveness.
  • Reduced Memory Footprint: By processing data incrementally, you avoid buffering massive strings in memory.
  • Interactive Interfaces: You can update the UI dynamically, allowing for features like auto-scrolling or even "stop generation" buttons that actually work instantly.

The Core Concept: URLSession and AsyncBytes

The heavy lifting of HTTP streaming in Swift is handled by a powerful addition to URLSession: the bytes(for:) method. Unlike the standard data(for:) method which returns a complete blob of data, bytes(for:) returns a tuple containing URLSession.AsyncBytes.

AsyncBytes is a concrete type that conforms to the AsyncSequence protocol. Think of it as a pipe: as data arrives from the network, it flows through the pipe, and you can "await" each piece as it drops out the other end.

func streamRawBytes(from url: URL) async throws {
    let (asyncBytes, response) = try await URLSession.shared.bytes(for: URLRequest(url: url))

    guard let httpResponse = response as? HTTPURLResponse, httpResponse.statusCode == 200 else {
        throw StreamingError.invalidResponse
    }

    // Iterate over bytes as they arrive in real-time
    for try await byteChunk in asyncBytes {
        // In a real AI scenario, you'd decode these bytes into tokens
        if let token = String(data: Data([byteChunk]), encoding: .utf8) {
            print("Received token: \(token)")
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Managing State with Actors and @observable

Streaming data introduces a classic concurrency challenge: shared mutable state. As tokens stream in from a background network task, you need to append them to a string and update your UI. Doing this unsafely will lead to data races and crashes.

To handle this elegantly, we use Actors for logic isolation and the @observable macro (or ObservableObject) for UI reactivity.

The ChatStreamManager Actor

An actor ensures that only one task can modify the message buffer at a time.

@available(iOS 15.0, *)
actor ChatStreamManager {
    private var messageBuffer: String = ""

    func startStreaming(from url: URL, updateHandler: @MainActor @Sendable (String) -> Void) async throws {
        let (asyncBytes, _) = try await URLSession.shared.bytes(for: URLRequest(url: url))

        for try await byte in asyncBytes {
            try Task.checkCancellation() // Support graceful cancellation

            // Convert byte to string (simplified for example)
            let character = String(UnicodeScalar(byte))
            messageBuffer.append(character)

            // Safely push the update to the MainActor for the UI
            await updateHandler(messageBuffer)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Connecting to SwiftUI

With the @Observable macro (introduced in iOS 17), your SwiftUI views can react to the incoming stream with almost zero boilerplate.

@Observable
@MainActor
class ChatViewModel {
    var currentResponse: String = ""
    var isProcessing: Bool = false

    func processStream() async {
        isProcessing = true
        let manager = ChatStreamManager()

        do {
            try await manager.startStreaming(from: URL(string: "https://api.example.com/v1/chat")!) { updatedText in
                self.currentResponse = updatedText
            }
        } catch {
            print("Streaming failed: \(error)")
        }
        isProcessing = false
    }
}
Enter fullscreen mode Exit fullscreen mode

In your SwiftUI view, simply reading viewModel.currentResponse will trigger a re-render every time a new token arrives, creating that smooth, "typing" animation users expect.

Why This Works: Structured Concurrency

Apple’s design of AsyncBytes isn't just about convenience; it’s about safety and resource management.

  1. Backpressure: The for await loop naturally manages backpressure. If your app’s processing logic slows down, the loop waits, which signals the underlying network layer to throttle the stream.
  2. Cancellation: Because we use Task and Task.checkCancellation(), if a user navigates away from the chat screen, the network connection is severed immediately, saving battery and data.
  3. Sendable Safety: By using Sendable types like String and Data, the compiler guarantees that we aren't passing "unsafe" references between the background streaming task and the main UI thread.

Conclusion: The Future is Incremental

Building AI-powered apps requires moving away from the "request-response" mindset. By embracing URLSession.AsyncBytes and Swift’s structured concurrency, you can build interfaces that feel alive. You aren't just fetching data; you're orchestrating a real-time flow of information from the cloud to the user's fingertips.

Let's Discuss

  1. What is the biggest challenge you've faced when trying to keep your UI responsive during long-running network tasks?
  2. With the rise of local LLMs (running on-device), do you think streaming will remain as important, or will the speed of Apple Silicon make batching viable again?

Leave a comment below and let’s talk Swift concurrency!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook
SwiftUI for AI Apps. Building reactive, intelligent interfaces that respond to model outputs, stream tokens, and visualize AI predictions in real time. You can find it here: Leanpub.com or Amazon.

Swift & AI Masterclass:
Book 1: Core ML & Vision Framework.
Book 2: Apple Intelligence & Foundation Models.
Book 3: Natural Language & Speech.
Book 4: SwiftUI for AI Apps.
Book 5: Create ML Studio.
Book 6: MLX Swift & Local LLMs.
Book 7: visionOS & Spatial AI.
Book 8: Swift + OpenAI & LangChain.
Book 9: CoreData, CloudKit & Vector Search.
Book 10: Shipping AI Apps to the App Store.

Check also all the other programming & AI ebooks on python, typescript, c#, swift, kotlin: Leanpub.com or Amazon.

Top comments (0)