Programming Central

Posted on Apr 20 • Originally published at programmingcentral.hashnode.dev

Mastering SwiftUI Overlays: How to Draw Real-Time AI Bounding Boxes with GeometryReader and Canvas

#swift #ai #swiftui

Building an AI-powered app is only half the battle. Whether you’re detecting cats in a living room or tracking sports performance, your users don't want to see raw JSON data—they want to see the results. Visualizing AI outputs through bounding boxes, masks, and keypoints is what turns a "cool tech demo" into a trustworthy, professional application.

In the world of SwiftUI, precision drawing on top of dynamic content used to be a headache. However, with the combination of GeometryReader and Canvas, Apple has provided a high-performance toolkit for creating responsive, real-time overlays. Here’s how you can master these tools for your next Vision or Core ML project.

The Layout Challenge: Why We Need GeometryReader

SwiftUI is famously declarative. We tell the framework what we want (e.g., "an image that fits the screen"), and SwiftUI decides the exact pixel coordinates. This is great for standard UI but difficult for AI tasks where you need to draw a box at a specific coordinate relative to an image.

If your AI model says there’s a "Dog" at x: 0.5, y: 0.2, that coordinate is usually normalized (between 0 and 1). To draw that on an iPhone 15 Pro Max versus an iPad, you need to know the exact runtime dimensions of the image view.

Enter the GeometryProxy

GeometryReader acts as a bridge. It’s a container that "spies" on the layout engine and hands you a GeometryProxy. This proxy gives you:

size: The width and height of the container.
frame(in:): The position of the view relative to the screen or its parent.

By wrapping your image in an .overlay with a GeometryReader, you gain the ability to scale your AI’s normalized coordinates to the actual points on the screen.

High-Performance Drawing with Canvas

While you could use a ZStack with a hundred Rectangle() views, it’s not efficient for real-time video or complex detections. This is where Canvas shines.

Introduced to provide a declarative wrapper around Core Graphics, Canvas is hardware-accelerated via Metal. It provides a GraphicsContext that allows you to draw paths, text, and shapes with minimal overhead. It is specifically designed for scenarios where the UI needs to update rapidly—perfect for 30fps camera feeds.

The Power of GraphicsContext

Inside a Canvas, you aren't just placing views; you are painting. You can:

Stroke and Fill: Draw the bounding box with custom line widths.
Resolve Text: Render labels and confidence scores directly onto the drawing surface.
Apply Transforms: Easily handle orientation changes or coordinate flips.

Putting it Together: The Technical Implementation

To create a reactive AI overlay, you need a clean data flow. In Swift 6, we use @Observable to ensure the UI updates the moment the AI finishes its inference.

1. The Data Model

First, define a Sendable struct to hold your detection data.

struct BoundingBoxDetection: Identifiable, Sendable {
    let id = UUID()
    let label: String
    let confidence: Double
    let rect: CGRect // Normalized (0.0 to 1.0)
}

2. The Overlay View

Next, create the Canvas that will handle the actual drawing. This view takes the imageSize provided by GeometryReader to calculate the final display coordinates.

struct BoundingBoxOverlayView: View {
    let imageSize: CGSize
    let detections: [BoundingBoxDetection]

    var body: some View {
        Canvas { context, size in
            for detection in detections {
                // Scale normalized coordinates to screen points
                let displayRect = CGRect(
                    x: detection.rect.origin.x * imageSize.width,
                    y: detection.rect.origin.y * imageSize.height,
                    width: detection.rect.size.width * imageSize.width,
                    height: detection.rect.size.height * imageSize.height
                )

                // Draw the box
                let path = Path(roundedRect: displayRect, cornerRadius: 4)
                context.stroke(path, with: .color(.green), lineWidth: 2)

                // Draw the label
                let text = Text("\(detection.label) \(Int(detection.confidence * 100))%")
                    .font(.caption.bold())

                let resolvedText = context.resolve(text)
                let textSize = resolvedText.measure(in: size)
                let textRect = CGRect(x: displayRect.minX, y: displayRect.minY - textSize.height, 
                                      width: textSize.width + 4, height: textSize.height)

                context.fill(Path(textRect), with: .color(.green))
                context.draw(resolvedText, at: CGPoint(x: textRect.midX, y: textRect.midY))
            }
        }
    }
}

3. The Main View Strategy

The "Golden Pattern" for AI overlays in SwiftUI is placing the GeometryReader inside an .overlay modifier. This ensures the drawing canvas is always the exact same size as the media being analyzed.

Image("input_image")
    .resizable()
    .aspectRatio(contentMode: .fit)
    .overlay {
        GeometryReader { geometry in
            BoundingBoxOverlayView(
                imageSize: geometry.size,
                detections: viewModel.detections
            )
        }
    }

Why This Matters for Modern Swift Development

By leveraging async/await and Actors, you can run your AI inference on a background thread and seamlessly update the @Observable view model. Because Canvas is so efficient, the transition from "Inference Finished" to "Box Drawn" is virtually instantaneous, providing a fluid experience for the user.

This approach also respects the "Safe Area" and different device aspect ratios automatically. Since GeometryReader reports the actual rendered size of the Image view, your bounding boxes will stay pinned to the correct pixels whether the user is on an iPhone SE or a Pro Max.

Conclusion

The combination of GeometryReader and Canvas transforms SwiftUI from a simple UI framework into a powerful engine for computer vision visualization. By understanding how to bridge the gap between normalized AI coordinates and screen points, you can build apps that feel intelligent, responsive, and professional.

Let's Discuss

Have you found Canvas to be significantly more performant than using standard SwiftUI Shapes for large numbers of detections?
What is your preferred method for handling coordinate transformations when the AI model's aspect ratio doesn't match the UI's contentMode(.fit)?

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook
SwiftUI for AI Apps. Building reactive, intelligent interfaces that respond to model outputs, stream tokens, and visualize AI predictions in real time. You can find it here: Leanpub.com or Amazon.
Check also all the other programming ebooks on python, typescript, c#, swift: Leanpub.com or Amazon.

DEV Community