DEV Community

Iniyarajan
Iniyarajan

Posted on

Building Custom Hand Pose Models: From Grief to Innovation

Building Custom Hand Pose Models: From Grief to Innovation

Last month, I found myself in an unexpected place: building an iOS app to honor my recently departed cat, Abu. What started as a grief-driven project to create a digital memorial became a fascinating exploration into custom machine learning models and gesture recognition. This journey taught me something profound about modern iOS development—we don't need to download every app for every need. Sometimes, the most meaningful solutions come from building them ourselves.

The Catalyst: When Loss Drives Creation

After losing Abu, I couldn't bear the thought of using another generic pet tracking app. The existing solutions felt cold and impersonal. Instead, I decided to build something unique—an app where hand gestures could trigger different memories and photo collections. This led me down a rabbit hole of custom hand pose detection, and what I discovered changed how I think about AI integration in iOS apps.

The beauty of modern iOS development is that we can now train custom ML models directly within our apps. Apple's CreateML framework has evolved dramatically, and when combined with Vision framework's hand pose detection, we can create incredibly personalized experiences.

iOS developer workspace
Photo by Christina Morillo on Pexels

Setting Up Custom Hand Pose Detection in iOS

Here's where things get interesting. While Apple provides excellent built-in hand pose detection through the Vision framework, training your own model allows for highly specific gestures that matter to your users. Let me show you how I implemented this in my memorial app.

First, I set up the basic Vision framework integration:

import Vision
import AVFoundation

class HandPoseDetector: NSObject {
    private let handPoseRequest = VNDetectHumanHandPoseRequest()
    private var customGestureClassifier: VNCoreMLModel?

    override init() {
        super.init()
        setupCustomModel()
    }

    private func setupCustomModel() {
        guard let modelURL = Bundle.main.url(forResource: "CustomHandGestures", withExtension: "mlmodelc"),
              let model = try? VNCoreMLModel(for: MLModel(contentsOf: modelURL)) else {
            print("Failed to load custom gesture model")
            return
        }

        self.customGestureClassifier = model
    }

    func detectHandPose(in pixelBuffer: CVPixelBuffer, completion: @escaping (String?) -> Void) {
        let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up)

        do {
            try handler.perform([handPoseRequest])

            guard let observation = handPoseRequest.results?.first else {
                completion(nil)
                return
            }

            processHandPoseObservation(observation, completion: completion)
        } catch {
            print("Hand pose detection failed: \(error)")
            completion(nil)
        }
    }

    private func processHandPoseObservation(_ observation: VNHumanHandPoseObservation, completion: @escaping (String?) -> Void) {
        // Extract hand landmarks and classify gesture
        guard let customClassifier = customGestureClassifier else {
            completion(nil)
            return
        }

        // Convert hand landmarks to feature vector for custom classification
        let landmarks = extractLandmarkFeatures(from: observation)
        classifyCustomGesture(landmarks: landmarks, using: customClassifier, completion: completion)
    }
}
Enter fullscreen mode Exit fullscreen mode

Training Your Custom Model: The AI/ML Deep Dive

This is where the real magic happens. Instead of relying solely on pre-trained models, I created a custom gesture classifier using CreateML. The process involves collecting gesture data, preprocessing it, and training a model that understands the specific hand movements meaningful to your application.

import CreateML
import pandas as pd
import numpy as np

# Data collection and preprocessing for hand pose training
def prepare_hand_pose_data(gesture_recordings):
    """
    Convert recorded hand poses into training data
    Each recording contains 21 hand landmarks with x,y,z coordinates
    """
    features = []
    labels = []

    for recording in gesture_recordings:
        gesture_name = recording['label']
        poses = recording['hand_poses']

        for pose in poses:
            # Flatten 21 landmarks (x,y,z) into 63-feature vector
            feature_vector = []
            for landmark in pose:
                feature_vector.extend([landmark.x, landmark.y, landmark.z])

            # Normalize relative to wrist position
            normalized_features = normalize_hand_pose(feature_vector)
            features.append(normalized_features)
            labels.append(gesture_name)

    return np.array(features), np.array(labels)

def normalize_hand_pose(landmarks):
    """Normalize hand landmarks relative to wrist position"""
    wrist_x, wrist_y, wrist_z = landmarks[0:3]  # Wrist is first landmark
    normalized = []

    for i in range(0, len(landmarks), 3):
        x, y, z = landmarks[i:i+3]
        normalized.extend([
            x - wrist_x,
            y - wrist_y, 
            z - wrist_z
        ])

    return normalized

# Training the custom gesture classifier
def train_custom_gesture_model(training_data, labels):
    """
    Train a custom gesture classification model using CreateML
    """
    df = pd.DataFrame(training_data)
    df['gesture'] = labels

    # Use CreateML's classifier
    classifier = MLClassifier.create(
        df, 
        target='gesture',
        feature_columns=df.columns[:-1].tolist()
    )

    return classifier
Enter fullscreen mode Exit fullscreen mode

ML Model Training Workflow

Practical Implementation: Beyond the Technical Details

The real breakthrough came when I realized that building custom ML models isn't just about the technology—it's about creating deeply personal experiences. In my memorial app for Abu, different hand gestures trigger different memory collections:

  • A closed fist opens his favorite toy photos
  • An open palm shows his sleeping spots around the house
  • A pointing gesture reveals his favorite outdoor adventures

This approach transforms a simple photo gallery into an interactive memorial that responds to natural human gestures. The technical implementation becomes secondary to the emotional connection.

Here's how I integrated the custom gestures with SwiftUI:

struct MemorialView: View {
    @StateObject private var handPoseDetector = HandPoseDetector()
    @State private var currentGesture: String? = nil
    @State private var displayedMemories: [Memory] = []

    var body: some View {
        ZStack {
            CameraPreviewView()
                .onReceive(handPoseDetector.gesturePublisher) { gesture in
                    handleGestureDetected(gesture)
                }

            VStack {
                if let gesture = currentGesture {
                    Text("Gesture: \(gesture)")
                        .font(.headline)
                        .foregroundColor(.white)
                        .padding()
                        .background(Color.black.opacity(0.7))
                        .cornerRadius(10)
                }

                Spacer()

                LazyVGrid(columns: Array(repeating: GridItem(.flexible()), count: 2)) {
                    ForEach(displayedMemories, id: \.id) { memory in
                        MemoryCard(memory: memory)
                    }
                }
            }
            .padding()
        }
    }

    private func handleGestureDetected(_ gesture: String) {
        currentGesture = gesture

        switch gesture {
        case "closed_fist":
            displayedMemories = MemoryStore.shared.toyMemories
        case "open_palm":
            displayedMemories = MemoryStore.shared.sleepingSpots
        case "pointing":
            displayedMemories = MemoryStore.shared.adventures
        default:
            displayedMemories = MemoryStore.shared.allMemories
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The Bigger Picture: Stop Downloading, Start Building

This project taught me something crucial about modern app development. We live in an era where we can build incredibly sophisticated, personalized applications without massive teams or budgets. The combination of iOS development tools, accessible ML frameworks, and creative problem-solving means we can create solutions that are deeply meaningful rather than generically functional.

When I started this project, I considered downloading existing pet memorial apps, photo organizers, or gesture recognition apps. None of them would have captured what I needed—a way to interact with memories through meaningful gestures that Abu and I shared.

App Architecture Flow

Practical Takeaways for Developers

Start with Emotion, Not Technology: The most compelling apps solve personal problems. Don't begin with "what's cool technically"—start with "what matters personally."

Leverage Custom ML Models: Apple's CreateML makes custom model training accessible. You don't need to be an ML expert to create models that understand your specific use case.

Combine Multiple Frameworks: The real power comes from combining Vision, CoreML, SwiftUI, and AVFoundation. Each framework handles what it does best.

Iterate Based on Real Use: I trained my gesture model by actually using the app daily, collecting data on how I naturally wanted to interact with Abu's memories.

Think Beyond Generic Solutions: Custom gestures, personalized UI, and tailored experiences create emotional connections that generic apps can't match.

The Technical Foundation You Need

To implement similar functionality in your own projects:

  1. Set up Vision framework for basic hand pose detection
  2. Collect training data by recording hand gestures in your app
  3. Use CreateML to train a custom gesture classifier
  4. Integrate the model with real-time camera input
  5. Create meaningful interactions based on detected gestures

The entire process, from idea to working app, took me about two weeks of evening development. The longest part wasn't the coding—it was collecting enough gesture training data and fine-tuning the model for reliable recognition.

Conclusion: Building What Matters

Losing Abu was heartbreaking, but building this memorial app became a healing process. More importantly, it showed me that we're living in an incredible time for iOS development. We have the tools to build deeply personal, AI-powered applications that respond to our individual needs and ways of interacting with technology.

The next time you find yourself about to download another generic app, ask yourself: "Could I build something better? Something that actually understands how I want to interact with this problem?" The answer, more often than you might think, is yes.

Modern iOS development, combined with accessible machine learning tools, gives us the power to create applications that are not just functional, but meaningful. Whether you're honoring a lost pet, solving a family problem, or addressing a personal challenge, the tools exist to build exactly what you need.

Stop downloading. Start building. The most important app you'll ever create might be the one that solves your own problem.

Top comments (0)