DEV Community

Cover image for rtmlib-ts — Real-time Pose Estimation & Object Detection in Browser with TypeScript + YOLO12
goh_dev
goh_dev

Posted on

rtmlib-ts — Real-time Pose Estimation & Object Detection in Browser with TypeScript + YOLO12

Project Overview

Hello dev.to community,

I would like to share a project I have been developing — rtmlib-ts. This is a TypeScript library designed for browser-based AI inference, featuring support for YOLO12. It enables real-time object detection, multi-person pose estimation, animal detection, and even 3D pose tracking directly in the web browser using WebAssembly, without requiring any backend infrastructure.

Repository: https://github.com/GOH23/rtmlib-ts

Live Playground: https://rtmlib-playground.vercel.app/


Core Features

Feature Description
Object Detection 80 COCO classes using YOLO12n
Pose Estimation 17 keypoints skeleton tracking for humans
Animal Detection 30 animal species with pose estimation
3D Pose Estimation Full-body 3D keypoints (x, y, z coordinates)
Custom Models Run any ONNX model with flexible API
Browser-based Pure WebAssembly/WebGPU, no server required
Video Support Real-time camera streams and video files
Performance ~40-200ms inference depending on configuration

Installation

npm install rtmlib-ts
Enter fullscreen mode Exit fullscreen mode

Available Detectors

1. ObjectDetector — General Object Detection

Detects 80 COCO classes (person, car, dog, etc.). Models are loaded automatically.

import { ObjectDetector } from 'rtmlib-ts';

const detector = new ObjectDetector({
  classes: ['person', 'car'],  // Optional: filter classes
  confidence: 0.5,
  inputSize: [416, 416],
  backend: 'wasm',  // or 'webgpu'
});

await detector.init();

const objects = await detector.detectFromCanvas(canvas);
console.log(`Found ${objects.length} objects`);

// Change classes dynamically
detector.setClasses(['dog', 'cat']);
Enter fullscreen mode Exit fullscreen mode

Input Sources:

  • detectFromCanvas(canvas) — HTMLCanvasElement
  • detectFromVideo(video) — HTMLVideoElement (real-time)
  • detectFromImage(image) — HTMLImageElement
  • detectFromFile(file) — File upload
  • detectFromBlob(blob) — Camera capture

2. PoseDetector — Human Pose Estimation

Combines YOLO12 detection with RTMW pose estimation for 17 keypoints.

import { PoseDetector } from 'rtmlib-ts';

const detector = new PoseDetector({
  detInputSize: [416, 416],
  poseInputSize: [384, 288],
  detConfidence: 0.5,
  poseConfidence: 0.3,
  backend: 'wasm',
});

await detector.init();

const people = await detector.detectFromCanvas(canvas);

people.forEach(person => {
  const visibleKpts = person.keypoints.filter(k => k.visible).length;
  console.log(`Person: ${visibleKpts}/17 keypoints visible`);
});
Enter fullscreen mode Exit fullscreen mode

Keypoints (COCO 17): nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles


3. AnimalDetector — Animal Detection & Pose

Supports 30 animal species with bounding boxes and 17 keypoints pose estimation.

import { AnimalDetector } from 'rtmlib-ts';

const detector = new AnimalDetector({
  classes: ['dog', 'cat', 'horse'],  // Filter specific animals
  detConfidence: 0.5,
  poseConfidence: 0.3,
  backend: 'wasm',
});

await detector.init();

const animals = await detector.detectFromCanvas(canvas);

animals.forEach(animal => {
  console.log(`${animal.className}: ${animal.bbox.confidence * 100}%`);
  console.log(`Keypoints: ${animal.keypoints.length}`);
});
Enter fullscreen mode Exit fullscreen mode

Supported Animals: gorilla, spider-monkey, zebra, elephant, hippo, tiger, lion, panda, dog, cat, horse, and 20 more species.


4. Pose3DDetector — 3D Pose Estimation

Provides 3D coordinates (x, y, z in meters) for each keypoint.

import { Pose3DDetector } from 'rtmlib-ts';

const detector = new Pose3DDetector({
  detInputSize: [640, 640],
  poseInputSize: [384, 288],
  backend: 'wasm',
});

await detector.init();

const result = await detector.detectFromCanvas(canvas);

result.keypoints.forEach((person, i) => {
  person.forEach((kpt, j) => {
    // kpt = [x, y, z] in meters
    console.log(`Person ${i}, Keypoint ${j}: [${kpt[0]}, ${kpt[1]}, ${kpt[2]}]`);
  });
});
Enter fullscreen mode Exit fullscreen mode

Output Structure:

  • keypoints — 3D coordinates [numPeople][17][3]
  • keypoints2d — 2D projection for canvas drawing
  • scores — Confidence scores per keypoint

5. CustomDetector — Any ONNX Model

Maximum flexibility for custom ONNX models with preprocessing/postprocessing.

import { CustomDetector } from 'rtmlib-ts';

const detector = new CustomDetector({
  model: 'path/to/model.onnx',
  inputSize: [224, 224],
  normalization: {
    mean: [123.675, 116.28, 103.53],
    std: [58.395, 57.12, 57.375],
  },
  postprocessing: (outputs) => {
    const output = outputs['output'];
    const scores = Array.from(output.data);
    const predictedClass = scores.indexOf(Math.max(...scores));
    return { predictedClass, confidence: scores[predictedClass] };
  },
});

await detector.init();

const result = await detector.runFromCanvas(canvas);
console.log(`Predicted: ${result.data.predictedClass}`);
Enter fullscreen mode Exit fullscreen mode

Use Cases:

  • Image classification (ResNet, MobileNet)
  • Object detection (YOLO variants)
  • Semantic segmentation
  • Face landmarks
  • Custom models

Performance Benchmarks

Typical inference times on M1 MacBook Pro:

Detector Configuration Time Use Case
ObjectDetector WASM, 416×416 ~40ms Real-time video
ObjectDetector WASM, 640×640 ~80ms High accuracy
ObjectDetector WebGPU, 640×640 ~30ms Fastest
PoseDetector WASM, 416×416 + 384×288 ~85ms Real-time pose
PoseDetector WebGPU, 640×640 + 384×288 ~60ms High accuracy
AnimalDetector WASM, 640×640 + 256×192 ~190ms 3 animals
Pose3DDetector WASM, 640×640 + 384×288 ~255ms 3 people 3D

Optimization Tips:

  • Use 416×416 input size for video/real-time applications
  • Use 640×640 for static images where accuracy matters
  • Switch to WebGPU backend if available (Chrome/Edge 94+)
  • Filter classes to reduce processing overhead
  • Increase confidence thresholds to skip low-quality detections
  • Process every Nth frame in video loops

Browser Support

Browser Version Backend
Chrome 94+ WASM, WebGPU
Edge 94+ WASM, WebGPU
Firefox 95+ WASM
Safari 16.4+ WASM

Complete Example — Real-time Video Detection

import { ObjectDetector, drawResultsOnCanvas } from 'rtmlib-ts';

async function startDetection() {
  // Initialize detector
  const detector = new ObjectDetector({
    classes: ['person', 'car'],
    inputSize: [416, 416],
    backend: 'webgpu',  // Use WebGPU if available
  });

  await detector.init();

  // Setup camera
  const video = document.querySelector('video')!;
  const stream = await navigator.mediaDevices.getUserMedia({ 
    video: { width: 1280, height: 720 } 
  });
  video.srcObject = stream;

  // Detection loop
  video.addEventListener('play', async () => {
    while (!video.paused && !video.ended) {
      const objects = await detector.detectFromVideo(video);

      // Draw results
      const canvas = document.querySelector('canvas')!;
      const ctx = canvas.getContext('2d')!;
      drawResultsOnCanvas(ctx, objects, 'object');

      // Get stats
      const stats = (objects as any).stats;
      console.log(`Detected ${stats.totalCount} objects in ${stats.inferenceTime}ms`);

      await new Promise(resolve => requestAnimationFrame(resolve));
    }
  });
}

startDetection();
Enter fullscreen mode Exit fullscreen mode

Drawing Utilities

Built-in canvas drawing functions for quick visualization:

import { 
  drawDetectionsOnCanvas,
  drawPoseOnCanvas,
  drawResultsOnCanvas
} from 'rtmlib-ts';

// Auto-detects mode (object or pose)
drawResultsOnCanvas(ctx, results, 'object');

// Custom drawing
drawDetectionsOnCanvas(ctx, detections, '#00ff00');
drawPoseOnCanvas(ctx, people, 0.3);  // 0.3 confidence threshold
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Issue Solution
No detections Lower confidence threshold (try 0.3)
Slow inference Use WebGPU, reduce input size, filter classes
Model loading failed Use HTTP server (not file://), check CORS
Unknown class Use exact COCO class names (see getAvailableClasses())

Contributing

Feedback and contributions are welcome:

  • Report bugs or request features via Issues
  • Submit code improvements via Pull Requests
  • Star the repository if you find it useful

License

Apache 2.0 — Free for commercial and open-source use.


If you have any questions or suggestions, please leave a comment or start a discussion in the repository.


Top comments (0)