goh_dev

Posted on Mar 17

rtmlib-ts — Real-time Pose Estimation & Object Detection in Browser with TypeScript + YOLO12

#opensource #typescript #webdev #computervision

Project Overview

Hello dev.to community,

I would like to share a project I have been developing — rtmlib-ts. This is a TypeScript library designed for browser-based AI inference, featuring support for YOLO12. It enables real-time object detection, multi-person pose estimation, animal detection, and even 3D pose tracking directly in the web browser using WebAssembly, without requiring any backend infrastructure.

Repository: https://github.com/GOH23/rtmlib-ts

Live Playground: https://rtmlib-playground.vercel.app/

Core Features

Feature	Description
Object Detection	80 COCO classes using YOLO12n
Pose Estimation	17 keypoints skeleton tracking for humans
Animal Detection	30 animal species with pose estimation
3D Pose Estimation	Full-body 3D keypoints (x, y, z coordinates)
Custom Models	Run any ONNX model with flexible API
Browser-based	Pure WebAssembly/WebGPU, no server required
Video Support	Real-time camera streams and video files
Performance	~40-200ms inference depending on configuration

Installation

npm install rtmlib-ts

Available Detectors

1. ObjectDetector — General Object Detection

Detects 80 COCO classes (person, car, dog, etc.). Models are loaded automatically.

import { ObjectDetector } from 'rtmlib-ts';

const detector = new ObjectDetector({
  classes: ['person', 'car'],  // Optional: filter classes
  confidence: 0.5,
  inputSize: [416, 416],
  backend: 'wasm',  // or 'webgpu'
});

await detector.init();

const objects = await detector.detectFromCanvas(canvas);
console.log(`Found ${objects.length} objects`);

// Change classes dynamically
detector.setClasses(['dog', 'cat']);

Input Sources:

detectFromCanvas(canvas) — HTMLCanvasElement
detectFromVideo(video) — HTMLVideoElement (real-time)
detectFromImage(image) — HTMLImageElement
detectFromFile(file) — File upload
detectFromBlob(blob) — Camera capture

2. PoseDetector — Human Pose Estimation

Combines YOLO12 detection with RTMW pose estimation for 17 keypoints.

import { PoseDetector } from 'rtmlib-ts';

const detector = new PoseDetector({
  detInputSize: [416, 416],
  poseInputSize: [384, 288],
  detConfidence: 0.5,
  poseConfidence: 0.3,
  backend: 'wasm',
});

await detector.init();

const people = await detector.detectFromCanvas(canvas);

people.forEach(person => {
  const visibleKpts = person.keypoints.filter(k => k.visible).length;
  console.log(`Person: ${visibleKpts}/17 keypoints visible`);
});

Keypoints (COCO 17): nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles

3. AnimalDetector — Animal Detection & Pose

Supports 30 animal species with bounding boxes and 17 keypoints pose estimation.

import { AnimalDetector } from 'rtmlib-ts';

const detector = new AnimalDetector({
  classes: ['dog', 'cat', 'horse'],  // Filter specific animals
  detConfidence: 0.5,
  poseConfidence: 0.3,
  backend: 'wasm',
});

await detector.init();

const animals = await detector.detectFromCanvas(canvas);

animals.forEach(animal => {
  console.log(`${animal.className}: ${animal.bbox.confidence * 100}%`);
  console.log(`Keypoints: ${animal.keypoints.length}`);
});

Supported Animals: gorilla, spider-monkey, zebra, elephant, hippo, tiger, lion, panda, dog, cat, horse, and 20 more species.

4. Pose3DDetector — 3D Pose Estimation

Provides 3D coordinates (x, y, z in meters) for each keypoint.

import { Pose3DDetector } from 'rtmlib-ts';

const detector = new Pose3DDetector({
  detInputSize: [640, 640],
  poseInputSize: [384, 288],
  backend: 'wasm',
});

await detector.init();

const result = await detector.detectFromCanvas(canvas);

result.keypoints.forEach((person, i) => {
  person.forEach((kpt, j) => {
    // kpt = [x, y, z] in meters
    console.log(`Person ${i}, Keypoint ${j}: [${kpt[0]}, ${kpt[1]}, ${kpt[2]}]`);
  });
});

Output Structure:

keypoints — 3D coordinates [numPeople][17][3]
keypoints2d — 2D projection for canvas drawing
scores — Confidence scores per keypoint

5. CustomDetector — Any ONNX Model

Maximum flexibility for custom ONNX models with preprocessing/postprocessing.

import { CustomDetector } from 'rtmlib-ts';

const detector = new CustomDetector({
  model: 'path/to/model.onnx',
  inputSize: [224, 224],
  normalization: {
    mean: [123.675, 116.28, 103.53],
    std: [58.395, 57.12, 57.375],
  },
  postprocessing: (outputs) => {
    const output = outputs['output'];
    const scores = Array.from(output.data);
    const predictedClass = scores.indexOf(Math.max(...scores));
    return { predictedClass, confidence: scores[predictedClass] };
  },
});

await detector.init();

const result = await detector.runFromCanvas(canvas);
console.log(`Predicted: ${result.data.predictedClass}`);

Use Cases:

Image classification (ResNet, MobileNet)
Object detection (YOLO variants)
Semantic segmentation
Face landmarks
Custom models

Performance Benchmarks

Typical inference times on M1 MacBook Pro:

Detector	Configuration	Time	Use Case
ObjectDetector	WASM, 416×416	~40ms	Real-time video
ObjectDetector	WASM, 640×640	~80ms	High accuracy
ObjectDetector	WebGPU, 640×640	~30ms	Fastest
PoseDetector	WASM, 416×416 + 384×288	~85ms	Real-time pose
PoseDetector	WebGPU, 640×640 + 384×288	~60ms	High accuracy
AnimalDetector	WASM, 640×640 + 256×192	~190ms	3 animals
Pose3DDetector	WASM, 640×640 + 384×288	~255ms	3 people 3D

Optimization Tips:

Use 416×416 input size for video/real-time applications
Use 640×640 for static images where accuracy matters
Switch to WebGPU backend if available (Chrome/Edge 94+)
Filter classes to reduce processing overhead
Increase confidence thresholds to skip low-quality detections
Process every Nth frame in video loops

Browser Support

Browser	Version	Backend
Chrome	94+	WASM, WebGPU
Edge	94+	WASM, WebGPU
Firefox	95+	WASM
Safari	16.4+	WASM

Complete Example — Real-time Video Detection

import { ObjectDetector, drawResultsOnCanvas } from 'rtmlib-ts';

async function startDetection() {
  // Initialize detector
  const detector = new ObjectDetector({
    classes: ['person', 'car'],
    inputSize: [416, 416],
    backend: 'webgpu',  // Use WebGPU if available
  });

  await detector.init();

  // Setup camera
  const video = document.querySelector('video')!;
  const stream = await navigator.mediaDevices.getUserMedia({ 
    video: { width: 1280, height: 720 } 
  });
  video.srcObject = stream;

  // Detection loop
  video.addEventListener('play', async () => {
    while (!video.paused && !video.ended) {
      const objects = await detector.detectFromVideo(video);

      // Draw results
      const canvas = document.querySelector('canvas')!;
      const ctx = canvas.getContext('2d')!;
      drawResultsOnCanvas(ctx, objects, 'object');

      // Get stats
      const stats = (objects as any).stats;
      console.log(`Detected ${stats.totalCount} objects in ${stats.inferenceTime}ms`);

      await new Promise(resolve => requestAnimationFrame(resolve));
    }
  });
}

startDetection();

Drawing Utilities

Built-in canvas drawing functions for quick visualization:

import { 
  drawDetectionsOnCanvas,
  drawPoseOnCanvas,
  drawResultsOnCanvas
} from 'rtmlib-ts';

// Auto-detects mode (object or pose)
drawResultsOnCanvas(ctx, results, 'object');

// Custom drawing
drawDetectionsOnCanvas(ctx, detections, '#00ff00');
drawPoseOnCanvas(ctx, people, 0.3);  // 0.3 confidence threshold

Troubleshooting

Issue	Solution
No detections	Lower confidence threshold (try 0.3)
Slow inference	Use WebGPU, reduce input size, filter classes
Model loading failed	Use HTTP server (not file://), check CORS
Unknown class	Use exact COCO class names (see `getAvailableClasses()`)

Contributing

Feedback and contributions are welcome:

Report bugs or request features via Issues
Submit code improvements via Pull Requests
Star the repository if you find it useful

License

Apache 2.0 — Free for commercial and open-source use.

If you have any questions or suggestions, please leave a comment or start a discussion in the repository.

DEV Community