Project Overview
Hello dev.to community,
I would like to share a project I have been developing — rtmlib-ts. This is a TypeScript library designed for browser-based AI inference, featuring support for YOLO12. It enables real-time object detection, multi-person pose estimation, animal detection, and even 3D pose tracking directly in the web browser using WebAssembly, without requiring any backend infrastructure.
Repository: https://github.com/GOH23/rtmlib-ts
Live Playground: https://rtmlib-playground.vercel.app/
Core Features
| Feature | Description |
|---|---|
| Object Detection | 80 COCO classes using YOLO12n |
| Pose Estimation | 17 keypoints skeleton tracking for humans |
| Animal Detection | 30 animal species with pose estimation |
| 3D Pose Estimation | Full-body 3D keypoints (x, y, z coordinates) |
| Custom Models | Run any ONNX model with flexible API |
| Browser-based | Pure WebAssembly/WebGPU, no server required |
| Video Support | Real-time camera streams and video files |
| Performance | ~40-200ms inference depending on configuration |
Installation
npm install rtmlib-ts
Available Detectors
1. ObjectDetector — General Object Detection
Detects 80 COCO classes (person, car, dog, etc.). Models are loaded automatically.
import { ObjectDetector } from 'rtmlib-ts';
const detector = new ObjectDetector({
classes: ['person', 'car'], // Optional: filter classes
confidence: 0.5,
inputSize: [416, 416],
backend: 'wasm', // or 'webgpu'
});
await detector.init();
const objects = await detector.detectFromCanvas(canvas);
console.log(`Found ${objects.length} objects`);
// Change classes dynamically
detector.setClasses(['dog', 'cat']);
Input Sources:
-
detectFromCanvas(canvas)— HTMLCanvasElement -
detectFromVideo(video)— HTMLVideoElement (real-time) -
detectFromImage(image)— HTMLImageElement -
detectFromFile(file)— File upload -
detectFromBlob(blob)— Camera capture
2. PoseDetector — Human Pose Estimation
Combines YOLO12 detection with RTMW pose estimation for 17 keypoints.
import { PoseDetector } from 'rtmlib-ts';
const detector = new PoseDetector({
detInputSize: [416, 416],
poseInputSize: [384, 288],
detConfidence: 0.5,
poseConfidence: 0.3,
backend: 'wasm',
});
await detector.init();
const people = await detector.detectFromCanvas(canvas);
people.forEach(person => {
const visibleKpts = person.keypoints.filter(k => k.visible).length;
console.log(`Person: ${visibleKpts}/17 keypoints visible`);
});
Keypoints (COCO 17): nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles
3. AnimalDetector — Animal Detection & Pose
Supports 30 animal species with bounding boxes and 17 keypoints pose estimation.
import { AnimalDetector } from 'rtmlib-ts';
const detector = new AnimalDetector({
classes: ['dog', 'cat', 'horse'], // Filter specific animals
detConfidence: 0.5,
poseConfidence: 0.3,
backend: 'wasm',
});
await detector.init();
const animals = await detector.detectFromCanvas(canvas);
animals.forEach(animal => {
console.log(`${animal.className}: ${animal.bbox.confidence * 100}%`);
console.log(`Keypoints: ${animal.keypoints.length}`);
});
Supported Animals: gorilla, spider-monkey, zebra, elephant, hippo, tiger, lion, panda, dog, cat, horse, and 20 more species.
4. Pose3DDetector — 3D Pose Estimation
Provides 3D coordinates (x, y, z in meters) for each keypoint.
import { Pose3DDetector } from 'rtmlib-ts';
const detector = new Pose3DDetector({
detInputSize: [640, 640],
poseInputSize: [384, 288],
backend: 'wasm',
});
await detector.init();
const result = await detector.detectFromCanvas(canvas);
result.keypoints.forEach((person, i) => {
person.forEach((kpt, j) => {
// kpt = [x, y, z] in meters
console.log(`Person ${i}, Keypoint ${j}: [${kpt[0]}, ${kpt[1]}, ${kpt[2]}]`);
});
});
Output Structure:
-
keypoints— 3D coordinates [numPeople][17][3] -
keypoints2d— 2D projection for canvas drawing -
scores— Confidence scores per keypoint
5. CustomDetector — Any ONNX Model
Maximum flexibility for custom ONNX models with preprocessing/postprocessing.
import { CustomDetector } from 'rtmlib-ts';
const detector = new CustomDetector({
model: 'path/to/model.onnx',
inputSize: [224, 224],
normalization: {
mean: [123.675, 116.28, 103.53],
std: [58.395, 57.12, 57.375],
},
postprocessing: (outputs) => {
const output = outputs['output'];
const scores = Array.from(output.data);
const predictedClass = scores.indexOf(Math.max(...scores));
return { predictedClass, confidence: scores[predictedClass] };
},
});
await detector.init();
const result = await detector.runFromCanvas(canvas);
console.log(`Predicted: ${result.data.predictedClass}`);
Use Cases:
- Image classification (ResNet, MobileNet)
- Object detection (YOLO variants)
- Semantic segmentation
- Face landmarks
- Custom models
Performance Benchmarks
Typical inference times on M1 MacBook Pro:
| Detector | Configuration | Time | Use Case |
|---|---|---|---|
| ObjectDetector | WASM, 416×416 | ~40ms | Real-time video |
| ObjectDetector | WASM, 640×640 | ~80ms | High accuracy |
| ObjectDetector | WebGPU, 640×640 | ~30ms | Fastest |
| PoseDetector | WASM, 416×416 + 384×288 | ~85ms | Real-time pose |
| PoseDetector | WebGPU, 640×640 + 384×288 | ~60ms | High accuracy |
| AnimalDetector | WASM, 640×640 + 256×192 | ~190ms | 3 animals |
| Pose3DDetector | WASM, 640×640 + 384×288 | ~255ms | 3 people 3D |
Optimization Tips:
- Use 416×416 input size for video/real-time applications
- Use 640×640 for static images where accuracy matters
- Switch to WebGPU backend if available (Chrome/Edge 94+)
- Filter classes to reduce processing overhead
- Increase confidence thresholds to skip low-quality detections
- Process every Nth frame in video loops
Browser Support
| Browser | Version | Backend |
|---|---|---|
| Chrome | 94+ | WASM, WebGPU |
| Edge | 94+ | WASM, WebGPU |
| Firefox | 95+ | WASM |
| Safari | 16.4+ | WASM |
Complete Example — Real-time Video Detection
import { ObjectDetector, drawResultsOnCanvas } from 'rtmlib-ts';
async function startDetection() {
// Initialize detector
const detector = new ObjectDetector({
classes: ['person', 'car'],
inputSize: [416, 416],
backend: 'webgpu', // Use WebGPU if available
});
await detector.init();
// Setup camera
const video = document.querySelector('video')!;
const stream = await navigator.mediaDevices.getUserMedia({
video: { width: 1280, height: 720 }
});
video.srcObject = stream;
// Detection loop
video.addEventListener('play', async () => {
while (!video.paused && !video.ended) {
const objects = await detector.detectFromVideo(video);
// Draw results
const canvas = document.querySelector('canvas')!;
const ctx = canvas.getContext('2d')!;
drawResultsOnCanvas(ctx, objects, 'object');
// Get stats
const stats = (objects as any).stats;
console.log(`Detected ${stats.totalCount} objects in ${stats.inferenceTime}ms`);
await new Promise(resolve => requestAnimationFrame(resolve));
}
});
}
startDetection();
Drawing Utilities
Built-in canvas drawing functions for quick visualization:
import {
drawDetectionsOnCanvas,
drawPoseOnCanvas,
drawResultsOnCanvas
} from 'rtmlib-ts';
// Auto-detects mode (object or pose)
drawResultsOnCanvas(ctx, results, 'object');
// Custom drawing
drawDetectionsOnCanvas(ctx, detections, '#00ff00');
drawPoseOnCanvas(ctx, people, 0.3); // 0.3 confidence threshold
Troubleshooting
| Issue | Solution |
|---|---|
| No detections | Lower confidence threshold (try 0.3) |
| Slow inference | Use WebGPU, reduce input size, filter classes |
| Model loading failed | Use HTTP server (not file://), check CORS |
| Unknown class | Use exact COCO class names (see getAvailableClasses()) |
Contributing
Feedback and contributions are welcome:
- Report bugs or request features via Issues
- Submit code improvements via Pull Requests
- Star the repository if you find it useful
License
Apache 2.0 — Free for commercial and open-source use.
If you have any questions or suggestions, please leave a comment or start a discussion in the repository.
Top comments (0)