DEV Community

Cover image for How to Detect Objects in Images Using AI
AI Engine
AI Engine

Posted on • Originally published at ai-engine.net

How to Detect Objects in Images Using AI

Whether you're building inventory management, a checkout kiosk, or a security dashboard, detecting and locating objects in images is a foundational capability. An object detection API gives you bounding boxes, labels, and confidence scores through a single REST call — no model training required.

Why Object Detection Matters

Image classification tells you what is in an image. Object detection goes further: it tells you where each item is and how confident the model is. This spatial information unlocks use cases like counting products on a shelf or drawing real-time annotations on a security feed.

Training your own model requires thousands of labeled images, GPU infrastructure, and ongoing maintenance. An API eliminates all of that.

Getting Started

The API accepts an image URL and returns detected objects with labels, confidence scores, and bounding box coordinates.

import requests

url = "https://objects-detection.p.rapidapi.com/objects-detection"
headers = {
    "x-rapidapi-host": "objects-detection.p.rapidapi.com",
    "x-rapidapi-key": "YOUR_API_KEY",
    "Content-Type": "application/x-www-form-urlencoded",
}

response = requests.post(url, headers=headers, data={"url": "https://example.com/street-scene.jpg"})
data = response.json()

for label in data["body"]["labels"]:
    name = label["Name"]
    for instance in label["Instances"]:
        conf = instance["Confidence"]
        bb = instance["BoundingBox"]
        print(f"{name} ({conf:.0f}%) at [{bb['topLeft']['x']:.2f}, {bb['topLeft']['y']:.2f}]")
Enter fullscreen mode Exit fullscreen mode

The coordinates are normalized between 0 and 1 — multiply by image dimensions to get pixel values. Use confidence scores to filter low-quality detections (0.6+ for production, 0.3+ for analytics).

Real-World Use Cases

  • Retail shelf auditing — Detect products and verify planogram compliance from shelf photos
  • Security and surveillance — Detect people or vehicles in restricted zones, trigger alerts based on bounding box regions
  • Accessibility — Generate scene descriptions for visually impaired users: "2 people, 1 dog, and a park bench"
  • Processing pipelines — Detect subjects first, then pass to background removal or face detection for downstream processing

Best Practices

  1. Resize images to ~1024px before sending — saves bandwidth without affecting accuracy
  2. Filter by confidence — 0.7+ for user-facing features, 0.4+ for safety-critical apps
  3. Cache results using image hash as key for repeated images
  4. Batch with concurrency — use a job queue with exponential backoff on 429 responses

👉 Read the full tutorial with cURL, Python, and JavaScript examples

Top comments (0)