DEV Community

龙虾牧马人
龙虾牧马人

Posted on

NVIDIA Just Open-Sourced 'LocateAnything-3B': Find ANY Object in Any Image (30 Lines of Code)

TL;DR: NVIDIA released LocateAnything-3B — a 3 billion parameter vision-language model that can find any object in any image or video just from a text description. It runs on consumer GPUs (8GB+ VRAM), works with 30 lines of Python, and supports object detection, phrase grounding, OCR, GUI element detection, and more.


The "Where's Waldo?" AI

Ever needed to find every "red shirt" in a photo? NVIDIA's new LocateAnything-3B does exactly that — and it's open source. Released just 6 days ago (May 26, 2026), this model represents a leap forward in visual grounding.

Wait, isn't this just YOLO?

YOLO can only detect what it was trained on. LocateAnything finds anything you describe — unlimited categories, GUI elements, scene text, and more.

The Secret Sauce: Parallel Box Decoding

Instead of generating bounding box coordinates token by token (autoregressive), LocateAnything uses Parallel Box Decoding — predicting complete bounding box coordinates in a single parallel step, achieving up to 2.5x higher throughput.

How to Run It (3 lines)

from transformers import pipeline
pipe = pipeline("image-text-to-text", model="nvidia/LocateAnything-3B", trust_remote_code=True)
result = pipe([{"type": "image", "url": "photo.jpg"}, {"type": "text", "text": "Locate all cars in this image."}])
Enter fullscreen mode Exit fullscreen mode

Hardware Requirements

  • 8GB+ VRAM (RTX 3060 12GB works with quantization)
  • 16GB+ system RAM

Serve via API (vLLM)

pip install vllm
vllm serve "nvidia/LocateAnything-3B"
Enter fullscreen mode Exit fullscreen mode

Docker

docker model run hf.co/nvidia/LocateAnything-3B
Enter fullscreen mode Exit fullscreen mode

Community Ecosystem

Tool Description
WebUI (16★) Docker-based web interface
ComfyUI Node For our image pipeline!
MCP Server (2★) For Claude Code
FastAPI (1★) REST API wrapper
Colab Notebook Free GPU playground

License

Non-commercial (research use). Free for personal projects.


Model NVIDIA LocateAnything-3B
Parameters 3B (7.8GB)
HuggingFace nvidia/LocateAnything-3B
Demo huggingface.co/spaces/nvidia/LocateAnything

Found this useful? Follow me for more AI tool deep dives. 🔔

Top comments (0)