TL;DR: NVIDIA released LocateAnything-3B — a 3 billion parameter vision-language model that can find any object in any image or video just from a text description. It runs on consumer GPUs (8GB+ VRAM), works with 30 lines of Python, and supports object detection, phrase grounding, OCR, GUI element detection, and more.
The "Where's Waldo?" AI
Ever needed to find every "red shirt" in a photo? NVIDIA's new LocateAnything-3B does exactly that — and it's open source. Released just 6 days ago (May 26, 2026), this model represents a leap forward in visual grounding.
Wait, isn't this just YOLO?
YOLO can only detect what it was trained on. LocateAnything finds anything you describe — unlimited categories, GUI elements, scene text, and more.
The Secret Sauce: Parallel Box Decoding
Instead of generating bounding box coordinates token by token (autoregressive), LocateAnything uses Parallel Box Decoding — predicting complete bounding box coordinates in a single parallel step, achieving up to 2.5x higher throughput.
How to Run It (3 lines)
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="nvidia/LocateAnything-3B", trust_remote_code=True)
result = pipe([{"type": "image", "url": "photo.jpg"}, {"type": "text", "text": "Locate all cars in this image."}])
Hardware Requirements
- 8GB+ VRAM (RTX 3060 12GB works with quantization)
- 16GB+ system RAM
Serve via API (vLLM)
pip install vllm
vllm serve "nvidia/LocateAnything-3B"
Docker
docker model run hf.co/nvidia/LocateAnything-3B
Community Ecosystem
| Tool | Description |
|---|---|
| WebUI (16★) | Docker-based web interface |
| ComfyUI Node | For our image pipeline! |
| MCP Server (2★) | For Claude Code |
| FastAPI (1★) | REST API wrapper |
| Colab Notebook | Free GPU playground |
License
Non-commercial (research use). Free for personal projects.
| Model | NVIDIA LocateAnything-3B |
| Parameters | 3B (7.8GB) |
| HuggingFace | nvidia/LocateAnything-3B |
| Demo | huggingface.co/spaces/nvidia/LocateAnything |
Found this useful? Follow me for more AI tool deep dives. 🔔
Top comments (0)