NVIDIA Just Open-Sourced 'LocateAnything-3B': Find ANY Object in Any Image (30 Lines of Code)

#deeplearning

TL;DR: NVIDIA released LocateAnything-3B — a 3 billion parameter vision-language model that can find any object in any image or video just from a text description. It runs on consumer GPUs (8GB+ VRAM), works with 30 lines of Python, and supports object detection, phrase grounding, OCR, GUI element detection, and more.

The "Where's Waldo?" AI

Ever needed to find every "red shirt" in a photo? NVIDIA's new LocateAnything-3B does exactly that — and it's open source. Released just 6 days ago (May 26, 2026), this model represents a leap forward in visual grounding.

Wait, isn't this just YOLO?

YOLO can only detect what it was trained on. LocateAnything finds anything you describe — unlimited categories, GUI elements, scene text, and more.

The Secret Sauce: Parallel Box Decoding

Instead of generating bounding box coordinates token by token (autoregressive), LocateAnything uses Parallel Box Decoding — predicting complete bounding box coordinates in a single parallel step, achieving up to 2.5x higher throughput.

How to Run It (3 lines)

from transformers import pipeline
pipe = pipeline("image-text-to-text", model="nvidia/LocateAnything-3B", trust_remote_code=True)
result = pipe([{"type": "image", "url": "photo.jpg"}, {"type": "text", "text": "Locate all cars in this image."}])

Hardware Requirements

8GB+ VRAM (RTX 3060 12GB works with quantization)
16GB+ system RAM

Serve via API (vLLM)

pip install vllm
vllm serve "nvidia/LocateAnything-3B"

Docker

docker model run hf.co/nvidia/LocateAnything-3B

Community Ecosystem

Tool	Description
WebUI (16★)	Docker-based web interface
ComfyUI Node	For our image pipeline!
MCP Server (2★)	For Claude Code
FastAPI (1★)	REST API wrapper
Colab Notebook	Free GPU playground

License

Non-commercial (research use). Free for personal projects.


Model	NVIDIA LocateAnything-3B
Parameters	3B (7.8GB)
HuggingFace	nvidia/LocateAnything-3B
Demo	huggingface.co/spaces/nvidia/LocateAnything

Found this useful? Follow me for more AI tool deep dives. 🔔

DEV Community