大阪のUber配達員

Posted on Sep 26

Meet “Awesome Gemini Robotics” — Copy-Paste Prompts & Demos for Gemini Robotics-ER 1.5

#ai #robotics #gemini #deepmind

https://github.com/GitHub30/Awesome-Gemini-Robotics

TL;DR

If you’re exploring Gemini Robotics-ER 1.5 (Google/DeepMind’s embodied-reasoning model for robots), this community repo collects ready-to-run prompts, reference images and usage patterns you can drop straight into your own projects. It tracks what ER 1.5 is good at right now—pointing/boxes/trajectories, long-horizon planning, temporal reasoning on video, tool-use (e.g., Search)—and shows practical prompt shapes for each. (GitHub)

Why this repo matters (and why now)

On Sept. 25, 2025, Google announced that Gemini Robotics-ER 1.5 is available to developers via the Gemini API in Google AI Studio. ER 1.5 acts as a “high-level brain” for robots: it plans multi-step tasks, performs spatial/temporal reasoning, and can natively call tools (like Google Search) to ground its plans in the real world. ([Google Developers Blog][2])

The official docs summarize what you can build today: unified object finding/labeling (points & boxes), trajectory planning, video understanding, and orchestration of long-horizon tasks—with the model currently in preview. (Google AI for Developers)

Awesome Gemini Robotics rides that momentum by curating concrete, reproducible prompts and screenshots so you don’t have to start from a blank page. (GitHub)

What’s inside

The repo is a gallery of 10+ use cases (and growing), each with an image and a copy-friendly prompt. Examples include:

Pointing to undefined/defined objects (single or multiple categories)
Bounding boxes and part localization
Counting by pointing
Trajectory generation (e.g., moving a pen to an organizer, brushing particles)
Obstacle-aware paths
Video analysis (step-by-step breakdowns and frame tracking)
Orchestrating tasks (e.g., “make room for a laptop,” “pack a lunch”)

Each card shows the expected output structure (normalized coordinates; JSON-friendly answers) to plug into your controller or VLA. (GitHub)

It also includes tips & patterns—like normalizing point/box coordinates to a 0–1000 range, tuning the thinking budget for latency vs. accuracy, and interleaving text with spatial outputs to create “spatially grounded” plans. (GitHub)

Bonus: the README is available in multiple languages (JA/KR/VI/ZH/EN), making it easier to share across global teams. (GitHub)

How it aligns with the official guidance

The AI for Developers robotics overview shows the same pillars this repo foregrounds—points, boxes, trajectories, video, and orchestration—and provides minimal client snippets for ER 1.5 (currently gemini-robotics-er-1.5-preview). If you’re new, start there to wire up your SDK, then use the repo’s prompts as drop-in tests. (Google AI for Developers)

The launch blog reinforces the model’s role as an agentic planner that can call tools (e.g., Search) and sequence your robot APIs—exactly the workflows many cards in the repo illustrate. ([Google Developers Blog][2])

Quick look: a reusable prompt pattern

A recurring pattern in the repo (and docs) is to ask for structured spatial outputs you can feed into your planner:

Point to no more than N items in the image.
Return [{"point":[y,x],"label":"<name>"}] with y/x normalized to 0–1000.

This shape keeps responses machine-friendly and model-agnostic, and it scales to parts, boxes, or trajectories with minimal edits. (GitHub)

Who should use this

Robotics researchers & builders prototyping perception-to-action loops and benchmarking prompt formats
Product teams validating UX for natural-language robot tasks (clean-up, organizing, sorting, etc.)
Educators & hobbyists looking for clear examples of agentic multimodal reasoning grounded in spatial outputs

For deeper context on the two-model system (ER 1.5 as the planner + the action-oriented VLA model), see DeepMind’s overview of Gemini Robotics 1.5 and ER 1.5. (Google DeepMind)

Getting started

Skim the official docs to confirm the current model name, preview status, and input limits. (Google AI for Developers)
Run a minimal image-prompt call in your environment (Python or REST). (Google AI for Developers)
Pick a use case from the repo (e.g., “2D bounding boxes” or “Simple trajectory planning”) and paste the prompt as-is. (GitHub)
Wire responses (points/boxes/paths) into your controller or VLA for execution; increase the thinking budget for harder tasks like counting or affordance reasoning. (Google AI for Developers)

Contributing

The repo welcomes PRs. Add a new folder under cases/<short-name>/ with a short README, your prompt, and an image (or link). Keep prompts copy-runnable, JSON-friendly, and cite primary sources (docs/blogs/videos). (GitHub)

License & image attribution

Text is MIT-licensed. Demo images marked in the repo reference Google’s public docs/blog; check source licenses before redistributing and replace placeholders with your own captures when in doubt. (GitHub)

DEV Community