DEV Community

Cover image for Building Reverse Engineering Reality with Google Gemini
Mwanza Simi
Mwanza Simi Subscriber

Posted on

Building Reverse Engineering Reality with Google Gemini

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I built an app called "Reverse Engineering Reality." You upload a photo of any everyday object, and it gives you detailed instructions for assembling or disassembling it. The instructions are fictional but surprisingly detailed complete with materials, tools, step-by-step guides, and custom illustrations for each step.

The idea came from that moment when you look at something and wonder how it's made. Instead of just wondering, you get an actual blueprint. It's part educational, part creative experiment. You can take a photo of your coffee maker or a lamp and get a full breakdown of how you'd theoretically build it from scratch.

I used gemini-2.5-flash for the core analysis and text generation, and imagen-4.0-generate-001 for creating the step illustrations. The app analyses your photo, identifies objects in it, lets you pick which one you want instructions for, then generates everything. After that, there's a chat assistant that knows about the blueprint you just created, so you can ask followup questions.

Phone Disassembly

The structured output feature was critical here. I defined a JSON schema that tells Gemini exactly what format I need, object name, materials list, tools, numbered steps, image prompts for each step. Without that, I'd be parsing unstructured text and hoping for consistency. With it, I get clean, predictable data every time.

For the illustrations, each step includes a text prompt that gets sent to Imagen. So the AI analyzes your photo, writes instructions, writes prompts for diagrams, then generates those diagrams. It's a full multimodal pipeline image to text to image.

Demo

Here is the embedded app you can play with:

What I Learned

Structured outputs changed how I think about building with AI. Instead of treating the model like a black box that returns text you have to wrangle, you can define exactly what you need and get it reliably. That makes the difference between a demo and something you can actually build a UI around.

I also learned that chaining models works better than I expected. Using one model for understanding and another for generation gave me more control over each part of the process. The chat feature was straightforward to add once the main pipeline worked just pass the generated instructions as context and let users ask questions about them.

Steps

The biggest surprise was how good the generated illustrations turned out. I wasn't sure if Imagen could handle technical diagram style images from text prompts, but it consistently produced clear, relevant visuals that actually help explain the steps.

Google Gemini Feedback

The structured output feature worked great. No complaints there it did exactly what I needed and made the whole project possible.

Items

The multimodal capabilities were solid. Image understanding was accurate enough for object detection and analysis, and the integration between models felt smooth. I didn't have to do much work to get them talking to each other.

The main friction was prompt tuning. Getting the right balance between creative and practical in the instructions took some iteration. Too vague and the steps weren't useful, too rigid and they felt robotic. System instructions helped, but it still took testing to find the sweet spot. AI Studio made that easier since I could experiment with prompts before writing code.

Top comments (0)