This is a submission for the Google AI Studio Multimodal Challenge
What I Built
Yo guys, i create a clone feature from Higgsfield.ai, Draw To Edit. For anyone that doesn't know about that feature in higgsfield.ai, u can check that in this link https://higgsfield.ai/posts/2LrRHrSK4MAkZNurxqPwtm
Basically that feature is to edit the base image with just give simple command and arrow. U don't have to write all prompt, just like drawing in canvas.
U can give text with minimal clue then gemini will complete all for you. even maybe just add your image on top base image to make that fit to scene perfectly, just add image then create arrow pointed to spesific location in base image. i create this 3 hours before submission end. but fortunely i can submit in the last 5 minutes :).
Demo
https://ai-canvas-scene-creator-318270580130.us-west1.run.app/
How I Used Google AI Studio
i use google ai studio to create the code from beginning. But i test everything about the features, so i ended up with this idea.
google gemini-2.5-pro
use this to analyze the base image that have draw on top of that, so the gemini will give few command thta later will use in google gemini-2.5 flash-image-preview to edit the image.google gemini 2.5-flash-image-preview
now, after google gemini-2.5-pro give this model the command. i send the command with few images and base image to edit this magic things.
Multimodal Features
google gemini-2.5-pro
just use the brain of this model to analyze the image provided.google gemini-2.5-flash
just to edit the image to make the output result align with user expectation.
Top comments (0)