This is a submission for the Google AI Studio Multimodal Challenge
What I Built
Stylo AI is a web-based, multi-tool AI photoshoot studio that automates and streamlines virtual fashion photography. The app empowers users—including designers, photographers, and retailers—to instantly generate photorealistic images of models wearing custom garments, revise poses, and place subjects in new AI-generated environments. Stylo AI solves the time-consuming and costly process of manual photoshoots by enabling anyone to produce high-quality fashion visuals with advanced accuracy, style control, and workflow efficiency.
Demo
Deployed Applet: https://ai.studio/apps/drive/1heo5X_TOhsXa1fHtPX6F-t41QJGEHQ1u
Screenshots:
Avatar creation (neutral garment inpainting)
Try-on results (model in new garment)
Pose change (model in custom pose)
Environment placement (model in new setting)
Video: https://drive.google.com/file/d/1blkhUIjTbAqToA6yRJYXTL11VWXXZloB/view?usp=sharing
How I Used Google AI Studio
Stylo AI is built on Google AI Studio, using the @google/genai SDK for seamless access to Gemini’s multimodal image and text generation capabilities. All AI interactions—including image synthesis, pose transformation, and rigorous output evaluation—are handled via a dedicated Gemini Service Layer. The primary models used are gemini-2.5-flash-image-preview for image-to-image tasks and gemini-2.5-pro for self-validation and JSON-based evaluation, ensuring the highest fidelity and accuracy in generated outputs.
Being this my first experience with Google AI Studio it has been eyes opening to me as a seasoned development that we need to jump into the wagon. I have been able to very quickly prototype and iterate different ideas just by maintaining a conversation with an AI assistant, and the results are impressive. The functionalities quickly prototyped here may be integrated in the future as part of a bigger product in our company.
Multimodal Features
Stylo AI implements advanced multimodal workflows with the following tools:
Avatar: Precise inpainting to standardize model photos for virtual try-ons using user-provided images.
Try-on: Deep image synthesis integrating model and garment images, with automated output validation via structured JSON scoring for realism and garment fidelity.
Pose: AI-driven pose transformation by matching a reference sketch, ensuring the model’s identity, attire, and background are preserved.
Environment: AI matte painting places the model in a new, thematic environment, matching lighting and ambiance based on an inspiration image.
These features let users manipulate images, sketches, and reference environments, providing end-to-end control over fashion photoshoots and dramatically enhancing creative possibilities, realism, and productivity in virtual photography.
Thank you for the opportunity to participate!
Top comments (1)
It feels video link got wrong on drag&drop, dropping it here as I am not sure if modifying challenge posts is allowed: [URL]