DEV Community

Cover image for EcommView AI: From a single image to e-commerce-ready product photos, model shots & 360 views.
Drashya Kuruwa
Drashya Kuruwa

Posted on

EcommView AI: From a single image to e-commerce-ready product photos, model shots & 360 views.

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

EcommView AI is a powerful multimodal applet designed to solve one of the biggest challenges for online businesses: the high cost and complexity of professional product photography. It functions as an instant virtual photo studio, transforming a single, basic product or model photo into a complete suite of high-quality, e-commerce-ready visual assets.

The core problem this applet solves is the immense time, expense, and logistical effort required for traditional photoshoots. By leveraging the Gemini API's advanced multimodal capabilities, EcommView AI democratizes access to professional-grade imagery, empowering businesses of any size to create stunning and engaging online listings.

The experience is seamless and creative. A user uploads one image, and the applet:

  1. Generates a professional product shot by isolating the main subject on a clean white background.
  2. Creates a full-body fashion model photo if the subject is human, preserving their identity while placing them in a standard e-commerce pose.
  3. Produces an interactive 360° view with a draggable scrubber, allowing customers to see the product from every angle.
  4. Places the subject in any custom scene described by the user via a text prompt, generating realistic lifestyle shots on demand.

Ultimately, EcommView AI creates an experience of effortless creation, turning a single image into a comprehensive and high-converting visual campaign in minutes, not weeks.

Demo

The 8 frames get exported in a zip file.

The deployed applet: https://aistudio.google.com/apps/drive/1I8pX6_EpfP6e3jvbSPfP0QiYKils1PfP?showPreview=true&showAssistant=true&fullscreenApplet=true

How I Used Google AI Studio

Google AI Studio was indispensable for the prompt engineering required to make this app successful. Before writing a single line of generation code, we used the Studio to:

  1. Rapidly Prototype Prompts: I iteratively tested dozens of prompts to find the most effective wording for tasks like subject isolation, full-body model generation, and creating specific 360° angles. For example, we fine-tuned the "fashion model" prompt in AI Studio to ensure it consistently preserved the person's identity while changing their pose and background.
  2. Validate Model Behavior: I used the Studio to confirm that the gemini-2.5-flash-image-preview model could handle the complex instruction of using a source image as a "single source of truth" for generating different angles, which was critical for the 360° view feature.
  3. Debug and Refine: When a generated image wasn't quite right, I would take the exact inputs (the image and prompt) back into AI Studio to experiment and find a better approach, dramatically speeding up our development cycle.

This workflow allowed me to move from concept to implementation with confidence, knowing our prompts were already optimized for high-quality results.

Multimodal Features

EcommView AI is built around four distinct multimodal functionalities, each designed to transform a single user-uploaded image into a suite of valuable e-commerce assets. This process turns a complex, multi-step creative task into an intuitive and empowering user experience.

Automated Subject Isolation & Identification (Image + Text → Image & Text)

  • Functionality: Upon upload, the app first uses the gemini-2.5-flash-image-preview model, combining the user's image with a specific text prompt to generate a clean, new image of the main subject isolated on a white background. Immediately after, it uses the gemini-2.5-flash model with the same input image and a new text prompt ("Is this a person?") to generate a structured text output ("yes" or "no").

  • User Experience Enhancement: This creates the initial "magic moment." It takes a potentially cluttered, amateur photo and instantly provides a professional, ready-to-use asset. The subsequent automatic identification personalizes the entire workflow without any user effort. It intelligently anticipates the user's needs, unlocking the relevant "Fashion Model" feature only when appropriate, making the app feel smart, seamless, and tailored to their specific image.

One-Click Professional Model Generation (Image + Text → Image)

  • Functionality: If the subject is identified as a person, this feature combines the original image with a sophisticated text prompt that instructs the gemini-2.5-flash-image-preview model to act as an expert fashion photographer. It generates a new, photorealistic image of the person in a full-body model pose against a studio backdrop.

  • User Experience Enhancement: This is a massive value proposition that directly enhances the user's capabilities. It solves the expensive and difficult problem of hiring a model and booking a studio with a single click. For a small business owner or creator, this is incredibly empowering, providing them access to a level of professional imagery that would otherwise be out of reach and creating a significant "wow" factor.

Interactive 360° View Generation (Programmatic Image + Text → Image Series)

  • Functionality: This feature programmatically combines a single source image with a series of 8 distinct text prompts, each describing a specific viewing angle (e.g., "Right side profile view"). The gemini-2.5-flash-image-preview model generates a new image for each prompt, resulting in a cohesive set of 8 images.
  • User Experience Enhancement: This elevates the output from static images to a rich, interactive experience. The draggable 360° scrubber is an engaging, premium feature that allows end-customers to explore a product in detail, which is proven to increase conversion rates. Furthermore, the detailed progress UI—showing which angle is being generated in real-time—turns a potentially tedious wait into a transparent and fascinating creation process, keeping the user engaged and informed.

Creative Scene Co-Creation (Image + User Text → Image)

  • Functionality: This feature puts the user in the director's chair. It takes the AI-generated isolated image and combines it with a text prompt written by the user (e.g., "on a marble countertop next to a plant"). The model then generates a new image depicting that exact scene.
  • User Experience Enhancement: This transforms the app from a simple tool into a creative partner. It fosters experimentation and allows for infinite personalization, enabling users to generate custom lifestyle shots, marketing materials, or social media content on the fly. This open-ended creativity provides immense replay value and makes the user feel powerful, as their own words are instantly translated into a high-quality visual.

Top comments (0)