This is a submission for the Google AI Studio Multimodal Challenge
What I Built
Demo
Multimodal features
ArtfulWhisper is fundamentally multimodal, creating a seamless flow between text and image data to deliver its unique functionality.
Text-to-Image Generation: The primary multimodal feature is taking a user's text prompt and transforming it into a rich, complex image using the Imagen 3 model. This is the creative heart of the app.
2.Fusing Text within an Image:The application then takes a second text input (the secret message) and algorithmically embeds it directly into the pixel data of the newly generated image. This goes beyond simple input-output; it's about fusing one modality (text) invisibly inside another (image).
The user experience is about power. It enhances it by giving the user a sense of control and secrecy that a simple image-and-text app could never provide. The magic isn't in seeing the two modalities work together; it's in knowing that one is invisibly controlling the other. It's a demonstration of how multimodal AI can be used for more than just cute chatbots and summary tools. It can be used to keep secrets.
Top comments (0)