DEV Community

Cover image for Turning Music Into Art — Building a Synesthesia Simulator with Gemini
sarthak bhardwaj
sarthak bhardwaj

Posted on

Turning Music Into Art — Building a Synesthesia Simulator with Gemini

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built the Synesthesia Simulator, an AI-powered applet designed to translate sound and imagery into a unified, cross-sensory artistic experience. It creatively simulates the neurological trait of synesthesia, allowing users to see music as color and hear pictures as melodies.

The applet provides a creative and exploratory space for users to discover novel connections between their senses. You can upload an audio file, an image file, or both, and the AI generates:

  • A Descriptive Scene – A vivid, artistic narrative describing the blended sensory experience.
  • Creative Prompts – Inspiring ideas for writing, art, or reflection based on the output.
  • A Generated Vision – A unique AI-generated image visually representing the fusion of sound and/or visuals.
  • Creative Chat – An interactive chat session with a creative AI assistant, primed with the context of your generated experience, to explore ideas further.

My goal was to create a tool that not only showcases advanced AI but also serves as a source of inspiration — particularly for creative and neurodiverse individuals who may naturally think in cross-sensory ways. It's not a medical tool, but a canvas for imagination.


Demo

Live Applet Link:

➡️ Launch the Synesthesia Simulator Here

Screenshots & Walkthrough:

Here’s the main interface where you can upload an audio file and an image:

Synesthesia Simulator Upload

After processing, the applet presents the AI's synesthetic interpretation alongside a newly generated piece of art. The app includes a built-in audio visualizer that reacts to your music, with customizable color schemes:

Synesthesia Simulator Output & Visualizer

Other features and showing the experience with a context-aware creative AI assistant:

Synesthesia Chat

History

How I Used Google AI Studio

Google AI Studio and the Gemini API power this entire experience. I combined multiple models in a seamless pipeline to handle complex multimodal tasks:

  • Gemini 2.5 Flash (Multimodal Understanding):

    • Core of the simulator.
    • Handles system prompt + user prompt + audio file bytes + image file bytes all in one request.
    • Outputs a structured JSON (descriptiveScene, creativePrompts, imageGenerationInstruction) for reliable integration into the UI.
  • Imagen 4.0 (Image Generation):

    • Translates the imageGenerationInstruction from Gemini into tangible artwork.
    • Creates visuals that embody the cross-sensory interpretation.
  • Gemini 2.5 Flash (Conversational AI):

    • Powers the Creative Chat.
    • A new chat session is initialized with descriptiveScene + creativePrompts as context.
    • Turns the assistant into a creative partner, offering deeper exploration of the user’s generated experience.

Multimodal Features

The multimodal capabilities of Gemini are what make this applet possible:

  • Cross-Modal Understanding:

    • Goes beyond analyzing audio and images separately.
    • Interprets emotional tone of melodies, maps rhythms to textures, and links color palettes to musical patterns.
    • Produces the descriptive scene that defines the synesthetic simulation.
  • Sense-Blending for Generation:

    • Uses cross-modal insights to drive Imagen prompts.
    • Example: “Abstract glowing waves of violet and silver flowing in rhythm with deep piano chords.”
    • Generates true synthesis of sound + visual inputs.
  • Contextual Conversation:

    • Creative Chat expands the experience.
    • Users can ask: “What does the color red sound like in this song?” or “Tell me a story based on the third creative prompt.”
    • The assistant responds with context-aware, imaginative answers.

✨ Thank you for checking out my project!

Submission by: @sarthak_bhardwaj_05aba55d

Top comments (0)