This is a submission for the Google AI Studio Multimodal Challenge
What I Built
AI Thought Visualizer is a tiny, deployable applet that shows how human language can be compressed into a compact, machine-friendly representation and then expanded back into a new visual and a fresh piece of text.
Why this matters: people often ask whether AIs have a “language of their own.” In practice, multi-agent systems tend to communicate via structured data (JSON) or embeddings—dense numeric vectors that carry meaning without human phrasing. This applet turns that idea into an interactive experience:
Input: a phrase, an uploaded image, or your voice.
Compression: Gemini extracts a minimal JSON concept (emotion, elements, setting, time_of_day, mood, temperature).
Generation: Imagen turns that JSON into abstract artwork; Gemini rewrites a short, poetic description only from the JSON.
Controls: creativity (temperature), visual style presets, regenerate image, and a small history.
It’s an educational and delightful way to “peek” at how an AI might trade human words for compact meaning—and then return to language again.
Demo
Live app (Cloud Run): Open App →
Video (fallback for judging): Watch Video →
Source code: GitHub Link →
Screenshots
Created image with prompt "A fleeting memory of a forgotten dream, tasting of salt and summer rain."
Visualization & Reconstruction
Origin image | AI-created image |
---|---|
![]() |
![]() |
![]() |
![]() |
Note: If Imagen becomes temporarily unavailable during judging, the video shows the full flow end-to-end.
How I Used Google AI Studio
Built in Google AI Studio using the “Build apps with Gemini” flow as a starting point, then extended it with microphone input, image understanding, style/creativity controls, history, and share/download.
-
Models:
- Gemini 2.5 Flash — text understanding + strict JSON + vision (image understanding).
- Imagen (v4) abstract visual generation.
- (Optional) Gemini Live API for voice → transcription → same pipeline.
Deployment: packaged as a small SPA and deployed to Cloud Run (public URL, unauthenticated access).
Gemini 2.5 Pro was used for prototyping inside Google AI Studio; deployment uses Flash for lower latency/cost.
Minimal architecture
UI (React + Tailwind)
├─ Input: text | voice (Live API) | image
├─ Gemini 2.5 → JSON concept (strict schema)
├─ Imagen ← JSON → abstract artwork (style-aware prompt)
└─ Gemini 2.5 ← JSON → short poetic description
Cloud Run serves the app; Share/Download provide links/assets.
Multimodal Features
Text → JSON: Gemini produces a strict, minimal schema (no prose).
Image → JSON: upload a picture; Gemini extracts scene objects, mood, time, setting.
Voice → Text: Live API transcribes speech and feeds it into the same concept pipeline.
JSON → Image: Imagen renders an abstract visualization of the concept with style presets (Abstract / Neon / Watercolor / Cosmic / Minimal).
JSON → Text: Gemini generates a new, poetic description without seeing the original phrase (only the concept).
UX: creativity slider (temperature), “Regenerate image only,” history (localStorage), Share & Download.
Why this app supports the “AI language” idea
There’s a long-standing observation in multi-agent research: if you optimize agents only for task success, they may develop concise codes instead of human-readable sentences. In production, AI systems don’t swap secret audio—they exchange data:
Structured messages (e.g., JSON) – human-auditable, compact, and task-focused.
Embeddings – vectors that encode concepts directly; think of them as “coordinates of meaning.”
AI Thought Visualizer simulates this: it compresses a human utterance into a minimal JSON (a proxy for the machine representation), generates a visual from that compressed signal, and reconstructs human language from the same signal. The result feels like watching an AI think.
Thanks for reading — and for the Challenge!
Top comments (0)