DEV Community

Cover image for AI Thought Visualizer ✨
Veronika K.
Veronika K.

Posted on

AI Thought Visualizer ✨

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

AI Thought Visualizer is a tiny, deployable applet that shows how human language can be compressed into a compact, machine-friendly representation and then expanded back into a new visual and a fresh piece of text.

Why this matters: people often ask whether AIs have a “language of their own.” In practice, multi-agent systems tend to communicate via structured data (JSON) or embeddings—dense numeric vectors that carry meaning without human phrasing. This applet turns that idea into an interactive experience:

  • Input: a phrase, an uploaded image, or your voice.

  • Compression: Gemini extracts a minimal JSON concept (emotion, elements, setting, time_of_day, mood, temperature).

  • Generation: Imagen turns that JSON into abstract artwork; Gemini rewrites a short, poetic description only from the JSON.

  • Controls: creativity (temperature), visual style presets, regenerate image, and a small history.

It’s an educational and delightful way to “peek” at how an AI might trade human words for compact meaning—and then return to language again.

Demo

Screenshots

AI Thought Visualizer — app UI with input, creativity/style controls, JSON concept, generated image, and reconstructed text

Created image with prompt "A fleeting memory of a forgotten dream, tasting of salt and summer rain."

Abstract artwork generated by Imagen from the prompt “A fleeting memory of a forgotten dream, tasting of salt and summer rain.”

Visualization & Reconstruction

Origin image AI-created image
Source image (user upload) — sample 1 AI-generated visualization from JSON concept — sample 1
Source image (user upload) — sample 2 AI-generated visualization from JSON concept — sample 2

Note: If Imagen becomes temporarily unavailable during judging, the video shows the full flow end-to-end.

How I Used Google AI Studio

  • Built in Google AI Studio using the “Build apps with Gemini” flow as a starting point, then extended it with microphone input, image understanding, style/creativity controls, history, and share/download.

  • Models:

    • Gemini 2.5 Flash — text understanding + strict JSON + vision (image understanding).
    • Imagen (v4) abstract visual generation.
    • (Optional) Gemini Live API for voice → transcription → same pipeline.
  • Deployment: packaged as a small SPA and deployed to Cloud Run (public URL, unauthenticated access).

Gemini 2.5 Pro was used for prototyping inside Google AI Studio; deployment uses Flash for lower latency/cost.

Minimal architecture

UI (React + Tailwind)
 ├─ Input: text | voice (Live API) | image
 ├─ Gemini 2.5 → JSON concept (strict schema)
 ├─ Imagen ← JSON → abstract artwork (style-aware prompt)
 └─ Gemini 2.5 ← JSON → short poetic description
Cloud Run serves the app; Share/Download provide links/assets.
Enter fullscreen mode Exit fullscreen mode

Multimodal Features

  • Text → JSON: Gemini produces a strict, minimal schema (no prose).

  • Image → JSON: upload a picture; Gemini extracts scene objects, mood, time, setting.

  • Voice → Text: Live API transcribes speech and feeds it into the same concept pipeline.

  • JSON → Image: Imagen renders an abstract visualization of the concept with style presets (Abstract / Neon / Watercolor / Cosmic / Minimal).

  • JSON → Text: Gemini generates a new, poetic description without seeing the original phrase (only the concept).

  • UX: creativity slider (temperature), “Regenerate image only,” history (localStorage), Share & Download.

Why this app supports the “AI language” idea

There’s a long-standing observation in multi-agent research: if you optimize agents only for task success, they may develop concise codes instead of human-readable sentences. In production, AI systems don’t swap secret audio—they exchange data:

  • Structured messages (e.g., JSON) – human-auditable, compact, and task-focused.

  • Embeddings – vectors that encode concepts directly; think of them as “coordinates of meaning.”

AI Thought Visualizer simulates this: it compresses a human utterance into a minimal JSON (a proxy for the machine representation), generates a visual from that compressed signal, and reconstructs human language from the same signal. The result feels like watching an AI think.

Thanks for reading — and for the Challenge!

Top comments (0)