DEV Community: vinamra sharma

🎧 ReCDFyi: Bringing the Mixtape Era Back to Life With Cloud, Web, and Kiro

vinamra sharma — Fri, 05 Dec 2025 20:35:39 +0000

There are certain pieces of tech that define your childhood.
For me — a kid growing up in India in the early 2000s — it was the CD.

Not just the shiny disc itself, but everything around it: the ritual of burning one at home, the excitement of curating the perfect mix of songs, the thrill of carrying it to school and popping it into a completely different computer… and watching your files magically appear.

That moment felt like superpowers.

Fast forward a couple of decades, and CDs have quietly faded into obscurity.
Streaming and cloud storage are powerful — but somewhere along the way, we lost the personal, handcrafted charm of the mixtape.

So I found myself wondering:

What if we could bring that experience back?
What if the mixtape era could live again — but with all the power of the modern web, cloud platforms, and even AI?

That question sparked ReCDFyi.
A digital mixtape creator.
A virtual CD burner.
A little love letter to the tech that raised me — rebuilt for the world we live in today.

💿 What is ReCDFyi?

ReCDFyi lets you:

Upload tracks, images, and metadata

“Burn” them into a virtual CD-style collection

Automatically generate metadata using AI

Publish your CD to the cloud

Share it with a simple link

Let anyone browse your digital mixtape in a nostalgic, retro-inspired interface

It’s fun.
It’s expressive.
And honestly… it feels like giving people a piece of their childhood back.

Live website- https://recd-fyi.vercel.app/
GitHub Repo- https://github.com/vnmrsharma/ReCDFyi

⚙️ How I Built It (and How Kiro Helped)

This was my first time fully committing to Kiro’s spec-driven and vibe-coding workflow, and I didn’t expect it to be this transformative.

Here’s what I did:

🧩 Spec-first development

I wrote a high-level spec describing:

Upload → metadata → burn → share

Cloud storage integration

Optional AI feature

Modular boundaries for frontend and services

From that spec, Kiro produced structured, testable, clean code. It honestly felt like pairing with a very disciplined engineer.

⚡ Vibe Coding for iteration

Once the foundation was generated, vibe-coding let me:

Quickly tweak components

Add new UI flows

Improve interactions

Fix smaller issues without spinning in circles

This back-and-forth workflow became surprisingly natural.

🔍 Lessons learned

Modularity matters

Specs save time in the long run

AI-assisted development is becoming a superpower

Building nostalgic tech requires equal parts design and soul

ReCDFyi is more polished than most of my hackathon projects — and I credit that entirely to how structured and iterative Kiro kept me.

🔧 Challenges

Every good project fights back a little.

Capturing the nostalgic vibe

The first versions felt too generic.
They didn’t “feel” like a mixtape.
It took UI iterations and layout experiments to get the vibe right.

Managing optional integrations

Cloud + AI metadata + sharing links…
This stack can get messy fast.

The modular service architecture saved the day here.

Handling real-world data

User uploads are unpredictable.
File formats break.
Metadata is missing.
Networks fail.

I built in fallbacks, validators, and robust upload flows to make the experience smooth.

🌱 What’s Next

ReCDFyi doesn’t stop here.

I want to add:

A public community mixtape gallery

Collaborative mixtapes

Comments and reactions

AI-generated album art

Better streaming + user profiles

Indie artist showcase pages

If CDs were the creative canvas of the early 2000s, maybe digital mixtapes can become the canvas of the 2020s.

❤️ Why I Built This

ReCDFyi isn’t just a technical project for me.
It’s nostalgia.
It’s creativity.
It’s the feeling of making something beautiful and giving it to someone you care about.

If you ever burned a CD, exchanged a mixtape, or treasured those little plastic discs…
I hope ReCDFyi gives you a spark of that magic again — with a modern twist.

Thanks for reading.
And if you’re curious, go ahead and burn your first digital CD.
I promise it’s still fun. 🎵💿✨

kiro #hackathon #kiroween #webdev #ai #react #firebase #opensource

Dreamzine

vinamra sharma — Mon, 15 Sep 2025 07:01:57 +0000

This is a submission for the Google AI Studio Multimodal Challenge

What I Built
I built DreamZine, a web application that serves as a personalized digital magazine of a user's dreams. It solves the problem of dreams being ephemeral and difficult to articulate by providing a tool to capture, visualize, and preserve them in a beautiful and engaging format.
The core experience is designed to be seamless and magical:
A user records a voice narration of their dream.
They select a preferred artistic style (e.g., Watercolor, Noir Sketch, Pop Art).
The application then uses the Gemini API's multimodal capabilities to analyze the audio. It transcribes the narration, identifies key themes, characters, and emotions, and breaks the story down into a sequence of comic book panels.
For each panel, it generates a unique, surreal illustration in the chosen art style, complete with a caption derived from the original narration. It even generates a creative title for the dream.
The final output is presented as an interactive, page-flipping digital comic book. Each new creation is automatically saved to the user's dashboard, creating a "Zine" they can revisit and reflect on anytime.
DreamZine transforms the abstract, fleeting nature of dreams into a tangible and shareable artistic artifact.

Demo
Live Link

How I Used Google AI Studio
Google AI Studio was instrumental in the rapid prototyping and development of DreamZine, primarily through its powerful and accessible Gemini API.
Multimodal Input: I leveraged the Gemini 2.5 Flash model's ability to process multiple input modalities simultaneously. The core generateContent call sends both the user's audio recording and a detailed text prompt in a single request. This allows the AI to not just transcribe the audio but to interpret it within the context and instructions provided by the text prompt.
Structured JSON Output: To reliably build the comic book interface, I needed structured data from the AI. I used the responseSchema feature to instruct the Gemini model to return its analysis in a specific JSON format. This schema defines the expected output, including a title for the dream and an array of panels, where each panel object contains a caption and a detailed imagePrompt. This ensured the application received predictable and easily parsable data, eliminating the need for fragile string parsing.
Image Generation: For the visual component, I used the imagen-4.0-generate-001 model. The detailed, surreal imagePrompts generated by the Gemini model in the previous step were fed directly into the image generation model to create the high-quality, stylized panels for the comic.

Multimodal Features
DreamZine's user experience is fundamentally built on its sophisticated use of multimodal AI, which blends different types of information to create something entirely new.
The primary multimodal feature is the Audio-to-Visual Narrative Translation:
Input Modalities: The system takes Audio (the user's voice, capturing the story, tone, and emotion) and Text (a guiding prompt that tells the AI how to behave and which art style to use).
Output Modalities: The system produces Text (the dream's title and panel captions) and a series of Images (the illustrated comic panels).

This enhances the user experience in several key ways:
Natural and Expressive Input: Recounting a dream verbally is far more natural than typing it. Voice captures the subtle emotions—excitement, fear, wonder—that are often lost in text. The AI can infer this emotional subtext from the user's tone and pacing, infusing the generated visuals with the appropriate mood.

Creative Transformation: The magic of the app lies in its ability to translate from one modality to another. It takes a raw, unstructured audio stream and transforms it into a structured, artistic, visual narrative. This feels less like a simple transcription and more like a creative collaboration between the user and the AI.

Deep Personalization: By grounding the entire creative process in the user's own voice, the resulting comic book is deeply personal. It's their story, their words, and their subconscious, visualized in an art style they chose. This creates a powerful sense of ownership and connection to the final artifact.

Snaps

WhiteBoard-Wiza

vinamra sharma — Mon, 15 Sep 2025 06:58:54 +0000

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

Whiteboard Wizard is an AI-powered applet designed to bridge the gap between analog brainstorming and digital development. It solves the common problem of valuable technical diagrams being trapped on physical whiteboards, where they are difficult to share, edit, and analyze.

The application allows a user to upload a photograph of a hand-drawn diagram and supplement it with an audio recording where they narrate the context, explain the logic, or describe a problem they are facing. Whiteboard Wizard then uses a multimodal AI model to:

Digitize the drawing into clean, editable Mermaid syntax.
Analyze the logical flow and architecture described in both the diagram and the narration.
Provide a detailed textual analysis, identifying potential errors, inefficiencies, or areas for improvement.
Suggest concrete solutions, including updated diagrams and code snippets, to help the user debug and refine their ideas.

Essentially, it acts as an expert pair-programmer, instantly transforming a static image and spoken thoughts into an interactive, digital, and actionable development tool.

Demo

Live Link- Click Here

How I Used Google AI Studio

Google AI Studio was instrumental in prototyping and refining the core multimodal prompt that powers Whiteboard Wizard. The application's success hinges on its ability to receive an image, audio, and a text prompt, and return a perfectly structured JSON object.

Prompt Engineering: I used AI Studio's iterative environment to craft a detailed prompt for the gemini-2.5-flash model. This involved defining the AI's persona as an "expert software architect," outlining its specific tasks, and providing critical rules for generating valid Mermaid syntax—including examples of what not to do.
Multimodal Input Testing: AI Studio was perfect for testing how the model would interpret various combinations of images and audio files. This allowed me to quickly see how different diagram styles or narration qualities affected the output quality.
Structured Output (JSON Schema): The most critical capability I leveraged was Gemini's JSON mode. I designed a responseSchema and tested it extensively in AI Studio to ensure the model would consistently return the required mermaidCode, analysis, and suggestions fields. This reliability is key to parsing the response and rendering the results in the UI without errors.

Multimodal Features

Whiteboard Wizard is built around a core multimodal feature: the fusion of visual (image) and auditory (audio) inputs to generate a comprehensive analysis. This enhances the user experience in several crucial ways:

Contextual Understanding: A diagram by itself lacks intent. The user's audio narration provides the critical "why" behind the "what." It allows the user to explain their goals, point out areas of concern, and ask specific questions. The AI uses this context to provide analysis and suggestions that are highly relevant to the user's actual problem, rather than just performing a generic transcription.
Natural and Efficient Interaction: The workflow mimics how developers collaborate in the real world—by pointing to a diagram and talking through it. This is a far more natural and faster way to convey complex information than writing a lengthy text description to accompany an image.
Deeper, More Accurate Analysis: By processing both modalities simultaneously, the AI gains a much deeper understanding of the user's work. It can correlate a specific shape on the diagram with a concept the user describes in the audio, leading to more insightful and accurate debugging. For example, if a user mentions "scalability concerns" in the audio while pointing to a database symbol in the diagram, the AI can specifically look for and flag potential bottlenecks in that part of the architecture.