DEV Community

Cover image for Challenge Entry: Dataset Crafter
Adam Yates
Adam Yates

Posted on

Challenge Entry: Dataset Crafter

This is a submission for the Google AI Studio Multimodal Challenge

What I built:

I built an AI dataset tool that allows you to craft custom dataset examples with the help of Gemini 2.5 Flash. Easily add new examples to existing datasets or craft your own small datasets based on your specific tuning needs. It is intended for fine-tuning multi-modal LLMs and LoRa. It supports text, image, and audio input formats with the option of manual output entry. You can upload files to each modality and have 2.5 Flash generate an output for any input. After you complete building your desired dataset, you can export into either JSON or CSV format. Funny story, I didn't have any audio clips on my laptop. That's why it has a record audio feature.


*I removed the YAML export option because it caused an issue.

Demo

It is using the free trial, but all services should work and were functional in AI Studio. Images included.

How I Used Google AI Studio:

All of it was prompt built using Gemini 2.5 Pro code assistant in AI Studio with the Build feature, aside from a few licensing changes. Multi-step process to refine input and output. It showcases Gemini 2.5 Flash multi-modality by using native text, image, and audio understanding to generate output labels or desired responses. I could probably tune the AI generated image and audio labels a little better. They get a bit long, but the addition to the dataset is the same.

Multimodal Features:

Vision is showcased by image understanding and generating text output, native audio understanding by generating text labels.

edit - afterthought:


That looks really good on mobile. I should have added a capture image option.

Thanks for checking out my applet! The UI isn't very fancy, but it works.

Top comments (1)