This is a submission for the Google AI Studio Multimodal Challenge
I built Vision Stock -Financial, an applet designed to revolutionize how small business owners manage their operations.
The problem it solves is the difficult and time-consuming nature of manual inventory tracking and financial logging. With our applet, a user can simply take a picture of a shelf to update their inventory or snap a photo of a receipt to log an expense or revenue. This makes the process fast, intuitive, and less prone to errors, allowing business owners to focus on what truly matters: growing their business.
Demo
Applet Link: [https://github.com/shakarpg/Vision_Estoque_Financeiro_Applet.git]
Below are a few screenshots of our applet in action:
Caption: The main interface of the applet.
How I Used Google AI Studio
We used Google AI Studio as our primary tool to develop and prototype the core intelligence of our applet. Specifically, we leveraged the power of the Gemini 2.5 Flash model for its outstanding multimodal processing capabilities (image and text) and its speed.
Within AI Studio, we crafted and refined the prompts that instruct the AI to:
- Analyze an image of a store shelf, visually identify the products, and count the units for each item.
- Extract crucial information from a receipt image, such as the total amount, date, vendor name, and line items, and structure this data neatly.
The AI Studio interface allowed us to rapidly test different prompting strategies and fine-tune the instructions for maximum accuracy, which significantly accelerated our development cycle.
Multimodal Features
Our project's core multimodal feature is inventory and financial management through image analysis.
This dramatically enhances the user experience by eliminating the need for manual data entry. Instead of opening a spreadsheet to type "15 soda cans" or "$10.00 - purchase of cleaning supplies," the user simply points their camera and takes a picture.
This multimodal approach makes management:
Faster: A photo takes seconds, while manual entry can take several minutes.
More Accurate: It significantly reduces human errors from typos or miscounts.
More Accessible: It offers a far more intuitive and natural way to interact with a management system, especially for users who aren't comfortable with complex software.
Top comments (0)