DEV Community

Samiksha
Samiksha

Posted on

Automated File Management System using AI Voice Commands

In this project, I created an automated file management system through the development of an AI agent capable of managing files based on voice commands. This way, a user could simply give voice or written commands regarding creation, editing, summarization, or deletion of files, and the system would automatically do it all. The aim was to integrate the frontend, backend, and AI algorithms into a single workflow.

Tech Stack

I used React.js with Vite for the frontend, which provides an interactive UI showcasing the pipeline of command processing. For the backend part, I used FastAPI, which contains all the logic of the application.

For AI algorithms, I used Groq's models:

  • Whisper model for speech recognition
  • LLaMA model for recognizing the command intention

System Pipeline

The process works in an easy to follow pipeline:

  • User provides input (voice/text)
  • Whisper (for voice) converts the input to text
  • Text is fed into the backend
  • LLaMA analyses the text and finds the intent (e.g. create, edit, summarize, etc.)
  • Backend does the file operation as per the intent
  • The output is provided back and shown in the frontend

Features Implemented

The major features of this agent include the ability to provide commands (voice and text), creating file(s) with custom content, editing existing files, summarizing contents of a file or given text, deleting file(s) with confirmations and showing transcription, user intent, command history and results in the frontend. The confirmation option was included to avoid accidents such as accidental deletion or editing of files.

Challenges Encountered

In the course of developing this application, there have been several difficulties.

First of all, it was the configuration of FFmpeg for audio processing. In order to get everything to work on the backend, much effort was invested in setting up the appropriate parameters.

In addition, there was an integration problem between the API and the environment variables. Missing or invalid API keys could cause the entire app to break down, which needed to be taken into account during setup.

Regarding the front-end, one of the encountered difficulties concerned the configuration of Tailwind CSS. Class-based bugs needed to be fixed.

Most of the mentioned difficulties were overcome by extensive testing and debugging.

Conclusion

In this project, I learnt the front end, back end, and AI models integration in one application. At the same time, I acquired practical skills to deal with real-life challenges that can arise from API errors, configuring systems, or debugging the user interface.

Some improvements could still be made, especially when dealing with many intents, but now I have a good proof-of-concept solution based on voice automation.

Project Repository

GitHub Link: https://github.com/samiksha-chandel/Voice-Agent

Demo Video

YouTube (Unlisted): https://youtu.be/VTIUTOWFY-o

Top comments (0)