Abstractive Summarization and Entity Extraction from Minutes of a Meeting
Developed an end-to-end solution that facilitates the generation of minutes of a meeting using audio or scanned text. Used the BART transformer model finetuned on the AMI Meeting's Corpus to achieve great ROUGE evaluation scores. Used a Bi-LSTM CRF model for Named Entity Recognition to identify names, places, dates, time, etc. Implemented a speaker diarization system using a VAD system and CNN. Developed an intuitive Flutter application to serve these modules, all hosted on Google Cloud Platform and Digital Ocean.
Demo Link
Link to Code
The code repositories have to be kept private until the end of the semester in May, but this is a test implementation repository that we used for our initial research.
GitHub Repo
How We built it
These were the tools and technologies we used to develop this project
- Summarization
- PyTorch Transformers
- PyTorch Lightning
- Google Cloud VM (Model Training)
- Named Entity Recognition
- PyTorch Transformers
- Speech Recognition and Speaker Diarization
- Google Speech API
- PyTorch
- Backend
- Node.js
- MongoDB (MongoDB Cloud Atlas - GitHub Education Pack)
- JSONWebTokens
- Digital Ocean (Deployment) (GitHub Education Pack)
- Mobile Application
- Flutter
- Firebase ML Vision
Also, we used a custom domain with the Digital Ocean server, obtained from NameCheap (GitHub Education Pack).
Additional Thoughts / Feelings / Stories
This application was actually conceived during a Students' Council meeting where one of us had to always write down stuff and then dictate before the next meeting which especially in a college environment feels extremely time-consuming given that we already have limited time juggling between studies and council activities.
Top comments (0)