Have you ever tried to build a Language Model from scratch, only to get stuck at the very first step: gathering your own data? Thatβs exactly what happened to me recently.
I wanted to extract messages from my Discord DMs to build a custom dataset for a new LM I'm developing. I started looking for tools to help me scrape and export my own conversations. To my surprise, almost every tool out there that actually works is hidden behind a paywall.
As developers, we know that if a tool doesn't exist (or isn't free), we just build it ourselves. So, I decided to fix the problem.
Enter Discord_Exporter π
I built my own completely free and open-source tool to export Discord messages. No subscriptions, no hidden fees, just code.
Link to the repository: https://github.com/PapLion/Discord_Exporter
Why am I sharing this?
This is officially my first open-source project.
I realized that if I had this problem while trying to build my datasets, other developers and AI enthusiasts are probably facing the exact same roadblock. Data should be accessible, especially your own data.
Let's connect!
Since this is my first foray into the open-source world, I would love your feedback.
If you find it useful for your own Machine Learning projects or data analysis, feel free to use it!
If you see room for improvement, PRs are more than welcome. Go ahead and fork it, break it, and make it better.
Don't forget to drop a β on the repo if it helps you out!
Let me know in the comments: what kind of datasets or models are you currently building?
Happy coding!
- Danilo (Dani.Dev)
Top comments (0)