Libraries in Python that I Used
1. Beautiful Soup (The OG)
It’s literally what it sounds like. You feed it messy HTML code, and it cleans it up so you can find things. Perfect for static sites like Wikipedia or your uni's old-school portal.
2. Selenium / Playwright (The "Heavy Lifters")
Some sites are annoying and use a ton of JavaScript (looking at you, infinite scrollers). These tools basically open a "ghost" browser and click things for you. It's like having a tiny robot living in your RAM.
3. yt-dlp (The MVP)
If you’re trying to download videos or audio, don't reinvent the wheel. This library is a beast. It supports basically every site on the planet.
The Real Deal: ULTIMATE-MEDIA-DOWNLOADER
To prove I wasn't just procrastinating on my finals, I put all this together into a project called ULTIMATE-MEDIA-DOWNLOADER (UMD). (Yeah, it's still under development, but it's already a tank).
Check it out here: NK2552003/ULTIMATE-MEDIA-DOWNLOADER
Why is it cool?
- It’s Fast: It uses
yt-dlpunder the hood, so it doesn't break every time YouTube changes its UI. - It Looks Sick: I used the
Richlibrary, so instead of boring white text, you get actual progress bars and colors in your terminal. (Makes you look like a hacker in the library, 10/10 would recommend). - No Setup Pain: I made a script that handles all the annoying installs for you.
Try it out
If you want to play around with it:
Install in just 2 commands - no virtual environment needed!
git clone https://github.com/NK2552003/ULTIMATE-MEDIA-DOWNLOADER.git
cd ULTIMATE-MEDIA-DOWNLOADER
./scripts/install.sh
Windows users:
git clone https://github.com/NK2552003/ULTIMATE-MEDIA-DOWNLOADER.git
cd ULTIMATE-MEDIA-DOWNLOADER
scripts\install.bat
That's it! Once it's ready, just throw a link at it, now user it form anywhere
umd <URL>
️ Disclaimer (The "Don't Sue Me" Part)
Look, I built this for educational purposes only.
- Be Ethical: Only download content you have permission to access. This tool is meant for personal backups or studying, not for pirating or re-distributing copyrighted stuff.
- Terms of Service: Every website has its own rules. By using this tool, you're responsible for making sure you aren't breaking any laws or ToS.
- No Liability: I (the developer) am not responsible for how you use this tool or any consequences that come from it. Use your brain!
Pro-Tips for My Fellow Colleagues
-
Don't be a jerk: If you scrape a site 10,000 times in a second, their servers will hate you and they might block your IP. Use
time.sleep(). -
Robots.txt is a thing: Check
website.com/robots.txtto see if they actually allow scraping. (Most of the time they're cool with it if it's for "educational purposes" lol). - Metadata is king: Don't just download the file; grab the title, the thumbnail, and the tags. It makes your folders look way cleaner.
Final Thoughts
Web scraping is probably the most useful skill I've picked up outside of class. It saves so much time and honestly, it’s just fun to see code doing work for you.
If you like the project, give it a star on GitHub! It helps with my "hired-after-graduation" ego. ⭐️
If you want to contribute then visit my github that is given above
Peace out! ✌️
Top comments (3)
We use Beautiful Soup and Selenium / Playwright extensively for data scrapping. We are experts when it comes to data scrapping.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.