DEV Community

DishyDev
DishyDev

Posted on • Originally published at dishy.dev

Creating a Reddit /r/anime Image Scraper

Scraping Images from Reddit /r/anime

Following on from my previous article about scraping images from Reddit, I decided to expand the concept out to a more fully featured tool.

I think Reddit's /r/anime is one of the most entertaining and best subreddits. Something that drives this is they have a bot that creates discussion threads for new episodes as they go out. They generate a lot of discussion and screenshots for shows.

Example of Anime discussion threads on reddit

I noticed since the threads are generated by a bot, its easy to search and get a list of all of the discussion threads for a show.

A

I decided to have a go to see how easy it was to implement an episode search.

Search for Episodes

The PRAW library makes it trivial to search Reddit.

subreddit = reddit.subreddit('anime')
matches = subreddit.search('violet evergarden', limit=250)

From here we can RegEx someones search string into a query to find all of the threads that match the ShowName - Episode discussion format.

REGEX_STR = '.*' + filteredName.replace(' ','.+') + '.*Episode\\D+(\\d+).+discussion.*'

This combined with the image search from the last article machines for a nice little scraper based on show name.

Frontend

The Frontend uses a simple CSS only framework Bulma for the UI. I'm really impressed with how easy it is to get an image window up for the preview feature of the app.

And the loading bars are nice too

This with a bit of JQuery to let us search the API and populate the results. It's a bit hacky so I've not gone into much detail on it, it'll be doing more work with it for the next article. The API is all served up through Azure Functions.

You can give it a try on my site below Reddit Anime Scraper Version 2.

Next Steps

  • It'd be nice to allow scrolling through the images while in fullscreen mode through arrow keys or buttons each side of the image.
  • These lookups are slow. There's an opportunity to implement a caching solution that looks at the number of comments or another flag to let us re-use lookups.

Top comments (0)