Web Demo: animerecbert.online
I wanted to build an anime recommendation system and came across a GitHub repo that used BERT trained on the MovieLens dataset. The impressive results it achieved inspired me to adapt the approach for anime recommendations. What started as a side project turned into a full-scale experiment with millions of anime ratings and one powerful language model: BERT.
To build a high-quality dataset, I scraped over 4 million user anime lists from:
- AniList
- Kitsu
- MyAnimeList
After filtering for users with at least 10 ratings, I was left with around 1.5 million users. For this experiment, I used a subset of 600,000 users for faster training and prototyping.
After testing various approaches like VAE and Matrix Factorization, BERT consistently delivered the best results. So I built AnimeRecBERT β a BERT-based system trained on this dataset with 54 million ratings.
The results? It actually gets my taste right.
The Moment of Truth: Testing My Own System
To evaluate the model qualitatively, I gave it a list of my 9 favorite anime:
Classroom of the Elite
Pseudo Harem
Donβt Toy With Me, Miss Nagatoro
86 (Eighty-Six)
Mushoku Tensei
Made in Abyss
Shangri-La Frontier
The Case Study of Vanitas
Hellβs Paradise
NOTE: The position of favorites does not affect inference results, as the model uses only the presence of items (not sequence).
Top Recommendation: Call of the Night, Summertime Rendering, Heavenly Delusion, and Jujutsu Kaisen.
Out of the top 15 recommendations, 6 were anime Iβd already watched and loved.
The rest? Titles like Frieren, Tonikaku Kawaii, Solo Leveling, and Heavenly Delusion fit my taste almost perfectly.
π§ BERT Model Architecture
This project builds upon BERT4Rec-VAE-Pytorch (https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch), with several key modifications:
- Custom dataset scraped from AniList, Kitsu, and MyAnimeList
- GUI-based inference script for quick testing and Web Demo
- Positional encoding removed from architecture (no timestamp info in user lists)
- Input sequence length: 128 anime tokens
- Increased dropout for better regularization
GitHub Repo
π You can try the model for yourself!
Setup & Inference instructions available on GitHub:
π https://github.com/MRamazan/AnimeRecBERT
Web Demo: animerecbert.online




Top comments (0)