Building AnimeRecBERT: How I Used BERT to Create a Personalized Anime Recommendation System

#machinelearning #animation #ai #website

I wanted to build an anime recommendation system and came across a GitHub repo that used BERT trained on the MovieLens dataset. The impressive results it achieved inspired me to adapt the approach for anime recommendations. What started as a side project turned into a full-scale experiment with millions of anime ratings and one powerful language model: BERT.

To build a high-quality dataset, I scraped over 4 million user anime lists from:

- AniList
- Kitsu
- MyAnimeList

After filtering for users with at least 10 ratings, I was left with around 1.5 million users. For this experiment, I used a subset of 600,000 users for faster training and prototyping.

After testing various approaches like VAE and Matrix Factorization, BERT consistently delivered the best results. So I built AnimeRecBERT — a BERT-based system trained on this dataset with 54 million ratings.

The results? It actually gets my taste right.

The Moment of Truth: Testing My Own System

To evaluate the model qualitatively, I gave it a list of my 9 favorite anime:

Classroom of the Elite
Pseudo Harem
Don’t Toy With Me, Miss Nagatoro
86 (Eighty-Six)
Mushoku Tensei
Made in Abyss
Shangri-La Frontier
The Case Study of Vanitas
Hell’s Paradise

NOTE: The position of favorites does not affect inference results, as the model uses only the presence of items (not sequence).

Top Recommendation: Call of the Night, Summertime Rendering, Heavenly Delusion, and Jujutsu Kaisen.

Out of the top 15 recommendations, 6 were anime I’d already watched and loved.
The rest? Titles like Frieren, Tonikaku Kawaii, Solo Leveling, and Heavenly Delusion fit my taste almost perfectly.