DEV Community

Ramazan Turan
Ramazan Turan

Posted on

Building AnimeRecBERT: How I Used BERT to Create a Personalized Anime Recommendation System

Image description

Web Demo: animerecbert.online

I wanted to build an anime recommendation system and came across a GitHub repo that used BERT trained on the MovieLens dataset. The impressive results it achieved inspired me to adapt the approach for anime recommendations. What started as a side project turned into a full-scale experiment with millions of anime ratings and one powerful language model: BERT.

To build a high-quality dataset, I scraped over 4 million user anime lists from:

- AniList
- Kitsu
- MyAnimeList
Enter fullscreen mode Exit fullscreen mode

After filtering for users with at least 10 ratings, I was left with around 1.5 million users. For this experiment, I used a subset of 600,000 users for faster training and prototyping.

After testing various approaches like VAE and Matrix Factorization, BERT consistently delivered the best results. So I built AnimeRecBERT — a BERT-based system trained on this dataset with 54 million ratings.

The results? It actually gets my taste right.

The Moment of Truth: Testing My Own System

To evaluate the model qualitatively, I gave it a list of my 9 favorite anime:

Classroom of the Elite
Pseudo Harem
Don’t Toy With Me, Miss Nagatoro
86 (Eighty-Six)
Mushoku Tensei
Made in Abyss
Shangri-La Frontier
The Case Study of Vanitas
Hell’s Paradise
Enter fullscreen mode Exit fullscreen mode

NOTE: The position of favorites does not affect inference results, as the model uses only the presence of items (not sequence).

Top Recommendation: Call of the Night, Summertime Rendering, Heavenly Delusion, and Jujutsu Kaisen.

Out of the top 15 recommendations, 6 were anime I’d already watched and loved.
The rest? Titles like Frieren, Tonikaku Kawaii, Solo Leveling, and Heavenly Delusion fit my taste almost perfectly.

🧠 BERT Model Architecture

This project builds upon BERT4Rec-VAE-Pytorch (https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch), with several key modifications:

  • Custom dataset scraped from AniList, Kitsu, and MyAnimeList
  • GUI-based inference script for quick testing and Web Demo
  • Positional encoding removed from architecture (no timestamp info in user lists)
  • Input sequence length: 128 anime tokens
  • Increased dropout for better regularization

GitHub Repo

📂 You can try the model for yourself!
Setup & Inference instructions available on GitHub:

👉 https://github.com/MRamazan/AnimeRecBERT

Web Demo: animerecbert.online

Some Images From Website

Image description

Image description

Image description

Top comments (0)