MyAnimeList recommendations were broken, so I scraped 108 years of history to fix them.
Standard anime search engines rely on keyword matching. If you search for "Cyberpunk", they look for the tag "Sci-Fi". I wanted to search by "Vibe" (e.g., "Anime that feels like a warm hug" or "Neon-soaked tragedy").
So, I spent the last 2 months building AiMi: A production-grade Hybrid RAG engine.
Here is the full technical breakdown of how I built it, the architectural challenges I faced (handling 8,000+ embeddings on CPU vs GPU), and the code behind the viral "Anime Receipts" generator.
1. The Data: 108 Years of History (1917-2025)
Garbage in, garbage out. Before building the model, I needed a dataset that didn't exist.
I aggregated data from AniDB, and MAL to create a unified database of 8,248 anime.
The biggest challenge was Normalization.
- Ratings: AniDB uses floats (0-10), MAL uses integers. I preserved the float precision.
-
Context: Raw synopses aren't enough for RAG. I engineered a
canonical_embedding_textfield that blends Themes + Character Archetypes + Emotional Tone into a single dense vector block.
🎁 Free Resource: I’ve open-sourced a 500-row sample of this cleaned dataset on Kaggle for anyone who wants to test their own models:
Download Sample Dataset
2. The Engine: Hybrid RAG Architecture
Most RAG tutorials are "Hello World" toys. I needed this to run in production.
I settled on a Hybrid Search architecture to balance "Vibe" (Semantic) with "Precision" (Keywords).
A. The Embedding Model
I chose Nomic v1.5 over OpenAI.
- Why? It outperforms other open-source models like Jina, Stella, and Alibaba-NLP for structured retrieval tasks.
- Cost: It runs locally. No API bills.
B. The "Keyword Boosting" Layer (BM25 Logic)
Vector search is bad at specific nouns. If a user searches for "Anime about a notebook", vectors might give you "School Life" anime.
I implemented a python-native boosting logic to force specific terms to the top.
Here is the actual search logic from my pipeline:
# Inside robust_rag_pipeline.py
def search(self, query: str, top_k: int = 10):
# 1. Vector Search (Nomic)
query_emb = self.model.encode(query)
scores, ids = self.index.search(query_emb, top_k)
# 2. Keyword Boosting (Safety Net)
# Extract rare nouns (>4 chars) like "Pancreas" or "Notebook"
query_words = set([w.lower() for w in query.split() if len(w) > 4])
for idx, score in zip(ids, scores):
anime = self.dataset.iloc[idx]
anime_text = (anime['Synopsis'] + " " + anime['Title']).lower()
# Check for exact matches
matches = sum(1 for q in query_words if q in anime_text)
# Boost score by 5% per match (Max 15%)
boost = min(matches * 0.05, 0.15)
final_score = min(score + boost, 0.9999)
3. Solving the "Hardware-Aware" Problem
I wanted this to run on a MacBook Air (CPU) and a Gaming Rig (NVIDIA GPU) without changing code.
- On GPU: The system loads a local LLM (Qwen-2.5-1.5B) to enable HyDE (Hypothetical Document Embeddings). It "translates" queries like "No fanservice" into "Wholesome, family friendly" before searching.
- On CPU: It gracefully degrades to "Lightweight Mode" (Nomic + Boosting only) to prevent timeouts.
This "Self-Healing" initialization was critical for deployment:
# Hardware-Aware Initialization
if device == 'cuda':
logger.info("⚡ GPU Detected: Enabling HyDE (Generative Intent).")
self.llm_pipeline = load_qwen_model()
else:
logger.warning("⚠️ CPU Detected: Skipping LLM to prevent timeouts.")
self.llm_pipeline = None
4. The Viral Feature: Generating Receipts with Playwright
Data is boring if you can't share it. I wanted users to be able to visualize their watch history as "Store Receipts."
I used Playwright (headless browser) to render HTML templates into High-DPI images.
The Challenge: Performance. Generating 100 receipts sequentially took forever.
The Solution: asyncio.gather.
# Batch Processing Receipts
async def convert_all(receipts):
tasks = []
async with async_playwright() as p:
browser = await p.chromium.launch()
# Create multiple contexts for parallelism
for receipt in receipts:
tasks.append(render_receipt(browser, receipt))
# Execute all renders in parallel
await asyncio.gather(*tasks)
This reduced generation time from 40 seconds to 3 seconds for a batch.
5. Build it Yourself (Source Code)
I believe in Source-Available software. You shouldn't have to spend 2 months scraping and refactoring like I did.
I have packaged the Entire Ecosystem into a "Business-in-a-Box" for developers who want to launch their own Anime SaaS or learn advanced RAG patterns.
📦 What's in the box?
- The 8k RAG Dataset (Parquet).
- The Recommendation Engine (FastAPI + Streamlit Source Code).
- The Receipt Generator (Playwright + Async Logic).
- The Asset Library (2.3GB of Posters/Logos).
You can clone this, white-label it, and launch your own version today.
🚀 Launch Special (Limited Time)
To celebrate the launch, I'm offering a 10% Discount on the Ultimate Tier.
-
Code:
AIMILAUNCH - Note: I will be raising the prices by $50 after the first 50 sales. Lock it in now.
💎 Get the Ultimate Ecosystem (Tier 3)
(If you just want the raw data, Tier 1 is available for $49 here.)
Let me know if you have any questions about the Nomic vs. OpenAI benchmarks or the HyDE implementation in the comments!
Go make something impossible. 🚀


Top comments (0)