DEV Community

Tim Ren
Tim Ren

Posted on

I Built an Open-Source Buddhist Text Search Engine — Here's What I Learned

If you've ever tried researching Buddhist texts, you know the pain: scriptures are scattered across hundreds of databases worldwide — CBETA, SuttaCentral, BDRC, 84000... Searching a single topic means juggling a dozen sites, each with different formats, languages, and varying quality of experience.

So I built FoJin (佛津) to bring it all together.

What is FoJin?

FoJin is an open-source platform that aggregates 504 data sources and 9,200+Buddhist texts across 21 languages (Chinese, Sanskrit, Pali, Tibetan, and more) into a single, searchable interface.

🔗 Live: fojin.app
📦 Source: github.com/xr843/fojin(Apache 2.0)

Core Features

🔍 Multilingual Full-Text Search

Built on Elasticsearch with ICU tokenization, supporting CJK, Sanskrit, Pali, and Tibetan scripts. Filter by dynasty, category, and source.

📖 Parallel Reading

Compare different language translations of the same text side by side — for example, the Heart Sutra in Sanskrit, Chinese, Tibetan, and English.

🤖 AI Q&A (XiaoJin)

RAG-powered assistant that answers questions by citing canonical sources. Free anonymous quota with BYOK support.

🕸️ Knowledge Graph

9,600 entities and 3,800 relations, visualized with D3. Explore connections between texts, authors, schools, and concepts.

📚 Dictionary Integration

6 authoritative dictionaries with 230,000+ entries, accessible inline while reading.

📝 Academic Export BibTeX, RIS, and APA citation formats for researchers.

Tech Stack

Frontend: React 18 + TypeScript + Ant Design 5
Backend: FastAPI + SQLAlchemy (async)
Database: PostgreSQL 15 (pgvector) + Elasticsearch 8 + Redis 7
Deploy: Docker Compose — one command to run everything

Why I Built This

I started FoJin as a personal tool. I was frustrated by how fragmented Buddhist digital resources are — hundreds of databases, each with its own search interface, its own text format, its own limitations.

I wanted one place where I could:

  • Search across all major collections at once
  • Read texts in multiple languages side by side
  • Ask questions and get answers grounded in primary sources

After using it privately for a while, I decided to open-source it in case it's useful to others in Buddhist studies or digital humanities.

Architecture Overview

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ React UI │────▶│ FastAPI │────▶│ PostgreSQL 15 │
│ Ant Design │ │ async APIs │ │ (pgvector) │
└─────────────┘ └──────┬───────┘ └─────────────────┘

┌──────┴───────┐
│ │
┌─────▼─────┐ ┌────▼────┐
│ ES 8 │ │ Redis 7 │
│ full-text │ │ cache │
└───────────┘ └─────────┘

Key technical decisions:

  • Elasticsearch + ICU plugin for proper CJK/Sanskrit/Tibetan tokenization
  • pgvector for semantic search embeddings
  • SSE streaming for AI chat responses
  • Docker Compose for one-command deployment

Get Involved

FoJin is actively maintained and welcomes contributions:

  • 🐛 Report issues
  • 🌐 Help with translations (currently supporting 8 UI languages)
  • 📚 Suggest new data sources to integrate

If you work in Buddhist studies, digital humanities, or just find this interesting — give
it a try at fojin.app and let me know what you think!


Star the repo if you find it useful:
github.com/xr843/fojin

Top comments (0)