<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tim Ren</title>
    <description>The latest articles on DEV Community by Tim Ren (@xr843).</description>
    <link>https://dev.to/xr843</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3828383%2Fb0f6555c-bf6f-4588-8a6b-cf1b78d2ea46.png</url>
      <title>DEV Community: Tim Ren</title>
      <link>https://dev.to/xr843</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xr843"/>
    <language>en</language>
    <item>
      <title>I Built an Open-Source Buddhist Text Search Engine — Here's What I Learned</title>
      <dc:creator>Tim Ren</dc:creator>
      <pubDate>Sun, 22 Mar 2026 13:10:00 +0000</pubDate>
      <link>https://dev.to/xr843/i-built-an-open-source-buddhist-text-search-engine-heres-what-i-learned-307l</link>
      <guid>https://dev.to/xr843/i-built-an-open-source-buddhist-text-search-engine-heres-what-i-learned-307l</guid>
      <description>&lt;p&gt;If you've ever tried researching Buddhist texts, you know the pain: scriptures are scattered across hundreds of databases worldwide — CBETA, SuttaCentral, BDRC, 84000... Searching a single topic means juggling a dozen sites, each with different formats, languages, and varying quality of experience.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;FoJin (佛津)&lt;/a&gt; to bring it all together.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is FoJin?
&lt;/h2&gt;

&lt;p&gt;FoJin is an open-source platform that aggregates &lt;strong&gt;504 data sources&lt;/strong&gt; and &lt;strong&gt;9,200+Buddhist texts&lt;/strong&gt; across &lt;strong&gt;21 languages&lt;/strong&gt; (Chinese, Sanskrit, Pali, Tibetan, and more) into a single, searchable interface.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;Live:&lt;/strong&gt; &lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;fojin.app&lt;/a&gt;&lt;br&gt;
📦 &lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/xr843/fojin" rel="noopener noreferrer"&gt;github.com/xr843/fojin&lt;/a&gt;(Apache 2.0)&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔍 Multilingual Full-Text Search
&lt;/h3&gt;

&lt;p&gt;Built on Elasticsearch with ICU tokenization, supporting CJK, Sanskrit, Pali, and Tibetan scripts. Filter by dynasty, category, and source.&lt;/p&gt;

&lt;h3&gt;
  
  
  📖 Parallel Reading
&lt;/h3&gt;

&lt;p&gt;Compare different language translations of the same text side by side — for example, the Heart Sutra in Sanskrit, Chinese, Tibetan, and English.&lt;/p&gt;

&lt;h3&gt;
  
  
  🤖 AI Q&amp;amp;A (XiaoJin)
&lt;/h3&gt;

&lt;p&gt;RAG-powered assistant that answers questions by citing canonical sources. Free anonymous quota with BYOK support.&lt;/p&gt;

&lt;h3&gt;
  
  
  🕸️ Knowledge Graph
&lt;/h3&gt;

&lt;p&gt;9,600 entities and 3,800 relations, visualized with D3. Explore connections between texts, authors, schools, and concepts.&lt;/p&gt;

&lt;h3&gt;
  
  
  📚 Dictionary Integration
&lt;/h3&gt;

&lt;p&gt;6 authoritative dictionaries with 230,000+ entries, accessible inline while reading.&lt;/p&gt;

&lt;h3&gt;
  
  
  📝 Academic Export BibTeX, RIS, and APA citation formats for researchers.
&lt;/h3&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;Frontend:  React 18 + TypeScript + Ant Design 5&lt;br&gt;
Backend:   FastAPI + SQLAlchemy (async)&lt;br&gt;
Database:  PostgreSQL 15 (pgvector) + Elasticsearch 8 + Redis 7&lt;br&gt;
Deploy:    Docker Compose — one command to run everything&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I started FoJin as a personal tool. I was frustrated by how fragmented Buddhist digital resources are — hundreds of databases, each with its own search interface, its own text format, its own limitations.&lt;/p&gt;

&lt;p&gt;I wanted one place where I could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search across &lt;strong&gt;all&lt;/strong&gt; major collections at once&lt;/li&gt;
&lt;li&gt;Read texts in multiple languages side by side&lt;/li&gt;
&lt;li&gt;Ask questions and get answers grounded in primary sources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After using it privately for a while, I decided to open-source it in case it's useful to others in Buddhist studies or digital humanities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;┌─────────────┐     ┌──────────────┐     ┌─────────────────┐&lt;br&gt;
  │   React UI  │────▶│   FastAPI    │────▶│  PostgreSQL 15  │&lt;br&gt;
  │  Ant Design │     │  async APIs  │     │   (pgvector)    │&lt;br&gt;
  └─────────────┘     └──────┬───────┘     └─────────────────┘&lt;br&gt;
                             │&lt;br&gt;
                      ┌──────┴───────┐&lt;br&gt;
                      │              │&lt;br&gt;
                ┌─────▼─────┐  ┌────▼────┐&lt;br&gt;
                │    ES 8   │  │ Redis 7 │&lt;br&gt;
                │ full-text │  │  cache  │&lt;br&gt;
                └───────────┘  └─────────┘&lt;/p&gt;

&lt;p&gt;Key technical decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Elasticsearch + ICU plugin&lt;/strong&gt; for proper CJK/Sanskrit/Tibetan tokenization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pgvector&lt;/strong&gt; for semantic search embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSE streaming&lt;/strong&gt; for AI chat responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker Compose&lt;/strong&gt; for one-command deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Involved
&lt;/h2&gt;

&lt;p&gt;FoJin is actively maintained and welcomes contributions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🐛 &lt;a href="https://github.com/xr843/fojin/issues" rel="noopener noreferrer"&gt;Report issues&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 Help with translations (currently supporting 8 UI languages)&lt;/li&gt;
&lt;li&gt;📚 Suggest new data sources to integrate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you work in Buddhist studies, digital humanities, or just find this interesting — give&lt;br&gt;
  it a try at &lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;fojin.app&lt;/a&gt; and let me know what you think!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Star the repo if you find it useful:&lt;/em&gt; ⭐&lt;br&gt;
  &lt;a href="https://github.com/xr843/fojin" rel="noopener noreferrer"&gt;github.com/xr843/fojin&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Built an Open-Source Search Engine for 440+ Buddhist Text Sources in 30 Languages</title>
      <dc:creator>Tim Ren</dc:creator>
      <pubDate>Tue, 17 Mar 2026 04:06:34 +0000</pubDate>
      <link>https://dev.to/xr843/how-i-built-an-open-source-search-engine-for-440-buddhist-text-sources-in-30-languages-gli</link>
      <guid>https://dev.to/xr843/how-i-built-an-open-source-search-engine-for-440-buddhist-text-sources-in-30-languages-gli</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Buddhist texts are scattered across hundreds of databases worldwide — CBETA (Chinese Buddhist Canon), SuttaCentral (Pali texts), 84000 (Tibetan translations), BDRC (manuscripts), GRETIL (Sanskrit), and 430+ more. Each has different interfaces, languages, and data formats.&lt;/p&gt;

&lt;p&gt;Researchers spend more time &lt;strong&gt;finding&lt;/strong&gt; texts than &lt;strong&gt;reading&lt;/strong&gt; them.&lt;/p&gt;

&lt;p&gt;I wanted to fix that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;FoJin&lt;/a&gt; (佛津) is an open-source platform that aggregates all of these into one unified interface. Think of it as "Google Scholar for Buddhist texts" — but with features no search engine offers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;fojin.app&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/xr843/fojin" rel="noopener noreferrer"&gt;github.com/xr843/fojin&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual full-text search&lt;/strong&gt; across 440+ sources using Elasticsearch with ICU tokenizer (handles Chinese, Sanskrit, Pali, Tibetan, Japanese, Korean, and 24 more languages)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4,488 fascicles&lt;/strong&gt; available for online reading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel reading&lt;/strong&gt; — compare translations side-by-side in 30 languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge graph&lt;/strong&gt; — 9,600+ entities and 3,800+ relationships, visualized with interactive force-directed graphs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG-based AI Q&amp;amp;A&lt;/strong&gt; — ask questions and get answers grounded in ~11 million characters of canonical texts, always with source citations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 dictionaries&lt;/strong&gt; — 237,593 entries across Chinese, Sanskrit, Pali, and English&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IIIF manuscript viewer&lt;/strong&gt; — browse digitized manuscripts from BDRC and other institutions&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;Here's what powers FoJin under the hood:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 18 + TypeScript + Vite + Ant Design 5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Zustand + TanStack Query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI + SQLAlchemy (async) + Pydantic v2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 15 + pgvector + pg_trgm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search&lt;/td&gt;
&lt;td&gt;Elasticsearch 8 with ICU tokenizer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;Redis 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI/RAG&lt;/td&gt;
&lt;td&gt;Dify + dual retrieval (keyword + semantic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Docker Compose + Nginx + GitHub Actions CI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  The Hardest Part: Multilingual Search
&lt;/h2&gt;

&lt;p&gt;Searching across Buddhist texts isn't like searching English documents. You're dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Classical Chinese&lt;/strong&gt; (no spaces, no punctuation in original texts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sanskrit&lt;/strong&gt; (complex compound words, multiple romanization schemes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pali&lt;/strong&gt; (diacritical marks: ā, ī, ū, ṃ, ṇ, ṭ)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tibetan&lt;/strong&gt; (unique script with syllable markers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Japanese&lt;/strong&gt; (mixed kanji + kana)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Korean&lt;/strong&gt; (Hangul + Hanja)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The solution: Elasticsearch's &lt;strong&gt;ICU tokenizer&lt;/strong&gt; with language-specific analyzers. Each language gets its own analysis chain, and queries are normalized to handle diacritical variations (so searching "nirvana" also finds "nirvāṇa" and "涅槃").&lt;/p&gt;
&lt;h2&gt;
  
  
  RAG Over Ancient Texts
&lt;/h2&gt;

&lt;p&gt;The AI Q&amp;amp;A feature ("XiaoJin") uses Retrieval-Augmented Generation to answer questions about Buddhist philosophy, history, and textual analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why RAG matters here:&lt;/strong&gt; Buddhist studies demand accuracy. You can't have an AI hallucinate a sutra citation. Every answer must be grounded in actual canonical texts with traceable sources.&lt;/p&gt;

&lt;p&gt;The pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User asks a question&lt;/li&gt;
&lt;li&gt;Dual retrieval: keyword search (BM25) + semantic search (pgvector embeddings)&lt;/li&gt;
&lt;li&gt;Retrieved passages are fed to the LLM as context&lt;/li&gt;
&lt;li&gt;LLM generates an answer with inline citations pointing to specific texts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The corpus covers ~11 million characters of canonical texts across multiple languages.&lt;/p&gt;
&lt;h2&gt;
  
  
  Knowledge Graph
&lt;/h2&gt;

&lt;p&gt;The knowledge graph maps relationships between Buddhist concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;People&lt;/strong&gt; → teachers, translators, historical figures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Texts&lt;/strong&gt; → sutras, commentaries, treatises&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schools&lt;/strong&gt; → Theravada, Mahayana, Vajrayana traditions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monasteries&lt;/strong&gt; → historical and active institutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concepts&lt;/strong&gt; → philosophical terms and doctrines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;9,600+ entities connected by 3,800+ typed relationships, rendered as an interactive force-directed graph using D3.js.&lt;/p&gt;
&lt;h2&gt;
  
  
  Self-Hosting
&lt;/h2&gt;

&lt;p&gt;FoJin is fully self-hostable with Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/xr843/fojin.git
&lt;span class="nb"&gt;cd &lt;/span&gt;fojin
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;code&gt;http://localhost:3000&lt;/code&gt; and you're done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multilingual NLP is hard.&lt;/strong&gt; ICU tokenization solved 80% of the problem, but edge cases in Classical Chinese and Sanskrit required custom analyzers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RAG needs domain-specific tuning.&lt;/strong&gt; Generic chunking strategies don't work well for ancient texts with verse-prose mixed formats. I had to design custom chunking that respects the structural units of Buddhist canons.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Knowledge graphs are underrated.&lt;/strong&gt; The force-directed visualization became one of the most-used features — researchers love exploring conceptual relationships visually.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker Compose is the right deployment choice&lt;/strong&gt; for academic tools. Researchers aren't DevOps engineers. One command should get everything running.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Get Involved
&lt;/h2&gt;

&lt;p&gt;FoJin is Apache 2.0 licensed and actively looking for contributors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Translate the UI&lt;/strong&gt; — we need Japanese, Korean, Thai, Vietnamese, and more (&lt;a href="https://github.com/xr843/fojin/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22" rel="noopener noreferrer"&gt;good first issues&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add data sources&lt;/strong&gt; — know a Buddhist text database we're missing?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improve search&lt;/strong&gt; — better tokenization, ranking, or multilingual support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend/Backend&lt;/strong&gt; — dark mode, mobile responsiveness, citation export&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://fojin.app" rel="noopener noreferrer"&gt;fojin.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/xr843/fojin" rel="noopener noreferrer"&gt;github.com/xr843/fojin&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Discord: &lt;a href="https://discord.gg/76SZeuJekq" rel="noopener noreferrer"&gt;discord.gg/76SZeuJekq&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;If you're interested in NLP, search engines, knowledge graphs, or digital humanities, I'd love to hear your feedback. Drop a comment or star the repo!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>elasticsearch</category>
      <category>rag</category>
      <category>knowledgegraph</category>
    </item>
  </channel>
</rss>
