Irfan Ghapar

Posted on May 22

AI Fiqh & Retrieval-augmented generation (RAG)

#fiqh #rag #webdev #phase1

Building AIFiqh: Our Journey with Islamic Knowledge and AI

How we're trying to make authentic Islamic scholarship more accessible (and learning a lot along the way)

Why We Built This Thing

Let me be straight here. When we started AIFiqh, we weren't trying to "disrupt" Islamic knowledge or whatever. We just noticed something annoying: finding reliable Islamic rulings online was a mess.

You'd search for something simple like "Can I pray with nail polish?" and get buried in forum discussions, random blog posts, and that one guy who's really confident but probably wrong. Meanwhile, there are literally hundreds of thousands of authentic Islamic texts sitting there, but good luck finding what you need without spending hours digging through PDFs.

So we thought: what if we could build something that actually knows this stuff?

When We Realized This Was Actually Hard

Turns out, building AI for Islamic knowledge isn't like building a chatbot for customer service. Islamic jurisprudence is nuanced. Context matters. Scholarly opinions differ. And you can't just throw GPT at it and hope for the best.

We learned this the hard way when our first prototype confidently told someone that "all fish are halal" without mentioning the Hanafi school's very specific opinions about shellfish. Oops.

That's when we realized we needed to get serious about this.

Why GPT and Claude Fall Short (And Why We Had to Build Our Own)

Before building AIFiqh, we tested the big players. Here's what we found:

Challenge	GPT-4	Claude	AIFiqh (Current)
Source Attribution	Generic responses, no citations	Limited Islamic sources	Every answer traces to specific kitab + page number
Madhab Awareness	Mixes schools without distinction	Occasional mention of differences	Clearly separates Hanafi, Maliki, Shafi'i, Hanbali opinions
Arabic Authenticity	Often paraphrases or translates incorrectly	Better than GPT but still limited	Original Arabic text + verified translations
Classical vs Contemporary	Can't distinguish scholarly weight	Similar issue	Properly weights classical scholars over modern opinions
Controversial Topics	Avoids or gives watered-down answers	Sometimes too cautious	Presents authentic scholarly positions with context
Update Frequency	Knowledge cutoff limitations	Same issue	Live updates from contemporary fatwa councils
Specialized Terminology	Generic Islamic terms	Better understanding but limited	Precise fiqh terminology with explanations
Infrastructure	Billion-dollar backing, global CDN	Well-funded, enterprise-grade	Running on startup budget (hoping investors are reading this 👀)

The Real Problem: General AI models are trained on everything—Wikipedia articles, random blogs, social media posts about Islam. They can't tell the difference between a authentic hadith commentary and someone's opinion on Reddit.

Our Problem: We know exactly what we need to build to compete with the big players, but we're currently bootstrapping this on DigitalOcean droplets while OpenAI has data centers around the world.

We needed something that actually knows the difference between Ibn Taymiyyah and that guy on IslamQA who's really confident but questionably qualified. But let's be honest, we also need the resources to scale this properly.

The Data Mission: 500K Texts and Counting

First things first: we needed the real deal. Not summaries, not interpretations, but actual source texts. So we went hunting.

What we digitized:

The entire Mausu'ah Fiqhiyyah Kuwaitiyah (all 45 volumes)
Al-Qardhawi's Fiqh Zakat
Hundreds of contemporary fatwas
Classical texts on muamalat
Tabung Haji documentation
And about 495,000 other texts that nearly broke our OCR budget

Converting these wasn't just scanning PDFs. Classical Arabic with diacritics, different fonts, handwritten marginalia, weird page layouts—our OCR pipeline had to handle it all. Fun times.

The Tech Stack (Or: How We Keep This Thing Running)

Frontend: React + Next.js

Because life's too short for vanilla JavaScript, and we needed SSR for performance.

Backend: The fun part we went multi-service:

Python + Flask for the heavy AI lifting
NestJS for our main API (TypeScript makes debugging less painful)
PostgreSQL + Prisma because we like our databases relational and our queries type-safe
ChromaDB for vector storage (more on this later)

AI Stack:

TensorFlow for our custom models
Google Gemini for the really tricky stuff
text-embedding-004 for turning Arabic text into numbers that actually mean something

Infrastructure: DigitalOcean everywhere

S3-compatible storage, droplets, managed databases—the works. No vendor lock-in headaches.

The Two-Tier Cache

Here's where it gets interesting. We built this two-tier caching system that's honestly kind of elegant:

Tier 1: The Speed Demon

LRU cache in memory. Sub-100ms responses for anything we've seen before. Because nobody wants to wait 3 seconds to find out if their prayer is valid.

Tier 2: The Brain

ChromaDB vector database with all our embeddings. When someone asks something new, we do semantic search across our entire corpus. Cosine similarity, 80% threshold, the works.

# This is simplified, but you get the idea
async def find_answer(question):
    # Check the fast cache first
    cached = await tier1_cache.get(question)
    if cached:
        return cached

    # Semantic search in the vector DB
    embedding = await embed_text(question)
    similar_texts = await vector_db.search(embedding, threshold=0.8)

    if similar_texts:
        answer = synthesize_answer(similar_texts)
        await tier1_cache.set(question, answer)
        return answer

    # Fall back to Gemini for novel questions
    return await gemini_generate(question, context=our_knowledge_base)

The cool part? If we don't have a good match in our vector DB, we fall back to Gemini but feed it relevant context from our texts. Best of both worlds.

Making Arabic Text Behave

Working with Arabic is... special. Different diacritics, right-to-left text, multiple valid spellings for the same word. Our preprocessing pipeline handles:

Normalization: Converting different Arabic fonts and diacritics to a standard form
Context preservation: Keeping track of which mazhab (school of thought) each text comes from
Citation tracking: Every piece of knowledge traces back to its source

We spent weeks just getting Arabic text to embed properly. Turns out, most embedding models are trained on English and get confused by Arabic morphology.

The UI: Making Knowledge Accessible

The interface is deliberately simple. Chat style interaction because that's what people expect from AI. But under the hood:

Source attribution for every answer (with Arabic text + translation)
Multiple perspectives when scholars disagree
Related questions to guide learning
Progressive disclosure so beginners don't get overwhelmed

We tried to make it clean, focused, and respectful. No flashy animations, just solid information architecture.

Testing Against the Big Players: A Reality Check

We ran some side-by-side comparisons during our beta. Here are real examples:

Query: "What's the ruling on cryptocurrency in Islam?"

GPT-4 Response: "Cryptocurrency is generally considered permissible in Islam, though some scholars have concerns about volatility and speculation..."

Claude Response: "Islamic scholars have different views on cryptocurrency. Some consider it halal while others have reservations due to gharar (uncertainty)..."

AIFiqh Response: "Contemporary scholars differ on cryptocurrency. Hanafi perspective (Dr. Monzer Kahf, 2018): Permissible as long as not used for gambling. Maliki perspective (European Council for Fatwa, 2019): Cautious approval with conditions. Concerns raised by Dar al-Ifta Egypt (2017): Excessive gharar and lack of intrinsic value. Source: Mausu'ah Fiqhiyyah Kuwaitiyah, Vol 31, pp. 234-237, plus contemporary fatwa compilation."

See the difference? We're not just giving opinions—we're showing you exactly where these opinions come from and how different schools approach the issue.

Real Talk: What We Got Wrong (And Fixed)

Performance Issues: Our first vector search was slow. Like, really slow. We fixed it with better indexing and parallel processing.

Cultural Blindspots: Beta users taught us things we never considered. Like how different communities have different transliteration preferences.

Source Weighting: Initially, we treated all texts equally. Bad idea. Classical scholars carry more weight than contemporary opinions, and our algorithm now reflects that.

Context Window: Arabic sentences can be really long. We had to tune our embedding strategy to capture complete thoughts without losing nuance.

The Numbers (Because Everyone Loves Metrics)

Current Performance:

555 active beta users across 50+ countries
33.35s average response time (we're working on this!)
4,562 total queries processed
66 queries today and growing

Data Scale:

500,000+ source texts processed
2.3M+ individual rulings extracted
50+ languages of source material
2MB average storage per user

What's Next (If We Can Keep the Servers Running)

Short term: We're working on voice interface (imagine asking fiqh questions while driving) and better mobile optimization. Assuming our DigitalOcean bill doesn't get too scary.

Medium term: Multi-language support, starting with Malay and Urdu. Also planning a scholar verification system—verified Islamic scholars can review and validate AI responses. This stuff needs proper funding though.

Long term: We want to build a complete Islamic knowledge graph. Imagine connecting every hadith to related Quranic verses and fiqh rulings automatically. Think Google's knowledge graph, but for Islamic scholarship. Obviously, this requires resources that match the ambition.

The Reality Check: We're currently a free tool serving 5,000+ users with bootstrap-level infrastructure. ChatGPT has billions in funding and global data centers. We have passion, authentic sources, and really good coffee.

If any investors are reading this: we've proven the concept works. Now we need to scale it properly. The Muslim community deserves AI that actually understands Islamic knowledge, not generic responses from models trained on Wikipedia.

The Human Element

Here's the thing about AI and religious knowledge: the technology is just a tool. We're not trying to replace scholars or traditional learning. We're trying to make authentic knowledge more accessible.

Every response includes source citations. We encourage users to verify important matters with qualified scholars. When scholars disagree, we show multiple perspectives.

The goal isn't to be the final authority it's to be a reliable starting point for learning.

Privacy & Ethics (The Boring But Important Stuff)

We don't sell user data. Ever.
Sensitive processing happens client-side when possible
Users can delete their data completely
We're transparent about our sources and limitations

Islamic knowledge is sacred. We treat it and our users with respect.

Behind the Scenes

Building this required more than just coding. We worked with Islamic scholars, studied classical Arabic, learned about different schools of jurisprudence. Our team spent months just understanding the domain before writing any code.

The technical challenges were real, but the cultural responsibility was bigger. Every design decision considered: "Does this serve the Muslim community well?"

Try It Yourself

AIFiqh is live at aifiqh.com. We're still in beta, still learning, still improving.

If you're a developer interested in Islamic tech, hit me up. If you're a scholar who wants to help improve our accuracy, we'd love to collaborate.

And if you're just someone who's ever struggled to find a clear Islamic ruling online well, that's exactly who we built this for.

Built with ❤️ by Muslims, for Muslims. Technical leadership by yours truly, with an amazing team of engineers and Islamic knowledge experts.

Current status: Beta (which means it's pretty good but we're still fixing things)

Questions? Feedback? Email us at hello@aiafiqh.com or better yet, try the platform and ask it directly.

Top comments (1)

Sir Khaliq • Oct 14

This is a good effort. However, there is a small question regarding the comparison made with the "Big Player" to ensure it is not biased.

"Have the questions directed to the 'Big Player' been given specific instructions, or were they random/general questions?"

It is well known that the use of AI operates through prompts, especially since ChatGPT has introduced Personalization features that allow users to determine the type of responses they want.

This is because the comparison between AI Fiqh and the "Big Player" should be noted as distinct — AI Fiqh is specifically designed for Islamic Knowledge purposes, whereas AI from the "Big Player" is generally for broader use, where users can set their own prompts to receive the responses they desire.