DEV Community

Zainalabdeen
Zainalabdeen

Posted on

๐Ÿง‘โ€โš–๏ธ Building a Saudi Labor Law AI Assistant โ€” Bilingual, Semantic, and Context-Aware

The Saudi Labor Law is a complex and evolving legal framework. For HR teams, employers, and employees, understanding its details โ€” from leave entitlements to termination rules โ€” often means scrolling through dozens of pages, interpreting legal text, and trying to connect articles to real-world cases.

I wanted to change that.
So I built Saudi Labor Law AI Assistant โ€” an intelligent, bilingual chatbot that answers legal questions instantly, explains relevant articles, and even analyzes employee-specific scenarios โ€” all powered by vector search, LLMs, and semantic retrieval.


๐Ÿ“Œ Why This Project Matters

The challenges were clear:

โš ๏ธ The official English translation of the law is outdated โ€” the Arabic version is the authoritative reference.

๐Ÿ“š Searching manually across legal PDFs is slow and error-prone.

๐Ÿง  HR teams need contextual interpretations, not just raw text.

The solution? Combine document parsing, embeddings, vector databases, translation, and LLM reasoning into one end-to-end system that delivers article-backed, trustworthy answers in Arabic or English.


๐Ÿง  What the AI Assistant Can Do

Hereโ€™s what the system offers today:

๐Ÿ’ฌ Ask legal questions in Arabic or English โ€” answers come in the same language.

๐Ÿงพ Analyze real employee cases โ€” like leave eligibility, overtime pay, or termination compensation.

๐Ÿ” Retrieve the exact legal articles that support every answer.

๐Ÿง‘โ€๐Ÿ’ผ Integrate employee data (age, salary, service years) into the reasoning process for personalized results.

๐ŸŒ Handle bilingual queries with automatic translation and context matching.


๐Ÿ”ง How It Works

The assistant is built on a robust NLP and retrieval pipeline:

๐Ÿ“„ PDF Parsing โ€“ The official Arabic labor law is parsed with PyMuPDF, preserving RTL text and diacritics.

๐Ÿ”Ž Structured Splitting โ€“ The document is split into parts, chapters, and articles with metadata.

๐ŸŒ Translation โ€“ Each article is translated to English using Helsinki-NLP/opus-mt-ar-en for bilingual support.

๐Ÿ“Š Vectorization โ€“ Both Arabic and English texts are embedded using intfloat/multilingual-e5-base and stored in a Qdrant vector database.

๐Ÿค– Retrieval + Reasoning โ€“ A VectorIndexRetriever fetches the most relevant articles, which are then passed to GPT-4o-mini for grounded, human-readable answers.

๐Ÿ“ˆ Hybrid Search Evaluation โ€“ After testing semantic and hybrid retrieval methods on 1,245 queries, hybrid search proved superior and is used by default.


๐Ÿง‘โ€๐Ÿ’ผ Context-Aware Legal Reasoning

One of the most powerful features is employee-specific reasoning.
For example:

โ€œIs this employee eligible for 30 days of annual leave if he has worked for 6 years?โ€

The chatbot uses employee metadata (service years, salary, leave days, etc.) to reason about the law in context, delivering precise, actionable answers โ€” always citing the original legal article.


๐Ÿ–ฅ๏ธ Streamlit Interface

The frontend is built with Streamlit to make the experience intuitive and user-friendly:

๐ŸŒ Auto-detect Arabic or English queries.

๐Ÿ“„ Optional employee data input.

๐Ÿ” Expandable references with similarity scores.

๐Ÿ“š Source tracing from Part โ†’ Chapter โ†’ Article.


๐Ÿš€ Example in Action

Arabic Example:

๐Ÿ‘ค: ู…ุง ู‡ูŠ ู…ุฏุฉ ุงู„ุฅุฌุงุฒุฉ ุงู„ุณู†ูˆูŠุฉ ุจุนุฏ ุฎู…ุณ ุณู†ูˆุงุช ู…ู† ุงู„ุฎุฏู…ุฉุŸ
๐Ÿค–: ูŠุณุชุญู‚ ุงู„ุนุงู…ู„ ุซู„ุงุซูŠู† ูŠูˆู…ุงู‹ ู…ู† ุงู„ุฅุฌุงุฒุฉ ุงู„ุณู†ูˆูŠุฉโ€ฆ
๐Ÿ“–: ุงุณุชู†ุงุฏู‹ุง ุฅู„ู‰ ุงู„ู…ุงุฏุฉ ุงู„ุชุงุณุนุฉ ุจุนุฏ ุงู„ู…ุงุฆุฉ

English Example:

๐Ÿ‘ค: What are the sick leave entitlements for an employee?
๐Ÿค–: The employee is entitled to paid sick leave for a specific durationโ€ฆ
๐Ÿ“–: Based on Article 117 โ€“ Chapter Four


๐Ÿงญ Whatโ€™s Next

The project is just getting started. Planned enhancements include:

๐Ÿ“‘ PDF export of Q&A with references

๐Ÿงฎ HR calculators (end-of-service, overtime, vacation accrual)

๐Ÿ”Š Arabic voice interaction

๐Ÿ“Š HR analytics dashboard

๐Ÿงฐ Tech Stack

Component Technology
Frontend Streamlit
LLM GPT-4o-mini
Embeddings intfloat/multilingual-e5-base
Vector DB Qdrant
Retrieval LlamaIndex
Translation Helsinki-NLP/opus-mt-ar-en
Parsing PyMuPDF (fitz)

๐Ÿ’ก Saudi Labor Law AI Assistant is open-source and licensed under MIT. Itโ€™s built to make labor law understandable, accessible, and actionable โ€” for HR teams, companies, and employees across Saudi Arabia.

๐Ÿ”— Explore the Project

๐Ÿ‘‰ GitHub Repository
I build This Project as Final Project Of learning LLm-ZoomCamp Course

Top comments (0)