How I Built 법마디(Lawmadi) OS — An AI Legal OS with 60 Specialized Agents
TL;DR
I built 법마디(Lawmadi) OS — an AI-powered legal operating system for Korean law with 60 domain-specialized agents. Every statute citation is verified against live government databases in real-time. If verification fails, the system refuses to answer. Here's the architecture and what I learned.
The Problem
In Korea, when you face a legal issue, your options are:
- Expensive lawyers — consultations start at ₩100,000+ (~$75)
- Unreliable internet searches — fragmented, outdated information
- AI chatbots that hallucinate — confidently citing laws that don't exist
That last one is the most dangerous. When ChatGPT tells you "Article 52 of the Labor Standards Act protects you," and that article says something completely different — you could make serious mistakes with real legal consequences.
The Solution: Verify Everything
법마디 OS takes a different approach: every statute citation is verified against Korea's official legislative database in real-time. If verification fails, the system refuses to answer rather than risk providing wrong legal information.
Architecture Overview
User Query (Korean or English)
|
+- Stage 0: NLU -> Leader Selection (1 of 60 agents)
| +- Layer 1: Regex intent patterns (~70% of queries)
| +- Layer 2: Domain keyword matching (~20%)
| +- Layer 3: Gemini classification (fallback ~10%)
|
+- Stage 1: RAG Retrieval
| +- Vertex AI Search (14,601 legal documents)
|
+- Stage 3: LLM Analysis
| +- Gemini 2.5 Flash (domain-tuned prompts)
|
+- Stage 4: DRF Verification
+- law.go.kr API (10 government data sources)
+- Fail-Closed: reject if unverified
Why 60 Agents?
Instead of one generalist legal AI, 법마디 OS has 60 domain-specialized agents — one for each area of Korean law:
- 담우 (Labor Law) — unfair dismissal, unpaid wages, workplace harassment
- 온유 (Lease/Housing) — tenant rights, deposit recovery, jeonse fraud
- 산들 (Divorce/Family) — custody, property division, domestic violence
- 하늬 (Traffic) — accident liability, insurance claims, DUI
- 무결 (Criminal) — fraud complaints, defamation, prosecution process
- And 55 more specialists...
Each agent has its own system prompt tuned for that specific legal domain. When you ask about unfair dismissal, the NLU engine routes your question to the labor law agent — not a generalist that might confuse labor law with contract law.
3-Layer NLU Routing
The routing system uses three layers with priority ordering:
Layer 1 — Regex NLU catches ~70% of queries using Korean and English legal intent patterns:
# Simplified example
patterns = {
"labor": [r"해고|퇴직금|임금체불|unfair.dismissal|unpaid.wages"],
"lease": [r"전세|보증금|임대차|tenant|deposit|lease"],
"divorce": [r"이혼|양육권|위자료|divorce|custody|alimony"],
}
Layer 2 — Keyword matching handles ~20% using domain-specific vocabularies.
Layer 3 — Gemini classification serves as the fallback for ambiguous queries.
This layered approach is much faster and cheaper than pure LLM routing, while maintaining 282/282 accuracy on our test suite.
Real-Time Verification (Stage 4)
This is the key differentiator. After Gemini generates a response, Stage 4:
- Extracts every statute citation from the response
- Queries Korea's official legislative API (법제처 DRF)
- Verifies the law exists, the article number is correct, and the content matches
- Rejects the response if verification fails (fail-closed)
We cross-reference against 10 government data sources to ensure accuracy.
Empathy-First Response Framework
Legal questions come from people in distress. Every response follows a 5-stage framework:
- Emotional acknowledgment — validate the user's feelings
- Situation diagnosis — analyze the legal situation
- Action roadmap — specific steps with costs and timelines
- Safety net — legal aid resources, hotlines, free consultation options
- Supportive closing — encouragement and next steps
Tech Stack
| Component | Technology |
|---|---|
| Backend | Python / FastAPI |
| LLM | Google Gemini 2.5 Flash |
| RAG | Vertex AI Search (14,601 docs) |
| Verification | 법제처 DRF API (10 sources) |
| Database | Cloud SQL PostgreSQL 17 |
| Hosting | GCP Cloud Run + Firebase |
| Billing | Paddle (credit-based) |
| CI/CD | GitHub Actions (5-stage pipeline) |
| Tests | 282 automated tests (264 NLU + 18 verifier) |
Results (March 2026)
| Metric | Value |
|---|---|
| Success rate | 100% |
| Error rate | 0.0% |
| Avg response time | ~38s |
| Test suite | 282/282 passing |
| Legal guides | 15 (SEO optimized) |
| Languages | Korean + English |
Most Popular Legal Domains
- Labor Law (담우) — unfair dismissal, unpaid wages
- Housing/Lease (온유) — deposit disputes, tenant rights
- Criminal (무결) — fraud complaints, defamation
- Divorce/Family (산들) — custody, property division
- Traffic (하늬) — accident liability, DUI
Challenges & Lessons Learned
1. Latency (~38s average)
The main bottleneck is Gemini generation. Streaming helps UX but doesn't reduce total time. We're exploring parallel RAG + DRF prefetching to cut latency.
2. English Statute Citation
Korean citation matching works well, but English accuracy is lower. Korean law names and article structures don't map cleanly to English translations. This is an active area of improvement.
3. Monitoring Bot Traffic
We discovered that our own health monitoring workflow was generating 45% of all traffic — running identical test queries every 6 hours. After fixing this, our real user metrics became much cleaner. Lesson: always filter admin/bot traffic from analytics.
Security: Fighting Bot Abuse
Within the first week, we detected automated scraping bots. Our response:
- 3-layer device fingerprinting (IP + canvas fingerprint + UUID token)
- DB-persistent IP blacklist (survives redeployments)
- Rate limiting with automatic blacklisting (20x 429 in 60s -> 1hr ban)
- Admin query filtering — bot traffic excluded from all dashboards
Pricing
- Free: 2 queries/day, no account needed
- Starter: 20 queries — $1.50
- Standard: 100 queries — $4.99
- Pro: 300 queries — $9.99
Credit-based (no subscription) via Paddle — lower friction for users who just need a few answers.
Try It
Live: https://lawmadi.com (Korean default, English at /en)
I'd love feedback on:
- The multi-agent routing approach — is 60 agents overkill?
- The fail-closed verification — would you trust a legal AI more knowing it refuses to answer when unsure?
- Ideas for reducing latency
Thanks for reading!
Top comments (0)