I Built an AI Legal OS with 60 Specialized Agents and Real-time Statute Verification

#ai #legaltech #python #webdev

When people in Korea face legal issues, they have three bad options: expensive lawyers ($75+ per session), unreliable internet searches, or AI chatbots that hallucinate laws that don't exist.

I built Lawmadi OS to fix this — an AI legal operating system with 60 domain-specialized agents that verify every answer against live government databases.

Live: lawmadi.com

The Problem with Legal AI

Ask ChatGPT about Korean labor law, and it will confidently cite "Article 27 of the Labor Standards Act" — except that article might not say what it claims, or might not exist at all. In the legal domain, hallucination isn't just annoying — it's dangerous. People make life-changing decisions based on legal information.

How Lawmadi OS Works

3-Layer NLU Routing

Instead of sending every query through an expensive LLM classification step, we use cascading routing:

This gives us low latency for most queries, high accuracy (264/264 test cases passing), and cost efficiency.

60 Specialized Agents

Each of the 60 agents specializes in a specific area of Korean law:

L09 담우 — Labor Law (unfair dismissal, unpaid wages)
L08 온유 — Lease/Rent Law (전세 deposits, tenant rights)
L03 담슬 — Divorce & Family Law
L10 결휘 — Traffic Accidents
L01 휘율 — Criminal Law
And 55 more covering tax, IP, immigration, inheritance, medical, military, environment, data privacy, startups, etc.

Why 60 instead of 1 generalist? Specialization matters. Each agent has domain-tuned prompts, knowledge of relevant statutes, and optimized response patterns. It's like having a law firm with 60 specialists instead of one generalist.

4-Stage Verification Pipeline

This is the core architecture:

Stage 4 is what makes Lawmadi OS different. After Gemini generates a response, we:

Extract all statute citations from the response
Query Korea's official legislative database (법제처, law.go.kr) via DRF API
Verify — Does the law exist? Does the article number exist? Is the content accurate?
Score — Generate a 0-100 verification score
Decide — If score is below threshold, reject the entire response

We cross-reference against 10 government data sources:

Statutes (법령)
Enforcement Decrees (시행령)
Enforcement Rules (시행규칙)
Court Precedents (판례)
Administrative Rules (행정규칙)
And 5 more

Fail-Closed Design

If the verification API is down, the system doesn't fall back to unverified responses. Instead:

Circuit breaker trips after consecutive failures
System enters fail-closed mode
All responses are held until verification is available
We'd rather give no answer than an unverified one

The 5-Stage Empathy Framework

Legal issues are stressful. Every response follows this structure:

Emotional acknowledgment — "This situation must be frustrating..."
Situation diagnosis — Clear analysis of the legal issue
Action roadmap — Specific steps with deadlines
Safety net — Legal aid resources, hotlines, government services
Supportive closing — Encouragement and next steps

Results After 1 Week

Metric	Value
Unique Visitors	114
Queries Processed	481
Success Rate	99.6%
Avg Verification Score	84.7/100
Korean Citation Accuracy	82.5%
English Citation Accuracy	25.6% (improving)
Tests Passing	282/282
Avg Response Time	~40s

Most popular domains: Labor law (90 queries), Housing/Lease (83), Divorce (50), Traffic accidents (48)

Tech Stack

Component	Technology
Backend	FastAPI 0.128.0 + Python 3.10+
LLM	Google Gemini 2.5 Flash
RAG	Vertex AI Search (14,601 docs)
Verification	법제처 DRF API (10 SSOT sources)
Database	Cloud SQL PostgreSQL 17
Hosting	GCP Cloud Run + Firebase
Billing	Paddle (credit-based)
CI/CD	GitHub Actions (5-stage pipeline)
Auth	JWT RBAC + Email OTP
Anti-abuse	IP + Canvas Fingerprint + Device Token

Pricing

Free: 2 queries/day (no account needed)
Starter: 20 queries — .50
Standard: 100 queries — .99
Pro: 300 queries — .99

Credit-based, no subscription. Powered by Paddle.

Challenges & Next Steps

Latency — ~40s avg is too slow. Gemini generation (~30s) is the bottleneck. Exploring parallel RAG + prefetch.
English citations — 25.6% accuracy vs 82.5% Korean. Standardized English translations of Korean law names are inconsistent.
Scale — 60 system prompts to maintain. Considering automated prompt generation.