peter choe

Posted on Mar 18 • Originally published at lawmadi.com

How I Built 법마디(Lawmadi) OS — An AI Legal OS with 60 Specialized Agents and Real-Time Statute Verification

#ai #legaltech #python #architecture

How I Built 법마디(Lawmadi) OS — An AI Legal OS with 60 Specialized Agents

TL;DR

I built 법마디(Lawmadi) OS — an AI-powered legal operating system for Korean law with 60 domain-specialized agents. Every statute citation is verified against live government databases in real-time. If verification fails, the system refuses to answer. Here's the architecture and what I learned.

The Problem

In Korea, when you face a legal issue, your options are:

Expensive lawyers — consultations start at ₩100,000+ (~$75)
Unreliable internet searches — fragmented, outdated information
AI chatbots that hallucinate — confidently citing laws that don't exist

That last one is the most dangerous. When ChatGPT tells you "Article 52 of the Labor Standards Act protects you," and that article says something completely different — you could make serious mistakes with real legal consequences.

The Solution: Verify Everything

법마디 OS takes a different approach: every statute citation is verified against Korea's official legislative database in real-time. If verification fails, the system refuses to answer rather than risk providing wrong legal information.

Architecture Overview

User Query (Korean or English)
  |
  +- Stage 0: NLU -> Leader Selection (1 of 60 agents)
  |   +- Layer 1: Regex intent patterns (~70% of queries)
  |   +- Layer 2: Domain keyword matching (~20%)
  |   +- Layer 3: Gemini classification (fallback ~10%)
  |
  +- Stage 1: RAG Retrieval
  |   +- Vertex AI Search (14,601 legal documents)
  |
  +- Stage 3: LLM Analysis
  |   +- Gemini 2.5 Flash (domain-tuned prompts)
  |
  +- Stage 4: DRF Verification
      +- law.go.kr API (10 government data sources)
      +- Fail-Closed: reject if unverified

Why 60 Agents?

Instead of one generalist legal AI, 법마디 OS has 60 domain-specialized agents — one for each area of Korean law:

담우 (Labor Law) — unfair dismissal, unpaid wages, workplace harassment
온유 (Lease/Housing) — tenant rights, deposit recovery, jeonse fraud
산들 (Divorce/Family) — custody, property division, domestic violence
하늬 (Traffic) — accident liability, insurance claims, DUI
무결 (Criminal) — fraud complaints, defamation, prosecution process
And 55 more specialists...

Each agent has its own system prompt tuned for that specific legal domain. When you ask about unfair dismissal, the NLU engine routes your question to the labor law agent — not a generalist that might confuse labor law with contract law.

3-Layer NLU Routing

The routing system uses three layers with priority ordering:

Layer 1 — Regex NLU catches ~70% of queries using Korean and English legal intent patterns:

# Simplified example
patterns = {
    "labor": [r"해고|퇴직금|임금체불|unfair.dismissal|unpaid.wages"],
    "lease": [r"전세|보증금|임대차|tenant|deposit|lease"],
    "divorce": [r"이혼|양육권|위자료|divorce|custody|alimony"],
}

Layer 2 — Keyword matching handles ~20% using domain-specific vocabularies.

Layer 3 — Gemini classification serves as the fallback for ambiguous queries.

This layered approach is much faster and cheaper than pure LLM routing, while maintaining 282/282 accuracy on our test suite.

Real-Time Verification (Stage 4)

This is the key differentiator. After Gemini generates a response, Stage 4:

Extracts every statute citation from the response
Queries Korea's official legislative API (법제처 DRF)
Verifies the law exists, the article number is correct, and the content matches
Rejects the response if verification fails (fail-closed)

We cross-reference against 10 government data sources to ensure accuracy.

Empathy-First Response Framework

Legal questions come from people in distress. Every response follows a 5-stage framework:

Emotional acknowledgment — validate the user's feelings
Situation diagnosis — analyze the legal situation
Action roadmap — specific steps with costs and timelines
Safety net — legal aid resources, hotlines, free consultation options
Supportive closing — encouragement and next steps

Tech Stack

Component	Technology
Backend	Python / FastAPI
LLM	Google Gemini 2.5 Flash
RAG	Vertex AI Search (14,601 docs)
Verification	법제처 DRF API (10 sources)
Database	Cloud SQL PostgreSQL 17
Hosting	GCP Cloud Run + Firebase
Billing	Paddle (credit-based)
CI/CD	GitHub Actions (5-stage pipeline)
Tests	282 automated tests (264 NLU + 18 verifier)

Results (March 2026)

Metric	Value
Success rate	100%
Error rate	0.0%
Avg response time	~38s
Test suite	282/282 passing
Legal guides	15 (SEO optimized)
Languages	Korean + English

Challenges & Lessons Learned

1. Latency (~38s average)

The main bottleneck is Gemini generation. Streaming helps UX but doesn't reduce total time. We're exploring parallel RAG + DRF prefetching to cut latency.

2. English Statute Citation

Korean citation matching works well, but English accuracy is lower. Korean law names and article structures don't map cleanly to English translations. This is an active area of improvement.

3. Monitoring Bot Traffic

We discovered that our own health monitoring workflow was generating 45% of all traffic — running identical test queries every 6 hours. After fixing this, our real user metrics became much cleaner. Lesson: always filter admin/bot traffic from analytics.

Security: Fighting Bot Abuse

Within the first week, we detected automated scraping bots. Our response:

3-layer device fingerprinting (IP + canvas fingerprint + UUID token)
DB-persistent IP blacklist (survives redeployments)
Rate limiting with automatic blacklisting (20x 429 in 60s -> 1hr ban)
Admin query filtering — bot traffic excluded from all dashboards

Pricing

Free: 2 queries/day, no account needed
Starter: 20 queries — $1.50
Standard: 100 queries — $4.99
Pro: 300 queries — $9.99

Credit-based (no subscription) via Paddle — lower friction for users who just need a few answers.

Try It

Live: https://lawmadi.com (Korean default, English at /en)

I'd love feedback on:

The multi-agent routing approach — is 60 agents overkill?
The fail-closed verification — would you trust a legal AI more knowing it refuses to answer when unsure?
Ideas for reducing latency

Thanks for reading!

DEV Community