DEV Community

Moca
Moca

Posted on

Geul: A Native Compiled Language Designed Around Korean Grammar (SOV, Particle Binding, Self-Hosting)

Every non-English programming language I've seen does the same thing: translate
the keywords. if, while, return become local words, but the grammar stays
identical to C or Python.

That never felt right to me for Korean.

Korean isn't just "English with different words." It's a fundamentally different
grammatical structure — SOV word order, agglutinative morphology, and a particle
system that carries meaning independently of word position.

So I built 글 (Geul) — a natively compiled, self-hosting programming language
that actually models Korean grammar instead of just wearing it as a skin.


The Core Problem: Korean is SOV, Not SVO

English (and most programming languages) follow SVO order:

Subject Verb Object → add(5, 3)

Korean follows SOV:

Subject Object Verb → 나는 밥을 먹다 (I rice eat)

But more importantly, Korean uses grammatical particles to mark roles:

Particle Role
을/를 Object marker
Location / target
Direction / method
이/가 Subject marker

This means word order is flexible in Korean — the particles carry the meaning,
not the position.


What This Looks Like in Code

In Geul, a function call follows Korean SOV structure:

5를 3에 더하다
Enter fullscreen mode Exit fullscreen mode

Breaking it down:

  • 5를 — 5 marked with (object marker)
  • 3에 — 3 marked with (target marker)
  • 더하다 — "to add" (verb form)

Compare this to the equivalent in C:

add(5, 3)
Enter fullscreen mode Exit fullscreen mode

The function declaration mirrors this structure:

[정수 왼쪽을 정수 오른쪽에 더하기]는 -> 정수 {
    반환 왼쪽 + 오른쪽.
}
Enter fullscreen mode Exit fullscreen mode

왼쪽을 declares that the parameter 왼쪽 (left) expects the 를/을 particle.

오른쪽에 declares that 오른쪽 (right) expects the particle.

The parser resolves argument-to-parameter binding through particle matching —
not positional order.


The Morpheme Problem

Korean verbs conjugate heavily. The call site uses a conjugated verb form, but
the function is declared with a nominalized (noun) form:

Form Korean Meaning
Function name 더하기 "adding" (noun form)
Call site 더하다 "to add" (verb form)

The compiler needs to know that 더하다더하기 at parse time.

To solve this, I built a 3-level recursive morpheme analyzer that strips
Korean verb conjugation suffixes and maps call-site verb forms to their
declared function names automatically.


Compiler Architecture

Geul has two backends:

.글 source
    │
    ▼
Morpheme Analysis   ← 3-level recursive affix splitting
    │
    ▼
Recursive Descent Parser   ← SOV + particle-aware
    │
    ▼
AST → 3-Address IR + Static Type Checking
    │
    ├──────────────────────┐
    ▼                      ▼
C Transpiler          x86-64 Native
(cl/gcc)              (PE direct write)
    │                      │
    ▼                      ▼
  .exe                   .exe
                    (zero external deps)
Enter fullscreen mode Exit fullscreen mode

The native backend writes PE headers and x86-64 machine code directly
no cl.exe, no gcc, no external assembler required.

Optimizations

  • Compare-branch fusioncmp + jcc direct emit
  • Register pinning — loop variables pinned to callee-saved registers
  • Peephole — dead store elimination, redundant copy removal
  • Loop-invariant code motion — constant loads hoisted out of loops
  • Tail call optimization — self-recursion converted to jmp

Self-Hosting

In v0.5, Geul became self-hosting — the compiler is written in Geul and
compiles itself.

Verification: bootstrap binary → compile source → output binary SHA256
fixed-point matches. ✓

This was the hardest milestone. Getting a compiler to compile itself means
every language feature used in the compiler itself must be fully functional
and stable.


Performance

Measured on Intel Ultra 5 226V, 5-run median. C compiled with MSVC /O2.

Benchmark Geul C /O2 vs C vs Python
Recursive fib(40) 464ms 321ms 1.45x 15.9x faster
Sieve of 1M primes 10ms 8ms 1.30x 16.7x faster
Bubble sort 30k 1564ms 564ms 2.77x 24.5x faster

The bubble sort gap (2.77x) is the most notable — nested loop register
allocation pressure is the likely cause. It's on the roadmap.


Quick Syntax Tour

Variables and types:

정수 나이 = 25.          (* int *)
실수 원주율 = 3.14159.   (* double *)
문자열 이름 = "홍길동".  (* string *)
참거짓 활성 = 참.        (* bool *)
Enter fullscreen mode Exit fullscreen mode

String interpolation:

"{이름}은 {나이}세입니다\n"을 쓰다.
Enter fullscreen mode Exit fullscreen mode

Variables are embedded directly with automatic type formatting.

Control flow:

값 > 10이면 {
    "크다\n"을 쓰다.
}
아니면 {
    "작다\n"을 쓰다.
}

(번호 < 10)동안 {
    번호를 증가하다.
}
Enter fullscreen mode Exit fullscreen mode

Structs:

묶음 점은 정수 가로, 정수 세로.

점 위치.
위치의 가로 = 10.
위치의 세로 = 20.
Enter fullscreen mode Exit fullscreen mode

Current State

  • ✅ Windows 10/11 (Linux ELF planned)
  • ✅ VS Code extension (syntax highlight, F5 run, LSP autocomplete)
  • ✅ 9 stdlib modules, 62 functions
  • ✅ Static type checking at compile time
  • ✅ Self-hosting since v0.5
  • 🔲 Package manager (planned)
  • 🔲 Debug info / PDB (planned)
  • 🔲 Error handling syntax (시도/실패)

Currently at v0.7.1 (18 releases since March 2026, MIT license).


Try It

No install needed — browser playground:

👉 https://geul-web.vercel.app/playground

Source:

👉 https://github.com/wwoosshh/geul-lang


Curious what approaches others have seen for non-English or non-SVO language
design — are there other languages that go beyond keyword translation?

Top comments (0)