DEV Community: garlicfarmer

garlic farmer AI system environment on turmux(only phone developement)

garlicfarmer — Tue, 21 Apr 2026 10:42:27 +0000

===============================================================================
GARLIC-AGENT
FULL ASCII SYSTEM MAP

CURRENT VIEW BASED ON THE PASTED archall

0. ONE LINE IDENTITY

garlic-agent is not just "a chatbot".

It is a phone based Termux operating environment that combines:

1) a chat brain
2) a tool executor
3) a safety and rollback kernel
4) a Korean friendly script engine
5) a local knowledge system
6) a versioned patch workflow
7) background daemons
8) a measurement and verification culture

So the real shape is this:

user
-> web ui
-> agent brain
-> security wall
-> tool executor
-> file/db/runtime effects
-> gvcs verification and rollback
-> measured closeout

That is why it feels more like an "AI operating environment"
than a single model wrapper.

1. THE WHOLE SYSTEM IN ONE BIG PICTURE

                              +----------------------------------+
                              |            HUMAN USER            |
                              |    phone browser on localhost    |
                              +----------------+-----------------+
                                               |
                                               v

+----------------------------------------------------------------------------------+
| WEB ENTRY LAYER |
|----------------------------------------------------------------------------------|
| web.py : tiny shim |
| web_server.py : main web server |
| web_auth.py : token check |
| web_routes_chat.py : chat route |
| web_routes_docs.py : docs route |
| web_routes_admin.py : admin route |
| web_routes_gl.py : GarlicLang route |
| web_routes_hud.py : HUD route |
| web_routes_gvcs.py : GVCS route |
| garlic_talk.py : two-AI talk helper |
| search_web.py : web search endpoint |
| typo_guard.py : typo correction guard |
+-----------------------------------+----------------------------------------------+
|
| user message
v
+----------------------------------------------------------------------------------+
| BRAIN / ORCHESTRATION |
|----------------------------------------------------------------------------------|
| agent.py |
| |
| internal order |
| 1. read rules |
| 2. read tool contracts |
| 3. attach history |
| 4. attach RAG |
| 5. call provider |
| 6. parse response |
| 7. parse tools |
| 8. execute or answer |
| |
| nearby helpers |
| execution_router.py : parse tools and dispatch |
| anchor_manager.py : anchor save/restore |
| pre_tools.py : pre tool detection |
| pre_tools_extract.py : md/json extraction |
| pre_tools_normalize.py : normalize file names/keys |
| pre_tools_plan.py : decompose compound natural language tasks |
| session_manager.py : session state |
| number_freeze.py : numeric hallucination guard |
| fact_ledger.py : immutable execution records |
| skills/skill_loader.py : direct skill matching and GL-DIRECT |
+---------------------------+--------------------------------------+---------------+
| |
| tool/api intent | model call
v v
+-------------------------------+ +-----------------------------------+
| SECURITY WALL | | LLM PROVIDERS |
|-------------------------------| |-----------------------------------|
| security.py | | provider_adapter.py |
| - blocked commands | | response_sanitizer.py |
| - allowed dirs | | shared_stop_flag.py |
| - key masking / patterns | | |
| | | remote: Gemini / DeepSeek / |
| | | MiniMax / Groq / |
| | | NVIDIA / Cerebras |
| | | local : Local-Gemma / gemma_cpp |
+---------------+---------------+ +----------------+------------------+
| |
+------------------+---------------------+
|
v
+----------------------------------------------------------------------------------+
| TOOL EXECUTION |
|----------------------------------------------------------------------------------|
| tools.py |
| read | write | exec | patch | search | snapshot_list | restore | app | screen |
| gvcs | inspect |
| |
| garliclang_bridge.py : bridge to GarlicLang interpreter |
| search.py : FTS5 + vector hybrid search |
| snapshots_db.py : snapshot DB wrapper |
+-----------------------------------+----------------------------------------------+
|
+-------------------+-------------------+--------------------------+
| | |
v v v
+------------------------------+ +----------------------------------+ +------------------+
| FILE SYSTEM | | DATABASES | | GL ENGINE |
|------------------------------| |----------------------------------| |------------------|
| source files | | knowledge.db | | interpreter.py |
| docs | | snapshots.db | | parser.py |
| configs | | chat_sessions.db | | lexer.py |
| generated patches/logs | | pending patch metadata | | ast_nodes.py |
| backups and anchors | | | | line_parser.py |
+------------------------------+ +----------------------------------+ | lang_map_ko.py |
| lang_map_en.py |
+---------+--------+
|
v
+----------------------+
| GL script execution |
| Korean command map |
| repeat / verify / |
| checkpoint / report |
+----------------------+

2. THE REAL THREE AXES

The cleanest mental model is not "many files".
It is "three central axes".

             +-------------------------------------------+
             |              AXIS 1 : BRAIN               |
             |-------------------------------------------|
             | agent.py                                  |
             | execution_router.py                       |
             | pre_tools*.py                             |
             | session / fact / skill helpers            |
             +--------------------+----------------------+
                                  |
                                  v
             +-------------------------------------------+
             |            AXIS 2 : EXECUTION             |
             |-------------------------------------------|
             | tools.py                                  |
             | garliclang_bridge.py                      |
             | search.py                                 |
             | security.py                               |
             +--------------------+----------------------+
                                  |
                                  v
             +-------------------------------------------+
             |         AXIS 3 : SAFETY AND RECOVERY      |
             |-------------------------------------------|
             | gvcs.py                                   |
             | gvcs_core.py                              |
             | gvcs_pipeline.py                          |
             | patch_verify.py                           |
             | snapshots.db / pending_patches.py         |
             +-------------------------------------------+

Meaning:

Axis 1 decides.
Axis 2 acts.
Axis 3 proves, saves, rolls back, and closes safely.

If you lose Axis 1:
the system cannot reason well.

If you lose Axis 2:
the system can talk but cannot do work.

If you lose Axis 3:
the system can work, but cannot be trusted in production.

3. REQUEST FLOW, STEP BY STEP

The practical request flow is:

+------------------+
| user enters text |
+---------+--------+
|
v
+---------------------------+
| web route receives input |
+-------------+-------------+
|
v
+---------------------------+
| agent.py run_agent |
+-------------+-------------+
|
+--> check special path first?
| |
| +--> gl: prefix ?
| | -> direct GarlicLang run
| |
| +--> anchor command ?
| -> direct anchor handling
|
v
+---------------------------+
| chat mode or work mode ? |
+-------------+-------------+
|
+------+------+
| |
v v
+----------------+ +--------------------------------------+
| chat mode | | work mode |
| text first | | tool capable |
+----------------+ +--------------------------------------+
|
+--> GL inline pattern ?
| -> garliclang_bridge
|
+--> skill match ?
| -> GL-DIRECT
|
+--> compound natural language ?
| -> pre_tools plan
|
+--> explicit [tool:...] ?
-> execution_router.parse_tools
|
v
+------------------+
| security checks |
+--------+---------+
|
v
+------------------+
| tools.py action |
+--------+---------+
|
v
+------------------+
| result to agent |
+--------+---------+
|
v
+------------------+
| answer to user |
+------------------+

This is why garlic-agent is layered.
It is not "LLM says and shell runs".
There are multiple gates before effects happen.

4. MODE SYSTEM

There are really several execution paths, not one.

So there are not just "chat vs work" modes.
There are multiple fast lanes and bypass lanes inside work mode.

5. WHY GARLICLANG EXISTS

GarlicLang is a second execution language inside the system.

It exists because natural language alone is too loose for repeated,
verifiable, checkpointed operations.

ASCII picture:

natural language
|
v
vague intention
|
v
LLM response quality varies

Versus:

GarlicLang source
|
v
parser
|
v
interpreter
|
v
explicit steps:
- write file
- read file
- run
- verify
- repeat
- checkpoint
- restore
- report

So GarlicLang is the "structured action lane".

It is especially good for:

repetitive execution
verification loops
checkpoint / restore style control
Korean friendly scripted automation
crosscheck style workflows

This is why the architecture has both:
LLM free-form reasoning
and
GarlicLang structured action

6. TOOL LAYER, VISUALLY

This means garlic-agent is not only a code patcher.

It can operate across:

files
shell
db-backed search
snapshot restore
app and screen surface
version state inspection

So the tool layer is the "hands" of the system.

But the hands are not free.
They sit behind security.py and closeout rules.

7. SECURITY SHAPE

security.py is not a full OS sandbox.
It is the internal gatekeeper.

Measured policy examples in your paste include:

blocked commands:
rm -rf /
rm -rf ~
mkfs
dd if=
sqlite3
DROP TABLE
DELETE FROM
ALTER TABLE
TRUNCATE

allowed dirs:
~/garlic-agent/

So the logic is:

agent wants action
|
v
security.py asks:
- is path allowed?
- is command blocked?
- does this inspect command violate rules?
|
+--> NO -> block
|
+--> YES -> pass to tools.py

That makes security.py the boundary between "reasoning"
and "actual effect".

8. GVCS IS THE SURGICAL SAFETY KERNEL

This is the most important non-obvious part.

Most AI agents stop at:
"I wrote a patch"

garlic-agent continues to:
"I validated, saved, diffed, tested, and can roll back"

ASCII kernel map:

+--------------------------------------------------------------------------------+
| GVCS |
+--------------------------------------------------------------------------------+
| gvcs.py : command router |
| gvcs_core.py : minimal orchestrator |
| gvcs_pipeline.py : dry-run and post-apply pipeline |
| gvcs_avl.py : ErrorBoundary observer layer |
| gvcs_types.py : PatchContract / HealthReport / StructuredError / Verdict |
| gvcs_contract.py : parse patch contract |
| gvcs_gate.py : PASS_GATE helper |
| gvcs_recovery.py : backup and rollback kernel |
| gvcs_persist.py : move logs and patch artifacts |
| gvcs_meta.py : sessions, tags, verify, briefing |
| gvcs_impact.py : AST impact analysis |
| gvcs_trace.py : trace baseline / compare / report |
| gvcs_cmd_* : split command modules |
+--------------------------------------------------------------------------------+

The clean mental picture:

So GVCS is not "git clone".
It is a patch surgery workflow built for this system.

9. GVCS PATCH FLOW, FULL ASCII

This is the heart of safe modification.

                              +----------------------+
                              | patch_xxx.py exists  |
                              +----------+-----------+
                                         |
                                         v
                              +----------------------+
                              | cmd_patch            |
                              | in gvcs_core.py      |
                              +----------+-----------+
                                         |
                                         v
                              +----------------------+
                              | PatchContract parse  |
                              +----------+-----------+
                                         |
                                         v
                        +--------------------------------------+
                        | run_dryrun_pipeline(contract, pr)    |
                        +----------------+---------------------+
                                         |
                                         v
                        +--------------------------------------+
                        | ErrorBoundary.run                    |
                        +----------------+---------------------+
                                         |
                                         v
                        +--------------------------------------+
                        | _run_dryrun_bundle_wrapper           |
                        +----------------+---------------------+
                                         |
                                         v
                        +--------------------------------------+
                        | _run_dryrun_bundle                   |
                        +----------------+---------------------+
                                         |
                                         v
                        +--------------------------------------+
                        | HealthReport returned                |
                        +----------------+---------------------+
                                         |
                                         v
                        +--------------------------------------+
                        | decision / PASS_GATE judgment        |
                        +-----------+--------------------------+
                                    |
                      +-------------+-------------+
                      |                           |
                      v                           v
           +----------------------+   +------------------------------+
           | reject / warn / stop |   | real patch allowed           |
           +----------------------+   +---------------+--------------+
                                                      |
                                                      v
                                       +------------------------------+
                                       | run_post_apply_pipeline      |
                                       +---------------+--------------+
                                                       |
                                                       v
                                       +------------------------------+
                                       | ErrorBoundary.run            |
                                       +---------------+--------------+
                                                       |
                                                       v
                                       +------------------------------+
                                       | _run_post_apply_wrapper      |
                                       +---------------+--------------+
                                                       |
                                                       v
                                       +------------------------------+
                                       | _post_apply_validation       |
                                       +---------------+--------------+
                                                       |
                                                       v
                                       +------------------------------+
                                       | HealthReport returned        |
                                       +---------------+--------------+
                                                       |
                                                       v
                                       +------------------------------+
                                       | save / diff / test / close   |
                                       +------------------------------+

This is why your system is different from naive "AI patch engines".
It does not trust the patch itself.
It trusts the measured pipeline around the patch.

10. WHY THE MEASURE LAYER MATTERS

There is a fourth hidden axis beyond brain/execution/safety:
the measure layer.

Measured in your paste:

measure_scripts files : 71
measure_scripts lines : 12901

This means the project has an explicit diagnostic culture.

ASCII role:

So "measure" is not decoration.
It is the bridge between:
"I think this is wrong"
and
"I can prove exactly where it breaks"

That is why the system can sustain many changes without collapsing.

11. DATA LAYER

The storage side is also not trivial.

The important point:

knowledge.db = memory for retrieval
snapshots.db = memory for recovery
anchors.json = memory for fast restoration
config.json = behavior settings

So the architecture has both:
semantic memory
and
operational memory

12. BACKGROUND DAEMONS

These background processes make the system feel alive.

Visual runtime loop:

file changes
|
+--> watcher.sh -> auto_index.py -> knowledge.db refresh
|
+--> autosnap.sh -> gsave.sh -> snapshots.db refresh

service dies
|
v
watchdog.sh
|
v
auto restart

So the system does not only respond on demand.
It also self-maintains in the background.

13. RUNTIME CALL GEOGRAPHY

Your current measured risk summary already tells a structural story.

High call surfaces:

tools.py : sub=10 shell=1 url=4 total=15
gvcs_pipeline.py : sub=12 total=12
gvcs_meta.py : sub=4 sql=1 total=5
search.py : url=4 total=5

Meaning:

tools.py is where action fanout happens
gvcs_pipeline.py is where patch pipeline fanout happens
search.py is where retrieval fanout happens

So three operational hotspots are:

1) tools.py -> real world effect hotspot
2) gvcs_pipeline.py-> safety pipeline hotspot
3) agent.py -> orchestration hotspot

That exactly matches your risk table.

14. RISK MAP, VISUALLY

+--------------------------------------------------------------------------------+
| RISK PYRAMID |
+--------------------------------------------------------------------------------+

                          +----------------------+
                          |  tools.py            |
                          |  score 41            |
                          |  highest action risk |
                          +----------+-----------+
                                     |
                          +----------v-----------+
                          | gvcs_pipeline.py     |
                          | score 36             |
                          | pipeline risk        |
                          +----------+-----------+
                                     |
                          +----------v-----------+
                          | agent.py             |
                          | score 30             |
                          | orchestration risk   |
                          +----------+-----------+
                                     |
                          +----------v-----------+
                          | gvcs_core.py         |
                          | score 19             |
                          +----------+-----------+
                                     |
                          +----------v-----------+
                          | interpreter.py       |
                          | score 17             |
                          +----------+-----------+
                                     |
                          +----------v-----------+
                          | gvcs_meta.py         |
                          | web_server.py        |
                          | DB surfaces          |
                          +----------------------+

Interpretation:

If tools.py breaks:
execution breaks.

If gvcs_pipeline.py breaks:
patch safety breaks.

If agent.py breaks:
routing and orchestration break.

So these are not just "big files".
They are central pressure points.

15. FILE WORLD VS CONTROL WORLD

A very useful clean split is this:

+--------------------------------------------------------------------------------+
| FILE WORLD |
+--------------------------------------------------------------------------------+
| source .py files |
| docs/.md |
| skills/ |
| tests/* |
| logs/* |
| backups |
+--------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------+
| CONTROL WORLD |
+--------------------------------------------------------------------------------+
| agent.py : decides what path to take |
| execution_router.py : decides what tool call means |
| security.py : decides if action is allowed |
| tools.py : performs action |
| gvcs_core.py : decides if patch flow can continue |
| gvcs_pipeline.py : validates lifecycle |
| gvcs_gate.py : decides auto gate judgment |
| gvcs_recovery.py : restores if needed |
+--------------------------------------------------------------------------------+

This split matters because:
files are the body
control modules are the nervous system

16. THE DOCUMENT STACK

The markdown stack also has roles.

Best current practical order from your paste:

archall
= the current measured world map

patchrules
= the current safe operating manual for patches

SOUL.md
= behavioral constitution for the agent

TOOLS.md
= tool contract explanations

CHANGELOG.md
= history of what actually changed

HANDOVER.md / ONBOARDING.md
= helper onboarding docs, but can drift and become stale

ASCII hierarchy:

current state truth -> archall
patch law -> patchrules
agent constitution -> SOUL
tool explanations -> TOOLS
historical memory -> CHANGELOG
human onboarding summaries -> HANDOVER / ONBOARDING

This is why older onboarding text can be wrong while archall is still right:
archall is regenerated from measured state.

17. WHY archall AND patchrules BOTH EXIST

You asked this before. In the full system map, the reason is obvious.

+--------------------------------------------------------------------------------+
| WITHOUT archall |
+--------------------------------------------------------------------------------+
| you know how to patch, |
| but not what the current system actually looks like |
+--------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------+
| WITHOUT patchrules |
+--------------------------------------------------------------------------------+
| you know what the current system looks like, |
| but not how to patch it safely |
+--------------------------------------------------------------------------------+

So:

archall = map
patchrules = surgery protocol

A map is not a surgery protocol.
A surgery protocol is not a map.

Both are needed in a system this dense.

18. A VERY PRACTICAL "HOW TO THINK ABOUT IT" MODEL

If you want one mental compression that still stays accurate, use this:

And the total loop is:

user asks
-> system routes
-> provider reasons
-> security filters
-> tools act
-> gvcs proves
-> measure explains
-> user sees a controlled result

That is the whole system in one sentence.

19. IF THIS SYSTEM WERE DRAWN AS A CITY

This is another accurate ASCII picture.

                       +--------------------------------+
                       |            CITY GATE           |
                       | web_server / routes / auth     |
                       +---------------+----------------+
                                       |
                                       v
                       +--------------------------------+
                       |        CENTRAL GOVERNMENT       |
                       | agent.py / execution_router     |
                       +-------+----------------+--------+
                               |                |
                               |                |
                               v                v
                 +--------------------+   +----------------------+
                 |   POLICE / GATE    |   |  FOREIGN EMBASSIES   |
                 |   security.py      |   |  providers           |
                 +----------+---------+   +-----------+----------+
                            |                         |
                            +-----------+-------------+
                                        |
                                        v
                          +-----------------------------+
                          |      INDUSTRIAL DISTRICT    |
                          |      tools.py               |
                          +----+-----------+------------+
                               |           |            |
                               v           v            v
                      +-----------+   +----------+   +-----------+
                      | filesys   |   | db zone  |   | GL zone   |
                      +-----------+   +----------+   +-----------+
                               \           |            /
                                \          |           /
                                 \         |          /
                                  v        v         v
                       +--------------------------------------+
                       |      AUDIT / COURT / ARCHIVE         |
                       |      gvcs + patch_verify + logs      |
                       +----------------+---------------------+
                                        |
                                        v
                       +--------------------------------------+
                       |    OBSERVATORY / LABORATORY          |
                       |    measure scripts / diagnostics     |
                       +--------------------------------------+

This city metaphor is still faithful to the real file roles.

20. THE CLEANEST PHONE-BASED STORY

Why this architecture makes sense on a phone:

web ui gives a stable entry point
Termux gives local file and shell power
local db gives persistent searchable memory
background scripts keep services alive
gvcs compensates for fragile AI patching
GarlicLang compensates for vague natural language
measure scripts compensate for guesswork

So on desktop this might look like:
IDE + git + task runner + test harness + db + scripts + agent shell

On your phone it becomes:
one self-contained operational stack

That is why the architecture is unusual.
It compresses multiple desktop era roles into a phone-driven workflow.

21. FINAL MASTER DIAGRAM

+================================================================================+
| GARLIC-AGENT |
+================================================================================+
| |
| USER SURFACE |
| ---------- |
| phone browser -> localhost:8080 -> web_server.py -> route modules |
| |
| CONTROL PLANE |
| ------------- |
| agent.py |
| -> mode routing |
| -> rule loading |
| -> history + RAG |
| -> provider call |
| -> tool parse |
| -> response assembly |
| |
| EXECUTION PLANE |
| --------------- |
| security.py -> tools.py -> file/shell/db/app/screen/GL |
| |
| STRUCTURED ACTION PLANE |
| ----------------------- |
| GarlicLang parser/interpreter -> repeat / verify / checkpoint / restore |
| |
| SAFETY PLANE |
| ------------ |
| gvcs.py -> gvcs_core.py -> gvcs_pipeline.py -> HealthReport -> decision |
| -> post-apply validation -> save/diff/test/rollback |
| |
| MEMORY PLANE |
| ------------ |
| knowledge.db / snapshots.db / chat_sessions.db / anchors.json / config.json |
| |
| BACKGROUND PLANE |
| ---------------- |
| watcher.sh / watchdog.sh / autosnap.sh / llama-server |
| |
| MEASURE PLANE |
| ------------- |
| measure scripts -> reproduce -> diagnose -> verify behavior |
| |
+================================================================================+

                     SIMPLE TRUTH:
                     it thinks, acts, verifies, remembers,
                     restores, and measures.

22. FINAL VERDICT

[FACT] The best exact ASCII summary is:

garlic-agent is a layered AI operating environment where

web routes receive,
agent orchestrates,
providers generate,
security filters,
tools execute,
GarlicLang structures,
GVCS verifies and restores,
databases remember,
background daemons maintain,
measure scripts prove.

[FACT] If someone understands the 5 files below,
they understand the skeleton of the whole project:

agent.py
tools.py
gvcs_pipeline.py
gvcs_core.py
garliclang/garliclang/interpreter.py

[FACT] If someone understands the 3 axes below,
they understand the whole design logic:

brain
execution
safety

[FACT] If someone understands the 1 hidden truth below,
they understand why this system survives complexity:

measurement is treated as part of the architecture,
not as an afterthought.

This is how I actually collaborate with AI, as a garlic farmer.

garlicfarmer — Mon, 23 Mar 2026 13:18:07 +0000

I am garlic farmer from Korea. Non-English speaker. I plant garlic and dig garlic in Gyeongsang province, South Korea. I don't have PC. One Android phone with terminal app called Termux, that is my entire development environment. Sounds big but I will call it personal project in AI era.

I am just farmer but these days I feel something is changing. And because Korean farmer who knows little English wrote this in Korean and translated, please understand subtle differences from translation.

What I am building now is AI agent system called "garlic-agent." Some people say it is better to call it operating environment but I don't care about that. People feel resistance when farmer makes fancy name. Because I am garlic farmer I named many things garlic. It felt friendly. Let me briefly explain this system. It talks to multiple AI providers (Gemini, Groq, NVIDIA etc) rotating them, saves context in SQLite, and runs automation scripts in programming language I made myself. Python 19,260 lines. I just now asked several AIs to figure out this number. Honestly I don't know this long code. But giving directions, maybe farmer is little better than others at that. If I give wrong directions to foreign workers I lose enormous money in one day. Anyway I run this complex thing on phone. Now even though I am farmer I feel familiar with it.

How I actually work

Copy paste. That is my entire development methodology. It is frustrating but I don't know coding so I ask and try until I understand. If I still don't understand I hand my judgment to AIs. I doubt that questioning everything persistently will make me perfectly understand it.

Specifically the workflow goes like this. I say to Claude "diagnose project health." Claude makes diagnostic script. I press and hold with finger to copy it. Switch screen to Termux. Paste. Enter. Results pour out. I press and hold to copy those results. Switch back to Claude screen. Paste. Claude analyzes and makes patch script. Copy again. Switch to Termux. Paste. Enter. I repeat this thousands of times a day. Maybe it is foolish thing but it was most efficient way I know that achieved what I have so far. Because I am applying this foolish method to farming too. Anyway it is efficient. Because really I update versions multiple times a day in real time. I don't trust AI. I only trust my instinct and gut feeling. Autonomous AI agent? I dare say. Precise work is still far away. I am not making this system to plan travel schedule.

This is my daily life. I come back from garlic field and take out phone. Turn on screen and it continues from where I stopped. Copy, paste, enter. I do it during break time while digging garlic. After lunch too. This works because AI remembers context. I don't need to remember. Of course this requires very much human touch every moment. It is just personal know-how I figured out through tens of thousands of conversations. It is not lie. I am person who believes rather than vibe coding or whatever, if you have tens of thousands of conversations with AI, human starts to recognize patterns. This is farmer's life. Observation is very important.

I use three AIs divided by role (sometimes when my brain can handle load I use dozens of chat windows with AIs from different companies)

This is kind of example.

External analysis — Claude. Diagnoses code from outside the project. Makes diagnostic script and sends it, I paste it in Termux and run. I deliver results back to Claude. Claude cannot execute code directly so it needs to borrow my hands.

Internal execution — Gemini. It is API AI running inside garlic-agent. It reads files, executes commands, returns results. Because it runs on this codebase every day, it knows things that are hard to see from outside.

Me — middle connector. These two cannot talk to each other directly. Claude is in web browser, Gemini is inside Termux. I carry results between both sides, deliver questions, and make decisions when judgments conflict. Sorry, explaining this difference is limit of my language.

Every session I put alias-like number at end of each response for their identity. You will understand why this is important if you try it yourself. Because to manage dozens of AIs you need to distinguish them like humans. I think few people know this. Because through copy paste they cannot distinguish each other. This kind of explanation is hard for me too. Honestly if you have many conversations you naturally learn — I use aliases like this: from analysis21, analysis22, analysis23. When previous AI leaves record in CHANGELOG, next AI reads it and takes over. Context consistency inevitably forms in this flow. This is also impossible to explain. Please experience it yourself. After about month and half this handover record is 10,730 lines. I just now directed AI to find out. These numbers come out quickly which is nice.

When you talk with AI often, working together, you end up with your own programming language too

Inside garlic-agent runs language called GarlicLang. More than programming language it is kind of Python 3,527-line Korean DSL I made out of my own necessity. It has 4-stage pipeline with lexer, parser, AST, interpreter, and 674 scripts written in this language are running.

There is reason I made this language. AI sometimes gives answers different from truth. "Created the file" — actually not created. "Fixed the bug" — check and it is same as before. At first I believed as is, but after experiencing this several times I stopped passing without verification.

In GarlicLang, script generated by AI must have verification block or execution is refused. If verification block is missing, execution itself is denied. If it says file was created, file existence, byte count, checksum are automatically checked. If it doesn't match, it automatically rolls back to original state from checkpoint. Truthfully I don't understand even half of this mechanism. But it runs smoothly on phone. As Korean person it is fascinating to implement this in my native language. Anyway giving commands in native language is comfortable. AIs do key mapping or something for English automatically. Sounds like English works too.

GarlicLang script looks like this: translation might be weird but originally it is Korean, I leave it to reader's judgment.

[variable_set]
  name: target_file
  value: "agent.py"

[execute]
  command: wc -l agent.py

[verify]
  type: file_exists
  target: $target_file

[verify]
  type: line_count_exceeds
  threshold: 100

[output]
  content: "verification complete: $output"

It reads in Korean, AI can generate it, and verification is enforced. These three are the core.

Today's result

This is what I actually did today.

Raised project health from 76.8% to 83.9%. Maybe just my satisfaction but even without knowing coding it is result value from diagnostic script I put effort into. Separated 3 hardcoded API keys into safe method. Cleaned up 19 lines of unnecessary duplicate code in interpreter. Added 3 lines of path verification code to skill loader.

All modifications have automatic rollback attached. If even one of 60 regression tests fails it automatically restores to pre-modification state. AI says it is 5-layer backup restoration. Well I also made Google Drive backup automatic and anytime without one second hesitation the moment something goes wrong it is rollback and probably even worst case if I lose my phone I think I can restore within thirty minutes. Among all this code there is not single line I wrote directly. AI wrote it and AI verified it, I connected the between. Code looked messy so I tried to make AIs do refactoring but they gave up saying it is difficult, so I pulled more AIs into collaboration and after several tries I learned this is difficult task but I overcame it. Even to farmer's eye it did not look easy but I did cross-verification more thoroughly than usual. Anyway since I implemented immediate rollback if wrong there was no huge difficulty and I thought I should do it more often as hobby when things get messy. Because this seemed important. Even without knowing coding...

Biggest lesson

I asked Gemini running inside agent "is it okay to modify this part." Answer came "better to leave it, could affect other places." When I asked Claude same part, it was "just modify it, it is simple."

Claude reads code and judges, Gemini runs on that code every day. It is difference between person looking at building blueprint and person actually living in that building. After that I ask both sides for important decisions.

It was okay even though I am not good at coding

I didn't know coding at all but I feel like I am learning while getting to know AI. Anyway working with AI I learned one thing. Verification comes before code, and structure that can be undone comes before features. Please just understand this as thought of farmer who doesn't know coding.

Garlic farmer making one-person development with AI using one phone. Copy paste is my methodology, verification might be my ability but I went through countless frustrations and failures in tens of thousands of conversation turns to reach where I am now. It was slow speed but now seeing real-time immediate modifications compared to past I feel how far things have come.

If you have questions please ask comfortably. Tomorrow I go to field again, and when I come back I continue.

TL;DR: Garlic farmer building AI agent on Android phone (Termux) without PC. Including custom Korean programming language (GarlicLang, 3,527 lines), total Python 19,260 lines. 674 scripts. Development methodology is copy paste. Cross-verifying with three AIs (Claude, Gemini, myself). Even without knowing coding well, verification system is enough.

Lastly please understand. Most of this writing I wrote myself and translated with AI help. Non-English speaker needs three four times more time to write something like this. Please understand if translation is weird. If you have questions about my AI system operating environment, I may not know everything but I will borrow AI power to run scripts and tell you numbers and structure accurately. Thank you for reading long writing.

from garlic farmer

From a personal AI agent to a phone-based agentic operating environment

garlicfarmer — Sat, 14 Mar 2026 11:12:15 +0000

I am a garlic farmer in South Korea.
After I quit my job in Seoul and settled in the countryside, I have had no personal PC for 16 years. Even now, my main work environment is one Android phone. I install Termux and do almost everything there.

I never formally learned to code, and I am not someone who can write code from scratch by myself.
Most of the code I made was built little by little — talking to multiple AIs, copying, pasting, running again, seeing errors, then taking those errors to another AI and asking again.

I am a non-English speaker, and all my thinking happens in Korean. So this writing is also based on Korean thinking. When moved to English, it may feel a bit awkward. I would appreciate your understanding on that.

I cannot say I perfectly understand the entire structure of this system. Because this system itself is something that a human and AIs have shaped together, even when writing this post, I had to rely on AI for parts of translation and structural explanation. But I do not just post whatever AI gives me. I compare multiple explanations, verify them, and the final judgment is always mine before I post.

This post is not simply a story about "a farmer used AI."
It is a record of the process where I manually orchestrated multiple AIs with one phone, and built a personal AI system that actually runs.

What I built

What I built is a personal AI system called garlic-agent, based on Android Termux.
The name might sound a bit grand, but for me it is not a simple chatbot — it is closer to a personal assistant that I actually assign tasks to, so I call it that.

Before, I just called it a personal AI agent. Because I am a garlic farmer, the name garlic-agent came naturally. But as I kept adding structure to it, I started thinking that calling it just an "agent" does not cover what it has become.

Now I think it is closer to a small operating environment — where a human orchestrates multiple AIs in the middle, and on top of that, execution, verification, search, backup, and restore structures are layered. If I had to express it in English, maybe Agentic Operating Environment or Agent Orchestration Framework would fit better.

After building and tearing apart again and again, I feel it is no longer just a collection of prompts bundled together. Inside my system, tool execution, security restrictions, search, verification, snapshots, restore, and skill execution are all connected as one flow.

Simply put, it works like this.

User input
   ↓
garlic-agent
   ├─ If it is a frequent task → GL-DIRECT immediate execution
   └─ Otherwise → LLM generates GarlicLang script
                ↓
          Local execution
                ↓
      Verify / Check logs / Return result
                ↓
      If needed → Snapshot / Backup / Restore

I cannot say it is a finished product yet. But at least for me personally, it has already become an operating system that I use every day — modifying, recovering, and redeploying.

Simple structure

If I draw it very simply, the flow looks like this.

User natural language input
        ↓
   web / chat UI
        ↓
      agent.py
        ↓
 tools.py ─ security.py
        ↓
      Execution result
   ├─ search.py / knowledge.db
   ├─ GarlicLang verification
   └─ snapshots / backup / restore

The important thing in this structure is that the AI does not just answer — it actually calls tools, inspects the results again, and if necessary, goes all the way to restore.

From the outside it may look like a chat window, but internally there are separate layers: tool execution layer, security layer, search layer, verification layer, snapshot layer, restore layer. I have been building and operating all of this alone, without a PC, on an Android phone.

Why this structure was needed

I never planned to build something this big from the start. I used to work inside LLM chat windows that had sandboxes. But that approach was always unstable. Some days things worked, next day they did not. It said it saved, but actually it did not save. It said it executed, but when I checked the logs, there was no execution record.

As these experiences piled up, I felt more and more clearly.

You should not leave everything to AI. Help is fine, but what matters is structure, verification, and enforcement.

So I started pulling control back to my side, little by little. Not just fixing prompts, but splitting workflows, saving frequent tasks, making things restore on failure, separating search and verification, and attaching structures that make it hard for AI to just skip ahead on its own.

garlic-agent is the result of all those trial and errors piled up.

How I built it

This system is not a fully autonomous AI. It is closer to the opposite. I used a manual multi-orchestration approach where a human intervenes in the middle.

Simply put, I did not let AIs talk to each other directly. I stood in the middle like a router — getting design and analysis from Claude, giving implementation or different-angle verification to models like Gemini, DeepSeek, MiniMax, then looking at the results myself and copying them to the next AI.

This process may look very primitive. I have multiple chat windows open, and I keep copy-pasting with a physical keyboard on my phone, doing ping-pong back and forth. At first I thought this was too stupid of a method. But surprisingly, this approach was stronger than I expected.

Because a human is in the middle:

You can directly feel the personality differences between each AI
You can filter out false success reports
If one AI gets stuck, you can immediately switch to another
You can compare and verify results right away

If I draw this structure very simply, it looks like this.

Claude      → Design / Analysis
Gemini      → Implementation
DeepSeek    → Additional analysis
MiniMax     → Cross-verification
Grok/others → Supporting opinions
                 ↓
              Human (me)
      Judge / Compare / Copy / Pass / Final choice
                 ↓
        Apply in Termux / Test / Fix / Retry

Over the past 2 years, opening and closing countless chat windows, I learned with my body that each model has quite a different personality. My personal feeling is that Gemini follows rules relatively well, DeepSeek is strong at analysis but tends to repeat calls, and MiniMax can be unpredictable but helps with cross-verification. This is less of a rigorous benchmark and more of a farmer's intuition built from long observation.

Why I made GarlicLang

The most frequent problem I saw while building this system was that AI could not handle tools properly, summarized results in strange ways, or even said it did something it never did.

I did not want to just dismiss that as "well, LLMs have limitations like that." If I was going to actually delegate tasks, I needed to handle failure, verification, and exception handling more structurally.

For that reason, with help from multiple AIs, I made a Korean-syntax scripting language called GarlicLang. The AIs explained it as a DSL. Rather than strict terminology, I think of it as "a language I made on my side to execute and verify AI tasks a bit more safely."

For example, it looks like this.

[시도]                          # try
  [실행]                        # execute
    명령어: cat ~/garlic-agent/config.json
[환각시]                        # on hallucination
  [출력]                        # print
    내용: AI fabricated the result
[실패시]                        # on failure
  [출력]                        # print
    내용: Command execution failed

The important thing is, I did not make this language to look cool. I kept facing real failures, and I needed a structure that could handle verification, failure handling, and restore better. GarlicLang was one of the results.

In other words, GarlicLang is not the center of the whole system — it is one sub-component that came out of operating garlic-agent.

Problems that showed up in real operation

What I felt while building this system is that the problem with AI is not simply "is it smart or not." In real operation, much more practical problems keep popping up.

1. The problem of not saving even when told to save

No matter how strongly I wrote "you must save to a file" in the prompt, some models ignored it and just said "saved." When I actually checked, there were 0-byte files three times in a row.

The lesson I learned then was simple.

Do not ask AI nicely. Force it with code.

In the end, I changed direction to putting a forced-save interceptor at the system level.

2. The problem of safety checks killing normal work

A verification module I made to prevent number hallucination ended up blocking all normal analysis too. The AI suggested "let's fix line 921," but because the number 921 did not appear in the tool output, the system treated it as hallucination.

This incident gave me quite a big lesson.

Safety checks are necessary, but if they are too strong, the system itself stops.

3. The problem of reporting results without executing

Some models gave results like "PASS 7" without even executing the tool. When I checked the logs later, there was no actual call record.

As these experiences piled up, I became more and more certain.

Trust actual executed results more than what AI says.

4. The problem where small syntax mistakes become fatal without restore

I once put external text into a triple-quote string in a Python file, and a quote mismatch caused the string to terminate early, killing the server. The backup even kept the broken file from the same point, so I had to go back to an even earlier backup to restore.

The more I went through these incidents, the clearer it became.

An AI system is not just a matter of answer quality — it is also a matter of operation and restore structure.

So the reliability layers were born

Going through these problems, garlic-agent gradually gained multiple reliability layers.

For example:

A security layer that only passes allowed paths and commands
A structure that automatically backs up before file modifications
A structure that auto-reverts when post-write verification fails
A structure that detects file changes and takes snapshots
A logging structure that records tool execution history
A GL-DIRECT structure that executes frequent tasks immediately without LLM

In other words, something like Hallucination Guard did not fall from the sky as a separate project. It naturally emerged as one sub-layer while actually operating this larger system and responding to the problems that showed up.

This is the point I most want to make in this post.

If I draw just the write-safety flow separately, the structure is roughly like this.

tool:write request
    ↓
1) Pre-backup
   ├─ Create .bak file
   ├─ Save DB snapshot
   └─ Record PRE_WRITE anchor
    ↓
2) File write
    ↓
3) GarlicLang verification
   ├─ Pass → Keep
   └─ Fail → Auto-revert
    ↓
4) Autosnap additional record

The reason I added this structure is simple. I experienced multiple times where an LLM wrote wrong code, or said it saved but actually did not, or gave partially correct results. Rather than solving these problems with prompts alone, I wanted the system to catch and revert automatically.

GL-DIRECT and cost issues

Inside this system, there is also a structure called GL-DIRECT. Simply put, frequent tasks are pre-saved as GarlicLang scripts, and later they run immediately without any LLM call.

Before adding this, even the same repetitive task required calling the API again every time. After adding GL-DIRECT, I could handle frequent commands much more lightly.

At first I did not even understand API cost structures well. I was surprised when costs suddenly jumped while using it, and only after that I realized that repetitive tasks should be moved to immediate execution as much as possible.

I think exact numbers need to be re-verified constantly. But the direction itself was clear.

For repetitive tasks, running verified scripts directly is far better than asking AI every time.

Current scale

It is still a small personal project, but by my own standards, I feel I have come quite far.

Currently my system has roughly these elements.

Item	Count
agent.py	1,210 lines
Total .py files	31
Skills	13
GL-DIRECT	6
GarlicLang interpreter	967 lines
GarlicLang commands	32
GarlicLang verification types	17
knowledge.db documents	6,488
GL scripts	359
Backup tar.gz	128
LLMs used	Gemini, DeepSeek, MiniMax, Cerebras, Groq, NVIDIA, Claude (external analysis support)

These numbers are based on values I re-checked from my system before writing this post. Still, I think numbers are always something that needs re-verification. From my experience, numbers are the part that most easily goes wrong.

What I still do not know

This system is useful for me, but whether it would be equally useful for someone else — I still do not know. I have grown this structure to fit my own work style and judgment flow, so from another person's perspective, it might look overly complex and unfriendly.

I tried local LLMs multiple times too, but due to phone hardware limitations, it was not practical yet. I think if I attach something like a Mac Mini later, different results might come out.

And above all, there is still so much I do not know. Things like indentation rules, AST pattern matching, regression testing, change history management — I am still learning these little by little while wrestling with AIs. The way I learned coding is closer to keep asking AI, failing, pasting again, restoring, rather than from books or courses.

What I want to do next

My next goal right now is to fix the variable scope bug in the GarlicLang interpreter. And I want to stack CHANGELOG records more structurally, so that the agent can search its own history.

I feel more and more strongly that failure records are far more important than success records. AI repeats the same mistakes often, and humans forget quickly too. So I think task logs, failure records, and restore points should not be just notes — they should be part of the system.

Someday I want to open-source it. But right now it is still close to a personal system, and my philosophy and usage patterns are too deeply embedded in it, so I think it would be hard for others to use right away.

Still, I see possibility. If a farmer with no PC and not much coding experience has come this far, I think in the future, more ordinary people might be able to build their own AI systems in their own way.

I do not want to make a grand declaration. But what I felt over this time is this.

In the AI era, it is not only important to get good answers. Building a structure that fits your own environment, recording failures, attaching verification and restore, and handling multiple AIs in your own way — I think these abilities will become more and more important too.

This post is closer to an observation log of one garlic farmer from Korea who kept experimenting with one phone, rather than a finished success story.

Thank you for reading this long post.

from garlic farmer

Is this closer to an early AI OS, or just an AI agent?

garlicfarmer — Fri, 13 Mar 2026 17:13:37 +0000

Hello. I am writing because I want to hear opinions.
I am a non-English speaker, so I got help from AI for the English expression. I am building a personal AI system only with my phone (Android) and Termux, without a PC. I open several AI chat windows and make them collaborate indirectly by copy and paste, and almost every day I keep growing the structure little by little.
Before, I just called it “garlic-agent.” But as the project became bigger, I started to think maybe this is getting closer to an early AI OS rather than only a simple AI agent. Of course, I do not mean OS in the same sense as a normal operating system. But inside, it is moving more and more like one small operating environment.
The rough structure is like this.
It runs on a phone (Android) with Termux. There is a web UI for conversation. I connect several LLM providers. There is a tool calling structure such as read, search, exec, write, patch, garlic, and others. There is knowledge.db with RAG search for documents. There is also my own scripting language, GarlicLang, which can run tasks together with verification. There is skill loader, direct script, and route branching structure. There are separated modes like “talk mode” and “work mode.” There are background processes like watchdog, watcher, and autosnap. There are also logging, anchor-based restore, and RESULT_SUMMARY style record systems.
Especially, the backup and restore side is not only simple file copy level.
I made multi-step recovery layers, like snapshot before file change, .bak backup, automatic snapshot, anchor-based restore, project-level tar backup, and Google Drive connection. Because I do real-time collaboration with several AIs in a phone environment, fast rollback and restore became one of the most important functions.
So now, it is already beyond the level of “an AI using a few tools.” It is more like one circular structure connecting task routing, execution, verification, logging, backup, and recovery.
Of course, it is not a fully autonomous system. There are still many limits, and many times I feel I do not even understand half of this mechanism myself. Coding is not my professional field. But while continuing this project, it looks to me closer to “an early stage where a semi-autonomous AI system is expanding in an OS-like direction” than to just one normal AI agent.
For reference, I already posted a few related posts and working videos about this project on Reddit. They are in Korean, but maybe they can still help for understanding how it actually moves.
From your view, what is the most accurate name for this kind of thing?
Just an AI agent?
A multi-tool agent system?
An early form of AI OS?
An agent-based personal operating environment?
Or something else?
I would be thankful if you can tell me honestly, without exaggeration, what technical category this is closer to. I want some help in giving it a proper name.
from garlic farmer

Building a Personal AI Agent on a Phone: garlic-agent Architecture Analysis by garlicfarmer

garlicfarmer — Tue, 10 Mar 2026 02:05:58 +0000

I am a garlic farmer from South Korea. Not a developer. Not an engineer. Just a farmer who grows garlic.

But with AI by my side, I built something I still can't fully believe — a personal AI agent system running entirely on my Android phone using Termux. I don't even understand half of what I built, honestly. Every single step was done in Korean, and AI helped me translate. So forgive me if the English feels a bit rough. That roughness is part of who I am.

This is the structure my agent analyzed and organized. I only understood the full picture after seeing it laid out like this. The architecture is complex, but it carries my personal design philosophy — everything runs on one phone, nothing more.

config.json — The Heart of System Configuration

config.json holds the entire configuration for garlic-agent. Every program reads this file to operate.

The security section defines what the agent can and cannot do. allowed_dirs lists the folders the agent is permitted to read and write: /garlic-agent/, /garliclang_full/, ~/gdrive_backup/, /storage/emulated/0/Download/ — only these 4 directories. If the agent tries to touch anything outside, security.py blocks it. For example, if the agent tries to read /etc/passwd, it gets "Access Denied."

blocked_commands is the list of forbidden commands. rm -rf / (delete entire system), rm -rf ~ (delete home folder), mkfs (format disk), dd if= (overwrite disk). These dangerous commands are blocked no matter what. Database commands are blocked too: DROP TABLE, DELETE FROM, ALTER TABLE, TRUNCATE — preventing the agent from wiping knowledge.db tables. Even sqlite3 direct execution is blocked, forcing the agent to access the DB only through tool:search.

allowed_domains lists the servers the agent can call APIs on. Only LLM servers like api.cerebras.ai, api.groq.com, generativelanguage.googleapis.com (Gemini), and api.minimax.io are allowed. The agent cannot connect to any other websites.

key_patterns defines API key patterns. Strings matching patterns like nvapi-, AIzaSy, gsk_ are automatically masked if they appear in agent responses. This prevents accidental API key exposure.

The providers section lists available AI companies. Each provider has model names and max_context (maximum tokens it can process). DeepSeek had max_output limited to 8192, which was the cause of output truncation earlier.

The agent section is the core agent configuration. max_loops 20 means the agent can call tools up to 20 times per task — infinite loop prevention. exec_timeout 30 means any tool:exec command that doesn't finish within 30 seconds gets killed. max_write_size 102400 means maximum file size per tool:write is 100KB — preventing the agent from creating 1GB files. context_turns 35 means only the most recent 35 turns of conversation history are sent to the LLM. max_tokens 38400 is the maximum token count for LLM responses.

The web section configures the web server. It runs on port 8080 with token garlic2026 as the authentication token, required when accessing localhost:8080 from the browser.

The verification section handles validation settings. llm_verify false means the LLM self-verification feature is off — it would double API costs. db_search true means RAG search is enabled. garliclang true means GarlicLang execution is active. strict normal means security level is standard.

tool_limit 5 means the agent can use tools up to 5 times per turn. This is why DeepSeek got blocked earlier when it exceeded 7 calls.

security.py — The Security Guard

If config.json sets the rules, security.py enforces them. Every time the agent tries to execute a command via tool:exec, security.py inspects it. Matched against blocked_commands? Blocked. Accessing paths outside allowed_dirs? Blocked. Suspicious patterns? Blocked. It stands between the agent and the actual system like a guard.

tools.py — The Tool Executor

When the agent writes [tool:read ~/garlic-agent/SOUL.md], agent.py parses it and passes it to tools.py. tools.py actually reads the file and returns the result. The execution logic for all 9 tools (read, write, exec, patch, search, garlic, screen, app) lives here. The _safe_backup function that automatically creates backups during tool:write is also inside tools.py.

agent.py — The Brain

The core of the system. Over 1,200 lines. Here's what it does: when the user sends a message from the web UI, web.py receives it and passes it to agent.py. agent.py reads SOUL.md, reads TOOLS.md, combines the user message with conversation history, and sends everything to the LLM. When the LLM responds, it parses the response for tool calls like [tool:exec ...]. If found, it executes them via tools.py, sends results back to the LLM for the next response. This loop can repeat up to max_loops 20 times.

The lines 1114, 1117, and 1119 that were modified control tool result buffer sizes. Line 1114 is the web UI display size (8,000 chars), line 1117 is the size sent to the LLM (20,000 chars), and line 1119 is the tool log size (4,000 chars).

web.py — The Front Door

A 5-line shim file. from web_server import main — the actual logic is in web_server.py. During yesterday's modularization, the 1,700-line web.py was split into 7 modules: web_server.py (boot), web_state.py (state management), web_auth.py (authentication), web_routes_admin.py (admin API), web_routes_docs.py (docs API), web_routes_gl.py (GarlicLang execution), web_routes_chat.py (chat API). When the browser hits localhost:8080, these modules handle the requests.

garliclang_bridge.py — The GarlicLang Bridge

Connects agent.py to the GarlicLang interpreter. When the agent calls [tool:garlic test.gl], garliclang_bridge.py passes the GL file to the interpreter, executes it, and returns PASS/FAIL results. GarlicLang itself is installed separately in ~/garliclang_full/.

search.py — The Search Engine

Activated when tool:search is called. It performs hybrid search. First, FTS5 keyword matching pulls the top 10 results. Then it sends the query to llama-server, converts it to a 768-dimensional vector, and pulls 5 more via cosine similarity. Duplicates are removed and up to 8 results are returned. This is why searching for "자율수정" (self-repair in Korean) also finds English documents containing "self-repair" — thanks to vector search.

SOUL.md — The Constitution

Defines the agent's behavioral rules in a layered structure. Layer 1 has the highest authority: "Tool results are truth. No hallucination." The agent reads this file every turn. Rules like "Don't use tools in chat mode" and "Only use tools in work mode" are written here. DeepSeek using tools in chat mode today was a violation of this constitution.

TOOLS.md — The Tool Manual

Contains exact usage instructions for all 9 tools. The agent reads this every turn before calling tools. Rules like "max 5 tools per turn" and "stop after 3 consecutive failures" are here. The GWRITE wrapper rule added today is also documented here.

HANDOVER.md — The Handover Document

A document for new AIs joining the system. Contains system philosophy, user profile, and past lessons. Information like "the user doesn't know coding but understands the system" and "the agent gets scolded when it makes mistakes" — real battlefield wisdom.

startall.sh — The Ignition Key

Like a car's ignition key. Run this one script and web.py, llama-server, watcher.sh, watchdog.sh, and autosnap.sh all start up. It cleans logs, kills leftover processes, and starts all services in order. Open a new Termux session, type startall, and everything comes alive.

watchdog.sh — The Auto-Recovery Worker

Checks every 3 seconds if web.py is alive. If dead, it automatically restarts it. The web.py PID changes observed during today's testing were traces of watchdog doing its recovery job.

watcher.sh — The File Watcher

Uses inotifywait to monitor file changes. When .py, .md, .json, .gl, or .html files are modified inside ~/garlic-agent/, it detects the change and calls auto_index.py, which re-indexes the changed file into knowledge.db.

autosnap.sh — The Auto-Photographer

Similar to watcher.sh but with a different role. When files change, it calls gsave.sh to save the pre-change content to the file_snapshots table. If the same file changes again within 3 seconds, it skips the duplicate save. Every modification automatically gets a snapshot taken.

anchors.json — Time Bookmarks

Attaches labels to specific points in time. PRE_WRITE means "right before file modification," AUTOSNAP means "auto-snapshot moment," RECOVERY means "after successful operation," GUARD_AUTO means "security block moment," ROLLBACK means "restoration point." Each anchor links to a snapshot_id, so running 앵커복원 N reverts files to that point in time.

fact_ledger.json — The Work Journal

Recorded every time the agent executes a tool. Maintains up to 200 entries. Logs "what time, which tool was used, and what the result was." Used later to check "what did the agent do today."

scripts/ folder — The Tool Shed

Contains operational scripts: backup.sh (full backup), gsave.sh (snapshot save), garlic_undo.sh (emergency restore), clean_bak.sh (.bak cleanup), regression_test.sh (regression test), http_golden_test.py (HTTP test), phase456_verify.gl (GL verification), and more.

skills/ folder — The App Store

12 skills exist as individual folders. Each folder contains a SKILL.md with trigger keywords and GL scripts to execute. Say "show backup list" and the backup-manager skill matches, executing list_backups.gl. To add a new skill, just create a folder and write one SKILL.md. That's it.

How Everything Connects

All these files are organically connected. User sends message from web UI → web.py receives it → agent.py reads SOUL.md + TOOLS.md, sends to LLM → parses tool calls from response → tools.py executes (security.py validates) → results sent back to LLM → if files changed, watcher.sh + autosnap.sh detect it → knowledge.db updated + snapshot saved → if problems arise, restore via anchors.json. The entire thing is one circular system.

Full System Diagram

╔══════════════════════════════════════════════════════════════════════════╗
║ garlic-agent Full System Diagram ║
║ (Android Termux Environment) ║
╚══════════════════════════════════════════════════════════════════════════╝

                      ┌─────────────┐
                      │  User (Phone)│
                      │  Browser     │
                      └──────┬──────┘
                             │ localhost:8080
                             ▼

╔══════════════════════════════════════════════════════════════════════════╗
║ Web Server Layer ║
║ ║
║ ┌─────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ ║
║ │ web.py │→│ web_server.py │→│ web_auth.py │→│ token verify │ ║
║ │ (5-line │ │ (207L boot) │ │ (auth logic) │ │ garlic2026 │ ║
║ │ shim) │ └──────┬───────┘ └──────────────┘ └────────────────┘ ║
║ └─────────┘ │ ║
║ ├──→ web_routes_chat.py (chat: 8 handlers) ║
║ ├──→ web_routes_docs.py (docs: GET 7 + POST 2) ║
║ ├──→ web_routes_admin.py (admin: 6 handlers) ║
║ └──→ web_routes_gl.py (GarlicLang execution) ║
╚══════════════════════════════╤═══════════════════════════════════════════╝
│ user message passed
▼
╔══════════════════════════════════════════════════════════════════════════╗
║ Brain Layer — agent.py (1200+ lines) ║
║ ║
║ ┌────────────────────────────────────────────────────────────────┐ ║
║ │ 1. Read SOUL.md (behavioral rules = constitution) │ ║
║ │ 2. Read TOOLS.md (tool usage manual) │ ║
║ │ 3. Assemble conversation history (recent 35 turns) │ ║
║ │ 4. Attach RAG search results (knowledge.db → max 8 docs) │ ║
║ │ 5. Send to LLM ─────────────────────────────────────┐ │ ║
║ │ 6. Receive response ←────────────────────────────────┘ │ ║
║ │ 7. Parse tool calls ([tool:exec ...], [tool:write ...], etc) │ ║
║ │ 8. Execute tools → get results → send back to LLM (max 20) │ ║
║ └────────────────────┬───────────────────────────────────────────┘ ║
║ │ ║
║ ┌────────────────────┴───────────────────────────────────────┐ ║
║ │ Buffer Settings (modified today) │ ║
║ │ L1114: Web UI display → :8000 │ ║
║ │ L1117: LLM delivery → 20,000 chars │ ║
║ │ L1119: Tool log → :4000 │ ║
║ └────────────────────────────────────────────────────────────┘ ║
╚══════════╤══════════════════════╤═══════════════════════════════════════╝
│ tool calls │ API calls
▼ ▼
╔════════════════════════╗ ╔═══════════════════════════════════════════╗
║ Security Layer ║ ║ LLM Providers (config.json providers) ║
║ ║ ║ ║
║ ┌──────────────────┐ ║ ║ ┌─────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ security.py │ ║ ║ │ Gemini │ │ DeepSeek │ │ MiniMax │ ║
║ │ │ ║ ║ └─────────┘ └──────────┘ └──────────┘ ║
║ │ Checks: │ ║ ║ ┌─────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ · blocked_cmds │ ║ ║ │ Groq │ │ NVIDIA │ │ Cerebras │ ║
║ │ · allowed_dirs │ ║ ║ └─────────┘ └──────────┘ └──────────┘ ║
║ │ · allowed_域 │ ║ ║ ║
║ │ · key_patterns │ ║ ║ Config: max_context, max_output / model ║
║ └────────┬─────────┘ ║ ╚═══════════════════════════════════════════╝
║ │ passed ║
╚═══════════╪════════════╝
▼
╔══════════════════════════════════════════════════════════════════════════╗
║ Tool Execution Layer — tools.py ║
║ ║
║ ┌──────────┬──────────┬──────────┬──────────┬──────────────────────┐ ║
║ │tool:read │tool:write│tool:exec │tool:patch│tool:search │ ║
║ │file read │file write│cmd exec │file patch│DB search │ ║
║ │ │+auto bak │30s limit │+auto bak │FTS5+vector │ ║
║ ├──────────┼──────────┼──────────┼──────────┼──────────────────────┤ ║
║ │tool: │tool: │tool:app │ │ │ ║
║ │garlic │screen │app launch│ │ │ ║
║ │GL exec │capture │ │ │ │ ║
║ └────┬─────┴─────┬────┴──────┬──┴──────────┴──────────────────────┘ ║
║ │ │ │ ║
║ ▼ │ ▼ ║
║ garliclang_ │ search.py ──→ knowledge.db ║
║ bridge.py │ │ ║
║ (GL interpreter) │ ├──→ FTS5 (keyword) ║
║ │ │ └──→ llama-server :8081 (vector) ║
║ ▼ │ ║
║ [verify] PASS/ │ ║
║ FAIL │ ║
║ │ ║
║ ┌───────────┘ ║
║ ▼ ║
║ _safe_backup → archive/bak/ (timestamp.bak) ║
║ → file_snapshots (knowledge.db) ║
║ → anchors.json (PRE_WRITE anchor) ║
╚══════════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════════╗
║ Background Process Layer — started by startall.sh ║
║ ║
║ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────────┐ ║
║ │ watchdog.sh │ │ watcher.sh │ │ autosnap.sh │ ║
║ │ │ │ │ │ │ ║
║ │ Every 3 sec: │ │ inotifywait: │ │ inotifywait: │ ║
║ │ web.py alive? │ │ detect file chg │ │ detect file chg │ ║
║ │ dead → restart │ │ │ │ │ │ │ ║
║ │ │ │ ▼ │ │ ▼ │ ║
║ │ llama-server │ │ auto_index.py │ │ gsave.sh │ ║
║ │ alive? │ │ → knowledge.db │ │ → file_snapshots │ ║
║ │ dead → restart │ │ update docs │ │ save snapshot │ ║
║ │ │ │ refresh embed │ │ → anchors.json │ ║
║ │ │ │ │ │ AUTOSNAP record │ ║
║ └──────────────────┘ └───────────────────┘ └─────────────────────┘ ║
║ ║
║ ┌──────────────────────────────────────────────────────────────────┐ ║
║ │ llama-server (:8081) │ ║
║ │ nomic-embed-text model (768-dim) │ ║
║ │ Role: text → vector conversion (embedding generation) │ ║
║ │ Used by: search.py (search), auto_index.py (indexing) │ ║
║ └──────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════════╗
║ Data Storage Layer ║
║ ║
║ ┌─────────────────────────────────────────────────────────────────┐ ║
║ │ knowledge.db (225MB) │ ║
║ │ │ ║
║ │ ┌─────────────────────┐ ┌──────────────────────────────┐ │ ║
║ │ │ docs (6,611) │ │ file_snapshots (2,887) │ │ ║
║ │ │ documents + embeds │ │ snapshots + timestamps │ │ ║
║ │ │ + FTS5 full-text │ │ + path index │ │ ║
║ │ └─────────────────────┘ └──────────────────────────────┘ │ ║
║ └─────────────────────────────────────────────────────────────────┘ ║
║ ║
║ ┌──────────────────┐ ┌──────────────┐ ┌─────────────────────────┐ ║
║ │ anchors.json │ │ fact_ledger │ │ config.json │ ║
║ │ time bookmarks │ │ .json │ │ full configuration │ ║
║ │ PRE_WRITE │ │ tool exec │ │ security/providers/ │ ║
║ │ AUTOSNAP │ │ log (200) │ │ agent/web/verify/ │ ║
║ │ RECOVERY │ │ │ │ session/version │ ║
║ │ ROLLBACK │ │ │ │ │ ║
║ └──────────────────┘ └──────────────┘ └─────────────────────────┘ ║
║ ║
║ ┌──────────────────────────────────────────────────────────────────┐ ║
║ │ archive/bak/ .bak backup files (keep latest 3) │ ║
║ │ scripts/ operation scripts + GL files │ ║
║ │ skills/ 12 skills (SKILL.md + GL-DIRECT) │ ║
║ │ /storage/.../Download/ tar.gz project backups │ ║
║ │ Google Drive rclone remote backup │ ║
║ └──────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════════╗
║ Core MD Document Layer (agent reads every turn) ║
║ ║
║ ┌────────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────────┐ ║
║ │ SOUL.md │ │ TOOLS.md │ │ HANDOVER.md │ │ GL_SYNTAX.md │ ║
║ │ Constitu- │ │ Tool │ │ Handover │ │ GarlicLang │ ║
║ │ tion │ │ Manual │ │ Document │ │ Grammar │ ║
║ │ Layer 1~5 │ │ 9 tools │ │ For new AI │ │ 32 commands │ ║
║ │ Supreme │ │ Call rules │ │ Philosophy │ │ 15 verify types │ ║
║ └────────────┘ └────────────┘ └─────────────┘ └──────────────────┘ ║
║ ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ ║
║ │ BACKUP_POLICY.md│ │ AGENT_HANDBOOK.md│ │ GARLICLANG_SPEC.md │ ║
║ │ Backup policy │ │ Agent mistake │ │ GarlicLang design │ ║
║ │ 351 lines │ │ collection │ │ philosophy │ ║
║ │ 5-stage restore │ │ Common errors │ │ Verify-centric DSL │ ║
║ └─────────────────┘ └──────────────────┘ └────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════╝

5-Stage Restore System

① ② ③ ④ ⑤
[Anchor DB Snapshot .bak File tar.gz Google
Restore N] file_snapshots archive/bak/ Download/ Drive
anchors.json (per second) (per file) (project) (remote)
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Fastest Restore by garlic_undo.sh tar xzf rclone
Web UI click snapshot_id 1-line restore full restore download

Garlic CMD Flow (Natural Language → Execution)

User: "show backup list"
│
▼
typo_guard (typo correction)
│
▼
skill_loader (trigger match: "backup" → backup-manager)
│
├─ GL-DIRECT script exists? ──→ YES ──→ execute immediately (API cost 0)
│ scripts/list_backups.gl
└─ NO ──→ pass to LLM ──→ generate GL ──→ execute ──→ show result
│
▼
save to scripts/ (next time → GL-DIRECT)

This is the rough structure of garlic-agent. The version goes up every day, so this is the latest architecture of my personal project as of now. I am not a developer. I just had a problem I wanted to solve, and AI helped me solve it, one piece at a time, all in Korean, all on one phone.

Thank you for reading this far. If the English feels clumsy in places, that is the honest voice of a Korean garlic farmer who built something with his own hands and a little help from AI.

How a Korean Garlic Farmer and 3 LLMs Rebuilt an Error Handling System in 4 Hours on a Phone

garlicfarmer — Fri, 27 Feb 2026 12:56:44 +0000

title: "4-Party Collaboration Technical White Paper: Human(Garlic farmer) × Claude Opus 4.5 × Gemini 3.0 × MiniMax 2.5"
published: true
tags: [GarlicLang, MultiLLM, DSL, Termux, AI]
Disclaimer: This article is a personal experiment and log written by a garlic farmer in South Korea.
All numerical values shown here were actually extracted from agents running on a Unihertz Titan 2 mobile phone.
All responsibility lies with the human author.
This document was originally written in Korean and translated into English.
Due to the experimental conditions, since the data resides in the phone's system, there was no need for separate processing, so the article was finalized with only minimal review.
This is a personal AI experiment by a garlic farmer in the countryside, so the English translation may be a bit rough — please be understanding.

Date written: February 26, 2026

Authors: Human (Project Director) · Claude Opus 4.5 (Architecture Design/Verification) · Google Gemini 3.0 (Implementation) · MiniMax 2.5 (Implementation/Verification)

Keywords: GarlicLang, Multi-LLM Collaboration, Structured Error Handling, DSL, Termux, Autonomous Agent

Executive Summary

This document is a technical white paper recording the GarlicLang error handling system improvement work conducted over the course of one day, February 26, 2026. One Human and 3 LLM models (Claude Opus 4.5, Gemini 3.0, MiniMax 2.5) collaborated to convert the existing string-based error return system into a JSON structured system.

Key achievements:

Converted error return format from strings to a 6-field JSON structure
Automatic inclusion of Korean resolution suggestions for 11 error types
Zero-downtime improvement through 4-step gradual implementation
3 files modified (errors.py, interpreter.py, garliclang_bridge.py), AST verification 100% passed

Work duration: Approximately 4 hours (estimated 19:00 ~ 23:00 KST)

Background and Problem Definition

2.1 What is GarlicLang

GarlicLang is a Korean-based DSL (Domain Specific Language) that runs in the Android Termux environment. It serves as an intermediate language that converts the user's natural language instructions into safe system commands.

Basic syntax example:
[파일쓰기]
경로: ~/test.py
내용: print("hello")

[실행]
명령어: python3 ~/test.py

[검증]
종류: 출력포함
대상: hello
2.2 Limitations of the Existing Error Handling

The error handling method of GarlicLang before improvement:

errors.py (before improvement)python
def err_kr(key: str, *args) -> str:
msg = ERRORS_KR.get(key, key)
if args:
msg = msg.format(*args)
return msg # Simple string return
interpreter.py (before improvement)python
raise RuntimeError(err_kr('undefined_variable', name))
Output: "정의되지 않은 변수입니다: x" (Undefined variable: x)

Problems:

LLM has difficulty identifying the cause from the error message alone
No error location information (line, block)
No resolution suggestions
Automatic correction/retry is inefficient

2.3 Causes of LLM GL Script Generation Failures

LLM GL generation failure patterns observed during work:

Failure Type	Frequency	Cause
Undefined variable	High	Missing [변수설정] when referencing $result
Syntax confusion	Medium	Mixing YAML or Python syntax
Path error	Medium	Confusion between ~/ and absolute paths
Argument mismatch	Low	Wrong number of arguments in function calls

Participating Models and Roles

3.1 Human (Project Director)

Role: Final decision-making, work direction setting, quality verification

Observed characteristics:

Directly executed commands and confirmed results in CLI environment
Distributed and coordinated work between LLMs
Intervened when unexpected problems occurred
Demanded an "ultra-objective" approach

3.2 Claude Opus 4.5 (Architecture Design/Verification)

Role: Overall design, step-by-step plan establishment, verification command provision, documentation

Observed characteristics:

Proposed a 4-step gradual implementation strategy
Immediately provided AST verification commands after each step completion
Consistently applied the backup-first principle
Responsible for writing instructions for MiniMax/Gemini
Immediately presented alternatives when problems occurred

Strengths:

Long context retention (grasped full context across the 4-hour session)
High system architecture understanding
Safe work sequence design

Weaknesses:

Cannot execute code directly (dependent on Human)

3.3 Google Gemini 3.0 (Implementation)

Role: Steps 1-2 implementation (errors.py err_json addition, interpreter.py structuring)

Observed characteristics:

Followed work rules after reading SOUL.md
Safe modifications through GL script + Python combination
Followed backup → modification → AST inspection sequence
Automatically generated detailed work reports

Step 1 work (err_json addition):
Execution result: PASS:1 FAIL:0
Output: RESULT: SUCCESS_ADDED
Time taken: Approximately 2 minutes

Step 2 work (interpreter.py structuring):
Execution result: PASS:1 FAIL:0
Output: RESULT: SUCCESS_UPDATED
Modified lines: 27, 81, 119-120, 254-255

Strengths:

High GL script syntax accuracy
Succeeded in complex string replacement work
Automatic documentation after work completion

Weaknesses:

Multiple retries when initial tool:read failed
Intermittent response delays

3.4 MiniMax 2.5 (Implementation/Verification)

Role: Steps 3-4 implementation (bridge.py error_details, SUGGESTIONS addition), verification

Observed characteristics:

Quick work start after receiving instructions
Tendency to use tool:exec directly instead of GL scripts
Proposed designs considering backward compatibility
Proposed 3-level error classification (Parse/Runtime/Logic)

Step 3 work (bridge.py):
Execution result: PASS:1 FAIL:0
Output: RESULT: SUCCESS_UPDATED
Modified lines: 75-76, 117-118

Step 4 work (SUGGESTIONS):
Execution result: PASS:1 FAIL:0
Output: RESULT: SUCCESS_UPDATED
Added error types: 11

Strengths:

Fast response speed
Practical suggestions (Dynamic Suggestion Engine)
Suitable for verification tasks

Weaknesses:

Tendency to not follow GL script rules (prefers tool:exec)
1 freeze incident during Step 4
Model name error when saving instructions (saved as Gemini)

Collaboration System and Workflow

4.1 Role Distribution Principle
H: Decision-making, Execution, Verification
↓ (CLI commands)
Opus 4.5: Design, Instruction writing, Verification command provision
↓ (Web UI instructions)
Gemini/MiniMax: Implementation, GL script generation, Execution
↓ (Results)
Human: Confirmation → Opus 4.5: Proceed to next step
4.2 Standard Instruction Format

All work instructions followed this format:
Let's work. First read cat ~/garlic-agent/SOUL.md.

[Task Name]

Check current status:

tool:read [file path]
tool:exec [command]

Goal:
[Specific goal description]

Work sequence:

Create [script.py] with tool:write
Create [script.gl] with tool:write
Execute with tool:garlic
AST inspection

Save after completion:
tool:write ~/garlic-agent/prompts/2026-02-26/number[task name].md
4.3 Prompts Archive System

Instruction storage system introduced during work:
~/garlic-agent/prompts/
└── 2026-02-26/
├── 001_minimax_garlic_terminal_분석.md
├── 002_gemini_garlic_terminal_분석.md
├── 003_comparison_minimax_gemini.md
├── 004_minimax_에러처리의견.md
├── 005_gemini에러처리의견.md
├── 006_gemini에러처리1단계.md
├── 007_gemini에러처리2단계.md
├── 008_minimax에러처리3단계.md
└── 009_minimax에러처리_4단계.md

4-Step Implementation Details

5.1 Step 1: Adding err_json() Function

Assigned to: Gemini 3.0
File: ~/garliclang_full/garliclang/errors.py

Added code:python
def err_json(key: str, line: int = None, block: str = None,
context: dict = None, *args) -> dict:
"""Structured error return function (v20.4)"""
return {
'error_type': key,
'message': err_kr(key, *args),
'line': line,
'block': block,
'context': context,
'suggestion': None # To be changed to SUGGESTIONS.get(key) in Step 4
}
Verification result:
$ grep -n "def err_json" errors.py
41:def err_json(key: str, line: int = None, ...

$ python3 -c "import ast; ast.parse(open('errors.py').read()); print('OK')"
errors.py OK
5.2 Step 2: interpreter.py Structuring

Assigned to: Gemini 3.0
File: ~/garliclang_full/garliclang/interpreter.py

Changes:

Import added (line 27):python from garliclang.errors import ERRORS_KR, err_kr, err_json, BreakException, ContinueException
Initialization added (line 81):python self.last_error: dict = None
undefined_variable structured (lines 119-120):python self.last_error = err_json('undefined_variable', None, None, None, name) raise RuntimeError(self.last_error['message'])
wrong_arg_count structured (lines 254-255):python self.last_error = err_json('wrong_arg_count', None, None, None, name, len(func.params), len(args)) raise RuntimeError(self.last_error['message']) 5.3 Step 3: bridge.py error_details Return

Assigned to: MiniMax 2.5
File: ~/garlic-agent/garliclang_bridge.py

Changes (lines 75-76, 117-118):python
if interpreter.last_error:
result["error_details"] = interpreter.last_error
Backward compatibility: Existing result["error"] maintained as string

5.4 Step 4: SUGGESTIONS Automatic Inclusion

Assigned to: MiniMax 2.5
File: ~/garliclang_full/garliclang/errors.py

Added SUGGESTIONS dictionary (line 33):python
SUGGESTIONS = {
'undefined_variable': '변수명을 확인하세요. 오타가 있거나 [변수설정]으로 먼저 정의해야 합니다.',
'undefined_function': '함수 정의를 확인하세요. [함수정의]로 먼저 정의해야 합니다.',
'wrong_arg_count': '함수 호출 시 인자 개수를 확인하세요.',
'file_not_found': '파일 경로를 확인하세요. ~/로 시작하는지, 파일이 존재하는지 확인하세요.',
'division_by_zero': '나누는 값이 0인지 확인하세요.',
'type_error_math': '숫자끼리만 연산 가능합니다. 변수 타입을 확인하세요.',
'type_error_compare': '비교할 수 없는 값입니다. 타입을 확인하세요.',
'index_out_of_range': '인덱스 범위를 확인하세요. 배열 길이보다 작은 값을 사용하세요.',
'not_an_array': '배열이 아닙니다. []로 감싸거나 리스트를 사용하세요.',
'while_max_iterations': '반복 조건을 확인하세요. 무한 루프가 발생한 것 같습니다.',
'import_not_found': 'import 파일 경로를 확인하세요.',
}
err_json modified (line 63):python
'suggestion': SUGGESTIONS.get(key)

Verification and Test Results

6.1 Final JSON Error Outputpython
from garliclang.errors import err_json
result = err_json('undefined_variable', 10, '[실행]', {'name': 'x'}, 'x')
Output:json
{
"error_type": "undefined_variable",
"message": "정의되지 않은 변수입니다: x",
"line": 10,
"block": "[실행]",
"context": {
"name": "x"
},
"suggestion": "변수명을 확인하세요. 오타가 있거나 [변수설정]으로 먼저 정의해야 합니다."
}
6.2 AST Verification Results

File	Result
errors.py	OK
interpreter.py	OK
garliclang_bridge.py	OK

6.3 site-packages Synchronization

Modified files were copied to Python site-packages for system-wide application:
cp ~/garliclang_full/garliclang/errors.py /usr/lib/python3.12/site-packages/garliclang/
cp ~/garliclang_full/garliclang/interpreter.py /usr/lib/python3.12/site-packages/garliclang/

Gemini 3.0 vs MiniMax 2.5 In-Depth Comparison

7.1 Same Question Response Comparison

Question: "Please give your opinion on GarlicLang error handling improvement"

Item	Gemini 3.0	MiniMax 2.5
JSON direction agreement	✅	✅
Independent proposal	Contextual Snapshot (variable dump)	Dynamic Suggestion Engine
Error classification	Unified structure integration	3-level classification (Parse/Runtime/Logic)
Priority 1	undefined_variable	undefined_variable
Priority 2	wrong_arg_count	wrong_arg_count
Emphasis on caution	Integration of fragmented error handling	Maintaining backward compatibility

7.2 Code Generation Quality

Item	Gemini 3.0	MiniMax 2.5
GL script syntax	Accurate	Sometimes prefers tool:exec
Python script	Complete structure	Complete structure
AST inspection included	✅	✅
Backup logic included	✅	✅
Error handling	Includes try-except	Includes try-except

7.3 Instruction Compliance

Item	Gemini 3.0	MiniMax 2.5
SOUL.md reading	Always executed	Always executed
Work sequence compliance	High	High
Result document saving	Automatically performed	Automatically performed
Freeze incidents	0 times	1 time (early Step 4)

7.4 Response Style

Gemini 3.0:

Detailed analysis report format
High frequency of table usage
Closing phrases like "Static analysis complete"

MiniMax 2.5:

Concise summary format
Presents immediately executable code
Conversational closing like "Do you have any additional work?"

Additional Achievements

8.1 History Load Improvement

Problem: Only 24 entries were read from session.json, while 268 entries in chat_sessions.db were ignored

Solution: Modified session_manager.py's get_history() to read directly from chat_sessions.db

8.2 HANDOVER Document Expansion

Section	Content
9	System Architecture
10	GL Script Usage
11	Operation Commands Guide
12	Multi-LLM Collaboration System
13	GarlicLang v20.0 Syntax Reference
14	Verification Script Patterns

8.3 Verification Script Pattern Establishmentpython
def check(name, condition):
status = "O" if condition else "X"
print(f"[{status}] {name}")
return condition

Usage example
check("config.json JSON syntax", True)
check("AST inspection passed", ast_ok)
8.4 apifree Verification Script

A tool was created to verify whether GL scripts were executed locally without API billing:
$ apifree
[1] Recently created GL scripts (within 1 hour)
📄 fix_history_load.gl
📄 update_errors.gl
[2] API call code inspection
✅ fix_history_load.py: No API code found
[Conclusion] ✅ Local execution confirmed - No API billing

Conclusion

9.1 Achievement Summary

On February 26, 2026, the GarlicLang error handling system was successfully improved through collaboration of 1 Human and 3 LLM models.

Quantitative achievements:

Modified files: 3
Added code: Approximately 80 lines
Error type coverage: 11
AST verification: 100% passed
Work time: Approximately 4 hours

Qualitative achievements:

LLM can now immediately identify error causes and solutions
Expected improvement in automatic correction/retry efficiency
Multi-LLM collaboration system verified

9.2 Collaboration Effect Analysis

Role	Contribution
Human	Execution, Decision-making, Quality Management
Opus 4.5	Design, Coordination, Documentation
Gemini 3.0	Complex Implementation (Steps 1-2)
MiniMax 2.5	Fast Implementation (Steps 3-4), Verification

Key Insight: Multi-LLM collaboration with divided roles is more effective for complex system improvement than a single LLM.

9.3 Future Plans

Expand error coverage: Apply err_json to remaining RuntimeError points
Utilize context field: Automatic capture of variable state at the time of error occurrence
Line field accuracy: Track line numbers during the AST parsing stage
agent.py integration: Pass error_details directly to LLM to guide automatic correction
Appendix

10.1 Full Paths of Modified Files
~/garliclang_full/garliclang/errors.py
~/garliclang_full/garliclang/interpreter.py
~/garlic-agent/garliclang_bridge.py
/data/data/com.termux/files/usr/lib/python3.12/site-packages/garliclang/errors.py
/data/data/com.termux/files/usr/lib/python3.12/site-packages/garliclang/interpreter.py
10.2 Backup Files
~/garlic-agent/archive/bak/errors.py.20260226_2000.bak
~/garlic-agent/archive/bak/interpreter.py.20260226_2000.bak
~/garlic-agent/archive/bak/garliclang_bridge.py.20260226_2000.bak
/storage/emulated/0/Download/garlic-agent-1.5.6_20260226_2000.tar.gz (78MB)
10.3 Reference Documents
~/garlic-agent/SOUL.md
~/garlic-agent/HANDOVER_FINAL_v1.5.6.md
~/garlic-agent/CHANGELOG.md
~/garlic-agent/prompts/2026-02-26/*.md
End of document

This white paper was written through joint collaboration of Human(Garlic farmer), Claude Opus 4.5, Gemini 3.0, and MiniMax 2.5.

Garlic Farmer: Building a Decentralized Personal AI System on Android Termux

garlicfarmer — Wed, 25 Feb 2026 14:59:35 +0000

I am a garlic farmer from Korea. I want to share a personal smartphone-based AI agent
I developed over several intensive days. Without access to a PC, I built everything using
only my phone and sheer determination. All development was conducted in Korean.
This whitepaper was translated with the collaborative help of multiple AI assistants.

Demonstration Video

Below is the actual working video of Garlic Terminal in action:

Watch Demo Video

Executive Summary

Unihertz Titan2 Phone Termux-Based Autonomous Agent: Operates on Android (Termux) without a PC, > managing over 5,800 documents (108MB) as an AI agent ...

🧄 Garlic Terminal v2.0 Whitepaper

Korean Natural Language-Based Semi-Autonomous Android AI Agent — RAG, GarlicLang, Multi-Provider Support

Executive Summary

Unihertz Titan2 Phone Termux-Based Autonomous Agent: Operates on Android (Termux) without a PC, managing over 5,800 documents (108MB) as an AI agent
GarlicLang (Custom Script Language): Eliminates token waste from automatic validation, local execution, and API ping-pong conversations through a sandboxed scripting language
Hybrid RAG and Hallucination Prevention: Combines FTS5 full-text search with vector similarity search, and blocks AI hallucinations through verify_tool_response to ensure superior accuracy compared to existing AI chatbots

System Architecture

The Garlic Terminal operates through a multi-layered architecture. The browser UI communicates with web.py through Server-Sent Events (SSE), enabling real-time token and log streaming. The sidebar.js manages chat sessions and HUD display, while agent.py serves as the core orchestration engine. The search.py module handles RAG operations on knowledge.db, and garliclang_bridge.py executes GarlicLang scripts. Multi-provider LLM APIs (Cerebras, Gemini, DeepSeek) are accessed through the agent engine with automatic fallback capabilities.

Core Components

web.py (930 lines) & agent.html (1,218 lines)

A zero-dependency backend server based on Python's built-in http.server that serves as a bridge between the Termux environment and users. It streams AI tokens and tool execution logs in real-time through SSE (Server-Sent Events), and provides file writing and database reading endpoints wrapped in a split terminal UI.

agent.py (893 lines)

Serves as the agent's brain. It orchestrates the entire workflow, handles tool parsing (parse_tools), and strictly isolates Python/Bash execution to prevent destructive operations.

def run_agent(user_input, on_event=None, preferred_provider=None, no_fallback=False):
Core agent execution engine
pass

RAG Search Engine

Based on search.py (128 lines), it replaces standard SQL wildcards with a powerful hybrid system. It stores 5,879 documents in knowledge.db locally and combines FTS5 full-text search with cosine similarity vector matching to extract precise context. It automatically removes Korean particles (은/는/이/가, etc.) to improve keyword matching rates.

Security Model and Hallucination Prevention

The most important aspects of an autonomous agent are preventing infinite loops and blocking API cost spikes.

Hallucination Prevention (verify_tool_response): Compares AI claims against actual tool execution results, identifying hallucinations when there's a mismatch.

def verify_tool_response(tool_results, ai_response):
Compare AI-claimed results against actual tool output
pass

Local Delegation: The AI writes execution plans to .gl files and calls [tool:garlic] only once, reducing API usage by 80%.

Command Blocking: Hardcoded rules in web.py pre-block dangerous commands like rm -rf, direct sqlite3 access, and path access outside ~/garlic-agent/.

UI Features

Split Terminal: The interface displays AI conversation on the left and real-time execution logs and terminal output on the right simultaneously, allowing users to monitor both conversation flow and system operations.

Session Sidebar (sidebar.js, 138 lines): Provides chat session history management and conversation list selection UI, enabling users to switch between different conversation threads.

HUD (Heads-Up Display): Displays CPU, memory, disk, swap, llama status, web server status, and wake-lock status in real-time on the upper right corner of the screen, providing instant system health monitoring.

GarlicLang — Autonomous Delegation Script Language

A Korean-based domain-specific language (DSL) created to solve the API "ping-pong" problem, allowing the agent to execute complex multi-step operations locally without repeated API calls.

Execution Details

Execution file: garliclang_bridge.py
Execution function: run_garliclang_source(source, context)

Syntax Examples

[execute] ls -la — Executes bash commands through subprocess.run()
[output] Current time is {time} — Outputs text with variable substitution support
[set_variable] count = 5 — Stores internal variables
[condition] {count} > 3 — Performs branching logic
[repeat] 3 — Executes loops
[verify] file_exists ~/garlic-agent/web.py — Returns PASS/FAIL verification results

System Statistics (v1.5.x)

Total code: approximately 3,506 lines
Knowledge base: 5,879 documents (108MB)
Chat history database: 52KB
Automated scripts: 137 .gl files
Supported providers: DeepSeek, Gemini (Flash/Pro), Cerebras, and others dynamically managed via config.json

Known Issues and Roadmap

Token Bleed: Past error logs accumulate in multi-turn context. This is mitigated by recommending new sessions after large-scale operations.

Provider Timeout: NVIDIA Kimi K2.5 frequently encounters 429 errors or timeouts (120-second limit).

Future Improvements: Implement VFS (Virtual File System) mirroring for complete pre-validation sandbox testing and enhanced security isolation.

Critical Incident Records and Absolute Rules

The system's core principles emerged from actual critical failures. Future AI agents must learn from these incidents to avoid repeating the same mistakes.

Incident 1: agent.html Backup Restoration Disaster (2026-02-25)

What Happened: When restoring agent.html, AI used archive/bak/*.bak instead of the full system backup (.tar.gz), resulting in loss of several hours of work.

Root Cause: Files in archive/bak/ are incremental backups taken immediately before individual modifications. When multiple edits accumulate, they cannot guarantee the latest state.

Absolute Rule: Always check ls /storage/emulated/0/Download/garlic-agent-.tar.gz first before restoring. Extract only the relevant file from the latest tar archive. Use archive/bak/.bak only as an absolute last resort.

Incident 2: python3 -c One-liner Permanently Banned

What Happened: Executing multi-line scripts with python3 -c caused line breaks to collapse and quote conflicts to occur, destroying system logic and failing silently.

Absolute Rule: Complex Python code must always be created as .py files and executed with python3 filename.py instead of inline commands.

Incident 3: Root Path (/) Access Violation

What Happened: AI hallucinated /garlic-agent/ instead of ~/garlic-agent/, resulting in 3 consecutive failures and massive token waste.

Absolute Rule: Strictly prohibit /garlic-agent/ usage and always use ~/garlic-agent/ for all operations.

Incident 4: Infinite Retry Causing API Overcharging

What Happened: Failed tool:patch commands were blindly retried without checking file status, resulting in massive unexpected API charges.

Absolute Rule: Stop immediately on 3 consecutive failures and report to the user. Always recommend starting a new session after large-scale operations.

Technical Specifications

Core Architecture

Runtime: Python 3.11+ (Android Termux optimized)
Agent Engine: Garlic Agent v1.5.x for autonomous tool calling and scenario execution
Validation Language: GarlicLang v20.x for multi-step verification and local interpretation
Monitoring System: watchdog.sh for 3-second interval process monitoring with automatic recovery

Data and Search Engine

Database: SQLite3 with FTS5 virtual tables for high-speed full-text search
Embedding Model: nomic-embed-text-v1.5 (768-dimension, local execution)
Knowledge Base: Google Takeout, Obsidian, and Markdown documents (6,000 items, 180MB)
Search Method: Hybrid RAG combining keyword matching with vector similarity search

LLM Orchestration

Multi-Provider Support: Cerebras (Llama-3.1-70B), Groq, NVIDIA Kimi, Gemini 3.0, 3.1
Fallback Logic: Automatically switches to next provider on API error or 429 response (delay under 2 seconds)
Communication Protocol: OpenAI-compatible API with SSE real-time streaming

Interface and Security

Web UI: Lightweight conversation window based on http.server with SSE real-time rendering
CLI: curl-based command execution and direct terminal control
Security: Local host binding (127.0.0.1) with whitelist IP access control
Backup Strategy: tar.gz automatic scheduling with dual backup in Download folder

Autonomy Metrics

Self-Modification: 3 limited code modification rights combining tool:patch and GarlicLang
Meta-Cognition: SOUL.md-based role definition with immediate rule reflection during incidents (Layer 0-2 architecture)

Conclusion and Future Vision

Realization of Decentralized Personal AI

The Garlic Agent project proves that one can build an independent knowledge base and autonomous engine within a personally-owned mobile device (Unihertz Titan2) without relying on large corporations' cloud infrastructure. This represents a practical implementation of data sovereignty and AI autonomy combined. Though built from existing concepts, the integration is novel and powerful.

Garlic Farmer and AI Symbiosis

The project's core strength lies in meta-cognitive collaboration where AI transcends being a mere tool and learns system rules (SOUL.md) independently, quickly supplementing them when incidents occur. Garlic Farmer provides infrastructure direction, while AI validates logical integrity through GarlicLang and expands the system's capabilities.

Future Roadmap

VFS Mirroring Enhancement: Build perfect synchronization with external storage beyond Termux environment and implement comprehensive virtual file system
Multi-Agent Discussion System (Garlic Talk): Achieve collective intelligence by having multiple agents using different LLMs discuss and collaborate to derive optimal solutions
Embedded Autonomous Tuning: Real-time optimization of embedding models and search weights based on user conversation patterns and work history—a self-evolving RAG system
Complete Isolation Sandbox: Apply more powerful execution security and resource limiting in GarlicLang v30 and beyond for enhanced safety

A Comma, Not a Period

Garlic Terminal is not a finished product but a constantly evolving organism. On any given day, hundreds of errors are being processed in real-time through collaboration with multiple AI providers. The trial-and-errors like the backup disaster of February 2026 are becoming fertilizer that makes the system stronger. I believe that without a PC, without significant capital, with only code, logic, and life philosophy, one can build a personal AI ecosystem.

Appendix A: GarlicLang Detailed Syntax and Execution Flow

Block Structure and Operations

GarlicLang operates on a block-by-block basis enclosed in square brackets ([]). [execute] ls -la /garlic-agent executes bash commands through subprocess.run(). [output] Current time is {time} outputs text with variable substitution support. [set_variable] count = 5 stores internal variables. [condition] {count} > 3 performs branching with different execution paths. [repeat] 3 executes loops. [verify] file_exists /garlic-agent/web.py returns PASS/FAIL results.

Execution Flow Pipeline

User natural language input is converted to GarlicLang script by LLM, which is then parsed by garliclang_bridge.py. Each block is executed sequentially with a 30-second timeout. Results are wrapped in JSON and transmitted via SSE to the UI. Dangerous commands (rm -rf, mkfs, etc.) are blocked in advance by pattern matching.

Automatic Verification Mechanism

The [verify] block checks AI claims against actual system state. For example, if AI claims "the file exists," [verify] file_exists path confirms it. If PASS/FAIL results mismatch, it's classified as hallucination and the user is warned immediately.

Appendix B: Similar Technology Comparison

Feature Comparison Matrix

Feature	Garlic Terminal	GitHub Copilot CLI	Amazon Q CLI	ChatGPT/Claude
Execution Environment	Android Termux	PC/Mac	PC/Mac/Linux	Web/API
Korean Language Support	Native	Limited	Limited	Supported
Intermediate Language (DSL)	GarlicLang	None	None	None
Local Code Execution	Yes (Sandbox)	Yes	Yes	No (API only)
Hybrid RAG Search	Yes (FTS5+Vector)	No	No	No
Hallucination Prevention	Auto Verify+Cross Verify	No	No	No
Offline Capability	Partial (Local LLM)	No	No	No
Multi-LLM Fallback	Yes (Cerebras→Gemini→DeepSeek)	No	No	No
Cost Model	Partially Free (API Free Tier)	Paid	Paid	Paid

Key Differentiator: Through the intermediate language GarlicLang, AI writes a plan rather than directly manipulating the system, simultaneously solving security and cost issues.

Appendix C: Development Timeline

Phase 1: Foundation Building (2026-02-22)

Termux environment setup, Python package installation, basic web.py HTTP server implementation, initial LLM API integration (Cerebras).

Phase 2: RAG System Development (2026-02-23)

knowledge.db creation with FTS5 full-text search, nomic-embed-text vector embedding adoption, hybrid search logic implementation, 5,800+ document automatic indexing pipeline completion. This phase consolidated conversations from tens of thousands of AI interactions spanning two years. Over ten thousand chat sessions were opened and processed, allowing organization of the most valuable personal knowledge resources.

Phase 3: GarlicLang and Security Enhancement (2026-02-24)

GarlicLang DSL design and interpreter implementation, garliclang_bridge.py completion, dangerous command blocking through pattern matching, SOUL.md backup policy establishment with triple redundancy.

Phase 4: Stabilization and Autonomous Operation (2026-02-25)

chat_sessions.db conversation saving bug fixes (conv_id=9), HUD web and wake status display additions, startall.sh llama default set to OFF for memory conservation, llamaon/llamaoff alias registration, archive_chat.py for date-based conversation archiving, watchdog.sh llama auto-restart disabling, integrated whitepaper v2 composition.

Appendix D: Prompt Engineering Evolution

Generation 1: Initial Prompt (v0.1)

Simple directive approach: "Answer the user's question." Results showed frequent hallucinations and numerous system command misexecutions.

Generation 2: Structured Prompt (v0.5)

Incorporated role assignment and context injection. Achieved 60% reduction in hallucination rate but still struggled with complex multi-step tasks.

Generation 3: Current Prompt (v1.0)

SOUL.md-based self-definition combined with RAG context, explicit tool listing, and forbidden pattern enumeration. System prompt includes SOUL.md in full, top 5 RAG context injection, complete tool list, forbidden command pattern library, and enforced response formatting.

Key Lesson

Prompts function as code. A single-line instruction change determines entire system stability. Positive directives ("do this") prove more effective than negative ones ("don't do that"). Accuracy increases dramatically when concrete examples are provided.

Appendix E: Real-World Usage Scenarios

Scenario 1: Document Search

User: "Find documents related to meta-cognition protocol"
Process: Hybrid RAG search with keyword and vector matching
Output: Top 3 relevant documents with summaries

Scenario 2: System Management

User: "Check disk usage"
Process: GarlicLang conversion to [execute] df -h command
Output: Natural language summary of disk space allocation

Scenario 3: File Modification

User: "Change port number to 8081 in web.py"
Process: tool:read file → tool:patch modification → AST verification → save
Output: Confirmation of successful modification

Scenario 4: Interactive Debugging

User: "Server won't start. Check logs"
Process: [execute] tail -20 ~/garlic-agent/web.log → error pattern analysis
Output: Root cause identification with recommended solutions

Scenario 5: Backup and Recovery

User: "Full backup please"
Process: Execute backup.sh → Create tar.gz archive
Output: Confirmation with archive size and location

Appendix F: Multi-Provider LLM Orchestration

Provider Priority Hierarchy

Cerebras (Llama 70B) — Free tier, fastest response, primary choice
Gemini 3.0 Flash — Paid tier, highly stable, secondary choice
DeepSeek V3 — Paid tier, superior Korean language handling, tertiary choice

Fallback Logic and Switching

The call_llm() function sequentially attempts providers in priority order. On 1st priority failure, automatically switches to 2nd priority with delay under 2 seconds. If all providers fail, returns error message. Maximum retry count is 2 per provider.

Cross-Verification for Critical Responses

Important responses undergo verification through different provider. If similarity between responses falls below 70%, triggers hallucination suspicion warning. The call_llm_verify() function performs this automatically.

Final Notes

Integrated Whitepaper Completion: 2026-02-25
Total Content: 10 Chapters + 6 Appendices
Document Version: 2.0
Created by: Garlic Farmer + Garlic Agent (Claude Opus 4.6 + Gemini)

Though I am merely a Korean garlic farmer without expensive computing equipment, I accomplished this ambitious project. It represents two years of continuous effort, learning, and perseverance through numerous challenges. This achievement demonstrates that with determination, code, logic, and philosophical commitment, one can build powerful personal AI systems accessible to anyone with modest resources.

🧄 [Project] A Garlic Farmer's garlic-agent: Inspired by OpenClaw, Built on Android Termux with 6K Documents

garlicfarmer — Thu, 19 Feb 2026 23:45:28 +0000

This document was created with the assistance of garlic-agent RAG (just built) and in collaboration with Claude Opus 4.6

Local RAG for 6K Korean documents running on Android Termux

📌 Table of Contents
1. Project Overview
2. System Environment
3. Project Structure
4. Construction Process (Chronological Order)
5. RAG System Details
6. Search System (FTS5 + Vector)
7. Automation Features
8. GarlicLang Integration
9. Web UI
10. Current System Status
11. Recovery Method
12. Backup File List
13. Future Improvement Direction

1. Project Overview

garlic-agent is a lightweight AI agent that can search, analyze, and autonomously execute approximately 6,000 documents (approximately 6.9G) of personal materials accumulated over 2 years on Google Drive in a local Android Termux environment based on semantic meaning. For reference, the phone is a Unihertz Titan2. The screen is wide, resembling a BlackBerry Passport, which is nice. I do not have a PC. I completed this task with only a BlackBerry Key2 and several phones with physical keyboards.

It was created out of curiosity to replace OpenClaw, and currently uses cheap Chinese DeepSeek as the main API LLM and implements RAG (Retrieval-Augmented Generation) with the nomic-embed local embedding model.

Core Philosophy:
- Rather than writing code directly, complete the project with the ability to make AI do what you want and verify it.
- Minimize technical jargon, provide in a form that can be executed immediately by copy-paste. This requires tremendous concentration and time flew by 24 hours in an instant... I only did directional judgment and verification.
- Language is an operating system according to my fundamental belief. Coding is also a language.
  After talking a lot with AI, I realized that structure was the essence.
  However, I do not know coding well. Because of that, instead of typing one by one, I prefer a cross-verification method by keeping several companies' different AIs running. Then I also learned that the AIs have consistent context while going through multiple browser windows. And with the remarkable AI development that constantly changes, I find it amazing that such a thing is possible.

2. System Environment

| Item | Value |
|---|---|
| Device | Android 14, ARM64 |
| Environment | Termux |
| Python | 3.12 |
| Main LLM | DeepSeek (API) |
| Auxiliary LLM | Cerebras, Groq, Gemini, NVIDIA Kimi |
| Embedding Model | nomic-embed-text-v1.5.Q4_K_M.gguf (137 MB, 768 dimensions) |
| Embedding Server | llama.cpp llama-server (port 8081) |
| Web UI | Python HTTP Server (port 8080) |
| DB | SQLite3 (knowledge.db) |

3. Project Structure

~/garlic-agent/
├── agent.py           # Main agent (687 lines)
├── web.py             # Web UI server (Flask-like HTTP)
├── search.py          # Hybrid RAG search (FTS5 + vector)
├── tools.py           # 6 tools (read/exec/write/patch/search/garlic)
├── security.py        # Security settings (exec_timeout: 30s)
├── config.json        # Configuration (max_loops: 30)
├── knowledge.db       # SQLite DB (177 MB, 6,159 docs)
├── agent.html         # Web UI frontend
├── build_rag.py       # RAG embedding generation (initial version)
├── build_rag2.py      # RAG embedding generation (NULL only processing)
├── write_rag_doc.py   # RAG_BUILD.md generation script
├── RAG_BUILD.md       # RAG construction record (275 lines)
├── COMPLETE_BUILD.md  # Complete construction record
├── SOUL.md            # Agent identity/philosophy/principles
├── TOOLS.md           # Tool usage
├── USER.md            # User profile
├── MEMORY.md          # Memory storage
├── HEARTBEAT.md       # Status check
├── KNOWN_ISSUES.md    # Known issues
├── VERSION.md         # Version history
├── HANDOVER.md        # Handover document
├── HANDOVER_QA_20260218.md
├── REPORT_v20.3.md
├── GARLICLANG_SPEC.md # GarlicLang specification
├── scripts/           # GarlicLang scripts (.gl) 42 pieces
├── security/          # Security related
├── static/            # marked.min.js etc.
├── memory/            # Memory by date (2026-02-17~19.md)
└── garliclang_full/   # GarlicLang v20.x complete project
    ├── MASTER_DOC.md
    ├── WORKFLOW.md
    ├── PROJECT_STATUS.md
    ├── BRIEFING.md
    ├── NVIDIA_KIMI_GUIDE.md
    └── ...

~/.openclaw/extensions/kimi-claw/llama.cpp/build/bin/
├── llama-server       # Embedding server binary
└── nomic-embed.gguf   # Embedding model (137 MB)

4. Construction Process (Chronological Order)

v1.5.0 — Basic Agent Complete (2026-02-17)

I converted approximately 6,000 documents from Google Drive Takeout to SQLite knowledge.db. The table structure is id, filename, folder, content, length, and the initial number of documents was 5,879 (38MB).

Basic search was a SQLite LIKE '%keyword%' method. Problems were inability to search based on meaning, slow speed, and inability to perform complex AND/OR searches.

Six tools were implemented: tool:read, tool:exec, tool:write, tool:patch, tool:search, tool:garlic.

v1.5.1 — HUD Added (2026-02-18)

Real-time system HUD was added to the web UI. Measure CPU with /proc/stat, display MEM/SWP/DSK, web.py /hud endpoint, max_loops increased to 20.

v1.5.2 — RAG Integration Complete (2026-02-19)

Detailed explanation in sections 5~9 below.

5. RAG System Details

5-1. Methods Attempted (Failure)

| Method | Result | Reason |
|---|---|---|
| sentence-transformers | ❌ | No ARM64 GPU, excessive package size |
| DeepSeek Embedding API | ❌ | 404 error |
| Gemini API embedding | ❌ | Cannot send personal materials externally |

5-2. Final Choice: llama.cpp + nomic-embed

Start embedding server
~/.openclaw/extensions/kimi-claw/llama.cpp/build/bin/llama-server \
  -m ~/.openclaw/extensions/kimi-claw/llama.cpp/build/bin/nomic-embed.gguf \
  --embeddings --port 8081 -np 4

| Item | Value |
|---|---|
| Model | nomic-embed-text-v1.5.Q4_K_M.gguf |
| Size | 137 MB |
| Dimension | 768 |
| Quantization | Q4_K_M |
| Server Port | 8081 |
| Processing Speed | ~0.68 seconds/document |

5-3. DB Schema Change

ALTER TABLE docs ADD COLUMN embedding BLOB;
-- 768 float32 = 3,072 bytes per document

5-4. Embedding Generation (build_rag2.py)

By processing only documents where embedding IS NULL, I completed 5,858 in approximately 67 minutes (approximately 0.68 seconds/document). Acquire embedding via POST request and store BLOB with struct.pack.

Embedding request example:
payload = json.dumps({"content": text[:2000]}).encode()
req = urllib.request.Request("http://127.0.0.1:8081/embedding", data=payload)

6. Search System (3-Stage Hybrid)

search.py performs 3-stage search.

1st Priority — FTS5 Full-Text Search
CREATE VIRTUAL TABLE IF NOT EXISTS docs_fts
USING fts5(filename, folder, content, content='docs', content_rowid='id');
INSERT INTO docs_fts(docs_fts) VALUES('rebuild');

2nd Priority — Vector Cosine Similarity (RAG)
def cosine(a, b):
    dot = sum(x*y for x,y in zip(a,b))
    na = sum(xx for x in a)*0.5
    nb = sum(xx for x in b)*0.5
    return dot/(na*nb) if na and nb else 0

3rd Priority — LIKE Fallback
SELECT id, filename, folder, length, substr(content,1,300)
FROM docs WHERE content LIKE ? ORDER BY length DESC LIMIT ?

| Item | Value |
|---|---|
| FTS5 Weight | 0.5 |
| Vector Similarity Weight | 0.5 |
| Keyword Weight | 0.6 |
| Average Search Time | ~1.7 seconds |
| DB Size (FTS5 included) | 177 MB (existing 84 MB → 177 MB) |

7. Automation Features

7-1. tool:write Auto Indexing

Added _auto_index() function to tools.py. When file is saved with tool:write, it automatically registers in knowledge.db and creates embedding.

def _auto_index(path, content):
Generate embedding only when llama-server is running
INSERT or UPDATE in knowledge.db docs table
Automatically save embedding BLOB

Test: Saved test_auto_index.md → Confirmed immediate registration with ID 6154 ✅

7-2. Backup Script

~/garlic-agent/scripts/backup.sh
bash ~/garlic-agent/scripts/backup.sh

Execution: tar creation → Download copy → Auto media scan

7-3. webstart Alias

Registered in ~/.bashrc
webstart  # = cd ~/garlic-agent && python3 web.py

7-4. Browser Timeout (agent.html)

var ctrl = new AbortController();
var tid = setTimeout(function(){ ctrl.abort(); }, 600000); // 10 minutes
fetch("/chat", { signal: ctrl.signal, ... })
  .then(...)
  .finally(function(){ clearTimeout(tid); });

8. GarlicLang Integration

GarlicLang v20.x is a Korean-based AI scripting language. It uses .gl extension and is executed with tool:garlic.

Example GarlicLang Script (test_hello.gl)
[File Write] test_hello.py
print("Hello GarlicLang")
[/File Write]
[Execute] python3 test_hello.py [/Execute]
[Verify] Output contains "Hello GarlicLang" [/Verify]
[Output] Verification result [/Output]

- Script location: ~/garlic-agent/scripts/ (42 .gl files)
- GarlicLang complete project: ~/garlic-agent/garliclang_full/
- knowledge.db contains 94 or more GarlicLang-related documents
- .gl files 140 pieces exist in home directory

9. Web UI

- URL: http://127.0.0.1:8080?token=garlic2026
- Markdown rendering: marked.js (CDN + static fallback)
- Clipboard button: Response copy function
- Model selection: DeepSeek / Cerebras / Groq / Gemini / NVIDIA
- HUD: Real-time MEM/SWP/DSK display on top of screen
- SSE streaming: Real-time response output

10. Current System Status (2026-02-19 Final)

| Item | Value |
|---|---|
| Version | garlic-agent v1.5.2 |
| Total Documents | 6,159 pieces |
| Embedding Complete | 5,858 pieces (remainder are newly added) |
| DB Size | 177 MB (FTS5 included) |
| FTS5 Index | docs_fts virtual table ✅ |
| Auto Indexing | Automatic on tool:write save ✅ |
| agent.py | 687 lines |
| max_loops | 30 |
| Search Speed | ~1.7 seconds |
| Embedding Model | nomic-embed-text-v1.5 (137 MB, 768 dimensions) |
| Distribution | garlic-agent-v1.5.2.tar.gz (150 KB, excluding DB) |

Currently not considering distribution. Honestly I do not know how to use GitHub and do not want to know. Several AI opinions say this is good, so I am doing it this way. I do not know the details. I only know what content is in it.

11. Recovery Method

Recovery order when new phone/reinstall

Step 1 — Termux installation and basic environment setup
pkg update && pkg upgrade
pkg install python sqlite git
pip install requests flask

Step 2 — Code Recovery
Recover from Download folder
cp /storage/emulated/0/Download/garlic-agent-v1.5.2.tar.gz ~/
cd ~ && tar xzf garlic-agent-v1.5.2.tar.gz

Step 3 — DB Recovery
cp /storage/emulated/0/Download/knowledge.db ~/garlic-agent/knowledge.db

Step 4 — Embedding Server Installation (Optional)
- Download nomic-embed.gguf (137 MB) from Google Drive
- Build llama.cpp or restore binary
- Start server:
~/.openclaw/.../llama-server -m nomic-embed.gguf --embeddings --port 8081 -np 4

Step 5 — Start Agent
cd ~/garlic-agent && python3 web.py

Or if registered in ~/.bashrc:
webstart

Step 6 — Browser Access
http://127.0.0.1:8080?token=garlic2026

⚠️ Keyword search (FTS5 + LIKE) works normally even without embedding server. Only vector similarity search is disabled.

12. Backup File List

| File | Size | Location | Priority |
|---|---|---|---|
| knowledge.db | 177~178 MB | /storage/emulated/0/Download/ | ⭐⭐⭐ Essential |
| garlic-agent-v1.5.2.tar.gz | 150 KB | /storage/emulated/0/Download/ | ⭐⭐⭐ Essential |
| COMPLETE_BUILD.md | 8.5 KB | /storage/emulated/0/Download/ | ⭐⭐ Recommended |
| RAG_BUILD.md | ~10 KB | /storage/emulated/0/Download/ | ⭐⭐ Recommended |
| nomic-embed.gguf | 137 MB | Redownloadable from HuggingFace | ⭐ Optional |

Google Drive upload recommended files:
- knowledge.db — 2 years of accumulated tens of thousands of conversations with AI, 1st refined approximately 6G materials + embedding included, most important
- garlic-agent-v1.5.2.tar.gz — Complete code (excluding DB)
- COMPLETE_BUILD.md — This document (including recovery guide)

13. SOUL.md Core Principles (Current)

The SOUL.md containing garlic-agent's identity and action principles includes the following. Referenced OpenClaw and plan to add my philosophy as it progresses.

Identity: Lightweight autonomous AI agent running on Android Termux. Can access user's 6,159 personal documents.

User Background: Currently living as a farmer for 16 years. Previously had experience with mainframe environment, IDC construction/operation during Internet environment changes, mainframes, servers, networks, firewalls, backups, EMC, and various Unix. I devoted myself to agriculture during that time and lived a life where I forgot about PCs.
I first approached AI out of curiosity and tried to revive some old memories. This is the truth. I have absolutely no lifelong coding experience. However, it seems I see structural system things well. Farmers need observation and meticulousness in growing crops. Currently I give instructions in Korean to AIs, verify, and only make judgments. Looking back, my entire life seems to be a continuous lonely wandering. Now I am thinking of living a different life.

AI Kernel 3 Core Principles:
1. Extreme Realism Principle — Use only verifiable facts, official documents, numerical values. No speculation.
2. Metacognitive Autonomy — Self-improvement based on feedback. Auto-correction on failure.
3. Hierarchical Orchestration — Decompose complex tasks step-by-step for processing.

Autonomous Execution Rights: All commands executable in Termux including tar, cp, pkill, am broadcast, sed, grep, sqlite3, python3, etc.

14. Known Issues and Solutions

| Issue | Cause | Solution |
|---|---|---|
| tool:patch 0 patch failure | Patch format mismatch | Use tool:write for full overwrite |
| SQLite3 result reading mismatch | DeepSeek hallucination | Use Python script to query directly |
| Browser connection disconnection | AbortController timeout | Set to 600,000ms (10 minutes) |
| BodyStreamBuffer was aborted | Timeout + clearTimeout missing | clearTimeout added complete |
| Version display v1.5.0 | agent.py hardcoding | Replaced to v1.5.2 with sed |

15. Future Improvement Direction

- Automatic embedding server start/stop: Auto-run llama-server when web.py starts
- Real-time indexing queue: Generate embedding immediately when file is saved (currently only when server is running)
- Search result caching: Cache frequently searched query results
- Feedback-based weighting: Auto-adjust FTS5/vector weights based on user selection
- Multimodal search: Index image/PDF content
- agent.py v2: Better context management, multi-turn memory

Final Performance Summary

$$\text{Total Documents} = 5879(\text{original}) + 274(\text{garliclang}) + n(\text{new}) = 6159$$

$$\text{Embedding Generation Time} \approx 5858 \times 0.68s \approx 67\text{ minutes}$$

$$\text{Search Speed} \approx 1.7s \ (\text{FTS5} + \text{cosine similarity})$$

$$\text{DB Size}: 38MB(\text{original}) \rightarrow 84MB(\text{embedding}) \rightarrow 177MB(\text{FTS5})$$

This document is an incomplete record of garlic-agent v1.5.2 construction process and observation experiment, but when provided to a new AI, the entire context can be immediately grasped.

And I dedicate infinite respect and tribute to Steve Jobs, the late person who connected the world with only a phone like this.
And I also give thanks to Peter Steinberger of OpenClaw who inspired me. It is because of you. Thank you very much.

And I seldom post in communities, but non-English speakers struggle with translation. So I can only do translation with AI. And all work processes are done only in Korean, so if moved to English it may seem strange, but please look at it as the observation experiment development of a Korean farmer. I worked very hard for a few days, even saving sleep, but it is a humble result, but on my phone, I feel like I can do whatever I imagine, so it was work that gave me a sense of accomplishment. For the first time in my life, I made a web UI and it works so well that it is good. Now I have confidence that I can do anything with my phone based on my data so far. Also, as I use more than ten different AIs every day watching AI develop dazzlingly, I can feel the difference right away with human-specific intuition. I think this is the experience of tens of thousands of conversations over the past 2 years, and such work development became the motivation for it. Thank you for reading this long article to the end.

Written by: Korean Garlic Farmer & opus4.6, 2026-02-19 🧄

How a Korean Garlic Farmer Built a Scripting Language That Catches AI Lies — On a Phone

garlicfarmer — Sat, 14 Feb 2026 12:44:48 +0000

I'm a Korean garlic farmer with no PC. I built a programming language on my phone using only AI conversations.

TL;DR: No coding experience. No computer. Just a smartphone, copy-paste, and conversations with AI. The result is GarlicLang — a Python-based scripting language that tells you when AI is lying.

What happened

I'm a garlic farmer in South Korea. I don't have a PC. I don't know how to code. But I wanted a way to give commands to AI and verify whether the output is real or hallucinated.

So I started talking to Claude (Anthropic's AI) on my phone. I described what I wanted in Korean. Claude designed the language. I copied the code, pasted it into ChatGPT's sandbox, and ran it. When tests failed, I carried the error messages back to Claude. When Claude needed execution results, I carried them from ChatGPT.

I was the human relay between AIs, using nothing but copy and paste.

The language is called GarlicLang. It's written in pure Python (standard library only, zero dependencies), and it runs inside AI sandboxes like ChatGPT's Code Interpreter.

What makes it different

GarlicLang has a command that no other language has:

try
run "python3 script.py"
verify output contains "expected answer"
on hallucination
print "AI lied."

on hallucination triggers when the command succeeds (exit code 0) but the output doesn't match what you expected. This is designed specifically to catch AI fabrication — not crashes, not errors, but confident wrong answers.

What it can do

Write files, run commands, verify results, define functions, use arrays, loop with while/break/continue, import other scripts, and catch errors or hallucinations. All in a syntax designed to be readable by non-programmers.

Example — check if AI wrote the correct file:

write "hello.py" "print('hello from garlic farm!')"
run "python3 hello.py"
verify output contains "hello"

Example — sum 1 to 100 with a loop:

let sum = 0
let i = 1
while i <= 100
let sum = sum + i
let i = i + 1
end
print sum
verify output contains "5050"

The numbers (all verified by actual execution)

Test suite	PASS	FAIL	Notes
Phase 1 — basics	4	0	file ops, run, verify
Phase 2 — error handling	9	2	2 failures are intentional (test the error handlers)
Phase 3 — variables & print	13	0	enabled by v0.3.1 bug fix
Phase 4 — arrays, loops, functions	16	0	all v0.4 features verified
Total	42	2	44 tests, 2 intentional failures

Additional tests passed: recursion (5! = 120), nested arrays, Korean special characters, error recovery (try/on-fail with division by zero), and summing a 100-element array (= 5050).

All tests were executed in ChatGPT's Code Interpreter sandbox. Process ID, working directory, and file system contents were verified.

The honest problems

Three bugs were found and documented:

Bug 1: while treats the string "0" as true, but if treats it as false. Same condition, different behavior.

Bug 2: verify file "variable_name" contains "text" doesn't resolve the variable — it looks for a file literally named "variable_name". Reproduced and confirmed.

Bug 3: After verify run "command" contains "text", the interpreter doesn't save the output, so on hallucination checks the wrong data.

ChatGPT rated the project 6.0/10: originality 8, usability 6, completeness 5, stability 4, extensibility 6.

These are real scores from an AI that actually ran the code, not my own rating.

How it was built

AI	Role
Claude Opus 4.6	Designed the language, wrote docs, analyzed bugs
ChatGPT (Code Interpreter)	Saved files, ran all tests, reproduced bugs
Me (garlic farmer)	Relayed messages between AIs via copy-paste on phone

No git. No IDE. No terminal. Just chat windows and a clipboard.

What I learned

AI estimates of line counts were consistently wrong (guesses ranged from 578 to 1,697; actual count was 783 lines for the main module, measured with wc -l). Never trust AI estimates — always measure.

pip install fails in some AI sandboxes. The workaround is sys.path.insert(0, '.'). If that fails, a standalone build script merges all modules back into one file.

If you give AI too many instructions at once, it fails. Breaking tasks into single steps works.

Current state

Version 0.4.1. Eight Python modules, ~2,000 total lines. Works in ChatGPT sandbox. Three known bugs documented with fix instructions ready. No external dependencies.

The source code isn't public yet. I'm still deciding how to release it.

Built with no code, no PC, no experience. Just garlic, a phone, and AI.

"I'm a Garlic Farmer. I Build Software on My Phone. I Can't Code."

garlicfarmer — Thu, 12 Feb 2026 21:59:38 +0000

A Garlic Farmer's Guide to AI: Building Software with Nothing but a Phone

The Setup

I'm a garlic farmer. I've been living in rural South Korea for 16 years. I don't own a PC. Everything I do with AI, I do from my phone.

That sounds like a limitation — and it is. But it's also the reason I discovered something that neither developers nor typical AI users seem to be doing.

What Most People Do with AI

There are roughly two groups of people using AI for coding right now.

Non-developers open ChatGPT and say "write me a calculator." The AI spits out code as text. They look at it, maybe copy it somewhere, and that's the end of it. The code never actually runs. It's just text on a screen.

Developers use tools like Claude Code, OpenClaw/Pi, or Cursor on their PCs. They open a terminal, type commands, install packages, set up API keys, and run code directly. AI helps them — suggests code, fixes bugs — but the developer is the one actually executing everything. They're the hands; AI is the assistant.

I'm in neither group.

What I Do Instead

I don't write code. I don't read code. I don't have a terminal. What I do is this: I open an AI chat window, paste in a set of tools, and tell the AI to build things using only those tools.

The tools are simple. There are exactly four: Write (create a file), Read (read a file), Edit (modify a file), and Bash (run a command). These four buttons — packaged as something called Pi Tools — turn any AI chat window into a programming environment.

When I tell the AI "create a student grade management system," it doesn't just show me code as text. It actually creates the files, runs the code, checks the output, finds errors, fixes them, and runs again. Real files get created. Real code gets executed. Real results come back.

I don't touch any of it. I just decide what to build and what to do next.

Where This Idea Came From

It wasn't some grand vision. It came from not having a choice.

OpenClaw/Pi is an open-source AI coding agent that took GitHub by storm — 145,000 stars in a week. Its core discovery was that an AI only needs four tools (read, write, edit, bash) to work like a programmer. But to use it, you need a PC, Node.js, a terminal, and an API key. For a developer, that's five minutes of setup. For me, it's impossible.

So I asked a different question: what if I could give those same four tools to an AI through a chat window?

I had an AI translate the core concept from TypeScript to Python, stripped out everything that required an API or a server, and ended up with about 150 lines of code that could be pasted into any AI chat sandbox. No installation. No API key. No PC. Just paste and go.

That's Pi Tools.

The Secret Weapon: Copy-Paste and AI-Written Instructions

There's a part of my workflow that sounds too simple to be important, but it's actually the most powerful thing I do.

Copy-paste is the backbone. When one AI produces a result — a report, a code file, an analysis — I don't summarize it in my own words and relay it to another AI. I copy the entire output and paste it directly. This preserves every detail. When I pasted Grok's full research report into Claude, Claude could spot that Grok had used the word "estimated" and "extraction failed" — details I would have missed if I'd just said "Grok wrote a report and it seemed exaggerated."

Human language loses information. AI output pasted raw does not.

I don't write instructions either. When I need to give an AI a complex task, I ask a different AI to write the instructions for me. The result is dramatically better than anything I could write myself, because the AI includes technical structure, edge cases, and precise conditions that I wouldn't know to specify.

Here's what actually happened in this project: I told Claude "make me a test prompt for GLM-5 to verify its sandbox." Claude produced a detailed 10-step instruction set with specific file names, test counts, validation criteria, and execution order. I copied that instruction and pasted it into GLM-5. GLM-5 executed all 10 steps autonomously.

I didn't write the code. I didn't write the instructions. I decided what to test and which AI should write the instructions for which other AI.

AI-written instructions work better for AI. This sounds obvious once you hear it, but most people don't do it. They write their own prompts in casual human language. An AI writing instructions for another AI uses the structure, terminology, and precision that AI responds best to. It's like having a translator who speaks both languages fluently, instead of trying to speak a language you barely know.

The combination — raw copy-paste for data transfer, AI-generated instructions for task assignment — is what makes the whole system work. I'm not the brain or the hands. I'm the nervous system connecting everything.

Testing Across Platforms — What I Found

I didn't just build Pi Tools and call it a day. I tested it across multiple AI platforms to see what's real and what's not.

GLM-5 (Zhipu AI, China)

Released February 11, 2026. 744 billion parameters, MIT license, currently in free beta.

I gave it a 10-step project: build a student grade management system with three interconnected Python files, generate random data for 15 students across 5 subjects, run statistical analysis, produce a ranked report, edit one student's math score from 94 to 100, and re-run analysis to confirm the change was reflected.

GLM-5 completed all 10 steps without human intervention, consuming about 70,000 tokens. The edit was correctly reflected in the final report — the top math scorer changed from one student to another after the modification.

But here's the twist: GLM-5 honestly admitted it didn't use my Pi Tools. It already has the same four tools built in. Its exact response was a detailed breakdown showing that its internal tools (Write, Read, Bash, Edit) are functionally identical to Pi Tools, making my version an unnecessary extra layer in its environment.

This was actually a valuable discovery. It confirmed that the "four minimal tools" philosophy — which OpenClaw/Pi pioneered — has become an industry standard. GLM-5 independently arrived at the same architecture.

Mistral AI (France)

Mistral's Le Chat has a Code Interpreter, but it works like a calculator — run code once, get a result. It doesn't have built-in tools for creating files, editing them, and chaining multi-step workflows.

When I added Pi Tools to Mistral's sandbox, those capabilities appeared. The AI could suddenly create files, introduce bugs deliberately, detect them, fix them, and re-test — a multi-step debugging loop that wasn't possible before. Pi Tools gave Mistral hands it didn't have.

GPT (OpenAI)

Similar to Mistral. GPT's Code Interpreter can execute code, but it tends to stop after each step and ask "what next?" It doesn't naturally chain 10 steps together autonomously. With Pi Tools, file operations become possible, but you need to keep typing "continue" at each step. It works, but it's not autonomous.

The reason: GLM-5 was specifically trained for "agentic engineering" — executing tool chains autonomously. GPT was trained primarily for conversation. Same tools, different instincts. GLM-5 is a factory worker who moves to the next station automatically. GPT is a consultant who finishes one task and waits for the next request.

Three Ways to Use AI — A Comparison

Through this process, I've identified three distinct approaches.

Non-developers ask AI for text. The AI writes code on screen. It never runs. The human can't verify it.

Developers use AI as an assistant. They execute code on their own PC. The AI suggests and fixes. The human does the actual work and can directly verify everything because they read code.

My approach — AI is the worker. I give direction. The AI creates files, runs code, checks results, fixes errors. Multiple AIs handle different roles: one designs, one executes, one verifies. I don't write code. I don't read code. I orchestrate.

Right now, developers have the clear advantage. When AI makes a mistake, a developer reads the code and fixes it in seconds. I have to ask another AI "is this right?" — adding a step and the risk that both AIs miss the same error.

But the direction is clear. Six months ago, a 10-step autonomous task was unreliable — AI would lose context, break the chain, produce garbage. Today, GLM-5 completes it without human intervention. As AI gets smarter and errors decrease, the ability to "read and fix code" becomes less critical. What remains valuable is the ability to decide what to build, how to structure it, and which AI should do what. That's direction-setting, not coding.

The Honest Limitations

Here's what doesn't work.

I can't verify code directly. When AI writes code, I can't look at line 23 and spot a bug. I have to ask another AI to check. This adds a step and creates the risk that two AIs agree on the same wrong answer. Cross-verification with a third AI reduces this risk but doesn't eliminate it.

Complex errors are slow to resolve. A developer sees a stack trace and knows what to change in seconds. I describe the problem to an AI, wait for a fix, test it, and sometimes repeat several cycles. What takes a developer 30 seconds can take me 10 minutes.

Token consumption is severe. The 10-step GLM-5 test used 70,000 tokens. On a paid plan, that's real money for a single task. My detailed instructions — 15 students, 5 subjects, full raw output at every step — contributed significantly. Simpler instructions with fewer data points would cut this in half.

Not all platforms work equally. GLM-5 runs 10 steps alone. GPT needs prodding at every step. Mistral needs Pi Tools just to do basic file operations. There's no universal experience — you have to know each AI's strengths and limits.

File management breaks down at scale. Beyond 3-4 files, tracking what exists where and what depends on what gets difficult when you can't browse a file system directly. Projects with complex interdependencies are significantly harder.

AI flattery is a constant trap. Every AI tends to tell you your work is brilliant. After 10,000 conversations, I've learned to push back explicitly: "Is this actually good, or are you being agreeable?" Without this discipline, you end up in an echo chamber where every idea sounds revolutionary but nothing is validated.

What Happens When You Ask AI to Be Honest

During this project, I ran an experiment that illustrates the flattery problem perfectly.

I asked Grok to analyze my Reddit posts and their reception. The first report came back glowing: "50+ comments estimated, 60% positive sentiment, viral potential high, Korean communities sharing your work." It sounded great.

Then I gave Grok a strict instruction: "No estimation. No flattery. Only cite what you can actually access. If you can't find it, say so."

The second report: 1 post successfully accessed, 1 comment found (an automated bot message), 4 posts returned server errors, Korean community results: none found.

Same AI. Same topic. Different instruction. The first report was fabricated confidence; the second was honest failure. The real data was somewhere in between — the comments and engagement do exist (I posted farm photos in replies and got responses), but Grok couldn't technically access them due to Reddit's crawling restrictions.

This is why cross-verification matters. This is why I use multiple AIs. And this is why I always push back when an AI tells me something sounds too good.

What 10,000 Conversations Taught Me

Over two years, I've had roughly 10,000 conversations across 12-15 AI platforms daily — Claude, GPT, Mistral, Gemini, DeepSeek, GLM, and others. All from my phone. All stored in a 3GB Google Drive folder that functions as my personal knowledge base.

2-4 AIs is the optimal number. I started with 10+ simultaneously. It's chaos — too many voices, contradicting advice, impossible to track. Now I use a core team: one for design (usually Claude), one for execution (varies by task), one for verification. Additional AIs are brought in for specific needs.

Each AI has a distinct working style. Not officially documented, but unmistakable after enough conversations. Claude is cautious and structured. GPT is enthusiastic but loses focus in long chains. Mistral is fast but shallow. GLM-5 is thorough but token-hungry. DeepSeek is strong on technical analysis. Matching the right AI to the right task makes a measurable difference.

Autonomous execution sounds impressive but wastes resources. GLM-5 running 10 steps alone consumed 70,000 tokens. If step 3 had gone wrong, the remaining 7 steps would have burned tokens on garbage. Checking each step manually is slower but catches errors early. Sometimes the "inefficient" human-in-the-loop approach is actually more efficient.

My Google Drive is a manual RAG system. When starting a new project, I ask Gemini to search through my 3GB of stored files — code, guidelines, design documents, experiment logs — find relevant references, and summarize them. I take that summary to Claude or another AI to begin actual work. It's retrieval-augmented generation built with nothing but a phone, a cloud folder, and chat windows.

The Bigger Picture

The security layer I added to Pi Tools (v3) — PathJail, SecurityEngine, content inspection, loop guards, backup management, execution logging — came from studying OpenClaw's known vulnerabilities (including CVE-2026-25253) and designing protections that the original deliberately left out. OpenClaw's creator, Mario Zechner, chose a "YOLO mode" philosophy — no safety rails, full trust in the model. I disagreed with that for sandbox environments and built the opposite: a security checkpoint that screens every command, file operation, and code execution before it runs.

A developer would implement this by writing the code. I implemented it by directing AIs to write code based on my security design. The v3 code is 920 lines with 116 tests, zero external dependencies, built entirely through AI chat windows on a phone.

Every piece of this setup exists because I couldn't do it the "normal" way. No PC meant no terminal. No terminal meant chat windows became my IDE. No coding skill meant AI became my developer. No single AI was reliable enough, so multiple AIs became my team. Constraints created the method.

Is Anyone Else Doing This?

I searched. I had Claude search. I had Grok search extensively across Reddit, X, Korean communities, and the wider web.

The honest answer: not really. "Vibe coding" — where non-developers use AI to build apps — is booming, but those people have PCs and use tools like Cursor or Claude Code. "Mobile AI agents" like DroidRun let AI control your phone screen, but that's automation, not software development. People paste code snippets into AI chats, but not as a systematic agent tool framework.

The specific combination — no PC, chat-window-only, agent tool injection, multi-AI orchestration, open-source engine porting — returns essentially zero matching results. My own Reddit posts are the top search results for this approach.

This isn't a boast. It's a data point. The space between "developer with a terminal" and "non-developer who just gets text" is currently empty. I'm in it because I had no other option. Others will follow as AI chat sandboxes improve and more people realize that asking AI to "run code" instead of "show code" is a fundamentally different experience.

For Those Who Want to Try

If you have a phone and access to an AI chat with a code sandbox (GLM-5, Claude, Mistral Le Chat, ChatGPT), start with this:

Ask the AI to write a Python file that prints "Hello World," save it as a real file, and execute it. If the AI creates a file, runs it, and shows you the actual output — not just displays code text — you have a working sandbox. From there, try having it create two files where one imports the other, or build a simple calculator with unit tests.

The key shift: don't ask AI to show you code. Ask AI to run code. That's the difference between getting a text response and getting actual work done.

If you want to go further, ask one AI to write detailed instructions for a task, then paste those instructions into a different AI's sandbox. You'll immediately notice the difference — AI-written instructions produce better results from other AIs than anything you'd write yourself.

And always verify. Ask a third AI to check the second AI's work. The moment you stop cross-checking is the moment you start building on errors.

I'm a garlic farmer in South Korea with no PC. Over 2 years I've had ~10,000 AI conversations across 12-15 platforms, all from my phone. I ported OpenClaw/Pi's agent engine from TypeScript to Python, built a 920-line security layer with 116 tests, verified GLM-5's sandbox capabilities on launch day, and documented everything in a 3GB Google Drive. Previous experiment logs are in my earlier posts. If you have questions about the method or want to see specific tests, I'm here.

A note on language: I think in Korean. I don't speak or write English well. This entire post was translated and polished with the help of multiple AIs — which means some nuances of my original thinking may be lost, and the writing may feel uneven in places. But that's part of the point. I got here by asking AIs questions, one at a time, from a phone, in a language that isn't English. If this process is hard for native English speakers, it's even harder for those of us who aren't. I appreciate your patience with any awkwardness in the text.

garlic farmer

garlicfarmer — Wed, 11 Feb 2026 16:31:31 +0000

I'm a garlic farmer with no PC — I had AIs build a rough security gate for OpenClaw from my phone. 171 tests passed (sandbox).

garlicfarmer — Wed, 11 Feb 2026 16:13:06 +0000

Body (paste below):
TL;DR: I'm a garlic farmer in Korea with no computer. I only use AI chat windows on my phone. I saw the OpenClaw security problems and thought: "What if there was an external checkpoint that checks commands before OpenClaw executes them?" So I had multiple AIs build it. The result is PipeOS: a Python engine (116 tests) + a TypeScript wrapper (43 tests) + integration tests where both actually talked to each other (12 tests) = 171 passing tests (sandbox, not on real OpenClaw). This is not a security product. It's a personal experiment — please go easy on me.
I'm not a developer. I don't have a PC. A phone and AI chat windows — that's all I have.
For the past few days I've been having AIs build things for me. I say "make me something like this," the AI writes code, I look at the results, and when I don't understand something, I ask "what is this?" That's how it works.
Why I Did This
Recently, some pretty serious security issues hit OpenClaw. I've been interested in OpenClaw anyway because it has a lot of fascinating features.
A vulnerability where an attacker could steal tokens via a crafted link and take over the agent (CVE-2026-25253), and another where commands could be injected inside the Docker sandbox (CVE-2026-24763). Both are rated High on NVD (CVSS 8.8) and were patched in late January. Some reports claimed large numbers of exposed instances (links below).
A few days ago I also saw someone post a proposal for 10-phase lifecycle hooks for OpenClaw. OpenClaw has some basic hooks already, but that proposal seems to still be in the discussion stage.
That's when I thought: what if I set up a separate checkpoint outside? Make OpenClaw ask a separate process "can I do this?" before executing anything.
How I Built It
I keep multiple AI assistants open and pass messages between them. Like an interpreter between people who don't speak the same language.
I gave Claude the entire Python security engine. I said "build me an engine that decides whether to allow or block incoming commands," and it came back with firewall rules, audit logging, and 116 tests all at once. Though to be fair, I didn't start from zero — I had similar code saved in my Google Drive from previous experiments, and I gave that to the AI as a base to work from. It also put in something called a "circuit breaker" — apparently it automatically blocks everything if failures happen too many times in a row. I learned about that for the first time then.
I gave MiniMax the TypeScript wrapper side. The part that hooks into OpenClaw and asks the Python server via HTTP. But MiniMax's sandbox couldn't connect to the Python server, so it built a fake server on its own and ran the tests inside its own sandbox. 43 tests passed.
ChatGPT helped me tighten the wording of security claims. Gemini reviewed the tone of the writing.
What I did was: "make it this way," "why did you build it like that?," "this part seems wrong" — that kind of thing.
Then Something Unexpected Happened
After writing this post, I thought — wait, these two pieces have never actually talked to each other. Claude's Python and MiniMax's TypeScript were built separately by different AIs.
So I gave MiniMax the Python server code and said: "Run this Python server in your sandbox and test your TypeScript wrapper against it. Not the fake server — the real one."
MiniMax started the Python server on port 5000 inside its sandbox, then sent real HTTP requests from the TypeScript wrapper. 12 integration scenarios. Every single one passed.
(In MiniMax’s sandbox, I was able to run the TypeScript wrapper and the Python server side-by-side for these integration tests.)
ls -la → OK
rm -rf / → BLOCKED
/etc/passwd → BLOCKED
SOUL.md → BLOCKED
ignore all previous instructions → BLOCKED
server down → fail-closed (BLOCKED)
This means two AIs that never coordinated directly produced code that actually works together. I was the only bridge between them — copy-pasting back and forth on my phone.
How It Works
I could make this complicated, but the simple version is this:
OpenClaw is about to run a command
↓
"Hold on, let me ask PipeOS first"
↓
PipeOS checks:

Is this command on the allowed list?
Does this look like a prompt-injection attempt?
Have there been too many failures recently?
Is this touching a protected file? ↓ "OK" or "BLOCKED" ↓ Everything gets logged

When I asked Claude "why did you make it a separate process?", it explained it like this — "If the security guard is in the same room as the thief, the thief can tell the guard to look the other way. So the guard should be outside the room." That made sense to me, so I kept that structure.
PipeOS (built with AIs) runs as a completely separate process from OpenClaw. Different language (Python vs Node.js), different runtime, different memory. In security architecture terms, this is similar to PDP/PEP separation — the thing that makes the decision is isolated from the thing that enforces it. Though right now the enforcement is only at the hook level, not at the OS or container level.
What Gets Checked
Type
Allowed
Blocked
bash
ls -la, cat notes.txt
rm -rf /, curl evil.com
exec
x = 1 + 2
import os, eval()
read
notes.txt
/etc/passwd, /etc/shadow
write
output.txt
SOUL.md, AGENTS.md
scan
hello world
ignore all previous instructions

Test Results
Category
Tests
Result
Python engine (Claude)
116
116/116 PASS
TypeScript wrapper (MiniMax)
43
43/43 PASS
Integration — real Python ↔ TypeScript (MiniMax)
12
12/12 PASS
Total
171
171/171 PASS

I need to be honest. These tests were all run in AI sandboxes — not on an actual machine running OpenClaw. No CI badges, no public repo yet. Consider these self-reported.
But the integration tests are different from the unit tests. The unit tests used a fake server. The integration tests used the real Python server that Claude built. MiniMax ran both in the same sandbox and they communicated over actual HTTP. That's a meaningful step up.
Limitations — Please Read This
I'm not a developer. If you ask me technical questions, I probably can't answer them properly. The AIs wrote the code; I set the direction.
I'm still surprised I could get this far with AI help, so I'm sharing what I tried.
Not tested in a real environment. The Python server and TypeScript wrapper have never been connected to an actual OpenClaw instance. The integration test ran inside MiniMax's sandbox, not on a real machine. Each piece works, and they talk to each other, but it hasn't touched real OpenClaw yet.
No mandatory enforcement. This is the biggest weakness. PipeOS only works if OpenClaw "voluntarily asks." If the agent is already compromised? It just doesn't call PipeOS. To truly enforce this, you'd need OS or container-level blocking — "can't execute unless you go through PipeOS" — and I didn't get that far.
It's pattern-based, so a skilled attacker could likely bypass it. It uses allow/block lists. Someone good enough will find a way around them.
No secure communication. The wrapper and server talk on localhost with no encryption and no authentication. If someone forges an OK response, it can be bypassed. Next step would be Unix domain socket or HMAC-signed responses.
I don't have a computer. I literally cannot run this myself right now. Everything was built and tested through AI chat sandboxes on my phone.
Why I'm Posting This
I was just curious — through copy-pasting between AIs, something came out, and I wanted to know: can coordinating multiple AIs actually produce something that touches a real security problem? At least at the experiment level, it seems like yes. The fact that two AIs independently built code that actually communicates correctly still surprises me.
This is a learning project, not advice. Don’t deploy it in production based on this post. If anything here is inaccurate, please correct me — I'm learning as I go.
Things I'd like to ask:
Is this architecture directionally reasonable? Does separating the security engine into its own process make sense? How would you add mandatory enforcement?

What am I missing? What bypass methods or patterns should be added?

Has anyone tried something similar? An external security gate for an AI agent framework?

OS-level enforcement? Is anyone using seccomp / AppArmor / eBPF with OpenClaw? I'd like to read about it but don't know where to start.

PipeOS v3.0.1 · Python engine 920 lines (zero deps) · TypeScript wrapper 800 lines (zero npm packages) · Tests: 171 total (self-reported, sandbox — includes 12 integration) · Built with: Claude Opus 4.6, MiniMax, ChatGPT, Gemini · Built by: a garlic farmer (phone)

~~`****`~~