c ck

Posted on Mar 7

A mere farmer built a personal AI agent with just one Android phone — no PC, no coding experience

#ai #android #termux #python

Who I am

I grow garlic in rural South Korea. Sixteen years ago I left my job in Seoul, moved to the countryside, and never owned a personal PC again. Everything I do runs on an Android phone with Termux installed. One unexpected advantage of working on a phone: text is all you get, so you stay focused on it. Every task I do is conversation-based AI interaction, and I suspect a PC's flashy visual interface would have scattered my attention. I have never written a line of code on my own. Every piece of code was built by talking to multiple AIs through a physical-keyboard phone, copy-pasting back and forth — a clumsy method I turned into my own workflow over two years.

I should say upfront: my thinking happens in Korean, and this post was translated from Korean with AI help. If some phrasing feels odd, that is why.

What I built

garlic-agent — a personal AI agent that runs in Termux on Android. Whether it deserves the word "agent" I am not sure, but it is my own AI assistant and I gave it a name that fits a garlic farmer.

The setup: 1,210 lines of Python, SQLite3, shell scripts. No heavy frameworks. Zero external library dependencies — standard library only. It connects to multiple LLM APIs (Gemini, DeepSeek, MiniMax, Cerebras, Groq, NVIDIA) with a fallback structure: if one fails, it moves to the next.

The core problem I kept running into was LLMs fabricating results and insisting they succeeded. To catch this, I needed verification built into the language grammar itself. So, with help from several AIs, I created GarlicLang — a Korean-based scripting DSL with a hallucination-detection block baked into the syntax. The interpreter is 967 lines. Here is what it looks like:

[시도]
  [실행]
    명령어: cat ~/garlic-agent/config.json
[환각시]
  [출력]
    내용: AI fabricated the result
[실패시]
  [출력]
    내용: Command execution failed

The [환각시] block (literally "on hallucination") fires when the AI's claimed output does not match what actually ran. I am still surprised I managed to build something like this through AI conversations alone.

Current scale: 13 skills, 6 GL-DIRECT commands (run instantly without LLM calls), ~6,488 documents in RAG search, Shizuku integration for screen control. The document base comes from two years of AI conversations — tens of thousands of turns, first-pass refined and stored in Google Docs. That accumulated knowledge is what ultimately made this agent possible. Before this, I was working painfully inside LLM sandbox chat windows, slow and limited. But that hardship is what trained me to build this.

How I built it

Manual multi-orchestration with a human in the middle. The AIs never talk to each other directly. I wonder if direct AI-to-AI communication would make this more powerful, but on a phone, within these constraints, I am the one who judges, verifies, and sets direction. I keep five to seven chat windows open at once. I am the router.

It is a slow, perhaps crude method. But I can produce multiple working versions in a single day, and I can verify results immediately. The clumsiest approach sometimes has its own efficiency. I also think hard about whether full AI autonomy is even desirable. I have seen too many hallucinations over two years to trust it blindly. LLMs have limits — and I wanted to push past those limits.

The typical workflow: Claude handles foundational design and analysis. Gemini, DeepSeek, or MiniMax handle implementation. I review the result and copy-paste it to the next AI. Two years of conversation across tens of thousands of turns taught me each model's personality, and that knowledge is the real foundation of this method.

Here is a concrete example from one day's work. I built GL-DIRECT — a skill structure where frequently used commands are pre-saved and executed instantly without any API call. Claude designed it, Gemini implemented it. During this process I copy-pasted between Termux and chat windows thousands of times. On a phone, this ping-pong is actually natural — maybe even more fluid than on a PC. Though honestly, I have no PC to compare with, so I simply learned to work within my constraints. The additions were small: 16 lines in skill_loader.py, 19 lines in web.py, 35 lines total. Testing revealed a JavaScript error. Claude found the cause and wrote the patch. I pasted it into the terminal. On top of that, I cross-checked with non-API Gemini, Grok, and Perplexity to see if there was a better approach. I gathered multiple opinions. The final judgment was mine.

A kind of manual multi-agent collaboration, with a human as the orchestrator.

Then a problem surfaced. GL files were not being auto-saved. No matter how forcefully I wrote "you MUST save this to a file" in the prompt, the LLM ignored it and replied "saved successfully." Three times in a row: 0-byte files. I gave up on trusting LLM autonomy for this and built a system-level interceptor in web.py that force-saves. Problem solved.

Lesson: do not ask the AI — force it in code. I learned this from Gemini 1.5 Pro's comprehensive analysis, cross-verified with other AIs, and made the final call myself.

Each AI model has a distinct personality. This is human intuition built over two years and tens of thousands of conversation windows. A farmer's habit of observation helped more than I expected. In my system: Gemini follows rules well and excels at combining commands into single lines. DeepSeek is strong at analysis but tends to repeat the same call. MiniMax ignores rules but is useful for cross-verification. The key was understanding each AI's character and placing them where they fit.

Inside the agent itself, everything is automatic. The user types natural language, skill_loader matches triggers, GL-DIRECT commands run instantly, and for everything else the LLM generates a GarlicLang script that executes locally. The system supports Korean-based commands (English too), with typo correction rules and dictionaries built from long experience. I discovered that a five-line Termux command can be described in natural language, and conversely, a natural language description can generate working terminal commands. I believe this kind of interface could make traditional programming far more intuitive for ordinary people.

Failures

I will be honest.

freeze_gate incident: I built a verification module to catch number hallucinations. It ended up killing valid responses too. When the LLM said "modify line 921," the gate flagged 921 as hallucinated because that number did not appear in the tool output. Every analysis request got blocked regardless of model. Lesson: safety mechanisms, when too aggressive, freeze the entire system. Sometimes flexibility matters more.

LLM ignoring save commands: Described above. Three consecutive 0-byte files. Solved by forcing saves in code. Lesson: never delegate critical operations to LLM goodwill.

MiniMax hallucination: It fabricated a result — "PASS 7" — without ever executing the tool. The execution log showed zero calls, yet a result appeared. This incident is why GarlicLang's [환각시] block exists.

Triple-quote incident: I inserted external text into a Python """ string. The text contained double quotes, which prematurely closed the triple-quote. The server crashed. The backup had been created in the same minute, so it saved the broken file. I had to dig back to an earlier backup to restore.

Restoration blunder: Restored from a backup that only contained 4 files. The remaining 83 files had no comparison point. I kept asking "why isn't this working" in circles.

Cost

Before GL-DIRECT: every repeated command hit the API. Running 10 commands individually consumed ~150,000 tokens. I did not even understand this was happening until I saw the bill.

After GL-DIRECT: saved GL scripts run instantly at zero API cost. The same 10 commands bundled as GL use ~30,000 tokens — roughly 80% savings. Running "show backup list" 100 times used to mean 100 API calls. Now it means zero. I verified this by observing my garlic terminal's instant responses. Theoretical, but consistent with what I see.

What I do not know yet, and what I want to do next

I tried local LLMs, but phone hardware is not there yet — llama.cpp was not practical. A Mac Mini might change things, and I plan to get one. Whether anyone else can use this system (scalability) is untested. This is a one-person project and there is much I lack. But the possibilities feel real. If a garlic farmer can do this, that says something. Talking to AIs every day, I can see their improvement happening in real time inside our conversations.

Two years ago, my intentions barely made it through to the code. Now the reflection is remarkably accurate. Indentation rules, AST pattern matching, regression tests — I did not know these terms before. I am learning them gradually, and the AIs are teaching me through the process of pasting and asking.

Farming taught me that ignorance about your crop leads to real losses. Experience matters. With the experience I have now, my next goals are: fix the variable scope bug in the GarlicLang interpreter, convert the CHANGELOG into a database so the agent can search its own history. Right now my date management is a mess, but seeing how AIs can diagnose problems just from those messy logs taught me how important it is to record both successes and failures.

Someday I want to open-source this. Right now, the system works perfectly for me, but others would not know my AI philosophy or my intentions, so it would be hard for them to use. I push hundreds of bug fixes and dozens of updates daily, and I am getting used to this rhythm. What still amazes me: even on a limited phone-based agent, I can snapshot, restore, patch, and redeploy in real time. This was unthinkable before. It is real now, and it surprises me every day.

I will close by paying respect to the late Steve Jobs, who connected the world and made this entire network ecosystem possible through a small device in our hands.

The numbers in this post were generated by commanding my garlic-agent to pull the latest figures, then double-verified. I trust them because if any number is wrong, it triggers my [환각시] block — no matching tool output means hallucination detected. And the final verification is done by me, a human.

This post is a personal experiment and observation log from a garlic farmer in Korea. It was written by thinking in Korean, verified with multiple AIs, and translated with their help.

My device is a Unihertz Titan 2. I also use a BlackBerry KEY2, but the agent is not on it yet.

Numbers

Development period: ~3 weeks (mid-February 2026~)
agent.py: 1,210 lines
Total .py files: 31
Skills: 13 (6 GL-DIRECT)
GarlicLang interpreter: 967 lines
GarlicLang commands: 32 (Korean)
GarlicLang verification types: 17
knowledge.db documents: 6,488
GL scripts: 359 (341 passed lint, 99.4%)
Backup tar.gz files: 128
LLMs used: Gemini, DeepSeek, MiniMax, Cerebras, Groq, NVIDIA, Claude (external consultant)

Thank you for reading this far.

from garlic farmer

DEV Community