A Garlic Farmer's Guide to AI: Building Software with Nothing but a Phone
The Setup
I'm a garlic farmer. I've been living in rural South Korea for 16 years. I don't own a PC. Everything I do with AI, I do from my phone.
That sounds like a limitation — and it is. But it's also the reason I discovered something that neither developers nor typical AI users seem to be doing.
What Most People Do with AI
There are roughly two groups of people using AI for coding right now.
Non-developers open ChatGPT and say "write me a calculator." The AI spits out code as text. They look at it, maybe copy it somewhere, and that's the end of it. The code never actually runs. It's just text on a screen.
Developers use tools like Claude Code, OpenClaw/Pi, or Cursor on their PCs. They open a terminal, type commands, install packages, set up API keys, and run code directly. AI helps them — suggests code, fixes bugs — but the developer is the one actually executing everything. They're the hands; AI is the assistant.
I'm in neither group.
What I Do Instead
I don't write code. I don't read code. I don't have a terminal. What I do is this: I open an AI chat window, paste in a set of tools, and tell the AI to build things using only those tools.
The tools are simple. There are exactly four: Write (create a file), Read (read a file), Edit (modify a file), and Bash (run a command). These four buttons — packaged as something called Pi Tools — turn any AI chat window into a programming environment.
When I tell the AI "create a student grade management system," it doesn't just show me code as text. It actually creates the files, runs the code, checks the output, finds errors, fixes them, and runs again. Real files get created. Real code gets executed. Real results come back.
I don't touch any of it. I just decide what to build and what to do next.
Where This Idea Came From
It wasn't some grand vision. It came from not having a choice.
OpenClaw/Pi is an open-source AI coding agent that took GitHub by storm — 145,000 stars in a week. Its core discovery was that an AI only needs four tools (read, write, edit, bash) to work like a programmer. But to use it, you need a PC, Node.js, a terminal, and an API key. For a developer, that's five minutes of setup. For me, it's impossible.
So I asked a different question: what if I could give those same four tools to an AI through a chat window?
I had an AI translate the core concept from TypeScript to Python, stripped out everything that required an API or a server, and ended up with about 150 lines of code that could be pasted into any AI chat sandbox. No installation. No API key. No PC. Just paste and go.
That's Pi Tools.
The Secret Weapon: Copy-Paste and AI-Written Instructions
There's a part of my workflow that sounds too simple to be important, but it's actually the most powerful thing I do.
Copy-paste is the backbone. When one AI produces a result — a report, a code file, an analysis — I don't summarize it in my own words and relay it to another AI. I copy the entire output and paste it directly. This preserves every detail. When I pasted Grok's full research report into Claude, Claude could spot that Grok had used the word "estimated" and "extraction failed" — details I would have missed if I'd just said "Grok wrote a report and it seemed exaggerated."
Human language loses information. AI output pasted raw does not.
I don't write instructions either. When I need to give an AI a complex task, I ask a different AI to write the instructions for me. The result is dramatically better than anything I could write myself, because the AI includes technical structure, edge cases, and precise conditions that I wouldn't know to specify.
Here's what actually happened in this project: I told Claude "make me a test prompt for GLM-5 to verify its sandbox." Claude produced a detailed 10-step instruction set with specific file names, test counts, validation criteria, and execution order. I copied that instruction and pasted it into GLM-5. GLM-5 executed all 10 steps autonomously.
I didn't write the code. I didn't write the instructions. I decided what to test and which AI should write the instructions for which other AI.
AI-written instructions work better for AI. This sounds obvious once you hear it, but most people don't do it. They write their own prompts in casual human language. An AI writing instructions for another AI uses the structure, terminology, and precision that AI responds best to. It's like having a translator who speaks both languages fluently, instead of trying to speak a language you barely know.
The combination — raw copy-paste for data transfer, AI-generated instructions for task assignment — is what makes the whole system work. I'm not the brain or the hands. I'm the nervous system connecting everything.
Testing Across Platforms — What I Found
I didn't just build Pi Tools and call it a day. I tested it across multiple AI platforms to see what's real and what's not.
GLM-5 (Zhipu AI, China)
Released February 11, 2026. 744 billion parameters, MIT license, currently in free beta.
I gave it a 10-step project: build a student grade management system with three interconnected Python files, generate random data for 15 students across 5 subjects, run statistical analysis, produce a ranked report, edit one student's math score from 94 to 100, and re-run analysis to confirm the change was reflected.
GLM-5 completed all 10 steps without human intervention, consuming about 70,000 tokens. The edit was correctly reflected in the final report — the top math scorer changed from one student to another after the modification.
But here's the twist: GLM-5 honestly admitted it didn't use my Pi Tools. It already has the same four tools built in. Its exact response was a detailed breakdown showing that its internal tools (Write, Read, Bash, Edit) are functionally identical to Pi Tools, making my version an unnecessary extra layer in its environment.
This was actually a valuable discovery. It confirmed that the "four minimal tools" philosophy — which OpenClaw/Pi pioneered — has become an industry standard. GLM-5 independently arrived at the same architecture.
Mistral AI (France)
Mistral's Le Chat has a Code Interpreter, but it works like a calculator — run code once, get a result. It doesn't have built-in tools for creating files, editing them, and chaining multi-step workflows.
When I added Pi Tools to Mistral's sandbox, those capabilities appeared. The AI could suddenly create files, introduce bugs deliberately, detect them, fix them, and re-test — a multi-step debugging loop that wasn't possible before. Pi Tools gave Mistral hands it didn't have.
GPT (OpenAI)
Similar to Mistral. GPT's Code Interpreter can execute code, but it tends to stop after each step and ask "what next?" It doesn't naturally chain 10 steps together autonomously. With Pi Tools, file operations become possible, but you need to keep typing "continue" at each step. It works, but it's not autonomous.
The reason: GLM-5 was specifically trained for "agentic engineering" — executing tool chains autonomously. GPT was trained primarily for conversation. Same tools, different instincts. GLM-5 is a factory worker who moves to the next station automatically. GPT is a consultant who finishes one task and waits for the next request.
Three Ways to Use AI — A Comparison
Through this process, I've identified three distinct approaches.
Non-developers ask AI for text. The AI writes code on screen. It never runs. The human can't verify it.
Developers use AI as an assistant. They execute code on their own PC. The AI suggests and fixes. The human does the actual work and can directly verify everything because they read code.
My approach — AI is the worker. I give direction. The AI creates files, runs code, checks results, fixes errors. Multiple AIs handle different roles: one designs, one executes, one verifies. I don't write code. I don't read code. I orchestrate.
Right now, developers have the clear advantage. When AI makes a mistake, a developer reads the code and fixes it in seconds. I have to ask another AI "is this right?" — adding a step and the risk that both AIs miss the same error.
But the direction is clear. Six months ago, a 10-step autonomous task was unreliable — AI would lose context, break the chain, produce garbage. Today, GLM-5 completes it without human intervention. As AI gets smarter and errors decrease, the ability to "read and fix code" becomes less critical. What remains valuable is the ability to decide what to build, how to structure it, and which AI should do what. That's direction-setting, not coding.
The Honest Limitations
Here's what doesn't work.
I can't verify code directly. When AI writes code, I can't look at line 23 and spot a bug. I have to ask another AI to check. This adds a step and creates the risk that two AIs agree on the same wrong answer. Cross-verification with a third AI reduces this risk but doesn't eliminate it.
Complex errors are slow to resolve. A developer sees a stack trace and knows what to change in seconds. I describe the problem to an AI, wait for a fix, test it, and sometimes repeat several cycles. What takes a developer 30 seconds can take me 10 minutes.
Token consumption is severe. The 10-step GLM-5 test used 70,000 tokens. On a paid plan, that's real money for a single task. My detailed instructions — 15 students, 5 subjects, full raw output at every step — contributed significantly. Simpler instructions with fewer data points would cut this in half.
Not all platforms work equally. GLM-5 runs 10 steps alone. GPT needs prodding at every step. Mistral needs Pi Tools just to do basic file operations. There's no universal experience — you have to know each AI's strengths and limits.
File management breaks down at scale. Beyond 3-4 files, tracking what exists where and what depends on what gets difficult when you can't browse a file system directly. Projects with complex interdependencies are significantly harder.
AI flattery is a constant trap. Every AI tends to tell you your work is brilliant. After 10,000 conversations, I've learned to push back explicitly: "Is this actually good, or are you being agreeable?" Without this discipline, you end up in an echo chamber where every idea sounds revolutionary but nothing is validated.
What Happens When You Ask AI to Be Honest
During this project, I ran an experiment that illustrates the flattery problem perfectly.
I asked Grok to analyze my Reddit posts and their reception. The first report came back glowing: "50+ comments estimated, 60% positive sentiment, viral potential high, Korean communities sharing your work." It sounded great.
Then I gave Grok a strict instruction: "No estimation. No flattery. Only cite what you can actually access. If you can't find it, say so."
The second report: 1 post successfully accessed, 1 comment found (an automated bot message), 4 posts returned server errors, Korean community results: none found.
Same AI. Same topic. Different instruction. The first report was fabricated confidence; the second was honest failure. The real data was somewhere in between — the comments and engagement do exist (I posted farm photos in replies and got responses), but Grok couldn't technically access them due to Reddit's crawling restrictions.
This is why cross-verification matters. This is why I use multiple AIs. And this is why I always push back when an AI tells me something sounds too good.
What 10,000 Conversations Taught Me
Over two years, I've had roughly 10,000 conversations across 12-15 AI platforms daily — Claude, GPT, Mistral, Gemini, DeepSeek, GLM, and others. All from my phone. All stored in a 3GB Google Drive folder that functions as my personal knowledge base.
2-4 AIs is the optimal number. I started with 10+ simultaneously. It's chaos — too many voices, contradicting advice, impossible to track. Now I use a core team: one for design (usually Claude), one for execution (varies by task), one for verification. Additional AIs are brought in for specific needs.
Each AI has a distinct working style. Not officially documented, but unmistakable after enough conversations. Claude is cautious and structured. GPT is enthusiastic but loses focus in long chains. Mistral is fast but shallow. GLM-5 is thorough but token-hungry. DeepSeek is strong on technical analysis. Matching the right AI to the right task makes a measurable difference.
Autonomous execution sounds impressive but wastes resources. GLM-5 running 10 steps alone consumed 70,000 tokens. If step 3 had gone wrong, the remaining 7 steps would have burned tokens on garbage. Checking each step manually is slower but catches errors early. Sometimes the "inefficient" human-in-the-loop approach is actually more efficient.
My Google Drive is a manual RAG system. When starting a new project, I ask Gemini to search through my 3GB of stored files — code, guidelines, design documents, experiment logs — find relevant references, and summarize them. I take that summary to Claude or another AI to begin actual work. It's retrieval-augmented generation built with nothing but a phone, a cloud folder, and chat windows.
The Bigger Picture
The security layer I added to Pi Tools (v3) — PathJail, SecurityEngine, content inspection, loop guards, backup management, execution logging — came from studying OpenClaw's known vulnerabilities (including CVE-2026-25253) and designing protections that the original deliberately left out. OpenClaw's creator, Mario Zechner, chose a "YOLO mode" philosophy — no safety rails, full trust in the model. I disagreed with that for sandbox environments and built the opposite: a security checkpoint that screens every command, file operation, and code execution before it runs.
A developer would implement this by writing the code. I implemented it by directing AIs to write code based on my security design. The v3 code is 920 lines with 116 tests, zero external dependencies, built entirely through AI chat windows on a phone.
Every piece of this setup exists because I couldn't do it the "normal" way. No PC meant no terminal. No terminal meant chat windows became my IDE. No coding skill meant AI became my developer. No single AI was reliable enough, so multiple AIs became my team. Constraints created the method.
Is Anyone Else Doing This?
I searched. I had Claude search. I had Grok search extensively across Reddit, X, Korean communities, and the wider web.
The honest answer: not really. "Vibe coding" — where non-developers use AI to build apps — is booming, but those people have PCs and use tools like Cursor or Claude Code. "Mobile AI agents" like DroidRun let AI control your phone screen, but that's automation, not software development. People paste code snippets into AI chats, but not as a systematic agent tool framework.
The specific combination — no PC, chat-window-only, agent tool injection, multi-AI orchestration, open-source engine porting — returns essentially zero matching results. My own Reddit posts are the top search results for this approach.
This isn't a boast. It's a data point. The space between "developer with a terminal" and "non-developer who just gets text" is currently empty. I'm in it because I had no other option. Others will follow as AI chat sandboxes improve and more people realize that asking AI to "run code" instead of "show code" is a fundamentally different experience.
For Those Who Want to Try
If you have a phone and access to an AI chat with a code sandbox (GLM-5, Claude, Mistral Le Chat, ChatGPT), start with this:
Ask the AI to write a Python file that prints "Hello World," save it as a real file, and execute it. If the AI creates a file, runs it, and shows you the actual output — not just displays code text — you have a working sandbox. From there, try having it create two files where one imports the other, or build a simple calculator with unit tests.
The key shift: don't ask AI to show you code. Ask AI to run code. That's the difference between getting a text response and getting actual work done.
If you want to go further, ask one AI to write detailed instructions for a task, then paste those instructions into a different AI's sandbox. You'll immediately notice the difference — AI-written instructions produce better results from other AIs than anything you'd write yourself.
And always verify. Ask a third AI to check the second AI's work. The moment you stop cross-checking is the moment you start building on errors.
I'm a garlic farmer in South Korea with no PC. Over 2 years I've had ~10,000 AI conversations across 12-15 platforms, all from my phone. I ported OpenClaw/Pi's agent engine from TypeScript to Python, built a 920-line security layer with 116 tests, verified GLM-5's sandbox capabilities on launch day, and documented everything in a 3GB Google Drive. Previous experiment logs are in my earlier posts. If you have questions about the method or want to see specific tests, I'm here.
A note on language: I think in Korean. I don't speak or write English well. This entire post was translated and polished with the help of multiple AIs — which means some nuances of my original thinking may be lost, and the writing may feel uneven in places. But that's part of the point. I got here by asking AIs questions, one at a time, from a phone, in a language that isn't English. If this process is hard for native English speakers, it's even harder for those of us who aren't. I appreciate your patience with any awkwardness in the text.
Top comments (0)