This is a submission for the Built with Google Gemini: Writing Challenge
╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ║
║ ║
║ ██╗ ██╗ ██╗ ███████╗ ██╗ ██╗ ███╗ ██╗ █████╗ ║
║ ██║ ██╔╝ ██║ ╚══███╔╝ ██║ ██║ ████╗ ██║ ██╔══██╗ ║
║ █████╔╝ ██║ ███╔╝ ██║ ██║ ██╔██╗ ██║ ███████║ ║
║ ██╔═██╗ ██║ ███╔╝ ██║ ██║ ██║╚██╗██║ ██╔══██║ ║
║ ██║ ██╗ ██║ ███████╗ ╚██████╔╝ ██║ ╚████║ ██║ ██║ ║
║ ╚═╝ ╚═╝ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝ ╚═╝ ║
║ ║
║ [ The Gemini to Local Environment Bridge ] ║
║ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ║
╚══════════════════════════════════════════════════════════════╝
Integrating AI into our daily coding workflows is a recurring discussion. The discourse focuses heavily on context windows, reasoning models, and whether AI will replace or augment engineers. But I think centering this debate purely on the models themselves is reductive.
The bigger question to me is the environment. How do we actually connect these floating, cloud-based brains to our physical work?
For a long time, my workflow was agonizing. I was stuck in what I call "Copy-Paste Torture." I would give the AI context, copy a file from my IDE, paste it into Google Gemini, ask for a change, copy the resulting code, paste it back, run it, hit an error, and start over.
Gemini was incredibly capable, but the friction of constant context-switching was killing the momentum. A brain in a browser jar has no hands.
So, I decided to build it a nervous system. I called it Kizuna (絆 - meaning bond or connection).
What I Built with Google Gemini
Kizuna is an end-to-end toolchain that transforms the standard Google Gemini web interface into a localized, agentic IDE companion. I didn't want to just build a wrapper; I wanted to create a system that felt intentional, granting the web-based LLM the ability to read, search, and safely patch a local codebase without compromising my machine.
To achieve this, I broke the system down into three foundational pillars:
1. The Engine (Local Daemon) ⚙️
Code is where software lives, but you can't just give an AI raw shell access—that's a massive security risk. I built a local backend service to act as a secure sandbox. It translates strict JSON intents from the browser into optimized local file reads, writes, and Git operations.
╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░ THE PATH JAIL (SANDBOX) ░░░░░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ ╭── [ ALLOWED WORKSPACE ] ──────────────────────────────╮ ║
║ │ │ ║
║ │ 📂 /workspace/my-app/ (Sandbox Root) │ ║
║ │ ├── 📄 src/main.js [ ✓ ] OK │ ║
║ │ └── 📄 package.json [ ✓ ] OK │ ║
║ │ │ ║
║ ╰───────────────────────────────────────────────────────╯ ║
║ ║
║ ╭── [ BLOCKED EXTERNALS ] ──────────────────────────────╮ ║
║ │ │ ║
║ │ 🚨 /etc/passwd [ ⨉ ] BLOCKED: 403 │ ║
║ │ │ ║
║ ╰───────────────────────────────────────────────────────╯ ║
║ ║
║ ▪ Engine drops all path traversal requests (../) ║
║ ▪ Symlinks are resolved prior to boundary validation ║
╚══════════════════════════════════════════════════════════════╝
The Engine resolves all symlinks before executing. If an AI hallucinates a path traversal, the system structurally drops the request.
2. The Bridge (Chrome Extension)
A web extension that sits quietly on the right side of the Gemini window. Building this was an architectural challenge. Because Gemini streams text, scraping the DOM naively crashes the parser with incomplete JSON. I had to build a MutationObserver that waits for the absolute "completion" state of the chat UI before parsing. It captures the AI's outputs, relays them to my local engine via a Background Worker (to bypass strict browser CORS restrictions), and injects the results back into the chat.
3. The Protocol (Documentation & Rules)
A deterministic system prompt fed to Gemini at the start of every chat. This acts as the "Operating Manual." LLMs are chaotic; they need boundaries. The protocol forces Gemini to use a strict JSON schema instead of conversational markdown.
Demo
Here is a look at the complete, fully-boxed architectural flow of Kizuna:
╔══════════════════════════════════════════════════════════════╗
║ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ BROWSER ENVIRONMENT ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ║
║ ║
║ ┌────────────────┐ [ DOM ] ┌───────────────────────┐ ║
║ │ 🧠 Gemini UI │ ◀───────────▶ │ 🧩 Chrome Extension │ ║
║ └────────────────┘ └───────────────────────┘ ║
║ ▲ │ ║
║ │ (Injects UI Data) │ (WebSockets) ║
╠══════════╪════════════════════════════════════╪══════════════╣
║ │ ▼ ║
║ ┌───────┴────────────────────────────────────┴───────────┐ ║
║ │ 💻 KIZUNA ENGINE (Local Daemon :8080) │ ║
║ │ │ ║
║ │ ╭─────────────────╮ ╭─────────────────────╮ │ ║
║ │ │ 🛡️ Path Sandbox │ ───────▶ │ 🛠️ Tool Dispatcher │ │ ║
║ │ ╰─────────────────╯ ╰───┬──────┬──────┬───╯ │ ║
║ │ │ │ │ │ ║
║ │ [Read] [Write] [Git] │ ║
║ │ │ │ │ │ ║
║ │ ╭─┴──────┴──────┴─╮ │ ║
║ │ │ 📂 Local Storage│ │ ║
║ │ ╰─────────────────╯ │ ║
║ └────────────────────────────────────────────────────────┘ ║
║ ║
║ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ LOCAL ENVIRONMENT ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ║
╚══════════════════════════════════════════════════════════════╝
When I ask Gemini to "Update the database connection," it doesn't write me a markdown tutorial. It outputs its intent directly:
{
"action": "patch_file",
"path": "src/config.js",
"search": "localhost",
"replace": "process.env.DB_HOST"
}
The extension picks this up, the engine verifies the path hasn't escaped the workspace, the file is safely patched, and the success message is injected directly back into the Gemini chat or I pasted it manually sometimes.
What I Learned
Through this work, I learned to question the constraints of LLMs first, not treat them as assumptions.
Abstract Syntax Trees (ASTs) vs. Raw Text
The naive approach to the context window is to just send the AI the whole file. But why send a 1,000-line file when the AI just needs to know what the file does? Sending raw text is a massive waste of tokens and scatters the model's focus.
I built a tool into the engine that parses code into an Abstract Syntax Tree (AST). Instead of returning raw code, it strips the implementation logic and returns a "Skeleton" of class names, imports, and function signatures.
╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░░ AST PARSER ROUTING LOGIC ░░░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║ [ 📄 Gemini Requests File ] ║
║ │ ║
║ ▼ ║
║ ╭───────────────────────╮ ║
║ │ Is File > 300 Lines? │ ║
║ ╰───────────┬───────────╯ ║
║ │ ║
║ ┌───( YES )───────┴───────( NO )────┐ ║
║ │ │ ║
║ ▼ ▼ ║
║ ╭───────────────╮ ╭───────────────╮ ║
║ │ 🌳 AST Parse │ │ 📝 Raw Read │ ║
║ ╰───────┬───────╯ ╰───────┬───────╯ ║
║ │ │ ║
║ ╭───────▼───────╮ ╭───────▼───────╮ ║
║ │ ▪ Classes │ │ ▪ Full Logic │ ║
║ │ ▪ Signatures │ │ ▪ Implement. │ ║
║ │ ▪ Docstrings │ │ ▪ Variables │ ║
║ ╰───────┬───────╯ ╰───────┬───────╯ ║
║ │ │ ║
║ █▓▒░ Token Cost: ~5% Token Cost: 100% ░▒▓█ ║
╚══════════════════════════════════════════════════════════════╝
To make this work, I learned to rely on highly descriptive function names and docstrings. With good docstrings, the AI could understand the codebase's intent just by pinging the AST, mapping out the architecture flawlessly.
Sandboxing and Heuristics
I actually used Gemini itself to help create a dataset filtering hundreds of developer CLI commands into "safe" and "harmful" categories. This allowed me to build heuristic judgments into the local sandbox, entirely disabling shell: true evaluations in Node.
The Ultimate Safety Net (Auto-Commits)
You can never trust an AI blindly. I didn't want it making untracked changes. I built a feature into the engine that automatically runs a local git commit after every (almost) single file change and kept git files backed up when messing around with git commits and refactoring. If the AI messed up, I had an instant, local undo button. Git became the AI's eyes and my safety net.
Google Gemini Feedback
I think of designing with AI in two stages: when the system hums, and when the material fights back.
✅ The 70% Magic
When it worked, it felt like magic. About 70% of the time, the system operated perfectly. Reading code worked phenomenally well. Gemini 2.5 Pro could absorb the AST skeletons, navigate the directory, and reason about system design with striking clarity.
⚠️ When the material fights back (The 30% Chaos)
The remaining 30% of the time was a battle against the realities of the medium: hallucinations, syntax errors, and context degradation.
1. The Writing Problem: Surgical Diffs vs. End-to-End
While reading was elegant, writing was clumsy. Gemini 2.5 Pro struggled heavily with surgical, line-by-line insertions. Standard diff patching is notoriously flaky with LLMs—they hallucinate line numbers or forget indentation. When I asked for complex changes across a large file, it would misplace the code entirely.
The Workaround: The solution wasn't to push harder; it was to change the abstraction. I stopped asking for line manipulations. Instead, I updated the protocol to force verbatim block patching or end-to-end rewrites of the entire file. The
searchblock had to match the local file exactly, down to the space, or the Engine rejected it.
2. Amnesia and The Autonomous Self-Healing Loop
Sometimes, the AI would just forget its own rules. It would hallucinate bad JSON syntax (trailing commas, unescaped quotes) or start outputting raw markdown. If I fixed it manually in my IDE, the sync between the "brain" and the "hands" broke.
The Workaround: I built an Autonomous Self-Correction Loop. If the Chrome extension failed to
JSON.parse()the output, it automatically generated an error payload and injected it directly back into the chat. Gemini would immediately apologize, fix the syntax, and re-emit the tool call without me typing a single word and if this was too often then I inserted Protocol docs again.
╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░ AUTONOMOUS SELF-HEALING LOOP ░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ [🧠 Gemini UI] [🧩 Extension] [💻 Engine] ║
║ │ │ │ ║
║ │ 1. Invalid JSON │ │ ║
║ │ ───────────────────▶ │ │ ║
║ │ │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ║
║ │ 2. Catch Error │ ▓ JSON Parse Fails ▓ │ ║
║ │ ◀─────────────────── │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ ║
║ │ │ │ ║
║ │ 3. Fixes Syntax │ │ ║
║ │ ───────────────────▶ │ │ ║
║ │ │ 4. Valid Request │ ║
║ │ │ ━━━━━━━━━━━━━━━━━━━▶ │ ║
║ │ │ │ ║
║ │ │ ╭────────────────╮ │ ║
║ │ │ │ ⚙️ Validates │ │ ║
║ │ │ │ ⚡ Executes │ │ ║
║ │ │ ╰────────────────╯ │ ║
║ │ │ 5. Returns Data │ ║
║ │ │ ◀━━━━━━━━━━━━━━━━━━━ │ ║
║ │ 6. Inject Status │ │ ║
║ │ ◀━━━━━━━━━━━━━━━━━━━ │ │ ║
║ ║
╚══════════════════════════════════════════════════════════════╝
3. Context Degradation (The Final Boss)
This compounds when the conversation gets too long. When dealing with detailed codebases, the LLM eventually loses its understanding of the current situation. The context window fills up, attention mechanisms drift, and it makes bad decisions.
The Master Workaround: To tackle this, I created a specific "Summarization Prompt" within the Chrome extension. When a chat got too long and the AI started losing the plot, I didn't try to salvage it.
I would run the prompt, instructing Gemini to condense its current understanding of the architecture, the problem, and our progress into one cohesive document. I would then open a brand new chat, paste that summary along with my base JSON rules, and resume.
╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░ CONTEXT HANDOVER PROTOCOL (SUMMARIZATION) ░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ [ ⏳ STAGE 1: Degradation ] ║
║ │ ║
║ ╰─▶ 💬 Long Chat Session ──▶ ⚠️ Hallucinations Begin ║
║ ║
║ [ 💉 STAGE 2: The Handoff ] ║
║ │ ║
║ ├─▶ 1. Inject "Summarization Prompt" via Extension ║
║ ╰─▶ 2. Gemini Generates Cohesive 'State Document' 🗂️ ║
║ ║
║ [ ✨ STAGE 3: Resurrection ] ║
║ │ ║
║ ├─▶ 1. Close Degraded Session 🗑️ ║
║ ├─▶ 2. Open Empty Chat Session 🆕 ║
║ ├─▶ 3. Inject [ Base Rules + State Document ] 📥 ║
║ ╰─▶ 4. Fresh AI Instance with Perfect Context 🚀 ║
║ ║
╚══════════════════════════════════════════════════════════════╝
It was like giving the AI a fresh cup of coffee and a perfect handover document.
Elevating the craft
The Kizuna system did not work flawlessly out of the box. But building it taught me that modern LLMs are not a pipeline; they are a search. They hallucinate, they forget, and they make errors.
But by building the right scaffolding—strict constraints, local safety nets, AST parsers, and clever workarounds like the self-healing loop and summarization prompt—you can harness them to do incredible things. It forced me to be intentional about how software is written, and it laid a profound foundation for the systems I want to build next.
Top comments (0)