Eugene Kovshilovsky

Posted on Mar 20 • Edited on Mar 27

128GB of RAM, Zero Internet, and a Year of Building AI Infrastructure Nobody Asked For

#ai #devops #macos #opensource

Most people who know me professionally won't see my name on LinkedIn attached to a company right now. That's intentional. It's not that I'm between roles, it's that the moment you attach a title and a company name to yourself on this platform, your inbox becomes a graveyard of BizDev pitches, QA outsourcing firms promising to "augment your team," and vendors who want to "explore synergies" over a 30-minute call that could have been a "no." I've done that dance for years. I'd rather write code.

What I will say: I'm still a CTO. I'm still leading technical strategy. The scope has changed, and I'm working with a smaller, sharper group of people on things I actually care about, but the role hasn't. I just happen to be writing a lot more code than I have in a decade, and that is the point of this article.

The Shift

About a year ago, I was running e-commerce engineering at CarID. Large teams, enterprise-scale platform, the usual orbit of architecture reviews and vendor evaluations. Over the years I'd pushed hard for modern data tooling through Delta architectures, real-time pipelines, and I'd wave-tested every AI tool that came across my desk. GitHub Copilot, early LLM integrations, various coding assistants. My honest assessment each time was the same: interesting, but I wouldn't let it near a production PR.

Then, around February and March of 2025, something shifted. Not gradually! It felt like someone flipped a switch while I wasn't looking. Models got reliable enough to trust with real work. Claude Code shipped and it wasn't a toy or a glorified autocomplete, it was a genuine development partner that could hold context, reason about architecture, and do actual engineering. The ecosystem around it started moving fast, and for the first time in years, I felt like I was watching something I needed to be inside of, not evaluating from a distance.

By mid-April 2025, I left CarID. In my free time, I moved into building AI-powered developer infrastructure, the kind of tooling that makes the difference between AI being a novelty and AI being a force multiplier. I'm still leading and setting technical direction, making architecture decisions, building systems but I'm also deeper in the work itself than I've been since the early days of my career.

And I am very willing to get my hands dirty.

What started as "let me properly integrate AI into my workflow" turned into a year-long obsession with building with AI reliably, securely, and without depending on things I can't control. Along the way, I recently decided to open-source the pieces because I realized the problems I was solving weren't just mine.

The Problem Nobody Warns You About

Here's the thing about using AI seriously across multiple contexts: the tools assume you're one person with one account doing one thing.

That's adorable.

I have work projects with confidentiality requirements. I have personal projects I'm developing on my own time with my wife and kids. I would never use company resources for personal development, and I'd never mix personal context with professional context. That's not how I operate, the Gemini in me, and if you've ever worked in enterprise, you know that's not how anyone should operate.

So I ended up with multiple Claude Code licenses, multiple LLMs, multiple agents running simultaneously and a very practical problem: how do you keep all of this isolated on one machine without everything bleeding into everything else?

I found out the hard way that Claude Code's config isolation is broken. Instructions from one context started showing up in sessions for another. Not obviously but subtly, in ways that only became visible over time. The digital equivalent of accidentally presenting the wrong client's slides, except it happens silently. That's the kind of thing that erodes trust in your tools, and trust is the whole game when you're delegating real work to AI.

So I Did What Any Reasonable Person Would Do...I Built a VM Orchestrator!

When config-level isolation doesn't work, you need process-level isolation. Separate filesystems. Separate kernels. Real boundaries. Normal people might just log out and log back in. I chose violence.

I built cloister, an open-source CLI that creates lightweight Linux VMs using Apple's Virtualization Framework, where each profile gets its own isolated environment while sharing your code workspace, SSH keys, and Claude Code plugins. Type "cloister work" and your terminal background changes color, tunnels auto-discover host services, and you're in an isolated shell with Claude Code installed. Everything works, nothing leaks.

But getting here wasn't a straight line. Not even close.

The War Stories

My M5 Max has 128GB of RAM. Absolute unit. When I tried to run Ollama with a 27-billion-parameter model on it, it crashed immediately. After a day of debugging Metal shader compiler output, I found the issue: six lines of code mixing bfloat and half types in a matrix multiplication kernel. The fix had been merged upstream three months earlier, but Ollama hadn't synced it yet.

Six lines.

Between a working 27B model and me on the most powerful laptop Apple makes. Sometimes the universe has a sense of humor. I cloned Ollama from source, applied the fix, and wrote up a full diagnostic guide so other M5 owners don't have to go through the same detective work.

Then I learned that Apple does not expose Metal GPU access to Linux VMs through any hypervisor API. None. Zero. Apple looked at that feature request and said "no" in every API they ship. The workaround, a Vulkan translation through three layers, is the GPU equivalent of translating English to French to Japanese to get your point across at a dinner party.

So I tunneled around it. Ollama runs on the host with native Metal acceleration, and an SSH reverse tunnel forwards it into each VM. Full GPU speed, zero translation overhead, and the VM is just a thin client sending prompts and receiving tokens.

Along the way I also discovered that NVM silently breaks bash strict mode, GPG commit signing is a three-layer lock-contention nightmare in VMs, Claude Code switched installers mid-project, and a single Claude session can consume 37GB of virtual memory. I lost a 20-minute coding session to an OOM kill before I learned to allocate proper swap.

Credentials Shouldn't Live in Plaintext

One of the things that kept me up at night about running AI agents in VMs was credential management. The standard approach is to export API keys as environment variables in plaintext, sitting in memory, accessible to any process that asks nicely.

Inside a VM running autonomous agents with a published history of 512 security vulnerabilities? That's not a security concern. That's a security invitation.

I built op-forward, a daemon-tunnel-shim that forwards 1Password CLI commands from the VM to the Mac host, where Touch ID handles the authentication. Every credential access triggers my actual fingerprint. If the VM is compromised, the attacker gets an op binary that responds to every request with the digital equivalent of "go ask your mother."

I also built a consent system for the VMs themselves. Interactive profiles get full access to host services such as clipboard, 1Password, Ollama. Headless agent profiles get nothing unless explicitly whitelisted. SSH keys, GPG keys, Downloads are locked out by default. It's like giving your house keys to a stranger because they promised to only use the kitchen. Except I don't give them the keys.

Working at 35,000 Feet

This is the part that matters most to me.

Picture this: I'm on a flight. The guy next to me is watching downloaded Netflix episodes. The WiFi costs $8 and delivers bandwidth that would have embarrassed a 56k modem.

I'm running a 27-billion-parameter language model doing code review on my laptop. Locally. On a GPU that Apple won't let my VMs touch directly, so I tunneled around the restriction.

When the model needs a database password, it asks 1Password through a daemon-tunnel-shim chain that triggers Touch ID. The VM's Claude Code session has its own credentials and history and is completely isolated from every other context on my machine.

No internet. No API calls leaving the aircraft. No tokens counted against a rate limit.

I can disconnect my Mac from the internet and keep working. Fully. The entire AI development stack runs on one machine with zero external dependencies.

That's not an accident. That's what I designed for, because I got tired of my productivity being at the mercy of someone else's infrastructure.

What This Year Taught Me

The biggest shift wasn't technical. It was in how I think about leverage.

For most of my career, leverage meant hiring. More people, more output. Simple math, complicated HR.

Now, leverage means infrastructure. The right local setup lets one person do what used to require a team. Not because AI replaces people...please, let's retire that talking point, but because it changes the bottleneck. The bottleneck used to be typing speed and debugging time. Now it's context management, credential security, and compute orchestration. Nobody taught us this in management training.

The CTO title hasn't changed, but what the job feels like has. I spend less time in meetings about product design and architecture and more time building directly. I spend less time reviewing pull requests and more time pair-programming with an AI that doesn't care if I refactor the same function four times.

The tools I've built this year aren't products I'm trying to sell. They're infrastructure I needed to work the way I want to work. I open-sourced them because I think more people are going to need this same infrastructure, and I'd rather they spend their time building things instead of solving the same plumbing problems I already solved.

What's Next

If you're a senior technical person who's starting to work with AI seriously and not just asking it to explain regex, but actually integrating it into your workflow, then you're going to hit these walls. The identity isolation wall. The GPU acceleration wall. The credential security wall. The "what happens when the internet goes down" wall.

I've hit all of them, usually face-first. Here's what's on the other side.

cloister — Isolated VM environments for AI coding agents and multi-account separation.

op-forward — Forward 1Password CLI across SSH boundaries with biometric authentication.

M5 Metal 4 fix guide — Step-by-step diagnostic and fix for Ollama on Apple M5 chips.

DEV Community