This article was originally published on aifoss.dev
TL;DR: Google's new Colab CLI (released June 5, 2026, Apache-2.0) provisions remote Colab GPUs from your terminal, so any agent with shell access — Aider, Open Interpreter, Claude Code — can run GPU work without a notebook. The catch: only the T4 is free; A100/H100 burn paid compute units, and serving a model API needs a tunnel.
What you'll have running after this guide:
- A Colab T4/A100/H100 runtime you can drive from your local terminal with one command
- A local LLM (Ollama or vLLM) served from that runtime and reachable over an ngrok tunnel
- Aider or Open Interpreter pointed at that endpoint — agentic coding on rented GPU, billed to your Colab plan instead of a separate cloud bill
Honest take: The Colab CLI is the cheapest way to give a local agent a GPU if you already pay for Colab Pro — but it is a batch/dev tool, not an always-on inference host. For a stable API endpoint, rent a box on RunPod instead.
What the Colab CLI actually is
For years, the only way to use a Colab GPU was the browser notebook. The Colab CLI (github.com/googlecolab/google-colab-cli) breaks that open. It connects your local shell to a remote Colab runtime, so you can ship a script, run it on an H100, and pull the results back — no Jupyter kernel in sight.
That last part is the interesting bit for this site. Any tool that can run shell commands can now provision a GPU. Google ships an agent skill file (skills/colab-operator/ plus an AGENTS.md) so terminal agents like Claude Code and Codex get built-in context on how to drive the CLI. But you do not need a fancy agent — Aider and Open Interpreter work fine because the CLI is just commands.
Key facts, verified against the repo and Google's June 5 announcement:
| Detail | |
|---|---|
| License | Apache-2.0 |
| Released | June 5, 2026 |
| Install |
pip install google-colab-cli (or uv tool install) |
| Platforms | Linux and macOS only — no Windows |
| GPUs | T4, L4, G4, A100 (40/80GB), H100 |
| TPUs | v5e, v6e |
| Billing | Your active Colab plan's compute units |
This is a Google-published open-source tool, not a third-party wrapper, which matters for trust — the auth flow uses your real Google account and standard ADC/OAuth2.
Install and first run
The tool is a Python package. uv is the cleaner path because it isolates the CLI from your project environments:
uv tool install google-colab-cli
# or: pip install google-colab-cli
colab version
colab auth # opens a browser, links your Google account
Provision a runtime and check what you got:
$ colab new --gpu T4 -s lab
Allocating runtime 'lab'... connected.
Runtime: T4 (16GB) · 12GB RAM · region us-central1
$ colab status -s lab
Session GPU RAM Uptime State
lab T4 12GB 00:01:14 running
Run a one-off script on a fresh VM (it allocates, runs, and tears down):
$ echo "import torch; print(torch.cuda.get_device_name(0))" | colab exec -s lab
Tesla T4
Ship a local file to the runtime and execute it — no manual upload step:
colab exec -s lab -f train_lora.py
colab download -s lab outputs/adapter.gguf ./ # pull results back
colab stop -s lab # release the VM
That colab exec -f workflow alone replaces the copy-paste-into-a-notebook dance for fine-tuning runs. You can wire it straight into a CI runner: colab run --gpu A100 fine_tune.py allocates an A100, runs the script, and stops the VM, with the GPU cost charged to your Colab subscription rather than a separate GPU-cloud invoice.
The cost reality (read before you get excited)
The queue title for this topic said "free T4/A100/H100." That is half true, and the honest half matters.
Colab uses a compute-unit (CU) model. The free tier gives you a T4 when one is available — that part is genuinely free. Premium GPUs are not:
| GPU | ~CU/hr | Free tier? | Practical cost |
|---|---|---|---|
| T4 (16GB) | 1.76 | Yes (when available) | $0, but preemptible |
| A100 (40/80GB) | ~15 | No | ~7 hrs per $9.99 (100 CU) |
| H100 | higher | No | Pro+ tier, burst quota |
Pay-as-you-go is $9.99 for 100 CU (about 57 hours on a T4 or ~7 hours on an A100). Colab Pro is $9.99/month and Pro+ is $49.99/month for larger burst quotas. The real headache is unpredictability: even on a paid plan you can request an A100 and get handed a T4, or find premium GPUs unavailable entirely at peak times.
So treat the Colab CLI as a way to get a T4 for free or an A100 for a few dollars an hour without standing up your own infra — not as a free H100 farm. If you need a guaranteed GPU type with a stable hourly rate, a dedicated RunPod instance is more honest money. For deciding whether to rent at all versus buy, see runaihome.com's home GPU build guides.
Serving a model from Colab to a local agent
Here is the part the launch posts gloss over. A Colab runtime is not directly reachable from the internet, so you can't just start vLLM on port 8000 and point Aider at it. You need a tunnel. ngrok is the well-trodden path.
Write a small server script, serve.py, that starts Ollama and exposes it:
import subprocess, time, os
from pyngrok import ngrok
# pull a small coding model and serve it
subprocess.Popen(["ollama", "serve"])
time.sleep(5)
subprocess.run(["ollama", "pull", "qwen3-coder:7b"])
ngrok.set_auth_token(os.environ["NGROK_TOKEN"])
tunnel = ngrok.connect(11434)
print("OLLAMA_PUBLIC_URL:", tunnel.public_url)
Run it on a Colab GPU and grab the public URL from the output:
$ colab install -s lab pyngrok # install deps on the runtime
$ colab exec -s lab -f serve.py
OLLAMA_PUBLIC_URL: https://a1b2-34-56-78-90.ngrok-free.app
Now point a local agent at that endpoint. Aider speaks the OpenAI-compatible API, so:
export OPENAI_API_BASE="https://a1b2-34-56-78-90.ngrok-free.app/v1"
export OPENAI_API_KEY="ollama" # placeholder, Ollama ignores it
aider --model openai/qwen3-coder:7b
Open Interpreter follows the same shape:
interpreter --api_base "https://a1b2-34-56-78-90.ngrok-free.app/v1" \
--model openai/qwen3-coder:7b --api_key ollama
Your agent now runs locally — editing your real files on your machine — while inference happens on the Colab GPU. For a deeper look at the agents themselves, see our Aider review and the Open Interpreter vs Aider vs Claude Code comparison. If you'd rather run vLLM for higher throughput, the vLLM setup guide covers the OpenAI-compatible server flags.
A blunt warning on the tunnel: an ngrok URL on a public LLM endpoint is open to anyone who finds it. Keep sessions short, don't paste secrets into prompts, and kill the tunnel when you're done. This is dev-grade plumbing, not a production deployment.
How agents drive the CLI directly
The other integration mode skips the tunnel entirely. Instead of serving a model, you let a terminal agent use the GPU as a tool. Because Google ships the colab-operator skill (and there was an earlier Colab MCP server, released March 2026), an agent can read those instructions and decide on its own to run, say, a heavy embedding job on an A100.
In practice this looks like telling Claude Code or Codex "fine-tune this LoRA on an A100," and the agent calls colab run --gpu A100 ... under the hood, monitors colab status, and downloads the artifact. The CLI's design — predictable subcommands, machine-readable status — is what makes it agent-friendly. Aider and Open Interpreter don't have the skill file baked in, but you can paste the command reference into a system prompt and they'll use it the same way.
For cloud-hosted AI coding tools that manage all of this for you (at a subscription
Top comments (0)