This article was originally published on aifoss.dev
TL;DR: Open Interpreter v0.4.3 gives any LLM the ability to write and execute Python, JavaScript, and shell commands directly on your machine — no sandbox, full filesystem access. The local LLM path works but requires 14B+ models for reliable output; 7B models produce too many errors for real tasks. Cloud API users (Claude or GPT-4o) get the best experience; local-first users should set their expectations accordingly.
| Open Interpreter | Aider | Cline | |
|---|---|---|---|
| Best for | System tasks, file ops, data analysis, OS automation | Git-native code editing, multi-file refactors | VS Code-based autonomous coding agent |
| Install complexity | Medium (pip + Ollama optional) | Low (pip) | Low (VS Code extension) |
| Local model quality | Needs 14B+ for reliability | Works well at 14B+ | Works at 14B+, best with cloud models |
| Hardware needs | 8–16GB VRAM for local, none for cloud | 8GB VRAM minimum for local | 8GB VRAM minimum for local |
| The catch | AGPL-3.0; OS mode is experimental | Git-only workflow | VS Code only |
Honest take: Use Open Interpreter when you need an LLM to actually run things on your computer — data analysis scripts, file manipulation, web scraping. For pure code editing, Aider or Cline are better tools.
What Open Interpreter Actually Does
ChatGPT's Code Interpreter runs your code inside a sandboxed container on OpenAI's servers. It can't touch your local files, install system packages, or browse the web. What you get back is a result inside the chat window.
Open Interpreter removes all of those constraints. When the LLM writes a Python script to analyze your CSV files, that script runs on your actual machine, reading from your actual filesystem. When it installs a package, it's installed in your local Python environment. There's no isolation layer — and that's both the point and the risk.
The project is maintained by the OpenInterpreter team, is licensed under AGPL-3.0, has accumulated over 63,000 GitHub stars, and is currently at version 0.4.3. It supports Python 3.9 through 3.12.
Two distinct modes exist:
Standard mode — you type a task in plain English, the model writes code to accomplish it, shows you the code before running, and waits for your approval. You can disable the approval step (--yes flag), but the default is conservative.
OS mode (--os flag) — the model gets access to your screen via screenshots and can control the mouse and keyboard to interact with any GUI application. Think "Jarvis" for your desktop, not just your terminal.
Installation
pip install open-interpreter
That's it. Python 3.9+ required, no CUDA setup needed if you're using a cloud API. The first run will prompt you to configure an API key or set up a local model.
For Ollama-based local inference, install Ollama separately:
# Install Ollama (Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a capable model
ollama pull codestral
# or
ollama pull deepseek-coder-v2:16b
Standard Mode in Practice
Start with the default cloud setup (OpenAI key required):
interpreter
Or with an Anthropic key:
interpreter --model claude-opus-4-8
The session opens a terminal chat interface. Ask it something concrete:
> Download the 10 most recent commits from my current git repo, format them as a markdown table, and save to commits.md
The model writes a Python script using subprocess to call git log, formats the output, and writes the file. Before executing, it shows you the code and asks "Would you like to run this?" — hit y and it runs. The result appears in the terminal and the file lands on disk.
This confirmation loop is the right default. You can skip it:
interpreter --yes
But only do this if you're running quick, low-stakes tasks. Without confirmation, a confused model can do things you didn't intend.
The Python API is clean for embedding in your own scripts:
from interpreter import interpreter
interpreter.auto_run = True # skip confirmation
interpreter.llm.model = "gpt-4o"
interpreter.chat("Analyze the CSV files in ./data and print summary statistics")
OS Mode: Full Computer Control
Version 0.4.0 shipped --os mode, which is the genuinely unusual capability here. Standard mode executes code in a shell; OS mode can see your screen and drive your mouse and keyboard.
interpreter --os
The model receives a screenshot of your current display. It can:
- Click UI elements by describing them
- Type into text fields
- Scroll, drag, open applications
- Read text from any visible window
It's powered by a vision-capable model (currently best with Claude or GPT-4V — local Ollama models with vision support are technically possible but unreliable for this use case) and the screenpipe integration for real-time screen capture.
A practical use: "Open Excel, find the spreadsheet named Q1 Sales, sum the revenue column, and put the result in cell B1."
The model figures out how to navigate to the file, click the right cells, enter a formula. It works. Until it doesn't — when a UI element is positioned differently than expected, or the model mis-clicks, or the formula syntax is wrong in a context-specific way. OS mode is genuinely impressive and genuinely fragile.
Requirements for OS mode:
- Vision-capable model (cloud API strongly recommended)
- Screen recording permission granted to your terminal application
- macOS, Windows, or Linux (screenpipe supports all three)
The project explicitly calls it experimental. Don't run it unattended against anything irreversible.
Running Locally with Ollama
The interactive local setup wizard:
interpreter --local
This launches a model explorer menu that lets you pick a model from your local Ollama library and auto-configures the API endpoint. It's the fastest path if you want to stay GUI-free.
For manual configuration, either via CLI:
interpreter --model ollama_chat/codestral --api_base http://localhost:11434
Or via Python:
from interpreter import interpreter
interpreter.offline = True
interpreter.llm.model = "ollama_chat/codestral"
interpreter.llm.api_base = "http://localhost:11434"
interpreter.llm.context_window = 16000 # override the 3000-token default
interpreter.llm.max_tokens = 4096
interpreter.chat()
Note the context_window override. Open Interpreter defaults to 3000 tokens in local mode, which is conservative and will cause models to lose track of multi-step tasks. Bump it to the actual context window your model supports.
Profiles let you save a pre-configured setup:
# Use a community-provided codestral profile
interpreter --profile codestral.py
Model Recommendations: Honest Numbers
The project documentation recommends CodeLlama 13B Q8 and DeepSeek Coder 33B Q4 for reliable local inference. Here's the practical breakdown based on community reports:
| Model | VRAM needed | Code reliability | Best for |
|---|---|---|---|
| Qwen2.5-Coder 7B | ~6GB | Low — loops, syntax errors | Simple file ops only |
| CodeLlama 13B Q8 | ~12GB | Medium — handles clear tasks | Data analysis, single-file scripts |
| Devstral / Codestral 22B | ~14GB | Good — comparable to older GPT-3.5 | Most standard mode tasks |
| DeepSeek Coder V2 33B Q4 | ~20GB | Very good | Complex multi-step tasks |
| Cloud (GPT-4o / Claude) | None | Best available | OS mode, complex automation |
The 7B models can handle "rename all files in this folder matching *.log to *.bak" type tasks reliably. They fall apart on anything requiring multi-step logic, error correction, or understanding of a codebase structure.
An RTX 4090 (24GB VRAM) is the sweet spot for running Devstral or DeepSeek Coder 33B Q4 locally at tolerable speed — expect 15–30 tokens/second at Q4 quantization. An [RTX 3090](https://www.amazon.com/s?k=RTX+3090&tag=runaihome
Top comments (0)