Taki

Posted on Mar 3

Local Free Claude & Codex with Ollama

#ai #opensource #llm #beginners

Prerequisites

A machine with at least 16GB RAM (32GB+ recommended for better performance with larger models).
Operating system: macOS, Linux, or Windows (Ollama supports all).
Node.js installed (for installing CLI tools like Claude Code and Codex). You can download it from the official Node.js website if not already installed.
Sufficient storage for models (models can be several GB in size).
For optimal performance, a GPU (NVIDIA or Apple Silicon) is helpful but not required; CPU-only works.

Step 1: Install Ollama

Ollama is the core tool for running local AI models. Download and install it using the following command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation by running:

ollama -v

This should display the Ollama version (ensure it's v0.15 or later for ollama launch support).

Step 2: Pull a Suitable Coding Model

Ollama needs a local model to power Claude Code and Codex. Choose a model based on your hardware (RAM/VRAM). Here are recommendations for coding tasks:

For 16-32GB RAM (smaller models, faster on CPU): qwen2.5-coder:7b (good balance of speed and quality).

  ollama pull qwen2.5-coder:7b

For 32GB+ RAM (larger models, better accuracy): devstral-small-2 (24B parameters) or qwen3-coder:30b.

  ollama pull devstral-small-2

High-performance option (quantized for efficiency): glm-4.7-flash:q8_0 (requires ~19-24GB VRAM for best speed).

  ollama pull glm-4.7-flash:q8_0

If a download fails (e.g., checksum error), remove the model and retry:

ollama rm <model-name>
ollama pull <model-name>

Larger models provide better coding assistance but may run slower on limited hardware.

Step 3: Set Up and Use Claude Code with Ollama

Claude Code is an AI coding assistant from Anthropic, but you can run it locally with Ollama's models instead of paying for API access.

Installation

Install the Claude Code CLI globally via npm:

npm install -g @anthropic-ai/claude-code

Configuration

Option 1: Use ollama launch for easy setup (recommended):

ollama launch claude --model <model-name>  # e.g., ollama launch claude --model qwen2.5-coder:7b

This starts Claude Code, configures it to use your local model, and prompts you to confirm trust (type "yes").

Option 2: Manual configuration (add to your ~/.bashrc or ~/.zshrc file):

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"

Reload your shell:

source ~/.bashrc  # or ~/.zshrc

Then launch:

claude --model <model-name>  # e.g., claude --model devstral-small-2

Usage

Once launched, you'll enter an interactive session. Type your coding query (e.g., "Write a Python function to sort a list").
Use Tab to switch modes (e.g., planning vs. building code).
Press Ctrl+P for options like sharing sessions or switching models.
Exit with Ctrl+C.
It runs fully offline and privately on your machine.

Step 4: Set Up and Use Codex with Ollama

Codex (inspired by OpenAI's original code model, often used in tools like GitHub Copilot) can be run locally via a CLI tool integrated with Ollama.

Installation

Install the Codex CLI globally via npm:

npm install -g codex

Configuration

Use ollama launch for setup:

ollama launch codex --model <model-name>  # e.g., ollama launch codex --model glm-4.7-flash:q8_0

This configures and launches Codex with your local model. Confirm trust if prompted.

For configuration without launching:

ollama launch codex config

Select your model and save.

Usage

In the interactive session, enter prompts like "Generate a JavaScript function for API calls."
Navigate with Tab, use Ctrl+P for menu options.
It supports code completion, explanation, and generation.
Fully local and free, no API keys needed.
Exit with Ctrl+C.

Tips and Troubleshooting

Performance: Start with smaller models if your machine is slower. Quantized models (e.g., ending in :q8_0) are faster.
Switching Models: Relaunch with a different --model flag.
Cloud Hybrid: If local is too slow, try cloud models like minimax-m2.1:cloud (pull with ollama pull minimax-m2.1:cloud), but note potential costs after free tier.
Errors: Ensure Ollama is running (ollama serve if needed). Check RAM usage with system tools.
Updates: Keep Ollama updated for new features and model support.
This setup replaces paid services like Anthropic or OpenAI APIs, keeping everything private and cost-free.

Top comments (2)

Arthur Heckmann • Apr 5 • Edited

Hi, this is a great article. Only few explain how to really get it going completely locally. I tried this out on a device without GPU (Asus NUC 13). Even if a model produces around 20 tokens per second when called with 'ollama run' directly, and even if I build a model with larger context window using 'ollama create' with a custom Modelfile, I only get codex and Claude Code to answer questions after reasoning for at least 2 minutes (First attempts with gpt-oss:20b timed out after around 20 minutes) but they never actually create a file or execute any command on my device.
So unfortunately without a GPU, I see no chance for codex or Claude code to be usable completely locally.
Does anybody have similar or different experiences? I' love to be wrong.
Models I tried were

glm-4.7-flash:q8_0
qwen2.5-coder:7b
gemma4:e2b
qwen3.5:0.8b

Taki • Apr 8

Bc it use your hardware to use model and may your machine does not enough to run. I suggest you can use openrouter.ai/ this one or watch this to set up youtube.com/watch?v=cq6GGKKZRJE. It works for me btw using NVIDIA (free) for better than other