Prerequisites
- A machine with at least 16GB RAM (32GB+ recommended for better performance with larger models).
- Operating system: macOS, Linux, or Windows (Ollama supports all).
- Node.js installed (for installing CLI tools like Claude Code and Codex). You can download it from the official Node.js website if not already installed.
- Sufficient storage for models (models can be several GB in size).
- For optimal performance, a GPU (NVIDIA or Apple Silicon) is helpful but not required; CPU-only works.
Step 1: Install Ollama
Ollama is the core tool for running local AI models. Download and install it using the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
Verify the installation by running:
ollama -v
This should display the Ollama version (ensure it's v0.15 or later for ollama launch support).
Step 2: Pull a Suitable Coding Model
Ollama needs a local model to power Claude Code and Codex. Choose a model based on your hardware (RAM/VRAM). Here are recommendations for coding tasks:
-
For 16-32GB RAM (smaller models, faster on CPU):
qwen2.5-coder:7b(good balance of speed and quality).
ollama pull qwen2.5-coder:7b
-
For 32GB+ RAM (larger models, better accuracy):
devstral-small-2(24B parameters) orqwen3-coder:30b.
ollama pull devstral-small-2
-
High-performance option (quantized for efficiency):
glm-4.7-flash:q8_0(requires ~19-24GB VRAM for best speed).
ollama pull glm-4.7-flash:q8_0
If a download fails (e.g., checksum error), remove the model and retry:
ollama rm <model-name>
ollama pull <model-name>
Larger models provide better coding assistance but may run slower on limited hardware.
Step 3: Set Up and Use Claude Code with Ollama
Claude Code is an AI coding assistant from Anthropic, but you can run it locally with Ollama's models instead of paying for API access.
Installation
Install the Claude Code CLI globally via npm:
npm install -g @anthropic-ai/claude-code
Configuration
Option 1: Use ollama launch for easy setup (recommended):
ollama launch claude --model <model-name> # e.g., ollama launch claude --model qwen2.5-coder:7b
This starts Claude Code, configures it to use your local model, and prompts you to confirm trust (type "yes").
Option 2: Manual configuration (add to your ~/.bashrc or ~/.zshrc file):
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"
Reload your shell:
source ~/.bashrc # or ~/.zshrc
Then launch:
claude --model <model-name> # e.g., claude --model devstral-small-2
Usage
- Once launched, you'll enter an interactive session. Type your coding query (e.g., "Write a Python function to sort a list").
- Use Tab to switch modes (e.g., planning vs. building code).
- Press Ctrl+P for options like sharing sessions or switching models.
- Exit with Ctrl+C.
- It runs fully offline and privately on your machine.
Step 4: Set Up and Use Codex with Ollama
Codex (inspired by OpenAI's original code model, often used in tools like GitHub Copilot) can be run locally via a CLI tool integrated with Ollama.
Installation
Install the Codex CLI globally via npm:
npm install -g codex
Configuration
Use ollama launch for setup:
ollama launch codex --model <model-name> # e.g., ollama launch codex --model glm-4.7-flash:q8_0
This configures and launches Codex with your local model. Confirm trust if prompted.
For configuration without launching:
ollama launch codex config
Select your model and save.
Usage
- In the interactive session, enter prompts like "Generate a JavaScript function for API calls."
- Navigate with Tab, use Ctrl+P for menu options.
- It supports code completion, explanation, and generation.
- Fully local and free, no API keys needed.
- Exit with Ctrl+C.
Tips and Troubleshooting
-
Performance: Start with smaller models if your machine is slower. Quantized models (e.g., ending in
:q8_0) are faster. -
Switching Models: Relaunch with a different
--modelflag. -
Cloud Hybrid: If local is too slow, try cloud models like
minimax-m2.1:cloud(pull withollama pull minimax-m2.1:cloud), but note potential costs after free tier. -
Errors: Ensure Ollama is running (
ollama serveif needed). Check RAM usage with system tools. - Updates: Keep Ollama updated for new features and model support.
- This setup replaces paid services like Anthropic or OpenAI APIs, keeping everything private and cost-free.
Top comments (0)