Shashi Jagtap

Posted on Sep 1

Codex CLI: Running GPT-OSS and Local Coding Models with Ollama, LM Studio, and MLX

Agentic coding is evolving rapidly, reshaping how developers interact with AI to generate code. Instead of being locked inside full-blown IDEs, many are moving back toward lightweight, flexible command-line interfaces. Since the arrival of Claude Code, we’ve seen a wave of new coding CLIs Gemini CLI, Qwen Code, and others but each has come with a major limitation: they are tied to a single model provider.

Codex CLI breaks that pattern. It’s the first CLI designed to be truly universal, capable of running any model cloud-based or open-source, local or remote through a single, unified interface. No more juggling separate CLIs or switching mental contexts depending on the model you want to use. There might be some toy open source projects doing something similar, but Codex is the first official CLI from a major model provider that allows developers to achieve this. With Codex CLI, you configure providers once and seamlessly switch between them with simple providers, profiles, or MCP servers.

This is still early stage, but it opens up a lot of possibilities for Agentic Coding in the near future.

Codex CLI

Codex CLI is OpenAI’s bold response to the wave of coding assistants like Claude Code and Gemini CLI. OpenAI describes it as “one agent for everywhere you code” and that vision shows. With a single installation, you get a lightweight yet powerful CLI that brings AI coding directly into your terminal.

Installation is straightforward:

If you have Node.js installed, run:

  npm i -g @openai/codex

On macOS, you can also use Homebrew:

  brew install codex

Once installed, you’re ready to go. Simply navigate to any project directory and launch:

codex

From there, Codex CLI integrates seamlessly into your workflow, providing an AI assistant without needing an IDE or browser-based environment.

Cloud Models vs. Open Source Models

OpenAI has recently released two open-source models: GPT-OSS-20B and GPT-OSS-120B, alongside GPT-5.

By default, Codex CLI connects to cloud models like GPT-5. These are great for rapid prototyping, but they also come with tradeoffs: API costs, usage limits, and the need for a constant internet connection.

The real breakthrough is that Codex also supports open-source, self-hosted models. With the --oss flag or a configured profile, you can run inference locally through providers like Ollama, LM Studio, or MLX.

For example:

codex --oss

By default, this checks if you have gpt-oss-20b installed with Ollama. You can also specify another model:

codex --oss -m gpt-oss:120b

Running models locally unlocks powerful advantages:

Run powerful LLMs locally without sending data to external servers
Avoid vendor lock-in by swapping providers or models at will
Optimize for privacy, speed, and cost while keeping workflows flexible

In short, Codex gives developers the freedom to choose between cutting-edge cloud models and locally hosted OSS models—all from the same CLI.

Configuring Codex with `config.toml`

When you install Codex CLI, you’ll find a ~/.codex/ directory on your system. This directory contains configuration files and subdirectories. If ~/.codex/config.toml doesn’t exist, create it manually.

This file allows you to configure providers and create profiles for different models. Some options aren’t fully documented yet, but you can explore the Codex source code for details. You can also configure MCP servers here.

Ollama Configuration

Assuming you have a model already downloaded and Ollama running, add the following to your ~/.codex/config.toml:

[model_providers.ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"

[profiles.gpt-oss-120b-ollama]
model_provider = "ollama"
model = "gpt-oss:120b"

Then launch Codex with:

codex --oss --profile gpt-oss-120b-ollama

LM Studio Configuration

In LM Studio, you’ll need to load a model and start the server (default port is 1234). You can use the LM Studio UI or the CLI:

# List available models
lms ls  

# Load the model
lms load qwen/qwen3-coder-30b  

# Start the server
lms server start

Config for GPT-OSS-120B

[model_providers.lms]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.gpt-oss-120b-lms]
model_provider = "lms"
model = "gpt-oss:120b"

Config for Qwen3-Coder-30B

[model_providers.lm_studio]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.qwen3-coder-30b-lms]
model_provider = "lm_studio"
model = "qwen/qwen3-coder-30b"

Launch with:

codex --profile gpt-oss-120b-lms  
codex --profile qwen3-coder-30b-lms

MLX Configuration

On Apple Silicon, you can use MLX for faster inference. Install the MLX LM package:

pip install mlx-lm

Start a local server:

mlx_lm.server --model SuperagenticAI/gpt-oss-20b-8bit-mlx --port 8888

Update your Codex config:

[model_providers.mlx]
name = "MLX LM"
base_url = "http://localhost:8888/v1"

[profiles.gpt-oss-20b-8bit-mlx]
model_provider = "mlx"
model = "SuperagenticAI/gpt-oss-20b-8bit-mlx"

Run with:

codex --profile gpt-oss-20b-8bit-mlx

Watch It in Action

🎥 Demo Video

Context Length

One challenge with local coding models is context length you may need to adjust it for larger projects.

Ollama: use /set parameter num_ctx
LM Studio: pass --context-length to the lms load command
MLX: configure via model/server launch parameters

Why Run Local Models?

While cloud APIs are convenient, local models bring unique benefits:

Privacy: your code never leaves your machine
Cost control: no API bills for long-running tasks
Flexibility: swap models without waiting for API support
Resilience: works offline or in restricted environments

By combining Codex CLI with providers like Ollama, LM Studio, and MLX, you get the best of both worlds: a unified developer experience with full freedom to choose between cloud and local inference.

Final Thoughts

Codex CLI marks a shift in how developers interact with AI coding models. For the first time, you can use one CLI to manage all your models from OpenAI’s cloud APIs to cutting-edge OSS models running locally.

If you’re serious about building with AI while keeping flexibility, privacy, and cost in check, it’s worth setting up Codex CLI with local providers today.

DEV Community

Codex CLI: Running GPT-OSS and Local Coding Models with Ollama, LM Studio, and MLX

Codex CLI

Cloud Models vs. Open Source Models

Configuring Codex with `config.toml`

Ollama Configuration

LM Studio Configuration

Config for GPT-OSS-120B

Config for Qwen3-Coder-30B

MLX Configuration

Watch It in Action

Context Length

Why Run Local Models?

Final Thoughts

Top comments (0)

Codex CLI

Cloud Models vs. Open Source Models

Configuring Codex with config.toml

Ollama Configuration

LM Studio Configuration

Config for GPT-OSS-120B

Config for Qwen3-Coder-30B

MLX Configuration

Watch It in Action

Context Length

Why Run Local Models?

Final Thoughts

Configuring Codex with `config.toml`