DEV Community

Cover image for Codex CLI: Running GPT-OSS and Local Coding Models with Ollama, LM Studio, and MLX
Shashi Jagtap
Shashi Jagtap

Posted on

Codex CLI: Running GPT-OSS and Local Coding Models with Ollama, LM Studio, and MLX

Agentic coding is evolving rapidly, reshaping how developers interact with AI to generate code. Instead of being locked inside full-blown IDEs, many are moving back toward lightweight, flexible command-line interfaces. Since the arrival of Claude Code, we’ve seen a wave of new coding CLIs Gemini CLI, Qwen Code, and others but each has come with a major limitation: they are tied to a single model provider.

Codex CLI breaks that pattern. It’s the first CLI designed to be truly universal, capable of running any model cloud-based or open-source, local or remote through a single, unified interface. No more juggling separate CLIs or switching mental contexts depending on the model you want to use. There might be some toy open source projects doing something similar, but Codex is the first official CLI from a major model provider that allows developers to achieve this. With Codex CLI, you configure providers once and seamlessly switch between them with simple providers, profiles, or MCP servers.

This is still early stage, but it opens up a lot of possibilities for Agentic Coding in the near future.

Codex CLI

Codex CLI is OpenAI’s bold response to the wave of coding assistants like Claude Code and Gemini CLI. OpenAI describes it as “one agent for everywhere you code” and that vision shows. With a single installation, you get a lightweight yet powerful CLI that brings AI coding directly into your terminal.

Installation is straightforward:

  • If you have Node.js installed, run:
  npm i -g @openai/codex
Enter fullscreen mode Exit fullscreen mode
  • On macOS, you can also use Homebrew:
  brew install codex
Enter fullscreen mode Exit fullscreen mode

Once installed, you’re ready to go. Simply navigate to any project directory and launch:

codex
Enter fullscreen mode Exit fullscreen mode

From there, Codex CLI integrates seamlessly into your workflow, providing an AI assistant without needing an IDE or browser-based environment.

Cloud Models vs. Open Source Models

OpenAI has recently released two open-source models: GPT-OSS-20B and GPT-OSS-120B, alongside GPT-5.

By default, Codex CLI connects to cloud models like GPT-5. These are great for rapid prototyping, but they also come with tradeoffs: API costs, usage limits, and the need for a constant internet connection.

The real breakthrough is that Codex also supports open-source, self-hosted models. With the --oss flag or a configured profile, you can run inference locally through providers like Ollama, LM Studio, or MLX.

For example:

codex --oss
Enter fullscreen mode Exit fullscreen mode

By default, this checks if you have gpt-oss-20b installed with Ollama. You can also specify another model:

codex --oss -m gpt-oss:120b
Enter fullscreen mode Exit fullscreen mode

Running models locally unlocks powerful advantages:

  • Run powerful LLMs locally without sending data to external servers
  • Avoid vendor lock-in by swapping providers or models at will
  • Optimize for privacy, speed, and cost while keeping workflows flexible

In short, Codex gives developers the freedom to choose between cutting-edge cloud models and locally hosted OSS models—all from the same CLI.

Configuring Codex with config.toml

When you install Codex CLI, you’ll find a ~/.codex/ directory on your system. This directory contains configuration files and subdirectories. If ~/.codex/config.toml doesn’t exist, create it manually.

This file allows you to configure providers and create profiles for different models. Some options aren’t fully documented yet, but you can explore the Codex source code for details. You can also configure MCP servers here.

Ollama Configuration

Assuming you have a model already downloaded and Ollama running, add the following to your ~/.codex/config.toml:

[model_providers.ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"

[profiles.gpt-oss-120b-ollama]
model_provider = "ollama"
model = "gpt-oss:120b"
Enter fullscreen mode Exit fullscreen mode

Then launch Codex with:

codex --oss --profile gpt-oss-120b-ollama
Enter fullscreen mode Exit fullscreen mode

LM Studio Configuration

In LM Studio, you’ll need to load a model and start the server (default port is 1234). You can use the LM Studio UI or the CLI:

# List available models
lms ls  

# Load the model
lms load qwen/qwen3-coder-30b  

# Start the server
lms server start
Enter fullscreen mode Exit fullscreen mode

Config for GPT-OSS-120B

[model_providers.lms]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.gpt-oss-120b-lms]
model_provider = "lms"
model = "gpt-oss:120b"
Enter fullscreen mode Exit fullscreen mode

Config for Qwen3-Coder-30B

[model_providers.lm_studio]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.qwen3-coder-30b-lms]
model_provider = "lm_studio"
model = "qwen/qwen3-coder-30b"
Enter fullscreen mode Exit fullscreen mode

Launch with:

codex --profile gpt-oss-120b-lms  
codex --profile qwen3-coder-30b-lms
Enter fullscreen mode Exit fullscreen mode

MLX Configuration

On Apple Silicon, you can use MLX for faster inference. Install the MLX LM package:

pip install mlx-lm
Enter fullscreen mode Exit fullscreen mode

Start a local server:

mlx_lm.server --model SuperagenticAI/gpt-oss-20b-8bit-mlx --port 8888
Enter fullscreen mode Exit fullscreen mode

Update your Codex config:

[model_providers.mlx]
name = "MLX LM"
base_url = "http://localhost:8888/v1"

[profiles.gpt-oss-20b-8bit-mlx]
model_provider = "mlx"
model = "SuperagenticAI/gpt-oss-20b-8bit-mlx"
Enter fullscreen mode Exit fullscreen mode

Run with:

codex --profile gpt-oss-20b-8bit-mlx
Enter fullscreen mode Exit fullscreen mode

Watch It in Action

🎥 Demo Video

Context Length

One challenge with local coding models is context length you may need to adjust it for larger projects.

  • Ollama: use /set parameter num_ctx
  • LM Studio: pass --context-length to the lms load command
  • MLX: configure via model/server launch parameters

Why Run Local Models?

While cloud APIs are convenient, local models bring unique benefits:

  • Privacy: your code never leaves your machine
  • Cost control: no API bills for long-running tasks
  • Flexibility: swap models without waiting for API support
  • Resilience: works offline or in restricted environments

By combining Codex CLI with providers like Ollama, LM Studio, and MLX, you get the best of both worlds: a unified developer experience with full freedom to choose between cloud and local inference.

Final Thoughts

Codex CLI marks a shift in how developers interact with AI coding models. For the first time, you can use one CLI to manage all your models from OpenAI’s cloud APIs to cutting-edge OSS models running locally.

If you’re serious about building with AI while keeping flexibility, privacy, and cost in check, it’s worth setting up Codex CLI with local providers today.

Top comments (0)