requirements
- llama.cpp
- nodejs (if you use
npm)
I'm using NVIDIA GeForce RTX 3070.
Step 1. Install codex
First install codex on WSL.
If Node.js isn’t installed yet, I recommend installing it with mise.
npm install -g @openai/codex@latest
# or use curl
curl -fsSL https://chatgpt.com/codex/install.sh | sh
Step 2. Create .codex folder
We need to create config.toml to use local llm with llama.cpp. First we need to run codex
codex
You don't need to set up anything here. You just need to hit ctrl + c.
Step 3. Create config.toml
Once you run Codex, your WSL will have .codex folder.
You can use whatever you like.
vim ~/.codex/config.toml
config.toml
[model_providers.llama]
name = "llama.cpp"
base_url = "http://localhost:8080/v1"
wire_api = "responses"
stream_idle_timeout_ms = 10000000
Step 4. Run llama.server to run Gemma-4
If llama.cpp isn't build/installed yet, you will need to build by yourself or install via homebrew.
For this article, I used google--gemma-4-12B-it-Q4_K_M.gguf
My folder structure
drwxr-xr-x 30 root root 4096 Jun 8 23:00 llama.cpp
drwxr-xr-x 6 root root 4096 Jun 8 02:21 quantization
ls -l quantization/
total 276
-rw-r--r-- 1 root root 0 Jun 6 13:00 README.md
drwxr-xr-x 2 root root 4096 Jun 8 23:29 gguf <-- google--gemma-4-12B-it-Q4_K_M.gguf is here
-rw-r--r-- 1 root root 90 Jun 6 13:00 main.py
-rw-r--r-- 1 root root 26 Jun 6 13:17 mise.toml
drwxr-xr-x 3 root root 4096 Jun 8 02:33 models
-rw-r--r-- 1 root root 307 Jun 6 13:00 pyproject.toml
-rwxr-xr-x 1 root root 4060 Jun 8 02:21 quantize.sh
-rw-r--r-- 1 root root 254959 Jun 6 13:00 uv.lock
We need to run llama.server
-c: context size should be bigger than 7959. I set -c 4096 first time and got the following error.
{"error":{"code":400,"message":"request (7959 tokens) exceeds the available context size (4096 tokens), try increasing it","type":"exceed_context_size_error","n_prompt_tokens":7959,"n_ctx":4096}}
cd llama.cpp
./build/bin/llama-server -m ../quantization/gguf/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -c 100000 --port 8080
Step 5. Run Codex
Now, it's a time to run Codex
Open new tab/session and run Codex
codex --model ./quantization/gguf/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -c model_provider=llama --search --dangerously-bypass-approvals-and-sandbox
I sent a very simple prompt and got the following app.
The app allows me to add a new task, check a task, and delete a task.
Can you create a simple todo app with reactjs and typescript

Top comments (0)