Jenuel Oras Ganawed

Posted on Jun 28 • Originally published at blog.jenuel.dev

Ornith 1.0: The Open-Source Coding Model Developers Should Watch Closely

#ai #vscode #programming #opensource

A new coding model is easy to ignore now. Every week someone claims a new benchmark score, a new agent, a new workflow, a new reason developers should change everything. Most of it fades by Monday.

Ornith 1.0 is worth a slower look.

DeepReinforce released Ornith 1.0 as an open-source family of models built specifically for agentic coding. That phrase matters. This is not only a chat model that can write a function when you ask politely. It is aimed at the messier part of software work: searching through a repo, using tools, trying a patch, reading failures, adjusting the plan, and doing that loop again.

If the reported numbers hold up in real-world use, Ornith 1.0 could become one of the more important open model releases for developers this year. Not because it magically replaces programmers. It does not. The interesting part is that it pushes strong coding-agent behavior into a model family that can be studied, hosted, modified, and run outside a closed API.

What is Ornith 1.0?

Ornith 1.0 is a family of open-source large language models from DeepReinforce AI focused on agentic coding. The official release lists several sizes: 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The models are post-trained on top of Gemma 4 and Qwen 3.5 foundations, and the project is published under the MIT license.

The 9B model is the approachable one for local experiments. The 35B and 397B variants are more serious serving targets for teams with stronger hardware. The project also publishes GGUF and FP8 variants, which matters because developers do not all have the same machines. A model that only works inside a giant lab is interesting. A model with smaller and quantized paths is useful.

The official docs say Ornith supports an OpenAI-compatible interface and a 256K token context window when served with modern runtimes. For developers, that means the model can fit into existing tools more easily: coding agents, VS Code extensions, local inference servers, and scripts that already speak the OpenAI API format.

The part that makes it different: self-scaffolding

The phrase DeepReinforce uses is "self-scaffolding." In normal agent setups, humans design the harness: how the model calls tools, what steps it should try, how it should recover from failure, and how it should structure its work. Ornith 1.0 tries to learn parts of that scaffold during reinforcement learning.

In plain English: the model is trained not only to produce the answer, but also to improve the process it uses to get there.

That is a big deal for coding. Real programming is rarely one-shot. You inspect the repo, make a change, run tests, hit an error, read the stack trace, narrow the problem, and patch again. A model that can learn better search and repair patterns is more valuable than a model that only produces a pretty code block in chat.

There is a catch. If a model learns to build its own scaffolds, it may also learn tricks that satisfy a verifier without really solving the task. TestingCatalog notes that DeepReinforce describes safeguards around this, including an outer trust boundary, a deterministic monitor, and a frozen LLM judge. That is good to see, but teams should still treat benchmark claims as a starting point, not a safety guarantee.

The benchmark story

According to DeepReinforce's published results, Ornith 1.0 performs strongly on agentic coding benchmarks. The headline number is the 397B model: 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified. The same release compares that result against Claude Opus 4.7 at 70.3 on Terminal-Bench 2.1 and 80.8 on SWE-Bench Verified.

The smaller models are also interesting. Ornith 1.0 9B is reported at 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified. That is the number I keep coming back to, because a useful smaller coding model changes who can experiment. Students, solo developers, startups, and privacy-conscious teams can test local agent workflows without sending every file to a hosted model.

Ornith 1.0 397B benchmark results. Source: DeepReinforce GitHub.

Ornith 1.0 35B benchmark results. Source: DeepReinforce GitHub.

Ornith 1.0 9B benchmark results. Source: DeepReinforce GitHub.

Quick benchmark summary

Model	Terminal-Bench 2.1	SWE-Bench Verified	Why it matters
Ornith 1.0 397B	77.5	82.4	Flagship open model aimed at frontier agentic coding.
Ornith 1.0 35B	64.2	75.6	Stronger team/self-hosted option without jumping to the largest model.
Ornith 1.0 9B	43.1	69.4	Most practical entry point for local testing and privacy-first experiments.

One honest note: these are vendor-published benchmark results. They are still useful, especially because the repo publishes detailed harness notes, but developers should test Ornith on their own repositories before making workflow decisions.

Why this could be a big changer

The open-source AI coding race has been moving from autocomplete to agents. That shift changes the question. Developers no longer ask only, "Can it write code?" They ask, "Can it work inside my project without breaking everything?"

Ornith 1.0 matters because it attacks that second question.

It is open enough to inspect and host. Closed coding agents can be powerful, but they create trust and data questions. An MIT-licensed model family gives teams more control.
It is built for tool-using coding loops. Benchmarks like Terminal-Bench and SWE-Bench are closer to real developer work than simple prompt-answer tests.
It has practical model sizes. 397B is for serious infrastructure. 9B and GGUF variants are for people who want to experiment locally.
It can plug into existing tools. OpenAI-compatible serving makes it easier to connect Ornith to VS Code extensions, OpenHands, custom scripts, and local agent frameworks.

The deeper shift is cultural. If models like Ornith keep improving, teams may start treating local or self-hosted coding agents as normal infrastructure, the same way they treat CI, linters, and internal dev tools.

Where Ornith 1.0 is useful

I would not use Ornith as a blind autopilot. I would use it as a repo-aware assistant that works under human review.

Bug fixing: give the agent a failing test, let it inspect the codebase, propose a patch, and rerun tests.
Refactoring: ask it to update repeated patterns across a project, then review the diff like you would review a junior developer's PR.
Test generation: use it to create coverage around brittle code before a larger change.
Offline or private coding: run a smaller checkpoint locally when the repository cannot leave your machine.
Agent research: study how self-scaffolding changes tool use, failure recovery, and long-context repo work.

Which model should you choose?

Use case	Recommended variant	Reason
Local experimentation	Ornith 1.0 9B GGUF	Easiest path for consumer machines and local tools.
Single powerful GPU server	Ornith 1.0 9B bf16 or quantized 35B	Good for private coding assistants and internal testing.
Team coding agent server	Ornith 1.0 35B or 35B FP8	Better performance while staying far below the flagship size.
Benchmark chasing or frontier experiments	Ornith 1.0 397B / 397B FP8	Best published results, but requires serious multi-GPU infrastructure.

My recommendation: start with 9B GGUF if you are learning, 35B if you have the hardware, and treat 397B as a hosted or lab-grade option unless your team already runs large MoE models.

How to use Ornith 1.0 on Windows

The simplest Windows path is Ollama or LM Studio with a GGUF checkpoint. If you have an NVIDIA GPU and prefer a Linux-like serving stack, use WSL2 and run vLLM from Ubuntu inside WSL.

# Option A: Windows + Ollama or LM Studio
# 1. Install Ollama or LM Studio.
# 2. Download a GGUF variant from Hugging Face, such as Ornith-1.0-9B-GGUF.
# 3. Start a local OpenAI-compatible server.
# 4. Point your coding tool to http://localhost:11434/v1 or the port your app exposes.

For WSL2 with vLLM:

# Inside Ubuntu on WSL2
python -m venv .venv
source .venv/bin/activate
pip install -U vllm
MODEL=deepreinforce-ai/Ornith-1.0-9B
vllm serve $MODEL \
  --served-model-name Ornith-1.0 \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 262144 \
  --enable-prefix-caching \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --reasoning-parser qwen3 \
  --trust-remote-code

Then test it:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Ornith-1.0",
    "messages": [{"role": "user", "content": "Write a short Python is_prime function."}],
    "temperature": 0.6
  }'

How to use Ornith 1.0 on Linux

Linux is the cleanest path for vLLM or SGLang. Make sure your NVIDIA drivers, CUDA stack, and Python environment are ready first.

python -m venv ornith-env
source ornith-env/bin/activate
pip install -U vllm

MODEL=deepreinforce-ai/Ornith-1.0-9B
vllm serve $MODEL \
  --served-model-name Ornith-1.0 \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 262144 \
  --gpu-memory-utilization 0.90 \
  --enable-prefix-caching \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --reasoning-parser qwen3 \
  --trust-remote-code

For 35B or 397B, use tensor parallelism and match the number to your GPU count:

MODEL=deepreinforce-ai/Ornith-1.0-35B-FP8
vllm serve $MODEL \
  --served-model-name Ornith-1.0 \
  --tensor-parallel-size 4 \
  --host 0.0.0.0 --port 8000 \
  --max-model-len 262144 \
  --enable-prefix-caching \
  --enable-auto-tool-choice --tool-call-parser qwen3_xml \
  --reasoning-parser qwen3 \
  --trust-remote-code

How to use Ornith 1.0 on macOS

On a Mac, start with GGUF. Apple Silicon machines are good local LLM boxes, but the 35B and 397B models are not casual laptop workloads. Try the 9B GGUF first.

# Option A: LM Studio
# 1. Install LM Studio for macOS.
# 2. Search for or download the Ornith-1.0-9B-GGUF checkpoint.
# 3. Start the local server from LM Studio.
# 4. Use the local OpenAI-compatible endpoint in your editor or agent.

If you use llama.cpp directly:

# Build llama.cpp, download a GGUF file, then serve it
./llama-server \
  -m /path/to/Ornith-1.0-9B.gguf \
  --host 0.0.0.0 --port 8000 \
  -c 32768

I would not begin with the largest context window on a laptop. Start smaller, confirm speed and memory, then increase context only if you need it.

How to use Ornith 1.0 in VS Code

The easiest VS Code setup is to run Ornith behind an OpenAI-compatible local server, then connect it through an extension such as Continue or another tool that lets you define a custom OpenAI-compatible endpoint.

Start Ornith with vLLM, SGLang, LM Studio, Ollama, or llama.cpp server.
Confirm the endpoint works at http://localhost:8000/v1 or your local server URL.
Install a VS Code AI extension that supports custom OpenAI-compatible providers.
Add a model entry with the model name Ornith-1.0.
Use it first for small tasks: explain a file, write tests, fix one failing function, or review a diff.

A typical Continue-style configuration looks like this:

{
  "models": [
    {
      "title": "Ornith 1.0 Local",
      "provider": "openai",
      "model": "Ornith-1.0",
      "apiBase": "http://localhost:8000/v1",
      "apiKey": "not-needed-for-local"
    }
  ]
}

Do not start by asking it to rewrite your entire application. That is how people get huge diffs they cannot review. Start with one failing test, one file, or one small refactor. Let it earn trust.

Practical guardrails before you use it on real code

Use git and commit before asking any agent to modify files.
Run tests after every patch.
Review the diff line by line.
Keep secrets out of prompts unless the model is fully local and your logs are private.
Prefer tasks with objective feedback: tests, type checks, lint, build output.
Do not let any coding agent auto-merge changes without human review.

My recommendation

Developers should test Ornith 1.0, but they should test it like engineers, not fans.

If you are a solo developer, try the 9B GGUF model locally through LM Studio, Ollama, or llama.cpp. Use it for test writing, bug hunting, and small refactors. If you are a team, set up a private vLLM or SGLang endpoint and compare it against your current assistant on your own repositories. The benchmark chart is interesting, but your codebase is the benchmark that matters.

If Ornith's self-scaffolding approach keeps improving, the next wave of AI coding may not be about who has the nicest autocomplete. It may be about who can build the most reliable software agent loop while keeping developers in control.

That is why Ornith 1.0 is worth watching. It points toward a future where powerful coding agents are not only rented from closed platforms. They can be hosted, inspected, adapted, and used on your own terms.

References

Originally published at https://blog.jenuel.dev/blog/ornith-1-open-source-agentic-coding-model

Thanks for reading! If you enjoyed this article and like this kind of content, you're always welcome to buy me a little coffee, but only if you'd like to. No pressure at all, and either way I'm truly grateful you stopped by. ☕️