TL;DR
H Company launched Holo3 on March 31, 2026, a mixture-of-experts computer use model scoring 78.85% on OSWorld-Verified, the highest recorded score on the desktop computer use benchmark. It beats GPT-5.4 and Opus 4.6 at a fraction of the cost. The API is live now, and the 35B variant is open-weight on HuggingFace under Apache 2.0.
The computer use gap most developers haven't solved
You may already have API automation and a reliable CI/CD pipeline. But some workflows still break automation:
- Legacy enterprise software with no API
- Desktop apps that predate REST
- Multi-step workflows across multiple UIs
- Internal tools where selectors and screen layouts change often
Traditional RPA tools such as UiPath and Automation Anywhere often solve this with screen-coordinate scripts. Those scripts are brittle: when the UI moves, the automation breaks.
Computer use AI uses a different loop. The model sees a screenshot, decides the next action, and returns an instruction such as click, type, scroll, or press a key. That makes it useful for GUI workflows where no API exists.
Holo3, released March 31, 2026 by Paris-based H Company, is currently the strongest publicly available model for this task class.
If you build automation workflows or testing pipelines that touch desktop software, Holo3's API is worth evaluating. If you use Apidog to design and test APIs, you can also use it to validate Holo3 requests, mock responses, and run repeatable test scenarios before connecting automation to a live desktop.
What is Holo3?
Holo3 is a computer use model. You provide:
- A screenshot of a desktop or browser
- A natural-language task
- Screen dimensions
The model returns the next action to execute on that screen. Your agent then performs the action, captures a new screenshot, and repeats until the task is complete.
H Company ships two variants:
-
Holo3-122B-A10B — flagship model. 122B total parameters, 10B active parameters using sparse MoE. Hosted API only at
hcompany.ai/holo-models-api. Sets the current benchmark record. - Holo3-35B-A3B — 35B total parameters, 3B active. Open-weight on HuggingFace under Apache 2.0. Available on H Company's inference API free tier and self-hostable.
The MoE, or mixture-of-experts, architecture means only a subset of parameters is active per token. That lowers serving cost compared with dense models of similar total size. H Company states Holo3-122B-A10B costs less than GPT-5.4 and Opus 4.6 on a per-task basis.
OSWorld-Verified: what the benchmark measures
OSWorld-Verified evaluates whether an AI agent can complete real tasks on a real computer.
Instead of scoring generated text, OSWorld checks the final system state after the agent runs. That makes it closer to end-to-end automation testing than a standard language benchmark.
Tasks include:
- Single-app operations, such as opening a file, filling a form, or copying spreadsheet data
- Cross-app workflows, such as reading a value from a PDF, updating a spreadsheet, and sending an email
- Long-horizon multi-app tasks requiring the agent to preserve context across several systems
Holo3-122B-A10B scores 78.85% on OSWorld-Verified. Scores above 40% were considered state-of-the-art until recently. Previous leading models from Anthropic and OpenAI were in the 60–65% range.
The biggest difference appears in harder workflows. H Company's internal H Corporate Benchmarks include 486 tasks across E-commerce, Business software, Collaboration, and Multi-App workflows. Holo3 pulls ahead most on multi-app tasks, where the agent must coordinate data across several applications.
How Holo3 was trained: the Agentic Learning Flywheel
Most computer use models are trained on static demonstrations. H Company describes Holo3's training loop as the Agentic Learning Flywheel.
The loop includes:
Synthetic Navigation Data
Human and generated instructions create scenario-specific navigation examples.Out-of-Domain Augmentation
Scenarios are extended programmatically to cover unexpected UI states and edge cases.Curated Reinforcement Learning
Data samples are filtered and used in an RL pipeline to optimize for task completion.
The training data comes from the Synthetic Environment Factory, where coding agents build complete enterprise web applications from scenario specs. These environments include verifiable tasks and end-to-end validation scripts, so the model trains on business-style workflows rather than toy examples.
The result is that Holo3 outperforms base Qwen3.5 models with larger parameter counts on the same benchmark tasks. The gap comes from both architecture and training methodology.
How to call the Holo3 API
The Holo3 API follows a screenshot-action loop:
- Capture the current screen
- Send the screenshot and task to the API
- Execute the returned action
- Capture the next screen
- Repeat until the task is complete
1. Set up authentication
# H Company Inference API base URL
https://api.hcompany.ai/v1
# Headers
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Get your API key at hcompany.ai/holo-models-api. The free tier covers Holo3-35B-A3B.
2. Send a screenshot with a task
import base64
import httpx
import pyautogui
# Capture your screen
screenshot = pyautogui.screenshot()
screenshot.save("/tmp/screen.png")
with open("/tmp/screen.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = httpx.post(
"https://api.hcompany.ai/v1/computer-use",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "holo3-122b-a10b",
"task": "Open the invoice folder and find the most recent PDF",
"screenshot": image_b64,
"screen_width": 1920,
"screen_height": 1080,
},
)
action = response.json()
print(action)
3. Parse and execute the action
The API returns structured actions that your local agent executes on the host machine.
Example response:
{
"action_type": "click",
"coordinate": [245, 380],
"reasoning": "The invoice folder icon is visible at this position"
}
Supported action types include:
clickdouble_clickright_clicktypekeyscrollscreenshot_requesttask_complete
4. Loop until completion
A minimal loop looks like this:
def run_computer_use_task(task: str, max_steps: int = 20):
for step in range(max_steps):
screenshot = capture_screen()
response = call_holo3_api(task, screenshot)
action = response["action"]
if action["action_type"] == "task_complete":
print(f"Done in {step + 1} steps")
return response["result"]
execute_action(action)
raise TimeoutError("Task not completed within step limit")
In production, add safeguards:
- Limit the maximum number of steps
- Validate coordinates before executing clicks
- Log screenshots and actions for replay
- Require approval before destructive actions
- Run inside a sandboxed VM when testing
Testing Holo3 API calls with Apidog
Once your code can call the Holo3 API, validate the integration before running it on a real machine. Apidog helps with request setup, response assertions, mocking, and test scenarios.
1. Import the endpoint
Create a new HTTP request in Apidog:
POST https://api.hcompany.ai/v1/computer-use
Add the authorization header as an environment variable:
Authorization: Bearer {{HCOMPANY_API_KEY}}
Content-Type: application/json
This keeps API keys out of your request definitions.
2. Define the request body
Use a representative request body:
{
"model": "holo3-122b-a10b",
"task": "Open the invoice folder and find the most recent PDF",
"screenshot": "{{screenshot_base64}}",
"screen_width": 1920,
"screen_height": 1080
}
Store screenshot_base64 as an environment or scenario variable when running tests.
3. Validate the response shape
Use Apidog post-response scripts to assert that the API returns an executable action.
pm.test("Action type is valid", () => {
const validActions = [
"click",
"double_click",
"right_click",
"type",
"key",
"scroll",
"task_complete",
"screenshot_request"
];
const json = pm.response.json();
pm.expect(validActions).to.include(json.action.action_type);
});
Validate coordinates before your automation executes them:
pm.test("Coordinates are within screen bounds", () => {
const action = pm.response.json().action;
if (action.coordinate) {
pm.expect(action.coordinate[0]).to.be.within(0, 1920);
pm.expect(action.coordinate[1]).to.be.within(0, 1080);
}
});
You can also assert required fields by action type:
pm.test("Click actions include coordinates", () => {
const action = pm.response.json().action;
if (["click", "double_click", "right_click"].includes(action.action_type)) {
pm.expect(action.coordinate).to.be.an("array");
pm.expect(action.coordinate).to.have.length(2);
}
});
4. Mock Holo3 responses during development
Use Apidog's Smart Mock to return realistic Holo3-style responses without calling the live API.
Example mock response:
{
"action": {
"action_type": "click",
"coordinate": [245, 380],
"reasoning": "The invoice folder icon is visible at this position"
}
}
Mocking is useful when:
- You want to avoid using API credits during integration work
- Your orchestration layer is not ready to control a real desktop
- Frontend and backend teams need stable test responses
- You want deterministic test data for CI
5. Run multi-step scenarios
Chain multiple Holo3 requests in an Apidog Test Scenario to simulate a full task loop:
- Send screenshot 1
- Assert the returned action
- Store action data
- Send screenshot 2
- Assert the next action
- Continue until
task_complete
This helps you validate request/response handling before giving the agent access to a live environment.
Holo3 vs Claude Computer Use vs OpenAI Operator
| Capability | Holo3-122B | Holo3-35B | Claude Computer Use | OpenAI Operator |
|---|---|---|---|---|
| OSWorld-Verified | 78.85% | ~55% (est.) | ~65% | ~62% |
| API access | Yes | Yes, free tier | Yes | Yes |
| Open weights | No | Yes, Apache 2.0 | No | No |
| Self-hostable | No | Yes | No | No |
| Cost vs GPT-5.4 | Lower | Much lower | Comparable | GPT-5.4 pricing |
| Best for | Production enterprise | Dev, testing, OSS | Anthropic ecosystem | OpenAI ecosystem |
The practical choice depends on your stack:
- Use Holo3-122B if you need peak accuracy on complex multi-app workflows and reliability matters more than model availability.
- Use Holo3-35B for development, testing, open-source projects, or self-hosting.
- Use Claude Computer Use if your team is already committed to the Anthropic ecosystem.
- Use OpenAI Operator if your team already uses GPT-5.4 and wants a single vendor relationship.
Enterprise use cases
Holo3 is most useful when there is no clean API-based solution.
Legacy system data entry
Older ERP and CRM systems often have no REST API. Holo3 can navigate the desktop UI and enter or extract data without requiring a modernization project.
Cross-platform reconciliation
Example workflow:
- Open a PDF invoice
- Extract a total
- Compare it against an internal spreadsheet
- Update a third-party dashboard
- Send a confirmation message
This is difficult to automate with APIs when each system has a different interface or no integration support.
Regression testing for web apps
Instead of maintaining Selenium scripts tied to element IDs, you can point Holo3 at a staging environment with a plain-language task description.
This does not remove the need for deterministic tests, but it can reduce selector maintenance for UI flows where layout changes frequently.
Competitive intelligence
Holo3 can browse and extract structured data from websites where standard scraping is blocked or brittle.
H Company's H Corporate Benchmarks show strong results across E-commerce, Business software, Collaboration, and Multi-App workflows. The largest gap appears in Multi-App workflows, where the model has to reason across several applications without losing task state.
What's next: Adaptive Agency
H Company describes its next direction as Adaptive Agency: models that can navigate software they have not seen during training.
Current computer use models, including Holo3, still perform best on software patterns represented in their training environments. A custom internal tool with unfamiliar UI structure may reduce success rates.
Adaptive Agency aims to close that gap. The goal is for the model to reason about a new software interface on first contact, infer how it works, and execute tasks without prior training data for that exact application.
If H Company delivers this, it would address a major limitation for enterprise computer use automation.
Conclusion
Holo3 raises the bar for desktop computer use models. Its 78.85% OSWorld-Verified score puts it ahead of Claude and GPT-based alternatives on complex multi-step tasks. The Holo3-35B-A3B free tier and Apache 2.0 open weights also make it practical for developers to test without upfront cost.
The implementation pattern is simple:
screenshot -> API request -> action -> execute -> repeat
The hard part is making that loop safe and reliable. Apidog helps by validating response structures, mocking API responses during development, and running test scenarios before automation touches live systems.
If you are building workflows that interact with desktop GUIs, test the Holo3 integration path early and validate it before production.
FAQ
What is Holo3?
Holo3 is a computer use AI model from H Company. It takes screenshots as input and returns actions such as clicks, keystrokes, and scrolls to complete tasks on a desktop or browser. It scores 78.85% on OSWorld-Verified.
Is Holo3 open source?
The smaller Holo3-35B-A3B variant is open-weight under Apache 2.0 and downloadable from HuggingFace. The flagship Holo3-122B-A10B is API-only. Both are available through H Company's inference API, with a free tier for the 35B model.
How does OSWorld work?
OSWorld tests AI agents on real computer tasks such as web navigation, file management, and cross-app workflows. Success is verified by checking the actual system state after the agent runs, not by evaluating generated text.
How does Holo3 compare to Claude Computer Use?
Holo3-122B scores higher on OSWorld-Verified: 78.85% versus approximately 65% for Claude. H Company also states it is cheaper per task. Claude Computer Use remains a practical option for teams already using the Anthropic API.
Can I run Holo3 locally?
Yes, if you use Holo3-35B-A3B. The weights are available on HuggingFace under Apache 2.0. The 122B model is inference API only.
What are the main use cases for computer use APIs?
Common use cases include legacy system automation, cross-app data workflows, web app regression testing without brittle selectors, competitive intelligence scraping, and desktop workflows that currently require manual human interaction.
How do I test my Holo3 API integration?
Use Apidog to create the request, configure environment-based authentication, add response validation assertions, mock the API during development, and chain requests into test scenarios.
What is Adaptive Agency?
Adaptive Agency is H Company's roadmap direction for models that can navigate enterprise software they have never seen before. The goal is for the agent to learn the UI structure in real time instead of relying only on prior training data.


Top comments (0)