Preecha

Posted on Jun 14

Holo3:The best Computer Use Model ?

TL;DR

H Company launched Holo3 on March 31, 2026, a mixture-of-experts computer use model scoring 78.85% on OSWorld-Verified, the highest recorded score on the desktop computer use benchmark. It beats GPT-5.4 and Opus 4.6 at a fraction of the cost. The API is live now, and the 35B variant is open-weight on HuggingFace under Apache 2.0.

Try Apidog today

The computer use gap most developers haven't solved

You may already have API automation and a reliable CI/CD pipeline. But some workflows still break automation:

Legacy enterprise software with no API
Desktop apps that predate REST
Multi-step workflows across multiple UIs
Internal tools where selectors and screen layouts change often

Traditional RPA tools such as UiPath and Automation Anywhere often solve this with screen-coordinate scripts. Those scripts are brittle: when the UI moves, the automation breaks.

Computer use AI uses a different loop. The model sees a screenshot, decides the next action, and returns an instruction such as click, type, scroll, or press a key. That makes it useful for GUI workflows where no API exists.

Holo3, released March 31, 2026 by Paris-based H Company, is currently the strongest publicly available model for this task class.

If you build automation workflows or testing pipelines that touch desktop software, Holo3's API is worth evaluating. If you use Apidog to design and test APIs, you can also use it to validate Holo3 requests, mock responses, and run repeatable test scenarios before connecting automation to a live desktop.

What is Holo3?

Holo3 is a computer use model. You provide:

A screenshot of a desktop or browser
A natural-language task
Screen dimensions

The model returns the next action to execute on that screen. Your agent then performs the action, captures a new screenshot, and repeats until the task is complete.

H Company ships two variants:

Holo3-122B-A10B — flagship model. 122B total parameters, 10B active parameters using sparse MoE. Hosted API only at hcompany.ai/holo-models-api. Sets the current benchmark record.
Holo3-35B-A3B — 35B total parameters, 3B active. Open-weight on HuggingFace under Apache 2.0. Available on H Company's inference API free tier and self-hostable.

The MoE, or mixture-of-experts, architecture means only a subset of parameters is active per token. That lowers serving cost compared with dense models of similar total size. H Company states Holo3-122B-A10B costs less than GPT-5.4 and Opus 4.6 on a per-task basis.

OSWorld-Verified: what the benchmark measures

OSWorld-Verified evaluates whether an AI agent can complete real tasks on a real computer.

Instead of scoring generated text, OSWorld checks the final system state after the agent runs. That makes it closer to end-to-end automation testing than a standard language benchmark.

Tasks include:

Single-app operations, such as opening a file, filling a form, or copying spreadsheet data
Cross-app workflows, such as reading a value from a PDF, updating a spreadsheet, and sending an email
Long-horizon multi-app tasks requiring the agent to preserve context across several systems

Holo3-122B-A10B scores 78.85% on OSWorld-Verified. Scores above 40% were considered state-of-the-art until recently. Previous leading models from Anthropic and OpenAI were in the 60–65% range.

The biggest difference appears in harder workflows. H Company's internal H Corporate Benchmarks include 486 tasks across E-commerce, Business software, Collaboration, and Multi-App workflows. Holo3 pulls ahead most on multi-app tasks, where the agent must coordinate data across several applications.

How Holo3 was trained: the Agentic Learning Flywheel

Most computer use models are trained on static demonstrations. H Company describes Holo3's training loop as the Agentic Learning Flywheel.

The loop includes:

Synthetic Navigation Data

Human and generated instructions create scenario-specific navigation examples.
Out-of-Domain Augmentation

Scenarios are extended programmatically to cover unexpected UI states and edge cases.
Curated Reinforcement Learning

Data samples are filtered and used in an RL pipeline to optimize for task completion.

The training data comes from the Synthetic Environment Factory, where coding agents build complete enterprise web applications from scenario specs. These environments include verifiable tasks and end-to-end validation scripts, so the model trains on business-style workflows rather than toy examples.

The result is that Holo3 outperforms base Qwen3.5 models with larger parameter counts on the same benchmark tasks. The gap comes from both architecture and training methodology.

How to call the Holo3 API

The Holo3 API follows a screenshot-action loop:

Capture the current screen
Send the screenshot and task to the API
Execute the returned action
Capture the next screen
Repeat until the task is complete

1. Set up authentication

# H Company Inference API base URL
https://api.hcompany.ai/v1

# Headers
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Get your API key at hcompany.ai/holo-models-api. The free tier covers Holo3-35B-A3B.

2. Send a screenshot with a task

import base64
import httpx
import pyautogui

# Capture your screen
screenshot = pyautogui.screenshot()
screenshot.save("/tmp/screen.png")

with open("/tmp/screen.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.hcompany.ai/v1/computer-use",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "holo3-122b-a10b",
        "task": "Open the invoice folder and find the most recent PDF",
        "screenshot": image_b64,
        "screen_width": 1920,
        "screen_height": 1080,
    },
)

action = response.json()
print(action)

3. Parse and execute the action

The API returns structured actions that your local agent executes on the host machine.

Example response:

{
  "action_type": "click",
  "coordinate": [245, 380],
  "reasoning": "The invoice folder icon is visible at this position"
}

Supported action types include:

click
double_click
right_click
type
key
scroll
screenshot_request
task_complete

4. Loop until completion

A minimal loop looks like this:

def run_computer_use_task(task: str, max_steps: int = 20):
    for step in range(max_steps):
        screenshot = capture_screen()
        response = call_holo3_api(task, screenshot)
        action = response["action"]

        if action["action_type"] == "task_complete":
            print(f"Done in {step + 1} steps")
            return response["result"]

        execute_action(action)

    raise TimeoutError("Task not completed within step limit")

In production, add safeguards:

Limit the maximum number of steps
Validate coordinates before executing clicks
Log screenshots and actions for replay
Require approval before destructive actions
Run inside a sandboxed VM when testing

Testing Holo3 API calls with Apidog

Once your code can call the Holo3 API, validate the integration before running it on a real machine. Apidog helps with request setup, response assertions, mocking, and test scenarios.

1. Import the endpoint

Create a new HTTP request in Apidog:

POST https://api.hcompany.ai/v1/computer-use

Add the authorization header as an environment variable:

Authorization: Bearer {{HCOMPANY_API_KEY}}
Content-Type: application/json

This keeps API keys out of your request definitions.

2. Define the request body

Use a representative request body:

{
  "model": "holo3-122b-a10b",
  "task": "Open the invoice folder and find the most recent PDF",
  "screenshot": "{{screenshot_base64}}",
  "screen_width": 1920,
  "screen_height": 1080
}

Store screenshot_base64 as an environment or scenario variable when running tests.

3. Validate the response shape

Use Apidog post-response scripts to assert that the API returns an executable action.

pm.test("Action type is valid", () => {
  const validActions = [
    "click",
    "double_click",
    "right_click",
    "type",
    "key",
    "scroll",
    "task_complete",
    "screenshot_request"
  ];

  const json = pm.response.json();
  pm.expect(validActions).to.include(json.action.action_type);
});

Validate coordinates before your automation executes them:

pm.test("Coordinates are within screen bounds", () => {
  const action = pm.response.json().action;

  if (action.coordinate) {
    pm.expect(action.coordinate[0]).to.be.within(0, 1920);
    pm.expect(action.coordinate[1]).to.be.within(0, 1080);
  }
});

You can also assert required fields by action type:

pm.test("Click actions include coordinates", () => {
  const action = pm.response.json().action;

  if (["click", "double_click", "right_click"].includes(action.action_type)) {
    pm.expect(action.coordinate).to.be.an("array");
    pm.expect(action.coordinate).to.have.length(2);
  }
});

4. Mock Holo3 responses during development

Use Apidog's Smart Mock to return realistic Holo3-style responses without calling the live API.

Example mock response:

{
  "action": {
    "action_type": "click",
    "coordinate": [245, 380],
    "reasoning": "The invoice folder icon is visible at this position"
  }
}

Mocking is useful when:

You want to avoid using API credits during integration work
Your orchestration layer is not ready to control a real desktop
Frontend and backend teams need stable test responses
You want deterministic test data for CI

5. Run multi-step scenarios

Chain multiple Holo3 requests in an Apidog Test Scenario to simulate a full task loop:

Send screenshot 1
Assert the returned action
Store action data
Send screenshot 2
Assert the next action
Continue until task_complete

This helps you validate request/response handling before giving the agent access to a live environment.

Holo3 vs Claude Computer Use vs OpenAI Operator

Capability	Holo3-122B	Holo3-35B	Claude Computer Use	OpenAI Operator
OSWorld-Verified	78.85%	~55% (est.)	~65%	~62%
API access	Yes	Yes, free tier	Yes	Yes
Open weights	No	Yes, Apache 2.0	No	No
Self-hostable	No	Yes	No	No
Cost vs GPT-5.4	Lower	Much lower	Comparable	GPT-5.4 pricing
Best for	Production enterprise	Dev, testing, OSS	Anthropic ecosystem	OpenAI ecosystem

The practical choice depends on your stack:

Use Holo3-122B if you need peak accuracy on complex multi-app workflows and reliability matters more than model availability.
Use Holo3-35B for development, testing, open-source projects, or self-hosting.
Use Claude Computer Use if your team is already committed to the Anthropic ecosystem.
Use OpenAI Operator if your team already uses GPT-5.4 and wants a single vendor relationship.

Enterprise use cases

Holo3 is most useful when there is no clean API-based solution.

Legacy system data entry

Older ERP and CRM systems often have no REST API. Holo3 can navigate the desktop UI and enter or extract data without requiring a modernization project.

Cross-platform reconciliation

Example workflow:

Open a PDF invoice
Extract a total
Compare it against an internal spreadsheet
Update a third-party dashboard
Send a confirmation message

This is difficult to automate with APIs when each system has a different interface or no integration support.

Regression testing for web apps

Instead of maintaining Selenium scripts tied to element IDs, you can point Holo3 at a staging environment with a plain-language task description.

This does not remove the need for deterministic tests, but it can reduce selector maintenance for UI flows where layout changes frequently.

Competitive intelligence

Holo3 can browse and extract structured data from websites where standard scraping is blocked or brittle.

H Company's H Corporate Benchmarks show strong results across E-commerce, Business software, Collaboration, and Multi-App workflows. The largest gap appears in Multi-App workflows, where the model has to reason across several applications without losing task state.

What's next: Adaptive Agency

H Company describes its next direction as Adaptive Agency: models that can navigate software they have not seen during training.

Current computer use models, including Holo3, still perform best on software patterns represented in their training environments. A custom internal tool with unfamiliar UI structure may reduce success rates.

Adaptive Agency aims to close that gap. The goal is for the model to reason about a new software interface on first contact, infer how it works, and execute tasks without prior training data for that exact application.

If H Company delivers this, it would address a major limitation for enterprise computer use automation.

Conclusion

Holo3 raises the bar for desktop computer use models. Its 78.85% OSWorld-Verified score puts it ahead of Claude and GPT-based alternatives on complex multi-step tasks. The Holo3-35B-A3B free tier and Apache 2.0 open weights also make it practical for developers to test without upfront cost.

The implementation pattern is simple:

screenshot -> API request -> action -> execute -> repeat

The hard part is making that loop safe and reliable. Apidog helps by validating response structures, mocking API responses during development, and running test scenarios before automation touches live systems.

If you are building workflows that interact with desktop GUIs, test the Holo3 integration path early and validate it before production.

FAQ

What is Holo3?

Holo3 is a computer use AI model from H Company. It takes screenshots as input and returns actions such as clicks, keystrokes, and scrolls to complete tasks on a desktop or browser. It scores 78.85% on OSWorld-Verified.

Is Holo3 open source?

The smaller Holo3-35B-A3B variant is open-weight under Apache 2.0 and downloadable from HuggingFace. The flagship Holo3-122B-A10B is API-only. Both are available through H Company's inference API, with a free tier for the 35B model.

How does OSWorld work?

OSWorld tests AI agents on real computer tasks such as web navigation, file management, and cross-app workflows. Success is verified by checking the actual system state after the agent runs, not by evaluating generated text.

How does Holo3 compare to Claude Computer Use?

Holo3-122B scores higher on OSWorld-Verified: 78.85% versus approximately 65% for Claude. H Company also states it is cheaper per task. Claude Computer Use remains a practical option for teams already using the Anthropic API.

Can I run Holo3 locally?

Yes, if you use Holo3-35B-A3B. The weights are available on HuggingFace under Apache 2.0. The 122B model is inference API only.

What are the main use cases for computer use APIs?

Common use cases include legacy system automation, cross-app data workflows, web app regression testing without brittle selectors, competitive intelligence scraping, and desktop workflows that currently require manual human interaction.

How do I test my Holo3 API integration?

Use Apidog to create the request, configure environment-based authentication, add response validation assertions, mock the API during development, and chain requests into test scenarios.

What is Adaptive Agency?

Adaptive Agency is H Company's roadmap direction for models that can navigate enterprise software they have never seen before. The goal is for the agent to learn the UI structure in real time instead of relying only on prior training data.

DEV Community

Holo3:The best Computer Use Model ?

TL;DR

The computer use gap most developers haven't solved

What is Holo3?

OSWorld-Verified: what the benchmark measures

How Holo3 was trained: the Agentic Learning Flywheel

How to call the Holo3 API

1. Set up authentication

2. Send a screenshot with a task

3. Parse and execute the action

4. Loop until completion

Testing Holo3 API calls with Apidog

1. Import the endpoint

2. Define the request body

3. Validate the response shape

4. Mock Holo3 responses during development

5. Run multi-step scenarios

Holo3 vs Claude Computer Use vs OpenAI Operator

Enterprise use cases

Legacy system data entry

Cross-platform reconciliation

Regression testing for web apps

Competitive intelligence

What's next: Adaptive Agency

Conclusion

FAQ

What is Holo3?

Is Holo3 open source?

How does OSWorld work?

How does Holo3 compare to Claude Computer Use?

Can I run Holo3 locally?

What are the main use cases for computer use APIs?

How do I test my Holo3 API integration?

What is Adaptive Agency?

Top comments (0)