Wanda

Posted on Apr 2 • Originally published at apidog.com

Holo3:The best Computer Use Model ?

TL;DR

H Company launched Holo3 on March 31, 2026—a mixture-of-experts model scoring 78.85% on OSWorld-Verified, setting a new high on the leading desktop computer use benchmark. It outperforms GPT-5.4 and Opus 4.6 at a lower cost. The API is live, and the 35B variant is open-weight on HuggingFace under Apache 2.0.

Try Apidog today

The computer use gap most developers haven't solved

You've automated your APIs and streamlined your CI/CD pipeline, but automating legacy enterprise software, old desktop apps, and multi-step workflows across several UIs remains a challenge. RPA tools (like UiPath, Automation Anywhere) usually rely on brittle screen-coordinate scripts that are prone to breaking with UI changes. Manual work has often been the fallback.

Computer use AI models solve this by interpreting screenshots and issuing GUI actions—click, type, scroll—allowing automation of any GUI, regardless of API support. Holo3, released by H Company, is currently the most capable public model for these tasks.

💡 If you’re automating workflows or testing pipelines involving desktop software, Holo3’s API is worth integrating. Below, learn exactly how to connect Holo3 calls into your workflow using Apidog.

What is Holo3?

Holo3 is a computer use model: provide a screenshot and a task description, and it returns a set of actions (clicks, keystrokes, scrolls) to execute on that UI. Repeat the process—screenshot, task, action—until the workflow completes.

Variants:

Holo3-122B-A10B: Flagship, 122B parameters (10B active). Hosted API only (details). Top performance.
Holo3-35B-A3B: 35B parameters (3B active). Open-weight on HuggingFace (Apache 2.0). Free API tier and self-hostable.

The MoE architecture means only a subset of parameters are used per token, making Holo3 significantly cheaper to run than parameter count alone suggests. H Company claims Holo3-122B-A10B is less expensive per task than GPT-5.4 and Opus 4.6.

OSWorld-Verified: what the benchmark actually measures

OSWorld-Verified is the main benchmark for AI computer use. Unlike text-only benchmarks, OSWorld evaluates execution: the agent must complete real desktop tasks, and success is determined by the post-task system state.

Task coverage:

Single-app tasks (e.g., open a file, fill a form)
Cross-app workflows (e.g., extract data from PDF, update spreadsheet, send email)
Long-horizon, multi-app sequences requiring context retention

Holo3-122B-A10B scores 78.85% on OSWorld-Verified. For reference, the previous state-of-the-art was ~40%; top models from Anthropic and OpenAI were in the 60–65% range.

H Company’s internal benchmarks show Holo3 excels in multi-app workflows—tasks that require reasoning and action across several applications at once.

How Holo3 was trained: the Agentic Learning Flywheel

Unlike most models trained on static demos, H Company uses a continuous loop called the Agentic Learning Flywheel:

Synthetic Navigation Data: Human and AI-generated instructions create navigation scenarios.
Out-of-Domain Augmentation: Programmatic extensions cover unexpected UI states and edge cases.
Curated Reinforcement Learning: Each example is filtered and used in RL to directly maximize task completion.

Training data comes from the Synthetic Environment Factory: coding agents build full enterprise web applications from scenario specs, creating realistic, verifiable training environments.

This approach enables Holo3 to outperform much larger base models on benchmark tasks.

How to call the Holo3 API

The Holo3 API uses a standard screenshot-action loop. Here’s how to implement it:

1. Set up authentication

# H Company Inference API base URL
https://api.hcompany.ai/v1

# Headers
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Get your API key at hcompany.ai/holo-models-api. The free tier covers Holo3-35B-A3B.

2. Send a screenshot with a task

import base64
import httpx
import pyautogui

# Capture a screenshot
screenshot = pyautogui.screenshot()
screenshot.save("/tmp/screen.png")

with open("/tmp/screen.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.hcompany.ai/v1/computer-use",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "holo3-122b-a10b",
        "task": "Open the invoice folder and find the most recent PDF",
        "screenshot": image_b64,
        "screen_width": 1920,
        "screen_height": 1080
    }
)

action = response.json()
print(action)

3. Parse and execute the action

API responses are structured actions to execute:

{
  "action_type": "click",
  "coordinate": [245, 380],
  "reasoning": "The invoice folder icon is visible at this position"
}

Action types: click, double_click, right_click, type, key, scroll, screenshot_request (model needs a fresh view), and task_complete.

4. Loop until completion

def run_computer_use_task(task: str, max_steps: int = 20):
    for step in range(max_steps):
        screenshot = capture_screen()
        response = call_holo3_api(task, screenshot)
        action = response["action"]

        if action["action_type"] == "task_complete":
            print(f"Done in {step + 1} steps")
            return response["result"]

        execute_action(action)

    raise TimeoutError("Task not completed within step limit")

Testing Holo3 API calls with Apidog

To ensure robust integration, use Apidog:

Import the endpoint: In Apidog, create a new HTTP request to https://api.hcompany.ai/v1/computer-use. Set the Authorization header as an environment variable.
Set up request validation: Use Apidog's test assertions to validate response structure:

// In Apidog's post-response script
pm.test("Action type is valid", () => {
    const validActions = ["click", "type", "key", "scroll", "task_complete", "screenshot_request"];
    pm.expect(validActions).to.include(pm.response.json().action.action_type);
});

pm.test("Coordinates are within screen bounds", () => {
    const action = pm.response.json().action;
    if (action.coordinate) {
        pm.expect(action.coordinate[0]).to.be.within(0, 1920);
        pm.expect(action.coordinate[1]).to.be.within(0, 1080);
    }
});

Mock the API during development: Use Smart Mock to simulate Holo3 responses, saving credits and enabling parallel frontend/backend dev.
Run test scenarios: Chain multiple Holo3 requests in a Test Scenario to simulate and validate full multi-step workflows before running on live systems.

Holo3 vs Claude Computer Use vs OpenAI Operator

	Holo3-122B	Holo3-35B	Claude Computer Use	OpenAI Operator
OSWorld-Verified	78.85%	~55% (est.)	~65%	~62%
API access	Yes	Yes (free tier)	Yes	Yes
Open weights	No	Yes (Apache 2.0)	No	No
Self-hostable	No	Yes	No	No
Cost vs GPT-5.4	Lower	Much lower	Comparable	GPT-5.4 pricing
Best for	Production enterprise	Dev/testing/OSS	Anthropic ecosystem	OpenAI ecosystem

Choose based on your stack:

Holo3-122B: For maximum accuracy on complex workflows; cost is secondary to reliability.
Holo3-35B: For development, testing, open source, or if you want to self-host.
Claude Computer Use: If you’re already in the Anthropic ecosystem.
OpenAI Operator: If you’re using GPT-5.4 and want single-vendor integration.

Enterprise use cases

Holo3 enables automation for workflows with no clean API-based solution:

Legacy system data entry: Automate data entry/extraction in old ERP and CRM systems without APIs.
Cross-platform reconciliation: Pull data from PDFs, check against spreadsheets, update dashboards—end-to-end.
Web app regression testing: Use Holo3 for plain-language task automation, avoiding brittle Selenium selectors.
Competitive intelligence: Browse and extract structured data from sites that block typical scraping.

Holo3 performs well across E-commerce, Business software, Collaboration, and especially Multi-App workflows.

What’s next: Adaptive Agency

H Company is developing Adaptive Agency—models that can navigate and learn new, bespoke enterprise software in real time, beyond what they've seen in training. The goal is on-the-fly reasoning about unfamiliar software structures and workflows.

If delivered, this will eliminate the last major gap in computer use AI for enterprise deployment.

Conclusion

Holo3 sets a new standard for desktop automation with 78.85% on OSWorld-Verified, outperforming Claude and GPT-based models on complex tasks. The free tier and open weights for Holo3-35B-A3B make it easy for developers to start.

The integration workflow is simple: screenshot, POST to API, execute action, repeat. Apidog streamlines this process—validating responses, mocking APIs during development, and running test scenarios before production.

If you’re building desktop GUI automation, start with Apidog and verify your Holo3 integration before deploying to production.

FAQ

What is Holo3?

Holo3 is a computer use AI model from H Company that takes screenshots as input and returns actions (clicks, keystrokes, scrolls) to complete tasks on a desktop or browser. It scores 78.85% on the OSWorld-Verified benchmark.

Is Holo3 open source?

Holo3-35B-A3B is open-weight under Apache 2.0 on HuggingFace. Holo3-122B-A10B is API-only. Both are available through H Company's inference API; 35B has a free tier.

How does the OSWorld benchmark work?

OSWorld tests AI agents on real computer tasks—web navigation, file management, cross-app workflows. Success is verified by checking the post-task system state. Tasks range from single-app to complex multi-app sequences.

How does Holo3 compare to Claude Computer Use?

Holo3-122B scores higher on OSWorld-Verified (78.85% vs ~65% for Claude) and is cheaper per task. Claude remains a solid choice if you’re already using Anthropic APIs.

Can I run Holo3 locally?

Yes, with Holo3-35B-A3B (Apache 2.0, HuggingFace). The 122B model is API-only.

What are main use cases for computer use APIs?

Legacy system automation (no REST API), cross-app workflows, web app regression testing without brittle selectors, competitive intelligence scraping, and any desktop workflow requiring manual intervention.

How do I test my Holo3 API integration?

Use Apidog to import endpoints, set up response validation, mock the API, and chain requests into test scenarios.

What is "Adaptive Agency" in Holo3's roadmap?

H Company is developing models that can navigate and reason about software they’ve never encountered, learning UI structure in real time—removing the final barrier for enterprise-scale computer use AI.

DEV Community