On April 27, 1981, Xerox introduced the Star 8010 Information System — the first commercial computer with a graphical user interface.
Bitmapped display, desktop metaphor, icons, windows, mouse, WYSIWYG. Everything we take for granted about modern computing started with a $16,595 workstation that most people never used.
Today marks the 45th anniversary of that moment.
Five Milestones in 45 Years
The GUI's history can be traced through a handful of defining moments:
- 1981 · Xerox Star: GUI is born. The desktop metaphor becomes the foundational paradigm for human-computer interaction.
- 1984 · Macintosh: Apple brings GUI to the consumer market. Computing becomes visual for everyone.
- 1995 · Windows 95: The Start menu and taskbar. GUI becomes the global default.
- 2007 · iPhone: Multi-touch replaces the mouse. GUI extends from desktops to pockets.
- 2025–2026 · GUI Agents: AI learns to "see" screens and operate them autonomously.
The first four milestones share one constant: the user is always a human. Interface design revolves around human visual cognition — icons should be intuitive, layouts should follow natural eye movement, interactions should provide instant feedback.
The fifth milestone introduces a fundamental shift: the "user" can be an AI.
When AI Becomes the GUI Operator
Over the past two years, GUI Agents have emerged as a distinct technical direction. The core idea: train AI models to operate computers the way humans do — by looking at the screen and performing mouse/keyboard actions.
This is fundamentally different from traditional automation:
| Approach | Dependency | Coverage |
|---|---|---|
| API/CLI | Target system must expose an API | Only apps with APIs |
| DOM/CDP parsing | Requires browser internals or accessible widget trees | Primarily web apps |
| Pure vision | None — works with any GUI | Any application with a visual interface |
The vision-based approach inherits the exact principle that Xerox Star's designers articulated 45 years ago: a GUI should be self-explanatory — you should be able to understand how to use it just by looking at it. Back then, that capability belonged to humans. Now AI is developing it too.
Mano-P: A Vision-Only GUI Agent for Edge Devices
Mininglamp Technology open-sourced Mano-P under the Apache 2.0 license, taking a vision-only approach to GUI automation. Mano-P uses a GUI-VLA (Vision-Language-Action) architecture that integrates visual understanding, language reasoning, and action generation in a single end-to-end model.
Benchmark Results
- OSWorld (verified, specialized model): Mano-P 72B achieves 58.2% accuracy, ranking #1 (runner-up: 45.0%)
- WebRetriever Protocol I: 41.7 NavEval (ranked #1), surpassing Gemini 2.5 Pro (40.9) and Claude 4.5 (31.3)
On-Device Performance
The 4B quantized model (w4a16) runs locally on Apple M4 Macs:
- Prefill: 476 tokens/s
- Decode: 76 tokens/s
- Peak memory: 4.3 GB
- Fully local execution — screen captures and task data never leave the device
Hardware requirements: Mac with Apple M4 chip + 32GB RAM, or any Mac with a Mano-P Compute Stick (USB 4.0).
Technical Approach
- Bidirectional self-reinforcement learning (Text ↔ Action cyclic consistency)
- Three-stage training: SFT → Offline RL → Online RL
- Think-act-verify reasoning loop
- GS-Pruning for visual token reduction, optimizing edge inference
Full Circle
Forty-five years ago, the Xerox Star taught humans to interact with computers through visual interfaces. Today, AI agents are learning to do the same thing — looking at pixels, understanding layouts, clicking buttons.
The Xerox Star was a commercial failure but a technical triumph. Its design DNA — bitmapped displays, the desktop metaphor, WYSIWYG — lives on in every Mac, PC, phone, and tablet. GUI Agents are the next chapter: the interface designed for human eyes turns out to work for AI eyes too.
The GUI hasn't changed. What changed is who's looking at the screen.
GitHub: github.com/Mininglamp-AI/Mano-P
Technical Report: arXiv:2509.17336
Mano-P is developed by Mininglamp Technology and released under the Apache 2.0 license.

Top comments (0)