DEV Community

Mininglamp
Mininglamp

Posted on

Happy 45th Birthday, GUI. Meet Your New Power User.

On April 27, 1981, Xerox introduced the Star 8010 Information System — the first commercial computer with a graphical user interface.

Bitmapped display, desktop metaphor, icons, windows, mouse, WYSIWYG. Everything we take for granted about modern computing started with a $16,595 workstation that most people never used.

Today marks the 45th anniversary of that moment.

Five Milestones in 45 Years

The GUI's history can be traced through a handful of defining moments:

  • 1981 · Xerox Star: GUI is born. The desktop metaphor becomes the foundational paradigm for human-computer interaction.
  • 1984 · Macintosh: Apple brings GUI to the consumer market. Computing becomes visual for everyone.
  • 1995 · Windows 95: The Start menu and taskbar. GUI becomes the global default.
  • 2007 · iPhone: Multi-touch replaces the mouse. GUI extends from desktops to pockets.
  • 2025–2026 · GUI Agents: AI learns to "see" screens and operate them autonomously.

The first four milestones share one constant: the user is always a human. Interface design revolves around human visual cognition — icons should be intuitive, layouts should follow natural eye movement, interactions should provide instant feedback.

The fifth milestone introduces a fundamental shift: the "user" can be an AI.

When AI Becomes the GUI Operator

Over the past two years, GUI Agents have emerged as a distinct technical direction. The core idea: train AI models to operate computers the way humans do — by looking at the screen and performing mouse/keyboard actions.

This is fundamentally different from traditional automation:

Approach Dependency Coverage
API/CLI Target system must expose an API Only apps with APIs
DOM/CDP parsing Requires browser internals or accessible widget trees Primarily web apps
Pure vision None — works with any GUI Any application with a visual interface

The vision-based approach inherits the exact principle that Xerox Star's designers articulated 45 years ago: a GUI should be self-explanatory — you should be able to understand how to use it just by looking at it. Back then, that capability belonged to humans. Now AI is developing it too.

Mano-P: A Vision-Only GUI Agent for Edge Devices

Mininglamp Technology open-sourced Mano-P under the Apache 2.0 license, taking a vision-only approach to GUI automation. Mano-P uses a GUI-VLA (Vision-Language-Action) architecture that integrates visual understanding, language reasoning, and action generation in a single end-to-end model.

Benchmark Results

OSWorld Benchmark

  • OSWorld (verified, specialized model): Mano-P 72B achieves 58.2% accuracy, ranking #1 (runner-up: 45.0%)
  • WebRetriever Protocol I: 41.7 NavEval (ranked #1), surpassing Gemini 2.5 Pro (40.9) and Claude 4.5 (31.3)

On-Device Performance

The 4B quantized model (w4a16) runs locally on Apple M4 Macs:

  • Prefill: 476 tokens/s
  • Decode: 76 tokens/s
  • Peak memory: 4.3 GB
  • Fully local execution — screen captures and task data never leave the device

Hardware requirements: Mac with Apple M4 chip + 32GB RAM, or any Mac with a Mano-P Compute Stick (USB 4.0).

Technical Approach

  • Bidirectional self-reinforcement learning (Text ↔ Action cyclic consistency)
  • Three-stage training: SFT → Offline RL → Online RL
  • Think-act-verify reasoning loop
  • GS-Pruning for visual token reduction, optimizing edge inference

Full Circle

Forty-five years ago, the Xerox Star taught humans to interact with computers through visual interfaces. Today, AI agents are learning to do the same thing — looking at pixels, understanding layouts, clicking buttons.

The Xerox Star was a commercial failure but a technical triumph. Its design DNA — bitmapped displays, the desktop metaphor, WYSIWYG — lives on in every Mac, PC, phone, and tablet. GUI Agents are the next chapter: the interface designed for human eyes turns out to work for AI eyes too.

The GUI hasn't changed. What changed is who's looking at the screen.


GitHub: github.com/Mininglamp-AI/Mano-P
Technical Report: arXiv:2509.17336

Mano-P is developed by Mininglamp Technology and released under the Apache 2.0 license.

Top comments (0)