Computer use is still the most engaging demo in AI today. Typing a request in plain language and then seeing an agent independently navigate an obtuse website, test code end-to-end, or complete a form feels like witnessing an automated future. There's something off to me here, though.
Computer and browser use demos highlight a limitation in how we are currently designing AI interactions. There are undeniable engineering achievements behind browser and computer use models, workflows, and tools. But these workflows feel inherently retrofitted. They're an attempt to force a fundamentally new paradigm into a legacy form factor. Perhaps dramatically, it feels like the modern equivalent of tapping out text messages in Morse code.
For three years, typed chat has served as our default gateway to AI. I'm not convinced that typed chat will be the long-term interface for advanced intelligence. Using voice dictation to control computer use agents feels like a step in the correct direction. But forcing AI agents to mimic human mouse movements, keystrokes, and inputs is a temporary bridge rather than the final destination.
As the AI industry moves beyond initial adoption, new approaches to interface and interaction design should be a focus. As great as the demos are, I hope next year's frontier demos are wildly different from what we have today.
Top comments (0)