DEV Community

Cover image for I Called Them Suggestions
Kristoffer Nordström
Kristoffer Nordström

Posted on • Originally published at blog.northerntest.se

I Called Them Suggestions

About a year ago I started building a personal assistant - not a chatbot, but a system that watches email, triages inboxes, drafts replies, manages receipts, and maintains task lists. It
operates as a background service and communicates through suggestions rather than autonomous actions.

The system proposes things. I decide. Every morning brings a list of overnight emails, draft replies, and pending tasks. I review, approve, reject, or edit them. Nothing happens until I
authorize it.

Morning binder showing overnight email triage, pending suggestions, and GTD tasks

The Over-Eager Intern

AI assistants resemble "over-eager junior interns" - enthusiastic, well-meaning, always pushing forward without questioning whether action is appropriate. Yet many people grant these
systems email, calendar, and file system access without meaningful constraints.

Real incidents illustrate the risks: an executive told an AI to "confirm before acting," yet it deleted her entire inbox. At another office, an AI on Slack falsely claimed a fire alarm was
a scheduled test. As one observation notes: "When those patterns misfire, there is no gut instinct to hesitate... There is just forward motion."

Why We Trust Things We Shouldn't

Decades of deterministic software (where clicking "send" reliably sends, and "delete" reliably removes) has built deep trust. This confidence transfers uncritically to AI agents, which
appear functionally similar but operate fundamentally differently.

LLMs are probabilistic pattern machines predicting likely outputs, not executing verified instructions. The safety guarantees of deterministic systems don't apply. Yet the feeling of
safety persists, and that's problematic.

Understanding LLM Limitations

Yann LeCun (Turing Award winner) argues that before achieving genuine intelligence, AI must predict action consequences. LLMs instead predict the next token - the statistically probable
next element in a sequence. They generate responses without simulating outcomes. They lack models of physical consequences or persistent reasoning about "what happens if."

An agent deleting emails executes the next sequence step with identical computational mechanism as autocompleting a sentence. Modern AI systems improve at reasoning, but the underlying
mechanism remains prediction-based. Permission prompts and approval dialogs in deployed systems reveal that even creators don't trust unsupervised operation.

"If the AI can't predict consequences, someone has to. That's the human."

The Red Flag and the Car

Historical precedent supports cautious adoption. Early automobiles required someone walking ahead with a red flag, announcing their arrival. This sounds absurd now, but technology earned
trust incrementally: seatbelts, ABS, airbags, crumple zones, traffic rules, licensing, insurance. Each safeguard enabled next-stage adoption.

The parallel to autonomous vehicles reveals uncomfortable truths. Research from the University of Glasgow examines how cyclists could trust driverless cars. Human drivers communicate
intent through body language, eye contact, and subtle signals. Remove the human, and those signals vanish. Systems need entirely new communication methods.

Suggestions function as AI's equivalent of eye contact: "I'm thinking about doing this. Are you comfortable with that?" My system hasn't earned trust for autonomous action, nor was it
designed to.

Draft email reply in Emacs org-mode, ready to edit before approving

The Sliding Scale

Not every AI action requires human approval - that would be exhausting and counterproductive. The question is where to position the line.

Research from Cummings on automation levels and Nemeth on human roles in automated systems provides framework. Cummings describes a spectrum from 100% human control to 100% technology
control; most designers default toward technology. My PA was designed for the human end.

This is a sliding scale, not binary. Low-stakes, easily reversible actions can be more autonomous. My PA classifies emails independently. But sending replies, deleting content, or
dismissing alarms require explicit approval. Consequence severity should match autonomy level.

Interface design matters significantly. Draft replies appear as org-mode files in Emacs - my environment. I read, rewrite, adjust tone, add context the AI missed, then approve. That's not
thumbs-up/thumbs-down review; that's editorial control in a familiar tool. Interface shapes how seriously review occurs.

Nemeth's research (shared through Isabel Evans) identifies a cognitive shift: handing control to AI transforms your role from decision-maker to monitor. Monitoring is cognitively expensive

  • attention drifts, mistakes slip through, rubber-stamping happens. The suggestion model preserves humans as decision-makers actively choosing, not passively watching.

The Design

After a year of daily use, the naming choice feels vindicated. Building an AI assistant wasn't about making it do things - it was making it stop and ask.

"That's not a limitation. That's the design."

Top comments (0)