On interfaces, accessibility, wicked problems, and the illusion that machines can write code for us.
The first time I used NotebookLM, I fed it an essay I'd been meaning to read for months: Vannevar Bush's "As We May Think," published in The Atlantic in 1945. I knew it was the essay that predicted the internet. What I didn't know was that it was about something much more fundamental.
Bush's central concern was that humanity generates knowledge faster than any individual can absorb it — that the collective output of science, culture, and experience had outgrown the tools we had for navigating it. He proposed a machine called the Memex to help: not a machine that thinks for us, but one that helps us think. A tool for making associative trails through the ever-expanding record of human knowledge, mirroring the way thought actually moves — by meaning, by context, not by alphabetical index.
The synchronicity wasn't lost on me. I was using an AI tool to navigate a piece of that very record — to absorb an essay I hadn't found the time to read. And the tool worked — it helped me absorb the essay faster, make connections, follow the thread. It was doing, in a small way, what Bush imagined eighty years ago: amplifying my thought, not replacing it.
But here's what stayed with me. Bush wasn't just predicting a technology. He was drawing a line. On one side: technology that extends what humans can do. On the other: technology that substitutes for what humans do. He was firmly, explicitly, on the side of amplification.
That line matters now more than ever. Because there is a quiet assumption that most people hold about software: that it is, fundamentally, a problem of logic. You define what you want, you write the rules, the machine executes them. Input, process, output. Deterministic. Clean.
This assumption is wrong. And it is the root of a much larger misunderstanding — one that shapes how we think about AI-generated code, about the role of developers, and about what it actually means to build something that people use.
I. The deterministic illusion
Software runs on deterministic machines. Given the same input, a CPU will produce the same output, every single time. This is a fact about execution, and it is what makes computers useful.
But somewhere along the way, we confused the nature of running software with the nature of creating it. Because the machine is deterministic, we assumed that building software must also be a deterministic process: define the requirements, translate them into code, compile, ship. A pipeline. A factory.
Fred Brooks dismantled this illusion in 1986. In "No Silver Bullet," he distinguished between two kinds of complexity. Accidental complexity is what we create ourselves — the friction of our tools, the limitations of our languages, the overhead of our processes. This is the complexity that better tools can eliminate. Essential complexity is something else entirely. It comes from the problem itself: from what the software needs to do, for whom, in what context, under what constraints. No tool can remove it, because it is not an artifact of the process — it is the substance of it.
Brooks's claim was radical and, forty years later, still widely misunderstood: the hard part of software is not writing code. The hard part is deciding what to build. And that decision is irreducibly human.
In 1992, Jack Reeves took this further with an essay that should be required reading for anyone who writes code. In "What Is Software Design?", he argued that source code is not the construction of software — it is the design. Compiling is construction: it's cheap, it's automatic, you press a button. But the act of writing code? That is a design activity. It is creative, contextual, full of trade-offs and judgment calls. It is, in the deepest sense, a human endeavor.
If code is design, then the developer is not a technician executing a blueprint. The developer is a designer making decisions that directly shape how people experience a product. Every line of code carries assumptions about who will use it, how, and why.
II. Interfaces are wicked problems
In 1973, Horst Rittel and Melvin Webber published a paper that gave vocabulary to something designers and planners had always felt but could never quite name. They called it "wicked problems" — not wicked in the moral sense, but in the sense of being deeply resistant to the kind of linear, analytical problem-solving that works so well in engineering and mathematics.
Wicked problems have no definitive formulation. They have no stopping rule — no signal that tells you the problem is solved. Their solutions are not true or false, only better or worse. Every attempt at a solution changes the problem itself. And they are essentially unique: what works in one context may fail in another.
User interfaces are wicked problems. Not metaphorically — structurally. Consider:
- There is no single correct interface for any given task. The "right" design depends on the user, their device, their abilities, their context, their mental model, their culture, their patience, their internet connection.
- Interfaces are never finished. Every new feature, every new user, every new device introduces new interactions, new edge cases, new failures.
- You cannot test your way to a perfect interface. You can only test your way to a less wrong one, because the space of human variability is infinite.
- Every design decision carries consequences that ripple outward. A color choice excludes people with low vision. A navigation pattern confuses people using screen readers. A gesture-based interaction locks out people with motor disabilities. A modal dialog that "works" for sighted users traps keyboard users in an invisible cage — or, conversely, lets their focus escape into a background that is supposed to be inert.
Richard Buchanan connected wicked problems to design thinking in 1992, arguing that what many people dismiss as "impossible" may actually be a failure of imagination — one that better design thinking can overcome. Don Norman, through decades of work from The Design of Everyday Things to Living with Complexity, showed that the complexity of our tools must match the complexity of our lives. Simplifying an interface beyond a certain point doesn't make it better — it makes it dishonest about the problem it's trying to solve.
The temptation, always, is to treat interfaces as tame problems. Define the requirements. Implement the spec. Check the boxes. Ship. But the users were never part of the spec, not really — not in all their unpredictable, embodied, diverse humanity.
III. HTML: the most human language in code
Here is something people rarely say about HTML: it is beautiful.
Not in the way that an elegant algorithm is beautiful, or a well-factored function. HTML is beautiful because it is a language of meaning. It is a declarative API — arguably the most successful one ever created — whose purpose is not to instruct a machine but to communicate intent.
When you write <nav>, you are not telling the browser to render a box. You are saying: this is navigation. When you write <button>, you are not just creating a clickable element — you are declaring that this element is interactive, that it can receive focus, that it responds to keyboard events, that it announces itself to assistive technology as something a person can press. A <button> carries decades of human expectations in six characters.
HTML is not a programming language in the classical sense. It has no conditionals, no loops, no variables. And yet it may be the most consequential language on the web, because it is the layer where machine logic meets human meaning. It is the interface between the accessibility tree and the person. Between the browser and the body.
And it is also, without question, the most widely written and most widely written poorly language on the web. Because it is forgiving. Because a <div> with an onclick looks the same as a <button> — to sighted users, on a good day, with a mouse. The damage is invisible until someone tries to navigate with a keyboard, or a screen reader, or a switch device. And by then, the developer has moved on.
This is the paradox: the language that carries the most human meaning is treated as though it carries the least. Because it "works." Because it renders. Because the tests pass.
IV. What the machines see
In 2024, Michael Fairchild at Microsoft built a tool to answer a simple question: when you ask an LLM to generate HTML for common UI components, how accessible is the result?
The answer, documented in the A11y LLM Eval project, is sobering. Without explicit accessibility instructions, most models score near zero. The average across all tested models is around 10%. Even the best-performing model only passes 41% of tests. And these are not obscure edge cases — they are fundamental patterns like forms, navigation, dialogs, and data tables.
An important caveat: these percentages do not mean "41% accessible according to WCAG." The tool evaluates code against axe-core (an automated accessibility scanner) plus a set of custom assertions per component. But automated tools can only catch a fraction of real accessibility barriers — perhaps 30-40% of WCAG criteria. Everything that requires human judgment — whether the reading order makes sense, whether an alt text is actually meaningful, whether a focus management pattern is appropriate for the context, whether the right ARIA design pattern was chosen — is beyond what any scanner can measure. So when we say "90% pass rate with good instructions," we mean 90% of the automatable tests pass. The full picture still requires a human who understands what accessible means.
Fairchild is explicit about this: these tests do not fully evaluate WCAG or guarantee accessible results. Manual testing remains essential.
With that caveat in mind, the results are still revealing. Why do models perform so poorly by default? Because approximately 95% of websites have accessibility issues. The training data is the web, and the web is broken. LLMs don't learn what HTML should be — they learn what HTML is, statistically. And statistically, it's <div> soup.
But Fairchild's research reveals something even more interesting. When you add explicit accessibility instructions — custom instruction files that tell the model to use semantic HTML, follow WCAG, ensure keyboard support — the automatable pass rates jump dramatically. A minimal instruction ("All output MUST be accessible") alone produced an 18 percentage point improvement. Detailed, expert-level guidance pushed some models from near-zero to over 90%.
Think about what this means. The models can generate code that passes automated checks. The knowledge is somewhere in the weights. But without a human explicitly asking for it — without someone who understands accessibility framing the problem — the model defaults to the statistical average. The inaccessible average. And even at its best, it still cannot evaluate the things that matter most: the human things, the contextual things, the wicked things.
This is not a bug. This is the fundamental nature of what LLMs do: they index. They find the most probable pattern given the input. Vannevar Bush warned us about this in 1945 — that the artificiality of indexing systems would always be a poor substitute for the associative, contextual, meaning-driven way humans actually think. LLMs are the most sophisticated indexing system ever built. They are extraordinary at finding what is common. They are structurally incapable of understanding what is right.
V. The old tension, resurfaced
There is a deeper reason why AI struggles with interfaces and accessibility, beyond training data. Interfaces are fundamentally human problems. They are wicked problems where context, intent, and user behavior all shape what the right solution looks like.
A modal is not a modal because of its markup. It is a modal because of what it means to the user in that moment: stop, pay attention, deal with this before you continue. A role="status" is not a technical annotation — it is a social contract: this information changed, and you should know, but I won't interrupt you. These are not pattern-matching problems. They are problems of human communication, encoded in HTML.
AI approaches this as a statistical exercise: given this prompt, what code is most likely? But accessibility — and interface design more broadly — requires understanding why a pattern exists, not just what it looks like. It requires knowing that a person using a screen reader will navigate by headings, not linearly. That a person with low vision needs more than a change in opacity to perceive a state change. That a person with a motor disability needs every interactive element to be reachable by keyboard. That these are not edge cases — they are the full spectrum of how humans meet technology.
Kat Holmes calls this the "mismatch" — the idea that disability is not in the person but in the gap between the person and the design. Sara Hendren goes further: nearly everything we build is assistive technology, designed to bridge the gap between body and world. The question is whether we acknowledge this or pretend the gap doesn't exist.
What AI brings to the table is not a new problem. It is an old tension, resurfaced with new urgency: the temptation to reduce a human problem to pure logic. Herbert Simon dreamed of this in The Sciences of the Artificial — a world where design could be formalized, optimized, computed. Fred Brooks answered him: the complexity of software is essential, not accidental. You cannot abstract it away without abstracting away its meaning.
The illusion is seductive. If software were truly deterministic — if interfaces were tame problems with optimal solutions — then of course machines could write them better than we can. They are faster, cheaper, tireless. But interfaces are not tame. They are wicked. They involve bodies, contexts, cultures, disabilities, emotions, expectations. They involve people. And people resist reduction to statistics.
VI. As we may code
I started this essay describing a small moment: using an AI tool to read a text about the limits of human processing. The tool helped. It genuinely did. It compressed time, surfaced connections, let me absorb an 80-year-old essay in a fraction of the time it would have taken otherwise. Bush would have recognized it as a Memex — a machine that extends human thought.
But the understanding that came from that essay — the realization that its argument about amplification versus substitution was directly relevant to my own work in accessibility and code — that was not something the tool provided. That was association. That was meaning. That was the intricate web of trails carried by the cells of a brain that has spent years thinking about HTML, about screen readers, about the gap between what we build and who we build it for.
Eighty years after Bush, we have tools that can generate code at extraordinary speed. They can produce a login form in seconds, a dashboard in minutes, an entire landing page before you finish your coffee. But speed is not understanding. And production is not design.
The question Bush asked is still open: does our technology amplify human thought, or does it replace it? With LLMs applied to code, we are in danger of choosing the second option — not by conscious decision, but by the quiet assumption that software is a deterministic problem that machines can solve. It isn't. It never has been.
The developers, designers, and accessibility specialists who understand this are not being replaced by AI. They are becoming more essential. Because someone needs to write the custom instructions that turn a 0% automated pass rate into a 90% one — and then do the manual work that covers everything automation can't. Someone needs to know that a <dialog> is not just a <div> with a backdrop. Someone needs to understand that the user on the other side of the screen might not see, might not hear, might not use a mouse, might not think the way the designer assumed — and that this is not an edge case but the human condition.
HTML is a declarative API of meaning. The most successful one ever built. It was designed to be read by machines and understood by people. It is, in its quiet way, a bridge between logic and humanity. And the decision of which element to use — <button> or <div>, <nav> or <div>, <dialog> or <div> — is never a technical decision. It is a decision about who gets to participate.
AI can help us write that code faster. But it cannot tell us what it means.
References
- Bush, V. (1945). "As We May Think." The Atlantic Monthly, July 1945.
- Rittel, H.W. & Webber, M.M. (1973). "Dilemmas in a General Theory of Planning." Policy Sciences, 4(2), 155–169.
- Brooks, F. (1986). "No Silver Bullet: Essence and Accident in Software Engineering." Proceedings of the IFIP Tenth World Computing Conference.
- Reeves, J.W. (1992). "What Is Software Design?" The C++ Journal.
- Buchanan, R. (1992). "Wicked Problems in Design Thinking." Design Issues, 8(2), 5–21.
- Norman, D. (2013). The Design of Everyday Things. Revised and expanded edition. Basic Books.
- Norman, D. (2010). Living with Complexity. MIT Press.
- Simon, H. (1996). The Sciences of the Artificial. 3rd edition. MIT Press.
- Holmes, K. (2018). Mismatch: How Inclusion Shapes Design. MIT Press.
- Hendren, S. (2020). What Can a Body Do? How We Meet the Built World. Riverhead Books.
- Fairchild, M. (2026). "Embedding Accessibility into AI-based Software Development." Tool: github.com/microsoft/a11y-llm-eval.
Top comments (0)