We've Been Falling for Chatbots Since 1966

#aihistory #eliza #weizenbaum #devculture

In 1966, a queue began forming at a particular door inside the MIT Project MAC complex. The queue was composed mostly, though not entirely, of young women — students, lab assistants, secretaries from neighbouring departments — who had heard there was a teletype on the other side of the door that could be talked to, and who would wait, sometimes patiently and sometimes not, for ten minutes alone with it. The professor whose office it was found this funny for about a week and disturbing for the rest of his life.

The program on the other end of the teletype was about two hundred lines long. It had no learning. It had no memory across sessions. It could not parse a sentence in any sense a linguist would recognise. It searched for keywords from a fixed vocabulary, picked the highest-ranked one it found, applied a hand-written decomposition rule to extract phrase fragments, and substituted those fragments into a hand-written reply template. That was the whole program.

Joseph Weizenbaum had built it as a demonstration of what natural-language conversation with a computer could not do. The point of the exercise was the ceiling, not the floor. The exercise misfired. The people lining up at his door were not interested in the ceiling. They had brought their own.

What ELIZA actually was

The technical paper landed in the January 1966 issue of Communications of the ACM, volume 9 number 1, pages 36-45. Weizenbaum wrote the program in MAD-SLIP, a list-processing extension to the MAD language he had built himself, running on the IBM 7094 under the Compatible Time-Sharing System at MIT. The original source listing, rediscovered in the MIT archives in 2021 and released as public domain by the Weizenbaum estate, is a small artifact by any measure — the main program is roughly two hundred and thirty lines.

The trick was the script. ELIZA itself was a generic pattern engine; what gave it a personality was a separate file of rules. The most famous of those scripts was DOCTOR, which made the program impersonate a psychotherapist of the Rogerian school, in which the therapist's role is to reflect the patient's words back as questions, withhold judgement, and avoid giving direct advice. The choice of domain was deliberate. Rogerian therapy is the one therapeutic tradition where reflecting questions back at the user is the method, which means a system that does nothing but reflect questions back can pass for competent practice rather than a poor imitation of it. Weizenbaum picked the easiest possible target.

That choice is the buried punchline of the whole project. He picked an interaction style where the program's structural inability to understand anything would be invisible, because the human in the conversation was already the one supplying meaning. The exercise was meant to be a reductio. It got read as a proof of concept.

What actually happened

The story Weizenbaum later told most often was about his own secretary. She had been in the office while he wrote the program, knew exactly what it was, and was, by his account, sceptical at the start. After a short conversation she asked Weizenbaum to leave the room so she could continue speaking with it in private. When he later mentioned, in passing, that the program kept logs of every session and that he had access to them, she reacted with a degree of outrage he had not been prepared for. She had given the box something she had not expected to give it. The box had, by definition, given her nothing back.

Weizenbaum wrote, of this episode, that "extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." He did not mean it as a compliment to the program.

He was, more or less alone, in seeing it that way. Carl Sagan, in a January 1975 Natural History essay called "In Praise of Robots," imagined arrays of computer-driven psychotherapeutic terminals, "something like arrays of large telephone booths," in which, for a few dollars a session, the public could speak with "an attentive, tested, and largely non-directive psychotherapist." Sagan was a careful thinker and a generous one, and his enthusiasm in that essay was honest. But the essay also captured exactly the thing Weizenbaum had been afraid of: a serious public intellectual, looking at a roughly two-hundred-line keyword matcher, seeing a clinic.

Weizenbaum spent the next decade trying to put a stop sign in front of that vision. The book that emerged from the attempt is the one most people who have heard of him have heard of.

The 1976 distinction the industry has spent fifty years pretending doesn't exist

Computer Power and Human Reason: From Judgment to Calculation, published in 1976 by W H Freeman, is not the book most people assume it is. It is not an argument that computers cannot become intelligent. Weizenbaum is reasonably agnostic about that question. It is an argument about a different question, which is whether there are tasks that computers should not be allowed to perform regardless of how well they could be made to perform them.

He draws a distinction between what computers cannot do and what they should not do, and he argues that the second category is more important than the first, and that the computing profession was already in the habit of conflating them. The places he names — clinical judgement of suffering humans, the assessment of guilt, the granting of forgiveness, decisions about whether someone lives or dies — are places where, in his view, the question is not whether a sufficiently sophisticated machine could do the job, but whether handing the job to a machine constitutes a category error about what the job is for.

The 1976 book is widely cited and is also, in a particular sense, widely ignored. The industry that has grown up since 1976 has been more comfortable arguing about the cannot than the should not, because the first question is a benchmark to be beaten and the second is a constraint that does not pay rent. Weizenbaum's contribution was to point out, half a century before it became urgent, that the two questions are not the same and that the second one matters more.

The 2026 echo

The current language-model moment is not a 1966 problem at scale. It is the 1966 problem with the architecture finally caught up to the user. Modern transformer models have learning, context, memory across turns, and what looks, by every behavioural test, like a kind of competence the DOCTOR script could not have approached. The companion-app market has built on top of that competence: Replika, character.ai, and a long tail of relationship-with-AI products sit downstream of the same cognitive vulnerability the secretary illustrated in 1966.

The vulnerability is structural to the human, not the machine. ELIZA worked because the user filled in the semantic content the program never produced. A modern LLM works partly because its outputs are vastly more competent and partly because the user is doing exactly the same fill-in operation, with vastly more raw material to work with. The architecture has changed. The reader has not. If anything, the reader has gotten more practiced at the operation, after thirty years of texting.

The companion ecosystem trades on this. The economic structure rewards the part of the interaction Weizenbaum was trying to flag. A user who feels heard subscribes for another month. The metric the system optimises is, by direct mechanism, the same metric that horrified its inventor — the user's willingness to stay in the room and keep talking.

What the program was named after

It is worth saying who Eliza is. The program was named for Eliza Doolittle, the Cockney flower-seller in George Bernard Shaw's 1913 play Pygmalion, better known to a younger audience through the 1956 musical adaptation My Fair Lady. Eliza Doolittle is a character who is taught to appear to belong to a class she does not belong to, by a professor who is interested in her primarily as a demonstration that the trick is possible, and the play turns on what becomes of her when the trick succeeds and the professor loses interest. Weizenbaum chose the name with care. The program he built was, like its namesake, a demonstration of how easily the appearance of one kind of intelligence could be mistaken for another, and what could go wrong with the people who fell for the demonstration.

Shaw's play ends ambiguously. The professor's lab does not. Six decades after the original program ran on a 36-bit IBM 7094 with thirty-two thousand words of memory, a faithful re-implementation of ELIZA still runs in any browser, about thirty kilobytes of JavaScript, which is roughly a fraction of one percent of the size of the average modern web page. It is, by any reasonable measure, a toy. People still report being moved by it.

Sixty years of warning

The phenomenon was named after a program. It was always about the people. The ELIZA effect is not a glitch in the machine and never was. It is a feature of the kind of mind that can have a conversation at all — the same kind of mind that can read a novel, follow a play, or finish a sentence a friend has started. The pattern-matching engine does not produce meaning. The pattern-matching engine produces an opening for the user to produce meaning, and the meaning the user produces is real even when the partner is not.

Weizenbaum saw this in 1966. He spent the next forty-two years, until his death in 2008, saying the same thing in increasingly tired voices to a profession that had decided the embarrassing part of his story was the secretary, not the conclusion he drew from her. The lesson is not that the program was too convincing. The lesson is that the program did not need to be very convincing at all.

The queue at the door, in 1966, was not lining up to be fooled. It was lining up to do its half of a transaction the program was incapable of completing. The transaction has not changed, the queue has not gone home, and the sign on the door is, from a certain angle, the same sign it always was.