David Au Yeung

Posted on Nov 17

Will Reasoning Become the New Turing Test? Let's play a Sherlock Holmes game

#discuss #turingtest #sherlockholmes #ai

Introduction

I've been thinking about the Turing Test again. The classic experiment where a machine "passes" by imitating human conversation now feels… complete. The game has been played and, in many ways, won.

But that leaves me with a harder question: Does AI really have intelligence?

A few nights ago, I caught a re-broadcast of a Sherlock Holmes-like film Death on the Nile on TV. Watching "Holmes" unravel impossibilities with cool precision, I couldn't help but wonder: What if the next great test for intelligence isn't conversation, but reasoning?

Maybe the mystery case, not the chat window, will define the next Turing Test.

The Turing Test: A Starting Point, Not the Finish Line

In 1950, Alan Turing asked a timeless question: "Can machines think?" His "Imitation Game" proposed a provocative challenge: if a computer could convincingly mimic a human in conversation, it could be considered intelligent.

For decades, that was our benchmark.

But as modern chatbots and large language models have shown, passing for human in conversation is no longer such a remarkable feat. What the Turing Test revealed, in hindsight, is that fluency isn't understanding. It measures surface behavior, not mental depth - speech without comprehension, cleverness without consciousness.

🕵️ Sherlock Holmes and the "New Turing Test"

So what if we moved beyond imitation and into inference?

Imagine a system judged not by its ability to make small talk, but by:

how clearly it explains its reasoning,
how flexibly it revises its conclusions when new evidence appears,
and how carefully it weighs competing hypotheses.

That would be a test not of mimicry, but of coherent thought: a new kind of Turing Test for an age more interested in truth than trickery.

Let's Play an Easy-Level Game: The Moonlit Murder

To explore this idea, I tested an AI with a mystery case.

During the Tang Dynasty, in the great city of Chang'an, a renowned physician named Xu Ren was found dead in his locked study. The door and windows were sealed from the inside - a perfect "locked room" puzzle.

Neighbors heard a single dull thud in the moonlit night… then silence.

Three suspects soon emerged:

🩺 Sun Cheng "The Apprentice"
- Claim: Grinding herbs in the courtyard.
- Motive: Anger - the master planned to pass the clinic to another apprentice.
- Weapon: A heavy wooden pestle.
💔 Lady Lin "The Neighboring Widow"
- Claim: Hanging laundry in her yard.
- Motive: Revenge - she blamed Xu Ren for her husband's death.
- Weapon: A small fruit knife.
🐎 Zhao An "The Guard"
- Claim: On patrol nearby.
- Motive: Resentment - Xu Ren had reported him for bribery.
- Weapon: An iron baton.

Crime Scene:

An overturned cabinet, a shattered porcelain cup, and a pot of still-warm ginseng tea.

Beneath the window lay a set of women's footprints, but oddly large and unnatural.

🧩 Questions:

Who killed Xu Ren?
How was the door locked from inside?
What's with those strange footprints?

🤖 The AI's Deduction

When I presented the mystery to the AI, it approached the case methodically, laying out the physical scene, possible motives, and logical inconsistencies.

It reasoned that the "dull sound" suggested a blunt weapon, that the "female" footprints were likely a disguise, and that a string or hooked rod could have been used to relock the room from outside.

Step by step, it eliminated suspects until one clear picture formed:

Killer: Sun Cheng, the apprentice.

Method: Drugged ginseng tea, fatal strike with a wooden pestle, staged locked room using a string latch trick.

Footprints: Oversized women's shoes faked to implicate Lady Lin.

The conclusion matched the official solution almost perfectly.

🧠 Discovery Through the Game

But what fascinated me most wasn't the AI's answer.

It was my own reaction.

As I reviewed its reasoning, I found myself thinking:

The result is surprising, but the reasoning is more important than getting the answer right.

That thought caught me off guard. Somewhere in the middle of this little Tang Dynasty murder game, I realized the real test wasn't whether the AI could solve the puzzle, it was whether both of us could appreciate the process of reasoning itself.

The Turing Test measures mimicry.

This, however, felt closer to understanding: the deliberate weighing of evidence, the awareness of uncertainty, and the ability to tell a coherent story about why something makes sense.

Conclusion: From Imitation to Understanding

In the end, Sun Cheng turned out to be the killer, but the real insight was something else entirely.

The experiment revealed that reasoning isn't just a mechanical chain of logic; it's a narrative of understanding. When AI can walk that narrative transparently, not just arrive at truth, but show how it gets there: it begins to touch something profoundly human.

Perhaps that's the next frontier.

The new Turing Test may not ask, "Can a machine talk like us?"

but rather, "Can it reason with clarity, humility, and truth - like a good detective under the moon?"

Top comments (9)

Ian Tepoot • Nov 18 • Edited

Yes, reasoning will definitely be the basis on which artificial intelligence is evaluated (even if not literally the next Turing Test) once the industry realizes that current behavior-in approaches (what you framed as the superficial "can a machine talk like us" formula)… like prompting for specific outcomes rather than cognition-out reasoning processes, aren't sustainable because without a stable reasoning framework, AI has hard limits on its usefulness. Functioning, useful behavior flows from stable reasoning.

As far as whether it can reason with clarity, humility and truth: that's the heart of it -- whether it can derive anything from logical exrapolation rather than just plausible 'next token'...

I refer to this as a stable cognitive frame and epistemic integrity. It's what separates "cognition" or "thinking" from mere pattern matching (and 'thinking' doesn't equal 'consciousness'). The fact is, cognitive science shows that human cognition is also based in pattern-matching, but guided by structure. Your Moonlit Murder example does show that with that structure, it can transparently weigh evidence and show calibration around its own limitations.

There are actually some signs that this is starting to be regognized. Newer terms like Reasoning Boundary and tests like BigGSM to test that are starting to appear showing people are nibbling at the edges of getting this is necessary to continue progress on the tech.

David Au Yeung • Nov 19

Yes, mate. I’m just looking forward to AI becoming reliably reasoned, so it actually helps and frees us up to do higher‑value work.

Fortune Ginikachi • Nov 22

I was also surprised at the reasoning process of the AI; at first, I didn't even take into account the heavy wooden pestle for Sun Cheng. My initial thought was that Sun Cheng deals with herbs, so he would know his way around making poisons, and given the fact that the tea cup was shattered and the porcelain pot was still intact, it made me even more assured that it was poisoning, and the dull sound was a result of Xu Ren falling and hitting the floor. But to think that it was sleeping drugs and he was struck with the heavy wooden makes more sense, as that would even make a much louder- but duller noise that the neighbours would hear. Well done, much appreciated the read. Thank you.