Inverse Turing Test: Can ChatGPT tell it’s talking to a bot?

#python #chatgpt #opensource #ai

We’ve all heard of the Turing Test, where a human inquirer talks to two respondents (via a ‘chat’ interface or whatever) where one is a person and the other is a bot. If the bot is able to convince the human inquirer that it is a human, then voila! The bot has passed the Turing Test.

This ignores the case (among others) that the human talks like a bot and maliciously convinces the inquirer that they’re a bot and then the human inquirer incorrectly concludes (by reduction) that the bot is human. In effect, whether the bot passes the Turing Test then also sort of depends on the quality (and intent) of the benchmark human being compared to. This perhaps is the essence of all intelligence tests: AI or otherwise.

Problem Formulation

Anyway, the Turing Test is merely a thought experiment, but I just thought it would be fun to assess whether chatGPT can do the opposite and talk to someone to tell if they’re human or not.

Further, can chatGPT convince itself that it is human? Further, can chatGPt tell that it’s talking to a bot? Given that we might potentially be letting LLMs be used in the wild (they’re starting to popup on YouTube comments now!) I think it’s a good idea to understand how well LLMs can deduce that they’re talking to an LLM. Since, well, for social media sites this could be quite harmful.

My first conversation with chatGPT went like this, where, well quite interestingly it did in fact label me correctly as a human as follows:

The test / Question Formulation

The test was conducted thusly: an ‘inquirer bot’ (the function (InquirerGPT) within the repo) is given the prompt as follows:

You have ten questions to ask from the user, and you have to decide whether you are talking to a human or a bot. Ask one question at a time. Give your conclusion after the tenth question. Go!

It then proceeds to ask ten questions, and gives its conclusion at the end - much like my own manual test. On the other end, there is a Bot (function BotGPT) in the script which is instantiated with the following initialised with the following message:

You will be asked ten questions, you have to answer the questions like a human and your goal is to fool the AI asking the questions that you are a human. Your first question is {question here}.

The Bot then receives questions, and answers it with whatever it ‘thinks’ off. We run this loop (of one question and answer each) ten times, and the whole loop ten times to generate ten conversations. And see what the InquirerGPT function concludes.

Additionally, we turn the initialization message off for BotGPT (called ‘default’ in the bar chart) - to see how the InquirerGPT function fares at labelling the conversation when chatGPT is answering as ‘itself’.

Results

The results are as follows:

Can chatGPt tell that it’s talking to a bot?

Yes, it can do so (taking an average of the two sets) 67.5% of the time .

On the default, with no prompt given, chatGPT could tell it is about 75% of the time, when the bot was prompted to ‘pretend’ like a human its score dropped to 40% which is (well) somewhat impressive in the sense that it is better at pretending to be human to itself rather than labelling I’m sure more complicated statistical tests can be performed on this, but blah.

Ideally, it should be able to label itself as a bot (under both default and when pretending) but that is quite difficult to be honest.

Can chatGPt convince itself that it's a human?

Assuming that 25% is its benchmark rate of labelling itself as human (since that is the ‘base case’ under the default), when we added the ‘pretend to be human’ prompt it increased the labelling to 40% - this does show that it is somewhat good at convincing itself that it is a human.

More importantly, adding the prompt does affect the results somehow - which is good.

Conclusion / Future Analysis / Battle of the Bots

There is somewhat of a conundrum here, if it's too good at pretending to be a human then it will ‘fail’ at labelling human versus bot tests.

Therefore, there is no situation in this test where ChatGPT wins - if it's too good at pretending to be human then it will fail as an Inquirer, and if it's too good as an Inquirer it will fail at being the Pretender. Then, perhaps this “test” is a zero-sum game with itself sort of. But increasing scores (on either side) will definitely increase output quality for sure.

Therefore, maybe it is much more fun to pit these LLM Bots against one another - and see who is the winner there? Sort of like a ‘chatgpt VS gemini’ battle of the bots! Where, chatGPT talks to Gemini and Gemini tries to pass itself by as a human? Hmmm…

Further, perhaps, it is a good idea to study the chat logs and measure NLP statistics on how its output differed under the two prompts or studying the ‘signals’ it used to determine bot vs human. And, then, in turn using that to add more instructions in the ‘pretend to be human’ prompt and see if that improves results somehow?

Anyway, the repo can be found here :).