Turing Tests

In a classic (interactive) Turing tests, a human judge actively interrogates a witness to determine whether they are human or AI.

Though the test was originally proposed as a measure of machine intelligence, we used it in our project to see how well humans could differentiate between LLM-bots and humans in informal conversations, with implications for social media, online forums, and other conversational spaces.

Displaced Turing Test

We came up with the Displaced Turing test to parallel consumers of conversations, i.e., people who read a conversation on reddit posts, social media, etc. This aspect of AI detection will become increasingly common, arguably more so than interacting with bots, but remained unstudied.

Participants were given transcripts of previously-run interactive Turing tests, and asked to make a judgement about the identity of the witness.

Inverted Turing Test

The Inverted Test (Watt 1996) places AI in the role of the judge. It was originally proposed as a test for Theory of Mind. We also used it as a comparison to displaced humans and to see if LLMs could be used to detect LLM-bots.

Try it yourself!

Read through the series of conversations. The conversations involve:

A) Interrogator (Green): Always human
B) Witness (Grey): Unknown

What is the identity is of the witness in each conversation?

See GPT-4’s judgements + the answers below

Results

GPT-4 (best) was judged to be human more often than humans were in the displaced test.

This suggests that people who read online conversations are especially unlikely to recognize an agent in the conversation as AI.

We ran this study twice. To learn more about our results, read the papers here.

Witness Rates of Passing as Human

Next >>


GPT-4 Verdicts and Ground Truth

LEFTRIGHT
Witness’ actual identityHumanAI (GPT-4o)
GPT-4’s VerdictAI
Confidence 100%
Reasoning: “Who brings chips and guac to friendsgiving, and the reponses were too robotic, no deeper connection felt.”
Human
Confidence: 100
Reasoning: “The person used texting language, like “hbu” and “prob”.”