Researchers at the University of California, San Diego (UCSD) have published a preprint study indicating that two artificial intelligence models, OpenAI’s GPT-4.5 and Meta’s LLaMa, have successfully passed the Turing Test, a milestone first proposed by British mathematician and computer scientist Alan Turing in 1950. This test measures a machine’s ability to exhibit intelligent behaviour indistinguishable from that of a human during interactions.

The study, authored by Cameron Jones from UCSD’s Language and Cognition Lab, involved 126 undergraduate students and 158 participants sourced from Prolific, who engaged in five-minute online conversations with both a chatbot and a human participant. Notably, the participants were unaware of which interlocutor was human and which was a machine, tasked with identifying which conversation partner was human.

The results revealed that GPT-4.5 was deemed human by participants 73 percent of the time when instructed to take on a humanlike persona. This performance surpassed the responses of actual human participants, who were identified correctly less often. Meanwhile, LLaMa-3.1, also adopting a similar prompt, was judged to be human 56 percent of the time—enough to pass the Turing Test. In contrast, earlier models like GPT-4o and the 1960s era chatbot Eliza performed significantly lower, with win rates of 23 percent and 21 percent, respectively.

The successful passing of the Turing Test marks a significant advancement in AI capabilities, demonstrating that these language models can mimic human conversational behaviours convincingly. Jones remarked that with the persona prompt, the AI models were “no better than chance” at being distinguished from humans, indicating that their conversational abilities have reached a level where they can effectively simulate human interaction for brief exchanges.

Researchers also noted a deterioration in the models’ performance without the humanlike persona prompt, suggesting that this aspect is crucial for their success in passing the Turing Test. When given a basic prompt, GPT-4.5’s performance dropped to a mere 36 percent, highlighting the importance of context in conversation.

The implications of this research are wide-ranging. Jones indicated that the findings could lead to discussions about job automation and the potential for misuse in social engineering scenarios. The emotional mimicry displayed by these AI models raises questions about the nature of human interactions and the prospects of AI in both professional and personal settings.

John Nosta, an expert quoted in the study, noted that this advancement reflects not just a leap in machine intelligence but also a growing susceptibility to emotional mimicry among humans. He highlighted that the assessment approach resembled a “social chemistry test” more than a traditional Turing Test of intelligence.

This research underscores the evolution of AI in recent years, marking a trajectory that could shape future developments in technology and society, and echoes Turing’s original notion of a machine capable of human-like interactions—75 years after his seminal work on machine intelligence. The preprint of the study is currently available on arXiv and awaits peer review.

Source: Noah Wire Services