ChatGPT gives wrong answers to programming questions more than 50% of the time

ChatGPT website displayed on a laptop screen

(Image credit: Getty Images)

published 8 August 2023

ChatGPT has been found to produce incorrect answers for over half of the software engineering questions posed, according to fresh research.

The Purdue University study saw researchers analyze ChatGPT answers to 517 Stack Overflow questions, with the aim of assessing the “correctness, consistency, comprehensiveness, and conciseness” of answers presented by the generative AI tool.

Researchers reported 52% of answers to programming-related queries were inaccurate, while more than three-quarters (77%) were deemed “verbose”.

A key talking point from the study centered around user interpretation of answers presented by ChatGPT, as well as the perceived legitimacy of answers produced by the chatbot.

Researchers said ChatGPT’s answers are “still preferred [to Stack Overflow] 39.34% of the time due to their comprehensiveness and well-articulated language style”, resulting in users taking answers at face value.

“When a participant failed to correctly identify the incorrect answer, we asked them what could be the contributing factors,” researchers said. “Seven out of 12 participants mentioned the logical and insightful explanations, and comprehensive and easy-to-read solutions generated by ChatGPT made them believe it to be correct.”

Of the “preferred answers” identified by users to software queries, more than three-quarters (77%) of these were found to be wrong.

Researchers said that users were only able to identify errors in ChatGPT-based answers when it was glaringly obvious. However, in instances where the error was “not readily verifiable”, users frequently failed to identify incorrect answers or “underestimate the degree” of error in the answer itself.

RELATED RESOURCE

A close up photo of the side of a dark blue conference booth with a glowing neon IBM sign on the side

Application performance management for microservice applications on Kubernetes

This guide offers a deep understanding of the challenges faced in managing performance in a microservices architecture deployed on Kubernetes.

DOWNLOAD FOR FREE

Surprisingly, the study also found that even when answers contained obvious errors, two out of 12 still marked them as correct and revealed they “preferred that answer”.

Researchers said the perceived legitimacy of answers presented by ChatGPT in this instance should be a cause for concern among users. “Communication correctness” should be a key focus for the creators of such tools.

ChatGPT does give users ample warning the answers provided may not be entirely accurate, stating the chatbot “may produce inaccurate information about people, places, or facts”.

But the study suggested “such a generic warning is insufficient” and recommended answers are complemented with a disclaimer outlining the “level of incorrectness and uncertainty”.

“Previous studies show that LLM knows when it is lying, but does LLM know when it is speculating? And how can we communicate the level of speculation?” the study pondered. “Therefore, it’s imperative to investigate how to communicate the level of incorrectness of the answers.”

The use of generative AI tools in software development and programming has gathered significant pace in recent years, most notably with the launch of GitHub’s Copilot services.

Earlier this year, the firm announced the general availability of the AI-based coding assistant for business customers. The model is designed specifically to bolster code safety and has been hailed by developers as a vital tool in supporting their daily operations.

A survey published by GitHub in June revealed that a majority (92%) of devs now use an AI coding tool at their work, with 70% stating they see “significant benefits” to using generative AI tools in workplace settings.

TOPICS

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.