‘Frontier models are still unable to solve the majority of tasks’: AI might not replace software engineers just yet – OpenAI researchers found leading models and coding tools still lag behind humans on basic tasks
Large language models struggle to identify root causes or provide comprehensive solutions


AI might not replace software engineers just yet as new research from OpenAI reveals ongoing weaknesses in the technology.
Having created a benchmark dubbed ‘SWE-Lancer’ to evaluate AI’s effectiveness at completing software engineering and managerial tasks, researchers concluded that the technology is lacking.
“We evaluate model performance and find that frontier models are still unable to solve the majority of tasks,” researchers said.
Researchers found that, while AI excels in certain areas, it is limited in others. For example, AI agents are skilled at localizing problems but bad at working out what the root cause is.
While they can pinpoint the location of an issue with speed and use search capabilities to access necessary repositories faster than humans can, their understanding is limited in terms of how an issue spans across different components and files.
This frequently leads to solutions that are incorrect or insufficiently comprehensive, and agents can often fail by not finding the right file or location to edit.
In a comparison between two OpenAI models, o1 and GPT-4o, and Claude’s 3.5 Sonnet model, researchers found they all failed to entirely solve one particular user interface (UI) problem.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
While o1 solved the basic issue, it missed a range of others, and GPT-4o failed to solve even the initial problem. Sonnet was quick to identify the root cause of the issue and fix the bug, but the solution was not comprehensive and did not pass the researcher’s end-to-end tests.
All told, researchers said that while AI coding tools have the capacity to make software engineering more productive, but that users need to be wary of the potential flaws in AI-generated code.
Are AI coding tools more trouble than they’re worth?
While businesses are ramping up the use of AI coding tools, there have been plenty of warning signs to make firms stop and consider whether the tools are worth it.
Research from Harness earlier this year found that many developers are becoming increasingly bogged down with manual tasks and code remediation due to the increased use of AI coding tools.
The study noted that while these tools may offer huge benefits to software engineers, experts say they are still littered with weaknesses and lack some of the capabilities of human engineers.
“While these tools can boost efficiency, in their current state they often result in a surge of errors, security vulnerabilities, and downstream manual work that burdens developers," Sheila Flavell, COO of FDM Group, told ITPro.
The risk of vulnerabilities and malicious code being introduced into organizations is also significantly higher when AI coding tools are used, according to Shobhit Gautam, security solutions architect at HackerOne.
“AI-generated code is not guaranteed to follow security guidelines and best practices as defined by the organization standards. As the code is generated from LLMs, there is a possibility that third-party components may be used in the code and go unnoticed,” Gautam told ITPro.
RELATED WHITEPAPER
“Aside from the risk of copyright infringement, the code hasn’t been through the company’s validation testing and peer reviews, potentially resulting in unchecked vulnerabilities,” Gautam added.
An overreliance on AI coding tools may also be eroding the skills of human programmers, with research from education platform O’Reilly finding that interest in traditional programming languages is in decline.
Similarly, a post from tech blogger and programmer Namanyay Goel sparked debate on this topic recently when Goel claimed junior developers lack coding skills owing to a heightened use of automated AI tooling.
How can businesses use these tools effectively?
Despite concerns, there are clear signs AI coding tools are delivering value for both software engineers and enterprises. GitHub research from last year revealed AI coding tools have helped engineers deliver more secure software, better quality code, and the adoption of new languages.
With this in mind, firms need to prioritize certain processes to deliver success with AI tools. Flavell said businesses need to put upskilling front and center, as well as improving code reviews and quality assurance.
“It is essential that organizations create and implement governance processes to manage the use of AI generated code,” Gautam added.
“When it comes to coding, AI tools and human input will all play their part. Organizations gain the best of both worlds when they integrate these two together. Human Intelligence is essential to tailor coding to specific requirements, and AI can help experts increase their efficiency.”
MORE FROM ITPRO
- Can AI code generation really replace human developers?
- AI-generated code risks: What CISOs need to know
- The world's 'first AI software engineer' isn't living up to expectations

George Fitzmaurice is a former Staff Writer at ITPro and ChannelPro, with a particular interest in AI regulation, data legislation, and market development. After graduating from the University of Oxford with a degree in English Language and Literature, he undertook an internship at the New Statesman before starting at ITPro. Outside of the office, George is both an aspiring musician and an avid reader.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
AI was a harbinger of doom for low-code solutions, but peaceful coexistence is possible – developers still love the time savings and simplicity despite the allure of popular AI coding tools
News The impact of AI coding tools on the low-code market hasn't been quite as disastrous as predicted
By Ross Kelly Published
-
Red teaming comes to the fore as devs tackle AI application flaws
News Only a third of organizations employ adequate testing practices in AI application development, according to new research, prompting calls for increased red teaming to reduce risks.
By Ross Kelly Published
-
NetSuite targets UK customer productivity gains with new AI tools
News Oracle NetSuite has announced new AI tools and features for UK customers aimed at supercharging productivity.
By Rory Bathgate Published
-
‘Awesome for the community’: DeepSeek open sourced its code repositories, and experts think it could give competitors a scare
News Challenger AI startup DeepSeek has open-sourced some of its code repositories in a move that experts told ITPro puts the firm ahead of the competition on model transparency.
By George Fitzmaurice Published
-
Java developers are facing serious productivity issues: Staff turnover, lengthy redeploy times, and a lack of resources are hampering efficiency – but firms are banking on AI tools to plug the gaps
News Java developers are encountering significant productivity barriers, according to new research, prompting businesses to take drastic measures to boost efficiency.
By Solomon Klappholz Published
-
Software security debt is spiraling out of control – remediation times have surged 47% in the last five years, and it’s pushing teams to breaking point
News Software security flaws are taking longer to fix than ever, with remediation times having grown by 47% in the last five years.
By Nicole Kobie Published
-
‘We’re trading deep understanding for quick fixes’: Junior software developers lack coding skills because of an overreliance on AI tools – and it could spell trouble for the future of development
News Junior software developers may lack coding skills because of an overreliance on AI tools, industry experts suggest.
By George Fitzmaurice Published
-
GitHub's new 'Agent Mode' feature lets AI take the reins for developers
News GitHub has unveiled the launch of 'Agent Mode' - a new agentic AI feature aimed at automating developer activities.
By Ross Kelly Published