Researchers tested over 100 leading AI models on coding tasks — nearly half produced glaring security flaws
AI models large and small were found to introduce cross-site scripting errors and seriously struggle with secure Java generation
Just 55% of code generated with AI is free of known cybersecurity vulnerabilities, according to new research from Veracode.
To test the capability of AI models to generate safe code, Veracode took existing functions and replaced part of the code with a comment describing what the finished code should look like.
In 45% of results, generated code contained known security flaws, with no significant difference in outcome between small models and the largest available.
30% off Keeper Security's Business Starter and Business plans
<p>Keeper Security is trusted and valued by thousands of businesses and millions of employees. Why not join them and protect your most important assets while taking advantage of this special offer?The findings underline a major potential risk attached to ‘vibe coding’, in which software developers rely heavily on large language model (LLM) output to quickly generate code for use in software.
Researchers put over 100 LLMs across a variety of vendors, sizes, and intended applications – including models specifically intended for coding as well as general purpose models – through 80 distinct coding tasks.
Veracode said researchers intentionally used sections that could be coded in a number of different ‘correct’ ways, as well as in at least one way that would include a known software vulnerability or ‘Common Weakness Enumeration’ (CWE).
These CWEs included flaws that hackers could use for SQL injection, cross-site scripting (XSS), cracking cryptographic algorithms, and log injection attacks. Each featured vulnerability is in the Open Worldwide Application Security Project (OWASP) list of top ten vulnerabilities.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Models showed inconsistent performance across different vulnerability types, achieving security pass rates of 85.6% and 80.4% when it came to avoiding inclusion of the cryptographic algorithm and SQL injection vulnerabilities.
In contrast, models fared extremely poorly with avoiding the XSS and log injection vulnerabilities, achieving an average 13.5% and 12% respectively.
Researchers noted that the tested LLMs are getting better still at avoiding the SQL injection and cryptographic algorithm flaws over time, while seemingly getting worse at avoiding the XSS and log injection vulnerabilities.
Overall, Veracode noted that the security improvements of the tested LLMs have flatlined.
The authors of the report noted that it is possible to phrase AI code prompts in a more security-conscious way, but that this is far from standard practice. With this in mind, they intentionally short prompts, to examine how models react when given minimal context.
But they also warned that even if firms take a more security-aware approach to code generation, LLMs are still prone to errors such as which variables require sanitization, a necessary step for preventing code injection attacks.
“Even with a large context window, it is unclear whether models can perform the detailed interprocedural dataflow analysis required to determine this information precisely,” they wrote.
LLMs were tested across a range of programming languages: Python, C#, JavaScript, and Java. Overall, the researchers found LLMs the worst at generating Java safely, achieving an average score of 28.5% in this widely-used language.
AI-generated code remains a concern, but adoption is still rising
AI tools are now widely used for generating code, with 84% of software developers using AI to produce code more quickly according to recent Stack Overflow findings.
But the same report underlined continued distrust among developers in the quality of AI code, with three-quarters (75.3%) reporting that they do not trust AI outputs and 61.7% stating they have security concerns over the use of AI code.
Despite these worries, big tech continues to embrace AI code, with Alphabet CEO Sundar Pichai having revealed last year that 25% of Google’s internal code is now AI-generated and Microsoft CEO Satya Nadella recently revealing up to 20-30% of his firm’s code was written by AI.
Nadella noted that while Microsoft has been quick to adopt AI-generated Python code, C++ has proven harder to adopt. Kevin Scott, CTO at Microsoft, has been bullish on overcoming these hurdles with his prediction that 95% of code will be AI-generated by 2030, as reported by Business Insider.
Security teams and developers will have to carefully weigh up findings such as Veracode’s against the potential benefits to their bottom line of using AI to alter and add to their codebase.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.
-
More transparency needed on sprawling data center projects, activists claimNews Activists call for governments to be held accountable when data centers are pushed through without proper consultation
-
Red Hat eyes tighter data controls with sovereign support for EU customersNews The company's new offering will see support delivered entirely by EU citizens in the region
-
Some of the most popular open weight AI models show ‘profound susceptibility’ to jailbreak techniquesNews Open weight AI models from Meta, OpenAI, Google, and Mistral all showed serious flaws
-
'It's slop': OpenAI co-founder Andrej Karpathy pours cold water on agentic AI hype – so your jobs are safe, at least for nowNews Despite the hype surrounding agentic AI, OpenAI co-founder Andrej Karpathy isn't convinced and says there's still a long way to go until the tech delivers real benefits.
-
Nvidia CEO Jensen Huang says future enterprises will employ a ‘combination of humans and digital humans’ – but do people really want to work alongside agents? The answer is complicated.News Enterprise workforces of the future will be made up of a "combination of humans and digital humans," according to Nvidia CEO Jensen Huang. But how will humans feel about it?
-
‘I don't think anyone is farther in the enterprise’: Marc Benioff is bullish on Salesforce’s agentic AI lead – and Agentforce 360 will help it stay top of the perchNews Salesforce is leaning on bringing smart agents to customer data to make its platform the easiest option for enterprises
-
This new Microsoft tool lets enterprises track internal AI adoption rates – and even how rival companies are using the technologyNews Microsoft's new Benchmarks feature lets managers track and monitor internal Copilot adoption and usage rates – and even how rival companies are using the tool.
-
Salesforce just launched a new catch-all platform to build enterprise AI agentsNews Businesses will be able to build agents within Slack and manage them with natural language
-
The tech industry is becoming swamped with agentic AI solutions – analysts say that's a serious cause for concernNews “Undifferentiated” AI companies will be the big losers in the wake of a looming market correction
-
Microsoft says 71% of workers have used unapproved AI tools at work – and it’s a trend that enterprises need to crack down onNews Shadow AI is by no means a new trend, but it’s creating significant risks for enterprises
