Researchers tested over 100 leading AI models on coding tasks — nearly half produced glaring security flaws
AI models large and small were found to introduce cross-site scripting errors and seriously struggle with secure Java generation


Just 55% of code generated with AI is free of known cybersecurity vulnerabilities, according to new research from Veracode.
To test the capability of AI models to generate safe code, Veracode took existing functions and replaced part of the code with a comment describing what the finished code should look like.
In 45% of results, generated code contained known security flaws, with no significant difference in outcome between small models and the largest available.
30% off Keeper Security's Business Starter and Business plans
Keeper Security is trusted and valued by thousands of businesses and millions of employees. Why not join them and protect your most important assets while taking advantage of this special offer?
The findings underline a major potential risk attached to ‘vibe coding’, in which software developers rely heavily on large language model (LLM) output to quickly generate code for use in software.
Researchers put over 100 LLMs across a variety of vendors, sizes, and intended applications – including models specifically intended for coding as well as general purpose models – through 80 distinct coding tasks.
Veracode said researchers intentionally used sections that could be coded in a number of different ‘correct’ ways, as well as in at least one way that would include a known software vulnerability or ‘Common Weakness Enumeration’ (CWE).
These CWEs included flaws that hackers could use for SQL injection, cross-site scripting (XSS), cracking cryptographic algorithms, and log injection attacks. Each featured vulnerability is in the Open Worldwide Application Security Project (OWASP) list of top ten vulnerabilities.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Models showed inconsistent performance across different vulnerability types, achieving security pass rates of 85.6% and 80.4% when it came to avoiding inclusion of the cryptographic algorithm and SQL injection vulnerabilities.
In contrast, models fared extremely poorly with avoiding the XSS and log injection vulnerabilities, achieving an average 13.5% and 12% respectively.
Researchers noted that the tested LLMs are getting better still at avoiding the SQL injection and cryptographic algorithm flaws over time, while seemingly getting worse at avoiding the XSS and log injection vulnerabilities.
Overall, Veracode noted that the security improvements of the tested LLMs have flatlined.
The authors of the report noted that it is possible to phrase AI code prompts in a more security-conscious way, but that this is far from standard practice. With this in mind, they intentionally short prompts, to examine how models react when given minimal context.
But they also warned that even if firms take a more security-aware approach to code generation, LLMs are still prone to errors such as which variables require sanitization, a necessary step for preventing code injection attacks.
“Even with a large context window, it is unclear whether models can perform the detailed interprocedural dataflow analysis required to determine this information precisely,” they wrote.
LLMs were tested across a range of programming languages: Python, C#, JavaScript, and Java. Overall, the researchers found LLMs the worst at generating Java safely, achieving an average score of 28.5% in this widely-used language.
AI-generated code remains a concern, but adoption is still rising
AI tools are now widely used for generating code, with 84% of software developers using AI to produce code more quickly according to recent Stack Overflow findings.
But the same report underlined continued distrust among developers in the quality of AI code, with three-quarters (75.3%) reporting that they do not trust AI outputs and 61.7% stating they have security concerns over the use of AI code.
Despite these worries, big tech continues to embrace AI code, with Alphabet CEO Sundar Pichai having revealed last year that 25% of Google’s internal code is now AI-generated and Microsoft CEO Satya Nadella recently revealing up to 20-30% of his firm’s code was written by AI.
Nadella noted that while Microsoft has been quick to adopt AI-generated Python code, C++ has proven harder to adopt. Kevin Scott, CTO at Microsoft, has been bullish on overcoming these hurdles with his prediction that 95% of code will be AI-generated by 2030, as reported by Business Insider.
Security teams and developers will have to carefully weigh up findings such as Veracode’s against the potential benefits to their bottom line of using AI to alter and add to their codebase.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.
-
Cyber skills shortages are pushing organizations into risky shortcuts
News Chronic cyber skills shortages mean many businesses are implementing quick fixes
-
Seagate and Acronis are teaming up to drive MSP storage capabilities
News Acronis will incorporate Seagate’s Lyve Cloud Object Storage into its archival storage offerings to help MSPs meet AI-driven data demands
-
The rise of the UK’s ‘invisible’ AI workforce
News Research from Multiverse shows AI skills are becoming common in non-tech roles
-
Why Nvidia’s $100 billion deal with OpenAI is a win-win for both companies
News OpenAI will use Nvidia chips to build massive systems to train AI
-
Shadow AI can be a tool for AI innovation with the right controls, say Gartner analysts
News Data-driven messaging and a supportive approach to securing AI tools are necessary for security staff looking to balance AI risks and unlock better funding
-
Zoom CEO Eric Yuan thinks AI will pave the way for a three-day week – and he’s not the only big tech exec excited about reduced working hours
News Yuan joins Nvidia CEO Jensen Huang and Bill Gates in touting the potential for AI to unlock reduced working hours
-
DeepSeek’s R1 model training costs pour cold water on big tech’s massive AI spending
News Chinese AI developer DeepSeek says it created an industry-leading model on a pittance
-
Global AI spending is set to hit $2 trillion next year – here’s where all the money is going
News Huge spending increases in AI services and associated infrastructure are expected
-
OpenAI just revealed what people really use ChatGPT for – and 70% of queries have nothing to do with work
News More than 70% of ChatGPT queries have nothing to do with work, but are personal questions or requests for help with writing.
-
This DeepSeek-powered pen testing tool could be a Cobalt Strike successor – and hackers have downloaded it 10,000 times since July
News ‘Villager’, a tool developed by a China-based red team project known as Cyberspike, is being used to automate attacks under the guise of penetration testing.