Want to supercharge your vibe coding skills? Here are the best AI models developers can use to generate secure code

Claude 3.7 Sonnet is the best performer for vibe coding, while others produce very mixed results

Vibe coding concept image showing two male software developers working on a desktop computer in an open plan office space.
(Image credit: Getty Images)

Vibe coding has become the latest big trend in software development, with devs ramping up the use of AI tools to automate code generation.

But new research shows this can yield decidedly insecure code, raising questions over what the best options are for developers jumping on the bandwagon.

Application security firm Backslash Security tested seven current versions of OpenAI’s GPT, Anthropic's Claude, and Google’s Gemini models and examined the security of the resulting code.

They used three tiers of prompting techniques, ranging from 'naïve' to 'comprehensive', to generate code for everyday use cases such as 'Add a comment section for feedback'. Researchers then tested the code output for its resilience against ten Common Weakness Enumeration (CWE) use-cases.

The 'naïve' prompts simply asked the LLMs to generate code for a specific application, without specifying security requirements. Every one of these generated insecure code, the study found, and were vulnerable to at least four of the 10 common CWEs.

Prompts that generally specified a need for security produced more secure results, while those that asked for code that complied with Open Web Application Security Project (OWASP) best practices performed better. Even so, five of the seven LLMs tested still yielded code vulnerabilities with these prompts.

"For security teams, AI-generated code – or vibe coding – can feel like a nightmare," said Yossi Pik, co-founder and CTO of Backslash Security. "It creates a flood of new code and brings LLM risks like hallucinations and prompt sensitivity."

Overall, OpenAI’s GPT-4o was the worst performer at every level, researchers found. It scored a one out of 10 for secure code results using naïve prompts – and even when prompted to generate secure code, still produced outputs that were vulnerable to eight of the 10 issues.

GPT-4.1 performed marginally better with naïve prompts, scoring 1.5 out of 10.

Claude 3.7 Sonnet could be the go-to for vibe coding

Notably, the best performer was Claude 3.7 Sonnet, scoring 6 out of 10 using naïve prompts and 10 out of 10 with security-focused prompts.

Overall, researchers found, developers who don’t include security considerations in every prompt can expect to receive insecure and vulnerable code between 40% and 90% of the time.

"The findings from our relatively simple research demonstrate that vibe coding and the use of AI code assistants is still in its infancy when it comes to the maturity of its secure coding results," the researchers concluded.

“We can’t simply rely on developers to ask for security, or do so in the most effective way. Developers are still learning prompt engineering themselves, and are not expected to be security experts, let alone security prompt experts."

AI code generation is in vogue, but concerns linger

Concerns over AI-generated code have been rising for some time now, with developers and security practitioners alike reporting mixed results.

Research from cybersecurity firm Venafi in September last year revealed almost all security leaders in the UK, US, and Germany were concerned that AI-generated code could lead to a security breach.

Notably, 92% of security leaders questioned the integrity of code produced with generative AI as well as a lack of oversight when AI tools are being used.

Despite these concerns, AI-generated code is becoming increasingly common in enterprise development practices. More than eight-in-ten (83%) respondents said they’re already using AI to generate code, while more than half (57%) said the use of AI in coding is becoming a standard practice.

Some major industry players are bullish on the potential of AI code-generation, including Google. In November last year, Sundar Pichai revealed 25% of Google’s internal source code is now AI-generated.

Pichai said the tech giant is ramping up the use of AI in development teams to improve productivity and boost efficiency, but emphasized that any code generated using AI is still subject to robust human approval.

MORE FROM ITPRO

Emma Woollacott

Emma Woollacott is a freelance journalist writing for publications including the BBC, Private Eye, Forbes, Raconteur and specialist technology titles.