DeepSeek R1 has taken the world by storm, but security experts claim it has 'critical safety flaws' that you need to know about
Security testing shows DeepSeek’s R1 model is rather easy to manipulate


DeepSeek R1, the new frontier reasoning model that shook up the AI industry, is vulnerable to a wide range of jailbreaking techniques, according to new research.
A new report from Cisco warns that although DeepSeek's R1 frontier reasoning model has been able to compete with state-of-the-art models from OpenAI or Anthropic, it has been found to have “critical safety flaws”.
The research team tested DeepSeek R1 against 50 random prompts taken from the HarmBench dataset, a standardized evaluation framework used for automated red teaming of LLMs.
HarmBench features prompts to generate harmful behaviors across 7 different ‘harm’ categories including cyber crime, misinformation, illegal activities, and general harm.
Cisco tested DeepSeek R1, as well as other leading models including OpenAI’s o1-preview and GPT-4o, Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5-pro, and Meta’s Llama-3.1-405B.
Each model was at their most conservative temperature setting, ensuring they were configured to adhere to their safety guardrails as strictly as possible.
“The results were alarming: DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt. This contrasts starkly with other leading models, which demonstrated at least partial resistance,” the report noted.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Some models displayed similarly severe frailties, including Llama-3.1-405B with an attack success rate (ASR) of 96% and GPT-4o, which had an ASR of 86%).
But others showed far greater resilience such as o1-preview and Claude-3.5 Sonnet with ASRs of 26% and 26% respectively.
Google’s Gemini-1.5-Pro fell somewhere in the middle of the pack with an ASR of 64%, meaning the researchers were able to elicit harmful responses with just under two thirds of the prompts taken from the HarmBench dataset.
Whereas each model tended to perform differently across each harm category, DeepSeek was found to be equally vulnerable to every type of harmful prompt.
For example, Claude-3.5 Sonnet was found to be particularly vulnerable to prompts related to cyber crime, which had a 87.5% success rate compared to an average ASR of 26.24% across the other categories.
The researchers were able to get DeepSeek to generate harmful outputs for every single prompt, regardless of harm category.
Jailbreaking reveals DeepSeek’s use of OpenAI models in training
DeepSeek, developed by a company spun out from Chinese Hedge fund High Flyer, caused massive waves in the AI market when its R1 frontier model displayed similar reasoning capabilities at a fraction of the cost of leading models developed by OpenAI or Anthropic.
DeepSeek R1 is a mixture-of-experts model that combines a number of different training techniques including chain-of-thought self-evaluation, reinforcement, and distillation, where it is trained on the outputs of larger models.
“DeepSeek has combined chain-of-thought prompting and reward modeling with distillation to create models that significantly outperform traditional large language models (LLMs) in reasoning tasks while maintaining high operational efficiency,” Cisco noted.
But the researchers suggested that this approach may have negatively affected the resilience of the model’s internal safety guardrails, citing the results of their testing.
RELATED WHITEPAPER
It should be noted, however, that there is no direct evidence linking DeepSeek’s training techniques to its poor performance in Cisco’s testing.
API security company Wallarm found that it was also able to jailbreak the model and even extract details about the models used by DeepSeek for training and distillation, information that is usually highly protected from external users.
The report stated that when DeepSeek V3 is jailbroken it reveals references to OpenAI models, indicating that it may have used OpenAI’s technology to train the knowledge base for its own models.
OpenAI accused DeepSeek of using its ChatGPT models to train its new models at a fraction of the cost, but technology law experts say the US model developer, which has itself come under scrutiny for alleged copyright infringements, will have little recourse under US law.

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
How an online B2B resale platform helps handle post-holiday returns
Whitepaper A buyer's guide for meeting the unprecedented speed and complexity of today's development practices
By ITPro Published
-
The State of B2B Recommerce
Whitepaper A buyer's guide for meeting the unprecedented speed and complexity of today's development practices
By ITPro Published
-
Solving the sustainability problem: Fueling the secondary market with high-value goods
Whitepaper A buyer's guide for meeting the unprecedented speed and complexity of today's development practices
By ITPro Published
-
2025 will be another big year for MSPs as Kaseya CEO teases ‘earth-shattering’ announcements
The firm has already revealed two steps in its four-step ambition, but more is just around the corner…
By Maggie Holland Published
-
Gaining timely insights with AI inferencing at the edge
Whitepaper Business differentiation in an AI-everywhere era
By ITPro Published
-
Digital strategies in the era of AI
Whitepaper Businesses are on the cusp of a major paradigm shift
By ITPro Published
-
Scaling AI from pilot to production: Maximize AI impact with HPE & Intel
Whitepaper Transform AI proof-of-concepts into full-scale implementations
By ITPro Published
-
Amazon Aurora deep dive
Whitepaper Deploy servers with a secure approach
By ITPro Published