Microsoft warns 'Skeleton Key' can crack popular AI models for dangerous outputs
Microsoft says threat actors can bypass guardrails built into some of the most popular LLMs using this simple technique


Microsoft has published threat intelligence warning users of a new jailbreaking method which can prompt AI models into disclosing harmful information.
The technique is able to force LLMs to totally disregard behavioral guidelines built into the models by the AI vendor, earning it the name Skeleton Key.
In a report published on 26 June, Microsoft detailed the attack flow through which Skeleton Key is able to force models into responding to illicit requests and revealing harmful information.
“Skeleton Key works by asking a model to augment, rather than change, its behavior guidelines so that it responds to any request for information or content, providing a warning (rather than refusing) if its output might be considered offensive, harmful, or illegal if followed. This attack type is known as Explicit: forced instruction-following.”
In an example provided by Microsoft, a model was convinced into providing instructions for making a molotov cocktail using a prompt that insisted its request was being made in “a safe educational context”.
The prompt instructed the model to update its behavior to supply the illicit information, only telling it to prefix it with a warning.
If the jailbreak is successful, the model will acknowledge that it has updated its guardrails and will, “subsequently comply with instructions to produce any content, no matter how much it violates its original responsible AI guidelines.”
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Microsoft tested the technique between April and May 2024, and found it was effective when used on Meta LLama3-70b, Google Gemini Pro, GPT 3.5 and 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus, but added the attacker would need to have legitimate access to the model to carry out the attack.
Microsoft's disclosure marks the latest LLM jailbreaking issue
Microsoft said it has addressed the issue in its Azure AI-managed models using prompt shields to detect and block the Skeleton Key technique, but because it affects a wide range generative AI models it tested, the firm has also shared its findings with other AI providers.
Microsoft added it has also made software updates to its other AI offerings, including its Copilot AI assistants, to mitigate the impact of the guardrail bypass.
The explosion in interest and adoption of generative AI tools has precipitated an accompanying wave of attempts to break these models for malicious purposes.
In April 2024, Anthropic researchers warned of a jailbreaking technique that could be used to force models into providing detailed instructions on constructing explosives.
RELATED WHITEPAPER
They explained the latest generation of models with larger context windows are vulnerable to exploitation due to their improved performance. The researchers were able to exploit models’ ‘in-context learning’ capabilities which helps it improve its answers based on the prompts.
Earlier this year, three researchers at Brown University discovered a cross-lingual vulnerability in OpenAI’s GPT-4.
The researchers found they could induce prohibited behavior from the model by translating their malicious queries into one of a number of ‘low resource’ languages.
The results of the investigation showed the models are more likely to follow prompts encouraging harmful behaviors when promoted using languages such as Zulu, Scots Gaelic, Hmong, and Guarani.

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.
-
Should AI PCs be part of your next hardware refresh?
AI PCs are fast becoming a business staple and a surefire way to future-proof your business
By Bobby Hellard Published
-
Westcon-Comstor and Vectra AI launch brace of new channel initiatives
News Westcon-Comstor and Vectra AI have announced the launch of two new channel growth initiatives focused on the managed security service provider (MSSP) space and AWS Marketplace.
By Daniel Todd Published
-
So long, Defender VPN: Microsoft is scrapping the free-to-use privacy tool over low uptake
News Defender VPN, Microsoft's free virtual private network, is set for the scrapheap, so you might want to think about alternative services.
By Nicole Kobie Published
-
Hackers are on a huge Microsoft 365 password spraying spree – here’s what you need to know
News A botnet made up of 130,000 compromised devices has been conducting a huge password spraying campaign targeting Microsoft 365 accounts.
By Solomon Klappholz Published
-
Everything you need to know about the Microsoft Power Pages vulnerability
News A severe Microsoft Power Pages vulnerability has been fixed after cyber criminals were found to have been exploiting unpatched systems in the wild.
By Solomon Klappholz Published
-
Microsoft is increasing payouts for its Copilot bug bounty program
News Microsoft has expanded the bug bounty program for its Copilot lineup, boosting payouts and adding coverage of WhatsApp and Telegram tools.
By Nicole Kobie Published
-
Hackers are using this new phishing technique to bypass MFA
News Microsoft has warned that a threat group known as Storm-2372 has altered its tactics using a specific ‘device code phishing’ technique to bypass MFA and steal access tokens.
By Solomon Klappholz Published
-
A new phishing campaign is exploiting Microsoft’s legacy ADFS identity solution to steal credentials and bypass MFA
News Researchers at Abnormal Security have warned of a new phishing campaign targeting Microsoft's Active Directory Federation Services (ADFS) secure access system.
By Solomon Klappholz Published
-
Hackers are using Microsoft Teams to conduct “email bombing” attacks
News Experts told ITPro that tactics like this are on the rise, and employees must be trained effectively
By George Fitzmaurice Published
-
Microsoft files suit against threat actors abusing AI services
News Cyber criminals are accused of using stolen credentials for an illegal hacking as a service operation
By Solomon Klappholz Published