Global IT outage caused by faulty CrowdStrike update could take ‘days and weeks’ to resolve, experts warn
Responsible for one of the largest IT outages in years, CrowdStrike has had a tough day, but experts warn customers could face a "nightmare" in the coming days
Experts are warning that the widespread IT outages caused by a faulty CrowdStrike update will require a lengthy remediation period as disruption continues to wreak havoc globally.
On the morning of 19 July, organizations in Australia and India began warning they were experiencing significant outages with their Windows machines.
This snowballed into global disruption as the rest of the world woke up to start their work day, with banks, airlines, broadcasters, and more all being confronted with the dreaded Windows blue screen of death (BSOD).
CrowdStrike has since confirmed it is the source of the issue, with CEO George Kurtz revealing via X that the incident was caused by a "defect found in a single content update for Windows hosts".
Maxine Holt, senior director of cybersecurity at research firm Omdia, outlined the scale of the issue, which some are calling the largest IT outage of all time.
“The global IT outage crisis is escalating, and organizations everywhere are in full scramble mode, desperately implementing workarounds to keep their businesses afloat,” she stated.
“CrowdStrike admits to a ‘defect found in a single content update for Windows hosts’ and is working feverishly with affected customers,” Holt added.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
Adam Leon Smith, BCS Fellow and cybersecurity expert, said the incident underscores the difficult balancing act software vendors must make between releasing updates in a timely manner, while ensuring they don’t end up breaking the systems they are intended to protect.
“People want to get security updates rolled out as quickly as possible because that helps prevent against what we call 'zero-day' attacks; that is new ways that actors are found to compromise systems,” he explained.
“There's a trade-off here between the speed of ensuring that systems get protected against new threats and the due diligence done to protect the system's resilience and stop things like this incident from happening."
Workarounds for IT outage will be a “nightmare” for customers
In a statement acknowledging the incident, CrowdStrike reassured users its team is fully mobilized to ensure security and stability of customers, providing a series of workarounds for individual hosts and public cloud systems.
Leon Smith added that although fixes have started to trickle out since the disruption began, because it is forcing so many machines into boot loops or the BSOD, it could be some time before the majority of systems are back online.
"In some cases, the fix may be applied very quickly, but because it has to be applied to so many computers around the world, that may take longer than it sounds,” he said.
“But if computers have reacted in a way that means they're getting into blue screens and endless loops it may be difficult to restore, and that could take days and weeks.”
Holt echoed Leon Smith’s concerns, confessing the workaround will be difficult to implement for many customers.
“The workaround, involving booting into safe mode, is a nightmare for cloud customers. Cloud-dependent businesses are facing severe disruptions.”
But Smith did note things could have been much worse, pointing out that a similar issue affecting Linux would have caused far more significant issues by virtue of its use in more critical systems and services.
"We have to realize this could have been a lot worse. Microsoft Windows isn't the main operating system used for mission-critical systems. It's Linux.”
Steve Sands, chair of the BCS Information Security Specialist Group said businesses should focus on getting the systems back online instead of trying to understand how the outage happened.
He further emphasized that the incident should stand as a reminder of how important software resilience is.
“I sincerely hope that today’s CrowdStrike issues raise awareness and create some much-needed urgency to continue this vital conversation… My advice would be to focus on restoring your own IT systems (following the advice of the vendors) and leave the providers and the industry to work on understanding how this happened and learning the lessons.”
Solomon Klappholz is a Staff Writer at ITPro. He has experience writing about the technologies that facilitate industrial manufacturing which led to him developing a particular interest in IT regulation, industrial infrastructure applications, and machine learning.