Millions of devices bricked and $5.4 billion in losses: The cost of the CrowdStrike outage continues to mount

Silhouette of man holding phone with crowdstrike logo infront of blue crash screen
(Image credit: Getty Images)

As the dust continues to settle in the wake of the CrowdStrike outage, the true extent of the damage caused by the disruption becomes clearer. 

Almost a week after the initial outage was first caused by a defective update, the cybersecurity firm announced that more than 97% of Windows sensors were back online as of the 24 July. 

CrowdStrike CEO George Kutz took to LinkedIn on 25 July to note that while the majority of the machines affected were back online, the company was continuing to work to ensure every impacted system was restored to full functionality.

Investigating the cost of the disruption caused by the incident, cloud outage analytics specialist Parametrix Insurance estimated the outage caused direct losses of $5.4 billion to US Fortune 500 companies. 

In fact, this figure excludes any losses incurred by Microsoft, indicating the overall financial impact of the incident is likely to be far higher. The report also noted that cyber insurance policies were unlikely to cover the extent of disruption companies suffered.

“The portion of the loss covered under cyber insurance policies is likely to be no more than 10% to 20%, due to many companies’ large risk retentions, and to low policy limits relative to the potential outage loss.” 

In a report outlining the key details behind the incident, Forrester noted that this is only the tip of the iceberg when it comes to legal and regulatory fallout from the outage.

“The US Congress has already characterized this as a ‘critical infrastructure’ matter, and regulatory agencies are reportedly in contact with CrowdStrike. EU and UK regulators will want to flex their muscles with their operational resilience and cybersecurity product legislation. Of course, there will be a plethora of lawsuits against CrowdStrike, but ISV contracts usually limit consequential damages”.

Forrester also warned there will be longer term effects to the level of accountability and support associated with update procedures.

“In the long term, there will be calls for all software vendors to improve security and quality by design, provide more update control and transparency, and offer greater accountability and support. Operating system (OS) vendors will suddenly find tech leaders demanding to know a lot more about kernel design.”

CrowdStrike incident underscores dangers of rushing out updates 

In its preliminary post-incident review, CrowdStrike reported the outage was caused by a botched rapid response update that contained a memory safety issue, specifically a read out-of-bounds access violation in the CSagent driver.

An incident response report from Microsoft confirmed CrowdStrike’s initial findings, leveraging Microsoft’s WinDBG Kernel Debugger and a series of extensions to analyze the crash dumps.

Notably, Microsoft revealed its 8.5 million estimate of devices affected by the update was based on those customers who opted to share their crash reports, which was ultimately a subset of the overall figure of impacted machines, indicating the actual number is higher.

In its report on the incident, Forrester said “cascading risk management failures fuelled the massive outage’, stating the available evidence suggests the incident was largely a quality assurance failure.

RELATED WHITEPAPER

“This was a content configuration update. For this type of content update, CrowdStrike didn’t conduct the same extensive testing as it did for sensor updates. There were two additional issues that precipitated the outage. First, there was an undetected bug in CrowdStrike’s testing/QA logic that failed to identify problematic content data,” the report stated.

“Second, because CrowdStrike didn’t stagger deployments for content updates, it deployed this update everywhere all at once. CrowdStrike has already committed to fixing both of these issues.”

Rapid response content is one of two channels used by CrowdStrike to deliver security content configuration updates, designed to respond to a rapidly evolving threat landscape.

David Ferbrache, managing director of Beyond Blue, said the incident highlights the risks associated with rapid update mechanisms like this, especially when the update affects software as heavily concentrated on enterprise devices as CrowdStrike Falcon.

“[T]he incident also lies in part on the risks of rapid automated updates in live production environments and the need for organizations to have the ability to control how those updates are applied and to balance the risk of a deferred update (potentially leaving security issues open but allowing additional testing) against the risk of immediate application. This is a fine balance, and sophisticated customers need to be able to strike that balance.”

Solomon Klappholz
Staff Writer

Solomon Klappholz is a Staff Writer at ITPro. He has experience writing about the technologies that facilitate industrial manufacturing which led to him developing a particular interest in IT regulation, industrial infrastructure applications, and machine learning.