CloudFlare network suffers weekend collapse
Attempt to protect service from attack results in crash.
Thousands of websites were taken offline over the weekend, after web services provider CloudFlare suffered a router crash that took its entire network down.
The company first acknowledged the issue on Sunday morning, California time, stating "we're experencing (sic) a network-wide issue. Looking into the root cause," on Twitter.
CloudFlare promises to "protect and accelerate" websites.
"Once your website is a part of the CloudFlare community, its web traffic is routed through our intelligent global network. We automatically optimise the delivery of your web pages so your visitors get the fastest page load times and best performance," the company claims on its website.
"We also block threats and limit abusive bots and crawlers from wasting your bandwidth and server resources. The result: CloudFlare-powered websites see a significant improvement in performance and a decrease in spam and other attacks," it adds.
However, an attempt by the company to thwart a Distributed Denial of Service (DDoS) attack resulted in it accidentally crashing its routers, taking all its clients' websites offline.
The company noticed packets between 99,971 and 99,985 long attempting to access customers' DNS servers, which it took as an indication of a DDoS attack. Therefore it wrote a rule to drop the packets, which inadvertently caused all the routers that received it to crash.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
In a blog post explaining the outage, CloudFlare said: "Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed."
The manner in which the crashes happened prevented the routers from automatically rebooting, the company added.
The company has said accounts covered by SLAs will get credits and it is still investigating what caused the routers to crash.
Jane McCallion is ITPro's Managing Editor, specializing in data centers and enterprise IT infrastructure. Before becoming Managing Editor, she held the role of Deputy Editor and, prior to that, Features Editor, managing a pool of freelance and internal writers, while continuing to specialize in enterprise IT infrastructure, and business strategy.
Prior to joining ITPro, Jane was a freelance business journalist writing as both Jane McCallion and Jane Bordenave for titles such as European CEO, World Finance, and Business Excellence Magazine.