Cloudflare servers "panic" after leap second

Alarm clock

A leap second added to the end of 2016 sent servers at DNS security service Cloudflare into a "panic", causing some of them to briefly drop offline.

The 61-second minute, caused by the extra second, hit a small number of Cloudflare servers at midnight on New Year's Day, as the code was unable to handle the invalid timestamp.

Any customers affected would have seen an error message saying servers cannot be reached, instead of being directed to the website they were trying to access.

The extra second was added to help co-ordinate worldwide timekeeping between zones, as the Earth's rotation experiences a gradual slowdown. However, the DNS service used by Cloudflare works under the assumption that 'time cannot go backwards', and the slight extension to 2016 caused the code to perceive a "negative resolution time".

"A number went negative when it should always have been, at worst, zero," said Cloudflare programmer John Graham-Cumming. "A little later this negative value caused RRDNS to panic... the net effect was that some DNS resolutions to some Cloudflare managed web properties failed."

The problem was believed to have only affected a small number of customers using CNAME DNS records with the company, and of these fewer than 1% of all user requests to servers resulted in an error.

"The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC," added Graham-Cumming. "We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand."

The new patch will allow the code behind the DNS service to 'normalise' in the unlikely event time is perceived to have skipped backwards.

Although widespread software meltdowns have yet to materialise after a leap second, the change in timestamps continues to hamper high profile tech companies. Both Twitter and Android were hit by 2015's mid-year leap second, as the services started to display notifications with incorrect dates and times.

Other major tech providers, including Instagram, Netflix and Amazon Web Services also experienced crippling web crashes in 2015, however this year the disruption appears to be on a much smaller scale.

Google recently announced it would be creating its own unit of time to accommodate for 2016's leap second. 'Smeared time' allowed the stretching of a regular second over the course of 31 December 2016, meaning the company was able to keep all servers that use Google's Network Time Protocol (NTP) in time with the changes.

Contributor

Dale Walker is a contributor specializing in cybersecurity, data protection, and IT regulations. He was the former managing editor at ITPro, as well as its sibling sites CloudPro and ChannelPro. He spent a number of years reporting for ITPro from numerous domestic and international events, including IBM, Red Hat, Google, and has been a regular reporter for Microsoft's various yearly showcases, including Ignite.