Windows Azure failed the leap year test
Microsoft says its cloud service was downed by a calendar error
Microsoft now says that the real reason for the Microsoft Azure outage which took the UK government's new G-Cloud appstore offline for several hours yesterday, was a leap-year bug, not a power problem as originally reported.
The embarrassing problems, which affected a raft of Azure customers, began appearing just after midnight GMT on Wednesday 29th February and spread over the following hours as Microsoft technicians struggled to locate and fix the bug. Writing on the Azure blog, Bill Laing, Microsoft's corporate VP for server and cloud, claimed that, "The fix was successfully deployed to most of the Windows Azure sub-regions and we restored Windows Azure service availability to the majority of our customers and services by 2:57AM PST [10:57AM GMT], Feb 29th."
However, angry Azure customers writing on the MSDN blog reported that problems were still occurring into the morning of Thursday 1st March, and slammed Microsoft for not telling them the truth.
"The service dashboard isn't telling me anything outside of what I already know, except it's saying "Deployed applications will continue to run." - I can tell you this isn't true as the deployed applications aren't responding," wrote one.
"We were facing lot of issues yesterday. Today we are able to create new deployments but still not able to delete old deployments on services on North Central US," another said.
According to Laing, however, Microsoft had responded as fast as possible. "While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year," he wrote. "Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue."
Cloud Pro Newsletter
Stay up to date with the latest news and analysis from the world of cloud computing with our twice-weekly newsletter