Things happen, and datacenters go dark sometimes. Still there were surprisingly few datacenter outages that lasted long enough to make the news this year. But of course, many more private outages may have occurred that no one talked about.
Two of the eight outages on the list — the longest, interestingly — stemmed from physical migrations of datacenters. Apparently, like asbestos, servers are best left undisturbed. Or perhaps datacenter managers don't know exactly what they've got until it's gone. A survey by Aperture Technologies Inc.'s Research Institute revealed that almost half of organizations (49 percent) aren't able to track changes across physical aspects of their datacenter, including space, power and cooling.
Two others were of the "backhoe" variety, where a contractor cut precisely the wrong cable. Power outages, in part caused by energy-gobbling datacenters , account for the rest of the outages that have causes attributed — except one. T-Mobile USA stands out as the only flood-caused datacenter disaster. You would think those expensive servers would be on the second floor, at least.
1. Muffed Migration: More than 165,000 Web sites were offline for up to seven days in early November, thanks to a datacenter migration that went awry for Web-hosting service NaviSite Inc. The migration was planned as a result of NaviSite's acquisition of another hosting company, Alabanza Corp. LLC. The new customers were to be moved to newer, more scalable equipment.
Initially, 200 servers were to be moved by truck while 650 were transferred over the Internet. But the virtual migration process proved too slow, and NaviSite began physically moving all servers. Technical glitches grew from there and soon mushroomed into a week-long ordeal.
2. 15 Hours to Several Days: In August, thousands of customers lost access to their Web sites and hosted business applications for several days, when 3,700 servers were relocated 270 miles . The servers were owned by ValueWeb, a subsidiary of the Web hosting company Hostway Corp. Customers were notified that the move would take 15 to 18 hours.
3. Texas Truckin': Hosting firm Rackspace US Inc. suffered back-to-back outages in just 36 hours. The first outage was caused by a "mechanical failure" in the company's Dallas datacenter on Sunday, November 11, 2007. Customers experienced "intermittent service interruptions" and a team of more than 100 techs was deployed to find and fix the problem. Then, on the following Monday evening, a pickup truck struck a utility pole and brought down the transformer feeding the datacenter.
Emergency generators kicked in and operated as intended, and Rackspace transferred its power to its secondary utility power system and brought its chilling units back online. However, the utility shut down power in order to give emergency workers safe access to the downed transformer. Temperatures rose within the datacenter. Rackspace shut down selected servers in order to avoid overheating all of them.
4. Site Giants Go Down: A power outage hit downtown San Francisco on July 24, knocking out 365 Main Inc. — a 227,000 square-foot facility and datacenter development company. At least three of 365 Main's eight co-location centers were knocked out. Among the Web sites that went down for a few hours were giants like Craigslist, GameSpot, Yelp, Technorati, Typepad and Netflix. Power was restored after 45 minutes.
365 Main later estimated that between 20 and 40 percent of its customers were affected.The company ultimately attributed the disaster to backup generators made by the Dutch firm Hitec, which failed to kick in. It seems that an incorrect setting in one tiny generator component prevented the component's memory from resetting properly.
5. Lycos Lost: A network outage at hosting provider SAVVIS Inc. knocked out several Web sites including Web portal Lycos Inc. in January 2007. The Web sites of CIO magazine and CSO magazine also went offline. Lycos Mail and the Tripod personal Web-hosting service went dark. The outage occurred while the main data line to SAVVIS was being repaired, and the secondary line was severed, leaving the hosting company with no Internet connection. The outage lasted overnight.
6. Millions Without Email: Research In Motion Ltd. suffered a systemwide outage that left millions of customers unable to access their email in April 2007. The company eventually traced the cause of the outage to "a new, noncritical system routine that was designed to provide better optimization of the system's cache." This relatively trivial software had been tested, but the problems didn't occur until it was deployed in production mode. Research In Motion began its failover process to a backup system, but the process failed. The company has since conducted a thorough review of its testing, monitoring and failover procedures.
7. Ongoing Outages: ServerBeach, the discount Web hosting arm of PEER 1, experienced a service outage in November at its San Antonio datacenter. It was caused by the severing of two circuits in a fiber-optic line that the city was relocating. What's remarkable is ServerBeach's record of repeated outages. Its Virginia datacenter went dark during "ongoing maintaineance" in October 2007, causing at least four hours of downtime for some customers. Earlier, in June, the same datacenter was affected by a power outage; only one of two backup generators came online. And in June 2006, the Virginia datacenter again went dark during "ongoing maintenance."
8. Flooded Faves: Torrential rains flooded a T-Mobile datacenter in Bothell, Wash., in December. The resulting outage knocked out new activations, "short code" dialing to T-Mobile customer service, access to the company's Web site and "My Faves" settings. The outage lasted less than a day.