The Amazon outage that took place a few weeks ago was rightfully referred to as “the computing equivalent of an airplane crash”. Amazon’s Elastic Compute Cloud (EC2) outage rendered hundreds of clients that rely on its services, unable to access data, experience major server interruptions and some sites to be entirely shut down.
Image Source: Cdyne
Most of the media attention surrounding the outage focused on it crashing more popular sites like Reddit, HootSuite, Foursquare and Quora. But it was the start-ups, the small bloggers and an array of online service providers like Heroku that were hit the hardest.
Analysts say that because smaller websites are less apt to pay for extensive backup and recovery services they suffered permanent data loss. Amazon said it would offer a 10-day credit to customers whose websites were affected.
The outage revived concerns and debate about whether or not cloud-based services are really ready to meet business needs that require virtually continuous uptime. Nevertheless, this was far from being the only instance where an outage disrupted web services.
Here are other major outages that shook the web.
When: May 14, 2009
In their blog, Google reported that an error in one of their systems caused them to direct some of their web traffic through Asia, which created a traffic jam. And a major jam it was. It caused its search site as well as its other properties including YouTube, Gmail, Google Analytics, Google Maps, Google Docs, AdSense and Blogger to run exceptionally slowly or not at all. Considering the influx of people and businesses migrating to Google services like calendar, docs and email, the blow was felt hard globally.
The productivity of the US workforce rose two-fold and office managers everywhere rejoiced.
Actually, Facebook called it the site’s worst outage in 4 years. Some glitch occurred with an automated system that was meant to prevent errors and the technical difficulties were so severe that Facebook had to shut down the entire site to fix the problem. For two days, the site was unavailable to most of its 550 million users. Even the “Like” button vanished from hundreds of sites during and left many websites that use the system with errors on their pages.
When: June 2010
Twitter said June was the site’s worst month in terms of “stability and service”. The popular microblogging site had been working through system modifications in order to provide greater stability. This was of course happening while the site was experiencing record traffic. With all the Vuvuzela rage during the summer’s World Cup games, Twitter suffered repeated outages when user activity spiked to unprecedented levels.
Twitter had encountered its first major outage during the January earthquake in Haiti.
When: December 6, 2010
Tumblr and its microblogs were down for nearly 24 hours. It seems that during a planned maintenance (that wasn’t supposed to interrupt service), an issue arose that took down a critical database cluster. Tumblr’s downtime, similar to that of Twitter was also the result of its success. By that time, the site was generating over 500 million page views each month.
When: August 19, 2010
A huge Gap promotion ($25 for $50 worth of Gap apparel) drove so much traffic to the Groupon site that it managed to crash the #1 Deals site’s servers. Groupon put up a 404 error statement indicating the high demand for the Gap deal brought down their site. Despite technical difficulties, 400,000 Groupons were sold, and the deal still brought in $11 million in revenues.
When: April 21, 2011
For some companies, outages aren’t temporary. Perhaps the most disastrous outage yet has been the PlayStation Network’s three week outage. Some 70 million of Sony’s 77 million registered users had their records compromised, including credit card information. At two weeks the outage had already set Sony back by $20 million. It seems hackers are responsible for the outage. While many speculated that Anonymous was behind the malicious hack, (in early April they accused Sony of censorship, violating privacy rights and abusing the legal system, on their open-posting site AnonNews) the group denied any responsibility for the outage.
If you are on the PlayStation network it is highly recommended that you change your passwords and cancel your credit card, if you haven’t done so already.
Overall, AWS and these other major outages have illustrated the limits of technology and the underlying prospect for human error. The biggest lessons these companies can learn is to improve security measures and transparency with their customers in order to keep people in the loop when mistakes happen. Because mistakes are bound to happen.