As reported in many sites (link-1, link-2), Amazon's S3 service was down more than 7 hours today. It affected many services including popular Twitter and our flagship iPhone application PhotoShare.
Because we store all the user's photo images on S3 server, most of those photos became inaccessible while S3 service was down. Fortunately, we have image caching mechanism on our application server (an instance of EC2 server), which temporarily hold photos when users post them from their iPhones, and asynchronously move them to S3. While S3 was down, this application server acted as a temporary storage server, and allowed our users to keep posting photos from their iPhone during this downtime.
Majority of photos posted by users (before and after this incidence) were intact, but we have unfortunately lost a few photos which our server has transfered to S3 server at the very beginning of this downtime (we are contacting the owner of those photos regarding this issue). It seems that our application server unfortunately hit a very small window right between the life and death of S3 sever.
We took this issue very seriously, and are installing an additional protection mechanism to secure user's data, so that the future downtime like this won't cause any data loss.
While we did not like this incidence at all, this was a great lesson for us. The cloud computing like Amazon's S3 service really make sense for small companies like us, who needs a flexible scalability. While Amazon has a lot of thing to do to keep their up time much much higher, this incidence left some homework for us to do (such as the additional protection mechanism described above).
I agree completely, with your conclusions, and the comments you left on my own blog. Cloud computing is the way forward, but we need to work a little harder to make sure we are not beholding to just a single vendor.
After all, we would never have hosted our servers with an ISP with just pipe to the Internet, yet, here we all our flocking to the cloud providers putting all our eggs in one basket.
Posted by: Alan | July 21, 2008 at 06:05 AM