Numerous service errors on AWS today
We've weathered the last few outages very well, but we got hit badly by this one.
This is what I believe happened to us: We had a lot of Spot instances. After about two hours of not being able to spin up instances, the Spot market went screwy and prices shot up on multiple instance types across multiple availability zones at the same time.
This screwed us. Many of our redundant services were deployed to multiple spot instances across multiple availability zones and most of them were terminated at the same time. Afterwards, we weren't able to spin up replacements and had to scramble to move our key services onto unrelated instances that were spared.
Lesson learned. Multiple availability zones aren't enough to protect you from wild spot price fluctuations and an AWS outage can aggravate spot prices pretty dramatically.
Also, as a final note, I feel this outage was severe enough to warrant something other than a green, everything's OK!!! icon.
I love how i haven't been able to launch a new instance for hours and the status page is still all green check marks, oh a few of them have an "i" subscript.
Love how Amazon pushes multi-AZ redundancy when it's always an entire region that gets screwed up. Oh, and us-east-1 again? Seriously?
when I worked there I always thought it was odd the concept of "green eye" and how every service can define it themselves. one of the few political stuff I remember during my time.