We work hard to minimize downtime, but we are reliant on Joyent in the same way you are reliant on us. Yesterday there was a significant outage due to an operator error at Joyent, taking down most of our infrastructure in the US-East-1 datacenter from approximately 4:30pm to 6:15pm Eastern Time (UTC-5).
Since US-East-1 is where our status page runs, it was also down during the outage. We know: "who relies on their own infrastructure for a status page?!". Our reasoning had been that these types of doomsday scenarios are few and far between, but it is exactly at these times that our status page is the most vital part of our communication to customers.
So we decided this could not happen again. Until now our status page was an application running on Nodejitsu backed by Redis to serve the current archictecture status. In addition to the recursive infrastructure problems, our old status page didn't communicate well. Distinguishing why and what parts of our infrastructure were failing was difficult. We've debated several options for improvement, from a static page hosted outside our cloud to a full NodeJS app with a backend to manage incidents. Each solution had downsides, from maintainability to investment.
This is why we are proud to announce our new status page is now run by the awesome service provided by statuspage.io.
- you can subscribe to the status page to receive notifications about incidents
- it runs external, so we are no longer relying on internal resources
- easy maintainable, allowing us to communicate more effective
How does it look? See it in action here. Finally, the status page will now provide accurate minute to minute system metrics of vital parts of our infrastructre. Our performance has never been more transparent as it is now.