At ZAG, we have been helping companies prepare for, and recover from, tech disasters for many years. This week was the first time that we personally experienced one. We learned (and relearned) many lessons in this disaster, and the goal of this posting is to share some of those lessons.
We were made aware of the situation when our network immediately shut down.
On Monday, the suite above us, which is going through construction, had their air-conditioning system freeze up causing an enormous amount of water to flow into our server room. As luck would have it, the water hit our most critical corporate operating systems.
Essentially, gallons and gallons of water poured down from the ceiling into our critical systems. The water event damaged our Number Four rack in a top down method; Switches were a total loss, servers were dramatically impacted, the SAN was slightly impacted and UPS's made it unscathed.
The first obvious lesson highlighted is that business continuity is as important as backups. Business continuity is key to business survival, especially during a technical emergency.
First, our voicemail system went down. This system handles the routing of our support calls coming from our clients. Fortunately, we have planned for such situations by having a system in place with AT&T whereby all incoming calls would be redirected to a different number in the event that our phone system was not reachable. This enabled us to continue to support our clients even without a phone system.
Fortunately, the water damage happened after hours, so the vast majority of incoming calls were support related. This meant the load of incoming calls to the ZAG PRI were not overloaded. We continued operating and supporting our clients quickly due to Business Continuity Planning.
Our second lesson came through our vindicated Data Center Design. If our backups had been in the same rack as our servers, then the experience could have been much worse.
The thought of losing a single rack, which is what happened in our case, may often not be thought of while planning a Data Center Layout. ZAG has placed all backup servers in a different location; this ensured that the backup server was protected from the localized disaster we experienced.
Lastly, the final lesson we received was the power of virtualization. Had our key systems not been virtualized, and taken the damage that several of our virtualized hosts did, we would have been down for much longer.
We completely lost three HyperV servers and the motherboards were destroyed due to the leaking water. However, we greatly benefited from the fact that we live in a virtual environment, and our SAN only suffered minor damages. Thankfully, we have enough virtual hosts to bring up our mission critical servers and keep the business running.
Our disaster this week was real. The damage to our systems was great. Nevertheless, we had Business Continuity practices in place alongside recovery methodology, which helped us successfully weather the “storm” without a significant loss in service.