Meta: What can the data center industry learn from the Facebook blackout?

Oct 07, 2021 | Posted by MadalineDunn

Social media services Facebook, WhatsApp and Instagram recently experienced a six-hour blackout, where users were unable to login, message, and peruse the platforms. But, users weren't the only ones whose access to the platform was interrupted. According to reports, tech giant Facebook also lost access to its internal systems and infrastructure and lost $100M in revenue. But, what do staff shortages have to do with the blackout, and could it have been avoided?

Explaining the cause of the blackout, Facebook's vice president for infrastructure, Santosh Janardhan, wrote in a blog post. "Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.

He added: "This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."

So, as outlined by Bisnow, the issue didn't derive from a data center mishap, but all the same, the delay was as long as it was because of the time it took for engineers to arrive at the location of the issue in Santa Clara, California. 

Interesting Engineer talked to a Facebook insider who commented on this, and why it took so long to get everything back up and running. They said: "There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified."

The insider added that this was in part "due to lower staffing in data centers due to pandemic measures." A total of 75% of Facebook's 60,000 workforce are currently not in the office, meaning that there's a significant lack of physical presence in its data centers.

This has, of course, put data center staff shortages back under the microscope. It wasn't long ago that research from the Uptime Institute outlined that by 2025, 300,000 more staff will be required to join the industry. Yet, its most recent report found that nearly half of data center operators and owners and operators encounter difficulty finding skilled candidates to fill positions.

It appears that while automation and remote monitoring are on the rise in the industry, the importance of workers' proximity to and involvement with data centers cannot be overstated.