Sep 17, 2025 | Posted by Abdul-Rahman Oladimeji
Meta deployed a 129,000 Nvidia H100 GPU data center cluster across five adjacent data centers. The VP of engineering for its infrastructure division explained that the company had to move thousands of racks to develop its then-largest supercomputer, emptying out live facilities.
“It's extremely expensive to take down data centers that are in production, because these are expensive investments, and you want them to keep running and doing useful work,” Yee Jiun Song said at the AI Infra Summit. In our case, these data centers were serving live workloads, and we have to take them down as quickly as possible, while not causing any user-perceived outages.”
He added: “In order to do this quickly, we had to redesign the loading docks in our data centers, build brand new robots to move these 1,000-pound racks, and even design crateless packaging for these racks to speed up the moves. We quadrupled the amount of networking in these buildings, which involved pulling out and replacing hundreds of meters of network fiber, and digging new trenches to connect the five buildings together. And we did all of this in a matter of months.”
