So how does AI impact the data center? Well, back in 2014 Google deployed Deepmind AI (using machine learning, an application of AI) in one of their data center facilities. The result? They were able to consistently achieve a 40 percent reduction in the amount of energy used for cooling the data center with AI, which equated to a 15 percent reduction in overall PUE overhead after accounting for electrical losses and other non-cooling inefficiencies. It also produced the lowest PUE the site had ever seen. Based on these significant savings, Google looked to deploy the technology across their other sites and suggested other companies will do the same.
Facebook’s mission is to “give people the power to build community and bring the world closer together,” outlined in their white paper Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. It describes the hardware and software infrastructure that supports machine learning at a global scale.
To give you an idea of how much computing power AI and ML needs, Andrew Ng, chief scientist at Baidu’s Silicon Valley Lab, said training one of Baidu’s Chinese speech recognition models requires not only four terabytes of training data, but also 20 exaflops of compute, or 20 billion, billion math operations across the entire training cycle.
But what about AI and our data center infrastructure? How does AI impact the design and deployment of all of the different-sized and -shaped data center facilities that we are looking to build, rent, or refresh to accommodate this innovative, cost-saving, and life-saving technology?
ML can be run on a single machine, but thanks to the incredible amount of data throughput is typically run across multiple machines, all interlinked to ensure continuous communication during the training and data processing phases, with low latency and absolutely no interruption to service at our fingertips, screens, or audio devices. As a human race, our desire for more and more data is driving exponential growth in the amount of bandwidth required to satisfy our most simple of whims.
This bandwidth needs to be distributed within and across multiple facilities using more complex architecture designs where spine-and-leaf networks no longer cut it – we are talking about super-spine and super-leaf networks to provide a highway for all of the complex algorithmic computing to flow between different devices and ultimately back to our receptors.