(Source: Leo /stock.adobe.com; generated with AI)
The traditional data center, once dominated by sprawling server farms, is undergoing a seismic shift driven by the rise of generative artificial intelligence (gen AI). As the demands of AI evolve, the conventional server-centric model is rapidly becoming obsolete. This transformation is not merely about upgrading hardware but redefining how data centers are built and operated. This blog will take a look at some of the technology making this transformation possible.
As sophisticated AI models push the boundaries of what’s possible in natural language processing (NLP), image generation, and beyond, they also push data centers past their limits.
Take GPT-3, for instance, the precursor to ChatGPT. With its staggering 175 billion parameters, it demanded a distributed system of at least 2,048 GPUs to operate efficiently.[1] While the exact number of parameters in GPT-4 has not been publicly disclosed by OpenAI, it is estimated that the model operates with approximately 1.7 to 1.8 trillion parameters, based on multiple credible sources and expert speculation.[2] This exponential growth in complexity isn’t just a numbers game—it’s a clarion call for completely rethinking data center architecture.
It’s worth noting that computational speed can be just as crucial as computational capacity. Consider applications generating visual content within virtual reality settings. These require a framerate of 90fps to reduce dizziness, meaning computational resources must be powerful enough to generate content in a ninetieth of a second.[3] This requirement underscores the importance of low-latency, high-throughput systems in modern data centers, particularly for applications relying on real-time processing.
With all these new demands, it’s clear that the days of CPU-centric server farms are numbered. As these traditional setups hit the wall of diminishing returns, the industry is pivoting towards heterogeneous architectures that decouple computing, memory, and storage resources. This shift allows for a more nuanced, efficient allocation of resources tailored to the unique demands of gen AI workloads.
High-performance computing (HPC) is essential for running generative AI applications. HPC architecture leverages multiple compute nodes, allowing for parallel processing of complex operations.[4]
Graphics processing units (GPUs) are inherently well-suited to this approach. They contain hundreds to thousands of execution units operating in parallel and can handle AI workloads with aplomb.[5] However, the soaring demand for GPUs across various sectors, including cryptocurrency mining, poses a significant challenge for data center designers.[6] The costs have increased and the availability of parts is an issue.
Partly as a result of these shortages, several other specialized processing units are receiving greater attention:
The computational demands of gen AI also translate into increased energy consumption. Considering that, on average, a ChatGPT query uses ten times more energy than a standard Google search, it’s easy to see why data center power demands are projected to surge by 160 percent by 2030 as a result of gen AI.[10] This dramatic uptick presents a significant challenge for data center operators striving to balance performance with sustainability—not to mention operating costs.
One approach to mitigating this hunger for electricity is the development of specialized chip-to-chip communication protocols. These protocols, such as NVIDIA’s direct chip-to-chip interconnects, optimize data transfer between integrated circuits, potentially reducing energy consumption.[11]
The gen AI revolution is not just reshaping algorithms—it’s fundamentally altering the physical infrastructure that powers our digital world. As we move forward, data centers must evolve to meet these AI models’ unprecedented demands while addressing critical energy efficiency and sustainability issues.
The future data center will likely be a marvel of heterogeneous architecture, leveraging a mix of specialized processing units and innovative communication protocols. Those who successfully navigate this transition will remain competitive and set the standard for the next generation of digital infrastructure.
As we stand on the brink of this transformation, one thing is clear: tomorrow's data centers will be as intelligent and adaptable as the AI models they host, ushering in a new era of computational capability and efficiency.
Sources
[1] https://ieeexplore.ieee.org/document/10268594
[2] https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/
[3] https://ieeexplore.ieee.org/document/10268594
[4] https://www.netapp.com/data-storage/high-performance-computing/what-is-hpc/
[5] https://www.nvidia.com/en-us/glossary/high-performance-computing/
[6] https://www.ciodive.com/news/nvidia-gpu-data-center-revolution-jensen-huang/708273/
[7] https://www.dataversity.net/future-data-center-heterogeneous-computing/
[8] https://www.kalrayinc.com/blog/dpus-gpus-and-cpus-in-the-data-center/
[9] https://www.purestorage.com/knowledge/what-is-neural-processing-unit.html
[10] https://www.goldmansachs.com/insights/articles/AI-poised-to-drive-160-increase-in-power-demand
[11] https://developer.nvidia.com/blog/strategies-for-maximizing-data-center-energy-efficiency/.
Brandon Lewis has been a deep tech journalist, storyteller, and technical writer for more than a decade, covering software startups, semiconductor giants, and everything in between. His focus areas include embedded processors, hardware, software, and tools as they relate to electronic system integration, IoT/industry 4.0 deployments, and edge AI use cases. He is also an accomplished podcaster, YouTuber, event moderator, and conference presenter, and has held roles as editor-in-chief and technology editor at various electronics engineering trade publications. When not inspiring large B2B tech audiences to action, Brandon coaches Phoenix-area sports franchises through the TV.