Goodbye, Server Farms: Generative AI and the Data Center

On October 11, 2024 in All, Computing, General by Brandon Lewis

Goodbye, Server Farms

(Source: Leo /stock.adobe.com; generated with AI)

The traditional data center, once dominated by sprawling server farms, is undergoing a seismic shift driven by the rise of generative artificial intelligence (gen AI). As the demands of AI evolve, the conventional server-centric model is rapidly becoming obsolete. This transformation is not merely about upgrading hardware but redefining how data centers are built and operated. This blog will take a look at some of the technology making this transformation possible.

Beyond Traditional Hardware

As sophisticated AI models push the boundaries of what’s possible in natural language processing (NLP), image generation, and beyond, they also push data centers past their limits.

Take GPT-3, for instance, the precursor to ChatGPT. With its staggering 175 billion parameters, it demanded a distributed system of at least 2,048 GPUs to operate efficiently.^[1] While the exact number of parameters in GPT-4 has not been publicly disclosed by OpenAI, it is estimated that the model operates with approximately 1.7 to 1.8 trillion parameters, based on multiple credible sources and expert speculation.^[2] This exponential growth in complexity isn’t just a numbers game—it’s a clarion call for completely rethinking data center architecture.

It’s worth noting that computational speed can be just as crucial as computational capacity. Consider applications generating visual content within virtual reality settings. These require a framerate of 90fps to reduce dizziness, meaning computational resources must be powerful enough to generate content in a ninetieth of a second.^[3] This requirement underscores the importance of low-latency, high-throughput systems in modern data centers, particularly for applications relying on real-time processing.

With all these new demands, it’s clear that the days of CPU-centric server farms are numbered. As these traditional setups hit the wall of diminishing returns, the industry is pivoting towards heterogeneous architectures that decouple computing, memory, and storage resources. This shift allows for a more nuanced, efficient allocation of resources tailored to the unique demands of gen AI workloads.

The GPU Gold Rush and Beyond

High-performance computing (HPC) is essential for running generative AI applications. HPC architecture leverages multiple compute nodes, allowing for parallel processing of complex operations.^[4]

Graphics processing units (GPUs) are inherently well-suited to this approach. They contain hundreds to thousands of execution units operating in parallel and can handle AI workloads with aplomb.^[5] However, the soaring demand for GPUs across various sectors, including cryptocurrency mining, poses a significant challenge for data center designers.^[6] The costs have increased and the availability of parts is an issue.

Partly as a result of these shortages, several other specialized processing units are receiving greater attention:

Field programmable gate arrays (FPGAs): These flexible chips offer lower latency and hardware-level parallelism, making them up to 100 times faster for specific data-centric analytics tasks.^[7] FPGAs are becoming increasingly relevant as AI workloads diversify, requiring more specialized processing capabilities that traditional CPUs cannot efficiently handle.
Data processing units (DPUs): By offloading tasks like encryption and data compression, DPUs free up CPUs and GPUs for more intensive workloads, potentially reducing overall power consumption.^[8] With their specialized low-power cores, coprocessors, and high-speed interfaces, DPUs can handle encryption, data compression, and quality of service (QoS) management tasks. This offloading optimizes performance and potentially lowers a data center’s total cost of ownership by reducing power utilization.
Neural processing units (NPUs): Inspired by the structure and function of neural networks, NPUs are designed to accelerate and optimize AI and machine learning algorithms. They excel in tasks like image recognition and natural language processing, making them an asset in gen AI applications.^[9]

The Energy Equation

The computational demands of gen AI also translate into increased energy consumption. Considering that, on average, a ChatGPT query uses ten times more energy than a standard Google search, it’s easy to see why data center power demands are projected to surge by 160 percent by 2030 as a result of gen AI.^[10] This dramatic uptick presents a significant challenge for data center operators striving to balance performance with sustainability—not to mention operating costs.

One approach to mitigating this hunger for electricity is the development of specialized chip-to-chip communication protocols. These protocols, such as NVIDIA’s direct chip-to-chip interconnects, optimize data transfer between integrated circuits, potentially reducing energy consumption.^[11]

Looking Ahead

The gen AI revolution is not just reshaping algorithms—it’s fundamentally altering the physical infrastructure that powers our digital world. As we move forward, data centers must evolve to meet these AI models’ unprecedented demands while addressing critical energy efficiency and sustainability issues.

The future data center will likely be a marvel of heterogeneous architecture, leveraging a mix of specialized processing units and innovative communication protocols. Those who successfully navigate this transition will remain competitive and set the standard for the next generation of digital infrastructure.

As we stand on the brink of this transformation, one thing is clear: tomorrow's data centers will be as intelligent and adaptable as the AI models they host, ushering in a new era of computational capability and efficiency.

Sources

[1] https://ieeexplore.ieee.org/document/10268594

[2] https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/

[3] https://ieeexplore.ieee.org/document/10268594

[4] https://www.netapp.com/data-storage/high-performance-computing/what-is-hpc/

[5] https://www.nvidia.com/en-us/glossary/high-performance-computing/

[6] https://www.ciodive.com/news/nvidia-gpu-data-center-revolution-jensen-huang/708273/

[7] https://www.dataversity.net/future-data-center-heterogeneous-computing/

[8] https://www.kalrayinc.com/blog/dpus-gpus-and-cpus-in-the-data-center/

[9] https://www.purestorage.com/knowledge/what-is-neural-processing-unit.html

[10] https://www.goldmansachs.com/insights/articles/AI-poised-to-drive-160-increase-in-power-demand

[11] https://developer.nvidia.com/blog/strategies-for-maximizing-data-center-energy-efficiency/.

« Back

Brandon Lewis has been a deep tech journalist, storyteller, and technical writer for more than a decade, covering software startups, semiconductor giants, and everything in between. His focus areas include embedded processors, hardware, software, and tools as they relate to electronic system integration, IoT/industry 4.0 deployments, and edge AI use cases. He is also an accomplished podcaster, YouTuber, event moderator, and conference presenter, and has held roles as editor-in-chief and technology editor at various electronics engineering trade publications.

When not inspiring large B2B tech audiences to action, Brandon coaches Phoenix-area sports franchises through the TV.

Tagged With: ai energy consumption, ai processing units, ai sustainability, data center evolution, data centers, dpu, generative ai, gpu, heterogeneous architecture, npu, server farms

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics