What do you really need for your AI project?

IT worker checking servers in a data center
(Image credit: Getty Images)

Generative AI and its many intriguing use cases have captured our imagination like nothing before it. Around the world, businesses are scrambling to adopt and deploy the technology, though much of the discussion around generative AI has focused on the software the user interacts with, rather than the hardware that’s needed to power such large computations. 

Like most forms of artificial intelligence, generative AI uses large language models (LLMs), which – as the name suggests – contains massive amounts of data. To run such models in-house requires substantial computational resources, with emphasis on power and storage. 

The dramatic advancement of processor technology over the last few years has largely been driven by the demands of AI. As such, these small pieces of silicon are the biggest factors in determining how fast you can train and run your machine learning models and how expensive it will be in the short and long term. 

Similarly, high-performance servers and storage systems are also needed for the data that will power your language models. Said data will need to be kept somewhere locally or easily accessible. While often organizations will opt for a third-party service to handle these considerations, there are some for which in-house AI training is preferable. 

Hardware considerations

While the software side of generative AI takes most of the plaudits, its success is uniquely tied to advancements in hardware. Upgrades to chipsets and increased storage capabilities have helped to improve speed, capacity, and connectivity, which has ultimately led to advances in both the training and inference of generative AI models.

Running generative AI effectively requires machine learning algorithms and the creation of new images, audio, and text, which is a considerable amount of data. Higher data transmission rates and lower power consumption that facilitate more efficient processing at low latencies are also key elements for generative AI. 

Similarly, high-bandwidth memory (HBM) has helped to improve AI accelerators, computational speeds, and energy efficiency with gradual increases in memory bandwidth. 

Storage capacity will be a major concern for businesses that want to train their own AI models in-house. As some organizations have found out in recent years with standard cloud storage, using a third-party provider can prove expensive in the long run, especially if large amounts of data are involved. The same can be the case with third-party AI training services and largely for the same reason: A rapidly growing set of data that was already big when it was first created will cost proportionately more and more to host. Another parallel is egress fees; once the data is in the third party provider’s infrastructure it can prove unexpectedly difficult and pricey to get it back out again.

For some organizations, therefore, it makes more financial sense in the long term to invest in building their own infrastructure to create LLMs and train generative AI applications in house.

CPUs, GPUs, and NPUs

While storage is clearly a top consideration when deciding to build dedicated AI infrastructure, the importance of chips can’t be overlooked

There are a few different kinds of processors being used across industries for generative AI. These include central processing units (CPUs), AI accelerator processor units (APUs), and network processors (NPUs). All three perform different kinds of computation that enable generative AI tasks and are often fitted into specialized processing units that are essential for the types of computational speeds required for generative AI.

Within this heterogeneous architecture, the components accommodate one another to provide computational performance for an AI application. The CPU is typically used for general-purpose tasks, while an AI accelerator can boost tensor operations which speeds up neural network training and inference. The network processor binds the three components together by ferrying data between them.

However, the rapid expansion of generative AI has seen a massive demand for high-performance graphic processing units (GPUs). How GPUs parallelize matrix operations is what powers generative AI; the architecture allows LLMs to process vast amounts of data all at once, speeding up training times. Such increased computational power efficiently trains complex language models with billions of parameters. 

The AMD Instinct MI300A accelerator is a great example of a modern high-powered APU. Largely seen as an industry-leading chip, this data center APU is a perfect combination of the Instinct series and AMD’s EPYC processors that enable the convergence of AI with high-performance computing.  

The next generation of processors

A total of 13 chiplets are used inside the MI300A accelerator to create a chip with twenty-four Zen 4 CPU cores, each fused with a CDNA 3 graphic engine and eight stacks of HBM3, totaling 128GB of memory. This kind of architecture is called 3D stacking, which is seen as an essential route in getting more transistors into the same space while continuing to drive Moore’s Law forward. 

With its Instinct MI300 family of GPUs AMD has potentially captured an advantage, particularly in the generative AI market, with its new approach to accelerator architecture. The MI300X, for example, will appeal to businesses looking to train generative AI models in-house as it has such impressive memory capacity compared to the MI300A – 192GB of higher bandwidth memory (HBM). 

The CDN architecture on the MI300X is designed to handle various data formats and spares metrics that enable AI workloads. That ability to process vast datasets is a key factor for advancing AI and machine learning.  

Alongside the MI300 Series hardware, businesses should also invest in AMD’s ROCm software, which is based on open-source software and is designed for GPU computation. It’s the software primarily used to customize GPU software for development, testing, and deployment and is specifically powered by AMD’s heterogeneous computing interface for portability (HIP).

ROCm can be used in several domains, including general-purpose graphics processors (GPGPU), high-performance computing (HPC), and also heterogeneous computing. This is in addition to its many programming models, like GPU-kernel-based programming and OpenCL which can be used to write programs that run on heterogeneous platforms. 

While there are other considerations when it comes to moving to in-house AI training that go beyond hardware, it’s a good place to start. There is no time like now to begin an AI project and there is no APU like the AMD MI300A to start it with.

TOPICS
ITPro

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.