AWS customers will be able to build multi-trillion parameter LLMs under latest deal with Nvidia

AWS logo pictured at the Tech & Innovation Expo during the South by Southwest (SXSW) Sydney festival in Sydney, Australia, on Wednesday, Oct. 18, 2023
(Image credit: Getty Images)

AWS and Nvidia have announced an extension of their strategic collaboration that will see the new Blackwell GPU platform come to the AWS platform. 

The hyperscaler will offer the Nvidia GB200 Grace Blackwell Superchip and B100 Tensor Core GPUs in a move designed to help customers unlock and leverage new generative AI capabilities.

The collaboration brings integration between Nvidia’s newest multi-node systems that feature the chipmaker’s next-gen Blackwell platform and AI software, AWS’ Nitro System and Key Management Service (AWS KMS) advanced security, Elastic Fabric Adapter (EFA) petabit scale networking, and Amazon elastic Compute Cloud (Amazon EC2) UltraCuster hyperscale clustering.

In an announcement, the companies said this combination of technologies will enable customers to build and run real-time inference on multi-trillion parameter large language models (LLMs) more efficiently than previous generation Nvidia GPUs on Amazon EC2.

“NVIDIA’s next-generation Grace Blackwell processor marks a significant step forward in generative AI and GPU computing,” commented Adam Selipsky, CEO at AWS.

“When combined with AWS’s powerful Elastic Fabric Adapter Networking, Amazon EC2 UltraClusters’ hyperscale clustering, and our unique Nitro system’s advanced virtualization and security capabilities, we make it possible for customers to build and run multi-trillion parameter large language models faster, at massive scale, and more securely than anywhere else.”

Accelerated LLMs through AWS

As part of the expanded partnership, Nvidia’s Blackwell platform, which features GB200 NVL72, will now be available via the AWS platform, complete with 72 Blackwell GPUs and 36 Grace GPUs interconnected by fifth generation Nvidia NVLink.

The platform will connect to AWS’ EFA networking and will leverage the cloud giant’s advanced Nitro System virtualization and hyperscale clustering, EC2 UltraClusters.

AWS said this combination will enable customers to scale to thousands of GB200 Superchips and speed up inference workloads for resource-intensive multi-trillion parameter language models.

Additionally, AWS is planning to offer EC2 instances featuring the new B100 GPUs deployed in EC2 UltraClusters for accelerating generative AI training and inference at greater scale.

GB200s will be available on Nvidia’s DGX Cloud platform to help accelerate development of generative AI and LLMs that have the capability to reach beyond 1 trillion parameters.

Improved security 

AWS and Nvidia are also building on existing AI security measures, with the combination of AWS Nitro System and Nvidia’s GB200 designed to prevent unauthorized users from accessing model weights. 

The GB200 works to allow physical encryption of the NVLink connections between GPUs and encrypts data transfer from the Grace CPU to the Blackwell GPU, while EFA will encrypt data across servers for distributed training and inference.

The GB200 will also benefit from the AWS Nitro System’s ability to offload I/O for functions from the host CPU/GPU to specialized AWS hardware, while implementing enhanced security to protect customers’ code and data during processing.

With the GB200 on Amazon EC2, AWS said customers will be able to create a trusted execution environment alongside their EC2 instance, leveraging AWS Nitro Enclaves to encrypt training data and weights with AWS KMS.

RELATED WHITEPAPER

Users can load the enclave from within GB200 instance for direct communication with the superchip, which will enable KMS to communicate directly with the enclave and transfer key material directly and securely.

The enclave is then able to pass that material to the GB200 securely and in a way that prevents the AWS operators from ever accessing the key or decrypting the training data or model weights.

More details on ‘Project Ceiba’

First announced at AWS re:Invent 2023, Nvidia and AWS are also collaborating to build one of the world’s fastest AI supercomputers.

Dubbed ‘Project Ceiba’, the new supercomputer will be hosted on AWS and used by Nvidia to advance AI for LLMs, graphics and simulation, digital biology, robotics, self-driving cars, as well as Nvidia Earth 2 for climate prediction.

The supercomputer will boast 20,736 B200 GPUs and is being built using the new Nvidia GB200 NVL72 system, which features fifth-generation NVLink that connects to 10,368 Grace GPUs. It will also leverage fourth-generation EFA networking for scaling, offering up to 8000 Gbps per superchip of low-latency, high-bandwidth networking throughput.

The pair said this combination will enable the processing of up to 400 exaflops of AI and six-fold increase over earlier plans to build Ceiba on Hopper architecture.

"AI is driving breakthroughs at an unprecedented pace, leading to new applications, business models, and innovation across industries,” commented Jensen Huang, Nvidia founder and CEO.

“Our collaboration with AWS is accelerating new generative AI capabilities and providing customers with unprecedented computing power to push the boundaries of what's possible."

Daniel Todd

Dan is a freelance writer and regular contributor to ChannelPro, covering the latest news stories across the IT, technology, and channel landscapes. Topics regularly cover cloud technologies, cyber security, software and operating system guides, and the latest mergers and acquisitions.

A journalism graduate from Leeds Beckett University, he combines a passion for the written word with a keen interest in the latest technology and its influence in an increasingly connected world.

He started writing for ChannelPro back in 2016, focusing on a mixture of news and technology guides, before becoming a regular contributor to ITPro. Elsewhere, he has previously written news and features across a range of other topics, including sport, music, and general news.