NVIDIA H100 GPUs set the standard for generative AI in debut MLPerf benchmark

Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLM) that power generative AI.

H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling in a new MLPerf test for Generative AI. This excellence is delivered both per accelerator and at scale in massive servers.

For example, on a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud services provider specializing in GPU-accelerated workloads, the system completed the massive training benchmark based on GPT-3 in less than eleven minutes.

Our customers are building cutting-edge generative AI and LLM at scale today, powered by our thousands of H100 GPUs on fast, low-latency InfiniBand networks, said Brian Venturo, co-founder and CTO of CoreWeave. Our joint presentation of MLPerf with NVIDIA clearly demonstrates the great performance our customers enjoy.

Maximum performance available today

Inflection AI leveraged that performance to build the advanced LLM behind its first personal AI, Pi, which stands for personal intelligence. The company will act as an artificial intelligence studio, creating personal AIs that users can interact with in simple and natural ways.

Anyone can experience the power of a personal AI today based on our state-of-the-art large language model that was trained on the powerful CoreWeaves network of H100 GPUs, said Mustafa Suleyman, CEO of Inflection AI.

Co-founded in early 2022 by Mustafa and Karn Simonyan of DeepMind and Reid Hoffman, Inflection AI aims to partner with CoreWeave to create one of the largest compute clusters in the world using NVIDIA GPUs.

Tale of the tape

These user experiences reflect the performance demonstrated in the MLPerf benchmarks announced today.

NVIDIA wins all eight tests in MLPerf Training v3.0

The H100 GPUs delivered peak performance on every benchmark, including large language models, recommendations, computer vision, medical imaging, and speech recognition. They were the only chips to perform all eight tests, demonstrating the versatility of the NVIDIA AI platform.

Excellence in execution at scale

Training is typically work done at scale by many GPUs working in tandem. In every MLPerf test, H100 GPUs set new performance records at scale for AI training.

Optimizations across the entire technology stack enabled near-linear performance scaling in the demanding LLM test as dispatches went from hundreds to thousands of H100 GPUs.

NVIDIA demonstrates efficiency at scale in MLPerf Training v3.0

In addition, CoreWeave delivered performance from the cloud similar to that obtained by NVIDIA from an AI supercomputer running in an on-premises data center. This is a testament to the low latency NVIDIA Quantum-2 InfiniBand network used by CoreWeave.

In this round, MLPerf also updated its benchmark for recommender systems.

The new test uses a larger dataset and a more modern AI model to better reflect the challenges facing cloud service providers. NVIDIA was the only company to present the improved benchmark results.

A growing NVIDIA AI ecosystem

Nearly a dozen companies presented results on the NVIDIA platform in this round. Their work demonstrates that NVIDIA AI is supported by the largest machine learning ecosystem in the industry.

Proposals came from major system manufacturers that include ASUS, Dell Technologies, GIGABYTE, Lenovo, and QCT. More than 30 submissions were done on H100 GPUs.

This level of participation lets users know they can achieve great performance with NVIDIA AI both in the cloud and on servers running in their data centers.

Performance across all workloads

NVIDIA ecosystem partners participate in MLPerf because they know it is a valuable tool for customers evaluating AI platforms and vendors.

The benchmarks cover workloads users care about computer vision, translation, and reinforcement learning, as well as generative AI and recommender systems.

Users can rely on MLPerf results to make informed purchasing decisions, because the tests are transparent and objective. The benchmarks enjoy the backing of a broad group including Arm, Baidu, Facebook AI, Google, Harvard, Intel, Microsoft, Stanford and the University of Toronto.

MLPerf results are available today on H100, L4 and NVIDIA Jetson platforms through AI training, inference and HPC benchmarks. We will also be proposing NVIDIA Grace Hopper systems in future MLPerf rounds.

The importance of energy efficiency

As AI performance requirements increase, it is essential to expand the efficiency of how that performance is achieved. This is what accelerated computation does.

NVIDIA GPU-accelerated data centers use fewer server nodes, so they consume less rack space and energy. Additionally, accelerated networking increases efficiency and performance, and continued software optimizations drive x-factor gains on the hardware itself.

Energy efficiency is good for the planet and also for business. Increased performance can accelerate time to market and enable organizations to build more advanced applications.

Energy efficiency also reduces costs because NVIDIA GPU-accelerated data centers use fewer server nodes. In fact, NVIDIA powers 22 of the top 30 supercomputers on the latest Green500 list.

Software available to everyone

NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform, enables optimized performance on the leading accelerated computing infrastructure. The software comes with the enterprise-grade support, security, and reliability needed to run AI in the corporate data center.

All of the software used for these tests is available in the MLPerf repository, so virtually anyone can get these top-notch results.

Optimizations are continuously bundled into bins available on NGC, NVIDIA’s catalog for GPU-accelerated software.

Read this tech blog to learn more about the optimizations that power NVIDIA MLPerf performance and efficiency.

#NVIDIA #H100 #GPUs #set #standard #generative #debut #MLPerf #benchmark
Image Source : blogs.nvidia.com

Leave a Comment