20 GPUs can carry the equivalent of global Internet traffic, and the Grace CPU super chip is available. What did Nvidia release this GTC?

created at 03-25-2022 views: 8

Where are the limits of technology?

Presumably the science and technology circle will give an answer with a high probability of no!

At the GTC 2022 Keynote, NVIDIA's Huang Renxun brought the H100 GPU built with TSMC's 4nm process with 80 billion transistors, the Grace CPU based on the latest Arm v9 architecture, and the Omniverse and autonomous driving platform Hyperion 9 with metaverse attributes. Software and hardware once again refreshed their own and even the industry's records.

20 H100 GPUs can sustain internet traffic equivalent to the entire world?

20 H100 GPUs can sustain internet traffic equivalent to the entire world?

Nvidia, a company you can call a "chip overlord", or a company that is a big player in artificial intelligence computing and the Metaverse.

In what it does best, GPUs, Nvidia has announced the arrival of its next-generation accelerated computing platform based on the Hopper™ architecture, which jumps an order of magnitude in performance over the previous generation and powers the next wave of AI data centers.

The new architecture, named after pioneer American computer scientist Grace Hopper, replaces the NVIDIA Ampere architecture introduced two years ago.

In addition to this, Nvidia also released its first Hopper-based GPU, the Nvidia H100.


As a replacement for the A100, in terms of design, it deviates from the previous 5nm process manufacturing. This H100 uses the most advanced TSMC 4nm process and has 80 billion transistors, which can speed up AI, HPC, memory bandwidth, mutual The development of connectivity and communications has even enabled external connections of nearly 5 megabytes per second.

In terms of performance, the H100 uses the standard model of natural language processing, the new Transformer Engine. The H100 accelerator can increase the speed of these networks up to 6x over the previous generation without loss of accuracy.

In addition, the H100 is also the first GPU to support PCIe Gen5 and the first to utilize HBM3, achieving a memory bandwidth of 3TB/s. Twenty H100 GPUs can sustain the equivalent of the entire world's Internet traffic, making it possible for customers to deliver advanced recommender systems and large language models to run data inference in real-time.

In addition to the above, H100 has achieved the following breakthroughs in technology:

  • Implement second-generation secure multi-instance GPUs. In the previous generation, Nvidia's multi-instance GPU technology can divide a GPU into seven smaller, fully isolated instances to handle different types of work. The Hopper architecture extends MIG capabilities by a factor of 7 over the previous generation by providing a secure multi-tenant configuration for each GPU instance in a cloud environment.

  • Confidential computing. The H100 is the world's first accelerator with confidential computing power to protect AI models and customer data as they are processed. Customers can also apply confidential computing to federated learning in privacy-sensitive industries such as healthcare and financial services, as well as on shared cloud infrastructure.

  • Supports fourth-generation NVLink technology. To accelerate the largest AI models, NVIDIA is combining NVLink with a new external NVLink Switch, extending NVLink as an extension network beyond the server, allowing up to 256 connections compared to the previous generation using NVIDIA HDR Quantum InfiniBand The H100 GPU also has 9x higher bandwidth.

  • Dynamic programming is accelerated by the new DPX instruction, which is widely used in a variety of algorithms, including route optimization and genomics. Dynamic programming is 40x faster than CPUs and 7x faster than previous generation GPUs. This includes the Floyd-Warshall algorithm for finding optimal routes for fleets of autonomous robots in dynamic warehouse environments, and the Smith-Waterman algorithm for sequence alignment for DNA and protein sorting and folding.

"Data centers are becoming artificial intelligence factories," said Jen-Hsun Huang. "The NVIDIA H100 is the engine of the global AI infrastructure that enterprises use to accelerate their AI-driven businesses."

It is worth noting that Nvidia has also released a series of products based on the H100.

“Artificial intelligence has fundamentally changed how software functions and is produced. Companies that are revolutionizing their industries with artificial intelligence realize the importance of their AI infrastructure,” said Jen-Hsun Huang. “Our new DGX H100 system will Powering enterprise AI factories, distilling data into our most valuable resource—intelligence.”

Based on the H100 of the Hopper architecture, NVIDIA has launched the DGX H100, the fourth-generation DGX™ system.

Featuring 8 H100 GPUs, the DGX H100 can deliver 32 petaflops of AI performance at the new FP8 precision, providing the scale to meet the large-scale computing needs of large language models, recommender systems, healthcare research, and climate science.

Each GPU in the DGX H100 system is connected by fourth-generation NVLink, delivering 900GB/s connection speed, 1.5x more than the previous generation. NVSwitch™ enables all eight GPUs of the H100 to be connected via NVLink.

Nvidia says it can also connect up to 32 DGXs (containing a total of 256 H100 GPUs) using its NVLink technology to create a "DGX Pod."

"The bandwidth of the DGX POD is 768 terbytes per second. In comparison, the current bandwidth of the entire Internet is 100 terbytes per second," Huang Renxun explained.

Multiple DGX Pods can be connected together to create DGS Superpods, which Huang Renxun calls a "modern AI factory."

In this regard, Nvidia has also developed a new supercomputer called Eos, which will be equipped with 18 DGX Pods. In terms of AI processing power, it will be four times as powerful as the Fugaku, the world's most powerful supercomputer.

Eos is expected to go live in the next few months and will be the fastest AI computer in the world.

A super chip composed of two CPUs - Grace CPU Superchip

In the CPU field, Huang Renxun officially shared NVIDIA's first Arm CPU chip designed for data centers - Grace CPU Superchip in a keynote speech.


The reason why it is called a super chip, Huang Renxun said that the chip will double the performance and energy efficiency of Nvidia chips.

However, in essence, this super chip is a combination of two CPUs, consisting of two CPU chips inside, interconnected through NVLink-C2C (a new high-speed, low-latency, chip-to-chip interconnect) Technology comes together.

According to Nvidia, the Grace CPU super chip is designed to provide the best performance, and its single CPU is equipped with 144 Arm Neoverse cores and has achieved an estimated performance of 740 points in the SPECrate2017_int_base benchmark.

This is more than 1.5x better performance than the dual CPUs currently shipping with the DGX A100, as estimated by Nvidia Labs using the same type of compiler.

The Grace CPU super chip's LPDDR5x memory subsystem provides twice the bandwidth of traditional DDR5 designs, up to 1 megabyte per second, while consuming significantly less power, consuming just 500 watts for the entire CPU including memory.

Nvidia says the Grace CPU super chip will excel in the most demanding HPC, AI, data analytics, scientific computing and hyperscale computing applications with the highest performance, memory bandwidth, energy efficiency and configurability, and will be available in 2023 Shipped at the beginning of the year.

The first Omniverse computing system OVX

As a big player in the metaverse, NVIDIA launched a new industrial digital twin computing system, OVX, at this year's GTC developer conference.

OVX was created to run digital twin simulations in the Omniverse, "a real-time physically accurate world simulation and 3D design collaboration platform" published by Nvidia.

"Just as we provided DGX for AI, we now provide OVX for Omniverse," said Jen-Hsun Huang.

OVX is the first Omniverse computing system, consisting of eight Nvidia A40 GPUs, three Nvidia ConnectX-6 Dx 200-Gbps NICs, dual Intel Ice Lake 8362 CPUs, 1TB of system memory, and 16TB of NVMe storage.

When connected to a Spectrum-3 switch fabric, an OVX computing system can scale from a single pod of 8 OVX servers to a SuperPOD of 32 OVX servers. Multiple SuperPODS can also be deployed for larger simulation needs.

According to NVIDIA, "OVX will enable designers, engineers and planners to build physically accurate digital twins of buildings, or create large-scale, realistic simulated environments with precise time synchronization between the physical and virtual worlds. .

Jen-Hsun Huang also pointed out in his speech that due to the complexity of industrial systems, "Omniverse software and computers need to be scalable, low-latency, and support precise timing," and because data centers process data in the shortest possible time, rather than At the exact time, so Nvidia wanted to create a "synchronized data center" with OVX.

The current first-generation OVX system has already been deployed within NVIDIA and with some early customers, and the second-generation system is currently in development and will benefit from NVIDIA's new Spectrum-4 Ethernet platform today.

Spectrum-4 is a 51.2 Tbps, 100 billion transistors Ethernet switch that enables nanosecond timing accuracy.

In addition, at the Omniverse level, NVIDIA also released a new product called Omniverse Cloud, a cloud service designed to facilitate real-time 3D design collaboration between creatives and engineers.

Omniverse Cloud is said to eliminate the complexity that arises from the need for multiple designers to work together in a variety of different tools and locations.

"We want Omniverse to reach every one of the tens of millions of designers, creators, roboticists and AI researchers," said Jen-Hsun Huang.

Autonomous Driving DRIVE Hyperion 9

Autonomous driving, a field in which the major technology giants have "meeted with each other" in recent years, everyone knows that this is a sweet pastry, but whether it can be won or not depends on real skills.

Different from Apple's vision of car building that wants to hold all the software and hardware ecology in its own hands, NVIDIA has a clear goal in the field of autonomous driving, which is to build a fully autonomous driving solution step by step.

Following the release of the Orin chip for autonomous driving in 2019 and its official production and sales this month, NVIDIA has released the next-generation platform for autonomous driving with software, DRIVE Hyperion 9.

Autonomous Driving DRIVE Hyperion 9

According to the official introduction, the DRIVE Hyperion 9 platform adopts an open and modular design, including computer architecture, sensor groups, and a complete NVIDIA DRIVE driver and concierge service application, which is also convenient for developers to get what they need during development.

At the same time, NVIDIA has added redundancy to the calculation of the DRIVE Hyperion 9 architecture. In addition, the DRIVE Atlan vehicle-specific system chip released in 2021 is used, and its performance is more than twice that of the Orin-based chip. At the detailed parameter level, the DRIVE Hyperion 9 architecture includes 14 cameras, 9 radars, 3 lidars, and 20 ultrasonics for autonomous and autonomous driving, as well as 3 cameras and 1 radar for interior occupant sensing.

Nvidia also likens DRIVE Hyperion to the nervous system of the vehicle and DRIVE Atlan as the brain, Nvidia’s generation of systems ranging from NCAP to Level 3 driving and Level 4 parking with advanced AI cockpit capabilities.

Nvidia plans to have DRIVE Hyperion 9 mass-produced vehicles in 2026, while the programmable architecture is built on multiple DRIVE Atlan computers for intelligent driving and in-vehicle functionality.





created at:03-25-2022
edited at: 03-25-2022: