Powering Discoveries
Jorge Salazar
Vista: A Bridge to Horizon
TACC introduces AI-focused supercomputer with an eye on future systems
We at TACC are excited to announce Vista, the new NSF-funded, AI-centric system arriving in late 2023 or early 2024.
Vista will feature compute nodes with new NVIDIA CPUs tightly coupled with AI-focused GPUs and high-speed networking.
“The lower power and higher memory bandwidth we expect from Vista will not only deliver more computation, but also enable bigger models and run with more power efficiency,” said TACC Executive Director Dan Stanzione.
Vista will set the stage for TACC’s Horizon, the forthcoming system that will serve as the flagship of the NSF-funded Leadership Class Computing Facility (LCCF), planned for 2025. Horizon is expected to provide 10 times the capability of Frontera, the current top U.S. academic supercomputer. Frontera users take advantage of its 8,000+ nodes to conduct breakthrough science that is not possible on smaller supercomputers.
Vista will mark a departure from the x86-based architecture used in Frontera, the Stampede systems, and others, to CPUs based on the ARM architecture. ARM has long been used in mobile phones, laptops, and more. The new NVIDIA “Grace” processor is an ARM-based CPU specifically designed for the needs of AI and scientific computing.
“Vista will be where our users get their first taste of what’s to come on Horizon.”
“We're excited about it,” Stanzione said. “It’s our first-ever system with an ARM-based primary processor.”
The NVIDIA Grace Hopper Superchip (shown left) will be the processor for about half of Vista’s compute nodes, which combines the CPU with a GPU. With it, CPU memory can be seamlessly accessed by the GPU to enable bigger AI models. The NVIDIA Grace CPU Superchip, which contains two of the Grace processors in a single module, will fill out the remainder of Vista’s nodes for non-GPU codes.
NVIDIA’s Quantum-2 InfiniBand networking platform will provide Vista with up to 400Gb/s of networking performance.
Rather than featuring the conventional 'sticks' of Dual In-line Memory Modules (DIMMs) on the motherboard, this system integrates its memory directly into the processor module. It utilizes LPDDR5 technology, which is akin to the memory found in laptops but fine-tuned to meet the specific demands of data centers. In addition to delivering higher bandwidth, this memory is much more power efficient than traditional DIMMs, with a savings that could be as much as 200 watts per node.
“That is a huge power advantage,” Stanzione said.
On the storage side, TACC has partnered with VAST Data to supply the file system with all-flash, high-performance storage linked to Stampede3.
Everything that TACC learns from using the new NVIDIA processors — porting software stacks, performance tuning, tighter coupling of CPU and GPU — will apply to easing user migration when Horizon is deployed.
“Vista will be where our users get their first taste of what’s to come on Horizon,” Stanzione concluded.