Today, I had the exciting opportunity to set up one of NVIDIA’s most powerful AI computing platforms – the Grace Hopper Superchip (GH200). What makes this setup particularly interesting is that it’s actually two systems in one: the main Grace Hopper CPU-GPU system and the BlueField-3 DPU (Data Processing Unit).
Understanding the Hardware
The GH200 combines NVIDIA’s Grace CPU with a Hopper GPU architecture, creating a powerhouse for AI and high-performance computing workloads. The BlueField-3 DPU is essentially a network card on steroids – it’s a specialized processor that handles networking, storage, and security tasks, freeing up the main system for computational work.
Installation Journey
The main system installation started with Ubuntu as the base OS. Despite NVIDIA’s official documentation suggesting Ubuntu 22.04 as the latest supported version, we decided to push boundaries with Ubuntu 24. This led to our first challenge: package compatibility. Some Debian packages needed direct downloads since the apt repositories weren’t yet updated for Ubuntu 24.
The installation sequence was straightforward:
- Ubuntu base installation
- CUDA toolkit deployment
- Latest NVIDIA driver installation
For the BlueField-3 DPU, we needed to:
- Download the latest DOCA (BFB file) – NVIDIA’s software framework for DPUs
- Flash it onto the DPU
- Configure the relationships between BlueField, its BMC (Baseboard Management Controller), and NIC data ports
Challenges and Solutions
The most interesting challenge came from testing the DPU installation. While there are sophisticated testing methods available, sometimes the simplest solution is the best – we confirmed proper operation by (un)plugging the cables. It’s a reminder that even with cutting-edge technology, basic troubleshooting methods remain valuable.
The End Result
The final setup runs Ubuntu on the main system and NVIDIA’s customized Ubuntu variant on the DPU. This configuration is specifically designed for scientific computing, particularly big data analysis and machine learning operations. The combination of the Grace Hopper’s computational power and BlueField-3’s networking capabilities creates an ideal environment for researchers pushing the boundaries of scientific discovery.
Looking Forward
While the installation had its challenges, particularly around OS version compatibility, the system represents a significant step forward in computing capabilities for scientific research. As software support catches up with hardware innovations, we can expect the setup process to become more streamlined.