Highlights
- Gaudi2 was able to train ResNet-50 in 36 percent less time than Nvidia’s submission, which equates to a 56 percent higher performance.
- Gaudi2 beat the A100 by 7% in BERT, although the margin of victory was smaller.
The most recent MLPerf benchmark results, which have grown to be the industry-standard set of benchmarks to compare AI accelerators, show that Intel’s Habana has surpassed Nvidia. Even though Nvidia has already announced its next-generation GPU, the findings show that competition in deep learning training hardware is getting intense.
Intel acquired Habana firm in late 2019 for USD 2 billion and its first-generation 16nm Gaudi NPU (Neural Processing Unit) last year. It went live in Amazon’s AWS cloud, boasting a 40% greater performance per dollar than Nvidia-based instances. The 7nm A100 from Nvidia was its main rival, and Habana added value by costing less than Nvidia rather than outperforming it.
This changed in May when Habana unveiled Gaudi2 on 7nm, which supports up to 96GB of HBM2e and triples the number of tensor processing cores. Habana asserted that it outperformed the A100, Nvidia’s top two-year-old data center GPU, by a comfortable margin. The introduction was timed well to be included in the most recent results of MLPerf, an industry effort to standardize deep learning benchmarking.
Performance outcomes
Habana claimed that because it only had ten days from the launch to submit its findings, it could not complete all eight tests and instead concentrated on ResNet-50 (image recognition) and BERT (natural language processing), the two most well-known benchmarks. Peer evaluation takes a month for submissions to MLPerf.
Habana added that the short period meant that thorough software optimizations hadn’t yet been done. For instance, while it wasn’t included in the submission, Gaudi2 introduced support for the new, lower-precision FP8 format. While Nvidia allegedly uses optimizations not included in its customer-available software, Habana chose to submit findings based on the same software that is accessible to all Habana customers.
This indicates that there is a more significant performance disparity in non-optimized scenarios. Gaudi2 was at least two times faster on ResNet-50 and BERT than the A100 in Habana’s tests utilizing open repositories on Azure. Habana contends that these outcomes are more indicative of the unique performance that users of open-source software will experience.
Gaudi2 could train ResNet-50 in 36% less time than Nvidia’s submission, equating to a 56% higher performance. Although slower than Gaudi2, PyTorch-using starting MosaicML’s results delivered a training time of 23.8 minutes that beat Nvidia’s submission. Additional software improvements might shorten Gaudi2’s time in the future submission.
Gaudi2 beat the A100 with a minor victory of 7% in BERT. Gaudi2 outperformed Gaudi in ResNet-50 and BERT by a factor of 3 and 4.7, respectively. All accelerator results are based on 8-card servers. Compared to the 32x theoretical scaling limit, Habana’s results for a system with 256 cores give approximately 25x performance. This demonstrates that performance is preserved in scale-out topologies, in which these chips are frequently used.
Next steps
Most AI firms believed they could defeat Nvidia by abandoning all GPU technology and concentrating solely on AI hardware. Habana’s Gaudi2 has defeated Nvidia’s A100; both are made on the 7nm manufacturing technology. Despite having had only a few days to submit its findings since its official debut, it has become a formidable contender utilizing out-of-the-box hardware and commercially available software. According to Habana, the performance gap on non-optimized code outside MLPerf can be nearly two times greater. Since each Gaudi chip also has 24 embedded 100G Ethernet ports and Habana is likely to price its Gaudi2 lower than Nvidia’s A100, the difference in the total cost of ownership may be much more than Habana and AWS have already claimed; this is the case for the first-generation Gaudi.
Even if Habana may have won this round’s performance contest, Nvidia has already confirmed that the H100 will be available later this year. Additionally, Habana has not yet disclosed any Gaudi2 cloud instances.