Perplexity AI Leverages Nvidia’s Latest Hardware for Unmatched Performance

By Patricia Miller

May 14, 2026

2 min read

Perplexity AI enhances performance using Nvidia's latest hardware, achieving significant improvements in speed and cost for language models.

#How is Perplexity AI Improving Performance with Nvidia’s Latest Hardware?

Perplexity AI has ramped up its operations by deploying massive language models on the latest Nvidia hardware. This deployment marks a significant leap in performance metrics that can drastically influence operational costs and speed. The technical research released by the company highlights its integration of Qwen3 235B mixture-of-experts models on Nvidia’s Blackwell-generation GB200 NVL72 racks, demonstrating marked enhancements over earlier Hopper-generation systems.

#What Hardware is Perplexity AI Utilizing?

The powerful setup comprises GB200 NVL72 racks, which are equipped with 72 GPUs, each boasting 180 GB of high-bandwidth memory. These GPUs are interconnected through a sophisticated 72-way NVLink, providing a staggering bandwidth of 1,800 GB/s. Such construction is particularly noteworthy as it lays the foundation for superior data handling and processing capabilities in AI applications.

#What Performance Improvements Are Observed?

The performance metrics reveal a dramatic decline in latency for NVLink all-reduce operations. Latency has decreased from 586.1 microseconds with the H200 Hopper to 313.3 microseconds using the GB200, representing a 46% improvement. Additionally, the time taken for MoE prefill combine operations has been significantly reduced from 730.1 microseconds to 438.5 microseconds, trailing a 40% boost.

Notably, Perplexity AI has reported achieving real-time inference speeds that are up to 30 times better than the H100 baselines under certain configurations, illustrating the profound influence of hardware upgrades on operational efficiency.

#What Software Innovations Are Driving Performance Gains?

Beyond hardware advancements, Perplexity AI has implemented several software-level optimizations that enhance the Blackwell architecture's performance. Blackwell-native quantization serves to decrease the precision of model weights, thereby increasing computational speed without significantly compromising output quality. The system also introduces prefill and decode disaggregation, a method that effectively separates initial prompt processing from the token-by-token generation phase. Furthermore, custom kernels have been developed to optimize the demands of operating a 235-billion-parameter MoE model tailored to this specific hardware system.

#Why Are These Developments Important for AI Hardware?

This strategic deployment solidifies Nvidia's competitive edge against alternatives such as AMD’s MI300X and AWS’s custom Trainium chips. The innovative use of the high-bandwidth 72-GPU NVLink topology, which enables 1,800 GB/s bandwidth, is particularly crucial. Many competing systems still depend on slower interconnects that create bottlenecks when coordinating models reliant on multiple GPUs simultaneously.

In summary, the advancements by Perplexity AI not only highlight the superiority of Nvidia's hardware but also signal a pivotal moment in the AI hardware landscape, potentially reshaping market dynamics. Investors attentive to these technological shifts should consider the implications for future developments in AI and machine learning capabilities.

Important Notice And Disclaimer

This article does not provide any financial advice and is not a recommendation to deal in any securities or product. Investments may fall in value and an investor may lose some or all of their investment. Past performance is not an indicator of future performance.