Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

The obvious answer would be Nvidia’s new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren’t available everywhere, the researchers noted. Meanwhile, H100 and H200 systems are plentiful and relatively cheap.

The catch: running large models across multiple older systems has traditionally meant brutal performance penalties. “There are no viable cross-provider solutions for LLM inference,” the research team wrote, noting that existing libraries either lack AWS support entirely or suffer severe performance degradation on Amazon’s hardware.

TransferEngine aims to change that. “TransferEngine enables portable point-to-point communication for modern LLM architectures, avoiding vendor lock-in while complementing collective libraries for cloud-native deployments,” the researchers wrote.

Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

iOS 26.2 To Kill A Key Feature For Apple Watch Users In Europe

Flaw in React Native CLI opens dev servers to attacks

iOS 26.2 Beta 1 Finally Lets You Customize The Liquid Glass Effect On The Lock Screen

Sorry, GM – Apple’s CarPlay Isn’t Going Anywhere