The obvious answer would be Nvidia’s new GB200 systems, essentially one giant 72-GPU server. But those cost millions, face extreme supply shortages, and aren’t available everywhere, the researchers noted. Meanwhile, H100 and H200 systems are plentiful and relatively cheap.
The catch: running large models across multiple older systems has traditionally meant brutal performance penalties. “There are no viable cross-provider solutions for LLM inference,” the research team wrote, noting that existing libraries either lack AWS support entirely or suffer severe performance degradation on Amazon’s hardware.
TransferEngine aims to change that. “TransferEngine enables portable point-to-point communication for modern LLM architectures, avoiding vendor lock-in while complementing collective libraries for cloud-native deployments,” the researchers wrote.



