Latest Post

Kodiak completes delivery of driverless trucks ahead of commercial operation How to find name-brand beauty products at Dollar Tree — Best Life
Alibaba Cloud abandons Nvidia’s interconnect in favor of Ethernet – technology giant uses its own high-performance network to connect 15,000 GPUs in the data center

Alibaba Cloud engineer and researcher Ennan Zhai has published his research paper on GitHub, outlining the design of the cloud provider’s data centers for LLM training. The PDF document, titled “Alibaba HPN: A Data Center Network for Training Large Language Models,” describes how Alibaba used Ethernet to enable its 15,000 GPUs to communicate with each other.

General cloud computing generates consistent but small data streams at speeds of less than 10 Gbps. LLM training, on the other hand, generates periodic bursts of data that can reach up to 400 Gbps. According to the paper, “this property of LLM training favors Equal-Cost Multi-Path (ECMP), the load balancing scheme commonly used in traditional data centers, for hash polarization, leading to problems such as uneven traffic distribution.”

Leave a Reply

Your email address will not be published. Required fields are marked *