FreeBSD: Netflix streams at almost 400 GBit/s per server
September 20, 2021For its streaming offer, Netflix has extremely optimized AMD hardware and the FreeBSD used for years. Plans for more are already in place.
For the video streaming provider Netflix, it is technically advantageous to provide as much video data as quickly as possible without having to invest too much in hardware. Developer Drew Gallatin therefore optimizes the FreeBSD system used, among other things, so that the distribution of video data is as close as possible to the technical limit of the network hardware, as he describes in a presentation at the EuroBSD conference (PDF). A single server can now achieve data rates of almost 400 GBit/s.
The basic technical problem here is that, due to the characteristics of the hardware and also because of certain software processes, the technically possible limit of the network hardware often cannot be reached. Thanks to various optimizations, Netflix was nevertheless able to increase data rates from around 200 GBit/s last year to 400 GBit/s per server in the meantime.
Problems with hardware and optimizations as a remedy
Accordingly, the AMD Epyc-7502P CPUs (32 cores), which are supported by 256 GBytes of DDR4-3200 RAM, serve as the basis of the servers. Two Mellanox ConnectX-6 Dx, which offer four times 100GbE as connections, serve for the network connection. Optimal results are initially prevented by the memory bandwidth of around 240 GBit/s.
The NUMA architecture of the Epyc CPUs also limits the flow of data. The large amount of data simply leads to delayed processing by the CPU, since the memory accesses compete with each other. In the best case, the data did not pass through the NUMA CPUs at all, according to Gallatin.
Netflix therefore tested the use of only a single IP address per host and relies on link aggregation. The idea behind this was to always perform calculations with certain data on one of the NUMA nodes, if possible, in order to avoid transfers. However, the change from one to four nodes probably only brought minimal improvements.
Kernel TLS as a solution
To achieve even better speeds after all, the team relied on the idea of taking the CPU out of the data transport as much as possible. This is made possible by the network hardware now in use and the use of a TLS offload with the help of the TLS implementation in the FreeBSD kernel (kTLS).
Gallatin already described details of this in a presentation two years ago (PDF). The actual TLS session is still established in user space, but the keys used then migrate with the help of the kernel to the network hardware, where the encryption takes place. A detour via the CPU is no longer necessary.
Using the latest firmware of the network cards and TLS optimizations, Netflix achieves about 380 GBit/s data transfer with about 400,000 active sessions per server. With ARM server CPUs from Ampere, which were also tested, the team still achieves about 320 GBit/s, but this is probably due to the PCIe connection. The team has not yet been able to test TLS offloading with Intel hardware. In addition, the team already has hardware prototypes with which network connections of up to 800 GBit/s could be possible, but the team has not tested this yet.
At 400,000 simultaneous sessions per server, Netflix with its 200 million subscribers needs about 500 of these servers.