The network interface card (NIC) is a key component for the computer requiring communication with an external system. There is a wide selection of motherboards, especially those used in servers that come with an integrated NIC. However, the onboard NIC might not suffice for many use cases since data transfer rates are inadequate for communication with storage arrays and other servers.
For machine learning, a dedicated NIC is a way to go, especially since prices are competitive due to the number of vendors in the market. The network interface card continues to evolve with the latest ones incorporating multi-core CPUs, DDR4 RAM, its own operating system, intelligent capabilities, and more. In fact, the latest ones from companies like Mellanox (Nvidia) call them SmartNICs, which are embedded with DPUs. The DPU takes NICs to another level, in that many data processing tasks are completely offloaded from the CPU. The ideal NIC for a machine learning system depends on the use case.
What is DPU?
According to Nvidia, the DPU (data processing unit) is a new class of processors that is embedded into a SmartNIC that incorporates a plethora of features. Nvidia CEO calls it one of the three pillars of computing along with the CPU and GPU. In short, it acts like an independent system that plugs into a PCIe slot and runs its own multi-core CPU, RAM, and operating system. By doing so, valuable CPU is reserved for other processes and threads. The DPU is an enterprise product to be used in the data center environment for data process intensive workloads like machine learning, big data, deep learning, security, storage, etc.
What functionality can the DPU support? Networking, storage, and security. One extreme example is the AWS Nitro System, which is like a DPU on steroids that does storage, networking, virtualization, security, and much more. Plug it into a PCIe slot and it turns a server into an AWS system. However, current SmartNICs with DPU are not there today but will be someday in the future. Nvidia list the following features of its DPUs.
- Multi-core CPU and dedicated RAM
- Software-programmable supports network virtualization and traffic shaping
- Network speed at line rate
- Supports RDMA (bypasses CPU)
- Designed for high-performance data processing
- Secure root of trust
An Nvidia competitor is Fungible, a startup that focuses on manufacturing DPUs. Instead of using an ARM processor for DPUs like Nvidia, the startup uses FPGA (field-programmable gate array). There is a huge debate going on as to which is technology is better. When it comes to working with the FPGA, a hardware programmer is needed that has experience in low-level languages like Verilog or VHDL. On the other hand, ARM supports high-level programming languages like Rust and Go. The Nvidia DPU is called BlueField and SDK is named DOCA. Companies like Juniper, Red Hat, and F5 have started supporting BlueField.
Ethernet or Infiniband
There are two interconnect options available when it comes to NICs, Ethernet, and Infiniband. Ethernet is used for local area networks and data centers. Infiniband is used as a network fabric in data centers and HPC settings that connect compute and storage together. Ethernet is a global networking standard that generates ~$2B in annual revenue according to the Dell’Oro Group. Infiniband generated ~$200M in 2019 with the market growing at 40% CAGR.
Infiniband is an industry standard that has its place in the world of HPC (high-performance computing) environments. In fact, according to an Nvidia SVP, it is the de facto standard for HPC systems. One stat indicates that 77% of the new HPC systems use it as an interconnect. In short, Infiniband powers the computing centers of research institutions, bioscience companies, hyperscale infrastructure like Azure, and much more. The benefits of Infiniband are the following:
- Full-transport offload network that bypasses the CPU
- Most efficient network protocol
- Lower latency than Ethernet
- Lossless network with zero packet loss
- Supports RDMA where remote systems can access the memory of another host and bypass the CPU
- First to market with higher speeds. 200Gb/s available now and 400Gb/s soon
Infiniband seems like the way to go for machine learning use cases, however, component pricing is much higher than Ethernet. For some companies, it will simply be too expensive. Here is a pricing summary for NICs, DPUs, and Infiniband. NIC pricing from FS, DPUs from Nvidia Store, and Infiniband from CDW.
- NIC – Mellanox MCX4121A 25Gbe: $259
- NIC – Mellanox MCX515A 100GbE: $749
- DPU – Nvidia MBF2H332A 25GbE: $1,545
- DPU – Nvidia MBF2M516A 100GbE: $1,995
- Infiniband – HPE Infiniband 544+M 40Gb/s PCIe3.0: $1,421
- Infiniband – HPE InfiniBand HDR100 100Gb/s PCIe4.0: $1,842
- Infiniband – Mellanox ConnectX-6 VPI Card 200Gb/s: $1,630