NCCL is a standalone NVIDIA library that consists of standard communication routines for GPUs. These routines implement all-reduce, reduce, all-gather, reduce-gather, reduce, reduce-scatter, and other send/received-based communication patterns.
NCCL has been optimized for high bandwidth on platforms using NVLink, PCIe, NVswitch. It supports an arbitrary number of GPUs on single or multiple nodes. It can also be used in either single- or multi-process applications.
Features of NCCL
- Facilitates automatic topology detection for high bandwidth paths across ARM, PCI Gen4, AMD, and IB HDR
- Uses SHARPV2 to significantly improve bandwidth with in-network all reduce operations
- Graph search for highest bandwidth and lowest latency rings and trees
- InfiniBand verbs, RoCE, libfabric, and IP Socket internode communication
- Reroutes traffic and eliminates congested ports using InfiniBand Adaptive routingBuil