Nvidia NCCL

NCCL is a standalone NVIDIA library that consists of standard communication routines for GPUs. These routines implement all-reduce, reduce, all-gather, reduce-gather, reduce, reduce-scatter, and other send/received-based communication patterns.

NCCL has been optimized for high bandwidth on platforms using NVLink, PCIe, NVswitch. It supports an arbitrary number of GPUs on single or multiple nodes. It can also be used in either single- or multi-process applications.

Project Background

    • GPU Library: NCCL
    • Author: Nvidia for Lawrence Berkeley National Laboratory
    • Initial Release: N/A 
    • Type: Optimized primitives for multi-GPU use
    • License: Restricted 
    • Language: C and C++
    • GitHub: Nvidia/nccl has 1.6k stars, 2 releases, and 31 contributors
    • NCCL Test:  To build and use

Features of NCCL

  • Facilitates automatic topology detection for high bandwidth paths across ARM, PCI Gen4, AMD, and IB HDR
  • Uses SHARPV2 to significantly improve bandwidth with in-network all reduce operations
  • Graph search for highest bandwidth and lowest latency rings and trees
  • InfiniBand verbs, RoCE, libfabric, and IP Socket internode communication
  • Reroutes traffic and eliminates congested ports using InfiniBand Adaptive routingBuil

Scroll to Top