The GPU (Graphics Processing Unit) plays a central part in machine learning. CPUs just don’t cut it. Hubert Yoshida from Hitachi described that CPUs are designed for a single purpose like transaction processing. On the other hand, GPUs were designed for multipurpose where they are able to process tasks and functions in parallel.
Google has developed its own product called the Cloud TPU. And Microsoft Azure has created AMD-powered NV4 instances for GPU-partitioning. However, the giant of the industry is Nvidia.
Renting GPUs from cloud providers is an expensive proposition for many organizations. The good news, there are lots of GPU options available for building your own AI workstation to do training, testing, and running machine learning models.
If deep learning is involved, a heftier GPU will be required because that’s a compute-intensive model. Some deep learning models require millions of calculations and parameters updates in run-time. For $2500 dollars, an engineer can acquire a GPU with 4,680 cores and 576 tensor cores. The Tensor cores are able to improve large matrix operations and do “mixed-precision matrix multiply and accumulate calculations in a single operation”.
Nvidia calls its latest GPU architecture Turing. And it’s the “greatest leap since the invention of CUDA GPU” in 2006, at least that’s what they say. An interesting feature is the real-time tracing that is able to project 3D environments to life. The Nvidia Titan RTX comes with 42GB of GDDR6 memory, 576 tensor cores, and supports 672 GB/s of memory bandwidth. Also, the NVLink features enable cards to be daisy-chained. Below is a list of Quadro of the GeForce cards.
Nvidia | Titan RTX | GeForce | GeForce | Quadro | Quadro | Quadro | Quadro |
---|---|---|---|---|---|---|---|
Specs | Titan RTX | RTX 2080 Ti | RTX 2080 Super | RTX 8000 | RTX 6000 | RTX 5000 | GV100 |
GPU | TU102 | TU102 | TU104 | TU102 | TU102 | TU102 | Volta |
CUDA Cores | 4608 | 4352 | 3072 | 4608 | 4608 | 3072 | 5120 |
Tensor Cores | 576 | 544 | 384 | 576 | 576 | 384 | 640 |
Memory | 24GB | 11GB | 8GB | 48GB | 24GB | 16GB | 32GB |
NVLink | yes | yes | yes | yes | yes | yes | yes |
TFLOPS Single Precision | 16.3 | 13.4 | 11.15 | 16.3 | 16.3 | 11.2 | 14.8 |
Base Clock | 1350Mhz | 1350Mhz | 1650Mhz | 1395Mhz | 1440Mhz | 1620Mhz | 1132Mhz |
Boost Clock | 1770Mhz | 1545Mhz | 1815Mhz | 1770Mhz | 1770Mhz | 1815Mhz | 1627Mhz |
Memory Bandwidth | 672GB/s | 616GB/s | 496GB/s | 672GB/s | 672GB/s | 448GB/s | 868GB/s |
Power | 280W | 260W | 250W | 295W | 295W | 265W | 250W |
Price | $2,595 | $1,199 | $699 | $5,500 | $4,000 | $2,400 | $11,083 |