GPUs
The modern GPU has enabled the machine learning industry to make significant progress in the areas of model training and inference. For training, working with large datasets can take anywhere from minutes to hours, depending on the GPU horsepower.
For inference, ingesting data on GPU-enabled systems at the edge minimizes latency and improves data processing performance. Every year, new more powerful GPUs are being introduced to the market at lower price points. For example, the Nvidia RTX 3090 is only $1,499 and comes with 24 GB RAM, 10k+ CUDA Cores, and supports memory bandwidth of 1 Tb/s. However, the only issue, for the time being, is choice.
GPU Competition
When it comes to GPU market competition, there is none. There is Nvidia, the 800 lb. gorilla of the industry, then everybody else. The Nvidia GPUs dominate the global market. However, AMD and Intel are stepping up their game and gunning for Nvidia. Also, there are emerging startups like Graphcore that are succeeding in providing alternatives to Nvidia. All we can do at the present time is cross our fingers and hope the competitive landscape changes in a few years. Here is a list of GPU manufacturers:
- Nvidia
- AMD Radeon
- Intel Arc
- Habana Labs (acquired by Intel)
- Graphcore
Supply Chain Crisis
For 2021, the GPU market has been in a crisis. GPU cards that were once cheap, they still are compared to the past, have sometimes more than doubled in price. Several events hit the GPU market at once, causing the GPU shortage. COVID 19 caused a global supply chain crisis in chips. Workers got sick, ports are backed up, and just about everything that could go wrong did.Â
Next, according to one research firm, crypto miners, especially Ethereum miners, scalpers, and speculators bought 25% of the global GPU card supply in the market. This compounded the GPU shortage problem. The good news, the Ethereum Foundation has changed its system from Proof of Work (PoW) to Proof of Stake (PoS) which is rolling out as we speak. PoW is energy-driven requiring GPU mining to perform validation, whereas PoS is not based on computing power and energy. Lastly, innovative mining hardware using ASICs as opposed to GPUs hit the market a while ago, and these expensive systems are taking over the industry. The result, at least that is what everybody is hoping for, is that a huge supply of GPUs that miners used will hit the secondary (used) market.   Â
GPU Cards
There are several GPU options available to ML engineers for the BIY (build-your-own) system approach. However, it first must be determined how much memory (RAM) is required for any given model. Formulating the proper amount of memory does require a rocket science degree. Graphcore, the GPU startup wrote a blog post back in 2017 that neural networks store “input data, weight parameters, and activations” in memory. ResNet-50 “has 26M weight activations and computes 16M activations in the forward pass.” Using 32-bit floating point, Graphcore estimates 168MB is needed. In two other examples, the company estimated that 2GB was needed for one model and 7.5GB for another.Â
Although memory was an issue five years ago when this blog was posted, we’re in a much different era today. Today, a top-of-line amateur GPU card with 24GB of RAM is $1,499 (list price). And processors like the AMD EPYC come with 64 cores and 256 MB of L3 cache, providing the ideal performance required for training models. Here’s a list of GPU cards:
- Â RTX 3060: 12GB RAM + 3,584 CUDA cores with 360 GB/s mem bandwidth is $329 (list)
- RTX 3090: 24 GB RAM + 10,496 CUDA cores with 936 GB/s mem bandwidth is $1,499
- RTX A6000: 48 GB RAM + 10,752 CUDA cores with 768 GB/s is $4,990Â Â Â Â Â Â