The common disk drive is available in HDD and SSD formats. The Hard Disk Drive (HDD) attaches to the motherboard via a serial connection known as SATA (serial advanced technology attachment). There are three different generations of SATA with the latest one being SATA III. HDD comes in three different speeds: 5400, 7200, and 10,000 RPM.
- SATA I: Runs at 1.5GB/s with actual bandwidth throughput at 150MB/s
- SATA II: Runs at 3Gb/s with actual bandwidth throughput at 300MB/s
- SATA III: Runs at 6GB/s with actual bandwidth throughput at 600MB/s
- NVMe SSD: Actual bandwidth throughput is 3.5GB/s for read/write
The two features that impact the price of the HDD are storage capacity and RPM. Here is a sample of prices based on the RPM:
- 2TB drive at 5400 RPM: ~$45
- 2TB drive at 7200 RPM: ~$80
- 1.2TB drive at 10,000 RPM: ~$229.
The introduction of SSD was a game changer that transformed the entire global technology industry. Not only is it significantly faster by several orders of magnitude, prices have dropped to ridiculously low prices. And the good news, the global supply chain crisis hasn’t impacted availability of this component like the GPU.
- 1TB PCIe NVMe M.2: $87.99
- 2TB HDD 5400 RPM: $44.99
NVMe stands for Non-Volatile Memory because unlike RAM, once you turn off the machine the data stored will still be present. Although NVMe and SSDs are sometimes used interchangeably, NVMe is a transport protocol for SSD that acts as the communication interface between the storage and CPU using PCIe.
For machine learning, SSD is the way to go. However, if vast amounts of storage capacity are required, a combination of the two is deal. Let’s say the datasets or source data being used for ML is in the petabytes, using a separate storage array with inexpensive hard drives is one possible solution.