If you just watched the above YouTube video about how Toy Story 2 was almost erased out of existence by a mistyped Linux command, then you’re probably paying close attention to why using RAID for media storage is a great idea. In addition to a disaster recovery plan, it’s always wise to protect critical data against disk failures and RAID provides a great solution.
Disk drives do have life spans. After cloud backup provider Backblaze analyzed 25,000 of their deployed production drives, they found that 22% of drives fail in their first four years at varying failure rates (see Figure 1) with 78% of Backblaze drives still alive after four years.
Using RAID to Protect Against Disk Failures
RAID, also known as “redundant array of independent disks,” combines multiple drives into one logical unit for fault tolerance and improved performance. Different RAID architectures provide a balance between storage system goals including reliability, availability, performance, and usable capacity. One of the primary uses of RAID technology is to provide fault-tolerance so that in the event or a disk failure there is no downtime and no loss of data. Some RAID types support multiple simultaneous disk failures such as RAID6, and on the other end of the spectrum RAID0 combines disks into a unit for improved performance but does not provide disk fault-tolerance.
Parity, from the Latin term “paritas,” means equal or equivalent and refers to RAID types 5 & 6 where an error correction algorithm (XOR and Reed-Solomon) are used to produce additional “parity” data which can be used by the system to recover all the data in the event a drive fails. RAID types 2, 3, and 4 are not commonly used as they require special hardware or have design aspects that make them less efficient than RAID5 or RAID6, so we’re going to skip over RAID2/3/4.
Typical RAID configurations include:
- RAID 0 consists of striping, without mirroring or parity and is generally NOT recommended as loss of a single disk drive results in complete loss of all data.
- RAID 1 consists of mirroring and is recommended for small configurations, usable capacity is 50% of total capacity.
- RAID 5 consists of block-level striping with distributed parity and is recommended for archive configurations but use no more than 7 drives in a group. It can sustain the loss of one drive but then must be repaired using a spare disk before another disk fails. Large RAID5 groups are risky as the odds of a second disk failure is higher if many disks are in the RAID5 unit. For example, a RAID5 unit with 3 data disks and one parity disk (3+1) will have usable capacity of 75% of the total and be low risk. In contrast a RAID5 group with 12+1 is higher risk due to the increased probability of a second failure during a unit rebuild from a first disk failure.
- RAID 6 consists of block-level striping with double distributed parity and can be slower with some RAID implementations but generally performs close to the performance of RAID5. Again, it’s good for archive, not so good for virtual machines and other high transaction workloads.
- RAID 7 consists of block-level striping with triple distributed parity and can sustain three simultaneous disk failures which makes it ideal for large long term archive configurations. Using the ZFS storage pool type is referred to as RAIDZ3 indicating the 3 drives used for Reed-Solomon parity information.
- RAID 10 consists of multiple RAID1 groups that are combined into one large unit using RAID0. It’s also the most recommended RAID layout as it combines fault tolerance with a large boost in IOPS or transactional performance.
- RAID 50 consists of multiple RAID5 groups which are combined into one large unit using RAID0. Using small RAID5 groups of 3 disks + 1 parity or 4 disks + 1 parity disk you can use RAID50 for light load virtualization deployments while yielding a higher amount of usable disk space (75% and 80% respectively).
- RAID 60 consists of multiple RAID6 groups which are combined into one large unit using RAID0 and is good for large archive configurations.
ZFS, QuantaStor’s native file system, supports RAID 0, RAID 1, RAID 5/50 (RAID-Z), RAID 6/60 (RAID-Z2) and RAID7/70 a triple-parity version called RAID-Z3.
Parity-based RAID for Media and Archive
Because RAID6 employs double parity (called P and Q) and can sustain two simultaneous disk failures with no data loss, it’s a good solution for ensuring critical data against drive failures. RAID6 is highly fault tolerant but it does have some drawbacks. To keep parity information consistent, parity-based RAID layouts like RAID5 and RAID6 must update the parity information any time data is written. Updating parity requires reading and/or writing from all the disks regardless of the data block size being written. This means that it takes roughly the same amount of time to write 4KB as it does to write 1MB.
If your workload is mostly reads with only one or two writers that do mostly sequential writes, as is the case with large files, then you’ve got a good candidate for RAID6.
RAID controllers that have a battery backed or super-capacitor protected NVRAM cache can hold writes for a period of time and often times can combine many 4K writes into larger efficient 1MB full-stripe writes. This IO coalescence works great when the IO patterns are sequential as with many media and archive applications but it doesn’t work well when the data is being written to disparate areas of the drive as you see with databases and virtual machines.