Although the benefits outlined in this article mostly still hold true in 2017- we’ve been going the route of using SATA/SAS HBAs connected directly to the drives for Ceph deployments. The main benefits of doing so are performance and additional usable capacity. With multi-site configurations using erasure coding (Example: 7k+5m or 11k+7m with a 3 site configuration) the overall performance is much higher with higher fault tolerance. We’ve also moved to using NVMe as journal so the RAID card’s on-board RAM is of little consequence and when we move up to Ceph Luminous in 2018 we’ll be introducing support for BlueStore which will further boost performance by eliminating the double-journal problem. Today our primary use of hardware RAID is just for mirroring the QuantaStor boot device as it simplifies maintenance but that too we’ll be introducing software options for to eliminate the need for hardware RAID altogether in 2018.
With the advent of many new scale-out file-systems and the high-availability technology built into them (SWIFT, Ceph, Gluster, etc) there have been a number of articles in the press talking about how RAID is dead. Our testing and hundreds of deployments show a different reality. We’ve found that although it’s not required when using some of these new scale-out storage technologies, it sure adds a lot of value. Here’s why the trusty hardware RAID controller is still relevant and important:
- Disk failures have zero impact on cluster network load since they’re repaired using a local hot-spare device attached to the RAID controller so the repair process has minimal impact. With the huge amounts of data that must be moved (8TB+ per device), using the scale-out file-system to heal using the network can impact production applications for minutes to hours.
- Disk drives are easy to replace by non-technical personnel. RAID controllers detect and proactively replace bad drives using one or more hot-spares, bad drives are marked with a red LED so it’s easy to identify, pull, and replace.
- Greater usable capacity is had because the storage is already fault-tolerant. As such the redundancy at the cluster level need only maintain two (2x) copies of the data (instead of the typical 3x used in in many scale-out filesystems including OpenStack SWIFT, Ceph, Gluster and others) so the storage efficiency goes up to 40% usable vs. the typical mode of operation which yields only 33% usable storage from raw. That’s a 20% increase in usable capacity. If erasure-coding is used with hardware RAID the rebuild speed performance gains are even more pronounced and usable capacity is 64% or higher (eg: 4d+1p RAID5 + erasure coding with a 4+1 node stripe size).
- RAID controllers bring with them a NVRAM write-back cache which greatly boosts the IOPS and sequential performance of the data storage and the write log journal devices. Writes hitting the NVRAM are completed at DDR3 RAM speed which outperforms SSD or even NVMe devices.
- Combining devices into larger fault-tolerant logical devices reduces the overall number of devices that the scale-out cluster software needs to manage. This makes the cluster software faster and more efficient so one can grow the capacity of the scale-out filesystem to hyper-scale (>30PB).
It’s all about Operations.
Many of the new object storage scale-out systems argue that it’s easier to fail-in-place which means that dead drives should just be left to rot in the appliances as it’s too expensive to replace the disk drives. Often times this is done just because identifying and replacing the drives is hard. Fail-in-place actually results in progressive degradation of capacity, reliability and maintainability of the cluster as a whole. It requires special engineering to communicate with the SES module in the disk backplane to make the drive LED go red when a device fails. SDS vendors (including us / OSNEXUS) write special logic to communicate with the disk back-plane to make it easy to identify bad drives. But even with SES integration the RAID controllers do it better because they have years of institutional knowledge about disk drives and fault detection built into the controller that is just not there with the HBAs. Operational efficiency for storage in data-centers is achieved via systematically replacing the hardware which is bad (as indicated by the red LEDs) on a regular basis.
SDS and Hardware Integration
HBAs pass thru all devices to the underlying OS to be managed by software. With QuantaStor SDS we integrate with both RAID controllers and HBAs via custom modules that are tightly integrated with the hardware. We support both hardware and software RAID as there are important use cases for both but we’re definitely advocates for combining hardware RAID with scale-out file, block, and object storage
deployments. If you’re using just a standard server with Linux on it, no such luck, you’re in for a head-ache every time a disk needs to be replaced if you’re using a plain HBA.
Drives are getting larger in capacity every year and with 16TB drives and 40TB SSD devices on their way, this is going to put increasing pressure on companies to upgrade their networks, especially in scenarios utilizing pure erasure encoding over multiple sites. The problem these large devices pose for scale-out storage technologies is that the repair process for these devices uses a huge amount of network bandwidth and can crowd out production workloads. In contrast, with hardware RAID in the mix the scale-out technology is used to repair whole appliance outages and the more common disk failures are completely invisible to the scale-out storage cluster. No network load, no impact to the OS, just a controller doing some extra work on the local SATA/SAS bus to heal the array using a local hot-spare.
The new scale-out technologies like Ceph and GlusterFS bring powerful auto-heal and recovery technologies to the fore. They’re a key part of the QuantaStor SDS strategy and are tools we use to deliver the best scale-out block, file, and object storage to our customers. But, the value of operational simplicity cannot be understated.. IT professionals have less time and need more value from their hardware and software. In our view there’s still a solid value proposition to bringing hardware RAID into the scale-out mix, the deployments are just clearly better.
Founder & CEO, OSNEXUS
Categories: Storage Appliance Hardware