Hardware RAID is dead, long live hardware RAID.

UPDATE 11/28/2017

Although the benefits outlined in this article mostly still hold true in 2017- we’ve been going the route of using SATA/SAS HBAs connected directly to the drives for Ceph deployments. The main benefits of doing so are performance and additional usable capacity. With multi-site configurations using erasure coding (Example: 7k+5m or 11k+7m with a 3 site configuration) the overall performance is much higher with higher fault tolerance. We’ve also moved to using NVMe as journal so the RAID card’s on-board RAM is of little consequence and when we move up to Ceph Luminous in 2018 we’ll be introducing support for BlueStore which will further boost performance by eliminating the double-journal problem. Today our primary use of hardware RAID is just for mirroring the QuantaStor boot device as it simplifies maintenance but that too we’ll be introducing software options for to eliminate the need for hardware RAID altogether in 2018.

ORIGINAL ARTICLE

With the advent of many new scale-out file-systems and the high-availability technology built into them (SWIFT, Ceph, Gluster, etc) there have been a number of articles in the press talking about how RAID is dead. Our testing and hundreds of deployments show a different reality. osn_blog_raidcrown We’ve found that although it’s not required when using some of these new scale-out storage technologies, it sure adds a lot of value. Here’s why the trusty hardware RAID controller is still relevant and important:

Disk failures have zero impact on cluster network load since they’re repaired using a local hot-spare device attached to the RAID controller so the repair process has minimal impact. With the huge amounts of data that must be moved (8TB+ per device), using the scale-out file-system to heal using the network can impact production applications for minutes to hours.
Disk drives are easy to replace by non-technical personnel. RAID controllers detect and proactively replace bad drives using one or more hot-spares, bad drives are marked with a red LED so it’s easy to identify, pull, and replace.
Greater usable capacity is had because the storage is already fault-tolerant. As such the redundancy at the cluster level need only maintain two (2x) copies of the data (instead of the typical 3x used in in many scale-out filesystems including OpenStack SWIFT, Ceph, Gluster and others) so the storage efficiency goes up to 40% usable vs. the typical mode of operation which yields only 33% usable storage from raw. That’s a 20% increase in usable capacity. If erasure-coding is used with hardware RAID the rebuild speed performance gains are even more pronounced and usable capacity is 64% or higher (eg: 4d+1p RAID5 + erasure coding with a 4+1 node stripe size).
RAID controllers bring with them a NVRAM write-back cache which greatly boosts the IOPS and sequential performance of the data storage and the write log journal devices. Writes hitting the NVRAM are completed at DDR3 RAM speed which outperforms SSD or even NVMe devices.
Combining devices into larger fault-tolerant logical devices reduces the overall number of devices that the scale-out cluster software needs to manage. This makes the cluster software faster and more efficient so one can grow the capacity of the scale-out filesystem to hyper-scale (>30PB).

It’s all about Operations.

Many of the new object storage scale-out systems argue that it’s easier to fail-in-place which means that dead drives should just be left to rot in the appliances as it’s too osn_sds_blog_raid expensive to replace the disk drives. Often times this is done just because identifying and replacing the drives is hard. Fail-in-place actually results in progressive degradation of capacity, reliability and maintainability of the cluster as a whole. It requires special engineering to communicate with the SES module in the disk backplane to make the drive LED go red when a device fails. SDS vendors (including us / OSNEXUS) write special logic to communicate with the disk back-plane to make it easy to identify bad drives. But even with SES integration the RAID controllers do it better because they have years of institutional knowledge about disk drives and fault detection built into the controller that is just not there with the HBAs. Operational efficiency for storage in data-centers is achieved via systematically replacing the hardware which is bad (as indicated by the red LEDs) on a regular basis.

SDS and Hardware Integration

HBAs pass thru all devices to the underlying OS to be managed by software. With qs_dell_md1280_screenshot QuantaStor SDS we integrate with both RAID controllers and HBAs via custom modules that are tightly integrated with the hardware. We support both hardware and software RAID as there are important use cases for both but we’re definitely advocates for combining hardware RAID with scale-out file, block, and object storage
deployments. If you’re using just a standard server with Linux on it, no such luck, you’re in for a head-ache every time a disk needs to be replaced if you’re using a plain HBA.

Rebuild Performance

Drives are getting larger in capacity every year and with 16TB drives and 40TB SSD devices on their way, this is going to put increasing pressure on companies to upgrade their networks, especially in scenarios utilizing pure erasure encoding over multiple sites. The problem these large devices pose for scale-out storage technologies is that the repair process for these devices uses a huge amount of network bandwidth and can crowd out production workloads. In contrast, with hardware RAID in the mix the scale-out technology is used to repair whole appliance outages and the more common disk failures are completely invisible to the scale-out storage cluster. No network load, no impact to the OS, just a controller doing some extra work on the local SATA/SAS bus to heal the array using a local hot-spare.

Summary

The new scale-out technologies like Ceph and GlusterFS bring powerful auto-heal and recovery technologies to the fore. They’re a key part of the QuantaStor SDS strategy and are tools we use to deliver the best scale-out block, file, and object storage to our customers. But, the value of operational simplicity cannot be understated.. IT professionals have less time and need more value from their hardware and software. In our view there’s still a solid value proposition to bringing hardware RAID into the scale-out mix, the deployments are just clearly better.

Steve Umbehocker
Founder & CEO, OSNEXUS

stevenu

I’m the CEO & co-founder of OS NEXUS, a storage appliance software company with a focus on scale-out storage management. We do a lot of hardware and software storage technology integration on Linux with our QuantaStor Software Defined Storage (SDS) platform this blog is our way to share some thoughts and insights on these technologies and where we’re headed with SDS.

About The Author

stevenu

I’m the CEO & co-founder of OS NEXUS, a storage appliance software company with a focus on scale-out storage management. We do a lot of hardware and software storage technology integration on Linux with our QuantaStor Software Defined Storage (SDS) platform this blog is our way to share some thoughts and insights on these technologies and where we’re headed with SDS.

See author's posts

Podcast also available on PocketCasts, SoundCloud, Spotify, Google Podcasts, Apple Podcasts, and RSS.

Hardware RAID is dead, long live hardware RAID.

It’s all about Operations.

SDS and Hardware Integration

Rebuild Performance

Summary

About The Author

stevenu

Like this:

Leave a ReplyCancel reply

Hardware RAID is dead, long live hardware RAID.

It’s all about Operations.

SDS and Hardware Integration

Rebuild Performance

Summary

About The Author

stevenu

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from OSNexus Official Blog