The Future of High Availability in Software Defined Storage: Ceph and GlusterFS

Of any feature storage administrators could claim to be the most important of a SDS solution, it would arguably be High Availability (HA). To date, High Availability has been a challenge for many Software Defined Storage solutions because the traditional mechanism for High Availability failover requires the use of special hardware and the process of failover can be slow. Slow is “bad” for HA failover because if it takes too long it for storage access to come back online, it can cause VMs to lock up which then need to be rebooted.

Scale-out open storage technologies such as Ceph and Gluster take a fundamentally different approach and are in the process of changing the storage landscape.

Ceph and Gluster achieve High Availability by making multiple copies of data that are spread across multiple servers in a cluster to ensure there is no single point of failure. Turn off any node, or in some cases even multiple nodes, and there’s no downtime and near instantaneous fail-over of workloads. This is a major step forward because proprietary hardware and custom hardware is not longer required to achieve fast, reliable failover.

In our meetings with customers and partners we’re seeing Ceph quickly progressing to becoming the go-to standard for OpenStack deployments, while Gluster is becoming the defacto standard scale-out file storage platform for Big Data deployments. The RADOS object storage architecture underlying Ceph makes it ideal for virtual machine workloads and in contrast Gluster’s file based architecture is great for unstructured data such as documents, archive, and media so we see a long term future for both of these technologies.

So the key take away is that both technologies reduce the cost of deployment for highly available storage clouds by a significant factor. With Ceph and Gluster there is no need to purchase expensive proprietary block or file storage from vendors offering proprietary solutions. You get the speed, reliability and features you need like snapshots, cloning, thin provisioning and massive scalability without the vendor lock-in, all on commodity hardware which can be expanded with RAM and solid state drives (SSDs) to accelerate throughput and IOPS performance.

At OSNEXUS, we integrated Gluster into the platform in 2013 and today we’re focused on deep integration of the Ceph file-system so that a broader audience can setup, manage, and maintain their virtualized environments with ease. We’ll be rolling out the first version of QuantaStor (v3.14) with integrated Ceph management in November.

QuantaStor is a scale-out SDS solution that installs on bare metal server hardware or as a VM so that you don’t have to deal with the complexities typically associated with deploying, installing, and managing scale-out storage. For more information on how to get a copy of QuantaStor Trial Edition or QuantaStor Community Edition click here.

qs_ceph

Ceph on the QuantaStor SDS Platform 

Ceph GlusterFS Software Defined Storage Storage Appliance Hardware
quantastor storage encryption

QuantaStor 3.13 now available featuring enhanced encryption management and one-step GlusterFS peering

The team at OSNEXUS has been hard at work this summer on the latest release of QuantaStor and today I’m happy to announce that QuantaStor 3.13 is now generally available with new encryption features, one-step GlusterFS peering and inclusion of the latest maintenance releases of ZFS (v.6.3) and GlusterFS (v3.5.2).

Security has always been an important focus for QuantaStor and now we’ve made it even easier to administer and manage encryption at both the Linux OS level with LUKS and at the physical drive level through the QuantaStor command line interface (CLI).

At the software level, QuantaStor now uses the LUKS (Linux Unified Key Setup) system for key management and comes with tools to greatly simplify the configuration and setup of encryption.

The QuantaStor qs-util CLI utility comes with a series of additional commands to encrypt disk devices including cryptformat, cryptopen, cryptclose, cryptdestroy, and cryptswap. There’s also a devicemap command which will scan for and display a list of devices available on the system. You can read more about setting up LUKS software storage encryption management here on the Wiki.

encrypted drives

QuantaStor 3.13 showing encrypted drives

From a hardware encryption perspective, QuantaStor now allows you to administer and manage an encrypted RAID controller directly through the qs CLI. There are three CLI commands for setting up hardware encryption using the ‘qs’ command line utility. They are ‘hw-unit-encrypt’, ‘hw-controller-create-security-key’, and ‘hw-controller-change-security-key.’ Read more about configuring QuantaStor drive encryption here.

Gluster Peer Setup

Setting up QuantaStor appliances into a grid allows them to intercommunicate but it doesn’t automatically setup the GlusterFS peer relationships between the appliances. For that we’ve created the new one-step ‘Peer Setup’ dialog in the Web Interface enabling the selection of the IP address on each node that you want Gluster to use for intercommunication between the nodes for Gluster operations.

The benefit of using Peer Setup in QuantaStor is ensuring that the configuration is kept in sync across the nodes and allowing the nodes to resolve names even if DNS server access is down. Read more on automated Peer Setup here.

gluster peer setupGluster Automated Peer Setup

GlusterFS QuantaStor 3.13 Security

Redapt – A Great New Partner

We continue to grow from strength to strength at OSNEXUS.  With the announcement of our advanced high availability features we continue to attract top tier resellers and integrators.

Today we announced a North American reseller distribution agreement with Redapt, Dell’s leading large value added reseller partner.  The team at Redapt are building some highly innovative data center infrastructure and cloud solutions built on OSNEXUS’ software and Dell server and storage systems.  Our first major customer engagement was with MediaAMP, a media distribution and content delivery platform.  Employing multi-tier/multi-tenant data storage and secure end user applications, MediaAMP provides secure storage and delivery methods that meet information privacy requirements for protected information in the healthcare and financial services sector.

 

Click here to read the entire press release

 

Storage Appliance Hardware

QuantaStor Updates

We have been so busy adding new features to QuanatStor that we have not had time to share all the great stuff that has gone in already in 2014.  If you have not had a chance to check out our new HA capability in QuantaStor V3.9 I highly recommend looking at our last blog post that includes an overview video.

Some other features that have been added since late last year include:

  • Parallelized backups to dramatically speed up the backup process.  We can now support up to 64 concurrent backup streams
  • High Availability for SAN environment
  • Support for Gluster HA virtual network interfaces
  • SNMP Agent with get/walk/trap support
  • Support for GlusterFS 3.4.2
  • Multipath support for dual pat SAS HBA connectivity to SAS JBODs
  • Integrated support for newer controllers from LSI and Adaptec including certification of the LSI MegaRaid 93XX, 12G SAS RAID controller

This list is only a highlight of all that has been added.  If you want to look at detailed reports check out our detailed list on the OSNEXUS Wiki pages

http://wiki.osnexus.com/mediawiki/index.php/QuantaStor_Version_ChangeLog

Filesystems QuantaStor Storage Appliance Hardware

Defining Software Defined Storage (SDS), what it is, and isn’t.

Software Defined Storage (SDS) has become something of an overloaded term to mean many things these days.  This article is about how I see it being defined, what it is and isn’t.

Fundamentally, SDS is about a sea change that’s happening in the storage industry, it’s about providing companies with a way to plan for the future and to manage the reality of > 40% year-over-year data growth rates.  The key feature of SDS that makes managing this growth possible is the use of commodity-off-the-shelf (COTS) server hardware.  By moving beyond proprietary storage hardware, companies can take control over their largest component of IT spend, namely storage. Much like the migration from mainframes to open-systems took place over a decade ago, today a migration is happening from mainframe-like proprietary hardware storage systems to SDS appliances built using COTS hardware.  In my view this is the only way you can explain the flat growth of the leading traditional storage companies in this exponential data growth environment.

Don’t Believe the Hype

There are a lot of storage solutions out there under the banner of SDS but I’m going to make the argument that these are not all true SDS systems.  Here are a few signs the storage appliance you’re looking at may be traditional proprietary hardware under an SDS umbrella:

  • You’re being sold a system in a proprietary chassis or a white-box chassis with a custom bezel
    • Custom bezels look cool and there’s a lot of value to having the storage vendor constrain the hardware compatibility list to a narrow subset so that problems are easier to triage and solve. That’s great, but be careful that the fancy bezel doesn’t eliminate your ability negotiate the price of the hardware vs the price of the software or limit your ability to upgrade the system.
  • The price you’re paying on a per TB basis is equal to or higher than traditional Tier 1 systems.
    • If you’re cost per TB is not considerably lower you need to ask whether or not it is sustainable at this price and your data growth rate for the next 4-5 years let alone on into the future.
  • You have little to no ability to customize the hardware or software in the appliance such as adding more RAM, NICs, reporting software, etc
    • At OS NEXUS we get asked frequently can I add additional software or hardware to the system and because the platform is modular and Linux based usually the answer is often yes.  Just as you can add more RAM and NICs to your VMware servers you should have the flexibility to expand your storage appliances using approved commodity hardware when you need it.
  • The appliance APIs are not publicly available and/or are not REST APIs
    • If you can’t automate the appliance or if all they provide you with is antiquated SNMP, SMIS, or C/Java API interface that’s a red flag that indicates the vendor is selling rehashed legacy kit under an SDS banner.  Your appliances should be easily script-able with REST APIs and a solid CLI with XML output so you can fully automate common provisioning and configuration tasks.
  • The appliance has a completely proprietary IO stack / filesystem / volume manager
    • There’s no doubt plenty of room for great innovations in filesystems but they take a long time to mature (typically 10 years), and if the underlying filesystem is proprietary, you’re effectively locked in.  Great for the storage vendor, not so good for you.
  • The web interface looks like something from a circa ’99 DSL modem
    • You’ve got to maintain the box for the life of the data and that can be a long time.  When you have new engineers join your IT team you need to make sure the interface is intuitive and designed with good access controls.  Without that, your new IT staff can and inevitably will shoot themselves (and you) in the foot.

Taking a brief look at the state of IT today it’s clear that open-source-software (OSS) is king.  OpenStack, Hadoop, MongoDB, Gluster, Ceph, ZFS and many other open storage technologies are the tip of the spear of the cloud revolution.  It’s a market that demands a near zero-cost entry point with commercial support and management software for commercial deployments.  When you choose an SDS solution that’s built on open storage I/O stack you not only have a community behind it, you have an insurance policy that the technology will be there long into the future.

A Focus on OSS

This philosophy is central to what we do at OS NEXUS and is why the majority of our innovations are focused on our scale-out storage management layer and in making storage easy to manage.  For us it’s about making the best and most reliable SDS platform on the market by leveraging and integrating with the best enterprise OSS and commercial commodity hardware available.  We do that, package the SDS platform up as downloadable ISO image, and provide fanatical customer support so you have the peace-of-mind you need when you deploy and realize the benefits of SDS for your business.

Filesystems QuantaStor Storage Appliance Hardware

Integrating ZFS

We just released QuantaStor v3.6 this month which completed our integration of ZoL (ZFS on Linux) with QuantaStor.  This article outlines some of the new features ZFS brings vs. our other storage pool types which are based on XFS and BTRFS. ZFS was something of a big lift as our storage pool stack was initially designed for traditional filesystems like XFS or BTRFS over Linux’s MD software RAID. ZFS is different in that it incorporates both the filesystem and the underlying volume management layer into one and this also lends many advantages.  These advantages are often must-have features for virtualization deployments like being able to online grow the storage pool by adding devices on the fly and hot-adding SSD drives as write-cache (ZIL) or read cache (L2ARC) to boost performance.

When to choose ZFS vs XFS?

We’re still learning alot about ZFS performance tuning but so far we’ve been able to get ZFS to perform at roughly 80% of XFS’s performance for many workloads and much better for some using SSD write caching (ZIL).  Going forward we will continue support both XFS along with ZFS as even though XFS doesn’t have the depth of features. This is because XFS is an ideal filesystem for sequential I/O workloads we see in Media & Entertainment and for some archive applications where the data is often already compressed.  For virtualization and many other workloads that benefit from ZFS’s advanced feature set, we recommend using ZFS now.

The table below gives a breakdown of the differences in capabilities between the storage pool types based on the underlying filesystem.

Note that ‘Volumes’ / storage volumes are iSCSI/FC LUNs which reside in a storage pool, and ‘Network Shares’ are CIFS/NFS shares which also reside within a given Storage Pool. Both volumes and shares can reside within and be provisioned from any given storage pool.
pool_cap* XFS and BTRFS storage pools can be expanded if they use the ‘Linear’ software RAID type but there’s a 15 sec time period where the pool is taken offline to expand it. BTRFS has software RAID features which would eliminate this issue but at the time of this writing those features have not matured and are not integrated into QuantaStor.
** Storage Pool deduplication can be turned on using the zpool command line utility. BTRFS has some user mode utilities for deduplication but no in-line  deduplication yet.
*** Replication of volumes from one QuantaStor appliance to another is done from a consistent point in time snapshot when done with btrfs or ZFS. XFS does not have this benefit as it lacks snapshot capabilities.
**** SSD Caching is possible using hardware RAID controllers like LSI Nytro MegaRAID or LSI CacheCade software add-on to MegaRAID.
 

Combining Hardware and Software RAID a ‘win-win’

Though QuantaStor allows you to use ZFS without a hardware RAID controller we recommend that you still use one.  The hardware RAID controllers are much better at managing drive replacement, hot-spares, drive identification and more.  It’s this combination of hardware RAID and software RAID that gives the best of both worlds as you can online expand ZFS with no downtime and you can easily manage the underlying fault-tolerance using an enterprise grade hardware RAID controller.   The hardware RAID controllers like the LSI MegaRAID series also typically have 1GB or more of non-volatile DD3 write-cache which also helps boost ZFS performance so the combination of software and hardware is a win-win.  Last, QuantaStor has integrated hardware RAID management so you can provision hardware RAID unit(s) within the web management interface for your storage pools in just a few clicks.  This makes the overhead of provisioning the hardware RAID units negligible and greatly reduces the cost of maintaining the storage appliance.

Thoughts on BTRFS

We’re still tracking BTRFS and look forward to seeing it mature in the months ahead.  For now we’ve tagged it as an ‘Experimental’ pool type until the BTRFS project team takes the ‘experimental’ badges off and sorts out some remaining issues.  Overall we see the filesystem working well but we’ve had to add logic to Quantastor to regularly reblance the filesystem meta data or else we see these -28 (out of space) errors so there are still some gaps in the BTRFS test matrix.  Once those are sorted we’ll restructure the software RAID management for btrfs along the lines of what we’ve done with ZFS so that we can hot-add storage to btrfs based storage pools with no downtime.  From there the gaps between the ZFS and BTRFS will be a fairly small list.   Specifically a mechanism like the ZFS send/receive is needed to support efficient remote replication, and support for SSD caching like you see with ZFS’s ZIL and L2ARC are similarly needed.

Hope you enjoyed the article and I look forward to more details on our ZFS tuning in future blog posts.

-Steve

ps. To try out QuantaStor v3 with ZFS please see our ISO download and license request page here.

Storage Appliance Hardware