Deploying a High Availability Storage Cluster with GlusterFS

During the Paris OpenStack Summit earlier this month, Red Hat announced the latest version of GlusterFS, version 3.6.0, with new features including volume snapshots, erasure coding across GlusterFS volumes, improved SSL support, and rewritten automatic file replication code for improved performance.

Today, GlusterFS provides the speed, reliability and features such as snapshots, cloning, thin provisioning and massive scalability that can be expanded with RAM and solid state drives (SSDs) to accelerate throughput and IOPS performance.

As we’ve stated before, we believe that GlusterFS is becoming the defacto standard scale-out file storage platform for Big Data deployments as its file-based architecture is great for unstructured data ranging from documents and archives to media.

Online Upgrades, Mostly

When managing Big Data, the key feature is high availability. With multi-petabyte archives and potentially hundreds of client applications reading and writing files it’s typically very difficult to find a maintenance window where the storage can be offline for upgrades. But with cluster based solutions like GlusterFS you can upgrade hardware without imposing downtime on clients due to the replica based architecture of GlusterFS. Multiple replicas provides access to data even if one copy of the data on a given appliance node goes offline.

The trouble is that when updating GlusterFS software a coordinated upgrade across nodes may be required where a maintenance window is required. This is because the introduction of new features can at times be very difficult to synchronize while old versions of the software are running on other nodes. In general, the GlusterFS team has done a great job with the more recent versions but when looking at any storage deployment you’ll need to factor in a maintenance window, and if you can’t afford one, you’ll need to setup replication so that you have failover ability to a second storage cluster while the first one is being upgraded.

Boosting Efficiency with Erasure Coding

The downside to using replicas for high-availability is the dramatic drop in useable storage. With two copies of every file, only 50 percent of your storage is usable. And with three copies only 33 percent is usable. This means that if you have 10PB of files and you are going to maintain 2 copies of each file so that your solution is highly available, you will need to purchase 20PB of raw storage.

Erasure coding takes a different approach to delivering high-availability and fault-tolerance by using parity information so that your storage overhead can be as low as 10 percent in some cases. Therefore, instead of needing to buy 20PB of raw storage you will only need ~12PB.  For those familiar with RAID technology you can think of it as loosely similar to network RAID5. This is a new capability for GlusterFS and it’s critical for deployments that need to scale to 10s of petabytes as the cost in just raw hardware and power becomes a serious issue using the replica model.

Making GlusterFS Easy to Manage

QuantaStor takes a holistic approach to GlusterFS integration by bringing management, monitoring, and NFS/CIFS services together so that deployments can be done faster, easier, with point-click-provision simplicity.

Provisioning GlusterFS Volumes

Gluster Volumes are provisioned from the ‘Gluster Management’ tab in the QuantaStor web management interface. To make a new Gluster Volume simply right-click on the Gluster Volumes section or choose Create Gluster Volume from the tool bar (Figure 1).

To make a Gluster Volume highly available be sure to choose a replica count of two or three.  If you only need fault tolerance in case of a disk failure that is provided by the storage pools and you can use a replica count of one but if an appliance goes offline then that portion of the data will be inaccessible. With replica count of two or three your data is always available even in the event a node is taken offline.

Figure 2

Figure 1

Auto Healing

When the appliance is turned back on it will automatically synchronize with the other nodes to bring itself up to the proper current state via auto-healing. GlusterFS does the all the work for you by comparing the contents of the “bricks” and then synchronizing the appliance that was offline to make it bring it up to date.

High-Availability for Gluster Volumes

When using the native Gluster client from a Linux server there are no additional steps required to make a volume highly-available as it will communicate with the server nodes to get the updated peer status information. To see the commands to connect to your QuantaStor appliance via the native Gluster protocol, just right-click on the volume and choose ‘View Mount Command.’

When accessing the Gluster Volume via traditional protocols such as CIFS or NFS, additional steps are required to make the storage highly available because CIFS and NFS clients communicate with a single IP address.

If the appliance serving storage through an interface with that IP address is turned off, then the IP address must move to another node to ensure continued storage access on that interface. QuantaStor natively provides this capability by allowing you to create virtual network interfaces for your Gluster Volumes that will float to another node automatically to maintain high-availability to your storage via CIFS/NFS in the event that an appliance is turned off.

OSNEXUS engineering is actively performing feature validation of GlusterFS 3.6 and the new erasure coding features. We look forward to releasing an updated version of QuantaStor in early 2015 with erasure coding support to leverage this new jump in efficiency it provides.

For more in-depth technical information on Managing Scale-out GlusterFS Volumes see the OSNEXUS administrators’ guide.

Categories: GlusterFS, High Availability

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: