Managing Scale-out NAS File Storage with GlusterFS Volumes

QuantaStor provides scale-out NAS capabilities using CIFS/SMB and NFS as well as via the GlusterFS client protocol for scale-out deployments. For those not familiar with GlusterFS, it’s a scale-out filesystem that ties multiple underlying files systems together across appliances to present them in aggregate as a single filesystem or “single-namespace” as it’s often called.

In QuantaStor appliances, Gluster is layered on top of the QuantaStor Storage Pool architecture enabling the use of QuantaStor appliances for file, block, and scale-out file storage needs all at the same time.

Scale-out NAS using GlusterFS technology is great for unstructured data, archive, and many media use cases. However, due to the current architecture of GlusterFS, it’s not as good for high IOPS workloads such as databases and virtual machines. For those applications, you’ll want to provision block devices (Storage Volumes) or file shares (Network Shares) in the QuantaStor appliances which can deliver the necessary write IOPS performance needed for transactional workloads and VMs.

GlusterFS read/write performance via CIFS/NFS is moderate and can be improved with SSD or hardware RAID controllers with one or more gigabytes of read/write cache as NVRAM. For deployments accessing scale-out NAS storage from Linux, it is ideal to use the native GlusterFS client as it will enable the performance and bandwidth to increase as you scale-out your QuantaStor grid. For Windows, OS/X and other operating systems you’ll need to use the traditional CIFS/NFS protocols.

QuantaStor Grid Setup

To provision the scale-out NAS shares on QuantaStor appliances, the first step is to create a management Grid by right-clicking on the Storage System icon in the tree stack view in the Web Management User Interface (WUI) and choose “Create Grid.” (Figure 1)

Figure 1

After you create the grid you’ll need to add appliances to the grid by right-clicking on the Grid icon and choosing ‘Add Grid Node.’ (Figure 2) Input the node IP address and password for the additional appliances.

Figure 2

After the nodes are added you’ll be able to manage them from the QuantaStor tree menu (Figure 3). User accounts across the appliances will be merged with the elected primary/master node in the grid taking precedence.

Figure 3

Network Setup Procedure

If you plan to use the native GlusterFS client with Linux servers connected directly to QuantaStor nodes then you should setup network bonding to bind multiple network ports on each appliance for additional bandwidth and automatic fail-over.

If you plan to use CIFS/NFS as the primary protocols then you could use either bonding or separate ports into a front-end network for client access and a back-end network for inter-node communication.

Peer Setup

QuantaStor appliance grids allow intercommunication but do not automatically setup GlusterFS peer relationships between appliances automatically. For that you’ll want to select ‘Peer Setup’ and select the IP address on each node to be used for GlusterFS intercommunication. (Figure 4)

Figure 4

Peer Setup creates a “hosts” file (/etc/hosts) on each appliance so that each node can refer to the other grid nodes by name and can also be done via DNS. This will ensure that the configuration is kept in sync across nodes and allows the nodes to resolve names even if DNS server access is down.

Gluster volumes span appliances and on each appliance it places a brick. These Gluster bricks are referenced with a brick path that looks much like a URL. By setting up the IP to hostname mappings QuantaStor is able to create brick paths using hostnames rather than IP addresses making it easier to change the IP address of a node.

Finally, in the Peer Setup dialog, there’s a check box to set up the Gluster Peer relationships. The ‘Gluster peer probe’ command links the nodes together so that Gluster volumes can be created across the appliances. Once the peers are attached, you’ll see them appear in the Gluster Peers section of the WUI and you can then begin to provision Gluster Volumes. Alternatively you can add the peers one at a time using the Peer Attach dialog. (Figure 5)

Figure 5

Provisioning Gluster Volumes

Gluster Volumes are provisioned from the ‘Gluster Management’ tab in the web user interface. To make a new Gluster Volume simply right-click on the Gluster Volumes section or choose Create Gluster Volume from the tool bar. (Figure 6)

To make a Gluster Volume highly-available with two copies of each file, choose a replica count of two (2). If you only need fault tolerance in case of a disk failure then that is supplied by the storage pools and you can use a replica count of one (1). With a replica count of two (2) you have full read/write access to your scale-out Network Share even if one of the appliance is turned off. With a replica count of (1) you will loose read access to some of your data in the event that one of the appliances is turned off. When the appliance is turned back on it will automatically synchronize with the other nodes to bring itself up to the proper current state via auto-healing.

Figure 6

Storage Pool Design Recommendations

When designing large QuantaStor grids using GlusterFS, there are several configurations that we make to ensure maximum reliability and maintainability.

First and foremost, it’s better to create multiple storage pools per appliance to allow for faster rebuilds and filesystem checks than it is to make one large storage pool.

In the following diagram (Figure 7) we show four appliances with one large 128TB storage pool each. With large amounts of data, it seems like a slightly simpler configuration from a setup perspective. However, there’s a very high cost in the event a GlusterFS brick needs to do a full heal given the large brick size (128TB). To put this in perspective, at 100MB/sec (1GbE) it takes at minimum 2.8 hours to transfer one of TB of data (and that’s assuming theoretical maximum throughput) for a 1GB port or 15 days to repair a 128TB brick.

Figure 7

The QuantaStor SDS appliance grid below (Figure 8) shows four storage pools per appliance that are segmented into 32TB. Each of the individual 32TB storage pools can be any RAID layout type but we recommend using hardware RAID5 or RAID50 as that will provide low cost disk fault tolerance within the storage pool and write caching assuming the controller has a CacheVault/MaxCache or other NVRAM type technology. When accounting for a GlusterFS volume configured with two replicas (mirroring) the effective RAID type is RAID5+1 or RAID51.

Figure 8

In contrast, we don’t recommend using RAID0 (which has no fault tolerance) as there is a high cost associated with having GlusterFS repair large bricks. In this scenario a full 32TB would need to be copied from the associated mirror brick and that can take a long time as well as utilize considerable disk and network bandwidth. Be sure to use RAID50 or RAID5 for all storage pools so that the disk rebuild process is localized to the hardware controller and doesn’t impact the entire network and other layers of the system.

Smart Brick Layout

When you provision a scale-out network share (GlusterFS Volume) it is also important that the bricks are selected so that no brick mirror-pair has both bricks on the same appliance. This ensures that there’s no single point of failure in HA configurations. QuantaStor automatically ensures correct Gluster brick placement when you select pools and provision a new Gluster Volume by processing the pool list for the new scale-out Gluster Volume using a round-robin technique so that brick mirror-pairs never reside on the same appliance. (Figure 9)

Figure 9

Once provisioned, the scale-out storage is managed as a standard Network Share in the QuantaStor Grid except that it can be accessed from any appliance. Finally, the use of the highly-available interface for NFS/CIFS access in the diagram below (Figure 10). This ensures that the IP address is always available even in the event that an appliance goes offline that is actively serving that virtual interface. If you’re using the GlusterFS native client then you don’t need to think about setting up a HA virtual network interface as the GlusterFS native client communication mechanism is inherently highly available.