Cloud Containers

Deploying Software Defined Storage with Cloud Containers

Over at The Virtualization Guy blog, Gaston Pantana wrote a great post on how to integrate SoftLayer’s Object Storage with QuantaStor using Cloud Containers. Cloud Containers let you back up storage volumes to Amazon S3, Google Cloud Storage and IBM/SoftLayer Object Storage. The data is compressed, encrypted, and deduplicated and is ideal for unstructured information such as documents.


The following Vine shows a cloud container being created for SoftLayer Object Storage.

Software Defined Storage

Protecting Against Data Loss Using RAID for Post-production Media Storage


If you just watched the above YouTube video about how Toy Story 2 was almost erased out of existence by a mistyped Linux command, then you’re probably paying close attention to why using RAID for media storage is a great idea. In addition to a disaster recovery plan, it’s always wise to protect critical data against disk failures and RAID provides a great solution.

Disk drives do have life spans. After cloud backup provider Backblaze analyzed 25,000 of their deployed production drives, they found that 22% of drives fail in their first four years at varying failure rates (see Figure 1) with 78% of Backblaze drives still alive after four years.

drivefailureratesFigure 1

Using RAID to Protect Against Disk Failures

RAID, also known as “redundant array of independent disks,” combines multiple drives into one logical unit for fault tolerance and improved performance. Different RAID architectures provide a balance between storage system goals including reliability, availability, performance, and usable capacity. One of the primary uses of RAID technology is to provide fault-tolerance so that in the event or a disk failure there is no downtime and no loss of data. Some RAID types support multiple simultaneous disk failures such as RAID6, and on the other end of the spectrum RAID0 combines disks into a unit for improved performance but does not provide disk fault-tolerance.

Parity, from the Latin term “paritas,” means equal or equivalent and refers to RAID types 5 & 6 where an error correction algorithm (XOR and Reed-Solomon) are used to produce additional “parity” data which can be used by the system to recover all the data in the event a drive fails. RAID types 2, 3, and 4 are not commonly used as they require special hardware or have design aspects that make them less efficient than RAID5 or RAID6, so we’re going to skip over RAID2/3/4.

Typical RAID configurations include:

  • RAID 0 consists of striping, without mirroring or parity and is generally NOT recommended as loss of a single disk drive results in complete loss of all data.
  • RAID 1 consists of mirroring and is recommended for small configurations, usable capacity is 50% of total capacity.
  • RAID 5 consists of block-level striping with distributed parity and is recommended for archive configurations but use no more than 7 drives in a group. It can sustain the loss of one drive but then must be repaired using a spare disk before another disk fails. Large RAID5 groups are risky as the odds of a second disk failure is higher if many disks are in the RAID5 unit. For example, a RAID5 unit with 3 data disks and one parity disk (3+1) will have usable capacity of 75% of the total and be low risk. In contrast a RAID5 group with 12+1 is higher risk due to the increased probability of a second failure during a unit rebuild from a first disk failure.
  • RAID 6 consists of block-level striping with double distributed parity and can be slower with some RAID implementations but generally performs close to the performance of RAID5. Again, it’s good for archive, not so good for virtual machines and other high transaction workloads.
  • RAID 7 consists of block-level striping with triple distributed parity and can sustain three simultaneous disk failures which makes it ideal for large long term archive configurations. Using the ZFS storage pool type is referred to as RAIDZ3 indicating the 3 drives used for Reed-Solomon parity information.
  • RAID 10 consists of multiple RAID1 groups that are combined into one large unit using RAID0. It’s also the most recommended RAID layout as it combines fault tolerance with a large boost in IOPS or transactional performance.
  • RAID 50 consists of multiple RAID5 groups which are combined into one large unit using RAID0. Using small RAID5 groups of 3 disks + 1 parity or 4 disks + 1 parity disk you can use RAID50 for light load virtualization deployments while yielding a higher amount of usable disk space (75% and 80% respectively).
  • RAID 60 consists of multiple RAID6 groups which are combined into one large unit using RAID0 and is good for large archive configurations.

ZFS, QuantaStor’s native file system, supports RAID 0, RAID 1, RAID 5/50 (RAID-Z), RAID 6/60 (RAID-Z2) and RAID7/70 a triple-parity version called RAID-Z3.

Parity-based RAID for Media and Archive

Because RAID6 employs double parity (called P and Q) and can sustain two simultaneous disk failures with no data loss, it’s a good solution for ensuring critical data against drive failures. RAID6 is highly fault tolerant but it does have some drawbacks. To keep parity information consistent, parity-based RAID layouts like RAID5 and RAID6 must update the parity information any time data is written. Updating parity requires reading and/or writing from all the disks regardless of the data block size being written. This means that it takes roughly the same amount of time to write 4KB as it does to write 1MB.

If your workload is mostly reads with only one or two writers that do mostly sequential writes, as is the case with large files, then you’ve got a good candidate for RAID6.

RAID controllers that have a battery backed or super-capacitor protected NVRAM cache can hold writes for a period of time and often times can combine many 4K writes into larger efficient 1MB full-stripe writes. This IO coalescence works great when the IO patterns are sequential as with many media and archive applications but it doesn’t work well when the data is being written to disparate areas of the drive as you see with databases and virtual machines.

For more information on configuring RAID with QuantaStor see the Getting Stared Guide and be sure to check out the Vine on Creating a RAID6 Storage Pool with QuantaStor.

Disaster Recovery High Availability

Use Vines to Learn QuantaStor in Seconds

At OSNEXUS we’re always trying to make it easy for our customers and partners to learn the many features that the QuantaStor SDS platform has to offer. We also realize that everyone is very busy and time is a precious commodity. To that end, we’re happy to announce that in addition to the OSNEXUS support site, we are introducing our new OSNEXUS Vine page enabling anyone to learn QuantaStor, even if they only have six seconds! You can view a few of our new Vines below. Also, be sure to follow our Twitter feed for new vines that will be added weekly.

Vine-logoCreate a Gluster Volume with QuantaStor
Vine-logoCreate a QuantaStor Storage Pool
Vine-logoCreate a QuantaStor Network Share
Vine-logoAdd a QuantaStor License Key

GlusterFS QuantaStor Software Defined Storage Storage Appliance Hardware
ZFS Hot Spares

How to: Universal Hot-spare Management for ZFS-based Storage Pools

A standard best practice for preventing data loss due to disk failure is to designate one or more disk spares so that fault-tolerant arrays can auto-heal using the spare in the event of a HDD or SSD drive failure.  Universal hot-spare management takes that a step further and lets you repair any array in the system with a given hot-spare rather than having to designate specific spares for specific RAID groups (arrays).

Policy Driven Hot Spare Management

QuantaStor’s universal hot-spare management was designed to automatically or manually reconstruct/heal fault-tolerant arrays (RAID 1/10/5/50/6/60/7/70) when one or more disks fail.  Depending on the specific needs of a given configuration or storage pool, spare disk devices can be assigned to a specific pool or added to the “Universal” hot spare list meaning that it can used by any storage pool.

QuantaStor’s hot spare management system is also distributed and “grid aware” so that spares can be shared between multiple appliances if they are connected to one or more shared disk enclosures (JBODs).  The grid aware spare management system makes decisions about how and when to repair disks pools based on hot-spare management policies set by the IT administrator.

Some of the challenges that the QuantaStor’s policy driven auto-heal system must tackle includes making sure to only repair pools with like devices (don’t use that slow HDD to repair an SSD pool!), detecting when an enclosure was turned off by differentiating between a power loss versus a disk failure, allowing users to set policies and designate disks to pools as needed and allowing hot spares to be shared and reserved within and across appliances for HA (High Availability) configurations.

With this policy driven hot-spare management, OSNEXUS has developed an advanced system that makes managing HBA (SAS Host Bus Adapter) connected spares for our ZFS-based Storage Pools as easy as managing spares in a hardware RAID controller (also manageable via QuantaStor).

Configuring Storage Pool Hot Spare Policies

The Hot Spare Policy Manager can be found in the Modify Storage Pool dialog box under Storage Pools. (Figure 1)

QuantaStor Hot Spare Policy Manager

Figure 1

The default hot spare policy is “Autoselect best match from assigned or universal spares.” Additional options include auto-select best match from pool assigned spares only, auto-select exact match from assigned or universal spares, auto-select exact match from pool assigned spares only and manual hot spare management only.

Marking Hot-Spares

Marking and unmarking physical disks as hot spares can be found under “Physical Disks” by clicking on the disk and then selecting from the dialog box “Mark as Hot Spare” or “Unmark as Hot Spare.” This applies to the “Universal” hot spare list. (Figure 2)

Figure 2Figure 2

Pinning Spares to Specific Storage Pools

To add assigned disk spares to specific storage pools rather than the universal spare list click on the storage pool and select “Recover Storage Pool/Add Spares.” (Figure 3)

Adding Assigned Disk Spares to Specific Storage Pools
Figure 3

Select disks to be added to the storage pool using the “Add Hot-spares / Recover Storage Pool” dialog box. (Figure 4)

Add Hot-spares / Recover Storage Pool
Figure 4 

For more information about Hot Spare Management and Storage Pool Sizing see the OSNEXUS Solution Design Guide.

Disaster Recovery

QuantaStor v3.14 Released

The latest maintenance release of QuantaStor SDS (v3.14) was published on December 30th, 2014 and comes with several new features. Some highlights include:

  • Cascading replication of volumes and shares allows for replicating data in an unlimited chain-linked fashion from appliance to appliance to appliance.
  • Kernel upgrade to the Linux 3.13 that adds support for the latest 12GB SAS/SATA HBA and RAID controllers as well as the latest 40GbE network interface cards.
  • Advanced universal hot-spare management to the ZFS-based storage pool type that’s enclosure aware and makes hot-spares universally shared within an appliance and across multiple appliances.

This is also the first release that has some initial Ceph support but at this time we’re only working with partners via a pilot program around the new Ceph capabilities. For more information about the pilot program please contact us here and note that broad GA availability of Ceph support is planned for late Q1 2015.

Below is the full list of changes. Linux kernel update instructions can be found on the OSNEXUS support site.

Change Log:

  • SO DVD image: osn_quantastor_v3.14.0.6993.iso
  • MD5 Hash: osn_quantastor_v3.14.0.6993.md5
  • adds 3.13 linux kernel and SCST driver stack upgrade
  • adds support for Micron PCIe SSD cards
  • adds universal hot-spare management system for ZFS based pools
  • adds support for FC session management and session iostats collection
  • adds disk search/filtering to Storage Pool Create/Grow dialogs in web interface
  • adds configurable replication schedule start offset to replication schedule create/modify dialogs
  • adds support for cascading replication schedules so that you can replicate volumes across appliances A->B->C->D->etc
  • adds wiki documentation for CopperEgg
  • adds significantly more stats/instruments to Librato Metrics integration
  • adds dual mode FC support where FC ports can now be in Target+Initiator mode
  • adds support for management API connection session management to CLI and REST API interfaces
  • adds storage volume instant rollback dialog to web management interface
  • adds sysstats to send logs report
  • adds swap device utilization monitoring and alerting on high swap utilization
  • adds support for unlimited users / removes user count limit license checks for all license editions
  • adds support for scale-out block storage via Ceph FS/RBDs (pilot program only)
  • fix for CLI host-modify command
  • fix for pool discovery reverting IO profile selection back to default at pool start
  • fix for web interface to hide ‘Delete Unit’ for units used for system/boot
  • fix for alert threshold slider setting in web interface ‘Alert Manager’ dialog
  • fix to accelerate pool start/stop operations for FC based systems
  • fix to disk/pool correlation logic
  • fix to allow IO profiles to have spaces and other special characters in the profile name
  • fix to FC ACL removal
  • fix to storage system link setup to use management network IPs
  • fix to remove replication association dialog to greatly simplify it
  • fix to CLI disk and pool operations to allow referencing disks by short names
  • fix for replication schedule create to fixup and validate storage system links
  • fix for replication schedule delta snapshot cleanup logic which ensures that the last delta between source and target is not removed
  • fix for stop replication to support terminating zfs based replication jobs
  • fix for pool freespace detection and alert management
  • fix license checks to support sum of vol, snap, cloud limits across all grid nodes
  • fix to create gluster volume to use round-robin brick allocation across grid nodes/appliances to ensure brick pairs do not land on the same node
  • fix to storage volume snapshot space utilization calculation
  • fix to iSCSI close session logic for when multiple sessions are created between the same pair of target/initiator IP addresses
  • fix to auto update user specific CHAP settings across all grid nodes when modified
  • fix to allow udev more time to generate block device links, resolves issue exposed during high load with replication
  • fix to IO fencing logic to reduce load and make it work better with udev
Storage Appliance Hardware