Btrfs & ZFS, the good, the bad, and some differences.

UPDATE 11/28/2017

It’s been 5 years since I wrote this article and a refresh is due.  In short, at OSNEXUS we’re no closer to adoption of BTRFS and in the intervening years we’ve put our efforts into making ZFS on Linux better as it started strong and has only gotten stronger.  The good news is that btrfs has also improved a lot over the years and as it’s reached it’s critical 10th year in the field.  That said it’s still “mostly OK” and mostly OK doesn’t cut it for production deployments unless you really stick to tight set of core features.   The challenge of writing a filesystem and a volume manager all in one like ZFS or (VxFS+VxVM) is a large one and takes a long time to get right.   In enterprise Linux distros the response has been mixed.  SUSE adopted it as their default filesystem in 2015 but more recently in August 2017 Redhat threw btrfs out and has no plans to support it in the future.   On the Ubuntu side there’s now support for both ZFS and btrfs so you’ve got choice.

There was a brief product fit for btrfs with Ceph but now with BlueStore in Ceph Luminous the need for btrfs for that use case has evaporated.  The remaining best use case fit I see is as a mirrored boot device solution for linux without the need for mdraid or LVM.  A secondary use case may be to use it underneath Gluster to lend it additional features (compression and maybe snapshots).  But that’s likely to be short lived and with no support for that combination from RedHat it’s not likely to ever stabilize.  For that matter, don’t get me started with Gluster.

All that said, BTRFS will get stronger over the next several years and it’s an important technology for Linux.  So be patient, keep supporting it, it’ll get there and will provide a way forward to retire previous generations of filesystems like ext3, ext4, xfs and eventually LVM and mdraid.  That’s a big reduction in complexity and that’s good.

ORIGINAL ARTICLE


I should start by saying these are both fantastic filesystems with a lot in common. With the recent announcement of ZFS on Linux (ZoL) becoming production-ready on Linux we’ve started to integrate ZoL into QuantaStor so I’m writing to get some thoughts out there on these filesystems while it’s all fresh in my mind.

For those unfamiliar with ZFS it’s a powerful filesystem developed at Sun Microsystems (now Oracle) which has been the benchmark by which filesystems are judged for many years now.  ZFS is still used today in many of the Oracle/Sun storage products and by the illumos project but at a certain point in Sun’s history ZFS was made open source and through a long chain of events was eventually ported to Linux. (Actually ported twice, once as a user mode/FUSE filesystem and once or twice as a kernel mode filesystem which the ZoL team is leading today.)

Oddly enough, BTRFS (aka “butter fs”) was also started at Oracle under the leadership of principal author Chris Mason (now with FusionIO) using a completely GPL license with many companies actively contributing to it. It’s an exciting project to monitor that’s come a long way in recent years and is one of the most actively developed native filesystems in the Linux kernel.

The Good

What sets these filesystems apart are the myriad of advanced features they bring over the more traditional filesystems like XFS, ext3/4, and jfs. Btrfs and ZoL both have compression, they both do data checksums, support snapshots, and they both have built-in disk management / RAID features for easy online filesystem expansion and fault-tolerance.

That’s a lot to crow about. Those features enable a whole set of use cases like VDI, high density archive, instant recovery from snapshot, remote replication, online expansion, snapshot based backups and the list goes on. This is why ZFS & btrfs are such a great filesystems for enterprise storage applications and why we initially integrated QuantaStor with both XFS and btrfs back in ’10.

The Bad

It’s not so much bad thing but it is just the reality with filesystems that they take about 10 years to develop and mature. ZFS development started in 2001 whereas BTRFS development started in 2007 but for it’s relatively shorter 6 years it’s come a long way in a short time. As noted above I first started using and testing btrfs back in ’10 when it was relatively new and at the time (kernel 2.6.35) it worked ok but didn’t hold up to our stress testing. We ran into problems like ENOSPC (-28) errors and such but in the more recent builds (v3.5 and newer) btrfs has really been solid. We’ve a series of tests we run which can create millions of files of various sizes with various patterns and such. It also does snapshots, and verification passes and we generally run the test continuously for a week at a time and in the more recent Linux builds we’ve tested we were not able to break btrfs. Not to say our tests are exhaustive of all corner cases but do doubt btrfs has matured alot in the last year or so. From some of the developer discussion threads it looks like the btrfs team may have leveraged some of the filesystem regression test tools developed for XFS and that seems to have helped the filesystem leap ahead in terms of stability and maturity.

The Different

One of the unique things I really like about ZFS is that it has the ability to create virtual block devices which are called zvols. These are like special objects in a ZFS pool of storage and can be very useful for storage virtualization because you can snapshot them and present them to other systems as block devices through protocols like iSCSI, FCoE/FC and Infiniband when used with a SCSI target framework like SCST or LIO. As a storage appliance / SDS software developer this is a really cool feature though it may not be as important for general use cases.

To my knowledge btrfs doesn’t have a zvol block device equivalent but since you can use files and sparse files as LUNs with SCST that’s the technique we use today with our integration of QuantaStor with btrfs.  An interesting feature that seems to be unique to btrfs is it’s support for file level snapshots. This leverages the copy-on-write (CoW) architecture of btrfs to make a space efficient instant copy of an individual file(s). As an example, say you have a huge file you want to copy of like a database, LUN, or a virtual machine image but it would take a long time to copy and waste alot of disk space which may not be available. With btrfs you can use copy command with the extra ‘reflink’ argument like so ‘cp –reflink sourcefile targetfile’ and it’ll make an instant copy of the file that’s space efficient. No bothering with low level snapshot and clones mechanisms, you just get a completely usable read/write copy of that file in an instant, nice! ZFS has snapshot features too but the granularity is a bit more coarse as you can make a snapshot of filesystems within a pool and of volumes. Snapshots are read-only but you can make an instant clone of them which are read-write.

In working with and testing ZFS over the last couple of weeks one of the features I have really come to like is how it presents configuration information as simple properties (key / value pairs) and how it even lets you set custom properties on your filesystems and volumes. The first time I saw this I was thinking ‘huh, what’s all this?’ but it is really clever and makes the filesystem really extensible by applications built on top. Btrfs isn’t properties based with it’s metadata, at least not from what I can tell from it’s command line interface but I think this would be a nice feature to borrow from ZFS that would probably pay dividends in the long run.

No discussion of these advanced filesystems would be complete without mentioning deduplication. ZFS has support for inline deduplication but as I understand it from friends you’ve got to be careful with it. Deduplication in ZFS can have some pretty bad performance impacts and demands alot of system memory. For those that are enabling it, the advice I’ve gotten is that it’s best to have a lot of memory (there’s a ratio) and that it works best with SSD because it can induce alot of random IO. For btrfs there are some interesting deduplication tools under development and there are some patches for inline deduplication but it’s not yet in the mainline kernel release. Something to look forward to and I hope to see it come out before long.

Summary

I really like both of these filesystems and hope to see a continued competition between them that will keep them both pushing the edge of the envelope for years to come.  I also want to send a big thank you to the ZoL project team for doing such a great job in driving the porting of ZFS to Linux and in their packaging of ZoL for such a broad set of platforms. If you’re a Ubuntu user like I am, you need only run a couple of commands to have the power of ZoL at your fingertips:

add-apt-repository ppa:zfs-native/stable
apt-get update
apt-get install ubuntu-zfs

To all the developers working on btrfs I’d like to say congrats on creating a really a fantastic filesystem. We plan on supporting both btrfs and ZoL in QuantaStor and I look forward to seeing btrfs continue to rapidly evolve and mature over the months and years ahead. With btrfs’ newer architecture, GPL basis, and integral role in the future of Linux I think we’ll see btrfs play a key role in everything from mobile devices to enterprise storage in the years ahead.

Last, I’d like to give a shout out to the guys over at Phoronix who have set a new standard in filesystem testing and who have played a key role in quantifying the performance impacts (both good and bad) to the optimizations and improvements to the broad set of Linux filesystems with each new kernel release. Keep them coming!



Categories: Storage Appliance Hardware

2 replies

  1. Thanks for your very interesting review!
    Actually I’m using zfs-fuse on Debian and I’m very happy with it. I choose ZFS because it was a mature filesystem and at the time Btrfs was way too new for my usage in production. I hope to be able to try Btrfs and ZoL someday soon 🙂

    • Thanks Oliver,
      Yes, the btrfs filesystem looks to have found a good home there a Facebook. I’m looking forward to seeing where Chris Mason and his team take it this year. No doubt it will mature quickly in the mega datacenter environment there at FB.
      Best,
      -Steve

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: