The Problem With Jumbo Frames

The humble Ethernet frame, the packet, is sent and received across your network and across the internet in chunks, “frames”, “packets”, each containing just 1500 bytes by default.  You’ll see this set in your network interface settings as the MTU or “maximum transmission unit” and it sets the maximum packet size for each send and receive to/from your network interface card.  

When larger files or blocks of data are being transmitted, they’re chunked up into packets of size MTU before being transmitted, this is called “segmentation”.   When using TCP/IP you’ve got a back and forth between the devices that are communicating and there is a digital handshake with each send and receive as the packets are acknowledged by the sender and receiver.

Overall, the 1500 byte packet size works well for most things like viewing the web page of this blog article but if you’re making a large investment in storage systems and compute for your IT environment and there was a magic wand you could wave to make everything about 20% faster, you’d do it right?   This is the allure of enabling Jumbo Frames which is simply an increase of the default packet size from 1500 to (typically) 9000.  

This 6x increase in the packet size (1500 x 6 = 9000) means your network card will spend less time sending and acknowledging packets as that step will be done less often as your data will be transferred in larger chunks.  It’s sort of like going from shoveling snow with a small garden shovel and then changing it out for a snow shovel.  Each scoop takes longer to fill and send (SYN / ACK) but you get more in each scoop and that yields an efficiency gain and in turn more performance.  

There have been so many optimizations made over the years with TCP Offload Engines (TOE) that enabling Jumbo Frames doesn’t give as much boost as it has in the past.  That said, enabling Jumbo Frames for storage workloads like iSCSI and many others will typically yield upwards of a 20% performance gain (not bad!) and worth it for many IT administrators to take the time out to enable it.

So why the gloom with “The Problem With Jumbo Frames”?   Well, they can be a major headache if you turn them on but the other systems, clients, hosts and the switches in between are not properly configured to allow the larger MTU size.  When it’s not set up right, you’ll see weird network behavior, connection problems, you’ll pull your hair out.

Ok, ok, but the 20% gain, the 20% gain, you say!  I get it, yes, I’d enable it too, I know you’re gonna enable it, damn the torpedoes.  So let’s get to it, here are some tips I’ve learned over the years on how to go about enabling it successfully and a bit on some of the tools we’ve added at OSNexus to our QuantaStor platform to make it easier to enable, test, and leverage Jumbo Frames.

Don’t Enable Jumbo Frames At First

Make sure your storage cluster is working properly first.  Work incrementally.  If you enable Jumbo Frames before you have things working well and solidly, then you could have multiple network and/or other problems at play all at once and that’s going to lead to a lot of wasted time and effort (if you’re working on triaging odd network problems, disable jumbo frames to see if that fixes it – it is a very common source of network problems).

Establish a Baseline of Performance

Measure how much performance you’re getting with whatever protocols you’re using (NFS, SMB, iSCSI, NVMe-oF) with standard 1500 MTU. Make sure there are not other problems in your network like VLAN issues, packet errors, packet overrun, dropped packets, routing errors, etc. By establishing a baseline of performance you have a point for comparison and you will have ironed out other network configuration issues to set the stage for proper testing.  Record your performance numbers, save the benchmark numbers and the settings used to achieve them.  This is super important as you’ll need these later in order to run an apples to apples performance test. Keep in mind that enabling Jumbo frames is going to give you about a 20% boost so if you’re seeing a performance problem that is major, then you probably have a network or other configuration issue elsewhere. Look at the RX/TX counters on your network ports as you do a performance or large file copy test and make sure the data is flowing through the right interfaces. If you have static routes, these can be suspect, check your assumptions.

Enabling Jumbo Frames

Jumbo Frames (ie set the MTU to 9000) must be enabled consistently across your network infrastructure including client NICs, server NICs, switches, and any routers in between.  If you don’t consistently enable it from end-to-end you’ll have network problems, it won’t work, you won’t get the gains.

Also, if you don’t have buy-in from your network administrator to enable Jumbo Frames at the switch layer, then full stop, make optimizations in other areas.

Assuming you can enable it everywhere, turn it on in your storage systems / storage cluster on the data ports, on your network switches, and the clients that are using the storage.  Again, if you miss one of these it won’t work, and it’ll just introduce network problems that are hard to track down, which leads us to testing..

Testing Jumbo Frames

Ping is one of the easiest and most readily available tools you can use to make sure you’re able to transmit Jumbo Frames from your server (or storage system) to your clients (hosts).  But first use ping without adjusting the size of the frame, make sure that the host you’re trying to reach is reachable.

ping <DEST-IPADDRESS>

Now we’re ready to do a jumbo frame test.  You’ll note in this next ping command that we’re not using exactly 9000 bytes in our ping test and that’s because we need to subtract away the IP header and the ICMP header (28 bytes total) that’ll be added on to this 8972 byte block of data.  So yes we are testing a proper 9000 MTU (8972 data bytes + 20 bytes IP header + 8 bytes ICMP header) but we need to subtract the 28 bytes from the ping command else we’ll be sending 9028 bytes and it’ll get blocked by our switch set to 9000.

ping -c 4 -w 1 -A -M do -s 8972 <DEST-IPADDRESS>

Once you’ve verified that the above jumbo frame ping test works, you need to login to that host you just pinged and then do the same ping test in the reverse direction, from the client back to the server.  If it’s working in both directions, great!  “Rinse and repeat” this for each of your clients and servers to make sure everything is set up correctly, it’ll save you a lot of time and grief.

Retesting Performance with Jumbo Frames Enabled

Remember all those tests you ran to establish a baseline of performance, it’s time to re-run those now to see what gains you’ve gotten from enabling jumbo frames.  If you see anywhere from a 10% to 20% gain, fantastic, enabling it was worth the effort.  If you got more than a 20% gain, even better.  If you got less than 10%, you might consider disabling Jumbo Frames as it is probably not significant enough for your workload and could lead to network problems in the future as new servers, clients and switches are added that may not be properly configured for Jumbo Frames.

Testing Jumbo Frames with QuantaStor Systems

Enabling Jumbo Frames is such a common practice with QuantaStor deployments, and such a common network problem, that we ended up building a feature into QuantaStor to quickly detect MTU and other network configuration issues.  To use it, simply navigate to:

Storage Management tab -> Storage Systems section -> right-click on a System -> Network Connectivity/Ping Checker..

When the dialog opens it will automatically start running a test on the selected node as the source and it will ping all the other QuantaStor systems in the grid.  It will also identify any clients that are connected to the selected node and will automatically ping them as well.  The ping testing uses the MTU setting of the interface from which the ping is originating.  So if the source interface is set to MTU 9000 and the destination is MTU 1500 you’ll see a warning and a description of the issue.  In the above example you can see all the ping checks “Passed” so we’re good.  Next one would change the Storage System selection to test again on each of the other nodes within the grid.  If everything passes, you’re generally good to go.  Note that QuantaStor pings just the clients it can detect which are those that are connected at the time.  If you have other clients to include in the ping test you can manually enter those IP addresses in the “Additional Client IPs” section to include them in the testing.   For example, this would be all the IPs of your VMware, HyperV, Kubernetes, and other compute servers that will be accessing the storage. 

Why MTU 9000

Some switch vendors support a higher MTU like 9014, some all the way up to 16000 but in general most hardware NICs and switches support up to 9000 so it’s a good choice when choosing to enable Jumbo Frames.  In some cases if you’re seeing odd performance issues it can be good to try an MTU of 4500 as a triage tool and in some cases you may get better results with it.  

Summary

I hope you enjoyed this article and that it ultimately saves you some time.  Jumbo Frames can provide a nice boost in performance without having to purchase more hardware but it can really cost you a lot of time and grief if you don’t enable the MTU settings properly across your network environment. If you’re interested in trying out QuantaStor we have full featured Trial and Community Edition licenses available from our web site here. Questions, suggestions, please post below or email us at info@osnexus.com.



Categories: Storage Appliance Hardware

Tags: , , , , ,

Leave a comment