In our 2016.5 release we implemented an important performance enhancement for SNMP monitoring. Let’s look at the details.
SNMP is the workhorse protocol of system monitoring especially when it comes to your networking gear. That includes your switches and routers, but can also extend to your UPS’s, PDU’s, printers and other kinds of network devices that support the protocol. Since SNMP is used for so many different kinds of monitoring, it’s essential to get it right and here at FrameFlow we’ve put a lot of effort into exactly that.
An often used operation in SNMP is one called a "walk." For example, when our SNMP Bandwidth event monitor wants to find a list of all interfaces on a device, it does a walk on the item called "IF-MIB::ifDescr" which is a standard implemented by almost all devices that support SNMP.
The walk gives us back a list of the descriptions for each interface. If you were to use a command line SNMP tool, the result would look something like this if you pointed it at a Windows box:
SNMP Walk Results for a Window System
This tells us there are 20 interfaces, some physical and a bunch of virtuals ones. We follow up by walking several other SNMP items to get values for incoming and outgoing bandwith, then we sew it all together to give you the result that you see from the event monitor.
So how does the walk work? SNMP offers two command options: GetNext and GetBulk.
With GetNext the monitoring system has to keep asking for each item one-by-one until it reaches the end of the list. So in the example, above where we want to get a list of all interface names, that means 20 requests go out to the device and 20 replies are sent back.
SNMP GetNext Requests
When using GetBulk requests the system doing the monitoring sends one request asking for an item and all following items up to a limit. In some cases, the complete data set can potentially be returned with just one request/reply pair, but even if this doesn’t happen, the number of requests is immensely reduced.
SNMP GetBulk Requests
This provides some savings in terms of bandwidth but more importantly it greatly reduces the effect of latency. To illustrate this, imagine a high latency situation, maybe over a WAN or VPN, where the round trip time is 50ms.
The twenty requests required with GetNext add up to a full second (1000ms) before all of the data has been retrieved.
With GetBulk it’s just a single round trip of 50ms. That theoretically means you can monitor 20 times more systems in the same given timeframe. In practice there’s a bit of additional overhead, but it is not much. In our testing on high latency networks we saw an 18.5 fold improvement.
Not everyone knows about GetBulk and part of the reason is that it was not available in SNMPv1 and many devices enable only that version by default. It is supported in SNMPv2c and in SNMPv3 though, so if your gear supports these newer versions then enable them so you can take advantage of the performance benefits. After doing so, the last step is to go to your SNMP event monitors in FrameFlow and set them to use SNMP2vc.