Checking load average on a Xen host

One of the first metrics encountered when learning to administer a Linux system is the system load average. The load average is available directly through the /proc/loadavg file and included in the output of commands such as 'w', 'uptime', and 'top'. For example:

$ uptime
 12:36:23 up 23 days, 19:58,  1 user,  load average: 0.38, 0.38, 0.43

A quick check of the manual will tell you that the three numbers are the system load averages for the past 1, 5, and 15 minutes - however, this does not explain what "system load" is actually a measure of. As it turns out, the system load is easy to understand but often not clearly explained.

The system load is the number of processes that are in one of two states: running (actively executing code) or ready to run (not actively executing, but will do so as soon as possible). Processes that are ready to run - but not yet running - can be in this state for a number of reasons, but the two most common are:

waiting for CPU time: all the processors are busy running some other process
waiting for I/O request to complete: for example, the process has requested some data from the hard disk, and is waiting for the data to be read

This second point is crucial to understanding why an overwhelmed system may show a load average of 30, 50, or even over 100: in such cases, it is rare that the problem is lack of CPU time. Instead, the high load average is typically caused by an I/O subsystem that is unable to keep up with the request rate.

System Load and Xen

When applied to a privileged domain ("Dom0") in Xen, we now have a problem. The Linux system load average counts processes that are running or ready to run, but Xen guest domains are not Linux processes and so do not factor into the calculation.

On a Xen host, we can use the command "xm list" to see a list of the guest domains and their current state. For example:

$ sudo xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  1536     2     r----- 11690263.8
dom000040                                  339  1536     4     r----- 192239.1
dom000048                                   15   128     4     ------ 332418.0
dom000067                                    5   512     1     r----- 10394745.1
dom000089                                  323   256     4     -b----  51310.3
dom000149                                  348  2048     4     -b----  25867.2
dom000173                                   10   128     4     -b---- 117079.9
dom000193                                   11   128     1     -b---- 128118.4
dom000514                                  123   192     4     -b---- 767836.0
dom000576                                  340   512     4     -b----  75573.7
dom000657                                  357  2048     4     -b----  94863.6
dom000701                                  185   256     4     -b---- 208035.7
dom000720                                  341  1024     4     -b---- 127005.0
dom000727                                  358  2048     2     -b----   7916.4

Here we have 3 guests in the "running" state (r-----), one in the ready to run state (------) and the rest are all "blocked" (-b----) which under Xen means they are currently sleeping.

The "xm list" command gives us all the information we need to calculate the system load for Xen: what is needed is a program to average this information.

Introducing xenload

xenload ( download it from our github account) is a short python script that runs as a daemon, calculating the Xen system load by parsing the "xm list" output every five seconds and storing the result.

To install xenload, download it to /usr/local/sbin and start the daemon:

wget -q https://raw.github.com/MammothMedia/XenExtras/master/xenload \
-O /usr/local/sbin/xenload
/usr/local/sbin/xenload --daemon

Wait a while for it to gather some data, and then check the Xen system load:

$ /srv/xen/xenload
xen load average: 4.80, 4.67, 4.81

Or if you need the data in a more easily parsable for software like Cacti:

$ /srv/xen/xenload --cacti
load_1min:4.40 load_5min:5.00 load_15min:4.80

(To make sure xenload is always available, place "/usr/local/sbin/xenload --daemon" into /etc/rc.local or an init script to load the daemon at system boot. )

This is a simple addition to your system administrator toolbox to provide an "at a glance" metric for system health, and is easy to graph due to its similarity to the Linux system load average.