One of the first metrics encountered when learning to administer
a Linux system is the system load average. The load average is
available directly through the /proc/loadavg file and included in
the output of commands such as 'w', 'uptime', and 'top'. For
example:
$ uptime
12:36:23 up 23 days, 19:58, 1 user, load average: 0.38, 0.38, 0.43
A quick check of the manual will tell you that the three numbers
are the system load averages for the past 1, 5, and 15
minutes - however, this does not explain what "system load"
is actually a measure of. As it turns out, the system load is easy
to understand but often not clearly explained.
The system load is the number of processes that are in one of
two states: running (actively executing code) or ready to run (not
actively executing, but will do so as soon as possible). Processes
that are ready to run - but not yet running - can be in this state
for a number of reasons, but the two most common are:
- waiting for CPU time: all the processors are busy running some
other process
- waiting for I/O request to complete: for example, the process
has requested some data from the hard disk, and is waiting for the
data to be read
This second point is crucial to understanding why an overwhelmed
system may show a load average of 30, 50, or even over 100: in such
cases, it is rare that the problem is lack of CPU time. Instead,
the high load average is typically caused by an I/O subsystem that
is unable to keep up with the request rate.
System Load and Xen
When applied to a privileged domain ("Dom0") in Xen, we now have
a problem. The Linux system load average counts processes that are
running or ready to run, but Xen guest domains are not Linux
processes and so do not factor into the calculation.
On a Xen host, we can use the command "xm list" to see a list of
the guest domains and their current state. For example:
$ sudo xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 1536 2 r----- 11690263.8
dom000040 339 1536 4 r----- 192239.1
dom000048 15 128 4 ------ 332418.0
dom000067 5 512 1 r----- 10394745.1
dom000089 323 256 4 -b---- 51310.3
dom000149 348 2048 4 -b---- 25867.2
dom000173 10 128 4 -b---- 117079.9
dom000193 11 128 1 -b---- 128118.4
dom000514 123 192 4 -b---- 767836.0
dom000576 340 512 4 -b---- 75573.7
dom000657 357 2048 4 -b---- 94863.6
dom000701 185 256 4 -b---- 208035.7
dom000720 341 1024 4 -b---- 127005.0
dom000727 358 2048 2 -b---- 7916.4
Here we have 3 guests in the "running" state (r-----), one in
the ready to run state (------) and the rest are all "blocked"
(-b----) which under Xen means they are currently sleeping.
The "xm list" command gives us all the information we need to
calculate the system load for Xen: what is needed is a program to
average this information.
Introducing xenload
xenload (
download it from our github account) is a short python script
that runs as a daemon, calculating the Xen system load by parsing
the "xm list" output every five seconds and storing the result.
To install xenload, download it to /usr/local/sbin and start the
daemon:
wget -q https://raw.github.com/MammothMedia/XenExtras/master/xenload \
-O /usr/local/sbin/xenload
/usr/local/sbin/xenload --daemon
Wait a while for it to gather some data, and then check the Xen
system load:
$ /srv/xen/xenload
xen load average: 4.80, 4.67, 4.81
Or if you need the data in a more easily parsable for software
like Cacti:
$ /srv/xen/xenload --cacti
load_1min:4.40 load_5min:5.00 load_15min:4.80
(To make sure xenload is always available, place
"/usr/local/sbin/xenload --daemon" into /etc/rc.local or an init
script to load the daemon at system boot. )
This is a simple addition to your system administrator toolbox
to provide an "at a glance" metric for system health, and is easy
to graph due to its similarity to the Linux system load
average.