Shinken Monitoring

Shinken is an open source monitoring application written in Python that is a complete binary replacement for Nagios. Ringfree has an instance running on a DigitalOcean virtual private server that monitors the nodes, the Sansay SBCs, and the VoIPmonitor server. The nodes and voipmon are primarily monitored using a service called the “Nagios Remote Plugin Executor” or NRPE which is installed and running on each server.

NRPE is used over other monitoring methods (such as SNMP) as it allows for more granular control over what is monitored and how it’s being monitored. A Nagios plugin can be written in practically any language to monitor practically anything so this gives us the ability to write one-off plugins and install them on any given server for any given purpose.

The Sansay SBCs are monitored primarily using a Sansay provided Nagios plugin that makes use of the SOAP API client. There is an additional plugin written in-house at Ringfree to monitor the SBC state (active, standby, etc) that also makes use of the SOAP API client.


Ringfree’s Shinken instance can be accessed at http://shinken.ringfree.biz:7767. If you need access credentials or a password reset, please consult with John or Kendall. The server can be accessed via ssh using either shinken.ringfree.biz or via IP address at 159.203.76.166.

On the various servers being monitored, NRPE runs as a service but is NOT configured to start automatically on boot. This is on purpose and makes it easy to spot an incident where a server was rebooted.

In the event of a warning or error, Shinken communicates with PagerDuty by sending an email. This email triggers whichever escalation policy is in place within PagerDuty to alert whoever is on call.


Shinken Configuration

Configuration for the monitoring of Ringfree’s infrastructure can be found on the server in the /etc/shinken/objects directory. Specific configuration for each box being monitored can be found within the hosts subdirectory. If you need to monitor a new box, begin by creating a definition here using an existing file as a template.

Most of the specifics associated with what is being monitored can be found in the commands and services subdirectories, the former containing the individual commands (most of which run check_nrpe), the latter containing definitions for which hosts/templates should use which commands.

The Shinken server contains an installation of Monitoring Plugins which contains the check_nrpe command along with many additional Nagios plugins.

There is nothing especially notable about the configuration as it’s all very elementary Nagios. With any prior Nagios/Shinken experience, you should have no problem navigating everything.


Server Configuration

In addition to NRPE, each of the servers also contains an installation of Monitoring Plugins. In the case of the nodes, Monitoring Plugins has been packaged and is available in the Ringfree repository. In other cases (such as voipmon) it has been downloaded and compiled from source.

The NRPE service runs the various commands using the username nrpe. The specific commands along with their arguments are defined in the /etc/nagios/nrpe.cfg file. The commands are aliased with whatever Shinken will use to call that particular command on the server in question. An example of a command and alias definition is:

command[check_openvz]=/usr/lib64/nagios/plugins/check_openvz

In this definition, command describes that this is, in fact, a command definition, [check_openvz] describes that the command can be run using check_openvz from the Shinken instance, and /usr/lib64/nagios/plugins/check_openvz is the command on the server that will be run when Shinken issues a check_openvz signal to NRPE.

Some commands require root privileges in order to run correctly. In order to do this without exposing a gaping security hole, some configuration has been added to the /etc/sudoers file on the affected servers and the commands in nrpe.cfg are prefixed with /usr/bin/sudo. For example on node001, the following can be found within the nrpe.cfg file:

command[check_haproxy]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_haproxy

In /etc/sudoers the following has been added in order to securely allow nrpe to execute the command:

Defaults: nrpe !requiretty
nrpe ALL=(root) NOPASSWD: /usr/lib64/nagios/plugins/check_haproxy ""

The first line indicates that nrpe does not require an active terminal session in order to execute commands. The second line gives explicit access for nrpe to run the check_haproxy command (and no other commands) with root privileges given that there are no arguments passed. Limiting access on a “per command” basis and limiting arguments are necessary to ensure proper security.


Custom Plugins

Several custom plugins have been developed in-house at Ringfree for monitoring various specifics for which there was no preexisting plugin available. Included in these are check_haproxycheck_openvz, and check_voipmon. Most of these are written in Python with very obvious syntax and very obvious goals.

The most notable of the custom plugins is check_sansay_ha which is located on the Shinken server in the /var/lib/shinken/libexec directory. This plugin is used to communicate with the Sansay SBCs and pull down “HA State” information. Any change of state will trigger an alert.