Shinken is an open source monitoring application written in Python that is a complete binary replacement for Nagios. Ringfree has an instance running on a DigitalOcean virtual private server that monitors the nodes, the Sansay SBCs, and the VoIPmonitor server. The nodes and voipmon are primarily monitored using a service called the “Nagios Remote Plugin Executor” or NRPE which is installed and running on each server.
NRPE is used over other monitoring methods (such as SNMP) as it allows for more granular control over what is monitored and how it’s being monitored. A Nagios plugin can be written in practically any language to monitor practically anything so this gives us the ability to write one-off plugins and install them on any given server for any given purpose.
The Sansay SBCs are monitored primarily using a Sansay provided Nagios plugin that makes use of the SOAP API client. There is an additional plugin written in-house at Ringfree to monitor the SBC state (active, standby, etc) that also makes use of the SOAP API client.
Ringfree’s Shinken instance can be accessed at http://shinken.ringfree.biz:7767. If you need access credentials or a password reset, please consult with John or Kendall. The server can be accessed via ssh using either shinken.ringfree.biz or via IP address at 159.203.76.166.
On the various servers being monitored, NRPE runs as a service but is NOT configured to start automatically on boot. This is on purpose and makes it easy to spot an incident where a server was rebooted.
In the event of a warning or error, Shinken communicates with PagerDuty by sending an email. This email triggers whichever escalation policy is in place within PagerDuty to alert whoever is on call.
Shinken Configuration
Configuration for the monitoring of Ringfree’s infrastructure can be found on the server in the /etc/shinken/objects
directory. Specific configuration for each box being monitored can be found within the hosts
subdirectory. If you need to monitor a new box, begin by creating a definition here using an existing file as a template.
Most of the specifics associated with what is being monitored can be found in the commands
and services
subdirectories, the former containing the individual commands (most of which run check_nrpe
), the latter containing definitions for which hosts/templates should use which commands.
The Shinken server contains an installation of Monitoring Plugins which contains the check_nrpe
command along with many additional Nagios plugins.
There is nothing especially notable about the configuration as it’s all very elementary Nagios. With any prior Nagios/Shinken experience, you should have no problem navigating everything.
Server Configuration
In addition to NRPE, each of the servers also contains an installation of Monitoring Plugins. In the case of the nodes, Monitoring Plugins has been packaged and is available in the Ringfree repository. In other cases (such as voipmon) it has been downloaded and compiled from source.
The NRPE service runs the various commands using the username nrpe
. The specific commands along with their arguments are defined in the /etc/nagios/nrpe.cfg
file. The commands are aliased with whatever Shinken will use to call that particular command on the server in question. An example of a command and alias definition is:
command[check_openvz]=/usr/lib64/nagios/plugins/check_openvz
In this definition, command
describes that this is, in fact, a command definition, [check_openvz]
describes that the command can be run using check_openvz
from the Shinken instance, and /usr/lib64/nagios/plugins/check_openvz
is the command on the server that will be run when Shinken issues a check_openvz
signal to NRPE.
Some commands require root privileges in order to run correctly. In order to do this without exposing a gaping security hole, some configuration has been added to the /etc/sudoers
file on the affected servers and the commands in nrpe.cfg
are prefixed with /usr/bin/sudo
. For example on node001
, the following can be found within the nrpe.cfg
file:
command[check_haproxy]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_haproxy
In /etc/sudoers
the following has been added in order to securely allow nrpe
to execute the command:
Defaults: nrpe !requiretty nrpe ALL=(root) NOPASSWD: /usr/lib64/nagios/plugins/check_haproxy ""
The first line indicates that nrpe
does not require an active terminal session in order to execute commands. The second line gives explicit access for nrpe
to run the check_haproxy
command (and no other commands) with root privileges given that there are no arguments passed. Limiting access on a “per command” basis and limiting arguments are necessary to ensure proper security.
Custom Plugins
Several custom plugins have been developed in-house at Ringfree for monitoring various specifics for which there was no preexisting plugin available. Included in these are check_haproxy
, check_openvz
, and check_voipmon
. Most of these are written in Python with very obvious syntax and very obvious goals.
The most notable of the custom plugins is check_sansay_ha
which is located on the Shinken server in the /var/lib/shinken/libexec
directory. This plugin is used to communicate with the Sansay SBCs and pull down “HA State” information. Any change of state will trigger an alert.