Part of the work around the upcoming OpenStack Cookbook involved a refresh of the chapter on Monitoring. Specifically, we wanted to deliver an updated section with some tool changes that would reflect the current state of things in terms of OpenStack monitoring. That said, after spending weeks and weeks exploring other options, we came back to Nagios. In part because it “just works” and in part due to the time constraints that are the nature of book writing. In this post we will explore the installation and configuration of Nagios 3 to monitor both the http APIs as well as the processes related to OpenStack.
Note: Nagios, out of the box, will only provide part of the view needed into an OpenStack environment. To get deeper into monitoring and HA OpenStack, check out the OpenStack Cookbook here.
Getting Started with OpenStack Monitoring
To get started, you will need an OpenStack environment and a server on which to install Nagios. If you do not already have one, you can take a look at the aforementioned book, or the Couch to OpenStack program here. Further, if you just want to jump right into a test environment, do the following (assuming you have Git, Vagrant, and VirtualBox)
git clone https://github.com/bunchc/Couch_to_OpenStack.git -b monitoring
vagrant up
Note: The environment above handles all of the install steps for you, thus making the rest of this guide instructional.
How to do it…
Nagios is deployed in a Client / Server architecture (not extremely cloud like, but it works). So we’ll preform our install similarly: Nagios Server, then the Nagios Clients.
Installing Nagios3 – Server
To install the Nagios 3 server components, log into a VM or server running Ubuntu 12.04, and run the following commands:
# Install Nagios
echo “postfix postfix/main_mailer_type select No configuration” | sudo debconf-set-selections
echo “nagios3-cgi nagios3/adminpassword password nagiosadmin” | sudo debconf-set-selections
echo “nagios3-cgi nagios3/adminpassword-repeat password nagiosadmin” | sudo debconf-set-selections
sudo apt-get install -y nagios3 nagios-nrpe-plugin
A quick rundown of what we did: the “echo” commands set a number of variables so that we are not prompted for them at Nagios install time. Specifically we set the admin user and password, as well as the mailer that Nagios will use for notifications. Finally, we install the nagios3 package and the nagios-nrpe-plugin that will provide our remote checks.
Configuring Nagios3 – Server
Once the server is installed, we need to provide some configuration details. Specifically, we need to tell Nagios where our hosts are, and what services to check for on them. First we’ll configure the hosts. To do this, we log into the Nagios server and run the following commands:
# Create our Nagios Hosts
sudo cat > /etc/nagios3/conf.d/controller.cfg <<EOF
# Generic host definition template – This is NOT a real host, just a template!
define host{
host_name controller.cook.book
address 172.16.80.200
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24×7
notification_options d,u,r
contact_groups admins
register 1 ; DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST, JUST A TEMPLATE!
}
EOF
sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/compute.cfg
sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/cinder.cfg
sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/quantum.cfg
sudo sed -i “s/controller/compute/” /etc/nagios3/conf.d/compute.cfg
sudo sed -i “s/172.16.80.200/172.16.80.201/” /etc/nagios3/conf.d/compute.cfg
sudo sed -i “s/controller/cinder/” /etc/nagios3/conf.d/cinder.cfg
sudo sed -i “s/172.16.80.200/172.16.80.211/” /etc/nagios3/conf.d/cinder.cfg
sudo sed -i “s/controller/quantum/” /etc/nagios3/conf.d/quantum.cfg
sudo sed -i “s/172.16.80.200/172.16.80.202/” /etc/nagios3/conf.d/quantum.cfg
The first command “sudo cat > <<EOF” creates a generic host definition and enables it. We then copy and change the host name and IP address in the files for additional nodes: compute, cinder, quantum.
Next we configure the checks for each host by running the following commands:
# Nagios services configuration for Compute Services
sudo cat > /etc/nagios3/conf.d/openstack_compute_services.cfg <<EOF
# Compute all the computes
define service {
host_name compute.cook.book
service_description Nova Processes
check_command check_nrpe_1arg!check_nova_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name compute.cook.book
service_description Quantum Services
check_command check_nrpe_1arg!check_quantum_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name compute.cook.book
service_description Open vSwitch – ovswitchd
check_command check_nrpe_1arg!check_ovswitch_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name compute.cook.book
service_description Open vSwitch – ovsdb-server
check_command check_nrpe_1arg!check_ovswitch_server_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
EOF
# Nagios services configuration for Cinder Node
sudo cat > /etc/nagios3/conf.d/openstack_cinder_services.cfg <<EOF
# Cinder
define service {
host_name cinder.cook.book
service_description Cinder-API-HTTP
check_command check_nrpe_1arg!check_cinder_http
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name cinder.cook.book
service_description Cinder-API-Proc
check_command check_nrpe_1arg!check_cinder_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
EOF
# Nagios services configuration for Quantum Node
sudo cat > /etc/nagios3/conf.d/openstack_quantum_services.cfg <<EOF
# Yes Quantum, it’s Grizzly, prior to the name change 😉
define service {
host_name quantum.cook.book
service_description Quantum-ovsdb-server
check_command check_nrpe_1arg!check_ovsdbserver
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name quantum.cook.book
service_description Quantum-ovs-vswitchd
check_command check_nrpe_1arg!check_ovsvswitchd
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
host_name quantum.cook.book
service_description Quantum-API-Proc
check_command check_nrpe_1arg!check_quantum_proc
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
EOF
A lot of text, but well worth it. What we did there, was use the same sudo cat > method to create a configuration file for each host and provide a number of services per host. An important thing to note in these files is the check_command section. Note the command that comes after it: “check_nrpe_1arg!” and the name of a check. These use the Nagios Remote Plugin Execution setup to launch a command on the defined host and report the results. Finally, we restart the nagios3 service:
sudo service nagios3 restart
What we’ve done so far, is install the Nagios3 server and configured it to monitor a multi-node OpenStack installation. Specifically, we provided checks for the “controller” (Keystone, Glance, Nova-Scheduler), the Cinder node, as well as the Neutron / Quantum node. Our configuration however is incomplete, we need to install the NRPE server and configure the respective checks on each host.
Installing Nagios 3 – Client
In this section, rather than show you the installation four different times, we will show you the nrpe install once, and then provide a configuration file for each host and it’s services. It will be an exercise for the reader to put these files to the best use.
To begin, ssh into your “controller” node. For the sake of our example, that is the host running Keystone and Glance. From there execute the following commands:
sudo apt-get install -y nagios-nrpe-server
sudo sed -i “s/allowed_hosts=127.0.0.1/allowed_hosts=127.0.0.1,172.16.80.100/” /etc/nagios/nrpe.cfg
What these commands did was install the nagios-nrpe-server components and configure them to allow communication with our nagios host. You will need to change addresses to suit your environment.
Configure Nagios 3 – NRPE & OpenStack
Each section below creates NRPE commands to be run on each node that will allow for monitoring of the respective services.
Controller
# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_horizon]=/usr/lib/nagios/plugins/check_http localhost -u /horizon -R username
command[check_keystone_http]=/usr/lib/nagios/plugins/check_http localhost -p 5000 -R application/vnd.openstack.identity-v3
command[check_keystone_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -u keystone
command[check_glance_http]=/usr/lib/nagios/plugins/check_http localhost -p 9292 -R “SUPPORTED”
command[check_glance_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u glance
command[check_quantum_api_http]=/usr/lib/nagios/plugins/check_http localhost -p 9696 -R “CURRENT”
command[check_quantum_api_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -C python -a quantum-server
EOF
# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg
# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start
Compute
# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_horizon]=/usr/lib/nagios/plugins/check_http localhost -u /horizon -R username
command[check_keystone_http]=/usr/lib/nagios/plugins/check_http localhost -p 5000 -R application/vnd.openstack.identity-v3
command[check_keystone_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -u keystone
command[check_glance_http]=/usr/lib/nagios/plugins/check_http localhost -p 9292 -R “SUPPORTED”
command[check_glance_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u glance
command[check_quantum_api_http]=/usr/lib/nagios/plugins/check_http localhost -p 9696 -R “CURRENT”
command[check_quantum_api_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -C python -a quantum-server
EOF
# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg
# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start
Cinder
# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_nova_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u nova
command[check_quantum_proc]=/usr/lib/nagios/plugins/check_procs -w 1: -u quantum
command[check_ovswitch_proc]=/usr/lib/nagios/plugins/check_procs -w 2 -C ovs-vswitchd
command[check_ovswitch_server_proc]=/usr/lib/nagios/plugins/check_procs -w 2 -C ovsdb-server
EOF
# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg
# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start
Quantum
# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_cinder_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u cinder
command[check_cinder_http]=/usr/lib/nagios/plugins/check_http localhost -p 8776 -R “CURRENT”
EOF
# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg
# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start
Summary
In this REALLY long post, we’ve covered how to install and configure both the Nagios Server and Nagios clients, using NRPE, to monitor OpenStack services. As stated before, while Nagios monitoring can provide some insight into your environment, you will want to work with some additional tools to complete the coverage.