Monitoring OpenStack – Nagios 3

Part of the work around the upcoming OpenStack Cookbook involved a refresh of the chapter on Monitoring. Specifically, we wanted to deliver an updated section with some tool changes that would reflect the current state of things in terms of OpenStack monitoring. That said, after spending weeks and weeks exploring other options, we came back to Nagios. In part because it “just works” and in part due to the time constraints that are the nature of book writing. In this post we will explore the installation and configuration of Nagios 3 to monitor both the http APIs as well as the processes related to OpenStack.

Note: Nagios, out of the box, will only provide part of the view needed into an OpenStack environment. To get deeper into monitoring and HA OpenStack, check out the OpenStack Cookbook here.

Getting Started with OpenStack Monitoring

To get started, you will need an OpenStack environment and a server on which to install Nagios. If you do not already have one, you can take a look at the aforementioned book, or the Couch to OpenStack program here. Further, if you just want to jump right into a test environment, do the following (assuming you have Git, Vagrant, and VirtualBox)

git clone https://github.com/bunchc/Couch_to_OpenStack.git -b monitoring

vagrant up

Note: The environment above handles all of the install steps for you, thus making the rest of this guide instructional.

How to do it…

Nagios is deployed in a Client / Server architecture (not extremely cloud like, but it works). So we’ll preform our install similarly: Nagios Server, then the Nagios Clients.

Installing Nagios3 – Server

To install the Nagios 3 server components,  log into a VM or server running Ubuntu 12.04, and run the following commands:

# Install Nagios
echo “postfix postfix/main_mailer_type select No configuration” | sudo debconf-set-selections
echo “nagios3-cgi nagios3/adminpassword password nagiosadmin” | sudo debconf-set-selections
echo “nagios3-cgi nagios3/adminpassword-repeat password nagiosadmin” | sudo debconf-set-selections

sudo apt-get install -y nagios3 nagios-nrpe-plugin

A quick rundown of what we did: the “echo” commands set a number of variables so that we are not prompted for them at Nagios install time. Specifically we set the admin user and password, as well as the mailer that Nagios will use for notifications. Finally, we install the nagios3 package and the nagios-nrpe-plugin that will provide our remote checks.

Configuring Nagios3 – Server

Once the server is installed, we need to provide some configuration details. Specifically, we need to tell Nagios where our hosts are, and what services to check for on them. First we’ll configure the hosts. To do this, we log into the Nagios server and run the following commands:

# Create our Nagios Hosts
sudo cat > /etc/nagios3/conf.d/controller.cfg <<EOF
# Generic host definition template – This is NOT a real host, just a template!

define host{
        host_name                       controller.cook.book
        address                         172.16.80.200
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                check_command                   check-host-alive
                max_check_attempts              10
                notification_interval           0
                notification_period             24×7
                notification_options            d,u,r
                contact_groups                  admins
        register                        1       ; DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
EOF

sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/compute.cfg
sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/cinder.cfg
sudo cp /etc/nagios3/conf.d/controller.cfg /etc/nagios3/conf.d/quantum.cfg
sudo sed -i “s/controller/compute/” /etc/nagios3/conf.d/compute.cfg
sudo sed -i “s/172.16.80.200/172.16.80.201/” /etc/nagios3/conf.d/compute.cfg
sudo sed -i “s/controller/cinder/” /etc/nagios3/conf.d/cinder.cfg
sudo sed -i “s/172.16.80.200/172.16.80.211/” /etc/nagios3/conf.d/cinder.cfg
sudo sed -i “s/controller/quantum/” /etc/nagios3/conf.d/quantum.cfg
sudo sed -i “s/172.16.80.200/172.16.80.202/” /etc/nagios3/conf.d/quantum.cfg

The first command “sudo cat > <<EOF” creates a generic host definition and enables it. We then copy and change the host name and IP address in the files for additional nodes: compute, cinder, quantum.

Next we configure the checks for each host by running the following commands:

# Nagios services configuration for Compute Services
sudo cat > /etc/nagios3/conf.d/openstack_compute_services.cfg <<EOF
# Compute all the computes

define service {
        host_name                       compute.cook.book
        service_description             Nova Processes
        check_command                   check_nrpe_1arg!check_nova_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

define service {
        host_name                       compute.cook.book
        service_description             Quantum Services
        check_command                   check_nrpe_1arg!check_quantum_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

define service {
        host_name                       compute.cook.book
        service_description             Open vSwitch – ovswitchd
        check_command                   check_nrpe_1arg!check_ovswitch_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}
define service {
        host_name                       compute.cook.book
        service_description             Open vSwitch – ovsdb-server
        check_command                   check_nrpe_1arg!check_ovswitch_server_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

EOF

# Nagios services configuration for Cinder Node
sudo cat > /etc/nagios3/conf.d/openstack_cinder_services.cfg <<EOF
# Cinder
define service {
        host_name                       cinder.cook.book
        service_description             Cinder-API-HTTP
        check_command                   check_nrpe_1arg!check_cinder_http
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

define service {
        host_name                       cinder.cook.book
        service_description             Cinder-API-Proc
        check_command                   check_nrpe_1arg!check_cinder_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}
EOF

# Nagios services configuration for Quantum Node
sudo cat > /etc/nagios3/conf.d/openstack_quantum_services.cfg <<EOF
# Yes Quantum, it’s Grizzly, prior to the name change 😉
define service {
        host_name                       quantum.cook.book
        service_description             Quantum-ovsdb-server
        check_command                   check_nrpe_1arg!check_ovsdbserver
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

define service {
        host_name                       quantum.cook.book
        service_description             Quantum-ovs-vswitchd
        check_command                   check_nrpe_1arg!check_ovsvswitchd
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

define service {
        host_name                       quantum.cook.book
        service_description             Quantum-API-Proc
        check_command                   check_nrpe_1arg!check_quantum_proc
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}
EOF

A lot of text, but well worth it. What we did there, was use the same sudo cat > method to create a configuration file for each host and provide a number of services per host. An important thing to note in these files is the check_command section. Note the command that comes after it: “check_nrpe_1arg!” and the name of a check. These use the Nagios Remote Plugin Execution setup to launch a command on the defined host and report the results. Finally, we restart the nagios3 service:

sudo service nagios3 restart

What we’ve done so far, is install the Nagios3 server and configured it to monitor a multi-node OpenStack installation. Specifically, we provided checks for the “controller” (Keystone, Glance, Nova-Scheduler), the Cinder node, as well as the Neutron / Quantum node. Our configuration however is incomplete, we need to install the NRPE server and configure the respective checks on each host.

Installing Nagios 3 – Client

In this section, rather than show you the installation four different times, we will show you the nrpe install once, and then provide a configuration file for each host and it’s services. It will be an exercise for the reader to put these files to the best use.

To begin, ssh into your “controller” node. For the sake of our example, that is the host running Keystone and Glance. From there execute the following commands:

sudo apt-get install -y nagios-nrpe-server
sudo sed -i “s/allowed_hosts=127.0.0.1/allowed_hosts=127.0.0.1,172.16.80.100/” /etc/nagios/nrpe.cfg

What these commands did was install the nagios-nrpe-server components and configure them to allow communication with our nagios host. You will need to change addresses to suit your environment.

Configure Nagios 3 – NRPE & OpenStack

Each section below creates NRPE commands to be run on each node that will allow for monitoring of the respective services.

Controller

# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_horizon]=/usr/lib/nagios/plugins/check_http localhost -u /horizon -R username
command[check_keystone_http]=/usr/lib/nagios/plugins/check_http localhost -p 5000 -R application/vnd.openstack.identity-v3
command[check_keystone_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -u keystone
command[check_glance_http]=/usr/lib/nagios/plugins/check_http localhost -p 9292 -R “SUPPORTED”
command[check_glance_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u glance
command[check_quantum_api_http]=/usr/lib/nagios/plugins/check_http localhost -p 9696 -R “CURRENT”
command[check_quantum_api_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -C python -a quantum-server
EOF

# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg

# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start

Compute

# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_horizon]=/usr/lib/nagios/plugins/check_http localhost -u /horizon -R username
command[check_keystone_http]=/usr/lib/nagios/plugins/check_http localhost -p 5000 -R application/vnd.openstack.identity-v3
command[check_keystone_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -u keystone
command[check_glance_http]=/usr/lib/nagios/plugins/check_http localhost -p 9292 -R “SUPPORTED”
command[check_glance_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u glance
command[check_quantum_api_http]=/usr/lib/nagios/plugins/check_http localhost -p 9696 -R “CURRENT”
command[check_quantum_api_proc]=/usr/lib/nagios/plugins/check_procs -w 1 -C python -a quantum-server
EOF

# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg

# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start

Cinder

# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_nova_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u nova
command[check_quantum_proc]=/usr/lib/nagios/plugins/check_procs -w 1: -u quantum
command[check_ovswitch_proc]=/usr/lib/nagios/plugins/check_procs -w 2 -C ovs-vswitchd
command[check_ovswitch_server_proc]=/usr/lib/nagios/plugins/check_procs -w 2 -C ovsdb-server
EOF

# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg

# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start

Quantum

# Setup our check commands:
sudo cat > /etc/nagios/checks.cfg <<EOF
command[check_cinder_proc]=/usr/lib/nagios/plugins/check_procs -w 4: -u cinder
command[check_cinder_http]=/usr/lib/nagios/plugins/check_http localhost -p 8776 -R “CURRENT”
EOF

# Include our check commands
sudo echo “include=/etc/nagios/checks.cfg” >> /etc/nagios/nrpe.cfg

# Restart the service
sudo service nagios-nrpe-server stop
sudo service nagios-nrpe-server start

Summary

In this REALLY long post, we’ve covered how to install and configure both the Nagios Server and Nagios clients, using NRPE, to monitor OpenStack services. As stated before, while Nagios monitoring can provide some insight into your environment, you will want to work with some additional tools to complete the coverage.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.