June 01, 2004 Edition

By Stephan Windischmann (mailto:windi@arslinux.com), Rick Lull (mailto:rick@csbnetworks.com)

 

Introduction

This edition, we show you how to easily monitor systems on your network using Nagios, we tell you about the new Fedora release, and about the new system OSDL put in place governing Linux kernel submissions.

 

Fedora Core 2 released

Fedora Core 2, the second major release of the Fedora distribution, was released on May 18th. Notable improvements over Fedora Core 1 are kernel 2.6 be default and integrated SELinux (Security Enhanced Linux) functionality, as well as GNOME 2.6, KDE3.2, and more. The distribution is available for both i386 and AMD64 systems, either as one DVD or as 4 CDs. The ISO images are available from your favorite Fedora/Red Hat mirror, and as a torrent from here. While it is possible to update a running FC1 (or FC2 beta release) using yum (or apt), the recommended (and most painless) way is to boot from the CD and use the update option in the installer.

 

OSDL implements new system for Linux kernel submissions

As reported by news.com (http://news.com.com/Linux+contributors+face+new+rules/2100-7344_3-5218724.html?tag=nefd.top), OSDL (Open Source Development Labs), who employ Linus Torvalds, have put a system in place to better track and document changes to the Linux kernel. With the new system, the code submitted by devopers will have to be under an "appropriate" open-source license. It also puts in place the Developer's Certificate of Origin (http://www.osdl.org/newsroom/press_releases/2004/2004_05_24_dco.html), which will ensure that developers will receive acknowledgement for their contributions and derivative works, and that developers who "receive submissions and pass them, unchanged, up the kernel tree" also receive acknowledgement.

"The DCO is intended to eliminate questions and legal battles over the origin of Linux code contributions. Last year, the SCO Group, which owns a disputed amount of Unix intellectual property, sued IBM, alleging that the company violated its Unix contract by moving Unix technology to Linux that it should have kept secret."

While the new system won't help resolve the current legal difficulties with the SCO Group, but it will help make sure that there won't be any more questions regarding the legality of parts of the Linux source code.

"Andrew Morton, who, along with Torvalds, maintains the current Linux 2.6 kernel, endorsed the new system after gaining support for it from other key Linux contributors, the open-source group said. "We've always had transparency, peer review, pride and personal responsibility behind our open-source development method. With the DCO, we're trying to document the process. We want to make it simpler to link submitted code to its contributors. It's like signing your own work," Torvalds said in a statement."

The larger question is whether the DCO will create a barrier to entry that hinders some of the Linux open development. While it may not in the short term because of the core of dedicated developers, is it possible that in the interest of process controls and legal concerns, Linux will become a de facto closed development effort (even if the source remains open) with broader contributions being relegated to derivatives and independently-maintained branches.

The larger question is whether the DCO will create a barrier to entry that hinders some of the Linux open development. While it may not in the short term because of the core of dedicated developers, is it possible that in the interest of process controls and legal concerns, Linux will become a de facto closed development effort (even if the source remains open) with broader contributions being relegated to derivatives and independently-maintained branches.

 

TTT: Tools, Tips and Tweaks

Nagios: system monitoring on the cheap

Have lots of servers but not much time? Tired of finding out that something isn't working from the Help Desk, your users, or (gulp) the CIO?

In that case, Nagios (http://www.nagios.org/) is for you. It is a free and Open Source Software package available to run on your flavor of Unix to monitor your Unix, Linux, Netware and NT based servers as well as your printers, switches and routers. We will assume that you have a little bit of knowledge of Linux (since we will be demonstrating the install onto a Linux system) and of the platforms you would like to monitor.

How does it work?

Nagios runs as a daemon on your Linux system, just like other services that you might be running, such as an e-mail server, a web server and a file server.

We recommend that you have a dedicated box to run Nagios on just to make things easier; not to say that you couldn't run it with a bunch of other services on your main file server, just that we recommend you don't. For system requirements, your middle of the road P3 will do fine We used a P3 733 with 128MB RAM and a 10GB hard drive to monitor quite a bit of activity. You might be able to swing with something a little bit less, but remember that this box needs to be able to run your choice of Linux and Apache.

After you configure what hosts and services you want to monitor, Nagios schedules regular checks against that host and/or service. If it finds a problem, Nagios does the following: the Nagios site that you are serving from the monitoring box updates (if you just sit your web browser of choice at the site, your view is refreshed every 90 seconds) to tell you that "Houston, we have a problem"; and if configured, you will get emails, pages and/or SMS messages.

Once you resolve the issue, Nagios will check again and see that everything is working and update itself, notifying you that the trouble is resolved. Nagios can perform two types of checks, active or passive. Active checks are those done by the box running Nagios on hosts and services you would like to monitor. Passive checks are done by the remote hosts and sent to Nagios. Usually, you use active checks when there is a regular service you want to monitor SMTP, IIS, Exchange, Apache, etc. and the passive checks are for things that don't happen that regularly or take place on a host that would be unreachable by a connection initialized from the Nagios server.

Getting down and dirty

First things first requirements as follows: 1st, you need a Linux box. On this Linux box, you should have Apache already installed; if not, apt-get, rpm, or use your package manager of choice to install Apache 1.3 (Don't go with 2 for this). You aren't going to need anything except for the standard modules, so don't sweat any of that.

Also, you will need to create a nagios user account and a nagios group on your system.

Now you are ready to download Nagios. You can either get ready-made binary packages for every major distribution, e.g. nagios-text (nagios-pgsql or nagios-mysql if you plan to use a real database instead) and nagios-plugins for Debian GNU/Linux, or you can compile from source. If you decide to compile it yourself, following the install instructions (http://nagios.sourceforge.net/docs/1_0/installing.html).

Once Nagios is installed, you will need to configure one host to monitor. We recommend also setting a service to be monitored.

To begin our trip through Nagios's configuration, we are going to start with the main configuration file.

vanhelsing:/usr/local/nagios/etc# vim nagios.cfg-sample
(If you did the samples install above)

A few items to confirm:

enable_notifications=1
date_format=us (Or your format of choice)

You will find each of these settings under their own header in the nagios.cfg file. Make sure you save it a nagios.cfg if you are editing the sample files. After that, we will want to add a host and a service. All IPs have been changed to protect the guilty and/or innocent.

vanhelsing:/usr/local/nagios/etc# vim hosts.cfg

Add the following to your hosts.cfg file, customized for whatever host you want to test with:

# SQLTest host definition
define host{
        usegeneric-host   
        host_namesqltest
        register                1
        alias                   SQL Test Server
        address                 10.1.1.37
        check_commandcheck-host-alive
        max_check_attempts10
        notification_interval120
        notification_period24x7
        notification_optionsd,u,r
        }

An explanation of what this means:

 

use - Name of host template to use, you will have the generic-host template already.
host_name - Host name (aka computer name for you Windows admins)
register - Whether to register this host with Nagios (should always be a 1, 
  unless you are building a template
alias - display name on the Nagios alerts and notifications 
address - IP address of host
check_command - check command to use on this host
notification_period - time period to notify applicable contacts
notification_options - when to send a notification on 
  (d = down, u = unreachable, r = recovered)
max_check_attempts - times to retry if the check does not return a host okay.
notification_interval - time in minutes to wait until contact get
  an additional notice that the host is still down

Now for a service:

vanhelsing:/usr/local/nagios/etc# vim services.cfg

As before, customize for your test host.

  # service def for MS SQL
  define service{
        usegeneric-service
        host_namesqltest
        service_descriptionMSSQL
        is_volatile0
        check_period24x7
        max_check_attempts3
        normal_check_interval3
        retry_check_interval1
        contact_groupsnt-admins
        notification_interval120
        notification_period24x7
        notification_optionsw,u,c,r
        check_commandcheck_nt!MSSQLServer
        }

For an explanation of options we have not seen before:

 

Service_description - Description of the Service being monitored; 
  you can make it whatever you want
Contact_groups - groups to contact if this service fails
Check_command - check command to run to test this service


The check_nt command needs to be added to the checkcommands.cfg file. You'd add it now if you have to watch NT boxes; and you are following my examples. Also, don't forget to download and install NSClient onto those NT based machines. The install is quick and simple.

Once you have these files modified, check your configuration. Before you run this command, you have to have the rest of the config files ready. We recommend moving most of the samples except for the following files: checkcommands.cfg, contact-groups.cfg and contacts.cfg. So, move the samples except those files and create blank ones with touch, e.g. as follows:

vanhelsing:/usr/local/nagios/bin# touch dependencies.cfg

At this point, you should have at least one host, one service. Now would be a good time to run the sanity check on your configuration files.

vanhelsing:/usr/local/nagios/bin# ./nagios -v ../etc/nagios.cfg
Nagios 1.2
Copyright (c) 1999-2004 Ethan Galstad (nagios@nagios.org)
Last Modified: 02-02-2004
License: GPL
 
Reading configuration data...
 
Running pre-flight check on configuration data...
 
Checking services...
        Checked 8 services.
Checking hosts...
        Checked 7 hosts. 
?
Checking time periods...
        Checked 4 time periods.
Checking for circular paths between hosts...
Checking for circular service execution dependencies...
Checking global event handlers...
Checking obsessive compulsive service processor command...
Checking misc settings...
 
Total Warnings: 3
Total Errors:   0
 
Things look okay - No serious problems were detected during the pre-flight check
vanhelsing:/usr/local/nagios/bin#

If you come up with warnings, you are usually okay, but errors are bad. For example, if you get a "can't parse file" error or something similar, go back and check your config files for misspellings, brackets, and the like.

After you pass the preflight check, it's time to fire it up.

vanhelsing:/usr/local/nagios/bin# /etc/init.d/nagios start

It should return the PID to you.

To see what is going on via the web, you will need to setup the web interface. This is where a bit of knowledge about Apache comes in handy, but you should be fine if you follow the instructions on how to do so here.

Make sure to enable authentication in the cgi.cfg file, as well as to give the usernames you made for apache access to the global host/service view access parameter, otherwise, the web page will not show anything. Look for authorized_for_all_services and authorized_for_all_hosts in the file. Now, you are all set, and can define contacts and notifications.

 

/dev/random