wassup

Host Monitor


wassup is a bash script that pings a target host to see if it is online. If the target host does not respond to the ping, wassup will send email alerts to the specified addresses. If the target responds to the ping the next time wassup runs, another notification will be sent to show that the host is online again.

wassup requires no special privileges, and may be run as an ordinary user. For best results, the user should have the ability to run wassup in a crontab. For example, to run it every 5 minutes:

*/5 * * * * /home/bob/wassup/wassup 192.168.1.2 >/dev/null

Installation

Untar wassup and copy the entire wassup directory to the desired location in a user's home directory.

To configure, edit SAFETY, BASEDIR, and MAILTO, which appear in the beginning of the script.

The safety host is a reliable host that is always online, pingable, and reachable from the monitoring machine. If the target host doesn't respond to a ping, wassup will check the safety host to confirm that the network is availabe, preventing false alarms.

SAFETY="www.example.com"

Lockfiles and logs will be written in BASEDIR, which must be writable by the user running wassup. This is also where the bodies of the notification messages will be stored.

BASEDIR=/home/bob/wassup

Specify the notification address, or list of addresses separated by commas with no spaces. Obviously, these accounts should be accessible if the target is down.

MAILTO=bob
MAILTO=bob,bob@example.com,5551234567@mobile.example.net
  

Optionally, edit the notification messages:

MESSAGE_DOWN=$BASEDIR/message_down
MESSAGE_UP=$BASEDIR/message_up
  

These files contain the body of the email notifications. I have included samples which were designed to be informative, yet small enough for the display on a cell phone or beeper. To provide custom messages for each target, change the variable to:

MESSAGE_DOWN=$BASEDIR/${TARGET}.message_down

and make sure matching files exist in the base directory for each monitored host. This obviously increases the maintenance overhead of the application, so avoid this customization unless there is a pressing need.

Operation

When wassup runs, it will attempt to ping the target host. If the target can't be reached, the safety host will be tried. If the safety does not respond, wassup will abort with an appropriate log entry. Since this indicates a problem with the local network or interface, no notification is sent.

If the safety responds, wassup will try to ping the target host again. If the target still doesn't respond, wassup will assume the target is offline and check for a lockfile. If no lockfile exists, it will make a log entry, email the alert, and create a lockfile to prevent further notifications.

If the target host responds to the initial ping, wassup checks for a lockfile. The presence of a lockfile indicates that the target was previously offline, so wassup writes to the log, sends a notification that the host is back online, and removes the lockfile.

Logging

wassup's logging is very simple, using parameter=value pairs. It only logs if the safety can't be reached, or when the target goes offline, then back online. This keeps the log quite small. It will probably never need to be rotated. If it gets big, there's either something seriously wrong with your script variables or your network.

To disable logging, comment out any lines that echo to the logfile.

Troubleshooting

In order to work, wassup requires that both the target and safety machines respond to pings.

Test wassup on the command line before running it in cron, using both a live target and an unreachable host. Once things are working, add it to the user's crontab.

Be sure to use the complete path to the wassup script in the crontab. Also make sure that the script is executable.

BASEDIR and its files must be writable for the user running wassup.

The first line must point to the system's bash executable.

Download

wassup.tgz

Read

README
wassup
message_up
message_down

Changelog

v.20080805:

v.20031008:

v.20011129:


Home | Code | Services | Contact