Use nagios monitor PostgreSQL archive status
背景
之前写过一篇BLOG:
PostgreSQL Archived in the Cloud
http://blog.163.com/digoal@126/blog/static/163877040201152321027994/
关于归档状态的监控采用nagios来搞定,如下是一个archive_command输出的文件内容:
cat /tmp/pgarchive.nagios_5432
0
status=0,success.remote
2011-07-0113:09:14
第一行是归档状态,在归档命令里面定义。后面的是详细信息。根据第一行的值判断应该返回给NAGIOS什么值。
nagios脚本的返回值0表示success,1表示warnning,2表示critical,3表示unknown.
# 下面是归档命令里面定义的返回值
# NAGIOS返回值 0 status=0,success.remote
# NAGIOS返回值 1 status=1,success.local
# NAGIOS返回值 1 status=2,handwork_pause_can_continue_at_this_point.remote and local
# NAGIOS返回值 2 status=2,failed_can_continue_at_this_point.remote and local
# NAGIOS返回值 2 status=3,handwork_stop_can_not_continue_at_this_point.remote and local
# NAGIOS返回值 2 status=4,switch_check_unknown_can_continue_at_this_point.remote and local
# NAGIOS返回值 2 错误消息
脚本供nrpe调用.
[root@db-192-168-173-66 ~]# cat /usr/local/nagios/libexec/check_postgresql
#!/bin/bash
if [ $# -ne 2 ]; then
echo -e "Usage : $prog \$1 \$2 "
exit 2
fi
DB_PORT=$2
nohup echo -e "q"|telnet -e "q" 127.0.0.1 $DB_PORT >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo -e "PostgreSQL not run in this node"
exit 0
fi
check_archive()
{
test -f /tmp/pgarchive.nagios_$DB_PORT
if [ $? -ne 0 ]; then
RESULT=2
MESSAGE="`date +%F%T`\n/tmp/pgarchive.nagios_$DB_PORT not exist."
echo -e $MESSAGE
return $RESULT
fi
find /tmp/pgarchive.nagios_$DB_PORT -mmin -45|grep "pgarchive.nagios_$DB_PORT"
if [ $? -ne 0 ]; then
RESULT=2
MESSAGE="`date +%F%T`\nPostgreSQL archive timeout."
echo -e $MESSAGE
return $RESULT
fi
RESULT=`head -n 1 /tmp/pgarchive.nagios_$DB_PORT`
cat /tmp/pgarchive.nagios_$DB_PORT
return $RESULT
}
# See how we were called.
case "$1" in
check_archive)
check_archive
;;
*)
echo $"Usage: $prog {check_archive} port"
exit 1
esac
# 2011-07-01 by Digoal.Zhou
chmod 555 /usr/local/nagios/libexec/check_postgresql
把这个脚本加入到nagios配置文件
vi /usr/local/nagios/etc/nrpe.cfg
添加一条
command[check_postgresql1]=/usr/local/nagios/libexec/check_postgresql check_archive 5432
重启xinetd服务
service xinetd restart
在nagios服务端添加监控项 check_postgresql1