Use nagios monitor PostgreSQL archive status

1 minute read

背景

之前写过一篇BLOG:
PostgreSQL Archived in the Cloud

http://blog.163.com/digoal@126/blog/static/163877040201152321027994/

关于归档状态的监控采用nagios来搞定,如下是一个archive_command输出的文件内容:

cat /tmp/pgarchive.nagios_5432  
0  
status=0,success.remote  
2011-07-0113:09:14  

第一行是归档状态,在归档命令里面定义。后面的是详细信息。根据第一行的值判断应该返回给NAGIOS什么值。

nagios脚本的返回值0表示success,1表示warnning,2表示critical,3表示unknown.

# 下面是归档命令里面定义的返回值  
  
# NAGIOS返回值 0 status=0,success.remote  
# NAGIOS返回值 1 status=1,success.local  
# NAGIOS返回值 1 status=2,handwork_pause_can_continue_at_this_point.remote and local  
# NAGIOS返回值 2 status=2,failed_can_continue_at_this_point.remote and local  
# NAGIOS返回值 2 status=3,handwork_stop_can_not_continue_at_this_point.remote and local  
# NAGIOS返回值 2 status=4,switch_check_unknown_can_continue_at_this_point.remote and local  
# NAGIOS返回值 2 错误消息  

脚本供nrpe调用.

[root@db-192-168-173-66 ~]# cat /usr/local/nagios/libexec/check_postgresql   
  
  
#!/bin/bash  
  
if [ $# -ne 2 ]; then  
echo -e "Usage : $prog \$1 \$2 "  
exit 2  
fi  
  
DB_PORT=$2  
  
nohup echo -e "q"|telnet -e "q" 127.0.0.1 $DB_PORT >/dev/null 2>&1  
if [ $? -ne 0 ]; then  
echo -e "PostgreSQL not run in this node"  
exit 0  
fi  
  
check_archive()  
{  
test -f /tmp/pgarchive.nagios_$DB_PORT  
if [ $? -ne 0 ]; then  
RESULT=2  
MESSAGE="`date +%F%T`\n/tmp/pgarchive.nagios_$DB_PORT not exist."  
echo -e $MESSAGE  
return $RESULT  
fi  
  
find /tmp/pgarchive.nagios_$DB_PORT -mmin -45|grep "pgarchive.nagios_$DB_PORT"  
if [ $? -ne 0 ]; then  
RESULT=2  
MESSAGE="`date +%F%T`\nPostgreSQL archive timeout."  
echo -e $MESSAGE  
return $RESULT  
fi  
  
RESULT=`head -n 1 /tmp/pgarchive.nagios_$DB_PORT`  
cat /tmp/pgarchive.nagios_$DB_PORT  
return $RESULT  
}  
  
# See how we were called.  
case "$1" in  
  check_archive)  
        check_archive  
        ;;  
  *)  
        echo $"Usage: $prog {check_archive} port"  
        exit 1  
esac  
# 2011-07-01 by Digoal.Zhou  
chmod 555 /usr/local/nagios/libexec/check_postgresql  

把这个脚本加入到nagios配置文件

vi /usr/local/nagios/etc/nrpe.cfg  
  
添加一条  
command[check_postgresql1]=/usr/local/nagios/libexec/check_postgresql check_archive 5432  

重启xinetd服务

service xinetd restart  

在nagios服务端添加监控项 check_postgresql1

Flag Counter

digoal’s 大量PostgreSQL文章入口