PostgreSQL monitor - nagios client installation
背景
本文将介绍nagios客户端的安装, 这个需要安装在被监控的主机上.
以及如何配置监控.
一, 客户端配置
1. 安装nagios-plugins
下载最新稳定版
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
[root@db-172-16-3-39 soft_bak]# tar -zxvf nagios-plugins-1.4.16.tar.gz
[root@db-172-16-3-39 soft_bak]# cd nagios-plugins-1.4.16
添加执行监控脚本的用户, 可以忽略. 因为你可能会用postgres用户进行监控.
[root@db-172-16-3-39 nagios-plugins-1.4.16]# useradd nagios
编译安装
[root@db-172-16-3-39 nagios-plugins-1.4.16]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --prefix=/opt/nagios
[root@db-172-16-3-39 nagios-plugins-1.4.16]# make
[root@db-172-16-3-39 nagios-plugins-1.4.16]# make install
[root@db-172-16-3-39 nagios-plugins-1.4.16]# chown nagios:nagios /opt/nagios
[root@db-172-16-3-39 nagios-plugins-1.4.16]# chown -R nagios:nagios /opt/nagios/libexec
2. 安装xinetd服务
[root@db-172-16-3-39 nagios-plugins-1.4.16]# yum install -y xinetd
3. 安装NRPE - Nagios Remote Plugin Executor
NRPE是一个远程的插件, nagios server通过check_nrpe来调用远端的这个插件监控远端服务.
其实nagios server通过check_by_ssh也可以来监控远端服务, 只是使用check_by_ssh的CPU开销比较大, 当被监控的服务很多的时候, 就不适用了. 所以推荐使用check_nrpe.
NRPE下载地址
http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz?r=&ts=1363788540&use_mirror=hivelocity
解压
[root@db-172-16-3-39 soft_bak]# tar -zxvf nrpe-2.14.tar.gz
[root@db-172-16-3-39 soft_bak]# cd nrpe-2.14
[root@db-172-16-3-39 nrpe-2.14]# ./configure --prefix=/opt/nagios
[root@db-172-16-3-39 nrpe-2.14]# make all
[root@db-172-16-3-39 nrpe-2.14]# make install-plugin
[root@db-172-16-3-39 nrpe-2.14]# make install-daemon
[root@db-172-16-3-39 nrpe-2.14]# make install-daemon-config
[root@db-172-16-3-39 nrpe-2.14]# make install-xinetd
编辑xinetd nrpe配置文件.
修改only_from, 允许本地Ip和nagios server ip连接. 新增log_type配置, 不写日志.
[root@db-172-16-3-39 nrpe-2.14]# vi /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /opt/nagios/bin/nrpe
server_args = -c /opt/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 172.16.3.33
log_type = FILE /dev/null
}
修改/etc/services
[root@db-172-16-3-39 nrpe-2.14]# vi /etc/services
nrpe 5666/tcp # NRPE, 添加到文件末尾
每次修改/etc/xinetd.d/nrpe或/opt/nagios/etc/nrpe.cfg后需要重启xinetd 服务.
[root@db-172-16-3-39 nrpe-2.14]# service xinetd restart
Stopping xinetd: [FAILED]
Starting xinetd: [ OK ]
查看nrpe是否正常启动
[root@db-172-16-3-39 nrpe-2.14]# netstat -anpo|grep xinetd
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 933/xinetd off (0.00/0/0)
unix 2 [ ] DGRAM 7922940 933/xinetd
[root@db-172-16-3-39 nrpe-2.14]# netstat -at|grep nrpe
tcp 0 0 *:nrpe *:* LISTEN
使用check_nrpe命令检查nrpe后台进程是否启动.
[root@db-172-16-3-39 nrpe-2.14]# /opt/nagios/libexec/check_nrpe -H localhost
NRPE v2.14
以上返回表示正常.
使用check_nrpe 通过tcp调用远端 /opt/nagios/etc/nrpe.cfg 中配置的command.
[root@db-172-16-3-39 nrpe-2.14]# /opt/nagios/libexec/check_nrpe -H localhost -c check_load
OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0; load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0;
nagios server也是通过调用check_nrpe来检测远端服务, 所以在nagios server端需要安装check_nrpe插件.
二, nagios 服务端配置
1. 安装check_nrpe插件, 注意prefix与客户端不一样. 还有只需要安装插件即可.
[root@db-172-16-3-33 soft_bak]# tar -zxvf nrpe-2.14.tar.gz
[root@db-172-16-3-33 soft_bak]# cd nrpe-2.14
[root@db-172-16-3-33 nrpe-2.14]# ./configure --prefix=/opt/nagios-3.5.0
[root@db-172-16-3-33 nrpe-2.14]# make all
[root@db-172-16-3-33 nrpe-2.14]# make install-plugin
2. 配置主配置文件 /opt/nagios-3.5.0/etc/nagios.cfg
# You can specify individual object config files as shown below:
cfg_file=/opt/nagios-3.5.0/etc/objects/commands.cfg
cfg_file=/opt/nagios-3.5.0/etc/objects/contacts.cfg
cfg_file=/opt/nagios-3.5.0/etc/objects/timeperiods.cfg
cfg_file=/opt/nagios-3.5.0/etc/objects/templates.cfg
# Definitions for monitoring the local (Linux) host
cfg_file=/opt/nagios-3.5.0/etc/objects/localhost.cfg
nagios启动时会解析并加载这些配置文件.
编辑 /opt/nagios-3.5.0/etc/objects/commands.cfg
将check_nrpe添加进去.
[root@db-172-16-3-33 db_servers]# vi /opt/nagios-3.5.0/etc/objects/commands.cfg
# add by digoal
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
上面的$USER1$定义在环境变量文件 :
/opt/nagios-3.5.0/etc/resource.cfg
# Sets $USER1$ to be the path to the plugins
$USER1$=/opt/nagios-3.5.0/libexec
# Sets $USER2$ to be the path to event handlers
#$USER2$=/opt/nagios-3.5.0/libexec/eventhandlers
# Store some usernames and passwords (hidden from the CGIs)
#$USER3$=someuser
#$USER4$=somepassword
(可选)定义一个linux主机模板, 名为linux-box, 原始的模板也在templates.cfg这个文件中, 后面也会用到原始模板 :
[root@db-172-16-3-33 db_servers]# vi /opt/nagios-3.5.0/etc/objects/templates.cfg
# add by digoal
# host template linux-box
define host{
name linux-box ; Name of this template
use generic-host ; Inherit default values
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0 ; DONT REGISTER THIS - ITS A TEMPLATE
}
当然我们也可以定制配置文件目录, 目录中以cfg结尾的文件, 在nagios启动时都会解析和加载.
修改主配置文件, 在OBJECT CONFIGURATION FILE配置组中新增如下行 :
cfg_dir=/opt/nagios-3.5.0/etc/db_servers
新建目录, 修改权限
[root@db-172-16-3-33 etc]# mkdir -p /opt/nagios-3.5.0/etc/db_servers
[root@db-172-16-3-33 etc]# chown nagios:nagios db_servers
在该目录中新建配置文件 :
配置主机 :
[root@db-172-16-3-33 etc]# cd /opt/nagios-3.5.0/etc/db_servers/
[root@db-172-16-3-33 db_servers]# vi hosts.cfg
define host{
use linux-box ; Inherit default values from a template
host_name db_3_39 ; The name we're giving to this server
alias postgresql_3_39 ; A longer name for the server
address 172.16.3.39 ; IP address of the server
}
[root@db-172-16-3-33 db_servers]# chown nagios:nagios hosts.cfg
配置主机对应的服务 :
[root@db-172-16-3-33 db_servers]# cd /opt/nagios-3.5.0/etc/db_servers/
[root@db-172-16-3-33 db_servers]# vi services.cfg
define service{
use generic-service
host_name db_3_39 ; this is host.host_name
service_description Current Users
check_command check_nrpe!check_users ;check_users对应客户端/opt/nagios/etc/nrpe.cfg中定义的command, 后面亦如此.
}
define service{
use generic-service
host_name db_3_39
service_description CPU Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name db_3_39
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
define service{
use generic-service
host_name db_3_39
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name db_3_39
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}
修改配置文件权限
[root@db-172-16-3-33 db_servers]# chown nagios:nagios *.cfg
检测配置文件是否合法
[root@db-172-16-3-33 db_servers]# /opt/nagios-3.5.0/bin/nagios -v /opt/nagios-3.5.0/etc/nagios.cfg
如果配置文件没有错误的话, 重启nagios
[root@db-172-16-3-33 db_servers]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
nagios显示截图如下 :
PENDING是指还未执行check, 所以状态未知.
三, 客户端对应的nrpe.cfg配置
[root@db-172-16-3-39 nrpe-2.14]# vi /opt/nagios/etc/nrpe.cfg
command[check_users]=/opt/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/opt/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_zombie_procs]=/opt/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/opt/nagios/libexec/check_procs -w 150 -c 200
自定义监控也是配置这些东西. 配置完后需要重启xinetd服务.
[root@db-172-16-3-39 nrpe-2.14]# service xinetd restart
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]
参考
1. http://blog.163.com/digoal@126/blog/static/16387704020135313354383/
2. http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz?r=&ts=1363788540&use_mirror=hivelocity
3. nrpe-2.14/docs/NRPE.pdf