PostgreSQL partition table or inherits table predict count and gap and privilege monitor
背景
当管理的数据库越来越多, 分区表也越来越多, 一个人可能要维护几十万的分区表. 对于分表的监控显得尤为重要.
本文基于PostgreSQL 9.2 进行讲解, 其他版本可能需要略微调整.
PostgreSQL 的分区表除了性能方面需要加强以外, 管理方面也需要加强.
例如,
一、分区表的气泡, 例如以下情况 :
父表:
tbl
子表 :
tbl_2012_01
tbl_2012_02
tbl_2012_03
tbl_2012_05
tbl_2012_06
以上分区表tbl_2012_04不存在, 说明存在气泡.
二、分区表的子表权限和父表权限不一致, 例如以下情况 :
一般来说父表权限应该和子表权限一致, 什么时候可能遇到不一致的情况呢?
1. 新增了子表, 忘记给新增的子表赋予权限.
例如系统中有另外的用户需要读分区表, 在一段时间后新增了子表, 很有可能会忘记赋予权限.
PostgreSQL 9.0开始, 只要给主表赋权, 直接操作主表的话, 不需要子表权限. 但是直接查询子表的话, 还是需要赋权的.
Note how table access permissions are handled. Querying a parent table can automatically access data in child tables without further access privilege checking.
This preserves the appearance that the data is (also) in the parent table.
Accessing the child tables directly is, however, not automatically allowed and would require further privileges to be granted.
三、分区表的过期
例如以下情况 :
父表:
tbl
子表 :
tbl_2012_12
tbl_2013_01
那么到2013年02月份的时候, 子表tbl_2013_02还不存在, 通常会报表不存在的错误.
接下来将针对以上几种情况进行监控, 假设表名遵循以下规则 :
1. 表名中的日期使用标准的yyyy, mm, dd 格式.
例如, tbl_2012_1_1 是不合法的,
应该改成tbl_2012_01_01.
2. 继承表的表名应该包含父表的表名作为前缀.
例如, tbl, tbl_2014.
监控还需要用到pg_inherits这个系统表, 其中inhseqno字段需要注意一下 :
If there is more than one direct parent for a child table (multiple inheritance),
this number tells the order in which the inherited columns are to be arranged. The count starts at 1.
当一个表直接继承多个父表的时候, 可用inhseqno来区分.
如下 :
digoal=> create table t1(id int);
CREATE TABLE
digoal=> create table t2(id int);
CREATE TABLE
digoal=> create table t(id int) inherits(t1);
NOTICE: merging column "id" with inherited definition
CREATE TABLE
digoal=> alter table t inherit t2;
ALTER TABLE
digoal=> select inhparent::regclass,inhrelid::regclass,inhseqno from pg_inherits ;
inhparent | inhrelid | inhseqno
-----------+----------+----------
t1 | t | 1
t2 | t | 2
(2 rows)
解除继承后重新继承的话, inhseqno会增加.
digoal=> alter table t no inherit t1;
ALTER TABLE
digoal=> alter table t inherit t1;
ALTER TABLE
digoal=> select inhparent::regclass,inhrelid::regclass,inhseqno from pg_inherits ;
inhparent | inhrelid | inhseqno
-----------+----------+----------
t2 | t | 2
t1 | t | 3
(2 rows)
在被监控的数据库中创建2个函数, 1个视图 :
通过pg_class.oid获取权限的函数, 用于比较子表的权限和父表的权限是否一致 .
create or replace function mon_part_get_acl(i_oid oid) returns aclitem[] as $$
declare
result aclitem[] := '{}'::aclitem[];
begin
select relacl into result from pg_class where oid=i_oid and relacl is not null;
if found then
return result;
else
return '{}'::aclitem[];
end if;
exception when others then
return '{}'::aclitem[];
end;
$$ language plpgsql;
把表名的数字转成日期的函数, 用于分析gap等.
create or replace function mon_part_conv_to_date(i_relname text, i_prefix text) returns date as $$
declare
v_len int;
v_date text := '';
i text := '';
v_suffix text := '';
begin
v_suffix := replace(i_relname,i_prefix,'');
FOREACH i IN ARRAY regexp_split_to_array(v_suffix, '')
loop
if (i >= '0' and i <= '9') then
v_date := v_date||i;
end if;
end loop;
v_len := length(v_date);
-- the first 2str must 18,19,20,21,22
if (substr(v_date,1,2) >= '18' and substr(v_date,1,2) <= '22' ) then
if (v_len >= 4 and v_len <=8) then
case v_len
when 4 then
return (v_date||'0101')::date;
when 6 then
return (v_date||'01')::date;
when 8 then
return v_date::date;
else
return '19700101'::date;
end case;
else
-- not a valid date
return '22220101';
end if;
else
-- not a valid date
return '22220101';
end if;
exception when others then
raise notice 'partition table name date format error.';
return '19700101'::date;
end;
$$ language plpgsql;
视图 :
根据间隔的平均值来评估是否出现了分区异常.
create or replace view v_mon_partition_tables as
select
*,
(max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt
from
(
select
*,
pdate-lag_pdate AS gap,
case
when pdate-lag_pdate >=365 then -- 预计按年分区
pdate - date_trunc('day', (pdate - interval '1 year'))::date
when pdate-lag_pdate >=28 then -- 预计按月分区
pdate - date_trunc('day', (pdate - interval '1 month'))::date
when pdate-lag_pdate >=1 then -- 预计按日分区
1
else
0.9 -- 分区间隔小于1天
end AS predict_gap
from
(
select
row_number() over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS rn,
current_database() AS dbname,
inhparent::regclass::text AS inhparent,
lag(inhrelid::regclass::text,1) over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS lag_inhrelid,
inhrelid::regclass::text AS inhrelid,
mon_part_get_acl(inhparent) AS acl_inhparent,
mon_part_get_acl(inhrelid) AS acl_inhrelid,
mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text) AS pdate,
lag(mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text), 1) over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS lag_pdate,
max(mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) over (partition by inhparent) AS max_pdate
from pg_inherits
) t
) t;
视图结构如下 :
View "payment.v_mon_partition_tables"
Column | Type | Modifiers
-------------------+-----------+-----------
rn | bigint |
dbname | name |
inhparent | text |
lag_inhrelid | text |
inhrelid | text |
acl_inhparent | aclitem[] |
acl_inhrelid | aclitem[] |
pdate | date |
lag_pdate | date |
max_pdate | date |
gap | integer |
predict_gap | numeric |
future_tables_cnt | numeric |
监控日期继承表存在气泡SQL,
select * from v_mon_partition_tables
where
gap<>predict_gap
and pdate <> '22220101'::date
and lag_pdate is not null;
监控过期的子表
例如保持未来有10个子表可用,并且未来2年(731day)都有子表已经创建,
select * from
(
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn
from v_mon_partition_tables
where
(future_tables_cnt<=10 or max_pdate - current_date <731)
and pdate <> '22220101'::date
and lag_pdate is not null
) t order by rn;
继承表的权限是否一致,
select * from v_mon_partition_tables
where acl_inhparent<>acl_inhrelid;
测试 :
1. 监控日期继承表存在气泡SQL,
digoal=> create table tbl(id int, crt_time timestamp(0));
CREATE TABLE
digoal=> create table tbl_201201 (id int, crt_time timestamp(0)) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> create table tbl_201202 (id int, crt_time timestamp(0)) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> create table tbl_201204 (id int, crt_time timestamp(0)) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> create table tbl_201205 (id int, crt_time timestamp(0)) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> select * from v_mon_partition_tables
where
gap<>predict_gap
and pdate <> '22220101'::date
and lag_pdate is not null;
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+-----------+------------+------------+------------+-----+-------------
digoal | tbl | tbl_201204 | 2012-04-01 | 2012-02-01 | 60 | 31
(1 row)
2. 何时过期监控SQL, 例如保持未来有10个子表可用,并且未来2年(731day)都有子表已经创建,
select * from
(
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn
from v_mon_partition_tables
where
(future_tables_cnt<=10 or max_pdate - current_date <731)
and pdate <> '22220101'::date
and lag_pdate is not null
) t order by rn;
dbname | inhparent | inhrelid | pdate | max_pdate | gap | future_tables_cnt
--------+-----------+------------+------------+------------+-----+---------------------
digoal | tbl | tbl_201202 | 2012-02-01 | 2012-05-01 | 31 | -8.0000000000000000
digoal | tbl | tbl_201204 | 2012-04-01 | 2012-05-01 | 60 | -8.0000000000000000
digoal | tbl | tbl_201205 | 2012-05-01 | 2012-05-01 | 30 | -8.2666666666666667
(3 rows)
创建超过未来2年的子表(现在是2013-01) :
create table tbl_201206 (id int, crt_time timestamp(0)) inherits(tbl);
.....................
create table tbl_201507 (id int, crt_time timestamp(0)) inherits(tbl);
再次执行监控SQL :
select * from
(
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn
from v_mon_partition_tables
where
(future_tables_cnt<=10 or max_pdate - current_date <731)
and pdate <> '22220101'::date
and lag_pdate is not null
) t order by rn;
dbname | inhparent | inhrelid | pdate | max_pdate | gap | future_tables_cnt
--------+-----------+----------+-------+-----------+-----+-------------------
(0 rows)
3. 继承表的权限是否一致,
当前未赋任何权限, 所以权限是一致的.
digoal=> select dbname,inhparent,inhrelid,acl_inhparent,acl_inhrelid from v_mon_partition_tables
digoal-> where acl_inhparent<>acl_inhrelid;
dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid
--------+-----------+----------+---------------+--------------
(0 rows)
当修改父表的权限后, 如果未给子表赋予权限, 那么权限将不一致, 所以查询有结果 :
digoal=> grant select on tbl to skycac;
GRANT
digoal=> select dbname,inhparent,inhrelid,acl_inhparent,acl_inhrelid from v_mon_partition_tables
where acl_inhparent<>acl_inhrelid;
dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid
--------+-----------+------------+-----------------------------------------+--------------
digoal | tbl | tbl_201201 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201202 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201204 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201205 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201206 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201207 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201208 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201209 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201210 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201211 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201212 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201301 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201302 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201303 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201304 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201305 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201306 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201307 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201308 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201309 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201310 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201311 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201312 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201401 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201402 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201403 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201404 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201405 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201406 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201407 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201408 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201409 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201410 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201411 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201412 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201501 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201502 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201503 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201504 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201505 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201506 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | tbl | tbl_201507 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
(42 rows)
脚本中未修改, 以上SQL已调整, 注意同步调整以下脚本的内容.
用以上的SQL写个SHELL脚本就可以进行监控了, 或者结合nagios来进行监控.
例如 :
ocz@db-172-16-3-150-> cat t.sh
#!/bin/bash
# 环境变量
export LANG=en_US.utf8
export PGHOME=/opt/pgsql
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH:.
# 参数判断
if [ $# -ne 3 ]; then
echo -e "Please check your parameter."
echo -e "$0 HOST PORT ROLE \n"
exit 1
fi
TODAY=`date +%Y%m%d`
EMAIL="digoal@126.com"
HOST=$1
PORT=$2
ROLE=$3
for DB in `psql -q -t -h $HOST -p $PORT -U $ROLE postgres -c "select datname from pg_database where datname not in ('postgres','template0','template1')"`
do
echo -e "------`date +%F\ %T`----Partition Table MON----IP:$HOST PORT:$PORT DBNAME:$DB------"
psql -h $HOST -p $PORT -U $ROLE $DB <<EOF
create or replace function mon_part_get_acl(i_oid oid) returns aclitem[] as \$\$
declare
result aclitem[] := '{}'::aclitem[];
begin
select relacl into result from pg_class where oid=i_oid and relacl is not null;
if found then
return result;
else
return '{}'::aclitem[];
end if;
exception when others then
return '{}'::aclitem[];
end;
\$\$ language plpgsql;
create or replace function mon_part_conv_to_date(i_relname text) returns date as \$\$
declare
v_len int;
v_date text := '';
i text := '';
begin
FOREACH i IN ARRAY regexp_split_to_array(i_relname, '')
loop
if (i >= '0' and i <= '9') then
v_date := v_date||i;
end if;
end loop;
v_len := length(v_date);
-- the first 2str must 18,19,20,21,22
if (substr(v_date,1,2) >= '18' and substr(v_date,1,2) <= '22' ) then
if (v_len >= 4 and v_len <=8) then
case v_len
when 4 then
return (v_date||'0101')::date;
when 6 then
return (v_date||'01')::date;
when 8 then
return v_date::date;
else
return '19700101'::date;
end case;
else
-- not a valid date
return '22220101';
end if;
else
-- not a valid date
return '22220101';
end if;
exception when others then
raise notice 'partition table name date format error.';
return '19700101'::date;
end;
\$\$ language plpgsql;
create or replace view v_mon_partition_tables as
select
*,
(max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt
from
(
select
*,
pdate-lag_pdate AS gap,
case
when pdate-lag_pdate >=365 then
pdate - date_trunc('year', (pdate - interval '1 year'))::date
when pdate-lag_pdate >=28 then
pdate - date_trunc('month', (pdate - interval '1 month'))::date
when pdate-lag_pdate >=7 then
pdate-lag_pdate
when pdate-lag_pdate >=1 then
pdate - date_trunc('day', (pdate - interval '1 day'))::date
else
0.9
end AS predict_gap
from
(
select
row_number() over (partition by inhparent order by mon_part_conv_to_date((inhrelid::regclass)::text)) AS rn,
current_database() AS dbname,
(inhparent::regclass)::text AS inhparent,
(inhrelid::regclass)::text AS inhrelid,
mon_part_get_acl(inhparent) AS acl_inhparent,
mon_part_get_acl(inhrelid) AS acl_inhrelid,
mon_part_conv_to_date((inhrelid::regclass)::text) AS pdate,
lag(mon_part_conv_to_date((inhrelid::regclass)::text), 1) over (partition by inhparent order by mon_part_conv_to_date((inhrelid::regclass)::text)) AS lag_pdate,
max(mon_part_conv_to_date((inhrelid::regclass)::text)) over (partition by inhparent) AS max_pdate
from pg_inherits
) t
) t;
select '-- 监控日期继承表气泡:' as monitor;
select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables
where
gap<>predict_gap
and pdate <> '22220101'::date
and lag_pdate is not null;
select '-- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):' as monitor;
select * from
(
select dbname,inhparent,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn
from v_mon_partition_tables
where
(future_tables_cnt<=10 or max_pdate - current_date <731)
and pdate <> '22220101'::date
and lag_pdate is not null
) t
where rn=1;
select '-- 监控所有继承表的权限和父表是否一致,输出含有不一致的父表名称:' as monitor;
select server_ip,server_port,dbname,name_inhparent
from mon_inherits
where acl_inhparent<>acl_inhrelid
group by server_ip,server_port,dbname,name_inhparent;
EOF
done
执行 :
ocz@db-172-16-3-150-> ./t.sh 127.0.0.1 9201 postgres
------2013-01-04 17:14:16----Partition Table MON----IP:127.0.0.1 PORT:9201 DBNAME:digoal------
CREATE FUNCTION
CREATE FUNCTION
CREATE VIEW
monitor
----------------------------
-- 监控日期继承表气泡:
(1 row)
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+------------+-------------------+------------+------------+-----+-------------
digoal | digoal.tbl | digoal.tbl_201204 | 2012-04-01 | 2012-02-01 | 60 | 31
(1 row)
monitor
--------------------------------
-- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):
(1 row)
dbname | inhparent | inhrelid | max_pdate | predict_gap | future_tables_cnt | rn
--------+-----------+----------+-----------+-------------+-------------------+----
(0 rows)
monitor
------------------
-- 监控所以继承表的权限和父表是否一致,输出不一致的:
(1 row)
dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid
--------+------------+-------------------+-----------------------------------------+--------------
digoal | digoal.tbl | digoal.tbl_201201 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201202 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201204 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201205 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201206 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201207 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201208 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201209 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201210 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201211 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201212 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201301 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201302 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201303 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201304 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201305 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201306 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201307 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201308 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201309 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201310 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201311 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201312 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201401 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201402 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201403 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201404 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201405 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201406 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201407 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201408 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201409 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201410 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201411 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201412 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201501 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201502 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201503 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201504 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201505 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201506 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
digoal | digoal.tbl | digoal.tbl_201507 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}
(42 rows)
------2013-01-04 17:14:16----Partition Table MON----IP:127.0.0.1 PORT:9201 DBNAME:skycac------
CREATE FUNCTION
CREATE FUNCTION
CREATE VIEW
monitor
----------------------------
-- 监控日期继承表气泡:
(1 row)
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+-----------+----------+-------+-----------+-----+-------------
(0 rows)
monitor
--------------------------------
-- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):
(1 row)
dbname | inhparent | inhrelid | max_pdate | predict_gap | future_tables_cnt | rn
--------+-----------+----------+-----------+-------------+-------------------+----
(0 rows)
monitor
------------------
-- 监控所以继承表的权限和父表是否一致,输出不一致的:
(1 row)
dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid
--------+-----------+----------+---------------+--------------
(0 rows)
如果是远程监控可以编辑PGPASSFILE进行.
其他
1. 当月表要变日表时, 不影响监控, 如下 :
digoal=> create table tbl_20150801(like tbl) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> create table tbl_20150802(like tbl) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> create table tbl_20150803(like tbl) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables
digoal-> where
digoal-> gap<>predict_gap
digoal-> and pdate <> '22220101'::date
digoal-> and lag_pdate is not null;
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+-----------+------------+------------+------------+-----+-------------
digoal | tbl | tbl_201204 | 2012-04-01 | 2012-02-01 | 60 | 31
(1 row)
当日表想变成月表时, 需要注意一定要补齐月末的表 :
digoal=> create table tbl_201509(like tbl) inherits(tbl);
NOTICE: merging column "id" with inherited definition
NOTICE: merging column "crt_time" with inherited definition
CREATE TABLE
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables
where
gap<>predict_gap
and lag_pdate is not null;
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+-----------+------------+------------+------------+-----+-------------
digoal | tbl | tbl_201204 | 2012-04-01 | 2012-02-01 | 60 | 31
digoal | tbl | tbl_201509 | 2015-09-01 | 2015-08-03 | 29 | 31
(2 rows)
补齐后, 监控恢复正常 :
create table tbl_20150804(like tbl) inherits(tbl);
................
create table tbl_20150831(like tbl) inherits(tbl);
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables
digoal-> where
digoal-> gap<>predict_gap
digoal-> and lag_pdate is not null;
dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap
--------+-----------+------------+------------+------------+-----+-------------
digoal | tbl | tbl_201204 | 2012-04-01 | 2012-02-01 | 60 | 31
(1 row)
2. 非法表名监控, 一定要清理掉所有非法表名的分区表. 否则这个监控就有暗病.
3. 人为的每个月一次例行检查还是要的, 因为这个脚本不能适合所有场景.
统一监控
方法,
1. 将被监控的数据库中的继承表的信息统一收集到一个监控数据库中,
2. 在监控数据库中进行分析.
监控库如下(存放所有收集到的生产数据库的继承表信息) :
HOST=172.16.3.150
PORT=1921
DBNAME=digoal
ROLE=digoal
SCHEMA=digoal
创建存储继承表信息的表, 如下.
create table mon_inherits (
server_ip inet,
server_port int,
dbname name,
inhrelid oid,
inhparent oid,
inhseqno int,
name_inhrelid text,
name_inhparent text,
acl_inhrelid text[],
acl_inhparent text[],
crt_time timestamp
);
创建唯一约束 :
create unique index idx_mon_inherits_1 on mon_inherits(server_ip, server_port, dbname, inhrelid, inhseqno);
创建数组排序函数, 原因见本文末尾 :
create or replace function mon_array_sort(i_arr text[]) returns text[] as $$
declare
result text[];
begin
select array_agg(arr order by arr) into result from unnest(i_arr) t(arr);
return result;
end;
$$ language plpgsql;
创建白名单表(用于过滤不是按日期分表的继承表)
create table mon_inherits_white (name_inhrelid text primary key);
关于array的比较可以参考以下BLOG :
http://blog.163.com/digoal@126/blog/static/16387704020130682522480/
被监控的目标库用到的取ACL的函数 :
创建获取acl的函数, 这个函数可以在统一收集信息的脚本中创建.
drop function mon_part_get_acl(oid);
create or replace function mon_part_get_acl(i_oid oid) returns text[] as $$
declare
result text[] := '{}'::text[];
begin
select relacl::text[] into result from pg_class where oid=i_oid and relacl is not null;
if found then
return result;
else
return '{}'::text[];
end if;
exception when others then
return '{}'::text[];
end;
$$ language plpgsql;
采集服务器, 能连到所有被监控的目标库 :
收集信息的脚本 :
vi /home/postgres/script/coll_inherit_info.sh
#!/bin/bash
# 环境变量
export LANG=en_US.utf8
export PGHOME=/opt/pgsql
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH:.
# 参数判断
if [ $# -ne 3 ]; then
echo -e "Please check your parameter."
echo -e "$0 HOST PORT ROLE \n"
exit 1
fi
TODAY=`date +%Y%m%d`
EMAIL="digoal@126.com"
MON_HOST=172.16.3.150
MON_PORT=1921
MON_ROLE=digoal
MON_DBNAME=digoal
MON_COPY_SQL="copy mon_inherits(server_ip,server_port,dbname,inhrelid,inhparent,inhseqno,name_inhrelid,name_inhparent,acl_inhrelid,acl_inhparent,crt_time) from stdin"
TARGET_COPY_SQL="copy (select inet_server_addr(),inet_server_port(),current_database(),inhrelid,inhparent,inhseqno,(inhrelid::regclass)::text,(inhparent::regclass)::text,mon_part_get_acl(inhrelid),mon_part_get_acl(inhparent),now() from pg_inherits) to stdout"
HOST=$1
PORT=$2
ROLE=$3
echo -e "-h $HOST -p $PORT -U $ROLE :\n"
for DB in `psql -q -t -h $HOST -p $PORT -U $ROLE postgres -c "select datname from pg_database where datname not in ('postgres','template0','template1')"`
do
echo -e "psql -q -t -h $HOST -p $PORT -U $ROLE $DB :\n"
# ----创建获取acl的函数
psql -h $HOST -p $PORT -U $ROLE $DB <<EOF
drop function mon_part_get_acl(oid);
create or replace function mon_part_get_acl(i_oid oid) returns text[] as \$\$
declare
result text[] := '{}'::text[];
begin
select relacl::text[] into result from pg_class where oid=i_oid and relacl is not null;
if found then
return result;
else
return '{}'::text[];
end if;
exception when others then
return '{}'::text[];
end;
\$\$ language plpgsql;
EOF
# ----获取真实目标数据库数据,避免pgbouncer或者防火墙中转过后的地址或者数据库名的改变造成不能使用.
REAL_HOST=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select inet_server_addr()"`
REAL_PORT=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select inet_server_port()"`
REAL_DBNAME=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select current_database()"`
# ----清除历史数据
psql -h $MON_HOST -p $MON_PORT -U $MON_ROLE $MON_DBNAME -c "delete from mon_inherits where server_ip='$REAL_HOST' and server_port='$REAL_PORT' and dbname='$REAL_DBNAME'"
# ----拷贝继承表数据
psql -h $HOST -p $PORT -U $ROLE $DB -c "$TARGET_COPY_SQL"|psql -h $MON_HOST -p $MON_PORT -U $MON_ROLE $MON_DBNAME -c "$MON_COPY_SQL"
done
chmod 500 /home/postgres/script/coll_inherit_info.sh
采集服务器,
vi ~/.pgpass
172.16.3.150:1921:digoal:digoal:digoal
172.16.3.150:9201:*:postgres:postgres
.....pgpass文件中包含的其他目标库的超级用户连接配置项目略
chmod 400 ~/.pgpass
执行测试 :
ocz@db-172-16-3-150-> ./t.sh 172.16.3.150 9201 postgres
CREATE FUNCTION
DELETE 74
CREATE FUNCTION
DELETE 0
信息收集如下 :
ocz@db-172-16-3-150-> psql -h 127.0.0.1 -p 9201 postgres
psql (9.2.1)
Type "help" for help.
postgres=# \c digoal digoal
You are now connected to database "digoal" as user "digoal".
digoal=> select * from mon_inherits ;
server_ip | server_port | dbname | inhrelid | inhparent | inhseqno | name_inhrelid | name_inhparent | acl_inhrelid |
acl_inhparent | crt_time
--------------+-------------+--------+----------+-----------+----------+---------------------+----------------+--------------+------
-----------------------------------+----------------------------
172.16.3.150 | 9201 | digoal | 98010 | 98007 | 1 | digoal.tbl_201201 | digoal.tbl | {} | {digo
al=arwdDxt/digoal,skycac=r/digoal} | 2013-01-05 09:37:53.100495
172.16.3.150 | 9201 | digoal | 98013 | 98007 | 1 | digoal.tbl_201202 | digoal.tbl | {} | {digo
al=arwdDxt/digoal,skycac=r/digoal} | 2013-01-05 09:37:53.100495
....略(74 rows)
采集服务器, 编辑定期执行脚本 :
vi /home/postgres/script/exec_coll_inherit_info.sh
#!/bin/bash
/home/postgres/script/coll_inherit_info.sh 172.16.3.150 9201 postgres
采集服务器, 编辑采集错误监控脚本 :
vi /home/postgres/script/mutt_err_coll_inherit_info.sh
#!/bin/bash
. /home/postgres/.bash_profile
FATAL=1
ERROR=1
FATAL=`grep -c FATAL /tmp/exec_coll_inherit_info.log`
ERROR=`grep -c ERROR /tmp/exec_coll_inherit_info.log`
if [ $FATAL -ne 0 ] || [ $ERROR -ne 0 ]; then
echo -e "`date +%F%T`exec_coll_inherit_info error\n"|mutt -s "exec_coll_inherit_info error" -a /tmp/exec_coll_inherit_info.log digoal.zhou@sky-mobi.com digoal@126.com
fi
定时任务 :
crontab -l
1 5 * * * /home/postgres/script/exec_coll_inherit_info.sh >/tmp/exec_coll_inherit_info.log 2>&1
1 6 * * * /home/postgres/script/mutt_err_coll_inherit_info.sh
修改权限 :
chmod -R 500 /home/postgres/script/*
在监控集中库中创建表名转日期的函数 :
create or replace function mon_part_conv_to_date(i_relname text) returns date as $$
declare
arr text[] := ARRAY[0];
v_date text := '';
i text := '';
v_len int := 0;
begin
arr := regexp_split_to_array( regexp_replace( rtrim( ltrim( regexp_replace(i_relname,'[^[:digit:]]','_','g') ,'_') ,'_') ,'_+','_','g') ,'_');
FOREACH i IN ARRAY arr
loop
if (length(v_date) = 8) then
exit;
end if;
if (length(i) = 4 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22')) then
v_date := v_date||i;
elsif (length(i) = 6 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22') and (substr(i,5,2)>='01' and substr(i,5,2)<='12')) then
v_date := v_date||i;
elsif (length(i) = 8 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22') and (substr(i,5,2)>='01' and substr(i,5,2)<='12') and (substr(i,7,2)>='01' and substr(i,7,2)<='31')) then
v_date := v_date||i;
elsif (length(i)=4 and length(v_date) = 4 and (substr(i,1,2)>='01' and substr(i,1,2)<='12') and (substr(i,3,2)>='01' and substr(i,3,2)<='31')) then
v_date := v_date||i;
elsif (length(i)=2 and length(v_date) = 4 and (substr(i,1,2)>='01' and substr(i,1,2)<='12')) then
v_date := v_date||i;
elsif (length(i)=2 and length(v_date) = 6 and (substr(i,1,2)>='01' and substr(i,1,2)<='31')) then
v_date := v_date||i;
else
continue;
end if;
end loop;
v_len := length(v_date);
case v_len
when 4 then -- 按年分表
return (v_date||'0101')::date;
when 6 then -- 按月分表
return (v_date||'01')::date;
when 8 then -- 按天分表
return v_date::date;
else -- 长度为5和7非日期分表
return '22220101'::date;
end case;
exception when others then
return '22220101'::date;
end;
$$ language plpgsql;
创建视图 :
create or replace view v_mon_partition_tables as
select
*,
(max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt
from
(
select
*,
pdate-lag_pdate AS gap,
case
when pdate-lag_pdate >=365 then
pdate - date_trunc('year', (pdate - interval '1 year'))::date
when pdate-lag_pdate >=28 then
pdate - date_trunc('month', (pdate - interval '1 month'))::date
when pdate-lag_pdate >=7 then
pdate-lag_pdate
when pdate-lag_pdate >=1 then
pdate - date_trunc('day', (pdate - interval '1 day'))::date
else
0.9
end AS predict_gap
from
(
select
row_number() over (partition by server_ip,server_port,dbname,inhparent order by mon_part_conv_to_date(name_inhrelid)) AS rn,
server_ip,
server_port,
dbname,
name_inhparent,
name_inhrelid,
acl_inhparent,
acl_inhrelid,
mon_part_conv_to_date(name_inhrelid) AS pdate,
lag(mon_part_conv_to_date(name_inhrelid), 1) over (partition by server_ip,server_port,dbname,inhparent order by mon_part_conv_to_date(name_inhrelid)) AS lag_pdate,
max(mon_part_conv_to_date(name_inhrelid)) over (partition by server_ip,server_port,dbname,inhparent) AS max_pdate
from mon_inherits
) t
) t;
视图结构 :
View "inherit_mon.v_mon_partition_tables"
Column | Type | Modifiers
-------------------+---------+-----------
rn | bigint |
server_ip | inet |
server_port | integer |
dbname | name |
name_inhparent | text |
name_inhrelid | text |
acl_inhparent | text[] |
acl_inhrelid | text[] |
pdate | date |
lag_pdate | date |
max_pdate | date |
gap | integer |
predict_gap | numeric |
future_tables_cnt | numeric |
监控脚本示例 :
每天自动监控 :
vi ~/inherit_day_monitor.sh
#!/bin/bash
# 环境变量
export LANG=en_US.utf8
export PGHOME=/opt/pgsql9.1.2
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH:.
TODAY=`date +%Y%m%d`
EMAIL="digoal@126.com"
HOST=127.0.0.1
PORT=1921
ROLE=digoal
DBNAME=digoal
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_day_monitor.log
select '-- 监控继承表的权限和父表是否一致,输出不一致项:' as monitor;
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,mon_array_sort(acl_inhparent),mon_array_sort(acl_inhrelid)
from mon_inherits
where mon_array_sort(acl_inhparent)<>mon_array_sort(acl_inhrelid);
select '-- 监控日期继承表气泡:' as monitor;
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,pdate,lag_pdate,gap,predict_gap
from v_mon_partition_tables
where
gap <> predict_gap
and pdate <> '22220101'::date
and lag_pdate is not null;
select '-- 监控日期继承表过期(需满足大于未来10个表,且大于未来半年(185day)):' as monitor;
select * from
(
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by server_ip,server_port,dbname,name_inhparent order by pdate desc) AS rn
from v_mon_partition_tables
where
(future_tables_cnt<=10 or max_pdate - current_date <185)
and pdate <> '22220101'::date
and lag_pdate is not null
) t
where rn=1;
EOF
CNT=0
CNT=`wc -l /tmp/inherit_day_monitor.log|awk '{print $1}'`
if [ $CNT -gt 3 ]; then
echo -e "partition table abnormal.\n"|mutt -s "partition table abnormal" -a /tmp/inherit_day_monitor.log $EMAIL
fi
每月非法表名检查 :
vi ~/inherit_month_invalidname_monitor.sh
#!/bin/bash
# 环境变量
export LANG=en_US.utf8
export PGHOME=/opt/pgsql9.1.2
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH:.
TODAY=`date +%Y%m%d`
EMAIL="digoal@126.com"
HOST=127.0.0.1
PORT=1921
ROLE=digoal
DBNAME=digoal
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_month_invalidname_monitor.log
select '-- 每月非法表名检查,过滤白名单(mon_inherits_white): ' as monitor;
select server_ip,server_port,dbname,name_inhrelid
from mon_inherits
where mon_part_conv_to_date(name_inhrelid)='22220101'::date
and name_inhrelid not in (select name_inhrelid from mon_inherits_white)
order by server_ip,server_port,dbname,name_inhparent,name_inhrelid;
EOF
CNT=0
CNT=`wc -l /tmp/inherit_month_invalidname_monitor.log|awk '{print $1}'`
if [ $CNT -gt 3 ]; then
echo -e "partition table 每月非法表名检查.\n"|mutt -s "partition table 每月非法表名检查" -a /tmp/inherit_month_invalidname_monitor.log $EMAIL
fi
非法表明确认不是按日期分表的, 并且不需要气泡,过期检查的. 写到白名单表中.
例如 :
insert into mon_inherits_white select name_inhrelid from (select server_ip,server_port,dbname,name_inhrelid
from mon_inherits
where mon_part_conv_to_date(name_inhrelid)='22220101'::date
and name_inhrelid not in (select name_inhrelid from mon_inherits_white)
order by server_ip,server_port,dbname,name_inhparent,name_inhrelid) t group by name_inhrelid;
以后定期维护这个白名单表.
每月人工检查 :
vi ~/inherit_month_human_monitor.sh
#!/bin/bash
# 环境变量
export LANG=en_US.utf8
export PGHOME=/opt/pgsql9.1.2
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH:.
TODAY=`date +%Y%m%d`
EMAIL="digoal@126.com"
HOST=127.0.0.1
PORT=1921
ROLE=digoal
DBNAME=digoal
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_month_human_monitor.log
select '-- 每月人工检查:' as monitor;
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,crt_time
from mon_inherits
order by server_ip,server_port,dbname,name_inhparent,name_inhrelid;
EOF
CNT=0
CNT=`wc -l /tmp/inherit_month_human_monitor.log|awk '{print $1}'`
if [ $CNT -gt 3 ]; then
echo -e "partition table 每月人工检查.\n"|mutt -s "partition table 每月人工检查" -a /tmp/inherit_month_human_monitor.log $EMAIL
fi
mutt环境 :
vi ~/.muttrc
set envelope_from=yes
set from=digoal.zhou@sky-mobi.com
set realname="德哥"
set use_from=yes
set charset="UTF-8"
定时任务 :
crontab -e
1 8 * * * /home/pg91/inherit_day_monitor.sh
1 9 1 * * /home/pg91/inherit_month_invalidname_monitor.sh
1 9 1 * * /home/pg91/inherit_month_human_monitor.sh
注意
1. 如果存在IP,端口,库名相同的情况(例如公司存在多个IDC, 这些IDC中用了同样的内网网段), 以上脚本将不适用. 那请使用多个监控库来区分, 或者多个监控表来区分.
2. 集中监控时为什么acl字段改成了text呢, 因为在监控库中不存在生产库中的用户, 存储不存在的角色时, aclitem[]类型将出错.
3. 权限比较可能出现父表和子表赋权时顺序不一致造成的假象, 这个需要手工调整一下.
例如 :
digoal=> create table tbl(id int);
CREATE TABLE
digoal=> create table tbl_child(id int);
CREATE TABLE
digoal=> grant select on tbl to postgres;
GRANT
digoal=> grant select on tbl to skycac;
GRANT
digoal=> grant select on tbl_child to skycac;
GRANT
digoal=> grant select on tbl_child to postgres;
GRANT
虽然实际的权限是一致的, 但是这样得到的权限比较将不一致 :
digoal=> select relacl from pg_class where relname='tbl';
relacl
-----------------------------------------------------------
{digoal=arwdDxt/digoal,postgres=r/digoal,skycac=r/digoal}
(1 row)
digoal=> select relacl from pg_class where relname='tbl_child';
relacl
-----------------------------------------------------------
{digoal=arwdDxt/digoal,skycac=r/digoal,postgres=r/digoal}
(1 row)
要解决这个问题, 需要对数组进行排序, 我们可以用unnest将数组转成setof rec,然后再按顺序聚合(PostgreSQL 9.0开始支持聚合顺序).
如果不是9.0请自行写聚合函数, 聚合函数范例可参看 :
http://blog.163.com/digoal@126/blog/static/16387704020121118112533410/
下面针对PostgreSQL 9.0+ 举例如下 :
digoal=> create or replace function mon_array_sort(i_arr text[]) returns text[] as $$
declare
result text[];
begin
select array_agg(arr order by arr) into result from unnest(i_arr) t(arr);
return result;
end;
$$ language plpgsql;
排序后 :
digoal=> select mon_array_sort('{h1,b,b,c,g,e}'::text[]);
mon_array_sort
----------------
{b,b,c,e,g,h1}
(1 row)
4. 一般还需要监控的项目,
父表和继承表的表结构一致性.
父表和继承表的索引一致性.
父表和继承表的列权限一致性(pg_attribute.attacl).