PostgreSQL partition table or inherits table predict count and gap and privilege monitor

21 minute read

背景

当管理的数据库越来越多, 分区表也越来越多, 一个人可能要维护几十万的分区表. 对于分表的监控显得尤为重要.

本文基于PostgreSQL 9.2 进行讲解, 其他版本可能需要略微调整.

PostgreSQL 的分区表除了性能方面需要加强以外, 管理方面也需要加强.

例如,

一、分区表的气泡, 例如以下情况 :

父表:

tbl  

子表 :

tbl_2012_01  
tbl_2012_02  
tbl_2012_03  
tbl_2012_05  
tbl_2012_06  

以上分区表tbl_2012_04不存在, 说明存在气泡.

二、分区表的子表权限和父表权限不一致, 例如以下情况 :

一般来说父表权限应该和子表权限一致, 什么时候可能遇到不一致的情况呢?

1. 新增了子表, 忘记给新增的子表赋予权限.

例如系统中有另外的用户需要读分区表, 在一段时间后新增了子表, 很有可能会忘记赋予权限.

PostgreSQL 9.0开始, 只要给主表赋权, 直接操作主表的话, 不需要子表权限. 但是直接查询子表的话, 还是需要赋权的.

Note how table access permissions are handled. Querying a parent table can automatically access data in child tables without further access privilege checking.   
  
This preserves the appearance that the data is (also) in the parent table.   
  
Accessing the child tables directly is, however, not automatically allowed and would require further privileges to be granted.  

三、分区表的过期

例如以下情况 :

父表:

tbl  

子表 :

tbl_2012_12  
tbl_2013_01  

那么到2013年02月份的时候, 子表tbl_2013_02还不存在, 通常会报表不存在的错误.

接下来将针对以上几种情况进行监控, 假设表名遵循以下规则 :

1. 表名中的日期使用标准的yyyy, mm, dd 格式.

例如, tbl_2012_1_1 是不合法的,

应该改成tbl_2012_01_01.

2. 继承表的表名应该包含父表的表名作为前缀.

例如, tbl, tbl_2014.

监控还需要用到pg_inherits这个系统表, 其中inhseqno字段需要注意一下 :

  If there is more than one direct parent for a child table (multiple inheritance),   
  this number tells the order in which the inherited columns are to be arranged. The count starts at 1.  

当一个表直接继承多个父表的时候, 可用inhseqno来区分.

如下 :

digoal=> create table t1(id int);  
CREATE TABLE  
digoal=> create table t2(id int);  
CREATE TABLE  
digoal=> create table t(id int) inherits(t1);  
NOTICE:  merging column "id" with inherited definition  
CREATE TABLE  
digoal=> alter table t inherit t2;  
ALTER TABLE  
digoal=> select inhparent::regclass,inhrelid::regclass,inhseqno from pg_inherits ;  
 inhparent | inhrelid | inhseqno   
-----------+----------+----------  
 t1        | t        |        1  
 t2        | t        |        2  
(2 rows)  

解除继承后重新继承的话, inhseqno会增加.

digoal=> alter table t no inherit t1;  
ALTER TABLE  
digoal=> alter table t inherit t1;  
ALTER TABLE  
digoal=> select inhparent::regclass,inhrelid::regclass,inhseqno from pg_inherits ;  
 inhparent | inhrelid | inhseqno   
-----------+----------+----------  
 t2        | t        |        2  
 t1        | t        |        3  
(2 rows)  

在被监控的数据库中创建2个函数, 1个视图 :

通过pg_class.oid获取权限的函数, 用于比较子表的权限和父表的权限是否一致 .

create or replace function mon_part_get_acl(i_oid oid) returns aclitem[] as $$  
declare  
  result aclitem[] := '{}'::aclitem[];  
begin  
  select relacl into result from pg_class where oid=i_oid and relacl is not null;  
  if found then  
    return result;  
  else  
    return '{}'::aclitem[];  
  end if;  
exception when others then  
  return '{}'::aclitem[];  
end;  
$$ language plpgsql;  

把表名的数字转成日期的函数, 用于分析gap等.

create or replace function mon_part_conv_to_date(i_relname text, i_prefix text) returns date as $$  
declare  
  v_len int;  
  v_date text := '';  
  i text := '';  
  v_suffix text := '';  
begin  
  v_suffix := replace(i_relname,i_prefix,'');  
  FOREACH i IN ARRAY regexp_split_to_array(v_suffix, '')  
  loop  
    if (i >= '0' and i <= '9') then  
      v_date := v_date||i;  
    end if;  
  end loop;  
  v_len := length(v_date);  
  -- the first 2str must 18,19,20,21,22  
  if (substr(v_date,1,2) >= '18' and substr(v_date,1,2) <= '22' ) then  
    if (v_len >= 4 and v_len <=8) then  
      case v_len  
      when 4 then  
        return (v_date||'0101')::date;  
      when 6 then  
        return (v_date||'01')::date;  
      when 8 then  
        return v_date::date;  
      else  
        return '19700101'::date;  
      end case;  
    else   
      -- not a valid date  
      return '22220101';  
    end if;  
  else  
    -- not a valid date  
    return '22220101';  
  end if;  
exception when others then  
  raise notice 'partition table name date format error.';  
  return '19700101'::date;  
end;  
$$ language plpgsql;  

视图 :

根据间隔的平均值来评估是否出现了分区异常.

create or replace view v_mon_partition_tables as   
select   
  *,  
  (max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt  
from   
(  
select   
  *,  
  pdate-lag_pdate AS gap,  
  case   
  when pdate-lag_pdate >=365 then  -- 预计按年分区  
    pdate - date_trunc('day', (pdate - interval '1 year'))::date  
  when pdate-lag_pdate >=28 then  -- 预计按月分区  
    pdate - date_trunc('day', (pdate - interval '1 month'))::date  
  when pdate-lag_pdate >=1 then  -- 预计按日分区  
    1  
  else   
    0.9  -- 分区间隔小于1天  
  end AS predict_gap  
from   
(  
select   
  row_number() over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS rn,  
  current_database() AS dbname,   
  inhparent::regclass::text AS inhparent,   
  lag(inhrelid::regclass::text,1) over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS lag_inhrelid,  
  inhrelid::regclass::text AS inhrelid,   
  mon_part_get_acl(inhparent) AS acl_inhparent,   
  mon_part_get_acl(inhrelid) AS acl_inhrelid,   
  mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text) AS pdate,   
  lag(mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text), 1) over (partition by inhparent order by mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) AS lag_pdate,  
  max(mon_part_conv_to_date(inhrelid::regclass::text,inhparent::regclass::text)) over (partition by inhparent) AS max_pdate  
from pg_inherits   
) t   
) t;  

视图结构如下 :

   View "payment.v_mon_partition_tables"  
      Column       |   Type    | Modifiers   
-------------------+-----------+-----------  
 rn                | bigint    |   
 dbname            | name      |   
 inhparent         | text      |   
 lag_inhrelid      | text      |   
 inhrelid          | text      |   
 acl_inhparent     | aclitem[] |   
 acl_inhrelid      | aclitem[] |   
 pdate             | date      |   
 lag_pdate         | date      |   
 max_pdate         | date      |   
 gap               | integer   |   
 predict_gap       | numeric   |   
 future_tables_cnt | numeric   |   

监控日期继承表存在气泡SQL,

select * from v_mon_partition_tables  
  where   
  gap<>predict_gap   
  and pdate <> '22220101'::date   
  and lag_pdate is not null;  

监控过期的子表

例如保持未来有10个子表可用,并且未来2年(731day)都有子表已经创建,

select * from  
(  
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn  
from v_mon_partition_tables  
  where   
  (future_tables_cnt<=10 or max_pdate - current_date <731)   
  and pdate <> '22220101'::date   
  and lag_pdate is not null  
) t order by rn;  

继承表的权限是否一致,

select * from v_mon_partition_tables  
  where acl_inhparent<>acl_inhrelid;  

测试 :

1. 监控日期继承表存在气泡SQL,

digoal=> create table tbl(id int, crt_time timestamp(0));  
CREATE TABLE  
digoal=> create table tbl_201201 (id int, crt_time timestamp(0)) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> create table tbl_201202 (id int, crt_time timestamp(0)) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> create table tbl_201204 (id int, crt_time timestamp(0)) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> create table tbl_201205 (id int, crt_time timestamp(0)) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> select * from v_mon_partition_tables  
  where   
  gap<>predict_gap   
  and pdate <> '22220101'::date   
  and lag_pdate is not null;  
 dbname | inhparent |  inhrelid  |   pdate    | lag_pdate  | gap | predict_gap   
--------+-----------+------------+------------+------------+-----+-------------  
 digoal | tbl       | tbl_201204 | 2012-04-01 | 2012-02-01 |  60 |          31  
(1 row)  

2. 何时过期监控SQL, 例如保持未来有10个子表可用,并且未来2年(731day)都有子表已经创建,

select * from  
(  
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn  
from v_mon_partition_tables  
  where   
  (future_tables_cnt<=10 or max_pdate - current_date <731)   
  and pdate <> '22220101'::date   
  and lag_pdate is not null  
) t order by rn;  
  
 dbname | inhparent |  inhrelid  |   pdate    | max_pdate  | gap |  future_tables_cnt    
--------+-----------+------------+------------+------------+-----+---------------------  
 digoal | tbl       | tbl_201202 | 2012-02-01 | 2012-05-01 |  31 | -8.0000000000000000  
 digoal | tbl       | tbl_201204 | 2012-04-01 | 2012-05-01 |  60 | -8.0000000000000000  
 digoal | tbl       | tbl_201205 | 2012-05-01 | 2012-05-01 |  30 | -8.2666666666666667  
(3 rows)  

创建超过未来2年的子表(现在是2013-01) :

create table tbl_201206 (id int, crt_time timestamp(0)) inherits(tbl);  
.....................  
create table tbl_201507 (id int, crt_time timestamp(0)) inherits(tbl);  

再次执行监控SQL :

select * from  
(  
select dbname,inhparent,lag_inhrelid,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn  
from v_mon_partition_tables  
  where   
  (future_tables_cnt<=10 or max_pdate - current_date <731)   
  and pdate <> '22220101'::date   
  and lag_pdate is not null  
) t order by rn;  
  
 dbname | inhparent | inhrelid | pdate | max_pdate | gap | future_tables_cnt   
--------+-----------+----------+-------+-----------+-----+-------------------  
(0 rows)  

3. 继承表的权限是否一致,

当前未赋任何权限, 所以权限是一致的.

digoal=>  select dbname,inhparent,inhrelid,acl_inhparent,acl_inhrelid from v_mon_partition_tables  
digoal->   where acl_inhparent<>acl_inhrelid;  
 dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid   
--------+-----------+----------+---------------+--------------  
(0 rows)  

当修改父表的权限后, 如果未给子表赋予权限, 那么权限将不一致, 所以查询有结果 :

digoal=> grant select on tbl to skycac;  
GRANT  
digoal=>  select dbname,inhparent,inhrelid,acl_inhparent,acl_inhrelid from v_mon_partition_tables  
  where acl_inhparent<>acl_inhrelid;  
 dbname | inhparent |  inhrelid  |              acl_inhparent              | acl_inhrelid   
--------+-----------+------------+-----------------------------------------+--------------  
 digoal | tbl       | tbl_201201 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201202 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201204 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201205 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201206 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201207 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201208 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201209 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201210 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201211 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201212 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201301 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201302 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201303 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201304 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201305 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201306 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201307 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201308 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201309 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201310 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201311 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201312 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201401 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201402 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201403 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201404 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201405 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201406 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201407 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201408 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201409 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201410 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201411 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201412 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201501 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201502 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201503 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201504 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201505 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201506 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | tbl       | tbl_201507 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
(42 rows)  

脚本中未修改, 以上SQL已调整, 注意同步调整以下脚本的内容.


用以上的SQL写个SHELL脚本就可以进行监控了, 或者结合nagios来进行监控.

例如 :

ocz@db-172-16-3-150-> cat t.sh  
#!/bin/bash  
# 环境变量  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export PATH=$PGHOME/bin:$PATH:.  
  
# 参数判断  
if [ $# -ne 3 ]; then  
echo -e "Please check your parameter."  
echo -e "$0 HOST PORT ROLE \n"  
exit 1  
fi  
  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
  
HOST=$1  
PORT=$2  
ROLE=$3  
  
for DB in `psql -q -t -h $HOST -p $PORT -U $ROLE postgres -c "select datname from pg_database where datname not in ('postgres','template0','template1')"`  
do  
echo -e "------`date +%F\ %T`----Partition Table MON----IP:$HOST PORT:$PORT DBNAME:$DB------"  
psql -h $HOST -p $PORT -U $ROLE $DB <<EOF  
create or replace function mon_part_get_acl(i_oid oid) returns aclitem[] as \$\$  
declare  
  result aclitem[] := '{}'::aclitem[];  
begin  
  select relacl into result from pg_class where oid=i_oid and relacl is not null;  
  if found then  
    return result;  
  else  
    return '{}'::aclitem[];  
  end if;  
exception when others then  
  return '{}'::aclitem[];  
end;  
\$\$ language plpgsql;  
  
create or replace function mon_part_conv_to_date(i_relname text) returns date as \$\$  
declare  
  v_len int;  
  v_date text := '';  
  i text := '';  
begin  
  FOREACH i IN ARRAY regexp_split_to_array(i_relname, '')  
  loop  
    if (i >= '0' and i <= '9') then  
      v_date := v_date||i;  
    end if;  
  end loop;  
  v_len := length(v_date);  
  -- the first 2str must 18,19,20,21,22  
  if (substr(v_date,1,2) >= '18' and substr(v_date,1,2) <= '22' ) then  
    if (v_len >= 4 and v_len <=8) then  
      case v_len  
      when 4 then  
        return (v_date||'0101')::date;  
      when 6 then  
        return (v_date||'01')::date;  
      when 8 then  
        return v_date::date;  
      else  
        return '19700101'::date;  
      end case;  
    else   
      -- not a valid date  
      return '22220101';  
    end if;  
  else  
    -- not a valid date  
    return '22220101';  
  end if;  
exception when others then  
  raise notice 'partition table name date format error.';  
  return '19700101'::date;  
end;  
\$\$ language plpgsql;  
  
create or replace view v_mon_partition_tables as   
select   
  *,  
  (max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt  
from   
(  
select   
  *,  
  pdate-lag_pdate AS gap,  
  case   
  when pdate-lag_pdate >=365 then  
    pdate - date_trunc('year', (pdate - interval '1 year'))::date  
  when pdate-lag_pdate >=28 then  
    pdate - date_trunc('month', (pdate - interval '1 month'))::date  
  when pdate-lag_pdate >=7 then  
    pdate-lag_pdate  
  when pdate-lag_pdate >=1 then  
    pdate - date_trunc('day', (pdate - interval '1 day'))::date  
  else   
    0.9  
  end AS predict_gap  
from   
(  
select   
  row_number() over (partition by inhparent order by mon_part_conv_to_date((inhrelid::regclass)::text)) AS rn,  
  current_database() AS dbname,   
  (inhparent::regclass)::text AS inhparent,   
  (inhrelid::regclass)::text AS inhrelid,   
  mon_part_get_acl(inhparent) AS acl_inhparent,   
  mon_part_get_acl(inhrelid) AS acl_inhrelid,   
  mon_part_conv_to_date((inhrelid::regclass)::text) AS pdate,   
  lag(mon_part_conv_to_date((inhrelid::regclass)::text), 1) over (partition by inhparent order by mon_part_conv_to_date((inhrelid::regclass)::text)) AS lag_pdate,  
  max(mon_part_conv_to_date((inhrelid::regclass)::text)) over (partition by inhparent) AS max_pdate  
from pg_inherits   
) t   
) t;  
  
select '-- 监控日期继承表气泡:' as monitor;  
select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables  
  where   
  gap<>predict_gap   
  and pdate <> '22220101'::date   
  and lag_pdate is not null;  
  
select '-- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):' as monitor;  
select * from  
(  
select dbname,inhparent,inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by inhparent order by pdate desc) AS rn  
from v_mon_partition_tables  
  where   
  (future_tables_cnt<=10 or max_pdate - current_date <731)   
  and pdate <> '22220101'::date   
  and lag_pdate is not null  
) t  
where rn=1;  
  
select '-- 监控所有继承表的权限和父表是否一致,输出含有不一致的父表名称:' as monitor;  
select server_ip,server_port,dbname,name_inhparent  
  from mon_inherits  
  where acl_inhparent<>acl_inhrelid  
  group by server_ip,server_port,dbname,name_inhparent;  
  
EOF  
done  

执行 :

ocz@db-172-16-3-150-> ./t.sh 127.0.0.1 9201 postgres  
------2013-01-04 17:14:16----Partition Table MON----IP:127.0.0.1 PORT:9201 DBNAME:digoal------  
CREATE FUNCTION  
CREATE FUNCTION  
CREATE VIEW  
          monitor             
----------------------------  
 -- 监控日期继承表气泡:  
(1 row)  
  
 dbname | inhparent  |     inhrelid      |   pdate    | lag_pdate  | gap | predict_gap   
--------+------------+-------------------+------------+------------+-----+-------------  
 digoal | digoal.tbl | digoal.tbl_201204 | 2012-04-01 | 2012-02-01 |  60 |          31  
(1 row)  
  
            monitor               
--------------------------------  
 -- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):  
(1 row)  
  
 dbname | inhparent | inhrelid | max_pdate | predict_gap | future_tables_cnt | rn   
--------+-----------+----------+-----------+-------------+-------------------+----  
(0 rows)  
  
     monitor        
------------------  
 -- 监控所以继承表的权限和父表是否一致,输出不一致的:  
(1 row)  
  
 dbname | inhparent  |     inhrelid      |              acl_inhparent              | acl_inhrelid   
--------+------------+-------------------+-----------------------------------------+--------------  
 digoal | digoal.tbl | digoal.tbl_201201 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201202 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201204 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201205 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201206 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201207 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201208 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201209 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201210 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201211 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201212 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201301 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201302 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201303 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201304 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201305 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201306 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201307 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201308 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201309 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201310 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201311 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201312 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201401 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201402 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201403 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201404 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201405 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201406 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201407 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201408 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201409 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201410 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201411 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201412 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201501 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201502 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201503 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201504 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201505 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201506 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
 digoal | digoal.tbl | digoal.tbl_201507 | {digoal=arwdDxt/digoal,skycac=r/digoal} | {}  
(42 rows)  
  
------2013-01-04 17:14:16----Partition Table MON----IP:127.0.0.1 PORT:9201 DBNAME:skycac------  
CREATE FUNCTION  
CREATE FUNCTION  
CREATE VIEW  
          monitor             
----------------------------  
 -- 监控日期继承表气泡:  
(1 row)  
  
 dbname | inhparent | inhrelid | pdate | lag_pdate | gap | predict_gap   
--------+-----------+----------+-------+-----------+-----+-------------  
(0 rows)  
  
            monitor               
--------------------------------  
 -- 监控日期继承表过期(需满足大于未来10个表,且大于未来2年(731day)):  
(1 row)  
  
 dbname | inhparent | inhrelid | max_pdate | predict_gap | future_tables_cnt | rn   
--------+-----------+----------+-----------+-------------+-------------------+----  
(0 rows)  
  
     monitor        
------------------  
 -- 监控所以继承表的权限和父表是否一致,输出不一致的:  
(1 row)  
  
 dbname | inhparent | inhrelid | acl_inhparent | acl_inhrelid   
--------+-----------+----------+---------------+--------------  
(0 rows)  

如果是远程监控可以编辑PGPASSFILE进行.

其他

1. 当月表要变日表时, 不影响监控, 如下 :

digoal=> create table tbl_20150801(like tbl) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> create table tbl_20150802(like tbl) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> create table tbl_20150803(like tbl) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables  
digoal->   where   
digoal->   gap<>predict_gap   
digoal->   and pdate <> '22220101'::date   
digoal->   and lag_pdate is not null;  
 dbname | inhparent |  inhrelid  |   pdate    | lag_pdate  | gap | predict_gap   
--------+-----------+------------+------------+------------+-----+-------------  
 digoal | tbl       | tbl_201204 | 2012-04-01 | 2012-02-01 |  60 |          31  
(1 row)  

当日表想变成月表时, 需要注意一定要补齐月末的表 :

digoal=> create table tbl_201509(like tbl) inherits(tbl);  
NOTICE:  merging column "id" with inherited definition  
NOTICE:  merging column "crt_time" with inherited definition  
CREATE TABLE  
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables  
  where   
  gap<>predict_gap   
  and lag_pdate is not null;  
 dbname | inhparent |  inhrelid  |   pdate    | lag_pdate  | gap | predict_gap   
--------+-----------+------------+------------+------------+-----+-------------  
 digoal | tbl       | tbl_201204 | 2012-04-01 | 2012-02-01 |  60 |          31  
 digoal | tbl       | tbl_201509 | 2015-09-01 | 2015-08-03 |  29 |          31  
(2 rows)  

补齐后, 监控恢复正常 :

create table tbl_20150804(like tbl) inherits(tbl);  
................  
create table tbl_20150831(like tbl) inherits(tbl);  
digoal=> select dbname,inhparent,inhrelid,pdate,lag_pdate,gap,predict_gap from v_mon_partition_tables  
digoal->   where   
digoal->   gap<>predict_gap   
digoal->   and lag_pdate is not null;  
 dbname | inhparent |  inhrelid  |   pdate    | lag_pdate  | gap | predict_gap   
--------+-----------+------------+------------+------------+-----+-------------  
 digoal | tbl       | tbl_201204 | 2012-04-01 | 2012-02-01 |  60 |          31  
(1 row)  

2. 非法表名监控, 一定要清理掉所有非法表名的分区表. 否则这个监控就有暗病.

3. 人为的每个月一次例行检查还是要的, 因为这个脚本不能适合所有场景.

统一监控

方法,

1. 将被监控的数据库中的继承表的信息统一收集到一个监控数据库中,

2. 在监控数据库中进行分析.

监控库如下(存放所有收集到的生产数据库的继承表信息) :

HOST=172.16.3.150  
PORT=1921  
DBNAME=digoal  
ROLE=digoal  
SCHEMA=digoal  

创建存储继承表信息的表, 如下.

create table mon_inherits (  
  server_ip inet,   
  server_port int,  
  dbname name,  
  inhrelid oid,  
  inhparent oid,  
  inhseqno int,  
  name_inhrelid text,  
  name_inhparent text,  
  acl_inhrelid text[],  
  acl_inhparent text[],  
  crt_time timestamp  
);  

创建唯一约束 :

create unique index idx_mon_inherits_1 on mon_inherits(server_ip, server_port, dbname, inhrelid, inhseqno);  

创建数组排序函数, 原因见本文末尾 :

create or replace function mon_array_sort(i_arr text[]) returns text[] as $$  
declare  
  result text[];  
begin  
  select array_agg(arr order by arr) into result from unnest(i_arr) t(arr);  
  return result;  
end;  
$$ language plpgsql;  

创建白名单表(用于过滤不是按日期分表的继承表)

create table mon_inherits_white (name_inhrelid text primary key);  

关于array的比较可以参考以下BLOG :

http://blog.163.com/digoal@126/blog/static/16387704020130682522480/

被监控的目标库用到的取ACL的函数 :

创建获取acl的函数, 这个函数可以在统一收集信息的脚本中创建.

drop function mon_part_get_acl(oid);  
create or replace function mon_part_get_acl(i_oid oid) returns text[] as $$  
declare  
  result text[] := '{}'::text[];  
begin  
  select relacl::text[] into result from pg_class where oid=i_oid and relacl is not null;  
  if found then  
    return result;  
  else  
    return '{}'::text[];  
  end if;  
exception when others then  
  return '{}'::text[];  
end;  
$$ language plpgsql;  

采集服务器, 能连到所有被监控的目标库 :

收集信息的脚本 :

vi /home/postgres/script/coll_inherit_info.sh  
  
#!/bin/bash  
# 环境变量  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export PATH=$PGHOME/bin:$PATH:.  
  
# 参数判断  
if [ $# -ne 3 ]; then  
echo -e "Please check your parameter."  
echo -e "$0 HOST PORT ROLE \n"  
exit 1  
fi  
  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
MON_HOST=172.16.3.150  
MON_PORT=1921  
MON_ROLE=digoal  
MON_DBNAME=digoal  
MON_COPY_SQL="copy mon_inherits(server_ip,server_port,dbname,inhrelid,inhparent,inhseqno,name_inhrelid,name_inhparent,acl_inhrelid,acl_inhparent,crt_time) from stdin"  
TARGET_COPY_SQL="copy (select inet_server_addr(),inet_server_port(),current_database(),inhrelid,inhparent,inhseqno,(inhrelid::regclass)::text,(inhparent::regclass)::text,mon_part_get_acl(inhrelid),mon_part_get_acl(inhparent),now() from pg_inherits) to stdout"  
  
HOST=$1  
PORT=$2  
ROLE=$3  
  
echo -e "-h $HOST -p $PORT -U $ROLE :\n"  
for DB in `psql -q -t -h $HOST -p $PORT -U $ROLE postgres -c "select datname from pg_database where datname not in ('postgres','template0','template1')"`  
do  
echo -e "psql -q -t -h $HOST -p $PORT -U $ROLE $DB :\n"  
# ----创建获取acl的函数  
psql -h $HOST -p $PORT -U $ROLE $DB <<EOF  
drop function mon_part_get_acl(oid);  
create or replace function mon_part_get_acl(i_oid oid) returns text[] as \$\$  
declare  
  result text[] := '{}'::text[];  
begin  
  select relacl::text[] into result from pg_class where oid=i_oid and relacl is not null;  
  if found then  
    return result;  
  else  
    return '{}'::text[];  
  end if;  
exception when others then  
  return '{}'::text[];  
end;  
\$\$ language plpgsql;  
EOF  
# ----获取真实目标数据库数据,避免pgbouncer或者防火墙中转过后的地址或者数据库名的改变造成不能使用.  
REAL_HOST=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select inet_server_addr()"`  
REAL_PORT=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select inet_server_port()"`  
REAL_DBNAME=`psql -t -A -h $HOST -p $PORT -U $ROLE $DB -c "select current_database()"`  
# ----清除历史数据  
psql -h $MON_HOST -p $MON_PORT -U $MON_ROLE $MON_DBNAME -c "delete from mon_inherits where server_ip='$REAL_HOST' and server_port='$REAL_PORT' and dbname='$REAL_DBNAME'"  
# ----拷贝继承表数据  
psql -h $HOST -p $PORT -U $ROLE $DB -c "$TARGET_COPY_SQL"|psql -h $MON_HOST -p $MON_PORT -U $MON_ROLE $MON_DBNAME -c "$MON_COPY_SQL"  
done  
  
  
chmod 500 /home/postgres/script/coll_inherit_info.sh  

采集服务器,

vi ~/.pgpass  
  
172.16.3.150:1921:digoal:digoal:digoal  
172.16.3.150:9201:*:postgres:postgres  
.....pgpass文件中包含的其他目标库的超级用户连接配置项目略  
  
chmod 400 ~/.pgpass  

执行测试 :

ocz@db-172-16-3-150-> ./t.sh 172.16.3.150 9201 postgres  
CREATE FUNCTION  
DELETE 74  
CREATE FUNCTION  
DELETE 0  

信息收集如下 :

ocz@db-172-16-3-150-> psql -h 127.0.0.1 -p 9201 postgres  
psql (9.2.1)  
Type "help" for help.  
postgres=# \c digoal digoal  
You are now connected to database "digoal" as user "digoal".  
digoal=> select * from mon_inherits ;  
  server_ip   | server_port | dbname | inhrelid | inhparent | inhseqno |    name_inhrelid    | name_inhparent | acl_inhrelid |        
        acl_inhparent              |          crt_time            
--------------+-------------+--------+----------+-----------+----------+---------------------+----------------+--------------+------  
-----------------------------------+----------------------------  
 172.16.3.150 |        9201 | digoal |    98010 |     98007 |        1 | digoal.tbl_201201   | digoal.tbl     | {}           | {digo  
al=arwdDxt/digoal,skycac=r/digoal} | 2013-01-05 09:37:53.100495  
 172.16.3.150 |        9201 | digoal |    98013 |     98007 |        1 | digoal.tbl_201202   | digoal.tbl     | {}           | {digo  
al=arwdDxt/digoal,skycac=r/digoal} | 2013-01-05 09:37:53.100495  
....略(74 rows)  

采集服务器, 编辑定期执行脚本 :

vi /home/postgres/script/exec_coll_inherit_info.sh  
  
#!/bin/bash  
/home/postgres/script/coll_inherit_info.sh 172.16.3.150 9201 postgres  

采集服务器, 编辑采集错误监控脚本 :

vi /home/postgres/script/mutt_err_coll_inherit_info.sh   
  
#!/bin/bash  
. /home/postgres/.bash_profile  
FATAL=1  
ERROR=1  
  
FATAL=`grep -c FATAL /tmp/exec_coll_inherit_info.log`  
ERROR=`grep -c ERROR /tmp/exec_coll_inherit_info.log`  
  
if [ $FATAL -ne 0 ] || [ $ERROR -ne 0 ]; then  
  echo -e "`date +%F%T`exec_coll_inherit_info error\n"|mutt -s "exec_coll_inherit_info error" -a /tmp/exec_coll_inherit_info.log digoal.zhou@sky-mobi.com digoal@126.com  
fi  

定时任务 :

crontab -l  
  
1 5 * * * /home/postgres/script/exec_coll_inherit_info.sh >/tmp/exec_coll_inherit_info.log 2>&1  
1 6 * * * /home/postgres/script/mutt_err_coll_inherit_info.sh  

修改权限 :

chmod -R 500 /home/postgres/script/*  

在监控集中库中创建表名转日期的函数 :

create or replace function mon_part_conv_to_date(i_relname text) returns date as $$  
declare  
  arr text[] := ARRAY[0];  
  v_date text := '';  
  i text := '';  
  v_len int := 0;  
begin  
  arr := regexp_split_to_array( regexp_replace( rtrim( ltrim( regexp_replace(i_relname,'[^[:digit:]]','_','g') ,'_') ,'_') ,'_+','_','g') ,'_');  
  FOREACH i IN ARRAY arr  
  loop  
    if (length(v_date) = 8) then  
      exit;  
    end if;  
    if (length(i) = 4 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22')) then  
      v_date := v_date||i;  
    elsif (length(i) = 6 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22') and (substr(i,5,2)>='01' and substr(i,5,2)<='12')) then  
      v_date := v_date||i;  
    elsif (length(i) = 8 and length(v_date)=0 and (substr(i,1,2) >= '18' and substr(i,1,2) <= '22') and (substr(i,5,2)>='01' and substr(i,5,2)<='12') and (substr(i,7,2)>='01' and substr(i,7,2)<='31')) then  
      v_date := v_date||i;  
    elsif (length(i)=4 and length(v_date) = 4 and (substr(i,1,2)>='01' and substr(i,1,2)<='12') and (substr(i,3,2)>='01' and substr(i,3,2)<='31')) then  
      v_date := v_date||i;  
    elsif (length(i)=2 and length(v_date) = 4 and (substr(i,1,2)>='01' and substr(i,1,2)<='12')) then  
      v_date := v_date||i;  
    elsif (length(i)=2 and length(v_date) = 6 and (substr(i,1,2)>='01' and substr(i,1,2)<='31')) then  
      v_date := v_date||i;  
    else  
      continue;  
    end if;  
  end loop;  
  v_len := length(v_date);  
  case v_len  
  when 4 then -- 按年分表  
    return (v_date||'0101')::date;  
  when 6 then -- 按月分表  
    return (v_date||'01')::date;  
  when 8 then -- 按天分表  
    return v_date::date;  
  else -- 长度为5和7非日期分表  
    return '22220101'::date;  
  end case;  
exception when others then  
  return '22220101'::date;  
end;  
$$ language plpgsql;  

创建视图 :

create or replace view v_mon_partition_tables as   
select   
  *,  
  (max_pdate-current_date)/predict_gap::numeric AS future_tables_cnt  
from   
(  
select   
  *,  
  pdate-lag_pdate AS gap,  
  case   
  when pdate-lag_pdate >=365 then  
    pdate - date_trunc('year', (pdate - interval '1 year'))::date  
  when pdate-lag_pdate >=28 then  
    pdate - date_trunc('month', (pdate - interval '1 month'))::date  
  when pdate-lag_pdate >=7 then  
    pdate-lag_pdate  
  when pdate-lag_pdate >=1 then  
    pdate - date_trunc('day', (pdate - interval '1 day'))::date  
  else   
    0.9  
  end AS predict_gap  
from   
(  
select   
  row_number() over (partition by server_ip,server_port,dbname,inhparent order by mon_part_conv_to_date(name_inhrelid)) AS rn,  
  server_ip,  
  server_port,  
  dbname,   
  name_inhparent,   
  name_inhrelid,   
  acl_inhparent,   
  acl_inhrelid,   
  mon_part_conv_to_date(name_inhrelid) AS pdate,   
  lag(mon_part_conv_to_date(name_inhrelid), 1) over (partition by server_ip,server_port,dbname,inhparent order by mon_part_conv_to_date(name_inhrelid)) AS lag_pdate,  
  max(mon_part_conv_to_date(name_inhrelid)) over (partition by server_ip,server_port,dbname,inhparent) AS max_pdate  
from mon_inherits   
) t   
) t;  

视图结构 :

View "inherit_mon.v_mon_partition_tables"  
      Column       |  Type   | Modifiers   
-------------------+---------+-----------  
 rn                | bigint  |   
 server_ip         | inet    |   
 server_port       | integer |   
 dbname            | name    |   
 name_inhparent    | text    |   
 name_inhrelid     | text    |   
 acl_inhparent     | text[]  |   
 acl_inhrelid      | text[]  |   
 pdate             | date    |   
 lag_pdate         | date    |   
 max_pdate         | date    |   
 gap               | integer |   
 predict_gap       | numeric |   
 future_tables_cnt | numeric |   

监控脚本示例 :

每天自动监控 :

vi ~/inherit_day_monitor.sh  
  
#!/bin/bash  
# 环境变量  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql9.1.2  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export PATH=$PGHOME/bin:$PATH:.  
  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
  
HOST=127.0.0.1  
PORT=1921  
ROLE=digoal  
DBNAME=digoal  
  
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_day_monitor.log  
select '-- 监控继承表的权限和父表是否一致,输出不一致项:' as monitor;  
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,mon_array_sort(acl_inhparent),mon_array_sort(acl_inhrelid)  
  from mon_inherits  
  where mon_array_sort(acl_inhparent)<>mon_array_sort(acl_inhrelid);  
select '-- 监控日期继承表气泡:' as monitor;  
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,pdate,lag_pdate,gap,predict_gap   
  from v_mon_partition_tables  
  where   
  gap <> predict_gap   
  and pdate <> '22220101'::date   
  and lag_pdate is not null;  
select '-- 监控日期继承表过期(需满足大于未来10个表,且大于未来半年(185day)):' as monitor;  
select * from  
(  
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,max_pdate,predict_gap,future_tables_cnt,row_number() over (partition by server_ip,server_port,dbname,name_inhparent order by pdate desc) AS rn  
from v_mon_partition_tables  
  where   
  (future_tables_cnt<=10 or max_pdate - current_date <185)   
  and pdate <> '22220101'::date   
  and lag_pdate is not null   
) t  
where rn=1;  
EOF  
  
CNT=0  
CNT=`wc -l /tmp/inherit_day_monitor.log|awk '{print $1}'`  
if [ $CNT -gt 3 ]; then  
  echo -e "partition table abnormal.\n"|mutt -s "partition table abnormal" -a /tmp/inherit_day_monitor.log $EMAIL  
fi  

每月非法表名检查 :

vi ~/inherit_month_invalidname_monitor.sh   
  
#!/bin/bash  
# 环境变量  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql9.1.2  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export PATH=$PGHOME/bin:$PATH:.  
  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
  
HOST=127.0.0.1  
PORT=1921  
ROLE=digoal  
DBNAME=digoal  
  
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_month_invalidname_monitor.log  
select '-- 每月非法表名检查,过滤白名单(mon_inherits_white): ' as monitor;  
select server_ip,server_port,dbname,name_inhrelid  
  from mon_inherits   
  where mon_part_conv_to_date(name_inhrelid)='22220101'::date  
  and name_inhrelid not in (select name_inhrelid from mon_inherits_white)  
  order by server_ip,server_port,dbname,name_inhparent,name_inhrelid;  
EOF  
  
CNT=0  
CNT=`wc -l /tmp/inherit_month_invalidname_monitor.log|awk '{print $1}'`  
if [ $CNT -gt 3 ]; then  
  echo -e "partition table 每月非法表名检查.\n"|mutt -s "partition table 每月非法表名检查" -a /tmp/inherit_month_invalidname_monitor.log $EMAIL  
fi  

非法表明确认不是按日期分表的, 并且不需要气泡,过期检查的. 写到白名单表中.

例如 :

insert into mon_inherits_white select name_inhrelid from (select server_ip,server_port,dbname,name_inhrelid  
  from mon_inherits   
  where mon_part_conv_to_date(name_inhrelid)='22220101'::date  
  and name_inhrelid not in (select name_inhrelid from mon_inherits_white)  
  order by server_ip,server_port,dbname,name_inhparent,name_inhrelid) t group by name_inhrelid;  

以后定期维护这个白名单表.

每月人工检查 :

vi ~/inherit_month_human_monitor.sh   
  
#!/bin/bash  
# 环境变量  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql9.1.2  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export PATH=$PGHOME/bin:$PATH:.  
  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
  
HOST=127.0.0.1  
PORT=1921  
ROLE=digoal  
DBNAME=digoal  
  
psql -t -A -h $HOST -p $PORT -U $ROLE $DBNAME <<EOF >/tmp/inherit_month_human_monitor.log  
select '-- 每月人工检查:' as monitor;  
select server_ip,server_port,dbname,name_inhparent,name_inhrelid,crt_time   
  from mon_inherits   
  order by server_ip,server_port,dbname,name_inhparent,name_inhrelid;  
EOF  
  
CNT=0  
CNT=`wc -l /tmp/inherit_month_human_monitor.log|awk '{print $1}'`  
if [ $CNT -gt 3 ]; then  
  echo -e "partition table 每月人工检查.\n"|mutt -s "partition table 每月人工检查" -a /tmp/inherit_month_human_monitor.log $EMAIL  
fi  

mutt环境 :

vi ~/.muttrc   
  
set envelope_from=yes  
set from=digoal.zhou@sky-mobi.com  
set realname="德哥"  
set use_from=yes  
set charset="UTF-8"  

定时任务 :

crontab -e  
  
1 8 * * * /home/pg91/inherit_day_monitor.sh  
1 9 1 * * /home/pg91/inherit_month_invalidname_monitor.sh  
1 9 1 * * /home/pg91/inherit_month_human_monitor.sh  

注意

1. 如果存在IP,端口,库名相同的情况(例如公司存在多个IDC, 这些IDC中用了同样的内网网段), 以上脚本将不适用. 那请使用多个监控库来区分, 或者多个监控表来区分.

2. 集中监控时为什么acl字段改成了text呢, 因为在监控库中不存在生产库中的用户, 存储不存在的角色时, aclitem[]类型将出错.

3. 权限比较可能出现父表和子表赋权时顺序不一致造成的假象, 这个需要手工调整一下.

例如 :

digoal=> create table tbl(id int);  
CREATE TABLE  
digoal=> create table tbl_child(id int);  
CREATE TABLE  
digoal=> grant select on tbl to postgres;  
GRANT  
digoal=> grant select on tbl to skycac;  
GRANT  
digoal=> grant select on tbl_child to skycac;  
GRANT  
digoal=> grant select on tbl_child to postgres;  
GRANT  

虽然实际的权限是一致的, 但是这样得到的权限比较将不一致 :

digoal=> select relacl from pg_class where relname='tbl';  
                          relacl                             
-----------------------------------------------------------  
 {digoal=arwdDxt/digoal,postgres=r/digoal,skycac=r/digoal}  
(1 row)  
  
digoal=> select relacl from pg_class where relname='tbl_child';  
                          relacl                             
-----------------------------------------------------------  
 {digoal=arwdDxt/digoal,skycac=r/digoal,postgres=r/digoal}  
(1 row)  

要解决这个问题, 需要对数组进行排序, 我们可以用unnest将数组转成setof rec,然后再按顺序聚合(PostgreSQL 9.0开始支持聚合顺序).

如果不是9.0请自行写聚合函数, 聚合函数范例可参看 :

http://blog.163.com/digoal@126/blog/static/16387704020121118112533410/

下面针对PostgreSQL 9.0+ 举例如下 :

digoal=> create or replace function mon_array_sort(i_arr text[]) returns text[] as $$  
declare  
  result text[];  
begin  
  select array_agg(arr order by arr) into result from unnest(i_arr) t(arr);  
  return result;  
end;  
$$ language plpgsql;   

排序后 :

digoal=> select mon_array_sort('{h1,b,b,c,g,e}'::text[]);  
 mon_array_sort   
----------------  
 {b,b,c,e,g,h1}  
(1 row)  

4. 一般还需要监控的项目,

父表和继承表的表结构一致性.

父表和继承表的索引一致性.

父表和继承表的列权限一致性(pg_attribute.attacl).

Flag Counter

digoal’s 大量PostgreSQL文章入口