PostgreSQL数据库监控中的统计学 - 对象SIZE的数据分布图
背景
有时,我们为了直观的显示对象的空间占用及分布情况,我们可能会以图表的形式展示。
通常情况下,我们可以根据需求,以柱状图或者聚集图的形式来展示,从各个维度来了解数据空间占用的分布情况,例如:
1. bucket分布,就是按大小排序,选定要划分为几个bucket,每个bucket放同样数量的对象,输出bucket的边界,形式和pg_stats.histogram_bounds的输出类似。
方法举例,需要用到窗口函数ntile:
postgres=# select bucket,min(size),maxsize),count(*) from (select relname,ntile(10) over( order by pg_relation_size(oid) ) bucket, pg_relation_size(oid) size from pg_class) t group by 1 order by 1;
bucket | min | max | count
--------+-------+----------+-------
1 | 0 | 0 | 31
2 | 0 | 0 | 30
3 | 0 | 0 | 30
4 | 0 | 0 | 30
5 | 0 | 8192 | 30
6 | 8192 | 8192 | 30
7 | 8192 | 16384 | 30
8 | 16384 | 16384 | 30
9 | 16384 | 32768 | 30
10 | 32768 | 36249600 | 30
(10 rows)
2. 按等间距线性分布,例如每100MB输出一组落在对应SIZE的对象。这种方式有点像systemtap的 @hist_linear 分布
https://sourceware.org/systemtap/langref/Statistics_aggregates.html#SECTION00094100000000000000
方法举例:
postgres=# select pg_relation_size(oid)/1024/1024,count(*) from pg_class group by 1 order by 1;
?column? | count
----------+-------
0 | 299
21 | 1
34 | 1
(3 rows)
3. 按2^n间距指数分布,这种方式有点像systemtap的 @hist_log 分布
https://sourceware.org/systemtap/langref/Statistics_aggregates.html#SECTION00094100000000000000
方法举例:
首先要将int转为二进制
http://blog.163.com/digoal@126/blog/static/16387704020132592725462/
create or replace function si32tob(i_num int) returns varbit as $$
declare
o_bit text;
o_len int;
i_conv int;
i_num_abs int;
i_pos int;
begin
if i_num = 0 then return varbit '0'; end if;
o_len := 32;
i_conv := 2;
i_num_abs := abs(i_num);
i_pos := trunc((dlog1(i_num_abs))/0.693147180559945);
o_bit := mod(i_num_abs,i_conv)::text;
if i_pos >= 1 then
for i in 1..i_pos loop
o_bit := mod(i_num_abs>>i, i_conv)||o_bit;
end loop;
end if;
if i_num >=0 then
null;
else
o_len := o_len - char_length(o_bit) - 1;
o_bit := repeat('0', o_len)||o_bit;
o_bit := '1'||o_bit;
end if;
return o_bit::varbit;
end;
$$ language plpgsql;
输出
postgres=# select 2^(bit_length(si32tob((pg_relation_size(oid))::int4))-1), count(*) from pg_class group by 1 order by 1;
?column? | count
----------+-------
1 | 145
8192 | 67
16384 | 60
32768 | 14
65536 | 6
131072 | 4
262144 | 7
524288 | 1
16777216 | 1
33554432 | 1
(10 rows)
4. 聚集分布,可以用k-mean分布插件
查询举例:
postgres=# select class,min(size),max(size),count(*) from (select kmeans(array[pg_relation_size(oid)],10) over() as class,pg_relation_size(oid) size from pg_class ) t group by 1 order by 1;
class | min | max | count
-------+----------+----------+-------
0 | 671744 | 671744 | 1
1 | 483328 | 483328 | 1
2 | 475136 | 475136 | 1
3 | 22487040 | 22487040 | 1
4 | 352256 | 352256 | 1
5 | 36249600 | 36249600 | 1
6 | 278528 | 319488 | 4
7 | 221184 | 221184 | 1
8 | 57344 | 139264 | 10
9 | 0 | 49152 | 285
(10 rows)