PostgreSQL数据库监控中的统计学 - 对象SIZE的数据分布图

1 minute read

背景

有时，我们为了直观的显示对象的空间占用及分布情况，我们可能会以图表的形式展示。

通常情况下，我们可以根据需求，以柱状图或者聚集图的形式来展示，从各个维度来了解数据空间占用的分布情况，例如：

1. bucket分布，就是按大小排序，选定要划分为几个bucket，每个bucket放同样数量的对象，输出bucket的边界，形式和pg_stats.histogram_bounds的输出类似。

方法举例，需要用到窗口函数ntile：

postgres=# select bucket,min(size),maxsize),count(*) from (select relname,ntile(10) over( order by pg_relation_size(oid) ) bucket, pg_relation_size(oid) size from pg_class) t group by 1 order by 1;  
 bucket |  min  |   max    | count   
--------+-------+----------+-------  
|     0 |        0 |    31  
|     0 |        0 |    30  
|     0 |        0 |    30  
|     0 |        0 |    30  
|     0 |     8192 |    30  
|  8192 |     8192 |    30  
|  8192 |    16384 |    30  
| 16384 |    16384 |    30  
| 16384 |    32768 |    30  
| 32768 | 36249600 |    30  
(10 rows)  

2. 按等间距线性分布，例如每100MB输出一组落在对应SIZE的对象。这种方式有点像systemtap的 @hist_linear 分布

https://sourceware.org/systemtap/langref/Statistics_aggregates.html#SECTION00094100000000000000

方法举例：

postgres=# select pg_relation_size(oid)/1024/1024,count(*) from pg_class group by 1 order by 1;  
 ?column? | count   
----------+-------  
        0 |   299  
       21 |     1  
       34 |     1  
(3 rows)  

3. 按2^n间距指数分布，这种方式有点像systemtap的 @hist_log 分布

https://sourceware.org/systemtap/langref/Statistics_aggregates.html#SECTION00094100000000000000

方法举例：

首先要将int转为二进制

http://blog.163.com/digoal@126/blog/static/16387704020132592725462/

create or replace function si32tob(i_num int) returns varbit as $$  
declare  
  o_bit text;  
  o_len int;  
  i_conv int;  
  i_num_abs int;  
  i_pos int;  
begin  
  if i_num = 0 then return varbit '0'; end if;   
   o_len := 32;  
  i_conv := 2;  
  i_num_abs := abs(i_num);  
  i_pos := trunc((dlog1(i_num_abs))/0.693147180559945);  
  o_bit := mod(i_num_abs,i_conv)::text;  
  if i_pos >= 1 then  
    for i in 1..i_pos loop  
      o_bit := mod(i_num_abs>>i, i_conv)||o_bit;  
    end loop;  
  end if;  
  if i_num >=0 then  
    null;  
  else  
    o_len := o_len - char_length(o_bit) - 1;  
    o_bit := repeat('0', o_len)||o_bit;  
    o_bit := '1'||o_bit;  
  end if;  
  return o_bit::varbit;  
end;  
$$ language plpgsql;  

输出

postgres=# select 2^(bit_length(si32tob((pg_relation_size(oid))::int4))-1), count(*) from pg_class group by 1 order by 1;  
 ?column? | count   
----------+-------  
        1 |   145  
     8192 |    67  
    16384 |    60  
    32768 |    14  
    65536 |     6  
   131072 |     4  
   262144 |     7  
   524288 |     1  
 16777216 |     1  
 33554432 |     1  
(10 rows)  

4. 聚集分布，可以用k-mean分布插件

《K-Means 数据聚集算法》

查询举例：

postgres=# select class,min(size),max(size),count(*) from (select kmeans(array[pg_relation_size(oid)],10) over() as class,pg_relation_size(oid) size from pg_class ) t group by 1 order by 1;  
 class |   min    |   max    | count   
-------+----------+----------+-------  
|   671744 |   671744 |     1  
|   483328 |   483328 |     1  
|   475136 |   475136 |     1  
| 22487040 | 22487040 |     1  
|   352256 |   352256 |     1  
| 36249600 | 36249600 |     1  
|   278528 |   319488 |     4  
|   221184 |   221184 |     1  
|    57344 |   139264 |    10  
|        0 |    49152 |   285  
(10 rows)  

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL数据库监控中的统计学 - 对象SIZE的数据分布图

背景

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法