cgroup告诉你如何计算 PostgreSQL 数据库实例用了多少内存

3 minute read

背景

当我们在一个操作系统中,启动了多个数据库实例时,我们也许会控制各个实例可以使用的内存,通过cgroup来控制是一种手段。

显然,使用cgroup也可以知道你的实例使用了多少内存。

例子:

在内存组新建一个组

[root@digoal ~]# cd /cgroup/memory  
[root@digoal memory]# mkdir cg1  
[root@digoal memory]# cd cg1  

列出我们需要观察的某个数据库实例的所有PID

[root@digoal memory]# ps -ewf|grep postgres  
postgres  5492     1  0 13:54 pts/0    00:00:23 /opt/pgsql9.4.4/bin/postgres  
postgres  5494  5492  0 13:54 ?        00:00:00 postgres: logger process      
postgres  5496  5492  0 13:54 ?        00:00:02 postgres: checkpointer process     
postgres  5497  5492  0 13:54 ?        00:00:00 postgres: writer process      
postgres  5498  5492  0 13:54 ?        00:00:12 postgres: wal writer process     
postgres  5499  5492  0 13:54 ?        00:00:01 postgres: autovacuum launcher process     
postgres  5500  5492  0 13:54 ?        00:00:01 postgres: stats collector process     

将这个数据库实例的PID加入到新建的这个组

[root@digoal cg1]# echo 5492 > tasks   
[root@digoal cg1]# echo 5494 > tasks   
[root@digoal cg1]# echo 5496 > tasks   
[root@digoal cg1]# echo 5497 > tasks   
[root@digoal cg1]# echo 5498 > tasks   
[root@digoal cg1]# echo 5499 > tasks   
[root@digoal cg1]# echo 5500 > tasks   

当下我们可以看到这些内容

[root@digoal cg1]# ll  
total 0  
--w--w--w- 1 root root 0 Sep 26 21:12 cgroup.event_control  
-rw-r--r-- 1 root root 0 Sep 26 21:12 cgroup.procs  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.failcnt  
--w------- 1 root root 0 Sep 26 21:12 memory.force_empty  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.limit_in_bytes  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.max_usage_in_bytes  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.failcnt  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.limit_in_bytes  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.max_usage_in_bytes  
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.usage_in_bytes  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.move_charge_at_immigrate  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.oom_control  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.soft_limit_in_bytes  
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.stat  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.swappiness  
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.usage_in_bytes  
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.use_hierarchy  
-rw-r--r-- 1 root root 0 Sep 26 21:12 notify_on_release  
-rw-r--r-- 1 root root 0 Sep 26 21:13 tasks  

当前允许你使用的内存

[root@digoal cg1]# cat memory.limit_in_bytes   
9223372036854775807  

如果使用的内存超过这个值,会触发OOM,当发生OOM时,内核的处理方式由你来定。

[root@digoal cg1]# cat memory.oom_control   
oom_kill_disable 0  
under_oom 0  

如果oom_kill_disable = 1表示不发生oom,但是会hang住申请内存的进程

At reading, current status of OOM is shown.  
        oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)  
        under_oom        0 or 1 (if 1, the memcg is under OOM, tasks may  be stopped.)  

under_oom 显示当前是不是处于超内存的状态,1表示正在发生OOM或已经超内存被hang住了。

显示这个组曾经用过的内存峰值

[root@digoal cg1]# cat memory.max_usage_in_bytes   
9203712  

显示这个组当前的内存使用情况

[root@digoal cg1]# cat memory.usage_in_bytes   
2985984  

以上两个值就能表明我们这个组中的数据库进程,总的内存使用量,以及历史的内存使用峰值。

注意PostgreSQL的postmaster进程是所有子进程的父进程,默认会把子进程加入到父进程所在的cgroup 组,所以我们可以非常方便的统计和控制整个实例的内存使用情况。

内存的统计信息文件memory.stat

[root@digoal cg1]# cat memory.stat  
cache 1200128  
rss 1589248  
mapped_file 0  
pgpgin 104831  
pgpgout 104150  
swap 0  
inactive_anon 823296  
active_anon 1798144  
inactive_file 139264  
active_file 28672  
unevictable 0  
hierarchical_memory_limit 10240000  
hierarchical_memsw_limit 9223372036854775807  
total_cache 1200128  
total_rss 1589248  
total_mapped_file 0  
total_pgpgin 104831  
total_pgpgout 104150  
total_swap 0  
total_inactive_anon 823296  
total_active_anon 1798144  
total_inactive_file 139264  
total_active_file 28672  
total_unevictable 0  

解释

The memory.stat file gives accounting information. Now, the number of

caches, RSS and Active pages/Inactive pages are shown.

5.2 stat file

memory.stat file includes following statistics

cache           - # of bytes of page cache memory.    页缓存  
rss             - # of bytes of anonymous and swap cache memory.   你们和交换缓存  
pgpgin          - # of pages paged in (equivalent to # of charging events).  
pgpgout         - # of pages paged out (equivalent to # of uncharging events).  
active_anon     - # of bytes of anonymous and  swap cache memory on active  
                  lru list.  
inactive_anon   - # of bytes of anonymous memory and swap cache memory on  
                  inactive lru list.  
active_file     - # of bytes of file-backed memory on active lru list.  
inactive_file   - # of bytes of file-backed memory on inactive lru list.  
unevictable     - # of bytes of memory that cannot be reclaimed (mlocked etc).  
  
The following additional stats are dependent on CONFIG_DEBUG_VM.  
  
recent_rotated_anon     - VM internal parameter. (see mm/vmscan.c)  
recent_rotated_file     - VM internal parameter. (see mm/vmscan.c)  
recent_scanned_anon     - VM internal parameter. (see mm/vmscan.c)  
recent_scanned_file     - VM internal parameter. (see mm/vmscan.c)  
  
Memo:  
        recent_rotated means recent frequency of lru rotation.  
        recent_scanned means recent # of scans to lru.  
        showing for better debug please see the code for meanings.  
  
Note:  
        Only anonymous and swap cache memory is listed as part of 'rss' stat.  
        This should not be confused with the true 'resident set size' or the  
        amount of physical memory used by the cgroup. Per-cgroup rss  
        accounting is not done yet.  

如果要控制实例的内存使用情况,前面说了,通过

memory.limit_in_bytes 可以控制这个组内的进程能使用多少内存,

限制到10MB

[root@digoal cg1]# echo 10M > memory.limit_in_bytes   
[root@digoal cg1]# cat memory.limit_in_bytes   
10485760  

同时通过memory.oom_control告诉内核当内存超出限制时该oom还是hang。

You can disable oom-killer by writing "1" to memory.oom_control file.  
As.  
        #echo 1 > memory.oom_control  

注意,hang可能不是一个好事情,因为被hang的进程可能持有一种比较大的文件系统锁,可能会影响整个操作系统对该文件系统的操作。

同时内核oom发的是kill -9的信号,如果数据库的进程被kill -9了,会导致整个数据库restart并进入恢复阶段。

src/backend/postmaster/postmaster.c

更多的cgroup用法和介绍可以参考

/usr/share/doc/kernel-doc-2.6.32/Documentation/cgroups

参考

1. /usr/share/doc/kernel-doc-2.6.32/Documentation/cgroups/memory.txt

Flag Counter

digoal’s 大量PostgreSQL文章入口