cgroup告诉你如何计算 PostgreSQL 数据库实例用了多少内存
背景
当我们在一个操作系统中,启动了多个数据库实例时,我们也许会控制各个实例可以使用的内存,通过cgroup来控制是一种手段。
显然,使用cgroup也可以知道你的实例使用了多少内存。
例子:
在内存组新建一个组
[root@digoal ~]# cd /cgroup/memory
[root@digoal memory]# mkdir cg1
[root@digoal memory]# cd cg1
列出我们需要观察的某个数据库实例的所有PID
[root@digoal memory]# ps -ewf|grep postgres
postgres 5492 1 0 13:54 pts/0 00:00:23 /opt/pgsql9.4.4/bin/postgres
postgres 5494 5492 0 13:54 ? 00:00:00 postgres: logger process
postgres 5496 5492 0 13:54 ? 00:00:02 postgres: checkpointer process
postgres 5497 5492 0 13:54 ? 00:00:00 postgres: writer process
postgres 5498 5492 0 13:54 ? 00:00:12 postgres: wal writer process
postgres 5499 5492 0 13:54 ? 00:00:01 postgres: autovacuum launcher process
postgres 5500 5492 0 13:54 ? 00:00:01 postgres: stats collector process
将这个数据库实例的PID加入到新建的这个组
[root@digoal cg1]# echo 5492 > tasks
[root@digoal cg1]# echo 5494 > tasks
[root@digoal cg1]# echo 5496 > tasks
[root@digoal cg1]# echo 5497 > tasks
[root@digoal cg1]# echo 5498 > tasks
[root@digoal cg1]# echo 5499 > tasks
[root@digoal cg1]# echo 5500 > tasks
当下我们可以看到这些内容
[root@digoal cg1]# ll
total 0
--w--w--w- 1 root root 0 Sep 26 21:12 cgroup.event_control
-rw-r--r-- 1 root root 0 Sep 26 21:12 cgroup.procs
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.failcnt
--w------- 1 root root 0 Sep 26 21:12 memory.force_empty
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.move_charge_at_immigrate
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.oom_control
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.stat
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.swappiness
-r--r--r-- 1 root root 0 Sep 26 21:12 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Sep 26 21:12 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Sep 26 21:12 notify_on_release
-rw-r--r-- 1 root root 0 Sep 26 21:13 tasks
当前允许你使用的内存
[root@digoal cg1]# cat memory.limit_in_bytes
9223372036854775807
如果使用的内存超过这个值,会触发OOM,当发生OOM时,内核的处理方式由你来定。
[root@digoal cg1]# cat memory.oom_control
oom_kill_disable 0
under_oom 0
如果oom_kill_disable = 1表示不发生oom,但是会hang住申请内存的进程
At reading, current status of OOM is shown.
oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)
under_oom 0 or 1 (if 1, the memcg is under OOM, tasks may be stopped.)
under_oom 显示当前是不是处于超内存的状态,1表示正在发生OOM或已经超内存被hang住了。
显示这个组曾经用过的内存峰值
[root@digoal cg1]# cat memory.max_usage_in_bytes
9203712
显示这个组当前的内存使用情况
[root@digoal cg1]# cat memory.usage_in_bytes
2985984
以上两个值就能表明我们这个组中的数据库进程,总的内存使用量,以及历史的内存使用峰值。
注意PostgreSQL的postmaster进程是所有子进程的父进程,默认会把子进程加入到父进程所在的cgroup 组,所以我们可以非常方便的统计和控制整个实例的内存使用情况。
内存的统计信息文件memory.stat
[root@digoal cg1]# cat memory.stat
cache 1200128
rss 1589248
mapped_file 0
pgpgin 104831
pgpgout 104150
swap 0
inactive_anon 823296
active_anon 1798144
inactive_file 139264
active_file 28672
unevictable 0
hierarchical_memory_limit 10240000
hierarchical_memsw_limit 9223372036854775807
total_cache 1200128
total_rss 1589248
total_mapped_file 0
total_pgpgin 104831
total_pgpgout 104150
total_swap 0
total_inactive_anon 823296
total_active_anon 1798144
total_inactive_file 139264
total_active_file 28672
total_unevictable 0
解释
The memory.stat file gives accounting information. Now, the number of
caches, RSS and Active pages/Inactive pages are shown.
5.2 stat file
memory.stat file includes following statistics
cache - # of bytes of page cache memory. 页缓存
rss - # of bytes of anonymous and swap cache memory. 你们和交换缓存
pgpgin - # of pages paged in (equivalent to # of charging events).
pgpgout - # of pages paged out (equivalent to # of uncharging events).
active_anon - # of bytes of anonymous and swap cache memory on active
lru list.
inactive_anon - # of bytes of anonymous memory and swap cache memory on
inactive lru list.
active_file - # of bytes of file-backed memory on active lru list.
inactive_file - # of bytes of file-backed memory on inactive lru list.
unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
The following additional stats are dependent on CONFIG_DEBUG_VM.
recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
Memo:
recent_rotated means recent frequency of lru rotation.
recent_scanned means recent # of scans to lru.
showing for better debug please see the code for meanings.
Note:
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup. Per-cgroup rss
accounting is not done yet.
如果要控制实例的内存使用情况,前面说了,通过
memory.limit_in_bytes 可以控制这个组内的进程能使用多少内存,
限制到10MB
[root@digoal cg1]# echo 10M > memory.limit_in_bytes
[root@digoal cg1]# cat memory.limit_in_bytes
10485760
同时通过memory.oom_control告诉内核当内存超出限制时该oom还是hang。
You can disable oom-killer by writing "1" to memory.oom_control file.
As.
#echo 1 > memory.oom_control
注意,hang可能不是一个好事情,因为被hang的进程可能持有一种比较大的文件系统锁,可能会影响整个操作系统对该文件系统的操作。
同时内核oom发的是kill -9的信号,如果数据库的进程被kill -9了,会导致整个数据库restart并进入恢复阶段。
src/backend/postmaster/postmaster.c
更多的cgroup用法和介绍可以参考
/usr/share/doc/kernel-doc-2.6.32/Documentation/cgroups
参考
1. /usr/share/doc/kernel-doc-2.6.32/Documentation/cgroups/memory.txt