在OOM现场 谈一谈数据库内存分配, 以及审计的重要性

10 minute read

背景

数据库是比较重内存的应用软件之一,比如排序、聚合、使用较大的列、使用很长的SQL(值SQL本身的内容长度),或者传入很大的变长类型值时,都可能使得单个连接就会消耗很大的内存。

而另一方面,每个连接会消耗一定的内存,比如SYSCACHE , RELCACHE,随着访问的对象变多,如果是长连接,消耗的内存也会越多。 通常来说,长连接当访问了很多元数据时,可能占用几十MB到上百MB不等。

当使用了cgroup来限制数据库实例的总内存时,随着数据库占用的RSS部分的内存越来越多,如果数据库连接在申请内存时,超出了cgroup的限制,则连接可能被OOM掉。

当然,即使不使用cgroup,Linux也会根据内核的配置,以及用户申请内存的动作,当时的剩余内存等情况综合,发生OOM。

有几篇文章可以参考一下

《精确度量Linux下进程占用多少内存的方法》

《一个笛卡尔积的update from语句引发的(内存泄露?)问题》

《PostgreSQL relcache在长连接应用中的内存霸占”坑”》

《Linux page allocation failure 的问题处理 - lowmem_reserve_ratio》

当发生了OOM后,如何找到引起OOM的那个会话,他是罪魁祸首(指单个会话申请了过多的内存),还是压死骆驼的最后一根稻草(指连接数过多)呢?

数据库又是如何申请内存的呢?

rss和cache

简单讲一下rss和cache.

rss是程序申请的内存,不能被内核自动释放,由用户自己来管理它什么时候被释放。

cache,通常指缓存,比如文件系统的缓存,不由用户进程来管理,它可以被内核释放。

我们所指的内存不足,是指即使cache完全被释放,也无法分配足够的内存给用户请求。

在cgroup的memory子系统中,我们可以看到这两个部分,rss很多,cache很少时,就要注意了,可能会发生OOM。

# cat memory.stat   
cache 204800  
rss 0  
mapped_file 0  
pgpgin 974906  
pgpgout 974856  
swap 0  
inactive_anon 155648  
active_anon 0  
inactive_file 49152  
active_file 0  
unevictable 0  
hierarchical_memory_limit 102400000  
hierarchical_memsw_limit 102400000  
total_cache 204800  
total_rss 0  
total_mapped_file 0  
total_pgpgin 974906  
total_pgpgout 974856  
total_swap 0  
total_inactive_anon 155648  
total_active_anon 0  
total_inactive_file 49152  
total_active_file 0  
total_unevictable 0  

数据库申请内存举例

以PostgreSQL数据库为例,为了管理方便,pg使用统一的内存分配和释放API,便于管理,详见src/backend/utils/mmgr/mcxt.c。

用得比较多的比如palloc,它申请的内存某些时候可以自动被回收,比如事务结束,会话断开,QUERY结束时,使用palloc申请的某些内存,会自动被释放。

还有一些内存分配接口详见src/backend/utils/mmgr/mcxt.c。

 * mcxt.c  
 *	  POSTGRES memory context management code.  
 *  
 * This module handles context management operations that are independent  
 * of the particular kind of context being operated on.  It calls  
 * context-type-specific operations via the function pointers in a  
 * context's MemoryContextMethods struct.  
 *  
 *  
 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group  
 * Portions Copyright (c) 1994, Regents of the University of California  
 *  
 *  
 * IDENTIFICATION  
 *	  src/backend/utils/mmgr/mcxt.c  
  
  
void *  
palloc(Size size)  
{  
	/* duplicates MemoryContextAlloc to avoid increased overhead */  
	void	   *ret;  
  
	AssertArg(MemoryContextIsValid(CurrentMemoryContext));  
  
	if (!AllocSizeIsValid(size))  // 目前变长类型,最大支持1GB,要更大可以使用大对象类型。  
		elog(ERROR, "invalid memory alloc request size %zu", size);  
  
	CurrentMemoryContext->isReset = false;  
  
	ret = (*CurrentMemoryContext->methods->alloc) (CurrentMemoryContext, size);  // 分配方法,用户可以自定义memcontx或者使用已有的。    
	VALGRIND_MEMPOOL_ALLOC(CurrentMemoryContext, ret, size);  
  
	return ret;  
}  

处理单个值时,内存限制。单条SQL的内存限制也是1GB。

 * memutils.h  
 *	  This file contains declarations for memory allocation utility  
 *	  functions.  These are functions that are not quite widely used  
 *	  enough to justify going in utils/palloc.h, but are still part  
 *	  of the API of the memory management subsystem.  
 *  
 *  
 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group  
 * Portions Copyright (c) 1994, Regents of the University of California  
 *  
 * src/include/utils/memutils.h  
  
#define MaxAllocSize	((Size) 0x3fffffff)		/* 1 gigabyte - 1 */  
  
#define AllocSizeIsValid(size)	((Size) (size) <= MaxAllocSize)  

用户可以根据mem context存储内存分配和释放的方法,应对不同的使用场景,这样保证了接口依旧是palloc。

 * memnodes.h  
 *	  POSTGRES memory context node definitions.  
 *  
 *  
 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group  
 * Portions Copyright (c) 1994, Regents of the University of California  
 *  
 * src/include/nodes/memnodes.h  
  
/*  
 * MemoryContext  
 *		A logical context in which memory allocations occur.  
 *  
 * MemoryContext itself is an abstract type that can have multiple  
 * implementations, though for now we have only AllocSetContext.  
 * The function pointers in MemoryContextMethods define one specific  
 * implementation of MemoryContext --- they are a virtual function table  
 * in C++ terms.  
 *  
 * Node types that are actual implementations of memory contexts must  
 * begin with the same fields as MemoryContext.  
 *  
 * Note: for largely historical reasons, typedef MemoryContext is a pointer  
 * to the context struct rather than the struct type itself.  
 */  
  
typedef struct MemoryContextMethods  
{  
	void	   *(*alloc) (MemoryContext context, Size size);  
	/* call this free_p in case someone #define's free() */  
	void		(*free_p) (MemoryContext context, void *pointer);  
	void	   *(*realloc) (MemoryContext context, void *pointer, Size size);  
	void		(*init) (MemoryContext context);  
	void		(*reset) (MemoryContext context);  
	void		(*delete_context) (MemoryContext context);  
	Size		(*get_chunk_space) (MemoryContext context, void *pointer);  
	bool		(*is_empty) (MemoryContext context);  
	void		(*stats) (MemoryContext context, int level);  
#ifdef MEMORY_CONTEXT_CHECKING  
	void		(*check) (MemoryContext context);  
#endif  
} MemoryContextMethods;  

我们可以在数据库的类型处理中,大量的用到了palloc,同时也可以很清晰的了解,每一种类型,在使用是需要消耗多少内存。

cd postgresql-9.6.1/src/backend/utils

$grep -r palloc *|less  
adt/rangetypes_spgist.c:        lowerBounds = palloc(sizeof(RangeBound) * in->nTuples);  
adt/rangetypes_spgist.c:        upperBounds = palloc(sizeof(RangeBound) * in->nTuples);  
adt/rangetypes_spgist.c:                out->mapTuplesToNodes = palloc(sizeof(int) * in->nTuples);  
  
...  
  
adt/varchar.c:  char       *res = (char *) palloc(64);  
adt/varchar.c:  result = (BpChar *) palloc(maxlen + VARHDRSZ);  
adt/varchar.c:  result = palloc(maxlen + VARHDRSZ);  
adt/varchar.c:  result = (BpChar *) palloc(VARHDRSZ + 1);  
adt/varchar.c:  /* We use palloc0 here to ensure result is zero-padded */  
adt/varchar.c:  result = (Name) palloc0(NAMEDATALEN);  
  
...  
  
adt/oracle_compat.c:    ret = (text *) palloc(VARHDRSZ + bytelen);  
adt/oracle_compat.c:    ret = (text *) palloc(VARHDRSZ + bytelen);  
adt/oracle_compat.c:                    stringchars = (const char **) palloc(stringlen * sizeof(char *));  
adt/oracle_compat.c:                    stringmblen = (int *) palloc(stringlen * sizeof(int));  
adt/oracle_compat.c:                    setchars = (const char **) palloc(setlen * sizeof(char *));  
adt/oracle_compat.c:                    setmblen = (int *) palloc(setlen * sizeof(int));  
adt/oracle_compat.c:    ret = (bytea *) palloc(VARHDRSZ + m);  
adt/oracle_compat.c:    result = (text *) palloc(worst_len + VARHDRSZ);  
adt/oracle_compat.c:            result = (text *) palloc(VARHDRSZ + bytes);  
adt/oracle_compat.c:            result = (text *) palloc(VARHDRSZ + 1);  
adt/oracle_compat.c:    result = (text *) palloc(tlen);  
  
...  
  
adt/tsvector.c:                                 repalloc(res->pos, newlen * sizeof(WordEntryPos));  
adt/tsvector.c: arr = (WordEntryIN *) palloc(sizeof(WordEntryIN) * arrlen);  
adt/tsvector.c: cur = tmpbuf = (char *) palloc(buflen);  
adt/tsvector.c:                         repalloc((void *) arr, sizeof(WordEntryIN) * arrlen);  
adt/tsvector.c:                 tmpbuf = (char *) repalloc((void *) tmpbuf, buflen);  
adt/tsvector.c: in = (TSVector) palloc0(totallen);  
adt/tsvector.c: curout = outbuf = (char *) palloc(lenbuf);  
adt/tsvector.c: vec = (TSVector) palloc0(len);  
adt/tsvector.c:                 vec = (TSVector) repalloc(vec, len);  
adt/tsvector.c:                  * Pad to 2-byte alignment if necessary. Though we used palloc0  
adt/tsvector.c:                  * for the initial allocation, subsequent repalloc'd memory areas  
  
...  
  
arrayfuncs.c:   dataPtr = (Datum *) palloc(nitems * sizeof(Datum));  
arrayfuncs.c:   nullsPtr = (bool *) palloc(nitems * sizeof(bool));  
arrayfuncs.c:   retval = (ArrayType *) palloc0(nbytes);  
  
...  
varlena.c:              sss = palloc(sizeof(VarStringSortSupport));  
varlena.c:              sss->buf1 = palloc(TEXTBUFLEN);  
varlena.c:              sss->buf2 = palloc(TEXTBUFLEN);  
varlena.c:                      sss->buf1 = palloc(sss->buflen1);  
varlena.c:                      sss->buf2 = palloc(sss->buflen2);  
varlena.c:      result = (bytea *) palloc(len);  
varlena.c:      res = (bytea *) palloc(VARSIZE(v));  
...  
  
  
-rw-r--r-- 1 digoal users 137344 Oct 25 04:08 acl.c  
-rw-r--r-- 1 digoal users   9269 Oct 25 04:08 amutils.c  
-rw-r--r-- 1 digoal users  13270 Oct 25 04:08 array_expanded.c  
-rw-r--r-- 1 digoal users 173266 Oct 25 04:08 arrayfuncs.c  
-rw-r--r-- 1 digoal users  33193 Oct 25 04:08 array_selfuncs.c  
-rw-r--r-- 1 digoal users  26170 Oct 25 04:08 array_typanalyze.c  
-rw-r--r-- 1 digoal users  25379 Oct 25 04:08 array_userfuncs.c  
-rw-r--r-- 1 digoal users   5742 Oct 25 04:08 arrayutils.c  
-rw-r--r-- 1 digoal users   4732 Oct 25 04:08 ascii.c  
-rw-r--r-- 1 digoal users   8139 Oct 25 04:08 bool.c  
-rw-r--r-- 1 digoal users  23931 Oct 25 04:08 cash.c  
-rw-r--r-- 1 digoal users   4442 Oct 25 04:08 char.c  
-rw-r--r-- 1 digoal users  64563 Oct 25 04:08 date.c  
-rw-r--r-- 1 digoal users 126062 Oct 25 04:08 datetime.c  
-rw-r--r-- 1 digoal users  10976 Oct 25 04:08 datum.c  
-rw-r--r-- 1 digoal users  23215 Oct 25 04:08 dbsize.c  
-rw-r--r-- 1 digoal users  10602 Oct 25 04:08 domains.c  
-rw-r--r-- 1 digoal users  10868 Oct 25 04:08 encode.c  
-rw-r--r-- 1 digoal users  12361 Oct 25 04:08 enum.c  
-rw-r--r-- 1 digoal users   3930 Oct 25 04:08 expandeddatum.c  
-rw-r--r-- 1 digoal users  82658 Oct 25 04:08 float.c  
-rw-r--r-- 1 digoal users 136395 Oct 25 04:08 formatting.c  
-rw-r--r-- 1 digoal users  11978 Oct 25 04:08 format_type.c  
-rw-r--r-- 1 digoal users  11402 Oct 25 04:08 genfile.c  
-rw-r--r-- 1 digoal users 124438 Oct 25 04:08 geo_ops.c  
-rw-r--r-- 1 digoal users   2316 Oct 25 04:08 geo_selfuncs.c  
-rw-r--r-- 1 digoal users  18718 Oct 25 04:08 geo_spgist.c  
-rw-r--r-- 1 digoal users   6491 Oct 25 04:08 inet_cidr_ntop.c  
-rw-r--r-- 1 digoal users  12305 Oct 25 04:08 inet_net_pton.c  
-rw-r--r-- 1 digoal users  32342 Oct 25 04:08 int8.c  
-rw-r--r-- 1 digoal users  30325 Oct 25 04:08 int.c  
-rw-r--r-- 1 digoal users  47499 Oct 25 04:08 jsonb.c  
-rw-r--r-- 1 digoal users  15766 Oct 25 04:08 jsonb_gin.c  
-rw-r--r-- 1 digoal users   6007 Oct 25 04:08 jsonb_op.c  
-rw-r--r-- 1 digoal users  50741 Oct 25 04:08 jsonb_util.c  
-rw-r--r-- 1 digoal users  64390 Oct 25 04:08 json.c  
-rw-r--r-- 1 digoal users 102478 Oct 25 04:08 jsonfuncs.c  
-rw-r--r-- 1 digoal users  12710 Oct 25 04:08 levenshtein.c  
-rw-r--r-- 1 digoal users  10743 Oct 25 04:08 like.c  
-rw-r--r-- 1 digoal users   9815 Oct 25 04:08 like_match.c  
-rw-r--r-- 1 digoal users  24444 Oct 25 04:08 lockfuncs.c  
-rw-r--r-- 1 digoal users   6792 Oct 25 04:08 mac.c  
-rw-r--r-- 1 digoal users  22403 Oct 25 04:08 misc.c  
-rw-r--r-- 1 digoal users  37550 Oct 25 04:08 nabstime.c  
-rw-r--r-- 1 digoal users   6947 Oct 25 04:08 name.c  
-rw-r--r-- 1 digoal users  31667 Oct 25 04:08 network.c  
-rw-r--r-- 1 digoal users  22779 Oct 25 04:08 network_gist.c  
-rw-r--r-- 1 digoal users  32861 Oct 25 04:08 network_selfuncs.c  
-rw-r--r-- 1 digoal users 213004 Oct 25 04:08 numeric.c  
-rw-r--r-- 1 digoal users   9498 Oct 25 04:08 numutils.c  
-rw-r--r-- 1 digoal users   9911 Oct 25 04:08 oid.c  
-rw-r--r-- 1 digoal users  22636 Oct 25 04:08 oracle_compat.c  
-rw-r--r-- 1 digoal users  38316 Oct 25 04:08 orderedsetaggs.c  
-rw-r--r-- 1 digoal users  41158 Oct 25 04:08 pg_locale.c  
-rw-r--r-- 1 digoal users   4446 Oct 25 04:08 pg_lsn.c  
-rw-r--r-- 1 digoal users  47784 Oct 25 04:08 pgstatfuncs.c  
-rw-r--r-- 1 digoal users   4902 Oct 25 04:08 pg_upgrade_support.c  
-rw-r--r-- 1 digoal users  14691 Oct 25 04:08 pseudotypes.c  
-rw-r--r-- 1 digoal users   2697 Oct 25 04:08 quote.c  
-rw-r--r-- 1 digoal users  61530 Oct 25 04:08 rangetypes.c  
-rw-r--r-- 1 digoal users  43048 Oct 25 04:08 rangetypes_gist.c  
-rw-r--r-- 1 digoal users  33892 Oct 25 04:08 rangetypes_selfuncs.c  
-rw-r--r-- 1 digoal users  29503 Oct 25 04:08 rangetypes_spgist.c  
-rw-r--r-- 1 digoal users  10402 Oct 25 04:08 rangetypes_typanalyze.c  
-rw-r--r-- 1 digoal users  34862 Oct 25 04:08 regexp.c  
-rw-r--r-- 1 digoal users  48615 Oct 25 04:08 regproc.c  
-rw-r--r-- 1 digoal users 105191 Oct 25 04:08 ri_triggers.c  
-rw-r--r-- 1 digoal users  48261 Oct 25 04:08 rowtypes.c  
-rw-r--r-- 1 digoal users 287720 Oct 25 04:08 ruleutils.c  
-rw-r--r-- 1 digoal users 225938 Oct 25 04:08 selfuncs.c  
-rw-r--r-- 1 digoal users   9692 Oct 25 04:08 tid.c  
-rw-r--r-- 1 digoal users 141798 Oct 25 04:08 timestamp.c  
-rw-r--r-- 1 digoal users   3152 Oct 25 04:08 trigfuncs.c  
-rw-r--r-- 1 digoal users  10077 Oct 25 04:08 tsginidx.c  
-rw-r--r-- 1 digoal users  18599 Oct 25 04:08 tsgistidx.c  
-rw-r--r-- 1 digoal users  25739 Oct 25 04:08 tsquery.c  
-rw-r--r-- 1 digoal users  14812 Oct 25 04:08 tsquery_cleanup.c  
-rw-r--r-- 1 digoal users   6232 Oct 25 04:08 tsquery_gist.c  
-rw-r--r-- 1 digoal users   7431 Oct 25 04:08 tsquery_op.c  
-rw-r--r-- 1 digoal users   8804 Oct 25 04:08 tsquery_rewrite.c  
-rw-r--r-- 1 digoal users   8621 Oct 25 04:08 tsquery_util.c  
-rw-r--r-- 1 digoal users  21322 Oct 25 04:08 tsrank.c  
-rw-r--r-- 1 digoal users  12304 Oct 25 04:08 tsvector.c  
-rw-r--r-- 1 digoal users  53573 Oct 25 04:08 tsvector_op.c  
-rw-r--r-- 1 digoal users   9259 Oct 25 04:08 tsvector_parser.c  
-rw-r--r-- 1 digoal users  13295 Oct 25 04:08 txid.c  
-rw-r--r-- 1 digoal users   9732 Oct 25 04:08 uuid.c  
-rw-r--r-- 1 digoal users  42245 Oct 25 04:08 varbit.c  
-rw-r--r-- 1 digoal users  24654 Oct 25 04:08 varchar.c  
-rw-r--r-- 1 digoal users 136176 Oct 25 04:08 varlena.c  
-rw-r--r-- 1 digoal users    491 Oct 25 04:08 version.c  
-rw-r--r-- 1 digoal users  10358 Oct 25 04:08 windowfuncs.c  
-rw-r--r-- 1 digoal users   4645 Oct 25 04:08 xid.c  
-rw-r--r-- 1 digoal users 104105 Oct 25 04:08 xml.c  

比如字符串,数组,全文检索,这些属于变长类型,最长允许1GB,在申请内存时,根据被处理的值的实际的大小申请。

cgroup例子

为了方便在一台物理机中启动多个实例,隔离资源,使用cgroup是一个很不错的方法。

# mount -t cgroup -o cpu,memory cpu_and_mem /cgroup/memory  
  
# cd /cgroup/memory  
  
# mkdir test  
  
# cd test  
  
# echo 102400000 > memory.limit_in_bytes  
  
# echo 102400000 > memory.memsw.limit_in_bytes  

把实例进程号写入tasks即可。

为了测试方便,我这里限制了100MB, 并且只将测试会话的backend process加入tasks

postgres=# select pg_backend_pid();  
 pg_backend_pid   
----------------  
          85938  
(1 row)  
# echo 85938 > tasks   

查看当前状态

# cat memory.stat   
cache 204800  
rss 0  
mapped_file 0  
pgpgin 974906  
pgpgout 974856  
swap 0  
inactive_anon 155648  
active_anon 0  
inactive_file 49152  
active_file 0  
unevictable 0  
hierarchical_memory_limit 102400000  
hierarchical_memsw_limit 102400000  
total_cache 204800  
total_rss 0  
total_mapped_file 0  
total_pgpgin 974906  
total_pgpgout 974856  
total_swap 0  
total_inactive_anon 155648  
total_active_anon 0  
total_inactive_file 49152  
total_active_file 0  
total_unevictable 0  

模拟数据库进程被OOM

因为限制了100MB,所以这个数据库backend process需要申请超过100MB的内存,才会被OOM。

根据前面讲的,排序、聚合、使用较大的列、使用很长的SQL(值SQL本身的内容长度),或者传入很大的变长类型值时,都可能使得单个连接就会消耗很大的内存。

注意每次被OOM后,重连,并将新的BACKEND PID写入CGROUP的tasks再测试下一轮.

1. 以使用较大的列为例,将1亿个值,聚合为一个数组,由于数组属于变长类型,最长可以放1GB,1亿已经超过1GB了,所以触发了OOM。

postgres=# explain (analyze,verbose,timing,costs,buffers) select array_agg(id::text) from generate_series(1,10000000) t(id);  
server closed the connection unexpectedly  
        This probably means the server terminated abnormally  
        before or while processing the request.  
The connection to the server was lost. Attempting reset: Failed.  
!>   

2. 对较大的表排序,并且设置较大的work_mem

postgres=# set work_mem ='101MB';  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select id from generate_series(1,10000000) t(id) order by id;  
server closed the connection unexpectedly  
        This probably means the server terminated abnormally  
        before or while processing the request.  
The connection to the server was lost. Attempting reset: Failed.  
!>   

使用较小的work_mem不会被OOM,因为使用了临时文件。

postgres=# set work_mem ='10MB';  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select id from generate_series(1,10000000) t(id) order by id;  
                                                                    QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=59.83..62.33 rows=1000 width=4) (actual time=10291.920..11930.237 rows=10000000 loops=1)  
   Output: id  
   Sort Key: t.id  
   Sort Method: external sort  Disk: 136856kB  
   Buffers: shared hit=3, temp read=34198 written=34197  
   ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..10.00 rows=1000 width=4) (actual time=1573.165..3261.392 rows=10000000 loops=1)  
         Output: id  
         Function Call: generate_series(1, 10000000)  
         Buffers: temp read=17091 written=17090  
 Planning time: 0.232 ms  
 Execution time: 12654.654 ms  
(11 rows)  

3. autovacuum worker进程启动后,单个WORKER进程可能需要申请的内存大小为maintenance_work_mem或者vacuum_work_mem。

4. 并行QUERY

5. 带有多个hash join,多个排序操作的复杂QUERY,可能消耗多份WORK_MEM。

这种操作不需要很多内存:

比如查询了一张很大的表,返回了大批量(比如一亿)记录,即使不使用流式接收,也不需要很多内存。

审计

终于说到审计了,没错,当OOM发生后,我们怎么找到压死骆驼的最后一根稻草,或者是罪魁祸首呢?

由于OOM发的是KILL -9的信号,被KILL的进程根本无法通过捕获信号来记录当时正在执行的QUERY或者当时的状态。

那么审计就很有用了。

有这么几个参数

postgres=# set log_statement='all';  // 在SQL请求时就写日志  
SET  
postgres=# set log_min_duration_statement ='0';  // 在SQL执行结束才写日志  
SET  
postgres=# set log_duration =on;  // 在SQL执行结束才写日志  
SET  

显然,如果我们需要在OOM后,还能找到被OOM进程当时执行QUERY的蛛丝马迹,方法1:在请求时就记录下它在执行什么(开启log_statement=’all’),方法2:记录detail字段,postmaster进程会收集这部分信息,不管什么方法,超过track_activity_query_size长度的QUERY都被截断。

例如开启log_statement=’all’;后,我们能在日志中看到这样的信息。

开启了log_statement='all';后,在客户端发起QUERY请求时的日志。
2017-01-03 16:22:44.612 CST,"postgres","postgres",85938,"127.0.0.1:27719",586b5f18.14fb2,12,"idle",2017-01-03 16:21:44 CST,2/11,0,LOG,00000,"statement: explain (analyze,verbose,timing,costs,buffers) select array_agg(id::text) from generate_series(1,10000000) t(id);",,,,,,,,"exec_simple_query, postgres.c:935","psql"

被KILL后,postmaster进程收集到的收到KILL -9信号的backend process正在执行的SQL日志
2017-01-03 16:22:49.041 CST,,,72682,,586b5a2d.11bea,11,,2017-01-03 16:00:45 CST,,0,LOG,00000,"server process (PID 85938) was terminated by signal 9: Killed","Failed process was running: explain (analyze,verbose,timing,costs,buffers) select array_agg(id::text) from generate_series(1,10000000) t(id);",,,,,,,"LogChildExit, postmaster.c:3502",""

postmaster开始干掉所有进程,然后会进入恢复模式
2017-01-03 16:22:49.041 CST,,,72682,,586b5a2d.11bea,12,,2017-01-03 16:00:45 CST,,0,LOG,00000,"terminating any other active server processes",,,,,,,,"HandleChildCrash, postmaster.c:3222",""

autovacuum launcher进程的日志
2017-01-03 16:22:49.041 CST,,,83546,,586b5e28.1465a,2,,2017-01-03 16:17:44 CST,1/0,0,WARNING,57P02,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,"quickdie, postgres.c:2601",""

尝试连接数据库的进程,提示还在恢复状态
2017-01-03 16:22:49.044 CST,"postgres","postgres",86571,"127.0.0.1:27747",586b5f59.1522b,2,"",2017-01-03 16:22:49 CST,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,,"ProcessStartupPacket, postmaster.c:2187",""

postmaster在干掉所有进程后,数据库重新初始化(构造共享内存区,启动服务端进程(autovacuum, log, writer, ...等)等动作)
2017-01-03 16:22:49.147 CST,,,72682,,586b5a2d.11bea,13,,2017-01-03 16:00:45 CST,,0,LOG,00000,"all server processes terminated; reinitializing",,,,,,,,"PostmasterStateMachine, postmaster.c:3746",""

参考

《精确度量Linux下进程占用多少内存的方法》

《一个笛卡尔积的update from语句引发的(内存泄露?)问题》

《PostgreSQL relcache在长连接应用中的内存霸占”坑”》

《Linux page allocation failure 的问题处理 - lowmem_reserve_ratio》

Flag Counter

digoal’s 大量PostgreSQL文章入口