PostgreSQL 开启“审计日志、时间记录”带来的性能影响有多少?

13 minute read

背景

开启审计日志,或者开启数据库的SQL耗时记录,会给数据库带来多大的性能开销?

1、审计日志通过log_statement设置,可以设置为all, ddl, mod, none,分别表示审计所有SQL,DDL SQL,DML SQL,不审计。

开启审计时,在执行SQL的时候(提交SQL请求时),数据库日志中打印类似这样的信息:

2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,8,"idle",2018-06-16 14:47:12 CST,3/1484181,0,LOG,00000,"statement: select count(*) from pg_class;",,,,,,,,"exec_simple_query, postgres.c:940","psql"      

2、SQL耗时记录,通过设置log_duration参数来指定,设置为ON 或OFF。

如果打开时间记录,那么在SQL执行结束时,打印SQL的耗时。(不包括将结果传输到客户端的时间)

2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,9,"SELECT",2018-06-16 14:47:12 CST,3/0,0,LOG,00000,"duration: 0.609 ms",,,,,,,,"exec_simple_query, postgres.c:1170","psql"      

开启这两个开关,有多大的性能开销?

1、开启审计,实际上性能开销较小,因为日志是异步管道输出,而且目前的BUFFER已经调得比较大,性能损耗小。

但是,请注意,如果并发非常高,大量写日志可能引发mutex lock冲突,导致性能骤降。PostgreSQL syslogger相关代码有改进优化的地方。

src/backend/postmaster/syslogger.c

/*      
 * Buffers for saving partial messages from different backends.      
 *      
 * Keep NBUFFER_LISTS lists of these, with the entry for a given source pid      
 * being in the list numbered (pid % NBUFFER_LISTS), so as to cut down on      
 * the number of entries we have to examine for any one incoming message.      
 * There must never be more than one entry for the same source pid.      
 *      
 * An inactive buffer is not removed from its list, just held for re-use.      
 * An inactive buffer has pid == 0 and undefined contents of data.      
 */      
typedef struct      
{      
        int32           pid;                    /* PID of source process */      
        StringInfoData data;            /* accumulated data, as a StringInfo */      
} save_buffer;      
      
#define NBUFFER_LISTS 256      
static List *buffer_lists[NBUFFER_LISTS];      

2、开启时间统计,这个开销主要取决于读取时钟的额外开销。

通过pg_test_timing可以测试获取系统时间带来的额外开销。

数据库中如果要统计IO TIMING(track_io_timing = on),以及auto_explain中的log timing,开销也与之类似。

pg_test_timing -d 10      
Testing timing overhead for 10 seconds.      
Per loop time including overhead: 42.38 ns      
Histogram of timing durations:      
  < us   % of total      count      
     1     95.76853  225960491      
     2      4.22695    9973261      
     4      0.00430      10145      
     8      0.00005        109      
    16      0.00016        370      
    32      0.00001         32      
    64      0.00000          3      

实测

测试环境为ECS主机,测出了一个虚拟机的性能缺陷。

1、创建测试数据

pgbench -i -s 300      

2、测试小事务

pgbench -M prepared -v -r -P 1 -c 56 -j 56 -T 120 -S      

3、关闭log_statement与log_duration

transaction type: <builtin: select only>      
scaling factor: 300      
query mode: prepared      
number of clients: 56      
number of threads: 56      
duration: 120 s      
number of transactions actually processed: 25058528      
latency average = 0.268 ms      
latency stddev = 0.217 ms      
tps = 208818.253316 (including connections establishing)      
tps = 208834.342644 (excluding connections establishing)      
script statistics:      
 - statement latencies in milliseconds:      
         0.025  \set aid random(1, 100000 * :scale)      
         0.252  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

4、开启log_statement=all

transaction type: <builtin: select only>      
scaling factor: 300      
query mode: prepared      
number of clients: 56      
number of threads: 56      
duration: 120 s      
number of transactions actually processed: 23088368      
latency average = 0.291 ms      
latency stddev = 1.445 ms      
tps = 192401.524926 (including connections establishing)      
tps = 192413.921654 (excluding connections establishing)      
script statistics:      
 - statement latencies in milliseconds:      
         0.018  \set aid random(1, 100000 * :scale)      
         0.282  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

5、关闭log_statement,开启log_duration=on

transaction type: <builtin: select only>      
scaling factor: 300      
query mode: prepared      
number of clients: 56      
number of threads: 56      
duration: 120 s      
number of transactions actually processed: 12842222      
latency average = 0.523 ms      
latency stddev = 1.516 ms      
tps = 107017.184761 (including connections establishing)      
tps = 107025.759343 (excluding connections establishing)      
script statistics:      
 - statement latencies in milliseconds:      
         0.019  \set aid random(1, 100000 * :scale)      
         0.520  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

6、开启log_statement=all,同时开启log_duration=on

transaction type: <builtin: select only>      
scaling factor: 300      
query mode: prepared      
number of clients: 56      
number of threads: 56      
duration: 120 s      
number of transactions actually processed: 7484093      
latency average = 0.898 ms      
latency stddev = 1.386 ms      
tps = 62366.260357 (including connections establishing)      
tps = 62370.603530 (excluding connections establishing)      
script statistics:      
 - statement latencies in milliseconds:      
         0.016  \set aid random(1, 100000 * :scale)      
         0.897  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

7、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on

transaction type: <builtin: select only>      
scaling factor: 300      
query mode: prepared      
number of clients: 56      
number of threads: 56      
duration: 120 s      
number of transactions actually processed: 7161693      
latency average = 0.938 ms      
latency stddev = 1.150 ms      
tps = 59679.955252 (including connections establishing)      
tps = 59684.540717 (excluding connections establishing)      
script statistics:      
 - statement latencies in milliseconds:      
         0.015  \set aid random(1, 100000 * :scale)      
         0.938  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

诊断

测试结果与实际的时钟开销不相符,因为时钟开销实际上只有几十纳秒,但是测试结果很显然不止这么多。

profiling

Samples: 1M of event 'cpu-clock', Event count (approx.): 234709749074     
Overhead  Shared Object          Symbol                                   
  41.15%  [kernel]               [k] osq_lock                             
  19.00%  libc-2.17.so           [.] __mcount_internal                    
  10.37%  [kernel]               [k] _raw_spin_unlock_irqrestore    

perf

主要是锁冲突引起的性能下降。

# Children      Self  Command         Shared Object          Symbol                                           
# ........  ........  ..............  .....................  ...............................................  
#  
    55.47%     0.00%  postgres        postgres               [.] PostmasterMain  
            |  
            ---PostmasterMain  
               |            
               |--54.42%--ServerLoop  
               |          |            
               |           --54.20%--PostgresMain  
               |                     |            
               |                     |--40.68%--errfinish  
               |                     |          |            
               |                     |           --40.56%--EmitErrorReport  
               |                     |                     |            
               |                     |                     |--38.51%--__write_nocancel  
               |                     |                     |          |            
               |                     |                     |           --38.44%--system_call_fastpath  
               |                     |                     |                     |            
               |                     |                     |                      --38.43%--sys_write  
               |                     |                     |                                |            
               |                     |                     |                                 --38.39%--vfs_write  
               |                     |                     |                                           |            
               |                     |                     |                                            --38.32%--do_sync_write  
               |                     |                     |                                                      |            
               |                     |                     |                                                       --38.27%--pipe_write  
               |                     |                     |                                                                 |            
               |                     |                     |                                                                 |--21.55%--pipe_wait  
               |                     |                     |                                                                 |          |            
               |                     |                     |                                                                 |          |--20.07%--mutex_lock  
               |                     |                     |                                                                 |          |          |            
               |                     |                     |                                                                 |          |           --20.02%--__mutex_lock_slowpath  
               |                     |                     |                                                                 |          |                     |            
               |                     |                     |                                                                 |          |                     |--17.89%--osq_lock  
               |                     |                     |                                                                 |          |                     |            
               |                     |                     |                                                                 |          |                     |--1.44%--apic_timer_interrupt  
               |                     |                     |                                                                 |          |                     |          smp_apic_timer_interrupt  
               |                     |                     |                                                                 |          |                     |          irq_exit  
               |                     |                     |                                                                 |          |                     |          do_softirq  
               |                     |                     |                                                                 |          |                     |          call_softirq  
               |                     |                     |                                                                 |          |                     |          |            
               |                     |                     |                                                                 |          |                     |           --1.25%--__do_softirq  
               |                     |                     |                                                                 |          |                     |            
               |                     |                     |                                                                 |          |                      --0.61%--mutex_spin_on_owner  
               |                     |                     |                                                                 |          |            
               |                     |                     |                                                                 |           --0.94%--prepare_to_wait  
               |                     |                     |                                                                 |                     |            
               |                     |                     |                                                                 |                      --0.91%--_raw_spin_unlock_irqrestore  
               |                     |                     |                                                                 |            
               |                     |                     |                                                                 |--11.44%--mutex_lock  
               |                     |                     |                                                                 |          |            
               |                     |                     |                                                                 |           --11.35%--__mutex_lock_slowpath  
               |                     |                     |                                                                 |                     |            
               |                     |                     |                                                                 |                     |--10.11%--osq_lock  
               |                     |                     |                                                                 |                     |            
               |                     |                     |                                                                 |                      --0.69%--apic_timer_interrupt  
               |                     |                     |                                                                 |                                smp_apic_timer_interrupt  
               |                     |                     |                                                                 |                                irq_exit  
               |                     |                     |                                                                 |                                do_softirq  
               |                     |                     |                                                                 |                                call_softirq  
               |                     |                     |                                                                 |                                |            
               |                     |                     |                                                                 |                                 --0.60%--__do_softirq  
               |                     |                     |                                                                 |            
               |                     |                     |                                                                  --4.73%--__wake_up_sync_key  
               |                     |                     |                                                                            |            
               |                     |                     |                                                                             --4.68%--_raw_spin_unlock_irqrestore  
               |                     |                     |            
               |                     |                     |--0.72%--log_line_prefix.isra.1  
               |                     |                     |            
               |                     |                      --0.52%--_mcount  
               |                     |            
               |                     |--4.05%--PortalRun  
               |                     |          |            
               |                     |           --4.01%--PortalRunSelect  
               |                     |                     |            
               |                     |                      --3.91%--standard_ExecutorRun  
               |                     |                                |            
               |                     |                                 --3.27%--ExecScan  
               |                     |                                           |            
               |                     |                                            --3.18%--IndexNext  
               |                     |                                                      |            
               |                     |                                                       --2.95%--index_getnext  
               |                     |                                                                 |            
               |                     |                                                                 |--2.19%--index_getnext_tid  
               |                     |                                                                 |          |            
               |                     |                                                                 |           --2.16%--btgettuple  
               |                     |                                                                 |                     |            
               |                     |                                                                 |                      --2.08%--_bt_first  
               |                     |                                                                 |                                |            
               |                     |                                                                 |                                 --1.65%--_bt_search  
               |                     |                                                                 |                                           |            
               |                     |                                                                 |                                            --0.80%--_bt_relandgetbuf  
               |                     |                                                                 |            
               |                     |                                                                  --0.73%--index_fetch_heap  
               |                     |            
               |                     |--2.14%--PortalStart  
               |                     |          |            
               |                     |           --1.90%--standard_ExecutorStart  
               |                     |                     |            
               |                     |                      --1.64%--ExecInitNode  
               |                     |                                |            
               |                     |                                 --1.59%--ExecInitIndexScan  
               |                     |            
               |                     |--1.65%--finish_xact_command.part.4  
               |                     |          |            
               |                     |           --1.64%--CommitTransactionCommand  
               |                     |                     |            
               |                     |                      --1.49%--CommitTransaction  
               |                     |                                |            
               |                     |                                 --0.72%--PreCommit_Portals  
               |                     |                                           |            
               |                     |                                            --0.63%--PortalDrop  
               |                     |                                                      |            
               |                     |                                                       --0.53%--PortalCleanup  
               |                     |            
               |                     |--0.96%--pq_getbyte  
               |                     |          |            
               |                     |           --0.95%--pq_recvbuf  
               |                     |                     |            
               |                     |                      --0.62%--secure_read  
               |                     |            
               |                      --0.90%--ReadyForQuery  
               |                                |            
               |                                 --0.83%--socket_flush  
               |                                           |            
               |                                            --0.82%--internal_flush  
               |                                                      |            
               |                                                       --0.78%--__libc_send  
               |                                                                 |            
               |                                                                  --0.76%--system_call_fastpath  
               |                                                                            |            
               |                                                                             --0.75%--sys_sendto  
               |                                                                                       |            
               |                                                                                        --0.75%--SYSC_sendto  
               |                                                                                                  |            
               |                                                                                                   --0.73%--sock_sendmsg  
               |                                                                                                             |            
               |                                                                                                              --0.69%--unix_stream_sendmsg  
               |                                                                                                                        |            
               |                                                                                                                         --0.57%--sock_def_readable  
               |                                                                                                                                   |            
               |                                                                                                                                    --0.57%--__wake_up_sync_key  
               |                                                                                                                                              |            
               |                                                                                                                                               --0.54%--_raw_spin_unlock_irqrestore  
               |            
                --1.04%--SysLogger_Start  
                          |            
                           --0.84%--__read_nocancel  
                                     |            
                                      --0.84%--system_call_fastpath  
                                                |            
                                                 --0.84%--sys_read  
                                                           |            
                                                            --0.84%--vfs_read  
                                                                      |            
                                                                       --0.84%--do_sync_read  
                                                                                 |            
                                                                                  --0.82%--pipe_read  

为什么只开log_statement,没有触发这个锁冲突呢?实际上只是并发没有够,或者调用elog的频率还不够。(开启log_duration后,调用elog翻倍,所以同样并发更容易触发这个锁冲突)

1、将并发调低,锁冲突问题减少,性能损耗相对来说就很小。

Samples: 672K of event 'cpu-clock', Event count (approx.): 146760374621     
Overhead  Shared Object        Symbol                                       
  29.73%  libc-2.17.so         [.] __mcount_internal                        
   6.37%  [kernel]             [k] _raw_spin_unlock_irqrestore              
   5.93%  libc-2.17.so         [.] _mcount                                  
   4.18%  libc-2.17.so         [.] vfprintf                                 
   2.96%  [kernel]             [k] osq_lock                                 
   2.92%  [kernel]             [k] __do_softirq                             
   1.80%  [kernel]             [k] finish_task_switch                       
   1.34%  postgres             [.] hash_search_with_hash_value              
   1.20%  libc-2.17.so         [.] _IO_default_xsputn     

1、关闭log_statement与log_duration

transaction type: <builtin: select only>    
scaling factor: 300    
query mode: prepared    
number of clients: 28    
number of threads: 28    
duration: 120 s    
number of transactions actually processed: 17770447    
latency average = 0.189 ms    
latency stddev = 0.077 ms    
tps = 148086.294302 (including connections establishing)    
tps = 148093.564454 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         0.013  \set aid random(1, 100000 * :scale)    
         0.176  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;    

2、开启log_statement=all

transaction type: <builtin: select only>    
scaling factor: 300    
query mode: prepared    
number of clients: 28    
number of threads: 28    
duration: 120 s    
number of transactions actually processed: 17236212    
latency average = 0.195 ms    
latency stddev = 0.321 ms    
tps = 143634.319683 (including connections establishing)    
tps = 143642.034125 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         0.010  \set aid random(1, 100000 * :scale)    
         0.185  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;    

3、关闭log_statement,开启log_duration=on

transaction type: <builtin: select only>    
scaling factor: 300    
query mode: prepared    
number of clients: 28    
number of threads: 28    
duration: 120 s    
number of transactions actually processed: 16774800    
latency average = 0.200 ms    
latency stddev = 0.977 ms    
tps = 139789.393297 (including connections establishing)    
tps = 139795.661393 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         0.008  \set aid random(1, 100000 * :scale)    
         0.193  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;    

4、开启log_statement=all,同时开启log_duration=on

transaction type: <builtin: select only>    
scaling factor: 300    
query mode: prepared    
number of clients: 28    
number of threads: 28    
duration: 120 s    
number of transactions actually processed: 14696887    
latency average = 0.229 ms    
latency stddev = 0.277 ms    
tps = 122473.319859 (including connections establishing)    
tps = 122479.381454 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         0.007  \set aid random(1, 100000 * :scale)    
         0.224  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;    

5、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on

transaction type: <builtin: select only>    
scaling factor: 300    
query mode: prepared    
number of clients: 28    
number of threads: 28    
duration: 120 s    
number of transactions actually processed: 14606445    
latency average = 0.230 ms    
latency stddev = 0.726 ms    
tps = 121719.147924 (including connections establishing)    
tps = 121725.278733 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         0.007  \set aid random(1, 100000 * :scale)    
         0.226  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;    

2、使用高并发,即使仅开启log_statement也会触发锁问题。

只开log_statement,调用elog是同时开启log_statement, log_duraiton的一半,所以仅开启log_statement时把并发开到一倍(112),就可以观测到问题。

transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 112
number of threads: 112
duration: 120 s
number of transactions actually processed: 11832722
latency average = 1.135 ms
latency stddev = 2.437 ms
tps = 98604.216501 (including connections establishing)
tps = 98621.113426 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.021  \set aid random(1, 100000 * :scale)
         1.149  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

perf 如下

Samples: 557K of event 'cpu-clock', Event count (approx.): 94808854918  
  Children      Self  Shared Object       Symbol                        
+   67.35%     0.22%  postgres            [.] PostgresMain              
+   45.50%     0.01%  [kernel]            [k] system_call_fastpath      
+   41.27%     0.02%  postgres            [.] errfinish                 
+   41.14%     0.27%  postgres            [.] EmitErrorReport           
+   39.62%     0.01%  [kernel]            [k] sys_write                 
+   39.56%     0.03%  [kernel]            [k] vfs_write                 
+   39.48%     0.02%  [kernel]            [k] do_sync_write             
+   39.31%     0.01%  libpthread-2.17.so  [.] __write_nocancel          
+   39.06%     0.08%  [kernel]            [k] pipe_write                
+   35.00%     0.09%  [kernel]            [k] mutex_lock                
+   34.88%     0.12%  [kernel]            [k] __mutex_lock_slowpath     
+   33.71%     0.00%  libc-2.17.so        [.] __libc_start_main         
+   33.61%    33.61%  [kernel]            [k] osq_lock                  
+   32.61%     0.00%  postgres            [.] PostmasterMain            
+   32.61%     0.00%  postgres            [.] main                      
+   32.33%     0.00%  postgres            [.] ServerLoop                
+   31.63%     2.75%  libc-2.17.so        [.] _mcount                   
+   29.31%    29.14%  libc-2.17.so        [.] __mcount_internal         
+   27.19%     0.02%  [kernel]            [k] pipe_wait                 
+   21.76%     0.27%  pgbench             [.] threadRun                 
+   11.30%     0.00%  libpthread-2.17.so  [.] start_thread      

syslogger相关代码

src/backend/tcop/postgres.c

    934         /* Log immediately if dictated by log_statement */  
    935         if (check_log_statement(parsetree_list))  
    936         {  
    937                 ereport(LOG,  
    938                                 (errmsg("statement: %s", query_string),  
    939                                  errhidestmt(true),  
    940                                  errdetail_execute(parsetree_list)));  
    941                 was_logged = true;  
    942         }  
  
  
  
   1162         /*  
   1163          * Emit duration logging if appropriate.  
   1164          */  
   1165         switch (check_log_duration(msec_str, was_logged))  
   1166         {  
   1167                 case 1:  
   1168                         ereport(LOG,  
   1169                                         (errmsg("duration: %s ms", msec_str),  
   1170                                          errhidestmt(true)));  
   1171                         break;  
   1172                 case 2:  
   1173                         ereport(LOG,  
   1174                                         (errmsg("duration: %s ms  statement: %s",  
   1175                                                         msec_str, query_string),  
   1176                                          errhidestmt(true),  
   1177                                          errdetail_execute(parsetree_list)));  
   1178                         break;  
   1179         }  
  
  
  
   1946         /* Log immediately if dictated by log_statement */  
   1947         if (check_log_statement(portal->stmts))  
   1948         {  
   1949                 ereport(LOG,  
   1950                                 (errmsg("%s %s%s%s: %s",  
   1951                                                 execute_is_fetch ?  
   1952                                                 _("execute fetch from") :  
   1953                                                 _("execute"),  
   1954                                                 prepStmtName,  
   1955                                                 *portal_name ? "/" : "",  
   1956                                                 *portal_name ? portal_name : "",  
   1957                                                 sourceText),  
   1958                                  errhidestmt(true),  
   1959                                  errdetail_params(portalParams)));  
   1960                 was_logged = true;  
   1961         }  

src/backend/utils/error/elog.c

/*  
 * Constructs the error message, depending on the Errordata it gets, in a CSV  
 * format which is described in doc/src/sgml/config.sgml.  
 */  
static void  
write_csvlog(ErrorData *edata)  
{  
............  
        /* If in the syslogger process, try to write messages direct to file */  
        if (am_syslogger)  
                write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);  
        else  
                write_pipe_chunks(buf.data, buf.len, LOG_DESTINATION_CSVLOG);  
  
        pfree(buf.data);  
}  
  
  
  
  
  
  
  
  
/*  
 * Send data to the syslogger using the chunked protocol  
 *  
 * Note: when there are multiple backends writing into the syslogger pipe,  
 * it's critical that each write go into the pipe indivisibly, and not  
 * get interleaved with data from other processes.  Fortunately, the POSIX  
 * spec requires that writes to pipes be atomic so long as they are not  
 * more than PIPE_BUF bytes long.  So we divide long messages into chunks  
 * that are no more than that length, and send one chunk per write() call.  
 * The collector process knows how to reassemble the chunks.  
 *  
 * Because of the atomic write requirement, there are only two possible  
 * results from write() here: -1 for failure, or the requested number of  
 * bytes.  There is not really anything we can do about a failure; retry would  
 * probably be an infinite loop, and we can't even report the error usefully.  
 * (There is noplace else we could send it!)  So we might as well just ignore  
 * the result from write().  However, on some platforms you get a compiler  
 * warning from ignoring write()'s result, so do a little dance with casting  
 * rc to void to shut up the compiler.  
 */  
static void  
write_pipe_chunks(char *data, int len, int dest)  
{  
        PipeProtoChunk p;  
        int                     fd = fileno(stderr);  
        int                     rc;  
  
        Assert(len > 0);  
  
        p.proto.nuls[0] = p.proto.nuls[1] = '\0';  
        p.proto.pid = MyProcPid;  
  
        /* write all but the last chunk */  
        while (len > PIPE_MAX_PAYLOAD)  
        {  
                p.proto.is_last = (dest == LOG_DESTINATION_CSVLOG ? 'F' : 'f');  
                p.proto.len = PIPE_MAX_PAYLOAD;  
                memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);  
                rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);  
                (void) rc;  
                data += PIPE_MAX_PAYLOAD;  
                len -= PIPE_MAX_PAYLOAD;  
        }  
  
        /* write the last chunk */  
        p.proto.is_last = (dest == LOG_DESTINATION_CSVLOG ? 'T' : 't');  
        p.proto.len = len;  
        memcpy(p.proto.data, data, len);  
        rc = write(fd, &p, PIPE_HEADER_SIZE + len);  
        (void) rc;  
}  
# Overhead          Command                        Shared Object                                                                                                                                                                               
# ........  ...............  ...................................  ............................  
#  
    32.77%         postgres  [kernel.kallsyms]                    [k] __mutex_lock_slowpath  
                   |  
                   --- __mutex_lock_slowpath  
                      |            
                      |--100.00%-- mutex_lock  
                      |          |            
                      |          |--96.97%-- pipe_lock  
                      |          |          pipe_wait  
                      |          |          |            
                      |          |          |--99.99%-- pipe_write  
                      |          |          |          do_sync_write  
                      |          |          |          vfs_write  
                      |          |          |          sys_write  
                      |          |          |          system_call_fastpath  
                      |          |          |          __write_nocancel  
                      |          |          |          |            
                      |          |          |           --100.00%-- write_csvlog  
                      |          |          |                     EmitErrorReport  
                      |          |          |                     errfinish  
                      |          |          |                     PostgresMain  
                      |          |          |                     PostmasterMain  
                      |          |          |                     main  
                      |          |          |                     __libc_start_main  
                      |          |           --0.01%-- [...]  
                      |          |            
                      |          |--2.84%-- pipe_write  
                      |          |          do_sync_write  
                      |          |          vfs_write  
                      |          |          sys_write  
                      |          |          system_call_fastpath  
                      |          |          |            
                      |          |          |--99.58%-- __write_nocancel  
                      |          |          |          |            
                      |          |          |           --100.00%-- write_csvlog  
                      |          |          |                     EmitErrorReport  
                      |          |          |                     errfinish  
                      |          |          |                     PostgresMain  
                      |          |          |                     PostmasterMain  
                      |          |          |                     main  
                      |          |          |                     __libc_start_main  
                      |          |           --0.42%-- [...]  
                      |           --0.19%-- [...]  
                       --0.00%-- [...]  
  
    17.76%         postgres  [kernel.kallsyms]                    [k] mutex_spin_on_owner  
                   |  
                   --- mutex_spin_on_owner  
                      |            
                      |--99.96%-- __mutex_lock_slowpath  
                      |          mutex_lock  
                      |          |            
                      |          |--98.91%-- pipe_lock  
                      |          |          pipe_wait  
                      |          |          |            
                      |          |          |--100.00%-- pipe_write  
                      |          |          |          do_sync_write  
                      |          |          |          vfs_write  
                      |          |          |          sys_write  
                      |          |          |          system_call_fastpath  
                      |          |          |          __write_nocancel  
                      |          |          |          |            
                      |          |          |           --100.00%-- write_csvlog  
                      |          |          |                     EmitErrorReport  
                      |          |          |                     errfinish  
                      |          |          |                     PostgresMain  
                      |          |          |                     PostmasterMain  
                      |          |          |                     main  
                      |          |          |                     __libc_start_main  
                      |          |           --0.00%-- [...]  
                      |          |            
                      |          |--0.96%-- pipe_write  
                      |          |          do_sync_write  
                      |          |          vfs_write  
                      |          |          sys_write  
                      |          |          system_call_fastpath  
                      |          |          |            
                      |          |          |--97.72%-- __write_nocancel  
                      |          |          |          |            
                      |          |          |           --100.00%-- write_csvlog  
                      |          |          |                     EmitErrorReport  
                      |          |          |                     errfinish  
                      |          |          |                     PostgresMain  
                      |          |          |                     PostmasterMain  
                      |          |          |                     main  
                      |          |          |                     __libc_start_main  
                      |          |          |            
                      |          |           --2.28%-- 0x7f89375b59fd  
                      |          |                     |            
                      |          |                     |--58.85%-- 0x3731206e754a205d  
                      |          |                     |            
                      |          |                     |--24.32%-- 0x6574616572635f74  
                      |          |                     |            
                      |          |                      --16.82%-- 0xa746e756f6d65  
                      |           --0.13%-- [...]  
                       --0.04%-- [...]  

小结

PostgreSQL syslogger elog部分还有优化的空间,当调用elog并发非常高时,会触发lock冲突,导致性能下降。

测试结果1:

高并发,log_statement+log_duration 同时开启,56并发时引发lock冲突影响了性能。

测试CASE QPS
关闭审计,关闭时间统计 208834
开启审计 192413
开启SQL时间统计 107025
同时开启审计,SQL时间统计 62370
同时开启审计,SQL时间统计,IO时间统计 59684

测试结果2:

低并发,开启审计和时间统计,未触发lock冲突,性能损耗小。

测试CASE QPS
关闭审计,关闭时间统计 148093
开启审计 143642
开启SQL时间统计 139795
同时开启审计,SQL时间统计 122479
同时开启审计,SQL时间统计,IO时间统计 121725

参考

https://www.postgresql.org/docs/11/static/pgtesttiming.html

《Linux 时钟精度 与 PostgreSQL auto_explain (explain timing 时钟开销估算)》

《PostgreSQL 代码性能诊断之 - OProfile & Systemtap》

《PostgreSQL DaaS设计注意 - schema与database的抉择》

Flag Counter

digoal’s 大量PostgreSQL文章入口