Systemtap EXP: PostgreSQL IN-BUILD mark Class 7 - others(statement,xlog,sort)

10 minute read

背景

本文介绍一下剩余的几个PostgreSQL探针:   
1. 语句状态改变探针, 在pg_stat_activity.status值改变时触发, 值为字符串, 可以从中获取到SQL语句.  
2. xlog探针, 在插入WAL record时触发. 可以获取到被插入record的rmid, info flags.  
3. xlog探针, 在xlog段切换时触发.   
4. 排序开始探针, 可获取到排序类别(heap,index,datum), 是否强制唯一值, 排序的列数量, work_mem kbytes, 是否需要随机访问排序结果.  
5. 排序结束探针, 可以获取到本探针处的排序空间(磁盘 或 内存), 以及消耗的大小blocks or kbytes.  

探针的详细信息如下 :

name parameter desc
statement-status (const char *) Probe that fires anytime the server process updates its pg_stat_activity.status. arg0 is the new status string.
xlog-insert (unsigned char, unsigned char) Probe that fires when a WAL record is inserted. arg0 is the resource manager (rmid) for the record. arg1 contains the info flags.
xlog-switch () Probe that fires when a WAL segment switch is requested.
sort-start (int, bool, int, int, bool) Probe that fires when a sort operation is started. arg0 indicates heap, index or datum sort. arg1 is true for unique-value enforcement. arg2 is the number of key columns. arg3 is the number of kilobytes of work memory allowed. arg4 is true if random access to the sort result is required.
sort-done (bool, long) Probe that fires when a sort is complete. arg0 is true for external sort, false for internal sort. arg1 is the number of disk blocks used for an external sort, or kilobytes of memory used for an internal sort.
[举例]  
1. 语句探针, 当语句状态改变(pg_stat_activity.status)时, 输出语句的当前值.  
[root@db-172-16-3-150 postgresql-9.3.1]# stap -e '  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status") {printdln("**", pn(), $arg1 ? user_string($arg1) : "0")   
}'  
执行SQL :   
digoal=# begin;  
WARNING:  there is already a transaction in progress  
BEGIN  
digoal=# end;  
COMMIT  
digoal=# select * from t1;  
digoal=# select * from t1 limit 1;  
 id |               info                 
----+----------------------------------  
  1 | 006f3673faa5991478e6db0c01c88716  
(1 row)  
  
digoal=# begin;  
BEGIN  
digoal=# select * from t1 limit 1;  
 id |               info                 
----+----------------------------------  
  1 | 006f3673faa5991478e6db0c01c88716  
(1 row)  
  
digoal=# select * from t1 limit 1;  
 id |               info                 
----+----------------------------------  
  1 | 006f3673faa5991478e6db0c01c88716  
(1 row)  
  
digoal=# select * from t1 limit 1;  
 id |               info                 
----+----------------------------------  
  1 | 006f3673faa5991478e6db0c01c88716  
(1 row)  
digoal=# select * from t12 limit 1;  
ERROR:  relation "t12" does not exist  
LINE 1: select * from t12 limit 1;  
                      ^  
输出 :   
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**begin;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**end;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t1 limit 1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**begin;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t1 limit 1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t1 limit 1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t1 limit 1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**select * from t12 limit 1;  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("statement__status")**0  
2. xlog探针  
xlog record data插入跟踪, 可以得到rmid(枚举,), info flags(不同的rmid, info有不同的含义).  
rmid和info的介绍参考本文源码部分.  
测试stap如下 :   
[root@db-172-16-3-150 postgresql-9.3.1]# stap -e '  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("xlog__insert") {  
  printf("rmid:%u, info:%d\n", $arg1 ,$arg2)  
}'  
输出如下 :   
digoal=# insert into t1 values (1,'test');  
INSERT 0 1  
OUTPUT :   
rmid:10, info:0  
rmid:1, info:96  
简单介绍, rmid=10代表这是个heap_redo, info=0在heap_redo中的含义是INSERT.  
rmid=1表示xact_redo, info=96在xact_redo中的含义是XLOG_XACT_COMMIT_COMPACT, 如下 .  
src/include/access/xact.h  
/* ----------------  
   99  *      transaction-related XLOG entries  
  100  * ----------------  
  101  */  
  102   
  103 /*  
  104  * XLOG allows to store some information in high 4 bits of log  
  105  * record xl_info field  
  106  */  
  107 #define XLOG_XACT_COMMIT            0x00  
  108 #define XLOG_XACT_PREPARE           0x10  
  109 #define XLOG_XACT_ABORT             0x20  
  110 #define XLOG_XACT_COMMIT_PREPARED   0x30  
  111 #define XLOG_XACT_ABORT_PREPARED    0x40  
  112 #define XLOG_XACT_ASSIGNMENT        0x50  
  113 #define XLOG_XACT_COMMIT_COMPACT    0x60  
详见本文源码部分.  
其他输出不再解释, 可以去查doxygen.  
digoal=# checkpoint;  
CHECKPOINT  
OUTPUT :   
rmid:8, info:16  
rmid:0, info:16  
  
digoal=# select pg_switch_xlog()  
digoal-# ;  
 pg_switch_xlog   
----------------  
 3/4226BC88  
(1 row)  
OUTPUT :   
rmid:0, info:64  
  
digoal=# select pg_start_backup('test');  
 pg_start_backup   
-----------------  
 3/43000028  
(1 row)  
OUTPUT :   
rmid:0, info:64  
rmid:8, info:16  
rmid:0, info:16  
  
digoal=# select pg_stop_backup();  
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup  
 pg_stop_backup   
----------------  
 3/430000F0  
(1 row)  
OUTPUT :   
rmid:0, info:80  
rmid:0, info:64  
3. xlog探针, 切换xlog时触发.  
[root@db-172-16-3-150 postgresql-9.3.1]# stap -e '  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("xlog__switch") {  
  println(pn())                               
}'  
SQL :   
digoal=# select pg_switch_xlog();  
 pg_switch_xlog   
----------------  
 3/45000728  
(1 row)  
digoal=# select pg_switch_xlog();  
 pg_switch_xlog   
----------------  
 3/5EE76270  
(1 row)  
输出  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("xlog__switch")  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("xlog__switch")  
4. 排序探针  
stap脚本 :   
[root@db-172-16-3-150 postgresql-9.3.1]# stap --vp 10000 -e '  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__start") {  
  if ($arg1 == 0) st="HEAP_SORT"  
  if ($arg1 == 1) st="INDEX_SORT"  
  if ($arg1 == 2) st="DATUM_SORT"  
  if ($arg1 == 3) st="CLUSTER_SORT"  
  printdln("**",pn(),st,$arg2,$arg3,$arg4,$arg5)  
}  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__done") {   
  if ($arg1) st="EXTERNAL_SORT"  
  if (! $arg1) st="MEM_SORT"  
  printdln("**",pn(),st,$arg2)  
}'  
SQL :   
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from t1 order by id;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=884772.20..898852.12 rows=5631970 width=11) (actual time=11457.870..13425.913 rows=5635072 loops=1)  
   Output: id, info  
   Sort Key: t1.id  
   Sort Method: external merge  Disk: 118696kB  
   Buffers: shared hit=31965, temp read=57480 written=57480  
   ->  Seq Scan on public.t1  (cost=0.00..88281.70 rows=5631970 width=11) (actual time=0.010..969.990 rows=5635072 loops=1)  
         Output: id, info  
         Buffers: shared hit=31962  
 Total runtime: 14040.272 ms  
(9 rows)  
# 使用了外部文件排序, 耗费14837个数据库, 与stap输出一致.  
digoal=# select 14837*8;  
 ?column?   
----------  
   118696  
(1 row)  
stap输出 :   
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__start")**HEAP_SORT**0**1**1024**0  
第一个0代表非唯一排序, 后面的1表示排序为1个key. 1024KB表示允许的work mem大小, 最后的0表示不需要随机访问排序结果  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__done")**EXTERNAL_SORT**14837  
14837个块, 和analyze结果一致.  
再来两个SQL, 修改work_mem后, 使用了内存排序.  
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from t1 order by id desc,info;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=884772.20..898852.12 rows=5631970 width=11) (actual time=15619.253..18770.273 rows=5635072 loops=1)  
   Output: id, info  
   Sort Key: t1.id, t1.info  
   Sort Method: external merge  Disk: 118704kB  
   Buffers: shared hit=31962, temp read=58144 written=58144  
   ->  Seq Scan on public.t1  (cost=0.00..88281.70 rows=5631970 width=11) (actual time=0.011..961.801 rows=5635072 loops=1)  
         Output: id, info  
         Buffers: shared hit=31962  
 Total runtime: 19390.284 ms  
(9 rows)  
  
digoal=# set work_mem='1024MB';  
SET  
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from t1 order by id desc,info;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=719772.20..733852.12 rows=5631970 width=11) (actual time=4449.092..5611.460 rows=5635072 loops=1)  
   Output: id, info  
   Sort Key: t1.id, t1.info  
   Sort Method: quicksort  Memory: 476753kB  
   Buffers: shared hit=31962  
   ->  Seq Scan on public.t1  (cost=0.00..88281.70 rows=5631970 width=11) (actual time=0.012..901.700 rows=5635072 loops=1)  
         Output: id, info  
         Buffers: shared hit=31962  
 Total runtime: 6203.356 ms  
(9 rows)  
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from t1 group by info,id order by id desc,info;  
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=175868.07..177276.06 rows=563197 width=11) (actual time=3792.241..3793.914 rows=11000 loops=1)  
   Output: id, info  
   Sort Key: t1.id, t1.info  
   Sort Method: quicksort  Memory: 931kB  
   Buffers: shared hit=31962  
   ->  HashAggregate  (cost=116441.55..122073.52 rows=563197 width=11) (actual time=3777.310..3784.205 rows=11000 loops=1)  
         Output: id, info  
         Buffers: shared hit=31962  
         ->  Seq Scan on public.t1  (cost=0.00..88281.70 rows=5631970 width=11) (actual time=0.017..920.271 rows=5635072 loops=1)  
               Output: id, info  
               Buffers: shared hit=31962  
 Total runtime: 3800.721 ms  
(12 rows)  
输出  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__start")**HEAP_SORT**0**2**1024**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__done")**EXTERNAL_SORT**14838  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__start")**HEAP_SORT**0**2**1048576**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__done")**MEM_SORT**476753  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__start")**HEAP_SORT**0**2**1048576**0  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("sort__done")**MEM_SORT**931  

参考

1. http://www.postgresql.org/docs/9.3/static/dynamic-trace.html

2.

src/backend/postmaster/pgstat.c  
src/backend/access/transam/xlog.c  
src/backend/utils/sort/tuplesort.c  
src/include/access/rmgr.h  
src/include/access/rmgrlist.h  
src/include/access/xact.h  

3. 探针信息 :

/* TRACE_POSTGRESQL_STATEMENT_STATUS ( const char *) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_STATEMENT_STATUS_ENABLED() __builtin_expect (statement__status_semaphore, 0)  
#define postgresql_statement__status_semaphore statement__status_semaphore  
#else  
#define TRACE_POSTGRESQL_STATEMENT_STATUS_ENABLED() __builtin_expect (postgresql_statement__status_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_statement__status_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_STATEMENT_STATUS(arg1) \  
DTRACE_PROBE1(postgresql,statement__status,arg1)  
  
/* TRACE_POSTGRESQL_XLOG_INSERT ( unsigned char, unsigned char) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_XLOG_INSERT_ENABLED() __builtin_expect (xlog__insert_semaphore, 0)  
#define postgresql_xlog__insert_semaphore xlog__insert_semaphore  
#else  
#define TRACE_POSTGRESQL_XLOG_INSERT_ENABLED() __builtin_expect (postgresql_xlog__insert_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_xlog__insert_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_XLOG_INSERT(arg1,arg2) \  
DTRACE_PROBE2(postgresql,xlog__insert,arg1,arg2)  
  
/* TRACE_POSTGRESQL_XLOG_SWITCH () */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_XLOG_SWITCH_ENABLED() __builtin_expect (xlog__switch_semaphore, 0)  
#define postgresql_xlog__switch_semaphore xlog__switch_semaphore  
#else  
#define TRACE_POSTGRESQL_XLOG_SWITCH_ENABLED() __builtin_expect (postgresql_xlog__switch_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_xlog__switch_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_XLOG_SWITCH() \  
DTRACE_PROBE(postgresql,xlog__switch)  
  
/* TRACE_POSTGRESQL_SORT_START ( int, char, int, int, char) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_SORT_START_ENABLED() __builtin_expect (sort__start_semaphore, 0)  
#define postgresql_sort__start_semaphore sort__start_semaphore  
#else  
#define TRACE_POSTGRESQL_SORT_START_ENABLED() __builtin_expect (postgresql_sort__start_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_sort__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_SORT_START(arg1,arg2,arg3,arg4,arg5) \  
DTRACE_PROBE5(postgresql,sort__start,arg1,arg2,arg3,arg4,arg5)  
  
/* TRACE_POSTGRESQL_SORT_DONE ( char, long) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_SORT_DONE_ENABLED() __builtin_expect (sort__done_semaphore, 0)  
#define postgresql_sort__done_semaphore sort__done_semaphore  
#else  
#define TRACE_POSTGRESQL_SORT_DONE_ENABLED() __builtin_expect (postgresql_sort__done_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_sort__done_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_SORT_DONE(arg1,arg2) \  
DTRACE_PROBE2(postgresql,sort__done,arg1,arg2)  

4. 探针在源码中的信息 :

语句状态改变探针:

src/backend/postmaster/pgstat.c

/* ----------  
 * pgstat_report_activity() -  
 *  
 *      Called from tcop/postgres.c to report what the backend is actually doing  
 *      (but note cmd_str can be NULL for certain cases).  
 *  
 * All updates of the status entry follow the protocol of bumping  
 * st_changecount before and after.  We use a volatile pointer here to  
 * ensure the compiler doesn't try to get cute.  
 * ----------  
 */  
void  
pgstat_report_activity(BackendState state, const char *cmd_str)  
{  
        volatile PgBackendStatus *beentry = MyBEEntry;  
        TimestampTz start_timestamp;  
        TimestampTz current_timestamp;  
        int                     len = 0;  
  
        TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);  
xlog插入和切换探针:   
src/backend/access/transam/xlog.c  
/*  
 * Insert an XLOG record having the specified RMID and info bytes,  
 * with the body of the record being the data chunk(s) described by  
 * the rdata chain (see xlog.h for notes about rdata).  
 *  
 * Returns XLOG pointer to end of record (beginning of next record).  
 * This can be used as LSN for data pages affected by the logged action.  
 * (LSN is the XLOG point up to which the XLOG must be flushed to disk  
 * before the data page can be written out.  This implements the basic  
 * WAL rule "write the log before the data".)  
 *  
 * NB: this routine feels free to scribble on the XLogRecData structs,  
 * though not on the data they reference.  This is OK since the XLogRecData  
 * structs are always just temporaries in the calling code.  
 */  
XLogRecPtr  
XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)  
{  
...  
        /* info's high bits are reserved for use by me */  
        if (info & XLR_INFO_MASK)  
                elog(PANIC, "invalid xlog info mask %02X", info);  
  
        TRACE_POSTGRESQL_XLOG_INSERT(rmid, info);  
...  
        /*  
         * If the record is an XLOG_SWITCH, we must now write and flush all the  
         * existing data, and then forcibly advance to the start of the next  
         * segment.  It's not good to do this I/O while holding the insert lock,  
         * but there seems too much risk of confusion if we try to release the  
         * lock sooner.  Fortunately xlog switch needn't be a high-performance  
         * operation anyway...  
         */  
        if (isLogSwitch)  
        {  
                XLogwrtRqst FlushRqst;  
                XLogRecPtr      OldSegEnd;  
  
                TRACE_POSTGRESQL_XLOG_SWITCH();  
...  
src/include/access/rmgr.h  
typedef uint8 RmgrId;  
...  
src/include/access/rmgrlist.h  
/* symbol name, textual name, redo, desc, startup, cleanup, restartpoint */  
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL)  
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, NULL, NULL, NULL)  
PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, NULL, NULL, NULL)  
PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, NULL, NULL, NULL)  
PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, NULL, NULL, NULL)  
PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL)  
PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL)  
PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL)  
PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, NULL, NULL, NULL)  
PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL)  
PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL)  
PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint)  
PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL)  
PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint)  
PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL)  
PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, NULL, NULL, NULL)  
PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_xlog_startup, spg_xlog_cleanup, NULL)  
不同的rmid, info有不同的含义, 例如heap insert  
src/include/access/heapam_xlog.h  
/*  
 * WAL record definitions for heapam.c's WAL operations  
 *  
 * XLOG allows to store some information in high 4 bits of log  
 * record xl_info field.  We use 3 for opcode and one for init bit.  
 */  
#define XLOG_HEAP_INSERT                0x00  
#define XLOG_HEAP_DELETE                0x10  
#define XLOG_HEAP_UPDATE                0x20  
/* 0x030 is free, was XLOG_HEAP_MOVE */  
#define XLOG_HEAP_HOT_UPDATE    0x40  
#define XLOG_HEAP_NEWPAGE               0x50  
#define XLOG_HEAP_LOCK                  0x60  
#define XLOG_HEAP_INPLACE               0x70  
排序探针:   
src/backend/utils/sort/tuplesort.c  
/*  
 *              tuplesort_begin_xxx  
 *  
 * Initialize for a tuple sort operation.  
 *  
 * After calling tuplesort_begin, the caller should call tuplesort_putXXX  
 * zero or more times, then call tuplesort_performsort when all the tuples  
 * have been supplied.  After performsort, retrieve the tuples in sorted  
 * order by calling tuplesort_getXXX until it returns false/NULL.  (If random  
 * access was requested, rescan, markpos, and restorepos can also be called.)  
 * Call tuplesort_end to terminate the operation and release memory/disk space.  
 *  
 * Each variant of tuplesort_begin has a workMem parameter specifying the  
 * maximum number of kilobytes of RAM to use before spilling data to disk.  
 * (The normal value of this parameter is work_mem, but some callers use  
 * other values.)  Each variant also has a randomAccess parameter specifying  
 * whether the caller needs non-sequential access to the sort result.  
 */  
...  
Tuplesortstate *  
tuplesort_begin_heap(TupleDesc tupDesc,  
                                         int nkeys, AttrNumber *attNums,  
                                         Oid *sortOperators, Oid *sortCollations,  
                                         bool *nullsFirstFlags,  
                                         int workMem, bool randomAccess)  
{  
...  
        state->nKeys = nkeys;  
  
        TRACE_POSTGRESQL_SORT_START(HEAP_SORT,  
                                                                false,  /* no unique check */  
                                                                nkeys,  
                                                                workMem,  
                                                                randomAccess);  
...  
Tuplesortstate *  
tuplesort_begin_cluster(TupleDesc tupDesc,  
                                                Relation indexRel,  
                                                int workMem, bool randomAccess)  
{  
...  
        state->nKeys = RelationGetNumberOfAttributes(indexRel);  
  
        TRACE_POSTGRESQL_SORT_START(CLUSTER_SORT,  
                                                                false,  /* no unique check */  
                                                                state->nKeys,  
                                                                workMem,  
                                                                randomAccess);  
...  
Tuplesortstate *  
tuplesort_begin_index_btree(Relation heapRel,  
                                                        Relation indexRel,  
                                                        bool enforceUnique,  
                                                        int workMem, bool randomAccess)  
{  
...  
        state->nKeys = RelationGetNumberOfAttributes(indexRel);  
  
        TRACE_POSTGRESQL_SORT_START(INDEX_SORT,  
                                                                enforceUnique,  
                                                                state->nKeys,  
                                                                workMem,  
                                                                randomAccess);  
...  
Tuplesortstate *  
tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,  
                                          bool nullsFirstFlag,  
                                          int workMem, bool randomAccess)  
{  
...  
        state->nKeys = 1;                       /* always a one-column sort */  
  
        TRACE_POSTGRESQL_SORT_START(DATUM_SORT,  
                                                                false,  /* no unique check */  
                                                                1,  
                                                                workMem,  
                                                                randomAccess);  
...  
/*  
 * tuplesort_end  
 *  
 *      Release resources and clean up.  
 *  
 * NOTE: after calling this, any pointers returned by tuplesort_getXXX are  
 * pointing to garbage.  Be careful not to attempt to use or free such  
 * pointers afterwards!  
 */  
void  
tuplesort_end(Tuplesortstate *state)  
{  
...  
#ifdef TRACE_SORT  
        if (trace_sort)  
        {  
                if (state->tapeset)  
                        elog(LOG, "external sort ended, %ld disk blocks used: %s",  
                                 spaceUsed, pg_rusage_show(&state->ru_start));  
                else  
                        elog(LOG, "internal sort ended, %ld KB used: %s",  
                                 spaceUsed, pg_rusage_show(&state->ru_start));  
        }  
  
        TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);  
#else  
  
        /*  
         * If you disabled TRACE_SORT, you can still probe sort__done, but you  
         * ain't getting space-used stats.  
         */  
        TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, 0L);  
排序类型定义  
src/backend/utils/sort/tuplesort.c  
/* sort-type codes for sort__start probes */  
#define HEAP_SORT               0  
#define INDEX_SORT              1  
#define DATUM_SORT              2  
#define CLUSTER_SORT    3  

Flag Counter

digoal’s 大量PostgreSQL文章入口