How many xlogs WAL can be generated by PostgreSQL?

12 minute read

背景

PostgreSQL和Oracle类似, 都记录了redo日志, PostgreSQL里面叫做xlog或者WAL.

XLOG可以用来做基于物理文件备份的还原操作. 这一点和Oracle使用RMAN进行还原时需要用到的archive log非常类似.

这里就不阐述PostgreSQL如何做物理备份和还原了, 有兴趣的朋友可以参考一下PostgreSQL Manual, 关于PITR的章节.

下面主要是要说一下PostgreSQL到底能产生多少个XLOG文件呢? 数据库还原时又是如何来定位需要的XLOG文件的呢?

正文

首先回答第一个问题.

1. PostgreSQL到底能产生多少个XLOG文件呢?

在源码中是这么定义的, 如下, 理论上最多可以产生0xFFFFFFFFFFFFFFFFFFFFFFFF个XLOG文件. 基本上任何一个数据库都不可能达到这么多个XLOG.

(但是实际上并不是这样, 要看log和seg是怎么来的, 还有TLI什么时候会变化. 看问题2里面的描述)

src/include/access/xlog_internal.h

/*  
 * These macros encapsulate knowledge about the exact layout of XLog file  
 * names, timeline history file names, and archive-status file names.  
 */  
#define MAXFNAMELEN             64  
#define XLogFileName(fname, tli, log, seg)      \  
        snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, log, seg)  

接下来要说说第二个问题,

2. 数据库还原时又是如何来定位需要的XLOG文件的呢?

把XLOG想象为内存的话, 那么要查找内存中的数据需要寻址. 而寻址是有上限的, 比如64位的机器, 寻址的上限是2 ^ 64. 字节.

同样, XLOG的寻址也有上限, 它是这么定义的.

src/include/access/xlogdefs.h

00017 /*  
00018  * Pointer to a location in the XLOG.  These pointers are 64 bits wide,  
00019  * because we don't want them ever to overflow.  
00020  */  
00021 typedef uint64 XLogRecPtr;  

uint64是无符号64位整型. 定义如下 :

src/include/c.h

/*  
 * 64-bit integers  
 */  
#ifdef HAVE_LONG_INT_64  
/* Plain "long int" fits, use it */  
  
#ifndef HAVE_INT64  
typedef long int int64;  
#endif  
#ifndef HAVE_UINT64  
typedef unsigned long int uint64;  
#endif  
#elif defined(HAVE_LONG_LONG_INT_64)  
/* We have working support for "long long int", use that */  
  
#ifndef HAVE_INT64  
typedef long long int int64;  
#endif  
#ifndef HAVE_UINT64  
typedef unsigned long long int uint64;  
#endif  
#else  
/* neither HAVE_LONG_INT_64 nor HAVE_LONG_LONG_INT_64 */  
#error must have a working 64-bit integer datatype  
#endif  

也就是说, xlog的寻址上限是 2^64 和 64位机器的内存寻址空间一样. 那么可以计算出最多能产生多少个XLOG文件.

默认的XLOG是16MB一个, 如下是编译的时候的默认值.

  --with-wal-segsize=SEGSIZE  
                          set WAL segment size in MB [16]  
(2^64) /16/1024/1024 = 1099511627776 (0x10000000000) 个XLOG文件.   

这个数字显然和XLOG文件名能够容纳的xlog文件个数有较大差别, 文件名中包含了TLI, LOG, SEG三部分信息. 如下 :

src/include/access/xlog_internal.h

00108 /*  
00109  * The XLOG is split into WAL segments (physical files) of the size indicated  
00110  * by XLOG_SEG_SIZE.  
00111  */  
00112 #define XLogSegSize     ((uint32) XLOG_SEG_SIZE)  
00113 #define XLogSegmentsPerXLogId   (UINT64CONST(0x100000000) / XLOG_SEG_SIZE)  
...  
#define XLogFileName (   fname,  
  tli,  
  logSegNo    
)     
Value:  
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,       \  
             (uint32) ((logSegNo) / XLogSegmentsPerXLogId), \  
             (uint32) ((logSegNo) % XLogSegmentsPerXLogId))  

src/include/c.h

#define UINT64CONST(x) ((uint64) x)  

src/include/access/xlogdefs.h

/*  
 * TimeLineID (TLI) - identifies different database histories to prevent  
 * confusion after restoring a prior state of a database installation.  
 * TLI does not change in a normal stop/restart of the database (including  
 * crash-and-recover cases); but we must assign a new TLI after doing  
 * a recovery to a prior state, a/k/a point-in-time recovery.  This makes  
 * the new WAL logfile sequence we generate distinguishable from the  
 * sequence that was generated in the previous incarnation.  
 */  
typedef uint32 TimeLineID;  

如果取XLOG_SEG_SIZE=16MB, 那么

XLogSegmentsPerXLogId = UINT64CONST(0x100000000) / 16MB = 256  

因此可以得出xlog文件名中的最后两个部分LOG和SEG的最大值分别可以 :

LOG = (uint32) ((logSegNo) / XLogSegmentsPerXLogId) = 2^32  
SEG = (uint32) ((logSegNo) % XLogSegmentsPerXLogId)) = 255  

也就是说在同一个时间线(TLI)里面, 当XLOG_SEG_SIZE=16MB时. 最多可以产生 2^32 * 256 个 XLOG 文件.

这样的话, 如果我们的数据库平均1天产生10TB的XLOG数据量, 那么需要

 ((2^32 * 256) *16 * 1024 * 1024) / (10*1024*1024*1024*1024) = 1677721天 = 4712年.   

才能达到XLOG的限制. 这在目前来看也是足够用的.

跟进这个规则,可以推算出当前数据库已经经历了多少个xlog文件. 例如 :

digoal@db-192-168-xxx-xxx-> cd $PGDATA/pg_xlog  
digoal@db-192-168-xxx-xxx-> ll | tail -n 5  
-rw------- 1 digoal digoal 64M Oct 14 13:43 000000020000180F0000000A  
-rw------- 1 digoal digoal 64M Oct 14 13:36 000000020000180F0000000B  
-rw------- 1 digoal digoal 64M Oct 14 13:30 000000020000180F0000000C  
-rw------- 1 digoal digoal 56 Jun 17  2011 00000002.history  
drwx------ 2 digoal digoal 2.0K Oct 15 09:08 archive_status  

这里的xlog文件时64MB的, 所以 :

XLogSegmentsPerXLogId = UINT64CONST(0x100000000) / 64MB = 64 .    

所以SEG最大值是63转换成16进制就是3F. 从xlog文件的SEG段就能看出规律 .

注意以上代码取自PostgreSQL 9.3 devel. 低于9.3的版本, SEG最大值还要减1 .

从xlog的文件名000000020000180F0000000C读到3个信息,

时间线: 00000002  
  
LOG: 0000180F  
  
SEG: 0000000C  

共经历(0x180F * 64) + 0xc = 394176 + 12 = 394188 个xlog文件.

折合 394188 * 64MB = 24TB .

低于9.3 devel的版本, XLogFileName代码如下 :

/*  
 * These macros encapsulate knowledge about the exact layout of XLog file  
 * names, timeline history file names, and archive-status file names.  
 */  
#define MAXFNAMELEN             64  
  
#define XLogFileName(fname, tli, log, seg)      \  
        snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, log, seg)  

下面通过几个例子来验证一下前面的限制情况.

1. 控制文件, 在控制文件中记录了XLogRecPtr的信息, 分别是

Latest checkpoint location  
Prior checkpoint location  
Latest checkpoint's REDO location  
Minimum recovery ending location  
Backup start location  

以上5个location都是XLogRecPtr类型的数据. 源码如下 :

src/include/catalog/pg_control.h

/*  
 * Body of CheckPoint XLOG records.  This is declared here because we keep  
 * a copy of the latest one in pg_control for possible disaster recovery.  
 * Changing this struct requires a PG_CONTROL_VERSION bump.  
 */  
typedef struct CheckPoint  
{  
        XLogRecPtr      redo;                   /* next RecPtr available when we began to  
                                                                 * create CheckPoint (i.e. REDO start point) */  
.....  
  
/*  
 * Contents of pg_control.  
 *  
 * NOTE: try to keep this under 512 bytes so that it will fit on one physical  
 * sector of typical disk drives.  This reduces the odds of corruption due to  
 * power failure midway through a write.  
 */  
  
typedef struct ControlFileData  
{  
.....  
        XLogRecPtr      checkPoint;             /* last check point record ptr */  
        XLogRecPtr      prevCheckPoint; /* previous check point record ptr */  
.....  
        /*  
         * These two values determine the minimum point we must recover up to  
         * before starting up:  
         *  
         * minRecoveryPoint is updated to the latest replayed LSN whenever we  
         * flush a data change during archive recovery. That guards against  
         * starting archive recovery, aborting it, and restarting with an earlier  
         * stop location. If we've already flushed data changes from WAL record X  
         * to disk, we mustn't start up until we reach X again. Zero when not  
         * doing archive recovery.  
         *  
         * backupStartPoint is the redo pointer of the backup start checkpoint, if  
         * we are recovering from an online backup and haven't reached the end of  
         * backup yet. It is reset to zero when the end of backup is reached, and  
         * we mustn't start up before that. A boolean would suffice otherwise, but  
         * we use the redo pointer as a cross-check when we see an end-of-backup  
         * record, to make sure the end-of-backup record corresponds the base  
         * backup we're recovering from.  
         *  
         * backupEndPoint is the backup end location, if we are recovering from an  
         * online backup which was taken from the standby and haven't reached the  
         * end of backup yet. It is initialized to the minimum recovery point in  
         * pg_control which was backed up last. It is reset to zero when the end  
         * of backup is reached, and we mustn't start up before that.  
         *  
         * If backupEndRequired is true, we know for sure that we're restoring  
         * from a backup, and must see a backup-end record before we can safely  
         * start up. If it's false, but backupStartPoint is set, a backup_label  
         * file was found at startup but it may have been a leftover from a stray  
         * pg_start_backup() call, not accompanied by pg_stop_backup().  
         */  
        XLogRecPtr      minRecoveryPoint;  
        XLogRecPtr      backupStartPoint;  
        XLogRecPtr      backupEndPoint;  

下面使用pg_resetxlog修改控制文件中TLI,LOG,SEG的值, 使它到达XLOG的上限, 再次切换日志, 看看会发生什么?

首先查看当前数据库的WAL segments的SIZE :

pg_controldata  
Bytes per WAL segment:                16777216  

所以SEG最大可以到255. 接下来就使用pg_resetxlog来修改控制文件,

digoal@db-172-16-3-33-> pg_resetxlog -f -l 4294967295,4294967295,254 $PGDATA  
Transaction log reset  

启动数据库

digoal@db-172-16-3-33-> pg_ctl start  
server starting  
  
digoal@db-172-16-3-33-> LOG:  could not create IPv6 socket: Address family not supported by protocol  
LOG:  database system was shut down at 2012-10-14 16:13:21 CST  
LOG:  autovacuum launcher started  
LOG:  database system is ready to accept connections  

执行CHECKPOINT

digoal@db-172-16-3-33-> psql postgres postgres  
psql (9.1.3)  
Type "help" for help.  
postgres=# checkpoint;  
CHECKPOINT  
postgres=# \q  

查看控制文件, Latest checkpoint’s TimeLineID达到最大4294967295. LOG也达到了最大0xFFFFFFFE, SEG=254(离最大只差1).

digoal@db-172-16-3-33-> pg_controldata   
pg_control version number:            903  
Catalog version number:               201105231  
Database system identifier:           5758179757924416507  
Database cluster state:               in production  
pg_control last modified:             Sun 14 Oct 2012 04:13:44 PM CST  
Latest checkpoint location:           FFFFFFFF/FE000078  
Prior checkpoint location:            FFFFFFFF/FE000020  
Latest checkpoint's REDO location:    FFFFFFFF/FE000078  
Latest checkpoint's TimeLineID:       4294967295  
Latest checkpoint's NextXID:          0/1857  
Latest checkpoint's NextOID:          33060  
Latest checkpoint's NextMultiXactId:  1  
Latest checkpoint's NextMultiOffset:  0  
Latest checkpoint's oldestXID:        1670  
Latest checkpoint's oldestXID's DB:   1  
Latest checkpoint's oldestActiveXID:  0  
Time of latest checkpoint:            Sun 14 Oct 2012 04:13:44 PM CST  
Minimum recovery ending location:     0/0  
Backup start location:                0/0  
Current wal_level setting:            minimal  
Current max_connections setting:      100  
Current max_prepared_xacts setting:   0  
Current max_locks_per_xact setting:   64  
Maximum data alignment:               8  
Database block size:                  8192  
Blocks per segment of large relation: 131072  
WAL block size:                       16384  
Bytes per WAL segment:                16777216  
Maximum length of identifiers:        64  
Maximum columns in an index:          32  
Maximum size of a TOAST chunk:        1996  
Date/time type storage:               64-bit integers  
Float4 argument passing:              by value  
Float8 argument passing:              by value  

进入数据库, 执行切换日志的操作 :

digoal@db-172-16-3-33-> psql postgres postgres  
psql (9.1.3)  
Type "help" for help.  
  
-- 第一次切换日志成功  
postgres=# select pg_switch_xlog();  
  pg_switch_xlog     
-------------------  
 FFFFFFFF/FE0000F0  
(1 row)  
postgres=# \set VERBOSITY verbose  
postgres=# checkpoint;  
CHECKPOINT  
  
-- 第二次切换失败, 因为已经到达上限.  
postgres=# select pg_switch_xlog();  
ERROR:  xlog flush request FFFFFFFF/FF000148 is not satisfied --- flushed only to FFFFFFFF/20  
STATEMENT:  select pg_switch_xlog();  
PANIC:  xlog write request FFFFFFFF/4000 is past end of log 0/0  
WARNING:  AbortTransaction while in COMMIT state  
WARNING:  01000: AbortTransaction while in COMMIT state  
LOCATION:  AbortTransaction, xact.c:2236  
ERROR:  XX000: xlog flush request FFFFFFFF/FF000148 is not satisfied --- flushed only to FFFFFFFF/20  
LOCATION:  XLogFlush, xlog.c:2145  
postgres=# LOG:  WAL writer process (PID 14966) was terminated by signal 6: Aborted  
LOG:  terminating any other active server processes  
WARNING:  terminating connection because of crash of another server process  
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.  
HINT:  In a moment you should be able to reconnect to the database and repeat your command.  
WARNING:  terminating connection because of crash of another server process  
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.  
HINT:  In a moment you should be able to reconnect to the database and repeat your command.  
LOG:  all server processes terminated; reinitializing  
LOG:  database system was interrupted; last known up at 2012-10-14 16:20:24 CST  
LOG:  database system was not properly shut down; automatic recovery in progress  
LOG:  could not open file "pg_xlog/FFFFFFFF0000000000000000" (log file 0, segment 0): No such file or directory  
LOG:  redo is not required  
LOG:  autovacuum launcher started  
LOG:  database system is ready to accept connections  

这些错误和TLI的值都没有关系, 时间线是在恢复结束后打开数据库时才会用到的, 自增长的一个值. 用来区分数据库是否是从备份中恢复过来的.

XLOG文件

digoal@db-172-16-3-33-> pwd  
/home/digoal/pg_root/pg_xlog  
digoal@db-172-16-3-33-> ll  
total 33M  
drwx------ 2 digoal digoal 4.0K Jun 26 11:18 archive_status  
-rw------- 1 digoal digoal  16M Oct 14 16:20 FFFFFFFF0000000000000000  
-rw------- 1 digoal digoal  16M Oct 14 16:20 FFFFFFFFFFFFFFFF000000FF  
显然, 文件名被rotate了( FFFFFFFF 000000FF -> 00000000 00000000 ).

2. 系统函数

在PostgreSQL中有一些系统函数

可以用来查看当前插入或者写入的XLogRecPtr,

STANDBY库上查看接收到的XLOG的XLogRecPtr或者是

已经APPLY的XLOG的XLogRecPtr.

又或者是比较两个XLogRecPtr之间的数据量.

pg_current_xlog_insert_location()  
pg_current_xlog_location()  
pg_start_backup(label text [, fast boolean ])  
pg_stop_backup()  
pg_xlogfile_name(location text)  
pg_xlog_location_diff(location text, location text)  

源码如下 :

src/backend/access/transam/xlogfuncs.c

pg_current_xlog_insert_location()

00192 /*  
00193  * Report the current WAL insert location (same format as pg_start_backup etc)  
00194  *  
00195  * This function is mostly for debugging purposes.  
00196  */  
00197 Datum  
00198 pg_current_xlog_insert_location(PG_FUNCTION_ARGS)  
00199 {  
00200     XLogRecPtr  current_recptr;  
00201     char        location[MAXFNAMELEN];  
00202   
00203     if (RecoveryInProgress())  
00204         ereport(ERROR,  
00205                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
00206                  errmsg("recovery is in progress"),  
00207                  errhint("WAL control functions cannot be executed during recovery.")));  
00208   
00209     current_recptr = GetXLogInsertRecPtr();  
00210   
00211     snprintf(location, sizeof(location), "%X/%X",  
00212              (uint32) (current_recptr >> 32), (uint32) current_recptr);  
00213     PG_RETURN_TEXT_P(cstring_to_text(location));  
00214 }  

pg_current_xlog_location()

/*  
00167  * Report the current WAL write location (same format as pg_start_backup etc)  
00168  *  
00169  * This is useful for determining how much of WAL is visible to an external  
00170  * archiving process.  Note that the data before this point is written out  
00171  * to the kernel, but is not necessarily synced to disk.  
00172  */  
00173 Datum  
00174 pg_current_xlog_location(PG_FUNCTION_ARGS)  
00175 {  
00176     XLogRecPtr  current_recptr;  
00177     char        location[MAXFNAMELEN];  
00178   
00179     if (RecoveryInProgress())  
00180         ereport(ERROR,  
00181                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
00182                  errmsg("recovery is in progress"),  
00183                  errhint("WAL control functions cannot be executed during recovery.")));  
00184   
00185     current_recptr = GetXLogWriteRecPtr();  
00186   
00187     snprintf(location, sizeof(location), "%X/%X",  
00188              (uint32) (current_recptr >> 32), (uint32) current_recptr);  
00189     PG_RETURN_TEXT_P(cstring_to_text(location));  
00190 }  

pg_start_backup(label text [, fast boolean ])

00038 /*  
00039  * pg_start_backup: set up for taking an on-line backup dump  
00040  *  
00041  * Essentially what this does is to create a backup label file in $PGDATA,  
00042  * where it will be archived as part of the backup dump.  The label file  
00043  * contains the user-supplied label string (typically this would be used  
00044  * to tell where the backup dump will be stored) and the starting time and  
00045  * starting WAL location for the dump.  
00046  */  
00047 Datum  
00048 pg_start_backup(PG_FUNCTION_ARGS)  
00049 {  
00050     text       *backupid = PG_GETARG_TEXT_P(0);  
00051     bool        fast = PG_GETARG_BOOL(1);  
00052     char       *backupidstr;  
00053     XLogRecPtr  startpoint;  
00054     char        startxlogstr[MAXFNAMELEN];  
00055   
00056     backupidstr = text_to_cstring(backupid);  
00057   
00058     startpoint = do_pg_start_backup(backupidstr, fast, NULL);  
00059   
00060     snprintf(startxlogstr, sizeof(startxlogstr), "%X/%X",  
00061              (uint32) (startpoint >> 32), (uint32) startpoint);  
00062     PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));  
00063 }  

pg_stop_backup()

00065 /*  
00066  * pg_stop_backup: finish taking an on-line backup dump  
00067  *  
00068  * We write an end-of-backup WAL record, and remove the backup label file  
00069  * created by pg_start_backup, creating a backup history file in pg_xlog  
00070  * instead (whence it will immediately be archived). The backup history file  
00071  * contains the same info found in the label file, plus the backup-end time  
00072  * and WAL location. Before 9.0, the backup-end time was read from the backup  
00073  * history file at the beginning of archive recovery, but we now use the WAL  
00074  * record for that and the file is for informational and debug purposes only.  
00075  *  
00076  * Note: different from CancelBackup which just cancels online backup mode.  
00077  */  
00078 Datum  
00079 pg_stop_backup(PG_FUNCTION_ARGS)  
00080 {  
00081     XLogRecPtr  stoppoint;  
00082     char        stopxlogstr[MAXFNAMELEN];  
00083   
00084     stoppoint = do_pg_stop_backup(NULL, true);  
00085   
00086     snprintf(stopxlogstr, sizeof(stopxlogstr), "%X/%X",  
00087              (uint32) (stoppoint >> 32), (uint32) stoppoint);  
00088     PG_RETURN_TEXT_P(cstring_to_text(stopxlogstr));  
00089 }  

pg_xlogfile_name(location text)

/*  
00345  * Compute an xlog file name given a WAL location,  
00346  * such as is returned by pg_stop_backup() or pg_xlog_switch().  
00347  */  
00348 Datum  
00349 pg_xlogfile_name(PG_FUNCTION_ARGS)  
00350 {  
00351     text       *location = PG_GETARG_TEXT_P(0);  
00352     char       *locationstr;  
00353     uint32      hi,  
00354                 lo;  
00355     XLogSegNo   xlogsegno;  
00356     XLogRecPtr  locationpoint;  
00357     char        xlogfilename[MAXFNAMELEN];  
00358   
00359     if (RecoveryInProgress())  
00360         ereport(ERROR,  
00361                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
00362                  errmsg("recovery is in progress"),  
00363          errhint("pg_xlogfile_name() cannot be executed during recovery.")));  
00364   
00365     locationstr = text_to_cstring(location);  
00366   
00367     validate_xlog_location(locationstr);  
00368   
00369     if (sscanf(locationstr, "%X/%X", &hi, &lo) != 2)  
00370         ereport(ERROR,  
00371                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),  
00372                  errmsg("could not parse transaction log location \"%s\"",  
00373                         locationstr)));  
00374     locationpoint = ((uint64) hi) << 32 | lo;  
00375   
00376     XLByteToPrevSeg(locationpoint, xlogsegno);  
00377     XLogFileName(xlogfilename, ThisTimeLineID, xlogsegno);  
00378   
00379     PG_RETURN_TEXT_P(cstring_to_text(xlogfilename));  
00380 }  

pg_xlog_location_diff(location text, location text)

00499 /*  
00500  * Compute the difference in bytes between two WAL locations.  
00501  */  
00502 Datum  
00503 pg_xlog_location_diff(PG_FUNCTION_ARGS)  
00504 {  
00505     text       *location1 = PG_GETARG_TEXT_P(0);  
00506     text       *location2 = PG_GETARG_TEXT_P(1);  
00507     char       *str1,  
00508                *str2;  
00509     XLogRecPtr  loc1,  
00510                 loc2;  
00511     Numeric     result;  
00512     uint64      bytes1,  
00513                 bytes2;  
00514     uint32      hi,  
00515                 lo;  
00516   
00517     /*  
00518      * Read and parse input  
00519      */  
00520     str1 = text_to_cstring(location1);  
00521     str2 = text_to_cstring(location2);  
00522   
00523     validate_xlog_location(str1);  
00524     validate_xlog_location(str2);  
00525   
00526     if (sscanf(str1, "%X/%X", &hi, &lo) != 2)  
00527         ereport(ERROR,  
00528                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),  
00529            errmsg("could not parse transaction log location \"%s\"", str1)));  
00530     loc1 = ((uint64) hi) << 32 | lo;  
00531   
00532     if (sscanf(str2, "%X/%X", &hi, &lo) != 2)  
00533         ereport(ERROR,  
00534                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),  
00535            errmsg("could not parse transaction log location \"%s\"", str2)));  
00536     loc2 = ((uint64) hi) << 32 | lo;  
00537   
00538     bytes1 = (uint64) loc1;  
00539     bytes2 = (uint64) loc2;  
00540   
00541     /*  
00542      * result = bytes1 - bytes2.  
00543      *  
00544      * XXX: this won't handle values higher than 2^63 correctly.  
00545      */  
00546     result = DatumGetNumeric(DirectFunctionCall2(numeric_sub,  
00547        DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) bytes1)),  
00548        DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) bytes2))));  
00549   
00550     PG_RETURN_NUMERIC(result);  
00551 }  

注意到, 输出的LOCATION格式如下 :

00211     snprintf(location, sizeof(location), "%X/%X",  
00212              (uint32) (current_recptr >> 32), (uint32) current_recptr);  

recptr 是 uint64的类型, 右移32位后转换成uint32相当于去uint64的高32位数字.

recptr直接转换成uint32相当于取的recptr的低32位数字,

所以输出的 :

digoal=# select pg_current_xlog_insert_location();  
 pg_current_xlog_insert_location   
---------------------------------  
 0/C1F1EA8  
(1 row)  

recptr就是 :

0x000000000C1F1EA8  
  
digoal=# select pg_xlogfile_name('0/C1F1EA8');  
     pg_xlogfile_name       
--------------------------  
 00000001000000000000000C  
(1 row)  

在这里0/C1F1EA8经过

locationpoint = ((uint64) hi) << 32 | lo;   

后还原成XLogRecPtr类型, 也就是0x000000000C1F1EA8

然后返回文件名, 还需要用到时间线ID.xlogsegno.

00376     XLByteToPrevSeg(locationpoint, xlogsegno);  
00377     XLogFileName(xlogfilename, ThisTimeLineID, xlogsegno);  
00378   
00379     PG_RETURN_TEXT_P(cstring_to_text(xlogfilename));  

Flag Counter

digoal’s 大量PostgreSQL文章入口