PostgreSQL 小改动,解决流复制遇到的pg_xlog已删除的问题(主库wal sender读归档目录文件发送给wal sender)

2 minute read

背景

PostgreSQL 通过流复制搭建STANDBY,在某些特定的场景可能会遭遇standby需要的xlog文件已经从master删除的情况。

例如

1. 备库写入XLOG的速度比主库产生XLOG的速度慢。

2. 网络传输速度比产生XLOG的速度慢。

3. 备库故障,中断时间过长。

4. 网络故障,中断时间过长。

目前的解决办法

1. 归档

主库对xlog文件进行归档,当备库需要的文件已经不在pg_xlog目录中时,备库可以通过配置restore_command,从主库的归档来获取需要的xlog文件。这种方法比较通用。

缺点

备库需要有获取归档的通道,例如能访问归档存储,或者能通过ftp, http服务获取到归档文件。如果归档是在磁带库的,要有从磁带库获取归档的路径。

2. slot

slot会记录下备库已经同步到XLOG的什么位置的信息,所以主库不会删除备库还需要的xlog文件。

缺点

因为主库不能删除未发送的XLOG,那么如果备库落后的太多,可能导致主库的空间爆满。

3. wal keep segments

设置主库最多保留的XLOG文件个数,不能彻底解决问题。

另类的解决办法,其实通过归档来解决是比较好的办法,但是有一个空缺,就是备库可能无法访问归档存储或归档服务。

这种情况下,有什么好的方法来解决呢?

首先我们要了解一个问题,主库一定是能访问归档的,或者说概率非常大。

我们以主库能访问归档为前提,当备库需要的XLOG文件已经被删除时,主库主动去归档获取归档文件,并通过流复制发送给备库。

已有的代码:

当BasicOpenFile失败时,表示$PGDATA/pg_xlog中不存在备库请求的xlog信息,返回错误信息requested WAL segment %s has already been removed”。

src/backend/replication/walsender.c

static void  
XLogRead(char *buf, XLogRecPtr startptr, Size count)  
{  
....  
                        XLogFilePath(path, curFileTimeLine, sendSegNo);  
  
                        sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);  
                        if (sendFile < 0)  
                        {  
                                /*  
                                 * If the file is not found, assume it's because the standby  
                                 * asked for a too old WAL segment that has already been  
                                 * removed or recycled.  
                                 */  
                                if (errno == ENOENT)  
                                        ereport(ERROR,  
                                                        (errcode_for_file_access(),  
                                                         errmsg("requested WAL segment %s has already been removed",  
                                                                XLogFileNameP(curFileTimeLine, sendSegNo))));  
                                else  
                                        ereport(ERROR,  
                                                        (errcode_for_file_access(),  
                                                         errmsg("could not open file \"%s\": %m",  
                                                                        path)));  
                        }  
                        sendOff = 0;  

我们假设归档目录是一个分布式文件系统例如lustre, moosefs或者其他。

归档命令是

archive_command = 'test ! -f /home/digoal/arch/%f && cp %p /home/digoal/arch/%f'  

我们需要新增一个参数, 这个参数是主库的参数。

xlog_search_dir  

通过它告诉主库归档目录在哪里。本文的归档目录是/home/digoal/arch

代码如下:

src/include/replication/walsender.h

extern PGDLLIMPORT char *xlog_search_dir;  

src/backend/utils/misc/guc.c

static struct config_string ConfigureNamesString[] =  
{  
        {  
                {"archive_command", PGC_SIGHUP, WAL_ARCHIVING,  
                        gettext_noop("Sets the shell command that will be called to archive a WAL file."),  
                        NULL  
                },  
                &XLogArchiveCommand,  
                "",  
                NULL, NULL, show_archive_command  
        },  
  
        {  
                {"xlog_search_dir", PGC_SIGHUP, WAL_ARCHIVING,  
                        gettext_noop("Sets the dir by walsender search XLOG from archive dir, when xlog cleaned in pg_xlog file which walreceiver requesting."),  
                        NULL  
                },  
                &xlog_search_dir,  
                "",  
                NULL, NULL, NULL  
        },  

另外还需要调整walsender.c的代码:

src/include/replication/walsender.h

extern PGDLLIMPORT char *xlog_search_dir;  

src/backend/replication/walsender.c

char * xlog_search_dir = NULL;  
  
static void  
XLogRead(char *buf, XLogRecPtr startptr, Size count)  
{  
...  
                if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))  
                {  
                        char            path[MAXPGPATH];  
...  
                        sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);  
                        if (sendFile < 0)  
                        {  
                                if (xlog_search_dir != NULL && xlog_search_dir[0] != '\0')  
                                {  
                                        snprintf(path, MAXPGPATH, "%s/%08X%08X%08X", xlog_search_dir, curFileTimeLine,  
                                                (uint32) ((sendSegNo) / XLogSegmentsPerXLogId),  
                                                (uint32) ((sendSegNo) % XLogSegmentsPerXLogId));  
  
                                        sendFile = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);  
                                        if (sendFile <0)  
                                         {  
                                                        /*  
                                                         * If the file is not found, assume it's because the standby  
                                                        * asked for a too old WAL segment that has already been  
                                                        * removed or recycled.  
                                                        */  
                                                if (errno == ENOENT)  
                                                        ereport(ERROR,  
                                                                (errcode_for_file_access(),  
                                                                 errmsg("requested WAL segment %s has already been removed",  
                                                                        XLogFileNameP(curFileTimeLine, sendSegNo))));  
                                                else  
                                                        ereport(ERROR,  
                                                                        (errcode_for_file_access(),  
                                                                         errmsg("could not open file \"%s\": %m",  
                                                                                path)));  
                                        }  
                                 }  
                                 else  
                                 {  
                                                        /*  
                                                         * If the file is not found, assume it's because the standby  
                                                        * asked for a too old WAL segment that has already been  
                                                        * removed or recycled.  
                                                        */  
                                                if (errno == ENOENT)  
                                                        ereport(ERROR,  
                                                                (errcode_for_file_access(),  
                                                                 errmsg("requested WAL segment %s has already been removed",  
                                                                        XLogFileNameP(curFileTimeLine, sendSegNo))));  
                                                else  
                                                        ereport(ERROR,  
                                                                        (errcode_for_file_access(),  
                                                                         errmsg("could not open file \"%s\": %m",  
                                                                                path)));  
                                 }  
                        }  
                        sendOff = 0;  
                }  
...  

重新编译,并在postgresql.conf中配置

xlog_search_dir='/home/digoal/arch'  

现在,把备库需要的已归档的XLOG从主库的pg_xlog目录删除,也不会报requested WAL segment %s has already been removed”了。

Flag Counter

digoal’s 大量PostgreSQL文章入口