PostgreSQL 时间点恢复(PITR)时查找wal record的顺序 - loop(pg_wal, restore_command, stream)

8 minute read

背景

PostgreSQL recovery时,如何获取需要的wal record呢?

流程

PostgreSQL recovery时,可以从三个地方获取wal record

1、pg_wal 目录

2、recovery.conf中配置的restore_command

3、recovery.conf中配置的stream replication

优先从1开始,如果找不到则返回FALSE,接下来去RESTORE_COMMAND中找,最后是stream。然后再次循环从1开始找。

代码如下

static bool  
  
typedef enum  
{  
        XLOG_FROM_ANY = 0,                      /* request to read WAL from any source */  
        XLOG_FROM_ARCHIVE,                      /* restored using restore_command */  
        XLOG_FROM_PG_WAL,                       /* existing file in pg_wal */  
        XLOG_FROM_STREAM                        /* streamed from master */  
} XLogSource;  

src/backend/access/transam/xlog.c

/*  
 * Open the WAL segment containing WAL location 'RecPtr'.  
 *  
 * The segment can be fetched via restore_command, or via walreceiver having  
 * streamed the record, or it can already be present in pg_wal. Checking  
 * pg_wal is mainly for crash recovery, but it will be polled in standby mode  
 * too, in case someone copies a new segment directly to pg_wal. That is not  
 * documented or recommended, though.  
 *  
 * If 'fetching_ckpt' is true, we're fetching a checkpoint record, and should  
 * prepare to read WAL starting from RedoStartLSN after this.  
 *  
 * 'RecPtr' might not point to the beginning of the record we're interested  
 * in, it might also point to the page or segment header. In that case,  
 * 'tliRecPtr' is the position of the WAL record we're interested in. It is  
 * used to decide which timeline to stream the requested WAL from.  
 *  
 * If the record is not immediately available, the function returns false  
 * if we're not in standby mode. In standby mode, waits for it to become  
 * available.  
 *  
 * When the requested record becomes available, the function opens the file  
 * containing it (if not open already), and returns true. When end of standby  
 * mode is triggered by the user, and there is no more WAL available, returns  
 * false.  
 */  
static bool  
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,  
                                                        bool fetching_ckpt, XLogRecPtr tliRecPtr)  
{  
        static TimestampTz last_fail_time = 0;  
        TimestampTz now;  
        bool            streaming_reply_sent = false;  
  
        /*-------  
         * Standby mode is implemented by a state machine:  
         *  
         * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just  
         *        pg_wal (XLOG_FROM_PG_WAL)  
         * 2. Check trigger file  
         * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)  
         * 4. Rescan timelines  
         * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.  
         *  
         * Failure to read from the current source advances the state machine to  
         * the next state.  
         *  
         * 'currentSource' indicates the current state. There are no currentSource  
         * values for "check trigger", "rescan timelines", and "sleep" states,  
         * those actions are taken when reading from the previous source fails, as  
         * part of advancing to the next state.  
         *-------  
         */  
...........  
  
  
        for (;;)  
        {  
                int                     oldSource = currentSource;  
  
                /*  
                 * First check if we failed to read from the current source, and  
                 * advance the state machine if so. The failure to read might've  
                 * happened outside this function, e.g when a CRC check fails on a  
                 * record, or within this loop.  
                 */  
                if (lastSourceFailed)  
                {  
                        switch (currentSource)  
                        {  
                                case XLOG_FROM_ARCHIVE:  
                                case XLOG_FROM_PG_WAL:  
  
                                        /*  
                                         * Check to see if the trigger file exists. Note that we  
                                         * do this only after failure, so when you create the  
                                         * trigger file, we still finish replaying as much as we  
                                         * can from archive and pg_wal before failover.  
                                         */  
                                        if (StandbyMode && CheckForStandbyTrigger())  
                                        {  
                                                ShutdownWalRcv();  
                                                return false;  
                                        }  
  
                                        /*  
                                         * Not in standby mode, and we've now tried the archive  
                                         * and pg_wal.  
                                         */  
                                        if (!StandbyMode)  
                                                return false;  
  
                                        /*  
                                         * If primary_conninfo is set, launch walreceiver to try  
                                         * to stream the missing WAL.  
                                         *  
                                         * If fetching_ckpt is true, RecPtr points to the initial  
                                         * checkpoint location. In that case, we use RedoStartLSN  
                                         * as the streaming start position instead of RecPtr, so  
                                         * that when we later jump backwards to start redo at  
                                         * RedoStartLSN, we will have the logs streamed already.  
                                         */  
                                        if (PrimaryConnInfo)  
                                        {  
                                                XLogRecPtr      ptr;  
                                                TimeLineID      tli;  
  
                                                if (fetching_ckpt)  
                                                {  
                                                        ptr = RedoStartLSN;  
                                                        tli = ControlFile->checkPointCopy.ThisTimeLineID;  
                                                }  
                                                else  
                                                {  
                                                        ptr = tliRecPtr;  
                                                        tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);  
  
                                                        if (curFileTLI > 0 && tli < curFileTLI)  
                                                                elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",  
                                                                         (uint32) (ptr >> 32), (uint32) ptr,  
                                                                         tli, curFileTLI);  
                                                }  
                                                curFileTLI = tli;  
                                                RequestXLogStreaming(tli, ptr, PrimaryConnInfo,  
                                                                                         PrimarySlotName);  
                                                receivedUpto = 0;  
                                        }  
  
                                        /*  
                                         * Move to XLOG_FROM_STREAM state in either case. We'll  
                                         * get immediate failure if we didn't launch walreceiver,  
                                         * and move on to the next state.  
                                         */  
                                        currentSource = XLOG_FROM_STREAM;  
                                        break;  
  
                                case XLOG_FROM_STREAM:  
  
                                        /*  
                                         * Failure while streaming. Most likely, we got here  
                                         * because streaming replication was terminated, or  
                                         * promotion was triggered. But we also get here if we  
                                         * find an invalid record in the WAL streamed from master,  
                                         * in which case something is seriously wrong. There's  
                                         * little chance that the problem will just go away, but  
                                         * PANIC is not good for availability either, especially  
                                         * in hot standby mode. So, we treat that the same as  
                                         * disconnection, and retry from archive/pg_wal again. The  
                                         * WAL in the archive should be identical to what was  
                                         * streamed, so it's unlikely that it helps, but one can  
                                         * hope...  
                                         */  
  
                                        /*  
                                         * Before we leave XLOG_FROM_STREAM state, make sure that  
                                         * walreceiver is not active, so that it won't overwrite  
                                         * WAL that we restore from archive.  
                                         */  
                                        if (WalRcvStreaming())  
                                                ShutdownWalRcv();  
  
                                        /*  
                                         * Before we sleep, re-scan for possible new timelines if  
                                         * we were requested to recover to the latest timeline.  
                                         */  
                                        if (recoveryTargetIsLatest)  
  
                                        {  
                                                if (rescanLatestTimeLine())  
                                                {  
                                                        currentSource = XLOG_FROM_ARCHIVE;  
                                                        break;  
                                                }  
                                        }  
  
                                        /*  
                                         * XLOG_FROM_STREAM is the last state in our state  
                                         * machine, so we've exhausted all the options for  
                                         * obtaining the requested WAL. We're going to loop back  
                                         * and retry from the archive, but if it hasn't been long  
                                         * since last attempt, sleep wal_retrieve_retry_interval  
                                         * milliseconds to avoid busy-waiting.  
                                         */  
                                        now = GetCurrentTimestamp();  
                                        if (!TimestampDifferenceExceeds(last_fail_time, now,  
                                                                                                        wal_retrieve_retry_interval))  
                                        {  
                                                long            secs,  
                                                                        wait_time;  
                                                int                     usecs;  
  
                                                TimestampDifference(last_fail_time, now, &secs, &usecs);  
                                                wait_time = wal_retrieve_retry_interval -  
                                                        (secs * 1000 + usecs / 1000);  
  
                                                WaitLatch(&XLogCtl->recoveryWakeupLatch,  
                                                                  WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,  
                                                                  wait_time, WAIT_EVENT_RECOVERY_WAL_STREAM);  
                                                ResetLatch(&XLogCtl->recoveryWakeupLatch);  
                                                now = GetCurrentTimestamp();  
                                        }  
                                        last_fail_time = now;  
                                        currentSource = XLOG_FROM_ARCHIVE;  
                                        break;  
  
                                default:  
                                        elog(ERROR, "unexpected WAL source %d", currentSource);  
                        }  
                }  
                else if (currentSource == XLOG_FROM_PG_WAL)  
                {  
                        /*  
                         * We just successfully read a file in pg_wal. We prefer files in  
                         * the archive over ones in pg_wal, so try the next file again  
                         * from the archive first.  
                         */  
                        if (InArchiveRecovery)  
                                currentSource = XLOG_FROM_ARCHIVE;  
                }  
  
                if (currentSource != oldSource)  
                        elog(DEBUG2, "switched WAL source from %s to %s after %s",  
                                 xlogSourceNames[oldSource], xlogSourceNames[currentSource],  
                                 lastSourceFailed ? "failure" : "success");  
  
                /*  
                 * We've now handled possible failure. Try to read from the chosen  
                 * source.  
                 */  
                lastSourceFailed = false;  
  
                switch (currentSource)  
                {  
                        case XLOG_FROM_ARCHIVE:  
                        case XLOG_FROM_PG_WAL:  
                                /* Close any old file we might have open. */  
                                if (readFile >= 0)  
                                {  
                                        close(readFile);  
                                        readFile = -1;  
                                }  
                                /* Reset curFileTLI if random fetch. */  
                                if (randAccess)  
                                        curFileTLI = 0;  
  
                                /*  
                                 * Try to restore the file from archive, or read an existing  
                                 * file from pg_wal.  
                                 */  
                                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,  
                                                                                          currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :  
                                                                                          currentSource);  
                                if (readFile >= 0)  
                                        return true;    /* success! */  
  
                                /*  
                                 * Nope, not found in archive or pg_wal.  
                                 */  
                                lastSourceFailed = true;  
                                break;  
  
                        case XLOG_FROM_STREAM:  
                                {  
                                        bool            havedata;  
  
                                        /*  
                                         * Check if WAL receiver is still active.  
                                         */  
                                        if (!WalRcvStreaming())  
                                        {  
                                                lastSourceFailed = true;  
                                                break;  
                                        }  
  
                                        /*  
                                         * Walreceiver is active, so see if new data has arrived.  
                                         *  
                                         * We only advance XLogReceiptTime when we obtain fresh  
                                         * WAL from walreceiver and observe that we had already  
                                         * processed everything before the most recent "chunk"  
                                         * that it flushed to disk.  In steady state where we are  
                                         * keeping up with the incoming data, XLogReceiptTime will  
                                         * be updated on each cycle. When we are behind,  
                                         * XLogReceiptTime will not advance, so the grace time  
                                         * allotted to conflicting queries will decrease.  
                                         */  
                                        if (RecPtr < receivedUpto)  
                                                havedata = true;  
                                        else  
                                        {  
                                                XLogRecPtr      latestChunkStart;  
  
                                                receivedUpto = GetWalRcvWriteRecPtr(&latestChunkStart, &receiveTLI);  
                                                if (RecPtr < receivedUpto && receiveTLI == curFileTLI)  
                                                {  
                                                        havedata = true;  
                                                        if (latestChunkStart <= RecPtr)  
                                                        {  
                                                                XLogReceiptTime = GetCurrentTimestamp();  
                                                                SetCurrentChunkStartTime(XLogReceiptTime);  
                                                        }  
                                                }  
                                                else  
                                                        havedata = false;  
                                        }  
                                        if (havedata)  
                                        {  
                                                /*  
                                                 * Great, streamed far enough.  Open the file if it's  
                                                 * not open already.  Also read the timeline history  
                                                 * file if we haven't initialized timeline history  
                                                 * yet; it should be streamed over and present in  
                                                 * pg_wal by now.  Use XLOG_FROM_STREAM so that source  
                                                 * info is set correctly and XLogReceiptTime isn't  
                                                 * changed.  
                                                 */  
                                                if (readFile < 0)  
                                                {  
                                                        if (!expectedTLEs)  
                                                                expectedTLEs = readTimeLineHistory(receiveTLI);  
                                                        readFile = XLogFileRead(readSegNo, PANIC,  
                                                                                                        receiveTLI,  
                                                                                                        XLOG_FROM_STREAM, false);  
                                                        Assert(readFile >= 0);  
                                                }  
                                                else  
                                                {  
                                                        /* just make sure source info is correct... */  
                                                        readSource = XLOG_FROM_STREAM;  
                                                        XLogReceiptSource = XLOG_FROM_STREAM;  
                                                        return true;  
                                                }  
                                                break;  
                                        }  
  
                                        /*  
                                         * Data not here yet. Check for trigger, then wait for  
                                         * walreceiver to wake us up when new WAL arrives.  
                                         */  
                                        if (CheckForStandbyTrigger())  
                                        {  
                                                /*  
                                                 * Note that we don't "return false" immediately here.  
                                                 * After being triggered, we still want to replay all  
                                                 * the WAL that was already streamed. It's in pg_wal  
                                                 * now, so we just treat this as a failure, and the  
                                                 * state machine will move on to replay the streamed  
                                                 * WAL from pg_wal, and then recheck the trigger and  
                                                 * exit replay.  
                                                 */  
                                                lastSourceFailed = true;  
                                                break;  
                                        }  
  
                                        /*  
                                         * Since we have replayed everything we have received so  
                                         * far and are about to start waiting for more WAL, let's  
                                         * tell the upstream server our replay location now so  
                                         * that pg_stat_replication doesn't show stale  
                                         * information.  
                                         */  
                                        if (!streaming_reply_sent)  
                                        {  
                                                WalRcvForceReply();  
                                                streaming_reply_sent = true;  
                                        }  
  
                                        /*  
                                         * Wait for more WAL to arrive. Time out after 5 seconds  
                                         * to react to a trigger file promptly.  
                                         */  
                                        WaitLatch(&XLogCtl->recoveryWakeupLatch,  
                                                          WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,  
                                                          5000L, WAIT_EVENT_RECOVERY_WAL_ALL);  
                                        ResetLatch(&XLogCtl->recoveryWakeupLatch);  
                                        break;  
                                }  
  
                        default:  
                                elog(ERROR, "unexpected WAL source %d", currentSource);  
                }  
  
                /*  
                 * This possibly-long loop needs to handle interrupts of startup  
                 * process.  
                 */  
                HandleStartupProcInterrupts();  
        }  
  
        return false;                           /* not reached */  

这里决定XLOG从哪里获取,会在pg_wal目录, resotre_command, stream三个SOURCE之间轮询,失败跳到下一个SOURCE获取WAL。

参考

src/backend/access/transam/xlog.c

recovery.conf

#restore_command = ''           # e.g. 'cp /mnt/server/archivedir/%f %p'  
  
#primary_conninfo = ''          # e.g. 'host=localhost port=5432'  

Flag Counter

digoal’s 大量PostgreSQL文章入口