PostgreSQL reload配置的动作反馈与源码分析

2 minute read

背景

PostgreSQL数据库的配置文件中,有一些配置项是支持reload的,但是如果配置写错了,reload时怎么知道呢?

源码分析

reload其实是通过给postmaster进程发SIGHUP信号来实现的。

通过pg_ctl或者kill或者pg_reload_conf()函数都可以发信号。

postmaster收到这个信号之后,会调用SIGHUP_handler,处理一堆事务,包括重载配置文件(包括postgresql.conf, pg_hba.conf, pg_ident.conf),以及调用一些处理函数。

从代码来看,发起reload的进程,并不知道reload的结果,因为信号发完就了事了。
src/backend/utils/adt/misc.c

/*
 * Signal to reload the database configuration
 */
Datum
pg_reload_conf(PG_FUNCTION_ARGS)
{
        if (!superuser())
                ereport(ERROR,
                                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                                 (errmsg("must be superuser to signal the postmaster"))));

        if (kill(PostmasterPid, SIGHUP))
        {
                ereport(WARNING,
                                (errmsg("failed to send signal to postmaster: %m")));
                PG_RETURN_BOOL(false);
        }

        PG_RETURN_BOOL(true);
}

postmaster进程收到SIGHUP信号后的处理,如下
src/backend/postmaster/postmaster.c

/*  
 * SIGHUP -- reread config files, and tell children to do same  
 */  
static void  
SIGHUP_handler(SIGNAL_ARGS)  
{  
        int                     save_errno = errno;  
  
        PG_SETMASK(&BlockSig);  
  
        if (Shutdown <= SmartShutdown)  
        {  
                ereport(LOG,  
                                (errmsg("received SIGHUP, reloading configuration files")));  
                ProcessConfigFile(PGC_SIGHUP);  
                SignalChildren(SIGHUP);  
                if (StartupPID != 0)  
                        signal_child(StartupPID, SIGHUP);  
                if (BgWriterPID != 0)  
                        signal_child(BgWriterPID, SIGHUP);  
                if (CheckpointerPID != 0)  
                        signal_child(CheckpointerPID, SIGHUP);  
                if (WalWriterPID != 0)  
                        signal_child(WalWriterPID, SIGHUP);  
                if (WalReceiverPID != 0)  
                        signal_child(WalReceiverPID, SIGHUP);  
                if (AutoVacPID != 0)  
                        signal_child(AutoVacPID, SIGHUP);  
                if (PgArchPID != 0)  
                        signal_child(PgArchPID, SIGHUP);  
                if (SysLoggerPID != 0)  
                        signal_child(SysLoggerPID, SIGHUP);  
                if (PgStatPID != 0)  
                        signal_child(PgStatPID, SIGHUP);  
  
                /* Reload authentication config files too */  
                if (!load_hba())  
                        ereport(WARNING,  
                                        (errmsg("pg_hba.conf not reloaded")));  
  
                if (!load_ident())  
                        ereport(WARNING,  
                                        (errmsg("pg_ident.conf not reloaded")));  
  
#ifdef EXEC_BACKEND  
                /* Update the starting-point file for future children */  
                write_nondefault_variables(PGC_SIGHUP);  
#endif  
        }  
  
        PG_SETMASK(&UnBlockSig);  
  
        errno = save_errno;  
}  

我们关心的是重载配置文件的几个调用

ProcessConfigFile(PGC_SIGHUP);  
load_hba()  
load_ident()  

postgresql.conf 配置文件重载的代码如下,如果有错误,会调用ereport,输出到日志。

void  
ProcessConfigFile(GucContext context)  
{  
  
        /*  
         * Read and apply the config file.  We don't need to examine the result.  
         */  
        (void) ProcessConfigFileInternal(context, true, elevel);  
...  
ProcessConfigFileInternal(context, true, elevel)  
...  
  
                else if (strchr(item->name, GUC_QUALIFIER_SEPARATOR) == NULL)  
                {  
                        /* Invalid non-custom variable, so complain */  
                        ereport(elevel,  
                                        (errcode(ERRCODE_UNDEFINED_OBJECT),  
                                         errmsg("unrecognized configuration parameter \"%s\" in file \"%s\" line %u",  
                                                        item->name,  
                                                        item->filename, item->sourceline)));  
                        item->errmsg = pstrdup("unrecognized configuration parameter");  
                        error = true;  
                        ConfFileWithError = item->filename;  
                }  
...  
                if (gconf->context < PGC_SIGHUP)  
                {  
                        ereport(elevel,  
                                        (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),  
                                         errmsg("parameter \"%s\" cannot be changed without restarting the server",  
                                                        gconf->name)));  
                        record_config_file_error(psprintf("parameter \"%s\" cannot be changed without restarting the server",  
                                                                                          gconf->name),  
                                                                         NULL, 0,  
                                                                         &head, &tail);  
                        error = true;  
                        continue;  
                }  
...  
                        /* Log the change if appropriate */  
                        if (context == PGC_SIGHUP)  
                                ereport(elevel,  
                                                (errmsg("parameter \"%s\" removed from configuration file, reset to default",  
                                                                gconf->name)));  

重载pg_hba.conf的代码,如果有错误也会输出,但是同样是postmaster进程的输出,而不是用户进程。

load_hba(void)  
{  
...  
                if ((newline = parse_hba_line(lfirst(line), lfirst_int(line_num), lfirst(raw_line))) == NULL)  
                {  
...  
parse_hba_line(List *line, int line_num, char *raw_line)  
{  
...  
                                ereport(LOG,  
                                                (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                                                 errmsg("hostssl requires SSL to be turned on"),  
                                                 errhint("Set ssl = on in postgresql.conf."),  
                                                 errcontext("line %d of configuration file \"%s\"",  
                                                                        line_num, HbaFileName)));  
...  

我们可以看到,如果重载异常,ereport调用是postmaster发出的,发送SIGHUP信号的进程(即与用户交互的backend process)收不到这个告警。

所以,用户可以查看数据库日志的方式,了解重载配置文件是否异常。

2016-09-01 18:49:50.617 CST,,,64793,,57c52489.fd19,14,,2016-08-30 14:15:37 CST,,0,LOG,F0000,"end-of-line before role specification",,,,,"line 94 of configuration file ""/u01/digoal/pg_root_1921/pg_hba.conf""",,,"parse_hba_line, hba.c:946",""  
2016-09-01 18:49:50.617 CST,,,64793,,57c52489.fd19,15,,2016-08-30 14:15:37 CST,,0,WARNING,01000,"pg_hba.conf not reloaded",,,,,,,,"SIGHUP_handler, postmaster.c:2494",""  

目前,不管reload有没有成功,都会更新reload时间,所以通过pg_conf_load_time获取到的是接收到SIGHUP信号的时间,并不能代表最后的成功reload时间

 pg_catalog | pg_conf_load_time                        | timestamp with time zone |                     | normal

backend process如何获取reload状态

但是backend process怎么样才能知道reload异常了呢?

因为backend process发完信号就返回了,所以只要信号发成功就可以,至于reload它才不管呢,那么postmaster怎么把问题反馈给backend process呢?

我想到一个思路是异步消息,我们知道PostgreSQL是支持异步消息的,我以前写过一些文档介绍异步消息, 例如
《PostgreSQL Notify/Listen Like ESB》
https://yq.aliyun.com/articles/14606

《PostgreSQL 的小玩具, async Notification as a chat group》
https://yq.aliyun.com/articles/81

其实Greenplum在一些管理手段中也使用了异步消息,用于传递一些状态信息。

PostgreSQL其实也可以这样做:
1. 后台进程调用pg_reload_conf(),并且监听一个channel(例如我们固定命名为reload channel)。
2. 信号发完,postmaster开始处理信号。
3. postmaster在解析配置文件,或者reload配置文件时,如果遇到错误,除了触发ereport之外,同时将异步消息通知到对应的channel。
4. 这样的话,只要backend process不退出,就能收到来自postmaster的通知,知道reload是否异常。

Flag Counter

digoal’s 大量PostgreSQL文章入口