PostgreSQL reload配置的动作反馈与源码分析
背景
PostgreSQL数据库的配置文件中,有一些配置项是支持reload的,但是如果配置写错了,reload时怎么知道呢?
源码分析
reload其实是通过给postmaster进程发SIGHUP信号来实现的。
通过pg_ctl或者kill或者pg_reload_conf()函数都可以发信号。
postmaster收到这个信号之后,会调用SIGHUP_handler,处理一堆事务,包括重载配置文件(包括postgresql.conf, pg_hba.conf, pg_ident.conf),以及调用一些处理函数。
从代码来看,发起reload的进程,并不知道reload的结果,因为信号发完就了事了。
src/backend/utils/adt/misc.c
/*
* Signal to reload the database configuration
*/
Datum
pg_reload_conf(PG_FUNCTION_ARGS)
{
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg("must be superuser to signal the postmaster"))));
if (kill(PostmasterPid, SIGHUP))
{
ereport(WARNING,
(errmsg("failed to send signal to postmaster: %m")));
PG_RETURN_BOOL(false);
}
PG_RETURN_BOOL(true);
}
postmaster进程收到SIGHUP信号后的处理,如下
src/backend/postmaster/postmaster.c
/*
* SIGHUP -- reread config files, and tell children to do same
*/
static void
SIGHUP_handler(SIGNAL_ARGS)
{
int save_errno = errno;
PG_SETMASK(&BlockSig);
if (Shutdown <= SmartShutdown)
{
ereport(LOG,
(errmsg("received SIGHUP, reloading configuration files")));
ProcessConfigFile(PGC_SIGHUP);
SignalChildren(SIGHUP);
if (StartupPID != 0)
signal_child(StartupPID, SIGHUP);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGHUP);
if (CheckpointerPID != 0)
signal_child(CheckpointerPID, SIGHUP);
if (WalWriterPID != 0)
signal_child(WalWriterPID, SIGHUP);
if (WalReceiverPID != 0)
signal_child(WalReceiverPID, SIGHUP);
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGHUP);
if (PgArchPID != 0)
signal_child(PgArchPID, SIGHUP);
if (SysLoggerPID != 0)
signal_child(SysLoggerPID, SIGHUP);
if (PgStatPID != 0)
signal_child(PgStatPID, SIGHUP);
/* Reload authentication config files too */
if (!load_hba())
ereport(WARNING,
(errmsg("pg_hba.conf not reloaded")));
if (!load_ident())
ereport(WARNING,
(errmsg("pg_ident.conf not reloaded")));
#ifdef EXEC_BACKEND
/* Update the starting-point file for future children */
write_nondefault_variables(PGC_SIGHUP);
#endif
}
PG_SETMASK(&UnBlockSig);
errno = save_errno;
}
我们关心的是重载配置文件的几个调用
ProcessConfigFile(PGC_SIGHUP);
load_hba()
load_ident()
postgresql.conf 配置文件重载的代码如下,如果有错误,会调用ereport,输出到日志。
void
ProcessConfigFile(GucContext context)
{
/*
* Read and apply the config file. We don't need to examine the result.
*/
(void) ProcessConfigFileInternal(context, true, elevel);
...
ProcessConfigFileInternal(context, true, elevel)
...
else if (strchr(item->name, GUC_QUALIFIER_SEPARATOR) == NULL)
{
/* Invalid non-custom variable, so complain */
ereport(elevel,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("unrecognized configuration parameter \"%s\" in file \"%s\" line %u",
item->name,
item->filename, item->sourceline)));
item->errmsg = pstrdup("unrecognized configuration parameter");
error = true;
ConfFileWithError = item->filename;
}
...
if (gconf->context < PGC_SIGHUP)
{
ereport(elevel,
(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
errmsg("parameter \"%s\" cannot be changed without restarting the server",
gconf->name)));
record_config_file_error(psprintf("parameter \"%s\" cannot be changed without restarting the server",
gconf->name),
NULL, 0,
&head, &tail);
error = true;
continue;
}
...
/* Log the change if appropriate */
if (context == PGC_SIGHUP)
ereport(elevel,
(errmsg("parameter \"%s\" removed from configuration file, reset to default",
gconf->name)));
重载pg_hba.conf的代码,如果有错误也会输出,但是同样是postmaster进程的输出,而不是用户进程。
load_hba(void)
{
...
if ((newline = parse_hba_line(lfirst(line), lfirst_int(line_num), lfirst(raw_line))) == NULL)
{
...
parse_hba_line(List *line, int line_num, char *raw_line)
{
...
ereport(LOG,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("hostssl requires SSL to be turned on"),
errhint("Set ssl = on in postgresql.conf."),
errcontext("line %d of configuration file \"%s\"",
line_num, HbaFileName)));
...
我们可以看到,如果重载异常,ereport调用是postmaster发出的,发送SIGHUP信号的进程(即与用户交互的backend process)收不到这个告警。
所以,用户可以查看数据库日志的方式,了解重载配置文件是否异常。
2016-09-01 18:49:50.617 CST,,,64793,,57c52489.fd19,14,,2016-08-30 14:15:37 CST,,0,LOG,F0000,"end-of-line before role specification",,,,,"line 94 of configuration file ""/u01/digoal/pg_root_1921/pg_hba.conf""",,,"parse_hba_line, hba.c:946",""
2016-09-01 18:49:50.617 CST,,,64793,,57c52489.fd19,15,,2016-08-30 14:15:37 CST,,0,WARNING,01000,"pg_hba.conf not reloaded",,,,,,,,"SIGHUP_handler, postmaster.c:2494",""
目前,不管reload有没有成功,都会更新reload时间,所以通过pg_conf_load_time获取到的是接收到SIGHUP信号的时间,并不能代表最后的成功reload时间
pg_catalog | pg_conf_load_time | timestamp with time zone | | normal
backend process如何获取reload状态
但是backend process怎么样才能知道reload异常了呢?
因为backend process发完信号就返回了,所以只要信号发成功就可以,至于reload它才不管呢,那么postmaster怎么把问题反馈给backend process呢?
我想到一个思路是异步消息,我们知道PostgreSQL是支持异步消息的,我以前写过一些文档介绍异步消息, 例如
《PostgreSQL Notify/Listen Like ESB》
https://yq.aliyun.com/articles/14606
《PostgreSQL 的小玩具, async Notification as a chat group》
https://yq.aliyun.com/articles/81
其实Greenplum在一些管理手段中也使用了异步消息,用于传递一些状态信息。
PostgreSQL其实也可以这样做:
1. 后台进程调用pg_reload_conf(),并且监听一个channel(例如我们固定命名为reload channel)。
2. 信号发完,postmaster开始处理信号。
3. postmaster在解析配置文件,或者reload配置文件时,如果遇到错误,除了触发ereport之外,同时将异步消息通知到对应的channel。
4. 这样的话,只要backend process不退出,就能收到来自postmaster的通知,知道reload是否异常。