Greenplum max_prepared_transactions 设置不正确时的症状 - too many clients already
背景
Greenplum分布式事务使用2PC提交,因此需要依赖max_prepared_transactions参数的配置。
原文
https://www.cnblogs.com/chenminklutz/p/8946269.html
max_prepared_transactions 应该至少设置为 max_connections 一样大。
正文
max_prepared_transactions是greenplum的一个参数,以下是官方文档
Sets the maximum number of transactions that can be in the prepared state simultaneously. Greenplum uses prepared transactions internally to ensure data integrity across the segments. This value must be at least as large as the value of max_connections on the master. Segment instances should be set to the same value as the master.
Value Range | Default | Set Classifications |
---|---|---|
integer | 250 on master 250 on segments |
local system restart |
参数类型:本地,系统级,需要重启生效。
当这个参数设置的足够小时,如果max_prepared_transactions使用耗尽,会报错如下:
FATAL: sorry, too many clients already.
DETAIL: There are no more available slots in the sharedSnapshotArray.
HINT: Another piece of code should have detected that we have too many clients.this probably means that someone isn't releasing their slot properly.
然后其他人就连不上数据库。
开始解决问题:
查看数据库状态,gpstate -s
正常,segment没有挂
但是发现master current role变成了utility,正常应该是是dispatch啊。
肯定有问题,继续找。
手动pg_terminate_backend()最早的30个进程,正常啦,master current role也回去了,其他人也能连上了。
过一会儿又不行啦。
上源码搜sharedSnapshotArray,发现如下判断就会报这个错
if (arrayP->numSlots >= arrayP->maxSlots || arrayP->nextSlot == -1)
{
/*
* Ooops, no room. this shouldn't happen as something else should have
* complained if we go over MaxBackends.
*/
LWLockRelease(SharedSnapshotLock);
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("sorry, too many clients already."),
errdetail("There are no more available slots in the sharedSnapshotArray."),
errhint("Another piece of code should have detected that we have too many clients."
" this probably means that someone isn't releasing their slot properly.")));
}
继续顺藤摸瓜搜maxSlots,看是怎么确定的值。
/*
* We're the first - initialize.
*/
sharedSnapshotArray->numSlots = 0;
/* TODO: MaxBackends is only somewhat right. What we really want here
* is the MaxBackends value from the QD. But this is at least
* safe since we know we dont need *MORE* than MaxBackends. But
* in general MaxBackends on a QE is going to be bigger than on a
* QE by a good bit. or at least it should be.
*
* But really, max_prepared_transactions *is* what we want (it
* corresponds to the number of connections allowed on the
* master).
*
* slotCount is initialized in SharedSnapshotShmemSize().
*/
sharedSnapshotArray->maxSlots = slotCount;
sharedSnapshotArray->nextSlot = 0;
sharedSnapshotArray->slots = (SharedSnapshotSlot *)&sharedSnapshotArray->xips;
/* xips start just after the last slot structure */
xip_base = (TransactionId *)&sharedSnapshotArray->slots[sharedSnapshotArray->maxSlots];
继续slotCount,上边有注释说slotCount在SharedSnapshotShmemSize里被初始化
/*
* Report shared-memory space needed by CreateSharedSnapshot.
*/
Size
SharedSnapshotShmemSize(void)
{
Size size;
xipEntryCount = MaxBackends + max_prepared_xacts;
slotSize = sizeof(SharedSnapshotSlot);
slotSize += mul_size(sizeof(TransactionId), (xipEntryCount));
slotSize = MAXALIGN(slotSize);
/*
* We only really need max_prepared_xacts; but for safety we
* multiply that by two (to account for slow de-allocation on
* cleanup, for instance).
*/
slotCount = NUM_SHARED_SNAPSHOT_SLOTS;
size = offsetof(SharedSnapshotStruct, xips);
size = add_size(size, mul_size(slotSize, slotCount));
return MAXALIGN(size);
}
全局变量NUM_SHARED_SNAPSHOT_SLOTS
#define NUM_SHARED_SNAPSHOT_SLOTS (2 * max_prepared_xacts)
二倍的max_prepared_transactions参数值。其实上边的英文注释也说啦:我们实际上只需要max_prepared_transactions,但是为了安全我们把他乘二,比如清除时的缓慢反分配。
所以查看系统的max_prepared_transactions的值,发现只有50,此处笑cry,捂脸哭。
修改max_prepared_transactions等于master的max_connections,gpconfig -c max_prepared_transactions -v 1500
,重启数据库gpstop -a -M fast
。
查看max_prepared_transactions,已经生效。
OK问题解决!!
问题模拟
gpconfig -c max_prepared_transactions -v 1
gpstop -M fast -a
gpstart -a
postgres=# show max_prepared_transactions ;
max_prepared_transactions
---------------------------
1
(1 row)
postgres=# create table test(id int, info text);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
postgres=# begin;
BEGIN
postgres=# insert into test select generate_series(1,1000),'test';
INSERT 0 1000
-- 会话2
postgres=# \set VERBOSITY verbose
postgres=# insert into test select generate_series(1,1000),'test';
FATAL: XX000: the limit of 1 distributed transactions has been reached. (cdbtm.c:2569)
DETAIL: The global user configuration (GUC) server parameter max_prepared_transactions controls this limit.
LOCATION: createDtx, cdbtm.c:2569
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.