数据库性能会随着对象增加而受影响吗?max_locks_per_transaction & pg_locks entrys limit
背景
有网友问PostgreSQL存储的对象越多, 是不是性能会越差.
我们这么来分析, 存储的对象越多, 元数据就会越大, 例如 :
我用这种方法来创建一堆表,
为了在一个事务中创建多个表, 这里需要调整一下max_locks_per_transaction 以及(或) max connection, 满足创建多表需要的锁插槽数.
digoal=# do language plpgsql $$
declare
tbl name;
begin
create table if not exists tbl (id int primary key, info text, crt_time timestamp);
for i in 400000..500000 loop
execute 'create table if not exists tbl_'||i||'(like tbl including all)';
end loop;
end;
$$;
在调用以上inline block的过程中, 我们可以看到pg_locks中不断的新增插槽, 以及新建relation的锁信息.
digoal=# select count(*) from pg_locks;
count
-------
18381
(1 row)
digoal=# select count(*) from pg_locks;
count
-------
33569
(1 row)
digoal=# select max(relation) from pg_locks;
max
---------
1185124
(1 row)
digoal=# select max(relation) from pg_locks;
max
---------
1186892
(1 row)
digoal=# select max(relation) from pg_locks;
max
---------
1190188
(1 row)
这就是为什么需要调整max_locks_per_transaction 以及(或) max connection的原因, 为的是得到更多的lock slots.
The shared lock table tracks locks on
max_locks_per_transaction * (max_connections + max_prepared_transactions) objects (e.g., tables);
hence, no more than this many distinct objects can be locked at any one time.
一些介绍可参考
《一个事务最多能锁多少对象? how many objects can be locked per transaction》
例如, 我在创建50W个表后,
digoal=# select count(*) from pg_class;
count
---------
2000322
(1 row)
Time: 721.482 ms
digoal=# select count(*) from pg_class where relname ~ '^tbl';
count
---------
1000002
(1 row)
Time: 1917.384 ms
digoal=# select count(*) from pg_class where relname ~ '^tbl' and relkind='r';
count
--------
500001
(1 row)
Time: 2098.243 ms
除了pg_class, 还有pg_index, pg_attribute等元表都会新增大量数据.
没有插入任何数据的情况下, 数据库已经达到了35GB.
什么情况下会读元表的, 哪些元表的操作会走索引, 哪些不会走索引呢.
可参阅 src/backend/utils/cache
为了提高效率, 元表的数据在数据库启动时, 会加载到内存. 但是, 如果元表的数据量太大的话, 一方面是加载变慢了, 另一方面元表的数据检索也会变慢, 即使是在内存中检索.
一般情况下, 元表的检索都有索引, 并且也不会频繁的检索.
但是也有不使用索引的情况, 例如有些应用框架会去查一些元表, 且做大量元表的管理查询. 如果你发现在表多了之后, 变慢, 可以跟踪一下是不是应用框架的问题.
如果框架没有问题, 数据库中存在多个表的话, 会带来的副作用还有可能是内存和CPU的负担变大.
digoal=# select count(*) from pg_class;
count
---------
2000322
(1 row)
Time: 353.556 ms
digoal=# select count(*) from pg_attribute;
count
----------
10502467
(1 row)
Time: 1959.034 ms
例如众多的表可能在使用过程中都被用到, 那么需要更多的内存来存储plan cache. 特别是并发大, 长连接的话, 会长期占用CACHE.
CPU的负担则可能是由于父子表带来的, PLAN开销, 扫描开销都会变大.
如果不使用父子表的话, 只是单表的操作, 理论上没有太大影响.
测试系统中有50W个表的情况下的单表操作性能.
postgres@db-172-16-3-221-> psql
psql (9.3.5)
Type "help" for help.
digoal=# create or replace function f_tbl(v_id int) returns void as $$
digoal$# declare
digoal$# begin
digoal$# update tbl set info='test' where id=v_id;
digoal$# if not found then
digoal$# insert into tbl values(v_id,'test');
digoal$# end if;
digoal$# exception when others then
digoal$# return;
digoal$# end;
digoal$# $$ language plpgsql strict;
CREATE FUNCTION
digoal=# select f_tbl(1);
f_tbl
-------
(1 row)
digoal=# select f_tbl(1);
f_tbl
-------
(1 row)
digoal=# select * from tbl;
id | info | crt_time
----+------+----------
1 | test |
(1 row)
postgres@db-172-16-3-221-> vi test.sql
\setrandom id 1 5000000
select f_tbl(:id);
postgres@db-172-16-3-221-> pgbench -n -r -f ./test.sql -c 8 -j 4 -T 30
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 1176626
tps = 39220.499301 (including connections establishing)
tps = 39231.529469 (excluding connections establishing)
statement latencies in milliseconds:
0.002908 \setrandom id 1 5000000
0.199018 select f_tbl(:id);
postgres@db-172-16-3-221-> psql
psql (9.3.5)
Type "help" for help.
digoal=# select count(*) from tbl;
count
---------
1048010
(1 row)
测试在数据库中只有1个表的情况下的单表操作性能.
digoal=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# drop database digoal;
postgres=# create database digoal;
CREATE DATABASE
postgres=# \c digoal
You are now connected to database "digoal" as user "postgres".
digoal=# create table tbl(id int primary key, info text, crt_time timestamp);
CREATE TABLE
digoal=# create or replace function f_tbl(v_id int) returns void as $$
digoal$# declare
digoal$# begin
digoal$# update tbl set info='test' where id=v_id;
digoal$# if not found then
digoal$# insert into tbl values(v_id,'test');
digoal$# end if;
digoal$# exception when others then
digoal$# return;
digoal$# end;
digoal$# $$ language plpgsql strict;
CREATE FUNCTION
postgres@db-172-16-3-221-> pgbench -n -r -f ./test.sql -c 8 -j 4 -T 30
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 1191138
tps = 39704.058701 (including connections establishing)
tps = 39714.073461 (excluding connections establishing)
statement latencies in milliseconds:
0.002761 \setrandom id 1 5000000
0.196848 select f_tbl(:id);
digoal=# select count(*) from tbl;
count
---------
1060031
(1 row)
可以看到没有性能差别.
实际上用户要担心的是另一个问题,例如一个长连接,访问了众多对象时,这些对象连带的syscache,relcache会在会话对应的连接中缓存起来。
如果连接多,每个连接都缓存了大量的syscache,relcache,会消耗较多的内存。
参考
《PostgreSQL relcache在长连接应用中的内存霸占”坑”》
参考
1. 《一个事务最多能锁多少对象? how many objects can be locked per transaction》
2. src/backend/storage/ipc/shmem.c
/*
* ShmemAlloc -- allocate max-aligned chunk from shared memory
*
* Assumes ShmemLock and ShmemSegHdr are initialized.
*
* Returns: real pointer to memory or NULL if we are out
* of space. Has to return a real pointer in order
* to be compatible with malloc().
*/
void *
ShmemAlloc(Size size)
{
Size newStart;
Size newFree;
void *newSpace;
/* use volatile pointer to prevent code rearrangement */
volatile PGShmemHeader *shmemseghdr = ShmemSegHdr;
/*
* ensure all space is adequately aligned.
*/
size = MAXALIGN(size);
Assert(shmemseghdr != NULL);
SpinLockAcquire(ShmemLock);
newStart = shmemseghdr->freeoffset;
/* extra alignment for large requests, since they are probably buffers */
if (size >= BLCKSZ)
newStart = BUFFERALIGN(newStart);
newFree = newStart + size;
if (newFree <= shmemseghdr->totalsize)
{
newSpace = (void *) ((char *) ShmemBase + newStart);
shmemseghdr->freeoffset = newFree;
}
else
newSpace = NULL;
SpinLockRelease(ShmemLock);
if (!newSpace)
ereport(WARNING,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory")));
return newSpace;
}
3. src/backend/storage/lmgr/lock.c
/*
* LockAcquireExtended - allows us to specify additional options
*
* reportMemoryError specifies whether a lock request that fills the
* lock table should generate an ERROR or not. This allows a priority
* caller to note that the lock table is full and then begin taking
* extreme action to reduce the number of other lock holders before
* retrying the action.
*/
LockAcquireResult
LockAcquireExtended(const LOCKTAG *locktag,
LOCKMODE lockmode,
bool sessionLock,
bool dontWait,
bool reportMemoryError)
{
....
/*
* If this lock could potentially have been taken via the fast-path by
* some other backend, we must (temporarily) disable further use of the
* fast-path for this lock tag, and migrate any locks already taken via
* this method to the main lock table.
*/
if (ConflictsWithRelationFastPath(locktag, lockmode))
{
uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
BeginStrongLockAcquire(locallock, fasthashcode);
if (!FastPathTransferRelationLocks(lockMethodTable, locktag,
hashcode))
{
AbortStrongLockAcquire();
if (reportMemoryError)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory"),
errhint("You might need to increase max_locks_per_transaction.")));
else
return LOCKACQUIRE_NOT_AVAIL;
}
}
......
/*
* Find or create lock and proclock entries with this tag
*
* Note: if the locallock object already existed, it might have a pointer
* to the lock already ... but we should not assume that that pointer is
* valid, since a lock object with zero hold and request counts can go
* away anytime. So we have to use SetupLockInTable() to recompute the
* lock and proclock pointers, even if they're already set.
*/
proclock = SetupLockInTable(lockMethodTable, MyProc, locktag,
hashcode, lockmode);
if (!proclock)
{
AbortStrongLockAcquire();
LWLockRelease(partitionLock);
if (reportMemoryError)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory"),
errhint("You might need to increase max_locks_per_transaction.")));
else
return LOCKACQUIRE_NOT_AVAIL;
}