PostgreSQL’s Cursor USAGE with SQL MODE - 分页优化
背景
今天谈一下cursor SQL, 函数中CURSOR中的使用另外再谈.
PostgreSQL客户端请求数据库返回大批量数据有几种常见的方法。
一、SELECT
二、CURSOR
三、分页取(ORDER BY OFFSET x LIMIT n)
环境:
PostgreSQL 9.0.2
RHEL 5 x64
服务端 172.16.3.33
客户端 172.16.3.39
测试表:
digoal=> \d tbl_user
Table "digoal.tbl_user"
Column | Type | Modifiers
-----------+-----------------------+-------------------------------------------------------
id | bigint | not null default nextval('tbl_user_id_seq'::regclass)
firstname | character varying(32) |
lastname | character varying(32) |
corp | character varying(32) |
age | smallint |
Indexes:
"tbl_user_pkey" PRIMARY KEY, btree (id)
一、SELECT的情况
postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal
Password for user digoal:
psql (9.0.2)
Type "help" for help.
digoal=>
查看客户端连接后,数据库服务器上占用的内存情况
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
5008 postgres 0.0 3512 postgres: digoal digoal 172.16.3.39(32286) idle
28272 root 15.2 2189832 /app/mongodb1.6.5/bin/mongod --config /app/mongodb1.6.5/conf/mongod1954.conf --shardsvr --replSet rep3/172.16.3.39:1953
18476 root 21.1 3040676 /opt/mongodb1.7/bin/mongod --config /opt/mongodb1.7/conf/mongod.conf --replSet reptest/172.16.3.39:5281
28259 root 25.2 3634580 /app/mongodb1.6.5/bin/mongod --config /app/mongodb1.6.5/conf/mongod1953.conf --shardsvr --replSet rep2/172.16.3.39:1954
约占用内存 3512KB
select 测试
postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal
Password for user digoal:
psql (9.0.2)
Type "help" for help.
digoal=> select * from tbl_user;
许久没有结果返回,观察数据库端的内存占用情况
4019 postgres 2.8 409764 postgres: digoal digoal 172.16.3.39(20211) SELECT
逐渐上升,直到idle
4019 postgres 5.0 721852 postgres: digoal digoal 172.16.3.39(20211) idle
此时该进程占用RSSIZE约721852 KB
再过少许客户端开始返回数据
digoal=> select * from tbl_user;
id | firstname | lastname | corp | age
---------+-----------+----------+----------+-----
1 | zhou | digoal | sky-mobi | 27
2 | zhou | digoal | sky-mobi | 27
3 | zhou | digoal | sky-mobi | 27
4 | zhou | digoal | sky-mobi | 27
5 | zhou | digoal | sky-mobi | 27
..................
二、CURSOR
详细语法参考 http://www.postgresql.org/docs/9.0/static/sql-declare.html
DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
举例讲解:
-- BINARY 不建议使用
-- INSENSITIVE 指出该指针不受UPDATE等语句的影响,即定义完CURSOR后返回的值被圈定在定义CURSOR之时。PG默认情况下就是INSENSITIVE的。
示例一:
SESSION A:
digoal=> truncate table tbl_user;
TRUNCATE TABLE
digoal=> insert into tbl_user select generate_series(1,10),'zhou','digoal','sky-mobi',27;
INSERT 0 10
digoal=> begin;
BEGIN
digoal=> declare cur_test scroll cursor for select * from tbl_user;
DECLARE CURSOR
digoal=> fetch 10 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
1 | zhou | digoal | sky-mobi | 27
2 | zhou | digoal | sky-mobi | 27
3 | zhou | digoal | sky-mobi | 27
4 | zhou | digoal | sky-mobi | 27
5 | zhou | digoal | sky-mobi | 27
6 | zhou | digoal | sky-mobi | 27
7 | zhou | digoal | sky-mobi | 27
8 | zhou | digoal | sky-mobi | 27
9 | zhou | digoal | sky-mobi | 27
10 | zhou | digoal | sky-mobi | 27
(10 rows)
SESSION B:
digoal=> insert into tbl_user select generate_series(11,20),'zhou','digoal','sky-mobi',27;
INSERT 0 10
SESSION A:
digoal=> fetch 10 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+------+-----
(0 rows)
示例二:
SESSION A:
digoal=> end;
COMMIT
digoal=> truncate table tbl_user;
TRUNCATE TABLE
digoal=> insert into tbl_user select generate_series(1,10),'zhou','digoal','sky-mobi',27;
INSERT 0 10
digoal=> declare cur_test scroll cursor with hold for select * from tbl_user;
DECLARE CURSOR
digoal=> select * from pg_cursors;
name | statement | is_holdable | is_binary | is_scrollable |
creation_time
----------+-----------------------------------------------------------------------+-------------+-----------+---------------+-------
------------------------
cur_test | declare cur_test scroll cursor with hold for select * from tbl_user; | t | f | t | 2011-0
2-16 11:01:15.146791+08
(1 row)
SESSION B:
digoal=> select * from pg_cursors ;
name | statement | is_holdable | is_binary | is_scrollable | creation_time
------+-----------+-------------+-----------+---------------+---------------
(0 rows)
digoal=> insert into tbl_user select generate_series(11,20),'zhou','digoal','sky-mobi',27;
INSERT 0 10
SESSION A:
digoal=> fetch last from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
10 | zhou | digoal | sky-mobi | 27
(1 row)
digoal=> fetch first from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
1 | zhou | digoal | sky-mobi | 27
(1 row)
digoal=> fetch 20 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
2 | zhou | digoal | sky-mobi | 27
3 | zhou | digoal | sky-mobi | 27
4 | zhou | digoal | sky-mobi | 27
5 | zhou | digoal | sky-mobi | 27
6 | zhou | digoal | sky-mobi | 27
7 | zhou | digoal | sky-mobi | 27
8 | zhou | digoal | sky-mobi | 27
9 | zhou | digoal | sky-mobi | 27
10 | zhou | digoal | sky-mobi | 27
(9 rows)
综上两例,cursor定义后的DML对于FETCH来说是不可见的,甚至在SESSION B删除掉的数据SESSION A的CURSOR还是能够看到的,因此with hold正如PostgreSQL所说,这部分数据已经被保持到内存或临时文件了。
SESSION B:
digoal=> delete from tbl_user;
DELETE 20
SESSION A:
digoal=> fetch first from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
1 | zhou | digoal | sky-mobi | 27
(1 row)
digoal=> fetch 20 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
2 | zhou | digoal | sky-mobi | 27
3 | zhou | digoal | sky-mobi | 27
4 | zhou | digoal | sky-mobi | 27
5 | zhou | digoal | sky-mobi | 27
6 | zhou | digoal | sky-mobi | 27
7 | zhou | digoal | sky-mobi | 27
8 | zhou | digoal | sky-mobi | 27
9 | zhou | digoal | sky-mobi | 27
10 | zhou | digoal | sky-mobi | 27
(9 rows)
SCROLL 指明该指针可以被往前也可以倒退FETCH数据。( 默认情况下不指定scroll或no scroll , PostgreSQL是这么说的The default is to allow scrolling in some cases; this is not the same as specifying SCROLL. )
WITH HOLD | WITHOUT HOLD 默认值为WITHOUT HOLD, 使用WITH HOLD将CURSOR的使用范围扩大到SESSION级别,WITHOUT HOLD是TRANSACTION级别,另外一个区别是,WITH HOLD将消耗更多的资源(内存或临时文件)来保持数据。 |
示例:
往TBL_USER插入更多数据,让它大起来,看看内存占用情况
SESSION A:
digoal=> insert into tbl_user select generate_series(0,9999999),'zhou','digoal','sky-mobi',27;
INSERT 0 10000000
digoal=> select pg_relation_size('tbl_user')/1024;
?column?
----------
588240
(1 row)
SESSION B:
postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal
Password for user digoal:
psql (9.0.2)
Type "help" for help.
digoal=>
HOST A:
查看SESSION B内存占用情况
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
11676 postgres 0.0 3508 postgres: digoal digoal 172.16.3.39(34514) idle
3508KB
SESSION B:
digoal=> declare cur_test scroll cursor with hold for select * from tbl_user;
DECLARE CURSOR
HOST A:
查看SESSION B内存占用情况
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
11676 postgres 4.9 709332 postgres: digoal digoal 172.16.3.39(34514) idle
709332KB
内存占用比表的大小588240KB略大,和存取结构有关系。
WITH HOLD的CURSOR何时被释放,1. 断开SESSION,2. 定义该CURSOR的事务被ABORT。
需要注意的是,定义CURSOR的TRANSACTION如果被ABORT了,CURSOR也会被删掉。来看看例子
SESSION B:
digoal=> begin;
BEGIN
digoal=> declare cur_test scroll cursor with hold for select * from tbl_user;
DECLARE CURSOR
digoal=> select * from pg_cursors;
name | statement | is_holdable | is_binary | is_scrollable |
creation_time
----------+----------------------------------------------------------------------+-------------+-----------+---------------+--------
-----------------------
cur_test | declare cur_test scroll cursor with hold for select * from tbl_user; | t | f | t | 2011-02
-16 11:19:07.506523+08
(1 row)
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
12205 postgres 0.0 4692 postgres: digoal digoal 172.16.3.39(52980) idle in transaction
SESSION B:
digoal=> fetch 1 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
0 | zhou | digoal | sky-mobi | 27
(1 row)
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
12205 postgres 0.0 5128 postgres: digoal digoal 172.16.3.39(52980) idle in transaction
SESSION B:
digoal=> fetch last from cur_test;
id | firstname | lastname | corp | age
---------+-----------+----------+----------+-----
9999999 | zhou | digoal | sky-mobi | 27
(1 row)
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
12205 postgres 4.8 703972 postgres: digoal digoal 172.16.3.39(52980) idle in transaction
SESSION B:
digoal=> abort;
ROLLBACK
digoal=> fetch last from cur_test;
ERROR: cursor "cur_test" does not exist
注意看内存占用的变化,对于hold游标,在取第一行的时候占用约1MB,直接跳到LAST的时候马上内存占用就上来了.
ABORT之后CURSOR自动释放.
相关系统表:
digoal=> \d pg_cursors
View "pg_catalog.pg_cursors"
Column | Type | Modifiers
---------------+--------------------------+-----------
name | text |
statement | text |
is_holdable | boolean |
is_binary | boolean |
is_scrollable | boolean |
creation_time | timestamp with time zone |
三、分页取(ORDER BY OFFSET x LIMIT n)
SESSION B:
digoal=> \timing
Timing is on.
digoal=> select * from tbl_user order by id offset 0 limit 1;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
0 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 1.354 ms
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
13172 postgres 0.0 5256 postgres: digoal digoal 172.16.3.39(53042) idle
SESSION B:
digoal=> select * from tbl_user order by id desc offset 0 limit 1;
id | firstname | lastname | corp | age
---------+-----------+----------+----------+-----
9999999 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 0.404 ms
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
13172 postgres 0.0 5352 postgres: digoal digoal 172.16.3.39(53042) idle
SESSION B:
digoal=> select * from tbl_user order by id offset 9999999 limit 1;
id | firstname | lastname | corp | age
---------+-----------+----------+----------+-----
9999999 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 1679.057 ms
HOST A:
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39
13172 postgres 5.7 832020 postgres: digoal digoal 172.16.3.39(53042) idle
SESSION B: (已经全部在内存,所以现在OFFSET 99999已经比较快了,不过由于走索引,可能有随机硬盘扫描)
digoal=> select * from tbl_user order by id offset 99999 limit 1;
id | firstname | lastname | corp | age
-------+-----------+----------+----------+-----
99999 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 15.248 ms
示例:比较offset 和 move
digoal=> begin;
BEGIN
digoal=> declare cur_test no scroll cursor without hold for select * from tbl_user order by id;
DECLARE CURSOR
digoal=> \timing
Timing is on.
digoal=> fetch 1 from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
0 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 0.356 ms
digoal=> move 99999 cur_test;
MOVE 99999
Time: 18.613 ms
digoal=> fetch 1 from cur_test;
id | firstname | lastname | corp | age
--------+-----------+----------+----------+-----
100000 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 0.286 ms
digoal=> fetch first from cur_test;
id | firstname | lastname | corp | age
----+-----------+----------+----------+-----
0 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 0.308 ms
digoal=> move 99999 cur_test;
MOVE 99999
Time: 18.567 ms
内存占用:
14622 postgres 0.1 18484 postgres: digoal digoal 172.16.3.39(27450) idle in transaction
OFFSET:
digoal=> \timing
Timing is on.
digoal=> select * from tbl_user order by id offset 99999 limit 1;
id | firstname | lastname | corp | age
-------+-----------+----------+----------+-----
99999 | zhou | digoal | sky-mobi | 27
(1 row)
Time: 19.484 ms
内存占用
14904 postgres 0.1 18496 postgres: digoal digoal 172.16.3.39(27452) idle
消耗的时间和内存的占用两种方法基本相同.不过再看看下面的例子,你会发现,CURSOR的开销是一次性的,而OFFSET是每次都要开销这么多
digoal=> \timing
Timing is on.
digoal=> begin;
BEGIN
Time: 0.287 ms
digoal=> declare cur_test no scroll cursor without hold for select * from tbl_user order by id;
DECLARE CURSOR
Time: 1.121 ms
digoal=> move 100000 from cur_test;
MOVE 100000
Time: 18.638 ms
digoal=> fetch 10 from cur_test;
id | firstname | lastname | corp | age
--------+-----------+----------+----------+-----
100000 | zhou | digoal | sky-mobi | 27
100001 | zhou | digoal | sky-mobi | 27
100002 | zhou | digoal | sky-mobi | 27
100003 | zhou | digoal | sky-mobi | 27
100004 | zhou | digoal | sky-mobi | 27
100005 | zhou | digoal | sky-mobi | 27
100006 | zhou | digoal | sky-mobi | 27
100007 | zhou | digoal | sky-mobi | 27
100008 | zhou | digoal | sky-mobi | 27
100009 | zhou | digoal | sky-mobi | 27
(10 rows)
Time: 0.317 ms
digoal=> fetch 10 from cur_test;
id | firstname | lastname | corp | age
--------+-----------+----------+----------+-----
100010 | zhou | digoal | sky-mobi | 27
100011 | zhou | digoal | sky-mobi | 27
100012 | zhou | digoal | sky-mobi | 27
100013 | zhou | digoal | sky-mobi | 27
100014 | zhou | digoal | sky-mobi | 27
100015 | zhou | digoal | sky-mobi | 27
100016 | zhou | digoal | sky-mobi | 27
100017 | zhou | digoal | sky-mobi | 27
100018 | zhou | digoal | sky-mobi | 27
100019 | zhou | digoal | sky-mobi | 27
(10 rows)
Time: 0.223 ms
OFFSET:
digoal=> select * from tbl_user order by id offset 100000 limit 10;
id | firstname | lastname | corp | age
--------+-----------+----------+----------+-----
100000 | zhou | digoal | sky-mobi | 27
100001 | zhou | digoal | sky-mobi | 27
100002 | zhou | digoal | sky-mobi | 27
100003 | zhou | digoal | sky-mobi | 27
100004 | zhou | digoal | sky-mobi | 27
100005 | zhou | digoal | sky-mobi | 27
100006 | zhou | digoal | sky-mobi | 27
100007 | zhou | digoal | sky-mobi | 27
100008 | zhou | digoal | sky-mobi | 27
100009 | zhou | digoal | sky-mobi | 27
(10 rows)
Time: 15.432 ms
digoal=> select * from tbl_user order by id offset 100010 limit 10;
id | firstname | lastname | corp | age
--------+-----------+----------+----------+-----
100010 | zhou | digoal | sky-mobi | 27
100011 | zhou | digoal | sky-mobi | 27
100012 | zhou | digoal | sky-mobi | 27
100013 | zhou | digoal | sky-mobi | 27
100014 | zhou | digoal | sky-mobi | 27
100015 | zhou | digoal | sky-mobi | 27
100016 | zhou | digoal | sky-mobi | 27
100017 | zhou | digoal | sky-mobi | 27
100018 | zhou | digoal | sky-mobi | 27
100019 | zhou | digoal | sky-mobi | 27
(10 rows)
Time: 15.170 ms
所以有可能的话还是尽量使用CURSOR来做翻页
cursor使用需要注意如下事项:
1. 指针使用完一定要记得用close关掉.
2. 尽量避免在大的结果集下使用with hold游标,因为需要将数据预加载到内存或临时文件,并发大并且结果集较大的话可能导致内存消耗过多。
3. 同时应尽量避免长事务(如等待用户翻页),由于带有without hold指针的事务会影响VACUUM回收空间,这个是非常严重的事情。
其他:
python的fetchall,fetchmany函数分别返回所有和多个