PostgreSQL’s Cursor USAGE with SQL MODE - 分页优化

10 minute read

背景

今天谈一下cursor SQL, 函数中CURSOR中的使用另外再谈.

PostgreSQL客户端请求数据库返回大批量数据有几种常见的方法。

一、SELECT

二、CURSOR

三、分页取(ORDER BY OFFSET x LIMIT n)

环境:

PostgreSQL 9.0.2  
RHEL 5 x64   
服务端 172.16.3.33  
客户端 172.16.3.39  

测试表:

digoal=> \d tbl_user  
                                  Table "digoal.tbl_user"  
  Column   |         Type          |                       Modifiers                         
-----------+-----------------------+-------------------------------------------------------  
 id        | bigint                | not null default nextval('tbl_user_id_seq'::regclass)  
 firstname | character varying(32) |   
 lastname  | character varying(32) |   
 corp      | character varying(32) |   
 age       | smallint              |   
Indexes:  
    "tbl_user_pkey" PRIMARY KEY, btree (id)  

一、SELECT的情况

postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal  
Password for user digoal:   
psql (9.0.2)  
Type "help" for help.  
  
digoal=>   

查看客户端连接后,数据库服务器上占用的内存情况

postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
 5008 postgres  0.0  3512 postgres: digoal digoal 172.16.3.39(32286) idle                   
28272 root     15.2 2189832 /app/mongodb1.6.5/bin/mongod --config /app/mongodb1.6.5/conf/mongod1954.conf --shardsvr --replSet rep3/172.16.3.39:1953  
18476 root     21.1 3040676 /opt/mongodb1.7/bin/mongod --config /opt/mongodb1.7/conf/mongod.conf --replSet reptest/172.16.3.39:5281  
28259 root     25.2 3634580 /app/mongodb1.6.5/bin/mongod --config /app/mongodb1.6.5/conf/mongod1953.conf --shardsvr --replSet rep2/172.16.3.39:1954  

约占用内存 3512KB

select 测试

postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal  
Password for user digoal:   
psql (9.0.2)  
Type "help" for help.  
digoal=> select * from tbl_user;  

许久没有结果返回,观察数据库端的内存占用情况

 4019 postgres  2.8 409764 postgres: digoal digoal 172.16.3.39(20211) SELECT    

逐渐上升,直到idle

 4019 postgres  5.0 721852 postgres: digoal digoal 172.16.3.39(20211) idle    

此时该进程占用RSSIZE约721852 KB

再过少许客户端开始返回数据

digoal=> select * from tbl_user;  
   id    | firstname | lastname |   corp   | age   
---------+-----------+----------+----------+-----  
       1 | zhou      | digoal   | sky-mobi |  27  
       2 | zhou      | digoal   | sky-mobi |  27  
       3 | zhou      | digoal   | sky-mobi |  27  
       4 | zhou      | digoal   | sky-mobi |  27  
       5 | zhou      | digoal   | sky-mobi |  27  
..................  

二、CURSOR

详细语法参考 http://www.postgresql.org/docs/9.0/static/sql-declare.html

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]  
    CURSOR [ { WITH | WITHOUT } HOLD ] FOR query  

举例讲解:

-- BINARY 不建议使用  
-- INSENSITIVE 指出该指针不受UPDATE等语句的影响,即定义完CURSOR后返回的值被圈定在定义CURSOR之时。PG默认情况下就是INSENSITIVE的。  

示例一:

SESSION A:  
digoal=> truncate table tbl_user;  
TRUNCATE TABLE  
digoal=> insert into tbl_user select generate_series(1,10),'zhou','digoal','sky-mobi',27;  
INSERT 0 10  
digoal=> begin;  
BEGIN  
digoal=> declare cur_test scroll cursor for select * from tbl_user;  
DECLARE CURSOR  
digoal=> fetch 10 from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  1 | zhou      | digoal   | sky-mobi |  27  
  2 | zhou      | digoal   | sky-mobi |  27  
  3 | zhou      | digoal   | sky-mobi |  27  
  4 | zhou      | digoal   | sky-mobi |  27  
  5 | zhou      | digoal   | sky-mobi |  27  
  6 | zhou      | digoal   | sky-mobi |  27  
  7 | zhou      | digoal   | sky-mobi |  27  
  8 | zhou      | digoal   | sky-mobi |  27  
  9 | zhou      | digoal   | sky-mobi |  27  
 10 | zhou      | digoal   | sky-mobi |  27  
(10 rows)  
  
SESSION B:  
digoal=> insert into tbl_user select generate_series(11,20),'zhou','digoal','sky-mobi',27;  
INSERT 0 10  
  
SESSION A:  
digoal=> fetch 10 from cur_test;  
 id | firstname | lastname | corp | age   
----+-----------+----------+------+-----  
(0 rows)  

示例二:

SESSION A:  
digoal=> end;  
COMMIT  
digoal=> truncate table tbl_user;  
TRUNCATE TABLE  
digoal=> insert into tbl_user select generate_series(1,10),'zhou','digoal','sky-mobi',27;  
INSERT 0 10  
digoal=> declare cur_test scroll  cursor with hold for select * from tbl_user;  
DECLARE CURSOR  
digoal=> select * from pg_cursors;  
   name   |                               statement                               | is_holdable | is_binary | is_scrollable |         
  creation_time           
----------+-----------------------------------------------------------------------+-------------+-----------+---------------+-------  
------------------------  
 cur_test | declare cur_test scroll  cursor with hold for select * from tbl_user; | t           | f         | t             | 2011-0  
2-16 11:01:15.146791+08  
(1 row)  
  
SESSION B:  
digoal=> select * from pg_cursors ;  
 name | statement | is_holdable | is_binary | is_scrollable | creation_time   
------+-----------+-------------+-----------+---------------+---------------  
(0 rows)  
  
digoal=> insert into tbl_user select generate_series(11,20),'zhou','digoal','sky-mobi',27;  
INSERT 0 10  
  
SESSION A:  
digoal=> fetch last from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
 10 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
digoal=> fetch first from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  1 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
digoal=> fetch 20 from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  2 | zhou      | digoal   | sky-mobi |  27  
  3 | zhou      | digoal   | sky-mobi |  27  
  4 | zhou      | digoal   | sky-mobi |  27  
  5 | zhou      | digoal   | sky-mobi |  27  
  6 | zhou      | digoal   | sky-mobi |  27  
  7 | zhou      | digoal   | sky-mobi |  27  
  8 | zhou      | digoal   | sky-mobi |  27  
  9 | zhou      | digoal   | sky-mobi |  27  
 10 | zhou      | digoal   | sky-mobi |  27  
(9 rows)  

综上两例,cursor定义后的DML对于FETCH来说是不可见的,甚至在SESSION B删除掉的数据SESSION A的CURSOR还是能够看到的,因此with hold正如PostgreSQL所说,这部分数据已经被保持到内存或临时文件了。

SESSION B:  
digoal=> delete from tbl_user;  
DELETE 20  
  
SESSION A:  
digoal=> fetch first from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  1 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
digoal=> fetch 20 from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  2 | zhou      | digoal   | sky-mobi |  27  
  3 | zhou      | digoal   | sky-mobi |  27  
  4 | zhou      | digoal   | sky-mobi |  27  
  5 | zhou      | digoal   | sky-mobi |  27  
  6 | zhou      | digoal   | sky-mobi |  27  
  7 | zhou      | digoal   | sky-mobi |  27  
  8 | zhou      | digoal   | sky-mobi |  27  
  9 | zhou      | digoal   | sky-mobi |  27  
 10 | zhou      | digoal   | sky-mobi |  27  
(9 rows)  

SCROLL 指明该指针可以被往前也可以倒退FETCH数据。( 默认情况下不指定scroll或no scroll , PostgreSQL是这么说的The default is to allow scrolling in some cases; this is not the same as specifying SCROLL. )

WITH HOLD WITHOUT HOLD 默认值为WITHOUT HOLD, 使用WITH HOLD将CURSOR的使用范围扩大到SESSION级别,WITHOUT HOLD是TRANSACTION级别,另外一个区别是,WITH HOLD将消耗更多的资源(内存或临时文件)来保持数据。

示例:

往TBL_USER插入更多数据,让它大起来,看看内存占用情况

SESSION A:  
digoal=> insert into tbl_user select generate_series(0,9999999),'zhou','digoal','sky-mobi',27;  
INSERT 0 10000000  
digoal=> select pg_relation_size('tbl_user')/1024;  
 ?column?   
----------  
   588240  
(1 row)  
  
SESSION B:  
postgres@db-172-16-3-39-> psql -h 172.16.3.33 digoal digoal  
Password for user digoal:   
psql (9.0.2)  
Type "help" for help.  
digoal=>   

HOST A:

查看SESSION B内存占用情况

postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
11676 postgres  0.0  3508 postgres: digoal digoal 172.16.3.39(34514) idle   
3508KB  
  
SESSION B:  
digoal=> declare cur_test scroll cursor with hold for select * from tbl_user;  
DECLARE CURSOR  

HOST A:

查看SESSION B内存占用情况

 postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
11676 postgres  4.9 709332 postgres: digoal digoal 172.16.3.39(34514) idle    
709332KB  

内存占用比表的大小588240KB略大,和存取结构有关系。

WITH HOLD的CURSOR何时被释放,1. 断开SESSION,2. 定义该CURSOR的事务被ABORT。

需要注意的是,定义CURSOR的TRANSACTION如果被ABORT了,CURSOR也会被删掉。来看看例子

SESSION B:  
digoal=> begin;  
BEGIN  
digoal=> declare cur_test scroll cursor with hold for select * from tbl_user;  
DECLARE CURSOR  
digoal=> select * from pg_cursors;  
   name   |                              statement                               | is_holdable | is_binary | is_scrollable |          
 creation_time           
----------+----------------------------------------------------------------------+-------------+-----------+---------------+--------  
-----------------------  
 cur_test | declare cur_test scroll cursor with hold for select * from tbl_user; | t           | f         | t             | 2011-02  
-16 11:19:07.506523+08  
(1 row)  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
12205 postgres  0.0  4692 postgres: digoal digoal 172.16.3.39(52980) idle in transaction  
  
SESSION B:  
digoal=> fetch 1 from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  0 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
12205 postgres  0.0  5128 postgres: digoal digoal 172.16.3.39(52980) idle in transaction   
  
SESSION B:  
digoal=> fetch last from cur_test;  
   id    | firstname | lastname |   corp   | age   
---------+-----------+----------+----------+-----  
 9999999 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
12205 postgres  4.8 703972 postgres: digoal digoal 172.16.3.39(52980) idle in transaction   
  
SESSION B:  
digoal=> abort;  
ROLLBACK  
digoal=> fetch last from cur_test;  
ERROR:  cursor "cur_test" does not exist  

注意看内存占用的变化,对于hold游标,在取第一行的时候占用约1MB,直接跳到LAST的时候马上内存占用就上来了.

ABORT之后CURSOR自动释放.

相关系统表:

digoal=> \d pg_cursors  
             View "pg_catalog.pg_cursors"  
    Column     |           Type           | Modifiers   
---------------+--------------------------+-----------  
 name          | text                     |   
 statement     | text                     |   
 is_holdable   | boolean                  |   
 is_binary     | boolean                  |   
 is_scrollable | boolean                  |   
 creation_time | timestamp with time zone |   

三、分页取(ORDER BY OFFSET x LIMIT n)

SESSION B:  
digoal=> \timing  
Timing is on.  
digoal=> select * from tbl_user order by id offset 0 limit 1;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  0 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
Time: 1.354 ms  
  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
13172 postgres  0.0  5256 postgres: digoal digoal 172.16.3.39(53042) idle   
  
SESSION B:  
digoal=> select * from tbl_user order by id desc offset 0 limit 1;  
   id    | firstname | lastname |   corp   | age   
---------+-----------+----------+----------+-----  
 9999999 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 0.404 ms  
  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
13172 postgres  0.0  5352 postgres: digoal digoal 172.16.3.39(53042) idle     
  
SESSION B:  
digoal=> select * from tbl_user order by id offset 9999999 limit 1;  
   id    | firstname | lastname |   corp   | age   
---------+-----------+----------+----------+-----  
 9999999 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 1679.057 ms  
  
HOST A:  
postgres@db-172-16-3-33-> ps -eo pid,user,pmem,rssize,cmd --sort rssize|grep 172.16.3.39  
13172 postgres  5.7 832020 postgres: digoal digoal 172.16.3.39(53042) idle  
  
SESSION B: (已经全部在内存,所以现在OFFSET 99999已经比较快了,不过由于走索引,可能有随机硬盘扫描)  
digoal=> select * from tbl_user order by id offset 99999 limit 1;  
  id   | firstname | lastname |   corp   | age   
-------+-----------+----------+----------+-----  
 99999 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 15.248 ms  

示例:比较offset 和 move

digoal=> begin;  
BEGIN  
digoal=> declare cur_test no scroll cursor without hold for select * from tbl_user order by id;  
DECLARE CURSOR  
digoal=> \timing  
Timing is on.  
digoal=> fetch 1 from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  0 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 0.356 ms  
digoal=> move 99999 cur_test;  
MOVE 99999  
Time: 18.613 ms  
digoal=> fetch 1 from cur_test;  
   id   | firstname | lastname |   corp   | age   
--------+-----------+----------+----------+-----  
 100000 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 0.286 ms  
digoal=> fetch first from cur_test;  
 id | firstname | lastname |   corp   | age   
----+-----------+----------+----------+-----  
  0 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
  
Time: 0.308 ms  
digoal=> move 99999 cur_test;  
MOVE 99999  
Time: 18.567 ms  

内存占用:

14622 postgres  0.1 18484 postgres: digoal digoal 172.16.3.39(27450) idle in transaction  
OFFSET:  
digoal=> \timing  
Timing is on.  
digoal=> select * from tbl_user order by id offset 99999 limit 1;  
  id   | firstname | lastname |   corp   | age   
-------+-----------+----------+----------+-----  
 99999 | zhou      | digoal   | sky-mobi |  27  
(1 row)  
Time: 19.484 ms  

内存占用

14904 postgres  0.1 18496 postgres: digoal digoal 172.16.3.39(27452) idle     

消耗的时间和内存的占用两种方法基本相同.不过再看看下面的例子,你会发现,CURSOR的开销是一次性的,而OFFSET是每次都要开销这么多

digoal=> \timing  
Timing is on.  
digoal=> begin;  
BEGIN  
Time: 0.287 ms  
digoal=> declare cur_test no scroll cursor without hold for select * from tbl_user order by id;  
DECLARE CURSOR  
Time: 1.121 ms  
digoal=> move 100000 from cur_test;  
MOVE 100000  
Time: 18.638 ms  
digoal=> fetch 10 from cur_test;  
   id   | firstname | lastname |   corp   | age   
--------+-----------+----------+----------+-----  
 100000 | zhou      | digoal   | sky-mobi |  27  
 100001 | zhou      | digoal   | sky-mobi |  27  
 100002 | zhou      | digoal   | sky-mobi |  27  
 100003 | zhou      | digoal   | sky-mobi |  27  
 100004 | zhou      | digoal   | sky-mobi |  27  
 100005 | zhou      | digoal   | sky-mobi |  27  
 100006 | zhou      | digoal   | sky-mobi |  27  
 100007 | zhou      | digoal   | sky-mobi |  27  
 100008 | zhou      | digoal   | sky-mobi |  27  
 100009 | zhou      | digoal   | sky-mobi |  27  
(10 rows)  
Time: 0.317 ms  
digoal=> fetch 10 from cur_test;  
   id   | firstname | lastname |   corp   | age   
--------+-----------+----------+----------+-----  
 100010 | zhou      | digoal   | sky-mobi |  27  
 100011 | zhou      | digoal   | sky-mobi |  27  
 100012 | zhou      | digoal   | sky-mobi |  27  
 100013 | zhou      | digoal   | sky-mobi |  27  
 100014 | zhou      | digoal   | sky-mobi |  27  
 100015 | zhou      | digoal   | sky-mobi |  27  
 100016 | zhou      | digoal   | sky-mobi |  27  
 100017 | zhou      | digoal   | sky-mobi |  27  
 100018 | zhou      | digoal   | sky-mobi |  27  
 100019 | zhou      | digoal   | sky-mobi |  27  
(10 rows)  
Time: 0.223 ms  
  
OFFSET:  
digoal=> select * from tbl_user order by id offset 100000 limit 10;  
   id   | firstname | lastname |   corp   | age   
--------+-----------+----------+----------+-----  
 100000 | zhou      | digoal   | sky-mobi |  27  
 100001 | zhou      | digoal   | sky-mobi |  27  
 100002 | zhou      | digoal   | sky-mobi |  27  
 100003 | zhou      | digoal   | sky-mobi |  27  
 100004 | zhou      | digoal   | sky-mobi |  27  
 100005 | zhou      | digoal   | sky-mobi |  27  
 100006 | zhou      | digoal   | sky-mobi |  27  
 100007 | zhou      | digoal   | sky-mobi |  27  
 100008 | zhou      | digoal   | sky-mobi |  27  
 100009 | zhou      | digoal   | sky-mobi |  27  
(10 rows)  
  
Time: 15.432 ms  
digoal=> select * from tbl_user order by id offset 100010 limit 10;  
   id   | firstname | lastname |   corp   | age   
--------+-----------+----------+----------+-----  
 100010 | zhou      | digoal   | sky-mobi |  27  
 100011 | zhou      | digoal   | sky-mobi |  27  
 100012 | zhou      | digoal   | sky-mobi |  27  
 100013 | zhou      | digoal   | sky-mobi |  27  
 100014 | zhou      | digoal   | sky-mobi |  27  
 100015 | zhou      | digoal   | sky-mobi |  27  
 100016 | zhou      | digoal   | sky-mobi |  27  
 100017 | zhou      | digoal   | sky-mobi |  27  
 100018 | zhou      | digoal   | sky-mobi |  27  
 100019 | zhou      | digoal   | sky-mobi |  27  
(10 rows)  
  
Time: 15.170 ms  

所以有可能的话还是尽量使用CURSOR来做翻页

cursor使用需要注意如下事项:

1. 指针使用完一定要记得用close关掉.

2. 尽量避免在大的结果集下使用with hold游标,因为需要将数据预加载到内存或临时文件,并发大并且结果集较大的话可能导致内存消耗过多。

3. 同时应尽量避免长事务(如等待用户翻页),由于带有without hold指针的事务会影响VACUUM回收空间,这个是非常严重的事情。

其他:

python的fetchall,fetchmany函数分别返回所有和多个

Flag Counter

digoal’s 大量PostgreSQL文章入口