PostgreSQL 黑科技 range 类型及 gist index 助力物联网(IoT)
背景
一位社区的兄弟跟我抱怨MYSQL里面查IP地址库并发几千每秒的查询数据库就抗不住了。
于是问他要来了他们的IP地址库数据和查询用的SQL以及MYSQL里面的表结构。
把数据转到PostgreSQL里面做一下相对应的压力测试,看看PostgreSQL的表现。
在其他的业务中,这样的需求也是屡见不鲜,比如年龄范围,收入范围,频繁活动的范围,地理位置区块,几何区块,线段等。都是用范围来描述的,随着物联网的发展,这类查询需求会越来越旺盛。
如果没有好的索引机制,查询需要消耗大量的CPU,很容易出现性能瓶颈。
本文要给大家介绍的是PostgreSQL 9.2引入的范围类型,以及针对范围类型的索引,大幅提升范围查询的性能。
正文
注意range类型必须包含subtype, 例如int4的range类型叫int4range.
例如1,2,3,4,5作为int4range类型可以写成’(0,6)’::int4range 或 ‘[1,6)’::int4range 或 ‘[1,5]’::int4range 或 ‘(0,5]’::int4range
postgres=# select '(0,6)'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '[1,6)'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '[1,5]'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '(0,5]'::int4range;
int4range
-----------
[1,6)
(1 row)
注意到稀疏类型的range类型, 内部存储的都是[).
中括号表示包含, 大括号表示不包含.
稀疏类型的range类型必须定义CANONICAL函数, 用来转换成[)格式的存储.
稀疏类型可以理解为知道一个值的前一个值和后一个值是多少,例如INT类型的1的后面是2,前面是0.
但是如果numeric的话,1的前面就不知道是多少了(0.9999999999..无穷…),后面也不知道(1.00000000….无穷.1) 。
但是对于连续类型的range类型, 内部存储则是精确存储的, 例如.
postgres=# select '(0,5]'::numrange;
numrange
----------
(0,5]
(1 row)
postgres=# select '[0,5]'::numrange;
numrange
----------
[0,5]
(1 row)
postgres=# select '[0,5)'::numrange;
numrange
----------
[0,5)
(1 row)
postgres=# select '(0,5)'::numrange;
numrange
----------
(0,5)
(1 row)
范围类型的 操作符、索引功能、内置函数 等介绍
PostgreSQL range类型提供了几个很好的功能, 例如包含,不包含,交叉等等。
系统自定义的range类型有
INT4RANGE — Range of INTEGER
INT8RANGE — Range of BIGINT
NUMRANGE — Range of NUMERIC
TSRANGE — Range of TIMESTAMP WITHOUT TIME ZONE
TSTZRANGE — Range of TIMESTAMP WITH TIME ZONE
DATERANGE — Range of DATE
系统表里面能查到的如下 :
digoal=# select oid ,typname from pg_type where typname ~ 'range';
oid | typname
-------+------------
3904 | int4range
3905 | _int4range
3906 | numrange
3907 | _numrange
3908 | tsrange
3909 | _tsrange
3910 | tstzrange
3911 | _tstzrange
3912 | daterange
3913 | _daterange
3926 | int8range
3927 | _int8range
3831 | anyrange
11026 | pg_range
再来看一下有哪些函数和anyrange类型相关, 简单的介绍一下.
digoal=# select proname,proargtypes from pg_proc where proargtypes::text ~ '3831';
proname | proargtypes
-------------------------+----------------------
anyrange_out | 3831
range_out | 3831
range_send | 3831
lower | 3831 -- 这个range的底部的值, 稀疏和连续类型有区别. 见例子
upper | 3831 -- 这个range的顶部的值, 稀疏和连续类型有区别. 见例子
isempty | 3831 -- 这个range里面是否不包含任何元素
lower_inc | 3831 -- 低位是否是包含的. , 稀疏和连续类型有区别. 见例子
upper_inc | 3831 -- 高位是否是包含的. , 稀疏和连续类型有区别. 见例子
lower_inf | 3831 -- 低位是否是无穷小. (注意这里指的不是subtype的无穷类型, 而是未定义低位的意思. 见例子)
upper_inf | 3831 -- 高位是否是无穷大. (注意这里指的不是subtype的无穷类型, 而是未定义高位的意思. 见例子)
range_eq | 3831 3831
range_ne | 3831 3831
range_overlaps | 3831 3831
range_contains_elem | 3831 2283
range_contains | 3831 3831
elem_contained_by_range | 2283 3831
range_contained_by | 3831 3831
range_adjacent | 3831 3831
range_before | 3831 3831
range_after | 3831 3831
range_overleft | 3831 3831
range_overright | 3831 3831
range_union | 3831 3831
range_intersect | 3831 3831
range_minus | 3831 3831
range_cmp | 3831 3831
range_lt | 3831 3831
range_le | 3831 3831
range_ge | 3831 3831
range_gt | 3831 3831
range_gist_consistent | 2281 3831 23 26 2281
range_gist_same | 3831 3831 2281
hash_range | 3831
相关的操作符如下 :
postgres=# select oprname from pg_operator where oprleft=3831 or oprright=3831;
oprname
---------
=
<>
<
<=
>=
>
&&
@>
@>
<@
<@
<<
>>
&<
&>
-|-
+
-
*
内置range类型的使用例子
创建测试表
digoal=# CREATE TABLE reservation ( room int, during TSRANGE );
插入以timestamp为subtype的range类型的测试数据
digoal=# INSERT INTO reservation VALUES
digoal-# ( 1108, '[2010-01-01 14:30, 2010-01-01 15:30)' );
INSERT 0 1
@>判断是否包含
digoal=# SELECT int4range(10, 20) @> 3;
?column?
----------
f
(1 row)
&& 判断两个range是否有交叉
digoal=# SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);
?column?
----------
t
(1 row)
upper查出顶端, int8range也是个稀疏range
digoal=# SELECT upper(int8range(15, 25));
upper
-------
25
(1 row)
使用numrange, 连续range查顶端, 与它一致.
digoal=# SELECT upper(numrange(15, 25));
upper
-------
25
(1 row)
换种写法可能更好理解, 稀疏输出的是不包含的顶端.换句话说是转换成 [) 后的里面的顶端和底部对应的值.
digoal=# SELECT upper('(15,25)'::int8range);
upper
-------
25
(1 row)
digoal=# SELECT upper('(15,25]'::int8range);
upper
-------
26
(1 row)
连续range, 输入是什么样子的就是什么样子的.
digoal=# SELECT upper('(15,25]'::numrange);
upper
-------
25
(1 row)
digoal=# SELECT upper('(15,25)'::numrange);
upper
-------
25
(1 row)
*符号输出两个range的交叉部分.
digoal=# SELECT int4range(10, 20) * int4range(15, 25);
?column?
----------
[15,20)
(1 row)
isempty函数输出range是否为空
digoal=# SELECT isempty(numrange(1, 5));
isempty
---------
f
(1 row)
isempty函数输出range是否为空
digoal=# SELECT isempty(numrange(1, 1));
isempty
---------
t
(1 row)
介绍一下无穷大
以下为例子
表示从now到无穷大的时间范围。
digoal=# SELECT '(now,)'::tsrange;
tsrange
---------------------------------
("2012-05-17 16:32:43.055233",)
(1 row)
表示从无穷小到无穷大的时间范围。
digoal=# SELECT '(,)'::tsrange;
tsrange
---------
(,)
(1 row)
从无穷小到now的时间范围.
digoal=# SELECT '(,now)'::tsrange;
tsrange
---------------------------------
(,"2012-05-17 16:32:55.800172")
(1 row)
介绍一下range类型的输入格式
(lower-bound,upper-bound)
(lower-bound,upper-bound]
[lower-bound,upper-bound)
[lower-bound,upper-bound]
empty
举个例子, 以下表示这个range为空.没有任何元素.
digoal=# SELECT 'empty'::tsrange;
tsrange
---------
empty
(1 row)
range类型的构造器函数, 这个函数的名字和range类型的名字一致, 带3个参数, 分别是底部值,顶部值,边界格式如( [], (], (), [) ).
例如int4range的构造器函数也叫int4range.
digoal=# select int4range(1,2,'()');
int4range
-----------
empty
(1 row)
digoal=# select int4range(1,2,'(]');
int4range
-----------
[2,3)
(1 row)
digoal=# select int4range(null,2,'(]');
int4range
-----------
(,3)
(1 row)
自建range类型
也会自动创建同名的构造器函数 , 如下 :
digoal=# create type iprange as range (subtype=inet);
CREATE TYPE
digoal=# select iprange('1.1.1.1'::inet,null);
iprange
------------
[1.1.1.1,)
(1 row)
我们来看看创建range类型的语法:
CREATE TYPE name AS RANGE (
SUBTYPE = subtype
[ , SUBTYPE_OPCLASS = subtype_operator_class ]
[ , COLLATION = collation ]
[ , CANONICAL = canonical_function ]
[ , SUBTYPE_DIFF = subtype_diff_function ]
)
这里不多解释, subtype_diff_function是用来提高gist索引的查询性能的, canonical_function是用来定义稀疏range类型的.
接下来我们在range类型上使用gist索引加速查询和某些特殊场景.
A GiST index can accelerate queries involving these range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &>
注意range类型不适合使用btree索引和hash索引.
digoal=# CREATE INDEX reservation_idx ON reservation USING gist (during);
CREATE INDEX
digoal=# \d reservation
Table "public.reservation"
Column | Type | Modifiers
--------+---------+-----------
room | integer |
during | tsrange |
Indexes:
"reservation_idx" gist (during)
digoal=# insert into reservation values (1,'(,now)'::tsrange);
INSERT 0 1
digoal=# select * from reservation ;
room | during
------+-----------------------------------------------
1108 | ["2010-01-01 14:30:00","2010-01-01 15:30:00")
1 | (,"2012-05-17 16:49:13.40783")
(2 rows)
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on reservation (cost=0.00..1.01 rows=1 width=36)
Filter: (during @> '["2012-05-17 16:50:18.794268","2012-05-17 16:50:18.794268"]'::tsrange)
(2 rows)
记录数太少,没走索引,接下来我们强制让它走索引.
digoal=# set enable_seqscan=off;
SET
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Index Scan using reservation_idx on reservation (cost=0.00..8.27 rows=1 width=36)
Index Cond: (during @> '["2012-05-17 16:50:59.716661","2012-05-17 16:50:59.716661"]'::tsrange)
(2 rows)
exclude约束
这个在PG DBA2000培训中也讲过,
可以参考
http://blog.163.com/digoal@126/blog/static/16387704020124793313702/
range中使用exclude约束的例子:
digoal=# delete from reservation ;
DELETE 2
以下约束表示, 不允许during 存储的时间上有交叉
digoal=# ALTER TABLE reservation
digoal-# ADD EXCLUDE USING gist (during WITH &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_during_excl" for table "reservation"
ALTER TABLE
digoal=# INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:30, 2010-01-01 13:00)' );
INSERT 0 1
因为时间上有交叉, 所以插入不成功, 约束有效.
digoal=# INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).
STATEMENT: INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).
还可以安装btree_gist 模块, 加强exclude约束的功能.
例如,我现在没有安装btree_gist模块, 在int类型的列上使用gist索引则不成功.
没有安装btree_gist模块时, int列上不允许创建gist索引, 所以以下SQL返回错误.
digoal=# ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
ERROR: data type integer has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
STATEMENT: ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
ERROR: data type integer has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
加载btree_gist模块重试上面的SQL.成功。
digoal=# create extension btree_gist;
CREATE EXTENSION
digoal=# ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_room_during_excl" for table "reservation"
ALTER TABLE
这个时候排除了room相等并且during字段上存在交叉的记录插入.
例如,
digoal=# CREATE TABLE room_reservation
digoal-# (
digoal(# room TEXT,
digoal(# during TSRANGE,
digoal(# EXCLUDE USING gist (room WITH =, during WITH &&)
digoal(# );
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "room_reservation_room_during_excl" for table "room_reservation"
CREATE TABLE
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123A', '[2010-01-01 14:00, 2010-01-01 15:00)' );
INSERT 0 1
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
STATEMENT: INSERT INTO room_reservation VALUES
( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123B', '[2010-01-01 14:30, 2010-01-01 15:30)' );
INSERT 0 1
btree_gist类型支持的类型如下 :
下次介绍btree_gist索引, 它是一个很有趣的索引.
int2, int4, int8, float4, float8, numeric, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, char, varchar, text, bytea, bit, varbit, macaddr, inet, and cidr.
实际的应用场景举例
例如通过IP地址定位这个IP是什么地方的.
假设我用iprange存储一段IP对应一个地名, 根据用户提交上来的IP可以去检索出这个IP属于什么地名的.
首先要创建iprange类型.
digoal=# create type iprange as range (subtype=inet);
CREATE TYPE
创建测试表
digoal=# create table ip_info (id serial primary key,iprange iprange,location text);
NOTICE: CREATE TABLE will create implicit sequence "ip_info_id_seq" for serial column "ip_info.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "ip_info_pkey" for table "ip_info"
CREATE TABLE
创建exclude约束, 注意我这里使用了text类型的gist索引, 所以需要先加载btree_gist模块, 否则会创建不成功.
digoal=# alter table ip_info add constraint ck_exclude_iprange exclude using gist(location with =, iprange with &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "ck_exclude_iprange" for table "ip_info"
ALTER TABLE
插入测试数据
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.0'::inet,'192.168.1.10'::inet,'[]'),'北京');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.11'::inet,'192.168.1.20'::inet,'[]'),'上海');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.21'::inet,'192.168.1.30'::inet,'[]'),'南京');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.31'::inet,'192.168.1.40'::inet,'[]'),'杭州');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.41'::inet,'192.168.1.50'::inet,'[]'),'南昌');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.51'::inet,'192.168.1.60'::inet,'[]'),'广州');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.61'::inet,'192.168.1.70'::inet,'[]'),'重庆');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.71'::inet,'192.168.1.80'::inet,'[]'),'香港');
INSERT 0 1
查看当前的表结构
digoal=# \d ip_info
Table "public.ip_info"
Column | Type | Modifiers
----------+---------+------------------------------------------------------
id | integer | not null default nextval('ip_info_id_seq'::regclass)
iprange | iprange |
location | text |
Indexes:
"ip_info_pkey" PRIMARY KEY, btree (id)
"ck_exclude_iprange" EXCLUDE USING gist (location WITH =, iprange WITH &&)
测试查询
digoal=# select * from ip_info where iprange @> '192.168.1.1'::inet;
id | iprange | location
----+----------------------------+----------
1 | [192.168.1.0,192.168.1.10] | 北京
(1 row)
查看执行计划.
digoal=# explain select * from ip_info where iprange @> '192.168.1.1'::inet;
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using ck_exclude_iprange on ip_info (cost=0.00..8.27 rows=1 width=68)
Index Cond: (iprange @> '192.168.1.1'::inet)
(2 rows)
参考
http://www.postgresql.org/docs/9.2/static/rangetypes.html
http://www.postgresql.org/docs/9.2/static/btree-gist.html
其他
9.1以前有个temporal模块, 可以实现和tsrange类似的功能, 如下
https://www.pgcon.org/2009/schedule/events/151.en.html
https://github.com/jeff-davis/PostgreSQL-Temporal
http://www.pgxn.org/dist/temporal/