函数稳定性讲解 - 函数索引思考, pay attention to function index used in PostgreSQL
背景
今天一位同事又问到了这个问题, 函数索引. 这是一个老问题了.
PostgreSQL中函数有三种状态, 不稳定, 稳定, 非常稳定.
三种状态分别代表的意思如下 :
《函数稳定性讲解 - Thinking PostgreSQL Function’s Volatility Categories》
如果一个时间字段上要创建to_char(timestamp without time zone, text)的函数索引. 是没有办法创建的, 报错
digoal=> create table t1 (crt_time timestamp);
CREATE TABLE
digoal=> \set VERBOSITY verbose
digoal=> create index t1_1 on t1 (to_char(crt_time,'yyyy-mm-dd'));
ERROR: 42P17: functions in index expression must be marked IMMUTABLE
LOCATION: ComputeIndexAttrs, indexcmds.c:909
为什么呢?
看看to_char函数的稳定性状态 :
digoal=> select proname,provolatile,proargtypes from pg_proc where proname='to_char';
proname | provolatile | proargtypes
---------+-------------+-------------
to_char | s | 20 25
to_char | s | 23 25
to_char | s | 700 25
to_char | s | 701 25
to_char | s | 1114 25
to_char | s | 1184 25
to_char | s | 1186 25
to_char | s | 1700 25
(8 rows)
digoal=> select oid,typname from pg_type where oid in (20,25,23,700,701,1114,1184,1186,1700);
oid | typname
------+-------------
20 | int8
23 | int4
25 | text
700 | float4
701 | float8
1114 | timestamp
1184 | timestamptz
1186 | interval
1700 | numeric
(9 rows)
为什么函数索引一定要immutable的函数呢?
函数索引一旦建立后, 如果执行计划走索引扫描, PostgreSQL必须确保在任何情况下, 函数得到的结果和建立函数索引时得到的结果是一致的, 否则就会出现走索引检索到的和走全表扫描检索到的结果不一致. 这可是灾难性的BUG.
测试如下 :
digoal=> create table test (id int,crt_time timestamp(0) with time zone);
CREATE TABLE
-- 正常情况下创建这个函数索引将报错
digoal=> create index idx_test on test (to_char(crt_time,'yyyymmddhh24'));
ERROR: 42P17: functions in index expression must be marked IMMUTABLE
LOCATION: ComputeIndexAttrs, indexcmds.c:909
-- 手工修改了to_char(timestamp with time zone,text)的strict immutable.
digoal=> \c digoal postgres
You are now connected to database "digoal" as user "postgres".
digoal=# alter function to_char(timestamp with time zone,text) strict immutable;
ALTER FUNCTION
-- 新建函数索引成功
digoal=# \c digoal digoal
You are now connected to database "digoal" as user "digoal".
digoal=> create index idx_test on test (to_char(crt_time,'yyyymmddhh24'));
CREATE INDEX
-- 接下来查看当前的TimeZone
digoal=> show TimeZone;
TimeZone
----------
PRC
(1 row)
-- 插入一条测试数据
digoal=> insert into test values (1,'2012-01-01 12:00:00');
INSERT 0 1
-- 在当前的TimeZone下查看to_char(crt_time,'yyyymmddhh24') = 2012010112
digoal=> select to_char(crt_time,'yyyymmddhh24') from test;
to_char
------------
2012010112
(1 row)
-- 把TimeZone改成了GMT
digoal=> set TimeZone='GMT';
SET
-- 查看to_char(crt_time,'yyyymmddhh24') = 2012010104
-- 同样的参数, 得到的结果不一致.
digoal=> select to_char(crt_time,'yyyymmddhh24') from test;
to_char
------------
2012010104
(1 row)
-- 接下来在执行计划是走索引时的查询结果. 因为建立索引时, 这个to_char(crt_time,'yyyymmddhh24')是等于2012010112的,
-- 所以走索引扫描将返回一条记录.
digoal=> explain select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using idx_test on test (cost=0.00..8.43 rows=10 width=12)
Index Cond: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)
(2 rows)
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
id | crt_time
----+------------------------
1 | 2012-01-01 04:00:00+00
(1 row)
-- 接下来严重BUG发生了, 走全表扫描的话同样的SQL, 查询结果没了.
-- 因为走全表扫描的话, 这个值 to_char(crt_time,'yyyymmddhh24') 将被重新运算, 结果是2012010104
digoal=> set enable_indexscan=off;
SET
digoal=> set enable_bitmapscan=off;
SET
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
id | crt_time
----+----------
(0 rows)
digoal=> explain analyze select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..39.10 rows=10 width=12) (actual time=0.031..0.031 rows=0 loops=1)
Filter: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)
Total runtime: 0.066 ms
(3 rows)
-- 查询2012010104则有结果, 返回一条记录
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010104';
id | crt_time
----+------------------------
1 | 2012-01-01 04:00:00+00
(1 row)
-- 所以在使用函数索引时需要慎重, 创建函数是strict的选择也要慎重.
-- 如果一定要在时间类型上有类似的请求怎么办, 第一使用timestamp without time zone , 使用immutable的date_trunc函数代替to_char
digoal=> select proname,provolatile,proargtypes from pg_proc where proname='date_trunc';
proname | provolatile | proargtypes
------------+-------------+-------------
date_trunc | s | 25 1184
date_trunc | i | 25 1186
date_trunc | i | 25 1114
(3 rows)
digoal=> select oid,typname from pg_type where oid in (25,1184,1186,1114);
oid | typname
------+-------------
25 | text
1114 | timestamp
1184 | timestamptz
1186 | interval
(4 rows)
其他
1. Thinking PostgreSQL Function’s Volatility Categories
《函数稳定性讲解 - Thinking PostgreSQL Function’s Volatility Categories》
2. 还有几种用法
仅仅适用timestamp without time zone, 带时区的都会有以上类似的问题.
2.1 以下是个折中方案, 不必修改系统函数to_char的strict, 但是必须符合一定条件才可以冒这个险去使用它.
this is intended behaviour as to_char depends on the LC_MESSAGES setting
也就是说在使用环境中不会涉及LC_MESSAGES的变动(或者变动后不会影响to_char的结果.)时方可使用.
CREATE OR REPLACE FUNCTION my_to_char(some_time timestamp)
RETURNS text
AS
$BODY$
select to_char($1, 'yyyy-mm-dd');
$BODY$
LANGUAGE sql
IMMUTABLE;
2.2
cast(crt_time as date)
crt_time必须是不带时区的. 否则会有类似问题
digoal=> set TimeZone='PRC';
SET
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);
date
------------
2012-01-01
(1 row)
digoal=> set TimeZone='GMT';
SET
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);
date
------------
2011-12-31
(1 row)
3. 还有一种方法是增加一列, 用来存储需要建立的函数索引to_char()的结果值. 这样就不需要使用函数索引了, 也不需要考虑函数的strict问题.