函数稳定性讲解 - 函数索引思考, pay attention to function index used in PostgreSQL

3 minute read

背景

今天一位同事又问到了这个问题, 函数索引. 这是一个老问题了.

PostgreSQL中函数有三种状态, 不稳定, 稳定, 非常稳定.

三种状态分别代表的意思如下 :

《函数稳定性讲解 - Thinking PostgreSQL Function’s Volatility Categories》

如果一个时间字段上要创建to_char(timestamp without time zone, text)的函数索引. 是没有办法创建的, 报错

digoal=> create table t1 (crt_time timestamp);  
CREATE TABLE  
digoal=> \set VERBOSITY verbose  
digoal=> create index t1_1 on t1 (to_char(crt_time,'yyyy-mm-dd'));  
ERROR:  42P17: functions in index expression must be marked IMMUTABLE  
LOCATION:  ComputeIndexAttrs, indexcmds.c:909  

为什么呢?

看看to_char函数的稳定性状态 :

digoal=> select proname,provolatile,proargtypes from pg_proc where proname='to_char';  
 proname | provolatile | proargtypes   
---------+-------------+-------------  
 to_char | s           | 20 25  
 to_char | s           | 23 25  
 to_char | s           | 700 25  
 to_char | s           | 701 25  
 to_char | s           | 1114 25  
 to_char | s           | 1184 25  
 to_char | s           | 1186 25  
 to_char | s           | 1700 25  
(8 rows)  
  
digoal=> select oid,typname from pg_type where oid in (20,25,23,700,701,1114,1184,1186,1700);  
 oid  |   typname     
------+-------------  
   20 | int8  
   23 | int4  
   25 | text  
  700 | float4  
  701 | float8  
 1114 | timestamp  
 1184 | timestamptz  
 1186 | interval  
 1700 | numeric  
(9 rows)  

为什么函数索引一定要immutable的函数呢?

函数索引一旦建立后, 如果执行计划走索引扫描, PostgreSQL必须确保在任何情况下, 函数得到的结果和建立函数索引时得到的结果是一致的, 否则就会出现走索引检索到的和走全表扫描检索到的结果不一致. 这可是灾难性的BUG.

测试如下 :

digoal=> create table test (id int,crt_time timestamp(0) with time zone);  
CREATE TABLE  
  
-- 正常情况下创建这个函数索引将报错  
  
digoal=> create index idx_test on test (to_char(crt_time,'yyyymmddhh24'));  
ERROR:  42P17: functions in index expression must be marked IMMUTABLE  
LOCATION:  ComputeIndexAttrs, indexcmds.c:909  
  
-- 手工修改了to_char(timestamp with time zone,text)的strict immutable.  
  
digoal=> \c digoal postgres  
You are now connected to database "digoal" as user "postgres".  
digoal=# alter function to_char(timestamp with time zone,text) strict immutable;  
ALTER FUNCTION  
  
-- 新建函数索引成功  
  
digoal=# \c digoal digoal  
You are now connected to database "digoal" as user "digoal".  
digoal=> create index idx_test on test (to_char(crt_time,'yyyymmddhh24'));  
CREATE INDEX  
  
-- 接下来查看当前的TimeZone  
  
digoal=> show TimeZone;  
 TimeZone   
----------  
 PRC  
(1 row)  
  
-- 插入一条测试数据  
  
digoal=> insert into test values (1,'2012-01-01 12:00:00');  
INSERT 0 1  
  
-- 在当前的TimeZone下查看to_char(crt_time,'yyyymmddhh24') = 2012010112  
  
digoal=> select to_char(crt_time,'yyyymmddhh24') from test;  
  to_char     
------------  
 2012010112  
(1 row)  
  
-- 把TimeZone改成了GMT  
  
digoal=> set TimeZone='GMT';  
SET  
  
-- 查看to_char(crt_time,'yyyymmddhh24') = 2012010104  
-- 同样的参数, 得到的结果不一致.   
  
digoal=> select to_char(crt_time,'yyyymmddhh24') from test;  
  to_char     
------------  
 2012010104  
(1 row)  
  
-- 接下来在执行计划是走索引时的查询结果. 因为建立索引时, 这个to_char(crt_time,'yyyymmddhh24')是等于2012010112的,   
-- 所以走索引扫描将返回一条记录.  
  
digoal=> explain select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';  
                                  QUERY PLAN                                    
------------------------------------------------------------------------------  
 Index Scan using idx_test on test  (cost=0.00..8.43 rows=10 width=12)  
   Index Cond: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)  
(2 rows)  
  
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';  
 id |        crt_time          
----+------------------------  
  1 | 2012-01-01 04:00:00+00  
(1 row)  
  
-- 接下来严重BUG发生了, 走全表扫描的话同样的SQL, 查询结果没了.  
-- 因为走全表扫描的话, 这个值 to_char(crt_time,'yyyymmddhh24') 将被重新运算, 结果是2012010104  
  
digoal=> set enable_indexscan=off;  
SET  
digoal=> set enable_bitmapscan=off;  
SET  
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';  
 id | crt_time   
----+----------  
(0 rows)  
  
digoal=> explain analyze select * from test where to_char(crt_time,'yyyymmddhh24')='2012010112';  
                                           QUERY PLAN                                              
-------------------------------------------------------------------------------------------------  
 Seq Scan on test  (cost=0.00..39.10 rows=10 width=12) (actual time=0.031..0.031 rows=0 loops=1)  
   Filter: (to_char(crt_time, 'yyyymmddhh24'::text) = '2012010112'::text)  
 Total runtime: 0.066 ms  
(3 rows)  
  
-- 查询2012010104则有结果, 返回一条记录  
  
digoal=> select * from test where to_char(crt_time,'yyyymmddhh24')='2012010104';  
 id |        crt_time          
----+------------------------  
  1 | 2012-01-01 04:00:00+00  
(1 row)  
  
-- 所以在使用函数索引时需要慎重, 创建函数是strict的选择也要慎重.  
-- 如果一定要在时间类型上有类似的请求怎么办, 第一使用timestamp without time zone , 使用immutable的date_trunc函数代替to_char  
  
digoal=> select proname,provolatile,proargtypes from pg_proc where proname='date_trunc';  
  proname   | provolatile | proargtypes   
------------+-------------+-------------  
 date_trunc | s           | 25 1184  
 date_trunc | i           | 25 1186  
 date_trunc | i           | 25 1114  
(3 rows)  
  
digoal=> select oid,typname from pg_type where oid in (25,1184,1186,1114);  
 oid  |   typname     
------+-------------  
   25 | text  
 1114 | timestamp  
 1184 | timestamptz  
 1186 | interval  
(4 rows)  

其他

1. Thinking PostgreSQL Function’s Volatility Categories

《函数稳定性讲解 - Thinking PostgreSQL Function’s Volatility Categories》

2. 还有几种用法

仅仅适用timestamp without time zone, 带时区的都会有以上类似的问题.

2.1 以下是个折中方案, 不必修改系统函数to_char的strict, 但是必须符合一定条件才可以冒这个险去使用它.

this is intended behaviour as to_char depends on the LC_MESSAGES setting

也就是说在使用环境中不会涉及LC_MESSAGES的变动(或者变动后不会影响to_char的结果.)时方可使用.

CREATE OR REPLACE FUNCTION my_to_char(some_time timestamp)   
  RETURNS text  
AS  
$BODY$  
    select to_char($1, 'yyyy-mm-dd');  
$BODY$  
LANGUAGE sql  
IMMUTABLE;  

2.2

cast(crt_time as date)  

crt_time必须是不带时区的. 否则会有类似问题

digoal=> set TimeZone='PRC';  
SET  
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);  
    date      
------------  
 2012-01-01  
(1 row)  
  
digoal=> set TimeZone='GMT';  
SET  
digoal=> select cast (timestamp with time zone '2012-01-01 13:00:00+14' as date);  
    date      
------------  
 2011-12-31  
(1 row)  

3. 还有一种方法是增加一列, 用来存储需要建立的函数索引to_char()的结果值. 这样就不需要使用函数索引了, 也不需要考虑函数的strict问题.

Flag Counter

digoal’s 大量PostgreSQL文章入口