HOW to Change PostgreSQL’s TOAST_TUPLE_THRESHOLD

2 minute read

背景

TOAST表的设计,在一个需要频繁更新(小字段更新)操作的表中非常有用,可以大大降低不必要的大字段带来的IO和CPU(大字段上有索引时) 。

优化场景可参考 :

http://blog.163.com/digoal@126/blog/static/16387704020130931040444/

但是PostgreSQL对什么时候启用TOAST存储做了限制,

检测是否需要TOAST的函数如下 :

src/backend/catalog/toasting.c

/*  
 * Check to see whether the table needs a TOAST table.  It does only if  
 * (1) there are any toastable attributes, and (2) the maximum length  
 * of a tuple could exceed TOAST_TUPLE_THRESHOLD.  (We don't want to  
 * create a toast table for something like "f1 varchar(20)".)  
 */  
static bool  
needs_toast_table(Relation rel)  
{  
        int32           data_length = 0;  
        bool            maxlength_unknown = false;  
        bool            has_toastable_attrs = false;  
        TupleDesc       tupdesc;  
        Form_pg_attribute *att;  
        int32           tuple_length;  
        int                     i;  
  
        tupdesc = rel->rd_att;  
        att = tupdesc->attrs;  
  
        for (i = 0; i < tupdesc->natts; i++)  
        {  
                if (att[i]->attisdropped)  
                        continue;  
                data_length = att_align_nominal(data_length, att[i]->attalign);  
                if (att[i]->attlen > 0)  
                {  
                        /* Fixed-length types are never toastable */  
                        data_length += att[i]->attlen;  
                }  
                else  
                {  
                        int32           maxlen = type_maximum_size(att[i]->atttypid,  
                                                                                                   att[i]->atttypmod);  
  
                        if (maxlen < 0)  
                                maxlength_unknown = true;  
                        else  
                                data_length += maxlen;  
                        if (att[i]->attstorage != 'p')  
                                has_toastable_attrs = true;  
                }  
        }  
        if (!has_toastable_attrs)  
                return false;                   /* nothing to toast? */  
        if (maxlength_unknown)  
                return true;                    /* any unlimited-length attrs? */  
        tuple_length = MAXALIGN(offsetof(HeapTupleHeaderData, t_bits) +  
                                                        BITMAPLEN(tupdesc->natts)) +  
                MAXALIGN(data_length);  
        return (tuple_length > TOAST_TUPLE_THRESHOLD);  
}  

TOAST_TUPLE_THRESHOLD如下 :

src/include/access/tuptoaster.h

/*  
 * Find the maximum size of a tuple if there are to be N tuples per page.  
 */  
#define MaximumBytesPerTuple(tuplesPerPage) \  
        MAXALIGN_DOWN((BLCKSZ - \  
                                   MAXALIGN(SizeOfPageHeaderData + (tuplesPerPage) * sizeof(ItemIdData))) \  
                                  / (tuplesPerPage))  
  
/*  
 * These symbols control toaster activation.  If a tuple is larger than  
 * TOAST_TUPLE_THRESHOLD, we will try to toast it down to no more than  
 * TOAST_TUPLE_TARGET bytes through compressing compressible fields and  
 * moving EXTENDED and EXTERNAL data out-of-line.  
 *  
 * The numbers need not be the same, though they currently are.  It doesn't  
 * make sense for TARGET to exceed THRESHOLD, but it could be useful to make  
 * it be smaller.  
 *  
 * Currently we choose both values to match the largest tuple size for which  
 * TOAST_TUPLES_PER_PAGE tuples can fit on a heap page.  
 *  
 * XXX while these can be modified without initdb, some thought needs to be  
 * given to needs_toast_table() in toasting.c before unleashing random  
 * changes.  Also see LOBLKSIZE in large_object.h, which can *not* be  
 * changed without initdb.  
 */  
#define TOAST_TUPLES_PER_PAGE   4  
  
#define TOAST_TUPLE_THRESHOLD   MaximumBytesPerTuple(TOAST_TUPLES_PER_PAGE)  
  
#define TOAST_TUPLE_TARGET              TOAST_TUPLE_THRESHOLD  

从上面定义可以得出, 如果你编译PostgreSQL时指定的数据块是8KB的,

  --with-blocksize=BLOCKSIZE  
                          set table block size in kB [8]  

那么TOAST_TUPLE_THRESHOLD约为2KB

MAXALIGN_DOWN(  
  (BLCKSZ - MAXALIGN(SizeOfPageHeaderData + (tuplesPerPage) * sizeof(ItemIdData)))   
  /   
  (tuplesPerPage)  
)  

如果想要在字段长度[可选压缩后]大于200字节后就启用TOAST, 可以调整TOAST_TUPLES_PER_PAGE为40.

重新编译即可.

正文

重新编译后测试验证 :

postgres=# create table test (info text);  
CREATE TABLE  

改为external, 不压缩. 这样更能看出结果.

postgres=# alter table test alter column info set storage external;  
ALTER TABLE  
postgres=# select pg_column_size(repeat('a', 200));  
 pg_column_size   
----------------  
            204  
(1 row)  
postgres=# insert into test values (repeat('a', 200));  
INSERT 0 1  
postgres=# select pg_relation_size(reltoastrelid::regclass) from pg_class where relname='test';  
 pg_relation_size   
------------------  
             8192  
(1 row)  

可以看出200字节已经存储到TOAST表了.

在未修改TOAST_TUPLES_PER_PAGE的PostgreSQL数据库中测试如下 :

digoal=> truncate test;  
TRUNCATE TABLE  
digoal=> insert into test values (repeat('a', 200));  
INSERT 0 1  
digoal=> select pg_relation_size(reltoastrelid::regclass) from pg_class where relname='test';  
 pg_relation_size   
------------------  
                0  
(1 row)  

可以看出200字节未触发存储到TOAST表的动作

digoal=> insert into test values (repeat('a', 2000));  
INSERT 0 1  
digoal=> select pg_relation_size(reltoastrelid::regclass) from pg_class where relname='test';  
 pg_relation_size   
------------------  
                0  
(1 row)  

2000字节未触发.

digoal=> insert into test values (repeat('a', 2100));  
INSERT 0 1  
digoal=> select pg_relation_size(reltoastrelid::regclass) from pg_class where relname='test';  
 pg_relation_size   
------------------  
             8192  
(1 row)  

2100字节触发TOAST.

参考

1. http://blog.163.com/digoal@126/blog/static/1638770402012116115354333/

2. http://blog.163.com/digoal@126/blog/static/16387704020120524144140/

3. http://blog.163.com/digoal@126/blog/static/163877040201122910531988/

4. src/include/access/tuptoaster.h

5. src/backend/catalog/toasting.c

Flag Counter

digoal’s 大量PostgreSQL文章入口