PostgreSQL 9.0 modify pg_attribute.atttypmod extend variable char length avoid rewrite table

3 minute read

背景

在PostgreSQL 9.1以及以下版本中扩展字段长度时, 需要rewrite table. 如果是大表, 这个操作是比较费时的, 索引很多的话时间还会拉长.

而实际上扩长度理论上是不需要rewrite table的,只需要修改元数据即可。

9.2开始,扩字段不需要rewrite table了。

我们来看手册的介绍

9.1 release

Allow ALTER TABLE ... SET DATA TYPE to avoid table rewrites in appropriate cases (Noah Misch, Robert Haas)  
For example, converting a varchar column to text no longer requires a rewrite of the table.   
However, increasing the length constraint on a varchar column still requires a table rewrite.  

9.2 release

Reduce need to rebuild tables and indexes for certain ALTER TABLE ... ALTER COLUMN TYPE operations (Noah Misch)  
  
Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite.   
Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite.   
Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.  

那么对于9.1以及以前的版本,有什么不需要rewrite table的方法来扩字段长度呢?

正文

9.0

psql (9.0.0)  
Type "help" for help.  
postgres=# create table test(id int,info varchar(100),crt_time timestamp)  
postgres-# ;  
CREATE TABLE  
postgres=# insert into test select generate_series(1,1000),md5(random()::text),clock_timestamp();  
INSERT 0 1000  
postgres=# select * from pg_relation_filepath('test');  
 pg_relation_filepath   
----------------------  
 base/11874/9302962  
(1 row)  

使用alter table的方式扩展字段长度, 需要重写表. 也就是数据重新拷贝.

postgres=# alter table test alter column info type varchar(512);  
ALTER TABLE  

数据文件已经发生变更.

postgres=# select * from pg_relation_filepath('test');  
 pg_relation_filepath   
----------------------  
 base/11874/9303169  
(1 row)  

那么有没有更好的办法来解决这个问题呢? 如下.

通过修改pg_attribute.atttypmod字段可以达到同样效果, 同时不需要rewrite table.

postgres=# drop table test;  
DROP TABLE  
postgres=# create table test(id int,info varchar(100),crt_time timestamp);                         
CREATE TABLE  
postgres=# insert into test select generate_series(1,1000),md5(random()::text),clock_timestamp();  
INSERT 0 1000  
postgres=# select * from pg_relation_filepath('test');  
 pg_relation_filepath   
----------------------  
 base/11874/9303305  
(1 row)  

以下SQL插入的时候报错, 意思是超过100个字符.

postgres=# insert into test select generate_series(1,1000),repeat(md5(random()::text),30),clock_timestamp();  
ERROR:  value too long for type character varying(100)  
postgres=# \x  
Expanded display is on.  

查看pg_attribute中的atttypmod字段, 变长字段(可以用作toast存储)的头信息用掉了4个字节, 所以100的长度在这里显示的是104.

postgres=# select * from pg_attribute where attrelid ='test'::regclass and attname='info';  
-[ RECORD 1 ]-+--------  
attrelid      | 9303305  
attname       | info  
atttypid      | 1043  
attstattarget | -1  
attlen        | -1  
attnum        | 2  
attndims      | 0  
attcacheoff   | -1  
atttypmod     | 104  
attbyval      | f  
attstorage    | x  
attalign      | i  
attnotnull    | f  
atthasdef     | f  
attisdropped  | f  
attislocal    | t  
attinhcount   | 0  
attacl        |   
attoptions    |   

更新为516, 那么这个字段的长度限制为512.

postgres=# update pg_attribute set atttypmod=516 where  attrelid ='test'::regclass and attname='info';  
UPDATE 1  
postgres=# select atttypmod from pg_attribute where attrelid ='test'::regclass and attname='info';  
-[ RECORD 1 ]--  
atttypmod | 516  

插入的时候报错, 意思是超过512个字符. 说明已经更新成功了.

postgres=# insert into test select generate_series(1,1000),repeat(md5(random()::text),30),clock_timestamp();  
ERROR:  value too long for type character varying(512)  

将字段长度更新为1028.

postgres=# update pg_attribute set atttypmod=1028 where  attrelid ='test'::regclass and attname='info';  
UPDATE 1  

重新插入成功

postgres=# insert into test select generate_series(1,1000),repeat(md5(random()::text),30),clock_timestamp();  
INSERT 0 1000  
postgres=# \d test  
                Table "public.test"  
  Column  |            Type             | Modifiers   
----------+-----------------------------+-----------  
 id       | integer                     |   
 info     | character varying(1024)     |   
 crt_time | timestamp without time zone |   

数据文件不变.

postgres=# select * from pg_relation_filepath('test');  
-[ RECORD 1 ]--------+-------------------  
pg_relation_filepath | base/11874/9303305  
postgres=# select * from test where id=1;  
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
id       | 1  
info     | 81416de8f27294f4283ce67f6a505463  
crt_time | 2013-06-17 16:40:07.827194  
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
id       | 1  
info     | d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69d2169b5852fd4eff728cae68933f0e69  
crt_time | 2013-06-17 16:41:38.173321  

其他测试

如果原字段上有索引, 也可以正常使用执行计划.

使用pg_dump导出时, 字段长度也可以正常导出更新后的长度.

参考

1. http://blog.163.com/digoal@126/blog/static/163877040201222210555500/

2. http://blog.163.com/digoal@126/blog/static/163877040201112251058216/

3. 变长字段(可选toast存储), 字段头为4字节(4字节用于表述它的真实长度).

如果是定长字段, 那么小于126字节时, 只需要1字节的头.

src/backend/access/common/heaptuple.c

 * Before Postgres 8.3 varlenas always had a 4-byte length header, and  
 * therefore always needed 4-byte alignment (at least).  This wasted space  
 * for short varlenas, for example CHAR(1) took 5 bytes and could need up to  
 * 3 additional padding bytes for alignment.  
 *  
 * Now, a short varlena (up to 126 data bytes) is reduced to a 1-byte header  
 * and we don't align it.  To hide this from datatype-specific functions that  
 * don't want to deal with it, such a datum is considered "toasted" and will  
 * be expanded back to the normal 4-byte-header format by pg_detoast_datum.  
 * (In performance-critical code paths we can use pg_detoast_datum_packed  
 * and the appropriate access macros to avoid that overhead.)  Note that this  
 * conversion is performed directly in heap_form_tuple, without invoking  
 * tuptoaster.c.  

举个例子,其中VARHDRSZ就是4

/*  
 * Return reversed string  
 */  
Datum  
text_reverse(PG_FUNCTION_ARGS)  
{  
        text       *str = PG_GETARG_TEXT_PP(0);  
        const char *p = VARDATA_ANY(str);  
        int                     len = VARSIZE_ANY_EXHDR(str);  
        const char *endp = p + len;  
        text       *result;  
        char       *dst;  
  
        result = palloc(len + VARHDRSZ);  
        dst = (char *) VARDATA(result) + len;  
        SET_VARSIZE(result, len + VARHDRSZ);  
  
        if (pg_database_encoding_max_length() > 1)  
        {  
                /* multibyte version */  
                while (p < endp)  
                {  
                        int                     sz;  
  
                        sz = pg_mblen(p);  
                        dst -= sz;  
                        memcpy(dst, p, sz);  
                        p += sz;  
                }  
        }  
        else  
        {  
                /* single byte version */  
                while (p < endp)  
                        *(--dst) = *p++;  
        }  
  
        PG_RETURN_TEXT_P(result);  
}  

Flag Counter

digoal’s 大量PostgreSQL文章入口