PostgreSQL 列存储引擎 susql (志铭奉献)
背景
susql在PostgreSQL9.5基础之上,增加了支持列存储,支持zlib压缩,支持COPY跳过异常行的功能。
可以在以下WEB下载测试。
https://github.com/susql/susql/releases
感谢志铭为PG社区的付出。
新增功能扩展语法介绍:
Synopsis
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
{ column_name data_type [ COLLATE collation ] [ COMPRESS compression ] [ column_constraint [ ... ] ]
| table_constraint
| LIKE source_table [ like_option ... ]
| COMPRESS default_compression }
[, ... ]
] )
[ INHERITS ( parent_table [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace_name ]
[ STORED AS { HEAP | ORC } ]
[ SORT BY { column_name [,...]}]
Parameters
STORED AS
HEAP: row storage format(postgres heap)(default).
ORC: column storage format.
SORT BY
When table is STORED AS ORC, sort the column for reading optimize.
COMPRESS
To set column compression type for a table. COMPRESS default_compression is for default compression type if individual column compression is not specific.
There is a buildin compression type PGLZ, and an extension compression type ZLIB(create extension dc_zlib fistly to use ZLIB compression).
注意insert需要vacuum后才能转换到列存储, copy进来的数据不需要转换直接进 of 文件(即列存)。
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | DIRECTORY 'directory_name' [ RECURSIVE { 'NUMBER' } ] | STDIN }
[ [ WITH ] ( option [, ...] ) ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
where option can be one of:
FORMAT format_name
OIDS [ boolean ]
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
UNSTRICT [ boolean ]
UNSTRICT_NUM { number }
Parameters
DIRECTORY Indicates the input is a directory and will copy all file in this directory.
RECURSIVE {NUMBER} Specifies if the recusive directory copy will be applied.
UNSTRICT Specifies that whether continue copy process when some line is broken.
UNSTRICT_NUM Specifies how many broken lines reach to stop the copy process. Only effects when UNSTRICT is set.
通过create compress可以创建压缩方法。
postgres=# create compress configuration zlib2 (template=zlib, level=9);
CREATE COMPRESS CONFIGURATION
Time: 9.996 ms
postgres=# create compress configuration pglz2 (template=pglz, ...);
pglz用的参数如下:
* PGLZ_Strategy -
*
* Some values that control the compression algorithm.
*
* min_input_size Minimum input data size to consider compression.
*
* max_input_size Maximum input data size to consider compression.
*
* min_comp_rate Minimum compression rate (0-99%) to require.
* Regardless of min_comp_rate, the output must be
* smaller than the input, else we don't store
* compressed.
*
* first_success_by Abandon compression if we find no compressible
* data within the first this-many bytes.
*
* match_size_good The initial GOOD match size when starting history
* lookup. When looking up the history to find a
* match that could be expressed as a tag, the
* algorithm does not always walk back entirely.
* A good match fast is usually better than the
* best possible one very late. For each iteration
* in the lookup, this value is lowered so the
* longer the lookup takes, the smaller matches
* are considered good.
*
* match_size_drop The percentage by which match_size_good is lowered
* after each history check. Allowed values are
* 0 (no change until end) to 100 (only check
* latest history entry at all).
安装:
[root@digoal ~]# rpm -ivh susql50-5.0.1-0.el6.x86_64.rpm
error: Failed dependencies:
susql50-libs(x86-64) = 5.0.1-0.el6 is needed by susql50-5.0.1-0.el6.x86_64
[root@digoal ~]# rpm -ivh susql50-libs-5.0.1-0.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:susql50-libs ########################################### [100%]
/sbin/ldconfig: /opt/gcc4.9.3/lib/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
/sbin/ldconfig: /opt/gcc4.9.3/lib64/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
[root@digoal ~]# rpm -ivh susql50-5.0.1-0.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:susql50 ########################################### [100%]
[root@digoal ~]# rpm -ivh susql50-server-5.0.1-0.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:susql50-server ########################################### [100%]
/sbin/ldconfig: /opt/gcc4.9.3/lib/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
/sbin/ldconfig: /opt/gcc4.9.3/lib64/libstdc++.so.6.0.20-gdb.py is not an ELF file - it has the wrong magic bytes at the start.
error reading information on service susql-5.0: No such file or directory
[root@digoal ~]# rpm -ivh susql50-contrib-5.0.1-0.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:susql50-contrib ########################################### [100%]
初始化数据库
[root@digoal ~]# mkdir /data01/susql
[root@digoal ~]# chown susql:susql /data01/susql
[root@digoal ~]# su - susql
-bash-4.1$ vi env_pg.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1923
export PGDATA=/data01/susql/pgdata/pg_root
export LANG=en_US.utf8
export PGHOME=/usr/susql-5.0
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
[root@digoal ~]# su - susql
-bash-4.1$ . ./env_pg.sh
-bash: unalias: vi: not found
susql@digoal-> initdb -D $PGDATA -E UTF8 --locale=C -U postgres -W
The files belonging to this database system will be owned by user "susql".
This user must also own the server process.
The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory /data01/susql/pgdata/pg_root ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /data01/susql/pgdata/pg_root/base/1 ... ok
initializing pg_authid ... ok
Enter new superuser password:
Enter it again:
setting password ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
pg_ctl -D /data01/susql/pgdata/pg_root -l logfile start
susql@digoal-> cd $PGDATA
修改配置文件
susql@digoal-> grep "^[a-z]" postgresql.conf
listen_addresses = '0.0.0.0' # what IP address(es) to listen on;
port = 1923 # (change requires restart)
max_connections = 100 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
unix_socket_permissions = 0700 # begin with 0 to use octal notation
tcp_keepalives_idle = 70 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds;
tcp_keepalives_count = 10 # TCP_KEEPCNT;
shared_buffers = 4096MB # min 128kB
dynamic_shared_memory_type = posix # the default is the first option
bgwriter_delay = 10ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round
synchronous_commit = off # synchronization level;
full_page_writes = off # recover from partial page writes
wal_compression = off # enable compression of full-page writes
wal_log_hints = off # also do full page writes of non-critical updates
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
checkpoint_timeout = 35min # range 30s-1h
max_wal_size = 4GB
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
effective_cache_size = 4GB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%a.log' # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file with the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 0 # Automatic rotation of logfiles will
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose # terse, default, or verbose messages
log_timezone = 'PRC'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
enable_copyto_flush = on #When set, Copy To operation will explicit flush the data when finishing
copy_extension = on #When set, Copy Extension Feature Enabled
测试
susql@digoal-> psql
psql (9.5.0 (SuSQL 5.0.1))
Type "help" for help.
postgres=# create extension dc_zlib;
CREATE EXTENSION
postgres=# create table test_col(
id int,
info text compress zlib,
crt_time timestamp,
c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int)
stored as orc
sort by id;
postgres=# create table test_heap(
id int,
info text compress zlib,
crt_time timestamp,
c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int)
stored as heap;
postgres=# create table test_heap_pglz(
id int,
info text compress pglz,
crt_time timestamp,
c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int)
stored as heap;
postgres=# \d+++ test_heap
Table "public.test_heap"
Column | Type | Modifiers | Storage | Compression | Stats target | Description
----------+-----------------------------+-----------+----------+-------------+--------------+-------------
id | integer | | plain | | |
info | text | | extended | zlib | |
crt_time | timestamp without time zone | | plain | | |
c1 | integer | | plain | | |
c2 | integer | | plain | | |
c3 | integer | | plain | | |
c4 | integer | | plain | | |
c5 | integer | | plain | | |
c6 | integer | | plain | | |
c7 | integer | | plain | | |
c8 | integer | | plain | | |
c9 | integer | | plain | | |
c10 | integer | | plain | | |
c11 | integer | | plain | | |
c12 | integer | | plain | | |
Stored As: HEAP
postgres=# \d+++ test_col
Table "public.test_col"
Column | Type | Modifiers | Storage | Compression | Stats target | Description
----------+-----------------------------+-----------+----------+-------------+--------------+-------------
id | integer | | plain | | |
info | text | | extended | zlib | |
crt_time | timestamp without time zone | | plain | | |
c1 | integer | | plain | | |
c2 | integer | | plain | | |
c3 | integer | | plain | | |
c4 | integer | | plain | | |
c5 | integer | | plain | | |
c6 | integer | | plain | | |
c7 | integer | | plain | | |
c8 | integer | | plain | | |
c9 | integer | | plain | | |
c10 | integer | | plain | | |
c11 | integer | | plain | | |
c12 | integer | | plain | | |
Stored As: ORC
postgres=# \d+++ test_heap_pglz
Table "public.test_heap_pglz"
Column | Type | Modifiers | Storage | Compression | Stats target | Description
----------+-----------------------------+-----------+----------+-------------+--------------+-------------
id | integer | | plain | | |
info | text | | extended | pglz | |
crt_time | timestamp without time zone | | plain | | |
c1 | integer | | plain | | |
c2 | integer | | plain | | |
c3 | integer | | plain | | |
c4 | integer | | plain | | |
c5 | integer | | plain | | |
c6 | integer | | plain | | |
c7 | integer | | plain | | |
c8 | integer | | plain | | |
c9 | integer | | plain | | |
c10 | integer | | plain | | |
c11 | integer | | plain | | |
c12 | integer | | plain | | |
Stored As: HEAP
性能测试
postgres=# \timing
Timing is on.
postgres=# insert into test_heap select i,repeat(md5(random()::text),64),clock_timestamp(),i,i,i,i,i,i,i,i,i,i,i,i from generate_series(1,1000000) t(i);
INSERT 0 1000000
Time: 18183.374 ms
postgres=# insert into test_col select i,repeat(md5(random()::text),64),clock_timestamp(),i,i,i,i,i,i,i,i,i,i,i,i from generate_series(1,1000000) t(i);
INSERT 0 1000000
Time: 19871.817 ms
postgres=# insert into test_heap select trunc(5000000*random()),repeat(md5(random()::text),64),clock_timestamp(),i,i,i,i,i,i,i,i,i,i,i,i from generate_series(1,1000000) t(i);
INSERT 0 1000000
Time: 20575.763 ms
postgres=# insert into test_col select trunc(5000000*random()),repeat(md5(random()::text),64),clock_timestamp(),i,i,i,i,i,i,i,i,i,i,i,i from generate_series(1,1000000) t(i);
INSERT 0 1000000
Time: 20440.462 ms
postgres=# select count(*) from test_col where id<10;
count
-------
10
(1 row)
Time: 417.611 ms
postgres=# select count(*) from test_col where id<10;
count
-------
10
(1 row)
Time: 386.153 ms
postgres=# select count(*) from test_heap where id<10;
count
-------
12
(1 row)
Time: 204.857 ms
postgres=# select count(*) from test_heap where id<10;
count
-------
12
(1 row)
Time: 210.323 ms
postgres=# select count(*) from test_col ;
count
---------
2000000
(1 row)
Time: 383.868 ms
postgres=# select count(*) from test_heap;
count
---------
2000000
(1 row)
Time: 182.571 ms
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------------+-------+----------+--------+-------------
public | test_col | table | postgres | 319 MB |
public | test_heap | table | postgres | 319 MB |
public | test_heap_pglz | table | postgres | 319 MB |
(3 rows)
postgres=# select count(id) from test_heap;
count
---------
2000000
(1 row)
Time: 210.082 ms
postgres=# select count(id) from test_col;
count
---------
2000000
(1 row)
Time: 468.594 ms
postgres=# explain analyze select count(distinct id) from test_col;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.83..8.84 rows=1 width=4) (actual time=1743.652..1743.652 rows=1 loops=1)
-> Custom Scan (ORC) on test_col (cost=0.00..7.06 rows=706 width=4) (actual time=0.018..431.255 rows=2000000 loops=1)
Planning time: 0.112 ms
Execution time: 1743.735 ms
(4 rows)
Time: 1744.215 ms
postgres=# explain analyze select count(distinct id) from test_heap;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=65816.71..65816.72 rows=1 width=4) (actual time=1513.416..1513.416 rows=1 loops=1)
-> Seq Scan on test_heap (cost=0.00..60816.77 rows=1999977 width=4) (actual time=0.017..221.881 rows=2000000 loops=1)
Planning time: 0.068 ms
Execution time: 1513.478 ms
(4 rows)
Time: 1513.900 ms
postgres=# explain analyze select count(distinct (id,info)) from test_col;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.83..8.84 rows=1 width=36) (actual time=15137.273..15137.274 rows=1 loops=1)
-> Custom Scan (ORC) on test_col (cost=0.00..7.06 rows=706 width=36) (actual time=0.014..513.342 rows=2000000 loops=1)
Planning time: 0.127 ms
Execution time: 15137.395 ms
(4 rows)
Time: 15138.411 ms
postgres=# explain analyze select count(distinct (id,info)) from test_heap;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=65816.71..65816.72 rows=1 width=72) (actual time=13865.787..13865.787 rows=1 loops=1)
-> Seq Scan on test_heap (cost=0.00..60816.77 rows=1999977 width=72) (actual time=0.012..235.289 rows=2000000 loops=1)
Planning time: 0.104 ms
Execution time: 13865.856 ms
(4 rows)
Time: 13866.441 ms
排序
postgres=# set work_mem='1GB';
postgres=# explain analyze select id from test_heap order by id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=270132.69..275132.69 rows=2000000 width=4) (actual time=766.044..959.705 rows=2000000 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 142903kB
-> Seq Scan on test_heap (cost=0.00..60817.00 rows=2000000 width=4) (actual time=0.012..254.047 rows=2000000 loops=1)
Planning time: 0.148 ms
Execution time: 1041.256 ms
(6 rows)
postgres=# explain analyze select * from test_heap order by id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort (cost=270132.69..275132.69 rows=2000000 width=128) (actual time=820.897..1104.178 rows=2000000 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 580403kB
-> Seq Scan on test_heap (cost=0.00..60817.00 rows=2000000 width=128) (actual time=0.017..202.813 rows=2000000 loops=1)
Planning time: 0.107 ms
Execution time: 1193.899 ms
(6 rows)
postgres=# explain analyze select * from test_col order by id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=229315.69..234315.69 rows=2000000 width=128) (actual time=1269.664..1553.249 rows=2000000 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 580403kB
-> Custom Scan (ORC) on test_col (cost=0.00..20000.00 rows=2000000 width=128) (actual time=0.019..417.134 rows=2000000 loops=1)
Planning time: 0.145 ms
Execution time: 1646.298 ms
(6 rows)
postgres=# explain analyze select id from test_col order by id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=229315.69..234315.69 rows=2000000 width=4) (actual time=1001.483..1207.091 rows=2000000 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 142903kB
-> Custom Scan (ORC) on test_col (cost=0.00..20000.00 rows=2000000 width=4) (actual time=0.020..466.365 rows=2000000 loops=1)
Planning time: 0.112 ms
Execution time: 1293.078 ms
(6 rows)
postgres=# set work_mem='1MB';
SET
postgres=# explain analyze select id from test_col order by id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=284006.69..289006.69 rows=2000000 width=4) (actual time=1837.612..2159.556 rows=2000000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 27392kB
-> Custom Scan (ORC) on test_col (cost=0.00..20000.00 rows=2000000 width=4) (actual time=0.019..465.012 rows=2000000 loops=1)
Planning time: 0.110 ms
Execution time: 2246.295 ms
(6 rows)
postgres=# explain analyze select id from test_heap order by id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=324823.69..329823.69 rows=2000000 width=4) (actual time=1742.758..2080.076 rows=2000000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 27384kB
-> Seq Scan on test_heap (cost=0.00..60817.00 rows=2000000 width=4) (actual time=0.016..278.976 rows=2000000 loops=1)
Planning time: 0.079 ms
Execution time: 2176.640 ms
(6 rows)
验证列存储的sort by id是否起作用,从测试结果来看并没有排序存储。
postgres=# select id from test_col limit 1 offset 1000000;
id
---------
3492623
(1 row)
postgres=# select id from test_col limit 1 offset 1000001;
id
--------
292687
(1 row)
postgres=# select id from test_col limit 1 offset 1000002;
id
---------
3435092
(1 row)
查看物理文件
postgres=# select pg_relation_filepath ('test_heap'::regclass);
pg_relation_filepath
----------------------
base/13245/16436
(1 row)
Time: 0.593 ms
postgres=# select pg_relation_filepath ('test_col'::regclass);
pg_relation_filepath
----------------------
base/13245/16408
(1 row)
Time: 0.456 ms
postgres=# \q
susql@digoal-> cd $PGDATA/base/
susql@digoal-> ll 13245/16436*
-rw------- 1 susql susql 319M Feb 29 09:04 13245/16436
-rw------- 1 susql susql 104K Feb 29 09:04 13245/16436_fsm
susql@digoal-> ll 13245/16408*
-rw------- 1 susql susql 319M Feb 29 09:04 13245/16408
-rw------- 1 susql susql 104K Feb 29 09:04 13245/16408_fsm
-rw------- 1 susql susql 8.0K Feb 29 08:58 13245/16408_of
使用insert插入的数据,没有写入列存储,需要调用vacuum重铸。
postgres=# vacuum test_col ;
VACUUM
-rw------- 1 susql susql 0 Feb 29 12:57 16408
-rw------- 1 susql susql 16K Feb 29 12:57 16408_fsm
-rw------- 1 susql susql 319M Feb 29 12:57 16408_of
重铸后,sort by起作用了。
postgres=# select id from test_heap offset 1000000 limit 10;
id
---------
4668069
3385308
3298340
2234740
3610229
2569580
1705913
828489
856546
1197041
(10 rows)
Time: 123.598 ms
postgres=# select id from test_col offset 1000000 limit 10;
id
--------
990214
990215
990216
990217
990218
990219
990220
990221
990222
990223
(10 rows)
Time: 100.979 ms
查看编译项
susql@digoal-> pg_config
BINDIR = /usr/susql-5.0/bin
DOCDIR = /usr/susql-5.0/doc
HTMLDIR = /usr/susql-5.0/doc
INCLUDEDIR = /usr/susql-5.0/include
PKGINCLUDEDIR = /usr/susql-5.0/include
INCLUDEDIR-SERVER = /usr/susql-5.0/include/server
LIBDIR = /usr/susql-5.0/lib
PKGLIBDIR = /usr/susql-5.0/lib
LOCALEDIR = /usr/susql-5.0/share/locale
MANDIR = /usr/susql-5.0/share/man
SHAREDIR = /usr/susql-5.0/share
SYSCONFDIR = /etc/sysconfig/susql
PGXS = /usr/susql-5.0/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/susql-5.0' '--includedir=/usr/susql-5.0/include' '--mandir=/usr/susql-5.0/share/man' '--datadir=/usr/susql-5.0/share' '--enable-nls' '--with-libxml' '--with-libxslt' '--with-system-tzdata=/usr/share/zoneinfo' '--sysconfdir=/etc/sysconfig/susql' '--docdir=/usr/susql-5.0/doc' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_ADJ=0' 'LDFLAGS=-Wl,--as-needed'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -DPGPG -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_ADJ=0
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,--as-needed -Wl,-rpath,'/usr/susql-5.0/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.5.0 (SuSQL 5.0.1)
验证copy跳过异常的功能
postgres=# copy test_heap to '/data01/susql/test.csv' ;
COPY 2000000
-rw-r--r-- 1 susql susql 4.1G Feb 29 09:22 test.csv
[root@digoal susql]# echo "err" >> test.csv
[root@digoal susql]# echo "err" >> test.csv
[root@digoal susql]# echo "err" >> test.csv
[root@digoal susql]# echo "err" >> test.csv
[root@digoal susql]# echo "err" >> test.csv
[root@digoal susql]# head -n 10 test.csv >> test.csv
postgres=# copy test_col from '/data01/susql/test.csv' with( UNSTRICT true, UNSTRICT_NUM 10);
INFO: 00000: invalid input syntax for integer: "err" (822)
CONTEXT: COPY test_col, file /data01/susql/test.csv, line 2000001, column id: "err"
LOCATION: PerformErrorLoggingForCopy, copy.c:5117
INFO: 00000: invalid input syntax for integer: "err" (822)
CONTEXT: COPY test_col, file /data01/susql/test.csv, line 2000002, column id: "err"
LOCATION: PerformErrorLoggingForCopy, copy.c:5117
INFO: 00000: invalid input syntax for integer: "err" (822)
CONTEXT: COPY test_col, file /data01/susql/test.csv, line 2000003, column id: "err"
LOCATION: PerformErrorLoggingForCopy, copy.c:5117
INFO: 00000: invalid input syntax for integer: "err" (822)
CONTEXT: COPY test_col, file /data01/susql/test.csv, line 2000004, column id: "err"
LOCATION: PerformErrorLoggingForCopy, copy.c:5117
INFO: 00000: invalid input syntax for integer: "err" (822)
CONTEXT: COPY test_col, file /data01/susql/test.csv, line 2000005, column id: "err"
LOCATION: PerformErrorLoggingForCopy, copy.c:5117
COPY 2000010
压缩比比较:
postgres=# create compress configuration zlib0 (template=zlib, level=1);
CREATE COMPRESS CONFIGURATION
Time: 0.814 ms
postgres=# create compress configuration zlib2 (template=zlib, level=9);
CREATE COMPRESS CONFIGURATION
Time: 0.814 ms
postgres=# create table test_heap_zlib0(
id int,
info text compress zlib0,
crt_time timestamp,
c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int)
stored as heap;
CREATE TABLE
postgres=# create table test_heap_zlib2(
id int,
info text compress zlib2,
crt_time timestamp,
c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int)
stored as heap;
CREATE TABLE
postgres=# insert into test_heap_zlib0 select * from test_heap;
INSERT 0 4000000
postgres=# insert into test_heap_zlib2 select * from test_heap;
INSERT 0 4000000
postgres=# select pg_size_pretty(pg_total_relation_size('test_heap'));
pg_size_pretty
----------------
638 MB
(1 row)
Time: 1.190 ms
postgres=# select pg_size_pretty(pg_total_relation_size('test_heap_zlib0'));
pg_size_pretty
----------------
638 MB
(1 row)
Time: 0.344 ms
postgres=# select pg_size_pretty(pg_total_relation_size('test_heap_zlib2'));
pg_size_pretty
----------------
638 MB
(1 row)
小结
1. 本测试用例下,pglz和zlib的压缩比一致。
2. 本测试用例下,列存的查询效率并没有比行存的效率高。列存使用了9.5新增的custom scan provider接口,和pgstrom使用GPU的方法类似。
3. 列存储在物理文件上和行存储一致,没有按列分割文件。insert插入的数据需要vacuum后才能转成列存,COPY进来的数据不需要执行vacuum。
4. COPY可以跳过错误的行,并输出行号。
本文测试用到的数据量还比较小,感兴趣的同学可以测试更大数据量,看看列存储的性能表现。
有问题可以找志铭.