行为、审计日志 (实时索引/实时搜索)建模 - 最佳实践 1
背景
在很多系统中会记录用户的行为日志,行为日志包括浏览行为、社交行为、操作行为等。
典型的应用例如:数据库的SQL审计、企业内部的堡垒机(行为审计)等。
行为、审计日志的量与业务量或者操作量有关,为了满足企业实时查询的需求,通常需要构建搜索引擎,比如使用ES或者使用PostgreSQL的全文检索功能来实现。
如果使用PostgreSQL来构建,有几个优势,可以满足多个需求:
1. 明细存储的需求,除了需要建立索引的字段,明细字段也可以存储在PostgreSQL中。
2. 索引的需求,即建立日志行为字段的全文索引。
3. 多维度索引的需求,除了日志行为字段的索引,还可以建立其他字段的索引,例如时间维度,属性维度的索引。这些索引可以组合使用,满足多个维度的搜索需求。
4. 不需要同步到搜索引擎,满足了实时搜索的需求。
例子
测试机
磁盘,使用空间大、廉价的SATA盘,使用一块SSD作为BCACHE写缓存。
12 * N TB
目录规划,每块盘一个目录
/disk[1-12]
操作系统配置
略
编译PostgreSQL
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf postgresql-snapshot.tar.bz2
cd postgresql-10beta1
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10_8k --with-blocksize=8 --with-wal-segsize=1024
LIBS=-lpthread CFLAGS="-O3" make world -j 128
LIBS=-lpthread CFLAGS="-O3" make install-world
环境变量
vi env_pg10.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=$(($1+1920))
export PGDATA=/disk$1/digoal/pgdata/pg_root$(($1+1920))
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql10_8k
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export LD_RUN_PATH=$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
初始化集群
创建12个数据库集群,对应到每一块磁盘。可以充分利用磁盘的IO。
vi init.sh
for ((i=1;i<13;i++))
do
. ~/env_pg10.sh $i
initdb -D $PGDATA -E SQL_ASCII --locale=C -U postgres
echo "local all all trust" > $PGDATA/pg_hba.conf
echo "host all all 127.0.0.1/32 trust" >> $PGDATA/pg_hba.conf
echo "host all all ::1/128 trust" >> $PGDATA/pg_hba.conf
echo "host all all 0.0.0.0/0 trust" >> $PGDATA/pg_hba.conf
done
. ./init.sh
配置参数
postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
listen_addresses = '0.0.0.0'
max_connections = 400
unix_socket_directories = '.'
shared_buffers = 32GB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 5.0
bgwriter_flush_after = 0
max_parallel_workers_per_gather = 0
fsync=on
synchronous_commit = off
full_page_writes = off
wal_buffers = 128MB
wal_writer_delay = 2s
wal_writer_flush_after = 0
checkpoint_timeout = 10min
max_wal_size = 32GB
min_wal_size = 4GB
checkpoint_completion_target = 0.5
checkpoint_flush_after = 128kB
max_wal_senders = 0
random_page_cost = 1.5
effective_cache_size = 48GB
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 500000000
autovacuum_multixact_freeze_max_age = 600000000
vacuum_freeze_table_age = 400000000
vacuum_multixact_freeze_table_age = 400000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
autovacuum_work_mem = 2GB
autovacuum_max_workers = 6
autovacuum_naptime = 20s
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.2
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 0
zero_damaged_pages = on
cp postgresql.auto.conf /disk1/digoal/pgdata/pg_root1921/
...
...
cp postgresql.auto.conf /disk12/digoal/pgdata/pg_root1932/
启动集群
将数据库实例绑定到不同的CPU核
vi start.sh
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh $i
if [ $i -eq 12 ]; then
numactl --physcpubind=56-63 pg_ctl start -o "-c port=$PGPORT"
else
numactl --physcpubind="$((($i-1)*(5)+(1)))"-"$(($i*5))" pg_ctl start -o "-c port=$PGPORT"
fi
done
. ./start.sh
停止集群
vi stop.sh
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh $i
pg_ctl stop -m fast
done
. ./stop.sh
建模
表结构
4个字段,分别存储PK(对应原始明细数据的PK),时间,用户ID,用户行为(tsvector字段)。
检索需求
检索时可能按照时间区间,用户ID,以及分词条件进行检索。
保留时长
日志保留一段时间(例如1个月)后清除。
分区
每个集群中,创建若干个分区表,例如本例使用了12个分区表。
如果条件允许,建议每个小时一个分区表,这样的话可以不建时间索引,查询时间区间的数据使用分区即可。
如果单个用户的数据量很庞大,那么建议按UID再建立哈希或LIST分区,这样的话,按照UID查询,不需要使用索引(可以省去在UID建立索引,甚至省去存储UID这个字段)。
索引
行为字段,全文索引。
用户ID,B-TREE索引。
时间字段,brin块级索引。
模拟数据
时间,时序产生。
用户ID,在一个范围内随机产生。
用户行为数据,长约512字符的字符串,拆分成若干个TOKEN,例如本例为40个长度不等的TOKEN。
初始化数据表
vi init.sql
drop table bptest cascade;
do language plpgsql $$
declare
begin
for i in 1..12 loop
execute 'drop table bptest'||i||' cascade';
end loop;
end;
$$;
create table bptest(pk serial8, uid int, ts int8, content tsvector);
create index idx_bptest_content on bptest using gin (content) with (fastupdate=on,gin_pending_list_limit=2048);
create index idx_bptest_uid on bptest(uid);
create index idx_bptest_ts on bptest using brin(ts);
do language plpgsql $$
declare
begin
for i in 1..12 loop
execute 'create table bptest'||i||'(like bptest including all) inherits(bptest)';
end loop;
end;
$$;
-- 产生若干字符长度的随机字符串
CREATE OR REPLACE FUNCTION public.gen_rand_str(integer)
RETURNS text
LANGUAGE sql
STRICT
AS $function$
select string_agg(a[random()*6+1],'') from generate_series(1,$1), (select array['a','b','c','d','e','f',' ']) t(a);
$function$;
psql -f ./init.sql -p 1921
...
...
psql -f ./init.sql -p 1932
测试
灌入测试数据,例如每张表插入2亿,一个数据库插入24亿(约6TB),总共插入288亿(约72TB)。
每10条一批灌入。
vi test.sh
for ((i=1;i<13;i++))
do
echo "insert into bptest${i} (uid,ts,content) select random()*100000 , extract(epoch from now()), to_tsvector('english',gen_rand_str(512)) from generate_series(1,10);" > ~/test${i}.sql
done
for ((i=1;i<13;i++))
do
. /home/digoal/env_pg10.sh ${i}
for ((x=1;x<13;x++))
do
nohup pgbench -M prepared -n -r -P 3 -f ./test${x}.sql -c 1 -j 1 -t 200000000 >/tmp/bptest_${i}_${x}.log 2>&1 &
done
done
chmod 500 test.sh
查询测试数据如下,数据非常随机,每条记录的content约40个元素,长度限定在512字符。
select * from bptest1 limit 1;
pk | 1
uid | 849185
ts | 1494928859
content | 'aaaefba':14 'acddcfd':39 'acdeeaadbffdbbecceb':50 'aceeedfbaefbdfcbd':59 'adbbeddbecfdcffaeedcedaeeddaeaaeebfbdcdcecfbbebfcebabceffbfdbfbfa':60
'adcdf':61 'aead':47 'afddf':70 'ba':8 'bae':37 'bbaacffbcafeffafefdf':38 'bbe':55 'bbecfdf':32 'bcbfd':27 'bdce':45 'bdeccbcdeaabefbeeebcdbfddd':19
'bed':17 'beedeadccbbbecbfcbf':44 'bfccaeddaddbc':2 'cafdfcf':5 'cbcacefaff':3 'cbcfc':52 'cbfef':63 'ccdcbedb':33 'ccdcd':20 'cd':6 'cfecfeeccabf':42
'cffb':15 'dabdfddeeabfdcefb':16 'dacdeecfbcefebfabeedfabbaccec':57 'daee':1 'daffcdffadddbaffd':68 'dbcddacefcd':9
'dbdbcbfadfffbdddaaabdcbcecdbecbbdecffbfcfecbbfebfebcadefecfceadaeffd':11 'dcdf':23 'dd':53 'ddec':31 'debdcdebfffebdbfdeefffbcfbccbececdbeaffffedfbefdcccbbccadedecfbeccccbbb':48
'deefaeeaabdbbdafcfcbeecc':71 'df':26 'dfcbbcd':46 'e':7,51,56 'eafddcaac':43 'ecbaffa':21 'ecdeeceddbdcbfcabdc':10 'ecedcec':41 'ed':66
'edcbaecfcdfbcbcdedeebdbfceeeececfac':35 'eeca':25 'eeebafeacfebfdbdbddaacabebabbfbfdefeddefccfbeaefdbf':29 'eefdbfcadebcbbfffaefcaecafbddbdbfcf':13
'ef':58 'efbdc':67 'efccdddaebfbdaffcdfcbfdcbdeb':54 'efccebdddededdeda':64 'effcbfdfdeebfbbcfaabfd':12 'f':24,28,65 'fbbfccfcbcba':30 'fc':4
'fcbbdbbaefcefefdf':34 'fd':18 'fdffcbe':69 'fea':62 'feeabdcd':36 'feeadcedecedebaedccffbfddadcfececbefddcbeaedbebfadefedcbd':22 'feffceceaeec':49 'ffaffde':40
用户全文检索请求,输入4个查询条件,流式返回PK。
建议使用流式返回接口,因为结果集可能非常大。
select pk from bptest1 where uid=$1 and ts between $2 and $3 and content @@ to_tsquery('english', $4);
压测
./test.sh
资源使用
dstat
CPU基本耗尽,磁盘的写入也非常的充分
cpu大部分为user的开销,后面使用perf看一下
dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
92 7 1 0 0 1|1075M 2435M|4048B 2297B| 0 0 | 142k 167k
92 7 0 0 0 1|1137M 2075M|2391B 1945B| 0 0 | 135k 161k
91 8 0 0 0 0|1182M 2125M|3483B 2845B| 0 0 | 140k 166k
91 8 0 0 0 1|1193M 1971M|3788B 1633B| 0 0 | 135k 159k
91 8 0 0 0 0|1089M 2305M|2232B 1478B| 0 0 | 139k 159k
92 7 0 0 0 1| 986M 2795M|2176B 1568B| 0 0 | 127k 142k
92 7 1 0 0 0| 760M 2864M|6028B 1408B| 0 0 | 116k 118k
90 8 0 0 0 0|1029M 3057M|1565B 2116B| 0 0 | 132k 150k
90 9 1 0 0 1|1000M 3237M|2336B 4850B| 0 0 | 133k 154k
90 8 1 1 0 1| 659M 4399M|2872B 7992B| 0 0 | 115k 119k
91 7 0 0 0 1| 925M 2996M|1293B 1059B| 0 0 | 122k 127k
90 8 1 1 0 1| 996M 3350M| 664B 574B| 0 0 | 133k 148k
91 7 1 0 0 1| 948M 2927M|3525B 2500B| 0 0 | 132k 146k
90 8 0 1 0 0|1114M 2869M|1751B 2645B| 0 0 | 132k 150k
90 8 0 1 0 1|1267M 2408M|3003B 2244B| 0 0 | 137k 167k
91 8 0 1 0 1|1086M 2539M| 900B 347B| 0 0 | 133k 154k
91 8 0 0 0 1| 998M 2614M|1975B 1757B| 0 0 | 130k 151k
91 8 0 0 0 0|1120M 2150M|1466B 4911B| 0 0 | 130k 154k
92 7 0 0 0 0|1163M 2387M|1356B 498B| 0 0 | 136k 163k
90 8 1 1 0 1| 864M 2656M|2601B 3373B| 0 0 | 130k 143k
91 8 0 0 0 1| 987M 2651M|2052B 898B| 0 0 | 135k 154k
91 8 0 0 0 0|1073M 2205M|2479B 2319B| 0 0 | 130k 144k
90 8 1 1 0 1| 951M 2941M|1390B 1001B| 0 0 | 130k 148k
磁盘使用率
iostat -x
avg-cpu: %user %nice %system %iowait %steal %idle
85.29 0.49 9.89 1.90 0.00 2.43
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 92.40 99.40 389.00 2822.40 136435.20 285.13 2.25 4.60 0.65 31.64
sdc 0.00 33.60 154.20 211.60 4838.40 85700.80 247.51 1.14 3.10 0.63 22.96
sdd 0.00 63.00 232.40 238.40 7316.80 109648.00 248.44 2.17 4.61 0.99 46.54
sde 0.00 78.80 269.60 340.80 7980.80 102419.20 180.87 2.53 4.14 0.94 57.62
sdf 0.00 58.40 283.00 234.20 8204.80 99129.60 207.53 2.30 4.45 0.93 48.20
sdg 0.00 50.80 207.60 236.60 6652.80 94337.60 227.35 1.42 3.19 0.68 30.34
sdh 0.00 102.20 109.40 475.20 3489.60 131211.20 230.42 2.60 4.45 0.52 30.40
sdi 0.00 70.20 107.00 337.00 3228.80 79603.20 186.56 1.35 3.04 0.53 23.38
sdj 0.00 31.00 70.60 158.80 2534.40 85124.80 382.12 0.82 3.59 0.86 19.72
sdk 0.20 58.40 190.60 295.80 5587.20 123539.20 265.47 1.74 3.57 0.68 33.28
sdl 0.00 91.00 162.80 396.40 4441.60 119507.20 221.65 1.98 3.54 0.59 33.26
sdm 0.00 274.80 103.20 359.20 2296.00 158908.80 348.63 3.81 8.23 1.08 50.06
perf
大部分的开销是postgres进程消耗的,建议使用以下开关重新编译一下.
《PostgreSQL 源码性能诊断(perf profiling)指南》
top -ag
PerfTop: 9171 irqs/sec kernel:63.7% exact: 0.0% [1000Hz cycles], (all, 64 CPUs)
----------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ _______________________________________
23044.00 4.5% tsCompareString /home/digoal/pgsql10_8k/bin/postgres
19821.00 3.9% ExecInterpExpr /home/digoal/pgsql10_8k/bin/postgres
12258.00 2.4% gintuple_get_key /home/digoal/pgsql10_8k/bin/postgres
12208.00 2.4% pg_detoast_datum_packed /home/digoal/pgsql10_8k/bin/postgres
11111.00 2.2% hash_search_with_hash_value /home/digoal/pgsql10_8k/bin/postgres
10318.00 2.0% memcpy /lib64/libc-2.12.so
9078.00 1.8% AllocSetAlloc /home/digoal/pgsql10_8k/bin/postgres
8944.00 1.7% advance_aggregates /home/digoal/pgsql10_8k/bin/postgres
8547.00 1.7% cmpEntryAccumulator /home/digoal/pgsql10_8k/bin/postgres
7311.00 1.4% array_seek /home/digoal/pgsql10_8k/bin/postgres
6744.00 1.3% gin_cmp_tslexeme /home/digoal/pgsql10_8k/bin/postgres
6650.00 1.3% __closure_wake_up [bcache]
6550.00 1.3% appendBinaryStringInfo /home/digoal/pgsql10_8k/bin/postgres
6475.00 1.3% TParserGet /home/digoal/pgsql10_8k/bin/postgres
5578.00 1.1% ginFindLeafPage /home/digoal/pgsql10_8k/bin/postgres
5543.00 1.1% PyParser_AddToken /lib64/libpython2.7.so.1.0
5412.00 1.1% array_get_element /home/digoal/pgsql10_8k/bin/postgres
5355.00 1.0% heap_fill_tuple /home/digoal/pgsql10_8k/bin/postgres
4936.00 1.0% entryLocateLeafEntry /home/digoal/pgsql10_8k/bin/postgres
4732.00 0.9% heap_form_minimal_tuple /home/digoal/pgsql10_8k/bin/postgres
4512.00 0.9% rb_insert /home/digoal/pgsql10_8k/bin/postgres
top
top -c -u digoal
top - 19:20:47 up 179 days, 5:38, 8 users, load average: 183.79, 189.01, 166.41
Tasks: 2939 total, 159 running, 2780 sleeping, 0 stopped, 0 zombie
Cpu(s): 87.7%us, 8.9%sy, 0.9%ni, 0.9%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 529321828k total, 512395020k used, 16926808k free, 299780k buffers
Swap: 0k total, 0k used, 0k free, 482162560k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11258 digoal 20 0 20.7g 2.9g 2.9g R 78.7 0.6 5:11.98 postgres: postgres postgres 127.0.0.1(52848) INSERT
11253 digoal 20 0 20.7g 3.0g 2.9g R 70.7 0.6 5:14.48 postgres: postgres postgres 127.0.0.1(52843) INSERT
11264 digoal 20 0 20.7g 3.0g 2.9g R 68.1 0.6 5:14.63 postgres: postgres postgres 127.0.0.1(52854) INSERT
11263 digoal 20 0 20.7g 3.0g 2.9g R 65.2 0.6 5:14.89 postgres: postgres postgres 127.0.0.1(52853) INSERT
11250 digoal 20 0 20.7g 3.0g 3.0g R 59.4 0.6 5:16.66 postgres: postgres postgres 127.0.0.1(52840) idle
11268 digoal 20 0 20.7g 2.9g 2.9g R 53.3 0.6 5:11.36 postgres: postgres postgres 127.0.0.1(52858) INSERT
11266 digoal 20 0 20.7g 3.0g 3.0g R 52.3 0.6 5:09.00 postgres: postgres postgres 127.0.0.1(52856) INSERT
11068 digoal 20 0 20.7g 2.6g 2.6g R 51.4 0.5 4:44.47 postgres: postgres postgres 127.0.0.1(45823) INSERT
11097 digoal 20 0 20.7g 2.6g 2.5g R 49.5 0.5 4:47.85 postgres: postgres postgres 127.0.0.1(45839) INSERT
11161 digoal 20 0 20.7g 2.6g 2.6g R 49.5 0.5 4:47.87 postgres: postgres postgres 127.0.0.1(44881) INSERT
11256 digoal 20 0 20.7g 3.0g 3.0g R 49.5 0.6 5:14.69 postgres: postgres postgres 127.0.0.1(52846) INSERT
10819 digoal 20 0 20.7g 2.7g 2.7g R 48.8 0.5 4:58.17 postgres: postgres postgres 127.0.0.1(47342) INSERT
11107 digoal 20 0 20.7g 2.7g 2.7g R 48.8 0.5 5:02.00 postgres: postgres postgres 127.0.0.1(59612) INSERT
11255 digoal 20 0 20.7g 3.0g 3.0g R 48.2 0.6 5:15.68 postgres: postgres postgres 127.0.0.1(52845) INSERT
11267 digoal 20 0 20.7g 3.0g 3.0g R 47.9 0.6 5:18.82 postgres: postgres postgres 127.0.0.1(52857) INSERT
11066 digoal 20 0 20.7g 2.6g 2.6g R 46.9 0.5 4:44.97 postgres: postgres postgres 127.0.0.1(45821) INSERT
11222 digoal 20 0 20.7g 2.6g 2.6g R 45.9 0.5 5:00.43 postgres: postgres postgres 127.0.0.1(40379) idle
11207 digoal 20 0 20.7g 2.6g 2.6g R 45.6 0.5 5:04.59 postgres: postgres postgres 127.0.0.1(46160) INSERT
11224 digoal 20 0 20.7g 2.6g 2.6g R 45.3 0.5 5:02.60 postgres: postgres postgres 127.0.0.1(40381) INSERT
11249 digoal 20 0 20.7g 2.6g 2.6g R 45.3 0.5 4:59.58 postgres: postgres postgres 127.0.0.1(46187) INSERT
11045 digoal 20 0 20.7g 2.6g 2.6g R 44.6 0.5 4:39.75 postgres: postgres postgres 127.0.0.1(64424) idle
11064 digoal 20 0 20.7g 2.6g 2.6g R 44.6 0.5 4:44.69 postgres: postgres postgres 127.0.0.1(45819) INSERT
11145 digoal 20 0 20.7g 2.6g 2.6g S 44.3 0.5 4:46.18 postgres: postgres postgres 127.0.0.1(44876) INSERT
10865 digoal 20 0 20.7g 2.6g 2.6g R 44.0 0.5 4:59.89 postgres: postgres postgres 127.0.0.1(49769) INSERT
11080 digoal 20 0 20.7g 2.6g 2.5g R 44.0 0.5 4:43.70 postgres: postgres postgres 127.0.0.1(45825) INSERT
11247 digoal 20 0 20.7g 2.6g 2.6g R 43.4 0.5 5:01.91 postgres: postgres postgres 127.0.0.1(40391) idle
11163 digoal 20 0 20.7g 2.6g 2.6g R 42.7 0.5 4:48.34 postgres: postgres postgres 127.0.0.1(44882) idle
11164 digoal 20 0 20.7g 2.6g 2.6g R 42.4 0.5 4:53.21 postgres: postgres postgres 127.0.0.1(44883) INSERT
10882 digoal 20 0 20.7g 2.6g 2.6g R 41.8 0.5 5:04.78 postgres: postgres postgres 127.0.0.1(49772) INSERT
10868 digoal 20 0 20.7g 2.6g 2.6g R 41.4 0.5 5:00.30 postgres: postgres postgres 127.0.0.1(49770) INSERT
写入tps
换算成单机的写入,约6.5万行/s。
progress: 729.0 s, 55.0 tps, lat 11.610 ms stddev 4.836
progress: 732.0 s, 59.7 tps, lat 20.071 ms stddev 107.984
progress: 735.0 s, 57.0 tps, lat 20.492 ms stddev 125.445
progress: 738.0 s, 38.7 tps, lat 25.891 ms stddev 154.607
progress: 741.0 s, 41.0 tps, lat 24.405 ms stddev 140.247
progress: 744.0 s, 43.0 tps, lat 13.550 ms stddev 10.448
progress: 747.0 s, 60.0 tps, lat 20.691 ms stddev 131.640
progress: 750.0 s, 60.0 tps, lat 17.394 ms stddev 83.385
progress: 753.0 s, 44.3 tps, lat 25.510 ms stddev 146.719
progress: 756.0 s, 25.0 tps, lat 39.819 ms stddev 213.642
progress: 759.0 s, 50.0 tps, lat 11.439 ms stddev 5.319
progress: 762.0 s, 60.0 tps, lat 20.979 ms stddev 106.782
progress: 765.0 s, 60.0 tps, lat 18.778 ms stddev 167.714
progress: 768.0 s, 58.0 tps, lat 18.017 ms stddev 99.949
progress: 771.0 s, 51.0 tps, lat 19.636 ms stddev 124.429
写入性能基本上取决于tsvector字段的元素个数,散列程度,本例每条记录约40个元素。如果元素个数下降一半,性能将提升一倍左右。
postgres=# select array_length(tsvector_to_array(content),1) from bptest1 limit 10;
array_length
--------------
40
37
40
45
35
42
38
46
30
40
(10 rows)
评估每秒构建了多少个索引条目
1. 全文检索索引条目
每条记录约40个元素,当插入的tps=6.5万时,构建的全文检索条目数约 260万/s。
65000*40 = 2600000
2. uid索引条目,较小,忽略不计。
3. ts索引条目,使用BRIN块级索引,忽略不计。
性能影响最大,资源消耗最多的就是全文检索索引条目的构建。
查询性能
举例
postgres=# \dt+ bptest1
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+----------+---------+-------------
public | bptest1 | table | postgres | 1689 MB |
(1 row)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bptest1 (cost=175.95..23691.41 rows=20015 width=811) (actual time=7.017..23.376 rows=19755 loops=1)
Output: uid, ts, content
Recheck Cond: (bptest1.content @@ '''abc'''::tsquery)
Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))
Heap Blocks: exact=18933
Buffers: shared hit=18948
-> Bitmap Index Scan on bptest1_content_idx (cost=0.00..170.94 rows=20019 width=0) (actual time=3.811..3.811 rows=19755 loops=1)
Index Cond: (bptest1.content @@ '''abc'''::tsquery)
Buffers: shared hit=15
Planning time: 0.097 ms
Execution time: 24.517 ms
(11 rows)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from bptest1 where ts between 1494999617 and 1495999617 and content @@ to_tsquery ('english','abc & bc');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bptest1 (cost=36.27..2598.42 rows=1996 width=811) (actual time=4.577..6.711 rows=2125 loops=1)
Output: uid, ts, content
Recheck Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)
Filter: ((bptest1.ts >= 1494999617) AND (bptest1.ts <= 1495999617))
Heap Blocks: exact=2116
Buffers: shared hit=2239
-> Bitmap Index Scan on bptest1_content_idx (cost=0.00..35.77 rows=1997 width=0) (actual time=4.291..4.291 rows=2125 loops=1)
Index Cond: (bptest1.content @@ '''abc'' & ''bc'''::tsquery)
Buffers: shared hit=123
Planning time: 0.125 ms
Execution time: 6.849 ms
(11 rows)
纯SSD fsync=on 写入性能
1. 写入TPS
7万/s ,构建的全文检索条目数约 280万/s。
性能比较平稳。
纯SATA+SSD bcache fsync=off 写入性能
1. 写入TPS
7.5万/s ,构建的全文检索条目数约 300万/s。
性能比较平稳。
小结
1. 查询聚合
由于日志数据打散分布在多个集群,多个表内,建议使用plproxy进行查询的聚合。
参考
《A Smart PostgreSQL extension plproxy 2.2 practices》
《阿里云ApsaraDB RDS for PostgreSQL 最佳实践 - 4 水平分库 之 节点扩展》
《阿里云ApsaraDB RDS for PostgreSQL 最佳实践 - 3 水平分库 vs 单机 性能》
《阿里云ApsaraDB RDS for PostgreSQL 最佳实践 - 2 教你RDS PG的水平分库》
2. 写入分片
写入分片,可以在业务层完成,随机打散写入。
实际应用时,可以根据需要,切分成更多的分区。
3. 主要的开销是postgres的开销,如果需要详细的分析,建议重新编译postgres
4. gin索引的优化
https://www.postgresql.org/docs/9.6/static/sql-createindex.html
GIN indexes accept different parameters:
1. fastupdate
This setting controls usage of the fast update technique described in Section 63.4.1.
It is a Boolean parameter: ON enables fast update, OFF disables it.
(Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.
Note: Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries,
but does not in itself flush previous entries.
You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.
2. gin_pending_list_limit
Custom gin_pending_list_limit parameter.
This value is specified in kilobytes.
gin_pending_list_limit的目的是延迟合并,因为一条记录中可能涉及较多的GIN KEY,如果实时更新,GIN索引的写入量会非常大,性能受到影响。
本例gin_pending_list_limit设置为2MB,tps比较平缓,如果设置过大,当CPU资源不足时,抖动会比较严重。
用户可以根据实际测试,设置合理的gin_pending_list_limit值。
5. 如果把PostgreSQL完全当成索引库使用,并且允许数据丢失,那么可以使用fsync=off的开关,(检查点fsync对IO的影响比较大,本例使用的是SATA盘,将会导致较大的性能抖动)。
postgresql.auto.conf
fsync = off
zero_damaged_pages = on
如果有ha的话,丢失的风险又会更小。(但是服务器CRASH后,需要重建备库,这么大的量,还是挺恐怖的。)
建议用更多的数据库实例,每个实例的大小可控(例如 < 2TB),重建的时间也相对可控。
6. 为了达到更好的响应速度(RT),建议明细和索引分开存放,明细要求写入RT低,索引可以存在一定的延迟。 并且索引与明细数据的可靠性要求也不一样。