PostgreSQL 1000亿数据量 正则匹配 速度与激情

16 minute read

背景

本文主要讲解并验证一下PostgreSQL 1000亿数据量级的模糊查询、正则查询,使用gin索引的效率。

承接上一篇

https://yq.aliyun.com/articles/7444

测试环境

测试环境为 8台主机(16c/host)的 PostgreSQL集群,一共240个数据节点,测试数据量1008亿。

性能图表 :

_

如果要获得更快的响应速度,可以通过增加主机和节点数(或者通过增加CPU和节点数),缩短recheck的处理时间。

数据生成方法

#!/bin/bash  
#      截取通过random()计算得到的MD5 128bit hex的前48bit, 转成字符串,得到[0-9]和[a-f]组成的12个随机字符串。  
  
psql digoal digoal -c "create table t_regexp_100billion distributed randomly"  
  
for ((i=1;i<=1008;i++))  
do  
  psql digoal digoal -c "copy (select substring(md5(random()::text),1,12) from generate_series(1,100000000)) to stdout" | psql digoal digoal -c "copy t_regexp_100billion from stdin"  
done  
  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_1 on t_regexp_100billion(info)"  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_2 on t_regexp_100billion(reverse(info))"  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_gin on t_regexp_100billion using gin (info gin_trgm_ops)"  

数据概貌

digoal=> select count(*) from t_regexp_100billion ;  
    count       
--------------  
 100800000000  
(1 row)  
Time: 228721.386 ms  

表大小, 4.1 TB

digoal=> \dt+ t_regexp_100billion   
                           List of relations  
 Schema |        Name         | Type  | Owner  |  Size   | Description   
--------+---------------------+-------+--------+---------+-------------  
 public | t_regexp_100billion | table | digoal | 4158 GB |   
(1 row)  

索引大小

idx_t_regexp_100billion_1     2961 GB  
idx_t_regexp_100billion_1     2961 GB  
idx_t_regexp_100billion_gin   2300 GB  

测试数据展示:

digoal=> select * from t_regexp_100billion offset 1000000 limit 10;  
     info       
--------------  
 bca0fb45367e  
 3051ca8a9a38  
 fadc91a3a4de  
 710b9c60417e  
 279dd9832cc3  
 f4743fe2e83b  
 9ce9e42d4039  
 65e64742fd3f  
 db3d0e0edc52  
 7cfb00bb38ec  
(10 rows)  

重复度取样, 计算random() md5得到的字符串,可以确保非常低的重复度:

digoal=> select count(distinct info) from (select * from t_regexp_100billion offset 1299422811 limit 1000000) t;  
 count    
--------  
 999750  
(1 row)  

统计信息展示:

digoal=> alter table t_regexp_100billion alter column info set statistics 10000;  
ALTER TABLE  
digoal=> analyze t_regexp_100billion ;  
ANALYZE  
  
schemaname             | public  
tablename              | t_regexp_100billion  
attname                | info  
inherited              | f  
null_frac              | 0  
avg_width              | 13  
n_distinct             | -0.836834             # 采样统计信息,约83.6834%的唯一值  
most_common_vals       | (pg_catalog.text){7f68d12d2205,00083380706d,00154b6d79e8,...    
most_common_freqs      | {1e-06,6.66667e-07,6.66667e-07,6.66667e-07,.....        单个最高频值的占比为1e-06, 也就是说1000亿记录中出现10万次。  
histogram_bounds       | (pg_catalog.text){0000008123b7,00066c71c9bb,000d672de234,...  
correlation            | 0.000237291  
most_common_elems      |   
most_common_elem_freqs |   
elem_count_histogram   |   

7f68d12d2205 实际的出现次数,可能是采样时7f68d12d2205被采样到的块较多,所以数据库认为它的占比较多:

digoal=> select count(*) from t_regexp_100billion where info='7f68d12d2205';  
-[ RECORD 1 ]  
count | 54  
  
digoal=> select ctid from t_regexp_100billion where info='7f68d12d2205' order by 1;  
     ctid        
---------------  
 (15343,114)  
 (62134,39)  
 (96808,112)  
 (116492,176)  
 (194615,143)  
 (328074,116)  
 (364037,115)  
 (375240,158)  
 (376187,152)  
 (602144,81)  
 (664026,6)  
 (689501,136)  
 (695345,130)  
 (697374,126)  
 (714719,148)  
 (743169,20)  
 (802326,139)  
 (833830,41)  
 (839417,185)  
 (892417,78)  
 (892493,149)  
 (907979,52)  
 (967078,163)  
 (990313,159)  
 (1007998,27)  
 (1106961,57)  
 (1142731,165)  
 (1148427,67)  
 (1156654,156)  
 (1205854,137)  
 (1243429,68)  
 (1277287,165)  
 (1328836,98)  
 (1331727,150)  
 (1337534,3)  
 (1360947,104)  
 (1438970,97)  
 (1476941,22)  
 (1482022,82)  
 (1486307,69)  
 (1548445,155)  
 (1557209,82)  
 (1564980,158)  
 (1646685,76)  
 (1663018,99)  
 (1678604,77)  
 (1755845,177)  
 (1981937,153)  
 (1984723,98)  
 (2071955,59)  
 (2093147,149)  
 (2199794,102)  
 (2204957,44)  
 (2234820,142)  
(54 rows)  

性能测试

前缀匹配查询速度:

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,60)   |    16677 | 80ebcdd47006  
 (896121,64)   |    16659 | 80ebcdd47006  
 (1124495,97)  |    16659 | 80ebcdd47006  
 (1126474,141) |    16659 | 80ebcdd47006  
 (1059471,62)  |    16659 | 80ebcdd47006  
 (1296562,115) |    16659 | 80ebcdd47006  
 (1190941,122) |    16659 | 80ebcdd47006  
 (680853,129)  |    16659 | 80ebcdd47006  
 (1010667,15)  |    16659 | 80ebcdd47006  
 (1386348,25)  |    16659 | 80ebcdd47006  
 (1522827,90)  |    16659 | 80ebcdd47006  
 (2204071,129) |    16659 | 80ebcdd47006  
 (1570431,114) |    16659 | 80ebcdd47006  
 (888185,38)   |    16659 | 80ebcdd47006  
 (605886,160)  |    16659 | 80ebcdd47006  
 (1306061,123) |    16659 | 80ebcdd47006  
 (757157,47)   |    16659 | 80ebcdd47006  
 (1166290,83)  |    16659 | 80ebcdd47006  
 (419730,1)    |    16659 | 80ebcdd47006  
 (1833853,131) |    16659 | 80ebcdd47006  
 (964866,120)  |    16659 | 80ebcdd47006  
 (904961,175)  |    16659 | 80ebcdd47006  
 (984373,32)   |    16659 | 80ebcdd47006  
 (891018,145)  |    16659 | 80ebcdd47006  
 (1520483,121) |    16659 | 80ebcdd47006  
 (571001,124)  |    16659 | 80ebcdd47006  
 (802093,55)   |    16659 | 80ebcdd47006  
 (6831,172)    |    16659 | 80ebcdd47006  
 (1169137,84)  |    16659 | 80ebcdd47006  
 (77398,164)   |    16659 | 80ebcdd47006  
 (24132,98)    |    16659 | 80ebcdd47006  
 (564322,152)  |    16659 | 80ebcdd47006  
 (357087,172)  |    16659 | 80ebcdd47006  
 (1823628,60)  |    16659 | 80ebcdd47006  
 (2153609,52)  |    16659 | 80ebcdd47006  
 (816401,140)  |    16659 | 80ebcdd47006  
 (542383,53)   |    16662 | 80ebcdd47006  
 (1340971,64)  |    16662 | 80ebcdd47006  
 (1239166,108) |    16662 | 80ebcdd47006  
 (2033648,39)  |    16662 | 80ebcdd47006  
 (1890808,93)  |    16662 | 80ebcdd47006  
 (1213124,4)   |    16662 | 80ebcdd47006  
 (1025184,106) |    16662 | 80ebcdd47006  
 (620238,131)  |    16662 | 80ebcdd47006  
 (583064,74)   |    16662 | 80ebcdd47006  
 (1454680,42)  |    16671 | 80ebcdd47006  
 (417385,74)   |    16671 | 80ebcdd47006  
 (323669,61)   |    16671 | 80ebcdd47006  
 (1759181,138) |    16671 | 80ebcdd47006  
 (2112157,146) |    16671 | 80ebcdd47006  
 (431326,92)   |    16671 | 80ebcdd47006  
 (2097356,110) |    16671 | 80ebcdd47006  
(52 rows)  
Time: 3226.393 ms  
  
digoal=> explain (analyze,verbose,buffers,costs,timing) select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3085.502..3112.273 rows=52 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '^80ebcdd47'::text)  
 Planning time: 0.061 ms  
 Execution time: 3112.296 ms  
(6 rows)  
Time: 3139.928 ms  

后缀匹配查询速度

digoal=> select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,26)   |    16677 | f3b98021d24f  
 (1696888,151) |    16659 | f3b98021d24f  
 (1278911,101) |    16659 | f3b98021d24f  
 (1427480,157) |    16659 | f3b98021d24f  
 (449192,30)   |    16659 | f3b98021d24f  
 (1833887,81)  |    16659 | f3b98021d24f  
 (229525,72)   |    16659 | f3b98021d24f  
 (1353789,17)  |    16659 | f3b98021d24f  
 (1875911,148) |    16659 | f3b98021d24f  
 (1847078,35)  |    16659 | f3b98021d24f  
 (316780,156)  |    16659 | f3b98021d24f  
 (1265453,120) |    16659 | f3b98021d24f  
 (100075,60)   |    16659 | f3b98021d24f  
 (1924176,2)   |    16659 | f3b98021d24f  
 (279583,2)    |    16659 | f3b98021d24f  
 (1631226,23)  |    16659 | f3b98021d24f  
 (1906666,50)  |    16659 | f3b98021d24f  
 (1640803,116) |    16659 | f3b98021d24f  
 (629651,46)   |    16659 | f3b98021d24f  
 (134982,13)   |    16659 | f3b98021d24f  
 (380660,123)  |    16659 | f3b98021d24f  
 (2158193,31)  |    16659 | f3b98021d24f  
 (324901,64)   |    16659 | f3b98021d24f  
 (1243973,160) |    16659 | f3b98021d24f  
 (540958,139)  |    16659 | f3b98021d24f  
 (441475,99)   |    16659 | f3b98021d24f  
 (1207114,121) |    16659 | f3b98021d24f  
 (574598,21)   |    16659 | f3b98021d24f  
 (1253283,185) |    16659 | f3b98021d24f  
 (1396717,142) |    16659 | f3b98021d24f  
 (149738,9)    |    16659 | f3b98021d24f  
 (764749,26)   |    16659 | f3b98021d24f  
 (1211899,5)   |    16659 | f3b98021d24f  
 (1626746,65)  |    16659 | f3b98021d24f  
 (1342895,124) |    16659 | f3b98021d24f  
 (733794,136)  |    16659 | f3b98021d24f  
 (417796,2)    |    16659 | f3b98021d24f  
 (555520,163)  |    16659 | f3b98021d24f  
 (232038,105)  |    16659 | f3b98021d24f  
 (355107,127)  |    16659 | f3b98021d24f  
 (352143,175)  |    16662 | f3b98021d24f  
 (1856293,69)  |    16662 | f3b98021d24f  
 (1405106,105) |    16662 | f3b98021d24f  
 (47689,79)    |    16662 | f3b98021d24f  
 (679310,7)    |    16671 | f3b98021d24f  
 (1076234,164) |    16671 | f3b98021d24f  
(46 rows)  
Time: 3140.835 ms  
  
  
digoal=> explain (verbose,costs,timing,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3085.738..3112.216 rows=46 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (reverse(info) ~ '^f42d12089b'::text)  
 Planning time: 0.063 ms  
 Execution time: 3112.236 ms  
(6 rows)  
  
Time: 3139.890 ms  

前后模糊查询速度:

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,45)   |    16677 | be7add048713  
 (49315,69)    |    16659 | be7add048713  
 (1770876,21)  |    16659 | be7add048713  
 (199079,143)  |    16659 | be7add048713  
 (151110,141)  |    16659 | be7add048713  
 (1597384,137) |    16659 | be7add048713  
 (1693453,25)  |    16659 | be7add048713  
 (101576,132)  |    16659 | be7add048713  
 (1110249,50)  |    16659 | be7add048713  
 (792326,68)   |    16659 | be7add048713  
 (1676705,68)  |    16659 | be7add048713  
 (1269148,101) |    16659 | be7add048713  
 (1027442,113) |    16659 | be7add048713  
 (1078144,100) |    16659 | be7add048713  
 (584038,141)  |    16659 | be7add048713  
 (1245454,80)  |    16659 | be7add048713  
 (1551184,102) |    16659 | be7add048713  
 (1326266,17)  |    16659 | be7add048713  
 (432025,101)  |    16659 | be7add048713  
 (300650,152)  |    16659 | be7add048713  
 (1322140,15)  |    16662 | be7add048713  
 (1424768,25)  |    16662 | be7add048713  
 (391150,31)   |    16662 | be7add048713  
 (254014,170)  |    16662 | be7add048713  
 (1758616,85)  |    16662 | be7add048713  
 (1720990,105) |    16662 | be7add048713  
 (345908,68)   |    16662 | be7add048713  
 (1592333,102) |    16662 | be7add048713  
 (1843902,130) |    16671 | be7add048713  
 (898136,121)  |    16671 | be7add048713  
 (1469985,138) |    16671 | be7add048713  
 (1287666,51)  |    16671 | be7add048713  
(32 rows)  
  
Time: 4970.662 ms  
  
digoal=> explain (analyze,verbose,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=4712.916..4897.512 rows=32 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'e7add04871'::text)  
 Planning time: 0.063 ms  
 Execution time: 4897.532 ms  
(6 rows)  
  
Time: 4925.741 ms  

正则匹配查询速度

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,29)   |    16677 | 0b39188209f2  
 (1443707,79)  |    16659 | 0b39188209f2  
 (596962,50)   |    16659 | 0b39188209f2  
 (1763787,145) |    16659 | 0b39188209f2  
 (2192691,24)  |    16659 | 0b39188209f2  
 (425121,26)   |    16659 | 0b39188209f2  
 (2157735,117) |    16659 | 0b39188209f2  
 (826685,32)   |    16659 | 0b39188209f2  
 (507417,51)   |    16659 | 0b39188209f2  
 (1168854,22)  |    16659 | 0b39188209f2  
 (178112,96)   |    16659 | 0b39188209f2  
 (1609343,84)  |    16659 | 0b39188209f2  
 (1883190,161) |    16659 | 0b39188209f2  
 (1879921,82)  |    16659 | 0b39188209f2  
 (187722,148)  |    16659 | 0b39188209f2  
 (411680,31)   |    16659 | 0b39188209f2  
 (1103474,74)  |    16659 | 0b39188209f2  
 (1756318,139) |    16659 | 0b39188209f2  
 (760475,112)  |    16659 | 0b39188209f2  
 (656098,47)   |    16659 | 0b39188209f2  
 (2015224,31)  |    16659 | 0b39188209f2  
 (401158,64)   |    16659 | 0b39188209f2  
 (1001315,155) |    16659 | 0b39188209f2  
 (527643,24)   |    16659 | 0b39188209f2  
 (51198,95)    |    16659 | 0b39188209f2  
 (1709591,26)  |    16659 | 0b39188209f2  
 (1235618,22)  |    16659 | 0b39188209f2  
 (542813,107)  |    16659 | 0b39188209f2  
 (331468,156)  |    16659 | 0b39188209f2  
 (940954,68)   |    16662 | 0b39188209f2  
 (1295686,169) |    16662 | 0b39188209f2  
 (825955,109)  |    16668 | 0b39188209f2  
 (2025210,165) |    16671 | 0b39188209f2  
 (1639115,139) |    16671 | 0b39188209f2  
 (422678,79)   |    16671 | 0b39188209f2  
 (379949,175)  |    16671 | 0b39188209f2  
 (455206,96)   |    16671 | 0b39188209f2  
 (1745081,184) |    16671 | 0b39188209f2  
(38 rows)  
Time: 3580.536 ms  
  
digoal=> explain (verbose,analyze,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3407.156..3621.601 rows=38 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '.3918.209f'::text)  
 Planning time: 0.072 ms  
 Execution time: 3621.626 ms  
(6 rows)  
Time: 3650.045 ms  
  
digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (899065,160)  |    16659 | 4eab207df3c8  
 (2100060,157) |    16659 | a2ab2fbdf3c8  
 (162213,12)   |    16659 | a2ab2fbdf3c8  
 (637030,50)   |    16659 | 4eab207df3c8  
 (1325830,35)  |    16659 | a2ab2fbdf3c8  
 (197454,129)  |    16659 | 4eab207df3c8  
 (2000258,158) |    16659 | a2ab2fbdf3c8  
 (765698,19)   |    16659 | a2ab2fbdf3c8  
 (935743,59)   |    16659 | 4eab207df3c8  
 (2203339,96)  |    16659 | a2ab2fbdf3c8  
 (701234,118)  |    16659 | a2ab2fbdf3c8  
 (971717,57)   |    16659 | a2ab2fbdf3c8  
 (1164498,54)  |    16659 | 4eab207df3c8  
 (393227,147)  |    16659 | 4eab207df3c8  
 (1439445,94)  |    16659 | a2ab2fbdf3c8  
 (1549135,146) |    16659 | 4eab207df3c8  
 (1551991,36)  |    16659 | 4eab207df3c8  
 (2206488,3)   |    16659 | a2ab2fbdf3c8  
 (481614,118)  |    16659 | 4eab207df3c8  
 (1809085,7)   |    16659 | a2ab2fbdf3c8  
 (173214,139)  |    16659 | 4eab207df3c8  
 (1021816,28)  |    16659 | 4eab207df3c8  
 (829846,43)   |    16659 | a2ab2fbdf3c8  
 (1899020,79)  |    16659 | 4eab207df3c8  
 (6241,163)    |    16659 | 4eab207df3c8  
 (1205920,5)   |    16659 | a2ab2fbdf3c8  
 (412014,52)   |    16659 | 4eab207df3c8  
 (1122051,14)  |    16659 | 4eab207df3c8  
 (284493,87)   |    16659 | 4eab207df3c8  
 (374322,83)   |    16659 | a2ab2fbdf3c8  
 (189124,19)   |    16659 | 4eab207df3c8  
 (747428,175)  |    16659 | a2ab2fbdf3c8  
 (795035,152)  |    16659 | a2ab2fbdf3c8  
 (1949396,25)  |    16659 | a2ab2fbdf3c8  
 (154445,167)  |    16659 | a2ab2fbdf3c8  
 (859513,82)   |    16659 | a2ab2fbdf3c8  
 (31337,41)    |    16659 | a2ab2fbdf3c8  
 (1393343,136) |    16659 | 4eab207df3c8  
 (63555,82)    |    16659 | a2ab2fbdf3c8  
 (608980,177)  |    16659 | 4eab207df3c8  
 (250484,31)   |    16659 | a2ab2fbdf3c8  
 (1696502,87)  |    16659 | 4eab207df3c8  
 (2021326,68)  |    16659 | a2ab2fbdf3c8  
 (397967,70)   |    16659 | a2ab2fbdf3c8  
 (2083071,101) |    16659 | a2ab2fbdf3c8  
 (98554,23)    |    16659 | a2ab2fbdf3c8  
 (1247891,182) |    16659 | 4eab207df3c8  
 (1533143,51)  |    16659 | a2ab2fbdf3c8  
 (1280652,28)  |    16659 | a2ab2fbdf3c8  
 (1337921,119) |    16659 | a2ab2fbdf3c8  
 (446914,180)  |    16659 | a2ab2fbdf3c8  
 (1810263,161) |    16659 | a2ab2fbdf3c8  
 (350272,51)   |    16659 | 4eab207df3c8  
 (909148,37)   |    16659 | 4eab207df3c8  
 (197153,108)  |    16659 | a2ab2fbdf3c8  
 (207423,96)   |    16659 | 4eab207df3c8  
 (1097934,22)  |    16659 | a2ab2fbdf3c8  
 (12605,49)    |    16659 | 4eab207df3c8  
 (65244,28)    |    16659 | 4eab207df3c8  
 (10274,30)    |    16659 | a2ab2fbdf3c8  
 (1547771,91)  |    16659 | 4eab207df3c8  
 (55044,64)    |    16659 | 4eab207df3c8  
 (1286116,136) |    16659 | 4eab207df3c8  
 (797831,10)   |    16659 | a2ab2fbdf3c8  
 (450949,98)   |    16659 | a2ab2fbdf3c8  
 (563308,46)   |    16659 | 4eab207df3c8  
 (1815443,179) |    16659 | a2ab2fbdf3c8  
 (279403,105)  |    16659 | 4eab207df3c8  
 (1953284,11)  |    16659 | 4eab207df3c8  
 (2068896,15)  |    16659 | 4eab207df3c8  
 (1230212,18)  |    16659 | 4eab207df3c8  
 (1513277,18)  |    16659 | 4eab207df3c8  
 (1675223,30)  |    16659 | a2ab2fbdf3c8  
 (966609,80)   |    16662 | a2ab2fbdf3c8  
 (118085,180)  |    16662 | a2ab2fbdf3c8  
 (1557051,116) |    16662 | a2ab2fbdf3c8  
 (1848877,62)  |    16662 | 4eab207df3c8  
 (2224775,3)   |    16662 | 4eab207df3c8  
 (1196571,72)  |    16662 | 4eab207df3c8  
 (1799448,154) |    16662 | 4eab207df3c8  
 (2246230,68)  |    16662 | a2ab2fbdf3c8  
 (984529,120)  |    16662 | a2ab2fbdf3c8  
 (1361482,97)  |    16662 | 4eab207df3c8  
 (1935512,51)  |    16662 | a2ab2fbdf3c8  
 (816119,95)   |    16662 | a2ab2fbdf3c8  
 (770381,45)   |    16662 | 4eab207df3c8  
 (1943960,146) |    16662 | a2ab2fbdf3c8  
 (346006,160)  |    16671 | a2ab2fbdf3c8  
 (1873262,96)  |    16671 | 4eab207df3c8  
 (1219041,118) |    16671 | a2ab2fbdf3c8  
 (418076,24)   |    16671 | a2ab2fbdf3c8  
 (724463,28)   |    16671 | a2ab2fbdf3c8  
 (1471492,164) |    16671 | a2ab2fbdf3c8  
 (975490,122)  |    16671 | a2ab2fbdf3c8  
 (1885629,34)  |    16671 | 4eab207df3c8  
(95 rows)  
Time: 4718.459 ms  
  
digoal=> explain (verbose,timing,costs,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=4386.010..4648.614 rows=95 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'ab2..d[1|f]3c8'::text)  
 Planning time: 0.058 ms  
 Execution time: 4648.638 ms  
(6 rows)  
Time: 4676.919 ms  

Flag Counter

digoal’s 大量PostgreSQL文章入口