PostgreSQL sharding : citus 系列2 - TPC-H

6 minute read

背景

紧接着上一篇文档,本文测试citus的tpc-h能力(包括兼容性).

《PostgreSQL sharding : citus 系列1 - 多机部署(含OLTP(TPC-B)测试)》

《(TPC-H测试 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》

https://github.com/digoal/gp_tpch

实际测试过程中,发现CITUS对TPC-H的SQL支持并不完整。

citus tpc-h 测试

1、下载gp_tpch包

git clone https://github.com/digoal/gp_tpch  

2、生成200G测试数据

cd gp_tpch  
ln -s `pwd` /tmp/dss-data  
./dbgen -s 200  

3、将数据转换为PG识别的格式

for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;  

4、生成测试SQL

SF=200  
  
mkdir dss/queries  
for q in `seq 1 22`  
do  
    DSS_QUERY=dss/templates ./qgen -s $SF $q > dss/queries/$q.sql  
    sed 's/^select/explain select/' dss/queries/$q.sql > dss/queries/$q.explain.sql  
done  

5、修改citus 的几个参数,确保在跑两类QUERY的时候不报错(末尾会提到报错原因)。

alter role postgres set citus.enable_repartition_joins =on;  
alter role postgres set citus.max_intermediate_result_size =-1;  

6、测试TPC-H(与coordinator同一台主机上测试)

./tpch.sh ./results 127.0.0.1 port tpch-db tpch-user password citus  

tpc-h 性能

有些SQL不支持,显示0.

2018-08-29 19:24:30 [1535541870] :     query 1 finished OK (12 seconds)
2018-08-29 19:24:30 [1535541870] :     query 2 finished OK (0 seconds)
2018-08-29 19:25:48 [1535541948] :     query 3 finished OK (77 seconds)
2018-08-29 19:25:50 [1535541950] :     query 4 finished OK (2 seconds)
2018-08-29 19:29:45 [1535542185] :     query 5 finished OK (234 seconds)
2018-08-29 19:29:47 [1535542187] :     query 6 finished OK (1 seconds)
2018-08-29 19:37:33 [1535542653] :     query 7 finished OK (465 seconds)
2018-08-29 19:44:30 [1535543070] :     query 8 finished OK (415 seconds)
2018-08-29 19:58:29 [1535543909] :     query 9 finished OK (837 seconds)
2018-08-29 20:00:26 [1535544026] :     query 10 finished OK (116 seconds)
2018-08-29 20:00:26 [1535544026] :     query 11 finished OK (0 seconds)
2018-08-29 20:00:32 [1535544032] :     query 12 finished OK (6 seconds)
2018-08-29 20:00:33 [1535544033] :     query 13 finished OK (0 seconds)
2018-08-29 20:01:40 [1535544100] :     query 14 finished OK (67 seconds)
2018-08-29 20:05:33 [1535544333] :     query 15 finished OK (232 seconds)
2018-08-29 20:05:34 [1535544334] :     query 16 finished OK (0 seconds)
2018-08-29 20:05:34 [1535544334] :     query 17 finished OK (0 seconds)
2018-08-29 20:05:34 [1535544334] :     query 18 finished OK (0 seconds)
2018-08-29 20:06:51 [1535544411] :     query 19 finished OK (76 seconds)
2018-08-29 20:06:51 [1535544411] :     query 20 finished OK (0 seconds)
2018-08-29 20:06:52 [1535544412] :     query 21 finished OK (0 seconds)
2018-08-29 20:06:52 [1535544412] :     query 22 finished OK (0 seconds)
2018-08-29 20:06:52 [1535544412] : finished TPC-H benchmark

citus tpc-h SQL文件讲解

1、分片字段

select create_distributed_table('part','p_partkey');  
select create_distributed_table('region','r_regionkey');  
select create_distributed_table('nation','n_nationkey');  
select create_distributed_table('supplier','s_suppkey');  
select create_distributed_table('customer','c_custkey');  
select create_distributed_table('partsupp','ps_suppkey');  
select create_distributed_table('orders','o_orderkey');  
select create_distributed_table('lineitem','l_orderkey');  

2、colocate(默认情况下是同一个分组,不需要colocate)

SELECT mark_tables_colocated('part', ARRAY['region', 'nation', 'supplier', 'customer', 'partsupp', 'orders', 'lineitem']);  
  
默认分组,在创建分片表时colocate默认为default, 

只要表的shard数量、分片类型(append, or hash)、colocate组 都一致,则他们就是colocate的。

create_distributed_table

参数:colocate_with: (Optional) include current table in the co-location group of another table. 
By default tables are co-located when they are distributed by columns of the same type, 
have the same shard count, and have the same replication factor. 
Possible values for colocate_with are default, none to start a new co-location group, 
or the name of another table to co-locate with that table. (See Co-Locating Tables.)

3、索引

ALTER TABLE PART ADD constraint pk1 PRIMARY KEY (P_PARTKEY);  
ALTER TABLE SUPPLIER ADD constraint pk2 PRIMARY KEY (S_SUPPKEY);  
ALTER TABLE PARTSUPP ADD constraint pk3 PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);  
ALTER TABLE CUSTOMER ADD constraint pk4 PRIMARY KEY (C_CUSTKEY);  
ALTER TABLE ORDERS ADD constraint pk5 PRIMARY KEY (O_ORDERKEY);  
ALTER TABLE LINEITEM ADD constraint pk6 PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);  
ALTER TABLE NATION ADD constraint pk7 PRIMARY KEY (N_NATIONKEY);  
ALTER TABLE REGION ADD constraint pk8 PRIMARY KEY (R_REGIONKEY);  
  
CREATE INDEX idx_nation_regionkey ON public.nation USING btree (n_regionkey);  
CREATE INDEX idx6 ON public.nation USING btree (n_nationkey, n_regionkey);  
  
CREATE INDEX idx5 ON public.region USING btree (r_name, r_regionkey);  
  
CREATE INDEX IDX_CUSTOMER_NATIONKEY ON CUSTOMER (C_NATIONKEY);  
  
CREATE INDEX IDX_LINEITEM_PART_SUPP ON LINEITEM (L_PARTKEY,L_SUPPKEY);  
CREATE INDEX idx_lineitem_shipdate ON public.lineitem USING btree (l_shipdate, l_discount, l_quantity);  
CREATE INDEX idx_lineitem__2 ON public.lineitem USING btree (l_partkey);  
CREATE INDEX idx_lineitem__3 ON public.lineitem USING btree (l_suppkey);  
CREATE INDEX idx_lineitem__11 ON public.lineitem USING btree (l_shipdate);  
CREATE INDEX idx_lineitem_orderkey ON public.lineitem USING btree (l_orderkey);  
CREATE INDEX idx1 ON public.lineitem USING btree (l_orderkey) WHERE (l_commitdate < l_receiptdate);  
  
CREATE INDEX idx_orders__6 ON public.orders USING btree (o_orderpriority);  
CREATE INDEX idx_orders_orderdate ON public.orders USING btree (o_orderdate);  
CREATE INDEX idx_orders_custkey ON public.orders USING btree (o_custkey)  
  
CREATE INDEX idx_part__5 ON public.part USING btree (p_type);  
CREATE INDEX idx_part__6 ON public.part USING btree (p_size);  
CREATE INDEX idx_part_1 ON public.part USING btree (p_container, p_brand);  
  
CREATE INDEX idx_supplier_nation_key ON public.supplier USING btree (s_nationkey);  
CREATE INDEX idx4 ON public.supplier USING btree (s_suppkey, s_nationkey);  
  
CREATE INDEX idx ON public.partsupp USING btree (ps_partkey, ps_suppkey, ps_supplycost);  
CREATE INDEX idx_partsupp_partkey ON public.partsupp USING btree (ps_partkey);  
CREATE INDEX idx_partsupp_suppkey ON public.partsupp USING btree (ps_suppkey);  
CREATE INDEX idx_partsupp__4 ON public.partsupp USING btree (ps_supplycost);  

citus测试TPC-H遇到的兼容性报错与解决方法

错误1

ERROR:  could not run distributed query with subquery outside the FROM and WHERE clauses  
HINT:  Consider using an equality filter on the distributed table's partition column.  

报错SQL

select  
        ps_partkey,  
        sum(ps_supplycost * ps_availqty) as value  
from  
        partsupp,  
        supplier,  
        nation  
where  
        ps_suppkey = s_suppkey  
        and s_nationkey = n_nationkey  
        and n_name = 'SAUDI ARABIA'  
group by  
        ps_partkey having  
                sum(ps_supplycost * ps_availqty) > (  
                        select  
                                sum(ps_supplycost * ps_availqty) * 0.0000005000  
                        from  
                                partsupp,  
                                supplier,  
                                nation  
                        where  
                                ps_suppkey = s_suppkey  
                                and s_nationkey = n_nationkey  
                                and n_name = 'SAUDI ARABIA'  
                )  
order by  
        value desc  
LIMIT 1;  

改成

select  
        ps_partkey,  
        sum(ps_supplycost * ps_availqty) as value  
from  
        partsupp,  
        supplier,  
        nation  
where  
        ps_suppkey = s_suppkey  
        and s_nationkey = n_nationkey  
        and n_name = 'SAUDI ARABIA'  
group by  
        ps_partkey having  
                sum(ps_supplycost * ps_availqty) >   
		             ( sum(ps_supplycost * ps_availqty) filter   
			       (where  
                                ps_suppkey = s_suppkey  
                                and s_nationkey = n_nationkey  
                                and n_name = 'SAUDI ARABIA'  
			        )   
			     ) * 0.0000005000  
                         
order by  
        value desc  
LIMIT 1;  

错误2

ERROR:  the query contains a join that requires repartitioning
HINT:  Set citus.enable_repartition_joins to on to enable repartitioning

报错SQL

select
        l_orderkey,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        o_orderdate,
        o_shippriority
from
        customer,
        orders,
        lineitem
where
        c_mktsegment = 'AUTOMOBILE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < date '1995-03-08'
        and l_shipdate > date '1995-03-08'
group by
        l_orderkey,
        o_orderdate,
        o_shippriority
order by
        revenue desc,
        o_orderdate
LIMIT 10;

有些SQL需要repartition join,需要设置如下参数

alter role postgres set citus.enable_repartition_joins =on;  

错误3

ERROR:  the intermediate result size exceeds citus.max_intermediate_result_size (currently 1048576 kB)  
DETAIL:  Citus restricts the size of intermediate results of complex subqueries and CTEs to avoid accidentally pulling large result sets into once place.  
HINT:  To run the current query, set citus.max_intermediate_result_size to a higher value or -1 to disable.  

有些SQL的中间结果比较大,可以设置为无限制,重新测试

alter role postgres set citus.max_intermediate_result_size =-1;  

错误4

ERROR:  0A000: complex joins are only supported when all distributed tables are joined on their distribution columns with equal operator  

复杂JOIN,仅支持分布键作为JOIN的字段,并且只能使用等值JOIN。

postgres=# explain select          
        sum(l_extendedprice) / 7.0 as avg_yearly  
from  
        lineitem join   
        part on (p_partkey = l_partkey and p_brand = 'Brand#13' and p_container = 'SM PKG') join  
        (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg  
on (agg_partkey = l_partkey) limit 10;  
  
ERROR:  0A000: complex joins are only supported when all distributed tables are joined on their distribution columns with equal operator  
LOCATION:  DeferErrorIfUnsupportedSubqueryPushdown, query_pushdown_planning.c:469  

对应的代码如下

https://github.com/citusdata/citus/blob/d63cbf382288358533b71d907f6a3a7fc8a1df5f/src/backend/distributed/planner/query_pushdown_planning.c

	else if (!RestrictionEquivalenceForPartitionKeys(plannerRestrictionContext))  
	{  
		return DeferredError(ERRCODE_FEATURE_NOT_SUPPORTED,  
							 "complex joins are only supported when all distributed tables are "  
							 "joined on their distribution columns with equal operator",  
							 NULL, NULL);  
	}  

错误5

ERROR:  failed to execute task 18  

报错SQL

select  
        ps_partkey,  
        sum(ps_supplycost * ps_availqty) as value  
from  
        partsupp,  
        supplier,  
        nation  
where  
        ps_suppkey = s_suppkey  
        and s_nationkey = n_nationkey  
        and n_name = 'SAUDI ARABIA'  
group by  
        ps_partkey having  
                sum(ps_supplycost * ps_availqty) > ( sum(ps_supplycost * ps_availqty) filter (where  
                                ps_suppkey = s_suppkey  
                                and s_nationkey = n_nationkey  
                                and n_name = 'SAUDI ARABIA') ) * 0.0000005000  
                         
order by  
        value desc  
LIMIT 1;  

错误6

ERROR:  0A000: cannot run outer join query if join is not on the partition column
DETAIL:  Outer joins requiring repartitioning are not supported.
LOCATION:  FixedJoinOrderList, multi_join_order.c:189

报错SQL

select
        c_count,
        count(*) as custdist
from
        (
                select
                        c_custkey,
                        count(o_orderkey)
                from
                        customer left outer join orders on
                                c_custkey = o_custkey
                                and o_comment not like '%unusual%requests%'
                group by
                        c_custkey
        ) as c_orders (c_custkey, c_count)
group by
        c_count
order by
        custdist desc,
        c_count desc
LIMIT 1;

小结

citus的复杂SQL的语法支持还不是非常完备,如果要实现比较复杂的运算,建议Writing in SQL, thinking in MapReduce

https://www.citusdata.com/blog/2018/08/17/breaking-down-citus-real-time-executor/

这样的话,基本上你可以避免掉复杂的SQL,使用分解动作来实现。

Flag Counter

digoal’s 大量PostgreSQL文章入口