PostgreSQL 9.6 sharding + 单元化 (based on postgres_fdw) 最佳实践 - 通用水平分库场景设计与实践

12 minute read

背景

20161004_01.md这篇文档讲解了PostgreSQL postgres_fdw的用法以及9.6的增强。

本文将以实践为主,定一个小目标,讲解一下如何使用postgres_fdw实现sharding。

单元化则可以结合bdr插件来实现。

一、通用场景设计

我以一个最简单易懂的场景来举例,例如所有的业务表都包含了用户ID这个字段,并且业务访问数据时,一定会带上用户ID进行访问。

同时用户没有跨用户ID的访问需求。

以上设计是大多数公司使用数据库sharding的最佳实践设计。

目标场景

以典型的关系系统为例,构建一个测试场景,每个表都带有用户ID,以用户ID哈希取模为分片规则。

1. 用户信息表

create table userinfo(uid int8 primary key, info text, crt_time timestamp);

2. 日志表

create table user_log(uid int8, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);

3. 用户正向关系表(user like who)

create table user_membership(uid1 int8, uid2 int8, unique (uid1,uid2) );

聚合物化视图

create materialized view mv_user_membership as select uid1,array_agg(uid2) uid2s from user_membership group by uid1;
create unique index idx_mv_user_membership_1 on mv_user_membership (uid1);
refresh materialized view CONCURRENTLY mv_user_membership with data;

4. 用户反向关系表(who like user)

create table user_membership_rev(uid2 int8, uid1 int8, unique (uid2,uid1) );

聚合物化视图

create materialized view mv_user_membership_rev as select uid2,array_agg(uid1) uid1s from user_membership_rev group by uid2;
create unique index idx_mv_user_membership_rev_1 on mv_user_membership_rev (uid2);
refresh materialized view CONCURRENTLY mv_user_membership_rev with data;

ddl-1.sql如下

create table userinfo(uid int8 primary key, info text, crt_time timestamp);

create table user_log(uid int8, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);

create table user_membership(uid1 int8, uid2 int8, unique (uid1,uid2) );

create materialized view mv_user_membership as select uid1,array_agg(uid2) uid2s from user_membership group by uid1;
create unique index idx_mv_user_membership_1 on mv_user_membership (uid1);
refresh materialized view CONCURRENTLY mv_user_membership with data;

create table user_membership_rev(uid2 int8, uid1 int8, unique (uid2,uid1) );

create materialized view mv_user_membership_rev as select uid2,array_agg(uid1) uid1s from user_membership_rev group by uid2;
create unique index idx_mv_user_membership_rev_1 on mv_user_membership_rev (uid2);
refresh materialized view CONCURRENTLY mv_user_membership_rev with data;

业务需求

userinfo 增删改查
新增用户、销毁用户、修改用户资料、查询用户资料

user_log 增查
新增用户登陆日志、查询用户日志

user_membership 增删查
新增用户关系,删除用户关系,查询用户关系

user_membership_rev 增删查
新增反向用户关系,删除反向用户关系,查询反向用户关系

mv_user_membership 刷新、查询
mv_user_membership_rev 刷新、查询
聚合物化视图主要是提高关系查询效率用的,刷新物化视图,查询物化视图

构建测试环境

初设256个分片,4个datanode,每个datanode存放64个分片。

测试时使用本地环境模拟,真实环境修改为对应的IP地址和端口即可。

segment 1 : 127.0.0.1:5281:db0:role0:pwd , schema_name role0

segment 2 : 127.0.0.1:5281:db1:role1:pwd , schema_name role1

segment 3 : 127.0.0.1:5281:db2:role2:pwd , schema_name role2

segment 4 : 127.0.0.1:5281:db3:role3:pwd , schema_name role3

master 1 : 127.0.0.1:5281:mas1:digoal:pwd , schema_name digoal

master 2 : 127.0.0.1:5281:mas2:digoal:pwd , schema_name digoal

修改pg_hba.conf,确保127.0.0.1使用md5认证

0. 初始化测试环境

create role role0 nosuperuser login encrypted password 'pwd';
create role role1 nosuperuser login encrypted password 'pwd';
create role role2 nosuperuser login encrypted password 'pwd';
create role role3 nosuperuser login encrypted password 'pwd';
create role digoal nosuperuser login encrypted password 'pwd';

create database db0 with template template0 encoding 'UTF8';
create database db1 with template template0 encoding 'UTF8';
create database db2 with template template0 encoding 'UTF8';
create database db3 with template template0 encoding 'UTF8';
create database mas1 with template template0 encoding 'UTF8';
create database mas2 with template template0 encoding 'UTF8';

grant all on database db0 to role0;
grant all on database db1 to role1;
grant all on database db2 to role2;
grant all on database db3 to role3;
grant all on database mas1 to digoal;
grant all on database mas2 to digoal;

\c db0 role0
create schema role0;
\c db1 role1
create schema role1;
\c db2 role2
create schema role2;
\c db3 role3
create schema role3;
\c mas1 digoal
create schema digoal;
\c mas2 digoal
create schema digoal;

\c mas1 postgres
create extension postgres_fdw;
grant usage on foreign data wrapper postgres_fdw to digoal;
\c mas2 postgres
create extension postgres_fdw;
grant usage on foreign data wrapper postgres_fdw to digoal;

1. 初始化segments (datanodes)

初始化segment1

\c db0 role0

执行 ddl-1.sql  

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 64;
  nod int := 0;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all, check(mod(abs(uid), 256) = '||i||' ))';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all, check(mod(abs(uid), 256) = '||i||' ))';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all, check(mod(abs(uid1), 256) = '||i||' ))';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all, check(mod(abs(uid2), 256) = '||i||' ))';
      execute sql;
      sql := 'create materialized view mv_user_membership_'||i||' as select uid1,array_agg(uid2) uid2s from user_membership_'||i||' group by uid1';
      execute sql;
      sql := 'create unique index idx_mv_user_membership_'||i||'_1 on mv_user_membership_'||i||' (uid1)';
      execute sql;
      sql := 'create materialized view mv_user_membership_rev_'||i||' as select uid2,array_agg(uid1) uid1s from user_membership_rev_'||i||' group by uid2';
      execute sql;
      sql := 'create unique index idx_mv_user_membership_rev_'||i||'_1 on mv_user_membership_rev_'||i||' (uid2)';
      execute sql;
  end loop;
end;
$$;

create or replace function refresh_mv(nsp name, mvname name) returns void as $$
declare
begin
  execute 'refresh materialized view CONCURRENTLY '||quote_ident($1)||'.'||quote_ident($2)||' with data';
end;
$$ language plpgsql strict;

create view v_refresh_mv0 as 
select refresh_mv('role0'::name, mvname::name)::text from 
( 
select id, 'mv_user_membership_'||id mvname from generate_series(0,255) t(id) 
union all 
select id, 'mv_user_membership_rev_'||id mvname from generate_series(0,255) t(id) 
) t 
where mod(abs(id), 256) >= 0 and mod(abs(id), 256) <= 63;

初始化segment2

\c db1 role1

执行 ddl-1.sql  

修改为
  nod int := 1;

修改为
create view v_refresh_mv1 as 
select refresh_mv('role1'::name, mvname::name)::text from 
( 
select id, 'mv_user_membership_'||id mvname from generate_series(0,255) t(id) 
union all 
select id, 'mv_user_membership_rev_'||id mvname from generate_series(0,255) t(id) 
) t 
where mod(abs(id), 256) >= 64 and mod(abs(id), 256) <= 127;

初始化segment3

\c db2 role2

执行 ddl-1.sql  

修改为
  nod int := 2;

修改为
create view v_refresh_mv2 as 
select refresh_mv('role2'::name, mvname::name)::text from 
( 
select id, 'mv_user_membership_'||id mvname from generate_series(0,255) t(id) 
union all 
select id, 'mv_user_membership_rev_'||id mvname from generate_series(0,255) t(id) 
) t 
where mod(abs(id), 256) >= 128 and mod(abs(id), 256) <= 191;

初始化segment4

\c db3 role3

执行 ddl-1.sql  

修改为
  nod int := 3;

修改为
create view v_refresh_mv3 as 
select refresh_mv('role3'::name, mvname::name)::text from 
( 
select id, 'mv_user_membership_'||id mvname from generate_series(0,255) t(id) 
union all 
select id, 'mv_user_membership_rev_'||id mvname from generate_series(0,255) t(id) 
) t 
where mod(abs(id), 256) >= 192 and mod(abs(id), 256) <= 255;

2. 初始化masters (coordinators)
master节点只存储元数据,无状态,完全对称,可以横向或者。

初始化 master_1

导入外部表结构

\c mas1 digoal

create server db0 foreign data wrapper postgres_fdw OPTIONS (host '127.0.0.1', port '5281', dbname 'db0', use_remote_estimate 'true');
create server db1 foreign data wrapper postgres_fdw OPTIONS (host '127.0.0.1', port '5281', dbname 'db1', use_remote_estimate 'true');
create server db2 foreign data wrapper postgres_fdw OPTIONS (host '127.0.0.1', port '5281', dbname 'db2', use_remote_estimate 'true');
create server db3 foreign data wrapper postgres_fdw OPTIONS (host '127.0.0.1', port '5281', dbname 'db3', use_remote_estimate 'true');

create user mapping FOR digoal SERVER db0 options (user 'role0', password 'pwd');
create user mapping FOR digoal SERVER db1 options (user 'role1', password 'pwd');
create user mapping FOR digoal SERVER db2 options (user 'role2', password 'pwd');
create user mapping FOR digoal SERVER db3 options (user 'role3', password 'pwd');

import foreign schema role0 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db0 into digoal;
import foreign schema role1 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db1 into digoal;
import foreign schema role2 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db2 into digoal;
import foreign schema role3 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db3 into digoal;

构建约束

do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 255;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' add constraint ck_userinfo_'||i||' check(mod(abs(uid), 256) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' add constraint ck_user_log_'||i||' check(mod(abs(uid), 256) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' add constraint ck_user_membership_'||i||' check(mod(abs(uid1), 256) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' add constraint ck_user_membership_rev_'||i||' check(mod(abs(uid2), 256) = '||i||' )';
      execute sql;
      sql := 'alter foreign table mv_user_membership_'||i||' add constraint ck_mv_user_membership_'||i||' check(mod(abs(uid1), 256) = '||i||' )';
      execute sql;
      sql := 'alter foreign table mv_user_membership_rev_'||i||' add constraint ck_mv_user_membership_rev_'||i||' check(mod(abs(uid2), 256) = '||i||' )';
      execute sql;
  end loop;
end;
$$;

构建继承关系

create table userinfo(uid int8 primary key, info text, crt_time timestamp);

create table user_log(uid int8, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);

create table user_membership(uid1 int8, uid2 int8, unique (uid1,uid2) );

create table mv_user_membership (uid1 int8, uid2s int8[]);

create table user_membership_rev(uid2 int8, uid1 int8, unique (uid2,uid1) );

create table mv_user_membership_rev (uid2 int8, uid1s int8[]);
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 255;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' inherit userinfo';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' inherit user_log';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' inherit user_membership';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' inherit user_membership_rev';
      execute sql;
      sql := 'alter foreign table mv_user_membership_'||i||' inherit mv_user_membership';
      execute sql;
      sql := 'alter foreign table mv_user_membership_rev_'||i||' inherit mv_user_membership_rev';
      execute sql;
  end loop;
end;
$$;

构建插入触发器

对于业务需要插入数据的表,构建插入触发器,例子使用动态SQL,未优化效率。

userinfo 增删改查
新增用户、销毁用户、修改用户资料、查询用户资料

user_log 增查
新增用户登陆日志、查询用户日志

user_membership 增删查
新增用户关系,删除用户关系,查询用户关系

user_membership_rev 增删查
新增反向用户关系,删除反向用户关系,查询反向用户关系
create or replace function tg_userinfo_ins() returns trigger as $$
declare
  i int;
begin
  i := mod(abs(NEW.uid), 256);
  execute 'insert into userinfo_'||i||'(uid,info,crt_time) values ('||quote_nullable(NEW.uid)||','||quote_nullable(NEW.info)||','||quote_nullable(NEW.crt_time)||')';
  return null;
end;
$$ language plpgsql strict;

create or replace function tg_user_log_ins() returns trigger as $$
declare
  i int;
begin
  i := mod(abs(NEW.uid), 256);
  execute 'insert into user_log_'||i||'(uid,logintime,logouttime,client_addr,client_port) values ('||quote_nullable(NEW.uid)||','||quote_nullable(NEW.logintime)||','||quote_nullable(NEW.logouttime)||','||quote_nullable(NEW.client_addr)||','||quote_nullable(NEW.client_port)||')';
  return null;
end;
$$ language plpgsql strict;

create or replace function tg_user_membership_ins() returns trigger as $$
declare
  i int;
begin
  i := mod(abs(NEW.uid1), 256);
  execute 'insert into user_membership_'||i||'(uid1,uid2) values ('||quote_nullable(NEW.uid1)||','||quote_nullable(NEW.uid2)||')';
  return null;
end;
$$ language plpgsql strict;

create or replace function tg_user_membership_rev_ins() returns trigger as $$
declare
  i int;
begin
  i := mod(abs(NEW.uid2), 256);
  execute 'insert into user_membership_rev_'||i||'(uid2,uid1) values ('||quote_nullable(NEW.uid2)||','||quote_nullable(NEW.uid1)||')';
  return null;
end;
$$ language plpgsql strict;
create trigger tg_ins before insert on userinfo for each row execute procedure tg_userinfo_ins();
create trigger tg_ins before insert on user_log for each row execute procedure tg_user_log_ins();
create trigger tg_ins before insert on user_membership for each row execute procedure tg_user_membership_ins();
create trigger tg_ins before insert on user_membership_rev for each row execute procedure tg_user_membership_rev_ins();

初始化 master_n

从mas1克隆一个结构即可  

pg_dump -s -d mas1 -U digoal | psql -U postgres -d mas2 -f -  

如果元数据发生变更,注意同步master的meta信息。

可以使用逻辑复制的方式,复制出多个一样的master。

初始化测试数据

\c mas1 digoal

-- 生成用户信息
insert into userinfo select generate_series(1,100000), md5(random()::text), now();

-- 生成用户日志信息
insert into user_log select uid,now(),null,null,null from userinfo;

-- 生成uid1喜欢uid2信息
insert into user_membership select uid1, uid2 from (select trunc(100000*random()) uid1,trunc(100000*random()) uid2 from generate_series(1,100000)) t where uid1<>uid2 group by 1,2; 

-- 生成uid2被uid1喜欢信息
insert into user_membership_rev select uid2, uid1 from user_membership; 

-- 刷新物化视图
select count(*) from v_refresh_mv0 union all select count(*) from v_refresh_mv1 union all select count(*) from v_refresh_mv2 union all select count(*) from v_refresh_mv3 ;

业务SQL 设计

1. 所有SQL都需要带上分区条件,例如

where uid=? and mod(abs(uid), 256) = mod(abs(?), 256);

where uid1=? and mod(abs(uid1), 256) = mod(abs(?), 256);

where uid2=? and mod(abs(uid2), 256) = mod(abs(?), 256);

2. userinfo 增删改查
新增用户、销毁用户、修改用户资料、查询用户资料

insert into userinfo(uid,info,text) values (?,?,?);

delete from userinfo where uid=? and mod(abs(uid), 256) = mod(abs(?), 256);

update userinfo set xxx=xxx where uid=? and mod(abs(uid), 256) = mod(abs(?), 256);

select x,x,x from userinfo where uid=? and mod(abs(uid), 256) = mod(abs(?), 256);

3. user_log 增查
新增用户登陆日志、查询用户日志

insert into user_log(uid,logintime,logouttime,client_addr,client_port) values (?,?,?,?,?);

select x,x,x,x,x from user_log where uid=? and mod(abs(uid), 256) = mod(abs(?), 256);

4. user_membership 增删查
新增用户关系,删除用户关系,查询用户关系

insert into user_membership(uid1, uid2) values (?,?);

delete from user_membership where uid1=? and mod(abs(uid1), 256) = mod(abs(?), 256);

select x,x from user_membership where uid1=? and mod(abs(uid1), 256) = mod(abs(?), 256);

5. user_membership_rev 增删查
新增反向用户关系,删除反向用户关系,查询反向用户关系

insert into user_membership_rev(uid2, uid1) values (?,?);

delete from user_membership_rev where uid2=? and mod(abs(uid2), 256) = mod(abs(?), 256);

select x,x from user_membership_rev where uid2=? and mod(abs(uid2), 256) = mod(abs(?), 256);

6. 物化视图
mv_user_membership 刷新、查询
mv_user_membership_rev 刷新、查询
聚合物化视图主要是提高关系查询效率用的,刷新物化视图,查询物化视图

select count(*) from v_refresh_mv0 union all select count(*) from v_refresh_mv1 union all select count(*) from v_refresh_mv2 union all select count(*) from v_refresh_mv3 ;

select x,x from mv_user_membership where uid1=? and mod(abs(uid1), 256) = mod(abs(?), 256);

select x,x from mv_user_membership_rev where uid2=? and mod(abs(uid2), 256) = mod(abs(?), 256);

执行计划查看

1. 查看执行计划,SQL是否下推到对应节点执行,是否只访问目标数据节点。

查询
mas1=> explain verbose select * from userinfo where uid=100 and mod(abs(uid), 256) = mod(abs(100), 256);
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..102.19 rows=2 width=48)
   ->  Seq Scan on digoal.userinfo  (cost=0.00..0.00 rows=1 width=48)
         Output: userinfo.uid, userinfo.info, userinfo.crt_time
         Filter: ((userinfo.uid = 100) AND (mod(abs(userinfo.uid), '256'::bigint) = 100))
   ->  Foreign Scan on digoal.userinfo_100  (cost=100.15..102.19 rows=1 width=49)
         Output: userinfo_100.uid, userinfo_100.info, userinfo_100.crt_time
         Remote SQL: SELECT uid, info, crt_time FROM role1.userinfo_100 WHERE ((uid = 100)) AND ((mod(abs(uid), 256::bigint) = 100))
(7 rows)

排序
mas1=> explain verbose select * from user_membership where uid1=100 and mod(abs(uid1), 256) = mod(abs(100), 256) order by uid2;
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=102.33..102.34 rows=2 width=16)
   Output: user_membership.uid1, user_membership.uid2
   Sort Key: user_membership.uid2
   ->  Append  (cost=0.00..102.32 rows=2 width=16)
         ->  Seq Scan on digoal.user_membership  (cost=0.00..0.00 rows=1 width=16)
               Output: user_membership.uid1, user_membership.uid2
               Filter: ((user_membership.uid1 = 100) AND (mod(abs(user_membership.uid1), '256'::bigint) = 100))
         ->  Foreign Scan on digoal.user_membership_100  (cost=100.27..102.32 rows=1 width=16)
               Output: user_membership_100.uid1, user_membership_100.uid2
               Remote SQL: SELECT uid1, uid2 FROM role1.user_membership_100 WHERE ((uid1 = 100)) AND ((mod(abs(uid1), 256::bigint) = 100)) ORDER BY uid2 ASC NULLS LAST
(10 rows)

更新
mas1=> explain verbose update userinfo set info='new' where uid=100 and mod(abs(uid), 256) = mod(abs(100), 256);
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Update on digoal.userinfo  (cost=0.00..102.20 rows=2 width=54)
   Update on digoal.userinfo
   Foreign Update on digoal.userinfo_100
   ->  Seq Scan on digoal.userinfo  (cost=0.00..0.00 rows=1 width=54)
         Output: userinfo.uid, 'new'::text, userinfo.crt_time, userinfo.ctid
         Filter: ((userinfo.uid = 100) AND (mod(abs(userinfo.uid), '256'::bigint) = 100))
   ->  Foreign Update on digoal.userinfo_100  (cost=100.15..102.20 rows=1 width=54)
         Remote SQL: UPDATE role1.userinfo_100 SET info = 'new'::text WHERE ((uid = 100)) AND ((mod(abs(uid), 256::bigint) = 100))
(8 rows)

删除
mas1=> explain verbose delete from user_membership where uid1=1001 and mod(abs(uid1), 256) = mod(abs(1001), 256);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Delete on digoal.user_membership  (cost=0.00..102.33 rows=2 width=6)
   Delete on digoal.user_membership
   Foreign Delete on digoal.user_membership_233
   ->  Seq Scan on digoal.user_membership  (cost=0.00..0.00 rows=1 width=6)
         Output: user_membership.ctid
         Filter: ((user_membership.uid1 = 1001) AND (mod(abs(user_membership.uid1), '256'::bigint) = 233))
   ->  Foreign Delete on digoal.user_membership_233  (cost=100.27..102.33 rows=1 width=6)
         Remote SQL: DELETE FROM role3.user_membership_233 WHERE ((uid1 = 1001)) AND ((mod(abs(uid1), 256::bigint) = 233))
(8 rows)

JOIN
mas1=> explain verbose select * from userinfo t1 join user_membership t2 on (t1.uid=t2.uid1) 
where 
t2.uid1=1001 and mod(abs(t2.uid1), 256) = mod(abs(1001), 256) 
and 
t1.uid=1001 and mod(abs(t1.uid), 256) = mod(abs(1001), 256);
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..204.57 rows=4 width=64)
   Output: t1.uid, t1.info, t1.crt_time, t2.uid1, t2.uid2
   ->  Append  (cost=0.00..102.19 rows=2 width=48)
         ->  Seq Scan on digoal.userinfo t1  (cost=0.00..0.00 rows=1 width=48)
               Output: t1.uid, t1.info, t1.crt_time
               Filter: ((t1.uid = 1001) AND (mod(abs(t1.uid), '256'::bigint) = 233))
         ->  Foreign Scan on digoal.userinfo_233 t1_1  (cost=100.15..102.19 rows=1 width=49)
               Output: t1_1.uid, t1_1.info, t1_1.crt_time
               Remote SQL: SELECT uid, info, crt_time FROM role3.userinfo_233 WHERE ((uid = 1001)) AND ((mod(abs(uid), 256::bigint) = 233))
   ->  Materialize  (cost=0.00..102.33 rows=2 width=16)
         Output: t2.uid1, t2.uid2
         ->  Append  (cost=0.00..102.32 rows=2 width=16)
               ->  Seq Scan on digoal.user_membership t2  (cost=0.00..0.00 rows=1 width=16)
                     Output: t2.uid1, t2.uid2
                     Filter: ((t2.uid1 = 1001) AND (mod(abs(t2.uid1), '256'::bigint) = 233))
               ->  Foreign Scan on digoal.user_membership_233 t2_1  (cost=100.27..102.32 rows=1 width=16)
                     Output: t2_1.uid1, t2_1.uid2
                     Remote SQL: SELECT uid1, uid2 FROM role3.user_membership_233 WHERE ((uid1 = 1001)) AND ((mod(abs(uid1), 256::bigint) = 233))
(18 rows)

以上SQL满足业务需求。

对于SQL中带的用户新增的函数或其他条件,如果要下推,则必须使用extension来管理这些函数,master与所有的segment都需要创建这些extension,并且在创建server时包含这些extension。

语法见我上一篇讲postgres_fdw的文档。

全局约束设计

1. 全局唯一序列

步调一致,起始值不一致。

create sequence seq increment by 10000 start with 0;
create sequence seq increment by 10000 start with 1;
create sequence seq increment by 10000 start with 2;
......
create sequence seq increment by 10000 start with 255;

master横向扩展

由于master只存储了META数据,因此master 都是对等的,如果master成为瓶颈的话,我们可以新增master节点,上层使用负载均衡的方式连接多个master即可。

通常master不会成为瓶颈,因为master不参与计算,对于9.6来说,单节点的master已经可以处理百万级别的QPS(基于PK的查询)。

pic1

fdw的一些使用限制

1. 外部表暂时不支持truncate

mas1=> truncate userinfo ;
ERROR:  42809: "userinfo_0" is not a table
LOCATION:  truncate_check_rel, tablecmds.c:1300
Time: 5.950 ms

2. 外部表暂时不支持透传upsert

mas1=> insert into userinfo values (1,'test',now()) on conflict (uid) do update set info=excluded.info;
ERROR:  23505: duplicate key value violates unique constraint "userinfo_1_pkey"
DETAIL:  Key (uid)=(1) already exists.
CONTEXT:  Remote SQL command: INSERT INTO role0.userinfo_1(uid, info, crt_time) VALUES ($1, $2, $3)
SQL statement "insert into userinfo_1(uid,info,crt_time) values ('1','test','2016-10-05 16:50:15.794812')"
PL/pgSQL function tg_userinfo_ins() line 6 at EXECUTE
LOCATION:  pgfdw_report_error, connection.c:574
Time: 0.931 ms

3. 目前没有办法过滤主表的扫描(除非使用 https://github.com/postgrespro/pg_pathman/ 插件)

mas1=> explain verbose select * from user_membership where uid1=100 and mod(abs(uid1), 256) = mod(abs(100), 256) order by uid2;
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=102.33..102.34 rows=2 width=16)
   Output: user_membership.uid1, user_membership.uid2
   Sort Key: user_membership.uid2
   ->  Append  (cost=0.00..102.32 rows=2 width=16)
         ->  Seq Scan on digoal.user_membership  (cost=0.00..0.00 rows=1 width=16)
               Output: user_membership.uid1, user_membership.uid2
               Filter: ((user_membership.uid1 = 100) AND (mod(abs(user_membership.uid1), '256'::bigint) = 100))
         ->  Foreign Scan on digoal.user_membership_100  (cost=100.27..102.32 rows=1 width=16)
               Output: user_membership_100.uid1, user_membership_100.uid2
               Remote SQL: SELECT uid1, uid2 FROM role1.user_membership_100 WHERE ((uid1 = 100)) AND ((mod(abs(uid1), 256::bigint) = 100)) ORDER BY uid2 ASC NULLS LAST
(10 rows)

简单场景未包含需要在多个segment之间复制的维度表,小表。

下一篇将为大家介绍

二、复杂场景设计

三、单元化

负载均衡  
  
对等  
  
单元化  

大表,水平拆分  

小表,但是需要与其他表JOIN操作,逻辑复制   

维度表,复制  

一些单元化相关的表,可以垂直拆分或隔离,按单元拆分  

同时需要考虑全局约束,序列步调等  

还将包括,复制怎么做?  

如何避免环路,如何解决冲突。    

Flag Counter

digoal’s 大量PostgreSQL文章入口