PostgreSQL PGCluster-II like Oracle’s RAC can used in PG-XC or other distributed db based pg improve DBsys’s Availbility
背景
昨天和一位朋友聊天, 得知他们想在PostgreSQL 中加入类似Oracle RAC的功能,
这个功能的好处之一: 当一台主机DOWN掉后, 可以平滑的切到另一台主机, 不需要中断会话, 中断未提交的事务等.
当然, 不中断事务的话, 可能对性能影响较大, 因为事务信息必须在多个主机间同步.
不中断会话对性能的影响没有那么大, 除非是短连接的业务.
其他的功能也可以建立在此基础之上, 例如数据库负载均衡, 利用主机的计算能力并行处理等. (当然也需要考虑类似Oracle rac的gc buffer带来的影响问题, 如果使用infiniband这样的设备可能可以减轻交互带来的影响)
如果能实现的话, 对于分布式数据库系统也是非常有帮助的, 因为分布式数据库的HA比较麻烦, 例如PG-XC, 当一个数据节点挂掉之后, 实际上会对整个分布式数据库带来影响, 一般可以通过流复制, shared disk HA等方法来提高datanode的可用性, 但是不管使用流复制HA还是shared disk HA, 都有一个问题: 已经建立的会话肯定是要重新建立的, 正在运行的事务肯定是会失败的.
如果PostgreSQL有类似Oracle RAC的特性, 那么就可以将基于PostgreSQL的分布式数据库如PG-XC的可用性提高一个层次.
以前有一个项目叫PGCluster-II, 和Oracle RAC的思路非常相似, 但是这个项目现在好像没有踪迹了.
有兴趣的朋友可以阅读一下以下paper.
http://www.pgcon.org/2007/schedule/events/6.en.html
http://www.pgcon.org/2007/schedule/attachments/1-pgcluster-II.pdf
http://www.pgcon.org/2007/schedule/attachments/26-pgcluster2pgcon.pdf
http://www.emm.usp.br/downloads/pg/pgcluster.pdf
[
[
PGCluster-II
Clustering system of PostgreSQL that is using shared data
PGCluster-II is a shared-disk based clustering system for PostgreSQL, implemented with virtual IPC for session and lock management. It supports both high availability and load balancing needs.
For the past 8 years, Atsushi Mitani has developed a number of web applications which were used PostgreSQL as the backend database. To support high availability for these sites, he developed PGCluster, a multi-master and synchronous replication system for PostgreSQL.
However, PGCluster did not help with load-balancing data writes and new web applications are more write-intensive, as well as requiring more links between disparate data in the database. PGCluster-II is engineered to support load balancing and high availability for this new generation of web applications.
Atsushi will explain how PGCluster is designed and show you how to install and test it.