PostgreSQL HOT STANDBY using log shipping

8 minute read

背景

PostgreSQL HOT STANDBY by log shipping

一、准备硬件

1. 主节点硬件配置

DISK : 146GB*6  
MEM : 14GB  
CPU : 2.83GHz*8  

2. standby节点硬件配置

DISK : 146GB*4  
MEM : 8GB  
CPU : 2.0GHz*8  

二、准备环境

1. 系统

Red Hat Enterprise Linux Server release 5.5 (Tikanga) x64  

2. 时钟同步

8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc  

3. 配置目录

mkdir -p /database/pgdata/tbs1  
mkdir -p /database/pgdata/tbs2  
mkdir -p /database/pgdata/tbs3  
mkdir -p /database/pgdata/tbs4  
mkdir -p /database/pgdata/tbs5  
  
fdisk  
  
mkfs.ext3  
  
mount /dev/cciss/c0d1p1 /database/pgdata/tbs1  
mount /dev/cciss/c0d2p1 /database/pgdata/tbs2  
mount /dev/cciss/c0d3p1 /database/pgdata/tbs3  
mount /dev/cciss/c0d4p1 /database/pgdata/tbs4  
mount /dev/cciss/c0d5p1 /database/pgdata/tbs5  

master节点:

[root@db-172-16-3-33 ~]# df -h  
Filesystem            Size  Used Avail Use% Mounted on  
/dev/cciss/c0d0p1      31G  8.1G   21G  29% /  
/dev/cciss/c0d0p3      88G  1.7G   81G   3% /opt  
tmpfs                 6.9G     0  6.9G   0% /dev/shm  
/dev/cciss/c0d1p1     135G   76M  128G   1% /database/pgdata/tbs1  
/dev/cciss/c0d2p1     135G  6.1G  122G   5% /database/pgdata/tbs2  
/dev/cciss/c0d3p1     135G  3.3G  125G   3% /database/pgdata/tbs3  
/dev/cciss/c0d4p1     135G  5.6G  123G   5% /database/pgdata/tbs4  
/dev/cciss/c0d5p1     135G   16G  113G  13% /database/pgdata/tbs5  

slave节点:

Filesystem            Size  Used Avail Use% Mounted on  
/dev/sda1              31G  3.5G   26G  13% /  
/dev/sda3              94G  386M   89G   1% /opt  
tmpfs                 3.9G     0  3.9G   0% /dev/shm  
/dev/sdb1             134G   76M  128G   1% /database/pgdata/tbs1  
/dev/sdc1             134G  188M  127G   1% /database/pgdata/tbs2  
/dev/sdd1             134G  2.9G  125G   3% /database/pgdata/tbs3  
172.16.3.33:/database/pgdata/tbs4  
                      135G  5.6G  123G   5% /database/pgdata/tbs4  

fstab

vi /etc/fstab  

4. 在主节点配置nfs,将wal归档目录export出去,

(这里没有使用中央日志服务器,有条件的话还是需要一个比较大的日志服务器为好,以便支持更多的slave节点)

/database/pgdata/tbs4 172.16.3.39/32(rw,no_root_squash,sync)  

slave节点mount这个目录.

确保master节点和slave节点的postgres用户gid uid相同,否则可能有权限的问题.

5. 配置sysctl内核参数, limits.conf资源使用限制等

kernel.shmmni = 4096  
kernel.sem = 501000 6412800000 501000 12800  
fs.file-max = 767246  
net.ipv4.ip_local_port_range = 1024 65000  
net.core.rmem_default = 1048576  
net.core.rmem_max = 1048576  
net.core.wmem_default = 262144  
net.core.wmem_max = 262144  
net.ipv4.tcp_tw_recycle=1   
net.ipv4.tcp_max_syn_backlog=4096   
net.core.netdev_max_backlog=10000  
vm.overcommit_memory=0  
net.ipv4.ip_conntrack_max=655360  
  
  
  
*  soft    nofile  131072  
*  hard    nofile  131072  
*  soft    nproc   131072  
*  hard    nproc   131072  
*  soft    core    unlimited  
*  hard    core    unlimited  
*  soft    memlock 50000000  
*  hard    memlock 50000000  

6. 配置系统服务

chkconfig --level 35 nfs on  
chkconfig --level 35 portmap pn  

7. 配置防火墙

vi /etc/sysconfig/iptables  

8. 升级操作系统补丁,驱动等

三、安装PostgreSQL 9.0.2

1. postgres user profile:

export PS1="$USER@`/bin/hostname -s`-> "  
export PGPORT=1921  
export PGDATA=/database/pgdata/tbs1/pg_root  
export PGARCHIVE=/database/pgdata/tbs4/pg_arch  
  
export LANG=en_US.utf8  
  
export PGHOME=/opt/pgsql  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export DATE=`date +"%Y%m%d%H%M"`  
export PATH=$PGHOME/bin:$PATH:.  
export MANPATH=$PGHOME/share/man:$MANPATH  
alias rm='rm -i'  
alias ll='ls -lh'  

2. 配置数据库相关目录

2.1 log

  /var/applog/pg_log  

2.2 pghome

  /opt/pgsql  

2.3 pgdata

  /database/pgdata/tbs1/pg_root  

2.4 pgarchive

  /database/pgdata/tbs4/pg_arch  

3. 初始化数据库

initdb -D /database/pgdata/tbs1/pg_root -E UTF8 --locale=C -U postgres -X /database/pgdata/tbs2/pg_xlog -W   

四、配置master节点

1. pg_hba.conf

# "local" is for Unix domain socket connections only  
local   all             all                                     trust  
# IPv4 local connections:  
host    all             all             127.0.0.1/32            trust  
# IPv6 local connections:  
# host    all             all             ::1/128                 trust  
host all all 0.0.0.0/0  md5  

2. postgresql.conf

listen_addresses = '*'          # what IP address(es) to listen on;  
port = 1921                             # (change requires restart)  
max_connections = 2000                  # (change requires restart)  
unix_socket_directory = '/database/pgdata/tbs1/pg_root'         # (change requires restart)  
unix_socket_permissions = 0700          # begin with 0 to use octal notation  
password_encryption = on  
shared_buffers = 2048MB                 # min 128kB  
maintenance_work_mem = 2048MB           # min 1MB  
max_stack_depth = 8MB                   # min 100kB  
wal_level = hot_standby                 # minimal, archive, or hot_standby  
synchronous_commit = off                # immediate fsync at commit  
wal_sync_method = fdatasync             # the default is the first option   
wal_buffers = 128000kB                  # min 32kB  
wal_writer_delay = 20ms                 # 1-10000 milliseconds  
checkpoint_segments = 64                # in logfile segments, min 1, 16MB each  
checkpoint_timeout = 30min              # range 30s-1h  
archive_mode = on               # allows archiving to be done  
archive_command = 'cp %p $PGARCHIVE/%f'         # command to use to archive a logfile segment  
max_wal_senders = 30            # max number of walsender processes  
random_page_cost = 2.0                  # same scale as above  
effective_cache_size = 12800MB  
constraint_exclusion = partition        # on, off, or partition  
log_destination = 'csvlog'              # Valid values are combinations of  
logging_collector = on          # Enable capturing of stderr and csvlog  
log_directory = '/var/applog/pg_log'            # directory where log files are written,  
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,  
log_truncate_on_rotation = on           # If on, an existing log file of the  
log_rotation_age = 1d                   # Automatic rotation of logfiles will  
log_rotation_size = 10MB                # Automatic rotation of logfiles will   
log_min_duration_statement = 1000ms     # -1 is disabled, 0 logs all statements  
log_checkpoints = on  
log_lock_waits = on                     # log lock waits >= deadlock_timeout  
log_statement = 'ddl'                   # none, ddl, mod, all  
track_activity_query_size = 2048        # (change requires restart)  
autovacuum = on                 # Enable autovacuum subprocess?  'on'   
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and  
check_function_bodies = on  
bytea_output = 'escape'                 # hex, escape  
datestyle = 'iso, mdy'  
lc_messages = 'C'                       # locale for system error message  
lc_monetary = 'C'                       # locale for monetary formatting  
lc_numeric = 'C'                        # locale for number formatting  
lc_time = 'C'                           # locale for time formatting  
default_text_search_config = 'pg_catalog.english'  
deadlock_timeout = 1s  

3. 启动主节点.

五、传输基础文件至slave节点,模拟一个正在运行的数据库生成复制库的操作.

1. on the master

select pg_start_backup('replication backup');  

2. on the master

scp -R $PGDATA $SLAVE_IP:$PGDATA  

3. on the master

select pg_stop_backup();  

六、配置slave节点

1. on the slave

chown -R postgres:postgres $PGDATA  
su - postgres  
cd $PGDATA  
rm postmaster.pid  
rm -rf pg_xlog  
ln -s /database/pgdata/tbs2/pg_xlog ./pg_xlog  

2. 配置postgresql.conf

listen_addresses = '*'          # what IP address(es) to listen on;  
port = 1921                             # (change requires restart)  
max_connections = 2000                  # (change requires restart)  
unix_socket_directory = '/database/pgdata/tbs1/pg_root'         # (change requires restart)  
unix_socket_permissions = 0700          # begin with 0 to use octal notation  
password_encryption = on  
shared_buffers = 2048MB                 # min 128kB  
maintenance_work_mem = 2048MB           # min 1MB  
max_stack_depth = 8MB                   # min 100kB  
wal_level = hot_standby                 # minimal, archive, or hot_standby  
synchronous_commit = off                # immediate fsync at commit  
wal_sync_method = fdatasync             # the default is the first option   
wal_buffers = 128000kB                  # min 32kB  
wal_writer_delay = 20ms                 # 1-10000 milliseconds  
checkpoint_segments = 64                # in logfile segments, min 1, 16MB each  
checkpoint_timeout = 30min              # range 30s-1h  
archive_mode = on               # allows archiving to be done  
archive_command = 'cp %p $PGARCHIVE/%f'         # command to use to archive a logfile segment  
max_wal_senders = 30            # max number of walsender processes  
hot_standby = off                        # 这个参数在初始化slave的时候关闭是比较明智的选择,在初始同步完成后在开启  
random_page_cost = 2.0                  # same scale as above  
effective_cache_size = 12800MB  
constraint_exclusion = partition        # on, off, or partition  
log_destination = 'csvlog'              # Valid values are combinations of  
logging_collector = on          # Enable capturing of stderr and csvlog  
log_directory = '/var/applog/pg_log'            # directory where log files are written,  
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,  
log_truncate_on_rotation = on           # If on, an existing log file of the  
log_rotation_age = 1d                   # Automatic rotation of logfiles will  
log_rotation_size = 10MB                # Automatic rotation of logfiles will   
log_min_duration_statement = 1000ms     # -1 is disabled, 0 logs all statements  
log_checkpoints = on  
log_lock_waits = on                     # log lock waits >= deadlock_timeout  
log_statement = 'ddl'                   # none, ddl, mod, all  
track_activity_query_size = 2048        # (change requires restart)  
autovacuum = on                 # Enable autovacuum subprocess?  'on'   
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and  
check_function_bodies = on  
bytea_output = 'escape'                 # hex, escape  
datestyle = 'iso, mdy'  
lc_messages = 'C'                       # locale for system error message  
lc_monetary = 'C'                       # locale for monetary formatting  
lc_numeric = 'C'                        # locale for number formatting  
lc_time = 'C'                           # locale for time formatting  
default_text_search_config = 'pg_catalog.english'  
deadlock_timeout = 1s  

3. 配置recovery.conf

restore_command = 'cp $PGARCHIVE/%f %p'         # e.g. 'cp /mnt/server/archivedir/%f %p'  
archive_cleanup_command = 'pg_archivecleanup $PGARCHIVE %r'  
standby_mode = 'on'  
trigger_file = '/database/pgdata/tbs1/pg_root/postgresql.trigger.1921'  

4. 启动slave节点

启动完后,可以通过top看到slave节点在拼命的恢复pg_start_backup以来的所有wal.

恢复完后修改hot_standby = on,重启slave节点

七、测试

1. (on master)新建用户

create role digoal nosuperuser login encrypted password 'digoal';  

2. 新建表空间

on master  
su - postgres  
mkdir /database/pgdata/tbs3/tbs_test  
on slave  
su - postgres  
mkdir /database/pgdata/tbs3/tbs_test  
on master  
create tablespace tbs_digoal owner test location '/database/pgdata/tbs3/tbs_digoal';  

3. (on master)新建数据库

create database digoal with owner digoal template template0 encoding 'UTF8' tablespace tbs_digoal;  

4. (on master)新建schema

\c digoal digoal  
create schema digoal authorization digoal;  

5. (on master)新建表

\c digoal digoal  
create table tbl_users (id int8 , nick varchar(32));  

6. (on master)插入测试数据

insert into tbl_users select generate_series(1,10000000),'digoal';  
由于插入数据量比较大,可以很明显的看到pg_arch目录中的WAL在增加,如  
-rw------- 1 postgres postgres      254 Dec 30 15:17 000000010000000000000004.00000020.backup  
-rw------- 1 postgres postgres 67108864 Dec 30 15:19 000000010000000100000024  
-rw------- 1 postgres postgres 67108864 Dec 30 15:19 000000010000000100000025  
生成WAL后,slave节点又开始recover事件,recover完成后等待下一个wal如:  
25456 postgres  18   0 2389m 1364  736 S  0.0  0.0   0:00.00 postgres: startup process   waiting for 00000001000000010000000E   

7. (on master)使用DDL测试冲突

on master

alter table tbl_users add column first_name default 'zhou';  
alter table tbl_users add column last_name default 'digoal';  

在slave恢复期间,在slave节点执行 select count(*) from tbl_users;发生等待事件.

on slave

/var/applog/pg_log中查看最近一个日志文件,  
2010-12-30 15:04:01.462 CST,"digoal","digoal",25240,"127.0.0.1:43079",4d1c2edf.6298,1,"SELECT waiting",2010-12-30 15:03:59 CST,2/14,0,LOG,00000,"process 25240 still waiting for AccessShareLock on relation 16388 of database 16386 after 1000.564 ms",,,,,,"select count(*) from tbl_users;",22,,"psql"  

如果数据库没有其他操作了,不再发生ARCHIVE操作时,你可能会发现主节点已经alter完了,slave节点还是在等待.

原因是alter完的log信息存在的XLOG还没有发生归档,slave节点会一直等待下去(这时可以手工执行pg_switch_xlog).

8. (on master)测试checkpoint

在PostgreSQL中发生checkpoint后,在此之前的WAL在做数据库恢复时就用不到了,因为确保数据都写入数据文件了.

pg_archivecleanup也是根据checkpoint来判断和删除不需要的WAL的.

9. (on slave)测试cleanarchive

在做checkpoint前,去看$PGARCHIVE目录,已经被apply的文件还存在,并没有被pg_archivecleanup命令清除掉,原因就是这些文件是最近一次checkpoint以来的WAL文件,在数据库恢复时是需要用到的.

如果你手工执行pg_archivecleanup $PGARCHIVE 000000010000000200000031 (假设000000010000000200000031这个是在$PGARCHIVE中的一个WAL的文件名)

这条命令将删除000000010000000200000031以前生成的所有WAL文件,一定要小心操作,万一不小心把最近一次CHECKPOINT以来的WAL删除了,

补救的方法是赶紧到master上做一次checkpoint,让slave知道这次checkpoint,否则的话下次slave启动还会读到000000010000000200000031这个文件以前的文件,那时候就只能找到这些文件或重建slave了.

10. (on slave)测试active slave

激活SLAVE很简单,了解到已经apply了最新的WAL后,执行以下

su - postgres  
touch /database/pgdata/tbs1/pg_root/postgresql.trigger.1921  

数据库会触发激活的动作,激活后/database/pgdata/tbs1/pg_root/postgresql.trigger.1921这个文件会自动删掉,并且recovery.conf被重命名为recovery.done.

激活后的slave不可逆转为slave了.需要重建.

11. 监控

pg_current_xlog_insert_location  
pg_current_xlog_location  
pg_last_xlog_receive_location  
pg_last_xlog_replay_location  
top  
CREATE OR REPLACE VIEW pg_stat_replication AS  
    SELECT  
            S.procpid,  
            S.usesysid,  
            U.rolname AS usename,  
            S.application_name,  
            S.client_addr,  
            S.client_port,  
            S.backend_start  
    FROM pg_stat_get_activity(NULL) AS S, pg_authid U  
    WHERE S.usesysid = U.oid AND S.datid = 0;  

八、附pgctl.sh脚本

#!/bin/bash  
  
# environment.  
# Get the aliases and functions  
if [ -f ~/.bashrc ]; then  
        . ~/.bashrc  
fi  
  
# User specific environment and startup programs  
  
export PGHOME=/opt/pgsql  
export PATH=$PGHOME/bin:$PATH  
export PGDATA=/database/pgdata/tbs1/pg_root  
export PGPORT=1921  
export LANG='en_US.utf8'  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
  
RETVAL=1  
  
start() {  
su - postgres -c "/usr/bin/nohup $PGHOME/bin/postgres -D $PGDATA -p $PGPORT >/dev/null 2>>/var/applog/pg_log/start_err.log  </dev/null &"  
         RETVAL=$?  
         return $RETVAL  
}  
  
stop() {  
su - postgres -c "$PGHOME/bin/pg_ctl stop -D $PGDATA -m fast"  
         RETVAL=$?  
         return $RETVAL  
}  
  
reload() {  
su - postgres -c "$PGHOME/bin/pg_ctl reload -D $PGDATA"  
         RETVAL=$?  
         return $RETVAL  
}  
  
# See how we were called.  
case "$1" in  
  start)  
        start  
        ;;  
  stop)  
        stop  
        ;;  
  restart)  
        stop  
        start  
        ;;  
  reload)  
        reload  
        ;;  
  *)  
        echo $"Usage: $prog {start|stop|restart|reload}"  
        exit 2  
esac  
  
exit $RETVAL  
# Auth Digoal.Zhou  
# Corp. Sky-Mobi  

久、其他

1. 自9.0以后,PostgreSQL引入了一个叫pg_archivecleanup的模块,简化了standby的配置.以前通过pg_standby来实现的.

2. 第二象限开发的基于PostgreSQL 内部复制的产品

http://projects.2ndquadrant.com/repmgr

3. other blog

PostgreSQL 9.1 Allow standby recovery to switch to a new timeline automatically

http://blog.163.com/digoal@126/blog/static/163877040201182395310376/

PostgreSQL 9.2 devel adding cascading replication support

http://blog.163.com/digoal@126/blog/static/1638770402012012361519/

PostgreSQL HOT STANDBY using Stream

http://blog.163.com/digoal@126/blog/static/16387704020110442050808/

PostgreSQL cluster role switchover between primary and standby

http://blog.163.com/digoal@126/blog/static/163877040201141154024306/

We can ignore the performance influence when use sync replication in PostgreSQL 9.1

http://blog.163.com/digoal@126/blog/static/163877040201192203458765/

PostgreSQL 9.1 Replication role privilege change to REPLICATION from SUPERUSER

http://blog.163.com/digoal@126/blog/static/16387704020114112379185/

PostgreSQL 9.0.2 Replication Best Practices

http://blog.163.com/digoal@126/blog/static/1638770402010113034232645/

PostgreSQL replication monitor

http://blog.163.com/digoal@126/blog/static/163877040201141134748660/

New replication mode: async, write, fsync, replay

http://blog.163.com/digoal@126/blog/static/16387704020121231117557/

PostgreSQL HOT STANDBY using log shipping

http://blog.163.com/digoal@126/blog/static/1638770402010113053825671/

Flag Counter

digoal’s 大量PostgreSQL文章入口