PostgreSQL Linux 下 僵尸状态的处理
背景
在某些非常特殊的情况下,可能遇到数据库无法停库,也无法连接(几乎处于不可用状态)的情况。即使使用-m immediate这样的暴力停库都无法停下来。
例如,当数据库的某些用户进程进入了defunct状态时。
2951 20411 37111 0 14:28 ? 00:00:00 postgres: digoal postgres [local] idle
2951 37111 1 0 2017 ? 00:05:48 /home/digoal/pgsql/bin/postgres -D /home/digoal/pgdata
2951 37121 37111 0 2017 ? 00:01:27 [postgres] <defunct>
2951 37128 37111 0 2017 ? 00:09:09 [postgres] <defunct>
2951 37129 37111 0 2017 ? 00:01:28 postgres: stats collector process
2951 38325 37111 0 19:33 ? 00:00:00 [postgres] <defunct>
.................
37111是PG的主进程,其他是它的子进程。
此时使用如下方法都无法停库。
pg_ctl stop -m fast -D /home/digoal/pgdata
pg_ctl stop -m immediate -D /home/digoal/pgdata
Linux 处理 Z 僵尸进程
如何kill杀掉linux系统中的僵尸defunct进程
在 Unix系统管理中,当用ps命令观察进程的执行状态时,经常看到某些进程的状态栏为defunct,这就是所谓的“僵尸”进程。“僵尸”进程是一个早已 死亡的进程,但在进程表(processs table)中仍占了一个位置(slot)。由于进程表的容量是有限的,所以,defunct进程不仅占用系统的内存资源,影响系统的性能,而且如果其数 目太多,还会导致系统瘫痪。
我们知道,每个Unix进程在进程表里都有一个进入点(entry),核心程序执行该进程时使用到的一切信息都存储在进入点。当用ps命令察看系统中的进程信息时,看到的就是进程表中的相关数据。当以fork()系统调用建立一个新的进程后,核心进程就会在进程表中给这个新进程分配一个进入点,然后将相关信息存储在该进入点所对应的进程表内。这些信息中有一项是其父进程的识别码。当这个进程走完了自己的生命周期后,它会执行exit()系统调用,此时原来进 程表中的数据会被该进程的退出码(exit code)、执行时所用的CPU时间等数据所取代,这些数据会一直保留到系统将它传递给它的父进程为止。由此可见,defunct进程的出现时间是在子进 程终止后,但是父进程尚未读取这些数据之前。
defunct进程是不能直接kill -9杀掉的,否则就不叫僵尸进程了。
知道了defunct进程产生的原因,就可以轻易的kill掉defunct进程。
方法有二:
1,重启服务器电脑,这个是最简单,最易用的方法,但是如果你服务器电脑上运行有其他的程序,那么这个方法,代价很大。所以,尽量使用下面一种方法。
2,找到该defunct僵尸进程的父进程,将该进程的父进程杀掉,则此defunct进程将自动消失。如何找到defunct僵尸进程的父进程?很简单,一句命令就够了:ps -ef | grep defunct_process_pid。 |
PostgreSQL 进入僵尸状态后的处理
如开篇提到的CASE,如何把PG停下来呢?
分为三个级别的操作,请依次尝试,最后万不得已使用第三种方法。
1、温柔停库
pg_ctl stop -m fast -D /home/digoal/pgdata
2、紧急快速停库(不刷盘,通常用于紧急停电只有UPS的处理)
pg_ctl stop -m immediate -D /home/digoal/pgdata
3、KILL清理
首先要查看启动数据库的主进程
cd /home/digoal/pgdata
cat postmaster.pid
37111
/home/digoal/pgdata
1511749309
1921
/tmp
*
1921001 2007007232
1、杀掉postmaster 进程的所有子进程
ps -ewf|grep 37111|grep -v grep|awk '{print "kill -9 "$2}'|grep -v 37111
2、杀掉postmaster 进程
kill -9 37111
3、清理shared memory和sem
$ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x002dcaa9 2007007232 digoal 600 48 0
------ Semaphore Arrays --------
key semid owner perms nsems
0x0002243f 65538 admin 666 2
0x002dcaa9 1912963075 digoal 600 17
0x002dcaaa 1912995844 digoal 600 17
0x002dcaab 1913028613 digoal 600 17
0x002dcaac 1913061382 digoal 600 17
0x002dcaad 1913094151 digoal 600 17
0x002dcaae 1913126920 digoal 600 17
0x002dcaaf 1913159689 digoal 600 17
0x002dcab0 1913192458 digoal 600 17
0x002dcab1 1913225227 digoal 600 17
0x002dcab2 1913257996 digoal 600 17
0x002dcab3 1913290765 digoal 600 17
0x002dcab4 1913323534 digoal 600 17
0x002dcab5 1913356303 digoal 600 17
0x002dcab6 1913389072 digoal 600 17
0x002dcab7 1913421841 digoal 600 17
0x002dcab8 1913454610 digoal 600 17
0x002dcab9 1913487379 digoal 600 17
0x002dcaba 1913520148 digoal 600 17
0x002dcabb 1913552917 digoal 600 17
0x002dcabc 1913585686 digoal 600 17
0x002dcabd 1913618455 digoal 600 17
0x002dcabe 1913651224 digoal 600 17
0x002dcabf 1913683993 digoal 600 17
0x002dcac0 1913716762 digoal 600 17
0x002dcac1 1913749531 digoal 600 17
0x002dcac2 1913782300 digoal 600 17
0x002dcac3 1913815069 digoal 600 17
0x002dcac4 1913847838 digoal 600 17
0x002dcac5 1913880607 digoal 600 17
0x002dcac6 1913913376 digoal 600 17
0x002dcac7 1913946145 digoal 600 17
0x002dcac8 1913978914 digoal 600 17
0x002dcac9 1914011683 digoal 600 17
0x002dcaca 1914044452 digoal 600 17
0x002dcacb 1914077221 digoal 600 17
0x002dcacc 1914109990 digoal 600 17
0x002dcacd 1914142759 digoal 600 17
0x002dcace 1914175528 digoal 600 17
0x002dcacf 1914208297 digoal 600 17
0x002dcad0 1914241066 digoal 600 17
0x002dcad1 1914273835 digoal 600 17
0x002dcad2 1914306604 digoal 600 17
0x002dcad3 1914339373 digoal 600 17
0x002dcad4 1914372142 digoal 600 17
0x002dcad5 1914404911 digoal 600 17
0x002dcad6 1914437680 digoal 600 17
0x002dcad7 1914470449 digoal 600 17
0x002dcad8 1914503218 digoal 600 17
0x002dcad9 1914535987 digoal 600 17
0x002dcada 1914568756 digoal 600 17
0x002dcadb 1914601525 digoal 600 17
0x002dcadc 1914634294 digoal 600 17
0x002dcadd 1914667063 digoal 600 17
0x002dcade 1914699832 digoal 600 17
0x002dcadf 1914732601 digoal 600 17
0x002dcae0 1914765370 digoal 600 17
0x002dcae1 1914798139 digoal 600 17
0x002dcae2 1914830908 digoal 600 17
0x002dcae3 1914863677 digoal 600 17
0x002dcae4 1914896446 digoal 600 17
0x002dcae5 1914929215 digoal 600 17
0x002dcae6 1914961984 digoal 600 17
0x002dcae7 1914994753 digoal 600 17
0x002dcae8 1915027522 digoal 600 17
0x002dcae9 1915060291 digoal 600 17
0x002dcaea 1915093060 digoal 600 17
0x002dcaeb 1915125829 digoal 600 17
0x002dcaec 1915158598 digoal 600 17
0x002dcaed 1915191367 digoal 600 17
0x002dcaee 1915224136 digoal 600 17
0x002dcaef 1915256905 digoal 600 17
0x002dcaf0 1915289674 digoal 600 17
0x002dcaf1 1915322443 digoal 600 17
0x002dcaf2 1915355212 digoal 600 17
0x002dcaf3 1915387981 digoal 600 17
0x002dcaf4 1915420750 digoal 600 17
0x002dcaf5 1915453519 digoal 600 17
0x002dcaf6 1915486288 digoal 600 17
0x002dcaf7 1915519057 digoal 600 17
0x002dcaf8 1915551826 digoal 600 17
0x002dcaf9 1915584595 digoal 600 17
0x002dcafa 1915617364 digoal 600 17
0x002dcafb 1915650133 digoal 600 17
0x002dcafc 1915682902 digoal 600 17
0x002dcafd 1915715671 digoal 600 17
0x002dcafe 1915748440 digoal 600 17
0x002dcaff 1915781209 digoal 600 17
0x002dcb00 1915813978 digoal 600 17
0x002dcb01 1915846747 digoal 600 17
0x002dcb02 1915879516 digoal 600 17
0x002dcb03 1915912285 digoal 600 17
0x002dcb04 1915945054 digoal 600 17
0x002dcb05 1915977823 digoal 600 17
0x002dcb06 1916010592 digoal 600 17
0x002dcb07 1916043361 digoal 600 17
0x002dcb08 1916076130 digoal 600 17
0x002dcb09 1916108899 digoal 600 17
0x002dcb0a 1916141668 digoal 600 17
0x002dcb0b 1916174437 digoal 600 17
0x002dcb0c 1916207206 digoal 600 17
0x002dcb0d 1916239975 digoal 600 17
0x002dcb0e 1916272744 digoal 600 17
0x002dcb0f 1916305513 digoal 600 17
0x002dcb10 1916338282 digoal 600 17
0x002dcb11 1916371051 digoal 600 17
0x002dcb12 1916403820 digoal 600 17
0x002dcb13 1916436589 digoal 600 17
0x002dcb14 1916469358 digoal 600 17
0x002dcb15 1916502127 digoal 600 17
0x002dcb16 1916534896 digoal 600 17
0x002dcb17 1916567665 digoal 600 17
0x002dcb18 1916600434 digoal 600 17
0x002dcb19 1916633203 digoal 600 17
0x002dcb1a 1916665972 digoal 600 17
0x002dcb1b 1916698741 digoal 600 17
0x002dcb1c 1916731510 digoal 600 17
0x002dcb1d 1916764279 digoal 600 17
0x002dcb1e 1916797048 digoal 600 17
0x002dcb1f 1916829817 digoal 600 17
0x002dcb20 1916862586 digoal 600 17
0x002dcb21 1916895355 digoal 600 17
0x002dcb22 1916928124 digoal 600 17
0x002dcb23 1916960893 digoal 600 17
0x002dcb24 1916993662 digoal 600 17
0x002dcb25 1917026431 digoal 600 17
0x002dcb26 1917059200 digoal 600 17
0x002dcb27 1917091969 digoal 600 17
0x002dcb28 1917124738 digoal 600 17
0x002dcb29 1917157507 digoal 600 17
0x002dcb2a 1917190276 digoal 600 17
0x002dcb2b 1917223045 digoal 600 17
0x002dcb2c 1917255814 digoal 600 17
------ Message Queues --------
key msqid owner perms used-bytes messages
注意,只清理37111这个PG实例创建的shared memory和sem,不要把别人的清理掉了。
1、清理共享内存段
$ipcrm -m 2007007232
2、清理sem
ipcs|awk '{print "ipcrm -s "$2}'
......
ipcrm -s 1912995844
ipcrm -s 1913028613
ipcrm -s 1913061382
ipcrm -s 1913094151
ipcrm -s 1913126920
......
ipcrm -s 1917124738
ipcrm -s 1917157507
ipcrm -s 1917190276
ipcrm -s 1917223045
ipcrm -s 1917255814
然后数据库就干净的关闭了,可以启动数据库了。这样就做到了不重启服务器把数据库从僵尸状态复活。
pg_ctl start -D /home/digoal/pgdata
参考
http://www.cnblogs.com/zeushuang/archive/2012/11/29/2794375.html