PostgreSQL Linux 下 僵尸状态的处理

4 minute read

背景

在某些非常特殊的情况下,可能遇到数据库无法停库,也无法连接(几乎处于不可用状态)的情况。即使使用-m immediate这样的暴力停库都无法停下来。

例如,当数据库的某些用户进程进入了defunct状态时。

2951     20411 37111  0 14:28 ?        00:00:00 postgres: digoal postgres [local] idle                                           
2951     37111     1  0  2017 ?        00:05:48 /home/digoal/pgsql/bin/postgres -D /home/digoal/pgdata  
2951     37121 37111  0  2017 ?        00:01:27 [postgres] <defunct>  
2951     37128 37111  0  2017 ?        00:09:09 [postgres] <defunct>  
2951     37129 37111  0  2017 ?        00:01:28 postgres: stats collector process                                                   
2951     38325 37111  0 19:33 ?        00:00:00 [postgres] <defunct>  
.................  

37111是PG的主进程,其他是它的子进程。

此时使用如下方法都无法停库。

pg_ctl stop -m fast -D /home/digoal/pgdata  
  
pg_ctl stop -m immediate -D /home/digoal/pgdata  

Linux 处理 Z 僵尸进程

如何kill杀掉linux系统中的僵尸defunct进程

在 Unix系统管理中,当用ps命令观察进程的执行状态时,经常看到某些进程的状态栏为defunct,这就是所谓的“僵尸”进程。“僵尸”进程是一个早已 死亡的进程,但在进程表(processs table)中仍占了一个位置(slot)。由于进程表的容量是有限的,所以,defunct进程不仅占用系统的内存资源,影响系统的性能,而且如果其数 目太多,还会导致系统瘫痪。

我们知道,每个Unix进程在进程表里都有一个进入点(entry),核心程序执行该进程时使用到的一切信息都存储在进入点。当用ps命令察看系统中的进程信息时,看到的就是进程表中的相关数据。当以fork()系统调用建立一个新的进程后,核心进程就会在进程表中给这个新进程分配一个进入点,然后将相关信息存储在该进入点所对应的进程表内。这些信息中有一项是其父进程的识别码。当这个进程走完了自己的生命周期后,它会执行exit()系统调用,此时原来进 程表中的数据会被该进程的退出码(exit code)、执行时所用的CPU时间等数据所取代,这些数据会一直保留到系统将它传递给它的父进程为止。由此可见,defunct进程的出现时间是在子进 程终止后,但是父进程尚未读取这些数据之前。

defunct进程是不能直接kill -9杀掉的,否则就不叫僵尸进程了。

知道了defunct进程产生的原因,就可以轻易的kill掉defunct进程。

方法有二:

1,重启服务器电脑,这个是最简单,最易用的方法,但是如果你服务器电脑上运行有其他的程序,那么这个方法,代价很大。所以,尽量使用下面一种方法。

2,找到该defunct僵尸进程的父进程,将该进程的父进程杀掉,则此defunct进程将自动消失。如何找到defunct僵尸进程的父进程?很简单,一句命令就够了:ps -ef grep defunct_process_pid。

PostgreSQL 进入僵尸状态后的处理

如开篇提到的CASE,如何把PG停下来呢?

分为三个级别的操作,请依次尝试,最后万不得已使用第三种方法。

1、温柔停库

pg_ctl stop -m fast -D /home/digoal/pgdata  

2、紧急快速停库(不刷盘,通常用于紧急停电只有UPS的处理)

pg_ctl stop -m immediate -D /home/digoal/pgdata  

3、KILL清理

首先要查看启动数据库的主进程

《explain postmaster.pid》

cd /home/digoal/pgdata  
  
cat postmaster.pid  
  
37111  
/home/digoal/pgdata  
1511749309  
1921  
/tmp  
*  
  1921001 2007007232  

1、杀掉postmaster 进程的所有子进程

ps -ewf|grep 37111|grep -v grep|awk '{print "kill -9 "$2}'|grep -v 37111  

2、杀掉postmaster 进程

kill -9 37111  

3、清理shared memory和sem

$ipcs  
  
------ Shared Memory Segments --------  
key        shmid      owner      perms      bytes      nattch     status        
0x002dcaa9 2007007232 digoal  600        48         0                         
  
------ Semaphore Arrays --------  
key        semid      owner      perms      nsems       
0x0002243f 65538      admin      666        2           
0x002dcaa9 1912963075 digoal  600        17          
0x002dcaaa 1912995844 digoal  600        17          
0x002dcaab 1913028613 digoal  600        17          
0x002dcaac 1913061382 digoal  600        17          
0x002dcaad 1913094151 digoal  600        17          
0x002dcaae 1913126920 digoal  600        17          
0x002dcaaf 1913159689 digoal  600        17          
0x002dcab0 1913192458 digoal  600        17          
0x002dcab1 1913225227 digoal  600        17          
0x002dcab2 1913257996 digoal  600        17          
0x002dcab3 1913290765 digoal  600        17          
0x002dcab4 1913323534 digoal  600        17          
0x002dcab5 1913356303 digoal  600        17          
0x002dcab6 1913389072 digoal  600        17          
0x002dcab7 1913421841 digoal  600        17          
0x002dcab8 1913454610 digoal  600        17          
0x002dcab9 1913487379 digoal  600        17          
0x002dcaba 1913520148 digoal  600        17          
0x002dcabb 1913552917 digoal  600        17          
0x002dcabc 1913585686 digoal  600        17          
0x002dcabd 1913618455 digoal  600        17          
0x002dcabe 1913651224 digoal  600        17          
0x002dcabf 1913683993 digoal  600        17          
0x002dcac0 1913716762 digoal  600        17          
0x002dcac1 1913749531 digoal  600        17          
0x002dcac2 1913782300 digoal  600        17          
0x002dcac3 1913815069 digoal  600        17          
0x002dcac4 1913847838 digoal  600        17          
0x002dcac5 1913880607 digoal  600        17          
0x002dcac6 1913913376 digoal  600        17          
0x002dcac7 1913946145 digoal  600        17          
0x002dcac8 1913978914 digoal  600        17          
0x002dcac9 1914011683 digoal  600        17          
0x002dcaca 1914044452 digoal  600        17          
0x002dcacb 1914077221 digoal  600        17          
0x002dcacc 1914109990 digoal  600        17          
0x002dcacd 1914142759 digoal  600        17          
0x002dcace 1914175528 digoal  600        17          
0x002dcacf 1914208297 digoal  600        17          
0x002dcad0 1914241066 digoal  600        17          
0x002dcad1 1914273835 digoal  600        17          
0x002dcad2 1914306604 digoal  600        17          
0x002dcad3 1914339373 digoal  600        17          
0x002dcad4 1914372142 digoal  600        17          
0x002dcad5 1914404911 digoal  600        17          
0x002dcad6 1914437680 digoal  600        17          
0x002dcad7 1914470449 digoal  600        17          
0x002dcad8 1914503218 digoal  600        17          
0x002dcad9 1914535987 digoal  600        17          
0x002dcada 1914568756 digoal  600        17          
0x002dcadb 1914601525 digoal  600        17          
0x002dcadc 1914634294 digoal  600        17          
0x002dcadd 1914667063 digoal  600        17          
0x002dcade 1914699832 digoal  600        17          
0x002dcadf 1914732601 digoal  600        17          
0x002dcae0 1914765370 digoal  600        17          
0x002dcae1 1914798139 digoal  600        17          
0x002dcae2 1914830908 digoal  600        17          
0x002dcae3 1914863677 digoal  600        17          
0x002dcae4 1914896446 digoal  600        17          
0x002dcae5 1914929215 digoal  600        17          
0x002dcae6 1914961984 digoal  600        17          
0x002dcae7 1914994753 digoal  600        17          
0x002dcae8 1915027522 digoal  600        17          
0x002dcae9 1915060291 digoal  600        17          
0x002dcaea 1915093060 digoal  600        17          
0x002dcaeb 1915125829 digoal  600        17          
0x002dcaec 1915158598 digoal  600        17          
0x002dcaed 1915191367 digoal  600        17          
0x002dcaee 1915224136 digoal  600        17          
0x002dcaef 1915256905 digoal  600        17          
0x002dcaf0 1915289674 digoal  600        17          
0x002dcaf1 1915322443 digoal  600        17          
0x002dcaf2 1915355212 digoal  600        17          
0x002dcaf3 1915387981 digoal  600        17          
0x002dcaf4 1915420750 digoal  600        17          
0x002dcaf5 1915453519 digoal  600        17          
0x002dcaf6 1915486288 digoal  600        17          
0x002dcaf7 1915519057 digoal  600        17          
0x002dcaf8 1915551826 digoal  600        17          
0x002dcaf9 1915584595 digoal  600        17          
0x002dcafa 1915617364 digoal  600        17          
0x002dcafb 1915650133 digoal  600        17          
0x002dcafc 1915682902 digoal  600        17          
0x002dcafd 1915715671 digoal  600        17          
0x002dcafe 1915748440 digoal  600        17          
0x002dcaff 1915781209 digoal  600        17          
0x002dcb00 1915813978 digoal  600        17          
0x002dcb01 1915846747 digoal  600        17          
0x002dcb02 1915879516 digoal  600        17          
0x002dcb03 1915912285 digoal  600        17          
0x002dcb04 1915945054 digoal  600        17          
0x002dcb05 1915977823 digoal  600        17          
0x002dcb06 1916010592 digoal  600        17          
0x002dcb07 1916043361 digoal  600        17          
0x002dcb08 1916076130 digoal  600        17          
0x002dcb09 1916108899 digoal  600        17          
0x002dcb0a 1916141668 digoal  600        17          
0x002dcb0b 1916174437 digoal  600        17          
0x002dcb0c 1916207206 digoal  600        17          
0x002dcb0d 1916239975 digoal  600        17          
0x002dcb0e 1916272744 digoal  600        17          
0x002dcb0f 1916305513 digoal  600        17          
0x002dcb10 1916338282 digoal  600        17          
0x002dcb11 1916371051 digoal  600        17          
0x002dcb12 1916403820 digoal  600        17          
0x002dcb13 1916436589 digoal  600        17          
0x002dcb14 1916469358 digoal  600        17          
0x002dcb15 1916502127 digoal  600        17          
0x002dcb16 1916534896 digoal  600        17          
0x002dcb17 1916567665 digoal  600        17          
0x002dcb18 1916600434 digoal  600        17          
0x002dcb19 1916633203 digoal  600        17          
0x002dcb1a 1916665972 digoal  600        17          
0x002dcb1b 1916698741 digoal  600        17          
0x002dcb1c 1916731510 digoal  600        17          
0x002dcb1d 1916764279 digoal  600        17          
0x002dcb1e 1916797048 digoal  600        17          
0x002dcb1f 1916829817 digoal  600        17          
0x002dcb20 1916862586 digoal  600        17          
0x002dcb21 1916895355 digoal  600        17          
0x002dcb22 1916928124 digoal  600        17          
0x002dcb23 1916960893 digoal  600        17          
0x002dcb24 1916993662 digoal  600        17          
0x002dcb25 1917026431 digoal  600        17          
0x002dcb26 1917059200 digoal  600        17          
0x002dcb27 1917091969 digoal  600        17          
0x002dcb28 1917124738 digoal  600        17          
0x002dcb29 1917157507 digoal  600        17          
0x002dcb2a 1917190276 digoal  600        17          
0x002dcb2b 1917223045 digoal  600        17          
0x002dcb2c 1917255814 digoal  600        17          
  
------ Message Queues --------  
key        msqid      owner      perms      used-bytes   messages      

注意,只清理37111这个PG实例创建的shared memory和sem,不要把别人的清理掉了。

1、清理共享内存段

$ipcrm -m 2007007232  

2、清理sem

ipcs|awk '{print "ipcrm -s "$2}'  
......  
ipcrm -s 1912995844  
ipcrm -s 1913028613  
ipcrm -s 1913061382  
ipcrm -s 1913094151  
ipcrm -s 1913126920  
......  
ipcrm -s 1917124738  
ipcrm -s 1917157507  
ipcrm -s 1917190276  
ipcrm -s 1917223045  
ipcrm -s 1917255814  

然后数据库就干净的关闭了,可以启动数据库了。这样就做到了不重启服务器把数据库从僵尸状态复活。

pg_ctl start -D /home/digoal/pgdata  

参考

http://www.cnblogs.com/zeushuang/archive/2012/11/29/2794375.html

《explain postmaster.pid》

Flag Counter

digoal’s 大量PostgreSQL文章入口