replace offline or FAULTED device in ZPOOL

15 minute read

背景

今早发现一台zfsonlinux主机的zpool其中一块硬盘写错误次数过多变成FAULTED状态了, raidz1的话变成了DEGRADED状态.

但是hot spare还是AVAIL的状态, 也就是说hot spare没有自动用起来.

(后来发现确实如此, hot spare不会自动使用, 需要手工干预)

当前的zpool状态, sdl已经faulted了.

[root@db-192-168-173-219 ~]# zpool status zp1  
  pool: zp1  
 state: DEGRADED  
status: One or more devices are faulted in response to persistent errors.  
        Sufficient replicas exist for the pool to continue functioning in a  
        degraded state.  
action: Replace the faulted device, or use 'zpool clear' to mark the device  
        repaired.  
  scan: none requested  
config:  
  
        NAME                                            STATE     READ WRITE CKSUM  
        zp1                                             DEGRADED     0     0     0  
          raidz1-0                                      DEGRADED     0     0     0  
            sdb                                         ONLINE       0     0     0  
            sdc                                         ONLINE       0     0     0  
            sdd                                         ONLINE       0     0     0  
            sde                                         ONLINE       0     0     0  
            sdf                                         ONLINE       0     0     0  
            sdg                                         ONLINE       0     0     0  
            sdh                                         ONLINE       0     0     0  
            sdi                                         ONLINE       0     0     0  
            sdj                                         ONLINE       0     0     0  
            sdk                                         ONLINE       0     0     0  
            sdl                                         FAULTED     11   586     0  too many errors  
        logs  
          scsi-36c81f660eb18e8001af8e4ec0420e21f-part4  ONLINE       0     0     0  
        spares  
          scsi-36c81f660eb18e8001b32c5c61a48318a        AVAIL     
  
errors: No known data errors  

从dmesge中可以读到大量的错误信息.

sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 93 be f0 00 00 08 00  
INFO: task txg_sync:25712 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
txg_sync      D 0000000000000000     0 25712      2 0x00000080  
 ffff8804ddaa7b70 0000000000000046 0000000000000001 ffff880232635530  
 0000000000000000 0000000000000000 ffff8804ddaa7af0 ffffffff81065e02  
 ffff8804ddb8b058 ffff8804ddaa7fd8 000000000000fbc8 ffff8804ddb8b058  
Call Trace:  
 [<ffffffff81065e02>] ? default_wake_function+0x12/0x20  
 [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffffa02e7480>] ? zio_execute+0x0/0x140 [zfs]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa02787e3>] dsl_pool_sync+0xb3/0x440 [zfs]  
 [<ffffffffa028c67b>] spa_sync+0x40b/0xae0 [zfs]  
 [<ffffffffa02a0bb4>] txg_sync_thread+0x384/0x5e0 [zfs]  
 [<ffffffff81059329>] ? set_user_nice+0xc9/0x130  
 [<ffffffffa02a0830>] ? txg_sync_thread+0x0/0x5e0 [zfs]  
 [<ffffffffa016f948>] thread_generic_wrapper+0x68/0x80 [spl]  
 [<ffffffffa016f8e0>] ? thread_generic_wrapper+0x0/0x80 [spl]  
 [<ffffffff8109aef6>] kthread+0x96/0xa0  
 [<ffffffff8100c20a>] child_rip+0xa/0x20  
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0  
 [<ffffffff8100c200>] ? child_rip+0x0/0x20  
INFO: task nfsd:28814 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
nfsd          D 0000000000000006     0 28814      2 0x00000080  
 ffff8806d77adab0 0000000000000046 0000000000000000 0000000000000003  
 0000000000000001 0000000000000086 ffff8806d77ada60 ffffffff81058d53  
 ffff8806d7497098 ffff8806d77adfd8 000000000000fbc8 ffff8806d7497098  
Call Trace:  
 [<ffffffff81058d53>] ? __wake_up+0x53/0x70  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa02e8473>] ? zio_nowait+0xb3/0x170 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa02e39e0>] zil_commit+0x3b0/0x700 [zfs]  
 [<ffffffffa02d83e2>] zfs_fsync+0x92/0x120 [zfs]  
 [<ffffffffa02ee8ee>] zpl_commit_metadata+0x3e/0x60 [zfs]  
 [<ffffffffa04f8e10>] commit_metadata+0x40/0x70 [nfsd]  
 [<ffffffff8119775e>] ? fsnotify_create+0x5e/0x80  
 [<ffffffff811983dc>] ? vfs_create+0xfc/0x110  
 [<ffffffffa04fc444>] nfsd_create_v3+0x444/0x530 [nfsd]  
 [<ffffffffa0503c13>] nfsd3_proc_create+0x123/0x1b0 [nfsd]  
 [<ffffffffa04f4425>] nfsd_dispatch+0xe5/0x230 [nfsd]  
 [<ffffffffa04a37e4>] svc_process_common+0x344/0x640 [sunrpc]  
 [<ffffffff81065df0>] ? default_wake_function+0x0/0x20  
 [<ffffffffa04a3e20>] svc_process+0x110/0x160 [sunrpc]  
 [<ffffffffa04f4b52>] nfsd+0xc2/0x160 [nfsd]  
 [<ffffffffa04f4a90>] ? nfsd+0x0/0x160 [nfsd]  
 [<ffffffff8109aef6>] kthread+0x96/0xa0  
 [<ffffffff8100c20a>] child_rip+0xa/0x20  
 [<ffffffff8109ae60>] ? kthread+0x0/0xa0  
 [<ffffffff8100c200>] ? child_rip+0x0/0x20  
INFO: task postgres:46313 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000004     0 46313  46300 0x00000080  
 ffff8803806519b8 0000000000000082 0000000000000000 ffff880380651a08  
 0000000000000001 ffff8800404c5930 ffff880623f52040 0000000000000000  
 ffff8804270d3058 ffff880380651fd8 000000000000fbc8 ffff8804270d3058  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff8119938a>] ? __link_path_walk+0x7ca/0xff0  
 [<ffffffff81528f0e>] ? mutex_lock+0x1e/0x50  
 [<ffffffff8122752f>] ? security_inode_permission+0x1f/0x30  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:46554 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000005     0 46554  46541 0x00000080  
 ffff88048fc799b8 0000000000000082 0000000000000000 ffff88048fc79a08  
 0000000000000001 ffff880240a8e4a0 ffff880606aa5750 0000000000000000  
 ffff8806d775f058 ffff88048fc79fd8 000000000000fbc8 ffff8806d775f058  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff814c5b8a>] ? inet_sendmsg+0x4a/0xb0  
 [<ffffffff81447e03>] ? sock_sendmsg+0x123/0x150  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:46555 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000006     0 46555  46541 0x00000080  
 ffff8807c1cab9b8 0000000000000086 0000000000000000 ffff8807c1caba08  
 0000000000000001 ffff8801e8ea73e0 ffff88063991f230 0000000000000000  
 ffff880806b9c638 ffff8807c1cabfd8 000000000000fbc8 ffff880806b9c638  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff8119938a>] ? __link_path_walk+0x7ca/0xff0  
 [<ffffffff81528f0e>] ? mutex_lock+0x1e/0x50  
 [<ffffffff8122752f>] ? security_inode_permission+0x1f/0x30  
 [<ffffffffa017936f>] ? tsd_exit+0x5f/0x2b0 [spl]  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:47162 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000005     0 47162  47129 0x00000080  
 ffff8802d1dbf9b8 0000000000000086 0000000000000000 ffff8802d1dbfa08  
 0000000000000001 ffff8800bb012870 ffff880673838a80 0000000000000000  
 ffff8804218f9058 ffff8802d1dbffd8 000000000000fbc8 ffff8804218f9058  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff814c5b8a>] ? inet_sendmsg+0x4a/0xb0  
 [<ffffffff81528f0e>] ? mutex_lock+0x1e/0x50  
 [<ffffffff81447e03>] ? sock_sendmsg+0x123/0x150  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:47184 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000004     0 47184  47129 0x00000080  
 ffff8802afd519b8 0000000000000082 0000000000000000 ffff8802afd51a08  
 0000000000000001 ffff88010ea124a0 ffff880373f1aa80 0000000000000000  
 ffff8804260a9058 ffff8802afd51fd8 000000000000fbc8 ffff8804260a9058  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff81449ab3>] ? sock_recvmsg+0x133/0x160  
 [<ffffffff8108b16e>] ? send_signal+0x3e/0x90  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:39751 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000000     0 39751  39591 0x00000080  
 ffff8805dbd1b9b8 0000000000000082 0000000000000040 ffff8805dbd1ba08  
 0000000000000001 ffff88020b036c40 ffff8804c9b02e58 0000000000000000  
 ffff8808270125f8 ffff8805dbd1bfd8 000000000000fbc8 ffff8808270125f8  
Call Trace:  
 [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0  
 [<ffffffff81447e03>] ? sock_sendmsg+0x123/0x150  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:12310 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000000     0 12310  39591 0x00000080  
 ffff8804844bd9b8 0000000000000086 0000000000000040 ffff8804844bda08  
 0000000000000001 ffff8803319737b0 ffff8803b97b06a8 0000000000000000  
 ffff880583917af8 ffff8804844bdfd8 000000000000fbc8 ffff880583917af8  
Call Trace:  
 [<ffffffff810a70a1>] ? ktime_get_ts+0xb1/0xf0  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff81449ab3>] ? sock_recvmsg+0x133/0x160  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
INFO: task postgres:19243 blocked for more than 120 seconds.  
      Tainted: P           ---------------    2.6.32-431.el6.x86_64 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  
postgres      D 0000000000000000     0 19243  46300 0x00000080  
 ffff8801c2fe59b8 0000000000000086 0000000000000000 ffff8801c2fe5a08  
 0000000000000001 ffff88001beec250 ffff88021e804fa0 0000000000000000  
 ffff880421903af8 ffff8801c2fe5fd8 000000000000fbc8 ffff880421903af8  
Call Trace:  
 [<ffffffff815280a3>] io_schedule+0x73/0xc0  
 [<ffffffffa0177bcc>] cv_wait_common+0xac/0x1c0 [spl]  
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40  
 [<ffffffffa024dde9>] ? dbuf_rele_and_unlock+0x169/0x210 [zfs]  
 [<ffffffffa0177cf8>] __cv_wait_io+0x18/0x20 [spl]  
 [<ffffffffa02e76bb>] zio_wait+0xfb/0x1b0 [zfs]  
 [<ffffffffa0264805>] dmu_tx_count_write+0x695/0x6f0 [zfs]  
 [<ffffffff8116fc6c>] ? __kmalloc+0x20c/0x220  
 [<ffffffffa016e20f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]  
 [<ffffffffa02648af>] dmu_tx_hold_write+0x4f/0x70 [zfs]  
 [<ffffffffa02dada6>] zfs_write+0x406/0xcf0 [zfs]  
 [<ffffffff81449ab3>] ? sock_recvmsg+0x133/0x160  
 [<ffffffff8108b16e>] ? send_signal+0x3e/0x90  
 [<ffffffffa02ef3f2>] zpl_write_common+0x52/0x80 [zfs]  
 [<ffffffffa02ef488>] zpl_write+0x68/0xa0 [zfs]  
 [<ffffffff812263c6>] ? security_file_permission+0x16/0x20  
 [<ffffffff81188f78>] vfs_write+0xb8/0x1a0  
 [<ffffffff81189871>] sys_write+0x51/0x90  
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290  
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 a4 ae 60 00 00 08 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 75 e4 08 00 00 08 00  
scanning ...  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10)  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 50 4a 60 00 00 10 00  
: 28 00 00 00 0a 10 00 00 10 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 67 e5 6d f8 00 00 08 00  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 66 95 36 98 00 00 10 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 7b fe 48 00 00 08 00  
  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(16): 88 00 00 00 00 01 d1 af b4 10 00 00 00 10 00 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 68 93 bd 78 00 00 08 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Write(10): 2a 00 68 ad 3d 20 00 00 28 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 5f 5c 66 48 00 00 08 00  
sd 0:2:11:0: [sdl] Unhandled error code  
sd 0:2:11:0: [sdl] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK  
sd 0:2:11:0: [sdl] CDB: Read(10): 28 00 5f 5f 44 90 00 00 10 00  
end_request: I/O error, dev sdl, sector 1715667304  
end_request: I/O error, dev sdl, sector 7812920848  
end_request: I/O error, dev sdl, sector 1716598240  
end_request: I/O error, dev sdl, sector 2576  
end_request: I/O error, dev sdl, sector 7812920336  
end_request: I/O error, dev sdl, sector 7812920848  

手工将fautaled的硬盘replace为hot spare盘.

[root@db-192-168-173-219 ~]# zpool replace zp1 sdl scsi-36c81f660eb18e8001b32c5c61a48318a  
[root@db-192-168-173-219 ~]# zpool status -v  
  pool: zp1  
 state: DEGRADED  
status: One or more devices is currently being resilvered.  The pool will  
        continue to function, possibly in a degraded state.  
action: Wait for the resilver to complete.  
  scan: resilver in progress since Thu Jul 31 08:31:53 2014  
    3.41G scanned out of 8.62T at 17.4M/s, 144h42m to go  
    300M resilvered, 0.04% done  
config:  
  
        NAME                                            STATE     READ WRITE CKSUM  
        zp1                                             DEGRADED     0     0     0  
          raidz1-0                                      DEGRADED     0     0     0  
            sdb                                         ONLINE       0     0     0  
            sdc                                         ONLINE       0     0     0  
            sdd                                         ONLINE       0     0     0  
            sde                                         ONLINE       0     0     0  
            sdf                                         ONLINE       0     0     0  
            sdg                                         ONLINE       0     0     0  
            sdh                                         ONLINE       0     0     0  
            sdi                                         ONLINE       0     0     0  
            sdj                                         ONLINE       0     0     0  
            sdk                                         ONLINE       0     0     0  
            spare-10                                    FAULTED      0     0     0  
              sdl                                       FAULTED     11   586     0  too many errors  
              scsi-36c81f660eb18e8001b32c5c61a48318a    ONLINE       0     0     0  (resilvering)  
        logs  
          scsi-36c81f660eb18e8001af8e4ec0420e21f-part4  ONLINE       0     0     0  
        spares  
          scsi-36c81f660eb18e8001b32c5c61a48318a        INUSE     currently in use  
  
errors: No known data errors  

resilver的时间, 和扫描速度, 以及ZPOOL已使用的空间有关, 我们看到的这个例子, 已经使用的空间有8.62T, 扫描速度只有17.4M/s, 所以评估需要144小时42分钟.

  scan: resilver in progress since Thu Jul 31 08:31:53 2014  
    3.41G scanned out of 8.62T at 17.4M/s, 144h42m to go  
    300M resilvered, 0.04% done  

如果有硬盘更换上去, 在更换前, 最好先把硬盘改成offline的.

[root@db-192-168-173-219 test1]# zpool offline zp1 sdl  
[root@db-192-168-173-219 opt]# zpool status -v zp1  
  pool: zp1  
 state: DEGRADED  
status: One or more devices is currently being resilvered.  The pool will  
        continue to function, possibly in a degraded state.  
action: Wait for the resilver to complete.  
  scan: resilver in progress since Thu Jul 31 09:43:21 2014  
    246M scanned out of 8.63T at 12.3M/s, 204h27m to go  
    22.1M resilvered, 0.00% done  
config:  
  
        NAME                                            STATE     READ WRITE CKSUM  
        zp1                                             DEGRADED     0     0     0  
          raidz1-0                                      DEGRADED     0     0     0  
            sdb                                         ONLINE       0     0     0  
            sdc                                         ONLINE       0     0     0  
            sdd                                         ONLINE       0     0     0  
            sde                                         ONLINE       0     0     0  
            sdf                                         ONLINE       0     0     0  
            sdg                                         ONLINE       0     0     0  
            sdh                                         ONLINE       0     0     0  
            sdi                                         ONLINE       0     0     0  
            sdj                                         ONLINE       0     0     0  
            sdk                                         ONLINE       0     0     0  
            spare-10                                    DEGRADED     0     0     0  
              sdl                                       OFFLINE     11   586     0  
              scsi-36c81f660eb18e8001b32c5c61a48318a    ONLINE       0     0     0  (resilvering)  
        logs  
          scsi-36c81f660eb18e8001af8e4ec0420e21f-part4  ONLINE       0     0     0  
        spares  
          scsi-36c81f660eb18e8001b32c5c61a48318a        INUSE     currently in use  
  
errors: No known data errors  

然后更换硬盘, 更换好后, 使用replace, 将spare盘释放出来.

zpool replace zp1 sdl sdl  

硬盘非常大, 所以resilvering过程很漫长, 从zpool status -v可以看到还需要多长时间.

更换过程如下 :

[root@digoal ~]# MegaCli -CfgLdAdd -r0 [32:10] WB Direct -a0  
                                       
  
Adapter 0: Configure Adapter Failed  
  
FW error description:   
  The current operation is not allowed because the controller has data in cache for offline or missing virtual drives.    
  
Exit Code: 0x54  
  
[root@digoal ~]# MegaCli -GetPreservedCacheList -aALL  
                                       
Adapter #0  
  
Virtual Drive(Target ID 11): Missing.  
  
Exit Code: 0x00  
[root@digoal ~]# MegaCli -DiscardPreservedCache -L11 -a0  
                                       
Adapter #0  
  
Virtual Drive(Target ID 11): Preserved Cache Data Cleared.  
  
Exit Code: 0x00  
[root@digoal ~]# MegaCli -CfgLdAdd -r0 [32:10] WT Direct -a0  
                                       
Adapter 0: Created VD 11  
  
Adapter 0: Configured the Adapter!!  
  
Exit Code: 0x00  
  
[root@digoal ~]# zpool replace zp1 /dev/sdl /dev/sdl  
invalid vdev specification  
use '-f' to override the following errors:  
/dev/sdl does not contain an EFI label but it may contain partition  
information in the MBR.  
[root@digoal ~]# zpool replace -f zp1 /dev/sdl /dev/sdl  
[root@digoal ~]# zpool status -v  
  pool: zp1  
 state: DEGRADED  
status: One or more devices is currently being resilvered.  The pool will  
        continue to function, possibly in a degraded state.  
action: Wait for the resilver to complete.  
  scan: resilver in progress since Tue Aug  5 15:47:35 2014  
    100M scanned out of 9.45T at 16.7M/s, 165h6m to go  
    9.08M resilvered, 0.00% done  
config:  
  
        NAME                                            STATE     READ WRITE CKSUM  
        zp1                                             DEGRADED     0     0     0  
          raidz1-0                                      DEGRADED     0     0     0  
            sdb                                         ONLINE       0     0     0  
            sdc                                         ONLINE       0     0     0  
            sdd                                         ONLINE       0     0     0  
            sde                                         ONLINE       0     0     0  
            sdf                                         ONLINE       0     0     0  
            sdg                                         ONLINE       0     0     0  
            sdh                                         ONLINE       0     0     0  
            sdi                                         ONLINE       0     0     0  
            sdj                                         ONLINE       0     0     0  
            sdk                                         ONLINE       0     0     0  
            spare-10                                    DEGRADED     0     0     0  
              replacing-0                               OFFLINE      0     0     0  
                old                                     OFFLINE     11   586     0  
                sdl                                     ONLINE       0     0     0  (resilvering)  
              scsi-36c81f660eb18e8001b32c5c61a48318a    ONLINE       0     0     0  
        logs  
          scsi-36c81f660eb18e8001af8e4ec0420e21f-part4  ONLINE       0     0     0  
        spares  
          scsi-36c81f660eb18e8001b32c5c61a48318a        INUSE     currently in use  
  
errors: No known data errors  

接下来模拟一下整个的过程, offline一个盘, 使用hot spare顶上去, 更换硬盘, 将更换后的硬盘顶替坏盘, hot spare盘自动释放回hot spare avail状态.

创建3个文件.

# dd if=/dev/zero of=/opt/zfs.disk1 bs=8192 count=102400  
# dd if=/dev/zero of=/opt/zfs.disk2 bs=8192 count=102400  
# dd if=/dev/zero of=/opt/zfs.disk3 bs=8192 count=102400  

创建zpool

[root@db-192-168-173-219 opt]# zpool create -o ashift=12 -o autoreplace=off zp2 mirror /opt/zfs.disk1 /opt/zfs.disk2 spare /opt/zfs.disk3  

查看当前状态

[root@db-192-168-173-219 opt]# zpool status zp2  
  pool: zp2  
 state: ONLINE  
  scan: none requested  
config:  
  
        NAME                STATE     READ WRITE CKSUM  
        zp2                 ONLINE       0     0     0  
          mirror-0          ONLINE       0     0     0  
            /opt/zfs.disk1  ONLINE       0     0     0  
            /opt/zfs.disk2  ONLINE       0     0     0  
        spares  
          /opt/zfs.disk3    AVAIL     
  
errors: No known data errors  

手工offline一块盘.

[root@db-192-168-173-219 opt]# zpool offline zp2 /opt/zfs.disk1  

查看当前状态

[root@db-192-168-173-219 test1]# zpool status -v zp2  
  pool: zp2  
 state: DEGRADED  
status: One or more devices has been taken offline by the administrator.  
        Sufficient replicas exist for the pool to continue functioning in a  
        degraded state.  
action: Online the device using 'zpool online' or replace the device with  
        'zpool replace'.  
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Jul 31 09:20:25 2014  
config:  
  
        NAME                STATE     READ WRITE CKSUM  
        zp2                 DEGRADED     0     0     0  
          mirror-0          DEGRADED     0     0     0  
            /opt/zfs.disk1  OFFLINE      0     0     0  
            /opt/zfs.disk2  ONLINE       0     0     0  
        spares  
          /opt/zfs.disk3    AVAIL     
  
errors: No known data errors  

使用hot spare盘顶替offline的盘.

[root@db-192-168-173-219 opt]# zpool replace zp2 /opt/zfs.disk1 /opt/zfs.disk3  
[root@db-192-168-173-219 opt]# zpool status zp2  
  pool: zp2  
 state: DEGRADED  
status: One or more devices has been taken offline by the administrator.  
        Sufficient replicas exist for the pool to continue functioning in a  
        degraded state.  
action: Online the device using 'zpool online' or replace the device with  
        'zpool replace'.  
  scan: resilvered 1000K in 0h0m with 0 errors on Thu Jul 31 09:26:25 2014  
config:  
  
        NAME                  STATE     READ WRITE CKSUM  
        zp2                   DEGRADED     0     0     0  
          mirror-0            DEGRADED     0     0     0  
            spare-0           OFFLINE      0     0     0  
              /opt/zfs.disk1  OFFLINE      0     0     0  
              /opt/zfs.disk3  ONLINE       0     0     0  
            /opt/zfs.disk2    ONLINE       0     0     0  
        spares  
          /opt/zfs.disk3      INUSE     currently in use  

接下来直接使用原盘替换时会报错, 因为这个盘目前有zpool的信息. 起到了保护作用.

[root@db-192-168-173-219 test1]# zpool replace zp2 /opt/zfs.disk1 /opt/zfs.disk1  
invalid vdev specification  
use '-f' to override the following errors:  
/opt/zfs.disk1 is part of active pool 'zp2'  

删除原盘对应的文件, 并新建一个文件来顶替

[root@db-192-168-173-219 opt]# rm -f zfs.disk1  
[root@db-192-168-173-219 opt]# zpool replace zp2 /opt/zfs.disk1 /opt/zfs.disk1  
cannot resolve path '/opt/zfs.disk1'  
[root@db-192-168-173-219 opt]# dd if=/dev/zero of=/opt/zfs.disk1 bs=8192 count=102400   
102400+0 records in  
102400+0 records out  
838860800 bytes (839 MB) copied, 1.48687 s, 564 MB/s  

使用新建的文件定义offline的盘. hot spare盘回到avail状态.

[root@db-192-168-173-219 opt]# zpool replace zp2 /opt/zfs.disk1 /opt/zfs.disk1  
[root@db-192-168-173-219 opt]# zpool status -v zp2  
  pool: zp2  
 state: ONLINE  
  scan: resilvered 1.05M in 0h0m with 0 errors on Thu Jul 31 09:27:28 2014  
config:  
  
        NAME                STATE     READ WRITE CKSUM  
        zp2                 ONLINE       0     0     0  
          mirror-0          ONLINE       0     0     0  
            /opt/zfs.disk1  ONLINE       0     0     0  
            /opt/zfs.disk2  ONLINE       0     0     0  
        spares  
          /opt/zfs.disk3    AVAIL     
  
errors: No known data errors  

注意

1. zpool的autoreplace和hot spare没有关系, 只和原盘有关, 当原盘被一个新的盘插入时自动加入原盘所在的vdev.

       autoreplace=on | off  
  
           Controls automatic device replacement. If set to "off", device replacement must be initiated by the  admin-  
           istrator  by  using the "zpool replace" command. If set to "on", any new device, found in the same physical  
           location as a device that previously belonged to the pool, is automatically  formatted  and  replaced.  The  
           default behavior is "off". This property can also be referred to by its shortened column name, "replace".  

2. 从下面的exp来看, 如果hot spare处于resilvering状态, 最好等resilver结束再replace到更换后的盘.

       zpool scrub [-s] pool ...  
  
           Begins  a  scrub. The scrub examines all data in the specified pools to verify that it checksums correctly.  
           For replicated (mirror or raidz) devices, ZFS automatically repairs any damage discovered during the scrub.  
           The  "zpool  status" command reports the progress of the scrub and summarizes the results of the scrub upon  
           completion.  
  
           Scrubbing and resilvering are very similar operations. The difference is  that  resilvering  only  examines  
           data that ZFS knows to be out of date (for example, when attaching a new device to a mirror or replacing an  
           existing device), whereas scrubbing examines all data to discover silent errors due to hardware  faults  or  
           disk failure.  
  
           Because  scrubbing  and resilvering are I/O-intensive operations, ZFS only allows one at a time. If a scrub  
           is already in progress, the "zpool scrub" command terminates it and starts a new scrub. If a resilver is in  
           progress, ZFS does not allow a scrub to be started until the resilver completes.  
  
           -s    Stop scrubbing.  
       Example 11 Managing Hot Spares  
  
       The following command creates a new pool with an available hot spare:  
  
         # zpool create tank mirror sda sdb spare sdc  
  
       If one of the disks were to fail, the pool would be reduced to the degraded state. The  failed  device  can  be  
       replaced using the following command:  
  
         # zpool replace tank sda sdd  
  
       Once  the  data  has  been  resilvered, the spare is automatically removed and is made available for use should  
       another device fails. The hot spare can be permanently removed from the pool using the following command:  
  
         # zpool remove tank sdc  

参考

1. man zpool

2. https://zpool.org/

3. http://www.zfsbuild.com/

4. http://www.minghao.hk/bbs/read.php?tid=757

Flag Counter

digoal’s 大量PostgreSQL文章入口