ZPOOL health check and repair use scrub
背景
zpool健康检查(scrub)主要用于通过checksum来检查zpool数据块的数据是否正常, 如果vdev是mirror或raidz的, 可以自动从其他设备来修复异常的数据块. 由于健康检查是IO开销很大的动作, 所以建议在不繁忙的时候操作(scrub只检查分配出去的数据块, 不会检查空闲的数据块, 所以只和使用率有关, 对于一个很大的zpool, 如果使用率很低的话, scrub也是很快完成的).
用法 :
# zpool scrub zp1
查看zpool状态, 如下, 正在做scrub
# zpool status zp1
pool: zp1
state: ONLINE
scan: scrub in progress since Tue Jun 17 15:17:01 2014
19.7G scanned out of 1.23T at 56.7M/s, 6h11m to go
0 repaired, 1.57% done
config:
NAME STATE READ WRITE CKSUM
zp1 ONLINE 0 0 0
vd01vol01 ONLINE 0 0 0
vd02vol01 ONLINE 0 0 0
vd03vol01 ONLINE 0 0 0
vd04vol01 ONLINE 0 0 0
errors: No known data errors
# zpool get all zp1
NAME PROPERTY VALUE SOURCE
zp1 size 9.75T -
zp1 capacity 12% -
zp1 altroot - default
zp1 health ONLINE -
zp1 guid 5877722976139588848 default
zp1 version - default
zp1 bootfs - default
zp1 delegation on default
zp1 autoreplace off default
zp1 cachefile - default
zp1 failmode wait default
zp1 listsnapshots off default
zp1 autoexpand off default
zp1 dedupditto 0 default
zp1 dedupratio 1.01x -
zp1 free 8.52T -
zp1 allocated 1.23T -
zp1 readonly off -
zp1 ashift 0 default
zp1 comment - default
zp1 expandsize 0 -
zp1 freeing 0 default
zp1 feature@async_destroy enabled local
zp1 feature@empty_bpobj active local
zp1 feature@lz4_compress active local
在做scrub时, 设备的io几乎消耗殆尽.
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 160.00 0.00 982.00 0.00 6.14 0.61 3.81 3.80 60.80
sdc 0.00 0.00 164.00 0.00 864.00 0.00 5.27 0.59 3.57 3.57 58.50
sdd 0.00 0.00 168.00 0.00 1100.00 0.00 6.55 0.61 3.63 3.64 61.20
sde 0.00 0.00 172.00 0.00 1116.00 0.00 6.49 0.61 3.52 3.52 60.60
sdf 0.00 0.00 160.00 0.00 1018.00 0.00 6.36 0.61 3.80 3.80 60.80
sdg 0.00 0.00 164.00 0.00 857.00 0.00 5.23 0.57 3.49 3.50 57.40
sdh 0.00 0.00 169.00 0.00 995.00 0.00 5.89 0.61 3.63 3.60 60.90
sdi 0.00 0.00 173.00 0.00 1066.00 0.00 6.16 0.60 3.49 3.46 59.90
dm-0 0.00 0.00 320.00 0.00 2000.00 0.00 6.25 1.99 6.21 3.12 100.00
dm-1 0.00 0.00 328.00 0.00 1721.00 0.00 5.25 1.91 5.83 3.01 98.60
dm-2 0.00 0.00 337.00 0.00 2095.00 0.00 6.22 1.99 5.92 2.97 100.00
dm-3 0.00 0.00 345.00 0.00 2182.00 0.00 6.32 1.99 5.77 2.90 100.00
scrub结束后, 可以通过-xv来查看是否有异常.
# zpool status zp1 -xv
pool 'zp1' is healthy
参考
- man zpool
zpool scrub [-s] pool ...
Begins a scrub. The scrub examines all data in the specified pools to verify that it checksums correctly.
For replicated (mirror or raidz) devices, ZFS automatically repairs any damage discovered during the scrub.
The "zpool status" command reports the progress of the scrub and summarizes the results of the scrub upon
completion.
Scrubbing and resilvering are very similar operations. The difference is that resilvering only examines
data that ZFS knows to be out of date (for example, when attaching a new device to a mirror or replacing an
existing device), whereas scrubbing examines all data to discover silent errors due to hardware faults or
disk failure.
Because scrubbing and resilvering are I/O-intensive operations, ZFS only allows one at a time. If a scrub
is already in progress, the "zpool scrub" command terminates it and starts a new scrub. If a resilver is in
progress, ZFS does not allow a scrub to be started until the resilver completes.
-s Stop scrubbing.
zpool status [-xvD] [-T d | u] [pool] ... [interval [count]]
Displays the detailed health status for the given pools. If no pool is specified, then the status of each
pool in the system is displayed. For more information on pool and device health, see the "Device Failure
and Recovery" section.
If a scrub or resilver is in progress, this command reports the percentage done and the estimated time to
completion. Both of these are only approximate, because the amount of data in the pool and the other work-
loads on the system can change.
-x Only display status for pools that are exhibiting errors or are otherwise unavailable. Warnings
about pools not using the latest on-disk format will not be included.
-v Displays verbose data error information, printing out a complete list of all data errors since
the last complete pool scrub.
-D Display a histogram of deduplication statistics, showing the allocated (physically present on
disk) and referenced (logically referenced in the pool) block counts and sizes by reference
count.
-T d | u Display a time stamp.
Specify u for a printed representation of the internal representation of time. See time(2).
Specify d for standard date format. See date(1).