SSD 因 NCQ hang,failed command: WRITE FPDMA QUEUED / tag 28 ncq 4096 out
背景
新购入的建兴ZETA 256G,在CentOS 7.2中,用PostgreSQL自带的fsync测试工具pg_test_fsync测试IOPS时,突然IO hang住了。
dmesg报了一堆这样的超时:
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 895.604149] ata1.00: status: { DRDY }
[ 895.606940] ata1.00: failed command: WRITE FPDMA QUEUED
[ 895.609389] ata1.00: cmd 61/08:e0:38:bd:06/00:00:00:00:00/40 tag 28 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 895.614144] ata1.00: status: { DRDY }
[ 895.616516] ata1.00: failed command: WRITE FPDMA QUEUED
[ 895.618665] ata1.00: cmd 61/10:e8:00:90:06/02:00:00:00:00/40 tag 29 ncq 270336 out
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 895.622940] ata1.00: status: { DRDY }
[ 895.625089] ata1.00: failed command: WRITE FPDMA QUEUED
[ 895.627236] ata1.00: cmd 61/00:f0:00:8c:06/04:00:00:00:00/40 tag 30 ncq 524288 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 895.631176] ata1.00: status: { DRDY }
[ 895.633133] ata1: hard resetting link
[ 895.937682] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 895.940816] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 895.940830] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 895.941234] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 895.941243] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 895.941314] ata1.00: configured for UDMA/133
[ 895.941356] ata1.00: device reported invalid CHS sector 0
[ 895.941362] ata1.00: device reported invalid CHS sector 0
[ 895.941366] ata1.00: device reported invalid CHS sector 0
[ 895.941369] ata1.00: device reported invalid CHS sector 0
[ 895.941374] ata1.00: device reported invalid CHS sector 0
[ 895.941377] ata1.00: device reported invalid CHS sector 0
[ 895.941381] ata1.00: device reported invalid CHS sector 0
[ 895.941384] ata1.00: device reported invalid CHS sector 0
[ 895.941388] ata1.00: device reported invalid CHS sector 0
[ 895.941392] ata1.00: device reported invalid CHS sector 0
[ 895.941395] ata1.00: device reported invalid CHS sector 0
[ 895.941399] ata1.00: device reported invalid CHS sector 0
[ 895.941403] ata1.00: device reported invalid CHS sector 0
[ 895.941408] ata1.00: device reported invalid CHS sector 0
[ 895.941434] ata1: EH complete
现象和网上描述的类似,很多SSD有这样的问题。
https://bugzilla.kernel.org/show_bug.cgi?id=15573
https://communities.intel.com/thread/77801?start=0&tstart=0
http://www.cnblogs.com/welhzh/p/4469206.html
http://patchwork.ozlabs.org/patch/49365/
建议关闭ncq。
什么是NCQ?
http://baike.baidu.com/view/17501.htm
NCQ(Native Command Queuing,全速命令队列)是被设计用于改进在日益增加的负荷情况下硬盘的性能和稳定性的技术。当用户的应用程序发送多条指令到用户的硬盘,NCQ硬盘可以优化完成这些指令的顺序,从而降低机械负荷达到提升性能的目的。 NCQ技术是一种使硬盘内部优化工作负荷执行顺序,通过对内部队列中的命令进行重新排序实现智能数据管理,改善硬盘因机械部件而受到的各种性能制约。
貌似对SSD没什么用,所以是SSD的话,可以关闭它。
查看了一下,装载ncq的信息如下:
# dmesg|gerp ncq
[ 4.157792] ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck pm led clo pio slum part ems apst
解决办法:
禁用ncq,启动项中加入libata.force=noncq
[root@digoal ahci]# vi /etc/default/grub
GRUB_CMDLINE_LINUX="rhgb quiet libata.force=noncq"
重启。
或者修改/boot/grub2/grub.cfg 加到rhgb quiet后面
libata.force=noncq
(如果我有机械盘,又有SSD,怎么处理呢?)
(机械盘需要ncq,而SSD不需要NCQ。)
(此时需要patch libata的代码才行,针对硬盘型号来处理。)
针对不同的盘设置不同的queue_depth,设置为1和禁用ncq功能相当。
Disabling ncq by putting the following in /etc/conf.d/local.start.
echo 1 > /sys/block/sdX/device/queue_depth
解释一下 libata.force=noncq
通过查看libata的模块信息
[root@digoal ~]# modinfo libata
filename: /lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/ata/libata.ko
version: 3.00
license: GPL
description: Library module for ATA devices
author: Jeff Garzik
rhelversion: 7.2
srcversion: 042B7B276FD3988FFBEFB88
depends:
intree: Y
vermagic: 3.10.0-327.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: 79:AD:88:6A:11:3C:A0:22:35:26:33:6C:0F:82:5B:8A:94:29:6A:B3
sig_hashalgo: sha256
parm: acpi_gtf_filter:filter mask for ACPI _GTF commands, set to filter out (0x1=set xfermode, 0x2=lock/freeze lock, 0x4=DIPM, 0x8=FPDMA non-zero offset, 0x10=FPDMA DMA Setup FIS auto-activate) (int)
parm: force:Force ATA configurations including cable type, link speed and transfer mode (see Documentation/kernel-parameters.txt for details) (string)
parm: atapi_enabled:Enable discovery of ATAPI devices (0=off, 1=on [default]) (int)
parm: atapi_dmadir:Enable ATAPI DMADIR bridge support (0=off [default], 1=on) (int)
parm: atapi_passthru16:Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default]) (int)
parm: fua:FUA support (0=off [default], 1=on) (int)
parm: ignore_hpa:Ignore HPA limit (0=keep BIOS limits, 1=ignore limits, using full disk) (int)
parm: dma:DMA enable/disable (0x1==ATA, 0x2==ATAPI, 0x4==CF) (int)
parm: ata_probe_timeout:Set ATA probing timeout (seconds) (int)
parm: noacpi:Disable the use of ACPI in probe/suspend/resume (0=off [default], 1=on) (int)
parm: allow_tpm:Permit the use of TPM commands (0=off [default], 1=on) (int)
parm: atapi_an:Enable ATAPI AN media presence notification (0=0ff [default], 1=on) (int)
看到有一个force参数,它提示详见内核文档。
[root@digoal ~]# less /usr/share/doc/kernel-doc-3.10.0/Documentation/kernel-parameters.txt
找到了对应的解释
libata.force= [LIBATA] Force configurations. The format is comma
separated list of "[ID:]VAL" where ID is
PORT[.DEVICE]. PORT and DEVICE are decimal numbers
matching port, link or device. Basically, it matches
the ATA ID string printed on console by libata. If
the whole ID part is omitted, the last PORT and DEVICE
values are used. If ID hasn't been specified yet, the
configuration applies to all ports, links and devices.
If only DEVICE is omitted, the parameter applies to
the port and all links and devices behind it. DEVICE
number of 0 either selects the first device or the
first fan-out link behind PMP device. It does not
select the host link. DEVICE number of 15 selects the
host link and device attached to it.
The VAL specifies the configuration to force. As long
as there's no ambiguity shortcut notation is allowed.
For example, both 1.5 and 1.5G would work for 1.5Gbps.
The following configurations can be forced.
* Cable type: 40c, 80c, short40c, unk, ign or sata.
Any ID with matching PORT is used.
* SATA link speed limit: 1.5Gbps or 3.0Gbps.
* Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7].
udma[/][16,25,33,44,66,100,133] notation is also
allowed.
* [no]ncq: Turn on or off NCQ. # 和本文相关的部分。
* nohrst, nosrst, norst: suppress hard, soft
and both resets.
* rstonce: only attempt one reset during
hot-unplug link recovery
* dump_id: dump IDENTIFY data.
* atapi_dmadir: Enable ATAPI DMADIR bridge support
* disable: Disable this device.
If there are multiple matching configurations changing
the same attribute, the last one is used.
模块参数也可以在这里查看。
[root@digoal ~]# cd /sys/module/libata/parameters/
[root@digoal parameters]# ll
total 0
-rw-r--r-- 1 root root 4096 Dec 20 21:17 acpi_gtf_filter
-r--r--r-- 1 root root 4096 Dec 20 21:17 allow_tpm
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_an
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_dmadir
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_enabled
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_passthru16
-r--r--r-- 1 root root 4096 Dec 20 21:17 ata_probe_timeout
-r--r--r-- 1 root root 4096 Dec 20 21:17 dma
-r--r--r-- 1 root root 4096 Dec 20 21:17 fua
-rw-r--r-- 1 root root 4096 Dec 20 21:17 ignore_hpa
-r--r--r-- 1 root root 4096 Dec 20 21:17 noacpi