SSD 因 NCQ hang,failed command: WRITE FPDMA QUEUED / tag 28 ncq 4096 out

4 minute read

背景

新购入的建兴ZETA 256G,在CentOS 7.2中,用PostgreSQL自带的fsync测试工具pg_test_fsync测试IOPS时,突然IO hang住了。

dmesg报了一堆这样的超时:

	 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)  
[  895.604149] ata1.00: status: { DRDY }  
[  895.606940] ata1.00: failed command: WRITE FPDMA QUEUED  
[  895.609389] ata1.00: cmd 61/08:e0:38:bd:06/00:00:00:00:00/40 tag 28 ncq 4096 out  
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)  
[  895.614144] ata1.00: status: { DRDY }  
[  895.616516] ata1.00: failed command: WRITE FPDMA QUEUED  
[  895.618665] ata1.00: cmd 61/10:e8:00:90:06/02:00:00:00:00/40 tag 29 ncq 270336 out  
         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)  
[  895.622940] ata1.00: status: { DRDY }  
[  895.625089] ata1.00: failed command: WRITE FPDMA QUEUED  
[  895.627236] ata1.00: cmd 61/00:f0:00:8c:06/04:00:00:00:00/40 tag 30 ncq 524288 out  
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)  
[  895.631176] ata1.00: status: { DRDY }  
[  895.633133] ata1: hard resetting link  
[  895.937682] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)  
[  895.940816] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out  
[  895.940830] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out  
[  895.941234] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out  
[  895.941243] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out  
[  895.941314] ata1.00: configured for UDMA/133  
[  895.941356] ata1.00: device reported invalid CHS sector 0  
[  895.941362] ata1.00: device reported invalid CHS sector 0  
[  895.941366] ata1.00: device reported invalid CHS sector 0  
[  895.941369] ata1.00: device reported invalid CHS sector 0  
[  895.941374] ata1.00: device reported invalid CHS sector 0  
[  895.941377] ata1.00: device reported invalid CHS sector 0  
[  895.941381] ata1.00: device reported invalid CHS sector 0  
[  895.941384] ata1.00: device reported invalid CHS sector 0  
[  895.941388] ata1.00: device reported invalid CHS sector 0  
[  895.941392] ata1.00: device reported invalid CHS sector 0  
[  895.941395] ata1.00: device reported invalid CHS sector 0  
[  895.941399] ata1.00: device reported invalid CHS sector 0  
[  895.941403] ata1.00: device reported invalid CHS sector 0  
[  895.941408] ata1.00: device reported invalid CHS sector 0  
[  895.941434] ata1: EH complete  

现象和网上描述的类似,很多SSD有这样的问题。

https://bugzilla.kernel.org/show_bug.cgi?id=15573

https://communities.intel.com/thread/77801?start=0&tstart=0

http://www.cnblogs.com/welhzh/p/4469206.html

http://patchwork.ozlabs.org/patch/49365/

建议关闭ncq。

什么是NCQ?

http://baike.baidu.com/view/17501.htm

NCQ(Native Command Queuing,全速命令队列)是被设计用于改进在日益增加的负荷情况下硬盘的性能和稳定性的技术。当用户的应用程序发送多条指令到用户的硬盘,NCQ硬盘可以优化完成这些指令的顺序,从而降低机械负荷达到提升性能的目的。 NCQ技术是一种使硬盘内部优化工作负荷执行顺序,通过对内部队列中的命令进行重新排序实现智能数据管理,改善硬盘因机械部件而受到的各种性能制约。

貌似对SSD没什么用,所以是SSD的话,可以关闭它。

查看了一下,装载ncq的信息如下:

# dmesg|gerp ncq  
  
[    4.157792] ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck pm led clo pio slum part ems apst   

解决办法:

禁用ncq,启动项中加入libata.force=noncq

[root@digoal ahci]# vi /etc/default/grub   
  
GRUB_CMDLINE_LINUX="rhgb quiet libata.force=noncq"  

重启。

或者修改/boot/grub2/grub.cfg 加到rhgb quiet后面

libata.force=noncq   

(如果我有机械盘,又有SSD,怎么处理呢?)

(机械盘需要ncq,而SSD不需要NCQ。)

(此时需要patch libata的代码才行,针对硬盘型号来处理。)

针对不同的盘设置不同的queue_depth,设置为1和禁用ncq功能相当。

Disabling ncq by putting the following in /etc/conf.d/local.start.   
echo 1 > /sys/block/sdX/device/queue_depth   

解释一下 libata.force=noncq

通过查看libata的模块信息

[root@digoal ~]# modinfo libata  
filename:       /lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/ata/libata.ko  
version:        3.00  
license:        GPL  
description:    Library module for ATA devices  
author:         Jeff Garzik  
rhelversion:    7.2  
srcversion:     042B7B276FD3988FFBEFB88  
depends:          
intree:         Y  
vermagic:       3.10.0-327.el7.x86_64 SMP mod_unload modversions   
signer:         CentOS Linux kernel signing key  
sig_key:        79:AD:88:6A:11:3C:A0:22:35:26:33:6C:0F:82:5B:8A:94:29:6A:B3  
sig_hashalgo:   sha256  
parm:           acpi_gtf_filter:filter mask for ACPI _GTF commands, set to filter out (0x1=set xfermode, 0x2=lock/freeze lock, 0x4=DIPM, 0x8=FPDMA non-zero offset, 0x10=FPDMA DMA Setup FIS auto-activate) (int)  
parm:           force:Force ATA configurations including cable type, link speed and transfer mode (see Documentation/kernel-parameters.txt for details) (string)  
parm:           atapi_enabled:Enable discovery of ATAPI devices (0=off, 1=on [default]) (int)  
parm:           atapi_dmadir:Enable ATAPI DMADIR bridge support (0=off [default], 1=on) (int)  
parm:           atapi_passthru16:Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default]) (int)  
parm:           fua:FUA support (0=off [default], 1=on) (int)  
parm:           ignore_hpa:Ignore HPA limit (0=keep BIOS limits, 1=ignore limits, using full disk) (int)  
parm:           dma:DMA enable/disable (0x1==ATA, 0x2==ATAPI, 0x4==CF) (int)  
parm:           ata_probe_timeout:Set ATA probing timeout (seconds) (int)  
parm:           noacpi:Disable the use of ACPI in probe/suspend/resume (0=off [default], 1=on) (int)  
parm:           allow_tpm:Permit the use of TPM commands (0=off [default], 1=on) (int)  
parm:           atapi_an:Enable ATAPI AN media presence notification (0=0ff [default], 1=on) (int)  

看到有一个force参数,它提示详见内核文档。

[root@digoal ~]# less /usr/share/doc/kernel-doc-3.10.0/Documentation/kernel-parameters.txt  

找到了对应的解释

	libata.force=   [LIBATA] Force configurations.  The format is comma  
                        separated list of "[ID:]VAL" where ID is  
                        PORT[.DEVICE].  PORT and DEVICE are decimal numbers  
                        matching port, link or device.  Basically, it matches  
                        the ATA ID string printed on console by libata.  If  
                        the whole ID part is omitted, the last PORT and DEVICE  
                        values are used.  If ID hasn't been specified yet, the  
                        configuration applies to all ports, links and devices.  
  
                        If only DEVICE is omitted, the parameter applies to  
                        the port and all links and devices behind it.  DEVICE  
                        number of 0 either selects the first device or the  
                        first fan-out link behind PMP device.  It does not  
                        select the host link.  DEVICE number of 15 selects the  
                        host link and device attached to it.  
  
                        The VAL specifies the configuration to force.  As long  
                        as there's no ambiguity shortcut notation is allowed.  
                        For example, both 1.5 and 1.5G would work for 1.5Gbps.  
                        The following configurations can be forced.  
  
                        * Cable type: 40c, 80c, short40c, unk, ign or sata.  
                          Any ID with matching PORT is used.  
  
                        * SATA link speed limit: 1.5Gbps or 3.0Gbps.  
  
                        * Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7].  
                          udma[/][16,25,33,44,66,100,133] notation is also  
                          allowed.  
  
                        * [no]ncq: Turn on or off NCQ.  # 和本文相关的部分。  
  
                        * nohrst, nosrst, norst: suppress hard, soft  
                          and both resets.  
  
                        * rstonce: only attempt one reset during  
                          hot-unplug link recovery  
  
                        * dump_id: dump IDENTIFY data.  
  
                        * atapi_dmadir: Enable ATAPI DMADIR bridge support  
  
                        * disable: Disable this device.  
  
                        If there are multiple matching configurations changing  
                        the same attribute, the last one is used.  

模块参数也可以在这里查看。

[root@digoal ~]# cd /sys/module/libata/parameters/  
[root@digoal parameters]# ll  
total 0  
-rw-r--r-- 1 root root 4096 Dec 20 21:17 acpi_gtf_filter  
-r--r--r-- 1 root root 4096 Dec 20 21:17 allow_tpm  
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_an  
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_dmadir  
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_enabled  
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_passthru16  
-r--r--r-- 1 root root 4096 Dec 20 21:17 ata_probe_timeout  
-r--r--r-- 1 root root 4096 Dec 20 21:17 dma  
-r--r--r-- 1 root root 4096 Dec 20 21:17 fua  
-rw-r--r-- 1 root root 4096 Dec 20 21:17 ignore_hpa  
-r--r--r-- 1 root root 4096 Dec 20 21:17 noacpi  

Flag Counter

digoal’s 大量PostgreSQL文章入口