Linux page allocation failure 的问题处理 - zone_reclaim_mode

3 minute read

背景

Linux内核分配失败,现象:

pic

内存使用一定量后,HANG。

dmesg中可能会有类似这样的错误,系统HANG住,无法连接,需要重启解决。

page allocation failure  
  
  
Oct 24 11:27:42  kernel: : [21289.479063] python2.6: page allocation failure. order:1, mode:0x20  
  
kernel: swapper: page allocation failure. order:1, mode:0x20  
kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.2.1.el6.x86_64 #1  
kernel: Call Trace:  
kernel: <IRQ>  [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0  
kernel: [<ffffffff81166ab2>] ? kmem_getpages+0x62/0x170  
kernel: [<ffffffff811676ca>] ? fallback_alloc+0x1ba/0x270  
kernel: [<ffffffff8116711f>] ? cache_grow+0x2cf/0x320  
kernel: [<ffffffff81167449>] ? ____cache_alloc_node+0x99/0x160  
kernel: [<ffffffff811683cb>] ? kmem_cache_alloc+0x11b/0x190  
kernel: [<ffffffff81439d58>] ? sk_prot_alloc+0x48/0x1c0  
kernel: [<ffffffff8143ae32>] ? sk_clone+0x22/0x2e0  
kernel: [<ffffffff81489d66>] ? inet_csk_clone+0x16/0xd0  
kernel: [<ffffffff814a2c73>] ? tcp_create_openreq_child+0x23/0x450  
kernel: [<ffffffff814a046d>] ? tcp_v4_syn_recv_sock+0x4d/0x310  
kernel: [<ffffffff814a2a16>] ? tcp_check_req+0x226/0x460  
kernel: [<ffffffff8149ff0b>] ? tcp_v4_do_rcv+0x35b/0x430  
kernel: [<ffffffff81082034>] ? mod_timer+0x144/0x220  
kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0  
kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0  
kernel: [<ffffffff8147f50d>] ? ip_local_deliver_finish+0xdd/0x2d0  
kernel: [<ffffffff8147f798>] ? ip_local_deliver+0x98/0xa0  
kernel: [<ffffffff8147ec5d>] ? ip_rcv_finish+0x12d/0x440  
kernel: [<ffffffff8147f1e5>] ? ip_rcv+0x275/0x350  
kernel: [<ffffffff814483bb>] ? __netif_receive_skb+0x4ab/0x750  
kernel: [<ffffffff8144a798>] ? netif_receive_skb+0x58/0x60  
kernel: [<ffffffffa008b975>] ? vmxnet3_rq_rx_complete+0x365/0x890 [vmxnet3]  
kernel: [<ffffffff8128d2b0>] ? swiotlb_map_page+0x0/0x100  
kernel: [<ffffffffa008c0f3>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3]  
kernel: [<ffffffff8144cf63>] ? net_rx_action+0x103/0x2f0  
kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0  
kernel: [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170  
kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30  
kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0  
kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90  
kernel: [<ffffffff81516f15>] ? do_IRQ+0x75/0xf0  
kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11  
kernel: <EOI>  [<ffffffff8103b90b>] ? native_safe_halt+0xb/0x10  
kernel: [<ffffffff8101495d>] ? default_idle+0x4d/0xb0  
kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110  
kernel: [<ffffffff81506d9c>] ? start_secondary+0x2ac/0x2ef  

解决方案 - 升级内核版本

1、升级到kernel-2.6.32-358.el6或更高内核。(但是不能彻底解决,只是减轻问题)

Update to kernel-2.6.32-358.el6 or higher, which contains the enhancement described in the Root Cause section below.  
  
Please note, this update (or newer) does not completely eliminate the possibility of the occurrence of the page allocation failure.  
The below mentioned workaround also works in 2.6.32-358.el6 and newer if the issue still persists even after the update.  

解决方案 - 修改内核参数

vi /etc/sysctl.conf or vi /etc/sysctl.d/xxx.conf  
  
vm.zone_reclaim_mode = 1  
vm.min_free_kbytes = 512000  
  
sysctl -w vm.zone_reclaim_mode=1  
sysctl -w vm.min_free_kbytes=512000  
The following tunables can be used in an attempt to alleviate or prevent the reported condition:  
  
Increase vm.min_free_kbytes value, for example to a higher value than a single allocation request.  
Change vm.zone_reclaim_mode to 1 if it's set to zero, so the system can reclaim back memory from cached memory.  
Both settings can be set in /etc/sysctl.conf, and loaded using sysctl -p /etc/sysctl.conf.  
  
For more information on these tunables, install the kernel-doc package and refer to file   
  
/usr/share/doc/kernel-doc-2.6.32/Documentation/sysctl/vm.txt.  

根本原因

6.4以前的版本,kswapd 不会处理

Before RHEL 6.4, kswapd does not try to free contiguous pages.

This can cause GFP_ATOMIC allocations requests to fail repeatedly,
when nothing else in the system defragments memory.

With RHEL 6.4 and newer, kswapd will compact (defragment) free memory, when required.

Please note that allocation failures can still happen.

For example, when a larger burst of GFP_ATOMIC allocations occur which kswapd may struggle to keep up with.

However, these allocations should eventually succeed.

There are also other more specific cases that can result in page allocation failures and cause additional issues.
Please refer to the following articles for more information

Zone_reclaim_mode 解释

Zone_reclaim_mode allows someone to set more or less aggressive approaches to  
reclaim memory when a zone runs out of memory. If it is set to zero then no  
zone reclaim occurs. Allocations will be satisfied from other zones / nodes  
in the system.  
  
This is value ORed together of  
  
1 = Zone reclaim on  
2 = Zone reclaim writes dirty pages out  
4 = Zone reclaim swaps pages  
  
zone_reclaim_mode is set during bootup to 1 if it is determined that pages  
from remote zones will cause a measurable performance reduction. The  
page allocator will then reclaim easily reusable pages (those page  
cache pages that are currently not used) before allocating off node pages.  
  
0: It may be beneficial to switch off zone reclaim if the system is  
used for a file server and all of memory should be used for caching files  
from disk. In that case the caching effect is more important than  
data locality.  
  
1: Allowing zone reclaim to write out pages stops processes that are  
writing large amounts of data from dirtying pages on other nodes. Zone  
reclaim will write out dirty pages if a zone fills up and so effectively  
throttle the process. This may decrease the performance of a single process  
  
2: since it cannot use all of system memory to buffer the outgoing writes  
anymore but it preserve the memory on other nodes so that the performance  
of other processes running on other nodes will not be affected.  
  
4: Allowing regular swap effectively restricts allocations to the local  
node unless explicitly overridden by memory policies or cpuset  
configurations.  

参考

http://www.zbuse.com/2014/07/837.html

https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory

https://access.redhat.com/solutions/90883

《Linux page allocation failure 的问题处理 - lowmem_reserve_ratio》

Flag Counter

digoal’s 大量PostgreSQL文章入口