netfilter内核模块知识 - 解决nf_conntrack: table full, dropping packet

9 minute read

背景

netfilter是一个Linux内核网络包管理模块。支持包过滤、规则转发、会话状态跟踪等功能。

很多人可能遇到过nf_conntrack table full, dropping packet的问题,(正常情况下不应该出这样的问题,除非是DDos攻击,把netfilter的会话跟踪表打满了),又或者应用程序设计有问题,没有正常的关闭会话导致netfilter会话跟踪表打满。

netfilter涉及到一系列的内核参数,本文会一一介绍。

如何解决呢?

一、如何启用netfilter

启动iptables后,会加载netfilter相关模块。

例如设置一个这样的iptables规则:

vi /etc/sysconfig/iptables    
    
# Firewall configuration written by system-config-firewall    
# Manual customization of this file is not recommended.    
*filter    
:INPUT ACCEPT [0:0]    
:FORWARD ACCEPT [0:0]    
:OUTPUT ACCEPT [0:0]    
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT    
-A INPUT -p icmp -j ACCEPT    
-A INPUT -i lo -j ACCEPT    
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT    
-A INPUT -j REJECT --reject-with icmp-host-prohibited    
-A FORWARD -j REJECT --reject-with icmp-host-prohibited    
COMMIT    

启动iptables

service iptables start    

如果你启动了iptables,但是发现nf_conntrack表是空的,那么需要启动一下nf_conntrack_ipv4模块。

cat /proc/net/nf_conntrack    
没有内容    

加载对应跟踪模块即可(以CentOS 6为例):

[root@plop ~]# modprobe /proc/net/nf_conntrack_ipv4    
[root@plop ~]# lsmod | grep nf_conntrack    
nf_conntrack_ipv4       9506  0    
nf_defrag_ipv4          1483  1 nf_conntrack_ipv4    
nf_conntrack_ipv6       8748  2    
nf_defrag_ipv6         11182  1 nf_conntrack_ipv6    
nf_conntrack           79758  3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state    
ipv6                  317340  28 sctp,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6    

现在就可以看到nf_conntrack表的内容了

cat /proc/net/nf_conntrack    
ipv4     2 tcp      6 296 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 mark=0 zone=0 use=2    
ipv4     2 tcp      6 293 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 mark=0 zone=0 use=2    
ipv4     2 udp      17 29 src=172.19.39.244 dst=10.143.33.51 sport=123 dport=123 [UNREPLIED] src=10.143.33.51 dst=172.19.39.244 sport=123 dport=123 mark=0 zone=0 use=2    
ipv4     2 udp      17 175 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 zone=0 use=2    
ipv4     2 tcp      6 299 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 zone=0 use=2    

二、如何查看netfilter会话表

1、查看proc文件系统,可以查看会话表。

cat /proc/net/nf_conntrack    
    
ipv4     2 tcp      6 296 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 mark=0 zone=0 use=2    
ipv4     2 tcp      6 293 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 mark=0 zone=0 use=2    
ipv4     2 udp      17 29 src=172.19.39.244 dst=10.143.33.51 sport=123 dport=123 [UNREPLIED] src=10.143.33.51 dst=172.19.39.244 sport=123 dport=123 mark=0 zone=0 use=2    
ipv4     2 udp      17 175 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 zone=0 use=2    
ipv4     2 tcp      6 299 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 zone=0 use=2    

2、通过conntrack命令行工具查看conntrack的内容

# yum install -y conntrack  
  
# conntrack -L  
tcp      6 431974 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 [ASSURED] mark=0 use=1  
tcp      6 299 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 [ASSURED] mark=0 use=1  
tcp      6 34 TIME_WAIT src=172.19.39.244 dst=100.100.2.148 sport=55898 dport=80 src=100.100.2.148 dst=172.19.39.244 sport=80 dport=55898 [ASSURED] mark=0 use=1  
tcp      6 34 TIME_WAIT src=172.19.39.244 dst=100.100.2.148 sport=55896 dport=80 src=100.100.2.148 dst=172.19.39.244 sport=80 dport=55896 [ASSURED] mark=0 use=1  
udp      17 176 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 use=1  
tcp      6 431990 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 use=1  
conntrack v1.4.4 (conntrack-tools): 6 flow entries have been shown.  

3、会话表的内容解释请参考如下(我们关注第五列,还有多少秒这条会话信息会从跟踪表清除):

https://stackoverflow.com/questions/16034698/details-of-proc-net-ip-conntrack-nf-conntrack

http://en.wikipedia.org/w/index.php?title=OSI_protocols&oldid=545448188

The format of a line from /proc/net/ip_conntrack is the same as for /proc/net/nf_conntrack, except the first two columns are missing.

I’ll try to summarize the format of the latter file, as I understand it from the net/netfilter/nf_conntrack_standalone.c, net/netfilter/nf_conntrack_acct.c and the net/netfilter/nf_conntrack_proto_*.c kernel source files. The term layer refers to the OSI protocol layer model.

  • First column: The network layer protocol name (eg. ipv4).
  • Second column: The network layer protocol number.
  • Third column: The transmission layer protocol name (eg. tcp).
  • Fourth column: The transmission layer protocol number.
  • Fifth column: The seconds until the entry is invalidated.
  • Sixth column (Not all protocols): The connection state.
  • All other columns are named (key=value) or represent flags ([UNREPLIED], [ASSURED], ...). A line can contain up to two columns having the same name (eg. src and dst). Then, the first occurrence relates to the request direction and the second occurrence relates to the response direction.

Meaning of the flags:

  • [ASSURED]: Traffic has been seen in both (ie. request and response) direction.
  • [UNREPLIED]: Traffic has not been seen in response direction yet. In case the connection tracking cache overflows, these connections are dropped first.

Please note that some column names appear only for specific protocols (eg. sport and dport for TCP and UDP, type and code for ICMP). Other column names (eg. mark) appear only if the kernel was built with specific options.

Examples:

  • ipv4 2 tcp 6 300 ESTABLISHED src=1.1.1.2 dst=2.2.2.2 sport=2000 dport=80 src=2.2.2.2 dst=1.1.1.1 sport=80 dport=12000 [ASSURED] mark=0 use=2 belongs to an established TCP connection from host 1.1.1.2, port 2000, to host 2.2.2.2, port 80, from which responses are sent to host 1.1.1.1, port 12000, timing out in five minutes. For this connection, packets have been seen in both directions.
  • ipv4 2 icmp 1 3 src=1.1.1.2 dst=1.1.1.1 type=8 code=0 id=32354 src=1.1.1.1 dst=1.1.1.2 type=0 code=0 id=32354 mark=0 use=2 belongs to an ICMP echo request packet from host 1.1.1.2 to host 1.1.1.1 with an expected echo reply packet from host 1.1.1.1 to host 1.1.1.2, timing out in three seconds.
    The response destination host is not necessarily the same as the request source host, as the request source address may have been masqueraded by the response destination host.

Please note that the following information might not be up-to-date!

Fields available for all entries:

  • bytes (if accounting is enabled, request and response)
  • delta-time (if CONFIG_NF_CONNTRACK_TIMESTAMP is enabled)
  • dst (request and response)
  • mark (if CONFIG_NF_CONNTRACK_MARK is enabled)
  • packets (if accounting is enabled, request and response)
  • secctx (if CONFIG_NF_CONNTRACK_SECMARK is enabled)
  • src (request and response)
  • use
  • zone (if CONFIG_NF_CONNTRACK_ZONES is enabled)

Fields available for dccp, sctp, tcp, udp and udplite transmission layer protocols:

  • dport (request and response)
  • sport (request and response)

Fields available for icmp transmission layer protocol:

  • code (request and response)
  • id (request and response)
  • type (request and response)

Fields available for gre transmission layer protocol:

  • dstkey (request and response)
  • srckey (request and response)
  • stream_timeout
  • timeout

Allowed values for the sixth field:

  • dccp transmission layer protocol
  • CLOSEREQ
  • CLOSING
  • IGNORE
  • INVALID
  • NONE
  • OPEN
  • PARTOPEN
  • REQUEST
  • RESPOND
  • TIME_WAIT

  • sctp transmission layer protocol
  • CLOSED
  • COOKIE_ECHOED
  • COOKIE_WAIT
  • ESTABLISHED
  • NONE
  • SHUTDOWN_ACK_SENT
  • SHUTDOWN_RECD
  • SHUTDOWN_SENT

  • tcp transmission layer protocol
  • CLOSE
  • CLOSE_WAIT
  • ESTABLISHED
  • FIN_WAIT
  • LAST_ACK
  • NONE
  • SYN_RECV
  • SYN_SENT
  • SYN_SENT2
  • TIME_WAIT

三、netfilter的相关内核参数和解释

参考内核帮助文档

/usr/share/doc/kernel-doc-3.10.0/Documentation/networking/nf_conntrack-sysctl.txt  
/proc/sys/net/netfilter/nf_conntrack_* Variables:  
  
nf_conntrack_acct - BOOLEAN  
        0 - disabled (default)  
        not 0 - enabled  
  
        Enable connection tracking flow accounting. 64-bit byte and packet  
        counters per flow are added.  
  
nf_conntrack_buckets - INTEGER (read-only)  
        Size of hash table. If not specified as parameter during module  
        loading, the default size is calculated by dividing total memory  
        by 16384 to determine the number of buckets but the hash table will  
        never have fewer than 32 and limited to 16384 buckets. For systems  
        with more than 4GB of memory it will be 65536 buckets.  
  
nf_conntrack_checksum - BOOLEAN  
        0 - disabled  
        not 0 - enabled (default)  
  
        Verify checksum of incoming packets. Packets with bad checksums are  
        in INVALID state. If this is enabled, such packets will not be  
        considered for connection tracking.  
  
nf_conntrack_count - INTEGER (read-only)  
        Number of currently allocated flow entries.  
  
nf_conntrack_events - BOOLEAN  
        0 - disabled  
        not 0 - enabled (default)  
  
        If this option is enabled, the connection tracking code will  
        provide userspace with connection tracking events via ctnetlink.  
  
nf_conntrack_events_retry_timeout - INTEGER (seconds)  
        default 15  
  
        This option is only relevant when "reliable connection tracking  
        events" are used.  Normally, ctnetlink is "lossy", that is,  
        events are normally dropped when userspace listeners can't keep up.  
  
        Userspace can request "reliable event mode".  When this mode is  
        active, the conntrack will only be destroyed after the event was  
        delivered.  If event delivery fails, the kernel periodically  
        re-tries to send the event to userspace.  
  
        This is the maximum interval the kernel should use when re-trying  
        to deliver the destroy event.  
  
        A higher number means there will be fewer delivery retries and it  
        will take longer for a backlog to be processed.  
  
nf_conntrack_expect_max - INTEGER  
        Maximum size of expectation table.  Default value is  
        nf_conntrack_buckets / 256. Minimum is 1.  
  
nf_conntrack_frag6_high_thresh - INTEGER  
        default 262144  
  
        Maximum memory used to reassemble IPv6 fragments.  When  
        nf_conntrack_frag6_high_thresh bytes of memory is allocated for this  
        purpose, the fragment handler will toss packets until  
        nf_conntrack_frag6_low_thresh is reached.  
  
nf_conntrack_frag6_low_thresh - INTEGER  
        default 196608  
  
        See nf_conntrack_frag6_low_thresh  
  
nf_conntrack_frag6_timeout - INTEGER (seconds)  
        default 60  
  
        Time to keep an IPv6 fragment in memory.  
  
nf_conntrack_generic_timeout - INTEGER (seconds)  
        default 600  
  
        Default for generic timeout.  This refers to layer 4 unknown/unsupported  
        protocols.  
  
nf_conntrack_helper - BOOLEAN  
        0 - disabled  
        not 0 - enabled (default)  
  
        Enable automatic conntrack helper assignment.  
  
nf_conntrack_icmp_timeout - INTEGER (seconds)  
        default 30  
  
        Default for ICMP timeout.  
  
nf_conntrack_icmpv6_timeout - INTEGER (seconds)  
        default 30  
  
        Default for ICMP6 timeout.  
  
nf_conntrack_log_invalid - INTEGER  
        0   - disable (default)  
        1   - log ICMP packets  
        6   - log TCP packets  
        17  - log UDP packets  
        33  - log DCCP packets  
        41  - log ICMPv6 packets  
        136 - log UDPLITE packets  
        255 - log packets of any protocol  
  
        Log invalid packets of a type specified by value.  
  
nf_conntrack_max - INTEGER  
        Size of connection tracking table.  Default value is  
        nf_conntrack_buckets value * 4.  
  
nf_conntrack_tcp_be_liberal - BOOLEAN  
        0 - disabled (default)  
        not 0 - enabled  
  
        Be conservative in what you do, be liberal in what you accept from others.  
        If it's non-zero, we mark only out of window RST segments as INVALID.  
  
nf_conntrack_tcp_loose - BOOLEAN  
        0 - disabled  
        not 0 - enabled (default)  
  
        If it is set to zero, we disable picking up already established  
        connections.  
  
nf_conntrack_tcp_max_retrans - INTEGER  
        default 3  
  
        Maximum number of packets that can be retransmitted without  
        received an (acceptable) ACK from the destination. If this number  
        is reached, a shorter timer will be started.  
  
nf_conntrack_tcp_timeout_close - INTEGER (seconds)  
        default 10  
  
nf_conntrack_tcp_timeout_close_wait - INTEGER (seconds)  
        default 60  
  
nf_conntrack_tcp_timeout_established - INTEGER (seconds)  
        default 432000 (5 days)  
  
nf_conntrack_tcp_timeout_fin_wait - INTEGER (seconds)  
        default 120  
  
nf_conntrack_tcp_timeout_last_ack - INTEGER (seconds)  
        default 30  
  
nf_conntrack_tcp_timeout_max_retrans - INTEGER (seconds)  
        default 300  
  
nf_conntrack_tcp_timeout_syn_recv - INTEGER (seconds)  
        default 60  
  
nf_conntrack_tcp_timeout_syn_sent - INTEGER (seconds)  
        default 120  
  
nf_conntrack_tcp_timeout_time_wait - INTEGER (seconds)  
        default 120  
  
nf_conntrack_tcp_timeout_unacknowledged - INTEGER (seconds)  
        default 300  
  
nf_conntrack_timestamp - BOOLEAN  
        0 - disabled (default)  
        not 0 - enabled  
  
        Enable connection tracking flow timestamping.  
  
nf_conntrack_udp_timeout - INTEGER (seconds)  
        default 30  
  
nf_conntrack_udp_timeout_stream2 - INTEGER (seconds)  
        default 180  
  
        This extended timeout will be used in case there is an UDP stream  
        detected.  

还有多少秒这条会话信息会从跟踪表清除,取决于超时参数的配置,以及是否有包传输,有包传输时,这个时间会重置为超时时间。

四、什么时候会话表会满

当会话表中的记录大于内核设置nf_conntrack_max的值时,会导致会话表满。

nf_conntrack_max - INTEGER  
        Size of connection tracking table.  Default value is  
        nf_conntrack_buckets value * 4.  

错误例子:

less /var/log/messages  
  
Nov  3 23:30:27 digoal_host kernel: : [63500383.870591] nf_conntrack: table full, dropping packet.  
Nov  3 23:30:27 digoal_host kernel: : [63500383.962423] nf_conntrack: table full, dropping packet.  
Nov  3 23:30:27 digoal_host kernel: : [63500384.060399] nf_conntrack: table full, dropping packet.  

五、会话表满的解决办法

nf_conntrack table full的问题,会导致丢包,影响网络质量,严重时甚至导致网络不可用。

解决方法举例:

1、排查是否DDoS攻击,如果是,从预防攻击层面解决问题。

2、清空会话表。

重启iptables,会自动清空nf_conntrack table。注意,重启前先保存当前iptables配置(iptables-save > /etc/sysconfig/iptables ; service iptables restart)。

3、应用程序正常关闭会话

设计应用时,正常关闭会话很重要。

4、加大表的上限(需要考虑内存的消耗)

sysctl -w net.nf_conntrack_max = 10240000  

永久生效

vi /etc/sysctl.conf  
net.nf_conntrack_max = 10240000  

计算方法,参考:

《[转载]解决 nf_conntrack: table full, dropping packet 的几种思路》

5、设置更短的会话跟踪超时时间

查看当前设置:

# sysctl -a|grep netfilter  

net.netfilter.nf_conntrack_acct = 0  
net.netfilter.nf_conntrack_buckets = 8192  
net.netfilter.nf_conntrack_checksum = 1  
net.netfilter.nf_conntrack_count = 5  
net.netfilter.nf_conntrack_events = 1  
net.netfilter.nf_conntrack_events_retry_timeout = 15  
net.netfilter.nf_conntrack_expect_max = 124  
net.netfilter.nf_conntrack_generic_timeout = 600  
net.netfilter.nf_conntrack_helper = 1  
net.netfilter.nf_conntrack_icmp_timeout = 30  
net.netfilter.nf_conntrack_log_invalid = 0  
net.netfilter.nf_conntrack_max = 31760  
net.netfilter.nf_conntrack_tcp_be_liberal = 0  
net.netfilter.nf_conntrack_tcp_loose = 1  
net.netfilter.nf_conntrack_tcp_max_retrans = 3  
net.netfilter.nf_conntrack_tcp_timeout_close = 10  
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60  
net.netfilter.nf_conntrack_tcp_timeout_established = 432000  
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120  
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30  
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300  
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60  
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120  
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120  
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300  
net.netfilter.nf_conntrack_timestamp = 0  
net.netfilter.nf_conntrack_udp_timeout = 30  
net.netfilter.nf_conntrack_udp_timeout_stream = 180  
net.netfilter.nf_log.0 = NONE  
net.netfilter.nf_log.1 = NONE  
net.netfilter.nf_log.10 = NONE  
net.netfilter.nf_log.11 = NONE  
net.netfilter.nf_log.12 = NONE  
net.netfilter.nf_log.2 = NONE  
net.netfilter.nf_log.3 = NONE  
net.netfilter.nf_log.4 = NONE  
net.netfilter.nf_log.5 = NONE  
net.netfilter.nf_log.6 = NONE  
net.netfilter.nf_log.7 = NONE  
net.netfilter.nf_log.8 = NONE  
net.netfilter.nf_log.9 = NONE  

修改设置:

建议参考

https://security.stackexchange.com/questions/43205/nf-conntrack-table-full-dropping-packet

The message means your connection tracking table is full.   
There are no security implications other than DoS.   
You can partially mitigate this by increasing the maximum number of connections being tracked,   
reducing the tracking timeouts or by disabling connection tracking altogether,   
which is doable on server, but not on a NAT router,   
because the latter will cease to function.  
  
单位秒  
  
sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=54000  
sysctl -w net.netfilter.nf_conntrack_generic_timeout=120  
sysctl -w net.ipv4.netfilter.ip_conntrack_max=<more than currently set>  

六、备份和恢复iptables规则

1、保存当前iptables规则到配置文件

iptables-save > /etc/sysconfig/iptables    

2、从配置文件,恢复iptables规则

iptables-restore < /etc/sysconfig/iptables    

3、启动iptables服务

service iptables start    
或    
iptables-restore < /etc/sysconfig/iptables    

4、关闭iptables服务

彻底关闭    
service iptables stop    
rmmod iptable_filter    
    
或,使用如下方法清空对应表    
iptables -F -t nat    
iptables -F -t filter    
iptables -F -t raw    
iptables -F -t mangle    

5、查看当前iptables规则

iptables-save    

iptables -L -v -n -t filter    
iptables -L -v -n -t nat    
iptables -L -v -n -t raw    
iptables -L -v -n -t mangle    

参考

《[转载]解决 nf_conntrack: table full, dropping packet 的几种思路》

《转载 - nf_conntrack: table full, dropping packet. 终结篇》

http://netfilter.org/

/usr/share/doc/kernel-doc-3.10.0/Documentation/networking/nf_conntrack-sysctl.txt

https://stackoverflow.com/questions/20327518/need-to-drop-established-connections-with-iptables

https://security.stackexchange.com/questions/43205/nf-conntrack-table-full-dropping-packet

https://unix.stackexchange.com/questions/127081/conntrack-tcp-timeout-for-state-stablished-not-working

https://en.wikipedia.org/wiki/Netfilter#Connection_tracking

https://unix.stackexchange.com/questions/227259/why-is-proc-net-nf-conntrack-empty

https://stackoverflow.com/questions/16034698/details-of-proc-net-ip-conntrack-nf-conntrack

Flag Counter

digoal’s 大量PostgreSQL文章入口