netfilter内核模块知识 - 解决nf_conntrack: table full, dropping packet
背景
netfilter是一个Linux内核网络包管理模块。支持包过滤、规则转发、会话状态跟踪等功能。
很多人可能遇到过nf_conntrack table full, dropping packet的问题,(正常情况下不应该出这样的问题,除非是DDos攻击,把netfilter的会话跟踪表打满了),又或者应用程序设计有问题,没有正常的关闭会话导致netfilter会话跟踪表打满。
netfilter涉及到一系列的内核参数,本文会一一介绍。
如何解决呢?
一、如何启用netfilter
启动iptables后,会加载netfilter相关模块。
例如设置一个这样的iptables规则:
vi /etc/sysconfig/iptables
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
启动iptables
service iptables start
如果你启动了iptables,但是发现nf_conntrack表是空的,那么需要启动一下nf_conntrack_ipv4模块。
cat /proc/net/nf_conntrack
没有内容
加载对应跟踪模块即可(以CentOS 6为例):
[root@plop ~]# modprobe /proc/net/nf_conntrack_ipv4
[root@plop ~]# lsmod | grep nf_conntrack
nf_conntrack_ipv4 9506 0
nf_defrag_ipv4 1483 1 nf_conntrack_ipv4
nf_conntrack_ipv6 8748 2
nf_defrag_ipv6 11182 1 nf_conntrack_ipv6
nf_conntrack 79758 3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state
ipv6 317340 28 sctp,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6
现在就可以看到nf_conntrack表的内容了
cat /proc/net/nf_conntrack
ipv4 2 tcp 6 296 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 mark=0 zone=0 use=2
ipv4 2 tcp 6 293 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 mark=0 zone=0 use=2
ipv4 2 udp 17 29 src=172.19.39.244 dst=10.143.33.51 sport=123 dport=123 [UNREPLIED] src=10.143.33.51 dst=172.19.39.244 sport=123 dport=123 mark=0 zone=0 use=2
ipv4 2 udp 17 175 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 zone=0 use=2
ipv4 2 tcp 6 299 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 zone=0 use=2
二、如何查看netfilter会话表
1、查看proc文件系统,可以查看会话表。
cat /proc/net/nf_conntrack
ipv4 2 tcp 6 296 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 mark=0 zone=0 use=2
ipv4 2 tcp 6 293 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 mark=0 zone=0 use=2
ipv4 2 udp 17 29 src=172.19.39.244 dst=10.143.33.51 sport=123 dport=123 [UNREPLIED] src=10.143.33.51 dst=172.19.39.244 sport=123 dport=123 mark=0 zone=0 use=2
ipv4 2 udp 17 175 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 zone=0 use=2
ipv4 2 tcp 6 299 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 zone=0 use=2
2、通过conntrack命令行工具查看conntrack的内容
# yum install -y conntrack
# conntrack -L
tcp 6 431974 ESTABLISHED src=172.19.39.244 dst=140.205.140.205 sport=40518 dport=80 src=140.205.140.205 dst=172.19.39.244 sport=80 dport=40518 [ASSURED] mark=0 use=1
tcp 6 299 ESTABLISHED src=58.60.108.26 dst=172.19.39.244 sport=58826 dport=22 src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58826 [ASSURED] mark=0 use=1
tcp 6 34 TIME_WAIT src=172.19.39.244 dst=100.100.2.148 sport=55898 dport=80 src=100.100.2.148 dst=172.19.39.244 sport=80 dport=55898 [ASSURED] mark=0 use=1
tcp 6 34 TIME_WAIT src=172.19.39.244 dst=100.100.2.148 sport=55896 dport=80 src=100.100.2.148 dst=172.19.39.244 sport=80 dport=55896 [ASSURED] mark=0 use=1
udp 17 176 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 src=127.0.0.1 dst=127.0.0.1 sport=57900 dport=57900 [ASSURED] mark=0 use=1
tcp 6 431990 ESTABLISHED src=172.19.39.244 dst=58.60.108.26 sport=22 dport=58856 src=58.60.108.26 dst=172.19.39.244 sport=58856 dport=22 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 6 flow entries have been shown.
3、会话表的内容解释请参考如下(我们关注第五列,还有多少秒这条会话信息会从跟踪表清除):
https://stackoverflow.com/questions/16034698/details-of-proc-net-ip-conntrack-nf-conntrack
http://en.wikipedia.org/w/index.php?title=OSI_protocols&oldid=545448188
The format of a line from /proc/net/ip_conntrack is the same as for /proc/net/nf_conntrack, except the first two columns are missing.
I’ll try to summarize the format of the latter file, as I understand it from the net/netfilter/nf_conntrack_standalone.c, net/netfilter/nf_conntrack_acct.c and the net/netfilter/nf_conntrack_proto_*.c kernel source files. The term layer
refers to the OSI protocol layer model.
- First column: The network layer protocol name (eg.
ipv4
). - Second column: The network layer protocol number.
- Third column: The transmission layer protocol name (eg.
tcp
). - Fourth column: The transmission layer protocol number.
- Fifth column: The seconds until the entry is invalidated.
- Sixth column (Not all protocols): The connection state.
- All other columns are named (
key=value
) or represent flags ([UNREPLIED], [ASSURED], ...
). A line can contain up to two columns having the same name (eg.src
anddst
). Then, the first occurrence relates to the request direction and the second occurrence relates to the response direction.
Meaning of the flags:
[ASSURED]
: Traffic has been seen in both (ie. request and response) direction.[UNREPLIED]
: Traffic has not been seen in response direction yet. In case the connection tracking cache overflows, these connections are dropped first.
Please note that some column names appear only for specific protocols (eg. sport
and dport
for TCP and UDP, type
and code
for ICMP). Other column names (eg. mark
) appear only if the kernel was built with specific options.
Examples:
ipv4 2 tcp 6 300 ESTABLISHED src=1.1.1.2 dst=2.2.2.2 sport=2000 dport=80 src=2.2.2.2 dst=1.1.1.1 sport=80 dport=12000 [ASSURED] mark=0 use=2
belongs to an established TCP connection from host 1.1.1.2, port 2000, to host 2.2.2.2, port 80, from which responses are sent to host 1.1.1.1, port 12000, timing out in five minutes. For this connection, packets have been seen in both directions.ipv4 2 icmp 1 3 src=1.1.1.2 dst=1.1.1.1 type=8 code=0 id=32354 src=1.1.1.1 dst=1.1.1.2 type=0 code=0 id=32354 mark=0 use=2
belongs to an ICMP echo request packet from host 1.1.1.2 to host 1.1.1.1 with an expected echo reply packet from host 1.1.1.1 to host 1.1.1.2, timing out in three seconds.
The response destination host is not necessarily the same as the request source host, as the request source address may have been masqueraded by the response destination host.
Please note that the following information might not be up-to-date!
Fields available for all entries:
- bytes (if accounting is enabled, request and response)
- delta-time (if CONFIG_NF_CONNTRACK_TIMESTAMP is enabled)
- dst (request and response)
- mark (if CONFIG_NF_CONNTRACK_MARK is enabled)
- packets (if accounting is enabled, request and response)
- secctx (if CONFIG_NF_CONNTRACK_SECMARK is enabled)
- src (request and response)
- use
- zone (if CONFIG_NF_CONNTRACK_ZONES is enabled)
Fields available for dccp, sctp, tcp, udp and udplite transmission layer protocols:
- dport (request and response)
- sport (request and response)
Fields available for icmp transmission layer protocol:
- code (request and response)
- id (request and response)
- type (request and response)
Fields available for gre transmission layer protocol:
- dstkey (request and response)
- srckey (request and response)
- stream_timeout
- timeout
Allowed values for the sixth field:
- dccp transmission layer protocol
- CLOSEREQ
- CLOSING
- IGNORE
- INVALID
- NONE
- OPEN
- PARTOPEN
- REQUEST
- RESPOND
-
TIME_WAIT
- sctp transmission layer protocol
- CLOSED
- COOKIE_ECHOED
- COOKIE_WAIT
- ESTABLISHED
- NONE
- SHUTDOWN_ACK_SENT
- SHUTDOWN_RECD
-
SHUTDOWN_SENT
- tcp transmission layer protocol
- CLOSE
- CLOSE_WAIT
- ESTABLISHED
- FIN_WAIT
- LAST_ACK
- NONE
- SYN_RECV
- SYN_SENT
- SYN_SENT2
- TIME_WAIT
三、netfilter的相关内核参数和解释
参考内核帮助文档
/usr/share/doc/kernel-doc-3.10.0/Documentation/networking/nf_conntrack-sysctl.txt
/proc/sys/net/netfilter/nf_conntrack_* Variables:
nf_conntrack_acct - BOOLEAN
0 - disabled (default)
not 0 - enabled
Enable connection tracking flow accounting. 64-bit byte and packet
counters per flow are added.
nf_conntrack_buckets - INTEGER (read-only)
Size of hash table. If not specified as parameter during module
loading, the default size is calculated by dividing total memory
by 16384 to determine the number of buckets but the hash table will
never have fewer than 32 and limited to 16384 buckets. For systems
with more than 4GB of memory it will be 65536 buckets.
nf_conntrack_checksum - BOOLEAN
0 - disabled
not 0 - enabled (default)
Verify checksum of incoming packets. Packets with bad checksums are
in INVALID state. If this is enabled, such packets will not be
considered for connection tracking.
nf_conntrack_count - INTEGER (read-only)
Number of currently allocated flow entries.
nf_conntrack_events - BOOLEAN
0 - disabled
not 0 - enabled (default)
If this option is enabled, the connection tracking code will
provide userspace with connection tracking events via ctnetlink.
nf_conntrack_events_retry_timeout - INTEGER (seconds)
default 15
This option is only relevant when "reliable connection tracking
events" are used. Normally, ctnetlink is "lossy", that is,
events are normally dropped when userspace listeners can't keep up.
Userspace can request "reliable event mode". When this mode is
active, the conntrack will only be destroyed after the event was
delivered. If event delivery fails, the kernel periodically
re-tries to send the event to userspace.
This is the maximum interval the kernel should use when re-trying
to deliver the destroy event.
A higher number means there will be fewer delivery retries and it
will take longer for a backlog to be processed.
nf_conntrack_expect_max - INTEGER
Maximum size of expectation table. Default value is
nf_conntrack_buckets / 256. Minimum is 1.
nf_conntrack_frag6_high_thresh - INTEGER
default 262144
Maximum memory used to reassemble IPv6 fragments. When
nf_conntrack_frag6_high_thresh bytes of memory is allocated for this
purpose, the fragment handler will toss packets until
nf_conntrack_frag6_low_thresh is reached.
nf_conntrack_frag6_low_thresh - INTEGER
default 196608
See nf_conntrack_frag6_low_thresh
nf_conntrack_frag6_timeout - INTEGER (seconds)
default 60
Time to keep an IPv6 fragment in memory.
nf_conntrack_generic_timeout - INTEGER (seconds)
default 600
Default for generic timeout. This refers to layer 4 unknown/unsupported
protocols.
nf_conntrack_helper - BOOLEAN
0 - disabled
not 0 - enabled (default)
Enable automatic conntrack helper assignment.
nf_conntrack_icmp_timeout - INTEGER (seconds)
default 30
Default for ICMP timeout.
nf_conntrack_icmpv6_timeout - INTEGER (seconds)
default 30
Default for ICMP6 timeout.
nf_conntrack_log_invalid - INTEGER
0 - disable (default)
1 - log ICMP packets
6 - log TCP packets
17 - log UDP packets
33 - log DCCP packets
41 - log ICMPv6 packets
136 - log UDPLITE packets
255 - log packets of any protocol
Log invalid packets of a type specified by value.
nf_conntrack_max - INTEGER
Size of connection tracking table. Default value is
nf_conntrack_buckets value * 4.
nf_conntrack_tcp_be_liberal - BOOLEAN
0 - disabled (default)
not 0 - enabled
Be conservative in what you do, be liberal in what you accept from others.
If it's non-zero, we mark only out of window RST segments as INVALID.
nf_conntrack_tcp_loose - BOOLEAN
0 - disabled
not 0 - enabled (default)
If it is set to zero, we disable picking up already established
connections.
nf_conntrack_tcp_max_retrans - INTEGER
default 3
Maximum number of packets that can be retransmitted without
received an (acceptable) ACK from the destination. If this number
is reached, a shorter timer will be started.
nf_conntrack_tcp_timeout_close - INTEGER (seconds)
default 10
nf_conntrack_tcp_timeout_close_wait - INTEGER (seconds)
default 60
nf_conntrack_tcp_timeout_established - INTEGER (seconds)
default 432000 (5 days)
nf_conntrack_tcp_timeout_fin_wait - INTEGER (seconds)
default 120
nf_conntrack_tcp_timeout_last_ack - INTEGER (seconds)
default 30
nf_conntrack_tcp_timeout_max_retrans - INTEGER (seconds)
default 300
nf_conntrack_tcp_timeout_syn_recv - INTEGER (seconds)
default 60
nf_conntrack_tcp_timeout_syn_sent - INTEGER (seconds)
default 120
nf_conntrack_tcp_timeout_time_wait - INTEGER (seconds)
default 120
nf_conntrack_tcp_timeout_unacknowledged - INTEGER (seconds)
default 300
nf_conntrack_timestamp - BOOLEAN
0 - disabled (default)
not 0 - enabled
Enable connection tracking flow timestamping.
nf_conntrack_udp_timeout - INTEGER (seconds)
default 30
nf_conntrack_udp_timeout_stream2 - INTEGER (seconds)
default 180
This extended timeout will be used in case there is an UDP stream
detected.
还有多少秒这条会话信息会从跟踪表清除,取决于超时参数的配置,以及是否有包传输,有包传输时,这个时间会重置为超时时间。
四、什么时候会话表会满
当会话表中的记录大于内核设置nf_conntrack_max的值时,会导致会话表满。
nf_conntrack_max - INTEGER
Size of connection tracking table. Default value is
nf_conntrack_buckets value * 4.
错误例子:
less /var/log/messages
Nov 3 23:30:27 digoal_host kernel: : [63500383.870591] nf_conntrack: table full, dropping packet.
Nov 3 23:30:27 digoal_host kernel: : [63500383.962423] nf_conntrack: table full, dropping packet.
Nov 3 23:30:27 digoal_host kernel: : [63500384.060399] nf_conntrack: table full, dropping packet.
五、会话表满的解决办法
nf_conntrack table full的问题,会导致丢包,影响网络质量,严重时甚至导致网络不可用。
解决方法举例:
1、排查是否DDoS攻击,如果是,从预防攻击层面解决问题。
2、清空会话表。
重启iptables,会自动清空nf_conntrack table。注意,重启前先保存当前iptables配置(iptables-save > /etc/sysconfig/iptables ; service iptables restart)。
3、应用程序正常关闭会话
设计应用时,正常关闭会话很重要。
4、加大表的上限(需要考虑内存的消耗)
sysctl -w net.nf_conntrack_max = 10240000
永久生效
vi /etc/sysctl.conf
net.nf_conntrack_max = 10240000
计算方法,参考:
《[转载]解决 nf_conntrack: table full, dropping packet 的几种思路》
5、设置更短的会话跟踪超时时间
查看当前设置:
# sysctl -a|grep netfilter
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 8192
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 5
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.netfilter.nf_conntrack_expect_max = 124
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 1
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 31760
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_log.0 = NONE
net.netfilter.nf_log.1 = NONE
net.netfilter.nf_log.10 = NONE
net.netfilter.nf_log.11 = NONE
net.netfilter.nf_log.12 = NONE
net.netfilter.nf_log.2 = NONE
net.netfilter.nf_log.3 = NONE
net.netfilter.nf_log.4 = NONE
net.netfilter.nf_log.5 = NONE
net.netfilter.nf_log.6 = NONE
net.netfilter.nf_log.7 = NONE
net.netfilter.nf_log.8 = NONE
net.netfilter.nf_log.9 = NONE
修改设置:
建议参考
https://security.stackexchange.com/questions/43205/nf-conntrack-table-full-dropping-packet
The message means your connection tracking table is full.
There are no security implications other than DoS.
You can partially mitigate this by increasing the maximum number of connections being tracked,
reducing the tracking timeouts or by disabling connection tracking altogether,
which is doable on server, but not on a NAT router,
because the latter will cease to function.
单位秒
sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=54000
sysctl -w net.netfilter.nf_conntrack_generic_timeout=120
sysctl -w net.ipv4.netfilter.ip_conntrack_max=<more than currently set>
六、备份和恢复iptables规则
1、保存当前iptables规则到配置文件
iptables-save > /etc/sysconfig/iptables
2、从配置文件,恢复iptables规则
iptables-restore < /etc/sysconfig/iptables
3、启动iptables服务
service iptables start
或
iptables-restore < /etc/sysconfig/iptables
4、关闭iptables服务
彻底关闭
service iptables stop
rmmod iptable_filter
或,使用如下方法清空对应表
iptables -F -t nat
iptables -F -t filter
iptables -F -t raw
iptables -F -t mangle
5、查看当前iptables规则
iptables-save
或
iptables -L -v -n -t filter
iptables -L -v -n -t nat
iptables -L -v -n -t raw
iptables -L -v -n -t mangle
参考
《[转载]解决 nf_conntrack: table full, dropping packet 的几种思路》
《转载 - nf_conntrack: table full, dropping packet. 终结篇》
http://netfilter.org/
/usr/share/doc/kernel-doc-3.10.0/Documentation/networking/nf_conntrack-sysctl.txt
https://stackoverflow.com/questions/20327518/need-to-drop-established-connections-with-iptables
https://security.stackexchange.com/questions/43205/nf-conntrack-table-full-dropping-packet
https://unix.stackexchange.com/questions/127081/conntrack-tcp-timeout-for-state-stablished-not-working
https://en.wikipedia.org/wiki/Netfilter#Connection_tracking
https://unix.stackexchange.com/questions/227259/why-is-proc-net-nf-conntrack-empty
https://stackoverflow.com/questions/16034698/details-of-proc-net-ip-conntrack-nf-conntrack