summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-8/+18
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15netfilter: nf_tables: fix flush after rule deletion in the same batchPablo Neira Ayuso1-0/+3
Flush after rule deletion bogusly hits -ENOENT. Skip rules that have been already from nft_delrule_by_chain() which is always called from the flush path. Fixes: cf9dc09d0949 ("netfilter: nf_tables: fix missing rules flushing per table") Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-13netfilter: nft_compat: use-after-free when deleting targetsPablo Neira Ayuso1-1/+2
Fetch pointer to module before target object is released. Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions") Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12ipvs: fix dependency on nf_defrag_ipv6Andrea Claudi3-6/+15
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation, but lacks proper Kconfig dependencies and does not explicitly request defrag features. As a result, if netfilter hooks are not loaded, when IPv6 fragmented packet are handled by ipvs only the first fragment makes through. Fix it properly declaring the dependency on Kconfig and registering netfilter hooks on ip_vs_add_service() and ip_vs_new_dest(). Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in ↵Chieh-Min Wang1-3/+11
__nf_conntrack_confirm For bridge(br_flood) or broadcast/multicast packets, they could clone skb with unconfirmed conntrack which break the rule that unconfirmed skb->_nfct is never shared. With nfqueue running on my system, the race can be easily reproduced with following warning calltrace: [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P W 4.4.60 #7744 [13257.707568] Hardware name: Qualcomm (Flattened Device Tree) [13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14) [13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8) [13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0) [13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24) [13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618) [13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc) [13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8) [13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0) [13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c) [13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8) [13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0) [13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4) [13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac) [13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170) [13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420) [13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248) [13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0) [13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c) [13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368) [13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44) [13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200) [13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64) [13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34) The original code just triggered the warning but do nothing. It will caused the shared conntrack moves to the dying list and the packet be droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack). - Reproduce steps: +----------------------------+ | br0(bridge) | | | +-+---------+---------+------+ | eth0| | eth1| | eth2| | | | | | | +--+--+ +--+--+ +---+-+ | | | | | | +--+-+ +-+--+ +--+-+ | PC1| | PC2| | PC3| +----+ +----+ +----+ iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass ps: Our nfq userspace program will set mark on packets whose connection has already been processed. PC1 sends broadcast packets simulated by hping3: hping3 --rand-source --udp 192.168.1.255 -i u100 - Broadcast racing flow chart is as follow: br_handle_frame BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish) // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage br_handle_frame_finish // check if this packet is broadcast br_flood_forward br_flood list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port maybe_deliver deliver_clone skb = skb_clone(skb) __br_forward BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...) // queue in our nfq and received by our userspace program // goto __nf_conntrack_confirm with process context on CPU 1 br_pass_frame_up BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...) // goto __nf_conntrack_confirm with softirq context on CPU 0 Because conntrack confirm can happen at both INPUT and POSTROUTING stage. So with NFQUEUE running, skb->_nfct with the same unconfirmed conntrack could race on different core. This patch fixes a repeating kernel splat, now it is only displayed once. Signed-off-by: Chieh-Min Wang <chiehminw@synology.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: xt_recent: Use struct_size() in kvzalloc()Gustavo A. R. Silva1-3/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; size = sizeof(struct foo) + count * sizeof(void *); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); instance = alloc(size, GFP_KERNEL) Notice that, in this case, variable sz is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12ipvs: Use struct_size() helperGustavo A. R. Silva1-4/+2
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: conntrack: fix indentation issueColin Ian King1-1/+1
A statement in an if block is not indented correctly. Fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: compat: initialize all fields in xt_initFrancesco Ruggeri1-1/+1
If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init time, the following panic can be caused by running % ebtables -t broute -F BROUTING from a 32-bit user level on a 64-bit kernel. This patch replaces kmalloc_array with kcalloc when allocating xt. [ 474.680846] BUG: unable to handle kernel paging request at 0000000009600920 [ 474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0 [ 474.693838] Oops: 0000 [#1] SMP [ 474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1 [ 474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013 [ 474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables] [ 474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d [ 474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207 [ 474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249 [ 474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124 [ 474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f [ 474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001 [ 474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8 [ 474.780234] FS: 0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700 [ 474.788612] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0 [ 474.802052] Call Trace: [ 474.804789] compat_do_replace+0x1fb/0x2a3 [ebtables] [ 474.810105] compat_do_ebt_set_ctl+0x69/0xe6 [ebtables] [ 474.815605] ? try_module_get+0x37/0x42 [ 474.819716] compat_nf_setsockopt+0x4f/0x6d [ 474.824172] compat_ip_setsockopt+0x7e/0x8c [ 474.828641] compat_raw_setsockopt+0x16/0x3a [ 474.833220] compat_sock_common_setsockopt+0x1d/0x24 [ 474.838458] __compat_sys_setsockopt+0x17e/0x1b1 [ 474.843343] ? __check_object_size+0x76/0x19a [ 474.847960] __ia32_compat_sys_socketcall+0x1cb/0x25b [ 474.853276] do_fast_syscall_32+0xaf/0xf6 [ 474.857548] entry_SYSENTER_compat+0x6b/0x7a Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-115/+108
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-05netfilter: nft_compat: don't use refcount_inc on newly allocated entryFlorian Westphal1-39/+23
When I moved the refcount to refcount_t type I missed the fact that refcount_inc() will result in use-after-free warning with CONFIG_REFCOUNT_FULL=y builds. The correct fix would be to init the reference count to 1 at allocation time, but, unfortunately we cannot do this, as we can't undo that in case something else fails later in the batch. So only solution I see is to special-case the 'new entry' condition and replace refcount_inc() with a "delayed" refcount_set(1) in this case, as done here. The .activate callback can be removed to simplify things, we only need to make sure that deactivate() decrements/unlinks the entry from the list at end of transaction phase (commit or abort). Fixes: 12c44aba6618 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count") Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: ipv6: avoid indirect calls for IPV6=y caseFlorian Westphal2-15/+7
indirect calls are only needed if ipv6 is a module. Add helpers to abstract the v6ops indirections and use them instead. fragment, reroute and route_input are kept as indirect calls. The first two are not not used in hot path and route_input is only used by bridge netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso6-79/+72
Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nft_tunnel: Add NFTA_TUNNEL_MODE optionswenxu1-2/+32
nft "tunnel" expr match both the tun_info of RX and TX. This patch provide the NFTA_TUNNEL_MODE to individually match the tun_info of RX or TX. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04netfilter: nf_nat: skip nat clash resolution for same-origin entriesMartynas Pumputis1-0/+16
It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. Packet A and B are sent at the same time from two different threads by same UDP socket. No matching conntrack entry exists yet. Both packets cause allocation of a new conntrack entry. 2. get_unique_tuple gets called for A. No clashing entry found. conntrack entry for A is added to main conntrack table. 3. get_unique_tuple is called for B and will find that the reply tuple of B is already taken by A. It will allocate a new UDP source port for B to resolve the clash. 4. conntrack entry for B cannot be added to main conntrack table because its ORIGINAL direction is clashing with A and the REPLY directions of A and B are not the same anymore due to UDP source port reallocation done in step 3. This patch modifies nf_conntrack_tuple_taken so it doesn't consider colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal. [ Florian: simplify patch to not use .allow_clash setting and always ignore identical flows ] Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-47/+158
2019-01-29netfilter: nf_tables: add NFTA_RULE_POSITION_ID to nla_policyFlorian Westphal1-0/+1
Fixes: 75dd48e2e420a ("netfilter: nf_tables: Support RULE_ID reference in new rule") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller43-1567/+1034
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Introduce a hashtable to speed up object lookups, from Florian Westphal. 2) Make direct calls to built-in extension, also from Florian. 3) Call helper before confirming the conntrack as it used to be originally, from Florian. 4) Call request_module() to autoload br_netfilter when physdev is used to relax the dependency, also from Florian. 5) Allow to insert rules at a given position ID that is internal to the batch, from Phil Sutter. 6) Several patches to replace conntrack indirections by direct calls, and to reduce modularization, from Florian. This also includes several follow up patches to deal with minor fallout from this rework. 7) Use RCU from conntrack gre helper, from Florian. 8) GRE conntrack module becomes built-in into nf_conntrack, from Florian. 9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(), from Florian. 10) Unify sysctl handling at the core of nf_conntrack, from Florian. 11) Provide modparam to register conntrack hooks. 12) Allow to match on the interface kind string, from wenxu. 13) Remove several exported symbols, not required anymore now after a bit of de-modulatization work has been done, from Florian. 14) Remove built-in map support in the hash extension, this can be done with the existing userspace infrastructure, from laura. 15) Remove indirection to calculate checksums in IPVS, from Matteo Croce. 16) Use call wrappers for indirection in IPVS, also from Matteo. 17) Remove superfluous __percpu parameter in nft_counter, patch from Luc Van Oostenryck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28netfilter: ipv4: remove useless export_symbolFlorian Westphal1-0/+19
Only one caller; place it where needed and get rid of the EXPORT_SYMBOL. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28netfilter: conntrack: fix error path in nf_conntrack_pernet_init()Cong Wang1-2/+2
When nf_ct_netns_get() fails, it should clean up itself, its caller doesn't need to call nf_conntrack_fini_net(). nf_conntrack_init_net() is called after registering sysctl and proc, so its cleanup function should be called before unregistering sysctl and proc. Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks") Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling") Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28netfilter: nft_counter: remove wrong __percpu of nft_counter_resest()'s argLuc Van Oostenryck1-1/+1
nft_counter_rest() has its first argument declared as struct nft_counter_percpu_priv __percpu *priv but this structure is not percpu (it only countains a member 'counter' which is, correctly, a pointer to a percpu struct nft_counter). So, remove the '__percpu' from the argument's declaration. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28ipvs: use indirect call wrappersMatteo Croce3-10/+45
Use the new indirect call wrappers in IPVS when calling the TCP or UDP protocol specific functions. This avoids an indirect calls in IPVS, and reduces the performance impact of the Spectre mitigation. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28ipvs: avoid indirect calls when calculating checksumsMatteo Croce4-15/+19
The function pointer ip_vs_protocol->csum_check is only used in protocol specific code, and never in the generic one. Remove the function pointer from struct ip_vs_protocol and call the checksum functions directly. This reduces the performance impact of the Spectre mitigation, and should give a small improvement even with RETPOLINES disabled. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28netfilter: nfnetlink_osf: add missing fmatch checkFernando Fernandez Mancera1-0/+4
When we check the tcp options of a packet and it doesn't match the current fingerprint, the tcp packet option pointer must be restored to its initial value in order to do the proper tcp options check for the next fingerprint. Here we can see an example. Assumming the following fingerprint base with two lines: S10:64:1:60:M*,S,T,N,W6: Linux:3.0::Linux 3.0 S20:64:1:60:M*,S,T,N,W7: Linux:4.19:arch:Linux 4.1 Where TCP options are the last field in the OS signature, all of them overlap except by the last one, ie. 'W6' versus 'W7'. In case a packet for Linux 4.19 kicks in, the osf finds no matching because the TCP options pointer is updated after checking for the TCP options in the first line. Therefore, reset pointer back to where it should be. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-24ipvs: Fix signed integer overflow when setsockopt timeoutZhangXiaoxu1-0/+12
There is a UBSAN bug report as below: UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21 signed integer overflow: -2147483647 * 1000 cannot be represented in type 'int' Reproduce program: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #define IPPROTO_IP 0 #define IPPROTO_RAW 255 #define IP_VS_BASE_CTL (64+1024+64) #define IP_VS_SO_SET_TIMEOUT (IP_VS_BASE_CTL+10) /* The argument to IP_VS_SO_GET_TIMEOUT */ struct ipvs_timeout_t { int tcp_timeout; int tcp_fin_timeout; int udp_timeout; }; int main() { int ret = -1; int sockfd = -1; struct ipvs_timeout_t to; sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); if (sockfd == -1) { printf("socket init error\n"); return -1; } to.tcp_timeout = -2147483647; to.tcp_fin_timeout = -2147483647; to.udp_timeout = -2147483647; ret = setsockopt(sockfd, IPPROTO_IP, IP_VS_SO_SET_TIMEOUT, (char *)(&to), sizeof(to)); printf("setsockopt return %d\n", ret); return ret; } Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-23netfilter: conntrack: fix bogus port values for other l4 protocolsFlorian Westphal1-11/+35
We must only extract l4 proto information if we can track the layer 4 protocol. Before removal of pkt_to_tuple callback, the code to extract port information was only reached for TCP/UDP/LITE/DCCP/SCTP. The other protocols were handled by the indirect call, and the 'generic' tracker took care of other protocols that have no notion of 'ports'. After removal of the callback we must be more strict here and only init port numbers for those protocols that have ports. Fixes: df5e1629087a ("netfilter: conntrack: remove pkt_to_tuple callback") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-23netfilter: conntrack: fix IPV6=n buildsFlorian Westphal2-0/+10
Stephen Rothwell reports: After merging the netfilter-next tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: ERROR: "nf_conntrack_invert_icmpv6_tuple" [nf_conntrack.ko] undefined! ERROR: "nf_conntrack_icmpv6_packet" [nf_conntrack.ko] undefined! ERROR: "nf_conntrack_icmpv6_init_net" [nf_conntrack.ko] undefined! ERROR: "icmpv6_pkt_to_tuple" [nf_conntrack.ko] undefined! ERROR: "nf_ct_gre_keymap_destroy" [nf_conntrack.ko] undefined! icmpv6 related errors are due to lack of IS_ENABLED(CONFIG_IPV6) (no icmpv6 support is builtin if kernel has CONFIG_IPV6=n), the nf_ct_gre_keymap_destroy error is due to lack of PROTO_GRE check. Fixes: a47c54048162 ("netfilter: conntrack: handle builtin l4proto packet functions via direct calls") Fixes: e2e48b471634 ("netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls") Fixes: 197c4300aec0 ("netfilter: conntrack: remove invert_tuple callback") Fixes: 2a389de86e4a ("netfilter: conntrack: remove l4proto init and get_net callbacks") Fixes: e56894356f60 ("netfilter: conntrack: remove l4proto destroy hook") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18Revert "netfilter: nft_hash: add map lookups for hashing operations"Laura Garcia Liebana1-121/+0
A better way to implement this from userspace has been found without specific code in the kernel side, revert this. Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations") Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nat: un-export nf_nat_used_tupleFlorian Westphal1-2/+1
Not used since 203f2e78200c27e ("netfilter: nat: remove l4proto->unique_tuple") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nft_meta: Add NFT_META_I/OIFKIND meta typewenxu1-0/+12
In the ip_rcv the skb goes through the PREROUTING hook first, then kicks in vrf device and go through the same hook again. When conntrack dnat works with vrf, there will be some conflict with rules because the packet goes through the hook twice with different nf status. ip link add user1 type vrf table 1 ip link add user2 type vrf table 2 ip l set dev tun1 master user1 ip l set dev tun2 master user2 nft add table firewall nft add chain firewall zones { type filter hook prerouting priority - 300 \; } nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 } nft add chain firewall rule-1000-ingress nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept nft add rule firewall rule-1000-ingress counter drop nft add chain firewall rule-1000-egress nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop nft add rule firewall rule-1000-egress counter accept nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; } nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress } nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress } nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 } nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7 For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong chain rule-1000-egress which leads the packet drop With this patch, userspace can add the 'don't re-do entire ruleset for vrf' policy itself via: nft add rule firewall rules-all meta iifkind "vrf" counter accept Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nf_conntrack: provide modparam to always register conntrack hooksPablo Neira Ayuso1-4/+24
The connection tracking hooks can be optionally registered per netns when conntrack is specifically invoked from the ruleset since 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed"), the default behaviour is changed to always register them on demand. This patch provides a toggle that allows users to always register them. Without this toggle, in order to use conntrack for statistics collection, you need a dummy rule that refers to conntrack, eg. iptables -I INPUT -m state --state NEW This patch allows users to restore the original behaviour via modparam, ie. always register connection tracking, eg. modprobe nf_conntrack enable_hooks=1 Hence, no dummy rule is required. Reported-by: Laura Garcia <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove nf_ct_l4proto_find_getFlorian Westphal9-174/+43
Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove l4proto destroy hookFlorian Westphal2-18/+11
Only one user (gre), add a direct call and remove this facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove l4proto init and get_net callbacksFlorian Westphal9-217/+56
Those were needed we still had modular trackers. As we don't have those anymore, prefer direct calls and remove all the (un)register infrastructure associated with this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove sysctl registration helpersFlorian Westphal1-76/+1
After previous patch these are not used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: unify sysctl handlingFlorian Westphal9-460/+391
Due to historical reasons, all l4 trackers register their own sysctls. This leads to copy&pasted boilerplate code, that does exactly same thing, just with different data structure. Place all of this in a single file. This allows to remove the various ctl_table pointers from the ct_netns structure and reduces overall code size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: avoid unneeded nf_conntrack_l4proto lookupsFlorian Westphal5-61/+18
after removal of the packet and invert function pointers, several places do not need to lookup the l4proto structure anymore. Remove those lookups. The function nf_ct_invert_tuplepr becomes redundant, replace it with nf_ct_invert_tuple everywhere. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove pernet l4 proto register interfaceFlorian Westphal1-16/+12
No used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove remaining l4proto indirect packet callsFlorian Westphal3-48/+24
Now that all l4trackers are builtin, no need to use a mix of direct and indirect calls. This removes the last two users: gre and the generic l4 protocol tracker. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove module owner fieldFlorian Westphal4-17/+0
No need to get/put module owner reference, none of these can be removed anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove invert_tuple callbackFlorian Westphal3-8/+10
Only used by icmp(v6). Prefer a direct call and remove this function from the l4proto struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove pkt_to_tuple callbackFlorian Westphal3-16/+6
GRE is now builtin, so we can handle it via direct call and remove the callback. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove net_idFlorian Westphal1-6/+2
No users anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: gre: switch module to be built-inFlorian Westphal5-85/+27
This makes the last of the modular l4 trackers 'bool'. After this, all infrastructure to handle dynamic l4 protocol registration becomes obsolete and can be removed in followup patches. Old: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko New: 313728 net/netfilter/nf_conntrack.ko Old: text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko New: 112095 21381 240 133716 20a54 nf_conntrack.ko The size increase is only temporary. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: gre: convert rwlock to rcuFlorian Westphal1-22/+15
We can use gre. Lock is only needed when a new expectation is added. In case a single spinlock proves to be problematic we can either add one per netns or use an array of locks combined with net_hash_mix() or similar to pick the 'correct' one. But given this is only needed for an expectation rather than per packet a single one should be ok. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: handle icmp pkt_to_tuple helper via direct callsFlorian Westphal3-8/+12
rather than handling them via indirect call, use a direct one instead. This leaves GRE as the last user of this indirect call facility. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: handle builtin l4proto packet functions via direct callsFlorian Westphal7-44/+76
The l4 protocol trackers are invoked via indirect call: l4proto->packet(). With one exception (gre), all l4trackers are builtin, so we can make .packet optional and use a direct call for most protocols. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nf_tables: Support RULE_ID reference in new rulePhil Sutter1-0/+9
To allow for a batch to contain rules in arbitrary ordering, introduce NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION but contains the ID of another rule within the same batch. This helps iptables-nft-restore handling dumps with mixed insert/append commands correctly. Note that NFTA_RULE_POSITION takes precedence over NFTA_RULE_POSITION_ID, so if the former is present, the latter is ignored. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: physdev: relax br_netfilter dependencyFlorian Westphal1-2/+7
Following command: iptables -D FORWARD -m physdev ... causes connectivity loss in some setups. Reason is that iptables userspace will probe kernel for the module revision of the physdev patch, and physdev has an artificial dependency on br_netfilter (xt_physdev use makes no sense unless a br_netfilter module is loaded). This causes the "phydev" module to be loaded, which in turn enables the "call-iptables" infrastructure. bridged packets might then get dropped by the iptables ruleset. The better fix would be to change the "call-iptables" defaults to 0 and enforce explicit setting to 1, but that breaks backwards compatibility. This does the next best thing: add a request_module call to checkentry. This was a stray '-D ... -m physdev' won't activate br_netfilter anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove helper hook againFlorian Westphal1-106/+36
place them into the confirm one. Old: hook (300): ipv4/6_help() first call helper, then seqadj. hook (INT_MAX): confirm Now: hook (INT_MAX): confirm, first call helper, then seqadj, then confirm Not having the extra call is noticeable in bechmarks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>