summaryrefslogtreecommitdiff
path: root/net/netfilter
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-03-13 03:44:08 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2024-03-13 03:44:08 +0300
commit9187210eee7d87eea37b45ea93454a88681894a4 (patch)
tree31b4610e62cdd5e1dfb700014aa619e41145d7d3 /net/netfilter
parent1f440397665f4241346e4cc6d93f8b73880815d1 (diff)
parented1f164038b50c5864aa85389f3ffd456f050cca (diff)
downloadlinux-9187210eee7d87eea37b45ea93454a88681894a4.tar.xz
Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
Diffstat (limited to 'net/netfilter')
-rw-r--r--net/netfilter/Kconfig12
-rw-r--r--net/netfilter/ipvs/ip_vs_conn.c4
-rw-r--r--net/netfilter/nf_bpf_link.c2
-rw-r--r--net/netfilter/nf_conncount.c8
-rw-r--r--net/netfilter/nf_conntrack_bpf.c4
-rw-r--r--net/netfilter/nf_conntrack_core.c2
-rw-r--r--net/netfilter/nf_conntrack_expect.c4
-rw-r--r--net/netfilter/nf_log.c9
-rw-r--r--net/netfilter/nf_nat_bpf.c4
-rw-r--r--net/netfilter/nf_queue.c106
-rw-r--r--net/netfilter/nf_synproxy_core.c2
-rw-r--r--net/netfilter/nf_tables_api.c35
-rw-r--r--net/netfilter/nfnetlink_queue.c142
-rw-r--r--net/netfilter/nft_osf.c11
-rw-r--r--net/netfilter/nft_set_pipapo.c193
-rw-r--r--net/netfilter/nft_set_pipapo.h37
-rw-r--r--net/netfilter/nft_set_pipapo_avx2.c59
-rw-r--r--net/netfilter/utils.c37
-rw-r--r--net/netfilter/x_tables.c3
19 files changed, 385 insertions, 289 deletions
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 441d1f134110..df2dc21304ef 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -818,7 +818,7 @@ config NETFILTER_XT_TARGET_AUDIT
config NETFILTER_XT_TARGET_CHECKSUM
tristate "CHECKSUM target support"
- depends on IP_NF_MANGLE || IP6_NF_MANGLE
+ depends on IP_NF_MANGLE || IP6_NF_MANGLE || NFT_COMPAT
depends on NETFILTER_ADVANCED
help
This option adds a `CHECKSUM' target, which can be used in the iptables mangle
@@ -869,7 +869,7 @@ config NETFILTER_XT_TARGET_CONNSECMARK
config NETFILTER_XT_TARGET_CT
tristate '"CT" target support'
depends on NF_CONNTRACK
- depends on IP_NF_RAW || IP6_NF_RAW
+ depends on IP_NF_RAW || IP6_NF_RAW || NFT_COMPAT
depends on NETFILTER_ADVANCED
help
This options adds a `CT' target, which allows to specify initial
@@ -880,7 +880,7 @@ config NETFILTER_XT_TARGET_CT
config NETFILTER_XT_TARGET_DSCP
tristate '"DSCP" and "TOS" target support'
- depends on IP_NF_MANGLE || IP6_NF_MANGLE
+ depends on IP_NF_MANGLE || IP6_NF_MANGLE || NFT_COMPAT
depends on NETFILTER_ADVANCED
help
This option adds a `DSCP' target, which allows you to manipulate
@@ -896,7 +896,7 @@ config NETFILTER_XT_TARGET_DSCP
config NETFILTER_XT_TARGET_HL
tristate '"HL" hoplimit target support'
- depends on IP_NF_MANGLE || IP6_NF_MANGLE
+ depends on IP_NF_MANGLE || IP6_NF_MANGLE || NFT_COMPAT
depends on NETFILTER_ADVANCED
help
This option adds the "HL" (for IPv6) and "TTL" (for IPv4)
@@ -1080,7 +1080,7 @@ config NETFILTER_XT_TARGET_TPROXY
depends on NETFILTER_ADVANCED
depends on IPV6 || IPV6=n
depends on IP6_NF_IPTABLES || IP6_NF_IPTABLES=n
- depends on IP_NF_MANGLE
+ depends on IP_NF_MANGLE || NFT_COMPAT
select NF_DEFRAG_IPV4
select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES != n
select NF_TPROXY_IPV4
@@ -1147,7 +1147,7 @@ config NETFILTER_XT_TARGET_TCPMSS
config NETFILTER_XT_TARGET_TCPOPTSTRIP
tristate '"TCPOPTSTRIP" target support'
- depends on IP_NF_MANGLE || IP6_NF_MANGLE
+ depends on IP_NF_MANGLE || IP6_NF_MANGLE || NFT_COMPAT
depends on NETFILTER_ADVANCED
help
This option adds a "TCPOPTSTRIP" target, which allows you to strip
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a743db073887..98d7dbe3d787 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1511,9 +1511,7 @@ int __init ip_vs_conn_init(void)
return -ENOMEM;
/* Allocate ip_vs_conn slab cache */
- ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
- sizeof(struct ip_vs_conn), 0,
- SLAB_HWCACHE_ALIGN, NULL);
+ ip_vs_conn_cachep = KMEM_CACHE(ip_vs_conn, SLAB_HWCACHE_ALIGN);
if (!ip_vs_conn_cachep) {
kvfree(ip_vs_conn_tab);
return -ENOMEM;
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index 0e4beae421f8..5257d5e7eb09 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
static const struct bpf_func_proto *
bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
- return bpf_base_func_proto(func_id);
+ return bpf_base_func_proto(func_id, prog);
}
const struct bpf_verifier_ops netfilter_verifier_ops = {
diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 5d8ed6c90b7e..8715617b02fe 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -605,15 +605,11 @@ static int __init nf_conncount_modinit(void)
for (i = 0; i < CONNCOUNT_SLOTS; ++i)
spin_lock_init(&nf_conncount_locks[i]);
- conncount_conn_cachep = kmem_cache_create("nf_conncount_tuple",
- sizeof(struct nf_conncount_tuple),
- 0, 0, NULL);
+ conncount_conn_cachep = KMEM_CACHE(nf_conncount_tuple, 0);
if (!conncount_conn_cachep)
return -ENOMEM;
- conncount_rb_cachep = kmem_cache_create("nf_conncount_rb",
- sizeof(struct nf_conncount_rb),
- 0, 0, NULL);
+ conncount_rb_cachep = KMEM_CACHE(nf_conncount_rb, 0);
if (!conncount_rb_cachep) {
kmem_cache_destroy(conncount_conn_cachep);
return -ENOMEM;
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 475358ec8212..d2492d050fe6 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -467,7 +467,7 @@ __bpf_kfunc int bpf_ct_change_status(struct nf_conn *nfct, u32 status)
__bpf_kfunc_end_defs();
-BTF_SET8_START(nf_ct_kfunc_set)
+BTF_KFUNCS_START(nf_ct_kfunc_set)
BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
@@ -478,7 +478,7 @@ BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS)
-BTF_SET8_END(nf_ct_kfunc_set)
+BTF_KFUNCS_END(nf_ct_kfunc_set)
static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
.owner = THIS_MODULE,
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5b876fa7f9af..c63868666bd9 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2530,7 +2530,7 @@ void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list)
* netfilter framework. Roll on, two-stage module
* delete...
*/
- synchronize_net();
+ synchronize_rcu_expedited();
i_see_dead_people:
busy = 0;
list_for_each_entry(net, net_exit_list, exit_list) {
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 81ca348915c9..21fa550966f0 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -722,9 +722,7 @@ int nf_conntrack_expect_init(void)
nf_ct_expect_hsize = 1;
}
nf_ct_expect_max = nf_ct_expect_hsize * 4;
- nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect",
- sizeof(struct nf_conntrack_expect),
- 0, 0, NULL);
+ nf_ct_expect_cachep = KMEM_CACHE(nf_conntrack_expect, 0);
if (!nf_ct_expect_cachep)
return -ENOMEM;
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index e16f158388bb..370f8231385c 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -31,10 +31,10 @@ static struct nf_logger *__find_logger(int pf, const char *str_logger)
int i;
for (i = 0; i < NF_LOG_TYPE_MAX; i++) {
- if (loggers[pf][i] == NULL)
+ log = nft_log_dereference(loggers[pf][i]);
+ if (!log)
continue;
- log = nft_log_dereference(loggers[pf][i]);
if (!strncasecmp(str_logger, log->name, strlen(log->name)))
return log;
}
@@ -156,6 +156,11 @@ int nf_logger_find_get(int pf, enum nf_log_type type)
struct nf_logger *logger;
int ret = -ENOENT;
+ if (pf >= ARRAY_SIZE(loggers))
+ return -EINVAL;
+ if (type >= NF_LOG_TYPE_MAX)
+ return -EINVAL;
+
if (pf == NFPROTO_INET) {
ret = nf_logger_find_get(NFPROTO_IPV4, type);
if (ret < 0)
diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c
index 6e3b2f58855f..481be15609b1 100644
--- a/net/netfilter/nf_nat_bpf.c
+++ b/net/netfilter/nf_nat_bpf.c
@@ -54,9 +54,9 @@ __bpf_kfunc int bpf_ct_set_nat_info(struct nf_conn___init *nfct,
__bpf_kfunc_end_defs();
-BTF_SET8_START(nf_nat_kfunc_set)
+BTF_KFUNCS_START(nf_nat_kfunc_set)
BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS)
-BTF_SET8_END(nf_nat_kfunc_set)
+BTF_KFUNCS_END(nf_nat_kfunc_set)
static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = {
.owner = THIS_MODULE,
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index e2f334f70281..7f12e56e6e52 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -248,109 +248,3 @@ int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
return 0;
}
EXPORT_SYMBOL_GPL(nf_queue);
-
-static unsigned int nf_iterate(struct sk_buff *skb,
- struct nf_hook_state *state,
- const struct nf_hook_entries *hooks,
- unsigned int *index)
-{
- const struct nf_hook_entry *hook;
- unsigned int verdict, i = *index;
-
- while (i < hooks->num_hook_entries) {
- hook = &hooks->hooks[i];
-repeat:
- verdict = nf_hook_entry_hookfn(hook, skb, state);
- if (verdict != NF_ACCEPT) {
- *index = i;
- if (verdict != NF_REPEAT)
- return verdict;
- goto repeat;
- }
- i++;
- }
-
- *index = i;
- return NF_ACCEPT;
-}
-
-static struct nf_hook_entries *nf_hook_entries_head(const struct net *net, u8 pf, u8 hooknum)
-{
- switch (pf) {
-#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
- case NFPROTO_BRIDGE:
- return rcu_dereference(net->nf.hooks_bridge[hooknum]);
-#endif
- case NFPROTO_IPV4:
- return rcu_dereference(net->nf.hooks_ipv4[hooknum]);
- case NFPROTO_IPV6:
- return rcu_dereference(net->nf.hooks_ipv6[hooknum]);
- default:
- WARN_ON_ONCE(1);
- return NULL;
- }
-
- return NULL;
-}
-
-/* Caller must hold rcu read-side lock */
-void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
-{
- const struct nf_hook_entry *hook_entry;
- const struct nf_hook_entries *hooks;
- struct sk_buff *skb = entry->skb;
- const struct net *net;
- unsigned int i;
- int err;
- u8 pf;
-
- net = entry->state.net;
- pf = entry->state.pf;
-
- hooks = nf_hook_entries_head(net, pf, entry->state.hook);
-
- i = entry->hook_index;
- if (WARN_ON_ONCE(!hooks || i >= hooks->num_hook_entries)) {
- kfree_skb(skb);
- nf_queue_entry_free(entry);
- return;
- }
-
- hook_entry = &hooks->hooks[i];
-
- /* Continue traversal iff userspace said ok... */
- if (verdict == NF_REPEAT)
- verdict = nf_hook_entry_hookfn(hook_entry, skb, &entry->state);
-
- if (verdict == NF_ACCEPT) {
- if (nf_reroute(skb, entry) < 0)
- verdict = NF_DROP;
- }
-
- if (verdict == NF_ACCEPT) {
-next_hook:
- ++i;
- verdict = nf_iterate(skb, &entry->state, hooks, &i);
- }
-
- switch (verdict & NF_VERDICT_MASK) {
- case NF_ACCEPT:
- case NF_STOP:
- local_bh_disable();
- entry->state.okfn(entry->state.net, entry->state.sk, skb);
- local_bh_enable();
- break;
- case NF_QUEUE:
- err = nf_queue(skb, &entry->state, i, verdict);
- if (err == 1)
- goto next_hook;
- break;
- case NF_STOLEN:
- break;
- default:
- kfree_skb(skb);
- }
-
- nf_queue_entry_free(entry);
-}
-EXPORT_SYMBOL(nf_reinject);
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index fbbc4fd37349..5b140c12b7df 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -800,7 +800,7 @@ synproxy_build_ip_ipv6(struct net *net, struct sk_buff *skb,
skb_reset_network_header(skb);
iph = skb_put(skb, sizeof(*iph));
ip6_flow_hdr(iph, 0, 0);
- iph->hop_limit = net->ipv6.devconf_all->hop_limit;
+ iph->hop_limit = READ_ONCE(net->ipv6.devconf_all->hop_limit);
iph->nexthdr = IPPROTO_TCP;
iph->saddr = *saddr;
iph->daddr = *daddr;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 1683dc196b59..e93f905e60b6 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1195,8 +1195,10 @@ static void nf_tables_table_disable(struct net *net, struct nft_table *table)
#define __NFT_TABLE_F_INTERNAL (NFT_TABLE_F_MASK + 1)
#define __NFT_TABLE_F_WAS_DORMANT (__NFT_TABLE_F_INTERNAL << 0)
#define __NFT_TABLE_F_WAS_AWAKEN (__NFT_TABLE_F_INTERNAL << 1)
+#define __NFT_TABLE_F_WAS_ORPHAN (__NFT_TABLE_F_INTERNAL << 2)
#define __NFT_TABLE_F_UPDATE (__NFT_TABLE_F_WAS_DORMANT | \
- __NFT_TABLE_F_WAS_AWAKEN)
+ __NFT_TABLE_F_WAS_AWAKEN | \
+ __NFT_TABLE_F_WAS_ORPHAN)
static int nf_tables_updtable(struct nft_ctx *ctx)
{
@@ -1216,8 +1218,11 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
if ((nft_table_has_owner(ctx->table) &&
!(flags & NFT_TABLE_F_OWNER)) ||
- (!nft_table_has_owner(ctx->table) &&
- flags & NFT_TABLE_F_OWNER))
+ (flags & NFT_TABLE_F_OWNER &&
+ !nft_table_is_orphan(ctx->table)))
+ return -EOPNOTSUPP;
+
+ if ((flags ^ ctx->table->flags) & NFT_TABLE_F_PERSIST)
return -EOPNOTSUPP;
/* No dormant off/on/off/on games in single transaction */
@@ -1246,6 +1251,13 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
}
}
+ if ((flags & NFT_TABLE_F_OWNER) &&
+ !nft_table_has_owner(ctx->table)) {
+ ctx->table->nlpid = ctx->portid;
+ ctx->table->flags |= NFT_TABLE_F_OWNER |
+ __NFT_TABLE_F_WAS_ORPHAN;
+ }
+
nft_trans_table_update(trans) = true;
nft_trans_commit_list_add_tail(ctx->net, trans);
@@ -4238,23 +4250,18 @@ static bool nft_set_ops_candidate(const struct nft_set_type *type, u32 flags)
* given, in that case the amount of memory per element is used.
*/
static const struct nft_set_ops *
-nft_select_set_ops(const struct nft_ctx *ctx,
- const struct nlattr * const nla[],
+nft_select_set_ops(const struct nft_ctx *ctx, u32 flags,
const struct nft_set_desc *desc)
{
struct nftables_pernet *nft_net = nft_pernet(ctx->net);
const struct nft_set_ops *ops, *bops;
struct nft_set_estimate est, best;
const struct nft_set_type *type;
- u32 flags = 0;
int i;
lockdep_assert_held(&nft_net->commit_mutex);
lockdep_nfnl_nft_mutex_not_held();
- if (nla[NFTA_SET_FLAGS] != NULL)
- flags = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS]));
-
bops = NULL;
best.size = ~0;
best.lookup = ~0;
@@ -5146,7 +5153,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
return -ENOENT;
- ops = nft_select_set_ops(&ctx, nla, &desc);
+ ops = nft_select_set_ops(&ctx, flags, &desc);
if (IS_ERR(ops))
return PTR_ERR(ops);
@@ -10435,6 +10442,10 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
} else if (trans->ctx.table->flags & __NFT_TABLE_F_WAS_AWAKEN) {
trans->ctx.table->flags &= ~NFT_TABLE_F_DORMANT;
}
+ if (trans->ctx.table->flags & __NFT_TABLE_F_WAS_ORPHAN) {
+ trans->ctx.table->flags &= ~NFT_TABLE_F_OWNER;
+ trans->ctx.table->nlpid = 0;
+ }
trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE;
nft_trans_destroy(trans);
} else {
@@ -11361,6 +11372,10 @@ again:
list_for_each_entry(table, &nft_net->tables, list) {
if (nft_table_has_owner(table) &&
n->portid == table->nlpid) {
+ if (table->flags & NFT_TABLE_F_PERSIST) {
+ table->flags &= ~NFT_TABLE_F_OWNER;
+ continue;
+ }
__nft_release_hook(net, table);
list_del_rcu(&table->list);
to_delete[deleted++] = table;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 5cf38fc0a366..00f4bd21c59b 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -225,6 +225,148 @@ find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
return entry;
}
+static unsigned int nf_iterate(struct sk_buff *skb,
+ struct nf_hook_state *state,
+ const struct nf_hook_entries *hooks,
+ unsigned int *index)
+{
+ const struct nf_hook_entry *hook;
+ unsigned int verdict, i = *index;
+
+ while (i < hooks->num_hook_entries) {
+ hook = &hooks->hooks[i];
+repeat:
+ verdict = nf_hook_entry_hookfn(hook, skb, state);
+ if (verdict != NF_ACCEPT) {
+ *index = i;
+ if (verdict != NF_REPEAT)
+ return verdict;
+ goto repeat;
+ }
+ i++;
+ }
+
+ *index = i;
+ return NF_ACCEPT;
+}
+
+static struct nf_hook_entries *nf_hook_entries_head(const struct net *net, u8 pf, u8 hooknum)
+{
+ switch (pf) {
+#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
+ case NFPROTO_BRIDGE:
+ return rcu_dereference(net->nf.hooks_bridge[hooknum]);
+#endif
+ case NFPROTO_IPV4:
+ return rcu_dereference(net->nf.hooks_ipv4[hooknum]);
+ case NFPROTO_IPV6:
+ return rcu_dereference(net->nf.hooks_ipv6[hooknum]);
+ default:
+ WARN_ON_ONCE(1);
+ return NULL;
+ }
+
+ return NULL;
+}
+
+static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry)
+{
+#ifdef CONFIG_INET
+ const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry);
+
+ if (entry->state.hook == NF_INET_LOCAL_OUT) {
+ const struct iphdr *iph = ip_hdr(skb);
+
+ if (!(iph->tos == rt_info->tos &&
+ skb->mark == rt_info->mark &&
+ iph->daddr == rt_info->daddr &&
+ iph->saddr == rt_info->saddr))
+ return ip_route_me_harder(entry->state.net, entry->state.sk,
+ skb, RTN_UNSPEC);
+ }
+#endif
+ return 0;
+}
+
+static int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry)
+{
+ const struct nf_ipv6_ops *v6ops;
+ int ret = 0;
+
+ switch (entry->state.pf) {
+ case AF_INET:
+ ret = nf_ip_reroute(skb, entry);
+ break;
+ case AF_INET6:
+ v6ops = rcu_dereference(nf_ipv6_ops);
+ if (v6ops)
+ ret = v6ops->reroute(skb, entry);
+ break;
+ }
+ return ret;
+}
+
+/* caller must hold rcu read-side lock */
+static void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
+{
+ const struct nf_hook_entry *hook_entry;
+ const struct nf_hook_entries *hooks;
+ struct sk_buff *skb = entry->skb;
+ const struct net *net;
+ unsigned int i;
+ int err;
+ u8 pf;
+
+ net = entry->state.net;
+ pf = entry->state.pf;
+
+ hooks = nf_hook_entries_head(net, pf, entry->state.hook);
+
+ i = entry->hook_index;
+ if (WARN_ON_ONCE(!hooks || i >= hooks->num_hook_entries)) {
+ kfree_skb_reason(skb, SKB_DROP_REASON_NETFILTER_DROP);
+ nf_queue_entry_free(entry);
+ return;
+ }
+
+ hook_entry = &hooks->hooks[i];
+
+ /* Continue traversal iff userspace said ok... */
+ if (verdict == NF_REPEAT)
+ verdict = nf_hook_entry_hookfn(hook_entry, skb, &entry->state);
+
+ if (verdict == NF_ACCEPT) {
+ if (nf_reroute(skb, entry) < 0)
+ verdict = NF_DROP;
+ }
+
+ if (verdict == NF_ACCEPT) {
+next_hook:
+ ++i;
+ verdict = nf_iterate(skb, &entry->state, hooks, &i);
+ }
+
+ switch (verdict & NF_VERDICT_MASK) {
+ case NF_ACCEPT:
+ case NF_STOP:
+ local_bh_disable();
+ entry->state.okfn(entry->state.net, entry->state.sk, skb);
+ local_bh_enable();
+ break;
+ case NF_QUEUE:
+ err = nf_queue(skb, &entry->state, i, verdict);
+ if (err == 1)
+ goto next_hook;
+ break;
+ case NF_STOLEN:
+ break;
+ default:
+ kfree_skb(skb);
+ }
+
+ nf_queue_entry_free(entry);
+}
+
static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
{
const struct nf_ct_hook *ct_hook;
diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c
index 7f61506e5b44..7fec57ff736f 100644
--- a/net/netfilter/nft_osf.c
+++ b/net/netfilter/nft_osf.c
@@ -63,7 +63,6 @@ static int nft_osf_init(const struct nft_ctx *ctx,
{
struct nft_osf *priv = nft_expr_priv(expr);
u32 flags;
- int err;
u8 ttl;
if (!tb[NFTA_OSF_DREG])
@@ -83,13 +82,9 @@ static int nft_osf_init(const struct nft_ctx *ctx,
priv->flags = flags;
}
- err = nft_parse_register_store(ctx, tb[NFTA_OSF_DREG], &priv->dreg,
- NULL, NFT_DATA_VALUE,
- NFT_OSF_MAXGENRELEN);
- if (err < 0)
- return err;
-
- return 0;
+ return nft_parse_register_store(ctx, tb[NFTA_OSF_DREG], &priv->dreg,
+ NULL, NFT_DATA_VALUE,
+ NFT_OSF_MAXGENRELEN);
}
static int nft_osf_dump(struct sk_buff *skb,
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index aa1d9e93a9a0..c0ceea068936 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -359,11 +359,13 @@
*
* Return: -1 on no match, bit position on 'match_only', 0 otherwise.
*/
-int pipapo_refill(unsigned long *map, int len, int rules, unsigned long *dst,
- union nft_pipapo_map_bucket *mt, bool match_only)
+int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules,
+ unsigned long *dst,
+ const union nft_pipapo_map_bucket *mt, bool match_only)
{
unsigned long bitset;
- int k, ret = -1;
+ unsigned int k;
+ int ret = -1;
for (k = 0; k < len; k++) {
bitset = map[k];
@@ -412,9 +414,9 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
struct nft_pipapo_scratch *scratch;
unsigned long *res_map, *fill_map;
u8 genmask = nft_genmask_cur(net);
+ const struct nft_pipapo_match *m;
+ const struct nft_pipapo_field *f;
const u8 *rp = (const u8 *)key;
- struct nft_pipapo_match *m;
- struct nft_pipapo_field *f;
bool map_index;
int i;
@@ -505,6 +507,7 @@ out:
* @data: Key data to be matched against existing elements
* @genmask: If set, check that element is active in given genmask
* @tstamp: timestamp to check for expired elements
+ * @gfp: the type of memory to allocate (see kmalloc).
*
* This is essentially the same as the lookup function, except that it matches
* key data against the uncommitted copy and doesn't use preallocated maps for
@@ -515,22 +518,26 @@ out:
static struct nft_pipapo_elem *pipapo_get(const struct net *net,
const struct nft_set *set,
const u8 *data, u8 genmask,
- u64 tstamp)
+ u64 tstamp, gfp_t gfp)
{
struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
struct nft_pipapo *priv = nft_set_priv(set);
- struct nft_pipapo_match *m = priv->clone;
unsigned long *res_map, *fill_map = NULL;
- struct nft_pipapo_field *f;
+ const struct nft_pipapo_match *m;
+ const struct nft_pipapo_field *f;
int i;
- res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), GFP_ATOMIC);
+ m = priv->clone;
+ if (m->bsize_max == 0)
+ return ret;
+
+ res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), gfp);
if (!res_map) {
ret = ERR_PTR(-ENOMEM);
goto out;
}
- fill_map = kcalloc(m->bsize_max, sizeof(*res_map), GFP_ATOMIC);
+ fill_map = kcalloc(m->bsize_max, sizeof(*res_map), gfp);
if (!fill_map) {
ret = ERR_PTR(-ENOMEM);
goto out;
@@ -608,7 +615,8 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
struct nft_pipapo_elem *e;
e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
- nft_genmask_cur(net), get_jiffies_64());
+ nft_genmask_cur(net), get_jiffies_64(),
+ GFP_ATOMIC);
if (IS_ERR(e))
return ERR_CAST(e);
@@ -616,6 +624,65 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
}
/**
+ * pipapo_realloc_mt() - Reallocate mapping table if needed upon resize
+ * @f: Field containing mapping table
+ * @old_rules: Amount of existing mapped rules
+ * @rules: Amount of new rules to map
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int pipapo_realloc_mt(struct nft_pipapo_field *f,
+ unsigned int old_rules, unsigned int rules)
+{
+ union nft_pipapo_map_bucket *new_mt = NULL, *old_mt = f->mt;
+ const unsigned int extra = PAGE_SIZE / sizeof(*new_mt);
+ unsigned int rules_alloc = rules;
+
+ might_sleep();
+
+ if (unlikely(rules == 0))
+ goto out_free;
+
+ /* growing and enough space left, no action needed */
+ if (rules > old_rules && f->rules_alloc > rules)
+ return 0;
+
+ /* downsize and extra slack has not grown too large */
+ if (rules < old_rules) {
+ unsigned int remove = f->rules_alloc - rules;
+
+ if (remove < (2u * extra))
+ return 0;
+ }
+
+ /* If set needs more than one page of memory for rules then
+ * allocate another extra page to avoid frequent reallocation.
+ */
+ if (rules > extra &&
+ check_add_overflow(rules, extra, &rules_alloc))
+ return -EOVERFLOW;
+
+ new_mt = kvmalloc_array(rules_alloc, sizeof(*new_mt), GFP_KERNEL);
+ if (!new_mt)
+ return -ENOMEM;
+
+ if (old_mt)
+ memcpy(new_mt, old_mt, min(old_rules, rules) * sizeof(*new_mt));
+
+ if (rules > old_rules) {
+ memset(new_mt + old_rules, 0,
+ (rules - old_rules) * sizeof(*new_mt));
+ }
+out_free:
+ f->rules_alloc = rules_alloc;
+ f->mt = new_mt;
+
+ kvfree(old_mt);
+
+ return 0;
+}
+
+/**
* pipapo_resize() - Resize lookup or mapping table, or both
* @f: Field containing lookup and mapping tables
* @old_rules: Previous amount of rules in field
@@ -627,12 +694,15 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
*
* Return: 0 on success, -ENOMEM on allocation failure.
*/
-static int pipapo_resize(struct nft_pipapo_field *f, int old_rules, int rules)
+static int pipapo_resize(struct nft_pipapo_field *f,
+ unsigned int old_rules, unsigned int rules)
{
long *new_lt = NULL, *new_p, *old_lt = f->lt, *old_p;
- union nft_pipapo_map_bucket *new_mt, *old_mt = f->mt;
- size_t new_bucket_size, copy;
- int group, bucket;
+ unsigned int new_bucket_size, copy;
+ int group, bucket, err;
+
+ if (rules >= NFT_PIPAPO_RULE0_MAX)
+ return -ENOSPC;
new_bucket_size = DIV_ROUND_UP(rules, BITS_PER_LONG);
#ifdef NFT_PIPAPO_ALIGN
@@ -672,27 +742,18 @@ static int pipapo_resize(struct nft_pipapo_field *f, int old_rules, int rules)
}
mt:
- new_mt = kvmalloc(rules * sizeof(*new_mt), GFP_KERNEL);
- if (!new_mt) {
+ err = pipapo_realloc_mt(f, old_rules, rules);
+ if (err) {
kvfree(new_lt);
- return -ENOMEM;
- }
-
- memcpy(new_mt, f->mt, min(old_rules, rules) * sizeof(*new_mt));
- if (rules > old_rules) {
- memset(new_mt + old_rules, 0,
- (rules - old_rules) * sizeof(*new_mt));
+ return err;
}
if (new_lt) {
f->bsize = new_bucket_size;
- NFT_PIPAPO_LT_ASSIGN(f, new_lt);
+ f->lt = new_lt;
kvfree(old_lt);
}
- f->mt = new_mt;
- kvfree(old_mt);
-
return 0;
}
@@ -843,8 +904,8 @@ static void pipapo_lt_8b_to_4b(int old_groups, int bsize,
*/
static void pipapo_lt_bits_adjust(struct nft_pipapo_field *f)
{
+ unsigned int groups, bb;
unsigned long *new_lt;
- int groups, bb;
size_t lt_size;
lt_size = f->groups * NFT_PIPAPO_BUCKETS(f->bb) * f->bsize *
@@ -894,7 +955,7 @@ static void pipapo_lt_bits_adjust(struct nft_pipapo_field *f)
f->groups = groups;
f->bb = bb;
kvfree(f->lt);
- NFT_PIPAPO_LT_ASSIGN(f, new_lt);
+ f->lt = new_lt;
}
/**
@@ -911,7 +972,7 @@ static void pipapo_lt_bits_adjust(struct nft_pipapo_field *f)
static int pipapo_insert(struct nft_pipapo_field *f, const uint8_t *k,
int mask_bits)
{
- int rule = f->rules, group, ret, bit_offset = 0;
+ unsigned int rule = f->rules, group, ret, bit_offset = 0;
ret = pipapo_resize(f, f->rules, f->rules + 1);
if (ret)
@@ -1216,7 +1277,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
else
end = start;
- dup = pipapo_get(net, set, start, genmask, tstamp);
+ dup = pipapo_get(net, set, start, genmask, tstamp, GFP_KERNEL);
if (!IS_ERR(dup)) {
/* Check if we already have the same exact entry */
const struct nft_data *dup_key, *dup_end;
@@ -1238,7 +1299,8 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
if (PTR_ERR(dup) == -ENOENT) {
/* Look for partially overlapping entries */
- dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp);
+ dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp,
+ GFP_KERNEL);
}
if (PTR_ERR(dup) != -ENOENT) {
@@ -1251,8 +1313,14 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
/* Validate */
start_p = start;
end_p = end;
+
+ /* some helpers return -1, or 0 >= for valid rule pos,
+ * so we cannot support more than INT_MAX rules at this time.
+ */
+ BUILD_BUG_ON(NFT_PIPAPO_RULE0_MAX > INT_MAX);
+
nft_pipapo_for_each_field(f, i, m) {
- if (f->rules >= (unsigned long)NFT_PIPAPO_RULE0_MAX)
+ if (f->rules >= NFT_PIPAPO_RULE0_MAX)
return -ENOSPC;
if (memcmp(start_p, end_p,
@@ -1358,18 +1426,25 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
if (!new_lt)
goto out_lt;
- NFT_PIPAPO_LT_ASSIGN(dst, new_lt);
+ dst->lt = new_lt;
memcpy(NFT_PIPAPO_LT_ALIGN(new_lt),
NFT_PIPAPO_LT_ALIGN(src->lt),
src->bsize * sizeof(*dst->lt) *
src->groups * NFT_PIPAPO_BUCKETS(src->bb));
- dst->mt = kvmalloc(src->rules * sizeof(*src->mt), GFP_KERNEL);
- if (!dst->mt)
- goto out_mt;
+ if (src->rules > 0) {
+ dst->mt = kvmalloc_array(src->rules_alloc,
+ sizeof(*src->mt), GFP_KERNEL);
+ if (!dst->mt)
+ goto out_mt;
+
+ memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
+ } else {
+ dst->mt = NULL;
+ dst->rules_alloc = 0;
+ }
- memcpy(dst->mt, src->mt, src->rules * sizeof(*src->mt));
src++;
dst++;
}
@@ -1423,10 +1498,10 @@ out_scratch:
*
* Return: Number of rules that originated from the same entry as @first.
*/
-static int pipapo_rules_same_key(struct nft_pipapo_field *f, int first)
+static unsigned int pipapo_rules_same_key(struct nft_pipapo_field *f, unsigned int first)
{
struct nft_pipapo_elem *e = NULL; /* Keep gcc happy */
- int r;
+ unsigned int r;
for (r = first; r < f->rules; r++) {
if (r != first && e != f->mt[r].e)
@@ -1479,8 +1554,9 @@ static int pipapo_rules_same_key(struct nft_pipapo_field *f, int first)
* 0 1 2
* element pointers: 0x42 0x42 0x44
*/
-static void pipapo_unmap(union nft_pipapo_map_bucket *mt, int rules,
- int start, int n, int to_offset, bool is_last)
+static void pipapo_unmap(union nft_pipapo_map_bucket *mt, unsigned int rules,
+ unsigned int start, unsigned int n,
+ unsigned int to_offset, bool is_last)
{
int i;
@@ -1586,8 +1662,8 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
{
struct nft_pipapo *priv = nft_set_priv(set);
struct net *net = read_pnet(&set->net);
+ unsigned int rules_f0, first_rule = 0;
u64 tstamp = nft_net_tstamp(net);
- int rules_f0, first_rule = 0;
struct nft_pipapo_elem *e;
struct nft_trans_gc *gc;
@@ -1597,8 +1673,8 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
while ((rules_f0 = pipapo_rules_same_key(m->f, first_rule))) {
union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
- struct nft_pipapo_field *f;
- int i, start, rules_fx;
+ const struct nft_pipapo_field *f;
+ unsigned int i, start, rules_fx;
start = first_rule;
rules_fx = rules_f0;
@@ -1792,7 +1868,8 @@ static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
{
struct nft_pipapo_elem *e;
- e = pipapo_get(net, set, data, nft_genmask_next(net), nft_net_tstamp(net));
+ e = pipapo_get(net, set, data, nft_genmask_next(net),
+ nft_net_tstamp(net), GFP_KERNEL);
if (IS_ERR(e))
return NULL;
@@ -1976,7 +2053,7 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
{
struct nft_pipapo *priv = nft_set_priv(set);
struct nft_pipapo_match *m = priv->clone;
- int rules_f0, first_rule = 0;
+ unsigned int rules_f0, first_rule = 0;
struct nft_pipapo_elem *e;
const u8 *data;
@@ -2039,9 +2116,9 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
{
struct nft_pipapo *priv = nft_set_priv(set);
struct net *net = read_pnet(&set->net);
- struct nft_pipapo_match *m;
- struct nft_pipapo_field *f;
- int i, r;
+ const struct nft_pipapo_match *m;
+ const struct nft_pipapo_field *f;
+ unsigned int i, r;
rcu_read_lock();
if (iter->genmask == nft_genmask_cur(net))
@@ -2145,6 +2222,9 @@ static int nft_pipapo_init(const struct nft_set *set,
field_count = desc->field_count ? : 1;
+ BUILD_BUG_ON(NFT_PIPAPO_MAX_FIELDS > 255);
+ BUILD_BUG_ON(NFT_PIPAPO_MAX_FIELDS != NFT_REG32_COUNT);
+
if (field_count > NFT_PIPAPO_MAX_FIELDS)
return -EINVAL;
@@ -2166,7 +2246,11 @@ static int nft_pipapo_init(const struct nft_set *set,
rcu_head_init(&m->rcu);
nft_pipapo_for_each_field(f, i, m) {
- int len = desc->field_len[i] ? : set->klen;
+ unsigned int len = desc->field_len[i] ? : set->klen;
+
+ /* f->groups is u8 */
+ BUILD_BUG_ON((NFT_PIPAPO_MAX_BYTES *
+ BITS_PER_BYTE / NFT_PIPAPO_GROUP_BITS_LARGE_SET) >= 256);
f->bb = NFT_PIPAPO_GROUP_BITS_INIT;
f->groups = len * NFT_PIPAPO_GROUPS_PER_BYTE(f);
@@ -2175,7 +2259,8 @@ static int nft_pipapo_init(const struct nft_set *set,
f->bsize = 0;
f->rules = 0;
- NFT_PIPAPO_LT_ASSIGN(f, NULL);
+ f->rules_alloc = 0;
+ f->lt = NULL;
f->mt = NULL;
}
@@ -2211,7 +2296,7 @@ static void nft_set_pipapo_match_destroy(const struct nft_ctx *ctx,
struct nft_pipapo_match *m)
{
struct nft_pipapo_field *f;
- int i, r;
+ unsigned int i, r;
for (i = 0, f = m->f; i < m->field_count - 1; i++, f++)
;
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 3842c7341a9f..24cd1ff73f98 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -70,15 +70,9 @@
#define NFT_PIPAPO_ALIGN_HEADROOM \
(NFT_PIPAPO_ALIGN - ARCH_KMALLOC_MINALIGN)
#define NFT_PIPAPO_LT_ALIGN(lt) (PTR_ALIGN((lt), NFT_PIPAPO_ALIGN))
-#define NFT_PIPAPO_LT_ASSIGN(field, x) \
- do { \
- (field)->lt_aligned = NFT_PIPAPO_LT_ALIGN(x); \
- (field)->lt = (x); \
- } while (0)
#else
#define NFT_PIPAPO_ALIGN_HEADROOM 0
#define NFT_PIPAPO_LT_ALIGN(lt) (lt)
-#define NFT_PIPAPO_LT_ASSIGN(field, x) ((field)->lt = (x))
#endif /* NFT_PIPAPO_ALIGN */
#define nft_pipapo_for_each_field(field, index, match) \
@@ -110,22 +104,20 @@ union nft_pipapo_map_bucket {
/**
* struct nft_pipapo_field - Lookup, mapping tables and related data for a field
- * @groups: Amount of bit groups
* @rules: Number of inserted rules
* @bsize: Size of each bucket in lookup table, in longs
+ * @rules_alloc: Number of allocated rules, always >= rules
+ * @groups: Amount of bit groups
* @bb: Number of bits grouped together in lookup table buckets
* @lt: Lookup table: 'groups' rows of buckets
- * @lt_aligned: Version of @lt aligned to NFT_PIPAPO_ALIGN bytes
* @mt: Mapping table: one bucket per rule
*/
struct nft_pipapo_field {
- int groups;
- unsigned long rules;
- size_t bsize;
- int bb;
-#ifdef NFT_PIPAPO_ALIGN
- unsigned long *lt_aligned;
-#endif
+ unsigned int rules;
+ unsigned int bsize;
+ unsigned int rules_alloc;
+ u8 groups;
+ u8 bb;
unsigned long *lt;
union nft_pipapo_map_bucket *mt;
};
@@ -145,15 +137,15 @@ struct nft_pipapo_scratch {
/**
* struct nft_pipapo_match - Data used for lookup and matching
* @field_count: Amount of fields in set
- * @scratch: Preallocated per-CPU maps for partial matching results
* @bsize_max: Maximum lookup table bucket size of all fields, in longs
+ * @scratch: Preallocated per-CPU maps for partial matching results
* @rcu: Matching data is swapped on commits
* @f: Fields, with lookup and mapping tables
*/
struct nft_pipapo_match {
- int field_count;
+ u8 field_count;
+ unsigned int bsize_max;
struct nft_pipapo_scratch * __percpu *scratch;
- size_t bsize_max;
struct rcu_head rcu;
struct nft_pipapo_field f[] __counted_by(field_count);
};
@@ -186,8 +178,9 @@ struct nft_pipapo_elem {
struct nft_set_ext ext;
};
-int pipapo_refill(unsigned long *map, int len, int rules, unsigned long *dst,
- union nft_pipapo_map_bucket *mt, bool match_only);
+int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules,
+ unsigned long *dst,
+ const union nft_pipapo_map_bucket *mt, bool match_only);
/**
* pipapo_and_field_buckets_4bit() - Intersect 4-bit buckets
@@ -195,7 +188,7 @@ int pipapo_refill(unsigned long *map, int len, int rules, unsigned long *dst,
* @dst: Area to store result
* @data: Input data selecting table buckets
*/
-static inline void pipapo_and_field_buckets_4bit(struct nft_pipapo_field *f,
+static inline void pipapo_and_field_buckets_4bit(const struct nft_pipapo_field *f,
unsigned long *dst,
const u8 *data)
{
@@ -223,7 +216,7 @@ static inline void pipapo_and_field_buckets_4bit(struct nft_pipapo_field *f,
* @dst: Area to store result
* @data: Input data selecting table buckets
*/
-static inline void pipapo_and_field_buckets_8bit(struct nft_pipapo_field *f,
+static inline void pipapo_and_field_buckets_8bit(const struct nft_pipapo_field *f,
unsigned long *dst,
const u8 *data)
{
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index a3a8ddca9918..d08407d589ea 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -212,8 +212,9 @@ static int nft_pipapo_avx2_refill(int offset, unsigned long *map,
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_4b_2(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
u8 pg[2] = { pkt[0] >> 4, pkt[0] & 0xf };
@@ -274,8 +275,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_4b_4(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
u8 pg[4] = { pkt[0] >> 4, pkt[0] & 0xf, pkt[1] >> 4, pkt[1] & 0xf };
@@ -350,8 +352,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_4b_8(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
u8 pg[8] = { pkt[0] >> 4, pkt[0] & 0xf, pkt[1] >> 4, pkt[1] & 0xf,
pkt[2] >> 4, pkt[2] & 0xf, pkt[3] >> 4, pkt[3] & 0xf,
@@ -445,8 +448,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_4b_12(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
u8 pg[12] = { pkt[0] >> 4, pkt[0] & 0xf, pkt[1] >> 4, pkt[1] & 0xf,
pkt[2] >> 4, pkt[2] & 0xf, pkt[3] >> 4, pkt[3] & 0xf,
@@ -534,8 +538,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_4b_32(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
u8 pg[32] = { pkt[0] >> 4, pkt[0] & 0xf, pkt[1] >> 4, pkt[1] & 0xf,
pkt[2] >> 4, pkt[2] & 0xf, pkt[3] >> 4, pkt[3] & 0xf,
@@ -669,8 +674,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_8b_1(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
unsigned long *lt = f->lt, bsize = f->bsize;
@@ -726,8 +732,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_8b_2(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
unsigned long *lt = f->lt, bsize = f->bsize;
@@ -790,8 +797,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_8b_4(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
unsigned long *lt = f->lt, bsize = f->bsize;
@@ -865,8 +873,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_8b_6(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
unsigned long *lt = f->lt, bsize = f->bsize;
@@ -950,8 +959,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_8b_16(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
int i, ret = -1, m256_size = f->bsize / NFT_PIPAPO_LONGS_PER_M256, b;
unsigned long *lt = f->lt, bsize = f->bsize;
@@ -1042,8 +1052,9 @@ nothing:
* word index to be checked next (i.e. first filled word).
*/
static int nft_pipapo_avx2_lookup_slow(unsigned long *map, unsigned long *fill,
- struct nft_pipapo_field *f, int offset,
- const u8 *pkt, bool first, bool last)
+ const struct nft_pipapo_field *f,
+ int offset, const u8 *pkt,
+ bool first, bool last)
{
unsigned long bsize = f->bsize;
int i, ret = -1, b;
@@ -1119,9 +1130,9 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
struct nft_pipapo *priv = nft_set_priv(set);
struct nft_pipapo_scratch *scratch;
u8 genmask = nft_genmask_cur(net);
+ const struct nft_pipapo_match *m;
+ const struct nft_pipapo_field *f;
const u8 *rp = (const u8 *)key;
- struct nft_pipapo_match *m;
- struct nft_pipapo_field *f;
unsigned long *res, *fill;
bool map_index;
int i, ret = 0;
diff --git a/net/netfilter/utils.c b/net/netfilter/utils.c
index acef4155f0da..008419db815a 100644
--- a/net/netfilter/utils.c
+++ b/net/netfilter/utils.c
@@ -179,43 +179,6 @@ int nf_route(struct net *net, struct dst_entry **dst, struct flowi *fl,
}
EXPORT_SYMBOL_GPL(nf_route);
-static int nf_ip_reroute(struct sk_buff *skb, const struct nf_queue_entry *entry)
-{
-#ifdef CONFIG_INET
- const struct ip_rt_info *rt_info = nf_queue_entry_reroute(entry);
-
- if (entry->state.hook == NF_INET_LOCAL_OUT) {
- const struct iphdr *iph = ip_hdr(skb);
-
- if (!(iph->tos == rt_info->tos &&
- skb->mark == rt_info->mark &&
- iph->daddr == rt_info->daddr &&
- iph->saddr == rt_info->saddr))
- return ip_route_me_harder(entry->state.net, entry->state.sk,
- skb, RTN_UNSPEC);
- }
-#endif
- return 0;
-}
-
-int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry)
-{
- const struct nf_ipv6_ops *v6ops;
- int ret = 0;
-
- switch (entry->state.pf) {
- case AF_INET:
- ret = nf_ip_reroute(skb, entry);
- break;
- case AF_INET6:
- v6ops = rcu_dereference(nf_ipv6_ops);
- if (v6ops)
- ret = v6ops->reroute(skb, entry);
- break;
- }
- return ret;
-}
-
/* Only get and check the lengths, not do any hop-by-hop stuff. */
int nf_ip6_check_hbh_len(struct sk_buff *skb, u32 *plen)
{
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 21624d68314f..da5d929c7c85 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1142,7 +1142,8 @@ void xt_compat_target_from_user(struct xt_entry_target *t, void **dstptr,
if (target->compat_from_user)
target->compat_from_user(t->data, ct->data);
else
- memcpy(t->data, ct->data, tsize - sizeof(*ct));
+ unsafe_memcpy(t->data, ct->data, tsize - sizeof(*ct),
+ /* UAPI 0-sized destination */);
tsize += off;
t->u.user.target_size = tsize;