summaryrefslogtreecommitdiff
path: root/net/netfilter/nft_set_rbtree.c
AgeCommit message (Collapse)AuthorFilesLines
2024-02-08netfilter: nft_set_rbtree: skip end interval element from gcPablo Neira Ayuso1-3/+3
rbtree lazy gc on insert might collect an end interval element that has been just added in this transactions, skip end interval elements that are not yet active. Fixes: f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk") Cc: stable@vger.kernel.org Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nf_tables: use timestamp to check for set element timeoutPablo Neira Ayuso1-4/+7
Add a timestamp field at the beginning of the transaction, store it in the nftables per-netns area. Update set backend .insert, .deactivate and sync gc path to use the timestamp, this avoids that an element expires while control plane transaction is still unfinished. .lookup and .update, which are used from packet path, still use the current time to check if the element has expired. And .get path and dump also since this runs lockless under rcu read size lock. Then, there is async gc which also needs to check the current time since it runs asynchronously from a workqueue. Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: nft_set_rbtree: Remove unused variable nft_netYang Li1-2/+0
The code that uses nft_net has been removed, and the nft_pernet function is merely obtaining a reference to shared data through the net pointer. The content of the net pointer is not modified or changed, so both of them should be removed. silence the warning: net/netfilter/nft_set_rbtree.c:627:26: warning: variable ‘nft_net’ set but not used Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7103 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: set->ops->insert returns opaque set element in case of ↵Pablo Neira Ayuso1-5/+5
EEXIST Return struct nft_elem_priv instead of struct nft_set_ext for consistency with ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv") and to prepare the introduction of element timeout updates from control path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: shrink memory consumption of set elementsPablo Neira Ayuso1-18/+7
Instead of copying struct nft_set_elem into struct nft_trans_elem, store the pointer to the opaque set element object in the transaction. Adapt set backend API (and set backend implementations) to take the pointer to opaque set element representation whenever required. This patch deconstifies .remove() and .activate() set backend API since these modify the set element opaque object. And it also constify nft_set_elem_ext() this provides access to the nft_set_ext struct without updating the object. According to pahole on x86_64, this patch shrinks struct nft_trans_elem size from 216 to 24 bytes. This patch also reduces stack memory consumption by removing the template struct nft_set_elem object, using the opaque set element object instead such as from the set iterator API, catchall elements and the get element command. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: expose opaque set element as struct nft_elem_privPablo Neira Ayuso1-20/+26
Add placeholder structure and place it at the beginning of each struct nft_*_elem for each existing set backend, instead of exposing elements as void type to the frontend which defeats compiler type checks. Use this pointer to this new type to replace void *. This patch updates the following set backend API to use this new struct nft_elem_priv placeholder structure: - update - deactivate - flush - get as well as the following helper functions: - nft_set_elem_ext() - nft_set_elem_init() - nft_set_elem_destroy() - nf_tables_set_elem_destroy() This patch adds nft_elem_priv_cast() to cast struct nft_elem_priv to native element representation from the corresponding set backend. BUILD_BUG_ON() makes sure this .priv placeholder is always at the top of the opaque set element representation. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: set backend .flush always succeedsPablo Neira Ayuso1-3/+1
.flush is always successful since this results from iterating over the set elements to toggle mark the element as inactive in the next generation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nft_set_rbtree: prefer sync gc to async workerFlorian Westphal1-59/+65
There is no need for asynchronous garbage collection, rbtree inserts can only happen from the netlink control plane. We already perform on-demand gc on insertion, in the area of the tree where the insertion takes place, but we don't do a full tree walk there for performance reasons. Do a full gc walk at the end of the transaction instead and remove the async worker. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nft_set_rbtree: rename gc deactivate+erase functionFlorian Westphal1-5/+6
Next patch adds a cllaer that doesn't hold the priv->write lock and will need a similar function. Rename the existing function to make it clear that it can only be used for opportunistic gc during insertion. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-18netfilter: nft_set_rbtree: .deactivate fails if element has expiredPablo Neira Ayuso1-0/+2
This allows to remove an expired element which is not possible in other existing set backends, this is more noticeable if gc-interval is high so expired elements remain in the tree. On-demand gc also does not help in this case, because this is delete element path. Return NULL if element has expired. Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failureFlorian Westphal1-17/+29
nft_rbtree_gc_elem() walks back and removes the end interval element that comes before the expired element. There is a small chance that we've cached this element as 'rbe_ge'. If this happens, we hold and test a pointer that has been queued for freeing. It also causes spurious insertion failures: $ cat test-testcases-sets-0044interval_overlap_0.1/testout.log Error: Could not process rule: File exists add element t s { 0 - 2 } ^^^^^^ Failed to insert 0 - 2 given: table ip t { set s { type inet_service flags interval,timeout timeout 2s gc-interval 2s } } The set (rbtree) is empty. The 'failure' doesn't happen on next attempt. Reason is that when we try to insert, the tree may hold an expired element that collides with the range we're adding. While we do evict/erase this element, we can trip over this check: if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) return -ENOTEMPTY; rbe_ge was erased by the synchronous gc, we should not have done this check. Next attempt won't find it, so retry results in successful insertion. Restart in-kernel to avoid such spurious errors. Such restart are rare, unless userspace intentionally adds very large numbers of elements with very short timeouts while setting a huge gc interval. Even in this case, this cannot loop forever, on each retry an existing element has been removed. As the caller is holding the transaction mutex, its impossible for a second entity to add more expiring elements to the tree. After this it also becomes feasible to remove the async gc worker and perform all garbage collection from the commit path. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-08netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GCPablo Neira Ayuso1-1/+1
pipapo needs to enqueue GC transactions for catchall elements through nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and nft_trans_gc_catchall_async() to handle GC transaction queueing accordingly. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08netfilter: nft_set_rbtree: use read spinlock to avoid datapath contentionPablo Neira Ayuso1-4/+2
rbtree GC does not modify the datastructure, instead it collects expired elements and it enqueues a GC transaction. Use a read spinlock instead to avoid data contention while GC worker is running. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-06netfilter: nft_set_rbtree: skip sync GC for new elements in this transactionPablo Neira Ayuso1-2/+6
New elements in this transaction might expired before such transaction ends. Skip sync GC for such elements otherwise commit path might walk over an already released object. Once transaction is finished, async GC will collect such expired element. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23netfilter: nf_tables: defer gc run if previous batch is still pendingFlorian Westphal1-0/+3
Don't queue more gc work, else we may queue the same elements multiple times. If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue. The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged. The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending. Add a helper for this and skip the gc run in this case. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10netfilter: nf_tables: adapt set backend to use GC transaction APIPablo Neira Ayuso1-57/+87
Use the GC transaction API to replace the old and buggy gc API and the busy mark approach. No set elements are removed from async garbage collection anymore, instead the _DEAD bit is set on so the set element is not visible from lookup path anymore. Async GC enqueues transaction work that might be aborted and retried later. rbtree and pipapo set backends does not set on the _DEAD bit from the sync GC path since this runs in control plane path where mutex is held. In this case, set elements are deactivated, removed and then released via RCU callback, sync GC never fails. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-09netfilter: nf_tables: don't skip expired elements during walkFlorian Westphal1-2/+0
There is an asymmetry between commit/abort and preparation phase if the following conditions are met: 1. set is a verdict map ("1.2.3.4 : jump foo") 2. timeouts are enabled In this case, following sequence is problematic: 1. element E in set S refers to chain C 2. userspace requests removal of set S 3. kernel does a set walk to decrement chain->use count for all elements from preparation phase 4. kernel does another set walk to remove elements from the commit phase (or another walk to do a chain->use increment for all elements from abort phase) If E has already expired in 1), it will be ignored during list walk, so its use count won't have been changed. Then, when set is culled, ->destroy callback will zap the element via nf_tables_set_elem_destroy(), but this function is only safe for elements that have been deactivated earlier from the preparation phase: lack of earlier deactivate removes the element but leaks the chain use count, which results in a WARN splat when the chain gets removed later, plus a leak of the nft_chain structure. Update pipapo_get() not to skip expired elements, otherwise flush command reports bogus ENOENT errors. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-26netfilter: nft_set_rbtree: fix overlap expiration walkFlorian Westphal1-6/+14
The lazy gc on insert that should remove timed-out entries fails to release the other half of the interval, if any. Can be reproduced with tests/shell/testcases/sets/0044interval_overlap_0 in nftables.git and kmemleak enabled kernel. Second bug is the use of rbe_prev vs. prev pointer. If rbe_prev() returns NULL after at least one iteration, rbe_prev points to element that is not an end interval, hence it should not be removed. Lastly, check the genmask of the end interval if this is active in the current generation. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-06-20netfilter: nf_tables: drop map element references from preparation phasePablo Neira Ayuso1-2/+3
set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path. Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets. Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-05-17netfilter: nft_set_rbtree: fix null deref on element insertionFlorian Westphal1-7/+13
There is no guarantee that rb_prev() will not return NULL in nft_rbtree_gc_elem(): general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] nft_add_set_elem+0x14b0/0x2990 nf_tables_newsetelem+0x528/0xb30 Furthermore, there is a possible use-after-free while iterating, 'node' can be free'd so we need to cache the next value to use. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-23netfilter: nft_set_rbtree: skip elements in transaction from garbage collectionPablo Neira Ayuso1-1/+15
Skip interference with an ongoing transaction, do not perform garbage collection on inactive elements. Reset annotated previous end interval if the expired element is marked as busy (control plane removed the element right before expiration). Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-23netfilter: nft_set_rbtree: Switch to node list walk for overlap detectionPablo Neira Ayuso1-127/+189
...instead of a tree descent, which became overly complicated in an attempt to cover cases where expired or inactive elements would affect comparisons with the new element being inserted. Further, it turned out that it's probably impossible to cover all those cases, as inactive nodes might entirely hide subtrees consisting of a complete interval plus a node that makes the current insertion not overlap. To speed up the overlap check, descent the tree to find a greater element that is closer to the key value to insert. Then walk down the node list for overlap detection. Starting the overlap check from rb_first() unconditionally is slow, it takes 10 times longer due to the full linear traversal of the list. Moreover, perform garbage collection of expired elements when walking down the node list to avoid bogus overlap reports. For the insertion operation itself, this essentially reverts back to the implementation before commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), except that cases of complete overlap are already handled in the overlap detection phase itself, which slightly simplifies the loop to find the insertion point. Based on initial patch from Stefano Brivio, including text from the original patch description too. Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-22netfilter: nft_set_rbtree: overlap detection with element re-addition after ↵Pablo Neira Ayuso1-1/+5
deletion This patch fixes spurious EEXIST errors. Extend d2df92e98a34 ("netfilter: nft_set_rbtree: handle element re-addition after deletion") to deal with elements with same end flags in the same transation. Reset the overlap flag as described by 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"). Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Fixes: d2df92e98a34 ("netfilter: nft_set_rbtree: handle element re-addition after deletion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: prefer direct calls for set lookupsFlorian Westphal1-2/+3
Extend nft_set_do_lookup() to use direct calls when retpoline feature is enabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27netfilter: nftables: add catch-all set element supportPablo Neira Ayuso1-0/+6
This patch extends the set infrastructure to add a special catch-all set element. If the lookup fails to find an element (or range) in the set, then the catch-all element is selected. Users can specify a mapping, expression(s) and timeout to be attached to the catch-all element. This patch adds a catchall list to the set, this list might contain more than one single catch-all element (e.g. in case that the catch-all element is removed and a new one is added in the same transaction). However, most of the time, there will be either one element or no elements at all in this list. The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and such special element has no NFTA_SET_ELEM_KEY attribute. There is a new nft_set_elem_catchall object that stores a reference to the dummy catch-all element (catchall->elem) whose layout is the same of the set element type to reuse the existing set element codebase. The set size does not apply to the catch-all element, users can define a catch-all element even if the set is full. The check for valid set element flags hava been updates to report EOPNOTSUPP in case userspace requests flags that are not supported when using new userspace nftables and old kernel. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21netfilter: nft_set_rbtree: Detect partial overlap with start endpoint matchStefano Brivio1-1/+33
Getting creative with nft and omitting the interval_overlap() check from the set_overlap() function, without omitting set_overlap() altogether, led to the observation of a partial overlap that wasn't detected, and would actually result in replacement of the end element of an existing interval. This is due to the fact that we'll return -EEXIST on a matching, pre-existing start element, instead of -ENOTEMPTY, and the error is cleared by API if NLM_F_EXCL is not given. At this point, we can insert a matching start, and duplicate the end element as long as we don't end up into other intervals. For instance, inserting interval 0 - 2 with an existing 0 - 3 interval would result in a single 0 - 2 interval, and a dangling '3' end element. This is because nft will proceed after inserting the '0' start element as no error is reported, and no further conflicting intervals are detected on insertion of the end element. This needs a different approach as it's a local condition that can be detected by looking for duplicate ends coming from left and right, separately. Track those and directly report -ENOTEMPTY on duplicated end elements for a matching start. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21netfilter: nft_set_rbtree: Handle outcomes of tree rotations in overlap ↵Stefano Brivio1-9/+14
detection Checks for partial overlaps on insertion assume that end elements are always descendant nodes of their corresponding start, because they are inserted later. However, this is not the case if a previous delete operation caused a tree rotation as part of rebalancing. Taking the issue reported by Andreas Fischer as an example, if we omit delete operations, the existing procedure works because, equivalently, we are inserting a start item with value 40 in the this region of the red-black tree with single-sized intervals: overlap flag 10 (start) / \ false 20 (start) / \ false 30 (start) / \ false 60 (start) / \ false 50 (end) / \ false 20 (end) / \ false 40 (start) if we now delete interval 30 - 30, the tree can be rearranged in a way similar to this (note the rotation involving 50 - 50): overlap flag 10 (start) / \ false 20 (start) / \ false 25 (start) / \ false 70 (start) / \ false 50 (end) / \ true (from rule a1.) 50 (start) / \ true 40 (start) and we traverse interval 50 - 50 from the opposite direction compared to what was expected. To deal with those cases, add a start-before-start rule, b4., that covers traversal of existing intervals from the right. We now need to restrict start-after-end rule b3. to cases where there are no occurring nodes between existing start and end elements, because addition of rule b4. isn't sufficient to ensure that the pre-existing end element we encounter while descending the tree corresponds to a start element of an interval that we already traversed entirely. Different types of overlap detection on trees with rotations resulting from re-balancing will be covered by nft test case sets/0044interval_overlap_1. Reported-by: Andreas Fischer <netfilter@d9c.eu> Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1449 Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-29netfilter: nft_set_rbtree: Use sequence counter with associated rwlockAhmed S. Darwish1-2/+2
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_rwlock_t data type, which allows to associate a rwlock with the sequence counter. This enables lockdep to verify that the rwlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-16-a.darwish@linutronix.de
2020-06-08netfilter: nft_set_rbtree: Don't account for expired elements on insertionStefano Brivio1-7/+14
While checking the validity of insertion in __nft_rbtree_insert(), we currently ignore conflicting elements and intervals only if they are not active within the next generation. However, if we consider expired elements and intervals as potentially conflicting and overlapping, we'll return error for entries that should be added instead. This is particularly visible with garbage collection intervals that are comparable with the element timeout itself, as reported by Mike Dillinger. Other than the simple issue of denying insertion of valid entries, this might also result in insertion of a single element (opening or closing) out of a given interval. With single entries (that are inserted as intervals of size 1), this leads in turn to the creation of new intervals. For example: # nft add element t s { 192.0.2.1 } # nft list ruleset [...] elements = { 192.0.2.1-255.255.255.255 } Always ignore expired elements active in the next generation, while checking for conflicts. It might be more convenient to introduce a new macro that covers both inactive and expired items, as this type of check also appears quite frequently in other set back-ends. This is however beyond the scope of this fix and can be deferred to a separate patch. Other than the overlap detection cases introduced by commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), we also have to cover the original conflict check dealing with conflicts between two intervals of size 1, which was introduced before support for timeout was introduced. This won't return an error to the user as -EEXIST is masked by nft if NLM_F_EXCL is not given, but would result in a silent failure adding the entry. Reported-by: Mike Dillinger <miked@softtalker.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-12netfilter: nft_set_rbtree: Add missing expired checksPhil Sutter1-0/+11
Expired intervals would still match and be dumped to user space until garbage collection wiped them out. Make sure they stop matching and disappear (from users' perspective) as soon as they expire. Fixes: 8d8540c4f5e03 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-06netfilter: nft_set_rbtree: Drop spurious condition for overlap detection on ↵Stefano Brivio1-12/+11
insertion Case a1. for overlap detection in __nft_rbtree_insert() is not a valid one: start-after-start is not needed to detect any type of interval overlap and it actually results in a false positive if, while descending the tree, this is the only step we hit after starting from the root. This introduced a regression, as reported by Pablo, in Python tests cases ip/ip.t and ip/numgen.t: ip/ip.t: ERROR: line 124: add rule ip test-ip4 input ip hdrlength vmap { 0-4 : drop, 5 : accept, 6 : continue } counter: This rule should not have failed. ip/numgen.t: ERROR: line 7: add rule ip test-ip4 pre dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200}: This rule should not have failed. Drop case a1. and renumber others, so that they are a bit clearer. In order for these diagrams to be readily understandable, a bigger rework is probably needed, such as an ASCII art of the actual rbtree (instead of a flattened version). Shell script test sets/0044interval_overlap_0 should cover all possible cases for false negatives, so I consider that test case still sufficient after this change. v2: Fix comments for cases a3. and b3. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-9/+78
Overlapping header include additions in macsec.c A bug fix in 'net' overlapping with the removal of 'version' string in ena_netdev.c Overlapping test additions in selftests Makefile Overlapping PCI ID table adjustments in iwlwifi driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24netfilter: nft_set_rbtree: Detect partial overlaps on insertionStefano Brivio1-3/+67
...and return -ENOTEMPTY to the front-end in this case, instead of proceeding. Currently, nft takes care of checking for these cases and not sending them to the kernel, but if we drop the set_overlap() call in nft we can end up in situations like: # nft add table t # nft add set t s '{ type inet_service ; flags interval ; }' # nft add element t s '{ 1 - 5 }' # nft add element t s '{ 6 - 10 }' # nft add element t s '{ 4 - 7 }' # nft list set t s table ip t { set s { type inet_service flags interval elements = { 1-3, 4-5, 6-7 } } } This change has the primary purpose of making the behaviour consistent with nft_set_pipapo, but is also functional to avoid inconsistent behaviour if userspace sends overlapping elements for any reason. v2: When we meet the same key data in the tree, as start element while inserting an end element, or as end element while inserting a start element, actually check that the existing element is active, before resetting the overlap flag (Pablo Neira Ayuso) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()Stefano Brivio1-6/+11
Replace negations of nft_rbtree_interval_end() with a new helper, nft_rbtree_interval_start(), wherever this helps to visualise the problem at hand, that is, for all the occurrences except for the comparison against given flags in __nft_rbtree_get(). This gets especially useful in the next patch. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: nf_tables: make all set structs constFlorian Westphal1-2/+1
They do not need to be writeable anymore. v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27netfilter: nf_tables: Support for sets with multiple ranged fieldsStefano Brivio1-0/+3
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used to specify the length of each field in a set concatenation. This allows set implementations to support concatenation of multiple ranged items, as they can divide the input key into matching data for every single field. Such set implementations would be selected as they specify support for NFT_SET_INTERVAL and allow desc->field_count to be greater than one. Explicitly disallow this for nft_set_rbtree. In order to specify the interval for a set entry, userspace would include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass range endpoints as two separate keys, represented by attributes NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END. While at it, export the number of 32-bit registers available for packet matching, as nftables will need this to know the maximum number of field lengths that can be specified. For example, "packets with an IPv4 address between 192.0.2.0 and 192.0.2.42, with destination port between 22 and 25", can be expressed as two concatenated elements: NFTA_SET_ELEM_KEY: 192.0.2.0 . 22 NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25 and NFTA_SET_DESC_CONCAT attribute would contain: NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 4 NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 2 v4: No changes v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY v2: No changes Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09netfilter: nft_set_rbtree: bogus lookup/get on consecutive elements in named ↵Pablo Neira Ayuso1-5/+16
sets The existing rbtree implementation might store consecutive elements where the closing element and the opening element might overlap, eg. [ a, a+1) [ a+1, a+2) This patch removes the optimization for non-anonymous sets in the exact matching case, where it is assumed to stop searching in case that the closing element is found. Instead, invalidate candidate interval and keep looking further in the tree. The lookup/get operation might return false, while there is an element in the rbtree. Moreover, the get operation returns true as if a+2 would be in the tree. This happens with named sets after several set updates. The existing lookup optimization (that only works for the anonymous sets) might not reach the opening [ a+1,... element if the closing ...,a+1) is found in first place when walking over the rbtree. Hence, walking the full tree in that case is needed. This patch fixes the lookup and get operations. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Fixes: ba0e4d9917b4 ("netfilter: nf_tables: get set elements via netlink") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-13netfilter: nf_tables: add missing prototypes.Valdis Klētnieks1-1/+1
Sparse rightly complains about undeclared symbols. CHECK net/netfilter/nft_set_hash.c net/netfilter/nft_set_hash.c:647:21: warning: symbol 'nft_set_rhash_type' was not declared. Should it be static? net/netfilter/nft_set_hash.c:670:21: warning: symbol 'nft_set_hash_type' was not declared. Should it be static? net/netfilter/nft_set_hash.c:690:21: warning: symbol 'nft_set_hash_fast_type' was not declared. Should it be static? CHECK net/netfilter/nft_set_bitmap.c net/netfilter/nft_set_bitmap.c:296:21: warning: symbol 'nft_set_bitmap_type' was not declared. Should it be static? CHECK net/netfilter/nft_set_rbtree.c net/netfilter/nft_set_rbtree.c:470:21: warning: symbol 'nft_set_rbtree_type' was not declared. Should it be static? Include nf_tables_core.h rather than nf_tables.h to pick up the additional definitions. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner1-4/+1
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-18netfilter: nft_set_rbtree: check for inactive element after flag mismatchPablo Neira Ayuso1-4/+3
Otherwise, we hit bogus ENOENT when removing elements. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-11netfilter: nft_set_rbtree: allow loose matching of closing element in intervalPablo Neira Ayuso1-2/+8
Allow to find closest matching for the right side of an interval (end flag set on) so we allow lookups in inner ranges, eg. 10-20 in 5-25. Fixes: ba0e4d9917b4 ("netfilter: nf_tables: get set elements via netlink") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28netfilter: nft_set_rbtree: add missing rb_erase() in GC routineTaehee Yoo1-14/+14
The nft_set_gc_batch_check() checks whether gc buffer is full. If gc buffer is full, gc buffer is released by the nft_set_gc_batch_complete() internally. In case of rbtree, the rb_erase() should be called before calling the nft_set_gc_batch_complete(). therefore the rb_erase() should be called before calling the nft_set_gc_batch_check() too. test commands: table ip filter { set set1 { type ipv4_addr; flags interval, timeout; gc-interval 10s; timeout 1s; elements = { 1-2, 3-4, 5-6, ... 10000-10001, } } } %nft -f test.nft splat looks like: [ 430.273885] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 430.282158] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 430.283116] CPU: 1 PID: 190 Comm: kworker/1:2 Tainted: G B 4.18.0+ #7 [ 430.283116] Workqueue: events_power_efficient nft_rbtree_gc [nf_tables_set] [ 430.313559] RIP: 0010:rb_next+0x81/0x130 [ 430.313559] Code: 08 49 bd 00 00 00 00 00 fc ff df 48 bb 00 00 00 00 00 fc ff df 48 85 c0 75 05 eb 58 48 89 d4 [ 430.313559] RSP: 0018:ffff88010cdb7680 EFLAGS: 00010207 [ 430.313559] RAX: 0000000000b84854 RBX: dffffc0000000000 RCX: ffffffff83f01973 [ 430.313559] RDX: 000000000017090c RSI: 0000000000000008 RDI: 0000000000b84864 [ 430.313559] RBP: ffff8801060d4588 R08: fffffbfff09bc349 R09: fffffbfff09bc349 [ 430.313559] R10: 0000000000000001 R11: fffffbfff09bc348 R12: ffff880100f081a8 [ 430.313559] R13: dffffc0000000000 R14: ffff880100ff8688 R15: dffffc0000000000 [ 430.313559] FS: 0000000000000000(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000 [ 430.313559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 430.313559] CR2: 0000000001551008 CR3: 000000005dc16000 CR4: 00000000001006e0 [ 430.313559] Call Trace: [ 430.313559] nft_rbtree_gc+0x112/0x5c0 [nf_tables_set] [ 430.313559] process_one_work+0xc13/0x1ec0 [ 430.313559] ? _raw_spin_unlock_irq+0x29/0x40 [ 430.313559] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [ 430.313559] ? set_load_weight+0x270/0x270 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x40/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x40/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x40/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __switch_to_asm+0x40/0x70 [ 430.313559] ? __switch_to_asm+0x34/0x70 [ 430.313559] ? __schedule+0x6d3/0x1f50 [ 430.313559] ? find_held_lock+0x39/0x1c0 [ 430.313559] ? __sched_text_start+0x8/0x8 [ 430.313559] ? cyc2ns_read_end+0x10/0x10 [ 430.313559] ? save_trace+0x300/0x300 [ 430.313559] ? sched_clock_local+0xd4/0x140 [ 430.313559] ? find_held_lock+0x39/0x1c0 [ 430.313559] ? worker_thread+0x353/0x1120 [ 430.313559] ? worker_thread+0x353/0x1120 [ 430.313559] ? lock_contended+0xe70/0xe70 [ 430.313559] ? __lock_acquire+0x4500/0x4500 [ 430.535635] ? do_raw_spin_unlock+0xa5/0x330 [ 430.535635] ? do_raw_spin_trylock+0x101/0x1a0 [ 430.535635] ? do_raw_spin_lock+0x1f0/0x1f0 [ 430.535635] ? _raw_spin_lock_irq+0x10/0x70 [ 430.535635] worker_thread+0x15d/0x1120 [ ... ] Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_set: fix allocation size overflow in privsize callback.Taehee Yoo1-2/+2
In order to determine allocation size of set, ->privsize is invoked. At this point, both desc->size and size of each data structure of set are used. desc->size means number of element that is given by user. desc->size is u32 type. so that upperlimit of set element is 4294967295. but return type of ->privsize is also u32. hence overflow can occurred. test commands: %nft add table ip filter %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; } %nft list ruleset splat looks like: [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7 [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.229091] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.229091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.229091] Call Trace: [ 1239.229091] ? nft_hash_remove+0xf0/0xf0 [nf_tables_set] [ 1239.229091] ? memset+0x1f/0x40 [ 1239.229091] ? __nla_reserve+0x9f/0xb0 [ 1239.229091] ? memcpy+0x34/0x50 [ 1239.229091] nf_tables_dump_set+0x9a1/0xda0 [nf_tables] [ 1239.229091] ? __kmalloc_reserve.isra.29+0x2e/0xa0 [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_commit+0x2c60/0x2c60 [nf_tables] [ 1239.229091] netlink_dump+0x470/0xa20 [ 1239.229091] __netlink_dump_start+0x5ae/0x690 [ 1239.229091] nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables] [ 1239.229091] nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables] [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables] [ 1239.229091] ? nla_parse+0xab/0x230 [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink] [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? debug_show_all_locks+0x290/0x290 [ 1239.229091] ? sched_clock_cpu+0x132/0x170 [ 1239.229091] ? find_held_lock+0x39/0x1b0 [ 1239.229091] ? sched_clock_local+0x10d/0x130 [ 1239.229091] netlink_rcv_skb+0x211/0x320 [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? netlink_ack+0x7b0/0x7b0 [ 1239.229091] ? ns_capable_common+0x6e/0x110 [ 1239.229091] nfnetlink_rcv+0x2d1/0x310 [nfnetlink] [ 1239.229091] ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink] [ 1239.229091] ? netlink_deliver_tap+0x829/0x930 [ 1239.229091] ? lock_acquire+0x265/0x2e0 [ 1239.229091] netlink_unicast+0x406/0x520 [ 1239.509725] ? netlink_attachskb+0x5b0/0x5b0 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] netlink_sendmsg+0x987/0xa20 [ 1239.509725] ? netlink_unicast+0x520/0x520 [ 1239.509725] ? _copy_from_user+0xa9/0xc0 [ 1239.509725] __sys_sendto+0x21a/0x2c0 [ 1239.509725] ? __ia32_sys_getpeername+0xa0/0xa0 [ 1239.509725] ? retint_kernel+0x10/0x10 [ 1239.509725] ? sched_clock_cpu+0x132/0x170 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] ? lock_downgrade+0x540/0x540 [ 1239.509725] ? up_read+0x1c/0x100 [ 1239.509725] ? __do_page_fault+0x763/0x970 [ 1239.509725] ? retint_user+0x18/0x18 [ 1239.509725] __x64_sys_sendto+0x177/0x180 [ 1239.509725] do_syscall_64+0xaa/0x360 [ 1239.509725] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1239.509725] RIP: 0033:0x7f5a8f468e03 [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03 [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003 [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0 [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0 [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables [ 1239.670713] ---[ end trace 39375adcda140f11 ]--- [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.751785] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.760993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.775679] Kernel panic - not syncing: Fatal exception [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1239.776630] Rebooting in 5 seconds.. Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nft_set_rbtree: fix panic when destroying set by GCTaehee Yoo1-2/+5
This patch fixes below. 1. check null pointer of rb_next. rb_next can return null. so null check routine should be added. 2. add rcu_barrier in destroy routine. GC uses call_rcu to remove elements. but all elements should be removed before destroying set and chains. so that rcu_barrier is added. test script: %cat test.nft table inet aa { map map1 { type ipv4_addr : verdict; flags interval, timeout; elements = { 0-1 : jump a0, 3-4 : jump a0, 6-7 : jump a0, 9-10 : jump a0, 12-13 : jump a0, 15-16 : jump a0, 18-19 : jump a0, 21-22 : jump a0, 24-25 : jump a0, 27-28 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset table inet aa { map map1 { type ipv4_addr : verdict; flags interval, timeout; elements = { 0-1 : jump a0, 3-4 : jump a0, 6-7 : jump a0, 9-10 : jump a0, 12-13 : jump a0, 15-16 : jump a0, 18-19 : jump a0, 21-22 : jump a0, 24-25 : jump a0, 27-28 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset splat looks like: [ 2402.419838] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 2402.428433] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 2402.429343] CPU: 1 PID: 1350 Comm: kworker/1:1 Not tainted 4.18.0-rc2+ #1 [ 2402.429343] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 03/23/2017 [ 2402.429343] Workqueue: events_power_efficient nft_rbtree_gc [nft_set_rbtree] [ 2402.429343] RIP: 0010:rb_next+0x1e/0x130 [ 2402.429343] Code: e9 de f2 ff ff 0f 1f 80 00 00 00 00 41 55 48 89 fa 41 54 55 53 48 c1 ea 03 48 b8 00 00 00 0 [ 2402.429343] RSP: 0018:ffff880105f77678 EFLAGS: 00010296 [ 2402.429343] RAX: dffffc0000000000 RBX: ffff8801143e3428 RCX: 1ffff1002287c69c [ 2402.429343] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 [ 2402.429343] RBP: 0000000000000000 R08: ffffed0016aabc24 R09: ffffed0016aabc24 [ 2402.429343] R10: 0000000000000001 R11: ffffed0016aabc23 R12: 0000000000000000 [ 2402.429343] R13: ffff8800b6933388 R14: dffffc0000000000 R15: ffff8801143e3440 [ 2402.534486] kasan: CONFIG_KASAN_INLINE enabled [ 2402.534212] FS: 0000000000000000(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 2402.534212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2402.534212] CR2: 0000000000863008 CR3: 00000000a3c16000 CR4: 00000000001006e0 [ 2402.534212] Call Trace: [ 2402.534212] nft_rbtree_gc+0x2b5/0x5f0 [nft_set_rbtree] [ 2402.534212] process_one_work+0xc1b/0x1ee0 [ 2402.540329] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 2402.534212] ? _raw_spin_unlock_irq+0x29/0x40 [ 2402.534212] ? pwq_dec_nr_in_flight+0x3e0/0x3e0 [ 2402.534212] ? set_load_weight+0x270/0x270 [ 2402.534212] ? __schedule+0x6ea/0x1fb0 [ 2402.534212] ? __sched_text_start+0x8/0x8 [ 2402.534212] ? save_trace+0x320/0x320 [ 2402.534212] ? sched_clock_local+0xe2/0x150 [ 2402.534212] ? find_held_lock+0x39/0x1c0 [ 2402.534212] ? worker_thread+0x35f/0x1150 [ 2402.534212] ? lock_contended+0xe90/0xe90 [ 2402.534212] ? __lock_acquire+0x4520/0x4520 [ 2402.534212] ? do_raw_spin_unlock+0xb1/0x350 [ 2402.534212] ? do_raw_spin_trylock+0x111/0x1b0 [ 2402.534212] ? do_raw_spin_lock+0x1f0/0x1f0 [ 2402.534212] worker_thread+0x169/0x1150 Fixes: 8d8540c4f5e0("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-06netfilter: nf_tables: place all set backends in one single modulePablo Neira Ayuso1-18/+1
This patch disallows rbtree with single elements, which is causing problems with the recent timeout support. Before this patch, you could opt out individual set representations per module, which is just adding extra complexity. Fixes: 8d8540c4f5e0("netfilter: nft_set_rbtree: add timeout support") Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+1
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Reject non-null terminated helper names from xt_CT, from Gao Feng. 2) Fix KASAN splat due to out-of-bound access from commit phase, from Alexey Kodanev. 3) Missing conntrack hook registration on IPVS FTP helper, from Julian Anastasov. 4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo. 5) Fix inverted check on packet xmit to non-local addresses, also from Julian. 6) Fix ebtables alignment compat problems, from Alin Nastac. 7) Hook mask checks are not correct in xt_set, from Serhey Popovych. 8) Fix timeout listing of element in ipsets, from Jozsef. 9) Cap maximum timeout value in ipset, also from Jozsef. 10) Don't allow family option for hash:mac sets, from Florent Fourcot. 11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this Florian. 12) Another bug reported by KASAN in the rbtree set backend, from Taehee Yoo. 13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT. From Gao Feng. 14) Missing initialization of match/target in ebtables, from Florian Westphal. 15) Remove useless nft_dup.h file in include path, from C. Labbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-06netfilter: nft_set_rbtree: fix parameter of __nft_rbtree_lookup()Taehee Yoo1-1/+1
The parameter this doesn't have a flags value. so that it can't be used by nft_rbtree_interval_end(). test commands: %nft add table ip filter %nft add set ip filter s { type ipv4_addr \; flags interval \; } %nft add element ip filter s {0-1} %nft add element ip filter s {2-10} %nft add chain ip filter input { type filter hook input priority 0\; } %nft add rule ip filter input ip saddr @s Splat looks like: [ 246.752502] BUG: KASAN: slab-out-of-bounds in __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree] [ 246.752502] Read of size 1 at addr ffff88010d9efa47 by task http/1092 [ 246.752502] CPU: 1 PID: 1092 Comm: http Not tainted 4.17.0-rc6+ #185 [ 246.752502] Call Trace: [ 246.752502] <IRQ> [ 246.752502] dump_stack+0x74/0xbb [ 246.752502] ? __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree] [ 246.752502] print_address_description+0xc7/0x290 [ 246.752502] ? __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree] [ 246.752502] kasan_report+0x22c/0x350 [ 246.752502] __nft_rbtree_lookup+0x677/0x6a0 [nft_set_rbtree] [ 246.752502] nft_rbtree_lookup+0xc9/0x2d2 [nft_set_rbtree] [ 246.752502] ? sched_clock_cpu+0x144/0x180 [ 246.752502] nft_lookup_eval+0x149/0x3a0 [nf_tables] [ 246.752502] ? __lock_acquire+0xcea/0x4ed0 [ 246.752502] ? nft_lookup_init+0x6b0/0x6b0 [nf_tables] [ 246.752502] nft_do_chain+0x263/0xf50 [nf_tables] [ 246.752502] ? __nft_trace_packet+0x1a0/0x1a0 [nf_tables] [ 246.752502] ? sched_clock_cpu+0x144/0x180 [ ... ] Fixes: f9121355eb6f ("netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: nft_set_rbtree: add timeout supportPablo Neira Ayuso1-3/+72
Add garbage collection logic to expire elements stored in the rb-tree representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-24netfilter: nf_tables: Simplify set backend selectionPhil Sutter1-20/+16
Drop nft_set_type's ability to act as a container of multiple backend implementations it chooses from. Instead consolidate the whole selection logic in nft_select_set_ops() and the actual backend provided estimate() callback. This turns nf_tables_set_types into a list containing all available backends which is traversed when selecting one matching userspace requested criteria. Also, this change allows to embed nft_set_ops structure into nft_set_type and pull flags field into the latter as it's only used during selection phase. A crucial part of this change is to make sure the new layout respects hash backend constraints formerly enforced by nft_hash_select_ops() function: This is achieved by introduction of a specific estimate() callback for nft_hash_fast_ops which returns false for key lengths != 4. In turn, nft_hash_estimate() is changed to return false for key lengths == 4 so it won't be chosen by accident. Also, both callbacks must return false for unbounded sets as their size estimate depends on a known maximum element count. Note that this patch partially reverts commit 4f2921ca21b71 ("netfilter: nf_tables: meter: pick a set backend that supports updates") by making nft_set_ops_candidate() not explicitly look for an update callback but make NFT_SET_EVAL a regular backend feature flag which is checked along with the others. This way all feature requirements are checked in one go. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-07netfilter: nf_tables: get set elements via netlinkPablo Neira Ayuso1-0/+73
This patch adds a new get operation to look up for specific elements in a set via netlink interface. You can also use it to check if an interval already exists. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>