summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2023-04-23net/sched: act_pedit: rate limit datapath messagesPedro Tammela1-7/+5
Unbounded info messages in the pedit datapath can flood the printk ring buffer quite easily depending on the action created. As these messages are informational, usually printing some, not all, is enough to bring attention to the real issue. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: remove extra check for key typePedro Tammela1-22/+7
The netlink parsing already validates the key 'htype'. Remove the datapath check as it's redundant. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: check static offsets a prioriPedro Tammela1-6/+14
Static key offsets should always be on 32 bit boundaries. Validate them on create/update time for static offsets and move the datapath validation for runtime offsets only. iproute2 already errors out if a given offset and data size cannot be packed to a 32 bit boundary. This change will make sure users which create/update pedit instances directly via netlink also error out, instead of finding out when packets are traversing. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: use extack in 'ex' parsing errorsPedro Tammela1-4/+13
We have extack available when parsing 'ex' keys, so pass it to tcf_pedit_keys_ex_parse and add more detailed error messages. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net/sched: act_pedit: use NLA_POLICY for parsing 'ex' keysPedro Tammela1-8/+3
Transform two checks in the 'ex' key parsing into netlink policies removing extra if checks. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: sched: Print msecs when transmit queue time outYajun Deng1-5/+5
The kernel will print several warnings in a short period of time when it stalls. Like this: First warning: [ 7100.097547] ------------[ cut here ]------------ [ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270 ... Second warning: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000 softirq=367 3137/3673146 fqs=13844 [ 7147.756960] (t=60001 jiffies g=4322709 q=133381) [ 7147.756962] NMI backtrace for cpu 24 ... We calculate that the transmit queue start stall should occur before 7095s according to watchdog_timeo, the rcu start stall at 7087s. These two times are close together, it is difficult to confirm which happened first. To let users know the exact time the stall started, print msecs when the transmit queue time out. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-6/+10
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net: skbuff: hide csum_not_inet when CONFIG_IP_SCTP not setJakub Kicinski1-2/+1
SCTP is not universally deployed, allow hiding its bit from the skb. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17net/sched: clear actions pointer in miss cookie init failPedro Tammela1-0/+3
Palash reports a UAF when using a modified version of syzkaller[1]. When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()' a call to 'tcf_exts_destroy()' is made to free up the tcf_exts resources. In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made; Then calling 'tcf_exts_destroy()', which triggers an UAF since the already freed tcf_exts action pointer is lingering in the struct. Before the offending patch, this was not an issue since there was no case where the tcf_exts action pointer could linger. Therefore, restore the old semantic by clearing the action pointer in case of a failure to initialize the miss_cookie. [1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus v1->v2: Fix compilation on configs without tc actions (kernel test robot) Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Reported-by: Palash Oswal <oswalpalash@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_aggGwangun Jung1-6/+7
If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. The MTU of the loopback device can be set up to 2^31-1. As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors. The following reports a oob access: [ 84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301 [ 84.583686] [ 84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1 [ 84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 84.584644] Call Trace: [ 84.584787] <TASK> [ 84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) [ 84.585570] kasan_report (mm/kasan/report.c:538) [ 84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255) [ 84.587607] dev_qdisc_enqueue (net/core/dev.c:3776) [ 84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212) [ 84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228) [ 84.589460] ip_output (net/ipv4/ip_output.c:430) [ 84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606) [ 84.590285] raw_sendmsg (net/ipv4/raw.c:649) [ 84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.592084] __sys_sendto (net/socket.c:2142) [ 84.593306] __x64_sys_sendto (net/socket.c:2150) [ 84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.594070] RIP: 0033:0x7fe568032066 [ 84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c Code starting with the faulting instruction =========================================== [ 84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066 [ 84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003 [ 84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010 [ 84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001 [ 84.596218] </TASK> [ 84.596295] [ 84.596351] Allocated by task 291: [ 84.596467] kasan_save_stack (mm/kasan/common.c:46) [ 84.596597] kasan_set_track (mm/kasan/common.c:52) [ 84.596725] __kasan_kmalloc (mm/kasan/common.c:384) [ 84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974) [ 84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938) [ 84.597100] qdisc_create (net/sched/sch_api.c:1244) [ 84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680) [ 84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174) [ 84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574) [ 84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365) [ 84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942) [ 84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.598016] ____sys_sendmsg (net/socket.c:2501) [ 84.598147] ___sys_sendmsg (net/socket.c:2557) [ 84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586) [ 84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.598688] [ 84.598744] The buggy address belongs to the object at ffff88810f674000 [ 84.598744] which belongs to the cache kmalloc-8k of size 8192 [ 84.599135] The buggy address is located 2664 bytes to the right of [ 84.599135] allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0) [ 84.599544] [ 84.599598] The buggy address belongs to the physical page: [ 84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670 [ 84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2) [ 84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000 [ 84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 84.601009] page dumped because: kasan: bad access detected [ 84.601187] [ 84.601241] Memory state around the buggy address: [ 84.601396] ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601620] ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602069] ^ [ 84.602243] ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602468] ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602693] ================================================================== [ 84.602924] Disabling lock debugging due to kernel taint Fixes: 3015f3d2a3cd ("pkt_sched: enable QFQ to support TSO/GSO") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Gwangun Jung <exsociety@gmail.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14net/sched: taprio: allow per-TC user input of FP adminStatusVladimir Oltean1-13/+52
This is a duplication of the FP adminStatus logic introduced for tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload structure embedded within tc_taprio_qopt_offload. So practically, if a device driver is written to treat the mqprio portion of taprio just like standalone mqprio, it gets unified handling of frame preemption. I would have reused more code with taprio, but this is mostly netlink attribute parsing, which is hard to transform into generic code without having something that stinks as a result. We have the same variables with the same semantics, just different nlattr type values (TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12; TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and consequently, different policies for the nest. Every time nla_parse_nested() is called, an on-stack table "tb" of nlattr pointers is allocated statically, up to the maximum understood nlattr type. That array size is hardcoded as a constant, but when transforming this into a common parsing function, it would become either a VLA (which the Linux kernel rightfully doesn't like) or a call to the allocator. Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE" and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE" use cases. HOLD and RELEASE events are emitted towards the underlying MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a Set-And-Release-MAC gate operation. So within the tc-taprio UAPI space, one can distinguish between the 2 use cases by choosing whether to use the TC_TAPRIO_CMD_SET_AND_HOLD and TC_TAPRIO_CMD_SET_AND_RELEASE gate operations within the schedule, or just TC_TAPRIO_CMD_SET_GATES. A small part of the change is dedicated to refactoring the max_sdu nlattr parsing to put all logic under the "if" that tests for presence of that nlattr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14net/sched: mqprio: allow per-TC user input of FP adminStatusVladimir Oltean3-1/+143
IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each packet priority can be assigned to a "frame preemption status" value of either "express" or "preemptible". Express priorities are transmitted by the local device through the eMAC, and preemptible priorities through the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge layer). The FP adminStatus is defined per packet priority, but 802.1Q clause 12.30.1.1.1 framePreemptionAdminStatus also says that: | Priorities that all map to the same traffic class should be | constrained to use the same value of preemption status. It is impossible to ignore the cognitive dissonance in the standard here, because it practically means that the FP adminStatus only takes distinct values per traffic class, even though it is defined per priority. I can see no valid use case which is prevented by having the kernel take the FP adminStatus as input per traffic class (what we do here). In addition, this also enforces the above constraint by construction. User space network managers which wish to expose FP adminStatus per priority are free to do so; they must only observe the prio_tc_map of the netdev (which presumably is also under their control, when constructing the mqprio netlink attributes). The reason for configuring frame preemption as a property of the Qdisc layer is that the information about "preemptible TCs" is closest to the place which handles the num_tc and prio_tc_map of the netdev. If the UAPI would have been any other layer, it would be unclear what to do with the FP information when num_tc collapses to 0. A key assumption is that only mqprio/taprio change the num_tc and prio_tc_map of the netdev. Not sure if that's a great assumption to make. Having FP in tc-mqprio can be seen as an implementation of the use case defined in 802.1Q Annex S.2 "Preemption used in isolation". There will be a separate implementation of FP in tc-taprio, for the other use cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14net/sched: pass netlink extack to mqprio and taprio offloadVladimir Oltean2-3/+14
With the multiplexed ndo_setup_tc() model which lacks a first-class struct netlink_ext_ack * argument, the only way to pass the netlink extended ACK message down to the device driver is to embed it within the offload structure. Do this for struct tc_mqprio_qopt_offload and struct tc_taprio_qopt_offload. Since struct tc_taprio_qopt_offload also contains a tc_mqprio_qopt_offload structure, and since device drivers might effectively reuse their mqprio implementation for the mqprio portion of taprio, we make taprio set the extack in both offload structures to point at the same netlink extack message. In fact, the taprio handling is a bit more tricky, for 2 reasons. First is because the offload structure has a longer lifetime than the extack structure. The driver is supposed to populate the extack synchronously from ndo_setup_tc() and leave it alone afterwards. To not have any use-after-free surprises, we zero out the extack pointer when we leave taprio_enable_offload(). The second reason is because taprio does overwrite the extack message on ndo_setup_tc() error. We need to switch to the weak form of setting an extack message, which preserves a potential message set by the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14net/sched: mqprio: add an extack message to mqprio_parse_opt()Vladimir Oltean1-1/+4
Ferenc reports that a combination of poor iproute2 defaults and obscure cases where the kernel returns -EINVAL make it difficult to understand what is wrong with this command: $ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name veth1 $ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 RTNETLINK answers: Invalid argument Hopefully with this patch, the cause is clearer: Error: Device does not support hardware offload. The kernel was (and still is) rejecting this because iproute2 defaults to "hw 1" if this command line option is not specified. Link: https://lore.kernel.org/netdev/ede5e9a2f27bf83bfb86d3e8c4ca7b34093b99e2.camel@inf.elte.hu/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14net/sched: mqprio: add extack to mqprio_parse_nlattr()Vladimir Oltean1-7/+23
Netlink attribute parsing in mqprio is a minesweeper game, with many options having the possibility of being passed incorrectly and the user being none the wiser. Try to make errors less sour by giving user space some information regarding what went wrong. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONSVladimir Oltean1-19/+13
In commit 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio"), the TCA_OPTIONS format of mqprio was extended to contain a fixed portion (of size NLA_ALIGN(sizeof struct tc_mqprio_qopt)) and a variable portion of other nlattrs (in the TCA_MQPRIO_* type space) following immediately afterwards. In commit feb2cf3dcfb9 ("net/sched: mqprio: refactor nlattr parsing to a separate function"), we've moved the nlattr handling to a smaller function, but yet, a small parse_attr() still remains, and the larger mqprio_parse_nlattr() still does not have access to the beginning, and the length, of the TCA_OPTIONS region containing these other nlattrs. In a future change, the mqprio qdisc will need to iterate through this nlattr region to discover other attributes, so eliminate parse_attr() and add 2 variables in mqprio_parse_nlattr() which hold the beginning and the length of the nlattr range. We avoid the need to memset when nlattr_opt_len has insufficient length by pre-initializing the table "tb". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06net/sched: sch_mqprio: use netlink payload helpersPedro Tammela1-4/+4
For the sake of readability, use the netlink payload helpers from the 'nla_get_*()' family to parse the attributes. tdc results: 1..5 ok 1 9903 - Add mqprio Qdisc to multi-queue device (8 queues) ok 2 453a - Delete nonexistent mqprio Qdisc ok 3 5292 - Delete mqprio Qdisc twice ok 4 45a9 - Add mqprio Qdisc to single-queue device ok 5 2ba9 - Show mqprio class Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230404203449.1627033-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-31net/sched: act_tunnel_key: add support for "don't fragment"Davide Caratti1-0/+5
extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-24fix typos in net/sched/* filesTaichi Nishimura3-3/+3
This patch fixes typos in net/sched/* files. Signed-off-by: Taichi Nishimura <awkrail01@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-23net/sched: act_api: use the correct TCA_ACT attributes in dumpPedro Tammela1-4/+4
4 places in the act api code are using 'TCA_' definitions where they should be using 'TCA_ACT_', which is confusing for the reader, although functionally they are equivalent. Cc: Hangbin Liu <haliu@redhat.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-23net/sched: remove two skb_mac_header() usesEric Dumazet2-2/+2
tcf_mirred_act() and tcf_mpls_act() can use skb_network_offset() instead of relying on skb_mac_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-23sch_cake: do not use skb_mac_header() in cake_overhead()Eric Dumazet1-3/+3
We want to remove our use of skb_mac_header() in tx paths, eg remove skb_reset_mac_header() from __dev_queue_xmit(). Idea is that ndo_start_xmit() can get the mac header simply looking at skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+4
net/wireless/nl80211.c b27f07c50a73 ("wifi: nl80211: fix puncturing bitmap policy") cbbaf2bb829b ("wifi: nl80211: add a command to enable/disable HW timestamping") https://lore.kernel.org/all/20230314105421.3608efae@canb.auug.org.au tools/testing/selftests/net/Makefile 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 13715acf8ab5 ("selftest: Add test for bind() conflicts.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17net/sched: act_api: add specific EXT_WARN_MSG for tc actionHangbin Liu1-4/+4
In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this param before going to the TCA_ACT_TAB nest. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"Hangbin Liu1-2/+2
This reverts commit 923b2e30dc9cd05931da0f64e2e23d040865c035. This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in TCA_ROOT_MAX as Jamal suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10net: sched: remove qdisc_watchdog->last_expiresEric Dumazet1-2/+4
This field mirrors hrtimer softexpires, we can instead use the existing helpers. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230308182648.1150762-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-01net/sched: flower: fix fl_change() error recovery pathEric Dumazet1-4/+6
The two "goto errout;" paths in fl_change() became wrong after cited commit. Indeed we only must not call __fl_put() until the net pointer has been set in tcf_exts_init_ex() This is a minimal fix. We might in the future validate TCA_FLOWER_FLAGS before we allocate @fnew. BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:72 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in refcount_read include/linux/refcount.h:147 [inline] BUG: KASAN: null-ptr-deref in __refcount_add_not_zero include/linux/refcount.h:152 [inline] BUG: KASAN: null-ptr-deref in __refcount_inc_not_zero include/linux/refcount.h:227 [inline] BUG: KASAN: null-ptr-deref in refcount_inc_not_zero include/linux/refcount.h:245 [inline] BUG: KASAN: null-ptr-deref in maybe_get_net include/net/net_namespace.h:269 [inline] BUG: KASAN: null-ptr-deref in tcf_exts_get_net include/net/pkt_cls.h:260 [inline] BUG: KASAN: null-ptr-deref in __fl_put net/sched/cls_flower.c:513 [inline] BUG: KASAN: null-ptr-deref in __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 Read of size 4 at addr 000000000000014c by task syz-executor548/5082 CPU: 0 PID: 5082 Comm: syz-executor548 Not tainted 6.2.0-syzkaller-05251-g5b7c4cabbb65 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_report mm/kasan/report.c:420 [inline] kasan_report+0xec/0x130 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:72 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] __refcount_add_not_zero include/linux/refcount.h:152 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] maybe_get_net include/net/net_namespace.h:269 [inline] tcf_exts_get_net include/net/pkt_cls.h:260 [inline] __fl_put net/sched/cls_flower.c:513 [inline] __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 fl_change+0x101b/0x4ab0 net/sched/cls_flower.c:2341 tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310 rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 ____sys_sendmsg+0x334/0x900 net/socket.c:2504 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2558 __sys_sendmmsg+0x18f/0x460 net/socket.c:2644 __do_sys_sendmmsg net/socket.c:2673 [inline] __se_sys_sendmmsg net/socket.c:2670 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2670 Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Reported-by: syzbot+baabf3efa7c1e57d28b2@syzkaller.appspotmail.com Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Blakey <paulb@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-01net/sched: act_connmark: handle errno on tcf_idr_check_allocPedro Tammela1-0/+3
Smatch reports that 'ci' can be used uninitialized. The current code ignores errno coming from tcf_idr_check_alloc, which will lead to the incorrect usage of 'ci'. Handle the errno as it should. Fixes: 288864effe33 ("net/sched: act_connmark: transition to percpu stats and rcu") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-27net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchyPedro Tammela1-2/+2
TCA_EXT_WARN_MSG is currently sitting outside of the expected hierarchy for the tc actions code. It should sit within TCA_ACT_TAB. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-26net/sched: act_sample: fix action bind logicPedro Tammela1-2/+9
The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action sample ... index 1 tc filter add ... action pedit index 1 In the current code for act_sample this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..29 ok 1 9784 - Add valid sample action with mandatory arguments ok 2 5c91 - Add valid sample action with mandatory arguments and continue control action ok 3 334b - Add valid sample action with mandatory arguments and drop control action ok 4 da69 - Add valid sample action with mandatory arguments and reclassify control action ok 5 13ce - Add valid sample action with mandatory arguments and pipe control action ok 6 1886 - Add valid sample action with mandatory arguments and jump control action ok 7 7571 - Add sample action with invalid rate ok 8 b6d4 - Add sample action with mandatory arguments and invalid control action ok 9 a874 - Add invalid sample action without mandatory arguments ok 10 ac01 - Add invalid sample action without mandatory argument rate ok 11 4203 - Add invalid sample action without mandatory argument group ok 12 14a7 - Add invalid sample action without mandatory argument group ok 13 8f2e - Add valid sample action with trunc argument ok 14 45f8 - Add sample action with maximum rate argument ok 15 ad0c - Add sample action with maximum trunc argument ok 16 83a9 - Add sample action with maximum group argument ok 17 ed27 - Add sample action with invalid rate argument ok 18 2eae - Add sample action with invalid group argument ok 19 6ff3 - Add sample action with invalid trunc size ok 20 2b2a - Add sample action with invalid index ok 21 dee2 - Add sample action with maximum allowed index ok 22 560e - Add sample action with cookie ok 23 704a - Replace existing sample action with new rate argument ok 24 60eb - Replace existing sample action with new group argument ok 25 2cce - Replace existing sample action with new trunc argument ok 26 59d1 - Replace existing sample action with new control argument ok 27 0a6e - Replace sample action with invalid goto chain control ok 28 3872 - Delete sample action with valid index ok 29 a394 - Delete sample action with invalid index Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-26net/sched: act_mpls: fix action bind logicPedro Tammela1-29/+37
The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action mpls ... index 1 tc filter add ... action mpls index 1 In the current code for act_mpls this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..53 ok 1 a933 - Add MPLS dec_ttl action with pipe opcode ok 2 08d1 - Add mpls dec_ttl action with pass opcode ok 3 d786 - Add mpls dec_ttl action with drop opcode ok 4 f334 - Add mpls dec_ttl action with reclassify opcode ok 5 29bd - Add mpls dec_ttl action with continue opcode ok 6 48df - Add mpls dec_ttl action with jump opcode ok 7 62eb - Add mpls dec_ttl action with trap opcode ok 8 09d2 - Add mpls dec_ttl action with opcode and cookie ok 9 c170 - Add mpls dec_ttl action with opcode and cookie of max length ok 10 9118 - Add mpls dec_ttl action with invalid opcode ok 11 6ce1 - Add mpls dec_ttl action with label (invalid) ok 12 352f - Add mpls dec_ttl action with tc (invalid) ok 13 fa1c - Add mpls dec_ttl action with ttl (invalid) ok 14 6b79 - Add mpls dec_ttl action with bos (invalid) ok 15 d4c4 - Add mpls pop action with ip proto ok 16 91fb - Add mpls pop action with ip proto and cookie ok 17 92fe - Add mpls pop action with mpls proto ok 18 7e23 - Add mpls pop action with no protocol (invalid) ok 19 6182 - Add mpls pop action with label (invalid) ok 20 6475 - Add mpls pop action with tc (invalid) ok 21 067b - Add mpls pop action with ttl (invalid) ok 22 7316 - Add mpls pop action with bos (invalid) ok 23 38cc - Add mpls push action with label ok 24 c281 - Add mpls push action with mpls_mc protocol ok 25 5db4 - Add mpls push action with label, tc and ttl ok 26 7c34 - Add mpls push action with label, tc ttl and cookie of max length ok 27 16eb - Add mpls push action with label and bos ok 28 d69d - Add mpls push action with no label (invalid) ok 29 e8e4 - Add mpls push action with ipv4 protocol (invalid) ok 30 ecd0 - Add mpls push action with out of range label (invalid) ok 31 d303 - Add mpls push action with out of range tc (invalid) ok 32 fd6e - Add mpls push action with ttl of 0 (invalid) ok 33 19e9 - Add mpls mod action with mpls label ok 34 1fde - Add mpls mod action with max mpls label ok 35 0c50 - Add mpls mod action with mpls label exceeding max (invalid) ok 36 10b6 - Add mpls mod action with mpls label of MPLS_LABEL_IMPLNULL (invalid) ok 37 57c9 - Add mpls mod action with mpls min tc ok 38 6872 - Add mpls mod action with mpls max tc ok 39 a70a - Add mpls mod action with mpls tc exceeding max (invalid) ok 40 6ed5 - Add mpls mod action with mpls ttl ok 41 77c1 - Add mpls mod action with mpls ttl and cookie ok 42 b80f - Add mpls mod action with mpls max ttl ok 43 8864 - Add mpls mod action with mpls min ttl ok 44 6c06 - Add mpls mod action with mpls ttl of 0 (invalid) ok 45 b5d8 - Add mpls mod action with mpls ttl exceeding max (invalid) ok 46 451f - Add mpls mod action with mpls max bos ok 47 a1ed - Add mpls mod action with mpls min bos ok 48 3dcf - Add mpls mod action with mpls bos exceeding max (invalid) ok 49 db7c - Add mpls mod action with protocol (invalid) ok 50 b070 - Replace existing mpls push action with new ID ok 51 95a9 - Replace existing mpls push action with new label, tc, ttl and cookie ok 52 6cce - Delete mpls pop action ok 53 d138 - Flush mpls actions Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-26net/sched: act_pedit: fix action bind logicPedro Tammela1-27/+31
The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action pedit ... index 1 tc filter add ... action pedit index 1 In the current code for act_pedit this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..69 ok 1 319a - Add pedit action that mangles IP TTL ok 2 7e67 - Replace pedit action with invalid goto chain ok 3 377e - Add pedit action with RAW_OP offset u32 ok 4 a0ca - Add pedit action with RAW_OP offset u32 (INVALID) ok 5 dd8a - Add pedit action with RAW_OP offset u16 u16 ok 6 53db - Add pedit action with RAW_OP offset u16 (INVALID) ok 7 5c7e - Add pedit action with RAW_OP offset u8 add value ok 8 2893 - Add pedit action with RAW_OP offset u8 quad ok 9 3a07 - Add pedit action with RAW_OP offset u8-u16-u8 ok 10 ab0f - Add pedit action with RAW_OP offset u16-u8-u8 ok 11 9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert ok 12 ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID) ok 13 f512 - Add pedit action with RAW_OP offset u16 at offmask shift set ok 14 c2cb - Add pedit action with RAW_OP offset u32 retain value ok 15 1762 - Add pedit action with RAW_OP offset u8 clear value ok 16 bcee - Add pedit action with RAW_OP offset u8 retain value ok 17 e89f - Add pedit action with RAW_OP offset u16 retain value ok 18 c282 - Add pedit action with RAW_OP offset u32 clear value ok 19 c422 - Add pedit action with RAW_OP offset u16 invert value ok 20 d3d3 - Add pedit action with RAW_OP offset u32 invert value ok 21 57e5 - Add pedit action with RAW_OP offset u8 preserve value ok 22 99e0 - Add pedit action with RAW_OP offset u16 preserve value ok 23 1892 - Add pedit action with RAW_OP offset u32 preserve value ok 24 4b60 - Add pedit action with RAW_OP negative offset u16/u32 set value ok 25 a5a7 - Add pedit action with LAYERED_OP eth set src ok 26 86d4 - Add pedit action with LAYERED_OP eth set src & dst ok 27 f8a9 - Add pedit action with LAYERED_OP eth set dst ok 28 c715 - Add pedit action with LAYERED_OP eth set src (INVALID) ok 29 8131 - Add pedit action with LAYERED_OP eth set dst (INVALID) ok 30 ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence ok 31 dec4 - Add pedit action with LAYERED_OP eth set type (INVALID) ok 32 ab06 - Add pedit action with LAYERED_OP eth add type ok 33 918d - Add pedit action with LAYERED_OP eth invert src ok 34 a8d4 - Add pedit action with LAYERED_OP eth invert dst ok 35 ee13 - Add pedit action with LAYERED_OP eth invert type ok 36 7588 - Add pedit action with LAYERED_OP ip set src ok 37 0fa7 - Add pedit action with LAYERED_OP ip set dst ok 38 5810 - Add pedit action with LAYERED_OP ip set src & dst ok 39 1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield ok 40 02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol ok 41 3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID) ok 42 31ae - Add pedit action with LAYERED_OP ip ttl clear/set ok 43 486f - Add pedit action with LAYERED_OP ip set duplicate fields ok 44 e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields ok 45 cc8a - Add pedit action with LAYERED_OP ip set tos ok 46 7a17 - Add pedit action with LAYERED_OP ip set precedence ok 47 c3b6 - Add pedit action with LAYERED_OP ip add tos ok 48 43d3 - Add pedit action with LAYERED_OP ip add precedence ok 49 438e - Add pedit action with LAYERED_OP ip clear tos ok 50 6b1b - Add pedit action with LAYERED_OP ip clear precedence ok 51 824a - Add pedit action with LAYERED_OP ip invert tos ok 52 106f - Add pedit action with LAYERED_OP ip invert precedence ok 53 6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport ok 54 afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code ok 55 3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID) ok 56 815c - Add pedit action with LAYERED_OP ip6 set src ok 57 4dae - Add pedit action with LAYERED_OP ip6 set dst ok 58 fc1f - Add pedit action with LAYERED_OP ip6 set src & dst ok 59 6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID) ok 60 94bb - Add pedit action with LAYERED_OP ip6 traffic_class ok 61 6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl ok 62 6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit ok 63 1442 - Add pedit action with LAYERED_OP tcp set dport & sport ok 64 b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID) ok 65 cfcc - Add pedit action with LAYERED_OP tcp flags set ok 66 3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags fields ok 67 f1c8 - Add pedit action with LAYERED_OP udp set dport & sport ok 68 d784 - Add pedit action with mixed RAW/LAYERED_OP #1 ok 69 70ca - Add pedit action with mixed RAW/LAYERED_OP #2 Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Fixes: f67169fef8db ("net/sched: act_pedit: fix WARN() in the traffic path") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-26net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()Nathan Chancellor1-1/+1
When CONFIG_NET_CLS_ACT is disabled: ../net/sched/cls_api.c:141:13: warning: 'tcf_exts_miss_cookie_base_destroy' defined but not used [-Wunused-function] 141 | static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Due to the way the code is structured, it is possible for a definition of tcf_exts_miss_cookie_base_destroy() to be present without actually being used. Its single callsite is in an '#ifdef CONFIG_NET_CLS_ACT' block but a definition will always be present in the file. The version of tcf_exts_miss_cookie_base_destroy() that actually does something depends on CONFIG_NET_TC_SKB_EXT, so the stub function is used in both CONFIG_NET_CLS_ACT=n and CONFIG_NET_CLS_ACT=y + CONFIG_NET_TC_SKB_EXT=n configurations. Move the call to tcf_exts_miss_cookie_base_destroy() in tcf_exts_destroy() out of the '#ifdef CONFIG_NET_CLS_ACT', so that it always appears used to the compiler, while not changing any behavior with any of the various configuration combinations. Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-21net/sched: flower: Support hardware miss to tc actionPaul Blakey1-1/+12
To support hardware miss to tc action in actions on the flower classifier, implement the required getting of filter actions, and setup filter exts (actions) miss by giving it the filter's handle and actions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21net/sched: flower: Move filter handle initialization earlierPaul Blakey1-27/+35
To support miss to action during hardware offload the filter's handle is needed when setting up the actions (tcf_exts_init()), and before offloading. Move filter handle initialization earlier. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21net/sched: cls_api: Support hardware miss to tc actionPaul Blakey2-11/+206
For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21net/sched: Rename user cookie and act cookiePaul Blakey2-27/+27
struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimitedVladimir Oltean1-0/+2
It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net/sched: taprio: don't allow dynamic max_sdu to go negative after stab ↵Vladimir Oltean1-1/+7
adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20net/sched: taprio: fix calculation of maximum gate durationsVladimir Oltean1-17/+17
taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-17Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-3/+3
Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-16net/sched: act_pedit: use percpu overlimit counter when availablePedro Tammela1-3/+1
Since act_pedit now has access to percpu counters, use the tcf_action_inc_overlimit_qstats wrapper that will use the percpu counter whenever they are available. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: act_gate: use percpu statsPedro Tammela1-14/+16
The tc action act_gate was using shared stats, move it to percpu stats. tdc results: 1..12 ok 1 5153 - Add gate action with priority and sched-entry ok 2 7189 - Add gate action with base-time ok 3 a721 - Add gate action with cycle-time ok 4 c029 - Add gate action with cycle-time-ext ok 5 3719 - Replace gate base-time action ok 6 d821 - Delete gate action with valid index ok 7 3128 - Delete gate action with invalid index ok 8 7837 - List gate actions ok 9 9273 - Flush gate actions ok 10 c829 - Add gate action with duplicate index ok 11 3043 - Add gate action with invalid index ok 12 2930 - Add gate action with cookie Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: act_connmark: transition to percpu stats and rcuPedro Tammela1-39/+68
The tc action act_connmark was using shared stats and taking the per action lock in the datapath. Improve it by using percpu stats and rcu. perf before: - 13.55% tcf_connmark_act - 81.18% _raw_spin_lock 80.46% native_queued_spin_lock_slowpath perf after: - 2.85% tcf_connmark_act tdc results: 1..15 ok 1 2002 - Add valid connmark action with defaults ok 2 56a5 - Add valid connmark action with control pass ok 3 7c66 - Add valid connmark action with control drop ok 4 a913 - Add valid connmark action with control pipe ok 5 bdd8 - Add valid connmark action with control reclassify ok 6 b8be - Add valid connmark action with control continue ok 7 d8a6 - Add valid connmark action with control jump ok 8 aae8 - Add valid connmark action with zone argument ok 9 2f0b - Add valid connmark action with invalid zone argument ok 10 9305 - Add connmark action with unsupported argument ok 11 71ca - Add valid connmark action and replace it ok 12 5f8f - Add valid connmark action with cookie ok 13 c506 - Replace connmark with invalid goto chain control ok 14 6571 - Delete connmark action with valid index ok 15 3426 - Delete connmark action with invalid index Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: act_nat: transition to percpu stats and rcuPedro Tammela1-23/+49
The tc action act_nat was using shared stats and taking the per action lock in the datapath. Improve it by using percpu stats and rcu. perf before: - 10.48% tcf_nat_act - 81.83% _raw_spin_lock 81.08% native_queued_spin_lock_slowpath perf after: - 0.48% tcf_nat_act tdc results: 1..27 ok 1 7565 - Add nat action on ingress with default control action ok 2 fd79 - Add nat action on ingress with pipe control action ok 3 eab9 - Add nat action on ingress with continue control action ok 4 c53a - Add nat action on ingress with reclassify control action ok 5 76c9 - Add nat action on ingress with jump control action ok 6 24c6 - Add nat action on ingress with drop control action ok 7 2120 - Add nat action on ingress with maximum index value ok 8 3e9d - Add nat action on ingress with invalid index value ok 9 f6c9 - Add nat action on ingress with invalid IP address ok 10 be25 - Add nat action on ingress with invalid argument ok 11 a7bd - Add nat action on ingress with DEFAULT IP address ok 12 ee1e - Add nat action on ingress with ANY IP address ok 13 1de8 - Add nat action on ingress with ALL IP address ok 14 8dba - Add nat action on egress with default control action ok 15 19a7 - Add nat action on egress with pipe control action ok 16 f1d9 - Add nat action on egress with continue control action ok 17 6d4a - Add nat action on egress with reclassify control action ok 18 b313 - Add nat action on egress with jump control action ok 19 d9fc - Add nat action on egress with drop control action ok 20 a895 - Add nat action on egress with DEFAULT IP address ok 21 2572 - Add nat action on egress with ANY IP address ok 22 37f3 - Add nat action on egress with ALL IP address ok 23 6054 - Add nat action on egress with cookie ok 24 79d6 - Add nat action on ingress with cookie ok 25 4b12 - Replace nat action with invalid goto chain control ok 26 b811 - Delete nat action with valid index ok 27 a521 - Delete nat action with invalid index Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire rsvp classifierJamal Hadi Salim5-846/+0
The rsvp classifier has served us well for about a quarter of a century but has has not been getting much maintenance attention due to lack of known users. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire tcindex classifierJamal Hadi Salim3-728/+0
The tcindex classifier has served us well for about a quarter of a century but has not been getting much TLC due to lack of known users. Most recently it has become easy prey to syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire dsmark qdiscJamal Hadi Salim3-530/+0
The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire ATM qdiscJamal Hadi Salim3-721/+0
The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire CBQ qdiscJamal Hadi Salim3-1745/+0
While this amazing qdisc has served us well over the years it has not been getting any tender love and care and has bitrotted over time. It has become mostly a shooting target for syzkaller lately. For this reason, we are retiring it. Goodbye CBQ - we loved you. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>