summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-05net: gro: minor optimization for dev_gro_receive()Paolo Abeni2-35/+32
While inspecting some perf report, I noticed that the compiler emits suboptimal code for the napi CB initialization, fetching and storing multiple times the memory for flags bitfield. This is with gcc 10.3.1, but I observed the same with older compiler versions. We can help the compiler to do a nicer work clearing several fields at once using an u32 alias. The generated code is quite smaller, with the same number of conditional. Before: objdump -t net/core/gro.o | grep " F .text" 0000000000000bb0 l F .text 0000000000000357 dev_gro_receive After: 0000000000000bb0 l F .text 000000000000033c dev_gro_receive v1 -> v2: - use struct_group (Alexander and Alex) RFC -> v1: - use __struct_group to delimit the zeroed area (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: gro: avoid re-computing truesize twice on recyclePaolo Abeni1-1/+0
After commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference") and commit af352460b465 ("net: fix GRO skb truesize update") the truesize of the skb with stolen head is properly updated by the GRO engine, we don't need anymore resetting it at recycle time. v1 -> v2: - clarify the commit message (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: dsa: qca8k: check correct variable in qca8k_phy_eth_command()Dan Carpenter1-1/+1
This is a copy and paste bug. It was supposed to check "clear_skb" instead of "write_skb". Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'lan966x-mcast-snooping'David S. Miller5-2/+166
Horatiu Vultur says: ==================== net: lan966x: add support for mcast snooping Implement the switchdev callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED to allow to enable/disable multicast snooping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Update mdb when enabling/disabling mcast_snoopingHoratiu Vultur3-0/+51
When the multicast snooping is disabled, the mdb entries should be removed from the HW, but they still need to be kept in memory for when the mcast_snooping will be enabled again. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLEDHoratiu Vultur4-1/+110
The callback allows to enable/disable multicast snooping. When the snooping is enabled, all IGMP and MLD frames are redirected to the CPU, therefore make sure not to set the skb flag 'offload_fwd_mark'. The HW will not flood multicast ipv4/ipv6 data frames. When the snooping is disabled, the HW will flood IGMP, MLD and multicast ipv4/ipv6 frames according to the mcast_flood flag. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net: lan966x: Update the PGID used by IPV6 data framesHoratiu Vultur1-1/+5
When enabling the multicast snooping, the forwarding of the IPV6 frames has it's own forwarding mask. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net/sched: Enable tc skb ext allocation on chain miss only when neededPaul Blakey5-23/+56
Currently tc skb extension is used to send miss info from tc to ovs datapath module, and driver to tc. For the tc to ovs miss it is currently always allocated even if it will not be used by ovs datapath (as it depends on a requested feature). Export the static key which is used by openvswitch module to guard this code path as well, so it will be skipped if ovs datapath doesn't need it. Enable this code path once ovs datapath needs it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05Merge branch 'mptcp-improve-set-flags-command-and-update-self-tests'Jakub Kicinski4-388/+595
Mat Martineau says: ==================== mptcp: Improve set-flags command and update self tests Patches 1-3 allow more flexibility in the combinations of features and flags allowed with the MPTCP_PM_CMD_SET_FLAGS netlink command, and add self test case coverage for the new functionality. Patches 4-6 and 9 refactor the mptcp_join.sh self tests to allow them to configure all of the test cases using either the pm_nl_ctl utility (part of the mptcp self tests) or the 'ip mptcp' command (from iproute2). The default remains to use pm_nl_ctl. Patches 7 and 8 update the pm_netlink.sh self tests to cover the use of endpoint ids to set endpoint flags (instead of just addresses). ==================== Link: https://lore.kernel.org/r/20220205000337.187292-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: set ip_mptcp in command lineGeliang Tang1-3/+9
This patch added a command line option '-i' for mptcp_join.sh to use 'ip mptcp' commands instead of using 'pm_nl_ctl' commands to deal with PM netlink. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add set_flags tests in pm_netlink.shGeliang Tang1-0/+18
This patch added the setting flags test cases, using both addr-based and id-based lookups for the setting address. The output looks like this: set flags (backup) [ OK ] (nobackup) [ OK ] (fullmesh) [ OK ] (nofullmesh) [ OK ] (backup,fullmesh) [ OK ] Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add the id argument for set_flagsGeliang Tang1-21/+42
This patch added the id argument for setting the address flags in pm_nl_ctl. Usage: pm_nl_ctl set id 1 flags backup Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add wrapper for setting flagsGeliang Tang1-3/+21
This patch implemented a new function named pm_nl_set_endpoint(), wrapped the PM netlink commands 'ip mptcp endpoint change flags' and 'pm_nl_ctl set flags' in it, and used a new argument 'ip_mptcp' to choose which one to use to set the flags of the PM endpoint. 'ip mptcp' used the ID number argument to find out the address to change flags, while 'pm_nl_ctl' used the address and port number arguments. So we need to parse the address ID from the PM dump output as well as the address and port number. Used this wrapper in do_transfer() instead of using the pm_nl_ctl command directly. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add wrapper for showing addrsGeliang Tang1-28/+50
This patch implemented a new function named pm_nl_show_endpoints(), wrapped the PM netlink commands 'ip mptcp endpoint show' and 'pm_nl_ctl dump' in it, used a new argument 'ip_mptcp' to choose which one to use to show all the PM endpoints. Used this wrapper in do_transfer() instead of using the pm_nl_ctl commands directly. The original 'pos+=5' in the remoing tests only works for the output of 'pm_nl_ctl show': id 1 flags subflow 10.0.1.1 It doesn't work for the output of 'ip mptcp endpoint show': 10.0.1.1 id 1 subflow So implemented a more flexible approach to get the address ID from the PM dump output to fit for both commands. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add ip mptcp wrappersGeliang Tang1-330/+407
This patch added four basic 'ip mptcp' wrappers: pm_nl_set_limits() pm_nl_add_endpoint() pm_nl_del_endpoint() pm_nl_flush_endpoint(). Wrapped the PM netlink commands 'ip mptcp' and 'pm_nl_ctl' in them, and used a new argument 'ip_mptcp' to choose which one to use for setting the PM limits, adding or deleting the PM endpoint. Used the wrappers in all the selftests in mptcp_join.sh instead of using the pm_nl_ctl commands directly. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add backup with port testcaseGeliang Tang1-5/+39
This patch added the backup testcase using an address with a port number. The original backup tests only work for the output of 'pm_nl_ctl dump' without the port number. It chooses the last item in the dump to parse the address in it, and in this case, the address is showed at the end of the item. But it doesn't work for the dump with the port number, in this case, the port number is showed at the end of the item, not the address. So implemented a more flexible approach to get the address and the port number from the dump to fit for the port number case. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05selftests: mptcp: add the port argument for set_flagsGeliang Tang1-1/+13
This patch added the port argument for setting the address flags in pm_nl_ctl. Usage: pm_nl_ctl set 10.0.2.1 flags backup port 10100 Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05mptcp: allow to use port and non-signal in set_flagsGeliang Tang1-7/+6
It's illegal to use both port and non-signal flags for adding address. But it's legal to use both of them for setting flags, which always uses non-signal flags, backup or fullmesh. This patch moves this non-signal flag with port check from mptcp_pm_parse_addr() to mptcp_nl_cmd_add_addr(). Do the check only when adding addresses, not setting flags or deleting addresses. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05Merge branch 'support-for-the-ioam-insertion-frequency'Jakub Kicinski2-2/+66
Justin Iurman says: ==================== Support for the IOAM insertion frequency The insertion frequency is represented as "k/n", meaning IOAM will be added to {k} packets over {n} packets, with 0 < k <= n and 1 <= {k,n} <= 1000000. Therefore, it provides the following percentages of insertion frequency: [0.0001% (min) ... 100% (max)]. Not only this solution allows an operator to apply dynamic frequencies based on the current traffic load, but it also provides some flexibility, i.e., by distinguishing similar cases (e.g., "1/2" and "2/4"). "1/2" = Y N Y N Y N Y N ... "2/4" = Y Y N N Y Y N N ... ==================== Link: https://lore.kernel.org/r/20220202142554.9691-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05ipv6: ioam: Insertion frequency in lwtunnel outputJustin Iurman1-2/+57
Add support for the IOAM insertion frequency inside its lwtunnel output function. This patch introduces a new (atomic) counter for packets, based on which the algorithm will decide if IOAM should be added or not. Default frequency is "1/1" (i.e., applied to all packets) for backward compatibility. The iproute2 patch is ready and will be submitted as soon as this one is accepted. Previous iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ mode ... ] ... New iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ freq k/n ] [ mode ... ] ... Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05uapi: ioam: Insertion frequencyJustin Iurman1-0/+9
Add the insertion frequency uapi for IOAM lwtunnels. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05net: don't include ndisc.h from ipv6.hJakub Kicinski7-1/+6
Nothing in ipv6.h needs ndisc.h, drop it. Link: https://lore.kernel.org/r/20220203043457.2222388-1-kuba@kernel.org Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20220203231240.2297588-1-kuba@kernel.org Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge branch 'ipa-RX-replenish'David S. Miller4-81/+60
Alex Elder says: ==================== net: ipa: improve RX buffer replenishing This series revises the algorithm used for replenishing receive buffers on RX endpoints. Currently there are two atomic variables that track how many receive buffers can be sent to the hardware. The new algorithm obviates the need for those, by just assuming we always want to provide the hardware with buffers until it can hold no more. The first patch eliminates an atomic variable that's not required. The next moves some code into the main replenish function's caller, making one of the called function's arguments unnecessary. The next six refactor things a bit more, adding a new helper function that allows us to eliminate an additional atomic variable. And the final two implement two more minor improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: determine replenish doorbell differentlyAlex Elder2-6/+10
Rather than tracking the number of receive buffer transactions that have been submitted without a doorbell, just track the total number of transactions that have been issued. Then ring the doorbell when that number modulo the replenish batch size is 0. The effect is roughly the same, but the new count is slightly more interesting, and this approach will someday allow the replenish batch size to be tuned at runtime. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: replenish after delivering payloadAlex Elder1-3/+3
Replenishing is now solely driven by whether transactions are available for a channel, and it doesn't really matter whether we replenish before or after we deliver received packets to the network stack. Replenishing before delivering the payload adds a little latency. Eliminate that by requesting a replenish after the payload is delivered. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: kill replenish_backlogAlex Elder2-9/+0
We no longer use the replenish_backlog atomic variable to decide when we've got work to do providing receive buffers to hardware. Basically, we try to keep the hardware as full as possible, all the time. We keep supplying buffers until the hardware has no more space for them. As a result, we can get rid of the replenish_backlog field and the atomic operations performed on it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: introduce gsi_channel_trans_idle()Alex Elder3-12/+26
Create a new function that returns true if all transactions for a channel are available for use. Use it in ipa_endpoint_replenish_enable() to see whether to start replenishing, and in ipa_endpoint_replenish() to determine whether it's necessary after a failure to schedule delayed work to ensure a future replenish attempt occurs. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: don't use replenish_backlogAlex Elder1-5/+2
Rather than determining when to stop replenishing using the replenish backlog, just stop when we have exhausted all available transactions. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: allocate transaction in replenish loopAlex Elder1-24/+16
When replenishing, have ipa_endpoint_replenish() allocate a transaction, and pass that to ipa_endpoint_replenish_one() to fill. Then, if that produces no error, commit the transaction within the replenish loop as well. In this way we can distinguish between transaction failures and buffer allocation/mapping failures. Failure to allocate a transaction simply means the hardware already has as many receive buffers as it can hold. In that case we can break out of the replenish loop because there's nothing more to do. If we fail to allocate or map pages for the receive buffer, just try again later. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: decide on doorbell in replenish loopAlex Elder1-9/+12
Decide whether the doorbell should be signaled when committing a replenish transaction in the main replenish loop, rather than in ipa_endpoint_replenish_one(). This is a step to facilitate the next patch. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: increment backlog in replenish callerAlex Elder1-20/+9
Three spots call ipa_endpoint_replenish(), and just one of those requests that the backlog be incremented after completing the replenish operation. Instead, have the caller increment the backlog, and get rid of the add_one argument to ipa_endpoint_replenish(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: allocate transaction before pages when replenishingAlex Elder1-8/+8
A transaction failure only occurs if no more transactions are available for an endpoint. It's a very cheap test. When replenishing an RX endpoint buffer, there's no point in allocating pages if transactions are exhausted. So don't bother doing so unless the transaction allocation succeeds. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: kill replenish_savedAlex Elder2-15/+4
The replenish_saved field keeps track of the number of times a new buffer is added to the backlog when replenishing is disabled. We don't really use it though, so there's no need for us to track it separately. Whether replenishing is enabled or not, we can simply increment the backlog. Get rid of replenish_saved, and initialize and increment the backlog where it would have otherwise been used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04tls: cap the output scatter list to something reasonableJakub Kicinski2-1/+19
TLS recvmsg() passes user pages as destination for decrypt. The decrypt operation is repeated record by record, each record being 16kB, max. TLS allocates an sg_table and uses iov_iter_get_pages() to populate it with enough pages to fit the decrypted record. Even though we decrypt a single message at a time we size the sg_table based on the entire length of the iovec. This leads to unnecessarily large allocations, risking triggering OOM conditions. Use iov_iter_truncate() / iov_iter_reexpand() to construct a "capped" version of iov_iter_npages(). Alternatively we could parametrize iov_iter_npages() to take the size as arg instead of using i->count, or do something else.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: dsa: realtek: convert to phylink_generic_validate()Russell King (Oracle)1-34/+12
Populate the supported interfaces and MAC capabilities for the Realtek rtl8365 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04Merge branch '40GbE' of ↵David S. Miller7-60/+255
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-02-03 This series contains updates to the i40e client header file and driver. Mateusz disables HW TC offload by default. Joe Damato removes a no longer used statistic. Jakub Kicinski removes an unused enum from the client header file. Jedrzej changes some admin queue commands to occur under atomic context and adds new functions for admin queue MAC VLAN filters to avoid a potential race that could occur due storing results in a structure that could be overwritten by the next admin queue call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: lan966x: use .mac_select_pcs() interfaceHoratiu Vultur2-1/+9
Convert lan966x to use the mac_select_interface instead of phylink_set_pcs. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220202114949.833075-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04selftests: rtnetlink: Use more sensible tos valuesGuillaume Nault1-2/+2
Using tos 0x1 with 'ip route get <IPv4 address> ...' doesn't test much of the tos option handling: 0x1 just sets an ECN bit, which is cleared by inet_rtm_getroute() before doing the fib lookup. Let's use 0x10 instead, which is actually taken into account in the route lookup (and is less surprising for the reader). For consistency, use 0x10 for the IPv6 route lookup too (IPv6 currently doesn't clear ECN bits, but might do so in the future). Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/d61119e68d01ba7ef3ba50c1345a5123a11de123.1643815297.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04selftests: fib offload: use sensible tos valuesGuillaume Nault1-6/+6
Although both iproute2 and the kernel accept 1 and 2 as tos values for new routes, those are invalid. These values only set ECN bits, which are ignored during IPv4 fib lookups. Therefore, no packet can actually match such routes. This selftest therefore only succeeds because it doesn't verify that the new routes do actually work in practice (it just checks if the routes are offloaded or not). It makes more sense to use tos values that don't conflict with ECN. This way, the selftest won't be affected if we later decide to warn or even reject invalid tos configurations for new routes. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/5e43b343720360a1c0e4f5947d9e917b26f30fbf.1643826556.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04net: minor __dev_alloc_name() optimizationEric Dumazet1-2/+2
__dev_alloc_name() allocates a private zeroed page, then sets bits in it while iterating through net devices. It can use __set_bit() to avoid unnecessary locked operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220203064609.3242863-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski405-2043/+4030
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04gcc-plugins/stackleak: Use noinstr in favor of notraceKees Cook1-3/+2
While the stackleak plugin was already using notrace, objtool is now a bit more picky. Update the notrace uses to noinstr. Silences the following objtool warnings when building with: CONFIG_DEBUG_ENTRY=y CONFIG_STACK_VALIDATION=y CONFIG_VMLINUX_VALIDATION=y CONFIG_GCC_PLUGIN_STACKLEAK=y vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section Note that the plugin's addition of calls to stackleak_track_stack() from noinstr functions is expected to be safe, as it isn't runtime instrumentation and is self-contained. Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-04Merge tag 'net-5.17-rc3' of ↵Linus Torvalds78-453/+877
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, netfilter, and ieee802154. Current release - regressions: - Partially revert "net/smc: Add netlink net namespace support", fix uABI breakage - netfilter: - nft_ct: fix use after free when attaching zone template - nft_byteorder: track register operations Previous releases - regressions: - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback - phy: qca8081: fix speeds lower than 2.5Gb/s - sched: fix use-after-free in tc_new_tfilter() Previous releases - always broken: - tcp: fix mem under-charging with zerocopy sendmsg() - tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() - neigh: do not trigger immediate probes on NUD_FAILED from neigh_managed_work, avoid a deadlock - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN false-positives - netfilter: nft_reject_bridge: fix for missing reply from prerouting - smc: forward wakeup to smc socket waitqueue after fallback - ieee802154: - return meaningful error codes from the netlink helpers - mcr20a: fix lifs/sifs periods - at86rf230, ca8210: stop leaking skbs on error paths - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent - ax25: add refcount in ax25_dev to avoid UAF bugs - eth: mlx5e: - fix SFP module EEPROM query - fix broken SKB allocation in HW-GRO - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows - eth: amd-xgbe: - fix skb data length underflow - ensure reset of the tx_timer_active flag, avoid Tx timeouts - eth: stmmac: fix runtime pm use in stmmac_dvr_remove() - eth: e1000e: handshake with CSME starts from Alder Lake platforms" * tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) ax25: fix reference count leaks of ax25_dev net: stmmac: ensure PTP time register reads are consistent net: ipa: request IPA register values be retained dt-bindings: net: qcom,ipa: add optional qcom,qmp property tools/resolve_btfids: Do not print any commands when building silently bpf: Use VM_MAP instead of VM_ALLOC for ringbuf net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() net: sparx5: do not refer to skb after passing it on Partially revert "net/smc: Add netlink net namespace support" net/mlx5e: Avoid field-overflowing memcpy() net/mlx5e: Use struct_group() for memcpy() region net/mlx5e: Avoid implicit modify hdr for decap drop rule net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic net/mlx5e: Don't treat small ceil values as unlimited in HTB offload net/mlx5: E-Switch, Fix uninitialized variable modact net/mlx5e: Fix handling of wrong devices during bond netevent net/mlx5e: Fix broken SKB allocation in HW-GRO net/mlx5e: Fix wrong calculation of header index in HW_GRO ...
2022-02-04Merge tag 'selinux-pr-20220203' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "One small SELinux patch to ensure that a policy structure field is properly reset after freeing so that we don't inadvertently do a double-free on certain error conditions" * tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix double free of cond_list on error paths
2022-02-04Merge tag 'linux-kselftest-fixes-5.17-rc3' of ↵Linus Torvalds15-177/+209
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Important fixes to several tests and documentation clarification on running mainline kselftest on stable releases. A few notable fixes: - fix kselftest run hang due to child processes that haven't been terminated. Fix signals all child processes - fix false pass/fail results from vdso_test_abi, openat2, mincore - build failures when using -j (multiple jobs) option - exec test build failure due to incorrect build rule for a run-time created "pipe" - zram test fixes related to interaction with zram-generator to make sure zram test to coordinate deleted with zram-generator - zram test compression ratio calculation fix and skipping max_comp_streams. - increasing rtc test timeout - cpufreq test to write test results to stdout which will necessary on automated test systems" * tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kselftest: Fix vdso_test_abi return status selftests: skip mincore.check_file_mmap when fs lacks needed support selftests: openat2: Skip testcases that fail with EOPNOTSUPP selftests: openat2: Add missing dependency in Makefile selftests: openat2: Print also errno in failure messages selftests: futex: Use variable MAKE instead of make selftests/exec: Remove pipe from TEST_GEN_FILES selftests/zram: Adapt the situation that /dev/zram0 is being used selftests/zram01.sh: Fix compression ratio calculation selftests/zram: Skip max_comp_streams interface on newer kernel docs/kselftest: clarify running mainline tests on stables kselftest: signal all child processes selftests: cpufreq: Write test output to stdout as well selftests: rtc: Increase test timeout so that all tests run
2022-02-04ax25: fix reference count leaks of ax25_devDuoming Zhou4-19/+41
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04net: stmmac: ensure PTP time register reads are consistentYannick Vignon1-7/+12
Even if protected from preemption and interrupts, a small time window remains when the 2 register reads could return inconsistent values, each time the "seconds" register changes. This could lead to an about 1-second error in the reported time. Add logic to ensure the "seconds" and "nanoseconds" values are consistent. Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver") Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski7-236/+11
Daniel Borkmann says: ==================== pull-request: bpf 2022-02-03 We've added 6 non-merge commits during the last 10 day(s) which contain a total of 7 files changed, 11 insertions(+), 236 deletions(-). The main changes are: 1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC flag which otherwise trips over KASAN, from Hou Tao. 2) Fix unresolved symbol warning in resolve_btfids due to LSM callback rename, from Alexei Starovoitov. 3) Fix a possible race in inc_misses_counter() when IRQ would trigger during counter update, from He Fengqing. 4) Fix tooling infra for cross-building with clang upon probing whether gcc provides the standard libraries, from Jean-Philippe Brucker. 5) Fix silent mode build for resolve_btfids, from Nathan Chancellor. 6) Drop unneeded and outdated lirc.h header copy from tooling infra as BPF does not require it anymore, from Sean Young. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: tools/resolve_btfids: Do not print any commands when building silently bpf: Use VM_MAP instead of VM_ALLOC for ringbuf tools: Ignore errors from `which' when searching a GCC toolchain tools headers UAPI: remove stale lirc.h bpf: Fix possible race in inc_misses_counter bpf: Fix renaming task_getsecid_subj->current_getsecid_subj. ==================== Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04i40e: Fix race condition while adding/deleting MAC/VLAN filtersJedrzej Jagielski1-11/+13
There was a race condition in access to hw->aq.asq_last_status while adding and deleting MAC/VLAN filters causing incorrect error status to be printed as ERROR OK instead of the correct error. Change calls to i40e_aq_add_macvlan in i40e_aqc_add_filters and i40e_aq_remove_macvlan in i40e_aqc_del_filters to _v2 versions that return Admin Queue status on the stack to avoid race conditions in access to hw->aq.asq_last_status. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-04i40e: Add new version of i40e_aq_add_macvlan functionJedrzej Jagielski2-20/+77
ASQ send command functions are returning only i40e status codes yet some calling functions also need Admin Queue status that is stored in hw->aq.asq_last_status. Since hw object is stored on a heap it introduces a possibility for a race condition in access to hw if calling function is not fast enough to read hw->aq.asq_last_status before next send ASQ command is executed. Add new _v2 version of i40e_aq_add_macvlan that is using new _v2 versions of ASQ send command functions and returns the Admin Queue status on the stack. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>