summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-02-14ipv6: icmp6: add drop reason support to ndisc_rcv()Eric Dumazet2-7/+8
Creates three new drop reasons: SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14ipv6: icmp6: add drop reason support to icmpv6_notify()Eric Dumazet1-8/+17
Accurately reports what happened in icmpv6_notify() when handling a packet. This makes use of the new IPV6_BAD_EXTHDR drop reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-13net: ethtool: extend ringparam set/get APIs for rx_pushShannon Nelson2-3/+16
Similar to what was done for TX_PUSH, add an RX_PUSH concept to the ethtool interfaces. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13net/sched: fix error recovery in qdisc_create()Eric Dumazet1-7/+8
If TCA_STAB attribute is malformed, qdisc_get_stab() returns an error, and we end up calling ops->destroy() while ops->init() has not been called yet. While we are at it, call qdisc_put_stab() after ops->destroy(). Fixes: 1f62879e3632 ("net/sched: make stab available before ops->init() call") Reported-by: syzbot+d44d88f1d11e6ca8576b@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: add forgotten devlink instance lock assertion to ↵Jiri Pirko1-0/+2
devl_param_driverinit_value_set() Driver calling devl_param_driverinit_value_set() has to hold devlink instance lock while doing that. Put an assertion there. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: allow to call devl_param_driverinit_value_get() without holding ↵Jiri Pirko1-2/+11
instance lock If the driver maintains following basic sane behavior, the devl_param_driverinit_value_get() function could be called without holding instance lock: 1) Driver ensures a call to devl_param_driverinit_value_get() cannot race with registering/unregistering the parameter with the same parameter ID. 2) Driver ensures a call to devl_param_driverinit_value_get() cannot race with devl_param_driverinit_value_set() call with the same parameter ID. 3) Driver ensures a call to devl_param_driverinit_value_get() cannot race with reload operation. By the nature of params usage, these requirements should be trivially achievable. If the driver for some off reason is not able to comply, it has to take the devlink->lock while calling devl_param_driverinit_value_get(). Remove the lock assertion and add comment describing the locking requirements. This fixes a splat in mlx5 driver introduced by the commit referenced in the "Fixes" tag. Lore: https://lore.kernel.org/netdev/719de4f0-76ac-e8b9-38a9-167ae239efc7@amd.com/ Reported-by: Kim Phillips <kim.phillips@amd.com> Fixes: 075935f0ae0f ("devlink: protect devlink param list by instance lock") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: convert param list to xarrayJiri Pirko3-39/+39
Loose the linked list for params and use xarray instead. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: use xa_for_each_start() helper in devlink_nl_cmd_port_get_dump_one()Jiri Pirko1-8/+2
As xarray has an iterator helper that allows to start from specified index, use this directly and avoid repeated iteration from 0. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: fix the name of value arg of devl_param_driverinit_value_get()Jiri Pirko1-3/+4
Probably due to copy-paste error, the name of the arg is "init_val" which is misleading, as the pointer is used to point to struct where to store the current value. Rename it to "val" and change the arg comment a bit on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: make sure driver does not read updated driverinit param before reloadJiri Pirko3-4/+28
The driverinit param purpose is to serve the driver during init/reload time to provide a value, either default or set by user. Make sure that driver does not read value updated by user before the reload is performed. Hold the new value in a separate struct and switch it during reload. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: don't use strcpy() to copy param valueJiri Pirko1-12/+3
No need to treat string params any different comparing to other types. Rely on the struct assign to copy the whole struct, including the string. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13devlink: stop using NL_SET_ERR_MSG_MODJacob Keller2-70/+65
NL_SET_ERR_MSG_MOD inserts the KBUILD_MODNAME and a ':' before the actual extended error message. The devlink feature hasn't been able to be compiled as a module since commit f4b6bcc7002f ("net: devlink: turn devlink into a built-in"). Stop using NL_SET_ERR_MSG_MOD, and just use the base NL_SET_ERR_MSG. This aligns the extended error messages better with the NL_SET_ERR_MSG_ATTR messages as well. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-13rds: rds_rm_zerocopy_callback() correct order for list_add_tail()Pietro Borrello1-1/+1
rds_rm_zerocopy_callback() uses list_add_tail() with swapped arguments. This links the list head with the new entry, losing the references to the remaining part of the list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-11bridge: mcast: Move validation to a policyIdo Schimmel1-18/+27
Future patches are going to move parts of the bridge MDB code to the common rtnetlink code in preparation for VXLAN MDB support. To facilitate code sharing between both drivers, move the validation of the top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will eventually be moved to the rtnetlink code. Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of NLA_POLICY_NESTED() as this attribute is going to be validated using different policies in the underlying drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11bridge: mcast: Remove pointless sequence generation counter assignmentIdo Schimmel1-2/+0
The purpose of the sequence generation counter in the netlink callback is to identify if a multipart dump is consistent or not by calling nl_dump_check_consistent() whenever a message is generated. The function is not invoked by the MDB code, rendering the sequence generation counter assignment pointless. Remove it. Note that even if the function was invoked, we still could not accurately determine if the dump is consistent or not, as there is no sequence generation counter for MDB entries, unlike nexthop objects, for example. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11bridge: mcast: Use correct define in MDB dumpIdo Schimmel1-1/+1
'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same value, but the latter is uAPI and cannot change, so use it when dumping an MDB entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11Merge tag 'for-net-next-2023-02-09' of ↵Jakub Kicinski4-34/+33
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== pull-request: bluetooth-next - Add new PID/VID 0489:e0f2 for MT7921 - Add VID:PID 13d3:3529 for Realtek RTL8821CE - Add CIS feature bits to controller information - Set Per Platform Antenna Gain(PPAG) for Intel controllers * tag 'for-net-next-2023-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG) Bluetooth: Make sure LE create conn cancel is sent when timeout Bluetooth: Free potentially unfreed SCO connection Bluetooth: hci_qca: get wakeup status from serdev device handle Bluetooth: L2CAP: Fix potential user-after-free Bluetooth: MGMT: add CIS feature bits to controller information Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds Bluetooth: HCI: Replace zero-length arrays with flexible-array members Bluetooth: qca: Fix sparse warnings Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921 Bluetooth: Fix issue with Actions Semi ATS2851 based devices ==================== Link: https://lore.kernel.org/r/20230209234922.3756173-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11Daniel Borkmann says:Jakub Kicinski16-84/+370
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11net: extract nf_ct_handle_fragments to nf_conntrack_ovsXin Long3-88/+51
Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11net: sched: move frag check and tc_skb_cb update out of handle_fragmentsXin Long1-32/+39
This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11openvswitch: move key and ovs_cb update out of handle_fragmentsXin Long1-12/+25
This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11net: extract nf_ct_skb_network_trim function to nf_conntrack_ovsXin Long3-58/+31
There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11net: create nf_conntrack_ovs for ovs and tc useXin Long6-98/+110
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: skbuff: drop the word head from skb cacheJakub Kicinski3-20/+18
skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10Merge tag 'rxrpc-next-20230208' of ↵David S. Miller5-9/+15
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc development Here are some miscellaneous changes for rxrpc: (1) Use consume_skb() rather than kfree_skb_reason(). (2) Fix unnecessary waking when poking and already-poked call. (3) Add ack.rwind to the rxrpc_tx_ack tracepoint as this indicates how many incoming DATA packets we're telling the peer that we are currently willing to accept on this call. (4) Reduce duplicate ACK transmission. We send ACKs to let the peer know that we're increasing the receive window (ack.rwind) as we consume packets locally. Normal ACK transmission is triggered in three places and that leads to duplicates being sent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10openvswitch: Use string_is_terminated() helperAndy Shevchenko1-2/+3
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10genetlink: Use string_is_terminated() helperAndy Shevchenko1-2/+2
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10string_helpers: Move string_is_valid() to the headerAndy Shevchenko1-10/+6
Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: introduce default_rps_mask netns attributePaolo Abeni2-1/+43
If RPS is enabled, this allows configuring a default rps mask, which is effective since receive queue creation time. A default RPS mask allows the system admin to ensure proper isolation, avoiding races at network namespace or device creation time. The default RPS mask is initially empty, and can be modified via a newly added sysctl entry. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net-sysctl: factor-out rpm mask manipulation helpersPaolo Abeni2-30/+44
Will simplify the following patch. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net-sysctl: factor out cpumask parsing helperPaolo Abeni1-18/+28
Will be used by the following patch to avoid code duplication. No functional changes intended. The only difference is that now flow_limit_cpu_sysctl() will always compute the flow limit mask on each read operation, even when read() will not return any byte to user-space. Note that the new helper is placed under a new #ifdef at the file start to better fit the usage in the later patch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Bluetooth: Make sure LE create conn cancel is sent when timeoutArchie Pusaka1-0/+2
When sending LE create conn command, we set a timer with a duration of HCI_LE_CONN_TIMEOUT before timing out and calling create_le_conn_complete. Additionally, when receiving the command complete, we also set a timer with the same duration to call le_conn_timeout. Usually the latter will be triggered first, which then sends a LE create conn cancel command. However, due to the nature of racing, it is possible for the former to be called first, thereby calling the chain hci_conn_failed -> hci_conn_del -> cancel_delayed_work, thereby preventing LE create conn cancel to be sent. In this situation, the controller will be stuck in trying the LE connection. This patch flushes le_conn_timeout on create_le_conn_complete to make sure we always send LE create connection cancel, if necessary. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-10Bluetooth: Free potentially unfreed SCO connectionArchie Pusaka1-1/+8
It is possible to initiate a SCO connection while deleting the corresponding ACL connection, e.g. in below scenario: (1) < hci setup sync connect command (2) > hci disconn complete event (for the acl connection) (3) > hci command complete event (for(1), failure) When it happens, hci_cs_setup_sync_conn won't be able to obtain the reference to the SCO connection, so it will be stuck and potentially hinder subsequent connections to the same device. This patch prevents that by also deleting the SCO connection if it is still not established when the corresponding ACL connection is deleted. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-10Bluetooth: L2CAP: Fix potential user-after-freeLuiz Augusto von Dentz2-24/+8
This fixes all instances of which requires to allocate a buffer calling alloc_skb which may release the chan lock and reacquire later which makes it possible that the chan is disconnected in the meantime. Fixes: a6a5568c03c4 ("Bluetooth: Lock the L2CAP channel when sending") Reported-by: Alexander Coffin <alex.coffin@matician.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-10Bluetooth: MGMT: add CIS feature bits to controller informationPauli Virtanen1-0/+12
Userspace needs to know whether the adapter has feature support for Connected Isochronous Stream - Central/Peripheral, so it can set up LE Audio features accordingly. Expose these feature bits as settings in MGMT controller info. Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-10Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeedsKees Cook1-9/+3
The compiler thinks "conn" might be NULL after a call to hci_bind_bis(), which cannot happen. Avoid any confusion by just making it not return a value since it cannot fail. Fixes the warnings seen with GCC 13: In function 'arch_atomic_dec_and_test', inlined from 'atomic_dec_and_test' at ../include/linux/atomic/atomic-instrumented.h:576:9, inlined from 'hci_conn_drop' at ../include/net/bluetooth/hci_core.h:1391:6, inlined from 'hci_connect_bis' at ../net/bluetooth/hci_conn.c:2124:3: ../arch/x86/include/asm/rmwcc.h:37:9: warning: array subscript 0 is outside array bounds of 'atomic_t[0]' [-Warray-bounds=] 37 | asm volatile (fullop CC_SET(cc) \ | ^~~ ... In function 'hci_connect_bis': cc1: note: source object is likely at address zero Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski19-42/+176
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net: enable usercopy for skb_small_head_cacheEric Dumazet1-1/+7
syzbot and other bots reported that we have to enable user copy to/from skb->head. [1] We can prevent access to skb_shared_info, which is a nice improvement over standard kmem_cache. Layout of these kmem_cache objects is: < SKB_SMALL_HEAD_HEADROOM >< struct skb_shared_info > usercopy: Kernel memory overwrite attempt detected to SLUB object 'skbuff_small_head' (offset 32, size 20)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6-syzkaller-01425-gcb6b2e11a42d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 Code: e8 ee ad ba f7 49 89 d9 4d 89 e8 4c 89 e1 41 56 48 89 ee 48 c7 c7 20 2b 5b 8a ff 74 24 08 41 57 48 8b 54 24 20 e8 7a 17 fe ff <0f> 0b e8 c2 ad ba f7 e8 7d fb 08 f8 48 8b 0c 24 49 89 d8 44 89 ea RSP: 0000:ffffc90000067a48 EFLAGS: 00010286 RAX: 000000000000006b RBX: ffffffff8b5b6ea0 RCX: 0000000000000000 RDX: ffff8881401c0000 RSI: ffffffff8166195c RDI: fffff5200000cf3b RBP: ffffffff8a5b2a60 R08: 000000000000006b R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8bf2a925 R13: ffffffff8a5b29a0 R14: 0000000000000014 R15: ffffffff8a5b2960 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000c48e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __check_heap_object+0xdd/0x110 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:251 [inline] __check_object_size+0x1da/0x5a0 mm/usercopy.c:213 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_from_iter include/linux/uio.h:186 [inline] copy_from_iter_full include/linux/uio.h:194 [inline] memcpy_from_msg include/linux/skbuff.h:3977 [inline] qrtr_sendmsg+0x65f/0x970 net/qrtr/af_qrtr.c:965 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 say_hello+0xf6/0x170 net/qrtr/ns.c:325 qrtr_ns_init+0x220/0x2b0 net/qrtr/ns.c:804 qrtr_proto_init+0x59/0x95 net/qrtr/af_qrtr.c:1296 do_one_initcall+0x141/0x790 init/main.c:1306 do_initcall_level init/main.c:1379 [inline] do_initcalls init/main.c:1395 [inline] do_basic_setup init/main.c:1414 [inline] kernel_init_freeable+0x6f9/0x782 init/main.c:1634 kernel_init+0x1e/0x1d0 init/main.c:1522 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Fixes: bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-next/CA+G9fYs-i-c2KTSA7Ai4ES_ZESY1ZnM=Zuo8P1jN00oed6KHMA@mail.gmail.com Link: https://lore.kernel.org/r/20230208142508.3278406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09rds: rds_rm_zerocopy_callback() use list_first_entry()Pietro Borrello1-3/+3
rds_rm_zerocopy_callback() uses list_entry() on the head of a list causing a type confusion. Use list_first_entry() to actually access the first element of the rs_zcookie_queue list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.it Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09Merge tag 'ipsec-2023-02-08' of ↵Jakub Kicinski5-20/+73
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2023-02-08 1) Fix policy checks for nested IPsec tunnels when using xfrm interfaces. From Benedict Wong. 2) Fix netlink message expression on 32=>64-bit messages translators. From Anastasia Belova. 3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr. From Eric Dumazet. 4) Always consistently use time64_t in xfrm_timer_handler. From Eric Dumazet. 5) Fix KCSAN reported bug: Multiple cpus can update use_time at the same time. From Eric Dumazet. 6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel. From Christian Hopps. * tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix bug with DSCP copy to v6 from v4 tunnel xfrm: annotate data-race around use_time xfrm: consistently use time64_t in xfrm_timer_handler() xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr() xfrm: compat: change expression for switch in xfrm_xlate64 Fix XFRM-I support for nested ESP tunnels ==================== Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08can: raw: use temp variable instead of rolling back configOliver Hartkopp1-5/+6
Introduce a temporary variable to check for an invalid configuration attempt from user space. Before this patch the value was copied to the real config variable and rolled back in the case of an error. Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230203090807.97100-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-08net/sched: taprio: don't segment unnecessarilyVladimir Oltean1-11/+20
Improve commit 497cc00224cf ("taprio: Handle short intervals and large packets") to only perform segmentation when skb->len exceeds what taprio_dequeue() expects. In practice, this will make the biggest difference when a traffic class gate is always open in the schedule. This is because the max_frm_len will be U32_MAX, and such large skb->len values as Kurt reported will be sent just fine unsegmented. What I don't seem to know how to handle is how to make sure that the segmented skbs themselves are smaller than the maximum frame size given by the current queueMaxSDU[tc]. Nonetheless, we still need to drop those, otherwise the Qdisc will hang. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: split segmentation logic from qdisc_enqueue()Vladimir Oltean1-30/+36
The majority of the taprio_enqueue()'s function is spent doing TCP segmentation, which doesn't look right to me. Compilers shouldn't have a problem in inlining code no matter how we write it, so move the segmentation logic to a separate function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: automatically calculate queueMaxSDU based on TC gate ↵Vladimir Oltean1-10/+60
durations taprio today has a huge problem with small TC gate durations, because it might accept packets in taprio_enqueue() which will never be sent by taprio_dequeue(). Since not much infrastructure was available, a kludge was added in commit 497cc00224cf ("taprio: Handle short intervals and large packets"), which segmented large TCP segments, but the fact of the matter is that the issue isn't specific to large TCP segments (and even worse, the performance penalty in segmenting those is absolutely huge). In commit a54fc09e4cba ("net/sched: taprio: allow user input of per-tc max SDU"), taprio gained support for queueMaxSDU, which is precisely the mechanism through which packets should be dropped at qdisc_enqueue() if they cannot be sent. After that patch, it was necessary for the user to manually limit the maximum MTU per TC. This change adds the necessary logic for taprio to further limit the values specified (or not specified) by the user to some minimum values which never allow oversized packets to be sent. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: keep the max_frm_len information inside struct sched_gate_listVladimir Oltean1-13/+40
I have one practical reason for doing this and one concerning correctness. The practical reason has to do with a follow-up patch, which aims to mix 2 sources of max_sdu (one coming from the user and the other automatically calculated based on TC gate durations @current link speed). Among those 2 sources of input, we must always select the smaller max_sdu value, but this can change at various link speeds. So the max_sdu coming from the user must be kept separated from the value that is operationally used (the minimum of the 2), because otherwise we overwrite it and forget what the user asked us to do. To solve that, this patch proposes that struct sched_gate_list contains the operationally active max_frm_len, and q->max_sdu contains just what was requested by the user. The reason having to do with correctness is based on the following observation: the admin sched_gate_list becomes operational at a given base_time in the future. Until then, it is inactive and applies no shaping, all gates are open, etc. So the queueMaxSDU dropping shouldn't apply either (this is a mechanism to ensure that packets smaller than the largest gate duration for that TC don't hang the port; clearly it makes little sense if the gates are always open). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: warn about missing size tableVladimir Oltean1-0/+5
Vinicius intended taprio to take the L1 overhead into account when estimating packet transmission time through user input, specifically through the qdisc size table (man tc-stab). Something like this: tc qdisc replace dev $eth root stab overhead 24 taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7e 9000000 \ sched-entry S 0x82 1000000 \ max-sdu 0 0 0 0 0 0 0 200 \ flags 0x0 clockid CLOCK_TAI Without the overhead being specified, transmission times will be underestimated and will cause late transmissions. For an offloading driver, it might even cause TX hangs if there is no open gate large enough to send the maximum sized packets for that TC (including L1 overhead). Properly knowing the L1 overhead will ensure that we are able to auto-calculate the queueMaxSDU per traffic class just right, and avoid these hangs due to head-of-line blocking. We can't make the stab mandatory due to existing setups, but we can warn the user that it's important with a warning netlink extack. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220505160357.298794-1-vladimir.oltean@nxp.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: make stab available before ops->init() callVladimir Oltean1-18/+11
Some qdiscs like taprio turn out to be actually pretty reliant on a well configured stab, to not underestimate the skb transmission time (by properly accounting for L1 overhead). In a future change, taprio will need the stab, if configured by the user, to be available at ops->init() time. It will become even more important in upcoming work, when the overhead will be used for the queueMaxSDU calculation that is passed to an offloading driver. However, rcu_assign_pointer(sch->stab, stab) is called right after ops->init(), making it unavailable, and I don't really see a good reason for that. Move it earlier, which nicely seems to simplify the error handling path as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: calculate guard band against actual TC gate close timeVladimir Oltean1-6/+34
taprio_dequeue_from_txq() looks at the entry->end_time to determine whether the skb will overrun its traffic class gate, as if at the end of the schedule entry there surely is a "gate close" event for it. Hint: maybe there isn't. For each schedule entry, introduce an array of kernel times which actually tracks when in the future will there be an *actual* gate close event for that traffic class, and use that in the guard band overrun calculation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: calculate budgets per traffic classVladimir Oltean1-8/+46
Currently taprio assumes that the budget for a traffic class expires at the end of the current interval as if the next interval contains a "gate close" event for this traffic class. This is, however, an unfounded assumption. Allow schedule entry intervals to be fused together for a particular traffic class by calculating the budget until the gate *actually* closes. This means we need to keep budgets per traffic class, and we also need to update the budget consumption procedure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net/sched: taprio: rename close_time to end_timeVladimir Oltean1-26/+26
There is a confusion in terms in taprio which makes what is called "close_time" to be actually used for 2 things: 1. determining when an entry "closes" such that transmitted skbs are never allowed to overrun that time (?!) 2. an aid for determining when to advance and/or restart the schedule using the hrtimer It makes more sense to call this so-called "close_time" "end_time", because it's not clear at all to me what "closes". Future patches will hopefully make better use of the term "to close". This is an absolutely mechanical change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>