summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-09-26wifi: cfg80211: avoid leaking stack data into traceBenjamin Berg1-1/+1
If the structure is not initialized then boolean types might be copied into the tracing data without being initialised. This causes data from the stack to leak into the trace and also triggers a UBSAN failure which can easily be avoided here. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://lore.kernel.org/r/20230925171855.a9271ef53b05.I8180bae663984c91a3e036b87f36a640ba409817@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: allow transmitting EAPOL frames with tainted keyWen Gong1-1/+2
Lower layer device driver stop/wake TX by calling ieee80211_stop_queue()/ ieee80211_wake_queue() while hw scan. Sometimes hw scan and PTK rekey are running in parallel, when M4 sent from wpa_supplicant arrive while the TX queue is stopped, then the M4 will pending send, and then new key install from wpa_supplicant. After TX queue wake up by lower layer device driver, the M4 will be dropped by below call stack. When key install started, the current key flag is set KEY_FLAG_TAINTED in ieee80211_pairwise_rekey(), and then mac80211 wait key install complete by lower layer device driver. Meanwhile ieee80211_tx_h_select_key() will return TX_DROP for the M4 in step 12 below, and then ieee80211_free_txskb() called by ieee80211_tx_dequeue(), so the M4 will not send and free, then the rekey process failed becaue AP not receive M4. Please see details in steps below. There are a interval between KEY_FLAG_TAINTED set for current key flag and install key complete by lower layer device driver, the KEY_FLAG_TAINTED is set in this interval, all packet including M4 will be dropped in this interval, the interval is step 8~13 as below. issue steps: TX thread install key thread 1. stop_queue -idle- 2. sending M4 -idle- 3. M4 pending -idle- 4. -idle- starting install key from wpa_supplicant 5. -idle- =>ieee80211_key_replace() 6. -idle- =>ieee80211_pairwise_rekey() and set currently key->flags |= KEY_FLAG_TAINTED 7. -idle- =>ieee80211_key_enable_hw_accel() 8. -idle- =>drv_set_key() and waiting key install complete from lower layer device driver 9. wake_queue -waiting state- 10. re-sending M4 -waiting state- 11. =>ieee80211_tx_h_select_key() -waiting state- 12. drop M4 by KEY_FLAG_TAINTED -waiting state- 13. -idle- install key complete with success/fail success: clear flag KEY_FLAG_TAINTED fail: start disconnect Hence add check in step 11 above to allow the EAPOL send out in the interval. If lower layer device driver use the old key/cipher to encrypt the M4, then AP received/decrypt M4 correctly, after M4 send out, lower layer device driver install the new key/cipher to hardware and return success. If lower layer device driver use new key/cipher to send the M4, then AP will/should drop the M4, then it is same result with this issue, AP will/ should kick out station as well as this issue. issue log: kworker/u16:4-5238 [000] 6456.108926: stop_queue: phy1 queue:0, reason:0 wpa_supplicant-961 [003] 6456.119737: rdev_tx_control_port: wiphy_name=phy1 name=wlan0 ifindex=6 dest=ARRAY[9e, 05, 31, 20, 9b, d0] proto=36488 unencrypted=0 wpa_supplicant-961 [003] 6456.119839: rdev_return_int_cookie: phy1, returned 0, cookie: 504 wpa_supplicant-961 [003] 6456.120287: rdev_add_key: phy1, netdev:wlan0(6), key_index: 0, mode: 0, pairwise: true, mac addr: 9e:05:31:20:9b:d0 wpa_supplicant-961 [003] 6456.120453: drv_set_key: phy1 vif:wlan0(2) sta:9e:05:31:20:9b:d0 cipher:0xfac04, flags=0x9, keyidx=0, hw_key_idx=0 kworker/u16:9-3829 [001] 6456.168240: wake_queue: phy1 queue:0, reason:0 kworker/u16:9-3829 [001] 6456.168255: drv_wake_tx_queue: phy1 vif:wlan0(2) sta:9e:05:31:20:9b:d0 ac:0 tid:7 kworker/u16:9-3829 [001] 6456.168305: cfg80211_control_port_tx_status: wdev(1), cookie: 504, ack: false wpa_supplicant-961 [003] 6459.167982: drv_return_int: phy1 - -110 issue call stack: nl80211_frame_tx_status+0x230/0x340 [cfg80211] cfg80211_control_port_tx_status+0x1c/0x28 [cfg80211] ieee80211_report_used_skb+0x374/0x3e8 [mac80211] ieee80211_free_txskb+0x24/0x40 [mac80211] ieee80211_tx_dequeue+0x644/0x954 [mac80211] ath10k_mac_tx_push_txq+0xac/0x238 [ath10k_core] ath10k_mac_op_wake_tx_queue+0xac/0xe0 [ath10k_core] drv_wake_tx_queue+0x80/0x168 [mac80211] __ieee80211_wake_txqs+0xe8/0x1c8 [mac80211] _ieee80211_wake_txqs+0xb4/0x120 [mac80211] ieee80211_wake_txqs+0x48/0x80 [mac80211] tasklet_action_common+0xa8/0x254 tasklet_action+0x2c/0x38 __do_softirq+0xdc/0x384 Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Link: https://lore.kernel.org/r/20230801064751.25803-1-quic_wgong@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: work around Cisco AP 9115 VHT MPDU lengthJohannes Berg6-7/+44
Cisco AP module 9115 with FW 17.3 has a bug and sends a too large maximum MPDU length in the association response (indicating 12k) that it cannot actually process. Work around that by taking the minimum between what's in the association response and the BSS elements (from beacon or probe response). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230918140607.d1966a9a532e.I090225babb7cd4d1081ee9acd40e7de7e41c15ae@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: cfg80211: Fix 6GHz scan configurationIlan Peer1-0/+4
When the scan request includes a non broadcast BSSID, when adding the scan parameters for 6GHz collocated scanning, do not include entries that do not match the given BSSID. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230918140607.6d31d2a96baf.I6c4e3e3075d1d1878ee41f45190fdc6b86f18708@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: fix potential key leakJohannes Berg1-5/+15
When returning from ieee80211_key_link(), the key needs to have been freed or successfully installed. This was missed in a number of error paths, fix it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: fix potential key use-after-freeJohannes Berg2-1/+4
When ieee80211_key_link() is called by ieee80211_gtk_rekey_add() but returns 0 due to KRACK protection (identical key reinstall), ieee80211_gtk_rekey_add() will still return a pointer into the key, in a potential use-after-free. This normally doesn't happen since it's only called by iwlwifi in case of WoWLAN rekey offload which has its own KRACK protection, but still better to fix, do that by returning an error code and converting that to success on the cfg80211 boundary only, leaving the error for bad callers of ieee80211_gtk_rekey_add(). Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-18rfkill: sync before userspace visibility/changesJohannes Berg1-6/+26
If userspace quickly opens /dev/rfkill after a new instance was created, it might see the old state of the instance from before the sync work runs and may even _change_ the state, only to have the sync work change it again. Fix this by doing the sync inline where needed, not just for /dev/rfkill but also for sysfs. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13wifi: mac80211: fix mesh id corruption on 32 bit systemsFelix Fietkau2-5/+5
Since the changed field size was increased to u64, mesh_bss_info_changed pulls invalid bits from the first 3 bytes of the mesh id, clears them, and passes them on to ieee80211_link_info_change_notify, because ifmsh->mbss_changed was not updated to match its size. Fix this by turning into ifmsh->mbss_changed into an unsigned long array with 64 bit size. Fixes: 15ddba5f4311 ("wifi: mac80211: consistently use u64 for BSS changes") Reported-by: Thomas Hühn <thomas.huehn@hs-nordhausen.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230913050134.53536-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11wifi: cfg80211: fix cqm_config access raceJohannes Berg3-41/+73
Max Schulze reports crashes with brcmfmac. The reason seems to be a race between userspace removing the CQM config and the driver calling cfg80211_cqm_rssi_notify(), where if the data is freed while cfg80211_cqm_rssi_notify() runs it will crash since it assumes wdev->cqm_config is set. This can't be fixed with a simple non-NULL check since there's nothing we can do for locking easily, so use RCU instead to protect the pointer, but that requires pulling the updates out into an asynchronous worker so they can sleep and call back into the driver. Since we need to change the free anyway, also change it to go back to the old settings if changing the settings fails. Reported-and-tested-by: Max Schulze <max.schulze@online.de> Closes: https://lore.kernel.org/r/ac96309a-8d8d-4435-36e6-6d152eb31876@online.de Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11wifi: cfg80211: validate AP phy operation before starting itAditya Kumar Singh1-0/+19
Many regulatories can have HE/EHT Operation as not permitted. In such cases, AP should not be allowed to start if it is using a channel having the no operation flag set. However, currently there is no such check in place. Fix this issue by validating such IEs sent during start AP against the channel flags. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230905064857.1503-1-quic_adisi@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11wifi: cfg80211/mac80211: hold link BSSes when assoc fails for MLO connectionWen Gong2-6/+8
When connect to MLO AP with more than one link, and the assoc response of AP is not success, then cfg80211_unhold_bss() is not called for all the links' cfg80211_bss except the primary link which means the link used by the latest successful association request. Thus the hold value of the cfg80211_bss is not reset to 0 after the assoc fail, and then the __cfg80211_unlink_bss() will not be called for the cfg80211_bss by __cfg80211_bss_expire(). Then the AP always looks exist even the AP is shutdown or reconfigured to another type, then it will lead error while connecting it again. The detail info are as below. When connect with muti-links AP, cfg80211_hold_bss() is called by cfg80211_mlme_assoc() for each cfg80211_bss of all the links. When assoc response from AP is not success(such as status_code==1), the ieee80211_link_data of non-primary link(sdata->link[link_id]) is NULL because ieee80211_assoc_success()->ieee80211_vif_update_links() is not called for the links. Then struct cfg80211_rx_assoc_resp resp in cfg80211_rx_assoc_resp() and struct cfg80211_connect_resp_params cr in __cfg80211_connect_result() will only have the data of the primary link, and finally function cfg80211_connect_result_release_bsses() only call cfg80211_unhold_bss() for the primary link. Then cfg80211_bss of the other links will never free because its hold is always > 0 now. Hence assign value for the bss and status from assoc_data since it is valid for this case. Also assign value of addr from assoc_data when the link is NULL because the addrs of assoc_data and link both represent the local link addr and they are same value for success connection. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Link: https://lore.kernel.org/r/20230825070055.28164-1-quic_wgong@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-29Merge tag 'net-next-6.6' of ↵Linus Torvalds289-13709/+16734
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
2023-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni4-9/+28
Merge in late fixes to prepare for the 6.6 net-next PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-28devlink: move devlink_notify_register/unregister() to dev.cJiri Pirko4-64/+30
At last, move the last bits out of leftover.c, the devlink_notify_register/unregister() functions to dev.c Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-16-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: move small_ops definition into netlink.cJiri Pirko3-253/+251
Move the generic netlink small_ops definition where they are consumed, into netlink.c Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-15-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: move tracepoint definitions into core.cJiri Pirko2-6/+6
Move remaining tracepoint definitions to most suitable file core.c. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-14-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push linecard related code into separate fileJiri Pirko4-615/+626
Cut out another chunk from leftover.c and put linecard related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-13-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push rate related code into separate fileJiri Pirko4-719/+728
Cut out another chunk from leftover.c and put rate related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-12-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push trap related code into separate fileJiri Pirko4-1862/+1873
Cut out another chunk from leftover.c and put trap related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-11-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: use tracepoint_enabled() helperJiri Pirko1-1/+1
In preparation for the trap code move, use tracepoint_enabled() helper instead of trace_devlink_trap_report_enabled() which would not be defined in that scope. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-10-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push region related code into separate fileJiri Pirko4-1257/+1267
Cut out another chunk from leftover.c and put region related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-9-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push param related code into separate fileJiri Pirko4-860/+875
Cut out another chunk from leftover.c and put param related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-8-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push resource related code into separate fileJiri Pirko4-575/+582
Cut out another chunk from leftover.c and put resource related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-7-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push dpipe related code into separate fileJiri Pirko4-895/+926
Cut out another chunk from leftover.c and put dpipe related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-6-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: move and rename devlink_dpipe_send_and_alloc_skb() helperJiri Pirko3-25/+26
Since both dpipe and resource code is using this helper, in preparation for code split to separate files, move devlink_dpipe_send_and_alloc_skb() helper into netlink.c. Rename it on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-5-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push shared buffer related code into separate fileJiri Pirko4-991/+1006
Cut out another chunk from leftover.c and put sb related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-4-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push port related code into separate fileJiri Pirko4-1530/+1547
Cut out another chunk from leftover.c and put port related code into a separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-3-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: push object register/unregister notifications into separate helpersJiri Pirko1-84/+163
In preparations of leftover.c split to individual files, avoid need to have object structures exposed in devl_internal.h and allow to have them maintained in object files. The register/unregister notifications need to know the structures to iterate lists. To avoid the need, introduce per-object register/unregister notification helpers and use them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230828061657.300667-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28inet: fix IP_TRANSPARENT error handlingEric Dumazet1-5/+3
My recent patch forgot to change error handling for IP_TRANSPARENT socket option. WARNING: bad unlock balance detected! 6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Not tainted ------------------------------------- syz-executor151/5028 is trying to release lock (sk_lock-AF_INET) at: [<ffffffff88213983>] sockopt_release_sock+0x53/0x70 net/core/sock.c:1073 but there are no more locks to release! other info that might help us debug this: 1 lock held by syz-executor151/5028: stack backtrace: CPU: 0 PID: 5028 Comm: syz-executor151 Not tainted 6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 __lock_release kernel/locking/lockdep.c:5438 [inline] lock_release+0x4b5/0x680 kernel/locking/lockdep.c:5781 sock_release_ownership include/net/sock.h:1824 [inline] release_sock+0x175/0x1b0 net/core/sock.c:3527 sockopt_release_sock+0x53/0x70 net/core/sock.c:1073 do_ip_setsockopt+0x12c1/0x3640 net/ipv4/ip_sockglue.c:1364 ip_setsockopt+0x59/0xe0 net/ipv4/ip_sockglue.c:1419 raw_setsockopt+0x218/0x290 net/ipv4/raw.c:833 __sys_setsockopt+0x2cd/0x5b0 net/socket.c:2305 __do_sys_setsockopt net/socket.c:2316 [inline] __se_sys_setsockopt net/socket.c:2313 [inline] Fixes: 4bd0623f04ee ("inet: move inet->transparent to inet->inet_flags") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28net: Make consumed action consistent in sch_handle_egressDaniel Borkmann1-0/+2
While looking at TC_ACT_* handling, the TC_ACT_CONSUMED is only handled in sch_handle_ingress but not sch_handle_egress. This was added via cd11b164073b ("net/tc: introduce TC_ACT_REINSERT.") and e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") and later got renamed into TC_ACT_CONSUMED via 720f22fed81b ("net: sched: refactor reinsert action"). The initial work was targeted for ovs back then and only needed on ingress, and the mirred action module also restricts it to only that. However, given it's an API contract it would still make sense to make this consistent to sch_handle_ingress and handle it on egress side in the same way, that is, setting return code to "success" and returning NULL back to the caller as otherwise an action module sitting on egress returning TC_ACT_CONSUMED could lead to an UAF when untreated. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28net: Fix skb consume leak in sch_handle_egressDaniel Borkmann1-0/+1
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: [...] unreferenced object 0xffff88818bcb4f00 (size 232): comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... backtrace: [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 [...] I was able to reproduce this via: ip link add dev dummy0 type dummy ip link set dev dummy0 up tc qdisc add dev eth0 clsact tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 ping 1.1.1.1 <stolen> After the fix, there are no kmemleak reports with the reproducer. This is in line with what is also done on the ingress side, and from debugging the skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible that these are two different skbs with both skb_unref(skb) as true. The two seen skbs are due to mirred doing a skb_clone() internally as use_reinsert is false in tcf_mirred_act() for egress. This was initially reported by Gal. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28dccp: Fix out of bounds access in DCCP error handlerJann Horn2-9/+19
There was a previous attempt to fix an out-of-bounds access in the DCCP error handlers, but that fix assumed that the error handlers only want to access the first 8 bytes of the DCCP header. Actually, they also look at the DCCP sequence number, which is stored beyond 8 bytes, so an explicit pskb_may_pull() is required. Fixes: 6706a97fec96 ("dccp: fix out of bound access in dccp_v4_err()") Fixes: 1aa9d1a0e7ee ("ipv6: dccp: fix out of bound access in dccp_v6_err()") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28netrom: Deny concurrent connect().Kuniyuki Iwashima1-0/+5
syzkaller reported null-ptr-deref [0] related to AF_NETROM. This is another self-accept issue from the strace log. [1] syz-executor creates an AF_NETROM socket and calls connect(), which is blocked at that time. Then, sk->sk_state is TCP_SYN_SENT and sock->state is SS_CONNECTING. [pid 5059] socket(AF_NETROM, SOCK_SEQPACKET, 0) = 4 [pid 5059] connect(4, {sa_family=AF_NETROM, sa_data="..." <unfinished ...> Another thread calls connect() concurrently, which finally fails with -EINVAL. However, the problem here is the socket state is reset even while the first connect() is blocked. [pid 5060] connect(4, NULL, 0 <unfinished ...> [pid 5060] <... connect resumed>) = -1 EINVAL (Invalid argument) As sk->state is TCP_CLOSE and sock->state is SS_UNCONNECTED, the following listen() succeeds. Then, the first connect() looks up itself as a listener and puts skb into the queue with skb->sk itself. As a result, the next accept() gets another FD of itself as 3, and the first connect() finishes. [pid 5060] listen(4, 0 <unfinished ...> [pid 5060] <... listen resumed>) = 0 [pid 5060] accept(4, NULL, NULL <unfinished ...> [pid 5060] <... accept resumed>) = 3 [pid 5059] <... connect resumed>) = 0 Then, accept4() is called but blocked, which causes the general protection fault later. [pid 5059] accept4(4, NULL, 0x20000400, SOCK_NONBLOCK <unfinished ...> After that, another self-accept occurs by accept() and writev(). [pid 5060] accept(4, NULL, NULL <unfinished ...> [pid 5061] writev(3, [{iov_base=...}] <unfinished ...> [pid 5061] <... writev resumed>) = 99 [pid 5060] <... accept resumed>) = 6 Finally, the leader thread close()s all FDs. Since the three FDs reference the same socket, nr_release() does the cleanup for it three times, and the remaining accept4() causes the following fault. [pid 5058] close(3) = 0 [pid 5058] close(4) = 0 [pid 5058] close(5) = -1 EBADF (Bad file descriptor) [pid 5058] close(6) = 0 [pid 5058] <... exit_group resumed>) = ? [ 83.456055][ T5059] general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN To avoid the issue, we need to return an error for connect() if another connect() is in progress, as done in __inet_stream_connect(). [0]: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 0 PID: 5059 Comm: syz-executor.0 Not tainted 6.5.0-rc5-syzkaller-00194-gace0ab3a4b54 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5012 Code: 45 85 c9 0f 84 cc 0e 00 00 44 8b 05 11 6e 23 0b 45 85 c0 0f 84 be 0d 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 d1 48 c1 e9 03 <80> 3c 11 00 0f 85 e8 40 00 00 49 81 3a a0 69 48 90 0f 84 96 0d 00 RSP: 0018:ffffc90003d6f9e0 EFLAGS: 00010006 RAX: ffff8880244c8000 RBX: 1ffff920007adf6c RCX: 0000000000000003 RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000018 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000018 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f51d519a6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f51d5158d58 CR3: 000000002943f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162 prepare_to_wait+0x47/0x380 kernel/sched/wait.c:269 nr_accept+0x20d/0x650 net/netrom/af_netrom.c:798 do_accept+0x3a6/0x570 net/socket.c:1872 __sys_accept4_file net/socket.c:1913 [inline] __sys_accept4+0x99/0x120 net/socket.c:1943 __do_sys_accept4 net/socket.c:1954 [inline] __se_sys_accept4 net/socket.c:1951 [inline] __x64_sys_accept4+0x96/0x100 net/socket.c:1951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f51d447cae9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f51d519a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000120 RAX: ffffffffffffffda RBX: 00007f51d459bf80 RCX: 00007f51d447cae9 RDX: 0000000020000400 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007f51d44c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f51d459bf80 R15: 00007ffc25c34e48 </TASK> Link: https://syzkaller.appspot.com/text?tag=CrashLog&x=152cdb63a80000 [1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+666c97e4686410e79649@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=666c97e4686410e79649 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28tls: get cipher_name from cipher_desc in tls_set_sw_offloadSabrina Dubroca1-25/+4
tls_cipher_desc also contains the algorithm name needed by crypto_alloc_aead, use it. Finally, use get_cipher_desc to check if the cipher_type coming from userspace is valid, and remove the cipher_type switch. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/53d021d80138aa125a9cef4468aa5ce531975a7b.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: use tls_cipher_desc to access per-cipher crypto_info in tls_set_sw_offloadSabrina Dubroca1-76/+13
The crypto_info_* helpers allow us to fetch pointers into the per-cipher crypto_info's data. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c23af110caf0af6b68de2f86c58064913e2e902a.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: use tls_cipher_desc to get per-cipher sizes in tls_set_sw_offloadSabrina Dubroca1-63/+16
We can get rid of some local variables, but we have to keep nonce_size because tls1.3 uses nonce_size = 0 for all ciphers. We can also drop the runtime sanity checks on iv/rec_seq/tag size, since we have compile time checks on those values. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/deed9c4430a62c31751a72b8c03ad66ffe710717.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: use tls_cipher_desc to simplify do_tls_getsockopt_confSabrina Dubroca1-163/+11
Every cipher uses the same code to update its crypto_info struct based on the values contained in the cctx, with only the struct type and size/offset changing. We can get those from tls_cipher_desc, and use a single pair of memcpy and final copy_to_user. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c21a904b91e972bdbbf9d1c6d2731ccfa1eedf72.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_confSabrina Dubroca1-31/+8
We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use get_cipher_desc's result to check if the cipher_type coming from userspace is valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/e97658eb4c6a5832f8ba20a06c4f36a77763c59e.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: expand use of tls_cipher_desc in tls_sw_fallback_initSabrina Dubroca1-14/+6
tls_sw_fallback_init already gets the key and tag size from tls_cipher_desc. We can now also check that the cipher type is valid, and stop hard-coding the algorithm name passed to crypto_alloc_aead. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c8c94b8fcafbfb558e09589c1f1ad48dbdf92f76.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: allocate the fallback aead after checking that the cipher is validSabrina Dubroca1-10/+10
No need to allocate the aead if we're going to fail afterwards. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: expand use of tls_cipher_desc in tls_set_device_offloadSabrina Dubroca1-18/+4
tls_set_device_offload is already getting iv and rec_seq sizes from tls_cipher_desc. We can now also check if the cipher_type coming from userspace is valid and can be offloaded. We can also remove the runtime check on rec_seq, since we validate it at compile time. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: validate cipher descriptions at compile timeSabrina Dubroca1-0/+18
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: extend tls_cipher_desc to fully describe the ciphersSabrina Dubroca2-9/+64
- add nonce, usually equal to iv_size but not for chacha - add offsets into the crypto_info for each field - add algorithm name - add offloadable flag Also add helpers to access each field of a crypto_info struct described by a tls_cipher_desc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: rename tls_cipher_size_desc to tls_cipher_descSabrina Dubroca4-52/+52
We're going to add other fields to it to fully describe a cipher, so the "_size" name won't match the contents. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: reduce size of tls_cipher_size_descSabrina Dubroca4-9/+20
tls_cipher_size_desc indexes ciphers by their type, but we're not using indices 0..50 of the array. Each struct tls_cipher_size_desc is 20B, so that's a lot of unused memory. We can reindex the array starting at the lowest used cipher_type. Introduce the get_cipher_size_desc helper to find the right item and avoid out-of-bounds accesses, and make tls_cipher_size_desc's size explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we add a new cipher. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_descSabrina Dubroca1-0/+2
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28tls: move tls_cipher_size_desc to net/tls/tls.hSabrina Dubroca1-0/+10
It's only used in net/tls/*, no need to bloat include/net/tls.h. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: Expose port function commands to control IPsec packet offloadsDima Chumak1-0/+52
Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-28devlink: Expose port function commands to control IPsec crypto offloadsDima Chumak1-0/+52
Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>