summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-26octeontx2-af: CN10KB: fix PFC configurationHariprasad Kelam1-8/+9
Suppose user has enabled pfc with prio 0,1 on a PF netdev(eth0) dcb pfc set dev eth0 prio-pfc o:on 1:on later user enabled pfc priorities 2 and 3 on the VF interface(eth1) dcb pfc set dev eth1 prio-pfc 2:on 3:on Instead of enabling pfc on all priorities (0..3), the driver only enables on priorities 2,3. This patch corrects the issue by using the proper CSR address. Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824081032.436432-3-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-26octeontx2-pf: Fix PFC TX scheduler freeSuman Ghosh2-11/+5
During PFC TX schedulers free, flag TXSCHQ_FREE_ALL was being set which caused free up all schedulers other than the PFC schedulers. This patch fixes that to free only the PFC Tx schedulers. Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824081032.436432-2-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-25Merge branch 'mlxsw-fixes'David S. Miller2-3/+5
Petr Machata says: ==================== mlxsw: Assorted fixes This patchset contains several fixes for the mlxsw driver. Patch #1 - Fixes buffer size in I2C mailbox buffer. Patch #2 - Sets limitation of chunk size in I2C transaction. Patch #3 - Fixes module label names based on MTCAP sensor counter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mlxsw: core_hwmon: Adjust module label names based on MTCAP sensor counterVadim Pasternak1-1/+2
Transceiver module temperature sensors are indexed after ASIC and platform sensors. The current label printing method does not take this into account and simply prints the index of the transceiver module sensor. On new systems that have platform sensors this results in incorrect (shifted) transceiver module labels being printed: $ sensors [...] front panel 002: +37.0°C (crit = +70.0°C, emerg = +75.0°C) front panel 003: +47.0°C (crit = +70.0°C, emerg = +75.0°C) [...] Fix by taking the sensor count into account. After the fix: $ sensors [...] front panel 001: +37.0°C (crit = +70.0°C, emerg = +75.0°C) front panel 002: +47.0°C (crit = +70.0°C, emerg = +75.0°C) [...] Fixes: a53779de6a0e ("mlxsw: core: Add QSFP module temperature label attribute to hwmon") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mlxsw: i2c: Limit single transaction buffer sizeVadim Pasternak1-1/+2
Maximum size of buffer is obtained from underlying I2C adapter and in case adapter allows I2C transaction buffer size greater than 100 bytes, transaction will fail due to firmware limitation. As a result driver will fail initialization. Limit the maximum size of transaction buffer by 100 bytes to fit to firmware. Remove unnecessary calculation: max_t(u16, MLXSW_I2C_BLK_DEF, quirk_size). This condition can not happened. Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25mlxsw: i2c: Fix chunk size setting in output mailbox bufferVadim Pasternak1-1/+1
The driver reads commands output from the output mailbox. If the size of the output mailbox is not a multiple of the transaction / block size, then the driver will not issue enough read transactions to read the entire output, which can result in driver initialization errors. Fix by determining the number of transactions using DIV_ROUND_UP(). Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: arcnet: Do not call kfree_skb() under local_irq_disable()Jinjie Ruan1-1/+1
It is not allowed to call kfree_skb() from hardware interrupt context or with hardware interrupts being disabled. So replace kfree_skb() with dev_kfree_skb_irq() under local_irq_disable(). Compile tested only. Fixes: 05fcd31cc472 ("arcnet: add err_skb package for package status feedback") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25octeontx2-pf: fix page_pool creation fail for rings > 32kRatheesh Kannoth2-1/+3
octeontx2 driver calls page_pool_create() during driver probe() and fails if queue size > 32k. Page pool infra uses these buffers as shock absorbers for burst traffic. These pages are pinned down over time as working sets varies, due to the recycling nature of page pool, given page pool (currently) don't have a shrinker mechanism, the pages remain pinned down in ptr_ring. Instead of clamping page_pool size to 32k at most, limit it even more to 2k to avoid wasting memory. This have been tested on octeontx2 CN10KA hardware. TCP and UDP tests using iperf shows no performance regressions. Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25ice: avoid executing commands on other ports when driving syncJacob Keller2-6/+52
The ice hardware has a synchronization mechanism used to drive the simultaneous application of commands on both PHY ports and the source timer in the MAC. When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will simultaneously apply the commands programmed for the main timer and each PHY port. Neither the main timer command register, nor the PHY port command registers auto clear on command execution. During the execution of a timer command intended for a single port on E822 devices, such as those used to configure a PHY during link up, the driver is not correctly clearing the previous commands. This results in unintentionally executing the last programmed command on the main timer and other PHY ports whenever performing reconfiguration on E822 ports after link up. This results in unintended side effects on other timers, depending on what command was previously programmed. To fix this, the driver must ensure that the main timer and all other PHY ports are properly initialized to perform no action. The enumeration for timer commands does not include an enumeration value for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a timer command to hardware, leave the command bits set to zero which indicates that no operation should be performed on that port. Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports other than the one being configured, write their timer command register to ICE_PTP_NOP. This ensures that no side effect happens on the timer command. To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always initialize all other ports to ICE_PTP_NOP. This ensures that no side effects happen on the other ports. Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in ice_sync_phy_timer_e822() and ice_start_phy_timer_e822(). With both of these changes, the driver should no longer execute a stale command on the main timer or another PHY port when reconfiguring one of the PHY ports after link up. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net: handle ARPHRD_PPP in dev_is_mac_header_xmit()Nicolas Dichtel1-0/+4
The goal is to support a bpf_redirect() from an ethernet device (ingress) to a ppp device (egress). The l2 header is added automatically by the ppp driver, thus the ethernet header should be removed. CC: stable@vger.kernel.org Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Siwar Zitouni <siwar.zitouni@6wind.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-25net/mlx5: Dynamic cyclecounter shift calculation for PTP free running clockRahul Rameshbabu1-5/+27
Use a dynamic calculation to determine the shift value for the internal timer cyclecounter that will lead to the highest precision frequency adjustments. Previously used a constant for the shift value assuming all devices supported by the driver had a nominal frequency of 1GHz. However, there are devices that operate at different frequencies. The previous shift value constant would break the PHC functionality for those devices. Reported-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Closes: https://lore.kernel.org/netdev/20230815151507.3028503-1-vadfed@meta.com/ Fixes: 6a4010927562 ("net/mlx5: Update cyclecounter shift value to improve ptp free running mode precision") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230821230554.236210-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24Merge tag 'net-6.5-rc8' of ↵Linus Torvalds79-356/+664
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wifi, can and netfilter. Fixes to fixes: - nf_tables: - GC transaction race with abort path - defer gc run if previous batch is still pending Previous releases - regressions: - ipv4: fix data-races around inet->inet_id - phy: fix deadlocking in phy_error() invocation - mdio: fix C45 read/write protocol - ipvlan: fix a reference count leak warning in ipvlan_ns_exit() - ice: fix NULL pointer deref during VF reset - i40e: fix potential NULL pointer dereferencing of pf->vf in i40e_sync_vsi_filters() - tg3: use slab_build_skb() when needed - mtk_eth_soc: fix NULL pointer on hw reset Previous releases - always broken: - core: validate veth and vxcan peer ifindexes - sched: fix a qdisc modification with ambiguous command request - devlink: add missing unregister linecard notification - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning - batman: - do not get eth header before batadv_check_management_packet - fix batadv_v_ogm_aggr_send memory leak - bonding: fix macvlan over alb bond support - mlxsw: set time stamp fields also when its type is MIRROR_UTC" * tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) selftests: bonding: add macvlan over bond testing selftest: bond: add new topo bond_topo_2d1c.sh bonding: fix macvlan over alb bond support rtnetlink: Reject negative ifindexes in RTM_NEWLINK netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ibmveth: Use dcbf rather than dcbfl i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters() net/sched: fix a qdisc modification with ambiguous command request igc: Fix the typo in the PTM Control macro batman-adv: Hold rtnl lock during MTU update via netlink igb: Avoid starting unnecessary workqueues can: raw: add missing refcount for memory leak fix can: isotp: fix support for transmission of SF without flow control bnx2x: new flag for track HW resource allocation sfc: allocate a big enough SKB for loopback selftest packet ...
2023-08-24Merge tag 'nf-23-08-23' of ↵Paolo Abeni5-11/+37
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net This PR contains nf_tables updates for your *net* tree. First patch fixes table validation, I broke this in 6.4 when tracking validation state per table, reported by Pablo, fixup from myself. Second patch makes sure objects waiting for memory release have been released, this was broken in 6.1, patch from Pablo Neira Ayuso. Patch three is a fix-for-fix from previous PR: In case a transaction gets aborted, gc sequence counter needs to be incremented so pending gc requests are invalidated, from Pablo. Same for patch 4: gc list needs to use gc list lock, not destroy lock, also from Pablo. Patch 5 fixes a UaF in a set backend, but this should only occur when failslab is enabled for GFP_KERNEL allocations, broken since feature was added in 5.6, from myself. Patch 6 fixes a double-free bug that was also added via previous PR: We must not schedule gc work if the previous batch is still queued. netfilter pull request 2023-08-23 * tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ==================== Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24Merge branch 'fix-macvlan-over-alb-bond-support'Paolo Abeni7-127/+272
Hangbin Liu says: ==================== fix macvlan over alb bond support Currently, the macvlan over alb bond is broken after commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds"). Fix this and add relate tests. ==================== Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24selftests: bonding: add macvlan over bond testingHangbin Liu2-1/+101
Add a macvlan over bonding test with mode active-backup, balance-tlb and balance-alb. ]# ./bond_macvlan.sh TEST: active-backup: IPv4: client->server [ OK ] TEST: active-backup: IPv6: client->server [ OK ] TEST: active-backup: IPv4: client->macvlan_1 [ OK ] TEST: active-backup: IPv6: client->macvlan_1 [ OK ] TEST: active-backup: IPv4: client->macvlan_2 [ OK ] TEST: active-backup: IPv6: client->macvlan_2 [ OK ] TEST: active-backup: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv4: server->client [ OK ] TEST: active-backup: IPv6: server->client [ OK ] TEST: active-backup: IPv4: macvlan_1->client [ OK ] TEST: active-backup: IPv6: macvlan_1->client [ OK ] TEST: active-backup: IPv4: macvlan_2->client [ OK ] TEST: active-backup: IPv6: macvlan_2->client [ OK ] TEST: active-backup: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_2->macvlan_2 [ OK ] [...] TEST: balance-alb: IPv4: client->server [ OK ] TEST: balance-alb: IPv6: client->server [ OK ] TEST: balance-alb: IPv4: client->macvlan_1 [ OK ] TEST: balance-alb: IPv6: client->macvlan_1 [ OK ] TEST: balance-alb: IPv4: client->macvlan_2 [ OK ] TEST: balance-alb: IPv6: client->macvlan_2 [ OK ] TEST: balance-alb: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv4: server->client [ OK ] TEST: balance-alb: IPv6: server->client [ OK ] TEST: balance-alb: IPv4: macvlan_1->client [ OK ] TEST: balance-alb: IPv6: macvlan_1->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->client [ OK ] TEST: balance-alb: IPv6: macvlan_2->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_2->macvlan_2 [ OK ] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24selftest: bond: add new topo bond_topo_2d1c.shHangbin Liu4-113/+167
Add a new testing topo bond_topo_2d1c.sh which is used more commonly. Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the extra link. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24bonding: fix macvlan over alb bond supportHangbin Liu2-13/+4
The commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") aims to enable the use of macvlans on top of rlb bond mode. However, the current rlb bond mode only handles ARP packets to update remote neighbor entries. This causes an issue when a macvlan is on top of the bond, and remote devices send packets to the macvlan using the bond's MAC address as the destination. After delivering the packets to the macvlan, the macvlan will rejects them as the MAC address is incorrect. Consequently, this commit makes macvlan over bond non-functional. To address this problem, one potential solution is to check for the presence of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev) and return NULL in the rlb_arp_xmit() function. However, this approach doesn't fully resolve the situation when a VLAN exists between the bond and macvlan. So let's just do a partial revert for commit 14af9963ba1e in rlb_arp_xmit(). As the comment said, Don't modify or load balance ARPs that do not originate locally. Fixes: 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds") Reported-by: susan.zheng@veritas.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24rtnetlink: Reject negative ifindexes in RTM_NEWLINKIdo Schimmel1-0/+3
Negative ifindexes are illegal, but the kernel does not validate the ifindex in the ancillary header of RTM_NEWLINK messages, resulting in the kernel generating a warning [1] when such an ifindex is specified. Fix by rejecting negative ifindexes. [1] WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593 [...] Call Trace: <TASK> register_netdevice+0x69a/0x1490 net/core/dev.c:10081 br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552 rtnl_newlink_create net/core/rtnetlink.c:3471 [inline] __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701 rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:728 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:751 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592 __sys_sendmsg+0x117/0x1e0 net/socket.c:2621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 38f7b870d4a6 ("[RTNETLINK]: Link creation API") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24Merge tag 'acpi-6.5-rc8' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro 16 M work as intended (Hans de Goede)" * tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
2023-08-23Merge tag 'platform-drivers-x86-v6.5-5' of ↵Linus Torvalds3-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Final set of three small fixes for 6.5" * tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
2023-08-23platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE ↵Shih-Yi Chen1-0/+1
notifications rshim console does not show all entries of dmesg. Fixed by setting MLXBF_TM_TX_LWM_IRQ for every CONSOLE notification. Signed-off-by: Shih-Yi Chen <shihyic@nvidia.com> Reviewed-by: Liming Sung <limings@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20230821150627.26075-1-shihyic@nvidia.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-08-23netfilter: nf_tables: defer gc run if previous batch is still pendingFlorian Westphal3-0/+11
Don't queue more gc work, else we may queue the same elements multiple times. If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue. The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged. The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending. Add a helper for this and skip the gc run in this case. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-23netfilter: nf_tables: fix out of memory error handlingFlorian Westphal1-3/+10
Several instances of pipapo_resize() don't propagate allocation failures, this causes a crash when fault injection is enabled for gfp_kernel slabs. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-23netfilter: nf_tables: use correct lock to protect gc_listPablo Neira Ayuso1-2/+2
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to protect the gc list. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23netfilter: nf_tables: GC transaction race with abort pathPablo Neira Ayuso1-1/+5
Abort path is missing a synchronization point with GC transactions. Add GC sequence number hence any GC transaction losing race will be discarded. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23netfilter: nf_tables: flush pending destroy work before netlink notifierPablo Neira Ayuso1-1/+1
Destroy work waits for the RCU grace period then it releases the objects with no mutex held. All releases objects follow this path for transactions, therefore, order is guaranteed and references to top-level objects in the hierarchy remain valid. However, netlink notifier might interfer with pending destroy work. rcu_barrier() is not correct because objects are not release via RCU callback. Flush destroy work before releasing objects from netlink notifier path. Fixes: d4bc8271db21 ("netfilter: nf_tables: netlink notifier might race to release objects") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23netfilter: nf_tables: validate all pending tablesFlorian Westphal2-4/+8
We have to validate all tables in the transaction that are in VALIDATE_DO state, the blamed commit below did not move the break statement to its right location so we only validate one table. Moreover, we can't init table->validate to _SKIP when a table object is allocated. If we do, then if a transcaction creates a new table and then fails the transaction, nfnetlink will loop and nft will hang until user cancels the command. Add back the pernet state as a place to stash the last state encountered. This is either _DO (we hit an error during commit validation) or _SKIP (transaction passed all checks). Fixes: 00c320f9b755 ("netfilter: nf_tables: make validation state per table") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-23ibmveth: Use dcbf rather than dcbflMichael Ellerman1-1/+1
When building for power4, newer binutils don't recognise the "dcbfl" extended mnemonic. dcbfl RA, RB is equivalent to dcbf RA, RB, 1. Switch to "dcbf" to avoid the build error. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()Andrii Staikov1-2/+3
Add check for pf->vf not being NULL before dereferencing pf->vf[vsi->vf_id] in updating VSI filter sync. Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted in the condition for clearing promisc mode bit. Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23net/sched: fix a qdisc modification with ambiguous command requestJamal Hadi Salim1-13/+40
When replacing an existing root qdisc, with one that is of the same kind, the request boils down to essentially a parameterization change i.e not one that requires allocation and grafting of a new qdisc. syzbot was able to create a scenario which resulted in a taprio qdisc replacing an existing taprio qdisc with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to create and graft scenario. The fix ensures that only when the qdisc kinds are different that we should allow a create and graft, otherwise it goes into the "change" codepath. While at it, fix the code and comments to improve readability. While syzbot was able to create the issue, it did not zone on the root cause. Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down. v1->V2 changes: - remove "inline" function definition (Vladmir) - remove extrenous braces in branches (Vladmir) - change inline function names (Pedro) - Run tdc tests (Victor) v2->v3 changes: - dont break else/if (Simon) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/ Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23igc: Fix the typo in the PTM Control macroSasha Neftin1-1/+1
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM requests. The bit resolution of this field is six bits. That bit five was missing in the mask. This patch comes to correct the typo in the IGC_PTM_CTRL_SHRT_CYC macro. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23batman-adv: Hold rtnl lock during MTU update via netlinkSven Eckelmann1-0/+3
The automatic recalculation of the maximum allowed MTU is usually triggered by code sections which are already rtnl lock protected by callers outside of batman-adv. But when the fragmentation setting is changed via batman-adv's own batadv genl family, then the rtnl lock is not yet taken. But dev_set_mtu requires that the caller holds the rtnl lock because it uses netdevice notifiers. And this code will then fail the check for this lock: RTNL: assertion failed at net/core/dev.c (1953) Cc: stable@vger.kernel.org Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU") Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23igb: Avoid starting unnecessary workqueuesAlessio Igor Bogani1-12/+12
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting PTP related workqueues. In this way we can fix this: BUG: unable to handle page fault for address: ffffc9000440b6f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0 Oops: 0000 [#1] PREEMPT SMP [...] Workqueue: events igb_ptp_overflow_check RIP: 0010:igb_rd32+0x1f/0x60 [...] Call Trace: igb_ptp_read_82580+0x20/0x50 timecounter_read+0x15/0x60 igb_ptp_overflow_check+0x1a/0x50 process_one_work+0x1cb/0x3c0 worker_thread+0x53/0x3f0 ? rescuer_thread+0x370/0x370 kthread+0x142/0x160 ? kthread_associate_blkcg+0xc0/0xc0 ret_from_fork+0x1f/0x30 Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.") Fixes: d339b1331616 ("igb: add PTP Hardware Clock code") Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23Merge branch '100GbE' of ↵Jakub Kicinski5-33/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-21 (ice) This series contains updates to ice driver only. Jesse fixes an issue on calculating buffer size. Petr Oros reverts a commit that does not fully resolve VF reset issues and implements one that provides a fuller fix. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix NULL pointer deref during VF reset Revert "ice: Fix ice VF reset during iavf initialization" ice: fix receive buffer size miscalculation ==================== Link: https://lore.kernel.org/r/20230821171633.2203505-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23Merge branch 'can-fixes-for-6-5-rc7'Jakub Kicinski2-24/+33
Oliver Hartkopp says: ==================== CAN fixes for 6.5-rc7 The isotp fix removes an unnecessary check which leads to delays and/or a wrong error notification. The fix for the CAN_RAW socket solves the last issue that has been introduced with commit ee8b94c8510c ("can: raw: fix receiver memory leak") in this upstream cycle (detected by Eric Dumazet). ==================== Link: https://lore.kernel.org/r/20230821144547.6658-1-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23can: raw: add missing refcount for memory leak fixOliver Hartkopp1-9/+26
Commit ee8b94c8510c ("can: raw: fix receiver memory leak") introduced a new reference to the CAN netdevice that has assigned CAN filters. But this new ro->dev reference did not maintain its own refcount which lead to another KASAN use-after-free splat found by Eric Dumazet. This patch ensures a proper refcount for the CAN nedevice. Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23can: isotp: fix support for transmission of SF without flow controlOliver Hartkopp1-15/+7
The original implementation had a very simple handling for single frame transmissions as it just sent the single frame without a timeout handling. With the new echo frame handling the echo frame was also introduced for single frames but the former exception ('simple without timers') has been maintained by accident. This leads to a 1 second timeout when closing the socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected. As the echo handling is always active (also for single frames) remove the wrong extra condition for single frames. Fixes: 9f39d36530e5 ("can: isotp: add support for transmission without flow control") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23bnx2x: new flag for track HW resource allocationThinh Tran4-28/+44
While injecting PCIe errors to the upstream PCIe switch of a BCM57810 NIC, system hangs/crashes were observed. After several calls to bnx2x_tx_timout() complete, bnx2x_nic_unload() is called to free up HW resources and bnx2x_napi_disable() is called to release NAPI objects. Later, when the EEH driver calls bnx2x_io_slot_reset() to complete the recovery process, bnx2x attempts to disable NAPI again by calling bnx2x_napi_disable() and freeing resources which have already been freed, resulting in a hang or crash. Introduce a new flag to track the HW resource and NAPI allocation state, refactor duplicated code into a single function, check page pool allocation status before freeing, and reduces debug output when a TX timeout event occurs. Reviewed-by: Manish Chopra <manishc@marvell.com> Tested-by: Abdul Haleem <abdhalee@in.ibm.com> Tested-by: David Christensen <drc@linux.vnet.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20230818161443.708785-2-thinhtr@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22Merge tag 'devicetree-fixes-for-6.5-2' of ↵Linus Torvalds4-27/+15
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix DT node refcount when creating platform devices - Fix deadlock in changeset code due to printing with devtree_lock held - Fix unittest EXPECT strings for parse_phandle_with_args_map() test - Fix IMA kexec memblock freeing * tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of/platform: increase refcount of fwnode of: dynamic: Refactor action prints to not use "%pOF" inside devtree_lock of: unittest: Fix EXPECT for parse_phandle_with_args_map() test mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer
2023-08-22sfc: allocate a big enough SKB for loopback selftest packetEdward Cree3-3/+3
Cited commits passed a size to alloc_skb that was only big enough for the actual packet contents, but the following skb_put + memcpy writes the whole struct efx_loopback_payload including leading and trailing padding bytes (which are then stripped off with skb_pull/skb_trim). This could cause an skb_over_panic, although in practice we get saved by kmalloc_size_roundup. Pass the entire size we use, instead of the size of the final packet. Reported-by: Andy Moreton <andy.moreton@amd.com> Fixes: cf60ed469629 ("sfc: use padding to fix alignment in loopback test") Fixes: 30c24dd87f3f ("sfc: siena: use padding to fix alignment in loopback test") Fixes: 1186c6b31ee1 ("sfc: falcon: use padding to fix alignment in loopback test") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821180153.18652-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22Merge tag 'wireless-2023-08-22' of ↵Jakub Kicinski3-2/+12
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Two fixes: - reorder buffer filter checks can cause bad shift/UBSAN warning with newer HW, avoid the check (mac80211) - add Kconfig dependency for iwlwifi for PTP clock usage * tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning wifi: iwlwifi: mvm: add dependency for PTP clock ==================== Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22leds: trigger: netdev: rename 'hw_control' sysfs entry to 'offloaded'Marek Behún2-14/+14
Commit b655892ffd6d ("leds: trigger: netdev: expose hw_control status via sysfs") exposed to sysfs the flag that tells whether the LED trigger is offloaded to hardware, under the name "hw_control", since that is the name under which this setting is called in the code. Everywhere else in kernel when some work that is normally done in software can be made to be done by hardware instead, we use the word "offloading" to describe this, e.g. "LED blinking is offloaded to hardware". Normally renaming sysfs entries is a no-go because of backwards compatibility. But since this patch was not yet released in a stable kernel, I think it is still possible to rename it, if there is consensus. Fixes: b655892ffd6d ("leds: trigger: netdev: expose hw_control status via sysfs") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230821121453.30203-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22net: ethernet: mtk_eth_soc: fix NULL pointer on hw resetDaniel Golle1-2/+10
When a hardware reset is triggered on devices not initializing WED the calls to mtk_wed_fe_reset and mtk_wed_fe_reset_complete dereference a pointer on uninitialized stack memory. Break out of both functions in case a hw_list entry is 0. Fixes: 08a764a7c51b ("net: ethernet: mtk_wed: add reset/reset_complete callbacks") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/5465c1609b464cc7407ae1530c40821dcdf9d3e6.1692634266.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22Merge tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds5-23/+35
Pull NFS client fixes from Trond Myklebust: - fix a use after free in nfs_direct_join_group() (Cc: stable) - fix sysfs server name memory leak - fix lock recovery hang in NFSv4.0 - fix page free in the error path for nfs42_proc_getxattr() and __nfs4_get_acl_uncached() - SUNRPC/rdma: fix receive buffer dma-mapping after a server disconnect * tag 'nfs-for-6.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: xprtrdma: Remap Receive buffers after a reconnect NFSv4: fix out path in __nfs4_get_acl_uncached NFSv4.2: fix error handling in nfs42_proc_getxattr NFS: Fix sysfs server name memory leak NFS: Fix a use after free in nfs_direct_join_group() NFSv4: Fix dropped lock for racing OPEN and delegation return
2023-08-22Merge tag 'selinux-pr-20230821' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "A small fix for a potential problem when cleaning up after a failed SELinux policy load (list next pointer not being properly initialized to NULL early enough)" * tag 'selinux-pr-20230821' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: set next pointer before attaching to list
2023-08-22tg3: Use slab_build_skb() when neededKees Cook1-1/+4
The tg3 driver will use kmalloc() under some conditions. Check the frag_size and use slab_build_skb() when frag_size is 0. Silences the warning introduced by commit ce098da1497c ("skbuff: Introduce slab_build_skb()"): Use slab_build_skb() instead ... tg3_poll_work+0x638/0xf90 [tg3] Fixes: ce098da1497c ("skbuff: Introduce slab_build_skb()") Reported-by: Fiona Ebner <f.ebner@proxmox.com> Closes: https://lore.kernel.org/all/1bd4cb9c-4eb8-3bdb-3e05-8689817242d1@proxmox.com Cc: Siva Reddy Kallam <siva.kallam@broadcom.com> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230818175417.never.273-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22selftests: bonding: do not set port down before adding to bondHangbin Liu1-2/+2
Before adding a port to bond, it need to be set down first. In the lacpdu test the author set the port down specifically. But commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") changed the operation order, the kernel will set the port down _after_ adding to bond. So all the ports will be down at last and the test failed. In fact, the veth interfaces are already inactive when added. This means there's no need to set them down again before adding to the bond. Let's just remove the link down operation. Fixes: a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") Reported-by: Zhengchao Shao <shaozhengchao@huawei.com> Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22of/platform: increase refcount of fwnodePeng Fan1-2/+2
commit 0f8e5651095b ("of/platform: Propagate firmware node by calling device_set_node()") use of_fwnode_handle to replace of_node_get, which introduces a side effect that the refcount is not increased. Then the out of tree jailhouse hypervisor enable/disable test will trigger kernel dump in of_overlay_remove, with the following sequence " of_changeset_revert(&overlay_changeset); of_changeset_destroy(&overlay_changeset); of_overlay_remove(&overlay_id); " So increase the refcount to avoid issues. This patch also release the refcount when releasing amba device to avoid refcount leakage. Fixes: 0f8e5651095b ("of/platform: Propagate firmware node by calling device_set_node()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20230821023928.3324283-2-peng.fan@oss.nxp.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-21ice: Fix NULL pointer deref during VF resetPetr Oros1-7/+8
During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a NULL pointer dereference in ice_reset_vf that VF's VSI is null. More than one instance of ice_reset_vf() can be running at a given time. When we rebuild the VSI in ice_reset_vf, another reset can be triaged from ice_service_task. In this case we can access the currently uninitialized VSI and cause panic. The window for this racing condition has been around for a long time but it's much worse after commit 227bf4500aaa ("ice: move VSI delete outside deconfig") because the reset runs faster. ice_reset_vf() using vf->cfg_lock and when we move this lock before accessing to the VF VSI, we can fix BUG for all cases. Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes in ice_vsi_stop_all_rx_rings() With our reproducer, we can hit BUG: ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig"). After this fix we are not able to reproduce it after ~48h There was commit cf90b74341ee ("ice: Fix call trace with null VSI during VF reset") which also tried to fix this issue, but it was only partially resolved and the bug still exists. [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 6420.665382] #PF: supervisor read access in kernel mode [ 6420.670521] #PF: error_code(0x0000) - not-present page [ 6420.675659] PGD 0 [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1 [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022 [ 6420.698729] Workqueue: ice ice_service_task [ice] [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice] [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83 [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246 [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000 [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828 [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000 [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0 [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000 [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0 [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002) [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6420.811841] PKRU: 55555554 [ 6420.811842] Call Trace: [ 6420.811843] <TASK> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice] [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice] [ 6420.841343] ice_service_task+0x23b/0x600 [ice] [ 6420.845884] ? __schedule+0x212/0x550 [ 6420.849550] process_one_work+0x1e2/0x3b0 [ 6420.853563] ? rescuer_thread+0x390/0x390 [ 6420.857577] worker_thread+0x50/0x3a0 [ 6420.861242] ? rescuer_thread+0x390/0x390 [ 6420.865253] kthread+0xdd/0x100 [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20 [ 6420.873194] ret_from_fork+0x1f/0x30 [ 6420.876774] </TASK> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk r Fixes: efe41860008e ("ice: Fix memory corruption in VF driver") Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-21Revert "ice: Fix ice VF reset during iavf initialization"Petr Oros4-25/+4
This reverts commit 7255355a0636b4eff08d5e8139c77d98f151c4fc. After this commit we are not able to attach VF to VM: virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52 error: Failed to attach interface error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable ice_check_vf_ready_for_cfg() already contain waiting for reset. New condition in ice_check_vf_ready_for_reset() causing only problems. Fixes: 7255355a0636 ("ice: Fix ice VF reset during iavf initialization") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>