summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds5-11/+15
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29cgroup/cpuset: Remove cpuset_do_slab_mem_spread()Xiongwei Song1-10/+0
The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD is marked as unused by [2]. Here we can remove cpuset_do_slab_mem_spread(). For more details, please check [3]. [1] https://lore.kernel.org/linux-mm/20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz/ [2] https://lore.kernel.org/linux-kernel/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/T/ [3] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350 Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-02-29dpll: fix build failure due to rcu_dereference_check() on unknown typeEric Dumazet1-4/+4
Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29Merge tag 'nf-24-02-29' of ↵Paolo Abeni1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29drm/xe/uapi: Remove unused flagsFrancois Dugast1-19/+0
Those cases missed in previous uAPI cleanups were mostly accidentally brought in from i915 or created to exercise the possibilities of gpuvm but they are not used by userspace yet, so let's remove them. They can still be brought back later if needed. v2: - Fix XE_VM_FLAG_FAULT_MODE support in xe_lrc.c (Brian Welty) - Leave DRM_XE_VM_BIND_OP_UNMAP_ALL (José Roberto de Souza) - Ensure invalid flag values are rejected (Rodrigo Vivi) v3: Rebase after removal of persistent exec_queues (Francois Dugast) v4: Rodrigo: Rebase after the new dumpable flag. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222232356.175431-1-rodrigo.vivi@intel.com (cherry picked from commit 84a1ed5e67565b09b8fd22a26754d2897de55ce0) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-29drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left overJosé Roberto de Souza1-1/+0
This is a comment left over of commit d3d767396a02 ("drm/xe/uapi: Remove sync binds"). Fixes: d3d767396a02 ("drm/xe/uapi: Remove sync binds") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231226172321.61518-1-jose.souza@intel.com (cherry picked from commit f031c3a7af8ea06790dd0a71872c4f0175084baa) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-29drm/xe: Add uapi for dumpable bosMaarten Lankhorst1-0/+1
Add the flag XE_VM_BIND_FLAG_DUMPABLE to notify devcoredump that this mapping should be dumped. This is not hooked up, but the uapi should be ready before merging. It's likely easier to dump the contents of the bo's at devcoredump readout time, so it's better if the bos will stay unmodified after a hang. The NEEDS_CPU_MAPPING flag is removed as requirement. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-3-maarten.lankhorst@linux.intel.com (cherry picked from commit 76a86b58d2b3de31e88acb487ebfa0c3cc7c41d2) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-29Merge tag 'asoc-fix-v6.8-rc5' of ↵Takashi Iwai1-0/+2
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.8 A few small fixes, some driver specific and one slightly larger one from Richard which adds a new core helper and updates a small clutch of drivers to deal with the fact that they were using a helper which requires that the lock for the list of controls without holding that lock. We also have some quirks for new AMD based Lenovo systems.
2024-02-29ublk: add UBLK_CMD_DEL_DEV_ASYNCMing Lei1-0/+2
The current command UBLK_CMD_DEL_DEV won't return until the device is released, this way looks more reliable, but makes userspace more difficult to implement, especially about orders: unmap command buffer(which holds one ublkc reference), ublkc close, io_uring_file_unregister, ublkb close. Add UBLK_CMD_DEL_DEV_ASYNC so that device deletion won't wait release, then userspace needn't worry about the above order. Actually both loop and nbd is deleted in this async way. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240223075539.89945-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-29netfilter: bridge: confirm multicast packets before passing them up the stackFlorian Westphal1-0/+1
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges. Example: macvlan0 | br0 / \ ethX ethY ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table. 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices. The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. The Macvlan broadcast worker and normal confirm path will race. This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine. But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting. Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again. This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook. Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry. The downside is that this disables NAT and conntrack helpers. Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example: -m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5 For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings. Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-28kunit: Annotate _MSG assertion variants with gnu printf specifiersDavid Gow1-6/+6
KUnit's assertion macros have variants which accept a printf format string, to allow tests to specify a more detailed message on failure. These (and the related KUNIT_FAIL() macro) ultimately wrap the __kunit_do_failed_assertion() function, which accepted a printf format specifier, but did not have the __printf attribute, so gcc couldn't warn on incorrect agruments. It turns out there were quite a few tests with such incorrect arguments. Add the __printf() specifier now that we've fixed these errors, to prevent them from recurring. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Gow <davidgow@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-28uapi: in6: replace temporary label with rfc9486Justin Iurman1-1/+1
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28Merge tag 'mm-hotfixes-stable-2024-02-27-14-52' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Six hotfixes. Three are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/debug_vm_pgtable: fix BUG_ON with pud advanced test mm: cachestat: fix folio read-after-free in cache walk MAINTAINERS: add memory mapping entry with reviewers mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index kasan: revert eviction of stack traces in generic mode stackdepot: use variable size records for non-evictable entries
2024-02-28libfs: Drop generic_set_encrypted_ci_d_opsGabriel Krisman Bertazi1-1/+0
No filesystems depend on it anymore, and it is generally a bad idea. Since all dentries should have the same set of dentry operations in case-insensitive capable filesystems, it should be propagated through ->s_d_op. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-11-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-28libfs: Add helper to choose dentry operations at mount-timeGabriel Krisman Bertazi1-0/+1
In preparation to drop the similar helper that sets d_op at lookup time, add a version to set the right d_op filesystem-wide, through sb->s_d_op. The operations structures are shared across filesystems supporting fscrypt and/or casefolding, therefore we can keep it in common libfs code. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-7-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-28fscrypt: Drop d_revalidate once the key is addedGabriel Krisman Bertazi1-7/+22
When a key is added, existing directory dentries in the DCACHE_NOKEY_NAME form are moved by the VFS to the plaintext version. But, since they have the DCACHE_OP_REVALIDATE flag set, revalidation will be done at each lookup only to return immediately, since plaintext dentries can't go stale until eviction. This patch optimizes this case, by dropping the flag once the nokey_name dentry becomes plain-text. Note that non-directory dentries are not moved this way, so they won't be affected. Of course, this can only be done if fscrypt is the only thing requiring revalidation for a dentry. For this reason, we only disable d_revalidate if the .d_revalidate hook is fscrypt_d_revalidate itself. It is safe to do it here because when moving the dentry to the plain-text version, we are holding the d_lock. We might race with a concurrent RCU lookup but this is harmless because, at worst, we will get an extra d_revalidate on the keyed dentry, which will still find the dentry to be valid. Finally, now that we do more than just clear the DCACHE_NOKEY_NAME in fscrypt_handle_d_move, skip it entirely for plaintext dentries, to avoid extra costs. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-5-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-28fscrypt: Drop d_revalidate for valid dentries during lookupGabriel Krisman Bertazi1-0/+22
Unencrypted and encrypted-dentries where the key is available don't need to be revalidated by fscrypt, since they don't go stale from under VFS and the key cannot be removed for the encrypted case without evicting the dentry. Disable their d_revalidate hook on the first lookup, to avoid repeated revalidation later. This is done in preparation to always configuring d_op through sb->s_d_op. The only part detail is that, since the filesystem might have other features that require revalidation, we only apply this optimization if the d_revalidate handler is fscrypt_d_revalidate itself. Finally, we need to clean the dentry->flags even for unencrypted dentries, so the ->d_lock might be acquired even for them. In order to avoid doing it for filesystems that don't care about fscrypt at all, we peek ->d_flags without the lock at first, and only acquire it if we actually need to write the flag. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-4-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-28fscrypt: Factor out a helper to configure the lookup dentryGabriel Krisman Bertazi1-0/+15
Both fscrypt_prepare_lookup_partial and fscrypt_prepare_lookup will set DCACHE_NOKEY_NAME for dentries when the key is not available. Extract out a helper to set this flag in a single place, in preparation to also add the optimization that will disable ->d_revalidate if possible. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-3-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-28ovl: Always reject mounting over case-insensitive directoriesGabriel Krisman Bertazi1-0/+9
overlayfs relies on the filesystem setting DCACHE_OP_HASH or DCACHE_OP_COMPARE to reject mounting over case-insensitive directories. Since commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), we set ->d_op through a hook in ->d_lookup, which means the root dentry won't have them, causing the mount to accidentally succeed. In v6.7-rc7, the following sequence will succeed to mount, but any dentry other than the root dentry will be a "weird" dentry to ovl and fail with EREMOTE. mkfs.ext4 -O casefold lower.img mount -O loop lower.img lower mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl /mnt Mounting on a subdirectory fails, as expected, because DCACHE_OP_HASH and DCACHE_OP_COMPARE are properly set by ->lookup. Fix by explicitly rejecting superblocks that allow case-insensitive dentries. Yes, this will be solved when we move d_op configuration back to ->s_d_op. Yet, we better have an explicit fix to avoid messing up again. While there, re-sort the entries to have more descriptive error messages first. Fixes: bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops") Acked-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-2-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-27workqueue: Introduce from_work() helper for cleaner callback declarationsAllen Pais1-0/+3
To streamline the transition from tasklets to worqueues, a new helper function, from_work(), is introduced. This helper, inspired by existing from_() patterns, utilizes container_of() and eliminates the redundancy of declaring variable types, leading to more concise and readable code. The modified code snippet demonstrates the enhanced clarity achieved with from_wq(): void callback(struct work_struct *w) { - struct some_data_structure *local = container_of(w, struct some_data_structure, work); + struct some_data_structure *local = from_work(local, w, work); This change aims to facilitate a smoother transition and uphold code quality standards. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git disable_work-v3 Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-02-27io_uring/kbuf: flag request if buffer pool is empty after buffer pickJens Axboe1-0/+3
Normally we do an extra roundtrip for retries even if the buffer pool has depleted, as we don't check that upfront. Rather than add this check, have the buffer selection methods mark the request with REQ_F_BL_EMPTY if the used buffer group is out of buffers after this selection. This is very cheap to do once we're all the way inside there anyway, and it gives the caller a chance to make better decisions on how to proceed. For example, recv/recvmsg multishot could check this flag when it decides whether to keep receiving or not. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27dpll: rely on rcu for netdev_dpll_pin()Eric Dumazet2-10/+12
This fixes a possible UAF in if_nlmsg_size(), which can run without RTNL. Add rcu protection to "struct dpll_pin" Move netdev_dpll_pin() from netdevice.h to dpll.h to decrease name pollution. Note: This looks possible to no longer acquire RTNL in netdev_dpll_pin_assign() later in net-next. v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko) Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27Merge branches 'rcu-doc.2024.02.14a', 'rcu-nocb.2024.02.14a', ↵Boqun Feng3-3/+4
'rcu-exp.2024.02.14a', 'rcu-tasks.2024.02.26a' and 'rcu-misc.2024.02.14a' into rcu.2024.02.26a
2024-02-26Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard58-128/+377
Sima needs a more recent release to apply a patch. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-02-26rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocksPaul E. McKenney1-0/+2
Holding a mutex across synchronize_rcu_tasks() and acquiring that same mutex in code called from do_exit() after its call to exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop() results in deadlock. This is by design, because tasks that are far enough into do_exit() are no longer present on the tasks list, making it a bit difficult for RCU Tasks to find them, let alone wait on them to do a voluntary context switch. However, such deadlocks are becoming more frequent. In addition, lockdep currently does not detect such deadlocks and they can be difficult to reproduce. In addition, if a task voluntarily context switches during that time (for example, if it blocks acquiring a mutex), then this task is in an RCU Tasks quiescent state. And with some adjustments, RCU Tasks could just as well take advantage of that fact. This commit therefore adds the data structures that will be needed to rely on these quiescent states and to eliminate these deadlocks. Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/ Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Reported-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2024-02-25Merge tag 'pull-fixes.pathwalk-rcu-2' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it *does* imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-0/+1
Pull vfs fixes from Al Viro: "A couple of fixes - revert of regression from this cycle and a fix for erofs failure exit breakage (had been there since way back)" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: erofs: fix handling kern_mount() failure Revert "get rid of DCACHE_GENOCIDE"
2024-02-25bdev: make struct bdev_handle private to the block layerChristian Brauner2-13/+0
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-29-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25bdev: make bdev_{release, open_by_dev}() private to block layerChristian Brauner1-3/+0
Move both of them to the private block header. There's no caller in the tree anymore that uses them directly. Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-28-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25bdev: remove bdev_open_by_path()Christian Brauner1-2/+0
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-27-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25pktcdvd: port block device access to fileChristian Brauner1-2/+2
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-9-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25swap: port block device usage to fileChristian Brauner1-1/+1
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-5-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25md: port block device access to fileChristian Brauner1-1/+1
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-4-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25bdev: open block device as filesChristian Brauner2-2/+15
Add two new helpers to allow opening block devices as files. This is not the final infrastructure. This still opens the block device before opening a struct a file. Until we have removed all references to struct bdev_handle we can't switch the order: * Introduce blk_to_file_flags() to translate from block specific to flags usable to pen a new file. * Introduce bdev_file_open_by_{dev,path}(). * Introduce temporary sb_bdev_handle() helper to retrieve a struct bdev_handle from a block device file and update places that directly reference struct bdev_handle to rely on it. * Don't count block device openes against the number of open files. A bdev_file_open_by_{dev,path}() file is never installed into any file descriptor table. One idea that came to mind was to use kernel_tmpfile_open() which would require us to pass a path and it would then call do_dentry_open() going through the regular fops->open::blkdev_open() path. But then we're back to the problem of routing block specific flags such as BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste FMODE_* flags every time we add a new one. With this we can avoid using a flag bit and we have more leeway in how we open block devices from bdev_open_by_{dev,path}(). Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25file: add alloc_file_pseudo_noaccount()Christian Brauner1-0/+2
When we open block devices as files we want to make sure to not charge them against the open file limit of the caller as that can cause spurious failures. Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25procfs: make freeing proc_fs_info rcu-delayedAl Viro1-0/+1
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns() is still synchronous, but that's not a problem - it does rcu-delay everything that needs to be) Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25nfs: fix UAF on pathwalk running into umountAl Viro1-0/+2
NFS ->d_revalidate(), ->permission() and ->get_link() need to access some parts of nfs_server when called in RCU mode: server->flags server->caps *(server->io_stats) and, worst of all, call server->nfs_client->rpc_ops->have_delegation (the last one - as NFS_PROTO(inode)->have_delegation()). We really don't want to RCU-delay the entire nfs_free_server() (it would have to be done with schedule_work() from RCU callback, since it can't be made to run from interrupt context), but actual freeing of nfs_server and ->io_stats can be done via call_rcu() just fine. nfs_client part is handled simply by making nfs_free_client() use kfree_rcu(). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25Merge tag 'iommu-fixes-v6.8-rc5' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes for nested domain handling: - Cache invalidation for changes in a parent domain - Dirty tracking setting for parent and nested domains - Fix a constant-out-of-range warning - ARM SMMU fixes: - Fix CD allocation from atomic context when using SVA with SMMUv3 - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it breaks the boot for Qualcomm MSM8996 devices - Restore SVA handle sharing in core code as it turned out there are still drivers relying on it * tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/sva: Restore SVA handle sharing iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock iommu/vt-d: Fix constant-out-of-range warning iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking iommu/vt-d: Add missing dirty tracking set for parent domain iommu/vt-d: Wrap the dirty tracking loop to be a helper iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking() iommu/vt-d: Add missing device iotlb flush for parent domain iommu/vt-d: Update iotlb in nested domain attach iommu/vt-d: Add missing iotlb flush for parent domain iommu/vt-d: Add __iommu_flush_iotlb_psi() iommu/vt-d: Track nested domains in parent Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
2024-02-25Merge tag 'cxl-fixes-6.8-rc6' of ↵Linus Torvalds2-18/+2
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "A collection of significant fixes for the CXL subsystem. The largest change in this set, that bordered on "new development", is the fix for the fact that the location of the new qos_class attribute did not match the Documentation. The fix ends up deleting more code than it added, and it has a new unit test to backstop basic errors in this interface going forward. So the "red-diff" and unit test saved the "rip it out and try again" response. In contrast, the new notification path for firmware reported CXL errors (CXL CPER notifications) has a locking context bug that can not be fixed with a red-diff. Given where the release cycle stands, it is not comfortable to squeeze in that fix in these waning days. So, that receives the "back it out and try again later" treatment. There is a regression fix in the code that establishes memory NUMA nodes for platform CXL regions. That has an ack from x86 folks. There are a couple more fixups for Linux to understand (reassemble) CXL regions instantiated by platform firmware. The policy around platforms that do not match host-physical-address with system-physical-address (i.e. systems that have an address translation mechanism between the address range reported in the ACPI CEDT.CFMWS and endpoint decoders) has been softened to abort driver load rather than teardown the memory range (can cause system hangs). Lastly, there is a robustness / regression fix for cases where the driver would previously continue in the face of error, and a fixup for PCI error notification handling. Summary: - Fix NUMA initialization from ACPI CEDT.CFMWS - Fix region assembly failures due to async init order - Fix / simplify export of qos_class information - Fix cxl_acpi initialization vs single-window-init failures - Fix handling of repeated 'pci_channel_io_frozen' notifications - Workaround platforms that violate host-physical-address == system-physical address assumptions - Defer CXL CPER notification handling to v6.9" * tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Fix load failures due to single window creation failure acpi/ghes: Remove CXL CPER notifications cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window cxl/test: Add support for qos_class checking cxl: Fix sysfs export of qos_class for memdev cxl: Remove unnecessary type cast in cxl_qos_class_verify() cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf' cxl/region: Allow out of order assembly of autodiscovered regions cxl/region: Handle endpoint decoders in cxl_region_find_decoder() x86/numa: Fix the sort compare func used in numa_fill_memblks() x86/numa: Fix the address overlap check in numa_fill_memblks() cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24sched: Add a new function to compare if two cpus have the same capacityQais Yousef1-0/+6
The new helper function is needed to help blk-mq check if it needs to dispatch the softirq on another CPU to match the performance level the IO requester is running at. This is important on HMP systems where not all CPUs have the same compute capacity. Signed-off-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240223155749.2958009-2-qyousef@layalina.io Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-24Merge tag 'scsi-fixes' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six fixes: the four driver ones are pretty trivial. The larger two core changes are to try to fix various USB attached devices which have somewhat eccentric ways of handling the VPD and other mode pages which necessitate multiple revalidates (that were removed in the interests of efficiency) and updating the heuristic for supported VPD pages" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: jazz_esp: Only build if SCSI core is builtin scsi: smartpqi: Fix disable_managed_interrupts scsi: ufs: Uninitialized variable in ufshcd_devfreq_target() scsi: target: pscsi: Fix bio_put() for error case scsi: core: Consult supported VPD page list prior to fetching page scsi: sd: usb_storage: uas: Access media prior to querying device properties
2024-02-24stackdepot: use variable size records for non-evictable entriesMarco Elver1-0/+3
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 [elver@google.com: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changingSean Christopherson1-0/+26
Retry page faults without acquiring mmu_lock, and without even faulting the page into the primary MMU, if the resolved gfn is covered by an active invalidation. Contending for mmu_lock is especially problematic on preemptible kernels as the mmu_notifier invalidation task will yield mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and ultimately increase the latency of resolving the page fault. And in the worst case scenario, yielding will be accompanied by a remote TLB flush, e.g. if the invalidation covers a large range of memory and vCPUs are accessing addresses that were already zapped. Faulting the page into the primary MMU is similarly problematic, as doing so may acquire locks that need to be taken for the invalidation to complete (the primary MMU has finer grained locks than KVM's MMU), and/or may cause unnecessary churn (getting/putting pages, marking them accessed, etc). Alternatively, the yielding issue could be mitigated by teaching KVM's MMU iterators to perform more work before yielding, but that wouldn't solve the lock contention and would negatively affect scenarios where a vCPU is trying to fault in an address that is NOT covered by the in-progress invalidation. Add a dedicated lockess version of the range-based retry check to avoid false positives on the sanity check on start+end WARN, and so that it's super obvious that checking for a racing invalidation without holding mmu_lock is unsafe (though obviously useful). Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking invalidation in a loop won't put KVM into an infinite loop, e.g. due to caching the in-progress flag and never seeing it go to '0'. Force a load of mmu_invalidate_seq as well, even though it isn't strictly necessary to avoid an infinite loop, as doing so improves the probability that KVM will detect an invalidation that already completed before acquiring mmu_lock and bailing anyways. Do the pre-check even for non-preemptible kernels, as waiting to detect the invalidation until mmu_lock is held guarantees the vCPU will observe the worst case latency in terms of handling the fault, and can generate even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock, detect retry, drop mmu_lock, re-enter the guest, retake the fault, and eventually re-acquire mmu_lock. This behavior is also why there are no new starvation issues due to losing the fairness guarantees provided by rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting on mmu_lock doesn't guarantee forward progress in the face of _another_ mmu_notifier invalidation event. Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE() may generate a load into a register instead of doing a direct comparison (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost is a few bytes of code and maaaaybe a cycle or three. Reported-by: Yan Zhao <yan.y.zhao@intel.com> Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com Reported-by: Friedrich Weber <f.weber@proxmox.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Cc: Yuan Yao <yuan.yao@linux.intel.com> Cc: Xu Yilun <yilun.xu@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20240222012640.2820927-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-23Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "A batch of MM (and one non-MM) hotfixes. Ten are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kasan: guard release_free_meta() shadow access with kasan_arch_is_ready() mm/damon/lru_sort: fix quota status loss due to online tunings mm/damon/reclaim: fix quota stauts loss due to online tunings MAINTAINERS: mailmap: update Shakeel's email address mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals mm: memcontrol: clarify swapaccount=0 deprecation warning mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array mm/zswap: invalidate duplicate entry when !zswap_enabled lib/Kconfig.debug: TEST_IOV_ITER depends on MMU mm/swap: fix race when skipping swapcache mm/swap_state: update zswap LRU's protection range with the folio locked selftests/mm: uffd-unit-test check if huge page size is 0 mm/damon/core: check apply interval in damon_do_apply_schemes() mm: zswap: fix missing folio cleanup in writeback race path
2024-02-23Merge tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-1/+14
Pull drm fixes from Dave Airlie: "This is the weekly drm fixes. Non-drivers there is a fbdev/sparc fix, syncobj, ttm and buddy fixes. On the driver side, ivpu, meson, i915 have a small fix each. Then amdgpu and xe have a bunch. Nouveau has some minor uapi additions to give userspace some useful info along with a Kconfig change to allow the new GSP firmware paths to be used by default on the GPUs it supports. Seems about the usual amount for this time of release cycle. fbdev: - fix sparc undefined reference syncobj: - fix sync obj fence waiting - handle NULL fence in syncobj eventfd code ttm: - fix invalid free buddy: - fix list handling - fix 32-bit build meson: - don't remove bridges from other drivers nouveau: - fix build warnings - add two minor info parameters - add a Kconfig to allow GSP by default on some GPUs ivpu: - allow fw to do initial tile config i915: - fix TV mode amdgpu: - Suspend/resume fixes - Backlight error fix - DCN 3.5 fixes - Misc fixes xe: - Remove support for persistent exec_queues - Drop a reduntant sysfs newline printout - A three-patch fix for a VM_BIND rebind optimization path - Fix a modpost warning on an xe KUNIT module" * tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drm: (27 commits) nouveau: add an ioctl to report vram usage nouveau: add an ioctl to return vram bar size. nouveau/gsp: add kconfig option to enable GSP paths by default drm/amdgpu: Fix the runtime resume failure issue drm/amd/display: fix null-pointer dereference on edid reading drm/amd/display: Fix memory leak in dm_sw_fini() drm/amd/display: fix input states translation error for dcn35 & dcn351 drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv drm/amd/display: Only allow dig mapping to pwrseq in new asic drm/amd/display: adjust few initialization order in dm drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set drm/ttm: Fix an invalid freeing on already freed page in error path sparc: Fix undefined reference to fb_is_primary_device drm/xe: Fix modpost warning on xe_mocs kunit module drm/xe/xe_gt_idle: Drop redundant newline in name drm/xe: Return 2MB page size for compact 64k PTEs drm/xe: Add XE_VMA_PTE_64K VMA flag drm/xe: Fix xe_vma_set_pte_size drm/xe/uapi: Remove support for persistent exec_queues ...
2024-02-23iommu/sva: Restore SVA handle sharingJason Gunthorpe1-0/+3
Prior to commit 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") the code allowed a SVA handle to be bound multiple times to the same (mm, device) pair. This was alluded to in the kdoc comment, but we had understood this to be more a remark about allowing multiple devices, not a literal same-driver re-opening the same SVA. It turns out uacce and idxd were both relying on the core code to handle reference counting for same-device same-mm scenarios. As this looks hard to resolve in the drivers bring it back to the core code. The new design has changed the meaning of the domain->users refcount to refer to the number of devices that are sharing that domain for the same mm. This is part of the design to lift the SVA domain de-duplication out of the drivers. Return the old behavior by explicitly de-duplicating the struct iommu_sva handle. The same (mm, device) will return the same handle pointer and the core code will handle tracking this. The last unbind of the handle will destroy it. Fixes: 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Closes: https://lore.kernel.org/all/20240221110658.529-1-zhangfei.gao@linaro.org/ Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-9455fc497a6f+3b4-iommu_sva_sharing_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-23drm/bridge: aux-hpd: separate allocation and registrationJohan Hovold1-0/+15
Combining allocation and registration is an anti-pattern that should be avoided. Add two new functions for allocating and registering an dp-hpd bridge with a proper 'devm' prefix so that it is clear that these are device managed interfaces. devm_drm_dp_hpd_bridge_alloc() devm_drm_dp_hpd_bridge_add() The new interface will be used to fix a use-after-free bug in the Qualcomm PMIC GLINK driver and may prevent similar issues from being introduced elsewhere. The existing drm_dp_hpd_bridge_register() is reimplemented using the above and left in place for now. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240217150228.5788-3-johan+linaro@kernel.org
2024-02-23ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()Richard Fitzgerald1-0/+2
snd_soc_card_get_kcontrol() must be holding a read lock on card->controls_rwsem while walking the controls list. Compare with snd_ctl_find_numid(). The existing function is renamed snd_soc_card_get_kcontrol_locked() so that it can be called from contexts that are already holding card->controls_rwsem (for example, control get/put functions). There are few direct or indirect callers of snd_soc_card_get_kcontrol(), and most are safe. Three require changes, which have been included in this patch: codecs/cs35l45.c: cs35l45_activate_ctl() is called from a control put() function so is changed to call snd_soc_card_get_kcontrol_locked(). codecs/cs35l56.c: cs35l56_sync_asp1_mixer_widgets_with_firmware() is called from control get()/put() functions so is changed to call snd_soc_card_get_kcontrol_locked(). fsl/fsl_xcvr.c: fsl_xcvr_activate_ctl() is called from three places, one of which already holds card->controls_rwsem: 1. fsl_xcvr_mode_put(), a control put function, which will already be holding card->controls_rwsem. 2. fsl_xcvr_startup(), a DAI startup function. 3. fsl_xcvr_shutdown(), a DAI shutdown function. To fix this, fsl_xcvr_activate_ctl() has been changed to call snd_soc_card_get_kcontrol_locked() so that it is safe to call directly from fsl_xcvr_mode_put(). The fsl_xcvr_startup() and fsl_xcvr_shutdown() functions have been changed to take a read lock on card->controls_rsem() around calls to fsl_xcvr_activate_ctl(). While this is not very elegant, it keeps the change small, to avoid this patch creating a large collateral churn in fsl/fsl_xcvr.c. Analysis of other callers of snd_soc_card_get_kcontrol() is that they do not need any changes, they are not holding card->controls_rwsem when they call snd_soc_card_get_kcontrol(). Direct callers of snd_soc_card_get_kcontrol(): fsl/fsl_spdif.c: fsl_spdif_dai_probe() - DAI probe function fsl/fsl_micfil.c: voice_detected_fn() - IRQ handler Indirect callers via soc_component_notify_control(): codecs/cs42l43: cs42l43_mic_shutter() - IRQ handler codecs/cs42l43: cs42l43_spk_shutter() - IRQ handler codecs/ak4118.c: ak4118_irq_handler() - IRQ handler codecs/wm_adsp.c: wm_adsp_write_ctl() - not currently used Indirect callers via snd_soc_limit_volume(): qcom/sc8280xp.c: sc8280xp_snd_init() - DAIlink init function ti/rx51.c: rx51_aic34_init() - DAI init function I don't have hardware to test the fsl/*, qcom/sc828xp.c, ti/rx51.c and ak4118.c changes. Backport note: The fsl/, qcom/, cs35l45, cs35l56 and cs42l43 callers were added since the Fixes commit so won't all be present on older kernels. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 209c6cdfd283 ("ASoC: soc-card: move snd_soc_card_get_kcontrol() to soc-card") Link: https://lore.kernel.org/r/20240221123710.690224-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-02-23ALSA: pcm: clarify and fix default msbits value for all formatsJaroslav Kysela1-2/+2
Return used most significant bits from sample bit-width rather than the whole physical sample word size. The starting bit offset is defined in the format itself. The behaviour is not changed for 32-bit formats like S32_LE. But with this change - msbits value 24 instead 32 is returned for 24-bit formats like S24_LE etc. Also, commit 2112aa034907 ("ALSA: pcm: Introduce MSBITS subformat interface") compares sample bit-width not physical sample bit-width to reset MSBITS_MAX bit from the subformat bitmask. Probably no applications are using msbits value for other than S32_LE/U32_LE formats, because no drivers are reducing msbits value for other formats (with the msb offset) at the moment. For sanity, increase PCM protocol version, letting the user space to detect the changed behaviour. Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20240222173649.1447549-1-perex@perex.cz Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-02-23net: mctp: take ownership of skb in mctp_local_outputJeremy Kerr1-0/+1
Currently, mctp_local_output only takes ownership of skb on success, and we may leak an skb if mctp_local_output fails in specific states; the skb ownership isn't transferred until the actual output routing occurs. Instead, make mctp_local_output free the skb on all error paths up to the route action, so it always consumes the passed skb. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>