summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-04-22Merge tag 'sound-5.18-rc4' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "At this time, the majority of changes are for pending ASoC fixes while a few usual HD-audio and USB-audio quirks are found. Almost all patches are small device-specific fixes, and nothing worrisome stands out, so far" * tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits) ALSA: hda/realtek: Add quirk for Clevo NP70PNP ALSA: hda: intel-dsp-config: Add RaptorLake PCI IDs ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9 ALSA: usb-audio: Clear MIDI port active flag after draining ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX. ALSA: hda/i915: Fix one too many pci_dev_put() ALSA: hda/hdmi: add HDMI codec VID for Raptorlake-P ALSA: hda/hdmi: fix warning about PCM count when used with SOF sound/oss/dmasound: fix 'dmasound_setup' defined but not used firmware: cs_dsp: Fix overrun of unterminated control name string ASoC: codecs: Fix an error handling path in (rx|tx|va)_macro_probe() ASoC: Intel: sof_es8336: Add a quirk for Huawei Matebook D15 ASoC: Intel: sof_es8336: add a quirk for headset at mic1 port ASoC: Intel: sof_es8336: support a separate gpio to control headphone ASoC: Intel: sof_es8336: simplify speaker gpio naming ASoC: wm8731: Disable the regulator when probing fails ASoC: Intel: soc-acpi: correct device endpoints for max98373 ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use ASoC: SOF: topology: Fix memory leak in sof_control_load() ASoC: SOF: topology: cleanup dailinks on widget unload ...
2022-04-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-0/+28
Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: mm (memory-failure, memcg, userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() kcov: don't generate a warning on vm_insert_page()'s failure MAINTAINERS: add Vincenzo Frascino to KASAN reviewers oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup selftest/vm: add skip support to mremap_test selftest/vm: support xfail in mremap_test selftest/vm: verify remap destination address in mremap_test selftest/vm: verify mmap addr in mremap_test mm, hugetlb: allow for "high" userspace addresses userfaultfd: mark uffd_wp regardless of VM_WRITE flag memcg: sync flush only if periodic flush is delayed mm/memory-failure.c: skip huge_zero_page in memory_failure() mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
2022-04-22oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanupNico Pache1-0/+1
The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Nico Pache <npache@redhat.com> Co-developed-by: Joel Savitz <jsavitz@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-22mm, hugetlb: allow for "high" userspace addressesChristophe Leroy1-0/+8
This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053dac8 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053dac8. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053dac8 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-22memcg: sync flush only if periodic flush is delayedShakeel Butt1-0/+5
Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Daniel Dao <dqminh@cloudflare.com> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Frank Hofmann <fhofmann@cloudflare.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-22mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()Naoya Horiguchi2-0/+14
There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page. The one simple result is that wrong processes can be killed, but another (more serious) one is that the actual error is left unhandled, so no one prevents later access to it, and that might lead to more serious results like consuming corrupted data. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... The current code first does prechecks roughly and then reconfirms after taking refcount, but it's found that it makes code overly complicated, so move the prechecks in a single hugetlb_lock range. A newly introduced function, try_memory_failure_hugetlb(), always takes hugetlb_lock (even for non-hugetlb pages). That can be improved, but memory_failure() is rare in principle, so should not be a big problem. Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21Merge tag 'net-5.18-rc4' of ↵Linus Torvalds3-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from xfrm and can. Current release - regressions: - rxrpc: restore removed timer deletion Current release - new code bugs: - gre: fix device lookup for l3mdev use-case - xfrm: fix egress device lookup for l3mdev use-case Previous releases - regressions: - sched: cls_u32: fix netns refcount changes in u32_change() - smc: fix sock leak when release after smc_shutdown() - xfrm: limit skb_page_frag_refill use to a single page - eth: atlantic: invert deep par in pm functions, preventing null derefs - eth: stmmac: use readl_poll_timeout_atomic() in atomic state Previous releases - always broken: - gre: fix skb_under_panic on xmit - openvswitch: fix OOB access in reserve_sfa_size() - dsa: hellcreek: calculate checksums in tagger - eth: ice: fix crash in switchdev mode - eth: igc: - fix infinite loop in release_swfw_sync - fix scheduling while atomic" * tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) drivers: net: hippi: Fix deadlock in rr_close() selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets nfc: MAINTAINERS: add Bug entry net: stmmac: Use readl_poll_timeout_atomic() in atomic state doc/ip-sysctl: add bc_forwarding netlink: reset network and mac headers in netlink_dump() net: mscc: ocelot: fix broken IP multicast flooding net: dsa: hellcreek: Calculate checksums in tagger net: atlantic: invert deep par in pm functions, preventing null derefs can: isotp: stop timeout monitoring when no first frame was sent bonding: do not discard lowest hash bit for non layer3+4 hashing net: lan966x: Make sure to release ptp interrupt ipv6: make ip6_rt_gc_expire an atomic_t net: Handle l3mdev in ip_tunnel_init_flow l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu net/sched: cls_u32: fix possible leak in u32_init_knode() net/sched: cls_u32: fix netns refcount changes in u32_change() powerpc: Update MAINTAINERS for ibmvnic and VAS net: restore alpha order to Ethernet devices in config ...
2022-04-19vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAPSong Liu1-2/+2
Huge page backed vmalloc memory could benefit performance in many cases. However, some users of vmalloc may not be ready to handle huge pages for various reasons: hardware constraints, potential pages split, etc. VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge pages. However, it is not easy to track down all the users that require the opt-out, as the allocation are passed different stacks and may cause issues in different layers. To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask specificially. Also, remove vmalloc_no_huge() and add opt-in helper vmalloc_huge(). Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP") Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/" Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-19fs: fix acl translationChristian Brauner1-0/+4
Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # 5.17 Cc: <regressions@lists.linux.dev> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-19Merge tag 'asoc-fix-v5.18-rc3' of ↵Takashi Iwai750-7992/+17155
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.18 A collection of fixes that came in since the merge window, plus one new device ID for an x86 laptop. Nothing that really stands out with particularly big impact outside of the affected device.
2022-04-17Merge tag 'gpio-fixes-for-v5.18-rc3' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "A single fix for gpio-sim and two patches for GPIO ACPI pulled from Andy: - fix the set/get_multiple() callbacks in gpio-sim - use correct format characters in gpiolib-acpi - use an unsigned type for pins in gpiolib-acpi" * tag 'gpio-fixes-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: sim: fix setting and getting multiple lines gpiolib: acpi: Convert type for pin to be unsigned gpiolib: acpi: use correct format characters
2022-04-17Merge tag 'random-5.18-rc3-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - Per your suggestion, random reads now won't fail if there's a page fault after some non-zero amount of data has been read, which makes the behavior consistent with all other reads in the kernel. - Rather than an inconsistent mix of random_get_entropy() returning an unsigned long or a cycles_t, now it just returns an unsigned long. - A memcpy() was replaced with an memmove(), because the addresses are sometimes overlapping. In practice the destination is always before the source, so not really an issue, but better to be correct than not. * tag 'random-5.18-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: use memmove instead of memcpy for remaining 32 bytes random: make random_get_entropy() return an unsigned long random: allow partial reads if later user copies fail
2022-04-16Merge tag 'scsi-fixes' of ↵Linus Torvalds2-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "13 fixes, all in drivers. The most extensive changes are in the iscsi series (affecting drivers qedi, cxgbi and bnx2i), the next most is scsi_debug, but that's just a simple revert and then minor updates to pm80xx" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainer scsi: qedi: Fix failed disconnect handling scsi: iscsi: Fix NOP handling during conn recovery scsi: iscsi: Merge suspend fields scsi: iscsi: Fix unbound endpoint error handling scsi: iscsi: Fix conn cleanup and stop race during iscsid restart scsi: iscsi: Fix endpoint reuse regression scsi: iscsi: Release endpoint ID when its freed scsi: iscsi: Fix offload conn cleanup when iscsid restarts scsi: iscsi: Move iscsi_ep_disconnect() scsi: pm80xx: Enable upper inbound, outbound queues scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63 Revert "scsi: scsi_debug: Address races following module load"
2022-04-16Merge tag 'intel-gpio-v5.18-2' of ↵Bartosz Golaszewski1-1/+7
gitolite.kernel.org:pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current intel-gpio for v5.18-2 * Couple of fixes related to handling unsigned value of the pin from ACPI gpiolib: - acpi: Convert type for pin to be unsigned - acpi: use correct format characters
2022-04-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+24
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: MAINTAINERS, binfmt, and mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction, hugetlb, vmalloc, and kmemleak)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: kmemleak: take a full lowmem check in kmemleak_*_phys() mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" hugetlb: do not demote poisoned hugetlb pages mm: compaction: fix compiler warning when CONFIG_COMPACTION=n mm: fix unexpected zeroed page mapping with zram swap mm, page_alloc: fix build_zonerefs_node() mm, kfence: support kmem_dump_obj() for KFENCE objects kasan: fix hw tags enablement when KUNIT tests are disabled irq_work: use kasan_record_aux_stack_noalloc() record callstack mm/secretmem: fix panic when growing a memfd_secret tmpfs: fix regressions from wider use of ZERO_PAGE MAINTAINERS: Broadcom internal lists aren't maintainers
2022-04-16mm, kfence: support kmem_dump_obj() for KFENCE objectsMarco Elver1-0/+24
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-16ipv6: make ip6_rt_gc_expire an atomic_tEric Dumazet1-2/+2
Reads and Writes to ip6_rt_gc_expire always have been racy, as syzbot reported lately [1] There is a possible risk of under-flow, leading to unexpected high value passed to fib6_run_gc(), although I have not observed this in the field. Hosts hitting ip6_dst_gc() very hard are under pretty bad state anyway. [1] BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 value changed: 0x00000bb3 -> 0x00000ba9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-16net: Handle l3mdev in ip_tunnel_init_flowDavid Ahern1-2/+9
Ido reported that the commit referenced in the Fixes tag broke a gre use case with dummy devices. Add a check to ip_tunnel_init_flow to see if the oif is an l3mdev port and if so set the oif to 0 to avoid the oif comparison in fib_lookup_good_nhc. Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15Merge tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-blockLinus Torvalds3-10/+10
Pull block fixes from Jens Axboe: - Moving of lower_48_bits() to the block layer and a fix for the unaligned_be48 added with that originally (Alexander, Keith) - Fix a bad WARN_ON() for trim size checking (Ming) - A polled IO timeout fix for null_blk (Ming) - Silence IO error printing for dead disks (Christoph) - Compat mode range fix (Khazhismel) - NVMe pull request via Christoph: - Tone down the error logging added this merge window a bit (Chaitanya Kulkarni) - Quirk devices with non-unique unique identifiers (Christoph) * tag 'block-5.18-2022-04-15' of git://git.kernel.dk/linux-block: block: don't print I/O error warning for dead disks block/compat_ioctl: fix range check in BLKGETSIZE nvme-pci: disable namespace identifiers for Qemu controllers nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202 nvme: add a quirk to disable namespace identifiers nvme: don't print verbose errors for internal passthrough requests block: null_blk: end timed out poll request block: fix offset/size check in bio_trim() asm-generic: fix __get_unaligned_be48() on 32 bit platforms block: move lower_48_bits() to block
2022-04-15Merge tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull io_uring fixes from Jens Axboe: - Ensure we check and -EINVAL any use of reserved or struct padding. Although we generally always do that, it's missed in two spots for resource updates, one for the ring fd registration from this merge window, and one for the extended arg. Make sure we have all of them handled. (Dylan) - A few fixes for the deferred file assignment (me, Pavel) - Add a feature flag for the deferred file assignment so apps can tell we handle it correctly (me) - Fix a small perf regression with the current file position fix in this merge window (me) * tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block: io_uring: abort file assignment prior to assigning creds io_uring: fix poll error reporting io_uring: fix poll file assign deadlock io_uring: use right issue_flags for splice/tee io_uring: verify pad field is 0 in io_get_ext_arg io_uring: verify resv is 0 in ringfd register/unregister io_uring: verify that resv2 is 0 in io_uring_rsrc_update2 io_uring: move io_uring_rsrc_update2 validation io_uring: fix assign file locking issue io_uring: stop using io_wq_work as an fd placeholder io_uring: move apoll->events cache io_uring: io_kiocb_update_pos() should not touch file for non -1 offset io_uring: flag the fact that linked file assignment is sane
2022-04-15Merge branch 'master' of ↵David S. Miller1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-04-14 1) Fix the output interface for VRF cases in xfrm_dst_lookup. From David Ahern. 2) Fix write out of bounds by doing COW on esp output when the packet size is larger than a page. From Sabrina Dubroca. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-15Merge tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfioLinus Torvalds1-0/+2
Pull vfio fix from Alex Williamson: - Fix VF token checking for vfio-pci variant drivers (Jason Gunthorpe) * tag 'vfio-v5.18-rc3' of https://github.com/awilliam/linux-vfio: vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
2022-04-14Merge tag 'net-5.18-rc3' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wireless and netfilter. Current release - regressions: - smc: fix af_ops of child socket pointing to released memory - wifi: ath9k: fix usage of driver-private space in tx_info Previous releases - regressions: - ipv6: fix panic when forwarding a pkt with no in6 dev - sctp: use the correct skb for security_sctp_assoc_request - smc: fix NULL pointer dereference in smc_pnet_find_ib() - sched: fix initialization order when updating chain 0 head - phy: don't defer probe forever if PHY IRQ provider is missing - dsa: revert "net: dsa: setup master before ports" - dsa: felix: fix tagging protocol changes with multiple CPU ports - eth: ice: - fix use-after-free when freeing @rx_cpu_rmap - revert "iavf: fix deadlock occurrence during resetting VF interface" - eth: lan966x: stop processing the MAC entry is port is wrong Previous releases - always broken: - sched: - flower: fix parsing of ethertype following VLAN header - taprio: check if socket flags are valid - nfc: add flush_workqueue to prevent uaf - veth: ensure eth header is in skb's linear part - eth: stmmac: fix altr_tse_pcs function when using a fixed-link - eth: macb: restart tx only if queue pointer is lagging - eth: macvlan: fix leaking skb in source mode with nodst option" * tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net: bcmgenet: Revert "Use stronger register read/writes to assure ordering" rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies net: dsa: felix: fix tagging protocol changes with multiple CPU ports tun: annotate access to queue->trans_start nfc: nci: add flush_workqueue to prevent uaf net: dsa: realtek: don't parse compatible string for RTL8366S net: dsa: realtek: fix Kconfig to assure consistent driver linkage net: ftgmac100: access hardware register after clock ready Revert "net: dsa: setup master before ports" macvlan: Fix leaking skb in source mode with nodst option netfilter: nf_tables: nft_parse_register can return a negative value net: lan966x: Stop processing the MAC entry is port is wrong. net: lan966x: Fix when a port's upper is changed. net: lan966x: Fix IGMP snooping when frames have vlan tag net: lan966x: Update lan966x_ptp_get_nominal_value sctp: Initialize daddr on peeled off socket net/smc: Fix af_ops of child socket pointing to released memory net/smc: Fix NULL pointer dereference in smc_pnet_find_ib() net/smc: use memcpy instead of snprintf to avoid out of bounds read net: macb: Restart tx only if queue pointer is lagging ...
2022-04-14Merge tag 'sound-5.18-rc3' of ↵Linus Torvalds2-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became an unexpectedly large pull request due to various regression fixes in the previous kernels. The majority of fixes are a series of patches to address the regression at probe errors in devres'ed drivers, while there are yet more fixes for the x86 SG allocations and for USB-audio buffer management. In addition, a few HD-audio quirks and other small fixes are found" * tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits) ALSA: usb-audio: Limit max buffer and period sizes per time ALSA: memalloc: Add fallback SG-buffer allocations for x86 ALSA: nm256: Don't call card private_free at probe error path ALSA: mtpav: Don't call card private_free at probe error path ALSA: rme9652: Fix the missing snd_card_free() call at probe error ALSA: hdspm: Fix the missing snd_card_free() call at probe error ALSA: hdsp: Fix the missing snd_card_free() call at probe error ALSA: oxygen: Fix the missing snd_card_free() call at probe error ALSA: lx6464es: Fix the missing snd_card_free() call at probe error ALSA: cmipci: Fix the missing snd_card_free() call at probe error ALSA: aw2: Fix the missing snd_card_free() call at probe error ALSA: als300: Fix the missing snd_card_free() call at probe error ALSA: lola: Fix the missing snd_card_free() call at probe error ALSA: bt87x: Fix the missing snd_card_free() call at probe error ALSA: sis7019: Fix the missing error handling ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error ALSA: via82xx: Fix the missing snd_card_free() call at probe error ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error ALSA: rme96: Fix the missing snd_card_free() call at probe error ALSA: rme32: Fix the missing snd_card_free() call at probe error ...
2022-04-14Merge tag 'fscache-fixes-20220413' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache fixes from David Howells: "Here's a collection of fscache and cachefiles fixes and misc small cleanups. The two main fixes are: - Add a missing unmark of the inode in-use mark in an error path. - Fix a KASAN slab-out-of-bounds error when setting the xattr on a cachefiles volume due to the wrong length being given to memcpy(). In addition, there's the removal of an unused parameter, removal of an unused Kconfig option, conditionalising a bit of procfs-related stuff and some doc fixes" * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: remove FSCACHE_OLD_API Kconfig option fscache: Use wrapper fscache_set_cache_state() directly when relinquishing fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS fscache: Remove the cookie parameter from fscache_clear_page_bits() docs: filesystems: caching/backend-api.rst: fix an object withdrawn API docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr cachefiles: unmark inode in use in error path
2022-04-13vfio/pci: Fix vf_token mechanism when device-specific VF drivers are usedJason Gunthorpe1-0/+2
get_pf_vdev() tries to check if a PF is a VFIO PF by looking at the driver: if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) { However now that we have multiple VF and PF drivers this is no longer reliable. This means that security tests realted to vf_token can be skipped by mixing and matching different VFIO PCI drivers. Instead of trying to use the driver core to find the PF devices maintain a linked list of all PF vfio_pci_core_device's that we have called pci_enable_sriov() on. When registering a VF just search the list to see if the PF is present and record the match permanently in the struct. PCI core locking prevents a PF from passing pci_disable_sriov() while VF drivers are attached so the VFIO owned PF becomes a static property of the VF. In common cases where vfio does not own the PF the global list remains empty and the VF's pointer is statically NULL. This also fixes a lockdep splat from recursive locking of the vfio_group::device_lock between vfio_device_get_from_name() and vfio_device_get_from_dev(). If the VF and PF share the same group this would deadlock. Fixes: ff53edf6d6ab ("vfio/pci: Split the pci_driver code out of vfio_pci_core.c") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v3-876570980634+f2e8-vfio_vf_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-04-13random: make random_get_entropy() return an unsigned longJason A. Donenfeld1-1/+1
Some implementations were returning type `unsigned long`, while others that fell back to get_cycles() were implicitly returning a `cycles_t` or an untyped constant int literal. That makes for weird and confusing code, and basically all code in the kernel already handled it like it was an `unsigned long`. I recently tried to handle it as the largest type it could be, a `cycles_t`, but doing so doesn't really help with much. Instead let's just make random_get_entropy() return an unsigned long all the time. This also matches the commonly used `arch_get_random_long()` function, so now RDRAND and RDTSC return the same sized integer, which means one can fallback to the other more gracefully. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-13esp: limit skb_page_frag_refill use to a single pageSabrina Dubroca1-2/+0
Commit ebe48d368e97 ("esp: Fix possible buffer overflow in ESP transformation") tried to fix skb_page_frag_refill usage in ESP by capping allocsize to 32k, but that doesn't completely solve the issue, as skb_page_frag_refill may return a single page. If that happens, we will write out of bounds, despite the check introduced in the previous patch. This patch forces COW in cases where we would end up calling skb_page_frag_refill with a size larger than a page (first in esp_output_head with tailen, then in esp_output_tail with skb->data_len). Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-04-13ALSA: memalloc: Add fallback SG-buffer allocations for x86Takashi Iwai1-0/+5
The recent change for memory allocator replaced the SG-buffer handling helper for x86 with the standard non-contiguous page handler. This works for most cases, but there is a corner case I obviously overlooked, namely, the fallback of non-contiguous handler without IOMMU. When the system runs without IOMMU, the core handler tries to use the continuous pages with a single SGL entry. It works nicely for most cases, but when the system memory gets fragmented, the large allocation may fail frequently. Ideally the non-contig handler could deal with the proper SG pages, it's cumbersome to extend for now. As a workaround, here we add new types for (minimalistic) SG allocations, instead, so that the allocator falls back to those types automatically when the allocation with the standard API failed. BTW, one better (but pretty minor) improvement from the previous SG-buffer code is that this provides the proper mmap support without the PCM's page fault handling. Fixes: 2c95b92ecd92 ("ALSA: memalloc: Unify x86 SG-buffer handling (take#3)") BugLink: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2272 BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1198248 Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220413054808.7547-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-13Merge tag 'hardening-v5.18-rc3' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - latent_entropy: Use /dev/urandom instead of small GCC seed (Jason Donenfeld) - uapi/stddef.h: add missed include guards (Tadeusz Struk) * tag 'hardening-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: latent_entropy: use /dev/urandom uapi/linux/stddef.h: Add include guards
2022-04-13Merge tag 'nfsd-5.18-1' of ↵Linus Torvalds2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a write performance regression - Fix crashes during request deferral on RDMA transports * tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix the svc_deferred_event trace class SUNRPC: Fix NFSD's request deferral on RDMA transports nfsd: Clean up nfsd_file_put() nfsd: Fix a write performance regression SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-13asm-generic: fix __get_unaligned_be48() on 32 bit platformsAlexander Lobakin1-1/+1
While testing the new macros for working with 48 bit containers, I faced a weird problem: 32 + 16: 0x2ef6e8da 0x79e60000 48: 0xffffe8da + 0x79e60000 All the bits starting from the 32nd were getting 1d in 9/10 cases. The debug showed: p[0]: 0x00002e0000000000 p[1]: 0x00002ef600000000 p[2]: 0xffffffffe8000000 p[3]: 0xffffffffe8da0000 p[4]: 0xffffffffe8da7900 p[5]: 0xffffffffe8da79e6 that the value becomes a garbage after the third OR, i.e. on `p[2] << 24`. When the 31st bit is 1 and there's no explicit cast to an unsigned, it's being considered as a signed int and getting sign-extended on OR, so `e8000000` becomes `ffffffffe8000000` and messes up the result. Cast the @p[2] to u64 as well to avoid this. Now: 32 + 16: 0x7ef6a490 0xddc10000 48: 0x7ef6a490 + 0xddc10000 p[0]: 0x00007e0000000000 p[1]: 0x00007ef600000000 p[2]: 0x00007ef6a4000000 p[3]: 0x00007ef6a4900000 p[4]: 0x00007ef6a490dd00 p[5]: 0x00007ef6a490ddc1 Fixes: c2ea5fcf53d5 ("asm-generic: introduce be48 unaligned accessors") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Link: https://lore.kernel.org/r/20220412215220.75677-1-alobakin@pm.me Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12ALSA: core: Add snd_card_free_on_error() helperTakashi Iwai1-0/+1
This is a small helper function to handle the error path more easily when an error happens during the probe for the device with the device-managed card. Since devres releases in the reverser order of the creations, usually snd_card_free() gets called at the last in the probe error path unless it already reached snd_card_register() calls. Due to this nature, when a driver expects the resource releases in card->private_free, this might be called too lately. As a workaround, one should call the probe like: static int __some_probe(...) { // do real probe.... } static int some_probe(...) { return snd_card_free_on_error(dev, __some_probe(dev, ...)); } so that the snd_card_free() is called explicitly at the beginning of the error path from the probe. This function will be used in the upcoming fixes to address the regressions by devres usages. Fixes: e8ad415b7a55 ("ALSA: core: Add managed card creation") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220412093141.8008-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-12scsi: iscsi: Fix NOP handling during conn recoveryMike Christie1-1/+1
If a offload driver doesn't use the xmit workqueue, then when we are doing ep_disconnect libiscsi can still inject PDUs to the driver. This adds a check for if the connection is bound before trying to inject PDUs. Link: https://lore.kernel.org/r/20220408001314.5014-9-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-12scsi: iscsi: Merge suspend fieldsMike Christie1-4/+5
Move the tx and rx suspend fields into one flags field. Link: https://lore.kernel.org/r/20220408001314.5014-8-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-12scsi: iscsi: Fix conn cleanup and stop race during iscsid restartMike Christie1-0/+2
If iscsid is doing a stop_conn at the same time the kernel is starting error recovery we can hit a race that allows the cleanup work to run on a valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set, but it calls flush_work on the clean_work before iscsi_conn_error_event has queued it. The flush then returns before the queueing and so the cleanup_work can run later and disconnect/stop a conn while it's in a connected state. The patch: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") added the late stop_conn call bug originally, and the patch: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") attempted to fix it but only fixed the normal EH case and left the above race for the iscsid restart case. For the normal EH case we don't hit the race because we only signal userspace to start recovery after we have done the queueing, so the flush will always catch the queued work or see it completed. For iscsid restart cases like boot, we can hit the race because iscsid will call down to the kernel before the kernel has signaled any error, so both code paths can be running at the same time. This adds a lock around the setting of the cleanup bit and queueing so they happen together. Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-12scsi: iscsi: Release endpoint ID when its freedMike Christie1-1/+1
We can't release the endpoint ID until all references to the endpoint have been dropped or it could be allocated while in use. This has us use an idr instead of looping over all conns to find a free ID and then free the ID when all references have been dropped instead of when the device is only deleted. Link: https://lore.kernel.org/r/20220408001314.5014-4-michael.christie@oracle.com Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Reviewed-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-12block: move lower_48_bits() to blockKeith Busch2-9/+9
The function is not generally applicable enough to be included in the core kernel header. Move it to block since it's the only subsystem using it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220327173316.315-1-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-11io_uring: flag the fact that linked file assignment is saneJens Axboe1-0/+1
Give applications a way to tell if the kernel supports sane linked files, as in files being assigned at the right time to be able to reliably do <open file direct into slot X><read file from slot X> while using IOSQE_IO_LINK to order them. Not really a bug fix, but flag it as such so that it gets pulled in with backports of the deferred file assignment. Fixes: 6bf9c47a3989 ("io_uring: defer file assignment") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-10Merge tag 'driver-core-5.18-rc2' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are two small driver core changes for 5.18-rc2. They are the final bits in the removal of the default_attrs field in struct kobj_type. I had to wait until after 5.18-rc1 for all of the changes to do this came in through different development trees, and then one new user snuck in. So this series has two changes: - removal of the default_attrs field in the powerpc/pseries/vas code. The change has been acked by the PPC maintainers to come through this tree - removal of default_attrs from struct kobj_type now that all in-kernel users are removed. This cleans up the kobject code a little bit and removes some duplicated functionality that confused people (now there is only one way to do default groups) Both of these have been in linux-next for all of this week with no reported problems" * tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kobject: kobj_type: remove default_attrs powerpc/pseries/vas: use default_groups in kobj_type
2022-04-10Merge tag 'locking_urgent_for_v5.18_rc2' of ↵Linus Torvalds2-23/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Allow the compiler to optimize away unused percpu accesses and change the local_lock_* macros back to inline functions - A couple of fixes to static call insn patching * tag 'locking_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "mm/page_alloc: mark pagesets as __maybe_unused" Revert "locking/local_lock: Make the empty local_lock_*() function a macro." x86/percpu: Remove volatile from arch_raw_cpu_ptr(). static_call: Remove __DEFINE_STATIC_CALL macro static_call: Properly initialise DEFINE_STATIC_CALL_RET0() static_call: Don't make __static_call_return0 static x86,static_call: Fix __static_call_return0 for i386
2022-04-10Merge tag 'gpio-fixes-for-v5.18-rc2' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - fix a race condition with consumers accessing the fields of GPIO IRQ chips before they're fully initialized * tag 'gpio-fixes-for-v5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: Restrict usage of GPIO chip irq members before initialization
2022-04-09Merge tag 'acpi-5.18-rc2' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These revert a problematic commit from the 5.17 development cycle and finalize the elimination of acpi_bus_get_device() that mostly took place during the recent merge window. Specifics: - Revert an ACPI processor driver change related to cache invalidation in acpi_idle_play_dead() that clearly was a mistake and introduced user-visible regressions (Akihiko Odaki). - Replace the last instance of acpi_bus_get_device() added during the recent merge window and drop the function to prevent more users of it from being added (Rafael Wysocki)" * tag 'acpi-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: bus: Eliminate acpi_bus_get_device() Revert "ACPI: processor: idle: Only flush cache on entering C3"
2022-04-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-4/+7
Merge fixes from Andrew Morton: "9 patches. Subsystems affected by this patch series: mm (migration, highmem, sparsemem, mremap, mempolicy, and memcg), lz4, mailmap, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add Tom as clang reviewer mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()" mailmap: update Vasily Averin's email address mm/mempolicy: fix mpol_new leak in shared_policy_replace mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning lz4: fix LZ4_decompress_safe_partial read out of bound highmem: fix checks in __kmap_local_sched_{in,out} mm: migrate: use thp_order instead of HPAGE_PMD_ORDER for new page allocation.
2022-04-09mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warningWaiman Long1-4/+7
The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code: static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; : It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense. Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access. Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long <longman@redhat.com> Reported-by: Justin Forbes <jforbes@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-09fscache: Remove the cookie parameter from fscache_clear_page_bits()Yue Hu1-3/+1
The cookie is not used at all, remove it and update the usage in io.c and afs/write.c (which is the only user outside of fscache currently) at the same time. [DH: Amended the documentation also] Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006659.html
2022-04-08Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds3-4/+3
Pull NFS client fixes from Trond Myklebust: "Stable fixes: - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() Bugfixes: - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the NFSv4.2 xattr code. - Fix for open() using an file open mode of '3' in NFSv4 - Replace readdir's use of xxhash() with hash_64() - Several patches to handle malloc() failure in SUNRPC" * tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg() SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() SUNRPC: Handle allocation failure in rpc_new_task() NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename() NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget SUNRPC: Handle low memory situations in call_status() SUNRPC: Handle ENOMEM in call_transmit_status() NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() NFS: Replace readdir's use of xxhash() with hash_64() SUNRPC: handle malloc failure in ->request_prepare NFSv4: fix open failure with O_ACCMODE flag Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-08Merge tag 'arm64-fixes' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The two main things to note are: (1) The bulk of the diffstat is us reverting a horrible bodge we had in place to ease the merging of maple tree during the merge window (which turned out not to be needed, but anyway) (2) The TLB invalidation fix is done in core code, as suggested by (and Acked-by) Peter. Summary: - Revert temporary bodge in MTE coredumping to ease maple tree integration - Fix stack frame size warning reported with 64k pages - Fix stop_machine() race with instruction text patching - Ensure alternatives patching routines are not instrumented - Enable Spectre-BHB mitigation for Cortex-A78AE - Fix hugetlb TLB invalidation when contiguous hint is used - Minor perf driver fixes - Fix some typos" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant arm64: Add part number for Arm Cortex-A78AE arm64: patch_text: Fixup last cpu should be master tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry arm64: alternatives: mark patch_alternative() as `noinstr` perf: MARVELL_CN10K_DDR_PMU should depend on ARCH_THUNDER perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator arm64: Fix comments in macro __init_el2_gicv3 arm64: fix typos in comments arch/arm64: Fix topology initialization for core scheduling arm64: mte: Fix the stack frame size warning in mte_dump_tag_range() Revert "arm64: Change elfcore for_each_mte_vma() to use VMA iterator"
2022-04-08Merge tag 'folio-5.18e' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-2/+6
Pull folio fixes from Matthew Wilcox: "Fewer bug reports than I was expecting from enabling large folios. One that doesn't show up on x86 but does on arm64, one that shows up with hugetlbfs memory failure testing and one that shows up with page migration, which it turns out I wasn't testing because my last NUMA machine died. Need to set up a qemu fake NUMA machine so I don't skip testing that in future. Summary: - Remove the migration code's assumptions about large pages being PMD sized - Don't call pmd_page() on a non-leaf PMD - Fix handling of hugetlbfs pages in page_vma_mapped_walk" * tag 'folio-5.18e' of git://git.infradead.org/users/willy/pagecache: mm/rmap: Fix handling of hugetlbfs pages in page_vma_mapped_walk mm/mempolicy: Use vma_alloc_folio() in new_page() mm: Add vma_alloc_folio() mm/migrate: Use a folio in migrate_misplaced_transhuge_page() mm/migrate: Use a folio in alloc_migration_target() mm/huge_memory: Avoid calling pmd_page() on a non-leaf PMD
2022-04-08Merge tag 'mmc-v5.18-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC updates from Ulf Hansson: "MMC core: - Improve API to make it clear that mmc_hw_reset() is for cards - Fixup support for writeback-cache for eMMC and SD - Check for errors after writes on SPI MMC host: - renesas_sdhi: A couple of fixes of TAP settings for eMMC HS400 mode - mmci_stm32: Fixup check of all elements in sg list - sdhci-xenon: Revert unnecessary fix for annoying 1.8V regulator warning" * tag 'mmc-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: improve API to make clear mmc_hw_reset is for cards mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete mmc: renesas_sdhi: special 4tap settings only apply to HS400 mmc: core: Fixup support for writeback-cache for eMMC and SD mmc: block: Check for errors after write on SPI mmc: mmci: stm32: correctly check all elements of sg list Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"