summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-27Merge tag 'x86_cc_for_v6.5' of ↵Linus Torvalds44-307/+1448
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 confidential computing update from Borislav Petkov: - Add support for unaccepted memory as specified in the UEFI spec v2.9. The gist of it all is that Intel TDX and AMD SEV-SNP confidential computing guests define the notion of accepting memory before using it and thus preventing a whole set of attacks against such guests like memory replay and the like. There are a couple of strategies of how memory should be accepted - the current implementation does an on-demand way of accepting. * tag 'x86_cc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: sevguest: Add CONFIG_CRYPTO dependency x86/efi: Safely enable unaccepted memory in UEFI x86/sev: Add SNP-specific unaccepted memory support x86/sev: Use large PSC requests if applicable x86/sev: Allow for use of the early boot GHCB for PSC requests x86/sev: Put PSC struct on the stack in prep for unaccepted memory support x86/sev: Fix calculation of end address based on number of pages x86/tdx: Add unaccepted memory support x86/tdx: Refactor try_accept_one() x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub efi/unaccepted: Avoid load_unaligned_zeropad() stepping into unaccepted memory efi: Add unaccepted memory support x86/boot/compressed: Handle unaccepted memory efi/libstub: Implement support for unaccepted memory efi/x86: Get full memory map in allocate_e820() mm: Add support for unaccepted memory
2023-06-27Merge tag 'x86_cache_for_v6.5' of ↵Linus Torvalds2-15/+163
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Implement a rename operation in resctrlfs to facilitate handling of application containers with dynamically changing task lists - When reading the tasks file, show the tasks' pid which are only in the current namespace as opposed to showing the pids from the init namespace too - Other fixes and improvements * tag 'x86_cache_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Documentation for MON group move feature x86/resctrl: Implement rename op for mon groups x86/resctrl: Factor rdtgroup lock for multi-file ops x86/resctrl: Only show tasks' pid in current pid namespace
2023-06-27Merge tag 'x86_build_for_v6.5' of ↵Linus Torvalds2-5/+50
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build update from Borislav Petkov: - Remove relocation information from vmlinux as it is not needed by other tooling and thus a slimmer binary is generated. This is important for distros who have to distribute vmlinux blobs with their kernel packages too and that extraneous unnecessary data bloats them for no good reason * tag 'x86_build_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Avoid relocation information in final vmlinux
2023-06-27Merge tag 'x86_alternatives_for_v6.5' of ↵Linus Torvalds6-168/+361
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 instruction alternatives updates from Borislav Petkov: - Up until now the Fast Short Rep Mov optimizations implied the presence of the ERMS CPUID flag. AMD decoupled them with a BIOS setting so decouple that dependency in the kernel code too - Teach the alternatives machinery to handle relocations - Make debug_alternative accept flags in order to see only that set of patching done one is interested in - Other fixes, cleanups and optimizations to the patching code * tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternative: PAUSE is not a NOP x86/alternatives: Add cond_resched() to text_poke_bp_batch() x86/nospec: Shorten RESET_CALL_DEPTH x86/alternatives: Add longer 64-bit NOPs x86/alternatives: Fix section mismatch warnings x86/alternative: Optimize returns patching x86/alternative: Complicate optimize_nops() some more x86/alternative: Rewrite optimize_nops() some x86/lib/memmove: Decouple ERMS from FSRM x86/alternative: Support relocations in alternatives x86/alternative: Make debug-alternative selective
2023-06-27Merge tag 'ras_core_for_v6.5' of ↵Linus Torvalds9-58/+513
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Add initial support for RAS hardware found on AMD server GPUs (MI200). Those GPUs and CPUs are connected together through the coherent fabric and the GPU memory controllers report errors through x86's MCA so EDAC needs to support them. The amd64_edac driver supports now HBM (High Bandwidth Memory) and thus such heterogeneous memory controller systems - Other small cleanups and improvements * tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/amd64: Cache and use GPU node map EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh EDAC/amd64: Document heterogeneous system enumeration x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors x86/amd_nb: Re-sort and re-indent PCI defines x86/amd_nb: Add MI200 PCI IDs ras/debugfs: Fix error checking for debugfs_create_dir() x86/MCE: Check a hw error's address to determine proper recovery action
2023-06-27Merge tag 'edac_updates_for_v6.5' of ↵Linus Torvalds8-5/+623
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - amd64_edac: Add support for Zen4 client hardware - amd64_edac: Remove the version string as it is useless and actively confusing when looking at backported versions of the driver - Add a driver for the Nuvoton NPCM memory controller - A debugfs error checking cleanup * tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/npcm: Add NPCM memory controller driver dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller EDAC/thunderx: Check debugfs file creation retval properly EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh EDAC/amd64: Remove module version string
2023-06-27Merge tag 'x86-core-2023-06-26' of ↵Linus Torvalds5-72/+217
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Thomas Gleixner: "A set of fixes for kexec(), reboot and shutdown issues: - Ensure that the WBINVD in stop_this_cpu() has been completed before the control CPU proceedes. stop_this_cpu() is used for kexec(), reboot and shutdown to park the APs in a HLT loop. The control CPU sends an IPI to the APs and waits for their CPU online bits to be cleared. Once they all are marked "offline" it proceeds. But stop_this_cpu() clears the CPU online bit before issuing WBINVD, which means there is no guarantee that the AP has reached the HLT loop. This was reported to cause intermittent reboot/shutdown failures due to some dubious interaction with the firmware. This is not only a problem of WBINVD. The code to actually "stop" the CPU which runs between clearing the online bit and reaching the HLT loop can cause large enough delays on its own (think virtualization). That's especially dangerous for kexec() as kexec() expects that all APs are in a safe state and not executing code while the boot CPU jumps to the new kernel. There are more issues vs kexec() which are addressed separately. Cure this by implementing an explicit synchronization point right before the AP reaches HLT. This guarantees that the AP has completed the full stop proceedure. - Fix the condition for WBINVD in stop_this_cpu(). The WBINVD in stop_this_cpu() is required for ensuring that when switching to or from memory encryption no dirty data is left in the cache lines which might cause a write back in the wrong more later. This checks CPUID directly because the feature bit might have been cleared due to a command line option. But that CPUID check accesses leaf 0x8000001f::EAX unconditionally. Intel CPUs return the content of the highest supported leaf when a non-existing leaf is read, while AMD CPUs return all zeros for unsupported leafs. So the result of the test on Intel CPUs is lottery and on AMD its just correct by chance. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required, which caused the above issue to be unearthed. - Make kexec() robust against AP code execution Ashok observed triple faults when doing kexec() on a system which had been booted with "nosmt". It turned out that the SMT siblings which had been brought up partially are parked in mwait_play_dead() to enable power savings. mwait_play_dead() is monitoring the thread flags of the AP's idle task, which has been chosen as it's unlikely to be written to. But kexec() can overwrite the previous kernel text and data including page tables etc. When it overwrites the cache lines monitored by an AP that AP resumes execution after the MWAIT on eventually overwritten text, stack and page tables, which obviously might end up in a triple fault easily. Make this more robust in several steps: 1) Use an explicit per CPU cache line for monitoring. 2) Write a command to these cache lines to kick APs out of MWAIT before proceeding with kexec(), shutdown or reboot. The APs confirm the wakeup by writing status back and then enter a HLT loop. 3) If the system uses INIT/INIT/STARTUP for AP bringup, park the APs in INIT state. HLT is not a guarantee that an AP won't wake up and resume execution. HLT is woken up by NMI and SMI. SMI puts the CPU back into HLT (+/- firmware bugs), but NMI is delivered to the CPU which executes the NMI handler. Same issue as the MWAIT scenario described above. Sending an INIT/INIT sequence to the APs puts them into wait for STARTUP state, which is safe against NMI. There is still an issue remaining which can't be fixed: #MCE If the AP sits in HLT and receives a broadcast #MCE it will try to handle it with the obvious consequences. INIT/INIT clears CR4.MCE in the AP which will cause a broadcast #MCE to shut down the machine. So there is a choice between fire (HLT) and frying pan (INIT). Frying pan has been chosen as it's at least preventing the NMI issue. On systems which are not using INIT/INIT/STARTUP there is not much which can be done right now, but at least the obvious and easy to trigger MWAIT issue has been addressed" * tag 'x86-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smp: Put CPUs into INIT on shutdown if possible x86/smp: Split sending INIT IPI out into a helper function x86/smp: Cure kexec() vs. mwait_play_dead() breakage x86/smp: Use dedicated cache-line for mwait_play_dead() x86/smp: Remove pointless wmb()s from native_stop_other_cpus() x86/smp: Dont access non-existing CPUID leaf x86/smp: Make stop_other_cpus() more robust
2023-06-27Merge tag 'timers-core-2023-06-26' of ↵Linus Torvalds29-587/+777
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time, timekeeping and related device driver updates: Core: - A set of fixes, cleanups and enhancements to the posix timer code: - Prevent another possible live lock scenario in the exit() path, which affects POSIX_CPU_TIMERS_TASK_WORK enabled architectures. - Fix a loop termination issue which was reported syzcaller/KSAN in the posix timer ID allocation code. That triggered a deeper look into the posix-timer code which unearthed more small issues. - Add missing READ/WRITE_ONCE() annotations - Fix or remove completely outdated comments - Document places which are subtle and completely undocumented. - Add missing hrtimer modes to the trace event decoder - Small cleanups and enhancements all over the place Drivers: - Rework the Hyper-V clocksource and sched clock setup code - Remove a deprecated clocksource driver - Small fixes and enhancements all over the place" * tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe dt-bindings: timers: Add Ralink SoCs timer clocksource/drivers/hyper-v: Rework clocksource and sched clock setup dt-bindings: timer: brcm,kona-timer: convert to YAML clocksource/drivers/imx-gpt: Fold <soc/imx/timer.h> into its only user clk: imx: Drop inclusion of unused header <soc/imx/timer.h> hrtimer: Add missing sparse annotations to hrtimer locking clocksource/drivers/imx-gpt: Use only a single name for functions clocksource/drivers/loongson1: Move PWM timer to clocksource framework dt-bindings: timer: Add Loongson-1 clocksource MIPS: Loongson32: Remove deprecated PWM timer clocksource clocksource/drivers/ingenic-timer: Use pm_sleep_ptr() macro tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode(). posix-timers: Add sys_ni_posix_timers() prototype tick/rcu: Fix bogus ratelimit condition alarmtimer: Remove unnecessary (void *) cast alarmtimer: Remove unnecessary initialization of variable 'ret' posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS posix-timers: Polish coding style in a few places posix-timers: Remove pointless comments ...
2023-06-26Merge tag 'smp-core-2023-06-26' of ↵Linus Torvalds66-1011/+1077
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-26Merge tag 'x86-boot-2023-06-26' of ↵Linus Torvalds44-352/+194
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Thomas Gleixner: "Initialize FPU late. Right now FPU is initialized very early during boot. There is no real requirement to do so. The only requirement is to have it done before alternatives are patched. That's done in check_bugs() which does way more than what the function name suggests. So first rename check_bugs() to arch_cpu_finalize_init() which makes it clear what this is about. Move the invocation of arch_cpu_finalize_init() earlier in start_kernel() as it has to be done before fork_init() which needs to know the FPU register buffer size. With those prerequisites the FPU initialization can be moved into arch_cpu_finalize_init(), which removes it from the early and fragile part of the x86 bringup" * tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build x86/fpu: Move FPU initialization into arch_cpu_finalize_init() x86/fpu: Mark init functions __init x86/fpu: Remove cpuinfo argument from init functions x86/init: Initialize signal frame size late init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init() init: Invoke arch_cpu_finalize_init() earlier init: Remove check_bugs() leftovers um/cpu: Switch to arch_cpu_finalize_init() sparc/cpu: Switch to arch_cpu_finalize_init() sh/cpu: Switch to arch_cpu_finalize_init() mips/cpu: Switch to arch_cpu_finalize_init() m68k/cpu: Switch to arch_cpu_finalize_init() loongarch/cpu: Switch to arch_cpu_finalize_init() ia64/cpu: Switch to arch_cpu_finalize_init() ARM: cpu: Switch to arch_cpu_finalize_init() x86/cpu: Switch to arch_cpu_finalize_init() init: Provide arch_cpu_finalize_init()
2023-06-26Merge tag 'irq-core-2023-06-26' of ↵Linus Torvalds23-297/+384
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Convert the interrupt descriptor storage to a maple tree to overcome the limitations of the radixtree + fixed size bitmap. This allows us to handle very large servers with a huge number of guests without imposing a huge memory overhead on everyone - Implement optional retriggering of interrupts which utilize the fasteoi handler to work around a GICv3 architecture issue Drivers: - A set of fixes and updates for the Loongson/Loongarch related drivers - Workaound for an ASR8601 integration hickup which ends up with CPU numbering which can't be represented in the GIC implementation - The usual set of boring fixes and updates all over the place" * tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) Revert "irqchip/mxs: Include linux/irqchip/mxs.h" irqchip/jcore-aic: Fix missing allocation of IRQ descriptors irqchip/stm32-exti: Fix warning on initialized field overwritten irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype irqchip/mxs: Include linux/irqchip/mxs.h irqchip/clps711x: Remove unused clps711x_intc_init() function irqchip/mmp: Remove non-DT codepath irqchip/ftintc010: Mark all function static irqdomain: Include internals.h for function prototypes irqchip/loongson-eiointc: Add DT init support dt-bindings: interrupt-controller: Add Loongson EIOINTC irqchip/loongson-eiointc: Fix irq affinity setting during resume irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag irqchip/loongson-liointc: Fix IRQ trigger polarity irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment irqchip/loongson-pch-pic: Fix initialization of HT vector register irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs genirq: Allow fasteoi handler to resend interrupts on concurrent handling genirq: Expand doc for PENDING and REPLAY flags ...
2023-06-26Merge tag 'core-debugobjects-2023-06-26' of ↵Linus Torvalds1-0/+9
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects update from Thomas Gleixner: "A single update for debug objects: - Recheck whether debug objects is enabled before reporting a problem to avoid spamming the logs with messages which are caused by a concurrent OOM" * tag 'core-debugobjects-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Recheck debug_objects_enabled before reporting
2023-06-26Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds218-3879/+4742
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ...
2023-06-26Merge tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds21-334/+416
Pull io_uring updates from Jens Axboe: "Nothing major in this release, just a bunch of cleanups and some optimizations around networking mostly. - clean up file request flags handling (Christoph) - clean up request freeing and CQ locking (Pavel) - support for using pre-registering the io_uring fd at setup time (Josh) - Add support for user allocated ring memory, rather than having the kernel allocate it. Mostly for packing rings into a huge page (me) - avoid an unnecessary double retry on receive (me) - maintain ordering for task_work, which also improves performance (me) - misc cleanups/fixes (Pavel, me)" * tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits) io_uring: merge conditional unlock flush helpers io_uring: make io_cq_unlock_post static io_uring: inline __io_cq_unlock io_uring: fix acquire/release annotations io_uring: kill io_cq_unlock() io_uring: remove IOU_F_TWQ_FORCE_NORMAL io_uring: don't batch task put on reqs free io_uring: move io_clean_op() io_uring: inline io_dismantle_req() io_uring: remove io_free_req_tw io_uring: open code io_put_req_find_next io_uring: add helpers to decode the fixed file file_ptr io_uring: use io_file_from_index in io_msg_grab_file io_uring: use io_file_from_index in __io_sync_cancel io_uring: return REQ_F_ flags from io_file_get_flags io_uring: remove io_req_ffs_set io_uring: remove a confusing comment above io_file_get_flags io_uring: remove the mode variable in io_file_get_flags io_uring: remove __io_file_supports_nowait io_uring: wait interruptibly for request completions on exit ...
2023-06-26Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds68-621/+694
Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-06-26Merge tag 'for-6.5-tag' of ↵Linus Torvalds75-2667/+2737
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Mainly core changes, refactoring and optimizations. Performance is improved in some areas, overall there may be a cumulative improvement due to refactoring that removed lookups in the IO path or simplified IO submission tracking. Core: - submit IO synchronously for fast checksums (crc32c and xxhash), remove high priority worker kthread - read extent buffer in one go, simplify IO tracking, bio submission and locking - remove additional tracking of redirtied extent buffers, originally added for zoned mode but actually not needed - track ordered extent pointer in bio to avoid rbtree lookups during IO - scrub, use recovered data stripes as cache to avoid unnecessary read - in zoned mode, optimize logical to physical mappings of extents - remove PageError handling, not set by VFS nor writeback - cleanups, refactoring, better structure packing - lots of error handling improvements - more assertions, lockdep annotations - print assertion failure with the exact line where it happens - tracepoint updates - more debugging prints Performance: - speedup in fsync(), better tracking of inode logged status can avoid transaction commit - IO path structures track logical offsets in data structures and does not need to look it up User visible changes: - don't commit transaction for every created subvolume, this can reduce time when many subvolumes are created in a batch - print affected files when relocation fails - trigger orphan file cleanup during START_SYNC ioctl Notable fixes: - fix crash when disabling quota and relocation - fix crashes when removing roots from drity list - fix transacion abort during relocation when converting from newer profiles not covered by fallback - in zoned mode, stop reclaiming block groups if filesystem becomes read-only - fix rare race condition in tree mod log rewind that can miss some btree node slots - with enabled fsverity, drop up-to-date page bit in case the verification fails" * tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (194 commits) btrfs: fix race between quota disable and relocation btrfs: add comment to struct btrfs_fs_info::dirty_cowonly_roots btrfs: fix race when deleting free space root from the dirty cow roots list btrfs: fix race when deleting quota root from the dirty cow roots list btrfs: tracepoints: also show actual number of the outstanding extents btrfs: update i_version in update_dev_time btrfs: make btrfs_compressed_bioset static btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers btrfs: scrub: remove scrub_ctx::csum_list member btrfs: do not BUG_ON after failure to migrate space during truncation btrfs: do not BUG_ON on failure to get dir index for new snapshot btrfs: send: do not BUG_ON() on unexpected symlink data extent btrfs: do not BUG_ON() when dropping inode items from log root btrfs: replace BUG_ON() at split_item() with proper error handling btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() btrfs: abort transaction at update_ref_for_cow() when ref count is zero ...
2023-06-26Merge tag 'zonefs-6.5-rc1' of ↵Linus Torvalds3-98/+121
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: - Modify the synchronous direct write path to use iomap instead of manually coding issuing zone append write BIOs (me) - Use the FMODE_CAN_ODIRECT file flag to indicate support from direct IO instead of using the old way with noop direct_io methods (Christoph) * tag 'zonefs-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method zonefs: use iomap for synchronous direct writes
2023-06-26Merge tag 'erofs-for-6.5-rc1' of ↵Linus Torvalds8-783/+438
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "No outstanding new feature for this cycle. Most of these commits are decompression cleanups which are part of the ongoing development for subpage/folio compression support as well as xattr cleanups for the upcoming xattr bloom filter optimization [1]. In addition, there are bugfixes to address some corner cases of compressed images due to global data de-duplication and arm64 16k pages. Summary: - Fix rare I/O hang on deduplicated compressed images due to loop hooked chains - Fix compact compression layout of 16k blocks on arm64 devices - Fix atomic context detection of async decompression - Decompression/Xattr code cleanups" Link: https://lore.kernel.org/r/20230621083209.116024-1-jefflexu@linux.alibaba.com [1] * tag 'erofs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up zmap.c erofs: remove unnecessary goto erofs: Fix detection of atomic context erofs: use separate xattr parsers for listxattr/getxattr erofs: unify inline/shared xattr iterators for listxattr/getxattr erofs: make the size of read data stored in buffer_ofs erofs: unify xattr_iter structures erofs: use absolute position in xattr iterator erofs: fix compact 4B support for 16k block size erofs: convert erofs_read_metabuf() to erofs_bread() for xattr erofs: use poison pointer to replace the hard-coded address erofs: use struct lockref to replace handcrafted approach erofs: adapt managed inode operations into folios erofs: kill hooked chains to avoid loops on deduplicated compressed images erofs: avoid on-stack pagepool directly passed by arguments erofs: allocate extra bvec pages directly instead of retrying erofs: clean up z_erofs_pcluster_readmore() erofs: remove the member readahead from struct z_erofs_decompress_frontend erofs: fold in z_erofs_decompress()
2023-06-26Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds12-362/+299
Pull fsverity updates from Eric Biggers: "Several updates for fs/verity/: - Do all hashing with the shash API instead of with the ahash API. This simplifies the code and reduces API overhead. It should also make things slightly easier for XFS's upcoming support for fsverity. It does drop fsverity's support for off-CPU hash accelerators, but that support was incomplete and not known to be used - Update and export fsverity_get_digest() so that it's ready for overlayfs's upcoming support for fsverity checking of lowerdata - Improve the documentation for builtin signature support - Fix a bug in the large folio support" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: improve documentation for builtin signature support fsverity: rework fsverity_get_digest() again fsverity: simplify error handling in verify_data_block() fsverity: don't use bio_first_page_all() in fsverity_verify_bio() fsverity: constify fsverity_hash_alg fsverity: use shash API instead of ahash API
2023-06-26Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds2-6/+6
Pull fscrypt update from Eric Biggers: "Just one flex array conversion patch" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: Replace 1-element array with flexible array
2023-06-26Merge tag 'nfsd-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds31-434/+801
Pull nfsd updates from Chuck Lever: - Clean-ups in the READ path in anticipation of MSG_SPLICE_PAGES - Better NUMA awareness when allocating pages and other objects - A number of minor clean-ups to XDR encoding - Elimination of a race when accepting a TCP socket - Numerous observability enhancements * tag 'nfsd-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (46 commits) nfsd: remove redundant assignments to variable len svcrdma: Fix stale comment NFSD: Distinguish per-net namespace initialization nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net SUNRPC: Address RCU warning in net/sunrpc/svc.c SUNRPC: Use sysfs_emit in place of strlcpy/sprintf SUNRPC: Remove transport class dprintk call sites SUNRPC: Fix comments for transport class registration svcrdma: Remove an unused argument from __svc_rdma_put_rw_ctxt() svcrdma: trace cc_release calls svcrdma: Convert "might sleep" comment into a code annotation NFSD: Add an nfsd4_encode_nfstime4() helper SUNRPC: Move initialization of rq_stime SUNRPC: Optimize page release in svc_rdma_sendto() svcrdma: Prevent page release when nothing was received svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling") SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field") SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing rq_res.head[0].iov_base") NFSD: add encoding of op_recall flag for write delegation NFSD: Add "official" reviewers for this subsystem ...
2023-06-26Merge tag 'v6.5/vfs.mount' of ↵Linus Torvalds4-82/+417
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains the work to extend move_mount() to allow adding a mount beneath the topmost mount of a mount stack. There are two LWN articles about this. One covers the original patch series in [1]. The other in [2] summarizes the session and roughly the discussion between Al and me at LSFMM. The second article also goes into some good questions from attendees. Since all details are found in the relevant commit with a technical dive into semantics and locking at the end I'm only adding the motivation and core functionality for this from commit message and leave out the invasive details. The code is also heavily commented and annotated as well which was explicitly requested. TL;DR: > mount -t ext4 /dev/sda /mnt | └─/mnt /dev/sda ext4 > mount --beneath -t xfs /dev/sdb /mnt | └─/mnt /dev/sdb xfs └─/mnt /dev/sda ext4 > umount /mnt | └─/mnt /dev/sdb xfs The longer motivation is that various distributions are adding or are in the process of adding support for system extensions and in the future configuration extensions through various tools. A more detailed explanation on system and configuration extensions can be found on the manpage which is listed below at [3]. System extension images may – dynamically at runtime — extend the /usr/ and /opt/ directory hierarchies with additional files. This is particularly useful on immutable system images where a /usr/ and/or /opt/ hierarchy residing on a read-only file system shall be extended temporarily at runtime without making any persistent modifications. When one or more system extension images are activated, their /usr/ and /opt/ hierarchies are combined via overlayfs with the same hierarchies of the host OS, and the host /usr/ and /opt/ overmounted with it ("merging"). When they are deactivated, the mount point is disassembled — again revealing the unmodified original host version of the hierarchy ("unmerging"). Merging thus makes the extension's resources suddenly appear below the /usr/ and /opt/ hierarchies as if they were included in the base OS image itself. Unmerging makes them disappear again, leaving in place only the files that were shipped with the base OS image itself. System configuration images are similar but operate on directories containing system or service configuration. On nearly all modern distributions mount propagation plays a crucial role and the rootfs of the OS is a shared mount in a peer group (usually with peer group id 1): TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:1 29 1 On such systems all services and containers run in a separate mount namespace and are pivot_root()ed into their rootfs. A separate mount namespace is almost always used as it is the minimal isolation mechanism services have. But usually they are even much more isolated up to the point where they almost become indistinguishable from containers. Mount propagation again plays a crucial role here. The rootfs of all these services is a slave mount to the peer group of the host rootfs. This is done so the service will receive mount propagation events from the host when certain files or directories are updated. In addition, the rootfs of each service, container, and sandbox is also a shared mount in its separate peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:24 master:1 71 47 For people not too familiar with mount propagation, the master:1 means that this is a slave mount to peer group 1. Which as one can see is the host rootfs as indicated by shared:1 above. The shared:24 indicates that the service rootfs is a shared mount in a separate peer group with peer group id 24. A service may run other services. Such nested services will also have a rootfs mount that is a slave to the peer group of the outer service rootfs mount. For containers things are just slighly different. A container's rootfs isn't a slave to the service's or host rootfs' peer group. The rootfs mount of a container is simply a shared mount in its own peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /home/ubuntu/debian-tree / ext4 shared:99 61 60 So whereas services are isolated OS components a container is treated like a separate world and mount propagation into it is restricted to a single well known mount that is a slave to the peer group of the shared mount /run on the host: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /propagate/debian-tree /run/host/incoming tmpfs master:5 71 68 Here, the master:5 indicates that this mount is a slave to the peer group with peer group id 5. This allows to propagate mounts into the container and served as a workaround for not being able to insert mounts into mount namespaces directly. But the new mount api does support inserting mounts directly. For the interested reader the blogpost in [4] might be worth reading where I explain the old and the new approach to inserting mounts into mount namespaces. Containers of course, can themselves be run as services. They often run full systems themselves which means they again run services and containers with the exact same propagation settings explained above. The whole system is designed so that it can be easily updated, including all services in various fine-grained ways without having to enter every single service's mount namespace which would be prohibitively expensive. The mount propagation layout has been carefully chosen so it is possible to propagate updates for system extensions and configurations from the host into all services. The simplest model to update the whole system is to mount on top of /usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc will then propagate into every service. This works cleanly the first time. However, when the system is updated multiple times it becomes necessary to unmount the first update on /opt, /usr, /etc and then propagate the new update. But this means, there's an interval where the old base system is accessible. This has to be avoided to protect against downgrade attacks. The vfs already exposes a mechanism to userspace whereby mounts can be mounted beneath an existing mount. Such mounts are internally referred to as "tucked". The patch series exposes the ability to mount beneath a top mount through the new MOVE_MOUNT_BENEATH flag for the move_mount() system call. This allows userspace to seamlessly upgrade mounts. After this series the only thing that will have changed is that mounting beneath an existing mount can be done explicitly instead of just implicitly. The crux is that the proposed mechanism already exists and that it is so powerful as to cover cases where mounts are supposed to be updated with new versions. Crucially, it offers an important flexibility. Namely that updates to a system may either be forced or can be delayed and the umount of the top mount be left to a service if it is a cooperative one" Link: https://lwn.net/Articles/927491 [1] Link: https://lwn.net/Articles/934094 [2] Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [3] Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [4] Link: https://github.com/flatcar/sysext-bakery Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1 Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2 Link: https://github.com/systemd/systemd/pull/26013 * tag 'v6.5/vfs.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: allow to mount beneath top mount fs: use a for loop when locking a mount fs: properly document __lookup_mnt() fs: add path_mounted()
2023-06-26Merge tag 'v6.5/vfs.file' of ↵Linus Torvalds9-62/+205
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs file handling updates from Christian Brauner: "This contains Amir's work to fix a long-standing problem where an unprivileged overlayfs mount can be used to avoid fanotify permission events that were requested for an inode or superblock on the underlying filesystem. Some background about files opened in overlayfs. If a file is opened in overlayfs @file->f_path will refer to a "fake" path. What this means is that while @file->f_inode will refer to inode of the underlying layer, @file->f_path refers to an overlayfs {dentry,vfsmount} pair. The reasons for doing this are out of scope here but it is the reason why the vfs has been providing the open_with_fake_path() helper for overlayfs for very long time now. So nothing new here. This is for sure not very elegant and everyone including the overlayfs maintainers agree. Improving this significantly would involve more fragile and potentially rather invasive changes. In various codepaths access to the path of the underlying filesystem is needed for such hybrid file. The best example is fsnotify where this becomes security relevant. Passing the overlayfs @file->f_path->dentry will cause fsnotify to skip generating fsnotify events registered on the underlying inode or superblock. To fix this we extend the vfs provided open_with_fake_path() concept for overlayfs to create a backing file container that holds the real path and to expose a helper that can be used by relevant callers to get access to the path of the underlying filesystem through the new file_real_path() helper. This pattern is similar to what we do in d_real() and d_real_inode(). The first beneficiary is fsnotify and fixes the security sensitive problem mentioned above. There's a couple of nice cleanups included as well. Over time, the old open_with_fake_path() helper added specifically for overlayfs a long time ago started to get used in other places such as cachefiles. Even though cachefiles have nothing to do with hybrid files. The only reason cachefiles used that concept was that files opened with open_with_fake_path() aren't charged against the caller's open file limit by raising FMODE_NOACCOUNT. It's just mere coincidence that both overlayfs and cachefiles need to ensure to not overcharge the caller for their internal open calls. So this work disentangles FMODE_NOACCOUNT use cases and backing file use-cases by adding the FMODE_BACKING flag which indicates that the file can be used to retrieve the backing file of another filesystem. (Fyi, Jens will be sending you a really nice cleanup from Christoph that gets rid of 3 FMODE_* flags otherwise this would be the last fmode_t bit we'd be using.) So now overlayfs becomes the sole user of the renamed open_with_fake_path() helper which is now named backing_file_open(). For internal kernel users such as cachefiles that are only interested in FMODE_NOACCOUNT but not in FMODE_BACKING we add a new kernel_file_open() helper which opens a file without being charged against the caller's open file limit. All new helpers are properly documented and clearly annotated to mention their special uses. We also rename vfs_tmpfile_open() to kernel_tmpfile_open() to clearly distinguish it from vfs_tmpfile() and align it the other kernel_*() internal helpers" * tag 'v6.5/vfs.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ovl: enable fsnotify events on underlying real files fs: use backing_file container for internal files with "fake" f_path fs: move kmem_cache_zalloc() into alloc_empty_file*() helpers fs: use a helper for opening kernel internal files fs: rename {vfs,kernel}_tmpfile_open()
2023-06-26Merge tag 'v6.5/vfs.rename.locking' of ↵Linus Torvalds7-74/+89
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs rename locking updates from Christian Brauner: "This contains the work from Jan to fix problems with cross-directory renames originally reported in [1]. To quickly sum it up some filesystems (so far we know at least about ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the directory when it is being renamed into another directory. This is because we need to update the parent pointer in the directory in that case and if that races with other operations on the directory, in particular a conversion from one directory format into another, bad things can happen. So far we've done the locking in the filesystem code but recently Darrick pointed out in [2] that the RENAME_EXCHANGE case was missing. That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix regular files and directories and proper lock ordering is not achievable in the filesystems alone. This patch set adds locking into vfs_rename() so that not only parent directories but also moved inodes, regardless of whether they are directories or not, are locked when calling into the filesystem. This means establishing a locking order for unrelated directories. New helpers are added for this purpose and our documentation is updated to cover this in detail. The locking is now actually easier to follow as we now always lock source and target. We've always locked the target independent of whether it was a directory or file and we've always locked source if it was a regular file. The exact details for why this came about can be found in [3] and [4]" Link: https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3 [1] Link: https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs [2] Link: https://lore.kernel.org/all/20230526-schrebergarten-vortag-9cd89694517e@brauner [3] Link: https://lore.kernel.org/all/20230530-seenotrettung-allrad-44f4b00139d4@brauner [4] * tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Restrict lock_two_nondirectories() to non-directory inodes fs: Lock moved directories fs: Establish locking order for unrelated directories Revert "f2fs: fix potential corruption when moving a directory" Revert "udf: Protect rename against modification of moved directory" ext4: Remove ext4 locking of moved directory
2023-06-26Merge tag 'v6.5/vfs.misc' of ↵Linus Torvalds40-132/+188
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Miscellaneous features, cleanups, and fixes for vfs and individual fs Features: - Use mode 0600 for file created by cachefilesd so it can be run by unprivileged users. This aligns them with directories which are already created with mode 0700 by cachefilesd - Reorder a few members in struct file to prevent some false sharing scenarios - Indicate that an eventfd is used a semaphore in the eventfd's fdinfo procfs file - Add a missing uapi header for eventfd exposing relevant uapi defines - Let the VFS protect transitions of a superblock from read-only to read-write in addition to the protection it already provides for transitions from read-write to read-only. Protecting read-only to read-write transitions allows filesystems such as ext4 to perform internal writes, keeping writers away until the transition is completed Cleanups: - Arnd removed the architecture specific arch_report_meminfo() prototypes and added a generic one into procfs.h. Note, we got a report about a warning in amdpgpu codepaths that suggested this was bisectable to this change but we concluded it was a false positive - Remove unused parameters from split_fs_names() - Rename put_and_unmap_page() to unmap_and_put_page() to let the name reflect the order of the cleanup operation that has to unmap before the actual put - Unexport buffer_check_dirty_writeback() as it is not used outside of block device aops - Stop allocating aio rings from highmem - Protecting read-{only,write} transitions in the VFS used open-coded barriers in various places. Replace them with proper little helpers and document both the helpers and all barrier interactions involved when transitioning between read-{only,write} states - Use flexible array members in old readdir codepaths Fixes: - Use the correct type __poll_t for epoll and eventfd - Replace all deprecated strlcpy() invocations, whose return value isn't checked with an equivalent strscpy() call - Fix some kernel-doc warnings in fs/open.c - Reduce the stack usage in jffs2's xattr codepaths finally getting rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] royally annoying compilation warning - Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not fmode_t is required to avoid fmode_t to integer degradation warnings - Create coredumps with O_WRONLY instead of O_RDWR. There's a long explanation in that commit how O_RDWR is actually a bug which we found out with the help of Linus and git archeology - Fix "no previous prototype" warnings in the pipe codepaths - Add overflow calculations for remap_verify_area() as a signed addition overflow could be triggered in xfstests - Fix a null pointer dereference in sysv - Use an unsigned variable for length calculations in jfs avoiding compilation warnings with gcc 13 - Fix a dangling pipe pointer in the watch queue codepath - The legacy mount option parser provided as a fallback by the VFS for filesystems not yet converted to the new mount api did prefix the generated mount option string with a leading ',' causing issues for some filesystems - Fix a repeated word in a comment in fs.h - autofs: Update the ctime when mtime is updated as mandated by POSIX" * tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) readdir: Replace one-element arrays with flexible-array members fs: Provide helpers for manipulating sb->s_readonly_remount fs: Protect reconfiguration of sb read-write from racing writes eventfd: add a uapi header for eventfd userspace APIs autofs: set ctime as well when mtime changes on a dir eventfd: show the EFD_SEMAPHORE flag in fdinfo fs/aio: Stop allocating aio rings from HIGHMEM fs: Fix comment typo fs: unexport buffer_check_dirty_writeback fs: avoid empty option when generating legacy mount string watch_queue: prevent dangling pipe pointer fs.h: Optimize file struct to prevent false sharing highmem: Rename put_and_unmap_page() to unmap_and_put_page() cachefiles: Allow the cache to be non-root init: remove unused names parameter in split_fs_names() jfs: Use unsigned variable for length calculations fs/sysv: Null check to prevent null-ptr-deref bug fs: use UB-safe check for signed addition overflow in remap_verify_area procfs: consolidate arch_report_meminfo declaration fs: pipe: reveal missing function protoypes ...
2023-06-26Merge tag 'v6.5/fs.ntfs' of ↵Linus Torvalds4-21/+23
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull ntfs updates from Christian Brauner: "A pile of various smaller fixes for ntfs" * tag 'v6.5/fs.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ntfs: do not dereference a null ctx on error ntfs: Remove unneeded semicolon ntfs: Correct spelling ntfs: remove redundant initialization to pointer cb_sb_start
2023-06-26Merge tag 'auxdisplay-6.5' of https://github.com/ojeda/linuxLinus Torvalds2-2/+2
Pull auxdisplay update from Miguel Ojeda: "A single cleanup for i2c drivers to switch them back to use '.probe()'" * tag 'auxdisplay-6.5' of https://github.com/ojeda/linux: auxdisplay: Switch i2c drivers back to use .probe()
2023-06-26Merge tag 'rust-6.5' of https://github.com/Rust-for-Linux/linuxLinus Torvalds36-795/+1658
Pull rust updates from Miguel Ojeda: "A fairly small one in terms of feature additions. Most of the changes in terms of lines come from the upgrade to the new version of the toolchain (which in turn is big due to the vendored 'alloc' crate). Upgrade to Rust 1.68.2: - This is the first such upgrade, and we will try to update it often from now on, in order to remain close to the latest release, until a minimum version (which is "in the future") can be established. The upgrade brings the stabilization of 4 features we used (and 2 more that we used in our old 'rust' branch). Commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2") contains the details and rationale. pin-init API: - Several internal improvements and fixes to the pin-init API, e.g. allowing to use 'Self' in a struct definition with '#[pin_data]'. 'error' module: - New 'name()' method for the 'Error' type (with 'errname()' integration), used to implement the 'Debug' trait for 'Error'. - Add error codes from 'include/linux/errno.h' to the list of Rust 'Error' constants. - Allow specifying error type on the 'Result' type (with the default still being our usual 'Error' type). 'str' module: - 'TryFrom' implementation for 'CStr', and new 'to_cstring()' method based on it. 'sync' module: - Implement 'AsRef' trait for 'Arc', allowing to use 'Arc' in code that is generic over smart pointer types. - Add 'ptr_eq' method to 'Arc' for easier, less error prone comparison between two 'Arc' pointers. - Reword the 'Send' safety comment for 'Arc', and avoid referencing it from the 'Sync' one. 'task' module: - Implement 'Send' marker for 'Task'. 'types' module: - Implement 'Send' and 'Sync' markers for 'ARef<T>' when 'T' is 'AlwaysRefCounted', 'Send' and 'Sync'. Other changes: - Documentation improvements and '.gitattributes' change to start using the Rust diff driver" * tag 'rust-6.5' of https://github.com/Rust-for-Linux/linux: rust: error: `impl Debug` for `Error` with `errname()` integration rust: task: add `Send` marker to `Task` rust: specify when `ARef` is thread safe rust: sync: reword the `Arc` safety comment for `Sync` rust: sync: reword the `Arc` safety comment for `Send` rust: sync: implement `AsRef<T>` for `Arc<T>` rust: sync: add `Arc::ptr_eq` rust: error: add missing error codes rust: str: add conversion from `CStr` to `CString` rust: error: allow specifying error type on `Result` rust: init: update macro expansion example in docs rust: macros: replace Self with the concrete type in #[pin_data] rust: macros: refactor generics parsing of `#[pin_data]` into its own function rust: macros: fix usage of `#[allow]` in `quote!` docs: rust: point directly to the standalone installers .gitattributes: set diff driver for Rust source code files rust: upgrade to Rust 1.68.2 rust: arc: fix intra-doc link in `Arc<T>::init` rust: alloc: clarify what is the upstream version
2023-06-26Merge tag 's390-6.4-4' of ↵Linus Torvalds6-11/+27
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Use correct type for size of memory allocated for ELF core header on kernel crash. - Fix insecure W+X mapping warning when KASAN shadow memory range is not aligned on page boundary. - Avoid allocation of short by one page KASAN shadow memory when the original memory range is less than (PAGE_SIZE << 3). - Fix virtual vs physical address confusion in physical memory enumerator. It is not a real issue, since virtual and physical addresses are currently the same. - Set CONFIG_NET_TC_SKB_EXT=y in s390 config files as it is required for offloading TC as well as bridges on switchdev capable ConnectX devices. * tag 's390-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/defconfigs: set CONFIG_NET_TC_SKB_EXT=y s390/boot: fix physmem_info virtual vs physical address confusion s390/kasan: avoid short by one page shadow memory s390/kasan: fix insecure W+X mapping warning s390/crash: use the correct type for memory allocation
2023-06-26Merge tag 'nios2_updates_for_v6.5' of ↵Linus Torvalds3-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 updates from Dinh Nguyen: - Convert pgtable constructor/destructors to ptdesc - Replace strlcpy with strscpy * tag 'nios2_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: nios2: Replace all non-returning strlcpy with strscpy nios2: Convert __pte_free_tlb() to use ptdescs
2023-06-26Merge tag 'irqchip-6.5' of ↵Thomas Gleixner1039-5611/+9989
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - A number of Loogson/Loogarch fixes - Allow the core code to retrigger an interrupt that has fired while the same interrupt is being handled on another CPU, papering over a GICv3 architecture issue - Work around an integration problem on ASR8601, where the CPU numbering isn't representable in the GIC implementation... - Add some missing interrupt to the STM32 irqchip - A bunch of warning squashing triggered by W=1 builds Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org
2023-06-26Merge tag 'timers-v6.5-rc1' of ↵Thomas Gleixner20-374/+442
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevent/source updates from Daniel Lezcano: - Fix memory leak on Cadence TTC at probe time (Feng Mingxi) - Use the pm_sleep_ptr macro for the Ingenic driver (Paul Cercueil) - Relocate the PMW timer Loongson from the mips arch directory to the drivers/clocksource (Keguang Zhang) - Use the same function names instead of using aliases and move data defined in the header to the driver directly as this one is the only user of the header file and remove this one on i.MX GPT (Uwe Kleine-König) - Convert Broadcom Kona family timer bindings to DT schema (Michael Kelley) - Add DT bindings for Ralink SoCs timer (Sergio Paracuellos)
2023-06-26Linux 6.4Linus Torvalds1-1/+1
2023-06-26Merge tag 'i2c-for-6.4-rc8' of ↵Linus Torvalds3-9/+17
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Nothing fancy. Two driver and one DT binding fix" * tag 'i2c-for-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle i2c: qup: Add missing unwind goto in qup_i2c_probe() dt-bindings: i2c: opencores: Add missing type for "regstep"
2023-06-25Merge tag 'perf_urgent_for_v6.4' of ↵Linus Torvalds2-4/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Drop the __weak attribute from a function prototype as it otherwise leads to the function getting replaced by a dummy stub - Fix the umask value setup of the frontend event as former is different on two Intel cores * tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
2023-06-25Merge tag 'objtool_urgent_for_v6.4' of ↵Linus Torvalds7-0/+59
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: - Add a ORC format hash to vmlinux and modules in order for other tools which use it, to detect changes to it and adapt accordingly * tag 'objtool_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Add ELF section with ORC version identifier
2023-06-25Merge tag 'x86_urgent_for_v6.4' of ↵Linus Torvalds2-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Do not use set_pgd() when updating the KASLR trampoline pgd entry because that updates the user PGD too on KPTI builds, resulting in memory corruption - Prevent a panic in the IO-APIC setup code due to conflicting command line parameters * tag 'x86_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys x86/mm: Avoid using set_pgd() outside of real PGD pages
2023-06-24Merge tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-3/+3
Pull drm fixes from Dave Airlie: "Very quiet last week, just two misc fixes, one dp-mst and one qaic: qaic: - dma-buf import fix dp-mst: - fix NULL ptr deref" [ It turns out it was a quiet week because Alex Deucher hadn't sent in his pending AMD changes. So they are coming next - Linus ] * tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm: drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2 accel/qaic: Call DRM helper function to destroy prime GEM
2023-06-24Merge tag 'arm-fixes-6.4-3' of ↵Linus Torvalds16-48/+69
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "The final bug fixes for Qualcomm and Rockchips came in, all of them for devicetree files: - Devices on Qualcomm SC7180/SC7280 that are cache coherent are now marked so correctly to fix a regression after a change in kernel behavior - Rockchips has a few minor changes for correctness of regulator and cache properties, as well as fixes for incorrect behavior of the RK3568 PCI controller and reset pins on two boards" * tag 'arm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent arm64: dts: rockchip: Fix rk356x PCIe register and range mappings arm64: dts: rockchip: fix button reset pin for nanopi r5c arm64: dts: rockchip: fix nEXTRST on SOQuartz arm64: dts: rockchip: add missing cache properties arm64: dts: rockchip: fix USB regulator on ROCK64
2023-06-24Merge tag 'for-6.4-rc7-tag' of ↵Linus Torvalds5-28/+40
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Unfortunately the recent u32 overflow fix was not complete, there was one conversion left, assertion not triggered by my tests but caught by Qu's fstests case. The "cleanup for later" has been promoted to a proper fix and wraps all uses of the stripe left shift so the diffstat has grown but leaves no potentially problematic uses. We should have done it that way before, sorry" * tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix remaining u32 overflows when left shifting stripe_nr
2023-06-24Merge tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-2/+3
Pull block fix from Jens Axboe: "It's apparently the week of 'fixup something from last week', because the same is true for this block pull request. Fix up a lock grab that needs to be IRQ saving, rather than just IRQ disabling, in the block cgroup code" * tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux: block: make sure local irq is disabled when calling __blkcg_rstat_flush
2023-06-24Merge tag 'iommu-fix-v6.4-rc7' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: - Fix potential memory leak in AMD IOMMU domain allocation path * tag 'iommu-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix possible memory leak of 'domain'
2023-06-24Merge tag 'sound-6.4' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Three oneliner fixes: one for a thinko in SOF SoundWire code and two HD-audio quirks for ASUS laptops. All device-specific and should be safe to apply" * tag 'sound-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Add quirk for ASUS ROG GV601V ALSA: hda/realtek: Add quirk for ASUS ROG G634Z ASoC: intel: sof_sdw: Fixup typo in device link checking
2023-06-24Merge tag 'gpio-fixes-for-v6.4' of ↵Linus Torvalds3-3/+24
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix IRQ initialization in gpiochip_irqchip_add_domain() - add a missing return value check for platform_get_irq() in gpio-sifive - don't free irq_domains which GPIOLIB does not manage * tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain() gpio: sifive: add missing check for platform_get_irq gpiolib: Fix GPIO chip IRQ initialization restriction
2023-06-23Merge tag 'qcom-arm64-fixes-for-6.4-2' of ↵Arnd Bergmann6-2/+19
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes One last Qualcomm ARM64 DeviceTree fix for v6.4 Changes related to cache management for DMA memory caused WiFi to stop work on SC7180 and SC7280 based products, using TF-A. These changes marks the relevant device dma-coherent to correct the behavior. * tag 'qcom-arm64-fixes-for-6.4-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent Link: https://lore.kernel.org/r/20230622203248.106422-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-23workqueue: clean up WORK_* constant types, clarify maskingLinus Torvalds2-12/+16
Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-23scsi/sg: don't grab scsi host module referenceYu Kuai1-3/+3
In order to prevent request_queue to be freed before cleaning up blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") use scsi_device_get(), however, scsi_device_get() will also grab scsi module reference and scsi module can't be removed. It's reported that blktests can't unload scsi_debug after block/001: blktests (master) # ./check block block/001 (stress device hotplugging) [failed] +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19 Running block/001 Stressing sd +modprobe: FATAL: Module scsi_debug is in use. Fix this problem by grabbing request_queue reference directly, so that scsi host module can still be unloaded while request_queue will be pinged by sg device. Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com> Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/ Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-23io_uring: merge conditional unlock flush helpersPavel Begunkov1-12/+1
There is no reason not to use __io_cq_unlock_post_flush for intermediate aux CQE flushing, all ->task_complete should apply there, i.e. if set it should be the submitter task. Combine them, get rid of of __io_cq_unlock_post() and rename the left function. This place was also taking a couple percents of CPU according to profiles for max throughput net benchmarks due to multishot recv flooding it with completions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bbed60734cbec2e833d9c7bdcf9741aada5d8aab.1687518903.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-23io_uring: make io_cq_unlock_post staticPavel Begunkov2-3/+1
io_cq_unlock_post() is exclusively used in io_uring/io_uring.c, mark it static and don't expose to other files. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3dc8127dda4514e1dd24bb32035faac887c5fa37.1687518903.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-23io_uring: inline __io_cq_unlockPavel Begunkov1-8/+4
__io_cq_unlock is not very helpful, and users should be calling flush variants anyway. Open code the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d875c4cfb69f38ccecb58a57111446c77a614caa.1687518903.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>