summaryrefslogtreecommitdiff
path: root/include/trace/events
AgeCommit message (Collapse)AuthorFilesLines
2024-05-23Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linuxLinus Torvalds1-1/+1
Pull more block updates from Jens Axboe: "Followup block updates, mostly due to NVMe being a bit late to the party. But nothing major in there, so not a big deal. In detail, this contains: - NVMe pull request via Keith: - Fabrics connection retries (Daniel, Hannes) - Fabrics logging enhancements (Tokunori) - RDMA delete optimization (Sagi) - ublk DMA alignment fix (me) - null_blk sparse warning fixes (Bart) - Discard support for brd (Keith) - blk-cgroup list corruption fixes (Ming) - blk-cgroup stat propagation fix (Waiman) - Regression fix for plugging stall with md (Yu) - Misc fixes or cleanups (David, Jeff, Justin)" * tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits) null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' blk-throttle: remove unused struct 'avg_latency_bucket' block: fix lost bio for plug enabled bio based device block: t10-pi: add MODULE_DESCRIPTION() blk-mq: add helper for checking if one CPU is mapped to specified hctx blk-cgroup: Properly propagate the iostat update up the hierarchy blk-cgroup: fix list corruption from reorder of WRITE ->lqueued blk-cgroup: fix list corruption from resetting io stat cdrom: rearrange last_media_change check to avoid unintentional overflow nbd: Fix signal handling nbd: Remove a local variable from nbd_send_cmd() nbd: Improve the documentation of the locking assumptions nbd: Remove superfluous casts nbd: Use NULL to represent a pointer brd: implement discard support null_blk: Fix two sparse warnings ublk_drv: set DMA alignment mask to 3 nvme-rdma, nvme-tcp: include max reconnects for reconnect logging nvmet-rdma: Avoid o(n^2) loop in delete_ctrl nvme: do not retry authentication failures ...
2024-05-23tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)52-303/+296
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-20Merge tag 'f2fs-for-6.10.rc1' of ↵Linus Torvalds1-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tried to address some performance issues on zoned storage such as direct IO and write_hints. In addition, we've migrated some IO paths using folio. Meanwhile, there are multiple bug fixes in the compression paths, sanity check conditions, and error handlers. Enhancements: - allow direct io of pinned files for zoned storage - assign the write hint per stream by default - convert read paths and test_writeback to folio - avoid allocating WARM_DATA segment for direct IO Bug fixes: - fix false alarm on invalid block address - fix to add missing iput() in gc_data_segment() - fix to release node block count in error path of f2fs_new_node_page() - compress: - don't allow unaligned truncation on released compress inode - cover {reserve,release}_compress_blocks() w/ cp_rwsem lock - fix error path of inc_valid_block_count() - fix to update i_compr_blocks correctly - fix block migration when section is not aligned to pow2 - don't trigger OPU on pinfile for direct IO - fix to do sanity check on i_xattr_nid in sanity_check_inode() - write missing last sum blk of file pinning section - clear writeback when compression failed - fix to adjust appropirate defragment pg_end As usual, there are several minor code clean-ups, and fixes to manage missing corner cases in the error paths" * tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits) f2fs: initialize last_block_in_bio variable f2fs: Add inline to f2fs_build_fault_attr() stub f2fs: fix some ambiguous comments f2fs: fix to add missing iput() in gc_data_segment() f2fs: allow dirty sections with zero valid block for checkpoint disabled f2fs: compress: don't allow unaligned truncation on released compress inode f2fs: fix to release node block count in error path of f2fs_new_node_page() f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock f2fs: compress: fix error path of inc_valid_block_count() f2fs: compress: fix typo in f2fs_reserve_compress_blocks() f2fs: compress: fix to update i_compr_blocks correctly f2fs: check validation of fault attrs in f2fs_build_fault_attr() f2fs: fix to limit gc_pin_file_threshold f2fs: remove unused GC_FAILURE_PIN f2fs: use f2fs_{err,info}_ratelimited() for cleanup f2fs: fix block migration when section is not aligned to pow2 f2fs: zone: fix to don't trigger OPU on pinfile for direct IO f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode() f2fs: fix to avoid allocating WARM_DATA segment for direct IO f2fs: remove redundant parameter in is_next_segment_free() ...
2024-05-20Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: "Mainly singleton patches, documented in their respective changelogs. Notable series include: - Some maintenance and performance work for ocfs2 in Heming Zhao's series "improve write IO performance when fragmentation is high". - Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes exposed by fstests". - kfifo header rework from Andy Shevchenko in the series "kfifo: Clean up kfifo.h". - GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes for $lx_current and $lx_per_cpu". - After much discussion, a coding-style update from Barry Song explaining one reason why inline functions are preferred over macros. The series is "codingstyle: avoid unused parameters for a function-like macro"" * tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits) fs/proc: fix softlockup in __read_vmcore nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON() scripts: checkpatch: check unused parameters for function-like macro Documentation: coding-style: ask function-like macros to evaluate parameters nilfs2: use __field_struct() for a bitwise field selftests/kcmp: remove unused open mode nilfs2: remove calls to folio_set_error() and folio_clear_error() kernel/watchdog_perf.c: tidy up kerneldoc watchdog: allow nmi watchdog to use raw perf event watchdog: handle comma separated nmi_watchdog command line nilfs2: make superblock data array index computation sparse friendly squashfs: remove calls to set the folio error flag squashfs: convert squashfs_symlink_read_folio to use folio APIs scripts/gdb: fix detection of current CPU in KGDB scripts/gdb: make get_thread_info accept pointers scripts/gdb: fix parameter handling in $lx_per_cpu scripts/gdb: fix failing KGDB detection during probe kfifo: don't use "proxy" headers media: stih-cec: add missing io.h media: rc: add missing io.h ...
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds4-17/+17
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-16Merge tag 'platform-drivers-x86-v6.10-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - New drivers/platform/arm64 directory for arm64 embedded-controller drivers - New drivers: - Acer Aspire 1 embedded controllers (for arm64 models) - ACPI quickstart PNP0C32 buttons - Dell All-In-One backlight support (dell-uart-backlight) - Lenovo WMI camera buttons - Lenovo Yoga Tablet 2 Pro 1380F/L fast charging - MeeGoPad ANX7428 Type-C Cross Switch (power sequencing only) - MSI WMI sensors (fan speed sensors only for now) - Asus WMI: - 2024 ROG Mini-LED support - MCU powersave support - Vivobook GPU MUX support - Misc. other improvements - Ideapad laptop: - Export FnLock LED as LED class device - Switch platform profiles using thermal management key - Intel drivers: - IFS: various improvements - PMC: Lunar Lake support - SDSI: various improvements - TPMI/ISST: various improvements - tools: intel-speed-select: various improvements - MS Surface drivers: - Fan profile switching support - Surface Pro thermal sensors support - ThinkPad ACPI: - Reworked hotkey support to use sparse keymaps - Add support for new trackpoint-doubletap, Fn+N and Fn+G hotkeys - WMI core: - New WMI driver development guide - x86 Android tablets: - Lenovo Yoga Tablet 2 Pro 1380F/L support - Xiaomi MiPad 2 status LED and bezel touch buttons backlight support - Miscellaneous cleanups / fixes / improvements * tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (128 commits) platform/x86: Add new MeeGoPad ANX7428 Type-C Cross Switch driver devm-helpers: Fix a misspelled cancellation in the comments tools arch x86: Add dell-uart-backlight-emulator platform/x86: Add new Dell UART backlight driver platform/x86: x86-android-tablets: Create LED device for Xiaomi Pad 2 bottom bezel touch buttons platform/x86: x86-android-tablets: Xiaomi pad2 RGB LED fwnode updates platform/x86: x86-android-tablets: Pass struct device to init() platform/x86/amd: pmc: Add new ACPI ID AMDI000B platform/x86/amd: pmf: Add new ACPI ID AMDI0105 platform/x86: p2sb: Don't init until unassigned resources have been assigned platform/surface: aggregator: Log critical errors during SAM probing platform/x86: ISST: Support SST-BF and SST-TF per level platform/x86/fujitsu-laptop: Replace sprintf() with sysfs_emit() tools/power/x86/intel-speed-select: v1.19 release tools/power/x86/intel-speed-select: Display CPU as None for -1 tools/power/x86/intel-speed-select: SST BF/TF support per level tools/power/x86/intel-speed-select: Increase number of CPUs displayed tools/power/x86/intel-speed-select: Present all TRL levels for turbo-freq tools/power/x86/intel-speed-select: Fix display for unsupported levels tools/power/x86/intel-speed-select: Support multiple dies ...
2024-05-16Merge tag 'wq-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+3
Pull workqueue updates from Tejun Heo: - Work items can now be disabled and enabled, and cancel_work_sync() and disable_work() can be called form atomic contexts for BH work items. This closes feature gap with tasklet and should allow converting all existing tasklet users to BH workqueues. - Improve pool sharing for unbound workqueues with strict affinity. - Misc changes including doc updates, improved debug annotations and cleanups. * tag 'wq-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use "@..." in function comment to describe variable length argument workqueue: Add destroy_work_on_stack() in workqueue_softirq_dead() workqueue: remove unnecessary import and function in wq_monitor.py workqueue: Introduce enable_and_queue_work() convenience function workqueue: add function in event of workqueue_activate_work workqueue: Cleanup subsys attribute registration workqueue: Use list_last_entry() to get the last idle worker workqueue: Move attrs->cpumask out of worker_pool's properties when attrs->affn_strict workqueue: Use INIT_WORK_ONSTACK in workqueue_softirq_dead() workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items workqueue: Remember whether a work item was on a BH workqueue workqueue: Remove WORK_OFFQ_CANCELING workqueue: Implement disable/enable for (delayed) work items workqueue: Preserve OFFQ bits in cancel[_sync] paths
2024-05-16Merge tag 'cgroup-for-6.10' of ↵Linus Torvalds1-0/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. * tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints selftests/cgroup: Drop define _GNU_SOURCE docs: cgroup-v1: Update page cache removal functions selftests/cgroup: fix uninitialized variables in test_zswap.c selftests/cgroup: cpu_hogger init: use {} instead of {NULL} selftests/cgroup: fix clang warnings: uninitialized fd variable selftests/cgroup: fix clang build failures for abs() calls cgroup/cpuset: Remove outdated comment in sched_partition_write() cgroup/cpuset: Fix incorrect top_cpuset flags cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice cgroup/cpuset: Statically initialize more members of top_cpuset cgroup: Avoid unnecessary looping in cgroup_no_v1() cgroup, legacy_freezer: update comment for freezer_css_offline() docs, cgroup: add entries for pids to cgroup-v2.rst cgroup: don't call cgroup1_pidlist_destroy_all() for v2 cgroup_freezer: update comment for freezer_css_online() cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints cgroup/pids: Remove superfluous zeroing docs: cgroup-v1: Fix description for css_online ...
2024-05-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-15/+0
Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greatly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. LoongArch: - Add ParaVirt IPI support - Add software breakpoint support - Add mmio trace events support RISC-V: - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak - Some preparatory work for both TDX and SNP page fault handling. This also cleans up the page fault path, so that the priorities of various kinds of fauls (private page, no memory, write to read-only slot, etc.) are easier to follow. x86: - Minimize amount of time that shadow PTEs remain in the special REMOVED_SPTE state. This is a state where the mmu_lock is held for reading but concurrent accesses to the PTE have to spin; shortening its use allows other vCPUs to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is defined by hardware but left for software use. This lets KVM communicate its inability to map GPAs that set bits 51:48 on hosts without 5-level nested page tables. Guest firmware is expected to use the information when mapping BARs; this avoids that they end up at a legal, but unmappable, GPA. - Fixed a bug where KVM would not reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - As usual, a bunch of code cleanups. x86 (AMD): - Implement a new and improved API to initialize SEV and SEV-ES VMs, which will also be extendable to SEV-SNP. The new API specifies the desired encryption in KVM_CREATE_VM and then separately initializes the VM. The new API also allows customizing the desired set of VMSA features; the features affect the measurement of the VM's initial state, and therefore enabling them cannot be done tout court by the hypervisor. While at it, the new API includes two bugfixes that couldn't be applied to the old one without a flag day in userspace or without affecting the initial measurement. When a SEV-ES VM is created with the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected once the VMSA has been encrypted. Also, the FPU and AVX state will be synchronized and encrypted too. - Support for GHCB version 2 as applicable to SEV-ES guests. This, once more, is only accessible when using the new KVM_SEV_INIT2 flow for initialization of SEV-ES VMs. x86 (Intel): - An initial bunch of prerequisite patches for Intel TDX were merged. They generally don't do anything interesting. The only somewhat user visible change is a new debugging mode that checks that KVM's MMU never triggers a #VE virtualization exception in the guest. - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. Generic: - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Remove .change_pte() MMU notifier - the changes to non-KVM code are small and Andrew Morton asked that I also take those through the KVM tree. The callback was only ever implemented by KVM (which was also the original user of MMU notifiers) but it had been nonfunctional ever since calls to set_pte_at_notify were wrapped with invalidate_range_start and invalidate_range_end... in 2012. Selftests: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell script, to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies. - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global pseudo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup. Documentation: - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits) selftests/kvm: remove dead file KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: SEV: Allow per-guest configuration of GHCB protocol version KVM: SEV: Add GHCB handling for termination requests KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests KVM: SEV: Add support to handle AP reset MSR protocol KVM: x86: Explicitly zero kvm_caps during vendor module load KVM: x86: Fully re-initialize supported_mce_cap on vendor module load KVM: x86: Fully re-initialize supported_vm_types on vendor module load KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values ...
2024-05-16Merge branch 'for-6.10' into test-merge-for-6.10Tejun Heo1-1/+3
2024-05-15Merge tag 'sound-6.10-rc1' of ↵Linus Torvalds3-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This one became bigger than usual, not in the total size but rather containing lots of small changes all over the places. The majority of changes are about ASoC, especially SOF / Intel stuff, and we see an interesting work for ASoC DAPM graph visualization, while there are many other code cleanup and refactoring, too. Core: - A deadlock fix at device disconnection - A new tool dapm-graph for visualising the DAPM state ASoC: - Large updates throughout the Intel audio drivers - Fixes and clarifications for the DAPM documentation - Cleanups of accessors for driver data, module labelling, and for constification - Modernsation and cleanup work in the Mediatek drivers - Several fixes and features for the DaVinci I2S driver - New drivers for several AMD and Intel platforms, Nuvoton NAU8325, Rockchip RK3308 and Texas Instruments PCM6240 HD-audio: - Cleanup for CONFIG_PM dependencies - Cirrus HD-audio codec fixes and quirks Others: - Series of tree-wide fixes in Makefiles to use *-y - Additions of missing module descriptions - Scarlett2 USB mixer enhancements - A series of legacy emu10k1 fixes and improvements" * tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (603 commits) ALSA: hda/realtek: Drop doubly quirk entry for 103c:8a2e ALSA: hda/realtek - fixed headset Mic not show ASoC: SOF: amd: Fix build error with built-in config ALSA: scarlett2: Increase mixer range to +12dB ALSA: scarlett2: Add S/PDIF source selection controls ALSA: core: Remove superfluous CONFIG_PM ALSA: Fix deadlocks with kctl removals at disconnection ASoC: audio-graph-card2: call of_node_get() before of_get_next_child() ASoC: SOF: amd: Correct spaces in Makefile ASoC: rt715-sdca-sdw: Fix wrong complete waiting in rt715_dev_resume() ASoC: Intel: sof_sdw_rt_amp: use dai parameter ASoC: Intel: sof_sdw: add dai parameter to rtd_init callback ASoC: Intel: sof_sdw: use .controls/.widgets to add controls/widgets ASoC: Intel: sof_sdw: add controls and dapm widgets in codec_info ASoC: Intel: sof_sdw: use generic name for controls/widgets ASoC: Intel: sof_sdw_cs_amp: rename Speakers to Speaker ASoC: Intel: maxim-common: change max98373 data to static ASoC: Intel: sof_sdw: add max98373 dapm routes ASoC: Intel: sof_rt5682: use max_98373_dai_link function ASoC: Intel: sof_nau8825: use max_98373_dai_link function ...
2024-05-15Merge tag 'net-next-6.10' of ↵Linus Torvalds7-112/+245
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Complete rework of garbage collection of AF_UNIX sockets. AF_UNIX is prone to forming reference count cycles due to fd passing functionality. New method based on Tarjan's Strongly Connected Components algorithm should be both faster and remove a lot of workarounds we accumulated over the years. - Add TCP fraglist GRO support, allowing chaining multiple TCP packets and forwarding them together. Useful for small switches / routers which lack basic checksum offload in some scenarios (e.g. PPPoE). - Support using SMP threads for handling packet backlog i.e. packet processing from software interfaces and old drivers which don't use NAPI. This helps move the processing out of the softirq jumble. - Continue work of converting from rtnl lock to RCU protection. Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link information available via rtnetlink. - Small optimizations from Eric to UDP wake up handling, memory accounting, RPS/RFS implementation, TCP packet sizing etc. - Allow direct page recycling in the bulk API used by XDP, for +2% PPS. - Support peek with an offset on TCP sockets. - Add MPTCP APIs for querying last time packets were received/sent/acked and whether MPTCP "upgrade" succeeded on a TCP socket. - Add intra-node communication shortcut to improve SMC performance. - Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver. - Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver. - Add reset reasons for tracing what caused a TCP reset to be sent. - Introduce direction attribute for xfrm (IPSec) states. State can be used either for input or output packet processing. Things we sprinkled into general kernel code: - Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS(). This required touch-ups and renaming of a few existing users. - Add Endian-dependent __counted_by_{le,be} annotations. - Make building selftests "quieter" by printing summaries like "CC object.o" rather than full commands with all the arguments. Netfilter: - Use GFP_KERNEL to clone elements, to deal better with OOM situations and avoid failures in the .commit step. BPF: - Add eBPF JIT for ARCv2 CPUs. - Support attaching kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. "Session mode" is a common use-case for tetragon and bpftrace. - Add the ability to specify and retrieve BPF cookie for raw tracepoint programs in order to ease migration from classic to raw tracepoints. - Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86, ARM64 and RISC-V JITs. This allows inlining functions which need to access per-CPU state. - Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction. Support BPF arena on ARM64. - Add a new bpf_wq API for deferring events and refactor process-context bpf_timer code to keep common code where possible. - Harden the BPF verifier's and/or/xor value tracking. - Introduce crypto kfuncs to let BPF programs call kernel crypto APIs. - Support bpf_tail_call_static() helper for BPF programs with GCC 13. - Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled. Driver API: - Skip software TC processing completely if all installed rules are marked as HW-only, instead of checking the HW-only flag rule by rule. - Add support for configuring PoE (Power over Ethernet), similar to the already existing support for PoDL (Power over Data Line) config. - Initial bits of a queue control API, for now allowing a single queue to be reset without disturbing packet flow to other queues. - Common (ethtool) statistics for hardware timestamping. Tests and tooling: - Remove the need to create a config file to run the net forwarding tests so that a naive "make run_tests" can exercise them. - Define a method of writing tests which require an external endpoint to communicate with (to send/receive data towards the test machine). Add a few such tests. - Create a shared code library for writing Python tests. Expose the YAML Netlink library from tools/ to the tests for easy Netlink access. - Move netfilter tests under net/, extend them, separate performance tests from correctness tests, and iron out issues found by running them "on every commit". - Refactor BPF selftests to use common network helpers. - Further work filling in YAML definitions of Netlink messages for: nftables, team driver, bonding interfaces, vlan interfaces, VF info, TC u32 mark, TC police action. - Teach Python YAML Netlink to decode attribute policies. - Extend the definition of the "indexed array" construct in the specs to cover arrays of scalars rather than just nests. - Add hyperlinks between definitions in generated Netlink docs. Drivers: - Make sure unsupported flower control flags are rejected by drivers, and make more drivers report errors directly to the application rather than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen). - Ethernet high-speed NICs: - Broadcom (bnxt): - support multiple RSS contexts and steering traffic to them - support XDP metadata - make page pool allocations more NUMA aware - Intel (100G, ice, idpf): - extract datapath code common among Intel drivers into a library - use fewer resources in switchdev by sharing queues with the PF - add PFCP filter support - add Ethernet filter support - use a spinlock instead of HW lock in PTP clock ops - support 5 layer Tx scheduler topology - nVidia/Mellanox: - 800G link modes and 100G SerDes speeds - per-queue IRQ coalescing configuration - Marvell Octeon: - support offloading TC packet mark action - Ethernet NICs consumer, embedded and virtual: - stop lying about skb->truesize in USB Ethernet drivers, it messes up TCP memory calculations - Google cloud vNIC: - support changing ring size via ethtool - support ring reset using the queue control API - VirtIO net: - expose flow hash from RSS to XDP - per-queue statistics - add selftests - Synopsys (stmmac): - support controllers which require an RX clock signal from the MII bus to perform their hardware initialization - TI: - icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices - icssg_prueth: add SW TX / RX Coalescing based on hrtimers - cpsw: minimal XDP support - Renesas (ravb): - support describing the MDIO bus - Realtek (r8169): - add support for RTL8168M - Microchip Sparx5: - matchall and flower actions mirred and redirect - Ethernet switches: - nVidia/Mellanox: - improve events processing performance - Marvell: - add support for MV88E6250 family internal PHYs - Microchip: - add DCB and DSCP mapping support for KSZ switches - vsc73xx: convert to PHYLINK - Realtek: - rtl8226b/rtl8221b: add C45 instances and SerDes switching - Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup - Ethernet PHYs: - Add a new driver for Airoha EN8811H 2.5 Gigabit PHY. - micrel: lan8814: add support for PPS out and external timestamp trigger - WiFi: - Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers. Modern devices can only be configured using nl80211. - mac80211/cfg80211 - handle color change per link for WiFi 7 Multi-Link Operation - Intel (iwlwifi): - don't support puncturing in 5 GHz - support monitor mode on passive channels - BZ-W device support - P2P with HE/EHT support - re-add support for firmware API 90 - provide channel survey information for Automatic Channel Selection - MediaTek (mt76): - mt7921 LED control - mt7925 EHT radiotap support - mt7920e PCI support - Qualcomm (ath11k): - P2P support for QCA6390, WCN6855 and QCA2066 - support hibernation - ieee80211-freq-limit Device Tree property support - Qualcomm (ath12k): - refactoring in preparation of multi-link support - suspend and hibernation support - ACPI support - debugfs support, including dfs_simulate_radar support - RealTek: - rtw88: RTL8723CS SDIO device support - rtw89: RTL8922AE Wi-Fi 7 PCI device support - rtw89: complete features of new WiFi 7 chip 8922AE including BT-coexistence and Wake-on-WLAN - rtw89: use BIOS ACPI settings to set TX power and channels - rtl8xxxu: enable Management Frame Protection (MFP) support - Bluetooth: - support for Intel BlazarI and Filmore Peak2 (BE201) - support for MediaTek MT7921S SDIO - initial support for Intel PCIe BT driver - remove HCI_AMP support" * tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits) selftests: netfilter: fix packetdrill conntrack testcase net: gro: fix napi_gro_cb zeroed alignment Bluetooth: btintel_pcie: Refactor and code cleanup Bluetooth: btintel_pcie: Fix warning reported by sparse Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1 Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config Bluetooth: btintel_pcie: Fix compiler warnings Bluetooth: btintel_pcie: Add *setup* function to download firmware Bluetooth: btintel_pcie: Add support for PCIe transport Bluetooth: btintel: Export few static functions Bluetooth: HCI: Remove HCI_AMP support Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init() Bluetooth: qca: Fix error code in qca_read_fw_build_info() Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning Bluetooth: btintel: Add support for Filmore Peak2 (BE201) Bluetooth: btintel: Add support for BlazarI LE Create Connection command timeout increased to 20 secs dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth Bluetooth: compute LE flow credits based on recvbuf space Bluetooth: hci_sync: Use cmd->num_cis instead of magic number ...
2024-05-15Merge tag 'firewire-updates-6.10' of ↵Linus Torvalds1-0/+348
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire updates from Takashi Sakamoto: "During the development period of v6.8 kernel, it became evident that there was a lack of helper utilities to trace the initial state of bus, while investigating certain PHYs compliant with different versions of IEEE 1394 specification. This series of changes includes the addition of tracepoints events, provided by 'firewire' subsystem. These events enable tracing of how firewire core functions during bus reset and asynchronous communication over IEEE 1394 bus. When implementing the tracepoints events, it was found that the existing serialization and deserialization helpers for several types of asynchronous packets are scattered across both firewire-core and firewire-ohci kernel modules. A set of inline functions is newly added to address it, along with some KUnit tests, serving as the foundation for the tracepoints events. This renders the dispersed code obsolete. The remaining changes constitute the final steps in phasing out the usage of deprecated PCI MSI APIs, in continuation from the previous version" * tag 'firewire-updates-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (29 commits) firewire: obsolete usage of *-objs in Makefile for KUnit test firewire: core: remove flag and width from u64 formats of tracepoints events firewire: core: fix type of timestamp for async_inbound_template tracepoints events firewire: core: add tracepoint event for handling bus reset Revert "firewire: core: option to log bus reset initiation" firewire: core: add tracepoints events for initiating bus reset firewire: ohci: obsolete OHCI_PARAM_DEBUG_BUSRESETS from debug module parameter firewire: ohci: add bus-reset event for initial set of handled irq firewire: core: add tracepoints event for asynchronous inbound phy packet firewire: core/cdev: add tracepoints events for asynchronous phy packet firewire: core: add tracepoints events for asynchronous outbound response firewire: core: add tracepoint event for asynchronous inbound request firewire: core: add tracepoints event for asynchronous inbound response firewire: core: add tracepoints events for asynchronous outbound request firewire: core: add support for Linux kernel tracepoints firewire: core: replace local macros with common inline functions for isochronous packet header firewire: core: add common macro to serialize/deserialize isochronous packet header firewire: core: obsolete tcode check macros with inline functions firewire: ohci: replace hard-coded values with common macros firewire: ohci: replace hard-coded values with inline functions for asynchronous packet header ...
2024-05-15Merge tag 'dlm-6.10' of ↵Linus Torvalds1-27/+19
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes some small fixes, and some big internal changes: - Fix a long standing race between the unlock callback for the last lkb struct, and removing the rsb that became unused after the final unlock. This could lead different nodes to inconsistent info about the rsb master node. - Remove unnecessary refcounting on callback structs, returning to the way things were done in the past. - Do message processing in softirq context. This allows dlm messages to be cleared more quickly and efficiently, reducing long lists of incomplete requests. A future change to run callbacks directly from this context will make this more effective. - The softirq message processing involved a number of patches changing mutexes to spinlocks and rwlocks, and a fair amount of code re-org in preparation. - Use an rhashtable for rsb structs, rather than our old internal hash table implementation. This also required some re-org of lists and locks preparation for the change. - Drop the dlm_scand kthread, and use timers to clear unused rsb structs. Scanning all rsb's periodically was a lot of wasted work. - Fix recent regression in logic for copying LVB data in user space lock requests" * tag 'dlm-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (34 commits) dlm: return -ENOMEM if ls_recover_buf fails dlm: fix sleep in atomic context dlm: use rwlock for lkbidr dlm: use rwlock for rsb hash table dlm: drop dlm_scand kthread and use timers dlm: do not use ref counts for rsb in the toss state dlm: switch to use rhashtable for rsbs dlm: add rsb lists for iteration dlm: merge toss and keep hash table lists into one list dlm: change to single hashtable lock dlm: increment ls_count for dlm_scand dlm: do message processing in softirq context dlm: use spin_lock_bh for message processing dlm: remove schedule in receive path dlm: convert ls_recv_active from rw_semaphore to rwlock dlm: avoid blocking receive at the end of recovery dlm: convert res_lock to spinlock dlm: convert ls_waiters_mutex to spinlock dlm: drop mutex use in waiters recovery dlm: add new struct to save position in dlm_copy_master_names ...
2024-05-15Merge tag 'for-6.10-tag' of ↵Linus Torvalds1-37/+121
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This update brings a few minor performance improvements, otherwise there's a lot of refactoring, cleanups and other sort of not user visible changes. Performance improvements: - inline b-tree locking functions, improvement in metadata-heavy changes - relax locking on a range that's being reflinked, allows read operations to run in parallel - speed up NOCOW write checks (throughput +9% on a sample test) - extent locking ranges have been reduced in several places, namely around delayed ref processing Core: - more page to folio conversions: - relocation - send - compression - inline extent handling - super block write and wait - extent_map structure optimizations: - reduced structure size - code simplifications - add shrinker for allocated objects, the numbers can go high and could exhaust memory on smaller systems (reported) as they may not get an opportunity to be freed fast enough - extent locking optimizations: - reduce locking ranges where it does not seem to be necessary and are safe due to other means of synchronization - potential improvements due to lower contention, allocation/freeing and state management operations of extent state tracking structures - delayed ref cleanups and simplifications - updated trace points - improved error handling, warnings and assertions - cleanups and refactoring, unification of error handling paths" * tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (122 commits) btrfs: qgroup: fix initialization of auto inherit array btrfs: count super block write errors in device instead of tracking folio error state btrfs: use the folio iterator in btrfs_end_super_write() btrfs: convert super block writes to folio in write_dev_supers() btrfs: convert super block writes to folio in wait_dev_supers() bio: Export bio_add_folio_nofail to modules btrfs: remove duplicate included header from fs.h btrfs: add a cached state to extent_clear_unlock_delalloc btrfs: push extent lock down in submit_one_async_extent btrfs: push lock_extent down in cow_file_range() btrfs: move can_cow_file_range_inline() outside of the extent lock btrfs: push lock_extent into cow_file_range_inline btrfs: push extent lock into cow_file_range btrfs: push extent lock into run_delalloc_cow btrfs: remove unlock_extent from run_delalloc_compressed btrfs: push extent lock down in run_delalloc_nocow btrfs: adjust while loop condition in run_delalloc_nocow btrfs: push extent lock into run_delalloc_nocow btrfs: push the extent lock into btrfs_run_delalloc_range btrfs: lock extent when doing inline extent in compression ...
2024-05-14cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepointsJesper Dangaard Brouer1-6/+50
This closely resembles helpers added for the global cgroup_rstat_lock in commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock. Based on production workloads, we observe the fast-path "update" function cgroup_rstat_updated() is invoked around 3 million times per sec, while the "flush" function cgroup_rstat_flush_locked(), walking each possible CPU, can see periodic spikes of 700 invocations/sec. For this reason, the tracepoints are split into normal and fastpath versions for this per-CPU lock. Making it feasible for production to continuously monitor the non-fastpath tracepoint to detect lock contention issues. The reason for monitoring is that lock disables IRQs which can disturb e.g. softirq processing on the local CPUs involved. When the global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex), this per CPU lock becomes the next bottleneck that can introduce latency variations. A practical bpftrace script for monitoring contention latency: bpftrace -e ' tracepoint:cgroup:cgroup_rstat_cpu_lock_contended { @start[tid]=nsecs; @cnt[probe]=count()} tracepoint:cgroup:cgroup_rstat_cpu_locked { if (args->contended) { @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);} @cnt[probe]=count()} interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}' Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-14nbd: Use NULL to represent a pointerBart Van Assche1-1/+1
This patch fixes the following sparse warnings: drivers/block/nbd.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer drivers/block/nbd.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/nbd.h): ./include/trace/events/nbd.h:61:1: warning: Using plain integer as NULL pointer Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240510202313.25209-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-14Merge tag 'x86-cpu-2024-05-13' of ↵Linus Torvalds1-4/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - Rework the x86 CPU vendor/family/model code: introduce the 'VFM' value that is an 8+8+8 bit concatenation of the vendor/family/model value, and add macros that work on VFM values. This simplifies the addition of new Intel models & families, and simplifies existing enumeration & quirk code. - Add support for the AMD 0x80000026 leaf, to better parse topology information - Optimize the NUMA allocation layout of more per-CPU data structures - Improve the workaround for AMD erratum 1386 - Clear TME from /proc/cpuinfo as well, when disabled by the firmware - Improve x86 self-tests - Extend the mce_record tracepoint with the ::ppin and ::microcode fields - Implement recovery for MCE errors in TDX/SEAM non-root mode - Misc cleanups and fixes * tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Switch to new Intel CPU model defines x86/tsc_msr: Switch to new Intel CPU model defines x86/tsc: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/resctrl: Switch to new Intel CPU model defines x86/microcode/intel: Switch to new Intel CPU model defines x86/mce: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/cpu/intel_epb: Switch to new Intel CPU model defines x86/aperfmperf: Switch to new Intel CPU model defines x86/apic: Switch to new Intel CPU model defines perf/x86/msr: Switch to new Intel CPU model defines perf/x86/intel/uncore: Switch to new Intel CPU model defines perf/x86/intel/pt: Switch to new Intel CPU model defines perf/x86/lbr: Switch to new Intel CPU model defines perf/x86/intel/cstate: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h x86/cpu/vfm: Add new macros to work with (vendor/family/model) values ...
2024-05-14Merge tag 'sched-core-2024-05-13' of ↵Linus Torvalds2-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Add cpufreq pressure feedback for the scheduler - Rework misfit load-balancing wrt affinity restrictions - Clean up and simplify the code around ::overutilized and ::overload access. - Simplify sched_balance_newidle() - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES handling that changed the output. - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch() - Reorganize, clean up and unify most of the higher level scheduler balancing function names around the sched_balance_*() prefix - Simplify the balancing flag code (sched_balance_running) - Miscellaneous cleanups & fixes * tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/pelt: Remove shift of thermal clock sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() thermal/cpufreq: Remove arch_update_thermal_pressure() sched/cpufreq: Take cpufreq feedback into account cpufreq: Add a cpufreq pressure feedback for the scheduler sched/fair: Fix update of rd->sg_overutilized sched/vtime: Do not include <asm/vtime.h> header s390/irq,nmi: Include <asm/vtime.h> header directly s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover sched/vtime: Get rid of generic vtime_task_switch() implementation sched/vtime: Remove confusing arch_vtime_task_switch() declaration sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized() sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded() sched/fair: Rename root_domain::overload to ::overloaded sched/fair: Use helper functions to access root_domain::overload sched/fair: Check root_domain::overload value before update sched/fair: Combine EAS check with root_domain::overutilized access sched/fair: Simplify the continue_balancing logic in sched_balance_newidle() ...
2024-05-14Merge tag 'execve-6.10-rc1' of ↵Linus Torvalds1-0/+35
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Provide knob to change (previously fixed) coredump NOTES size (Allen Pais) - Add sched_prepare_exec tracepoint (Marco Elver) - Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov) - Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh Balasubramanian) - Leave a gap between .bss and brk * tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs/coredump: Enable dynamic configuration of max file note size binfmt_elf_fdpic: fix /proc/<pid>/auxv binfmt_elf: Leave a gap between .bss and brk Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig tracing: Add sched_prepare_exec tracepoint
2024-05-13Merge tag 'vfs-6.10.netfs' of ↵Linus Torvalds1-13/+237
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This reworks the netfslib writeback implementation so that pages read from the cache are written to the cache through ->writepages(), thereby allowing the fscache page flag to be retired. The reworking also: - builds on top of the new writeback_iter() infrastructure - makes it possible to use vectored write RPCs as discontiguous streams of pages can be accommodated - makes it easier to do simultaneous content crypto and stream division - provides support for retrying writes and re-dividing a stream - replaces the ->launder_folio() op, so that ->writepages() is used instead - uses mempools to allocate the netfs_io_request and netfs_io_subrequest structs to avoid allocation failure in the writeback path Some code that uses the fscache page flag is retained for compatibility purposes with nfs and ceph. The code is switched to using the synonymous private_2 label instead and marked with deprecation comments. The merge commit contains additional details on the new algorithm that I've left out of here as it would probably be excessively detailed. On top of the netfslib infrastructure this contains the work to convert cifs over to netfslib" * tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) cifs: Enable large folio support cifs: Remove some code that's no longer used, part 3 cifs: Remove some code that's no longer used, part 2 cifs: Remove some code that's no longer used, part 1 cifs: Cut over to using netfslib cifs: Implement netfslib hooks cifs: Make add_credits_and_wake_if() clear deducted credits cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs cifs: Set zero_point in the copy_file_range() and remap_file_range() cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c cifs: Replace the writedata replay bool with a netfs sreq flag cifs: Make wait_mtu_credits take size_t args cifs: Use more fields from netfs_io_subrequest cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest cifs: Use alternative invalidation to using launder_folio netfs, afs: Use writeback retry to deal with alternate keys netfs: Miscellaneous tidy ups netfs: Remove the old writeback code netfs: Cut over to using new writeback code ...
2024-05-13Merge tag 'rcu.next.v6.10' of https://github.com/urezki/linuxLinus Torvalds1-0/+27
Pull RCU updates from Uladzislau Rezki: - Fix a lockdep complain for lazy-preemptible kernel, remove redundant BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix false positives KCSAN splat and fix buffer overflow in the print_cpu_stall_info(). - Misc updates related to bpf, tracing and update the MAINTAINERS file. - An improvement of a normal synchronize_rcu() call in terms of latency. It maintains a separate track for sync. users only. This approach bypasses per-cpu nocb-lists thus sync-users do not depend on nocb-list length and how fast regular callbacks are processed. - RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE priority, fix some comments, add some diagnostic warning to the exit_tasks_rcu_start() and fix a buffer overflow in the show_rcu_tasks_trace_gp_kthread(). - RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU testing, some updates for TREE09, dump mode information to debug GP kthread state, remove redundant READ_ONCE(), fix some comments about RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer initialization, fix a hung splat task by when the rcutorture tests start to exit, fix invalid context warning, add '--do-kvfree' parameter to torture test and use slow register unregister callbacks only for rcutype test. * tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits) rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test torture: Scale --do-kvfree test time rcutorture: Fix invalid context warning when enable srcu barrier testing rcutorture: Make stall-tasks directly exit when rcutorture tests end rcutorture: Removing redundant function pointer initialization rcutorture: Make rcutorture support print rcu-tasks gp state rcutorture: Use the gp_kthread_dbg operation specified by cur_ops rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE() rcu: Allocate WQ with WQ_MEM_RECLAIM bit set rcu: Support direct wake-up of synchronize_rcu() users rcu: Add a trace event for synchronize_rcu_normal() rcu: Reduce synchronize_rcu() latency rcu: Fix buffer overflow in print_cpu_stall_info() rcu: Mollify sparse with RCU guard rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE() rcu: Remove redundant CONFIG_PROVE_RCU #if condition ...
2024-05-13Merge tag 'soc-drivers-6.10' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "As usual, these are updates for drivers that are specific to certain SoCs or firmware running on them. Notable updates include - The new STMicroelectronics STM32 "firewall" bus driver that is used to provide a barrier between different parts of an SoC - Lots of updates for the Qualcomm platform drivers, in particular SCM, which gets a rewrite of its initialization code - Firmware driver updates for Arm FF-A notification interrupts and indirect messaging, SCMI firmware support for pin control and vendor specific interfaces, and TEE firmware interface changes across multiple TEE drivers - A larger cleanup of the Mediatek CMDQ driver and some related bits - Kconfig changes for riscv drivers to prepare for adding Kanaan k230 support - Multiple minor updates for the TI sysc bus driver, memory controllers, hisilicon hccs and more" * tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (103 commits) firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G soc: qcom: pmic_glink: Make client-lock non-sleeping dt-bindings: soc: qcom,wcnss: fix bluetooth address example soc/tegra: pmc: Add EQOS wake event for Tegra194 and Tegra234 bus: stm32_firewall: fix off by one in stm32_firewall_get_firewall() bus: etzpc: introduce ETZPC firewall controller driver firmware: arm_ffa: Avoid queuing work when running on the worker queue bus: ti-sysc: Drop legacy idle quirk handling bus: ti-sysc: Drop legacy quirk handling for smartreflex bus: ti-sysc: Drop legacy quirk handling for uarts bus: ti-sysc: Add a description and copyrights bus: ti-sysc: Move check for no-reset-on-init soc: hisilicon: kunpeng_hccs: replace MAILBOX dependency with PCC soc: hisilicon: kunpeng_hccs: Add the check for obtaining complete port attribute firmware: arm_ffa: Fix memory corruption in ffa_msg_send2() bus: rifsc: introduce RIFSC firewall controller driver of: property: fw_devlink: Add support for "access-controller" soc: mediatek: mtk-socinfo: Correct the marketing name for MT8188GV soc: mediatek: mtk-socinfo: Add entry for MT8395AV/ZA Genio 1200 soc: mediatek: mtk-mutex: Add support for MT8188 VPPSYS ...
2024-05-13Merge tag 'asoc-v6.10' of ↵Takashi Iwai2-2/+4
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.10 This is a very big update, in large part due to extensive work the Intel people have been doing in their drivers though it's also been busy elsewhere. There's also a big overhaul of the DAPM documentation from Luca Ceresoli arising from the work he did putting together his recent ELC talk, and he also contributed a new tool for visualising the DAPM state. - A new tool dapm-graph for visualising the DAPM state. - Substantial fixes and clarifications for the DAPM documentation. - Very large updates throughout the Intel audio drivers. - Cleanups of accessors for driver data, module labelling, and for constification. - Modernsation and cleanup work in the Mediatek drivers. - Several fixes and features for the DaVinci I2S driver. - New drivers for several AMD and Intel platforms, Nuvoton NAU8325, Rockchip RK3308 and Texas Instruments PCM6240.
2024-05-12nilfs2: use __field_struct() for a bitwise fieldBart Van Assche1-1/+5
As one can see in include/trace/stages/stage4_event_fields.h, the implementation of __field() uses the is_signed_type() macro. As one can see in commit dcf8e5633e2e ("tracing: Define the is_signed_type() macro once"), there has been an attempt to not make is_signed_type() trigger sparse warnings for bitwise types. Despite that change, sparse complains when passing a bitwise type to is_signed_type(). The reason is that in its definition below, an inequality comparison will be made against bitwise types, which are random collections of bits (the casts to bitwise types themselves are semantically valid and not problematic): #define is_signed_type(type) (((type)(-1)) < (__force type)1) So, as a workaround, follow the example of <trace/events/initcall.h> and suppress the following sparse warnings by changing __field() into __field_struct() that doesn't use is_signed_type(): fs/nilfs2/segment.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/nilfs2.h): ./include/trace/events/nilfs2.h:191:1: warning: cast to restricted blk_opf_t ./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t degrades to integer ./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t degrades to integer [konishi.ryusuke: describe the reason for the warnings per Linus's explanation] Link: https://lkml.kernel.org/r/20240507222041.4876-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240507142454.3344-1-konishi.ryusuke@gmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401092241.I4mm9OWl-lkp@intel.com/ Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Closes: https://lore.kernel.org/all/20240430080019.4242-2-konishi.ryusuke@gmail.com/ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-10Merge tag 'loongarch-kvm-6.10' of ↵Paolo Bonzini2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.10 1. Add ParaVirt IPI support. 2. Add software breakpoint support. 3. Add mmio trace events support.
2024-05-09ALSA/ASoC: include: clarify Copyright informationPierre-Louis Bossart2-2/+2
For some reason a number of files included the "All rights reserved" statement. Good old copy-paste made sure this mistake proliferated. Remove the "All rights reserved" in all Intel-copyright to align with internal guidance. Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20240503140359.259762-10-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-08net/ipv4: add tracepoint for icmp_sendPeilin He1-0/+67
Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen. 1. Giving an usecase example: ============================= When an application experiences packet loss due to an unreachable UDP destination port, the kernel will send an exception message through the icmp_send function. By adding a trace point for icmp_send, developers or system administrators can obtain detailed information about the UDP packet loss, including the type, code, source address, destination address, source port, and destination port. This facilitates the trouble-shooting of UDP packet loss issues especially for those network-service applications. 2. Operation Instructions: ========================== Switch to the tracing directory. cd /sys/kernel/tracing Filter for destination port unreachable. echo "type==3 && code==3" > events/icmp/icmp_send/filter Enable trace event. echo 1 > events/icmp/icmp_send/enable 3. Result View: ================ udp_client_erro-11370 [002] ...s.12 124.728002: icmp_send: icmp_send: type=3, code=3. From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23 skbaddr=00000000589b167a Signed-off-by: Peilin He <he.peilin@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Liu Chun <liu.chun2@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-07btrfs: add tracepoints for extent map shrinker eventsFilipe Manana1-0/+99
Add some tracepoints for the extent map shrinker to help debug and analyse main events. These have proved useful during development of the shrinker. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: stop referencing btrfs_delayed_tree_ref directlyJosef Bacik1-1/+0
We only ever need to use this to get the level of the tree block ref, so use the btrfs_delayed_ref_owner() helper, which returns the level for the given reference. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: stop referencing btrfs_delayed_data_ref directlyJosef Bacik1-1/+0
Now that most of our elements are inside of btrfs_delayed_ref_node directly and we have helpers for the delayed_data_ref bits, go ahead and remove all direct usage of btrfs_delayed_data_ref and use the helpers where needed. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_nodeJosef Bacik1-4/+4
These two members are shared by both the tree refs and data refs, so move them into btrfs_delayed_ref_node proper. This allows us to greatly simplify the comparison code, as the shared refs always only sort on parent, and the non shared refs always sort first on ref_root, and then only data refs sort on their specific fields. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: simplify delayed ref tracepointsJosef Bacik1-33/+21
Now that all of the delayed ref information is in the delayed ref node, drastically simplify the delayed ref tracepoints by simply passing in the btrfs_delayed_ref_node and populating the tracepoints with the values from the structure itself. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: remove not needed mod_start and mod_len from struct extent_mapFilipe Manana1-2/+1
The mod_start and mod_len fields of struct extent_map were introduced by commit 4e2f84e63dc1 ("Btrfs: improve fsync by filtering extents that we want") in order to avoid too low performance when fsyncing a file that keeps getting extent maps merge, because it resulted in each fsync logging again csum ranges that were already merged before. We don't need this anymore as extent maps in the list of modified extents are never merged with other extent maps and once we log an extent map we remove it from the list of modified extent maps, so it's never logged twice. So remove the mod_start and mod_len fields from struct extent_map and use instead the start and len fields when logging checksums in the fast fsync path. This also makes EXTENT_FLAG_FILLING unused so remove it as well. Running the reproducer from the commit mentioned before, with a larger number of extents and against a null block device, so that IO is fast and we can better see any impact from searching checksums items and logging them, gave the following results from dd: Before this change: 409600000 bytes (410 MB, 391 MiB) copied, 22.948 s, 17.8 MB/s After this change: 409600000 bytes (410 MB, 391 MiB) copied, 22.9997 s, 17.8 MB/s So no changes in throughput. The test was done in a release kernel (non-debug, Debian's default kernel config) and its steps are the following: $ mkfs.btrfs -f /dev/nullb0 $ mount /dev/sdb /mnt $ dd if=/dev/zero of=/mnt/foobar bs=4k count=100000 oflag=sync $ umount /mnt This also reduces the size of struct extent_map from 128 bytes down to 112 bytes, so now we can have 36 extents maps per 4K page instead of 32. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-06firewire: core: remove flag and width from u64 formats of tracepoints eventsTakashi Sakamoto1-3/+3
The pointer to fw_packet structure is passed to ring buffer of tracepoints framework as the value of u64 type. '0x%016llx' is used for the print format of value, while the flag and width are useless in the case. This commit removes them. Link: https://lore.kernel.org/r/20240506082154.396077-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: fix type of timestamp for async_inbound_template tracepoints ↵Takashi Sakamoto1-1/+1
events The type of time stamp should be u16, instead of u8. Link: https://lore.kernel.org/r/20240506082154.396077-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoint event for handling bus resetTakashi Sakamoto1-1/+27
The core function expects hardware drivers to call fw_core_handle_bus_reset() when changing bus topology. The 1394 OHCI driver calls it when handling selfID event as a result of any bus-reset. This commit adds a tracepoints event for it. Link: https://lore.kernel.org/r/20240501073238.72769-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for initiating bus resetTakashi Sakamoto1-0/+33
At a commit 673249124304 ("firewire: core: option to log bus reset initiation"), some kernel log messages were added to trace initiation of bus reset. The kernel log messages are really helpful, while nowadays it is not preferable just for debugging purpose. For the purpose, Linux kernel tracepoints is more preferable. This commit adds some alternative tracepoints events. Link: https://lore.kernel.org/r/20240501073238.72769-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints event for asynchronous inbound phy packetTakashi Sakamoto1-0/+30
At the former commit, a pair of tracepoints events is added to trace asynchronous outbound phy packet. This commit adds a tracepoints event to trace inbound phy packet. It includes transaction status as well as the content of phy packet. This is an example for Remote Reply Packet as a response to Remote Access Packet sent by lsfirewirephy command in linux-firewire-utils: async_phy_inbound: \ packet=0xffff955fc02b4e10 generation=1 status=1 timestamp=0x0619 \ first_quadlet=0x001c8208 second_quadlet=0xffe37df7 Link: https://lore.kernel.org/r/20240430001404.734657-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core/cdev: add tracepoints events for asynchronous phy packetTakashi Sakamoto1-0/+48
In IEEE 1394 bus, the type of asynchronous packet without any offset to node address space is called as phy packet. The destination of packet is IEEE 1394 phy itself. This type of packet is used for several purposes, mainly for selfID at the state of bus reset, to force selection of root node, and to adjust gap count. This commit adds tracepoints events for the type of asynchronous outbound packet. Like asynchronous outbound transaction packets, a pair of events are added to trace initiation and completion of transmission. In the case that the phy packet is sent by kernel API, the match between the initiation and completion is not so easy, since the data of 'struct fw_packet' is allocated statically. In the case that it is sent by userspace applications via cdev, the match is easy, since the data is allocated per each. This example is for Remote Access Packet by lsfirewirephy command in linux-firewire-utils: async_phy_outbound_initiate: \ packet=0xffff89fb34e42e78 generation=1 first_quadlet=0x00148200 \ second_quadlet=0xffeb7dff async_phy_outbound_complete: \ packet=0xffff89fb34e42e78 generation=1 status=1 timestamp=0x0619 Link: https://lore.kernel.org/r/20240430001404.734657-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for asynchronous outbound responseTakashi Sakamoto1-0/+24
In a view of core transaction service, the asynchronous outbound response consists of two stages; initiation and completion. This commit adds a pair of events for the asynchronous outbound response. The following example is for asynchronous write quadlet request as IEC 61883-1 FCP response to node 0xffc1. async_response_outbound_initiate: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 dst_id=0xffc1 \ tlabel=25 tcode=2 src_id=0xffc0 rcode=0 \ header={0xffc16420,0xffc00000,0x0,0x0} data={} async_response_outbound_complete: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=1 \ timestamp=0x0000 Link: https://lore.kernel.org/r/20240429043218.609398-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoint event for asynchronous inbound requestTakashi Sakamoto1-0/+21
This commit adds an event for asynchronous inbound request. The following example is for asynchronous block write request as IEC 61883-1 FCP request from node 0xffc1. async_request_inbound: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=2 \ timestamp=0x00b3 dst_id=0xffc0 tlabel=19 tcode=1 src_id=0xffc1 \ offset=0xfffff0000d00 header={0xffc04d10,0xffc1ffff,0xf0000d00,0x80000} \ data={0x19ff08,0xffff0090} Link: https://lore.kernel.org/r/20240429043218.609398-5-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints event for asynchronous inbound responseTakashi Sakamoto1-0/+50
In the transaction of IEEE 1394, the node to receive the asynchronous request transfers any response packet to the requester except for the unified transaction. This commit adds an event for the inbound packet. Note that the code to decode the packet header is moved, against the note about the sanity check. The following example is for asynchronous lock response with compare_and_swap code. async_response_inbound: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 status=1 \ timestamp=0x0089 dst_id=0xffc1 tlabel=54 tcode=11 src_id=0xffc0 \ rcode=0 header={0xffc1d9b0,0xffc00000,0x0,0x40002} data={0x50800080} Link: https://lore.kernel.org/r/20240429043218.609398-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for asynchronous outbound requestTakashi Sakamoto1-1/+102
In a view of core transaction service, the asynchronous outbound request consists of two stages; initiation and completion. This commit adds a pair of event for them. The following example is for asynchronous lock request with compare_swap code to offset 0x'ffff'f000'0904 in node 0xffc0. async_request_outbound_initiate: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 dst_id=0xffc0 \ tlabel=54 tcode=9 src_id=0xffc1 offset=0xfffff0000904 \ header={0xffc0d990,0xffc1ffff,0xf0000904,0x80002} data={0x80,0x940181} async_request_outbound_complete: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 status=2 \ timestamp=0xd887 Link: https://lore.kernel.org/r/20240429043218.609398-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add support for Linux kernel tracepointsTakashi Sakamoto1-0/+15
The Linux Kernel Tracepoints framework is enough useful to trace packet data inbound to and outbound from core. This commit adds firewire subsystem to use the framework. Link: https://lore.kernel.org/r/20240429043218.609398-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06trace/events/page_ref: trace the raw page mapcount valueDavid Hildenbrand1-2/+2
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. We already trace raw page->refcount, raw page->flags and raw page->mapping, and don't involve any folios. Let's also trace the raw mapcount value that does not consider the entire mapcount of large folios, and we don't add "1" to it. When dealing with typed folios, this makes a lot more sense. ... and it's for debugging purposes only either way. Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01cifs: Implement netfslib hooksDavid Howells1-0/+1
Provide implementation of the netfslib hooks that will be used by netfslib to ask cifs to set up and perform operations. Of particular note are (*) cifs_clamp_length() - This is used to negotiate the size of the next subrequest in a read request, taking into account the credit available and the rsize. The credits are attached to the subrequest. (*) cifs_req_issue_read() - This is used to issue a subrequest that has been set up and clamped. (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a channel, reopening the file and requesting credits so that we can set the maximum size of the subrequest and also sets the maximum number of segments if we're doing RDMA. (*) cifs_issue_write() - This releases any unneeded credits and issues an asynchronous data write for the contiguous slice of file covered by the subrequest. This should possibly be folded in to all ->async_writev() ops and that called directly. (*) cifs_begin_writeback() - This gets the cached writable handle through which we do writeback (this does not affect writethrough, unbuffered or direct writes). At this point, cifs is not wired up to actually *use* netfslib; that will be done in a subsequent patch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs: Miscellaneous tidy upsDavid Howells1-5/+1
Do a couple of miscellaneous tidy ups: (1) Add a qualifier into a file banner comment. (2) Put the writeback folio traces back into alphabetical order. (3) Remove some unused folio traces. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: New writeback implementationDavid Howells1-3/+229
The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org