summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-01-11Merge tag 'net-next-6.8' of ↵Linus Torvalds111-975/+3246
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests. Core & protocols: - Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes This improves TCP performances with many concurrent connections up to 40% - Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks - Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination - Refactor the TCP bind conflict code, shrinking related socket structs - Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF - Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it - Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations - Reduce extension header parsing overhead at GRO time - Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries - Reorder nftables struct members, to keep data accessed by the datapath first - Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex) - More data-race annotations - Extend the diag interface to dump TCP bound-only sockets - Conditional notification of events for TC qdisc class and actions - Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID - Implement SMCv2.1 virtual ISM device support - Add support for Batman-avd mulicast packet type BPF: - Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes - Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload - Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y - Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques - Support uid/gid options when mounting bpffs - Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id - Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext - Add BPF link_info support for uprobe multi link along with bpftool integration for the latter - Support for VLAN tag in XDP hints - Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter) Misc: - Support for parellel TC self-tests execution - Increase MPTCP self-tests coverage - Updated the bridge documentation, including several so-far undocumented features - Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs - Add TCP-AO self-tests - Add kunit tests for both cfg80211 and mac80211 - Autogenerate Netlink families documentation from YAML spec - Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs - A bunch of additional module descriptions fixes - Catch incorrect freeing of pages belonging to a page pool Driver API: - Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust - Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship - Introduce notifications filtering for devlink to allow control application scale to thousands of instances - Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host - Add support for ethtool symmetric-xor RSS hash - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform - Expose pin fractional frequency offset value over new DPLL generic netlink attribute - Convert older drivers to platform remove callback returning void - Add support for PHY package MMD read/write New hardware / drivers: - Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY - Bluetooth: - IMC Networks Bluetooth radio Removed: - WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver Driver updates: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload - Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation - nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode - Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support - Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support - Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels - Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync" * tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits) lan78xx: remove redundant statement in lan78xx_get_eee lan743x: remove redundant statement in lan743x_ethtool_get_eee bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer() bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel() bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter() tcp: Revert no longer abort SYN_SENT when receiving some ICMP Revert "mlx5 updates 2023-12-20" Revert "net: stmmac: Enable Per DMA Channel interrupt" ipvlan: Remove usage of the deprecated ida_simple_xx() API ipvlan: Fix a typo in a comment net/sched: Remove ipt action tests net: stmmac: Use interrupt mode INTM=1 for per channel irq net: stmmac: Add support for TX/RX channel interrupt net: stmmac: Make MSI interrupt routine generic dt-bindings: net: snps,dwmac: per channel irq net: phy: at803x: make read_status more generic net: phy: at803x: add support for cdt cross short test for qca808x net: phy: at803x: refactor qca808x cable test get status function net: phy: at803x: generalize cdt fault length function net: ethernet: cortina: Drop TSO support ...
2024-01-11Merge tag 'asm-generic-6.8' of ↵Linus Torvalds3-1/+27
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "A series from Baoquan He cleans up the asm-generic/io.h to remove the ioremap_uc() definition from everything except x86, which still needs it for pre-PAT systems. This series notably contains a patch from Jiaxun Yang that converts MIPS to use asm-generic/io.h like every other architecture does, enabling future cleanups. Some of my own patches fix -Wmissing-prototype warnings in architecture specific code across several architectures. This is now needed as the warning is enabled by default. There are still some remaining warnings in minor platforms, but the series should catch most of the widely used ones make them more consistent with one another. David McKay fixes a bug in __generic_cmpxchg_local() when this is used on 64-bit architectures. This could currently only affect parisc64 and sparc64. Additional cleanups address from Linus Walleij, Uwe Kleine-König, Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies between architectures" * tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: Fix 32 bit __generic_cmpxchg_local Hexagon: Make pfn accessors statics inlines ARC: mm: Make virt_to_pfn() a static inline mips: remove extraneous asm-generic/iomap.h include sparc: Use $(kecho) to announce kernel images being ready arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes csky: fix arch_jump_label_transform_static override arch: add do_page_fault prototypes arch: add missing prepare_ftrace_return() prototypes arch: vdso: consolidate gettime prototypes arch: include linux/cpu.h for trap_init() prototype arch: fix asm-offsets.c building with -Wmissing-prototypes arch: consolidate arch_irq_work_raise prototypes hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() mips: io: remove duplicated codes arch/*/io.h: remove ioremap_uc in some architectures mips: add <asm-generic/io.h> including
2024-01-11Merge tag 'modules-6.8-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "Just one cleanup and one documentation improvement change. No functional changes" * tag 'modules-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kernel/module: improve documentation for try_module_get() module: Remove redundant TASK_UNINTERRUPTIBLE
2024-01-11Merge tag 'sysctl-6.8-rc1' of ↵Linus Torvalds1-7/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. In the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7 we had all arch/ and drivers/ modified to remove the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only. The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches are still being reviewed. After that we then can expect also the removal of the no longer needed check for procname == NULL. Let us recap the purpose of this work: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further work by Thomas Weißschuh with the constificatin of the struct ctl_table. Due to Joel Granados's work, and to help bring in new blood, I have suggested for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward to seeing him sent you a pull request for further sysctl changes. This also removes Iurii Zaikin as a maintainer as he has moved on to other projects and has had no time to help at all" * tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove struct ctl_path sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR coda: Remove the now superfluous sentinel elements from ctl_table array sysctl: Remove the now superfluous sentinel elements from ctl_table array fs: Remove the now superfluous sentinel elements from ctl_table array cachefiles: Remove the now superfluous sentinel element from ctl_table array sysclt: Clarify the results of selftest run sysctl: Add a selftest for handling empty dirs sysctl: Fix out of bounds access for empty sysctl registers MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl MAINTAINERS: remove Iurii Zaikin from proc sysctl
2024-01-11Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds64-767/+972
Pull header cleanups from Kent Overstreet: "The goal is to get sched.h down to a type only header, so the main thing happening in this patchset is splitting out various _types.h headers and dependency fixups, as well as moving some things out of sched.h to better locations. This is prep work for the memory allocation profiling patchset which adds new sched.h interdepencencies" * tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits) Kill sched.h dependency on rcupdate.h kill unnecessary thread_info.h include Kill unnecessary kernel.h include preempt.h: Kill dependency on list.h rseq: Split out rseq.h from sched.h LoongArch: signal.c: add header file to fix build error restart_block: Trim includes lockdep: move held_lock to lockdep_types.h sem: Split out sem_types.h uidgid: Split out uidgid_types.h seccomp: Split out seccomp_types.h refcount: Split out refcount_types.h uapi/linux/resource.h: fix include x86/signal: kill dependency on time.h syscall_user_dispatch.h: split out *_types.h mm_types_task.h: Trim dependencies Split out irqflags_types.h ipc: Kill bogus dependency on spinlock.h shm: Slim down dependencies workqueue: Split out workqueue_types.h ...
2024-01-11Merge tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds6-15/+17
Pull nfs client updates from Anna Schumaker: "New Features: - Always ask for type with READDIR - Remove nfs_writepage() Bugfixes: - Fix a suspicious RCU usage warning - Fix a blocklayoutdriver reference leak - Fix the block driver's calculation of layoutget size - Fix handling NFS4ERR_RETURNCONFLICT - Fix _xprt_switch_find_current_entry() - Fix v4.1 backchannel request timeouts - Don't add zero-length pnfs block devices - Use the parent cred in nfs_access_login_time() Cleanups: - A few improvements when dealing with referring calls from the server - Clean up various unused variables, struct fields, and function calls - Various tracepoint improvements" * tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (21 commits) NFSv4.1: Use the nfs_client's rpc timeouts for backchannel SUNRPC: Fixup v4.1 backchannel request timeouts rpc_pipefs: Replace one label in bl_resolve_deviceid() nfs: Remove writepage NFS: drop unused nfs_direct_req bytes_left pNFS: Fix the pnfs block driver's calculation of layoutget size nfs: print fileid in lookup tracepoints nfs: rename the nfs_async_rename_done tracepoint nfs: add new tracepoint at nfs4 revalidate entry point SUNRPC: fix _xprt_switch_find_current_entry logic NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT NFSv4.1: if referring calls are complete, trust the stateid argument NFSv4: Track the number of referring calls in struct cb_process_state NFS: Use parent's objective cred in nfs_access_login_time() NFSv4: Always ask for type with READDIR pnfs/blocklayout: Don't add zero-length pnfs_block_dev blocklayoutdriver: Fix reference leak of pnfs_device_node SUNRPC: Fix a suspicious RCU usage warning SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch SUNRPC: Remove unused function rpc_clnt_xprt_switch_put() ...
2024-01-11Merge tag 'ext4_for_linus-6.8-rc1' of ↵Linus Torvalds1-11/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Various ext4 bug fixes and cleanups. The fixes are mostly in the fstrim and mballoc code paths. Also enable dioread_nolock in the case where the block size is less than the page size (dioread_nolock has been default in the bs == ps case for quite some time)" * tag 'ext4_for_linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix inconsistent between segment fstrim and full fstrim ext4: fallback to complex scan if aligned scan doesn't work ext4: convert ext4_da_do_write_end() to take a folio ext4: allow for the last group to be marked as trimmed ext4: move ext4_check_bdev_write_error() into nojournal mode jbd2: abort journal when detecting metadata writeback error of fs dev jbd2: remove unused 'JBD2_CHECKPOINT_IO_ERROR' and 'j_atomic_flags' jbd2: replace journal state flag by checking errseq jbd2: add errseq to detect client fs's bdev writeback error ext4: improving calculation of 'fe_{len|start}' in mb_find_extent() ext4: clarify handling of unwritten bh in __ext4_block_zero_page_range() ext4: treat end of range as exclusive in ext4_zero_range() ext4: enable dioread_nolock as default for bs < ps case ext4: delete redundant calculations in ext4_mb_get_buddy_page_lock() ext4: reduce unnecessary memory allocation in alloc_flex_gd() ext4: avoid online resizing failures due to oversized flex bg ext4: remove unnecessary check from alloc_flex_gd() ext4: unify the type of flexbg_size to unsigned int
2024-01-10Merge tag 'v6.8-p1' of ↵Linus Torvalds4-36/+134
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add incremental lskcipher/skcipher processing Algorithms: - Remove SHA1 from drbg - Remove CFB and OFB Drivers: - Add comp high perf mode configuration in hisilicon/zip - Add support for 420xx devices in qat - Add IAA Compression Accelerator driver" * tag 'v6.8-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (172 commits) crypto: iaa - Account for cpu-less numa nodes crypto: scomp - fix req->dst buffer overflow crypto: sahara - add support for crypto_engine crypto: sahara - remove error message for bad aes request size crypto: sahara - remove unnecessary NULL assignments crypto: sahara - remove 'active' flag from sahara_aes_reqctx struct crypto: sahara - use dev_err_probe() crypto: sahara - use devm_clk_get_enabled() crypto: sahara - use BIT() macro crypto: sahara - clean up macro indentation crypto: sahara - do not resize req->src when doing hash operations crypto: sahara - fix processing hash requests with req->nbytes < sg->length crypto: sahara - improve error handling in sahara_sha_process() crypto: sahara - fix wait_for_completion_timeout() error handling crypto: sahara - fix ahash reqsize crypto: sahara - handle zero-length aes requests crypto: skcipher - remove excess kerneldoc members crypto: shash - remove excess kerneldoc members crypto: qat - generate dynamically arbiter mappings crypto: qat - add support for ring pair level telemetry ...
2024-01-10Merge tag 'hardening-v6.8-rc1' of ↵Linus Torvalds3-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce the param_unknown_fn type and other clean ups (Andy Shevchenko) - Various __counted_by annotations (Christophe JAILLET, Gustavo A. R. Silva, Kees Cook) - Add KFENCE test to LKDTM (Stephen Boyd) - Various strncpy() refactorings (Justin Stitt) - Fix qnx4 to avoid writing into the smaller of two overlapping buffers - Various strlcpy() refactorings * tag 'hardening-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qnx4: Use get_directory_fname() in qnx4_match() qnx4: Extract dir entry filename processing into helper atags_proc: Add __counted_by for struct buffer and use struct_size() tracing/uprobe: Replace strlcpy() with strscpy() params: Fix multi-line comment style params: Sort headers params: Use size_add() for kmalloc() params: Do not go over the limit when getting the string length params: Introduce the param_unknown_fn type lkdtm: Add kfence read after free crash type nvme-fc: replace deprecated strncpy with strscpy nvdimm/btt: replace deprecated strncpy with strscpy nvme-fabrics: replace deprecated strncpy with strscpy drm/modes: replace deprecated strncpy with strscpy_pad afs: Add __counted_by for struct afs_acl and use struct_size() VMCI: Annotate struct vmci_handle_arr with __counted_by i40e: Annotate struct i40e_qvlist_info with __counted_by HID: uhid: replace deprecated strncpy with strscpy samples: Replace strlcpy() with strscpy() SUNRPC: Replace strlcpy() with strscpy()
2024-01-10Merge tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds6-166/+204
Pull nfsd updates from Chuck Lever: "The bulk of the patches for this release are clean-ups and minor bug fixes. There is one significant revert to mention: support for RDMA Read operations in the server's RPC-over-RDMA transport implementation has been fixed so it waits for Read completion in a way that avoids tying up an nfsd thread. This prevents a possible DoS vector if an RPC-over-RDMA client should become unresponsive during RDMA Read operations. As always I am grateful to NFSD contributors, reviewers, and testers" * tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) nfsd: rename nfsd_last_thread() to nfsd_destroy_serv() SUNRPC: discard sv_refcnt, and svc_get/svc_put svc: don't hold reference for poolstats, only mutex. SUNRPC: remove printk when back channel request not found svcrdma: Implement multi-stage Read completion again svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete() svcrdma: Add back svcxprt_rdma::sc_read_complete_q svcrdma: Add back svc_rdma_recv_ctxt::rc_pages svcrdma: Clean up comment in svc_rdma_accept() svcrdma: Remove queue-shortening warnings svcrdma: Remove pointer addresses shown in dprintk() svcrdma: Optimize svc_rdma_cc_init() svcrdma: De-duplicate completion ID initialization helpers svcrdma: Move the svc_rdma_cc_init() call svcrdma: Remove struct svc_rdma_read_info svcrdma: Update the synopsis of svc_rdma_read_special() svcrdma: Update the synopsis of svc_rdma_read_call_chunk() svcrdma: Update synopsis of svc_rdma_read_multiple_chunks() svcrdma: Update synopsis of svc_rdma_copy_inline_range() svcrdma: Update the synopsis of svc_rdma_read_data_item() ...
2024-01-10Merge tag 'afs-fix-rotation-20240105' of ↵Linus Torvalds3-342/+455
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs updates from David Howells: "The majority of the patches are aimed at fixing and improving the AFS filesystem's rotation over server IP addresses, but there are also some fixes from Oleg Nesterov for the use of read_seqbegin_or_lock(). - Fix fileserver probe handling so that the next round of probes doesn't break ongoing server/address rotation by clearing all the probe result tracking. This could occasionally cause the rotation algorithm to drop straight through, give a 'successful' result without actually emitting any RPC calls, leaving the reply buffer in an undefined state. Instead, detach the probe results into a separate struct and allocate a new one each time we start probing and update the pointer to it. Probes are also sent in order of address preference to try and improve the chance that the preferred one will complete first. - Fix server rotation so that it uses configurable address preferences across on the probes that have completed so far than ranking them by RTT as the latter doesn't necessarily give the best route. The preference list can be altered by writing into /proc/net/afs/addr_prefs. - Fix the handling of Read-Only (and Backup) volume callbacks as there is one per volume, not one per file, so if someone performs a command that, say, offlines the volume but doesn't change it, when it comes back online we don't spam the server with a status fetch for every vnode we're using. Instead, check the Creation timestamp in the VolSync record when prompted by a callback break. - Handle volume regression (ie. a RW volume being restored from a backup) by scrubbing all cache data for that volume. This is detected from the VolSync creation timestamp. - Adjust abort handling and abort -> error mapping to match better with what other AFS clients do. - Fix offline and busy volume state handling as they only apply to individual server instances and not entire volumes and the rotation algorithm should go and look at other servers if available. Also make it sleep briefly before each retry if all the volume instances are unavailable" * tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (40 commits) afs: trace: Log afs_make_call(), including server address afs: Fix offline and busy message emission afs: Fix fileserver rotation afs: Overhaul invalidation handling to better support RO volumes afs: Parse the VolSync record in the reply of a number of RPC ops afs: Don't leave DONTUSE/NEWREPSITE servers out of server list afs: Fix comment in afs_do_lookup() afs: Apply server breaks to mmap'd files in the call processor afs: Move the vnode/volume validity checking code into its own file afs: Defer volume record destruction to a workqueue afs: Make it possible to find the volumes that are using a server afs: Combine the endpoint state bools into a bitmask afs: Keep a record of the current fileserver endpoint state afs: Dispatch vlserver probes in priority order afs: Dispatch fileserver probes in priority order afs: Mark address lists with configured priorities afs: Provide a way to configure address priorities afs: Remove the unimplemented afs_cmp_addr_list() afs: Add some more info to /proc/net/afs/servers rxrpc: Create a procfile to display outstanding client conn bundles ...
2024-01-10Merge tag 'for-6.8-tag' of ↵Linus Torvalds1-48/+30
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There are no exciting changes for users, it's been mostly API conversions and some fixes or refactoring. The mount API conversion is a base for future improvements that would come with VFS. Metadata processing has been converted to folios, not yet enabling the large folios but it's one patch away once everything gets tested enough. Core changes: - convert extent buffers to folios: - direct API conversion where possible - performance can drop by a few percent on metadata heavy workloads, the folio sizes are not constant and the calculations add up in the item helpers - both regular and subpage modes - data cannot be converted yet, we need to port that to iomap and there are some other generic changes required - convert mount to the new API, should not be user visible: - options deprecated long time ago have been removed: inode_cache, recovery - the new logic that splits mount to two phases slightly changes timing of device scanning for multi-device filesystems - LSM options will now work (like for selinux) - convert delayed nodes radix tree to xarray, preserving the preload-like logic that still allows to allocate with GFP_NOFS - more validation of sysfs value of scrub_speed_max - refactor chunk map structure, reduce size and improve performance - extent map refactoring, smaller data structures, improved performance - reduce size of struct extent_io_tree, embedded in several structures - temporary pages used for compression are cached and attached to a shrinker, this may slightly improve performance - in zoned mode, remove redirty extent buffer tracking, zeros are written in case an out-of-order is detected and proper data are written to the actual write pointer - cleanups, refactoring, error message improvements, updated tests - verify and update branch name or tag - remove unwanted text" * tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (89 commits) btrfs: pass btrfs_io_geometry into btrfs_max_io_len btrfs: pass struct btrfs_io_geometry to set_io_stripe btrfs: open code set_io_stripe for RAID56 btrfs: change block mapping to switch/case in btrfs_map_block btrfs: factor out block mapping for single profiles btrfs: factor out block mapping for RAID5/6 btrfs: reduce scope of data_stripes in btrfs_map_block btrfs: factor out block mapping for RAID10 btrfs: factor out block mapping for DUP profiles btrfs: factor out RAID1 block mapping btrfs: factor out block-mapping for RAID0 btrfs: re-introduce struct btrfs_io_geometry btrfs: factor out helper for single device IO check btrfs: migrate btrfs_repair_io_failure() to folio interfaces btrfs: migrate eb_bitmap_offset() to folio interfaces btrfs: migrate various end io functions to folios btrfs: migrate subpage code to folio interfaces btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios btrfs: don't double put our subpage reference in alloc_extent_buffer btrfs: cleanup metadata page pointer usage ...
2024-01-10Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+1
Pull xfs updates from Chandan Babu: "New features/functionality: - Online repair: - Reserve disk space for online repairs - Fix misinteraction between the AIL and btree bulkloader because of which the bulk load fails to queue a buffer for writeback if it happens to be on the AIL list - Prevent transaction reservation overflows when reaping blocks during online repair - Whenever possible, bulkloader now copies multiple records into a block - Support repairing of 1. Per-AG free space, inode and refcount btrees 2. Ondisk inodes 3. File data and attribute fork mappings - Verify the contents of 1. Inode and data fork of realtime bitmap file 2. Quota files - Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about a pmem device being removed Bug fixes: - Fix memory leak of recovered attri intent items - Fix UAF during log intent recovery - Fix realtime geometry integer overflows - Prevent scrub from live locking in xchk_iget - Prevent fs shutdown when removing files during low free disk space - Prevent transaction reservation overflow when extending an RT device - Prevent incorrect warning from being printed when extending a filesystem - Fix an off-by-one error in xreap_agextent_binval - Serialize access to perag radix tree during deletion operation - Fix perag memory leak during growfs - Allow allocation of minlen realtime extent when the maximum sized realtime free extent is minlen in size Cleanups: - Remove duplicate boilerplate code spread across functionality associated with different log items - Cleanup resblks interfaces - Pass defer ops pointer to defer helpers instead of an enum - Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings - Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of structures and structure member offsets. This is done in order to be able to share the code with userspace - Move XFS documentation under a new directory specific to XFS - Do not invoke deferred ops' ->create_done callback if the deferred operation does not have an intent item associated with it - Remove duplicate inclusion of header files from scrub/health.c - Refactor Realtime code - Cleanup attr code" * tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits) xfs: use the op name in trace_xlog_intent_recovery_failed xfs: fix a use after free in xfs_defer_finish_recovery xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts xfs: remove xfs_attr_sf_hdr_t xfs: remove struct xfs_attr_shortform xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue xfs: remove xfs_attr_shortform_lookup xfs: simplify xfs_attr_sf_findname xfs: move the xfs_attr_sf_lookup tracepoint xfs: return if_data from xfs_idata_realloc xfs: make if_data a void pointer xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc xfs: simplify and optimize the RT allocation fallback cascade xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc xfs: remove XFS_RTMIN/XFS_RTMAX xfs: remove rt-wrappers from xfs_format.h xfs: factor out a xfs_rtalloc_sumlevel helper xfs: tidy up xfs_rtallocate_extent_exact xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block xfs: reflow the tail end of xfs_rtallocate_extent_block ...
2024-01-10Merge tag 'fsnotify_for_v6.8-rc1' of ↵Linus Torvalds1-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fanotify changes allowing use of fanotify directory events even for filesystems such as FUSE which don't report proper fsid" * tag 'fsnotify_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: allow "weak" fsid when watching a single filesystem fanotify: store fsid in mark instead of in connector
2024-01-10Merge tag 'fs_for_v6.8-rc1' of ↵Linus Torvalds1-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull small quota cleanup from Jan Kara. * tag 'fs_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: convert dquot_claim_space_nodirty() to return void
2024-01-10Merge tag 'linux_kselftest-kunit-6.8-rc1' of ↵Linus Torvalds6-15/+134
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - a new feature that adds APIs for managing devices introducing a set of helper functions which allow devices (internally a struct kunit_device) to be created and managed by KUnit. These devices will be automatically unregistered on test exit. These helpers can either use a user-provided struct device_driver, or have one automatically created and managed by KUnit. In both cases, the device lives on a new kunit_bus. - changes to switch drm/tests to use kunit devices - several fixes and enhancements to attribute feature - changes to reorganize deferred action function introducing KUNIT_DEFINE_ACTION_WRAPPER - new feature adds ability to run tests after boot using debugfs - fixes and enhancements to string-stream-test: - parse ERR_PTR in string_stream_destroy() - unchecked dereference in bug fix in debugfs_print_results() - handling errors from alloc_string_stream() - NULL-dereference bug fix in kunit_init_suite() * tag 'linux_kselftest-kunit-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (27 commits) kunit: Fix some comments which were mistakenly kerneldoc kunit: Protect string comparisons against NULL kunit: Add example of kunit_activate_static_stub() with pointer-to-function kunit: Allow passing function pointer to kunit_activate_static_stub() kunit: Fix NULL-dereference in kunit_init_suite() if suite->log is NULL kunit: Reset test->priv after each param iteration kunit: Add example for using test->priv drm/tests: Switch to kunit devices ASoC: topology: Replace fake root_device with kunit_device in tests overflow: Replace fake root_device with kunit_device fortify: test: Use kunit_device kunit: Add APIs for managing devices Documentation: Add debugfs docs with run after boot kunit: add ability to run tests after boot using debugfs kunit: add is_init test attribute kunit: add example suite to test init suites kunit: add KUNIT_INIT_TABLE to init linker section kunit: move KUNIT_TABLE out of INIT_DATA kunit: tool: add test for parsing attributes kunit: tool: fix parsing of test attributes ...
2024-01-10Merge tag 'efi-next-for-v6.8' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Fix a syzbot reported issue in efivarfs where concurrent accesses to the file system resulted in list corruption - Add support for accessing EFI variables via the TEE subsystem (and a trusted application in the secure world) instead of via EFI runtime firmware running in the OS's execution context - Avoid linker tricks to discover the image base on LoongArch * tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: memmap: fix kernel-doc warnings efi/loongarch: Directly position the loaded image file efivarfs: automatically update super block flag efi: Add tee-based EFI variable driver efi: Add EFI_ACCESS_DENIED status code efi: expose efivar generic ops register function efivarfs: Move efivarfs list into superblock s_fs_info efivarfs: Free s_fs_info on unmount efivarfs: Move efivar availability check into FS context init efivarfs: force RO when remounting if SetVariable is not supported
2024-01-10Merge tag 'platform-drivers-x86-v6.8-1' of ↵Linus Torvalds4-14/+117
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - Intel PMC / PMT / TPMI / uncore-freq / vsec improvements and new platform support - AMD PMC / PMF improvements and new platform support - AMD ACPI based Wifi band RFI mitigation feature (WBRF) - WMI bus driver cleanups and improvements (Armin Wolf) - acer-wmi Predator PHN16-71 support - New Silicom network appliance EC LEDs / GPIOs driver * tag 'platform-drivers-x86-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (96 commits) platform/x86/amd/pmc: Modify SMU message port for latest AMD platform platform/x86/amd/pmc: Add 1Ah family series to STB support list platform/x86/amd/pmc: Add idlemask support for 1Ah family platform/x86/amd/pmc: call amd_pmc_get_ip_info() during driver probe platform/x86/amd/pmc: Add VPE information for AMDI000A platform platform/x86/amd/pmc: Send OS_HINT command for AMDI000A platform platform/x86/amd/pmf: Return a status code only as a constant in two functions platform/x86/amd/pmf: Return directly after a failed apmf_if_call() in apmf_sbios_heartbeat_notify() platform/x86: wmi: linux/wmi.h: fix Excess kernel-doc description warning platform/x86/intel/pmc: Add missing extern platform/x86/intel/pmc/lnl: Add GBE LTR ignore during suspend platform/x86/intel/pmc/arl: Add GBE LTR ignore during suspend platform/x86: intel-uncore-freq: Add additional client processors platform/x86: Remove "X86 PLATFORM DRIVERS - ARCH" from MAINTAINERS platform/x86: hp-bioscfg: Removed needless asm-generic platform/x86/intel/pmc: Add Lunar Lake M support to intel_pmc_core driver platform/x86/intel/pmc: Add Arrow Lake S support to intel_pmc_core driver platform/x86/intel/pmc: Add ssram_init flag in PMC discovery in Meteor Lake platform/x86/intel/pmc: Move common code to core.c platform/x86/intel/pmc: Add PSON residency counter for Alder Lake ...
2024-01-10Merge tag 'pm-6.8-rc1' of ↵Linus Torvalds2-17/+13
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add support for new processors (Sierra Forest, Grand Ridge and Meteor Lake) to the intel_idle driver, make intel_pstate run on Emerald Rapids without HWP support and adjust it to utilize EPP values supplied by the platform firmware, fix issues, clean up code and improve documentation. The most significant fix addresses deadlocks in the core system-wide resume code that occur if async_schedule_dev() attempts to run its argument function synchronously (for example, due to a memory allocation failure). It rearranges the code in question which may increase the system resume time in some cases, but this basically is a removal of a premature optimization. That optimization will be added back later, but properly this time. Specifics: - Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs to the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui) - Do not enable interrupts when entering idle in the haltpoll cpuidle driver (Borislav Petkov) - Add Emerald Rapids support in no-HWP mode to the intel_pstate cpufreq driver (Zhenguo Yao) - Use EPP values programmed by the platform firmware as balanced performance ones by default in intel_pstate (Srinivas Pandruvada) - Add a missing function return value check to the SCMI cpufreq driver to avoid unexpected behavior (Alexandra Diupina) - Fix parameter type warning in the armada-8k cpufreq driver (Gregory CLEMENT) - Rework trans_stat_show() in the devfreq core code to avoid buffer overflows (Christian Marangi) - Synchronize devfreq_monitor_[start/stop] so as to prevent a timer list corruption from occurring when devfreq governors are switched frequently (Mukesh Ojha) - Fix possible deadlocks in the core system-wide PM code that occur if device-handling functions cannot be executed asynchronously during resume from system-wide suspend (Rafael J. Wysocki) - Clean up unnecessary local variable initializations in multiple places in the hibernation code (Wang chaodong, Li zeming) - Adjust core hibernation code to avoid missing wakeup events that occur after saving an image to persistent storage (Chris Feng) - Update hibernation code to enforce correct ordering during image compression and decompression (Hongchen Zhang) - Use kmap_local_page() instead of kmap_atomic() in copy_data_page() during hibernation and restore (Chen Haonan) - Adjust documentation and code comments to reflect recent tasks freezer changes (Kevin Hao) - Repair excess function parameter description warning in the hibernation image-saving code (Randy Dunlap) - Fix _set_required_opps when opp is NULL (Bryan O'Donoghue) - Use device_get_match_data() in the OPP code for TI (Rob Herring) - Clean up OPP level and other parts and call dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar)" * tag 'pm-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) OPP: Rename 'rate_clk_single' OPP: Pass rounded rate to _set_opp() OPP: Relocate dev_pm_opp_sync_regulators() PM: sleep: Fix possible deadlocks in core system-wide PM code OPP: Move dev_pm_opp_icc_bw to internal opp.h async: Introduce async_schedule_dev_nocall() async: Split async_schedule_node_domain() cpuidle: haltpoll: Do not enable interrupts when entering idle OPP: Fix _set_required_opps when opp is NULL OPP: The level field is always of unsigned int type PM: hibernate: Repair excess function parameter description warning PM: sleep: Remove obsolete comment from unlock_system_sleep() cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes PM: hibernate: Use kmap_local_page() in copy_data_page() intel_idle: add Sierra Forest SoC support intel_idle: add Grand Ridge SoC support PM / devfreq: Synchronize devfreq_monitor_[start/stop] cpufreq: armada-8k: Fix parameter type warning PM: hibernate: Enforce ordering during image compression/decompression ...
2024-01-10Merge tag 'thermal-6.8-rc1' of ↵Linus Torvalds2-5/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These add support for the D1/T113s THS controller to the sun8i driver and a DT-based mechanism for platforms to indicate a preference to reboot (instead of shutting down) on crossing a critical trip point, fix issues, make other improvements (in the IPA governor, the Intel HFI driver, the exynos driver and the thermal netlink interface among other places) and clean up code. One long-standing issue addressed here is that trip point crossing notifications sent to user space might be unreliable due to the incorrect handling of trip point hysteresis in the thermal core: multiple notifications might be sent for the same event or there might be events without any notification at all. Specifics: - Add dynamic thresholds for trip point crossing detection to prevent trip point crossing notifications from being sent at incorrect times or not at all in some cases (Rafael J. Wysocki) - Fix synchronization issues related to the resume of thermal zones during a system-wide resume and allow thermal zones to be resumed concurrently (Rafael J. Wysocki) - Modify the thermal zone unregistration to wait for the given zone to go away completely before returning to the caller and rework the sysfs interface for trip points on top of that (Rafael J. Wysocki) - Fix a possible NULL pointer dereference in thermal zone registration error path (Rafael J. Wysocki) - Clean up the IPA thermal governor and modify it (with the help of a new governor callback) to avoid allocating and freeing memory every time its throttling callback is invoked (Lukasz Luba) - Make the IPA thermal governor handle thermal instance weight changes via sysfs correctly (Lukasz Luba) - Update the thermal netlink code to avoid sending messages if there are no recipients (Stanislaw Gruszka) - Convert Mediatek Thermal to the json-schema (Rafał Miłecki) - Fix thermal DT bindings issue on Loongson (Binbin Zhou) - Fix returning NULL instead of -ENODEV during thermal probe on Loogsoon (Binbin Zhou) - Add thermal DT binding for tsens on the SM8650 platform (Neil Armstrong) - Add reboot on the critical trip point crossing option feature (Fabio Estevam) - Use DEFINE_SIMPLE_DEV_PM_OPS do define PM functions for thermal suspend/resume on AmLogic (Uwe Kleine-König) - Add D1/T113s THS controller support to the Sun8i thermal control driver (Maxim Kiselev) - Fix example in the thermal DT binding for QCom SPMI (Johan Hovold) - Fix compilation warning in the tmon utility (Florian Eckert) - Add support for interrupt-based thermal configuration on Exynos along with a set of related cleanups (Mateusz Majewski) - Make the Intel HFI thermal driver enable an HFI instance (eg. processor package) from its first online CPU and disable it when the last CPU in it goes offline (Ricardo Neri) - Fix a kernel-doc warning and a spello in the cpuidle_cooling thermal driver (Randy Dunlap) - Move the .get_temp() thermal zone callback presence check to the thermal zone registration code (Daniel Lezcano) - Use the for_each_trip() macro for trip points table walks in a few places in the thermal core (Rafael J. Wysocki) - Make all trip point updates (via sysfs as well as from the platform firmware) trigger trip change notifications (Rafael J. Wysocki) - Drop redundant code from the thermal core and make one function in it take a const pointer argument (Rafael J. Wysocki)" * tag 'thermal-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) thermal: trip: Constify thermal zone argument of thermal_zone_trip_id() thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline thermal: intel: hfi: Enable an HFI instance from its first online CPU thermal: intel: hfi: Refactor enabling code into helper functions thermal/drivers/exynos: Use set_trips ops thermal/drivers/exynos: Use BIT wherever possible thermal/drivers/exynos: Split initialization of TMU and the thermal zone thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210 thermal/drivers/exynos: Simplify regulator (de)initialization thermal/drivers/exynos: Handle devm_regulator_get_optional return value correctly thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to threaded interrupts thermal/drivers/exynos: Drop id field thermal/drivers/exynos: Remove an unnecessary field description tools/thermal/tmon: Fix compilation warning for wrong format dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examples dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node names thermal/drivers/sun8i: Add D1/T113s THS controller support dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controller thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions thermal: amlogic: Make amlogic_thermal_disable() return void ...
2024-01-10Merge tag 'acpi-6.8-rc1' of ↵Linus Torvalds5-26/+188
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "From the new features standpoint, the most significant change here is the addition of CSI-2 and MIPI DisCo for Imaging support to the ACPI device enumeration code that will allow MIPI cameras to be enumerated through the platform firmware on systems using ACPI. Also significant is the switch-over to threaded interrupt handlers for the ACPI SCI and the dedicated EC interrupt (on systems where the former is not used) which essentially allows all ACPI code to run with local interrupts enabled. That should improve responsiveness significantly on systems where multiple GPEs are enabled and the handling of one SCI involves many I/O address space accesses which previously had to be carried out in one go with disabled interrupts on the local CPU. Apart from the above, the ACPI thermal zone driver will use the Thermal fast Sampling Period (_TFP) object if available, which should allow temperature changes to be followed more accurately on some systems, the ACPI Notify () handlers can run on all CPUs (not just on CPU0), which should generally speed up the processing of events signaled through the ACPI SCI, and the ACPI power button driver will trigger wakeup key events via the input subsystem (on systems where it is a system wakeup device) In addition to that, there are the usual bunch of fixes and cleanups. Specifics: - Add CSI-2 and DisCo for Imaging support to the ACPI device enumeration code (Sakari Ailus, Rafael J. Wysocki) - Adjust the cpufreq thermal reduction algorithm in the ACPI processor driver for Tegra241 (Srikar Srimath Tirumala, Arnd Bergmann) - Make acpi_proc_quirk_mwait_check() x86-specific (Rafael J. Wysocki) - Switch over ACPI to using a threaded interrupt handler for the SCI (Rafael J. Wysocki) - Allow ACPI Notify () handlers to run on all CPUs and clean up the ACPI interface for deferred events processing (Rafael J. Wysocki) - Switch over the ACPI EC driver to using a threaded handler for the dedicated IRQ on systems without the EC GPE (Rafael J. Wysocki) - Adjust code using ACPICA spinlocks and the ACPI EC driver spinlock to keep local interrupts on (Rafael J. Wysocki) - Adjust the USB4 _OSC handshake to correctly handle cases in which certain types of OS control are denied by the platform (Mika Westerberg) - Correct and clean up the generic function for parsing ACPI data-only tables with array structure (Yuntao Wang) - Modify acpi_dev_uid_match() to support different types of its second argument and adjust its users accordingly (Raag Jadav) - Clean up code related to acpi_evaluate_reference() and ACPI device lists (Rafael J. Wysocki) - Use generic ACPI helpers for evaluating trip point temperature objects in the ACPI thermal zone driver (Rafael J. Wysockii, Arnd Bergmann) - Add Thermal fast Sampling Period (_TFP) support to the ACPI thermal zone driver (Jeff Brasen) - Modify the ACPI LPIT table handling code to avoid u32 multiplication overflows in state residency computations (Nikita Kiryushin) - Drop an unused helper function from the ACPI backlight (video) driver and add a clarifying comment to it (Hans de Goede) - Update the ACPI backlight driver to avoid using uninitialized memory in some cases (Nikita Kiryushin) - Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo Qiu) - Add support for vendor-defined error types to the ACPI APEI error injection code (Avadhut Naik) - Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous memory failure events, so they are handled differently from the asynchronous ones (Shuai Xue) - Fix NULL pointer dereference check in the ACPI extlog driver (Prarit Bhargava) - Adjust the ACPI extlog driver to clear the Extended Error Log status when RAS_CEC handled the error (Tony Luck) - Add IRQ override quirks for some Infinity laptops and for TongFang GMxXGxx (David McFarland, Hans de Goede) - Clean up the ACPI NUMA code and fix it to ensure that fake_pxm is not the same as one of the real pxm values (Yuntao Wang) - Fix the fractional clock divider flags in the ACPI LPSS (Intel SoC) driver so as to prevent miscalculation of the values in the clock divider (Andy Shevchenko) - Adjust comments in the ACPI watchdog driver to prevent kernel-doc from complaining during documentation builds (Randy Dunlap) - Make the ACPI button driver send wakeup key events to user space in addition to power button events on systems that can be woken up by the power button (Ken Xue) - Adjust pnpacpi_parse_allocated_vendor() to use memcpy() on a full structure field (Dmitry Antipov)" * tag 'acpi-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits) ACPI: resource: Add Infinity laptops to irq1_edge_low_force_override ACPI: button: trigger wakeup key events ACPI: resource: Add another DMI match for the TongFang GMxXGxx ACPI: EC: Use a spin lock without disabing interrupts ACPI: EC: Use a threaded handler for dedicated IRQ ACPI: OSL: Use spin locks without disabling interrupts ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events ACPI: utils: Introduce helper for _DEP list lookup ACPI: utils: Fix white space in struct acpi_handle_list definition ACPI: utils: Refine acpi_handle_list_equal() slightly ACPI: utils: Return bool from acpi_evaluate_reference() ACPI: utils: Rearrange in acpi_evaluate_reference() ACPI: arm64: export acpi_arch_thermal_cpufreq_pctg() ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error ACPI: LPSS: Fix the fractional clock divider flags ACPI: NUMA: Fix the logic of getting the fake_pxm value ACPI: NUMA: Optimize the check for the availability of node values ACPI: NUMA: Remove unnecessary check in acpi_parse_gi_affinity() ACPI: watchdog: fix kernel-doc warnings ACPI: extlog: fix NULL pointer dereference check ...
2024-01-10Merge tag 'mtd/for-6.8' of ↵Linus Torvalds1-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd updates from Miquel Raynal: "MTD: - Apart from preventing the mtdblk to run on top of ftl or ubiblk (which may cause security issues and has no meaning anyway), there are a few misc fixes. Raw NAND: - Two meaningful changes this time. The conversion of the brcmnand driver to the ->exec_op() API, this series brought additional changes to the core in order to help controller drivers to handle themselves the WP pin during destructive operations when relevant. - There is also a series bringing important fixes to the sequential read feature. - As always, there is as well a whole bunch of miscellaneous W=1 fixes, together with a few runtime fixes (double free, timeout value, OOB layout, missing register initialization) and the usual load of remove callbacks turned into void (which led to switch the txx9ndfmc driver to use module_platform_driver()). SPI NOR: - SPI NOR comes with die erase support for multi die flashes, with new octal protocols (1-1-8 and 1-8-8) parsed from SFDP and with an updated documentation about what the contributors shall consider when proposing flash additions or updates. - Michael Walle stepped out from the reviewer role to maintainer" * tag 'mtd/for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (39 commits) mtd: rawnand: Clarify conditions to enable continuous reads mtd: rawnand: Prevent sequential reads with on-die ECC engines mtd: rawnand: Fix core interference with sequential reads mtd: rawnand: Prevent crossing LUN boundaries during sequential reads mtd: Fix gluebi NULL pointer dereference caused by ftl notifier dt-bindings: mtd: partitions: u-boot: Fix typo mtd: rawnand: s3c2410: fix Excess struct member description kernel-doc warnings MAINTAINERS: change my mail to the kernel.org one mtd: spi-nor: sfdp: get the 1-1-8 and 1-8-8 protocol from SFDP mtd: spi-nor: drop superfluous debug prints mtd: spi-nor: sysfs: hide the flash name if not set mtd: spi-nor: mark the flash name as obsolete mtd: spi-nor: print flash ID instead of name mtd: maps: vmu-flash: Fix the (mtd core) switch to ref counters mtd: ssfdc: Remove an unused variable mtd: rawnand: diskonchip: fix a potential double free in doc_probe mtd: rawnand: rockchip: Add missing title to a kernel doc comment mtd: rawnand: rockchip: Rename a structure mtd: rawnand: pl353: Fix kernel doc mtd: spi-nor: micron-st: Add support for mt25qu01g ...
2024-01-10Merge tag 'spi-v6.8' of ↵Linus Torvalds2-16/+51
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "A moderately busy release for SPI, the main core update was the merging of support for multiple chip selects, used in some flash configurations. There were also big overhauls for the AXI SPI Engine and PL022 drivers, plus some new device support for ST. There's a few patches for other trees, API updates to allow the multiple chip select support and one of the naming modernisations touched a controller embedded in the USB code. - Support for multiple chip selects. - A big overhaul for the AXI SPI engine driver, modernising it and adding a bunch of new features. - Modernisation of the PL022 driver, fixing some issues with submitting messages while in atomic context in the process. - Many drivers were converted to use new APIs which avoid outdated terminology for devices and controllers. - Support for ST Microelectronics STM32F7 and STM32MP25, and Renesas RZ/Five" * tag 'spi-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (83 commits) spi: stm32: add st,stm32mp25-spi compatible supporting STM32MP25 soc dt-bindings: spi: stm32: add st,stm32mp25-spi compatible spi: stm32: use dma_get_slave_caps prior to configuring dma channel spi: axi-spi-engine: fix struct member doc warnings spi: pl022: update description of internal_cs_control() spi: pl022: delete description of cur_msg spi: dw: Remove Intel Thunder Bay SOC support spi: dw: Remove Intel Thunder Bay SOC support spi: sh-msiof: Enforce fixed DTDL for R-Car H3 spi: ljca: switch to use devm_spi_alloc_host() spi: cs42l43: switch to use devm_spi_alloc_host() spi: zynqmp-gqspi: switch to use modern name spi: zynq-qspi: switch to use modern name spi: xtensa-xtfpga: switch to use modern name spi: xlp: switch to use modern name spi: xilinx: switch to use modern name spi: xcomm: switch to use modern name spi: uniphier: switch to use modern name spi: topcliff-pch: switch to use modern name spi: wpcm-fiu: switch to use devm_spi_alloc_host() ...
2024-01-10Merge tag 'regulator-v6.8' of ↵Linus Torvalds4-52/+110
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The main updates for this release are around monitoring of regulators, largely for error handling purposes. We allow the stream of regulator events to be seen by userspace as netlink events and allow system integrators to describe individual regulators as system critical with information on how long the system is expected to last on error. The system level error handling is very much about best effort problem mitigation rather than providing something fully robust, the initial drive was to provide a mechanism for trying to avoid initiating any new writes to flash once we notice the power going out. Otherwise it's very quiet, mainly several new Qualcomm devices. - Support for marking regulators as system critical and providing information on how long the system might last with those regulators in a failure state, hooked into the existing critical shutdown error handling. - Optional support for generating netlink events for events, there are use cases for system monitoring UIs and error handling. - A command line option to leave unused controllable regulators enabled, useful for debugging. We already only disable regulators we were explicitly given permission to control. - Support for Quacomm MP5496, PM8010 and PM8937" * tag 'regulator-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (31 commits) regulator: event: Ensure atomicity for sequence number uapi: regulator: Fix typo regulator: Reuse LINEAR_RANGE() in REGULATOR_LINEAR_RANGE() dt-bindings: regulator: qcom,usb-vbus-regulator: clean up example regulator: qcom_smd: Add LDO5 MP5496 regulator regulator: qcom-rpmh: add support for pm8010 regulators regulator: dt-bindings: qcom,rpmh: add compatible for pm8010 regulator: qcom-rpmh: extend to support multiple linear voltage ranges regulator: wm8350: Convert to platform remove callback returning void regulator: virtual: Convert to platform remove callback returning void regulator: userspace-consumer: Convert to platform remove callback returning void regulator: uniphier: Convert to platform remove callback returning void regulator: stm32-vrefbuf: Convert to platform remove callback returning void regulator: db8500-prcmu: Convert to platform remove callback returning void regulator: bd9571mwv: Convert to platform remove callback returning void regulator: arizona-ldo1: Convert to platform remove callback returning void regulator: event: Add regulator netlink event support regulator: event: Add regulator netlink event support regulator: stpmic1: Fix kernel-doc notation warnings regulator: palmas: remove redundant initialization of pointer pdata ...
2024-01-10Merge tag 'integrity-v6.8' of ↵Linus Torvalds2-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: - Add a new IMA/EVM maintainer and reviewer - Disable EVM on overlayfs The EVM HMAC and the original file signatures contain filesystem specific metadata (e.g. i_ino, i_generation and s_uuid), preventing the security.evm xattr from directly being copied up to the overlay. Further before calculating and writing out the overlay file's EVM HMAC, EVM must first verify the existing backing file's 'security.evm' value. For now until a solution is developed, disable EVM on overlayfs. - One bug fix and two cleanups * tag 'integrity-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: overlay: disable EVM evm: add support to disable EVM on unsupported filesystems evm: don't copy up 'security.evm' xattr MAINTAINERS: Add Eric Snowberg as a reviewer to IMA MAINTAINERS: Add Roberto Sassu as co-maintainer to IMA and EVM KEYS: encrypted: Add check for strsep ima: Remove EXPERIMENTAL from Kconfig ima: Reword IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY
2024-01-09Merge tag 'lsm-pr-20240105' of ↵Linus Torvalds6-9/+175
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull security module updates from Paul Moore: - Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and lsm_set_self_attr(). The first syscall simply lists the LSMs enabled, while the second and third get and set the current process' LSM attributes. Yes, these syscalls may provide similar functionality to what can be found under /proc or /sys, but they were designed to support multiple, simultaneaous (stacked) LSMs from the start as opposed to the current /proc based solutions which were created at a time when only one LSM was allowed to be active at a given time. We have spent considerable time discussing ways to extend the existing /proc interfaces to support multiple, simultaneaous LSMs and even our best ideas have been far too ugly to support as a kernel API; after +20 years in the kernel, I felt the LSM layer had established itself enough to justify a handful of syscalls. Support amongst the individual LSM developers has been nearly unanimous, with a single objection coming from Tetsuo (TOMOYO) as he is worried that the LSM_ID_XXX token concept will make it more difficult for out-of-tree LSMs to survive. Several members of the LSM community have demonstrated the ability for out-of-tree LSMs to continue to exist by picking high/unused LSM_ID values as well as pointing out that many kernel APIs rely on integer identifiers, e.g. syscalls (!), but unfortunately Tetsuo's objections remain. My personal opinion is that while I have no interest in penalizing out-of-tree LSMs, I'm not going to penalize in-tree development to support out-of-tree development, and I view this as a necessary step forward to support the push for expanded LSM stacking and reduce our reliance on /proc and /sys which has occassionally been problematic for some container users. Finally, we have included the linux-api folks on (all?) recent revisions of the patchset and addressed all of their concerns. - Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit ioctls on 64-bit systems problem. This patch includes support for all of the existing LSMs which provide ioctl hooks, although it turns out only SELinux actually cares about the individual ioctls. It is worth noting that while Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this patch, they did both indicate they are okay with the changes. - Fix a potential memory leak in the CALIPSO code when IPv6 is disabled at boot. While it's good that we are fixing this, I doubt this is something users are seeing in the wild as you need to both disable IPv6 and then attempt to configure IPv6 labeled networking via NetLabel/CALIPSO; that just doesn't make much sense. Normally this would go through netdev, but Jakub asked me to take this patch and of all the trees I maintain, the LSM tree seemed like the best fit. - Update the LSM MAINTAINERS entry with additional information about our process docs, patchwork, bug reporting, etc. I also noticed that the Lockdown LSM is missing a dedicated MAINTAINERS entry so I've added that to the pull request. I've been working with one of the major Lockdown authors/contributors to see if they are willing to step up and assume a Lockdown maintainer role; hopefully that will happen soon, but in the meantime I'll continue to look after it. - Add a handful of mailmap entries for Serge Hallyn and myself. * tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits) lsm: new security_file_ioctl_compat() hook lsm: Add a __counted_by() annotation to lsm_ctx.ctx calipso: fix memory leak in netlbl_calipso_add_pass() selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test MAINTAINERS: add an entry for the lockdown LSM MAINTAINERS: update the LSM entry mailmap: add entries for Serge Hallyn's dead accounts mailmap: update/replace my old email addresses lsm: mark the lsm_id variables are marked as static lsm: convert security_setselfattr() to use memdup_user() lsm: align based on pointer length in lsm_fill_user_ctx() lsm: consolidate buffer size handling into lsm_fill_user_ctx() lsm: correct error codes in security_getselfattr() lsm: cleanup the size counters in security_getselfattr() lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation lsm: drop LSM_ID_IMA LSM: selftests for Linux Security Module syscalls SELinux: Add selfattr hooks AppArmor: Add selfattr hooks Smack: implement setselfattr and getselfattr hooks ...
2024-01-09Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of ↵Linus Torvalds9-26/+17
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Quite a lot of kexec work this time around. Many singleton patches in many places. The notable patch series are: - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio conversions for file paths'. - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2: Folio conversions for directory paths'. - IA64 remnant removal in Heiko Carstens's 'Remove unused code after IA-64 removal'. - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere in 'Treewide: enable -Wmissing-prototypes'. This had some followup fixes: - Nathan Chancellor has cleaned up the hexagon build in the series 'hexagon: Fix up instances of -Wmissing-prototypes'. - Nathan also addressed some s390 warnings in 's390: A couple of fixes for -Wmissing-prototypes'. - Arnd Bergmann addresses the same warnings for MIPS in his series 'mips: address -Wmissing-prototypes warnings'. - Baoquan He has made kexec_file operate in a top-down-fitting manner similar to kexec_load in the series 'kexec_file: Load kernel at top of system RAM if required' - Baoquan He has also added the self-explanatory 'kexec_file: print out debugging message if required'. - Some checkstack maintenance work from Tiezhu Yang in the series 'Modify some code about checkstack'. - Douglas Anderson has disentangled the watchdog code's logging when multiple reports are occurring simultaneously. The series is 'watchdog: Better handling of concurrent lockups'. - Yuntao Wang has contributed some maintenance work on the crash code in 'crash: Some cleanups and fixes'" * tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits) crash_core: fix and simplify the logic of crash_exclude_mem_range() x86/crash: use SZ_1M macro instead of hardcoded value x86/crash: remove the unused image parameter from prepare_elf_headers() kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE scripts/decode_stacktrace.sh: strip unexpected CR from lines watchdog: if panicking and we dumped everything, don't re-enable dumping watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting watchdog/hardlockup: adopt softlockup logic avoiding double-dumps kexec_core: fix the assignment to kimage->control_page x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init() lib/trace_readwrite.c:: replace asm-generic/io with linux/io nilfs2: cpfile: fix some kernel-doc warnings stacktrace: fix kernel-doc typo scripts/checkstack.pl: fix no space expression between sp and offset x86/kexec: fix incorrect argument passed to kexec_dprintk() x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs nilfs2: add missing set_freezable() for freezable kthread kernel: relay: remove relay_file_splice_read dead code, doesn't work docs: submit-checklist: remove all of "make namespacecheck" ...
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds34-475/+1317
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09Merge tag 'slab-for-6.8' of ↵Linus Torvalds4-349/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLUB: delayed freezing of CPU partial slabs (Chengming Zhou) Freezing is an operation involving double_cmpxchg() that makes a slab exclusive for a particular CPU. Chengming noticed that we use it also in situations where we are not yet installing the slab as the CPU slab, because freezing also indicates that the slab is not on the shared list. This results in redundant freeze/unfreeze operation and can be avoided by marking separately the shared list presence by reusing the PG_workingset flag. This approach neatly avoids the issues described in 9b1ea29bc0d7 ("Revert "mm, slub: consider rest of partial list if acquire_slab() fails"") as we can now grab a slab from the shared list in a quick and guaranteed way without the cmpxchg_double() operation that amplifies the lock contention and can fail. As a result, lkp has reported 34.2% improvement of stress-ng.rawudp.ops_per_sec - SLAB removal and SLUB cleanups (Vlastimil Babka) The SLAB allocator has been deprecated since 6.5 and nobody has objected so far. We agreed at LSF/MM to wait until the next LTS, which is 6.6, so we should be good to go now. This doesn't yet erase all traces of SLAB outside of mm/ so some dead code, comments or documentation remain, and will be cleaned up gradually (some series are already in the works). Removing the choice of allocators has already allowed to simplify and optimize the code wiring up the kmalloc APIs to the SLUB implementation. * tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits) mm/slub: free KFENCE objects in slab_free_hook() mm/slub: handle bulk and single object freeing separately mm/slub: introduce __kmem_cache_free_bulk() without free hooks mm/slub: fix bulk alloc and free stats mm/slub: optimize free fast path code layout mm/slub: optimize alloc fastpath code layout mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers mm/slab: move kmalloc() functions from slab_common.c to slub.c mm/slab: move kmalloc_slab() to mm/slab.h mm/slab: move kfree() from slab_common.c to slub.c mm/slab: move struct kmem_cache_node from slab.h to slub.c mm/slab: move memcg related functions from slab.h to slub.c mm/slab: move pre/post-alloc hooks from slab.h to slub.c mm/slab: consolidate includes in the internal mm/slab.h mm/slab: move the rest of slub_def.h to mm/slab.h mm/slab: move struct kmem_cache_cpu declaration to slub.c mm/slab: remove mm/slab.c and slab_def.h mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs mm/slab: remove CONFIG_SLAB code from slab common code cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks ...
2024-01-09Merge tag 'cgroup-for-6.8' of ↵Linus Torvalds5-6/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Yafang Shao added task_get_cgroup1() helper to enable a similar BPF helper so that BPF progs can be more useful on cgroup1 hierarchies. While cgroup1 is mostly in maintenance mode, this addition is very small while having an outsized usefulness for users who are still on cgroup1. Yafang also optimized root cgroup list access by making it RCU protected in the process. - Waiman Long optimized rstat operation leading to substantially lower and more consistent lock hold time while flushing the hierarchical statistics. As the lock can be acquired briefly in various hot paths, this reduction has cascading benefits. - Waiman also improved the quality of isolation for cpuset's isolated partitions. CPUs which are allocated to isolated partitions are now excluded from running unbound work items and cpu_is_isolated() test which is used by vmstat and memcg to reduce interference now includes cpuset isolated CPUs. While it isn't there yet, the hope is eventually reaching parity with the isolation level provided by the `isolcpus` boot param but in a dynamic manner. * tag 'cgroup-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Move rcu_head up near the top of cgroup_root cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check cgroup: Avoid false cacheline sharing of read mostly rstat_cpu cgroup/rstat: Optimize cgroup_rstat_updated_list() cgroup: Fix documentation for cpu.idle cgroup/cpuset: Expose cpuset.cpus.isolated workqueue: Move workqueue_set_unbound_cpumask() and its helpers inside CONFIG_SYSFS cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask cgroup/cpuset: Keep track of CPUs in isolated partitions selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask selftests: cgroup: Fixes a typo in a comment cgroup: Add a new helper for cgroup1 hierarchy cgroup: Add annotation for holding namespace_sem in current_cgns_cgroup_from_root() cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show() cgroup: Make operations on the cgroup root_list RCU safe cgroup: Remove unnecessary list_empty()
2024-01-09Merge tag 'sched-core-2024-01-08' of ↵Linus Torvalds8-53/+66
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Energy scheduling: - Consolidate how the max compute capacity is used in the scheduler and how we calculate the frequency for a level of utilization. - Rework interface between the scheduler and the schedutil governor - Simplify the util_est logic Deadline scheduler: - Work more towards reducing SCHED_DEADLINE starvation of low priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks monopolize CPU cycles, via the introduction of 'deadline servers' (nested/2-level scheduling). "Fair servers" to make use of this facility are not introduced yet. EEVDF: - Introduce O(1) fastpath for EEVDF task selection NUMA balancing: - Tune the NUMA-balancing vma scanning logic some more, to better distribute the probability of a particular vma getting scanned. Plus misc fixes, cleanups and updates" * tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) sched/fair: Fix tg->load when offlining a CPU sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair() sched/fair: Use all little CPUs for CPU-bound workloads sched/fair: Simplify util_est sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) arm64/amu: Use capacity_ref_freq() to set AMU ratio cpufreq/cppc: Set the frequency used for computing the capacity cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() energy_model: Use a fixed reference frequency cpufreq/schedutil: Use a fixed reference frequency cpufreq: Use the fixed and coherent frequency for scaling capacity sched/topology: Add a new arch_scale_freq_ref() method freezer,sched: Clean saved_state when restoring it during thaw sched/fair: Update min_vruntime for reweight_entity() correctly sched/doc: Update documentation after renames and synchronize Chinese version sched/cpufreq: Rework iowait boost sched/cpufreq: Rework schedutil governor performance estimation sched/pelt: Avoid underestimation of task utilization sched/timers: Explain why idle task schedules out on remote timer enqueue sched/cpuidle: Comment about timers requirements VS idle handler ...
2024-01-09Merge tag 'perf-core-2024-01-08' of ↵Linus Torvalds2-1/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Add branch stack counters ABI extension to better capture the growing amount of information the PMU exposes via branch stack sampling. There's matching tooling support. - Fix race when creating the nr_addr_filters sysfs file - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU support - Misc cleanups & fixes * tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Factor out topology_gidnid_map() perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology() perf/x86/amd: Reject branch stack for IBS events perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge perf/x86/intel/uncore: Support IIO free-running counters on GNR perf/x86/intel/uncore: Support Granite Rapids perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR perf: Fix the nr_addr_filters fix perf/x86/intel/cstate: Add Grand Ridge support perf/x86/intel/cstate: Add Sierra Forest support x86/smp: Export symbol cpu_clustergroup_mask() perf/x86/intel/cstate: Cleanup duplicate attr_groups perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file perf/x86/intel: Support branch counters logging perf/x86/intel: Reorganize attrs and is_visible perf: Add branch_sample_call_stack perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag perf: Add branch stack counters
2024-01-09Merge tag 'timers-core-2024-01-08' of ↵Linus Torvalds1-10/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer subsystem updates from Ingo Molnar: - Various preparatory cleanups & enhancements of the timer-wheel code, in preparation for the WIP 'pull timers at expiry' timer migration model series (which will replace the current 'push timers at enqueue' migration model), by Anna-Maria Behnsen: - Update comments and clean up confusing variable names - Add debug check to warn about time travel - Improve/expand timer-wheel tracepoints - Optimize away unnecessary IPIs for deferrable timers - Restructure & clean up next_expiry_recalc() - Clean up forward_timer_base() - Introduce __forward_timer_base() and use it to simplify and micro-optimize get_next_timer_interrupt() - Restructure the get_next_timer_interrupt()'s idle logic for better readability and to enable a minor optimization. - Fix the nextevt calculation when no timers are pending - Fix the sysfs_get_uname() prototype declaration * tag 'timers-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Fix nextevt calculation when no timers are pending timers: Rework idle logic timers: Use already existing function for forwarding timer base timers: Split out forward timer base functionality timers: Clarify check in forward_timer_base() timers: Move store of next event into __next_timer_interrupt() timers: Do not IPI for deferrable timers tracing/timers: Add tracepoint for tracking timer base is_idle flag tracing/timers: Enhance timer_start tracepoint tick-sched: Warn when next tick seems to be in the past tick/sched: Cleanup confusing variables tick-sched: Fix function names in comments time: Make sysfs_get_uname() function visible in header
2024-01-09Merge tag 'smp-core-2024-01-08' of ↵Linus Torvalds1-15/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Ingo Molnar: - Remove unused CPU hotplug states - Increase the number of dynamic CPU hotplug states from 30 to 40, because existing drivers can exhaust the allocation space * tag 'smp-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Increase the number of dynamic states cpu/hotplug: Remove unused CPU hotplug states
2024-01-09Merge tag 'core-entry-2024-01-08' of ↵Linus Torvalds1-4/+91
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic syscall updates from Ingo Molnar: "Move various entry functions from kernel/entry/common.c to a header file, and always-inline them, to improve syscall entry performance on s390 by ~11%" * tag 'core-entry-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Move syscall_enter_from_user_mode() to header file entry: Move enter_from_user_mode() to header file entry: Move exit to usermode functions to header file
2024-01-09Merge tag 'locking-core-2024-01-08' of ↵Linus Torvalds6-9/+99
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molar: "Lock guards: - Use lock guards in the ptrace code - Introduce conditional guards to extend to conditional lock primitives like mutex_trylock()/mutex_lock_interruptible()/etc. lockdep: - Optimize 'struct lock_class' to be smaller - Update file patterns in MAINTAINERS mutexes: - Document mutex lifetime rules a bit more" * tag 'locking-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked locking/mutex: Document that mutex_unlock() is non-atomic ptrace: Convert ptrace_attach() to use lock guards locking/lockdep: Slightly reorder 'struct lock_class' to save some memory MAINTAINERS: Add include/linux/lockdep*.h cleanup: Add conditional guard support
2024-01-09Merge tag 'arm64-upstream' of ↵Linus Torvalds5-20/+48
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "CPU features: - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde Thunder-X machines - Avoid mapping KPTI trampoline when it is not required - Make CPU capability API more robust during early initialisation Early idreg overrides: - Remove dependencies on core kernel helpers from the early command-line parsing logic in preparation for moving this code before the kernel is mapped FPsimd: - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd code sequences in the kernel with pre-emption enabled KBuild: - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y - Makefile cleanups LPA2 prep: - Preparatory work for enabling the 'LPA2' extension, which will introduce 52-bit virtual and physical addressing even with 4KiB pages (including for KVM guests). Misc: - Remove dead code and fix a typo MM: - Pass NUMA node information for IRQ stack allocations Perf: - Add perf support for the Synopsys DesignWare PCIe PMU - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced in Armv8.8 - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver. - Minor PMU driver fixes and optimisations RIP VPIPT: - Remove what support we had for the obsolete VPIPT I-cache policy Selftests: - Improvements to the SVE and SME selftests Stacktrace: - Refactor kernel unwind logic so that it can used by BPF unwinding and, eventually, reliable backtracing Sysregs: - Update a bunch of register definitions based on the latest XML drop from Arm" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits) kselftest/arm64: Don't probe the current VL for unsupported vector types efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad arm64: properly install vmlinuz.efi arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64: memory: remove duplicated include arm: perf: Fix ARCH=arm build with GCC arm64: Align boot cpucap handling with system cpucap handling arm64: Cleanup system cpucap handling MAINTAINERS: add maintainers for DesignWare PCIe PMU driver drivers/perf: add DesignWare PCIe PMU driver PCI: Move pci_clear_and_set_dword() helper to PCI header PCI: Add Alibaba Vendor ID to linux/pci_ids.h docs: perf: Add description for Synopsys DesignWare PCIe PMU driver arm64: irq: set the correct node for shadow call stack Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: fpsimd: Preserve/restore kernel mode NEON at context switch ...
2024-01-09Merge tag 'm68k-for-v6.8-tag1' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - make the NuBus bus type static and constant - defconfig updates * tag 'm68k-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.7-rc1 nubus: Make nubus_bus_type static and constant
2024-01-09mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov4-12/+12
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-09mm, treewide: introduce NR_PAGE_ORDERSKirill A. Shutemov2-3/+5
NR_PAGE_ORDERS defines the number of page orders supported by the page allocator, ranging from 0 to MAX_ORDER, MAX_ORDER + 1 in total. NR_PAGE_ORDERS assists in defining arrays of page orders and allows for more natural iteration over them. [kirill.shutemov@linux.intel.com: fixup for kerneldoc warning] Link: https://lkml.kernel.org/r/20240101111512.7empzyifq7kxtzk3@box Link: https://lkml.kernel.org/r/20231228144704.14033-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-09Merge tag 'edac_updates_for_v6.8' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - The EDAC drivers part of the effort to make the ->remove() platform driver callback return void - Add support for AMD AI accelerators - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P, Meteor Lake-{P,PS} - Random fixes and cleanups all over the place * tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits) EDAC/skx_common: Filter out the invalid address EDAC, pnd2: Sort headers alphabetically EDAC, pnd2: Correct misleading error message in mk_region_mask() EDAC, pnd2: Apply bit macros and helpers where it makes sense EDAC, pnd2: Replace custom definition by one from sizes.h EDAC/igen6: Add Intel Meteor Lake-P SoCs support EDAC/igen6: Add Intel Meteor Lake-PS SoCs support EDAC/igen6: Add Intel Raptor Lake-P SoCs support EDAC/igen6: Add Intel Alder Lake-N SoCs support EDAC/igen6: Make get_mchbar() helper function EDAC/amd64: Add support for family 0x19, models 0x90-9f devices EDAC/mc: Add support for HBM3 memory type EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer EDAC/armada_xp: Explicitly include correct DT includes EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals EDAC/thunderx: Fix possible out-of-bounds string access EDAC/fsl_ddr: Convert to platform remove callback returning void EDAC/zynqmp: Convert to platform remove callback returning void EDAC/xgene: Convert to platform remove callback returning void EDAC/ti: Convert to platform remove callback returning void ...
2024-01-08Merge tag 'vfs-6.8.iov_iter' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iov_iter cleanups from Christian Brauner: "This contains a minor cleanup. The patches drop an unused argument from import_single_range() allowing to replace import_single_range() with import_ubuf() and dropping import_single_range() completely" * tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iov_iter: replace import_single_range() with import_ubuf() iov_iter: remove unused 'iov' argument from import_single_range()
2024-01-08Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds4-45/+169
Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...
2024-01-08Merge tag 'vfs-6.8.mount' of ↵Linus Torvalds5-4/+88
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains the work to retrieve detailed information about mounts via two new system calls. This is hopefully the beginning of the end of the saga that started with fsinfo() years ago. The LWN articles in [1] and [2] can serve as a summary so we can avoid rehashing everything here. At LSFMM in May 2022 we got into a room and agreed on what we want to do about fsinfo(). Basically, split it into pieces. This is the first part of that agreement. Specifically, it is concerned with retrieving information about mounts. So this only concerns the mount information retrieval, not the mount table change notification, or the extended filesystem specific mount option work. That is separate work. Currently mounts have a 32bit id. Mount ids are already in heavy use by libmount and other low-level userspace but they can't be relied upon because they're recycled very quickly. We agreed that mounts should carry a unique 64bit id by which they can be referenced directly. This is now implemented as part of this work. The new 64bit mount id is exposed in statx() through the new STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is returned. If it is raised and the kernel supports the new 64bit mount id the flag is raised in the result mask and the new 64bit mount id is returned. New and old mount ids do not overlap so they cannot be conflated. Two new system calls are introduced that operate on the 64bit mount id: statmount() and listmount(). A summary of the api and usage can be found on LWN as well (cf. [3]) but of course, I'll provide a summary here as well. Both system calls rely on struct mnt_id_req. Which is the request struct used to pass the 64bit mount id identifying the mount to operate on. It is extensible to allow for the addition of new parameters and for future use in other apis that make use of mount ids. statmount() mimicks the semantics of statx() and exposes a set flags that userspace may raise in mnt_id_req to request specific information to be retrieved. A statmount() call returns a struct statmount filled in with information about the requested mount. Supported requests are indicated by raising the request flag passed in struct mnt_id_req in the @mask argument in struct statmount. Currently we do support: - STATMOUNT_SB_BASIC: Basic filesystem info - STATMOUNT_MNT_BASIC Mount information (mount id, parent mount id, mount attributes etc) - STATMOUNT_PROPAGATE_FROM Propagation from what mount in current namespace - STATMOUNT_MNT_ROOT Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla) - STATMOUNT_MNT_POINT Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt) - STATMOUNT_FS_TYPE Name of the filesystem type as the magic number isn't enough due to submounts The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE are appended to the end of the struct. Userspace can use the offsets in @fs_type, @mnt_root, and @mnt_point to reference those strings easily. The struct statmount reserves quite a bit of space currently for future extensibility. This isn't really a problem and if this bothers us we can just send a follow-up pull request during this cycle. listmount() is given a 64bit mount id via mnt_id_req just as statmount(). It takes a buffer and a size to return an array of the 64bit ids of the child mounts of the requested mount. Userspace can thus choose to either retrieve child mounts for a mount in batches or iterate through the child mounts. For most use-cases it will be sufficient to just leave space for a few child mounts. But for big mount tables having an iterator is really helpful. Iterating through a mount table works by setting @param in mnt_id_req to the mount id of the last child mount retrieved in the previous listmount() call" Link: https://lwn.net/Articles/934469 [1] Link: https://lwn.net/Articles/829212 [2] Link: https://lwn.net/Articles/950569 [3] * tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: add selftest for statmount/listmount fs: keep struct mnt_id_req extensible wire up syscalls for statmount/listmount add listmount(2) syscall statmount: simplify string option retrieval statmount: simplify numeric option retrieval add statmount(2) syscall namespace: extract show_path() helper mounts: keep list of mounts in an rbtree add unique mount ID
2024-01-08Merge tag 'vfs-6.8.super' of ↵Linus Torvalds3-15/+41
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
2024-01-08Merge tag 'vfs-6.8.misc' of ↵Linus Torvalds8-44/+64
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Add Jan Kara as VFS reviewer - Show correct device and inode numbers in proc/<pid>/maps for vma files on stacked filesystems. This is now easily doable thanks to the backing file work from the last cycles. This comes with selftests Cleanups: - Remove a redundant might_sleep() from wait_on_inode() - Initialize pointer with NULL, not 0 - Clarify comment on access_override_creds() - Rework and simplify eventfd_signal() and eventfd_signal_mask() helpers - Process aio completions in batches to avoid needless wakeups - Completely decouple struct mnt_idmap from namespaces. We now only keep the actual idmapping around and don't stash references to namespaces - Reformat maintainer entries to indicate that a given subsystem belongs to fs/ - Simplify fput() for files that were never opened - Get rid of various pointless file helpers - Rename various file helpers - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from last cycle - Make relatime_need_update() return bool - Use GFP_KERNEL instead of GFP_USER when allocating superblocks - Replace deprecated ida_simple_*() calls with their current ida_*() counterparts Fixes: - Fix comments on user namespace id mapping helpers. They aren't kernel doc comments so they shouldn't be using /** - s/Retuns/Returns/g in various places - Add missing parameter documentation on can_move_mount_beneath() - Rename i_mapping->private_data to i_mapping->i_private_data - Fix a false-positive lockdep warning in pipe_write() for watch queues - Improve __fget_files_rcu() code generation to improve performance - Only notify writer that pipe resizing has finished after setting pipe->max_usage otherwise writers are never notified that the pipe has been resized and hang - Fix some kernel docs in hfsplus - s/passs/pass/g in various places - Fix kernel docs in ntfs - Fix kcalloc() arguments order reported by gcc 14 - Fix uninitialized value in reiserfs" * tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) reiserfs: fix uninit-value in comp_keys watch_queue: fix kcalloc() arguments order ntfs: dir.c: fix kernel-doc function parameter warnings fs: fix doc comment typo fs tree wide selftests/overlayfs: verify device and inode numbers in /proc/pid/maps fs/proc: show correct device and inode numbers in /proc/pid/maps eventfd: Remove usage of the deprecated ida_simple_xx() API fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation fs/hfsplus: wrapper.c: fix kernel-doc warnings fs: add Jan Kara as reviewer fs/inode: Make relatime_need_update return bool pipe: wakeup wr_wait after setting max_usage file: remove __receive_fd() file: stop exposing receive_fd_user() fs: replace f_rcuhead with f_task_work file: remove pointless wrapper file: s/close_fd_get_file()/file_close_fd()/g Improve __fget_files_rcu() code generation (and thus __fget_light()) file: massage cleanup of files that failed to open fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() ...
2024-01-08asm-generic: make sparse happy with odd-sized put_unaligned_*()Dmitry Torokhov1-12/+12
__put_unaligned_be24() and friends use implicit casts to convert larger-sized data to bytes, which trips sparse truncation warnings when the argument is a constant: CC [M] drivers/input/touchscreen/hynitron_cstxxx.o CHECK drivers/input/touchscreen/hynitron_cstxxx.c drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h): include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0) include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1) include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0) include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0) To avoid this let's mask off upper bits explicitly, the resulting code should be exactly the same, but it will keep sparse happy. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/oe-kbuild-all/202401070147.gqwVulOn-lkp@intel.com/ Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08Merge branch 'pm-sleep'Rafael J. Wysocki1-0/+2
Merge system-wide power management updates for 6.8-rc1: - Fix possible deadlocks in the core system-wide PM code that occur if device-handling functions cannot be executed asynchronously during resune from system-wide suspend (Rafael J. Wysocki). - Clean up unnecessary local variable initializations in multiple places in the hibernation code (Wang chaodong, Li zeming). - Adjust core hibernation code to avoid missing wakeup events that occur after saving an image to persistent storage (Chris Feng). - Update hibernation code to enforce correct ordering during image compression and decompression (Hongchen Zhang). - Use kmap_local_page() instead of kmap_atomic() in copy_data_page() during hibernation and restore (Chen Haonan). - Adjust documentation and code comments to reflect recent task freezer changes (Kevin Hao). - Repair excess function parameter description warning in the hibernation image-saving code (Randy Dunlap). * pm-sleep: PM: sleep: Fix possible deadlocks in core system-wide PM code async: Introduce async_schedule_dev_nocall() async: Split async_schedule_node_domain() PM: hibernate: Repair excess function parameter description warning PM: sleep: Remove obsolete comment from unlock_system_sleep() Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes PM: hibernate: Use kmap_local_page() in copy_data_page() PM: hibernate: Enforce ordering during image compression/decompression PM: hibernate: Avoid missing wakeup events during hibernation PM: hibernate: Do not initialize error in snapshot_write_next() PM: hibernate: Do not initialize error in swap_write_page() PM: hibernate: Drop unnecessary local variable initialization
2024-01-08Merge tag 'opp-updates-6.8' of ↵Rafael J. Wysocki1-17/+11
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Merge OPP (Operating Performance Points) updates for 6.8 from Viresh Kumar: "- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue). - ti: Use device_get_match_data() (Rob Herring). - Minor cleanups around OPP level and other parts and call dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar)." * tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: Rename 'rate_clk_single' OPP: Pass rounded rate to _set_opp() OPP: Relocate dev_pm_opp_sync_regulators() OPP: Move dev_pm_opp_icc_bw to internal opp.h OPP: Fix _set_required_opps when opp is NULL OPP: The level field is always of unsigned int type OPP: Check for invalid OPP in dev_pm_opp_find_level_ceil() OPP: Don't set OPP recursively for a parent genpd OPP: Call dev_pm_opp_set_opp() for required OPPs OPP: Use _set_opp_level() for single genpd case OPP: Level zero is valid opp: ti: Use device_get_match_data()
2024-01-08Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes ↵Ingo Molnar7-58/+39
for the v6.8 merge window This fix didn't make it upstream in time, pick it up for the v6.8 merge window. Signed-off-by: Ingo Molnar <mingo@kernel.org>