summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-05-27Merge tag 'modules-5.19-rc1' of ↵Linus Torvalds23-1971/+2276
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: - It was time to tidy up kernel/module.c and one way of starting with that effort was to split it up into files. At my request Aaron Tomlin spearheaded that effort with the goal to not introduce any functional at all during that endeavour. The penalty for the split is +1322 bytes total, +112 bytes in data, +1210 bytes in text while bss is unchanged. One of the benefits of this other than helping make the code easier to read and review is summoning more help on review for changes with livepatching so kernel/module/livepatch.c is now pegged as maintained by the live patching folks. The before and after with just the move on a defconfig on x86-64: $ size kernel/module.o text data bss dec hex filename 38434 4540 104 43078 a846 kernel/module.o $ size -t kernel/module/*.o text data bss dec hex filename 4785 120 0 4905 1329 kernel/module/kallsyms.o 28577 4416 104 33097 8149 kernel/module/main.o 1158 8 0 1166 48e kernel/module/procfs.o 902 108 0 1010 3f2 kernel/module/strict_rwx.o 3390 0 0 3390 d3e kernel/module/sysfs.o 832 0 0 832 340 kernel/module/tree_lookup.o 39644 4652 104 44400 ad70 (TOTALS) - Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING), to enable tracking unloaded modules which did taint the kernel. - Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC which lets architectures to request having modules data in vmalloc area instead of module area. There are three reasons why an architecture might want this: a) On some architectures (like book3s/32) it is not possible to protect against execution on a page basis. The exec stuff can be mapped by different arch segment sizes (on book3s/32 that is 256M segments). By default the module area is in an Exec segment while vmalloc area is in a NoExec segment. Using vmalloc lets you muck with module data as NoExec on those architectures whereas before you could not. b) By pushing more module data to vmalloc you also increase the probability of module text to remain within a closer distance from kernel core text and this reduces trampolines, this has been reported on arm first and powerpc folks are following that lead. c) Free'ing module_alloc() (Exec by default) area leaves this exposed as Exec by default, some architectures have some security enhancements to set this as NoExec on free, and splitting module data with text let's future generic special allocators be added to the kernel without having developers try to grok the tribal knowledge per arch. Work like Rick Edgecombe's permission vmalloc interface [0] becomes easier to address over time. [0] https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r - Masahiro Yamada's symbol search enhancements * tag 'modules-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (33 commits) module: merge check_exported_symbol() into find_exported_symbol_in_section() module: do not binary-search in __ksymtab_gpl if fsa->gplok is false module: do not pass opaque pointer for symbol search module: show disallowed symbol name for inherit_taint() module: fix [e_shstrndx].sh_size=0 OOB access module: Introduce module unload taint tracking module: Move module_assert_mutex_or_preempt() to internal.h module: Make module_flags_taint() accept a module's taints bitmap and usable outside core code module.h: simplify MODULE_IMPORT_NS powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx module: Remove module_addr_min and module_addr_max module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC module: Introduce data_layout module: Prepare for handling several RB trees module: Always have struct mod_tree_root module: Rename debug_align() as strict_align() module: Rework layout alignment to avoid BUG_ON()s module: Move module_enable_x() and frob_text() in strict_rwx.c module: Make module_enable_x() independent of CONFIG_ARCH_HAS_STRICT_MODULE_RWX module: Move version support into a separate file ...
2022-05-27Merge tag 'sysctl-5.19-rc1' of ↵Linus Torvalds16-472/+542
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "For two kernel releases now kernel/sysctl.c has been being cleaned up slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and all this caused merge conflicts with one susbystem or another. This tree was put together to help try to avoid conflicts with these cleanups going on different trees at time. So nothing exciting on this pull request, just cleanups. Thanks a lot to the Uniontech and Huawei folks for doing some of this nasty work" * tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits) sched: Fix build warning without CONFIG_SYSCTL reboot: Fix build warning without CONFIG_SYSCTL kernel/kexec_core: move kexec_core sysctls into its own file sysctl: minor cleanup in new_dir() ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n fs/proc: Introduce list_for_each_table_entry for proc sysctl mm: fix unused variable kernel warning when SYSCTL=n latencytop: move sysctl to its own file ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y ftrace: Fix build warning ftrace: move sysctl_ftrace_enabled to ftrace.c kernel/do_mount_initrd: move real_root_dev sysctls to its own file kernel/delayacct: move delayacct sysctls to its own file kernel/acct: move acct sysctls to its own file kernel/panic: move panic sysctls to its own file kernel/lockdep: move lockdep sysctls to its own file mm: move page-writeback sysctls to their own file mm: move oom_kill sysctls to their own file kernel/reboot: move reboot sysctls to its own file sched: Move energy_aware sysctls to topology.c ...
2022-05-26Merge tag 'mm-stable-2022-05-25' of ↵Linus Torvalds3-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
2022-05-26Merge tag 'kbuild-v5.19' of ↵Linus Torvalds2-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add HOSTPKG_CONFIG env variable to allow users to override pkg-config - Support W=e as a shorthand for KCFLAGS=-Werror - Fix CONFIG_IKHEADERS build to support toybox cpio - Add scripts/dummy-tools/pahole to ease distro packagers' life - Suppress false-positive warnings from checksyscalls.sh for W=2 build - Factor out the common code of arch/*/boot/install.sh into scripts/install.sh - Support 'kernel-install' tool in scripts/prune-kernel - Refactor module-versioning to link the symbol versions at the final link of vmlinux and modules - Remove CONFIG_MODULE_REL_CRCS because module-versioning now works in an arch-agnostic way - Refactor modpost, Makefiles * tag 'kbuild-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (56 commits) genksyms: adjust the output format to modpost kbuild: stop merging *.symversions kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS modpost: extract symbol versions from *.cmd files modpost: add sym_find_with_module() helper modpost: change the license of EXPORT_SYMBOL to bool type modpost: remove left-over cross_compile declaration kbuild: record symbol versions in *.cmd files kbuild: generate a list of objects in vmlinux modpost: move *.mod.c generation to write_mod_c_files() modpost: merge add_{intree_flag,retpoline,staging_flag} to add_header scripts/prune-kernel: Use kernel-install if available kbuild: factor out the common installation code into scripts/install.sh modpost: split new_symbol() to symbol allocation and hash table addition modpost: make sym_add_exported() always allocate a new symbol modpost: make multiple export error modpost: dump Module.symvers in the same order of modules.order modpost: traverse the namespace_list in order modpost: use doubly linked list for dump_lists modpost: traverse unresolved symbols in order ...
2022-05-26Merge tag 'fsnotify_for_v5.19-rc1' of ↵Linus Torvalds3-20/+21
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "The biggest part of this is support for fsnotify inode marks that don't pin inodes in memory but rather get evicted together with the inode (they are useful if userspace needs to exclude receipt of events from potentially large subtrees using fanotify ignore marks). There is also a fix for more consistent handling of events sent to parent and a fix of sparse(1) complaints" * tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix incorrect fmode_t casts fsnotify: consistent behavior for parent not watching children fsnotify: introduce mark type iterator fanotify: enable "evictable" inode marks fanotify: use fsnotify group lock helpers fanotify: implement "evictable" inode marks fanotify: factor out helper fanotify_mark_update_flags() fanotify: create helper fanotify_mark_user_flags() fsnotify: allow adding an inode mark without pinning inode dnotify: use fsnotify group lock helpers nfsd: use fsnotify group lock helpers audit: use fsnotify group lock helpers inotify: use fsnotify group lock helpers fsnotify: create helpers for group mark_mutex lock fsnotify: make allow_dups a property of the group fsnotify: pass flags argument to fsnotify_alloc_group() fsnotify: fix wrong lockdep annotations inotify: move control flags from mask to mark flags inotify: show inotify mask flags in proc fdinfo
2022-05-26Merge tag 'dma-mapping-5.19-2022-05-25' of ↵Linus Torvalds4-112/+109
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - don't over-decrypt memory (Robin Murphy) - takes min align mask into account for the swiotlb max mapping size (Tianyu Lan) - use GFP_ATOMIC in dma-debug (Mikulas Patocka) - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me) - don't fail on highmem CMA pages in dma_direct_alloc_pages (me) - cleanup swiotlb initialization and share more code with swiotlb-xen (me, Stefano Stabellini) * tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits) dma-direct: don't over-decrypt memory swiotlb: max mapping size takes min align mask into account swiotlb: use the right nslabs-derived sizes in swiotlb_init_late swiotlb: use the right nslabs value in swiotlb_init_remap swiotlb: don't panic when the swiotlb buffer can't be allocated dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm x86: remove cruft from <asm/dma-mapping.h> swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl swiotlb: merge swiotlb-xen initialization into swiotlb swiotlb: provide swiotlb_init variants that remap the buffer swiotlb: pass a gfp_mask argument to swiotlb_init_late swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction swiotlb: make the swiotlb_init interface more useful x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled x86: remove the IOMMU table infrastructure MIPS/octeon: use swiotlb_init instead of open coding it arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region swiotlb: rename swiotlb_late_init_with_default_size ...
2022-05-26Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+1
Pull drm updates from Dave Airlie: "Intel have enabled DG2 on certain SKUs for laptops, AMD has started some new GPU support, msm has user allocated VA controls dma-buf: - add dma_resv_replace_fences - add dma_resv_get_singleton - make dma_excl_fence private core: - EDID parser refactorings - switch drivers to drm_mode_copy/duplicate - DRM managed mutex initialization display-helper: - put HDMI, SCDC, HDCP, DSC and DP into new module gem: - rework fence handling ttm: - rework bulk move handling - add common debugfs for resource managers - convert to kvcalloc format helpers: - support monochrome formats - RGB888, RGB565 to XRGB8888 conversions fbdev: - cfb/sys_imageblit fixes - pagelist corruption fix - create offb platform device - deferred io improvements sysfb: - Kconfig rework - support for VESA mode selection bridge: - conversions to devm_drm_of_get_bridge - conversions to panel_bridge - analogix_dp - autosuspend support - it66121 - audio support - tc358767 - DSI to DPI support - icn6211 - PLL/I2C fixes, DT property - adv7611 - enable DRM_BRIDGE_OP_HPD - anx7625 - fill ELD if no monitor - dw_hdmi - add audio support - lontium LT9211 support, i.MXMP LDB - it6505: Kconfig fix, DPCD set power fix - adv7511 - CEC support for ADV7535 panel: - ltk035c5444t, B133UAN01, NV3052C panel support - DataImage FG040346DSSWBG04 support - st7735r - DT bindings fix - ssd130x - fixes i915: - DG2 laptop PCI-IDs ("motherboard down") - Initial RPL-P PCI IDs - compute engine ABI - DG2 Tile4 support - DG2 CCS clear color compression support - DG2 render/media compression formats support - ATS-M platform info - RPL-S PCI IDs added - Bump ADL-P DMC version to v2.16 - Support static DRRS - Support multiple eDP/LVDS native mode refresh rates - DP HDR support for HSW+ - Lots of display refactoring + fixes - GuC hwconfig support and query - sysfs support for multi-tile - fdinfo per-client gpu utilisation - add geometry subslices query - fix prime mmap with LMEM - fix vm open count and remove vma refcounts - contiguous allocation fixes - steered register write support - small PCI BAR enablement - GuC error capture support - sunset igpu legacy mmap support for newer devices - GuC version 70.1.1 support amdgpu: - Initial SoC21 support - SMU 13.x enablement - SMU 13.0.4 support - ttm_eu cleanups - USB-C, GPUVM updates - TMZ fixes for RV - RAS support for VCN - PM sysfs code cleanup - DC FP rework - extend CG/PG flags to 64-bit - SI dpm lockdep fix - runtime PM fixes amdkfd: - RAS/SVM fixes - TLB flush fixes - CRIU GWS support - ignore bogus MEC signals more efficiently msm: - Fourcc modifier for tiled but not compressed layouts - Support for userspace allocated IOVA (GPU virtual address) - DPU: DSC (Display Stream Compression) support - DP: eDP support - DP: conversion to use drm_bridge and drm_bridge_connector - Merge DPU1 and MDP5 MDSS driver - DPU: writeback support nouveau: - make some structures static - make some variables static - switch to drm_gem_plane_helper_prepare_fb radeon: - misc fixes/cleanups mxsfb: - rework crtc mode setting - LCDIF CRC support etnaviv: - fencing improvements - fix address space collisions - cleanup MMU reference handling gma500: - GEM/GTT improvements - connector handling fixes komeda: - switch to plane reset helper mediatek: - MIPI DSI improvements omapdrm: - GEM improvements qxl: - aarch64 support vc4: - add a CL submission tracepoint - HDMI YUV support - HDMI/clock improvements - drop is_hdmi caching virtio: - remove restriction of non-zero blob types vmwgfx: - support for cursormob and cursorbypass 4 - fence improvements tidss: - reset DISPC on startup solomon: - SPI support - DT improvements sun4i: - allwinner D1 support - drop is_hdmi caching imx: - use swap() instead of open-coding - use devm_platform_ioremap_resource - remove redunant initializations ast: - Displayport support rockchip: - Refactor IOMMU initialisation - make some structures static - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi - support swapped YUV formats, - clock improvements - rk3568 support - VOP2 support mediatek: - MT8186 support tegra: - debugabillity improvements" * tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm: (1740 commits) drm/i915/dsi: fix VBT send packet port selection for ICL+ drm/i915/uc: Fix undefined behavior due to shift overflowing the constant drm/i915/reg: fix undefined behavior due to shift overflowing the constant drm/i915/gt: Fix use of static in macro mismatch drm/i915/audio: fix audio code enable/disable pipe logging drm/i915: Fix CFI violation with show_dynamic_id() drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c drm/i915/gt: Fix build error without CONFIG_PM drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path drm/msm/dpu: add DRM_MODE_ROTATE_180 back to supported rotations drm/msm: don't free the IRQ if it was not requested drm/msm/dpu: limit writeback modes according to max_linewidth drm/amd: Don't reset dGPUs if the system is going to s2idle drm/amdgpu: Unmap legacy queue when MES is enabled drm: msm: fix possible memory leak in mdp5_crtc_cursor_set() drm/msm: Fix fb plane offset calculation drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init drm/msm/dsi: don't powerup at modeset time for parade-ps8640 drm/rockchip: Change register space names in vop2 dt-bindings: display: rockchip: make reg-names mandatory for VOP2 ...
2022-05-25Merge tag 'net-next-5.19' of ↵Linus Torvalds33-683/+2705
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core ---- - Support TCPv6 segmentation offload with super-segments larger than 64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP). - Generalize skb freeing deferral to per-cpu lists, instead of per-socket lists. - Add a netdev statistic for packets dropped due to L2 address mismatch (rx_otherhost_dropped). - Continue work annotating skb drop reasons. - Accept alternative netdev names (ALT_IFNAME) in more netlink requests. - Add VLAN support for AF_PACKET SOCK_RAW GSO. - Allow receiving skb mark from the socket as a cmsg. - Enable memcg accounting for veth queues, sysctl tables and IPv6. BPF --- - Add libbpf support for User Statically-Defined Tracing (USDTs). - Speed up symbol resolution for kprobes multi-link attachments. - Support storing typed pointers to referenced and unreferenced objects in BPF maps. - Add support for BPF link iterator. - Introduce access to remote CPU map elements in BPF per-cpu map. - Allow middle-of-the-road settings for the kernel.unprivileged_bpf_disabled sysctl. - Implement basic types of dynamic pointers e.g. to allow for dynamically sized ringbuf reservations without extra memory copies. Protocols --------- - Retire port only listening_hash table, add a second bind table hashed by port and address. Avoid linear list walk when binding to very popular ports (e.g. 443). - Add bridge FDB bulk flush filtering support allowing user space to remove all FDB entries matching a condition. - Introduce accept_unsolicited_na sysctl for IPv6 to implement router-side changes for RFC9131. - Support for MPTCP path manager in user space. - Add MPTCP support for fallback to regular TCP for connections that have never connected additional subflows or transmitted out-of-sequence data (partial support for RFC8684 fallback). - Avoid races in MPTCP-level window tracking, stabilize and improve throughput. - Support lockless operation of GRE tunnels with seq numbers enabled. - WiFi support for host based BSS color collision detection. - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets. - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2). - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile). - Allow matching on the number of VLAN tags via tc-flower. - Add tracepoint for tcp_set_ca_state(). Driver API ---------- - Improve error reporting from classifier and action offload. - Add support for listing line cards in switches (devlink). - Add helpers for reporting page pool statistics with ethtool -S. - Add support for reading clock cycles when using PTP virtual clocks, instead of having the driver convert to time before reporting. This makes it possible to report time from different vclocks. - Support configuring low-latency Tx descriptor push via ethtool. - Separate Clause 22 and Clause 45 MDIO accesses more explicitly. New hardware / drivers ---------------------- - Ethernet: - Marvell's Octeon NIC PCI Endpoint support (octeon_ep) - Sunplus SP7021 SoC (sp7021_emac) - Add support for Renesas RZ/V2M (in ravb) - Add support for MediaTek mt7986 switches (in mtk_eth_soc) - Ethernet PHYs: - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting) - TI DP83TD510 PHY - Microchip LAN8742/LAN88xx PHYs - WiFi: - Driver for pureLiFi X, XL, XC devices (plfxlc) - Driver for Silicon Labs devices (wfx) - Support for WCN6750 (in ath11k) - Support Realtek 8852ce devices (in rtw89) - Mobile: - MediaTek T700 modems (Intel 5G 5000 M.2 cards) - CAN: - ctucanfd: add support for CTU CAN FD open-source IP core from Czech Technical University in Prague Drivers ------- - Delete a number of old drivers still using virt_to_bus(). - Ethernet NICs: - intel: support TSO on tunnels MPLS - broadcom: support multi-buffer XDP - nfp: support VF rate limiting - sfc: use hardware tx timestamps for more than PTP - mlx5: multi-port eswitch support - hyper-v: add support for XDP_REDIRECT - atlantic: XDP support (including multi-buffer) - macb: improve real-time perf by deferring Tx processing to NAPI - High-speed Ethernet switches: - mlxsw: implement basic line card information querying - prestera: add support for traffic policing on ingress and egress - Embedded Ethernet switches: - lan966x: add support for packet DMA (FDMA) - lan966x: add support for PTP programmable pins - ti: cpsw_new: enable bc/mc storm prevention - Qualcomm 802.11ax WiFi (ath11k): - Wake-on-WLAN support for QCA6390 and WCN6855 - device recovery (firmware restart) support - support setting Specific Absorption Rate (SAR) for WCN6855 - read country code from SMBIOS for WCN6855/QCA6390 - enable keep-alive during WoWLAN suspend - implement remain-on-channel support - MediaTek WiFi (mt76): - support Wireless Ethernet Dispatch offloading packet movement between the Ethernet switch and WiFi interfaces - non-standard VHT MCS10-11 support - mt7921 AP mode support - mt7921 IPv6 NS offload support - Ethernet PHYs: - micrel: ksz9031/ksz9131: cabletest support - lan87xx: SQI support for T1 PHYs - lan937x: add interrupt support for link detection" * tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits) ptp: ocp: Add firmware header checks ptp: ocp: fix PPS source selector debugfs reporting ptp: ocp: add .init function for sma_op vector ptp: ocp: vectorize the sma accessor functions ptp: ocp: constify selectors ptp: ocp: parameterize input/output sma selectors ptp: ocp: revise firmware display ptp: ocp: add Celestica timecard PCI ids ptp: ocp: Remove #ifdefs around PCI IDs ptp: ocp: 32-bit fixups for pci start address Revert "net/smc: fix listen processing for SMC-Rv2" ath6kl: Use cc-disable-warning to disable -Wdangling-pointer selftests/bpf: Dynptr tests bpf: Add dynptr data slices bpf: Add bpf_dynptr_read and bpf_dynptr_write bpf: Dynptr support for ring buffers bpf: Add bpf_dynptr_from_mem for local dynptrs bpf: Add verifier support for dynptrs bpf: Suppress 'passing zero to PTR_ERR' warning bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack ...
2022-05-25Merge branch 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+1
Pull workqueue update from Tejun Heo: "A lone commit fixing CPU offline handling for per-cpu wq workers so that they don't bother isolated CPUs" * 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs
2022-05-25Merge branch 'for-5.19' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. This adds cpu controller selftests and there are a couple code cleanup patches" * 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: remove the superfluous judgment cgroup: Make cgroup_debug static kseltest/cgroup: Make test_stress.sh work if run interactively kselftest/cgroup: fix test_stress.sh to use OUTPUT dir cgroup: Add config file to cgroup selftest suite cgroup: Add test_cpucg_max_nested() testcase cgroup: Add test_cpucg_max() testcase cgroup: Add test_cpucg_nested_weight_underprovisioned() testcase cgroup: Adding test_cpucg_nested_weight_overprovisioned() testcase cgroup: Add test_cpucg_weight_underprovisioned() testcase cgroup: Add test_cpucg_weight_overprovisioned() testcase cgroup: Add test_cpucg_stats() testcase to cgroup cpu selftests cgroup: Add new test_cpu.c test suite in cgroup selftests
2022-05-25Merge tag 'linux-kselftest-kunit-5.19-rc1' of ↵Linus Torvalds1-18/+13
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: "Several fixes, cleanups, and enhancements to tests and framework: - introduce _NULL and _NOT_NULL macros to pointer error checks - rework kunit_resource allocation policy to fix memory leaks when caller doesn't specify free() function to be used when allocating memory using kunit_add_resource() and kunit_alloc_resource() funcs. - add ability to specify suite-level init and exit functions" * tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits) kunit: tool: Use qemu-system-i386 for i386 runs kunit: fix executor OOM error handling logic on non-UML kunit: tool: update riscv QEMU config with new serial dependency kcsan: test: use new suite_{init,exit} support kunit: tool: Add list of all valid test configs on UML kunit: take `kunit_assert` as `const` kunit: tool: misc cleanups kunit: tool: minor cosmetic cleanups in kunit_parser.py kunit: tool: make parser stop overwriting status of suites w/ no_tests kunit: tool: remove dead parse_crash_in_log() logic kunit: tool: print clearer error message when there's no TAP output kunit: tool: stop using a shell to run kernel under QEMU kunit: tool: update test counts summary line format kunit: bail out of test filtering logic quicker if OOM lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS kunit: Rework kunit_resource allocation policy kunit: fix debugfs code to use enum kunit_status, not bool kfence: test: use new suite_{init/exit} support, add .kunitconfig kunit: add ability to specify suite-level init and exit functions kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite) ...
2022-05-25Merge tag 'printk-for-5.19' of ↵Linus Torvalds7-291/+953
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Offload writing printk() messages on consoles to per-console kthreads. It prevents soft-lockups when an extensive amount of messages is printed. It was observed, for example, during boot of large systems with a lot of peripherals like disks or network interfaces. It prevents live-lockups that were observed, for example, when messages about allocation failures were reported and a CPU handled consoles instead of reclaiming the memory. It was hard to solve even with rate limiting because it would need to take into account the amount of messages and the speed of all consoles. It is a must to have for real time. Otherwise, any printk() might break latency guarantees. The per-console kthreads allow to handle each console on its own speed. Slow consoles do not longer slow down faster ones. And printk() does not longer unpredictably slows down various code paths. There are situations when the kthreads are either not available or not reliable, for example, early boot, suspend, or panic. In these situations, printk() uses the legacy mode and tries to handle consoles immediately. - Add documentation for the printk index. * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk, tracing: fix console tracepoint printk: remove @console_locked printk: extend console_lock for per-console locking printk: add kthread console printers printk: add functions to prefer direct printing printk: add pr_flush() printk: move buffer definitions into console_emit_next_record() caller printk: refactor and rework printing logic printk: add con_printk() macro for console details printk: call boot_delay_msec() in printk_delay() printk: get caller_id/timestamp after migration disable printk: wake waiters for safe and NMI contexts printk: wake up all waiters printk: add missing memory barrier to wake_up_klogd() printk: cpu sync always disable interrupts printk: rename cpulock functions printk/index: Printk index feature documentation MAINTAINERS: Add printk indexing maintainers on mention of printk_index
2022-05-25Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-3/+4
Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
2022-05-25Merge tag 'pm-5.19-rc1' of ↵Linus Torvalds5-66/+49
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add support for 'artificial' Energy Models in which power numbers for different entities may be in different scales, add support for some new hardware, fix bugs and clean up code in multiple places. Specifics: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add CPU-based scaling support to passive devfreq governor (Saravana Kannan, Chanwoo Choi). - Update the rk3399_dmc devfreq driver (Brian Norris). - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). - Fix cpufreq governor clean up code to avoid using kfree() directly to free kobject-based items (Kevin Hao). - Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe Leroy). - Make intel_pstate notify frequency invariance code when no_turbo is turned on and off (Chen Yu). - Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas Pandruvada). - Make cpufreq avoid unnecessary frequency updates due to mismatch between hardware and the frequency table (Viresh Kumar). - Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify code (Viresh Kumar). - Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the calling convention for some driver callbacks consistent (Rafael Wysocki). - Avoid accessing half-initialized cpufreq policies from the show() and store() sysfs functions (Schspa Shi). - Rearrange cpufreq_offline() to make the calling convention for some driver callbacks consistent (Schspa Shi). - Update CPPC handling in cpufreq (Pierre Gondois). - Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski). - Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf Hansson). - Improve the way genpd deals with its governors (Ulf Hansson). - Update the turbostat utility to version 2022.04.16 (Len Brown, Dan Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)" * tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits) PM: domains: Trust domain-idle-states from DT to be correct by genpd PM: domains: Measure power-on/off latencies in genpd based on a governor PM: domains: Allocate governor data dynamically based on a genpd governor PM: domains: Clean up some code in pm_genpd_init() and genpd_remove() PM: domains: Fix initialization of genpd's next_wakeup PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd PM: domains: Measure suspend/resume latencies in genpd based on governor PM: domains: Move the next_wakeup variable into the struct gpd_timing_data PM: domains: Allocate gpd_timing_data dynamically based on governor PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain() PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd PM: domains: Drop redundant code for genpd always-on governor PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor powercap: intel_rapl: remove redundant store to value after multiply cpufreq: CPPC: Enable dvfs_possible_from_any_cpu cpufreq: CPPC: Enable fast_switch ACPI: CPPC: Assume no transition latency if no PCCT ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported ACPI: CPPC: Check _OSC for flexible address space ...
2022-05-24Merge tag 'seccomp-v5.19-rc1' of ↵Linus Torvalds1-3/+41
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Rework USER_NOTIF notification ordering and kill logic (Sargun Dhillon) - Improved PTRACE_O_SUSPEND_SECCOMP selftest (Jann Horn) - Gracefully handle failed unshare() in selftests (Yang Guang) - Spelling fix (Colin Ian King) * tag 'seccomp-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Fix spelling mistake "Coud" -> "Could" selftests/seccomp: Add test for wait killable notifier selftests/seccomp: Refactor get_proc_stat to split out file reading code seccomp: Add wait_killable semantic to seccomp user notifier selftests/seccomp: Ensure that notifications come in FIFO order seccomp: Use FIFO semantics to order notifications selftests/seccomp: Add SKIP for failed unshare() selftests/seccomp: Test PTRACE_O_SUSPEND_SECCOMP without CAP_SYS_ADMIN
2022-05-24Merge tag 'kernel-hardening-v5.19-rc1' of ↵Linus Torvalds2-43/+64
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - usercopy hardening expanded to check other allocation types (Matthew Wilcox, Yuanzheng Song) - arm64 stackleak behavioral improvements (Mark Rutland) - arm64 CFI code gen improvement (Sami Tolvanen) - LoadPin LSM block dev API adjustment (Christoph Hellwig) - Clang randstruct support (Bill Wendling, Kees Cook) * tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits) loadpin: stop using bdevname mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr() gcc-plugins: randstruct: Remove cast exception handling af_unix: Silence randstruct GCC plugin warning niu: Silence randstruct warnings big_keys: Use struct for internal payload gcc-plugins: Change all version strings match kernel randomize_kstack: Improve docs on requirements/rationale lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n arm64: entry: use stackleak_erase_on_task_stack() stackleak: add on/off stack variants lkdtm/stackleak: check stack boundaries lkdtm/stackleak: prevent unexpected stack usage lkdtm/stackleak: rework boundary management lkdtm/stackleak: avoid spurious failure stackleak: rework poison scanning stackleak: rework stack high bound handling stackleak: clarify variable names stackleak: rework stack low bound handling stackleak: remove redundant check ...
2022-05-24Merge tag 'random-5.19-rc1-for-linus' of ↵Linus Torvalds2-2/+15
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "These updates continue to refine the work began in 5.17 and 5.18 of modernizing the RNG's crypto and streamlining and documenting its code. New for 5.19, the updates aim to improve entropy collection methods and make some initial decisions regarding the "premature next" problem and our threat model. The cloc utility now reports that random.c is 931 lines of code and 466 lines of comments, not that basic metrics like that mean all that much, but at the very least it tells you that this is very much a manageable driver now. Here's a summary of the various updates: - The random_get_entropy() function now always returns something at least minimally useful. This is the primary entropy source in most collectors, which in the best case expands to something like RDTSC, but prior to this change, in the worst case it would just return 0, contributing nothing. For 5.19, additional architectures are wired up, and architectures that are entirely missing a cycle counter now have a generic fallback path, which uses the highest resolution clock available from the timekeeping subsystem. Some of those clocks can actually be quite good, despite the CPU not having a cycle counter of its own, and going off-core for a stamp is generally thought to increase jitter, something positive from the perspective of entropy gathering. Done very early on in the development cycle, this has been sitting in next getting some testing for a while now and has relevant acks from the archs, so it should be pretty well tested and fine, but is nonetheless the thing I'll be keeping my eye on most closely. - Of particular note with the random_get_entropy() improvements is MIPS, which, on CPUs that lack the c0 count register, will now combine the high-speed but short-cycle c0 random register with the lower-speed but long-cycle generic fallback path. - With random_get_entropy() now always returning something useful, the interrupt handler now collects entropy in a consistent construction. - Rather than comparing two samples of random_get_entropy() for the jitter dance, the algorithm now tests many samples, and uses the amount of differing ones to determine whether or not jitter entropy is usable and how laborious it must be. The problem with comparing only two samples was that if the cycle counter was extremely slow, but just so happened to be on the cusp of a change, the slowness wouldn't be detected. Taking many samples fixes that to some degree. This, combined with the other improvements to random_get_entropy(), should make future unification of /dev/random and /dev/urandom maybe more possible. At the very least, were we to attempt it again today (we're not), it wouldn't break any of Guenter's test rigs that broke when we tried it with 5.18. So, not today, but perhaps down the road, that's something we can revisit. - We attempt to reseed the RNG immediately upon waking up from system suspend or hibernation, making use of the various timestamps about suspend time and such available, as well as the usual inputs such as RDRAND when available. - Batched randomness now falls back to ordinary randomness before the RNG is initialized. This provides more consistent guarantees to the types of random numbers being returned by the various accessors. - The "pre-init injection" code is now gone for good. I suspect you in particular will be happy to read that, as I recall you expressing your distaste for it a few months ago. Instead, to avoid a "premature first" issue, while still allowing for maximal amount of entropy availability during system boot, the first 128 bits of estimated entropy are used immediately as it arrives, with the next 128 bits being buffered. And, as before, after the RNG has been fully initialized, it winds up reseeding anyway a few seconds later in most cases. This resulted in a pretty big simplification of the initialization code and let us remove various ad-hoc mechanisms like the ugly crng_pre_init_inject(). - The RNG no longer pretends to handle the "premature next" security model, something that various academics and other RNG designs have tried to care about in the past. After an interesting mailing list thread, these issues are thought to be a) mainly academic and not practical at all, and b) actively harming the real security of the RNG by delaying new entropy additions after a potential compromise, making a potentially bad situation even worse. As well, in the first place, our RNG never even properly handled the premature next issue, so removing an incomplete solution to a fake problem was particularly nice. This allowed for numerous other simplifications in the code, which is a lot cleaner as a consequence. If you didn't see it before, https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a thread worth skimming through. - While the interrupt handler received a separate code path years ago that avoids locks by using per-cpu data structures and a faster mixing algorithm, in order to reduce interrupt latency, input and disk events that are triggered in hardirq handlers were still hitting locks and more expensive algorithms. Those are now redirected to use the faster per-cpu data structures. - Rather than having the fake-crypto almost-siphash-based random32 implementation be used right and left, and in many places where cryptographically secure randomness is desirable, the batched entropy code is now fast enough to replace that. - As usual, numerous code quality and documentation cleanups. For example, the initialization state machine now uses enum symbolic constants instead of just hard coding numbers everywhere. - Since the RNG initializes once, and then is always initialized thereafter, a pretty heavy amount of code used during that initialization is never used again. It is now completely cordoned off using static branches and it winds up in the .text.unlikely section so that it doesn't reduce cache compactness after the RNG is ready. - A variety of functions meant for waiting on the RNG to be initialized were only used by vsprintf, and in not a particularly optimal way. Replacing that usage with a more ordinary setup made it possible to remove those functions. - A cleanup of how we warn userspace about the use of uninitialized /dev/urandom and uninitialized get_random_bytes() usage. Interestingly, with the change you merged for 5.18 that attempts to use jitter (but does not block if it can't), the majority of users should never see those warnings for /dev/urandom at all now, and the one for in-kernel usage is mainly a debug thing. - The file_operations struct for /dev/[u]random now implements .read_iter and .write_iter instead of .read and .write, allowing it to also implement .splice_read and .splice_write, which makes splice(2) work again after it was broken here (and in many other places in the tree) during the set_fs() removal. This was a bit of a last minute arrival from Jens that hasn't had as much time to bake, so I'll be keeping my eye on this as well, but it seems fairly ordinary. Unfortunately, read_iter() is around 3% slower than read() in my tests, which I'm not thrilled about. But Jens and Al, spurred by this observation, seem to be making progress in removing the bottlenecks on the iter paths in the VFS layer in general, which should remove the performance gap for all drivers. - Assorted other bug fixes, cleanups, and optimizations. - A small SipHash cleanup" * tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits) random: check for signals after page of pool writes random: wire up fops->splice_{read,write}_iter() random: convert to using fops->write_iter() random: convert to using fops->read_iter() random: unify batched entropy implementations random: move randomize_page() into mm where it belongs random: remove mostly unused async readiness notifier random: remove get_random_bytes_arch() and add rng_has_arch_random() random: move initialization functions out of hot pages random: make consistent use of buf and len random: use proper return types on get_random_{int,long}_wait() random: remove extern from functions in header random: use static branch for crng_ready() random: credit architectural init the exact amount random: handle latent entropy and command line from random_init() random: use proper jiffies comparison macro random: remove ratelimiting for in-kernel unseeded randomness random: move initialization out of reseeding hot path random: avoid initializing twice in credit race random: use symbolic constants for crng_init states ...
2022-05-24lockdown: also lock down previous kgdb useDaniel Thompson2-3/+83
KGDB and KDB allow read and write access to kernel memory, and thus should be restricted during lockdown. An attacker with access to a serial port (for example, via a hypervisor console, which some cloud vendors provide over the network) could trigger the debugger so it is important that the debugger respect the lockdown mode when/if it is triggered. Fix this by integrating lockdown into kdb's existing permissions mechanism. Unfortunately kgdb does not have any permissions mechanism (although it certainly could be added later) so, for now, kgdb is simply and brutally disabled by immediately exiting the gdb stub without taking any action. For lockdowns established early in the boot (e.g. the normal case) then this should be fine but on systems where kgdb has set breakpoints before the lockdown is enacted than "bad things" will happen. CVE: CVE-2022-21499 Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-24Merge tag 'sched-core-2022-05-23' of ↵Linus Torvalds16-299/+190
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Updates to scheduler metrics: - PELT fixes & enhancements - PSI fixes & enhancements - Refactor cpu_util_without() - Updates to instrumentation/debugging: - Remove sched_trace_*() helper functions - can be done via debug info - Fix double update_rq_clock() warnings - Introduce & use "preemption model accessors" to simplify some of the Kconfig complexity. - Make softirq handling RT-safe. - Misc smaller fixes & cleanups. * tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: topology: Remove unused cpu_cluster_mask() sched: Reverse sched_class layout sched/deadline: Remove superfluous rq clock update in push_dl_task() sched/core: Avoid obvious double update_rq_clock warning smp: Make softirq handling RT safe in flush_smp_call_function_queue() smp: Rename flush_smp_call_function_from_idle() sched: Fix missing prototype warnings sched/fair: Remove cfs_rq_tg_path() sched/fair: Remove sched_trace_*() helper functions sched/fair: Refactor cpu_util_without() sched/fair: Revise comment about lb decision matrix sched/psi: report zeroes for CPU full at the system level sched/fair: Delete useless condition in tg_unthrottle_up() sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq sched/fair: Move calculate of avg_load to a better location mailmap: Update my email address to @redhat.com MAINTAINERS: Add myself as scheduler topology reviewer psi: Fix trigger being fired unexpectedly at initial ftrace: Use preemption model accessors for trace header printout kcsan: Use preemption model accessors
2022-05-24Merge tag 'perf-core-2022-05-23' of ↵Linus Torvalds2-4/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "Platform PMU changes: - x86/intel: - Add new Intel Alder Lake and Raptor Lake support - x86/amd: - AMD Zen4 IBS extensions support - Add AMD PerfMonV2 support - Add AMD Fam19h Branch Sampling support Generic changes: - signal: Deliver SIGTRAP on perf event asynchronously if blocked Perf instrumentation can be driven via SIGTRAP, but this causes a problem when SIGTRAP is blocked by a task & terminate the task. Allow user-space to request these signals asynchronously (after they get unblocked) & also give the information to the signal handler when this happens: "To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). " - Unify/standardize the /sys/devices/cpu/events/* output format. - Misc fixes & cleanups" * tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) perf/x86/amd/core: Fix reloading events for SVM perf/x86/amd: Run AMD BRS code only on supported hw perf/x86/amd: Fix AMD BRS period adjustment perf/x86/amd: Remove unused variable 'hwc' perf/ibs: Fix comment perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute perf/amd/ibs: Add support for L3 miss filtering perf/amd/ibs: Use ->is_visible callback for dynamic attributes perf/amd/ibs: Cascade pmu init functions' return value perf/x86/uncore: Add new Alder Lake and Raptor Lake support perf/x86/uncore: Clean up uncore_pci_ids[] perf/x86/cstate: Add new Alder Lake and Raptor Lake support perf/x86/msr: Add new Alder Lake and Raptor Lake support perf/x86: Add new Alder Lake and Raptor Lake support perf/amd/ibs: Use interrupt regs ip for stack unwinding perf/x86/amd/core: Add PerfMonV2 overflow handling perf/x86/amd/core: Add PerfMonV2 counter control perf/x86/amd/core: Detect available counters perf/x86/amd/core: Detect PerfMonV2 support x86/msr: Add PerfCntrGlobal* registers ...
2022-05-24Merge tag 'objtool-core-2022-05-23' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Comprehensive interface overhaul: ================================= Objtool's interface has some issues: - Several features are done unconditionally, without any way to turn them off. Some of them might be surprising. This makes objtool tricky to use, and prevents porting individual features to other arches. - The config dependencies are too coarse-grained. Objtool enablement is tied to CONFIG_STACK_VALIDATION, but it has several other features independent of that. - The objtool subcmds ("check" and "orc") are clumsy: "check" is really a subset of "orc", so it has all the same options. The subcmd model has never really worked for objtool, as it only has a single purpose: "do some combination of things on an object file". - The '--lto' and '--vmlinux' options are nonsensical and have surprising behavior. Overhaul the interface: - get rid of subcmds - make all features individually selectable - remove and/or clarify confusing/obsolete options - update the documentation - fix some bugs found along the way - Fix x32 regression - Fix Kbuild cleanup bugs - Add scripts/objdump-func helper script to disassemble a single function from an object file. - Rewrite scripts/faddr2line to be section-aware, by basing it on 'readelf', moving it away from 'nm', which doesn't handle multiple sections well, which can result in decoding failure. - Rewrite & fix symbol handling - which had a number of bugs wrt. object files that don't have global symbols - which is rare but possible. Also fix a bunch of symbol handling bugs found along the way. * tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) objtool: Fix objtool regression on x32 systems objtool: Fix symbol creation scripts/faddr2line: Fix overlapping text section failures scripts: Create objdump-func helper script objtool: Remove libsubcmd.a when make clean objtool: Remove inat-tables.c when make clean objtool: Update documentation objtool: Remove --lto and --vmlinux in favor of --link objtool: Add HAVE_NOINSTR_VALIDATION objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION" objtool: Make noinstr hacks optional objtool: Make jump label hack optional objtool: Make static call annotation optional objtool: Make stack validation frame-pointer-specific objtool: Add CONFIG_OBJTOOL objtool: Extricate sls from stack validation objtool: Rework ibt and extricate from stack validation objtool: Make stack validation optional objtool: Add option to print section addresses objtool: Don't print parentheses in function addresses ...
2022-05-24Merge tag 'locking-core-2022-05-23' of ↵Linus Torvalds14-82/+167
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - rwsem cleanups & optimizations/fixes: - Conditionally wake waiters in reader/writer slowpaths - Always try to wake waiters in out_nolock path - Add try_cmpxchg64() implementation, with arch optimizations - and use it to micro-optimize sched_clock_{local,remote}() - Various force-inlining fixes to address objdump instrumentation-check warnings - Add lock contention tracepoints: lock:contention_begin lock:contention_end - Misc smaller fixes & cleanups * tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote} locking/atomic/x86: Introduce arch_try_cmpxchg64 locking/atomic: Add generic try_cmpxchg64 support futex: Remove a PREEMPT_RT_FULL reference. locking/qrwlock: Change "queue rwlock" to "queued rwlock" lockdep: Delete local_irq_enable_in_hardirq() locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning locking: Apply contention tracepoints in the slow path locking: Add lock contention tracepoints locking/rwsem: Always try to wake waiters in out_nolock path locking/rwsem: Conditionally wake waiters in reader/writer slowpaths locking/rwsem: No need to check for handoff bit if wait queue empty lockdep: Fix -Wunused-parameter for _THIS_IP_ x86/mm: Force-inline __phys_addr_nodebug() x86/kvm/svm: Force-inline GHCB accessors task_stack, x86/cea: Force-inline stack helpers
2022-05-24kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCSMasahiro Yamada1-9/+1
include/{linux,asm-generic}/export.h defines a weak symbol, __crc_* as a placeholder. Genksyms writes the version CRCs into the linker script, which will be used for filling the __crc_* symbols. The linker script format depends on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset to the reference of CRC. It is time to get rid of this complexity. Now that modpost parses text files (.*.cmd) to collect all the CRCs, it can generate C code that will be linked to the vmlinux or modules. Generate a new C file, .vmlinux.export.c, which contains the CRCs of symbols exported by vmlinux. It is compiled and linked to vmlinux in scripts/link-vmlinux.sh. Put the CRCs of symbols exported by modules into the existing *.mod.c files. No additional build step is needed for modules. As before, *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal. No linker magic is used here. The new C implementation works in the same way, whether CONFIG_RELOCATABLE is enabled or not. CONFIG_MODULE_REL_CRCS is no longer needed. Previously, Kbuild invoked additional $(LD) to update the CRCs in objects, but this step is unneeded too. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
2022-05-24Merge tag 'arm64-upstream' of ↵Linus Torvalds3-2/+31
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. No KVM support yet, SME is disabled in guests. - Support for crashkernel reservations above ZONE_DMA via the 'crashkernel=X,high' command line option. - btrfs search_ioctl() fix for live-lock with sub-page faults. - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring coherent I/O traffic, support for Arm's CMN-650 and CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup. - Kselftest updates for SME, BTI, MTE. - Automatic generation of the system register macros from a 'sysreg' file describing the register bitfields. - Update the type of the function argument holding the ESR_ELx register value to unsigned long to match the architecture register size (originally 32-bit but extended since ARMv8.0). - stacktrace cleanups. - ftrace cleanups. - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(), avoid executable mappings in kexec/hibernate code, drop TLB flushing from get_clear_flush() (and rename it to get_clear_contig()), ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits) arm64/sysreg: Generate definitions for FAR_ELx arm64/sysreg: Generate definitions for DACR32_EL2 arm64/sysreg: Generate definitions for CSSELR_EL1 arm64/sysreg: Generate definitions for CPACR_ELx arm64/sysreg: Generate definitions for CONTEXTIDR_ELx arm64/sysreg: Generate definitions for CLIDR_EL1 arm64/sve: Move sve_free() into SVE code section arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: kdump: Do not allocate crash low memory if not needed arm64/sve: Generate ZCR definitions arm64/sme: Generate defintions for SVCR arm64/sme: Generate SMPRI_EL1 definitions arm64/sme: Automatically generate SMPRIMAP_EL2 definitions arm64/sme: Automatically generate SMIDR_EL1 defines arm64/sme: Automatically generate defines for SMCR ...
2022-05-24Merge tag 's390-5.19-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Make use of the IBM z16 processor activity instrumentation facility to count cryptography operations: add a new PMU device driver so that perf can make use of this. - Add new IBM z16 extended counter set to cpumf support. - Add vdso randomization support. - Add missing KCSAN instrumentation to barriers and spinlocks, which should make s390's KCSAN support complete. - Add support for IPL-complete-control facility: notify the hypervisor that kexec finished work and the kernel starts. - Improve error logging for PCI. - Various small changes to workaround llvm's integrated assembler limitations, and one bug, to make it finally possible to compile the kernel with llvm's integrated assembler. This also requires to raise the minimum clang version to 14.0.0. - Various other small enhancements, bug fixes, and cleanups all over the place. * tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/head: get rid of 31 bit leftovers scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390 s390/boot: do not emit debug info for assembly with llvm's IAS s390/boot: workaround llvm IAS bug s390/purgatory: workaround llvm's IAS limitations s390/entry: workaround llvm's IAS limitations s390/alternatives: remove padding generation code s390/alternatives: provide identical sized orginal/alternative sequences s390/cpumf: add new extended counter set for IBM z16 s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES s390/stp: clock_delta should be signed s390/stp: fix todoff size s390/pai: add support for cryptography counters entry: Rename arch_check_user_regs() to arch_enter_from_user_mode() s390/compat: cleanup compat_linux.h header file s390/entry: remove broken and not needed code s390/boot: convert parmarea to C s390/boot: convert initial lowcore to C s390/ptrace: move short psw definitions to ptrace header file s390/head: initialize all new psws ...
2022-05-24Merge tag 'platform-drivers-x86-v5.19-1' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "This includes some small changes to kernel/stop_machine.c and arch/x86 which are deps of the new Intel IFS support. Highlights: - New drivers: - Intel "In Field Scan" (IFS) support - Winmate FM07/FM07P buttons - Mellanox SN2201 support - AMD PMC driver enhancements - Lots of various other small fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits) platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency platform/x86: intel_cht_int33fe: Set driver data platform/x86: intel-hid: fix _DSM function index handling platform/x86: toshiba_acpi: use kobj_to_dev() platform/x86: samsung-laptop: use kobj_to_dev() platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu tools/power/x86/intel-speed-select: Display error on turbo mode disabled Documentation: In-Field Scan platform/x86/intel/ifs: add ABI documentation for IFS trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations platform/x86/intel/ifs: Add IFS sysfs interface platform/x86/intel/ifs: Add scan test support platform/x86/intel/ifs: Authenticate and copy to secured memory platform/x86/intel/ifs: Check IFS Image sanity platform/x86/intel/ifs: Read IFS firmware image platform/x86/intel/ifs: Add stub driver for In-Field Scan stop_machine: Add stop_core_cpuslocked() for per-core operations x86/msr-index: Define INTEGRITY_CAPABILITIES MSR x86/microcode/intel: Expose collect_cpu_info_early() for IFS ...
2022-05-24Merge tag 'x86_splitlock_for_v5.19_rc1' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 splitlock updates from Borislav Petkov: - Add Raptor Lake to the set of CPU models which support splitlock - Make life miserable for apps using split locks by slowing them down considerably while the rest of the system remains responsive. The hope is it will hurt more and people will really fix their misaligned locks apps. As a result, free a TIF bit. * tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Enable the split lock feature on Raptor Lake x86/split-lock: Remove unused TIF_SLD bit x86/split_lock: Make life miserable for split lockers
2022-05-24Merge tag 'x86_core_for_v5.19_rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Borislav Petkov: - Remove all the code around GS switching on 32-bit now that it is not needed anymore - Other misc improvements * tag 'x86_core_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bug: Use normal relative pointers in 'struct bug_entry' x86/nmi: Make register_nmi_handler() more robust x86/asm: Merge load_gs_index() x86/32: Remove lazy GS macros ELF: Remove elf_core_copy_kernel_regs() x86/32: Simplify ELF_CORE_COPY_REGS
2022-05-24Merge tag 'x86_build_for_v5.19_rc1' of ↵Linus Torvalds1-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: - Add a "make x86_debug.config" target which enables a bunch of useful config debug options when trying to debug an issue - A gcc-12 build warnings fix * tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Wrap literal addresses in absolute_pointer() x86/configs: Add x86 debugging Kconfig fragment plus docs
2022-05-24Merge tag 'x86_tdx_for_v5.19_rc1' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel TDX support from Borislav Petkov: "Intel Trust Domain Extensions (TDX) support. This is the Intel version of a confidential computing solution called Trust Domain Extensions (TDX). This series adds support to run the kernel as part of a TDX guest. It provides similar guest protections to AMD's SEV-SNP like guest memory and register state encryption, memory integrity protection and a lot more. Design-wise, it differs from AMD's solution considerably: it uses a software module which runs in a special CPU mode called (Secure Arbitration Mode) SEAM. As the name suggests, this module serves as sort of an arbiter which the confidential guest calls for services it needs during its lifetime. Just like AMD's SNP set, this series reworks and streamlines certain parts of x86 arch code so that this feature can be properly accomodated" * tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/tdx: Fix RETs in TDX asm x86/tdx: Annotate a noreturn function x86/mm: Fix spacing within memory encryption features message x86/kaslr: Fix build warning in KASLR code in boot stub Documentation/x86: Document TDX kernel architecture ACPICA: Avoid cache flush inside virtual machines x86/tdx/ioapic: Add shared bit for IOAPIC base address x86/mm: Make DMA memory shared for TD guest x86/mm/cpa: Add support for TDX shared memory x86/tdx: Make pages shared in ioremap() x86/topology: Disable CPU online/offline control for TDX guests x86/boot: Avoid #VE during boot for TDX platforms x86/boot: Set CR0.NE early and keep it set during the boot x86/acpi/x86/boot: Add multiprocessor wake-up support x86/boot: Add a trampoline for booting APs via firmware handoff x86/tdx: Wire up KVM hypercalls x86/tdx: Port I/O: Add early boot support x86/tdx: Port I/O: Add runtime hypercalls x86/boot: Port I/O: Add decompression-time support for TDX x86/boot: Port I/O: Allow to hook up alternative helpers ...
2022-05-24Merge tag 'timers-core-2022-05-23' of ↵Linus Torvalds7-53/+121
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and timekeeping updates from Thomas Gleixner: - Expose CLOCK_TAI to instrumentation to aid with TSN debugging. - Ensure that the clockevent is stopped when there is no timer armed to avoid pointless wakeups. - Make the sched clock frequency handling and rounding consistent. - Provide a better debugobject hint for delayed works. The timer callback is always the same, which makes it difficult to identify the underlying work. Use the work function as a hint instead. - Move the timer specific sysctl code into the timer subsystem. - The usual set of improvements and cleanups * tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Provide a better debugobjects hint for delayed works time/sched_clock: Fix formatting of frequency reporting code time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz time/sched_clock: Round the frequency reported to nearest rather than down timekeeping: Consolidate fast timekeeper timekeeping: Annotate ktime_get_boot_fast_ns() with data_race() timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped timekeeping: Introduce fast accessor to clock tai tracing/timer: Add missing argument documentation of trace points clocksource: Replace cpumask_weight() with cpumask_empty() timers: Move timer sysctl into the timer code clockevents: Use dedicated list iterator variable timers: Simplify calc_index() timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
2022-05-24Merge tag 'irq-core-2022-05-23' of ↵Linus Torvalds7-20/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt handling updates from Thomas Gleixner: "Core code: - Make the managed interrupts more robust by shutting them down in the core code when the assigned affinity mask does not contain online CPUs. - Make the irq simulator chip work on RT - A small set of cpumask and power manageent cleanups Drivers: - A set of changes which mark GPIO interrupt chips immutable to prevent the GPIO subsystem from modifying it under the hood. This provides the necessary infrastructure and converts a set of GPIO and pinctrl drivers over. - A set of changes to make the pseudo-NMI handling for GICv3 more robust: a missing barrier and consistent handling of the priority mask. - Another set of GICv3 improvements and fixes, but nothing outstanding - The usual set of improvements and cleanups all over the place - No new irqchip drivers and not even a new device tree binding! 100+ interrupt chips are truly enough" * tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) irqchip: Add Kconfig symbols for sunxi drivers irqchip/gic-v3: Fix priority mask handling irqchip/gic-v3: Refactor ISB + EOIR at ack time irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling genirq/irq_sim: Make the irq_work always run in hard irq context irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x irqchip/gic: Improved warning about incorrect type irqchip/csky: Return true/false (not 1/0) from bool functions irqchip/imx-irqsteer: Add runtime PM support irqchip/imx-irqsteer: Constify irq_chip struct irqchip/armada-370-xp: Enable MSI affinity configuration irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value irqchip/sun6i-r: Use NULL for chip_data irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup irqchip/exiu: Fix acknowledgment of edge triggered interrupts irqchip/gic-v3: Claim iomem resources dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP irqchip/gic-v3: Detect LPI invalidation MMIO registers ...
2022-05-24Merge tag 'smp-core-2022-05-23' of ↵Linus Torvalds2-9/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: - Initialize the per-CPU structures during early boot so that the state is consistent from the very beginning. - Make the virtualization hotplug state handling more robust and let the core bringup CPUs which timed out in an earlier attempt again. - Make the x86/xen CPU state tracking consistent on a failed online attempt, so a consecutive bringup does not fall over the inconsistent state. * tag 'smp-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Initialise all cpuhp_cpu_state structs earlier cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again. x86/xen: Allow to retry if cpu_initialize_context() failed.
2022-05-24Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski17-198/+1120
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-05-23 We've added 113 non-merge commits during the last 26 day(s) which contain a total of 121 files changed, 7425 insertions(+), 1586 deletions(-). The main changes are: 1) Speed up symbol resolution for kprobes multi-link attachments, from Jiri Olsa. 2) Add BPF dynamic pointer infrastructure e.g. to allow for dynamically sized ringbuf reservations without extra memory copies, from Joanne Koong. 3) Big batch of libbpf improvements towards libbpf 1.0 release, from Andrii Nakryiko. 4) Add BPF link iterator to traverse links via seq_file ops, from Dmitrii Dolgov. 5) Add source IP address to BPF tunnel key infrastructure, from Kaixi Fan. 6) Refine unprivileged BPF to disable only object-creating commands, from Alan Maguire. 7) Fix JIT blinding of ld_imm64 when they point to subprogs, from Alexei Starovoitov. 8) Add BPF access to mptcp_sock structures and their meta data, from Geliang Tang. 9) Add new BPF helper for access to remote CPU's BPF map elements, from Feng Zhou. 10) Allow attaching 64-bit cookie to BPF link of fentry/fexit/fmod_ret, from Kui-Feng Lee. 11) Follow-ups to typed pointer support in BPF maps, from Kumar Kartikeya Dwivedi. 12) Add busy-poll test cases to the XSK selftest suite, from Magnus Karlsson. 13) Improvements in BPF selftest test_progs subtest output, from Mykola Lysenko. 14) Fill bpf_prog_pack allocator areas with illegal instructions, from Song Liu. 15) Add generic batch operations for BPF map-in-map cases, from Takshak Chahande. 16) Make bpf_jit_enable more user friendly when permanently on 1, from Tiezhu Yang. 17) Fix an array overflow in bpf_trampoline_get_progs(), from Yuntao Wang. ==================== Link: https://lore.kernel.org/r/20220523223805.27931-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-24bpf: Add dynptr data slicesJoanne Koong2-0/+51
This patch adds a new helper function void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len); which returns a pointer to the underlying data of a dynptr. *len* must be a statically known value. The bpf program may access the returned data slice as a normal buffer (eg can do direct reads and writes), since the verifier associates the length with the returned pointer, and enforces that no out of bounds accesses occur. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com
2022-05-24bpf: Add bpf_dynptr_read and bpf_dynptr_writeJoanne Koong1-0/+78
This patch adds two helper functions, bpf_dynptr_read and bpf_dynptr_write: long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset); long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len); The dynptr passed into these functions must be valid dynptrs that have been initialized. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220523210712.3641569-5-joannelkoong@gmail.com
2022-05-24bpf: Dynptr support for ring buffersJoanne Koong3-7/+137
Currently, our only way of writing dynamically-sized data into a ring buffer is through bpf_ringbuf_output but this incurs an extra memcpy cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra memcpy, but it can only safely support reservation sizes that are statically known since the verifier cannot guarantee that the bpf program won’t access memory outside the reserved space. The bpf_dynptr abstraction allows for dynamically-sized ring buffer reservations without the extra memcpy. There are 3 new APIs: long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); These closely follow the functionalities of the original ringbuf APIs. For example, all ringbuffer dynptrs that have been reserved must be either submitted or discarded before the program exits. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
2022-05-24bpf: Add bpf_dynptr_from_mem for local dynptrsJoanne Koong2-0/+71
This patch adds a new api bpf_dynptr_from_mem: long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr); which initializes a dynptr to point to a bpf program's local memory. For now only local memory that is of reg type PTR_TO_MAP_VALUE is supported. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220523210712.3641569-3-joannelkoong@gmail.com
2022-05-24bpf: Add verifier support for dynptrsJoanne Koong1-3/+185
This patch adds the bulk of the verifier work for supporting dynamic pointers (dynptrs) in bpf. A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure defined internally as: struct bpf_dynptr_kern { void *data; u32 size; u32 offset; } __aligned(8); The upper 8 bits of *size* is reserved (it contains extra metadata about read-only status and dynptr type). Consequently, a dynptr only supports memory less than 16 MB. There are different types of dynptrs (eg malloc, ringbuf, ...). In this patchset, the most basic one, dynptrs to a bpf program's local memory, is added. For now only local memory that is of reg type PTR_TO_MAP_VALUE is supported. In the verifier, dynptr state information will be tracked in stack slots. When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding to the frame pointer where the dynptr resides at are marked STACK_DYNPTR. For helper functions that take in initialized dynptrs (eg bpf_dynptr_read + bpf_dynptr_write which are added later in this patchset), the verifier enforces that the dynptr has been initialized properly by checking that their corresponding stack slots have been marked as STACK_DYNPTR. The 6th patch in this patchset adds test cases that the verifier should successfully reject, such as for example attempting to use a dynptr after doing a direct write into it inside the bpf program. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20220523210712.3641569-2-joannelkoong@gmail.com
2022-05-24bpf: Suppress 'passing zero to PTR_ERR' warningKumar Kartikeya Dwivedi1-1/+1
Kernel Test Robot complains about passing zero to PTR_ERR for the said line, suppress it by using PTR_ERR_OR_ZERO. Fixes: c0a5a21c25f3 ("bpf: Allow storing referenced kptr in map") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220521132620.1976921-1-memxor@gmail.com
2022-05-24bpf: Introduce bpf_arch_text_invalidate for bpf_prog_packSong Liu1-0/+8
Introduce bpf_arch_text_invalidate and use it to fill unused part of the bpf_prog_pack with illegal instructions when a BPF program is freed. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220520235758.1858153-4-song@kernel.org
2022-05-24bpf: Fill new bpf_prog_pack with illegal instructionsSong Liu1-4/+6
bpf_prog_pack enables sharing huge pages among multiple BPF programs. These pages are marked as executable before the JIT engine fill it with BPF programs. To make these pages safe, fill the hole bpf_prog_pack with illegal instructions before making it executable. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220520235758.1858153-2-song@kernel.org
2022-05-23Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds2-13/+14
Pull block updates from Jens Axboe: "Here are the core block changes for 5.19. This contains: - blk-throttle accounting fix (Laibin) - Series removing redundant assignments (Michal) - Expose bio cache via the bio_set, so that DM can use it (Mike) - Finish off the bio allocation interface cleanups by dealing with the weirdest member of the family. bio_kmalloc combines a kmalloc for the bio and bio_vecs with a hidden bio_init call and magic cleanup semantics (Christoph) - Clean up the block layer API so that APIs consumed by file systems are (almost) only struct block_device based, so that file systems don't have to poke into block layer internals like the request_queue (Christoph) - Clean up the blk_execute_rq* API (Christoph) - Clean up various lose end in the blk-cgroup code to make it easier to follow in preparation of reworking the blkcg assignment for bios (Christoph) - Fix use-after-free issues in BFQ when processes with merged queues get moved to different cgroups (Jan) - BFQ fixes (Jan) - Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming, Wolfgang, me)" * tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits) blk-mq: fix typo in comment bfq: Remove bfq_requeue_request_body() bfq: Remove superfluous conversion from RQ_BIC() bfq: Allow current waker to defend against a tentative one bfq: Relax waker detection for shared queues blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE() blk-throttle: Set BIO_THROTTLED when bio has been throttled blk-cgroup: Remove unnecessary rcu_read_lock/unlock() blk-cgroup: always terminate io.stat lines block, bfq: make bfq_has_work() more accurate block, bfq: protect 'bfqd->queued' by 'bfqd->lock' block: cleanup the VM accounting in submit_bio block: Fix the bio.bi_opf comment block: reorder the REQ_ flags blk-iocost: combine local_stat and desc_stat to stat block: improve the error message from bio_check_eod block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone block: remove superfluous calls to blkcg_bio_issue_init kthread: unexport kthread_blkcg blk-cgroup: cleanup blkcg_maybe_throttle_current ...
2022-05-23Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+19
Pull io_uring updates from Jens Axboe: "Here are the main io_uring changes for 5.19. This contains: - Fixes for sparse type warnings (Christoph, Vasily) - Support for multi-shot accept (Hao) - Support for io_uring managed fixed files, rather than always needing the applicationt o manage the indices (me) - Fix for a spurious poll wakeup (Dylan) - CQE overflow fixes (Dylan) - Support more types of cancelations (me) - Support for co-operative task_work signaling, rather than always forcing an IPI (me) - Support for doing poll first when appropriate, rather than always attempting a transfer first (me) - Provided buffer cleanups and support for mapped buffers (me) - Improve how io_uring handles inflight SCM files (Pavel) - Speedups for registered files (Pavel, me) - Organize the completion data in a struct in io_kiocb rather than keep it in separate spots (Pavel) - task_work improvements (Pavel) - Cleanup and optimize the submission path, in general and for handling links (Pavel) - Speedups for registered resource handling (Pavel) - Support sparse buffers and file maps (Pavel, me) - Various fixes and cleanups (Almog, Pavel, me)" * tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits) io_uring: fix incorrect __kernel_rwf_t cast io_uring: disallow mixed provided buffer group registrations io_uring: initialize io_buffer_list head when shared ring is unregistered io_uring: add fully sparse buffer registration io_uring: use rcu_dereference in io_close io_uring: consistently use the EPOLL* defines io_uring: make apoll_events a __poll_t io_uring: drop a spurious inline on a forward declaration io_uring: don't use ERR_PTR for user pointers io_uring: use a rwf_t for io_rw.flags io_uring: add support for ring mapped supplied buffers io_uring: add io_pin_pages() helper io_uring: add buffer selection support to IORING_OP_NOP io_uring: fix locking state for empty buffer group io_uring: implement multishot mode for accept io_uring: let fast poll support multishot io_uring: add REQ_F_APOLL_MULTISHOT for requests io_uring: add IORING_ACCEPT_MULTISHOT for accept io_uring: only wake when the correct events are set io_uring: avoid io-wq -EAGAIN looping for !IOPOLL ...
2022-05-23Merge tag 'rcu.2022.05.19a' of ↵Linus Torvalds22-375/+1082
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU update from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offloading updates, mainly simplifications - RCU-tasks updates, including some -rt fixups, handling of systems with sparse CPU numbering, and a fix for a boot-time race-condition failure - Put SRCU on a memory diet in order to reduce the size of the srcu_struct structure - Torture-test updates fixing some bugs in tests and closing some testing holes - Torture-test updates for the RCU tasks flavors, most notably ensuring that building rcutorture and friends does not change the RCU-tasks-related Kconfig options - Torture-test scripting updates - Expedited grace-period updates, most notably providing milliseconds-scale (not all that) soft real-time response from synchronize_rcu_expedited(). This is also the first time in almost 30 years of RCU that someone other than me has pushed for a reduction in the RCU CPU stall-warning timeout, in this case by more than three orders of magnitude from 21 seconds to 20 milliseconds. This tighter timeout applies only to expedited grace periods * tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu: Move expedited grace period (GP) work to RT kthread_worker rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT srcu: Drop needless initialization of sdp in srcu_gp_start() srcu: Prevent expedited GPs and blocking readers from consuming CPU srcu: Add contention check to call_srcu() srcu_data ->lock acquisition srcu: Automatically determine size-transition strategy at boot rcutorture: Make torture.sh allow for --kasan rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU rcutorture: Make kvm.sh allow more memory for --kasan runs torture: Save "make allmodconfig" .config file scftorture: Remove extraneous "scf" from per_version_boot_params rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC torture: Enable CSD-lock stall reports for scftorture torture: Skip vmlinux check for kvm-again.sh runs scftorture: Adjust for TASKS_RCU Kconfig option being selected rcuscale: Allow rcuscale without RCU Tasks Rude/Trace rcuscale: Allow rcuscale without RCU Tasks refscale: Allow refscale without RCU Tasks Rude/Trace refscale: Allow refscale without RCU Tasks rcutorture: Allow specifying per-scenario stat_interval ...
2022-05-23Merge branches 'pm-em' and 'pm-cpuidle'Rafael J. Wysocki1-29/+36
Marge Energy Model support updates and cpuidle updates for 5.19-rc1: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). * pm-em: PM: EM: Decrement policy counter powercap: DTPM: Check for Energy Model type thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Documentation: EM: Add artificial EM registration description PM: EM: Remove old debugfs files and print all 'flags' PM: EM: Change the order of arguments in the .active_power() callback PM: EM: Use the new .get_cost() callback while registering EM PM: EM: Add artificial EM flag PM: EM: Add .get_cost() callback * pm-cpuidle: cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used cpuidle: psci: Fix regression leading to no genpd governor intel_idle: Add AlderLake support
2022-05-23Merge branches 'pm-core', 'pm-sleep' and 'powercap'Rafael J. Wysocki4-37/+13
Merge PM core changes, updates related to system sleep and power capping updates for 5.19-rc1: - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). * pm-core: PM: runtime: Avoid device usage count underflows iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume() * pm-sleep: cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode PM: runtime: Allow to call __pm_runtime_set_status() from atomic context PM: hibernate: Don't mark comment as kernel-doc x86/ACPI: Preserve ACPI-table override during hibernation PM: hibernate: Fix some kernel-doc comments PM: sleep: enable dynamic debug support within pm_pr_dbg() PM: sleep: Narrow down -DDEBUG on kernel/power/ files * powercap: powercap: intel_rapl: remove redundant store to value after multiply powercap: intel_rapl: add support for ALDERLAKE_N powercap: RAPL: Add Power Limit4 support for RaptorLake powercap: intel_rapl: add support for RaptorLake
2022-05-23dma-direct: don't over-decrypt memoryRobin Murphy1-2/+2
The original x86 sev_alloc() only called set_memory_decrypted() on memory returned by alloc_pages_node(), so the page order calculation fell out of that logic. However, the common dma-direct code has several potential allocators, not all of which are guaranteed to round up the underlying allocation to a power-of-two size, so carrying over that calculation for the encryption/decryption size was a mistake. Fix it by rounding to a *number* of pages, rather than an order. Until recently there was an even worse interaction with DMA_DIRECT_REMAP where we could have ended up decrypting part of the next adjacent vmalloc area, only averted by no architecture actually supporting both configs at once. Don't ask how I found that one out... Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Rientjes <rientjes@google.com>
2022-05-21bpf: refine kernel.unprivileged_bpf_disabled behaviourAlan Maguire1-1/+13
With unprivileged BPF disabled, all cmds associated with the BPF syscall are blocked to users without CAP_BPF/CAP_SYS_ADMIN. However there are use cases where we may wish to allow interactions with BPF programs without being able to load and attach them. So for example, a process with required capabilities loads/attaches a BPF program, and a process with less capabilities interacts with it; retrieving perf/ring buffer events, modifying map-specified config etc. With all BPF syscall commands blocked as a result of unprivileged BPF being disabled, this mode of interaction becomes impossible for processes without CAP_BPF. As Alexei notes "The bpf ACL model is the same as traditional file's ACL. The creds and ACLs are checked at open(). Then during file's write/read additional checks might be performed. BPF has such functionality already. Different map_creates have capability checks while map_lookup has: map_get_sys_perms(map, f) & FMODE_CAN_READ. In other words it's enough to gate FD-receiving parts of bpf with unprivileged_bpf_disabled sysctl. The rest is handled by availability of FD and access to files in bpffs." So key fd creation syscall commands BPF_PROG_LOAD and BPF_MAP_CREATE are blocked with unprivileged BPF disabled and no CAP_BPF. And as Alexei notes, map creation with unprivileged BPF disabled off blocks creation of maps aside from array, hash and ringbuf maps. Programs responsible for loading and attaching the BPF program can still control access to its pinned representation by restricting permissions on the pin path, as with normal files. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/1652970334-30510-2-git-send-email-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-21bpf: Allow kfunc in tracing and syscall programs.Benjamin Tissoires1-0/+6
Tracing and syscall BPF program types are very convenient to add BPF capabilities to subsystem otherwise not BPF capable. When we add kfuncs capabilities to those program types, we can add BPF features to subsystems without having to touch BPF core. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220518205924.399291-2-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>