summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-07-01Merge tag 'net-next-5.14' of ↵Linus Torvalds137-516/+2887
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-07-01Merge tag 'audit-pr-20210629' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Another merge window, another small audit pull request. Four patches in total: one is cosmetic, one removes an unnecessary initialization, one renames some enum values to prevent name collisions, and one converts list_del()/list_add() to list_move(). None of these are earth shattering and all pass the audit-testsuite tests while merging cleanly on top of your tree from earlier today" * tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: remove unnecessary 'ret' initialization audit: remove trailing spaces and tabs audit: Use list_move instead of list_del/list_add audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition audit: add blank line after variable declarations
2021-07-01Merge tag 'selinux-pr-20210629' of ↵Linus Torvalds3-8/+7
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: - The slow_avc_audit() function is now non-blocking so we can remove the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of avc_has_perm(). - Use kmemdup() instead of kcalloc()+copy when copying parts of the SELinux policydb. - The InfiniBand device name is now passed by reference when possible in the SELinux code, removing a strncpy(). - Minor cleanups including: constification of avtab function args, removal of useless LSM/XFRM function args, SELinux kdoc fixes, and removal of redundant assignments. * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit() selinux: slow_avc_audit has become non-blocking selinux: Fix kernel-doc selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC lsm_audit,selinux: pass IB device name by reference selinux: Remove redundant assignment to rc selinux: Corrected comment to match kernel-doc comment selinux: delete selinux_xfrm_policy_lookup() useless argument selinux: constify some avtab function arguments selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
2021-07-01Merge tag 'clang-features-v5.14-rc1' of ↵Linus Torvalds4-14/+27
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang feature updates from Kees Cook: - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the face of the noinstr attribute, paving the way for PGO and fixing GCOV. (Nick Desaulniers) - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor) - Small fixes to CFI. (Mark Rutland, Nathan Chancellor) * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR compiler_attributes.h: cleanups for GCC 4.9+ compiler_attributes.h: define __no_profile, add to noinstr x86, lto: Enable Clang LTO for 32-bit as well CFI: Move function_nocfi() into compiler.h MAINTAINERS: Add Clang CFI section
2021-06-30Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds3-2/+17
Pull block driver updates from Jens Axboe: "Pretty calm round, mostly just NVMe and a bit of MD: - NVMe updates (via Christoph) - improve the APST configuration algorithm (Alexey Bogoslavsky) - look for StorageD3Enable on companion ACPI device (Mario Limonciello) - allow selecting the network interface for TCP connections (Martin Belanger) - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King, Christoph) - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner) - MD updates (Via Song) - iostats rewrite (Guoqing Jiang) - raid5 lock contention optimization (Gal Ofri) - Fall through warning fix (Gustavo) - Misc fixes (Gustavo, Jiapeng)" * tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value loop: Fix missing discard support when using LOOP_CONFIGURE nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established ...
2021-06-30Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds8-26/+58
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
2021-06-30Merge branch 'for-linus' of ↵Linus Torvalds2-2/+27
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID updates from Jiri Kosina: - patch series that ensures that hid-multitouch driver disables touch and button-press reporting on hid-mt devices during suspend when the device is not configured as a wakeup-source, from Hans de Goede - support for ISH DMA on Intel EHL platform, from Even Xu - support for Renoir and Cezanne SoCs, Ambient Light Sensor and Human Presence Detection sensor for amd-sfh driver, from Basavaraj Natikar - other assorted code cleanups and device-specific fixes/quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (45 commits) HID: thrustmaster: Switch to kmemdup() when allocate change_request HID: multitouch: Disable event reporting on suspend when the device is not a wakeup-source HID: logitech-dj: Implement may_wakeup ll-driver callback HID: usbhid: Implement may_wakeup ll-driver callback HID: core: Add hid_hw_may_wakeup() function HID: input: Add support for Programmable Buttons HID: wacom: Correct base usage for capacitive ExpressKey status bits HID: amd_sfh: Add initial support for HPD sensor HID: amd_sfh: Extend ALS support for newer AMD platform HID: amd_sfh: Extend driver capabilities for multi-generation support HID: surface-hid: Fix get-report request HID: sony: fix freeze when inserting ghlive ps3/wii dongles HID: usbkbd: Avoid GFP_ATOMIC when GFP_KERNEL is possible HID: amd_sfh: change in maintainer HID: intel-ish-hid: ipc: Specify that EHL no cache snooping HID: intel-ish-hid: ishtp: Add dma_no_cache_snooping() callback HID: intel-ish-hid: Set ISH driver depends on x86 HID: hid-input: add Surface Go battery quirk HID: intel-ish-hid: Fix minor typos in comments HID: usbmouse: Avoid GFP_ATOMIC when GFP_KERNEL is possible ...
2021-06-30Merge tag 'platform-drivers-x86-v5.14-1' of ↵Linus Torvalds7-5/+133
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "Highlights: - New think-lmi driver adding support for changing Lenovo Thinkpad BIOS settings from within Linux using the standard firmware- attributes class sysfs API - MS Surface aggregator-cdev now also supports forwarding events to user-space (for debugging / new driver development purposes only) - New intel_skl_int3472 driver this provides the necessary glue to translate ACPI table information to GPIOs, regulators, etc. for camera sensors on Intel devices with IPU3 attached MIPI cameras - A whole bunch of other fixes + device-specific quirk additions - New devm_work_autocancel() devm-helpers.h function" * tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (83 commits) platform/x86: dell-wmi-sysman: Change user experience when Admin/System Password is modified platform/x86: intel_skl_int3472: Uninitialized variable in skl_int3472_handle_gpio_resources() platform/x86: think-lmi: Move kfree(setting->possible_values) to tlmi_attr_setting_release() platform/x86: think-lmi: Split current_value to reflect only the value platform/x86: think-lmi: Fix issues with duplicate attributes platform/x86: think-lmi: Return EINVAL when kbdlang gets set to a 0 length string platform/x86: intel_cht_int33fe: Move to its own subfolder platform/x86: intel_skl_int3472: Move to intel/ subfolder platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock() platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_regulator() platform/x86: intel_skl_int3472: Use ACPI GPIO resource directly platform/x86: intel_skl_int3472: Fix dependencies (drop CLKDEV_LOOKUP) platform/x86: intel_skl_int3472: Free ACPI device resources after use platform/x86: Remove "default n" entries platform/x86: ISST: Use numa node id for cpu pci dev mapping platform/x86: ISST: Optimize CPU to PCI device mapping tools/power/x86/intel-speed-select: v1.10 release tools/power/x86/intel-speed-select: Fix uncore memory frequency display extcon: extcon-max8997: Simplify driver using devm extcon: extcon-max8997: Fix IRQ freeing at error path ...
2021-06-30Merge tag 'mailbox-v5.14' of ↵Linus Torvalds2-6/+45
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - imx: add support for i.MX8ULP - mtk: code change around callback struct - qcom: add sm6125, MSM8939 fix for channel exhaustion - microchip: add support for polarfire controller - misc: cosmetic changes to bcm-2835,flexrm,pdc, arm-mhu and hisilicon * tag 'mailbox-v5.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration: (26 commits) MAINTAINERS: add entry for polarfire soc mailbox dt-bindings: add bindings for polarfire soc system controller mbox: add polarfire soc system controller mailbox dt-bindings: add bindings for polarfire soc mailbox mailbox: imx: Avoid using val uninitialized in imx_mu_isr() mailbox: qcom: Add MSM8939 APCS support mailbox: qcom: Use PLATFORM_DEVID_AUTO to register platform device dt-bindings: mailbox: qcom: Add MSM8939 APCS compatible mailbox: qcom-apcs: Add SM6125 compatible dt-bindings: mailbox: Add binding for sm6125 mailbox: mtk-cmdq: Fix uninitialized variable in cmdq_mbox_flush() mailbox: bcm-flexrm-mailbox: Remove redundant dev_err call in flexrm_mbox_probe() mailbox: bcm2835: Remove redundant dev_err call in bcm2835_mbox_probe() mailbox: qcom-ipcc: Fix IPCC mbox channel exhaustion mailbox: mtk-cmdq: Add struct cmdq_pkt in struct cmdq_cb_data mailbox: mtk-cmdq: Use mailbox rx_callback mailbox: mtk-cmdq: Remove cmdq_cb_status mailbox: imx-mailbox: support i.MX8ULP MU mailbox: imx: add xSR/xCR register array mailbox: imx: replace the xTR/xRR array with single register ...
2021-06-30Merge branch 'for-5.14/multitouch' into for-linusJiri Kosina1-0/+18
- patch series that ensures that hid-multitouch driver disables touch and button-press reporting on hid-mt devices during suspend when the device is not configured as a wakeup-source, from Hans de Goede
2021-06-30Merge branch 'for-5.14/intel-ish' into for-linusJiri Kosina1-2/+8
- support for ISH DMA on EHL platform from Even Xu - various code style fixes and cleanups from Lee Jones and Uwe Kleine-König
2021-06-30Merge branch 'for-5.14/core' into for-linusJiri Kosina1-0/+1
- device unbinding locking fix from Dmitry Torokhov - support for programmable buttons (mapping to KEY_MACRO# event codes) from Thomas Weißschuh - various other small fixes and code style improvements
2021-06-30Merge tag '5.14-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-0/+8
Pull cifs updates from Steve French: - improve fallocate emulation - DFS fixes - minor multichannel fixes - various cleanup patches, many to address Coverity warnings * tag '5.14-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (38 commits) smb3: prevent races updating CurrentMid cifs: fix missing spinlock around update to ses->status cifs: missing null pointer check in cifs_mount smb3: fix possible access to uninitialized pointer to DACL cifs: missing null check for newinode pointer cifs: remove two cases where rc is set unnecessarily in sid_to_id SMB3: Add new info level for query directory cifs: fix NULL dereference in smb2_check_message() smbdirect: missing rc checks while waiting for rdma events cifs: Avoid field over-reading memcpy() smb311: remove dead code for non compounded posix query info cifs: fix SMB1 error path in cifs_get_file_info_unix smb3: fix uninitialized value for port in witness protocol move cifs: fix unneeded null check cifs: use SPDX-Licence-Identifier cifs: convert list_for_each to entry variant in cifs_debug.c cifs: convert list_for_each to entry variant in smb2misc.c cifs: avoid extra calls in posix_info_parse cifs: retry lookup and readdir when EAGAIN is returned. cifs: fix check of dfs interlinks ...
2021-06-30Merge tag 'fs.openat2.unknown_flags.v5.14' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull openat2 fixes from Christian Brauner: - Remove the unused VALID_UPGRADE_FLAGS define we carried from an extension to openat2() that we haven't merged. Aleksa might be getting back to it at some point but just not right now. - openat2() used to accidently ignore unknown flag values in the upper 32 bits. The new openat2() syscall verifies that no unknown O-flag values are set and returns an error to userspace if they are while the older open syscalls like open() and openat() simply ignore unknown flag values: #define O_FLAG_CURRENTLY_INVALID (1 << 31) struct open_how how = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID, .resolve = 0, }; /* fails */ fd = openat2(-EBADF, "/dev/null", &how, sizeof(how)); /* succeeds */ fd = openat(-EBADF, "/dev/null", O_RDONLY | O_FLAG_CURRENTLY_INVALID); However, openat2() silently truncates the upper 32 bits meaning: #define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31) #define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40) struct open_how how_lowe32 = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_LOWER32, }; struct open_how how_upper32 = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_UPPER32, }; /* fails */ fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32)); /* succeeds */ fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32)); Fix this by preventing the immediate truncation in build_open_flags() and add a compile-time check to catch when we add flags in the upper 32 bit range. * tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: test: add openat2() test for invalid upper 32 bit flag value open: don't silently ignore unknown O-flags in openat2() fcntl: remove unused VALID_UPGRADE_FLAGS
2021-06-30Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull mount_setattr updates from Christian Brauner: "A few releases ago the old mount API gained support for a mount options which prevents following symlinks on a given mount. This adds support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW flag via mount_setattr() and fsmount(). With mount_setattr() that flag can even be applied recursively. There's an additional ack from Ross Zwisler who originally authored the nosymfollow patch. As I've already had the patches in my for-next I didn't add his ack explicitly" * tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr() mount: Support "nosymfollow" in new mount api
2021-06-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds39-215/+342
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski10-20/+50
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-30Merge tag 'devprop-5.14-rc1' of ↵Linus Torvalds3-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework updates from Rafael Wysocki: "These unify device properties access in some pieces of code and make related changes. Specifics: - Handle device properties with software node API in the ACPI IORT table parsing code (Heikki Krogerus). - Unify of_node access in the common device properties code, constify the acpi_dma_supported() argument pointer and fix up CONFIG_ACPI=n stubs of some functions related to device properties (Andy Shevchenko)" * tag 'devprop-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: device property: Unify access to of_node ACPI: scan: Constify acpi_dma_supported() helper function ACPI: property: Constify stubs for CONFIG_ACPI=n case ACPI: IORT: Handle device properties with software node API device property: Retrieve fwnode from of_node via accessor
2021-06-29Merge tag 'acpi-5.14-rc1' of ↵Linus Torvalds9-6/+215
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to the 20210604 upstream revision, add preliminary support for the Platform Runtime Mechanism (PRM), address issues related to the handling of device dependencies in the ACPI device eunmeration code, improve the tracking of ACPI power resource states, improve the ACPI support for suspend-to-idle on AMD systems, continue the unification of message printing in the ACPI code, address assorted issues and clean up the code in a number of places. Specifics: - Update ACPICA code in the kernel to upstrea revision 20210604 including the following changes: - Add defines for the CXL Host Bridge Structureand and add the CFMWS structure definition to CEDT (Alison Schofield). - iASL: Finish support for the IVRS ACPI table (Bob Moore). - iASL: Add support for the SVKL table (Bob Moore). - iASL: Add full support for RGRT ACPI table (Bob Moore). - iASL: Add support for the BDAT ACPI table (Bob Moore). - iASL: add disassembler support for PRMT (Erik Kaneda). - Fix memory leak caused by _CID repair function (Erik Kaneda). - Add support for PlatformRtMechanism OpRegion (Erik Kaneda). - Add PRMT module header to facilitate parsing (Erik Kaneda). - Add _PLD panel positions (Fabian Wüthrich). - MADT: add Multiprocessor Wakeup Mailbox Structure and the SVKL table headers (Kuppuswamy Sathyanarayanan). - Use ACPI_FALLTHROUGH (Wei Ming Chen). - Add preliminary support for the Platform Runtime Mechanism (PRM) to allow the AML interpreter to call PRM functions (Erik Kaneda). - Address some issues related to the handling of device dependencies reported by _DEP in the ACPI device enumeration code and clean up some related pieces of it (Rafael Wysocki). - Improve the tracking of states of ACPI power resources (Rafael Wysocki). - Improve ACPI support for suspend-to-idle on AMD systems (Alex Deucher, Mario Limonciello, Pratik Vishwakarma). - Continue the unification and cleanup of message printing in the ACPI code (Hanjun Guo, Heiner Kallweit). - Fix possible buffer overrun issue with the description_show() sysfs attribute method (Krzysztof Wilczyński). - Improve the acpi_mask_gpe kernel command line parameter handling and clean up the core ACPI code related to sysfs (Andy Shevchenko, Baokun Li, Clayton Casciato). - Postpone bringing devices in the general ACPI PM domain to D0 during resume from system-wide suspend until they are really needed (Dmitry Torokhov). - Make the ACPI processor driver fix up C-state latency if not ordered (Mario Limonciello). - Add support for identifying devices depening on the given one that are not its direct descendants with the help of _DEP (Daniel Scally). - Extend the checks related to ACPI IRQ overrides on x86 in order to avoid false-positives (Hui Wang). - Add battery DPTF participant for Intel SoCs (Sumeet Pawnikar). - Rearrange the ACPI fan driver and device power management code to use a common list of device IDs (Rafael Wysocki). - Fix clang CFI violation in the ACPI BGRT table parsing code and clean it up (Nathan Chancellor). - Add GPE-related quirks for some laptops to the EC driver (Chris Chiu, Zhang Rui). - Make the ACPI PPTT table parsing code populate the cache-id value if present in the firmware (James Morse). - Remove redundant clearing of context->ret.pointer from acpi_run_osc() (Hans de Goede). - Add missing acpi_put_table() in acpi_init_fpdt() (Jing Xiangfeng). - Make ACPI APEI handle ARM Processor Error CPER records like Memory Error ones to avoid user space task lockups (Xiaofei Tan). - Stop warning about disabled ACPI in APEI (Jon Hunter). - Fix fall-through warning for Clang in the SBSHC driver (Gustavo A. R. Silva). - Add custom DSDT file as Makefile prerequisite (Richard Fitzgerald). - Initialize local variable to avoid garbage being returned (Colin Ian King). - Simplify assorted pieces of code, address assorted coding style and documentation issues and comment typos (Baokun Li, Christophe JAILLET, Clayton Casciato, Liu Shixin, Shaokun Zhang, Wei Yongjun, Yang Li, Zhen Lei)" * tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (97 commits) ACPI: PM: postpone bringing devices to D0 unless we need them ACPI: tables: Add custom DSDT file as makefile prerequisite ACPI: bgrt: Use sysfs_emit ACPI: bgrt: Fix CFI violation ACPI: EC: trust DSDT GPE for certain HP laptop ACPI: scan: Simplify acpi_table_events_fn() ACPI: PM: Adjust behavior for field problems on AMD systems ACPI: PM: s2idle: Add support for new Microsoft UUID ACPI: PM: s2idle: Add support for multiple func mask ACPI: PM: s2idle: Refactor common code ACPI: PM: s2idle: Use correct revision id ACPI: sysfs: Remove tailing return statement in void function ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros ACPI: sysfs: Sort headers alphabetically ACPI: sysfs: Refactor param_get_trace_state() to drop dead code ACPI: sysfs: Unify pattern of memory allocations ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe ACPI: sysfs: Make sparse happy about address space in use ACPI: scan: Fix race related to dropping dependencies ACPI: scan: Reorganize acpi_device_add() ...
2021-06-29Merge tag 'pm-5.14-rc1' of ↵Linus Torvalds2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add hybrid processors support to the intel_pstate driver and make it work with more processor models when HWP is disabled, make the intel_idle driver use special C6 idle state paremeters when package C-states are disabled, add cooling support to the tegra30 devfreq driver, rework the TEO (timer events oriented) cpuidle governor, extend the OPP (operating performance points) framework to use the required-opps DT property in more cases, fix some issues and clean up a number of assorted pieces of code. Specifics: - Make intel_pstate support hybrid processors using abstract performance units in the HWP interface (Rafael Wysocki). - Add Icelake servers and Cometlake support in no-HWP mode to intel_pstate (Giovanni Gherdovich). - Make cpufreq_online() error path be consistent with the CPU device removal path in cpufreq (Rafael Wysocki). - Clean up 3 cpufreq drivers and the statistics code (Hailong Liu, Randy Dunlap, Shaokun Zhang). - Make intel_idle use special idle state parameters for C6 when package C-states are disabled (Chen Yu). - Rework the TEO (timer events oriented) cpuidle governor to address some theoretical shortcomings in it (Rafael Wysocki). - Drop unneeded semicolon from the TEO governor (Wan Jiabing). - Modify the runtime PM framework to accept unassigned suspend and resume callback pointers (Ulf Hansson). - Improve pm_runtime_get_sync() documentation (Krzysztof Kozlowski). - Improve device performance states support in the generic power domains (genpd) framework (Ulf Hansson). - Fix some documentation issues in genpd (Yang Yingliang). - Make the operating performance points (OPP) framework use the required-opps DT property in use cases that are not related to genpd (Hsin-Yi Wang). - Make lazy_link_required_opp_table() use list_del_init instead of list_del/INIT_LIST_HEAD (Yang Yingliang). - Simplify wake IRQs handling in the core system-wide sleep support code and clean up some coding style inconsistencies in it (Tian Tao, Zhen Lei). - Add cooling support to the tegra30 devfreq driver and improve its DT bindings (Dmitry Osipenko). - Fix some assorted issues in the devfreq core and drivers (Chanwoo Choi, Dong Aisheng, YueHaibing)" * tag 'pm-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits) PM / devfreq: passive: Fix get_target_freq when not using required-opp cpufreq: Make cpufreq_online() call driver->offline() on errors opp: Allow required-opps to be used for non genpd use cases cpuidle: teo: remove unneeded semicolon in teo_select() dt-bindings: devfreq: tegra30-actmon: Add cooling-cells dt-bindings: devfreq: tegra30-actmon: Convert to schema PM / devfreq: userspace: Use DEVICE_ATTR_RW macro PM: runtime: Clarify documentation when callbacks are unassigned PM: runtime: Allow unassigned ->runtime_suspend|resume callbacks PM: runtime: Improve path in rpm_idle() when no callback PM: hibernate: remove leading spaces before tabs PM: sleep: remove trailing spaces and tabs PM: domains: Drop/restore performance state votes for devices at runtime PM PM: domains: Return early if perf state is already set for the device PM: domains: Split code in dev_pm_genpd_set_performance_state() cpuidle: teo: Use kerneldoc documentation in admin-guide cpuidle: teo: Rework most recent idle duration values treatment cpuidle: teo: Change the main idle state selection logic cpuidle: teo: Cosmetic modification of teo_select() cpuidle: teo: Cosmetic modifications of teo_update() ...
2021-06-29Merge tag 'timers-core-2021-06-29' of ↵Linus Torvalds3-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time and clocksource/clockevent related updates: Core changes: - Infrastructure to support per CPU "broadcast" devices for per CPU clockevent devices which stop in deep idle states. This allows us to utilize the more efficient architected timer on certain ARM SoCs for normal operation instead of permanentely using the slow to access SoC specific clockevent device. - Print the name of the broadcast/wakeup device in /proc/timer_list - Make the clocksource watchdog more robust against delays between reading the current active clocksource and the watchdog clocksource. Such delays can be caused by NMIs, SMIs and vCPU preemption. Handle this by reading the watchdog clocksource twice, i.e. before and after reading the current active clocksource. In case that the two watchdog reads shows an excessive time delta, the read sequence is repeated up to 3 times. - Improve the debug output and add a test module for the watchdog mechanism. - Reimplementation of the venerable time64_to_tm() function with a faster and significantly smaller version. Straight from the source, i.e. the author of the related research paper contributed this! Driver changes: - No new drivers, not even new device tree bindings! - Fixes, improvements and cleanups and all over the place" * tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) time/kunit: Add missing MODULE_LICENSE() time: Improve performance of time64_to_tm() clockevents: Use list_move() instead of list_del()/list_add() clocksource: Print deviation in nanoseconds when a clocksource becomes unstable clocksource: Provide kernel module to test clocksource watchdog clocksource: Reduce clocksource-skew threshold clocksource: Limit number of CPUs checked for clock synchronization clocksource: Check per-CPU clock synchronization when marked unstable clocksource: Retry clock read if long delays detected clockevents: Add missing parameter documentation clocksource/drivers/timer-ti-dm: Drop unnecessary restore clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes clocksource/drivers/ingenic: Rename unreasonable array names clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG clocksource/drivers/mediatek: Ack and disable interrupts on suspend clocksource/drivers/samsung_pwm: Constify source IO memory ...
2021-06-29Merge tag 'irq-core-2021-06-29' of ↵Linus Torvalds4-39/+49
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...
2021-06-29Merge tag 'printk-for-5.14' of ↵Linus Torvalds3-2/+43
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add %pt[RT]s modifier to vsprintf(). It overrides ISO 8601 separator by using ' ' (space). It produces "YYYY-mm-dd HH:MM:SS" instead of "YYYY-mm-ddTHH:MM:SS". - Correctly parse long row of numbers by sscanf() when using the field width. Add extensive sscanf() selftest. - Generalize re-entrant CPU lock that has already been used to serialize dump_stack() output. It is part of the ongoing printk rework. It will allow to remove the obsoleted printk_safe buffers and introduce atomic consoles. - Some code clean up and sparse warning fixes. * tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: fix cpu lock ordering lib/dump_stack: move cpu lock to printk.c printk: Remove trailing semicolon in macros random32: Fix implicit truncation warning in prandom_seed_state() lib: test_scanf: Remove pointless use of type_min() with unsigned types selftests: lib: Add wrapper script for test_scanf lib: test_scanf: Add tests for sscanf number conversion lib: vsprintf: Fix handling of number field widths in vsscanf lib: vsprintf: scanf: Negative number must have field width > 1 usb: host: xhci-tegra: Switch to use %ptTs nilfs2: Switch to use %ptTs kdb: Switch to use %ptTs lib/vsprintf: Allow to override ISO 8601 date and time separator
2021-06-29tcp: change ICSK_CA_PRIV_SIZE definitionEric Dumazet1-1/+1
Instead of a magic number (13 currently) and having to change it every other year, use sizeof_field() macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: stmmac: option to enable PHY WOL with PMT enabledLing Pei Lee1-0/+1
The current stmmac driver WOL implementation will enable MAC WOL if MAC HW PMT feature is on. Else, the driver will check for PHY WOL support. There is another case where MAC HW PMT is enabled but the platform still goes for the PHY WOL option. E.g, Intel platform are designed for PHY WOL but not MAC WOL although HW MAC PMT features are enabled. Introduce use_phy_wol platform data to select PHY WOL instead of depending on HW PMT features. Set use_phy_wol will disable the plat->pmt which currently used to determine the system to wake up by MAC WOL or PHY WOL. Signed-off-by: Ling Pei Lee <pei.lee.ling@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: sock: add trace for socket errorsAlexander Aring1-0/+60
This patch will add tracers to trace inet socket errors only. A user space monitor application can track connection errors indepedent from socket lifetime and do additional handling. For example a cluster manager can fence a node if errors occurs in a specific heuristic. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29net: sock: introduce sk_error_reportAlexander Aring3-2/+4
This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, for example to trace socket errors. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29Merge tag 'hyperv-next-signed-20210629' of ↵Linus Torvalds1-10/+51
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: "Just a few minor enhancement patches and bug fixes" * tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv() Drivers: hv: Move Hyper-V extended capability check to arch neutral code drivers: hv: Fix missing error code in vmbus_connect() x86/hyperv: fix logical processor creation hv_utils: Fix passing zero to 'PTR_ERR' warning scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer hv_balloon: Remove redundant assignment to region_start
2021-06-29mm,hwpoison: send SIGBUS with error virutal addressNaoya Horiguchi1-0/+5
Now an action required MCE in already hwpoisoned address surely sends a SIGBUS to current process, but the SIGBUS doesn't convey error virtual address. That's not optimal for hwpoison-aware applications. To fix the issue, make memory_failure() call kill_accessing_process(), that does pagetable walk to find the error virtual address. It could find multiple virtual addresses for the same error page, and it seems hard to tell which virtual address is correct one. But that's rare and sending incorrect virtual address could be better than no address. So let's report the first found virtual address for now. [naoya.horiguchi@nec.com: fix walk_page_range() return] Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.com Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jue Wang <juew@google.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: allow high-order pages to be stored on the per-cpu listsMel Gorman1-1/+19
The per-cpu page allocator (PCP) only stores order-0 pages. This means that all THP and "cheap" high-order allocations including SLUB contends on the zone->lock. This patch extends the PCP allocator to store THP and "cheap" high-order pages. Note that struct per_cpu_pages increases in size to 256 bytes (4 cache lines) on x86-64. Note that this is not necessarily a universal performance win because of how it is implemented. High-order pages can cause pcp->high to be exceeded prematurely for lower-orders so for example, a large number of THP pages being freed could release order-0 pages from the PCP lists. Hence, much depends on the allocation/free pattern as observed by a single CPU to determine if caching helps or hurts a particular workload. That said, basic performance testing passed. The following is a netperf UDP_STREAM test which hits the relevant patches as some of the network allocations are high-order. netperf-udp 5.13.0-rc2 5.13.0-rc2 mm-pcpburst-v3r4 mm-pcphighorder-v1r7 Hmean send-64 261.46 ( 0.00%) 266.30 * 1.85%* Hmean send-128 516.35 ( 0.00%) 536.78 * 3.96%* Hmean send-256 1014.13 ( 0.00%) 1034.63 * 2.02%* Hmean send-1024 3907.65 ( 0.00%) 4046.11 * 3.54%* Hmean send-2048 7492.93 ( 0.00%) 7754.85 * 3.50%* Hmean send-3312 11410.04 ( 0.00%) 11772.32 * 3.18%* Hmean send-4096 13521.95 ( 0.00%) 13912.34 * 2.89%* Hmean send-8192 21660.50 ( 0.00%) 22730.72 * 4.94%* Hmean send-16384 31902.32 ( 0.00%) 32637.50 * 2.30%* Functionally, a patch like this is necessary to make bulk allocation of high-order pages work with similar performance to order-0 bulk allocations. The bulk allocator is not updated in this series as it would have to be determined by bulk allocation users how they want to track the order of pages allocated with the bulk allocator. Link: https://lkml.kernel.org/r/20210611135753.GC30378@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEMMike Rapoport1-2/+2
After removal of the DISCONTIGMEM memory model the FLAT_NODE_MEM_MAP configuration option is equivalent to FLATMEM. Drop CONFIG_FLAT_NODE_MEM_MAP and use CONFIG_FLATMEM instead. Link: https://lkml.kernel.org/r/20210608091316.3622-10-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMAMike Rapoport4-9/+9
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [rppt@linux.ibm.com: fix arm boot crash] Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29arch, mm: remove stale mentions of DISCONIGMEMMike Rapoport1-2/+2
There are several places that mention DISCONIGMEM in comments or have stale code guarded by CONFIG_DISCONTIGMEM. Remove the dead code and update the comments. Link: https://lkml.kernel.org/r/20210608091316.3622-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: remove CONFIG_DISCONTIGMEMMike Rapoport2-36/+9
There are no architectures that support DISCONTIGMEM left. Remove the configuration option and the dead code it was guarding in the generic memory management code. Link: https://lkml.kernel.org/r/20210608091316.3622-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: drop SECTION_SHIFT in code commentsDong Aisheng1-2/+0
Actually SECTIONS_SHIFT is used in the kernel code, so the code comments is strictly incorrect. And since commit bbeae5b05ef6 ("mm: move page flags layout to separate header"), SECTIONS_SHIFT definition has been moved to include/linux/page-flags-layout.h, since code itself looks quite straighforward, instead of moving the code comment into the new place as well, we just simply remove it. This also fixed a checkpatch complain derived from the original code: WARNING: please, no space before tabs + * SECTIONS_SHIFT ^I^I#bits space required to store a section #$ Link: https://lkml.kernel.org/r/20210531091908.1738465-2-aisheng.dong@nxp.com Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Suggested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Yu Zhao <yuzhao@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: introduce vm.percpu_pagelist_high_fractionMel Gorman1-0/+3
This introduces a new sysctl vm.percpu_pagelist_high_fraction. It is similar to the old vm.percpu_pagelist_fraction. The old sysctl increased both pcp->batch and pcp->high with the higher pcp->high potentially reducing zone->lock contention. However, the higher pcp->batch value also potentially increased allocation latency while the PCP was refilled. This sysctl only adjusts pcp->high so that zone->lock contention is potentially reduced but allocation latency during a PCP refill remains the same. # grep -E "high:|batch" /proc/zoneinfo | tail -2 high: 649 batch: 63 # sysctl vm.percpu_pagelist_high_fraction=8 # grep -E "high:|batch" /proc/zoneinfo | tail -2 high: 35071 batch: 63 # sysctl vm.percpu_pagelist_high_fraction=64 high: 4383 batch: 63 # sysctl vm.percpu_pagelist_high_fraction=0 high: 649 batch: 63 [mgorman@techsingularity.net: fix documentation] Link: https://lkml.kernel.org/r/20210528151010.GQ30378@techsingularity.net Link: https://lkml.kernel.org/r/20210525080119.5455-7-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: limit the number of pages on PCP lists when reclaim is activeMel Gorman1-0/+1
When kswapd is active then direct reclaim is potentially active. In either case, it is possible that a zone would be balanced if pages were not trapped on PCP lists. Instead of draining remote pages, simply limit the size of the PCP lists while kswapd is active. Link: https://lkml.kernel.org/r/20210525080119.5455-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: scale the number of pages that are batch freedMel Gorman1-1/+2
When a task is freeing a large number of order-0 pages, it may acquire the zone->lock multiple times freeing pages in batches. This may unnecessarily contend on the zone lock when freeing very large number of pages. This patch adapts the size of the batch based on the recent pattern to scale the batch size for subsequent frees. As the machines I used were not large enough to test this are not large enough to illustrate a problem, a debugging patch shows patterns like the following (slightly editted for clarity) Baseline vanilla kernel time-unmap-14426 [...] free_pcppages_bulk: free 63 count 378 high 378 time-unmap-14426 [...] free_pcppages_bulk: free 63 count 378 high 378 time-unmap-14426 [...] free_pcppages_bulk: free 63 count 378 high 378 time-unmap-14426 [...] free_pcppages_bulk: free 63 count 378 high 378 time-unmap-14426 [...] free_pcppages_bulk: free 63 count 378 high 378 With patches time-unmap-7724 [...] free_pcppages_bulk: free 126 count 814 high 814 time-unmap-7724 [...] free_pcppages_bulk: free 252 count 814 high 814 time-unmap-7724 [...] free_pcppages_bulk: free 504 count 814 high 814 time-unmap-7724 [...] free_pcppages_bulk: free 751 count 814 high 814 time-unmap-7724 [...] free_pcppages_bulk: free 751 count 814 high 814 Link: https://lkml.kernel.org/r/20210525080119.5455-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: adjust pcp->high after CPU hotplug eventsMel Gorman1-1/+1
The PCP high watermark is based on the number of online CPUs so the watermarks must be adjusted during CPU hotplug. At the time of hot-remove, the number of online CPUs is already adjusted but during hot-add, a delta needs to be applied to update PCP to the correct value. After this patch is applied, the high watermarks are adjusted correctly. # grep high: /proc/zoneinfo | tail -1 high: 649 # echo 0 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 664 # echo 1 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 649 Link: https://lkml.kernel.org/r/20210525080119.5455-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: delete vm.percpu_pagelist_fractionMel Gorman1-3/+0
Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2. The per-cpu page allocator (PCP) is meant to reduce contention on the zone lock but the sizing of batch and high is archaic and neither takes the zone size into account or the number of CPUs local to a zone. With larger zones and more CPUs per node, the contention is getting worse. Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch and high values means that the sysctl can reduce zone lock contention but also increase allocation latencies. This series disassociates pcp->high from pcp->batch and then scales pcp->high based on the size of the local zone with limited impact to reclaim and accounting for active CPUs but leaves pcp->batch static. It also adapts the number of pages that can be on the pcp list based on recent freeing patterns. The motivation is partially to adjust to larger memory sizes but is also driven by the fact that large batches of page freeing via release_pages() often shows zone contention as a major part of the problem. Another is a bug report based on an older kernel where a multi-terabyte process can takes several minutes to exit. A workaround was to use vm.percpu_pagelist_fraction to increase the pcp->high value but testing indicated that a production workload could not use the same values because of an increase in allocation latencies. Unfortunately, I cannot reproduce this test case myself as the multi-terabyte machines are in active use but it should alleviate the problem. The series aims to address both and partially acts as a pre-requisite. pcp only works with order-0 which is useless for SLUB (when using high orders) and THP (unconditionally). To store high-order pages on PCP, the pcp->high values need to be increased first. This patch (of 6): The vm.percpu_pagelist_fraction is used to increase the batch and high limits for the per-cpu page allocator (PCP). The intent behind the sysctl is to reduce zone lock acquisition when allocating/freeing pages but it has a problem. While it can decrease contention, it can also increase latency on the allocation side due to unreasonably large batch sizes. This leads to games where an administrator adjusts percpu_pagelist_fraction on the fly to work around contention and allocation latency problems. This series aims to alleviate the problems with zone lock contention while avoiding the allocation-side latency problems. For the purposes of review, it's easier to remove this sysctl now and reintroduce a similar sysctl later in the series that deals only with pcp->high. Link: https://lkml.kernel.org/r/20210525080119.5455-1-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210525080119.5455-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: batch the accounting updates in the bulk allocatorMel Gorman1-0/+8
Now that the zone_statistics are simple counters that do not require special protection, the bulk allocator accounting updates can be batch updated without adding too much complexity with protected RMW updates or using xchg. Link: https://lkml.kernel.org/r/20210512095458.30632-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/vmstat: inline NUMA event counter updatesMel Gorman1-1/+9
__count_numa_event is small enough to be treated similarly to __count_vm_event so inline it. Link: https://lkml.kernel.org/r/20210512095458.30632-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/vmstat: convert NUMA statistics to basic NUMA countersMel Gorman2-27/+29
NUMA statistics are maintained on the zone level for hits, misses, foreign etc but nothing relies on them being perfectly accurate for functional correctness. The counters are used by userspace to get a general overview of a workloads NUMA behaviour but the page allocator incurs a high cost to maintain perfect accuracy similar to what is required for a vmstat like NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to turn off the collection of NUMA statistics like NUMA_HIT. This patch converts NUMA_HIT and friends to be NUMA events with similar accuracy to VM events. There is a possibility that slight errors will be introduced but the overall trend as seen by userspace will be similar. The counters are no longer updated from vmstat_refresh context as it is unnecessary overhead for counters that may never be read by userspace. Note that counters could be maintained at the node level to save space but it would have a user-visible impact due to /proc/zoneinfo. [lkp@intel.com: Fix misplaced closing brace for !CONFIG_NUMA] Link: https://lkml.kernel.org/r/20210512095458.30632-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: convert per-cpu list protection to local_lockMel Gorman1-0/+2
There is a lack of clarity of what exactly local_irq_save/local_irq_restore protects in page_alloc.c . It conflates the protection of per-cpu page allocation structures with per-cpu vmstat deltas. This patch protects the PCP structure using local_lock which for most configurations is identical to IRQ enabling/disabling. The scope of the lock is still wider than it should be but this is decreased later. It is possible for the local_lock to be embedded safely within struct per_cpu_pages but it adds complexity to free_unref_page_list. [akpm@linux-foundation.org: coding style fixes] [mgorman@techsingularity.net: work around a pahole limitation with zero-sized struct pagesets] Link: https://lkml.kernel.org/r/20210526080741.GW30378@techsingularity.net [lkp@intel.com: Make pagesets static] Link: https://lkml.kernel.org/r/20210512095458.30632-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_alloc: split per cpu page lists and zone statsMel Gorman2-12/+14
The PCP (per-cpu page allocator in page_alloc.c) shares locking requirements with vmstat and the zone lock which is inconvenient and causes some issues. For example, the PCP list and vmstat share the same per-cpu space meaning that it's possible that vmstat updates dirty cache lines holding per-cpu lists across CPUs unless padding is used. Second, PREEMPT_RT does not want to disable IRQs for too long in the page allocator. This series splits the locking requirements and uses locks types more suitable for PREEMPT_RT, reduces the time when special locking is required for stats and reduces the time when IRQs need to be disabled on !PREEMPT_RT kernels. Why local_lock? PREEMPT_RT considers the following sequence to be unsafe as documented in Documentation/locking/locktypes.rst local_irq_disable(); spin_lock(&lock); The pcp allocator has this sequence for rmqueue_pcplist (local_irq_save) -> __rmqueue_pcplist -> rmqueue_bulk (spin_lock). While it's possible to separate this out, it generally means there are points where we enable IRQs and reenable them again immediately. To prevent a migration and the per-cpu pointer going stale, migrate_disable is also needed. That is a custom lock that is similar, but worse, than local_lock. Furthermore, on PREEMPT_RT, it's undesirable to leave IRQs disabled for too long. By converting to local_lock which disables migration on PREEMPT_RT, the locking requirements can be separated and start moving the protections for PCP, stats and the zone lock to PREEMPT_RT-safe equivalent locking. As a bonus, local_lock also means that PROVE_LOCKING does something useful. After that, it's obvious that zone_statistics incurs too much overhead and leaves IRQs disabled for longer than necessary on !PREEMPT_RT kernels. zone_statistics uses perfectly accurate counters requiring IRQs be disabled for parallel RMW sequences when inaccurate ones like vm_events would do. The series makes the NUMA statistics (NUMA_HIT and friends) inaccurate counters that then require no special protection on !PREEMPT_RT. The bulk page allocator can then do stat updates in bulk with IRQs enabled which should improve the efficiency. Technically, this could have been done without the local_lock and vmstat conversion work and the order simply reflects the timing of when different series were implemented. Finally, there are places where we conflate IRQs being disabled for the PCP with the IRQ-safe zone spinlock. The remainder of the series reduces the scope of what is protected by disabled IRQs on !PREEMPT_RT kernels. By the end of the series, page_alloc.c does not call local_irq_save so the locking scope is a bit clearer. The one exception is that modifying NR_FREE_PAGES still happens in places where it's known the IRQs are disabled as it's harmless for PREEMPT_RT and would be expensive to split the locking there. No performance data is included because despite the overhead of the stats, it's within the noise for most workloads on !PREEMPT_RT. However, Jesper Dangaard Brouer ran a page allocation microbenchmark on a E5-1650 v4 @ 3.60GHz CPU on the first version of this series. Focusing on the array variant of the bulk page allocator reveals the following. (CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz) ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size Baseline Patched 1 56.383 54.225 (+3.83%) 2 40.047 35.492 (+11.38%) 3 37.339 32.643 (+12.58%) 4 35.578 30.992 (+12.89%) 8 33.592 29.606 (+11.87%) 16 32.362 28.532 (+11.85%) 32 31.476 27.728 (+11.91%) 64 30.633 27.252 (+11.04%) 128 30.596 27.090 (+11.46%) While this is a positive outcome, the series is more likely to be interesting to the RT people in terms of getting parts of the PREEMPT_RT tree into mainline. This patch (of 9): The per-cpu page allocator lists and the per-cpu vmstat deltas are stored in the same struct per_cpu_pages even though vmstats have no direct impact on the per-cpu page lists. This is inconsistent because the vmstats for a node are stored on a dedicated structure. The bigger issue is that the per_cpu_pages structure is not cache-aligned and stat updates either cache conflict with adjacent per-cpu lists incurring a runtime cost or padding is required incurring a memory cost. This patch splits the per-cpu pagelists and the vmstat deltas into separate structures. It's mostly a mechanical conversion but some variable renaming is done to clearly distinguish the per-cpu pages structure (pcp) from the vmstats (pzstats). Superficially, this appears to increase the size of the per_cpu_pages structure but the movement of expire fills a structure hole so there is no impact overall. [mgorman@techsingularity.net: make it W=1 cleaner] Link: https://lkml.kernel.org/r/20210514144622.GA3735@techsingularity.net [mgorman@techsingularity.net: make it W=1 even cleaner] Link: https://lkml.kernel.org/r/20210516140705.GB3735@techsingularity.net [lkp@intel.com: check struct per_cpu_zonestat has a non-zero size] [vbabka@suse.cz: Init zone->per_cpu_zonestats properly] Link: https://lkml.kernel.org/r/20210512095458.30632-1-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210512095458.30632-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: optimise nth_page for contiguous memmapMatthew Wilcox (Oracle)1-0/+4
If the memmap is virtually contiguous (either because we're using a virtually mapped memmap or because we don't support a discontig memmap at all), then we can implement nth_page() by simple addition. Contrary to popular belief, the compiler is not able to optimise this itself for a vmemmap configuration. This reduces one example user (sg.c) by four instructions: struct page *page = nth_page(rsv_schp->pages[k], offset >> PAGE_SHIFT); before: 49 8b 45 70 mov 0x70(%r13),%rax 48 63 c9 movslq %ecx,%rcx 48 c1 eb 0c shr $0xc,%rbx 48 8b 04 c8 mov (%rax,%rcx,8),%rax 48 2b 05 00 00 00 00 sub 0x0(%rip),%rax R_X86_64_PC32 vmemmap_base-0x4 48 c1 f8 06 sar $0x6,%rax 48 01 d8 add %rbx,%rax 48 c1 e0 06 shl $0x6,%rax 48 03 05 00 00 00 00 add 0x0(%rip),%rax R_X86_64_PC32 vmemmap_base-0x4 after: 49 8b 45 70 mov 0x70(%r13),%rax 48 63 c9 movslq %ecx,%rcx 48 c1 eb 0c shr $0xc,%rbx 48 c1 e3 06 shl $0x6,%rbx 48 03 1c c8 add (%rax,%rcx,8),%rbx Link: https://lkml.kernel.org/r/20210413194625.1472345-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Douglas Gilbert <dougg@torque.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: constify page_count and page_ref_countMatthew Wilcox (Oracle)1-2/+2
Now that compound_head() accepts a const struct page pointer, these two functions can be marked as not modifying the page pointer they are passed. Link: https://lkml.kernel.org/r/20210416231531.2521383-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetypeMatthew Wilcox (Oracle)1-1/+1
The struct page is not modified by these routines, so it can be marked const. Link: https://lkml.kernel.org/r/20210416231531.2521383-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: make compound_head const-preservingMatthew Wilcox (Oracle)1-5/+5
If you pass a const pointer to compound_head(), you get a const pointer back; if you pass a mutable pointer, you get a mutable pointer back. Also remove an unnecessary forward definition of struct page; we're about to dereference page->compound_head, so it must already have been defined. Link: https://lkml.kernel.org/r/20210416231531.2521383-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/page_owner: constify dump_page_ownerMatthew Wilcox (Oracle)1-3/+3
dump_page_owner() only uses struct page to find the page_ext, and lookup_page_ext() already takes a const argument. Link: https://lkml.kernel.org/r/20210416231531.2521383-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>