summaryrefslogtreecommitdiff
path: root/Documentation
AgeCommit message (Collapse)AuthorFilesLines
2021-11-06Docs/admin-guide/mm/pagemap: wordsmith page flags descriptionsSeongJae Park1-26/+27
Some descriptions of page flags in 'pagemap.rst' are written in assumption of none-rst, which respects every new line, as below: 7 - SLAB page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When compound page is used, SLUB/SLQB will only set this flag on the head Because rst ignores the new line between the first sentence and second sentence, resulting html looks a little bit weird, as below. 7 - SLAB page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When ^ compound page is used, SLUB/SLQB will only set this flag on the head page; SLOB will not flag it at all. This change makes it more natural and consistent with other parts in the rendered version. Link: https://lkml.kernel.org/r/20211022090311.3856-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: simplify the contentSeongJae Park1-53/+60
Information in 'TL; DR' section of 'Getting Started' is duplicated in other parts of the doc. It is also asking readers to visit the access pattern visualizations gallery web site to show the results of example visualization commands, while the users of the commands can use terminal output. To make the doc simple, this removes the duplicated 'TL; DR' section and replaces the visualization example commands with versions using terminal outputs. Link: https://lkml.kernel.org/r/20211022090311.3856-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: fix a wrong linkSeongJae Park1-1/+3
The 'Getting Started' of DAMON is providing a link to DAMON's user interface document while saying about its user space tool's detailed usages. This fixes the link. Link: https://lkml.kernel.org/r/20211022090311.3856-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: fix wrong example commandsSeongJae Park1-7/+7
Patch series "Fix trivial nits in Documentation/admin-guide/mm". This patchset fixes trivial nits in admin guide documents for DAMON and pagemap. This patch (of 4): Some of the example commands in DAMON getting started guide are outdated, missing sudo, or just wrong. This fixes those. Link: https://lkml.kernel.org/r/20211022090311.3856-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIMSeongJae Park2-0/+236
This adds an admin-guide document for DAMON-based Reclamation. Link: https://lkml.kernel.org/r/20211019150731.16699-16-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/DAMON: document physical memory monitoring supportSeongJae Park3-19/+40
This updates the DAMON documents for the physical memory address space monitoring support. Link: https://lkml.kernel.org/r/20211012205711.29216-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon: document 'init_regions' featureSeongJae Park1-2/+39
This adds description of the 'init_regions' feature in the DAMON usage document. Link: https://lkml.kernel.org/r/20211012205711.29216-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon: document DAMON-based Operation SchemesSeongJae Park2-2/+60
This adds the description of DAMON-based operation schemes in the DAMON documents. Link: https://lkml.kernel.org/r/20211001125604.29660-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06docs/vm/damon: remove broken referenceSeongJae Park1-1/+0
Building DAMON documents warns for a reference to nonexisting doc, as below: $ time make htmldocs [...] Documentation/vm/damon/index.rst:24: WARNING: toctree contains reference to nonexisting document 'vm/damon/plans' This fixes the warning by removing the wrong reference. Link: https://lkml.kernel.org/r/20210917123958.3819-4-sj@kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation/vm: move user guides to admin-guide/mm/SeongJae Park4-21/+7
Most memory management user guide documents are in 'admin-guide/mm/', but two of those are in 'vm/'. This moves the two docs into 'admin-guide/mm' for easier documents finding. Link: https://lkml.kernel.org/r/20210917123958.3819-2-sj@kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06kfence: default to dynamic branch instead of static keys modeMarco Elver1-4/+8
We have observed that on very large machines with newer CPUs, the static key/branch switching delay is on the order of milliseconds. This is due to the required broadcast IPIs, which simply does not scale well to hundreds of CPUs (cores). If done too frequently, this can adversely affect tail latencies of various workloads. One workaround is to increase the sample interval to several seconds, while decreasing sampled allocation coverage, but the problem still exists and could still increase tail latencies. As already noted in the Kconfig help text, there are trade-offs: at lower sample intervals the dynamic branch results in better performance; however, at very large sample intervals, the static keys mode can result in better performance -- careful benchmarking is recommended. Our initial benchmarking showed that with large enough sample intervals and workloads stressing the allocator, the static keys mode was slightly better. Evaluating and observing the possible system-wide side-effects of the static-key-switching induced broadcast IPIs, however, was a blind spot (in particular on large machines with 100s of cores). Therefore, a major downside of the static keys mode is, unfortunately, that it is hard to predict performance on new system architectures and topologies, but also making conclusions about performance of new workloads based on a limited set of benchmarks. Most distributions will simply select the defaults, while targeting a large variety of different workloads and system architectures. As such, the better default is CONFIG_KFENCE_STATIC_KEYS=n, and re-enabling it is only recommended after careful evaluation. For reference, on x86-64 the condition in kfence_alloc() generates exactly 2 instructions in the kmem_cache_alloc() fast-path: | ... | cmpl $0x0,0x1a8021c(%rip) # ffffffff82d560d0 <kfence_allocation_gate> | je ffffffff812d6003 <kmem_cache_alloc+0x243> | ... which, given kfence_allocation_gate is infrequently modified, should be well predicted by most CPUs. Link: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06kfence: add note to documentation about skipping covered allocationsMarco Elver1-0/+11
Add a note briefly mentioning the new policy about "skipping currently covered allocations if pool close to full." Since this has a notable impact on KFENCE's bug-detection ability on systems with large uptimes, it is worth pointing out the feature. Link: https://lkml.kernel.org/r/20210923104803.2620285-5-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Jann Horn <jannh@google.com> Cc: Taras Madan <tarasmadan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06zram: introduce an aged idle interfaceBrian Geffon1-0/+8
This change introduces an aged idle interface to the existing idle sysfs file for zram. When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also accepts an integer argument. This integer is the age (in seconds) of pages to mark as idle. The idle file still supports 'all' as it always has. This new approach allows for much more control over which pages get marked as idle. [bgeffon@google.com: use IS_ENABLED and cleanup comment] Link: https://lkml.kernel.org/r/20210924161128.1508015-1-bgeffon@google.com [bgeffon@google.com: Sergey's cleanup suggestions] Link: https://lkml.kernel.org/r/20210929143056.13067-1-bgeffon@google.com Link: https://lkml.kernel.org/r/20210923130115.1344361-1-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06mm/memory_hotplug: remove HIGHMEM leftoversDavid Hildenbrand2-7/+0
We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not HIGHMEM. Let's remove any leftover code -- including the unused "status_change_nid_high" field part of the memory notifier. Link: https://lkml.kernel.org/r/20210929143600.49379-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: document the "auto-movable" online policyDavid Hildenbrand1-20/+121
Commit e83a437faa62 ("mm/memory_hotplug: introduce "auto-movable" online policy") introduced a new memory online policy to automatically select a zone for memory blocks to be onlined. It added a way to set the active online policy and tunables for the auto-movable online policy. Follow-up commits tweaked the "auto-movable" policy to also consider memory device details when selecting zones for memory blocks to be onlined. Let's document the new toggles and how the two online policies we have work. [david@redhat.com: updates] Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ pathDavid Hildenbrand1-1/+1
We accidentially added a superfluous "s". Link: https://lkml.kernel.org/r/20210930144117.23641-3-david@redhat.com Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: fix two instances of "movablecore" that should be ↵David Hildenbrand1-2/+2
"movable_node" Patch series "memory-hotplug.rst: document the "auto-movable" online policy". Now that the memory-hotplug.rst overhaul is upstream, proper documentation for the "auto-movable" online policy, documenting all new toggles and options. Along, two fixes for the original overhaul. This patch (of 3): We really want to refer to the "movable_node" kernel command line parameter here. Link: https://lkml.kernel.org/r/20210930144117.23641-2-david@redhat.com Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06hugetlbfs: extend the definition of hugepages parameter to support node ↵Zhenguo Yao2-4/+16
allocation We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepages in one node. For example: DPDK needs hugepages which are in the same node as NIC. If DPDK needs four hugepages of 1G size in node1 and system has 16 numa nodes we must reserve 64 hugepages on the kernel cmdline. But only four hugepages are used. The others should be free after boot. If the system memory is low(for example: 64G), it will be an impossible task. So extend the hugepages parameter to support specifying hugepages on a specific node. For example add following parameter: hugepagesz=1G hugepages=0:1,1:3 It will allocate 1 hugepage in node0 and 3 hugepages in node1. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06tools/vm/page_owner_sort.c: count and sort by memZhenliang Wei1-1/+22
When viewing page owner information, we may be more concerned about the total memory rather than the times of stack appears. Therefore, the following adjustments are made: 1. Added the statistics on the total number of pages. 2. Added the optional parameter "-m" to configure the program to sort by memory (total pages). The general output of page_owner is as follows: Page allocated via order XXX, ... PFN XXX ... // Detailed stack Page allocated via order XXX, ... PFN XXX ... // Detailed stack The original page_owner_sort ignores PFN rows, puts the remaining rows in buf, counts the times of buf, and finally sorts them according to the times. General output: XXX times: Page allocated via order XXX, ... // Detailed stack Now, we use regexp to extract the page order value from the buf, and count the total pages for the buf. General output: XXX times, XXX pages: Page allocated via order XXX, ... // Detailed stack By default, it is still sorted by the times of buf; If you want to sort by the pages nums of buf, use the new -m parameter. Link: https://lkml.kernel.org/r/1631678242-41033-1-git-send-email-weizhenliang@huawei.com Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com> Cc: Tang Bin <tangbin@cmss.chinamobile.com> Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Cc: Zhenliang Wei <weizhenliang@huawei.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06hugetlb: support node specified when using cma for gigantic hugepagesBaolin Wang1-2/+4
Now the size of CMA area for gigantic hugepages runtime allocation is balanced for all online nodes, but we also want to specify the size of CMA per-node, or only one node in some cases, which are similar with patch [1]. For example, on some multi-nodes systems, each node's memory can be different, allocating the same size of CMA for each node is not suitable for the low-memory nodes. Meanwhile some workloads like DPDK mentioned by Zhenguo in patch [1] only need hugepages in one node. On the other hand, we have some machines with multiple types of memory, like DRAM and PMEM (persistent memory). On this system, we may want to specify all the hugepages only on DRAM node, or specify the proportion of DRAM node and PMEM node, to tuning the performance of the workloads. Thus this patch adds node format for 'hugetlb_cma' parameter to support specifying the size of CMA per-node. An example is as follows: hugetlb_cma=0:5G,2:5G which means allocating 5G size of CMA area on node 0 and node 2 respectively. And the users should use the node specific sysfs file to allocate the gigantic hugepages if specified the CMA size on that node. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com [1] Link: https://lkml.kernel.org/r/bb790775ca60bb8f4b26956bb3f6988f74e075c7.1634261144.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06hugetlb: add demote hugetlb page sysfs interfacesMike Kravetz1-2/+28
Patch series "hugetlb: add demote/split page functionality", v4. The concurrent use of multiple hugetlb page sizes on a single system is becoming more common. One of the reasons is better TLB support for gigantic page sizes on x86 hardware. In addition, hugetlb pages are being used to back VMs in hosting environments. When using hugetlb pages to back VMs, it is often desirable to preallocate hugetlb pools. This avoids the delay and uncertainty of allocating hugetlb pages at VM startup. In addition, preallocating huge pages minimizes the issue of memory fragmentation that increases the longer the system is up and running. In such environments, a combination of larger and smaller hugetlb pages are preallocated in anticipation of backing VMs of various sizes. Over time, the preallocated pool of smaller hugetlb pages may become depleted while larger hugetlb pages still remain. In such situations, it is desirable to convert larger hugetlb pages to smaller hugetlb pages. Converting larger to smaller hugetlb pages can be accomplished today by first freeing the larger page to the buddy allocator and then allocating the smaller pages. For example, to convert 50 GB pages on x86: gb_pages=`cat .../hugepages-1048576kB/nr_hugepages` m2_pages=`cat .../hugepages-2048kB/nr_hugepages` echo $(($gb_pages - 50)) > .../hugepages-1048576kB/nr_hugepages echo $(($m2_pages + 25600)) > .../hugepages-2048kB/nr_hugepages On an idle system this operation is fairly reliable and results are as expected. The number of 2MB pages is increased as expected and the time of the operation is a second or two. However, when there is activity on the system the following issues arise: 1) This process can take quite some time, especially if allocation of the smaller pages is not immediate and requires migration/compaction. 2) There is no guarantee that the total size of smaller pages allocated will match the size of the larger page which was freed. This is because the area freed by the larger page could quickly be fragmented. In a test environment with a load that continually fills the page cache with clean pages, results such as the following can be observed: Unexpected number of 2MB pages allocated: Expected 25600, have 19944 real 0m42.092s user 0m0.008s sys 0m41.467s To address these issues, introduce the concept of hugetlb page demotion. Demotion provides a means of 'in place' splitting of a hugetlb page to pages of a smaller size. This avoids freeing pages to buddy and then trying to allocate from buddy. Page demotion is controlled via sysfs files that reside in the per-hugetlb page size and per node directories. - demote_size Target page size for demotion, a smaller huge page size. File can be written to chose a smaller huge page size if multiple are available. - demote Writable number of hugetlb pages to be demoted To demote 50 GB huge pages, one would: cat .../hugepages-1048576kB/free_hugepages /* optional, verify free pages */ cat .../hugepages-1048576kB/demote_size /* optional, verify target size */ echo 50 > .../hugepages-1048576kB/demote Only hugetlb pages which are free at the time of the request can be demoted. Demotion does not add to the complexity of surplus pages and honors reserved huge pages. Therefore, when a value is written to the sysfs demote file, that value is only the maximum number of pages which will be demoted. It is possible fewer will actually be demoted. The recently introduced per-hstate mutex is used to synchronize demote operations with other operations that modify hugetlb pools. Real world use cases -------------------- The above scenario describes a real world use case where hugetlb pages are used to back VMs on x86. Both issues of long allocation times and not necessarily getting the expected number of smaller huge pages after a free and allocate cycle have been experienced. The occurrence of these issues is dependent on other activity within the host and can not be predicted. This patch (of 5): Two new sysfs files are added to demote hugtlb pages. These files are both per-hugetlb page size and per node. Files are: demote_size - The size in Kb that pages are demoted to. (read-write) demote - The number of huge pages to demote. (write-only) By default, demote_size is the next smallest huge page size. Valid huge page sizes less than huge page size may be written to this file. When huge pages are demoted, they are demoted to this size. Writing a value to demote will result in an attempt to demote that number of hugetlb pages to an appropriate number of demote_size pages. NOTE: Demote interfaces are only provided for huge page sizes if there is a smaller target demote huge page size. For example, on x86 1GB huge pages will have demote interfaces. 2MB huge pages will not have demote interfaces. This patch does not provide full demote functionality. It only provides the sysfs interfaces. It also provides documentation for the new interfaces. [mike.kravetz@oracle.com: n_mask initialization does not need to be protected by the mutex] Link: https://lkml.kernel.org/r/0530e4ef-2492-5186-f919-5db68edea654@oracle.com Link: https://lkml.kernel.org/r/20211007181918.136982-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: David Rientjes <rientjes@google.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Nghia Le <nghialm78@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation: update pagemap with shmem exceptionsTiberiu A Georgescu1-0/+22
This patch follows the discussions on previous documentation patch threads [1][2]. It presents the exception case of shared memory management from the pagemap's point of view. It briefly describes what is missing, why it is missing and alternatives to the pagemap for page info retrieval in user space. In short, the kernel does not keep track of PTEs for swapped out shared pages within the processes that references them. Thus, the proc/pid/pagemap tool cannot print the swap destination of the shared memory pages, instead setting the pagemap entry to zero for both non-allocated and swapped out pages. This can create confusion for users who need information on swapped out pages. The reasons why maintaining the PTEs of all swapped out shared pages among all processes while maintaining similar performance is not a trivial task, or a desirable change, have been discussed extensively [1][3][4][5]. There are also arguments for why this arguably missing information should eventually be exposed to the user in either a future pagemap patch, or by an alternative tool. [1]: https://marc.info/?m=162878395426774 [2]: https://lore.kernel.org/lkml/20210920164931.175411-1-tiberiu.georgescu@nutanix.com/ [3]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@nutanix.com/ [4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@redhat.com/ [5]: https://lore.kernel.org/lkml/20210715201651.212134-1-peterx@redhat.com/ Mention the current missing information in the pagemap and alternatives on how to retrieve it, in case someone stumbles upon unexpected behaviour. Link: https://lkml.kernel.org/r/20210923064618.157046-1-tiberiu.georgescu@nutanix.com Link: https://lkml.kernel.org/r/20210923064618.157046-2-tiberiu.georgescu@nutanix.com Signed-off-by: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Reviewed-by: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Reviewed-by: Florian Schmidt <florian.schmidt@nutanix.com> Reviewed-by: Carl Waldspurger <carl.waldspurger@nutanix.com> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memcg, kmem: further deprecate kmem.limit_in_bytesShakeel Butt1-9/+2
The deprecation process of kmem.limit_in_bytes started with the commit 0158115f702 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also explains in detail the motivation behind the deprecation. To summarize, it is the unexpected behavior on hitting the kmem limit. This patch moves the deprecation process to the next stage by disallowing to set the kmem limit. In future we might just remove the kmem.limit_in_bytes file completely. [akpm@linux-foundation.org: s/ENOTSUPP/EOPNOTSUPP/] [arnd@arndb.de: mark cancel_charge() inline] Link: https://lkml.kernel.org/r/20211022070542.679839-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20211019153408.2916808-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28Merge tag 'net-5.15-rc8' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from WiFi (mac80211), and BPF. Current release - regressions: - skb_expand_head: adjust skb->truesize to fix socket memory accounting - mptcp: fix corrupt receiver key in MPC + data + checksum Previous releases - regressions: - multicast: calculate csum of looped-back and forwarded packets - cgroup: fix memory leak caused by missing cgroup_bpf_offline - cfg80211: fix management registrations locking, prevent list corruption - cfg80211: correct false positive in bridge/4addr mode check - tcp_bpf: fix race in the tcp_bpf_send_verdict resulting in reusing previous verdict Previous releases - always broken: - sctp: enhancements for the verification tag, prevent attackers from killing SCTP sessions - tipc: fix size validations for the MSG_CRYPTO type - mac80211: mesh: fix HE operation element length check, prevent out of bound access - tls: fix sign of socket errors, prevent positive error codes being reported from read()/write() - cfg80211: scan: extend RCU protection in cfg80211_add_nontrans_list() - implement ->sock_is_readable() for UDP and AF_UNIX, fix poll() for sockets in a BPF sockmap - bpf: fix potential race in tail call compatibility check resulting in two operations which would make the map incompatible succeeding - bpf: prevent increasing bpf_jit_limit above max - bpf: fix error usage of map_fd and fdget() in generic batch update - phy: ethtool: lock the phy for consistency of results - prevent infinite while loop in skb_tx_hash() when Tx races with driver reconfiguring the queue <> traffic class mapping - usbnet: fixes for bad HW conjured by syzbot - xen: stop tx queues during live migration, prevent UAF - net-sysfs: initialize uid and gid before calling net_ns_get_ownership - mlxsw: prevent Rx stalls under memory pressure" * tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) Revert "net: hns3: fix pause config problem after autoneg disabled" mptcp: fix corrupt receiver key in MPC + data + checksum riscv, bpf: Fix potential NULL dereference octeontx2-af: Fix possible null pointer dereference. octeontx2-af: Display all enabled PF VF rsrc_alloc entries. octeontx2-af: Check whether ipolicers exists net: ethernet: microchip: lan743x: Fix skb allocation failure net/tls: Fix flipped sign in async_wait.err assignment net/tls: Fix flipped sign in tls_err_abort() calls net/smc: Correct spelling mistake to TCPF_SYN_RECV net/smc: Fix smc_link->llc_testlink_time overflow nfp: bpf: relax prog rejection for mtu check through max_pkt_offset vmxnet3: do not stop tx queues after netif_device_detach() r8169: Add device 10ec:8162 to driver r8169 ptp: Document the PTP_CLK_MAGIC ioctl number usbnet: fix error return code in usbnet_probe() net: hns3: adjust string spaces of some parameters of tx bd info in debugfs net: hns3: expand buffer len for some debugfs command net: hns3: add more string spaces for dumping packets number of queue info in debugfs net: hns3: fix data endian problem of some functions of debugfs ...
2021-10-28ptp: Document the PTP_CLK_MAGIC ioctl numberRandy Dunlap1-0/+1
Add PTP_CLK_MAGIC to the userspace-api/ioctl/ioctl-number.rst documentation file. Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20211024163831.10200-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25Merge tag 'pinctrl-v5.15-3' of ↵Linus Torvalds2-24/+20
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some late pin control fixes, the most generally annoying will probably be the AMD IRQ storm fix affecting the Microsoft surface. Summary: - Three fixes pertaining to Broadcom DT bindings. Some stuff didn't work out as inteded, we need to back out - A resume bug fix in the STM32 driver - Disable and mask the interrupts on probe in the AMD pinctrl driver, affecting Microsoft surface" * tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: disable and mask interrupts on probe pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume() Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode" dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
2021-10-22Merge tag 'net-5.15-rc7' of ↵Linus Torvalds2-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and can. We'll have one more fix for a socket accounting regression, it's still getting polished. Otherwise things look fine. Current release - regressions: - revert "vrf: reset skb conntrack connection on VRF rcv", there are valid uses for previous behavior - can: m_can: fix iomap_read_fifo() and iomap_write_fifo() Current release - new code bugs: - mlx5: e-switch, return correct error code on group creation failure Previous releases - regressions: - sctp: fix transport encap_port update in sctp_vtag_verify - stmmac: fix E2E delay mechanism (in PTP timestamping) Previous releases - always broken: - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of init - netfilter: ipvs: make global sysctl read-only in non-init netns - tcp: md5: fix selection between vrf and non-vrf keys - ipv6: count rx stats on the orig netdev when forwarding - bridge: mcast: use multicast_membership_interval for IGMPv3 - can: - j1939: fix UAF for rx_kref of j1939_priv abort sessions on receiving bad messages - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix return error on FC timeout on TX path - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited - hns3: schedule the polling again when allocation fails, prevent stalls - drivers: add missing of_node_put() when aborting for_each_available_child_of_node() - ptp: fix possible memory leak and UAF in ptp_clock_register() - e1000e: fix packet loss in burst mode on Tiger Lake and later - mlx5e: ipsec: fix more checksum offload issues" * tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) usbnet: sanity check for maxpacket net: enetc: make sure all traffic classes can send large frames net: enetc: fix ethtool counter name for PM0_TERR ptp: free 'vclock_index' in ptp_clock_release() sfc: Don't use netif_info before net_device setup sfc: Export fibre-specific supported link modes net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags net/mlx5e: IPsec: Fix a misuse of the software parser's fields net/mlx5e: Fix vlan data lost during suspend flow net/mlx5: E-switch, Return correct error code on group creation failure net/mlx5: Lag, change multipath and bonding to be mutually exclusive ice: Add missing E810 device ids igc: Update I226_K device ID e1000e: Fix packet loss on Tiger Lake and later e1000e: Separate TGP board type from SPT ptp: Fix possible memory leak in ptp_clock_register() net: stmmac: Fix E2E delay mechanism nfc: st95hf: Make spi remove() callback return zero net: hns3: disable sriov before unload hclge layer net: hns3: fix vf reset workqueue cannot exit ...
2021-10-18mctp: unify sockaddr_mctp typesJeremy Kerr1-5/+5
Use the more precise __kernel_sa_family_t for smctp_family, to match struct sockaddr. Also, use an unsigned int for the network member; negative networks don't make much sense. We're already using unsigned for mctp_dev and mctp_skb_cb, but need to change mctp_sock to suit. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+1
Pull virtio fixes from Michael Tsirkin: "Fixes up some issues in rc5" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: Fix the wrong input in config_cb VDUSE: fix documentation underline warning Revert "virtio-blk: Add validation for block size in config space" vhost_vdpa: unset vq irq before freeing irq virtio: write back F_VERSION_1 before validate
2021-10-16Merge branch '100GbE' of ↵Jakub Kicinski1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-10-14 Brett ensures RDMA nodes are removed during release and rebuild. He also corrects fw.mgmt.api to include the patch number for proper identification. Dave stops ida_free() being called when an IDA has not been allocated. Michal corrects the order of parameters being provided and the number of entries skipped for UDP tunnels. ==================== Link: https://lore.kernel.org/r/20211014181953.3538330-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15Merge tag 'spi-fix-v5.15-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small fixes. Mostly driver specific but there's one in the core which fixes a deadlock when adding devices on spi-mux that's triggered because spi-mux is a SPI device which is itself a SPI controller and so can instantiate devices when registered. We were using a global lock to protect against reusing chip selects but they're a per controller thing so moving the lock per controller resolves that" * tag 'spi-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi-mux: Fix false-positive lockdep splats spi: Fix deadlock when adding SPI controllers on SPI buses spi: bcm-qspi: clear MSPI spifie interrupt during probe spi: spi-nxp-fspi: don't depend on a specific node name erratum workaround spi: mediatek: skip delays if they are 0 spi: atmel: Fix PDC transfer setup bug spi: spidev: Add SPI ID table spi: Use 'flash' node name instead of 'spi-flash' in example
2021-10-15Merge tag 'ntfs3_for_5.15' of ↵Linus Torvalds1-66/+75
git://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: "Use the new api for mounting as requested by Christoph. Also fixed: - some memory leaks and panic - xfstests (tested on x86_64) generic/016 generic/021 generic/022 generic/041 generic/274 generic/423 - some typos, wrong returned error codes, dead code, etc" * tag 'ntfs3_for_5.15' of git://github.com/Paragon-Software-Group/linux-ntfs3: (70 commits) fs/ntfs3: Check for NULL pointers in ni_try_remove_attr_list fs/ntfs3: Refactor ntfs_read_mft fs/ntfs3: Refactor ni_parse_reparse fs/ntfs3: Refactor ntfs_create_inode fs/ntfs3: Refactor ntfs_readlink_hlp fs/ntfs3: Rework ntfs_utf16_to_nls fs/ntfs3: Fix memory leak if fill_super failed fs/ntfs3: Keep prealloc for all types of files fs/ntfs3: Remove unnecessary functions fs/ntfs3: Forbid FALLOC_FL_PUNCH_HOLE for normal files fs/ntfs3: Refactoring of ntfs_set_ea fs/ntfs3: Remove locked argument in ntfs_set_ea fs/ntfs3: Use available posix_acl_release instead of ntfs_posix_acl_release fs/ntfs3: Check for NULL if ATTR_EA_INFO is incorrect fs/ntfs3: Refactoring of ntfs_init_from_boot fs/ntfs3: Reject mount if boot's cluster size < media sector size fs/ntfs3: Refactoring lock in ntfs_init_acl fs/ntfs3: Change posix_acl_equiv_mode to posix_acl_update_mode fs/ntfs3: Pass flags to ntfs_set_ea in ntfs_set_acl_ex fs/ntfs3: Refactor ntfs_get_acl_ex for better readability ...
2021-10-15Merge tag 'net-5.15-rc6' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Quite calm. The noisy DSA driver (embedded switches) changes, and adjustment to IPv6 IOAM behavior add to diffstat's bottom line but are not scary. Current release - regressions: - af_unix: rename UNIX-DGRAM to UNIX to maintain backwards compatibility - procfs: revert "add seq_puts() statement for dev_mcast", minor format change broke user space Current release - new code bugs: - dsa: fix bridge_num not getting cleared after ports leaving the bridge, resource leak - dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0, prevent packet drops if pvid is removed - dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW getting confused about station to VLAN mapping Previous releases - regressions: - virtio-net: fix for skb_over_panic inside big mode - phy: do not shutdown PHYs in READY state - dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link LED staying lit after ifdown - mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL) - mqprio: Correct stats in mqprio_dump_class_stats() - ice: fix deadlock for Tx timestamp tracking flush - stmmac: fix feature detection on old hardware Previous releases - always broken: - sctp: account stream padding length for reconf chunk - icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe() - isdn: cpai: check ctr->cnr to avoid array index out of bound - isdn: mISDN: fix sleeping function called from invalid context - nfc: nci: fix potential UAF of rf_conn_info object - dsa: microchip: prevent ksz_mib_read_work from kicking back in after it's canceled in .remove and crashing - dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports - dsa: sja1105, ocelot: break circular dependency between switch and tag drivers - dsa: felix: improve timestamping in presence of packe loss - mlxsw: thermal: fix out-of-bounds memory accesses Misc: - ipv6: ioam: move the check for undefined bits to improve interoperability" * tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe MAINTAINERS: Update the devicetree documentation path of imx fec driver sctp: account stream padding length for reconf chunk mlxsw: thermal: Fix out-of-bounds memory accesses ethernet: s2io: fix setting mac address during resume NFC: digital: fix possible memory leak in digital_in_send_sdd_req() NFC: digital: fix possible memory leak in digital_tg_listen_mdaa() nfc: fix error handling of nfc_proto_register() Revert "net: procfs: add seq_puts() statement for dev_mcast" net: encx24j600: check error in devm_regmap_init_encx24j600 net: korina: select CRC32 net: arc: select CRC32 net: dsa: felix: break at first CPU port during init and teardown net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header net: mscc: ocelot: deny TX timestamping of non-PTP packets net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb ...
2021-10-14ice: Print the api_patch as part of the fw.mgmt.apiBrett Creeley1-4/+5
Currently when a user uses "devlink dev info", the fw.mgmt.api will be the major.minor numbers as shown below: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7 <--- No patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 There are many features in the driver that depend on the major, minor, and patch version of the FW. Without the patch number in the output for fw.mgmt.api debugging issues related to the FW API version is difficult. Also, using major.minor.patch aligns with the existing firmware version which uses a 3 digit value. Fix this by making the fw.mgmt.api print the major.minor.patch versions. Shown below is the result: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7.9 <--- patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from exampleRafał Miłecki1-16/+8
There is no need to include CRU in example of this binding. It wasn't complete / correct anyway. The proper binding can be find in the mfd/brcm,cru.yaml . Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211008205938.29925-2-zajec5@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-14Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"Rafał Miłecki2-15/+19
This reverts commit 2ae80900f239484069569380e1fc4340fd6e0089. My rework was unneeded & wrong. It replaced a clear & correct "reg" property usage with a custom "offset" one. Back then I didn't understand how to properly handle CRU block binding. I heard / read about syscon and tried to use it in a totally invalid way. That change also missed Rob's review (obviously). Northstar's pin controller is a simple consistent hardware block that can be cleanly mapped using a 0x24 long reg space. Since the rework commit there wasn't any follow up modifying in-kernel DTS files to use the new binding. Broadcom also isn't known to use that bugged binding. There is close to zero chance this revert may actually cause problems / regressions. This commit is a simple revert. Example binding may (should) be updated / cleaned up but that can be handled separately. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211008205938.29925-1-zajec5@gmail.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-13VDUSE: fix documentation underline warningRandy Dunlap1-1/+1
Fix a VDUSE documentation build warning: Documentation/userspace-api/vduse.rst:21: WARNING: Title underline too short. Fixes: 7bc7f61897b6 ("Documentation: Add documentation for VDUSE") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Xie Yongji <xieyongji@bytedance.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20211006202904.30241-1-rdunlap@infradead.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-10-12Merge branch 'for-5.15-fixes' of ↵Linus Torvalds1-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "All documentation / comment updates" * 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroupv2, docs: fix misinformation in "device controller" section cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem docs/cgroup: remove some duplicate words
2021-10-08Merge tag 'for-linus-5.15b-rc5-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - fix two minor issues in the Xen privcmd driver plus a cleanup patch for that driver - fix multiple issues related to running as PVH guest and some related earlyprintk fixes for other Xen guest types - fix an issue introduced in 5.15 the Xen balloon driver * tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: fix cancelled balloon action xen/x86: adjust data placement x86/PVH: adjust function/data placement xen/x86: hook up xen_banner() also for PVH xen/x86: generalize preferred console model from PV to PVH Dom0 xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU xen/x86: allow "earlyprintk=xen" to work for PV Dom0 xen/x86: make "earlyprintk=xen" work better for PVH Dom0 xen/x86: allow PVH Dom0 without XEN_PV=y xen/x86: prevent PVH type from getting clobbered xen/privcmd: drop "pages" parameter from xen_remap_pfn() xen/privcmd: fix error handling in mmap-resource processing xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
2021-10-08Merge tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drmLinus Torvalds3-12/+3
Pull drm fixes from Dave Airlie: "I've returned from my tropical island retreat, even managed to bring one of my kids on a dive with some turtles. Thanks to Daniel for doing last week's work. Otherwise this is the weekly fixes pull, it's a bit bigger because the vc4 reverts in your tree caused some problems with fixes in the drm-misc tree so it got left out last week, so this week has the misc fixes rebased without the vc4 pieces. Otherwise it's i915, amdgpu with the usual fixes and a scattering over other drivers. I expect things should calm down a bit more next week. core: - Kconfig fix for fb_simple vs simpledrm. i915: - Fix RKL HDMI audio - Fix runtime pm imbalance on i915_gem_shrink() error path - Fix Type-C port access before hw/sw state sync - Fix VBT backlight struct version/size check - Fix VT-d async flip on SKL/BXT with plane stretch workaround amdgpu: - DCN 3.1 DP alt mode fixes - S0ix gfxoff fix - Fix DRM_AMD_DC_SI dependencies - PCIe DPC handling fix - DCN 3.1 scaling fix - Documentation fix amdkfd: - Fix potential memory leak - IOMMUv2 init fixes vc4 (there were some hdmi fixes but things got reverted, sort it out later): - compiler fix nouveau: - Cursor fix - Fix ttm buffer moves for ampere gpu's by adding minimal acceleration support. - memory leak fixes rockchip: - crtc/clk fixup panel: - ili9341 Fix DT bindings indent - y030xx067a - yellow tint init seq fix gbefb: - Fix gbefb when built with COMPILE_TEST" * tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drm: (33 commits) drm/amd/display: Fix detection of 4 lane for DPALT drm/amd/display: Limit display scaling to up to 4k for DCN 3.1 drm/amd/display: Skip override for preferred link settings during link training drm/nouveau/debugfs: fix file release memory leak drm/nouveau/kms/nv50-: fix file release memory leak drm/nouveau: avoid a use-after-free when BO init fails DRM: delete DRM IRQ legacy midlayer docs video: fbdev: gbefb: Only instantiate device when built for IP32 fbdev: simplefb: fix Kconfig dependencies drm/panel: abt-y030xx067a: yellow tint fix dt-bindings: panel: ili9341: correct indentation drm/nouveau/fifo/ga102: initialise chid on return from channel creation drm/rockchip: Update crtc fixup to account for fractional clk change drm/nouveau/ga102-: support ttm buffer moves via copy engine drm/nouveau/kms/tu102-: delay enabling cursor until after assign_windows drm/sun4i: dw-hdmi: Fix HDMI PHY clock setup drm/vc4: hdmi: Remove unused struct drm/kmb: Enable alpha blended second plane drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume drm/amdgpu: init iommu after amdkfd device init ...
2021-10-08dt-bindings: net: snps,dwmac: add dwmac 3.40a IP versionHerve Codina1-0/+2
dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08Merge tag 'amd-drm-fixes-5.15-2021-10-06' of ↵Dave Airlie1-2/+2
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.15-2021-10-06: amdgpu: - DCN 3.1 DP alt mode fixes - S0ix gfxoff fix - Fix DRM_AMD_DC_SI dependencies - PCIe DPC handling fix - DCN 3.1 scaling fix - Documentation fix amdkfd: - Fix potential memory leak - IOMMUv2 init fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211006203828.4818-1-alexander.deucher@amd.com
2021-10-07Merge tag 'net-5.15-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from xfrm, bpf, netfilter, and wireless. Current release - regressions: - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new value in the middle of an enum - unix: fix an issue in unix_shutdown causing the other end read/write failures - phy: mdio: fix memory leak Current release - new code bugs: - mlx5e: improve MQPRIO resiliency against bad configs Previous releases - regressions: - bpf: fix integer overflow leading to OOB access in map element pre-allocation - stmmac: dwmac-rk: fix ethernet on rk3399 based devices - netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 - brcmfmac: revert using ISO3166 country code and 0 rev as fallback - i40e: fix freeing of uninitialized misc IRQ vector - iavf: fix double unlock of crit_lock Previous releases - always broken: - bpf, arm: fix register clobbering in div/mod implementation - netfilter: nf_tables: correct issues in netlink rule change event notifications - dsa: tag_dsa: fix mask for trunked packets - usb: r8152: don't resubmit rx immediately to avoid soft lockup on device unplug - i40e: fix endless loop under rtnl if FW fails to correctly respond to capability query - mlx5e: fix rx checksum offload coexistence with ipsec offload - mlx5: force round second at 1PPS out start time and allow it only in supported clock modes - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable sequence Misc: - xfrm: slightly rejig the new policy uAPI to make it less cryptic" * tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) net: prefer socket bound to interface when not in VRF iavf: fix double unlock of crit_lock i40e: Fix freeing of uninitialized misc IRQ vector i40e: fix endless loop under rtnl dt-bindings: net: dsa: marvell: fix compatible in example ionic: move filter sync_needed bit set gve: report 64bit tx_bytes counter from gve_handle_report_stats() gve: fix gve_get_stats() rtnetlink: fix if_nlmsg_stats_size() under estimation gve: Properly handle errors in gve_assign_qpl gve: Avoid freeing NULL pointer gve: Correct available tx qpl check unix: Fix an issue in unix_shutdown causing the other end read/write failures net: stmmac: trigger PCS EEE to turn off on link down net: pcs: xpcs: fix incorrect steps on disable EEE netlink: annotate data races around nlk->bound net: pcs: xpcs: fix incorrect CL37 AN sequence net: sfp: Fix typo in state machine debug string net/sched: sch_taprio: properly cancel timer from taprio_destroy() net: bridge: fix under estimation in br_get_linkxstats_size() ...
2021-10-07Merge tag 'devicetree-fixes-for-5.15-3' of ↵Linus Torvalds9-12/+6
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Add another allowed address for TI sn65dsi86 - Drop more redundant minItems/maxItems - Fix more graph 'unevaluatedProperties' warnings in media bindings * tag 'devicetree-fixes-for-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg value dt-bindings: Drop more redundant 'maxItems/minItems' dt-bindings: media: Fix more graph 'unevaluatedProperties' related warnings
2021-10-06dt-bindings: net: dsa: marvell: fix compatible in exampleMarcel Ziswiler1-1/+1
While the MV88E6390 switch chip exists, one is supposed to use a compatible of "marvell,mv88e6190" for it. Fix this in the given example. Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com> Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06DRM: delete DRM IRQ legacy midlayer docsRandy Dunlap1-9/+0
Remove documentation associated with the removal of the DRM IRQ legacy midlayer. Eliminates these documentation warnings: ../drivers/gpu/drm/drm_irq.c:1: warning: 'irq helpers' not found ../drivers/gpu/drm/drm_irq.c:1: warning: no structured comments found Fixes: c1736b9008cb ("drm: IRQ midlayer is now legacy") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: dri-devel@lists.freedesktop.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20211005025312.20913-1-rdunlap@infradead.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-10-06dt-bindings: panel: ili9341: correct indentationKrzysztof Kozlowski1-1/+1
Correct indentation warning: ilitek,ili9341.yaml:25:9: [warning] wrong indentation: expected 10 but found 8 (indentation) Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210819101020.26368-1-krzysztof.kozlowski@canonical.com (cherry picked from commit 333ba0d9d5d5a2cf1f6bbb754045e4f2cb3ed22d) Link: https://lore.kernel.org/dri-devel/CAL_JsqKcTfgnXNYzGDSFhKS2udhw2Dvk04ODwTxUdDRQjKdT0Q@mail.gmail.com/ Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-10-05Documentation/gpu: remove spurious "+" in amdgpu.rstAlex Deucher1-2/+2
Not sure why that was there. Remove it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-05xen/x86: make "earlyprintk=xen" work for HVM/PVH DomUJan Beulich1-1/+1
xenboot_write_console() is dealing with these quite fine so I don't see why xenboot_console_setup() would return -ENOENT in this case. Adjust documentation accordingly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/3d212583-700e-8b2d-727a-845ef33ac265@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-04dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg valueGeert Uytterhoeven1-1/+1
make dtbs_check: arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dt.yaml: bridge@2c: reg:0:0: 45 was expected According to the datasheet, the I2C address can be either 0x2c or 0x2d, depending on the ADDR control input. Fixes: e3896e6dddf0b821 ("dt-bindings: drm/bridge: Document sn65dsi86 bridge bindings") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Link: https://lore.kernel.org/r/08f73c2aa0d4e580303357dfae107d084d962835.1632486753.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>