summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-bus-cxl15
-rw-r--r--Documentation/ABI/testing/sysfs-class-led-trigger-netdev20
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu13
-rw-r--r--Documentation/ABI/testing/sysfs-module11
-rw-r--r--Documentation/ABI/testing/sysfs-platform-hidma2
-rw-r--r--Documentation/ABI/testing/sysfs-platform-hidma-mgmt20
-rw-r--r--Documentation/admin-guide/devices.txt2
-rw-r--r--Documentation/admin-guide/hw-vuln/gather_data_sampling.rst109
-rw-r--r--Documentation/admin-guide/hw-vuln/index.rst14
-rw-r--r--Documentation/admin-guide/hw-vuln/spectre.rst11
-rw-r--r--Documentation/admin-guide/hw-vuln/srso.rst150
-rw-r--r--Documentation/admin-guide/kdump/vmcoreinfo.rst6
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt62
-rw-r--r--Documentation/arch/arm64/silicon-errata.rst3
-rw-r--r--Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/mediatek,net.yaml7
-rw-r--r--Documentation/devicetree/bindings/net/rockchip-dwmac.yaml10
-rw-r--r--Documentation/devicetree/bindings/pinctrl/qcom,sa8775p-tlmm.yaml2
-rw-r--r--Documentation/devicetree/bindings/serial/atmel,at91-usart.yaml4
-rw-r--r--Documentation/filesystems/idmappings.rst14
-rw-r--r--Documentation/filesystems/locking.rst13
-rw-r--r--Documentation/filesystems/porting.rst25
-rw-r--r--Documentation/filesystems/tmpfs.rst85
-rw-r--r--Documentation/filesystems/vfs.rst12
-rw-r--r--Documentation/i2c/writing-clients.rst2
-rw-r--r--Documentation/networking/napi.rst13
-rw-r--r--Documentation/networking/nf_conntrack-sysctl.rst4
-rw-r--r--Documentation/process/embargoed-hardware-issues.rst3
-rw-r--r--Documentation/process/security-bugs.rst39
29 files changed, 519 insertions, 155 deletions
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 6350dd82b9a9..087f762ebfd5 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -82,7 +82,12 @@ Description:
whether it resides in persistent capacity, volatile capacity,
or the LSA, is made permanently unavailable by whatever means
is appropriate for the media type. This functionality requires
- the device to be not be actively decoding any HPA ranges.
+ the device to be disabled, that is, not actively decoding any
+ HPA ranges. This permits avoiding explicit global CPU cache
+ management, relying instead for it to be done when a region
+ transitions between software programmed and hardware committed
+ states. If this file is not present, then there is no hardware
+ support for the operation.
What /sys/bus/cxl/devices/memX/security/erase
@@ -92,7 +97,13 @@ Contact: linux-cxl@vger.kernel.org
Description:
(WO) Write a boolean 'true' string value to this attribute to
secure erase user data by changing the media encryption keys for
- all user data areas of the device.
+ all user data areas of the device. This functionality requires
+ the device to be disabled, that is, not actively decoding any
+ HPA ranges. This permits avoiding explicit global CPU cache
+ management, relying instead for it to be done when a region
+ transitions between software programmed and hardware committed
+ states. If this file is not present, then there is no hardware
+ support for the operation.
What: /sys/bus/cxl/devices/memX/firmware/
diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
index 78b62a23b14a..f6d9d72ce77b 100644
--- a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
+++ b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
@@ -13,7 +13,7 @@ Description:
Specifies the duration of the LED blink in milliseconds.
Defaults to 50 ms.
- With hw_control ON, the interval value MUST be set to the
+ When offloaded is true, the interval value MUST be set to the
default value and cannot be changed.
Trying to set any value in this specific mode will return
an EINVAL error.
@@ -44,8 +44,8 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal transmission.
- With hw_control ON, the blink interval is controlled by hardware
- and won't reflect the value set in interval.
+ When offloaded is true, the blink interval is controlled by
+ hardware and won't reflect the value set in interval.
What: /sys/class/leds/<led>/rx
Date: Dec 2017
@@ -59,21 +59,21 @@ Description:
If set to 1, the LED will blink for the milliseconds specified
in interval to signal reception.
- With hw_control ON, the blink interval is controlled by hardware
- and won't reflect the value set in interval.
+ When offloaded is true, the blink interval is controlled by
+ hardware and won't reflect the value set in interval.
-What: /sys/class/leds/<led>/hw_control
+What: /sys/class/leds/<led>/offloaded
Date: Jun 2023
KernelVersion: 6.5
Contact: linux-leds@vger.kernel.org
Description:
- Communicate whether the LED trigger modes are driven by hardware
- or software fallback is used.
+ Communicate whether the LED trigger modes are offloaded to
+ hardware or whether software fallback is used.
If 0, the LED is using software fallback to blink.
- If 1, the LED is using hardware control to blink and signal the
- requested modes.
+ If 1, the LED blinking in requested mode is offloaded to
+ hardware.
What: /sys/class/leds/<led>/link_10
Date: Jun 2023
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..77942eedf4f6 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -513,17 +513,18 @@ Description: information about CPUs heterogeneity.
cpu_capacity: capacity of cpuX.
What: /sys/devices/system/cpu/vulnerabilities
+ /sys/devices/system/cpu/vulnerabilities/gather_data_sampling
+ /sys/devices/system/cpu/vulnerabilities/itlb_multihit
+ /sys/devices/system/cpu/vulnerabilities/l1tf
+ /sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
+ /sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+ /sys/devices/system/cpu/vulnerabilities/retbleed
+ /sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1
/sys/devices/system/cpu/vulnerabilities/spectre_v2
- /sys/devices/system/cpu/vulnerabilities/spec_store_bypass
- /sys/devices/system/cpu/vulnerabilities/l1tf
- /sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/srbds
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
- /sys/devices/system/cpu/vulnerabilities/itlb_multihit
- /sys/devices/system/cpu/vulnerabilities/mmio_stale_data
- /sys/devices/system/cpu/vulnerabilities/retbleed
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
diff --git a/Documentation/ABI/testing/sysfs-module b/Documentation/ABI/testing/sysfs-module
index 08886367d047..62addab47d0c 100644
--- a/Documentation/ABI/testing/sysfs-module
+++ b/Documentation/ABI/testing/sysfs-module
@@ -60,3 +60,14 @@ Description: Module taint flags:
C staging driver module
E unsigned module
== =====================
+
+What: /sys/module/grant_table/parameters/free_per_iteration
+Date: July 2023
+KernelVersion: 6.5 but backported to all supported stable branches
+Contact: Xen developer discussion <xen-devel@lists.xenproject.org>
+Description: Read and write number of grant entries to attempt to free per iteration.
+
+ Note: Future versions of Xen and Linux may provide a better
+ interface for controlling the rate of deferred grant reclaim
+ or may not need it at all.
+Users: Qubes OS (https://www.qubes-os.org)
diff --git a/Documentation/ABI/testing/sysfs-platform-hidma b/Documentation/ABI/testing/sysfs-platform-hidma
index fca40a54df59..a80aeda85ef6 100644
--- a/Documentation/ABI/testing/sysfs-platform-hidma
+++ b/Documentation/ABI/testing/sysfs-platform-hidma
@@ -2,7 +2,7 @@ What: /sys/devices/platform/hidma-*/chid
/sys/devices/platform/QCOM8061:*/chid
Date: Dec 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains the ID of the channel within the HIDMA instance.
It is used to associate a given HIDMA channel with the
diff --git a/Documentation/ABI/testing/sysfs-platform-hidma-mgmt b/Documentation/ABI/testing/sysfs-platform-hidma-mgmt
index 3b6c5c9eabdc..0373745b4e18 100644
--- a/Documentation/ABI/testing/sysfs-platform-hidma-mgmt
+++ b/Documentation/ABI/testing/sysfs-platform-hidma-mgmt
@@ -2,7 +2,7 @@ What: /sys/devices/platform/hidma-mgmt*/chanops/chan*/priority
/sys/devices/platform/QCOM8060:*/chanops/chan*/priority
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains either 0 or 1 and indicates if the DMA channel is a
low priority (0) or high priority (1) channel.
@@ -11,7 +11,7 @@ What: /sys/devices/platform/hidma-mgmt*/chanops/chan*/weight
/sys/devices/platform/QCOM8060:*/chanops/chan*/weight
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains 0..15 and indicates the weight of the channel among
equal priority channels during round robin scheduling.
@@ -20,7 +20,7 @@ What: /sys/devices/platform/hidma-mgmt*/chreset_timeout_cycles
/sys/devices/platform/QCOM8060:*/chreset_timeout_cycles
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains the platform specific cycle value to wait after a
reset command is issued. If the value is chosen too short,
@@ -32,7 +32,7 @@ What: /sys/devices/platform/hidma-mgmt*/dma_channels
/sys/devices/platform/QCOM8060:*/dma_channels
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains the number of dma channels supported by one instance
of HIDMA hardware. The value may change from chip to chip.
@@ -41,7 +41,7 @@ What: /sys/devices/platform/hidma-mgmt*/hw_version_major
/sys/devices/platform/QCOM8060:*/hw_version_major
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Version number major for the hardware.
@@ -49,7 +49,7 @@ What: /sys/devices/platform/hidma-mgmt*/hw_version_minor
/sys/devices/platform/QCOM8060:*/hw_version_minor
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Version number minor for the hardware.
@@ -57,7 +57,7 @@ What: /sys/devices/platform/hidma-mgmt*/max_rd_xactions
/sys/devices/platform/QCOM8060:*/max_rd_xactions
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains a value between 0 and 31. Maximum number of
read transactions that can be issued back to back.
@@ -69,7 +69,7 @@ What: /sys/devices/platform/hidma-mgmt*/max_read_request
/sys/devices/platform/QCOM8060:*/max_read_request
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Size of each read request. The value needs to be a power
of two and can be between 128 and 1024.
@@ -78,7 +78,7 @@ What: /sys/devices/platform/hidma-mgmt*/max_wr_xactions
/sys/devices/platform/QCOM8060:*/max_wr_xactions
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Contains a value between 0 and 31. Maximum number of
write transactions that can be issued back to back.
@@ -91,7 +91,7 @@ What: /sys/devices/platform/hidma-mgmt*/max_write_request
/sys/devices/platform/QCOM8060:*/max_write_request
Date: Nov 2015
KernelVersion: 4.4
-Contact: "Sinan Kaya <okaya@codeaurora.org>"
+Contact: "Sinan Kaya <okaya@kernel.org>"
Description:
Size of each write request. The value needs to be a power
of two and can be between 128 and 1024.
diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
index 06c525e01ea5..b1b57f638b94 100644
--- a/Documentation/admin-guide/devices.txt
+++ b/Documentation/admin-guide/devices.txt
@@ -2691,7 +2691,7 @@
45 = /dev/ttyMM1 Marvell MPSC - port 1 (obsolete unused)
46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
...
- 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
+ 49 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 3
50 = /dev/ttyIOC0 Altix serial card
...
81 = /dev/ttyIOC31 Altix serial card
diff --git a/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst b/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst
new file mode 100644
index 000000000000..264bfa937f7d
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst
@@ -0,0 +1,109 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+GDS - Gather Data Sampling
+==========================
+
+Gather Data Sampling is a hardware vulnerability which allows unprivileged
+speculative access to data which was previously stored in vector registers.
+
+Problem
+-------
+When a gather instruction performs loads from memory, different data elements
+are merged into the destination vector register. However, when a gather
+instruction that is transiently executed encounters a fault, stale data from
+architectural or internal vector registers may get transiently forwarded to the
+destination vector register instead. This will allow a malicious attacker to
+infer stale data using typical side channel techniques like cache timing
+attacks. GDS is a purely sampling-based attack.
+
+The attacker uses gather instructions to infer the stale vector register data.
+The victim does not need to do anything special other than use the vector
+registers. The victim does not need to use gather instructions to be
+vulnerable.
+
+Because the buffers are shared between Hyper-Threads cross Hyper-Thread attacks
+are possible.
+
+Attack scenarios
+----------------
+Without mitigation, GDS can infer stale data across virtually all
+permission boundaries:
+
+ Non-enclaves can infer SGX enclave data
+ Userspace can infer kernel data
+ Guests can infer data from hosts
+ Guest can infer guest from other guests
+ Users can infer data from other users
+
+Because of this, it is important to ensure that the mitigation stays enabled in
+lower-privilege contexts like guests and when running outside SGX enclaves.
+
+The hardware enforces the mitigation for SGX. Likewise, VMMs should ensure
+that guests are not allowed to disable the GDS mitigation. If a host erred and
+allowed this, a guest could theoretically disable GDS mitigation, mount an
+attack, and re-enable it.
+
+Mitigation mechanism
+--------------------
+This issue is mitigated in microcode. The microcode defines the following new
+bits:
+
+ ================================ === ============================
+ IA32_ARCH_CAPABILITIES[GDS_CTRL] R/O Enumerates GDS vulnerability
+ and mitigation support.
+ IA32_ARCH_CAPABILITIES[GDS_NO] R/O Processor is not vulnerable.
+ IA32_MCU_OPT_CTRL[GDS_MITG_DIS] R/W Disables the mitigation
+ 0 by default.
+ IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] R/W Locks GDS_MITG_DIS=0. Writes
+ to GDS_MITG_DIS are ignored
+ Can't be cleared once set.
+ ================================ === ============================
+
+GDS can also be mitigated on systems that don't have updated microcode by
+disabling AVX. This can be done by setting gather_data_sampling="force" or
+"clearcpuid=avx" on the kernel command-line.
+
+If used, these options will disable AVX use by turning off XSAVE YMM support.
+However, the processor will still enumerate AVX support. Userspace that
+does not follow proper AVX enumeration to check both AVX *and* XSAVE YMM
+support will break.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The mitigation can be disabled by setting "gather_data_sampling=off" or
+"mitigations=off" on the kernel command line. Not specifying either will default
+to the mitigation being enabled. Specifying "gather_data_sampling=force" will
+use the microcode mitigation when available or disable AVX on affected systems
+where the microcode hasn't been updated to include the mitigation.
+
+GDS System Information
+------------------------
+The kernel provides vulnerability status information through sysfs. For
+GDS this can be accessed by the following sysfs file:
+
+/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
+
+The possible values contained in this file are:
+
+ ============================== =============================================
+ Not affected Processor not vulnerable.
+ Vulnerable Processor vulnerable and mitigation disabled.
+ Vulnerable: No microcode Processor vulnerable and microcode is missing
+ mitigation.
+ Mitigation: AVX disabled,
+ no microcode Processor is vulnerable and microcode is missing
+ mitigation. AVX disabled as mitigation.
+ Mitigation: Microcode Processor is vulnerable and mitigation is in
+ effect.
+ Mitigation: Microcode (locked) Processor is vulnerable and mitigation is in
+ effect and cannot be disabled.
+ Unknown: Dependent on
+ hypervisor status Running on a virtual guest processor that is
+ affected but with no way to know if host
+ processor is mitigated or vulnerable.
+ ============================== =============================================
+
+GDS Default mitigation
+----------------------
+The updated microcode will enable the mitigation by default. The kernel's
+default action is to leave the mitigation enabled.
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index e0614760a99e..de99caabf65a 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -13,9 +13,11 @@ are configurable at compile, boot or run time.
l1tf
mds
tsx_async_abort
- multihit.rst
- special-register-buffer-data-sampling.rst
- core-scheduling.rst
- l1d_flush.rst
- processor_mmio_stale_data.rst
- cross-thread-rsb.rst
+ multihit
+ special-register-buffer-data-sampling
+ core-scheduling
+ l1d_flush
+ processor_mmio_stale_data
+ cross-thread-rsb
+ srso
+ gather_data_sampling
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 4d186f599d90..32a8893e5617 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -484,11 +484,14 @@ Spectre variant 2
Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
boot, by setting the IBRS bit, and they're automatically protected against
- Spectre v2 variant attacks, including cross-thread branch target injections
- on SMT systems (STIBP). In other words, eIBRS enables STIBP too.
+ Spectre v2 variant attacks.
- Legacy IBRS systems clear the IBRS bit on exit to userspace and
- therefore explicitly enable STIBP for that
+ On Intel's enhanced IBRS systems, this includes cross-thread branch target
+ injections on SMT systems (STIBP). In other words, Intel eIBRS enables
+ STIBP, too.
+
+ AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
+ the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.
The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
new file mode 100644
index 000000000000..b6cfb51cb0b4
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -0,0 +1,150 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Speculative Return Stack Overflow (SRSO)
+========================================
+
+This is a mitigation for the speculative return stack overflow (SRSO)
+vulnerability found on AMD processors. The mechanism is by now the well
+known scenario of poisoning CPU functional units - the Branch Target
+Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
+tricking the elevated privilege domain (the kernel) into leaking
+sensitive data.
+
+AMD CPUs predict RET instructions using a Return Address Predictor (aka
+Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
+CALL instruction (i.e., an instruction predicted to be a CALL but is
+not actually a CALL) can create an entry in the RAP which may be used
+to predict the target of a subsequent RET instruction.
+
+The specific circumstances that lead to this varies by microarchitecture
+but the concern is that an attacker can mis-train the CPU BTB to predict
+non-architectural CALL instructions in kernel space and use this to
+control the speculative target of a subsequent kernel RET, potentially
+leading to information disclosure via a speculative side-channel.
+
+The issue is tracked under CVE-2023-20569.
+
+Affected processors
+-------------------
+
+AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
+processors have not been investigated.
+
+System information and options
+------------------------------
+
+First of all, it is required that the latest microcode be loaded for
+mitigations to be effective.
+
+The sysfs file showing SRSO mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
+
+The possible values in this file are:
+
+ * 'Not affected':
+
+ The processor is not vulnerable
+
+ * 'Vulnerable: no microcode':
+
+ The processor is vulnerable, no microcode extending IBPB
+ functionality to address the vulnerability has been applied.
+
+ * 'Mitigation: microcode':
+
+ Extended IBPB functionality microcode patch has been applied. It does
+ not address User->Kernel and Guest->Host transitions protection but it
+ does address User->User and VM->VM attack vectors.
+
+ Note that User->User mitigation is controlled by how the IBPB aspect in
+ the Spectre v2 mitigation is selected:
+
+ * conditional IBPB:
+
+ where each process can select whether it needs an IBPB issued
+ around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
+
+ * strict:
+
+ i.e., always on - by supplying spectre_v2_user=on on the kernel
+ command line
+
+ (spec_rstack_overflow=microcode)
+
+ * 'Mitigation: safe RET':
+
+ Software-only mitigation. It complements the extended IBPB microcode
+ patch functionality by addressing User->Kernel and Guest->Host
+ transitions protection.
+
+ Selected by default or by spec_rstack_overflow=safe-ret
+
+ * 'Mitigation: IBPB':
+
+ Similar protection as "safe RET" above but employs an IBPB barrier on
+ privilege domain crossings (User->Kernel, Guest->Host).
+
+ (spec_rstack_overflow=ibpb)
+
+ * 'Mitigation: IBPB on VMEXIT':
+
+ Mitigation addressing the cloud provider scenario - the Guest->Host
+ transitions only.
+
+ (spec_rstack_overflow=ibpb-vmexit)
+
+
+
+In order to exploit vulnerability, an attacker needs to:
+
+ - gain local access on the machine
+
+ - break kASLR
+
+ - find gadgets in the running kernel in order to use them in the exploit
+
+ - potentially create and pin an additional workload on the sibling
+ thread, depending on the microarchitecture (not necessary on fam 0x19)
+
+ - run the exploit
+
+Considering the performance implications of each mitigation type, the
+default one is 'Mitigation: safe RET' which should take care of most
+attack vectors, including the local User->Kernel one.
+
+As always, the user is advised to keep her/his system up-to-date by
+applying software updates regularly.
+
+The default setting will be reevaluated when needed and especially when
+new attack vectors appear.
+
+As one can surmise, 'Mitigation: safe RET' does come at the cost of some
+performance depending on the workload. If one trusts her/his userspace
+and does not want to suffer the performance impact, one can always
+disable the mitigation with spec_rstack_overflow=off.
+
+Similarly, 'Mitigation: IBPB' is another full mitigation type employing
+an indrect branch prediction barrier after having applied the required
+microcode patch for one's system. This mitigation comes also at
+a performance cost.
+
+Mitigation: safe RET
+--------------------
+
+The mitigation works by ensuring all RET instructions speculate to
+a controlled location, similar to how speculation is controlled in the
+retpoline sequence. To accomplish this, the __x86_return_thunk forces
+the CPU to mispredict every function return using a 'safe return'
+sequence.
+
+To ensure the safety of this mitigation, the kernel must ensure that the
+safe return sequence is itself free from attacker interference. In Zen3
+and Zen4, this is accomplished by creating a BTB alias between the
+untraining function srso_alias_untrain_ret() and the safe return
+function srso_alias_safe_ret() which results in evicting a potentially
+poisoned BTB entry and using that safe one for all function returns.
+
+In older Zen1 and Zen2, this is accomplished using a reinterpretation
+technique similar to Retbleed one: srso_untrain_ret() and
+srso_safe_ret().
diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index c18d94fa6470..f8ebb63b6c5d 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -624,3 +624,9 @@ Used to get the correct ranges:
* VMALLOC_START ~ VMALLOC_END : vmalloc() / ioremap() space.
* VMEMMAP_START ~ VMEMMAP_END : vmemmap space, used for struct page array.
* KERNEL_LINK_ADDR : start address of Kernel link and BPF
+
+va_kernel_pa_offset
+-------------------
+
+Indicates the offset between the kernel virtual and physical mappings.
+Used to translate virtual to physical addresses.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..6ff63638ed82 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1623,6 +1623,26 @@
Format: off | on
default: on
+ gather_data_sampling=
+ [X86,INTEL] Control the Gather Data Sampling (GDS)
+ mitigation.
+
+ Gather Data Sampling is a hardware vulnerability which
+ allows unprivileged speculative access to data which was
+ previously stored in vector registers.
+
+ This issue is mitigated by default in updated microcode.
+ The mitigation may have a performance impact but can be
+ disabled. On systems without the microcode mitigation
+ disabling AVX serves as a mitigation.
+
+ force: Disable AVX to mitigate systems without
+ microcode mitigation. No effect if the microcode
+ mitigation is present. Known to cause crashes in
+ userspace with buggy AVX enumeration.
+
+ off: Disable GDS mitigation.
+
gcov_persist= [GCOV] When non-zero (default), profiling data for
kernel modules is saved and remains accessible via
debugfs, even when the module is unloaded/reloaded.
@@ -3273,24 +3293,25 @@
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
- Equivalent to: nopti [X86,PPC]
- if nokaslr then kpti=0 [ARM64]
- nospectre_v1 [X86,PPC]
- nobp=0 [S390]
- nospectre_v2 [X86,PPC,S390,ARM64]
- spectre_v2_user=off [X86]
- spec_store_bypass_disable=off [X86,PPC]
- ssbd=force-off [ARM64]
- nospectre_bhb [ARM64]
+ Equivalent to: if nokaslr then kpti=0 [ARM64]
+ gather_data_sampling=off [X86]
+ kvm.nx_huge_pages=off [X86]
l1tf=off [X86]
mds=off [X86]
- tsx_async_abort=off [X86]
- kvm.nx_huge_pages=off [X86]
- srbds=off [X86,INTEL]
+ mmio_stale_data=off [X86]
no_entry_flush [PPC]
no_uaccess_flush [PPC]
- mmio_stale_data=off [X86]
+ nobp=0 [S390]
+ nopti [X86,PPC]
+ nospectre_bhb [ARM64]
+ nospectre_v1 [X86,PPC]
+ nospectre_v2 [X86,PPC,S390,ARM64]
retbleed=off [X86]
+ spec_store_bypass_disable=off [X86,PPC]
+ spectre_v2_user=off [X86]
+ srbds=off [X86,INTEL]
+ ssbd=force-off [ARM64]
+ tsx_async_abort=off [X86]
Exceptions:
This does not have any effect on
@@ -5501,6 +5522,10 @@
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
+ rootwait= [KNL] Maximum time (in seconds) to wait for root device
+ to show up before attempting to mount the root
+ filesystem.
+
rproc_mem=nn[KMG][@address]
[KNL,ARM,CMA] Remoteproc physical memory block.
Memory area to be used by remote processor image,
@@ -5875,6 +5900,17 @@
Not specifying this option is equivalent to
spectre_v2_user=auto.
+ spec_rstack_overflow=
+ [X86] Control RAS overflow mitigation on AMD Zen CPUs
+
+ off - Disable mitigation
+ microcode - Enable microcode mitigation only
+ safe-ret - Enable sw-only safe RET mitigation (default)
+ ibpb - Enable mitigation by issuing IBPB on
+ kernel entry
+ ibpb-vmexit - Issue IBPB only on VMEXIT
+ (cloud-specific mitigation)
+
spec_store_bypass_disable=
[HW] Control Speculative Store Bypass (SSB) Disable mitigation
(Speculative Store Bypass vulnerability)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 496cdca5cb99..bedd3a1d7b42 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -148,6 +148,9 @@ stable kernels.
| ARM | MMU-700 | #2268618,2812531| N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml b/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
index 72d2e910f206..2594fa192f93 100644
--- a/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
+++ b/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
@@ -216,7 +216,6 @@ properties:
description: Whether to enable burnout current for EXT1.
adi,ext1-burnout-current-nanoamp:
- $ref: /schemas/types.yaml#/definitions/uint32
description:
Burnout current in nanoamps to be applied to EXT1.
enum: [0, 50, 500, 1000, 10000]
@@ -233,7 +232,6 @@ properties:
description: Whether to enable burnout current for EXT2.
adi,ext2-burnout-current-nanoamp:
- $ref: /schemas/types.yaml#/definitions/uint32
description: Burnout current in nanoamps to be applied to EXT2.
enum: [0, 50, 500, 1000, 10000]
default: 0
@@ -249,7 +247,6 @@ properties:
description: Whether to enable burnout current for VIOUT.
adi,viout-burnout-current-nanoamp:
- $ref: /schemas/types.yaml#/definitions/uint32
description: Burnout current in nanoamps to be applied to VIOUT.
enum: [0, 1000, 10000]
default: 0
diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml
index acb2b2ac4fe1..31cc0c412805 100644
--- a/Documentation/devicetree/bindings/net/mediatek,net.yaml
+++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml
@@ -293,7 +293,7 @@ allOf:
patternProperties:
"^mac@[0-1]$":
type: object
- additionalProperties: false
+ unevaluatedProperties: false
allOf:
- $ref: ethernet-controller.yaml#
description:
@@ -305,14 +305,9 @@ patternProperties:
reg:
maxItems: 1
- phy-handle: true
-
- phy-mode: true
-
required:
- reg
- compatible
- - phy-handle
required:
- compatible
diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
index 176ea5f90251..7f324c6da915 100644
--- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
@@ -91,12 +91,18 @@ properties:
$ref: /schemas/types.yaml#/definitions/phandle
tx_delay:
- description: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default.
+ description: Delay value for TXD timing.
$ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 0x7F
+ default: 0x30
rx_delay:
- description: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default.
+ description: Delay value for RXD timing.
$ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 0x7F
+ default: 0x10
phy-supply:
description: PHY regulator
diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,sa8775p-tlmm.yaml b/Documentation/devicetree/bindings/pinctrl/qcom,sa8775p-tlmm.yaml
index e608a4f1bcae..e119a226a4b1 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,sa8775p-tlmm.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,sa8775p-tlmm.yaml
@@ -87,7 +87,7 @@ $defs:
emac0_mdc, emac0_mdio, emac0_ptp_aux, emac0_ptp_pps, emac1_mcg0,
emac1_mcg1, emac1_mcg2, emac1_mcg3, emac1_mdc, emac1_mdio,
emac1_ptp_aux, emac1_ptp_pps, gcc_gp1, gcc_gp2, gcc_gp3,
- gcc_gp4, gcc_gp5, hs0_mi2s, hs1_mi2s, hs2_mi2s, ibi_i3c,
+ gcc_gp4, gcc_gp5, gpio, hs0_mi2s, hs1_mi2s, hs2_mi2s, ibi_i3c,
jitter_bist, mdp0_vsync0, mdp0_vsync1, mdp0_vsync2, mdp0_vsync3,
mdp0_vsync4, mdp0_vsync5, mdp0_vsync6, mdp0_vsync7, mdp0_vsync8,
mdp1_vsync0, mdp1_vsync1, mdp1_vsync2, mdp1_vsync3, mdp1_vsync4,
diff --git a/Documentation/devicetree/bindings/serial/atmel,at91-usart.yaml b/Documentation/devicetree/bindings/serial/atmel,at91-usart.yaml
index 30b2131b5860..65cb2e5c5eee 100644
--- a/Documentation/devicetree/bindings/serial/atmel,at91-usart.yaml
+++ b/Documentation/devicetree/bindings/serial/atmel,at91-usart.yaml
@@ -16,7 +16,6 @@ properties:
- enum:
- atmel,at91rm9200-usart
- atmel,at91sam9260-usart
- - microchip,sam9x60-usart
- items:
- const: atmel,at91rm9200-dbgu
- const: atmel,at91rm9200-usart
@@ -24,6 +23,9 @@ properties:
- const: atmel,at91sam9260-dbgu
- const: atmel,at91sam9260-usart
- items:
+ - const: microchip,sam9x60-usart
+ - const: atmel,at91sam9260-usart
+ - items:
- const: microchip,sam9x60-dbgu
- const: microchip,sam9x60-usart
- const: atmel,at91sam9260-dbgu
diff --git a/Documentation/filesystems/idmappings.rst b/Documentation/filesystems/idmappings.rst
index ad6d21640576..d095c5838f94 100644
--- a/Documentation/filesystems/idmappings.rst
+++ b/Documentation/filesystems/idmappings.rst
@@ -146,9 +146,10 @@ For the rest of this document we will prefix all userspace ids with ``u`` and
all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
an idmapping will be written as ``u0:k10000:r10000``.
-For example, the id ``u1000`` is an id in the upper idmapset or "userspace
-idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
-kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
+For example, within this idmapping, the id ``u1000`` is an id in the upper
+idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to
+``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
+starting with ``k10000``.
A kernel id is always created by an idmapping. Such idmappings are associated
with user namespaces. Since we mainly care about how idmappings work we're not
@@ -373,6 +374,13 @@ kernel maps the caller's userspace id down into a kernel id according to the
caller's idmapping and then maps that kernel id up according to the
filesystem's idmapping.
+From the implementation point it's worth mentioning how idmappings are represented.
+All idmappings are taken from the corresponding user namespace.
+
+ - caller's idmapping (usually taken from ``current_user_ns()``)
+ - filesystem's idmapping (``sb->s_user_ns``)
+ - mount's idmapping (``mnt_idmap(vfsmnt)``)
+
Let's see some examples with caller/filesystem idmapping but without mount
idmappings. This will exhibit some problems we can hit. After that we will
revisit/reconsider these examples, this time using mount idmappings, to see how
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index ca3d24becbf5..b7e5a3841aa4 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -85,13 +85,14 @@ prototypes::
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int);
+ struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
locking rules:
all may block
-============== =============================================
+============== ==================================================
ops i_rwsem(inode)
-============== =============================================
+============== ==================================================
lookup: shared
create: exclusive
link: exclusive (both)
@@ -115,7 +116,8 @@ atomic_open: shared (exclusive if O_CREAT is set in open flags)
tmpfile: no
fileattr_get: no or exclusive
fileattr_set: exclusive
-============== =============================================
+get_offset_ctx no
+============== ==================================================
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
@@ -558,9 +560,8 @@ mutex or just to use i_size_read() instead.
Note: this does not protect the file->f_pos against concurrent modifications
since this is something the userspace has to take care about.
-->iterate() is called with i_rwsem exclusive.
-
-->iterate_shared() is called with i_rwsem at least shared.
+->iterate_shared() is called with i_rwsem held for reading, and with the
+file f_pos_lock held exclusively
->fasync() is responsible for maintaining the FASYNC bit in filp->f_flags.
Most instances call fasync_helper(), which does that maintenance, so it's
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index d2d684ae7798..0f5da78ef4f9 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -537,7 +537,7 @@ vfs_readdir() is gone; switch to iterate_dir() instead
**mandatory**
-->readdir() is gone now; switch to ->iterate()
+->readdir() is gone now; switch to ->iterate_shared()
**mandatory**
@@ -693,24 +693,19 @@ parallel now.
---
-**recommended**
+**mandatory**
-->iterate_shared() is added; it's a parallel variant of ->iterate().
+->iterate_shared() is added.
Exclusion on struct file level is still provided (as well as that
between it and lseek on the same struct file), but if your directory
has been opened several times, you can get these called in parallel.
Exclusion between that method and all directory-modifying ones is
still provided, of course.
-Often enough ->iterate() can serve as ->iterate_shared() without any
-changes - it is a read-only operation, after all. If you have any
-per-inode or per-dentry in-core data structures modified by ->iterate(),
-you might need something to serialize the access to them. If you
-do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
-that; look for in-tree examples.
-
-Old method is only used if the new one is absent; eventually it will
-be removed. Switch while you still can; the old one won't stay.
+If you have any per-inode or per-dentry in-core data structures modified
+by ->iterate_shared(), you might need something to serialize the access
+to them. If you do dcache pre-seeding, you'll need to switch to
+d_alloc_parallel() for that; look for in-tree examples.
---
@@ -930,9 +925,9 @@ should be done by looking at FMODE_LSEEK in file->f_mode.
filldir_t (readdir callbacks) calling conventions have changed. Instead of
returning 0 or -E... it returns bool now. false means "no more" (as -E... used
to) and true - "keep going" (as 0 in old calling conventions). Rationale:
-callers never looked at specific -E... values anyway. ->iterate() and
-->iterate_shared() instance require no changes at all, all filldir_t ones in
-the tree converted.
+callers never looked at specific -E... values anyway. -> iterate_shared()
+instances require no changes at all, all filldir_t ones in the tree
+converted.
---
diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index f18f46be5c0c..56a26c843dbe 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the
fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
filesystem can be resized but it cannot be resized to a size below its current
usage. tmpfs also supports POSIX ACLs, and extended attributes for the
-trusted.* and security.* namespaces. ramfs does not use swap and you cannot
-modify any parameter for a ramfs filesystem. The size limit of a ramfs
+trusted.*, security.* and user.* namespaces. ramfs does not use swap and you
+cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs
filesystem is how much memory you have available, and so care must be taken if
used so to not run out of memory.
@@ -84,8 +84,6 @@ nr_inodes The maximum number of inodes for this instance. The default
is half of the number of your physical RAM pages, or (on a
machine with highmem) the number of lowmem RAM pages,
whichever is the lower.
-noswap Disables swap. Remounts must respect the original settings.
- By default swap is enabled.
========= ============================================================
These parameters accept a suffix k, m or g for kilo, mega and giga and
@@ -99,36 +97,65 @@ mount with such options, since it allows any user with write access to
use up all the memory on the machine; but enhances the scalability of
that instance in a system with many CPUs making intensive use of it.
+If nr_inodes is not 0, that limited space for inodes is also used up by
+extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases.
+
+tmpfs blocks may be swapped out, when there is a shortage of memory.
+tmpfs has a mount option to disable its use of swap:
+
+====== ===========================================================
+noswap Disables swap. Remounts must respect the original settings.
+ By default swap is enabled.
+====== ===========================================================
+
tmpfs also supports Transparent Huge Pages which requires a kernel
configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
your system (has_transparent_hugepage(), which is architecture specific).
The mount options for this are:
-====== ============================================================
-huge=0 never: disables huge pages for the mount
-huge=1 always: enables huge pages for the mount
-huge=2 within_size: only allocate huge pages if the page will be
- fully within i_size, also respect fadvise()/madvise() hints.
-huge=3 advise: only allocate huge pages if requested with
- fadvise()/madvise()
-====== ============================================================
-
-There is a sysfs file which you can also use to control system wide THP
-configuration for all tmpfs mounts, the file is:
-
-/sys/kernel/mm/transparent_hugepage/shmem_enabled
-
-This sysfs file is placed on top of THP sysfs directory and so is registered
-by THP code. It is however only used to control all tmpfs mounts with one
-single knob. Since it controls all tmpfs mounts it should only be used either
-for emergency or testing purposes. The values you can set for shmem_enabled are:
-
-== ============================================================
--1 deny: disables huge on shm_mnt and all mounts, for
- emergency use
--2 force: enables huge on shm_mnt and all mounts, w/o needing
- option, for testing
-== ============================================================
+================ ==============================================================
+huge=never Do not allocate huge pages. This is the default.
+huge=always Attempt to allocate huge page every time a new page is needed.
+huge=within_size Only allocate huge page if it will be fully within i_size.
+ Also respect madvise(2) hints.
+huge=advise Only allocate huge page if requested with madvise(2).
+================ ==============================================================
+
+See also Documentation/admin-guide/mm/transhuge.rst, which describes the
+sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled: which can
+be used to deny huge pages on all tmpfs mounts in an emergency, or to
+force huge pages on all tmpfs mounts for testing.
+
+tmpfs also supports quota with the following mount options
+
+======================== =================================================
+quota User and group quota accounting and enforcement
+ is enabled on the mount. Tmpfs is using hidden
+ system quota files that are initialized on mount.
+usrquota User quota accounting and enforcement is enabled
+ on the mount.
+grpquota Group quota accounting and enforcement is enabled
+ on the mount.
+usrquota_block_hardlimit Set global user quota block hard limit.
+usrquota_inode_hardlimit Set global user quota inode hard limit.
+grpquota_block_hardlimit Set global group quota block hard limit.
+grpquota_inode_hardlimit Set global group quota inode hard limit.
+======================== =================================================
+
+None of the quota related mount options can be set or changed on remount.
+
+Quota limit parameters accept a suffix k, m or g for kilo, mega and giga
+and can't be changed on remount. Default global quota limits are taking
+effect for any and all user/group/project except root the first time the
+quota entry for user/group/project id is being accessed - typically the
+first time an inode with a particular id ownership is being created after
+the mount. In other words, instead of the limits being initialized to zero,
+they are initialized with the particular value provided with these mount
+options. The limits can be changed for any user/group id at any time as they
+normally can be.
+
+Note that tmpfs quotas do not support user namespaces so no uid/gid
+translation is done if quotas are enabled inside user namespaces.
tmpfs has a mount option to set the NUMA memory allocation policy for
all files in that instance (if CONFIG_NUMA is enabled) - which can be
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index cb2a97e49872..f8fe815ab1f3 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -260,9 +260,11 @@ filesystem. The following members are defined:
void (*evict_inode) (struct inode *);
void (*put_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
- int (*freeze_super) (struct super_block *);
+ int (*freeze_super) (struct super_block *sb,
+ enum freeze_holder who);
int (*freeze_fs) (struct super_block *);
- int (*thaw_super) (struct super_block *);
+ int (*thaw_super) (struct super_block *sb,
+ enum freeze_wholder who);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
@@ -515,6 +517,7 @@ As of kernel 2.6.22, the following members are defined:
int (*fileattr_set)(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
+ struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
};
Again, all methods are called without any locks being held, unless
@@ -675,7 +678,10 @@ otherwise noted.
called on ioctl(FS_IOC_SETFLAGS) and ioctl(FS_IOC_FSSETXATTR) to
change miscellaneous file flags and attributes. Callers hold
i_rwsem exclusive. If unset, then fall back to f_op->ioctl().
-
+``get_offset_ctx``
+ called to get the offset context for a directory inode. A
+ filesystem must define this operation to use
+ simple_offset_dir_operations.
The Address Space Object
========================
diff --git a/Documentation/i2c/writing-clients.rst b/Documentation/i2c/writing-clients.rst
index b7d3ae7458f8..41ddc10f1ac7 100644
--- a/Documentation/i2c/writing-clients.rst
+++ b/Documentation/i2c/writing-clients.rst
@@ -46,7 +46,7 @@ driver model device node, and its I2C address.
},
.id_table = foo_idtable,
- .probe_new = foo_probe,
+ .probe = foo_probe,
.remove = foo_remove,
/* if device autodetection is needed: */
.class = I2C_CLASS_SOMETHING,
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index a7a047742e93..7bf7b95c4f7a 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -65,15 +65,16 @@ argument - drivers can process completions for any number of Tx
packets but should only process up to ``budget`` number of
Rx packets. Rx processing is usually much more expensive.
-In other words, it is recommended to ignore the budget argument when
-performing TX buffer reclamation to ensure that the reclamation is not
-arbitrarily bounded; however, it is required to honor the budget argument
-for RX processing.
+In other words for Rx processing the ``budget`` argument limits how many
+packets driver can process in a single poll. Rx specific APIs like page
+pool or XDP cannot be used at all when ``budget`` is 0.
+skb Tx processing should happen regardless of the ``budget``, but if
+the argument is 0 driver cannot call any XDP (or page pool) APIs.
.. warning::
- The ``budget`` argument may be 0 if core tries to only process Tx completions
- and no Rx packets.
+ The ``budget`` argument may be 0 if core tries to only process
+ skb Tx completions and no Rx or XDP packets.
The poll method returns the amount of work done. If the driver still
has outstanding work to do (e.g. ``budget`` was exhausted)
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index 8b1045c3b59e..c383a394c665 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -178,10 +178,10 @@ nf_conntrack_sctp_timeout_established - INTEGER (seconds)
Default is set to (hb_interval * path_max_retrans + rto_max)
nf_conntrack_sctp_timeout_shutdown_sent - INTEGER (seconds)
- default 0.3
+ default 3
nf_conntrack_sctp_timeout_shutdown_recd - INTEGER (seconds)
- default 0.3
+ default 3
nf_conntrack_sctp_timeout_shutdown_ack_sent - INTEGER (seconds)
default 3
diff --git a/Documentation/process/embargoed-hardware-issues.rst b/Documentation/process/embargoed-hardware-issues.rst
index df978127f2d7..cb686238f21d 100644
--- a/Documentation/process/embargoed-hardware-issues.rst
+++ b/Documentation/process/embargoed-hardware-issues.rst
@@ -254,7 +254,6 @@ an involved disclosed party. The current ambassadors list:
Samsung Javier González <javier.gonz@samsung.com>
Microsoft James Morris <jamorris@linux.microsoft.com>
- VMware
Xen Andrew Cooper <andrew.cooper3@citrix.com>
Canonical John Johansen <john.johansen@canonical.com>
@@ -263,10 +262,8 @@ an involved disclosed party. The current ambassadors list:
Red Hat Josh Poimboeuf <jpoimboe@redhat.com>
SUSE Jiri Kosina <jkosina@suse.cz>
- Amazon
Google Kees Cook <keescook@chromium.org>
- GCC
LLVM Nick Desaulniers <ndesaulniers@google.com>
============= ========================================================
diff --git a/Documentation/process/security-bugs.rst b/Documentation/process/security-bugs.rst
index 82e29837d589..5a6993795bd2 100644
--- a/Documentation/process/security-bugs.rst
+++ b/Documentation/process/security-bugs.rst
@@ -63,31 +63,28 @@ information submitted to the security list and any followup discussions
of the report are treated confidentially even after the embargo has been
lifted, in perpetuity.
-Coordination
-------------
-
-Fixes for sensitive bugs, such as those that might lead to privilege
-escalations, may need to be coordinated with the private
-<linux-distros@vs.openwall.org> mailing list so that distribution vendors
-are well prepared to issue a fixed kernel upon public disclosure of the
-upstream fix. Distros will need some time to test the proposed patch and
-will generally request at least a few days of embargo, and vendor update
-publication prefers to happen Tuesday through Thursday. When appropriate,
-the security team can assist with this coordination, or the reporter can
-include linux-distros from the start. In this case, remember to prefix
-the email Subject line with "[vs]" as described in the linux-distros wiki:
-<http://oss-security.openwall.org/wiki/mailing-lists/distros#how-to-use-the-lists>
+Coordination with other groups
+------------------------------
+
+The kernel security team strongly recommends that reporters of potential
+security issues NEVER contact the "linux-distros" mailing list until
+AFTER discussing it with the kernel security team. Do not Cc: both
+lists at once. You may contact the linux-distros mailing list after a
+fix has been agreed on and you fully understand the requirements that
+doing so will impose on you and the kernel community.
+
+The different lists have different goals and the linux-distros rules do
+not contribute to actually fixing any potential security problems.
CVE assignment
--------------
-The security team does not normally assign CVEs, nor do we require them
-for reports or fixes, as this can needlessly complicate the process and
-may delay the bug handling. If a reporter wishes to have a CVE identifier
-assigned ahead of public disclosure, they will need to contact the private
-linux-distros list, described above. When such a CVE identifier is known
-before a patch is provided, it is desirable to mention it in the commit
-message if the reporter agrees.
+The security team does not assign CVEs, nor do we require them for
+reports or fixes, as this can needlessly complicate the process and may
+delay the bug handling. If a reporter wishes to have a CVE identifier
+assigned, they should find one by themselves, for example by contacting
+MITRE directly. However under no circumstances will a patch inclusion
+be delayed to wait for a CVE identifier to arrive.
Non-disclosure agreements
-------------------------