Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 04f4230e2f86a4e961ea5466eda3db8c1762004d upstream.
The definition of spectre_bhi_state() incorrectly returns a const char
* const. This causes the a compiler warning when building with W=1:
warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
2812 | static const char * const spectre_bhi_state(void)
Remove the const qualifier from the pointer.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5ce344beaca688f4cdea07045e0b8f03dc537e74 upstream.
When done from a virtual machine, instructions that touch APIC memory
must be emulated. By convention, MMIO accesses are typically performed
via io.h helpers such as readl() or writeq() to simplify instruction
emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention,
allowing the compiler to emit instructions other than the MOV
instruction generated by readl(). In particular, when the kernel is
compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
would emit a TESTL instruction which is not supported by the SEV-ES
emulator, causing a boot failure in that environment. It is likely the
same problem would happen in a TDX guest as that uses the same
instruction emulator as SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use
the readl() function in native_apic_mem_read(). It is expected that any
emulator would support MOV in any addressing mode as it is the most
generic and is what is usually emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined
into apic_mem_wait_icr_idle(). The emulator comes from
insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
extend insn_decode_mmio() to support more instructions since, in theory,
the compiler could choose to output nearly any instruction for such
reads which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
[ bp: Massage commit message, fix typos. ]
Signed-off-by: Adam Dunlap <acdunlap@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Kevin Loughlin <kevinloughlin@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dec8ced871e17eea46f097542dd074d022be4bd1 upstream.
On x86 each struct cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del(). This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling. Then it would return out of range
data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.disabled = 1,
};
void *worker(void *arg)
{
int cpu = (long)arg;
int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
void *p;
do {
ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
munmap(p, 4096);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
} while (1);
return NULL;
}
int main(void)
{
int i;
int n = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++)
pthread_create(&th[i], NULL, worker, (void *)(long)i);
for (i = 0; i < n; i++)
pthread_join(th[i], NULL);
free(th);
return 0;
}
And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread
$ ./repro &
$ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%)
1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%)
1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%)
2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%)
2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%)
2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%)
2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%)
3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%)
3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%)
3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%)
3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f72b544a514c07d34a0d9d5380f5905b3731e647 upstream.
spi0_lpcg: clock-controller@5a400000 {
... Col0 Col1
clocks = <&clk IMX_SC_R_SPI_0 IMX_SC_PM_CLK_PER>,// 0 1
<&dma_ipg_clk>; // 1 4
clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
lpspi0: spi@5a000000 {
...
clocks = <&spi0_lpcg 0>, <&spi0_lpcg 1>;
^ ^
Should be:
clocks = <&spi0_lpcg IMX_LPCG_CLK_0>, <&spi0_lpcg IMX_LPCG_CLK_4>;
};
Arg0 is divided by 4 in lpcg driver. <&spi0_lpcg 0> and <&spi0_lpcg 1> are
IMX_SC_PM_CLK_PER. Although code can work, code logic is wrong. It should
use IMX_LPCG_CLK_0 and IMX_LPCG_CLK_4 for lpcg arg0.
Cc: stable@vger.kernel.org
Fixes: c4098885e790 ("arm64: dts: imx8dxl: add lpspi support")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1d86c2b3946e69d6b0b93568d312aae6247847c0 upstream.
lpcg's arg0 should use clock indices instead of index.
pwm0_lpcg: clock-controller@5d400000 {
... // Col1 Col2
clocks = <&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 0 0
<&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 1 1
<&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>, // 2 4
<&lsio_bus_clk>, // 3 5
<&clk IMX_SC_R_PWM_0 IMX_SC_PM_CLK_PER>; // 4 6
clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_1>,
<IMX_LPCG_CLK_4>, <IMX_LPCG_CLK_5>,
<IMX_LPCG_CLK_6>;
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
pwm1 {
....
clocks = <&pwm1_lpcg 4>, <&pwm1_lpcg 1>;
^^ ^^
should be:
clocks = <&pwm1_lpcg IMX_LPCG_CLK_6>, <&pwm1_lpcg IMX_LPCG_CLK_1>;
};
Arg0 is divided by 4 in lpcg driver, so index 0 and 1 will be get by pwm
driver, which are same as IMX_LPCG_CLK_6 and IMX_LPCG_CLK_1. Even it can
work, but code logic is wrong. Fixed it by use correct indices.
Cc: stable@vger.kernel.org
Fixes: 23fa99b205ea ("arm64: dts: freescale: imx8-ss-lsio: add support for lsio_pwm0-3")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9055d87bce7276234173fa90e9702af31b3f5353 upstream.
adma_pwm_lpcg: clock-controller@5a590000 {
... col1 col2
clocks = <&clk IMX_SC_R_LCD_0_PWM_0 IMX_SC_PM_CLK_PER>,// 0 0
<&dma_ipg_clk>; // 1 4
clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
...
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
adma_pwm: pwm@5a190000 {
...
clocks = <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>;
^^ ^^
Should be
clocks = <&adma_pwm_lpcg IMX_LPCG_CLK_4>,
<&adma_pwm_lpcg IMX_LPCG_CLK_0>;
};
Arg0 will be divided by 4 in lcpg driver, so pwm will get IMX_SC_PM_CLK_PER
by <&adma_pwm_lpcg 1>, <&adma_pwm_lpcg 0>. Although function can work, code
logic is wrong. Fix it by use correct indices.
Cc: stable@vger.kernel.org
Fixes: f1d6a6b991ef ("arm64: dts: imx8qxp: add adma_pwm in adma")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 808e7716edcdb39d3498b9f567ef6017858b49aa upstream.
usb2_lpcg: clock-controller@5b270000 {
... Col1 Col2
clocks = <&conn_ahb_clk>, <&conn_ipg_clk>; // 0 6
clock-indices = <IMX_LPCG_CLK_6>, <IMX_LPCG_CLK_7>; // 0 7
...
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
usbotg1: usb@5b0d0000 {
...
clocks = <&usb2_lpcg 0>;
^^
Should be:
clocks = <&usb2_lpcg IMX_LPCG_CLK_6>;
};
usbphy1: usbphy@5b100000 {
clocks = <&usb2_lpcg 1>;
^^
SHould be:
clocks = <&usb2_lpcg IMX_LPCG_CLK_7>;
};
Arg0 is divided by 4 in lpcg driver. So lpcg will do dummy enable. Fix it
by use correct clock indices.
Cc: stable@vger.kernel.org
Fixes: 8065fc937f0f ("arm64: dts: imx8dxl: add usb1 and usb2 support")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81975080f14167610976e968e8016e92d836266f upstream.
adc0_lpcg: clock-controller@5ac80000 {
... Col1 Col2
clocks = <&clk IMX_SC_R_ADC_0 IMX_SC_PM_CLK_PER>, // 0 0
<&dma_ipg_clk>; // 1 4
clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>;
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
adc0: adc@5a880000 {
clocks = <&adc0_lpcg 0>, <&adc0_lpcg 1>;
^^ ^^
clocks = <&adc0_lpcg IMX_LPCG_CLK_0>, <&adc0_lpcg IMX_LPCG_CLK_4>;
Arg0 is divided by 4 in lpcg driver. So adc get IMX_SC_PM_CLK_PER by
<&adc0_lpcg 0>, <&adc0_lpcg 1>. Although function can work, code logic is
wrong. Fix it by using correct indices.
Cc: stable@vger.kernel.org
Fixes: 1db044b25d2e ("arm64: dts: imx8dxl: add adc0 support")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0893392334b5dffdf616a53679c6a2942c46391b upstream.
can0_lpcg: clock-controller@5acd0000 {
... Col1 Col2
clocks = <&clk IMX_SC_R_CAN_0 IMX_SC_PM_CLK_PER>, // 0 0
<&dma_ipg_clk>, // 1 4
<&dma_ipg_clk>; // 2 5
clock-indices = <IMX_LPCG_CLK_0>,
<IMX_LPCG_CLK_4>,
<IMX_LPCG_CLK_5>;
}
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver.
flexcan1: can@5a8d0000 {
clocks = <&can0_lpcg 1>, <&can0_lpcg 0>;
^^ ^^
Should be:
clocks = <&can0_lpcg IMX_LPCG_CLK_4>, <&can0_lpcg IMX_LPCG_CLK_0>;
};
Arg0 is divided by 4 in lpcg driver. flexcan driver get IMX_SC_PM_CLK_PER
by <&can0_lpcg 1> and <&can0_lpcg 0>. Although function can work, code
logic is wrong. Fix it by using correct clock indices.
Cc: stable@vger.kernel.org
Fixes: 5e7d5b023e03 ("arm64: dts: imx8qxp: add flexcan in adma")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 00b436182138310bb8d362b912b12a9df8f72ca3 upstream.
can1_lpcg: clock-controller@5ace0000 {
... Col1 Col2
clocks = <&clk IMX_SC_R_CAN_1 IMX_SC_PM_CLK_PER>,// 0 0
<&dma_ipg_clk>, // 1 4
<&dma_ipg_clk>; // 2 5
clock-indices = <IMX_LPCG_CLK_0>,
<IMX_LPCG_CLK_4>,
<IMX_LPCG_CLK_5>;
};
Col1: index, which existing dts try to get.
Col2: actual index in lpcg driver
&flexcan2 {
clocks = <&can1_lpcg 1>, <&can1_lpcg 0>;
^^ ^^
Should be:
clocks = <&can1_lpcg IMX_LPCG_CLK_4>, <&can1_lpcg IMX_LPCG_CLK_0>;
};
Arg0 is divided by 4 in lpcg driver. So flexcan get IMX_SC_PM_CLK_PER by
<&can1_lpcg 1> and <&can1_lpcg 0>. Although function work, code logic is
wrong. Fix it by using correct clock indices.
Cc: stable@vger.kernel.org
Fixes: be85831de020 ("arm64: dts: imx8qm: add can node in devicetree")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c6ddd6e7b166532a0816825442ff60f70aed9647 ]
The actual clock show wrong frequency:
echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
cat /sys/kernel/debug/mmc0/ios
clock: 200000000 Hz
actual clock: 166000000 Hz
^^^^^^^^^
.....
According to
sdhc0_lpcg: clock-controller@5b200000 {
compatible = "fsl,imx8qxp-lpcg";
reg = <0x5b200000 0x10000>;
#clock-cells = <1>;
clocks = <&clk IMX_SC_R_SDHC_0 IMX_SC_PM_CLK_PER>,
<&conn_ipg_clk>, <&conn_axi_clk>;
clock-indices = <IMX_LPCG_CLK_0>, <IMX_LPCG_CLK_4>,
<IMX_LPCG_CLK_5>;
clock-output-names = "sdhc0_lpcg_per_clk",
"sdhc0_lpcg_ipg_clk",
"sdhc0_lpcg_ahb_clk";
power-domains = <&pd IMX_SC_R_SDHC_0>;
}
"per_clk" should be IMX_LPCG_CLK_0 instead of IMX_LPCG_CLK_5.
After correct clocks order:
echo on >/sys/devices/platform/bus\@5b000000/5b010000.mmc/power/control
cat /sys/kernel/debug/mmc0/ios
clock: 200000000 Hz
actual clock: 198000000 Hz
^^^^^^^^
...
Fixes: 16c4ea7501b1 ("arm64: dts: imx8: switch to new lpcg clock binding")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6f8e0aca838e163e81fde176e945161d50679339 ]
When using usb-conn-gpio to control USB role and VBUS, the vbus-supply
property must be present in the usb-conn-gpio node. Additionally it
should not be present in the phy node as that isn't what controls vbus
and will upset the use count.
This resolves an issue where VBUS is enabled with OTG in peripheral
mode.
Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8cb10cba124c4798b6cb333245ecdc8dde78aeae ]
When using usb-conn-gpio to control USB role and VBUS, the vbus-supply
property must be present in the usb-conn-gpio node. Additionally it
should not be present in the phy node as that isn't what controls vbus
and will upset the use count.
This resolves an issue where VBUS is enabled with OTG in peripheral
mode.
Fixes: ad9a12f7a522 ("arm64: dts: imx8mp-venice: Fix USB connector description")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4421405e3634a3189b541cf1e34598e44260720d ]
GPIO chip labels are wrong for OMAP2, so the USB does not work. Fix.
Fixes: 8e0285ab95a9 ("ARM/musb: omap2: Remove global GPIO numbers from TUSB6010")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Message-ID: <20240223181656.1099845-1-aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 480d44d0820dd5ae043dc97c0b46dabbe53cb1cf ]
Trying to append a second table for the same dev_id doesn't seem to work.
The second table is just silently ignored. As a result eMMC GPIOs are not
present.
Fix by using separate tables for N800 and N810.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Message-ID: <20240223181439.1099750-3-aaro.koskinen@iki.fi>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95f37eb52e18879a1b16e51b972d992b39e50a81 ]
The GPIO bank width is 32 on OMAP2, so all labels are incorrect.
Fixes: e519f0bb64ef ("ARM/mmc: Convert old mmci-omap to GPIO descriptors")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Message-ID: <20240223181439.1099750-2-aaro.koskinen@iki.fi>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 135f218255b28c5bbf71e9e32a49e5c734cabbe5 upstream.
Since commit 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint
property verification") the ov2680 no longer probes on a imx7s-warp7:
ov2680 1-0036: error -EINVAL: supported link freq 330000000 not found
ov2680 1-0036: probe with driver ov2680 failed with error -22
Fix it by passing the required 'link-frequencies' property as
recommended by:
https://www.kernel.org/doc/html/v6.9-rc1/driver-api/media/camera-sensor.html#handling-clocks
Cc: stable@vger.kernel.org
Fixes: 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e3ba51ab24fddef79fc212f9840de54db8fd1685 upstream.
KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
pages are collected by VMM and the page table entries become write
protected during live migration. Unfortunately, the operand passed
to the TLBI RANGE instruction isn't correctly sorted out due to the
commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
It leads to crash on the destination VM after live migration because
TLBs aren't flushed completely and some of the dirty pages are missed.
For example, I have a VM where 8GB memory is assigned, starting from
0x40000000 (1GB). Note that the host has 4KB as the base page size.
In the middile of migration, kvm_tlb_flush_vmid_range() is executed
to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
__kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
in the __flush_tlb_range_op() until the variable @scale underflows
and becomes -9, 0xffff708000040000 is set as the operand. The operand
is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
invalid @scale and @num.
Fix it by extending __TLBI_RANGE_NUM() to support the combination of
SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
be returned from the macro, meaning the TLBs for 0x200000 pages in the
above example can be flushed in one shoot with SCALE#3 and NUM#31. The
macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
comments are also adjusted accordingly.
Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
Cc: stable@kernel.org # v6.6+
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240405035852.1532010-2-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4969d75dd9077e19e175e60f3c5a6c7653252e63 upstream.
In a similar fashion to
b388e57d4628 ("x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o")
annotate vdso-image-x32.o too for objtool so that it gets annotated
properly and the unused return thunk warning doesn't fire.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202403251454.23df6278-lkp@intel.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/202403251454.23df6278-lkp@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 38620fc4e8934f1801c7811ef39a041914ac4c1d ]
When running as PVH or HVM Linux will use holes in the memory map as scratch
space to map grants, foreign domain pages and possibly miscellaneous other
stuff. However the usage of such memory map holes for Xen purposes can be
problematic. The request of holesby Xen happen quite early in the kernel boot
process (grant table setup already uses scratch map space), and it's possible
that by then not all devices have reclaimed their MMIO space. It's not
unlikely for chunks of Xen scratch map space to end up using PCI bridge MMIO
window memory, which (as expected) causes quite a lot of issues in the system.
At least for PVH dom0 we have the possibility of using regions marked as
UNUSABLE in the e820 memory map. Either if the region is UNUSABLE in the
native memory map, or it has been converted into UNUSABLE in order to hide RAM
regions from dom0, the second stage translation page-tables can populate those
areas without issues.
PV already has this kind of logic, where the balloon driver is inflated at
boot. Re-use the current logic in order to also inflate it when running as
PVH. onvert UNUSABLE regions up to the ratio specified in EXTRA_MEM_RATIO to
RAM, while reserving them using xen_add_extra_mem() (which is also moved so
it's no longer tied to CONFIG_PV).
[jgross: fixed build for CONFIG_PVH without CONFIG_XEN_PVH]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240220174341.56131-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 29297ffffb0bf388778bd4b581a43cee6929ae65 ]
The Revision Guide for AMD Family 19h Model 10-1Fh processors declares
Erratum 1452 which states that non-branch entries may erroneously be
recorded in the Last Branch Record (LBR) stack with the valid and
spec bits set.
Such entries can be recognized by inspecting bit 61 of the corresponding
LastBranchStackToIp register. This bit is currently reserved but if found
to be set, the associated branch entry should be discarded.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://bugzilla.kernel.org/attachment.cgi?id=305518
Link: https://lore.kernel.org/r/3ad2aa305f7396d41a40e3f054f740d464b16b7f.1706526029.git.sandipan.das@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cdea98bf1faef23166262825ce44648be6ebff42 ]
The Asus B1400 with original shipped firmware versions and VMD disabled
cannot resume from suspend: the NVMe device becomes unresponsive and
inaccessible.
This appears to be an untested D3cold transition by the vendor; Intel
socwatch shows that Windows leaves the NVMe device and parent bridge in D0
during suspend, even though these firmware versions have StorageD3Enable=1.
The NVMe device and parent PCI bridge both share the same "PXP" ACPI power
resource, which gets turned off as both devices are put into D3cold during
suspend. The _OFF() method calls DL23() which sets a L23E bit at offset
0xe2 into the PCI configuration space for this root port. This is the
specific write that the _ON() routine is unable to recover from. This
register is not documented in the public chipset datasheet.
Disallow D3cold on the PCI bridge to enable successful suspend/resume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742
Link: https://lore.kernel.org/r/20240228075316.7404-1-drake@endlessos.org
Signed-off-by: Daniel Drake <drake@endlessos.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jian-Hong Pan <jhp@endlessos.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b388e57d4628eb22782bdad4cd5b83ca87a1b7c9 ]
For CONFIG_RETHUNK kernels, objtool annotates all the function return
sites so they can be patched during boot. By design, after
apply_returns() is called, all tail-calls to the compiler-generated
default return thunk (__x86_return_thunk) should be patched out and
replaced with whatever's needed for any mitigations (or lack thereof).
The commit
4461438a8405 ("x86/retpoline: Ensure default return thunk isn't used at runtime")
adds a runtime check and a WARN_ONCE() if the default return thunk ever
gets executed after alternatives have been applied. This warning is
a sanity check to make sure objtool and apply_returns() are doing their
job.
As Nathan reported, that check found something:
Unpatched return thunk in use. This should not happen!
WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/bugs.c:2856 __warn_thunk+0x27/0x40
RIP: 0010:__warn_thunk+0x27/0x40
Call Trace:
<TASK>
? show_regs
? __warn
? __warn_thunk
? report_bug
? console_unlock
? handle_bug
? exc_invalid_op
? asm_exc_invalid_op
? ia32_binfmt_init
? __warn_thunk
warn_thunk_thunk
do_one_initcall
kernel_init_freeable
? __pfx_kernel_init
kernel_init
ret_from_fork
? __pfx_kernel_init
ret_from_fork_asm
</TASK>
Boris debugged to find that the unpatched return site was in
init_vdso_image_64(), and its translation unit wasn't being analyzed by
objtool, so it never got annotated. So it got ignored by
apply_returns().
This is only a minor issue, as this function is only called during boot.
Still, objtool needs full visibility to the kernel. Fix it by enabling
objtool on vdso-image-{32,64}.o.
Note this problem can only be seen with !CONFIG_X86_KERNEL_IBT, as that
requires objtool to run individually on all translation units rather on
vmlinux.o.
[ bp: Massage commit message. ]
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240215032049.GA3944823@dev-arch.thelio-3990X
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c6bef576a8891abce08d448165b53328032aa5f ]
The SC7280 GCC binding describes clocks which, due to the difference in
security model, are not accessible on the RB3gen2 - in the same way seen
on QCM6490.
Mark these clocks as protected, to allow the board to boot. In contrast
to the present QCM6490 boards GCC_EDP_CLKREF_EN is left out, as this
does not need to be "protected" and is used on the RB3Gen2 board.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20240209-qcm6490-gcc-protected-clocks-v2-1-11cd5fc13bd0@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f051b6ace7ffcc48d6d1017191f167c0a85799f6 ]
Fix rk3399 hdmi ports node so that it matches the
rockchip,dw-hdmi.yaml binding.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/a6ab6f75-3b80-40b1-bd30-3113e14becdd@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1d00ba4700d1e0f88ae70d028d2e17e39078fa1c ]
Fix rk3328 hdmi ports node so that it matches the
rockchip,dw-hdmi.yaml binding.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/e5dea3b7-bf84-4474-9530-cc2da3c41104@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 15a5ed03000cf61daf87d14628085cb1bc8ae72c ]
Fix rk322x hdmi ports node so that it matches the
rockchip,dw-hdmi.yaml binding.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/9b84adf0-9312-47fd-becc-cadd06941f70@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 585e4dc07100a6465b3da8d24e46188064c1c925 ]
Fix rk3288 hdmi ports node so that it matches the
rockchip,dw-hdmi.yaml binding with some reordering
to align with the (new) documentation about
property ordering.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/cc3a9b4f-076d-4660-b464-615003b6a066@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7f492d48f08207e4ee23edc926b11de9f720aa61 ]
If cluster domain idle state is enabled on the RB1, the board becomes
significantly less responsive. Under certain circumstances (if some of
the devices are disabled in kernel config) the board can even lock up.
It seems this is caused by the MPM not updating wakeup timer during CPU
idle (in the same way the RPMh updates it when cluster idle state is
entered).
Disable cluster domain idle for the RB1 board until MPM driver is fixed
to cooperate with the CPU idle states.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240130-rb1-suspend-cluster-v2-1-5bc1109b0869@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8385383cc2c2f7039ecc57864043112cdc7026c7 ]
Add definition for three LEDs to make sure they can
be enabled base on QCOM LPG LED driver.
Signed-off-by: Hui Liu <quic_huliu@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240126-lpg-v6-1-f879cecbce69@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2bb69f5fc72183e1c62547d900f560d0e9334925 upstream.
Part of a merge commit from Linus that adjusted the default setting of
SPECTRE_BHI_ON.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ed2e8d49b54d677f3123668a21a57822d679651f upstream.
Intel processors that aren't vulnerable to BHI will set
MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to
determine if they need to implement BHI mitigations or not. Allow this bit
to be passed to the guests.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 95a6ccbdc7199a14b71ad8901cb788ba7fb5167b upstream.
BHI mitigation mode spectre_bhi=auto does not deploy the software
mitigation by default. In a cloud environment, it is a likely scenario
where userspace is trusted but the guests are not trusted. Deploying
system wide mitigation in such cases is not desirable.
Update the auto mode to unconditionally mitigate against malicious
guests. Deploy the software sequence at VMexit in auto mode also, when
hardware mitigation is not available. Unlike the force =on mode,
software sequence is not deployed at syscalls in auto mode.
Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ec9404e40e8f36421a2b66ecb76dc2209fe7f3ef upstream.
Branch history clearing software sequences and hardware control
BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
auto - Deploy the hardware mitigation BHI_DIS_S, if available.
on - Deploy the hardware mitigation BHI_DIS_S, if available,
otherwise deploy the software sequence at syscall entry and
VMexit.
off - Turn off BHI mitigation.
The default is auto mode which does not deploy the software sequence
mitigation. This is because of the hardening done in the syscall
dispatch path, which is the likely target of BHI.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit be482ff9500999f56093738f9219bbabc729d163 upstream.
Mitigation for BHI is selected based on the bug enumeration. Add bits
needed to enumerate BHI bug.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0f4a837615ff925ba62648d280a861adf1582df7 upstream.
Newer processors supports a hardware control BHI_DIS_S to mitigate
Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
from userspace BHI attacks without having to manually overwrite the
branch history.
Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
Mitigation is enabled later.
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7390db8aea0d64e9deb28b8e1ce716f5020c7ee5 upstream.
Branch History Injection (BHI) attacks may allow a malicious application to
influence indirect branch prediction in kernel by poisoning the branch
history. eIBRS isolates indirect branch targets in ring0. The BHB can
still influence the choice of indirect branch predictor entry, and although
branch predictor entries are isolated between modes when eIBRS is enabled,
the BHB itself is not isolated between modes.
Alder Lake and new processors supports a hardware control BHI_DIS_S to
mitigate BHI. For older processors Intel has released a software sequence
to clear the branch history on parts that don't support BHI_DIS_S. Add
support to execute the software sequence at syscall entry and VMexit to
overwrite the branch history.
For now, branch history is not cleared at interrupt entry, as malicious
applications are not believed to have sufficient control over the
registers, since previous register state is cleared at interrupt
entry. Researchers continue to poke at this area and it may become
necessary to clear at interrupt entry as well in the future.
This mitigation is only defined here. It is enabled later.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Co-developed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1e3ad78334a69b36e107232e337f9d693dcc9df2 upstream.
Make <asm/syscall.h> build a switch statement instead, and the compiler can
either decide to generate an indirect jump, or - more likely these days due
to mitigations - just a series of conditional branches.
Yes, the conditional branches also have branch prediction, but the branch
prediction is much more controlled, in that it just causes speculatively
running the wrong system call (harmless), rather than speculatively running
possibly wrong random less controlled code gadgets.
This doesn't mitigate other indirect calls, but the system call indirection
is the first and most easily triggered case.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0cd01ac5dcb1e18eb18df0f0d05b5de76522a437 upstream.
Change the format of the 'spectre_v2' vulnerabilities sysfs file
slightly by converting the commas to semicolons, so that mitigations for
future variants can be grouped together and separated by commas.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9c55461040a9264b7e44444c53d26480b438eda6 upstream.
Currently, the EFI stub invokes the EFI memory attributes protocol to
strip any NX restrictions from the entire loaded kernel, resulting in
all code and data being mapped read-write-execute.
The point of the EFI memory attributes protocol is to remove the need
for all memory allocations to be mapped with both write and execute
permissions by default, and make it the OS loader's responsibility to
transition data mappings to code mappings where appropriate.
Even though the UEFI specification does not appear to leave room for
denying memory attribute changes based on security policy, let's be
cautious and avoid relying on the ability to create read-write-execute
mappings. This is trivially achievable, given that the amount of kernel
code executing via the firmware's 1:1 mapping is rather small and
limited to the .head.text region. So let's drop the NX restrictions only
on that subregion, but not before remapping it as read-only first.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 428080c9b19bfda37c478cd626dbd3851db1aff9 upstream.
In preparation for implementing rigorous build time checks to enforce
that only code that can support it will be called from the early 1:1
mapping of memory, move SEV init code that is called in this manner to
the .head.text section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240227151907.387873-19-ardb+git@google.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 48204aba801f1b512b3abed10b8e1a63e03f3dd1 upstream.
The .head.text section is the initial primary entrypoint of the core
kernel, and is entered with the CPU executing from a 1:1 mapping of
memory. Such code must never access global variables using absolute
references, as these are based on the kernel virtual mapping which is
not active yet at this point.
Given that the SME startup code is also called from this early execution
context, move it into .head.text as well. This will allow more thorough
build time checks in the future to ensure that early startup code only
uses RIP-relative references to global variables.
Also replace some occurrences of __pa_symbol() [which relies on the
compiler generating an absolute reference, which is not guaranteed] and
an open coded RIP-relative access with RIP_REL_REF().
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240227151907.387873-18-ardb+git@google.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cd0d9d92c8bb46e77de62efd7df13069ddd61e7d upstream.
The early SME/SEV code parses the command line very early, in order to
decide whether or not memory encryption should be enabled, which needs
to occur even before the initial page tables are created.
This is problematic for a number of reasons:
- this early code runs from the 1:1 mapping provided by the decompressor
or firmware, which uses a different translation than the one assumed by
the linker, and so the code needs to be built in a special way;
- parsing external input while the entire kernel image is still mapped
writable is a bad idea in general, and really does not belong in
security minded code;
- the current code ignores the built-in command line entirely (although
this appears to be the case for the entire decompressor)
Given that the decompressor/EFI stub is an intrinsic part of the x86
bootable kernel image, move the command line parsing there and out of
the core kernel. This removes the need to build lib/cmdline.o in a
special way, or to use RIP-relative LEA instructions in inline asm
blocks.
This involves a new xloadflag in the setup header to indicate
that mem_encrypt=on appeared on the kernel command line.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240227151907.387873-17-ardb+git@google.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 3428faa0d675f6a6284331731b762dc041011b3c which is
commit f2208aa12c27bfada3c15c550c03ca81d42dcac2 upstream.
It is reported to cause problems in the stable branches, so revert it.
Link: https://lore.kernel.org/r/899b7c1419a064a2b721b78eade06659@stwm.de
Reported-by: Wolfgang Walter <linux@stwm.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 upstream.
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear@fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d080a08b06b6266cc3e0e86c5acfd80db937cb6b upstream.
These macros did not initialize __kr_err, so they could fail even if
the access did not fault.
Cc: stable@vger.kernel.org
Fixes: d464118cdc41 ("riscv: implement __get_kernel_nofault and __put_user_nofault")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 378ca2d2ad410a1cd5690d06b46c5e2297f4c8c0 upstream.
Align system call table on 8 bytes. With sys_call_table entry size
of 8 bytes that eliminates the possibility of a system call pointer
crossing cache line boundary.
Cc: stable@kernel.org
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b017a0cea627fcbe158fc2c214fe893e18c4d0c4 upstream.
The SVE register sets have two different formats, one of which is a wrapped
version of the standard FPSIMD register set and another with actual SVE
register data. At present we check TIF_SVE to see if full SVE register
state should be provided when reading the SVE regset but if we were in a
syscall we may have saved only floating point registers even though that is
set.
Fix this and simplify the logic by checking and using the format which we
recorded when deciding if we should use FPSIMD or SVE format.
Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Cc: <stable@vger.kernel.org> # 6.2.x
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240325-arm64-ptrace-fp-type-v1-1-8dc846caf11f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c27fa53b858b4ee6552a719aa599c250cf98a586 upstream.
The RISC-V Vector specification states in "Appendix D: Calling
Convention for Vector State" [1] that "Executing a system call causes
all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
become unspecified.". In the RISC-V kernel this is called "discarding
the vstate".
Returning from a signal handler via the rt_sigreturn() syscall, vector
discard is also performed. However, this is not an issue since the
vector state should be restored from the sigcontext, and therefore not
care about the vector discard.
The "live state" is the actual vector register in the running context,
and the "vstate" is the vector state of the task. A dirty live state,
means that the vstate and live state are not in synch.
When vectorized user_from_copy() was introduced, an bug sneaked in at
the restoration code, related to the discard of the live state.
An example when this go wrong:
1. A userland application is executing vector code
2. The application receives a signal, and the signal handler is
entered.
3. The application returns from the signal handler, using the
rt_sigreturn() syscall.
4. The live vector state is discarded upon entering the
rt_sigreturn(), and the live state is marked as "dirty", indicating
that the live state need to be synchronized with the current
vstate.
5. rt_sigreturn() restores the vstate, except the Vector registers,
from the sigcontext
6. rt_sigreturn() restores the Vector registers, from the sigcontext,
and now the vectorized user_from_copy() is used. The dirty live
state from the discard is saved to the vstate, making the vstate
corrupt.
7. rt_sigreturn() returns to the application, which crashes due to
corrupted vstate.
Note that the vectorized user_from_copy() is invoked depending on the
value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which
means that vlen has to be larger than 128b for this bug to trigger.
The fix is simply to mark the live state as non-dirty/clean prior
performing the vstate restore.
Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1]
Reported-by: Charlie Jenkins <charlie@rivosinc.com>
Reported-by: Vineet Gupta <vgupta@kernel.org>
Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Vineet Gupta <vineetg@rivosinc.com>
Link: https://lore.kernel.org/r/20240403072638.567446-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 312be9fc2234c8acfb8148a9f4c358b70d358dee upstream.
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups
should be generated into a PEBS record, and it's shared among all counters.
If there are different configurations among counters, perf combines all the
configurations.
The first perf command as below requires a complete PEBS record
(including memory info, GPRs, XMMs, and LBRs). The second perf command
only requires a basic group. However, after the second perf command is
running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is
generated in a PEBS record, which is wrong. The required information
for the first perf command is missed.
$ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp &
$ sleep 5
$ perf record --per-thread -c 1 -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000
The first PEBS event is a system-wide PEBS event. The second PEBS event
is a per-thread event. When the thread is scheduled out, the
intel_pmu_pebs_del() function is invoked to update the PEBS state.
Since the system-wide event is still available, the cpuc->n_pebs is 1.
The cpuc->pebs_data_cfg is cleared. The data configuration for the
system-wide PEBS event is lost.
The (cpuc->n_pebs == 1) check was introduced in commit:
b6a32f023fcc ("perf/x86: Fix PEBS threshold initialization")
At that time, it indeed didn't hurt whether the state was updated
during the removal, because only the threshold is updated.
The calculation of the threshold takes the last PEBS event into
account.
However, since commit:
b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
we delay the threshold update, and clear the PEBS data config, which triggers
the bug.
The PEBS data config update scope should not be shrunk during removal.
[ mingo: Improved the changelog & comments. ]
Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240401133320.703971-1-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|