From 69a7386c1ec25476a0c78ffeb59de08a2a08f495 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 15 Dec 2023 09:58:58 +0100 Subject: x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully Chris reported that a Dell PowerEdge T340 system stopped to boot when upgrading to a kernel which contains the parallel hotplug changes. Disabling parallel hotplug on the kernel command line makes it boot again. It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes up in X2APIC mode, but the APs come up inconsistently in xAPIC mode. Parallel hotplug requires that the upcoming CPU reads out its APIC ID from the local APIC in order to map it to the Linux CPU number. In this particular case the readout on the APs uses the MMIO mapped registers because the BIOS failed to enable x2APIC mode. That readout results in a page fault because the kernel does not have the APIC MMIO space mapped when X2APIC mode was enabled by the BIOS on the boot CPU and the kernel switched to X2APIC mode early. That page fault can't be handled on the upcoming CPU that early and results in a silent boot failure. If parallel hotplug is disabled the system boots because in that case the APIC ID read is not required as the Linux CPU number is provided to the AP in the smpboot control word. When the kernel uses x2APIC mode then the APs are switched to x2APIC mode too slightly later in the bringup process, but there is no reason to do it that late. Cure the BIOS bogosity by checking in the parallel bootup path whether the kernel uses x2APIC mode and if so switching over the APs to x2APIC mode before the APIC ID readout. Fixes: 0c7ffa32dbd6 ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it") Reported-by: Chris Lindee Signed-off-by: Thomas Gleixner Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Ashok Raj Tested-by: Chris Lindee Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CA%2B2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com --- arch/x86/kernel/head_64.S | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 086a2c3aaaa0..0f8103240fda 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -255,6 +255,22 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) testl $X2APIC_ENABLE, %eax jnz .Lread_apicid_msr +#ifdef CONFIG_X86_X2APIC + /* + * If system is in X2APIC mode then MMIO base might not be + * mapped causing the MMIO read below to fault. Faults can't + * be handled at that point. + */ + cmpl $0, x2apic_mode(%rip) + jz .Lread_apicid_mmio + + /* Force the AP into X2APIC mode. */ + orl $X2APIC_ENABLE, %eax + wrmsr + jmp .Lread_apicid_msr +#endif + +.Lread_apicid_mmio: /* Read the APIC ID from the fix-mapped MMIO space. */ movq apic_mmio_base(%rip), %rcx addq $APIC_ID, %rcx -- cgit v1.2.3 From 3ea1704a92967834bf0e64ca1205db4680d04048 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 7 Dec 2023 20:49:24 +0100 Subject: x86/alternatives: Sync core before enabling interrupts text_poke_early() does: local_irq_save(flags); memcpy(addr, opcode, len); local_irq_restore(flags); sync_core(); That's not really correct because the synchronization should happen before interrupts are re-enabled to ensure that a pending interrupt observes the complete update of the opcodes. It's not entirely clear whether the interrupt entry provides enough serialization already, but moving the sync_core() invocation into interrupt disabled region does no harm and is obviously correct. Fixes: 6fffacb30349 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()") Signed-off-by: Thomas Gleixner Signed-off-by: Borislav Petkov (AMD) Acked-by: Peter Zijlstra (Intel) Cc: Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com --- arch/x86/kernel/alternative.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 73be3931e4f0..fd44739828f7 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1685,8 +1685,8 @@ void __init_or_module text_poke_early(void *addr, const void *opcode, } else { local_irq_save(flags); memcpy(addr, opcode, len); - local_irq_restore(flags); sync_core(); + local_irq_restore(flags); /* * Could also do a CLFLUSH here to speed up CPU recovery; but -- cgit v1.2.3 From 2dc4196138055eb0340231aecac4d78c2ec2bea5 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 7 Dec 2023 20:49:26 +0100 Subject: x86/alternatives: Disable interrupts and sync when optimizing NOPs in place apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set special as it optimizes the existing NOPs in place. Unfortunately, this happens with interrupts enabled and does not provide any form of core synchronization. So an interrupt hitting in the middle of the update and using the affected code path will observe a half updated NOP and crash and burn. The following 3 NOP sequence was observed to expose this crash halfway reliably under QEMU 32bit: 0x90 0x90 0x90 which is replaced by the optimized 3 byte NOP: 0x8d 0x76 0x00 So an interrupt can observe: 1) 0x90 0x90 0x90 nop nop nop 2) 0x8d 0x90 0x90 undefined 3) 0x8d 0x76 0x90 lea -0x70(%esi),%esi 4) 0x8d 0x76 0x00 lea 0x0(%esi),%esi Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously. Disable interrupts around this NOP optimization and invoke sync_core() before re-enabling them. Fixes: 270a69c4485d ("x86/alternative: Support relocations in alternatives") Reported-by: Paul Gortmaker Signed-off-by: Thomas Gleixner Signed-off-by: Borislav Petkov (AMD) Acked-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com --- arch/x86/kernel/alternative.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index fd44739828f7..aae7456ece07 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -255,6 +255,16 @@ static void __init_or_module noinline optimize_nops(u8 *instr, size_t len) } } +static void __init_or_module noinline optimize_nops_inplace(u8 *instr, size_t len) +{ + unsigned long flags; + + local_irq_save(flags); + optimize_nops(instr, len); + sync_core(); + local_irq_restore(flags); +} + /* * In this context, "source" is where the instructions are placed in the * section .altinstr_replacement, for example during kernel build by the @@ -438,7 +448,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, * patch if feature is *NOT* present. */ if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) { - optimize_nops(instr, a->instrlen); + optimize_nops_inplace(instr, a->instrlen); continue; } -- cgit v1.2.3 From d5a10b976ecb77fa49b95f3f1016ca2997c122cb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 15 Dec 2023 15:19:32 +0100 Subject: x86/acpi: Handle bogus MADT APIC tables gracefully The recent fix to ignore invalid x2APIC entries inadvertently broke systems with creative MADT APIC tables. The affected systems have APIC MADT tables where all entries have invalid APIC IDs (0xFF), which means they register exactly zero CPUs. But the condition to ignore the entries of APIC IDs < 255 in the X2APIC MADT table is solely based on the count of MADT APIC table entries. As a consequence, the affected machines enumerate no secondary CPUs at all because the APIC table has entries and therefore the X2APIC table entries with APIC IDs < 255 are ignored. Change the condition so that the APIC table preference for APIC IDs < 255 only becomes effective when the APIC table has valid APIC ID entries. IOW, an APIC table full of invalid APIC IDs is considered to be empty which in consequence enables the X2APIC table entries with a APIC ID < 255 and restores the expected behaviour. Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries") Reported-by: John Sperbeck Reported-by: Andres Freund Signed-off-by: Thomas Gleixner Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/r/169953729188.3135.6804572126118798018.tip-bot2@tip-bot2 --- arch/x86/kernel/acpi/boot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 1a0dd80d81ac..85a3ce2a3666 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -293,6 +293,7 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) processor->processor_id, /* ACPI ID */ processor->lapic_flags & ACPI_MADT_ENABLED); + has_lapic_cpus = true; return 0; } @@ -1134,7 +1135,6 @@ static int __init acpi_parse_madt_lapic_entries(void) if (!count) { count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC, acpi_parse_lapic, MAX_LOCAL_APIC); - has_lapic_cpus = count > 0; x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC, acpi_parse_x2apic, MAX_LOCAL_APIC); } -- cgit v1.2.3