summaryrefslogtreecommitdiff
path: root/drivers/acpi/apei
AgeCommit message (Collapse)AuthorFilesLines
2024-01-19Merge tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds1-0/+89
Pull CXL (Compute Express Link) updates from Dan Williams: "The bulk of this update is support for enumerating the performance capabilities of CXL memory targets and connecting that to a platform CXL memory QoS class. Some follow-on work remains to hook up this data into core-mm policy, but that is saved for v6.9. The next significant update is unifying how CXL event records (things like background scrub errors) are processed between so called "firmware first" and native error record retrieval. The CXL driver handler that processes the record retrieved from the device mailbox is now the handler for that same record format coming from an EFI/ACPI notification source. This also contains miscellaneous feature updates, like Get Timestamp, and other fixups. Summary: - Add support for parsing the Coherent Device Attribute Table (CDAT) - Add support for calculating a platform CXL QoS class from CDAT data - Unify the tracing of EFI CXL Events with native CXL Events. - Add Get Timestamp support - Miscellaneous cleanups and fixups" * tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (41 commits) cxl/core: use sysfs_emit() for attr's _show() cxl/pci: Register for and process CPER events PCI: Introduce cleanup helpers for device reference counts and locks acpi/ghes: Process CXL Component Events cxl/events: Create a CXL event union cxl/events: Separate UUID from event structures cxl/events: Remove passing a UUID to known event traces cxl/events: Create common event UUID defines cxl/events: Promote CXL event structures to a core header cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe() cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge() cxl: Fix device reference leak in cxl_port_perf_data_calculate() cxl: Convert find_cxl_root() to return a 'struct cxl_root *' cxl: Introduce put_cxl_root() helper cxl/port: Fix missing target list lock cxl/port: Fix decoder initialization when nr_targets > interleave_ways cxl/region: fix x9 interleave typo cxl/trace: Pass UUID explicitly to event traces cxl/region: use %pap format to print resource_size_t cxl/region: Add dev_dbg() detail on failure to allocate HPA space ...
2024-01-10acpi/ghes: Process CXL Component EventsIra Weiny1-0/+89
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to the OS via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference is the use of a GUID in the Section Type rather than a UUID as part of the event itself. Add GHES support to detect CXL CPER records and call a registered callback with the event. A notifier chain was considered for the callback but the complexity did not justify the use case as only the CXL subsystem requires this event. Enforce that only one callback can be registered at any time. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-7-1bb8a4ca2c7a@intel.com [djbw: fixup checkpatch errors] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-21ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous eventsShuai Xue1-6/+23
There are two major types of uncorrected recoverable (UCR) errors : - Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For synchronous errors, kernel will kill the current process which accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all synchronous errors are handled as asynchronous errors in memory failure. To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-21ACPI: APEI: EINJ: Add support for vendor defined error typesAvadhut Naik1-0/+24
Vendor-Defined Error types are supported by the platform apart from standard error types if bit 31 is set in the output of GET_ERROR_TYPE Error Injection Action.[1] While the errors themselves and the length of their associated "OEM Defined data structure" might vary between vendors, the physical address of this structure can be computed through vendor_extension and length fields of "SET_ERROR_TYPE_WITH_ADDRESS" and "Vendor Error Type Extension" Structures respectively.[2][3] Currently, however, the einj module only computes the physical address of Vendor Error Type Extension Structure. Neither does it compute the physical address of OEM Defined structure nor does it establish the memory mapping required for injecting Vendor-defined errors. Consequently, userspace tools have to establish the very mapping through /dev/mem, nopat kernel parameter and system calls like mmap/munmap initially before injecting Vendor-defined errors. Circumvent the issue by computing the physical address of OEM Defined data structure and establishing the required mapping with the structure. Create a new file "oem_error", if the system supports Vendor-defined errors, to export this mapping, through debugfs_create_blob(). Userspace tools can then populate their respective OEM Defined structure instances and just write to the file as part of injecting Vendor-defined Errors. Similarly, the tools can also read from the file if the system firmware provides some information through the OEM defined structure after error injection. [1] ACPI specification 6.5, section 18.6.4 [2] ACPI specification 6.5, Table 18.31 [3] ACPI specification 6.5, Table 18.32 Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-21ACPI: APEI: EINJ: Refactor available_error_type_show()Avadhut Naik1-23/+24
OSPM can discover the error injection capabilities of the platform by executing GET_ERROR_TYPE error injection action.[1] The action returns a DWORD representing a bitmap of platform supported error injections.[2] The available_error_type_show() function determines the bits set within this DWORD and provides a verbose output, from einj_error_type_string array, through /sys/kernel/debug/apei/einj/available_error_type file. The function however, assumes one to one correspondence between an error's position in the bitmap and its array entry offset. Consequently, some errors like Vendor Defined Error Type fail this assumption and will incorrectly be shown as not supported, even if their corresponding bit is set in the bitmap and they have an entry in the array. Navigate around the issue by converting einj_error_type_string into an array of structures with a predetermined mask for all error types corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE action. The same breaks the aforementioned assumption resulting in all supported error types by a platform being outputted through the above available_error_type file. [1] ACPI specification 6.5, Table 18.25 [2] ACPI specification 6.5, Table 18.30 Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24ACPI: APEI: Use ERST timeout for slow devicesJeshua Smith1-4/+37
Slow devices such as flash may not meet the default 1ms timeout value, so use the ERST max execution time value that they provide as the timeout if it is larger. Signed-off-by: Jeshua Smith <jeshuas@nvidia.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-21ACPI: APEI: Fix AER info corruption when error status data has multiple sectionsShiju Jose1-1/+22
ghes_handle_aer() passes AER data to the PCI core for logging and recovery by calling aer_recover_queue() with a pointer to struct aer_capability_regs. The problem was that aer_recover_queue() queues the pointer directly without copying the aer_capability_regs data. The pointer was to the ghes->estatus buffer, which could be reused before aer_recover_work_func() reads the data. To avoid this problem, allocate a new aer_capability_regs structure from the ghes_estatus_pool, copy the AER data from the ghes->estatus buffer into it, pass a pointer to the new struct to aer_recover_queue(), and free it after aer_recover_work_func() has processed it. Reported-by: Bjorn Helgaas <helgaas@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12APEI: GHES: correctly return NULL for ghes_get_devices()Li Yang1-0/+2
Since 315bada690e0 ("EDAC: Check for GHES preference in the chipset-specific EDAC drivers"), vendor specific EDAC driver will not probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device is present. Make ghes_get_devices() return NULL when the GHES device list is empty to fix the problem. Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module") Signed-off-by: Li Yang <leoyang.li@nxp.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12ACPI: APEI: mark bert_disable as __initdataMiaohe Lin1-1/+1
It's only used inside the __init section. Mark it __initdata. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-05ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()Miaohe Lin1-2/+0
ghes_estatus_pool_size_request() is unused now, so remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-25efi: fix missing prototype warningsArnd Bergmann2-6/+1
The cper.c file needs to include an extra header, and efi_zboot_entry needs an extern declaration to avoid these 'make W=1' warnings: drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes] drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes] To make this easier, move the cper specific declarations to include/linux/cper.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-03-27ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platformShuai Xue1-1/+7
OSPM executes an EXECUTE_OPERATION action to instruct the platform to begin the injection operation, then executes a GET_COMMAND_STATUS action to determine the status of the completed operation. The ACPI Specification documented error codes[1] are: 0 = Success (Linux #define EINJ_STATUS_SUCCESS) 1 = Unknown failure (Linux #define EINJ_STATUS_FAIL) 2 = Invalid Access (Linux #define EINJ_STATUS_INVAL) The original code report -EBUSY for both "Unknown Failure" and "Invalid Access" cases. Actually, firmware could do some platform dependent sanity checks and returns different error codes, e.g. "Invalid Access" to indicate to the user that the parameters they supplied cannot be used for injection. To this end, fix to return -EINVAL in the __einj_error_inject() error handling case instead of always -EBUSY, when explicitly indicated by the platform in the status of the completed operation. [1] ACPI Specification 6.5 18.6.1. Error Injection Table Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-20ACPI: APEI: EINJ: Add CXL error typesTony Luck1-0/+6
ACPI 6.5 added six new error types for CXL. See chapter 18 table 18.30. Add strings for the new types so that Linux will list them in the /sys/kernel/debug/apei/einj/available_error_types file. It seems no other changes are needed. Linux already accepts the CXL codes (on a BIOS that advertises them). Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-30ACPI: APEI: EINJ: Limit error type to 32-bit widthShuai Xue1-0/+4
The bit map of error types to inject is 32-bit width [1]. Add parameter check to reflect the fact. [1] ACPI Specification 6.4, Section 18.6.4. Error Types Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-26treewide: Convert del_timer*() to timer_shutdown*()Steven Rostedt (Google)1-1/+1
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-13Merge tag 'edac_updates_for_6.2' of ↵Linus Torvalds1-3/+63
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Make ghes_edac a simple module like the rest of the EDAC drivers and drop the forced built-in only configuration by disentangling it from GHES (Jia He) - The usual small cleanups and improvements all over EDAC land * tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() EDAC/i5400: Fix typo in comment: vaious -> various EDAC/mc_sysfs: Increase legacy channel support to 12 MAINTAINERS: Make Mauro EDAC reviewer MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac EDAC/igen6: Return the correct error type when not the MC owner apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() EDAC: Check for GHES preference in the chipset-specific EDAC drivers EDAC/ghes: Make ghes_edac a proper module EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Add a notifier for reporting memory errors efi/cper: Export several helpers for ghes_edac to use EDAC/i5000: Mark as BROKEN
2022-12-07ACPI: APEI: EINJ: Refactor available_error_type_show()Jay Lu1-24/+17
Move error type descriptions into an array and loop over error types to improve readability and maintainability. Replace seq_printf() with seq_puts() as recommended by checkpatch.pl. Signed-off-by: Jay Lu <jaylu102@amd.com> Co-developed-by: Ben Cheatham <benjamin.cheatham@amd.com> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-07ACPI: APEI: EINJ: Fix formatting errorsJay Lu1-7/+8
Checkpatch reveals warnings and an error due to missing lines and incorrect indentations. Add the missing lines after declarations and fix the suspect indentations. Signed-off-by: Jay Lu <jaylu102@amd.com> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-02ACPI: APEI: Remove a useless includeChristophe JAILLET1-1/+0
This file does not use rcu, so there is no point in including <linux/rculist.h>. So just remove it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23ACPI: APEI: Silence missing prototype warningsSudeep Holla1-0/+1
Silence the following warnings when make W=1: | CC drivers/acpi/apei/apei-base.c | warning: no previous prototype for 'arch_apei_enable_cmcff' [-Wmissing-prototypes] | int __weak arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, | ^ | CC drivers/acpi/apei/apei-base.c | warning: no previous prototype for 'arch_apei_report_mem_error' [-Wmissing-prototypes] | void __weak arch_apei_report_mem_error(int sev, | ^ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()Ard Biesheuvel1-27/+33
Some documentation first, about how this machinery works: It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does: /* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. */ New elements are added to the cache this way: if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus); The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported. Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme. And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really *is* that element that gets removed and not one which replaced it in the meantime. Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available. Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped. And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case. This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally. Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all. [ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ] Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28ACPI: APEI: Drop unsetting driver data on removeUwe Kleine-König1-2/+0
Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no driver is bound") the driver core cares for cleaning driver data, so don't do it in the driver, too. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-24apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()Ard Biesheuvel1-27/+33
Some documentation first, about how this machinery works: It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does: /* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. */ New elements are added to the cache this way: if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus); The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported. Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme. And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really *is* that element that gets removed and not one which replaced it in the meantime. Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available. Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped. And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case. This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally. Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all. [ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ] Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
2022-10-21EDAC/ghes: Make ghes_edac a proper moduleJia He1-4/+0
Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") introduced a bug leading to ghes_edac_register() to be invoked before edac_init(). Because at that time the bus "edac" hadn't been even registered, this created sysfs nodes as /devices/mc0 instead of /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. Fix this by turning ghes_edac into a proper module. The list of GHES devices returned is not protected from being modified concurrently but it is pretty static as it gets created only during GHES init and latter is not a module so... [ bp: Massage. ] Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Co-developed-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
2022-10-21EDAC/ghes: Prepare to make ghes_edac a proper moduleJia He1-0/+50
To make ghes_edac a proper module, prepare to decouple its dependencies from GHES. Move the ghes_edac.force_load parameter to ghes.c in order to properly control whether ghes_edac should be force-loaded: In ghes_edac_register() it is too late to set the module flag. Introduce a helper ghes_get_devices(), which returns the list of GHES devices which got probed when the platform-check passes on the system. The previous force_load check is not needed in ghes_edac_unregister() since it will be checked in the module's init function of ghes_edac later. [ bp: Massage. ] Suggested-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
2022-10-20EDAC/ghes: Add a notifier for reporting memory errorsJia He1-1/+15
In order to make it a proper module and disentangle it from facilities, add a notifier for reporting memory errors. Use an atomic notifier because calls sites like ghes_proc_in_irq() run in interrupt context. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
2022-10-13ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()Ashish Kalra1-1/+1
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail. The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int: len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE); The following call trace is observed because of this bug: [ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-04ACPI: APEI: do not add task_work to kernel thread to avoid memory leakShuai Xue1-1/+1
If an error is detected as a result of user-space process accessing a corrupt memory location, the CPU may take an abort. Then the platform firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA, NOTIFY_SOFTWARE_DELEGATED, etc. For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue so that the work is processed before return to user-space. The code use init_mm to check whether the error occurs in user space: if (current->mm != &init_mm) The condition is always true, becase _nobody_ ever has "init_mm" as a real VM any more. In addition to abort, errors can also be signaled as asynchronous exceptions, such as interrupt and SError. In such case, the interrupted current process could be any kind of thread. When a kernel thread is interrupted, the work ghes_kick_task_work deferred to task_work will never be processed because entry_handler returns to call ret_to_kernel() instead of ret_to_user(). Consequently, the estatus_node alloced from ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed. After around 200 allocations in our platform, the ghes_estatus_pool will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a result, the event failed to be processed. sdei: event 805 on CPU 113 failed with error: -2 Finally, a lot of unhandled events may cause platform firmware to exceed some threshold and reboot. The condition should generally just do if (current->mm) as described in active_mm.rst documentation. Then if an asynchronous error is detected when a kernel thread is running, (e.g. when detected by a background scrubber), do not add task_work to it as the original patch intends to do. Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-24ACPI: APEI: Remove unneeded result variablesye xingchen2-9/+2
Return the erst_get_record_id_begin() and apei_exec_write_register() return values directly instead of storing them in redundant local variables. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03ACPI: APEI: Add BERT error log footerDmitry Monakhov1-0/+3
Print total number of records found during BERT log parsing. This also simplify dmesg parser implementation for BERT events. Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-29ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SPDan Williams1-0/+2
When a platform marks a memory range as "special purpose" it is not onlined as System RAM by default. However, it is still suitable for error injection. Add IORES_DESC_SOFT_RESERVED to einj_error_inject() as a permissible memory type in the sanity checking of the arguments to _EINJ. Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration") Reviewed-by: Tony Luck <tony.luck@intel.com> Reported-by: Omar Avelar <omar.avelar@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-29ACPI: APEI: Better fix to avoid spamming the console with old error logsTony Luck1-8/+23
The fix in commit 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT table data") does not work as intended on systems where the BIOS has a fixed size block of memory for the BERT table, relying on s/w to quit when it finds a record with estatus->block_status == 0. On these systems all errors are suppressed because the check: if (region_len < ACPI_BERT_PRINT_MAX_LEN) always fails. New scheme skips individual CPER records that are too large, and also limits the total number of records that will be printed to 5. Fixes: 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT table data") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-09ACPI: APEI: Fix double word in a commentXiang wangx1-1/+1
Delete the redundant word 'the'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> [ rjw: New subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-22ACPI, APEI, EINJ: Refuse to inject into the zero pageTony Luck1-0/+3
Some validation tests dynamically inject errors into memory used by applications to check that the system can recover from a variety of poison consumption sceenarios. But sometimes the virtual address picked by these tests is mapped to the zero page. This causes additional unexpected machine checks as other processes that map the zero page also consume the poison. Disallow injection to the zero page. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13ACPI: APEI: Fix missing ERST record idLiu Xinpeng2-7/+73
Read a record is cleared by others, but the deleted record cache entry is still created by erst_get_record_id_next. When next enumerate the records, get the cached deleted record, then erst_read() return -ENOENT and try to get next record, loop back to first ID will return 0 in function __erst_record_id_cache_add_one and then set record_id as APEI_ERST_INVALID_RECORD_ID, finished this time read operation. It will result in read the records just in the cache hereafter. This patch cleared the deleted record cache, fix the issue that "./erst-inject -p" shows record counts not equal to "./erst-inject -n". A reproducer of the problem(retry many times): [root@localhost erst-inject]# ./erst-inject -c 0xaaaaa00011 [root@localhost erst-inject]# ./erst-inject -p rc: 273 rcd sig: CPER rcd id: 0xaaaaa00012 rc: 273 rcd sig: CPER rcd id: 0xaaaaa00013 rc: 273 rcd sig: CPER rcd id: 0xaaaaa00014 [root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000006 [root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000007 [root@localhost erst-inject]# ./erst-inject -i 0xaaaaa000008 [root@localhost erst-inject]# ./erst-inject -p rc: 273 rcd sig: CPER rcd id: 0xaaaaa00012 rc: 273 rcd sig: CPER rcd id: 0xaaaaa00013 rc: 273 rcd sig: CPER rcd id: 0xaaaaa00014 [root@localhost erst-inject]# ./erst-inject -n total error record count: 6 Signed-off-by: Liu Xinpeng <liuxp11@chinatelecom.cn> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-22ACPI, APEI: Use the correct variable for sizeof()Jakob Koschel1-1/+1
While the original code is valid, it is not the obvious choice for the sizeof() call and in preparation to limit the scope of the list iterator variable the sizeof should be changed to the size of the variable being allocated. Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-09ACPI/APEI: Limit printable size of BERT table dataDarren Hart1-2/+6
Platforms with large BERT table data can trigger soft lockup errors while attempting to print the entire BERT table data to the console at boot: watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1] Observed on Ampere Altra systems with a single BERT record of ~250KB. The original bert driver appears to have assumed relatively small table data. Since it is impractical to reassemble large table data from interwoven console messages, and the table data is available in /sys/firmware/acpi/tables/data/BERT limit the size for tables printed to the console to 1024 (for no reason other than it seemed like a good place to kick off the discussion, would appreciate feedback from existing users in terms of what size would maintain their current usage model). Alternatively, we could make printing a CONFIG option, use the bert_disable boot arg (or something similar), or use a debug log level. However, all those solutions require extra steps or change the existing behavior for small table data. Limiting the size preserves existing behavior on existing platforms with small table data, and eliminates the soft lockups for platforms with large table data, while still making it available. Signed-off-by: Darren Hart <darren@os.amperecomputing.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-08ACPI: APEI: fix return value of __setup handlersRandy Dunlap3-3/+3
__setup() handlers should return 1 to indicate that the boot option has been handled. Returning 0 causes a boot option to be listed in the Unknown kernel command line parameters and also added to init's arg list (if no '=' sign) or environment list (if of the form 'a=b'). Unknown kernel command line parameters "erst_disable bert_disable hest_disable BOOT_IMAGE=/boot/bzImage-517rc6", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init erst_disable bert_disable hest_disable with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc6 Fixes: a3e2acc5e37b ("ACPI / APEI: Add Boot Error Record Table (BERT) support") Fixes: a08f82d08053 ("ACPI, APEI, Error Record Serialization Table (ERST) support") Fixes: 9dc966641677 ("ACPI, APEI, HEST table parsing") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-03ACPI: APEI: rename ghes_init() with an "acpi_" prefixShuai Xue1-1/+1
ghes_init() sticks out in acpi_init() because it is the only functions without an "acpi_" prefix. Rename ghes_init with an "acpi_" prefix, then all looks fine. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-03ACPI: APEI: explicit init of HEST and GHES in apci_init()Shuai Xue1-11/+8
From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage the estatus memory pool. On the other hand, ghes_init() relies on sdei_init() to detect the SDEI version and (un)register events. The dependencies are as follows: ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init() ghes_init() => sdei_init() HEST is not PCI-specific and initcall ordering is implicit and not well-defined within a level. Based on above, remove acpi_hest_init() from acpi_pci_root_init() and convert ghes_init() and sdei_init() from initcalls to explicit calls in the following order: acpi_hest_init() ghes_init() sdei_init() Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-15x86/sgx: Add check for SGX pages to ghes_do_memory_failure()Tony Luck1-1/+1
SGX EPC pages do not have a "struct page" associated with them so the pfn_valid() sanity check fails and results in a warning message to the console. Add an additional check to skip the warning if the address of the error is in an SGX EPC page. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
2021-11-15x86/sgx: Add hook to error injection address validationTony Luck1-1/+2
SGX reserved memory does not appear in the standard address maps. Add hook to call into the SGX code to check if an address is located in SGX memory. There are other challenges in injecting errors into SGX. Update the documentation with a sequence of operations to inject. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-7-tony.luck@intel.com
2021-10-27ACPI: APEI: mark apei_hest_parse() staticChristoph Hellwig1-2/+3
apei_hest_parse() is only used in hest.c, so mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> [ rjw: Minor subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-27ACPI: APEI: EINJ: Relax platform response timeout to 1 secondShuai Xue1-7/+8
When injecting an error into the platform, the OSPM executes an EXECUTE_OPERATION action to instruct the platform to begin the injection operation. And then, the OSPM busy waits for a while by continually executing CHECK_BUSY_STATUS action until the platform indicates that the operation is complete. More specifically, the platform is limited to respond within 1 millisecond right now. This is too strict for some platforms. For example, in Arm platform, when injecting a Processor Correctable error, the OSPM will warn: Firmware does not respond in time. And a message is printed on the console: echo: write error: Input/output error We observe that the waiting time for DDR error injection is about 10 ms and that for PCIe error injection is about 500 ms in Arm platform. In this patch, we relax the response timeout to 1 second. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-17ACPI: APEI: fix synchronous external aborts in user-modeXiaofei Tan1-17/+64
Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), do_sea() would unconditionally signal the affected task from the arch code. Since that change, the GHES driver sends the signals. This exposes a problem as errors the GHES driver doesn't understand or doesn't handle effectively are silently ignored. It will cause the errors get taken again, and circulate endlessly. User-space task get stuck in this loop. Existing firmware on Kunpeng9xx systems reports cache errors with the 'ARM Processor Error' CPER records. Do memory failure handling for ARM Processor Error Section just like for Memory Error Section. Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-21ACPI: APEI: Don't warn if ACPI is disabledJon Hunter1-1/+1
If ACPI is not enabled but support for ACPI and APEI is enabled in the kernel, then the following warning is seen on boot ... WARNING KERN EINJ: ACPI disabled. For ARM64 platforms, the 'acpi_disabled' variable is true by default and hence, the above is often seen on ARM64. Given that it can be normal for ACPI to be disabled, make this an informational print rather that a warning. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-26Merge branch 'acpi-misc'Rafael J. Wysocki2-2/+2
* acpi-misc: ACPI: dock: fix some coding style issues ACPI: sysfs: fix some coding style issues ACPI: PM: add a missed blank line after declarations ACPI: custom_method: fix a coding style issue ACPI: CPPC: fix some coding style issues ACPI: button: fix some coding style issues ACPI: battery: fix some coding style issues ACPI: acpi_pad: add a missed blank line after declarations ACPI: LPSS: add a missed blank line after declarations ACPI: ipmi: remove useless return statement for void function ACPI: processor: fix some coding style issues ACPI: APD: fix a block comment align issue ACPI: AC: fix some coding style issues ACPI: fix various typos in comments
2021-04-21ACPI: APEI: remove redundant assignment to variable rcColin Ian King1-1/+0
The variable rc is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-19ACPI: fix various typos in commentsTom Saeger2-2/+2
Fix trivial ACPI driver comment typos. s/notifcations/notifications/ s/Ajust/Adjust/ s/preform/perform/ s/atrributes/attributes/ s/Souce/Source/ s/Evalutes/Evaluates/ s/Evalutes/Evaluates/ s/specifiy/specify/ s/promixity/proximity/ s/presuambly/presumably/ s/Evalute/Evaluate/ s/specificed/specified/ s/rountine/routine/ s/previosuly/previously/ Change comment referencing pcc_send_cmd to send_pcc_cmd. Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-05ACPI: APEI: ERST: remove unneeded semicolonYang Li1-1/+1
Eliminate the following coccicheck warning: ./drivers/acpi/apei/erst.c:691:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>