summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2022-04-14EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta1-5/+11
Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com
2022-04-11EDAC/mc: Get rid of edac_align_ptr()Borislav Petkov2-57/+0
Get rid of it now that it is unused. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-6-bp@alien8.de
2022-04-11EDAC/device: Sanitize edac_device_alloc_ctl_info() definitionBorislav Petkov1-16/+16
Shorten argument names, correct formatting, sort local vars in reverse x-mas tree order. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-5-bp@alien8.de
2022-04-11EDAC/device: Get rid of the silly one-shot memory allocation in ↵Borislav Petkov3-71/+57
edac_device_alloc_ctl_info() Use boring kzalloc() instead. Add pointers to the different allocated members in struct edac_device_ctl_info for easier freeing later. One of the reasons, perhaps, why it was done this way is to be able to do a single kfree(ctl_info) without having to kfree() the other parts of the struct too but that is not nearly a sensible reason as to why there should be this obscure pointer alignment. There should be no functional changes resulting from this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-4-bp@alien8.de
2022-04-11EDAC/pci: Get rid of the silly one-shot memory allocation in ↵Borislav Petkov1-13/+12
edac_pci_alloc_ctl_info() Use boring kzalloc() instead. There should be no functional changes resulting from this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-3-bp@alien8.de
2022-04-11EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc()Borislav Petkov1-28/+13
This has probably meant something at some point but there's no need for it anymore - the struct mem_ctl_info allocation can happen with normal, boring k*alloc() calls like everyone else does it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-2-bp@alien8.de
2022-04-08EDAC/ghes: Unify CPER memory error location reportingShuai Xue2-163/+38
Switch the GHES EDAC memory error reporting functions to use the common CPER ones and get rid of code duplication. [ bp: - rewrite commit message, remove useless text - rip out useless reformatting - align function params on the opening brace - rename function to a more descriptive name - drop useless function exports - handle buffer lengths properly when printing other detail - remove useless casting ] Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220308144053.49090-3-xueshuai@linux.alibaba.com
2022-04-05powerpc/85xx: Remove fsl,85... bindingsChristophe Leroy1-14/+0
Since 8a4ab218ef70 ("powerpc/85xx: Change deprecated binding for 85xx-based boards") those bindings are not used anymore. A comment in drivers/edac/mpc85xx_edac.c say they are to be removed with kernel 2.6.30. Remove them now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Scott Wood <oss@buserror.net> Link: https://lore.kernel.org/r/82a8bc4450a4daee50ee5fada75621fecb3703ff.1648721299.git.christophe.leroy@csgroup.eu
2022-04-05x86/amd_nb: Unexport amd_cache_northbridges()Muralidhara M K1-1/+1
amd_cache_northbridges() is exported by amd_nb.c and is called by amd64-agp.c and amd64_edac.c modules at module_init() time so that NB descriptors are properly cached before those drivers can use them. However, the init_amd_nbs() initcall already does call amd_cache_northbridges() unconditionally and thus makes sure the NB descriptors are enumerated. That initcall is a fs_initcall type which is on the 5th group (starting from 0) of initcalls that gets run in increasing numerical order by the init code. The module_init() call is turned into an __initcall() in the MODULE=n case and those are device-level initcalls, i.e., group 6. Therefore, the northbridges caching is already finished by the time module initialization starts and thus the correct initialization order is retained. Unexport amd_cache_northbridges(), update dependent modules to call amd_nb_num() instead. While at it, simplify the checks in amd_cache_northbridges(). [ bp: Heavily massage and *actually* explain why the change is ok. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324122729.221765-1-nchatrad@amd.com
2022-03-21Merge branch 'edac-amd64' into edac-updates-for-v5.18Borislav Petkov5-25/+114
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-03-16EDAC/altera: Add SDRAM ECC check for U-BootRabara Niravkumar L1-1/+39
A bug in legacy U-Boot causes a crash during SDRAM boot if ECC is not enabled in the bitstream but enabled in the Linux config. Memory mapped read of the ECC Enabled bit was only enabled if U-Boot determined ECC was enabled in the bitstream. The Linux driver checks the ECC enable bit using a memory map read. In the ECC disabled bitstream case, U-Boot didn't enable ECC register memory map reads and since they are not allowed this results in a crash. Always read the ECC Enable register through a SMC call which is always allowed and it works with legacy and current U-Boot. [ bp: Massage commit message. ] Signed-off-by: Rabara Niravkumar L <niravkumar.l.rabara@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20220305014118.4794-1-niravkumar.l.rabara@intel.com
2022-02-24EDAC/amd64: Add new register offset support and related changesYazen Ghannam2-16/+78
Introduce a "family flags" bitmask that can be used to indicate any special behavior needed on a per-family basis. Add a flag to indicate a system uses the new register offsets introduced with Family 19h Model 10h. Use this flag to account for register offset changes, a new bitfield indicating DDR5 use on a memory controller, and to set the proper number of chip select masks. Rework f17_addr_mask_to_cs_size() to properly handle the change in chip select masks. And update code comments to reflect the updated Chip Select, DIMM, and Mask relationships. [uninitialized variable warning] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20220202144307.2678405-3-yazen.ghannam@amd.com
2022-02-23EDAC/amd64: Set memory type per DIMMYazen Ghannam2-13/+40
Current AMD systems allow mixing of DIMM types within a system. However, DIMMs within a channel, i.e. managed by a single Unified Memory Controller (UMC), must be of the same type. Handle this possible configuration by checking and setting the memory type for each individual "UMC" structure. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20220202144307.2678405-2-yazen.ghannam@amd.com
2022-02-15EDAC: Fix calculation of returned address and next offset in edac_align_ptr()Eliav Farber1-1/+1
Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com
2022-01-30EDAC/xgene: Fix deferred probingSergey Shtylyov1-1/+1
The driver overrides error codes returned by platform_get_irq_optional() to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru
2022-01-28EDAC/altera: Fix deferred probingSergey Shtylyov1-1/+1
The driver overrides the error codes returned by platform_get_irq() to -ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru
2022-01-26EDAC/mc: Remove unnecessary cast to char * in edac_align_ptr()Eliav Farber1-2/+2
Remove the forgotten (char *) casts as that function returns void *. [ bp: Rewrite commit message. ] Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220113100622.12783-3-farbere@amazon.com
2022-01-23EDAC: Use default_groups in kobj_typeGreg Kroah-Hartman2-10/+15
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the edac sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that the obsolete default_attrs field can be removed soon. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220104112401.1067148-2-gregkh@linuxfoundation.org
2022-01-23EDAC: Use proper list of struct attribute for attributesGreg Kroah-Hartman2-26/+26
The EDAC sysfs code is doing some crazy casting of the list of attributes that is not necessary at all. Instead, properly point to the correct attribute structure in the lists, which removes the need to cast anything and the code is now properly typesafe (as much as sysfs attribute logic is typesafe...) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220104112401.1067148-1-gregkh@linuxfoundation.org
2022-01-10Merge tag 'edac_updates_for_v5.17_rc1' of ↵Linus Torvalds7-14/+90
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add support for version 3 of the Synopsys DDR controller to synopsys_edac - Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD family 0x19 CPUs to amd64_edac - The usual set of fixes and cleanups * tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Add support for family 19h, models 50h-5fh EDAC/sb_edac: Remove redundant initialization of variable rc RAS/CEC: Remove a repeated 'an' in a comment EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh EDAC: Add RDDR5 and LRDDR5 memory types EDAC/sifive: Fix non-kernel-doc comment dt-bindings: memory: Add entry for version 3.80a EDAC/synopsys: Enable the driver on Intel's N5X platform EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR EDAC/synopsys: Use the quirk for version instead of ddr version
2022-01-10Merge tag 'ras_core_for_v5.17_rc1' of ↵Linus Torvalds2-16/+405
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "A relatively big amount of movements in RAS-land this time around: - First part of a series to move the AMD address translation code from arch/x86/ to amd64_edac as that is its only user anyway - Some MCE error injection improvements to the AMD side - Reorganization of the #MC handler code and the facilities it calls to make it noinstr-safe - Add support for new AMD MCA bank types and non-uniform banks layout - The usual set of cleanups and fixes" * tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/mce: Reduce number of machine checks taken during recovery x86/mce/inject: Avoid out-of-bounds write when setting flags x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types x86/mce: Check regs before accessing it x86/mce: Mark mce_start() noinstr x86/mce: Mark mce_timed_out() noinstr x86/mce: Move the tainting outside of the noinstr region x86/mce: Mark mce_read_aux() noinstr x86/mce: Mark mce_end() noinstr x86/mce: Mark mce_panic() noinstr x86/mce: Prevent severity computation from being instrumented x86/mce: Allow instrumentation during task work queueing x86/mce: Remove noinstr annotation from mce_setup() x86/mce: Use mce_rdmsrl() in severity checking code x86/mce: Remove function-local cpus variables x86/mce: Do not use memset to clear the banks bitmaps x86/mce/inject: Set the valid bit in MCA_STATUS before error injection x86/mce/inject: Check if a bank is populated before injecting x86/mce: Get rid of cpu_missing ...
2022-01-10Merge branches 'edac-misc' and 'edac-amd64' into edac-updates-for-v5.17Borislav Petkov5-4/+46
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-01-04EDAC/i10nm: Release mdev/mbase when failing to detect HBMQiuxu Zhuo1-0/+9
On systems without HBM (High Bandwidth Memory) mdev/mbase are not released/unmapped. Add the code to release mdev/mbase when failing to detect HBM. [Tony: re-word commit message] Cc: <stable@vger.kernel.org> Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com
2021-12-24EDAC/amd64: Add support for family 19h, models 50h-5fhMarc Bevand2-0/+18
Add the new family 19h models 50h-5fh PCI IDs (device 18h functions 0 and 6) to support Ryzen 5000 APUs ("Cezanne"). Signed-off-by: Marc Bevand <m@zorinaq.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20211221233112.556927-1-m@zorinaq.com
2021-12-22x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumerationYazen Ghannam1-9/+2
AMD systems currently lay out MCA bank types such that the type of bank number "i" is either the same across all CPUs or is Reserved/Read-as-Zero. For example: Bank # | CPUx | CPUy 0 LS LS 1 RAZ UMC 2 CS CS 3 SMU RAZ Future AMD systems will lay out MCA bank types such that the type of bank number "i" may be different across CPUs. For example: Bank # | CPUx | CPUy 0 LS LS 1 RAZ UMC 2 CS NBIO 3 SMU RAZ Change the structures that cache MCA bank types to be per-CPU and update smca_get_bank_type() to handle this change. Move some SMCA-specific structures to amd.c from mce.h, since they no longer need to be global. Break out the "count" for bank types from struct smca_hwid, since this should provide a per-CPU count rather than a system-wide count. Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The values in this array should not change at runtime. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
2021-12-22x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank typesYazen Ghannam1-7/+128
Add HWID and McaType values for new SMCA bank types, and add their error descriptions to edac_mce_amd. The "PHY" bank types all have the same error descriptions, and the NBIF and SHUB bank types have the same error descriptions. So reuse the same arrays where appropriate. [ bp: Remove useless comments over hwid types. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
2021-12-21EDAC/sb_edac: Remove redundant initialization of variable rcColin Ian King1-1/+1
The variable rc is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and thus remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211126221848.1125321-1-colin.i.king@gmail.com
2021-12-10EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFhYazen Ghannam2-2/+24
Add a new family type for AMD Family 19h Models 10h to 1Fh. Use this new family type for Models A0h to AFh also. Increase the maximum number of controllers from 8 to 12. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-3-yazen.ghannam@amd.com
2021-12-10EDAC: Add RDDR5 and LRDDR5 memory typesYazen Ghannam1-0/+2
Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com
2021-12-05EDAC/sifive: Fix non-kernel-doc commentRandy Dunlap1-1/+1
scripts/kernel-doc complains about a comment that begins with "/**" but is not in kernel-doc format, so correct it. Prevents this warning: drivers/edac/sifive_edac.c:23: warning: This comment starts with '/**', \ but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EDAC error callback Fixes: 91abaeaaff35 ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211201030913.10283-1-rdunlap@infradead.org
2021-11-20EDAC/synopsys: Enable the driver on Intel's N5X platformDinh Nguyen1-1/+1
Intel's N5X platform is also using the Synopsys EDAC controller. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-3-dinguyen@kernel.org
2021-11-20EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDRDinh Nguyen1-7/+42
Add support for version 3.80a of the Synopsys DDR controller. This version of the controller has the following differences: - UE/CE are auto cleared - Interrupts are supported by default Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-2-dinguyen@kernel.org
2021-11-20EDAC/synopsys: Use the quirk for version instead of ddr versionDinh Nguyen1-2/+1
Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org
2021-11-15EDAC/amd64: Add context structYazen Ghannam1-42/+55
Define an address translation context struct. This will hold values that will be passed between multiple functions. Save return address, Node ID, and the Instance ID number to start. Currently, the UMC number is used as the Instance ID, but future DF versions may use another value. Also include a "tmp" field to use when reading registers. This is to avoid having to define a temporary variable in multiple functions. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com
2021-11-15EDAC/amd64: Allow for DF Indirect Broadcast readsYazen Ghannam1-8/+21
The DF Indirect Access method allows for "Broadcast" accesses in which case no specific instance is targeted. Add support using a reserved instance ID of 0xFF to indicate a broadcast access. Set the FICAA register appropriately. Define helpers functions for instance and broadcast reads and use them where appropriate. Drop the "amd_" prefix since these functions are all static. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com
2021-11-15x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDACYazen Ghannam1-0/+50
df_indirect_read() is used only for address translation. Move it to EDAC along with the translation code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com
2021-11-15x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDACYazen Ghannam1-0/+199
The address translation code used for current AMD systems is non-architectural. So move it to EDAC. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.com
2021-11-02Merge tag 'edac_updates_for_v5.16' of ↵Linus Torvalds6-44/+49
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A small pile of EDAC updates which the autumn wind blew my way. :) - amd64_edac: Add support for three-rank interleaving mode which is present on AMD zen2 servers - The usual fixes and cleanups all over EDAC land" * tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell EDAC/ti: Remove redundant error messages EDAC/amd64: Handle three rank interleaving mode EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned EDAC/al_mc: Make use of the helper function devm_add_action_or_reset() EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()
2021-10-14EDAC/armada-xp: Fix output of uncorrectable error counterHans Potsch1-1/+1
The number of correctable errors is displayed as uncorrectable errors because the "SBE" error count is passed to both calls of edac_mc_handle_error(). Pass the correct uncorrectable error count to the second edac_mc_handle_error() call when logging uncorrectable errors. [ bp: Massage commit message. ] Fixes: 7f6998a41257 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC") Signed-off-by: Hans Potsch <hans.potsch@nokia.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com
2021-10-11EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/HaswellEric Badger1-1/+1
The computation of TOHM is off by one bit. This missed bit results in too low a value for TOHM, which can cause errors in regular memory to incorrectly report: EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory Fixes: 50d1bb93672f ("sb_edac: add support for Haswell based systems") Cc: stable@vger.kernel.org Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com
2021-10-07EDAC/ti: Remove redundant error messagesTang Bin1-6/+1
In the function ti_edac_probe(), devm_ioremap_resource() and platform_get_irq() already issue error messages when they fail so remove the redundant error messages in the EDAC driver. Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210811112626.27848-1-tangbin@cmss.chinamobile.com
2021-10-07EDAC/amd64: Handle three rank interleaving modeYazen Ghannam1-1/+21
AMD Rome systems and later support interleaving between three identical ranks within a channel. Check for this mode by counting the number of enabled chip selects and comparing their masks. If there are exactly three enabled chip selects and their masks are identical, then three rank interleaving is enabled. The size of a rank is determined from its mask value. However, three rank interleaving doesn't follow the method of swapping an interleave bit with the most significant bit. Rather, the interleave bit is flipped and the most significant bit remains the same. There is only a single interleave bit in this case. Account for this when determining the chip select size by keeping the most significant bit at its original value and ignoring any zero bits. This will return a full bitmask in [MSB:1]. Fixes: e53a3b267fb0 ("EDAC/amd64: Find Chip Select memory size using Address Mask") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com
2021-10-07EDAC/mc_sysfs: Print MC-scope sysfs counters unsignedEric Badger1-4/+4
This is cosmetically nicer for counts > INT32_MAX, and aligns the MC-scope format with that of the lower layer sysfs counter files. Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20211003181653.GA685515@ebadger-ThinkPad-T590
2021-09-28EDAC/al_mc: Make use of the helper function devm_add_action_or_reset()Cai Huoqing1-8/+4
The helper function devm_add_action_or_reset() will internally call devm_add_action(), and if devm_add_action() fails then it will execute the action mentioned and return the error code. So use devm_add_action_or_reset() instead of devm_add_action() to simplify the error handling, reduce the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Talel Shenhar <talel@amazon.com> Link: https://lkml.kernel.org/r/20210922125924.321-1-caihuoqing@baidu.com
2021-09-16EDAC/dmc520: Assign the proper type to dimm->edac_modeBorislav Petkov1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210916085258.7544-1-bp@alien8.de
2021-09-16EDAC/synopsys: Fix wrong value type assignment for edac_modeSai Krishna Potthuri1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Issue caught by Coverity check "enumerated type mixed with another type." [ bp: Rewrite commit message, add tags. ] Fixes: ae9b56e3996d ("EDAC, synps: Add EDAC support for zynq ddr ecc controller") Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com
2021-09-15EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or ↵Len Baker1-24/+18
scnprintf() strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehavior. The safe replacement is strscpy(). [1][2] However, to simplify and clarify the code, to concatenate labels use the scnprintf() function. This way it is not necessary to check the return value of strscpy() (-E2BIG if the parameter count is 0 or the src was truncated) since scnprintf() always returns the number of chars written into the buffer. This function always returns a nul-terminated string even if it needs to be truncated. While at it, fix all other broken string generation code that wrongly interprets snprintf()'s return code or just uses sprintf(), implement that using scnprintf() here too. Drop breaks in loops around scnprintf() as it is safe now to loop. Moreover, the check is not needed: for the case when the buffer is exhausted, len never gets zero because scnprintf() takes the full buffer length as input parameter, but excludes the trailing '\0' in its return code and thus, 1 is the minimum len. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [2] https://github.com/KSPP/linux/issues/88 [ rric: Replace snprintf() with scnprintf(), rework sprintf() user, drop breaks in loops around scnprintf(), introduce 'end' pointer to reduce pointer arithmetic, use prefix pattern for e->location, adjust subject and description ] Co-developed-by: Joe Perches <joe@perches.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210903150539.7282-1-len.baker@gmx.com
2021-08-31Merge tag 'irq-core-2021-08-30' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...
2021-08-30Merge tag 'edac_updates_for_v5.15' of ↵Linus Torvalds8-38/+202
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "The usual EDAC stuff which managed to trickle in for 5.15: - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it to the Intel SKx drivers - Print additional useful per-channel error information on i10nm, like on SKL - Don't load the AMD EDAC decoder in virtual images - The usual round of fixes and cleanups" * tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Retrieve and print retry_rd_err_log registers EDAC/i10nm: Fix NVDIMM detection EDAC/skx_common: Set the memory type correctly for HBM memory EDAC/altera: Skip defining unused structures for specific configs EDAC/mce_amd: Do not load edac_mce_amd module on guests EDAC/mc: Add new HBM2 memory type EDAC/amd64: Use DEVICE_ATTR helper macros
2021-08-23EDAC/i10nm: Retrieve and print retry_rd_err_log registersYouquan Song4-3/+157
Retrieve and print retry_rd_err_log registers like the earlier change: commit e80634a75aba ("EDAC, skx: Retrieve and print retry_rd_err_log registers") This is a little trickier than on Skylake because of potential interference with BIOS use of the same registers. The default behavior is to ignore these registers. A module parameter retry_rd_err_log(default=0) controls the mode of operation: - 0=off : Default. - 1=bios : Linux doesn't reset any control bits, but just reports values. This is "no harm" mode, but it may miss reporting some data. - 2=linux: Linux tries to take control and resets mode bits, clears valid/UC bits after reading. This should be more reliable (especially if BIOS interference is reduced by disabling eMCA reporting mode in BIOS setup). Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com