summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2022-04-28Merge tag 'v5.15.36' into dev-5.15Joel Stanley1-5/+11
This is the 5.15.36 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-04-27EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta1-5/+11
commit e2932d1f6f055b2af2114c7e64a26dc1b5593d0c upstream. Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Merge tag 'v5.15.26' into dev-5.15Joel Stanley1-1/+1
This is the 5.15.26 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-02-23EDAC: Fix calculation of returned address and next offset in edac_align_ptr()Eliav Farber1-1/+1
commit f8efca92ae509c25e0a4bd5d0a86decea4f0c41e upstream. Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-17Merge tag 'v5.15.24' into dev-5.15Joel Stanley2-2/+2
This is the 5.15.24 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-02-08EDAC/xgene: Fix deferred probingSergey Shtylyov1-1/+1
commit dfd0dfb9a7cc04acf93435b440dd34c2ca7b4424 upstream. The driver overrides error codes returned by platform_get_irq_optional() to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08EDAC/altera: Fix deferred probingSergey Shtylyov1-1/+1
commit 279eb8575fdaa92c314a54c0d583c65e26229107 upstream. The driver overrides the error codes returned by platform_get_irq() to -ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-31Merge tag 'v5.15.18' into dev-5.15Joel Stanley1-2/+1
This is the 5.15.18 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-01-27EDAC/synopsys: Use the quirk for version instead of ddr versionDinh Nguyen1-2/+1
[ Upstream commit bd1d6da17c296bd005bfa656952710d256e77dd3 ] Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-14Merge tag 'v5.15.14' into dev-5.15Joel Stanley1-0/+9
This is the 5.15.14 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2022-01-11EDAC/i10nm: Release mdev/mbase when failing to detect HBMQiuxu Zhuo1-0/+9
commit c370baa328022cbd46c59c821d1b467a97f047be upstream. On systems without HBM (High Bandwidth Memory) mdev/mbase are not released/unmapped. Add the code to release mdev/mbase when failing to detect HBM. [Tony: re-word commit message] Cc: <stable@vger.kernel.org> Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26Merge tag 'v5.15.5' into dev-5.15Joel Stanley2-2/+22
This is the 5.15.5 stable release Signed-off-by: Joel Stanley <joel@jms.id.au>
2021-11-18EDAC/amd64: Handle three rank interleaving modeYazen Ghannam1-1/+21
[ Upstream commit 9f4873fb6af7966de8fcbd95c36b61351c1c4b1f ] AMD Rome systems and later support interleaving between three identical ranks within a channel. Check for this mode by counting the number of enabled chip selects and comparing their masks. If there are exactly three enabled chip selects and their masks are identical, then three rank interleaving is enabled. The size of a rank is determined from its mask value. However, three rank interleaving doesn't follow the method of swapping an interleave bit with the most significant bit. Rather, the interleave bit is flipped and the most significant bit remains the same. There is only a single interleave bit in this case. Account for this when determining the chip select size by keeping the most significant bit at its original value and ignoring any zero bits. This will return a full bitmask in [MSB:1]. Fixes: e53a3b267fb0 ("EDAC/amd64: Find Chip Select memory size using Address Mask") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/HaswellEric Badger1-1/+1
commit 537bddd069c743759addf422d0b8f028ff0f8dbc upstream. The computation of TOHM is off by one bit. This missed bit results in too low a value for TOHM, which can cause errors in regular memory to incorrectly report: EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory Fixes: 50d1bb93672f ("sb_edac: add support for Haswell based systems") Cc: stable@vger.kernel.org Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-01edac: npcm: Add Nuvoton NPCM7xx EDAC driverGeorge Hung3-0/+429
Add support for the Nuvoton NPCM7xx SoC EDAC driver NPCM7xx ECC datasheet from nuvoton.israel-Poleg: "Cadence DDR Controller User’s Manual For DDR3 & DDR4 Memories" Tested: Forcing an ECC error event Write a value to the xor_check_bits parameter that will trigger an ECC event once that word is read For example, to force a single-bit correctable error on bit 0 of the user-word space shown, write 0x75 into that byte of the xor_check_bits parameter and then assert fwc (force write check) bit to 'b1' (mem base: 0xf0824000, xor_check_bits reg addr: 0x178) $ devmem 0xf0824178 32 0x7501 To force a double-bit un-correctable error for the user-word space, write 0x03 into that byte of the xor_check_bits parameter $ devmem 0xf0824178 32 0x301 OpenBMC-Staging-Count: 9 Signed-off-by: George Hung <george.hung@quantatw.com> Reviewed-by: Avi Fishman <avifishman70@gmail.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2021-10-14EDAC/armada-xp: Fix output of uncorrectable error counterHans Potsch1-1/+1
The number of correctable errors is displayed as uncorrectable errors because the "SBE" error count is passed to both calls of edac_mc_handle_error(). Pass the correct uncorrectable error count to the second edac_mc_handle_error() call when logging uncorrectable errors. [ bp: Massage commit message. ] Fixes: 7f6998a41257 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC") Signed-off-by: Hans Potsch <hans.potsch@nokia.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com
2021-09-16EDAC/dmc520: Assign the proper type to dimm->edac_modeBorislav Petkov1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210916085258.7544-1-bp@alien8.de
2021-09-16EDAC/synopsys: Fix wrong value type assignment for edac_modeSai Krishna Potthuri1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Issue caught by Coverity check "enumerated type mixed with another type." [ bp: Rewrite commit message, add tags. ] Fixes: ae9b56e3996d ("EDAC, synps: Add EDAC support for zynq ddr ecc controller") Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com
2021-08-31Merge tag 'irq-core-2021-08-30' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...
2021-08-30Merge tag 'edac_updates_for_v5.15' of ↵Linus Torvalds8-38/+202
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "The usual EDAC stuff which managed to trickle in for 5.15: - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it to the Intel SKx drivers - Print additional useful per-channel error information on i10nm, like on SKL - Don't load the AMD EDAC decoder in virtual images - The usual round of fixes and cleanups" * tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Retrieve and print retry_rd_err_log registers EDAC/i10nm: Fix NVDIMM detection EDAC/skx_common: Set the memory type correctly for HBM memory EDAC/altera: Skip defining unused structures for specific configs EDAC/mce_amd: Do not load edac_mce_amd module on guests EDAC/mc: Add new HBM2 memory type EDAC/amd64: Use DEVICE_ATTR helper macros
2021-08-23EDAC/i10nm: Retrieve and print retry_rd_err_log registersYouquan Song4-3/+157
Retrieve and print retry_rd_err_log registers like the earlier change: commit e80634a75aba ("EDAC, skx: Retrieve and print retry_rd_err_log registers") This is a little trickier than on Skylake because of potential interference with BIOS use of the same registers. The default behavior is to ignore these registers. A module parameter retry_rd_err_log(default=0) controls the mode of operation: - 0=off : Default. - 1=bios : Linux doesn't reset any control bits, but just reports values. This is "no harm" mode, but it may miss reporting some data. - 2=linux: Linux tries to take control and resets mode bits, clears valid/UC bits after reading. This should be more reliable (especially if BIOS interference is reduced by disabling eMCA reporting mode in BIOS setup). Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com
2021-08-23EDAC/i10nm: Fix NVDIMM detectionQiuxu Zhuo1-3/+3
MCDDRCFG is a per-channel register and uses bit{0,1} to indicate the NVDIMM presence on DIMM slot{0,1}. Current i10nm_edac driver wrongly uses MCDDRCFG as per-DIMM register and fails to detect the NVDIMM. Fix it by reading MCDDRCFG as per-channel register and using its bit{0,1} to check whether the NVDIMM is populated on DIMM slot{0,1}. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Fan Du <fan.du@intel.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-2-tony.luck@intel.com
2021-08-23EDAC/skx_common: Set the memory type correctly for HBM memoryQiuxu Zhuo1-1/+4
Set the memory type to MEM_HBM2 if it's managed by the HBM2 memory controller. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210720163009.GA1417532@agluck-desk2.amr.corp.intel.com
2021-08-16EDAC/altera: Skip defining unused structures for specific configsKrzysztof Kozlowski1-18/+26
The Altera EDAC driver has several features conditionally built depending on Kconfig options. The edac_device_prv_data structures are conditionally used in of_device_id tables. They reference other functions and structures which can be defined as __maybe_unused. Silence build warnings like: drivers/edac/altera_edac.c:643:37: warning: ‘altr_edac_device_inject_fops’ defined but not used [-Wunused-const-variable=] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lkml.kernel.org/r/20210601092704.203555-1-krzysztof.kozlowski@canonical.com
2021-08-12EDAC/altera: Convert to generic_handle_domain_irq()Marc Zyngier1-5/+2
Replace generic_handle_irq(irq_linear_revmap()) with a single call to generic_handle_domain_irq(). Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-09EDAC/mce_amd: Do not load edac_mce_amd module on guestsSmita Koralahalli1-0/+3
Hypervisors likely do not expose the SMCA feature to the guest and loading this module leads to false warnings. This module should not be loaded in guests to begin with, but people tend to do so, especially when testing kernels in VMs. And then they complain about those false warnings. Do the practical thing and do not load this module when running as a guest to avoid all that complaining. [ bp: Rewrite commit message. ] Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com
2021-07-20EDAC/mc: Add new HBM2 memory typeNaveen Krishna Chatradhi1-0/+1
Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for HBM2 (High Bandwidth Memory Gen 2) new memory type. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com
2021-07-15EDAC/igen6: fix core dependency AGAINRandy Dunlap1-1/+1
My previous patch had a typo/thinko which prevents this driver from being enabled: change X64_64 to X86_64. Fixes: 0a9ece9ba154 ("EDAC/igen6: fix core dependency") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac@vger.kernel.org Cc: bowsingbetee <bowsingbetee@protonmail.com> Cc: stable@vger.kernel.org Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-13EDAC/amd64: Use DEVICE_ATTR helper macrosDwaipayan Ray1-13/+8
Instead of "open coding" DEVICE_ATTR, use the corresponding helper macros DEVICE_ATTR_{RW,RO,WO} in amd64_edac.c Some function names needed to be changed to match the device conventions <foo>_show and <foo>_store, but the functionality itself is unchanged. The devices using EDAC_DCT_ATTR_SHOW() are left unchanged. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210713065130.2151-1-dwaipayanray1@gmail.com
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01kernel.h: split out panic and oops helpersAndy Shevchenko1-0/+1
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30Merge tag 'edac_updates_for_v5.14' of ↵Linus Torvalds11-60/+625
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Tony Luck: "Various fixes and support for new CPUs: - Clean up error messages from thunderx_edac - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload - Use %pR to print resources in aspeed_edac - Add Yazen Ghannam as MAINTAINER for AMD edac drivers - Fix Ice Lake and Sapphire Rapids drivers to report correct "near" or "far" device for errors in 2LM configurations - Add support of on package high bandwidth memory in Sapphire Rapids - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for ICL-NNPI, Tiger Lake and Alder Lake) - Don't even try to load Intel EDAC drivers when running as a guest - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6" * tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/igen6: fix core dependency EDAC/Intel: Do not load EDAC driver when running as a guest EDAC/igen6: Add Intel Alder Lake SoC support EDAC/igen6: Add Intel Tiger Lake SoC support EDAC/igen6: Add Intel ICL-NNPI SoC support EDAC/i10nm: Add support for high bandwidth memory EDAC/i10nm: Add detection of memory levels for ICX/SPR servers EDAC/skx_common: Add new ADXL components for 2-level memory MAINTAINERS: Make Yazen Ghannam maintainer for EDAC-AMD64 EDAC/aspeed: Use proper format string for printing resource EDAC/ti: Add missing MODULE_DEVICE_TABLE EDAC/thunderx: Remove irrelevant variable from error messages
2021-06-21EDAC/igen6: fix core dependencyRandy Dunlap1-1/+2
igen6_edac needs mce_register()/unregister() functions, so it should depend on X86_MCE (or X86_MCE_INTEL). That change prevents these build errors: ld: drivers/edac/igen6_edac.o: in function `igen6_remove': igen6_edac.c:(.text+0x494): undefined reference to `mce_unregister_decode_chain' ld: drivers/edac/igen6_edac.o: in function `igen6_probe': igen6_edac.c:(.text+0xf5b): undefined reference to `mce_register_decode_chain' Fixes: 10590a9d4f23e ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210619160203.2026-1-rdunlap@infradead.org
2021-06-18EDAC/Intel: Do not load EDAC driver when running as a guestLuck, Tony4-0/+12
There's little to no point in loading an EDAC driver running in a guest: 1) The CPU model reported by CPUID may not represent actual h/w 2) The hypervisor likely does not pass in access to memory controller devices 3) Hypervisors generally do not pass corrected error details to guests Add a check in each of the Intel EDAC drivers for X86_FEATURE_HYPERVISOR and simply return -ENODEV in the init routine. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210615174419.GA1087688@agluck-desk2.amr.corp.intel.com
2021-06-18EDAC/igen6: Add Intel Alder Lake SoC supportQiuxu Zhuo1-11/+73
Alder Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Tiger Lake SoC. Like Tiger Lake, it also has two memory controllers each associated one IBECC instance. The minor differences include the MMIO offset of each memory controller and the type of memory error address logged in the IBECC. So add Alder Lake compute die IDs, adjust the MMIO offset for each memory controller and handle the type of memory error address logged in the IBECC for Alder Lake EDAC support. Tested-by: Vrukesh V Panse <vrukesh.v.panse@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-7-tony.luck@intel.com
2021-06-18EDAC/igen6: Add Intel Tiger Lake SoC supportQiuxu Zhuo1-20/+253
Tiger Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Elkhart Lake SoC. The main differences are that Tiger Lake has two memory controllers each associated with one IBECC and uses Machine Check for the memory error notification. So add Tiger Lake compute die IDs, MCE decoding chain registration, and memory slice decoding for Tiger Lake EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-6-tony.luck@intel.com
2021-06-18EDAC/igen6: Add Intel ICL-NNPI SoC supportQiuxu Zhuo1-0/+29
The Ice Lake Neural Network Processor for Deep Learning Inference (ICL-NNPI) SoC shares the same memory controller and In-Band ECC with Elkhart Lake SoC. Add the ICL-NNPI compute die IDs for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-5-tony.luck@intel.com
2021-06-18EDAC/i10nm: Add support for high bandwidth memoryQiuxu Zhuo3-19/+148
A future Xeon processor will include in-package HBM (high bandwidth memory). The in-package HBM memory controller shares the same architecture with the regular DDR memory controller. Add the HBM memory controller devices for EDAC support. Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-4-tony.luck@intel.com
2021-06-18EDAC/i10nm: Add detection of memory levels for ICX/SPR serversQiuxu Zhuo2-0/+42
Current i10nm_edac driver is only for system configured in 1-level memory. If the system is configured in 2-level memory, the driver doesn't report the 1st level memory DIMM for the error address, even if the error occurs in the 1st level memory. Both Ice Lake servers and Sapphire Rapids servers can be configured in 2-level memory. Add detection of memory levels to i10nm_edac for the two kinds of servers so that the driver can report the 2nd level memory DIMM or the 1st level memory DIMM according to error source. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-3-tony.luck@intel.com
2021-06-18EDAC/skx_common: Add new ADXL components for 2-level memoryQiuxu Zhuo2-11/+67
Some Intel servers may configure memory in 2 levels, using fast "near" memory (e.g. DDR) as a cache for larger, slower, "far" memory (e.g. 3D X-point). In these configurations the BIOS ADXL address translation for an address in a 2-level memory range will provide details of both the "near" and far components. Current exported ADXL components are only for 1-level memory system or for 2nd level memory of 2-level memory system. So add new ADXL components for 1st level memory of 2-level memory system to fully support 2-level memory system and the detection of memory error source(1st level memory or 2nd level memory). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-2-tony.luck@intel.com
2021-06-04EDAC/mce_amd: Fix typo "FIfo" -> "Fifo"Colin Ian King1-1/+1
There is an uppercase letter I in one of the MCE error descriptions instead of a lowercase one. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210603103349.79117-1-colin.king@canonical.com
2021-05-27x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank typesMuralidhara M K1-0/+70
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Also while at it, optimize the string names for some SMCA banks. [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
2021-05-18EDAC/aspeed: Use proper format string for printing resourceArnd Bergmann1-2/+2
On ARMv7, resource_size_t can be 64-bit, which breaks printing it as %x: drivers/edac/aspeed_edac.c: In function 'init_csrows': drivers/edac/aspeed_edac.c:257:28: error: format '%x' expects argument of \ type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long \ long unsigned int'} [-Werror=format=] 257 | dev_dbg(mci->pdev, "dt: /memory node resources: first page \ r.start=0x%x, resource_size=0x%x, PAGE_SHIFT macro=0x%x\n", Use the special %pR format string to pretty-print the entire resource instead. Fixes: edfc2d73ca45 ("EDAC/aspeed: Add support for AST2400 and AST2600") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lkml.kernel.org/r/20210421135500.3518661-1-arnd@kernel.org
2021-05-14EDAC/ti: Add missing MODULE_DEVICE_TABLEBixuan Cui1-0/+1
The module misses MODULE_DEVICE_TABLE() for of_device_id tables and thus never autoloads on ID matches. Add the missing declaration. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tero Kristo <kristo@kernel.org> Link: https://lkml.kernel.org/r/20210512033727.26701-1-cuibixuan@huawei.com
2021-05-10EDAC/thunderx: Remove irrelevant variable from error messagesChristophe JAILLET1-2/+2
'ret' is irrelevant (it is 0) for both dev_err() calls, so just remove it from the error message. [ bp: Massage commit message. ] Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/0c046ef5cfb367a3f707ef4270e21a2bcbf44952.1620280098.git.christophe.jaillet@wanadoo.fr
2021-05-10x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFGBrijesh Singh1-1/+1
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8 name from it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
2021-03-23EDAC: altera: merge ARCH_SOCFPGA and ARCH_STRATIX10Krzysztof Kozlowski2-7/+12
Simplify 32-bit and 64-bit Intel SoCFPGA Kconfig options by having only one for both of them. This the common practice for other platforms. Additionally, the ARCH_SOCFPGA is too generic as SoCFPGA designs come from multiple vendors. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2021-02-15Merge branch 'edac-misc' into edac-updates-for-v5.12Borislav Petkov2-2/+2
2021-01-22EDAC/amd64: Issue probing messages only on properly detected hardwareBorislav Petkov1-7/+7
amd64_edac was converted to CPU family autoprobing (from PCI device IDs) to not have to add a new PCI device ID each time a new platform is shipped but to support the whole family out-of-the-box. However, this caused a lot of noise in dmesg even when the machine doesn't have ECC DIMMs or ECC has been disabled in the BIOS: EDAC MC: Ver: 3.0.0 EDAC amd64: F17h detected (node 0). EDAC amd64: Node 0: DRAM ECC disabled. EDAC amd64: F17h detected (node 1). EDAC amd64: Node 1: DRAM ECC disabled. EDAC amd64: F17h detected (node 2). EDAC amd64: Node 2: DRAM ECC disabled. EDAC amd64: F17h detected (node 3). EDAC amd64: Node 3: DRAM ECC disabled. EDAC amd64: F17h detected (node 4). EDAC amd64: Node 4: DRAM ECC disabled. EDAC amd64: F17h detected (node 5). EDAC amd64: Node 5: DRAM ECC disabled. EDAC amd64: F17h detected (node 6). EDAC amd64: Node 6: DRAM ECC disabled. EDAC amd64: F17h detected (node 7). EDAC amd64: Node 7: DRAM ECC disabled. or even $ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c 128 EDAC amd64: F17h detected (node 0). 128 EDAC amd64: Node 0: DRAM ECC disabled. 1 EDAC MC: Ver: 3.0.0 on a big machine. Yap, that's once per CPU for 128 of them. So move the init messages after all probing has succeeded to avoid unnecessary spew in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210119164141.17417-1-bp@alien8.de
2021-01-19EDAC/xgene: Do not print a failure message to get an IRQ twiceMenglong Dong1-1/+1
Coccinelle reports a redundant error print in xgene_edac_probe() because platform_get_irq() will already print an error message when it is unable to get an IRQ. Use platform_get_irq_optional() instead which avoids the error message and keep the driver-specific one. [ bp: Sanitize commit message. ] Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rric@kernel.org> Link: https://lkml.kernel.org/r/20210112103540.7818-1-dong.menglong@zte.com.cn