summaryrefslogtreecommitdiff
path: root/drivers/edac/amd64_edac.c
AgeCommit message (Collapse)AuthorFilesLines
2016-12-04EDAC, amd64: Fix improper return valuePan Bian1-1/+1
When the call to zalloc_cpumask_var() fails, returning "false" seems improper. The real value of macro "false" is 0, and 0 means no error. Return -ENOMEM instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071 Signed-off-by: Pan Bian <bianpan2016@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-01EDAC, amd64: Improve amd64-specific printing macrosBorislav Petkov1-10/+6
Prefix the warn and error macros with the respective string so that callers don't have to say "Error" or "Warning". We save us string length this way in the actual calls. While at it, shorten the calls in reserve_mc_sibling_devs(). Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2016-11-29EDAC, amd64: Autoload amd64_edac_mod on Fam17h systemsYazen Ghannam1-0/+1
Add Fam17h to the list of families to autoload amd64_edac_mod. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29EDAC, amd64: Define and register UMC error decode functionYazen Ghannam1-3/+86
How we need to decode UMC errors is different from how we decode bus errors, so let's define a new function for this. We also need a way to determine the UMC channel since we're not guaranteed that there is a fixed relation between channel and MCA bank. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com [ Fold in decode_synd_reg(), simplify. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29EDAC, amd64: Determine EDAC capabilities on Fam17h systemsYazen Ghannam1-6/+24
We need to determine the EDAC capabilities from all UMCs on the node. We should only check UMCs that are enabled and make sure they all agree. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29EDAC, amd64: Determine EDAC MC capabilities on Fam17hYazen Ghannam1-18/+49
The UMCs on Fam17h are independent memory controllers so we need to read the capabilities from all UMCs and make sure they agree. Once we determine what capabilities are available we should save them for convenience. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com [ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29EDAC, amd64: Add Fam17h debug outputYazen Ghannam1-8/+88
Read a few more UMC registers and provide debug output in order to be as similar as possible to older AMD systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded K8 check and comments, fixup others. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28EDAC, amd64: Add Fam17h scrubber supportYazen Ghannam1-5/+38
Fam17h has new register offsets and fields for setting up the DRAM scrubber so add support for this. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28EDAC, amd64: Read MC registers on AMD Fam17hYazen Ghannam1-39/+133
Fam17h has a different set of registers and bitfields. Most of these registers are read through SMN (System Management Network) rather than PCI config space. Also, the derivation of various values is now different. Update amd64_edac to read the appropriate registers and extract the correct values for Fam17h. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com [ Save us the indentation level in read_mc_regs(), add defines ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28EDAC, amd64: Reserve correct PCI devices on AMD Fam17hYazen Ghannam1-18/+69
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older systems. Update struct amd64_pvt to hold the new functions and reserve them if on Fam17h. Also, allocate an array of UMC structs within our newly allocated PVT struct. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com [ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24EDAC, amd64: Add AMD Fam17h family type and opsYazen Ghannam1-0/+44
Add a family type and associated ops for Fam17h. Define a struct to hold all the UMC registers that we need. Make this a part of struct amd64_pvt in order to maximize code reuse in the rest of the driver. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24EDAC, amd64: Extend ecc_enabled() to Fam17hYazen Ghannam1-10/+40
Update the ecc_enabled() function to work on Fam17h. This entails reading a different set of registers and using the SMN (System Management Network) rather than PCI devices. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com [ Fixup ecc_en assignment and get_umc_base(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23EDAC, amd64: Don't force-enable ECC checking on newer systemsYazen Ghannam1-3/+8
It's not recommended for the OS to try and force-enable ECC checking. This is considered a firmware task since it includes memory training, etc, so don't change ECC settings on Fam17h or newer systems and inform the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com [ Put the "forcing" message in an else branch. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21EDAC, amd64: Add Deferred Error typeYazen Ghannam1-0/+2
Currently, deferred errors are classified as correctable in EDAC. Add a new error type for deferred errors so that they are correctly reported to the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21EDAC, amd64: Rename __log_bus_error() to be more specificYazen Ghannam1-2/+2
We only use __log_bus_error() to log DRAM ECC errors, so let's change the name to reflect this. We'll also use this function for DRAM ECC errors on Fam17h, but we'll call it from a different function than decode_bus_error(). Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21EDAC, amd64: Change target of pci_name from F2 to F3Yazen Ghannam1-1/+1
AMD Fam17h will not be using PCI function 2 for EDAC, but will continue to use function 3. So let's get the name of F3 instead of F2 to support Fam17h and previous families. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-21EDAC, amd64: Autoload module using x86_cpu_idYazen Ghannam1-0/+9
Reinstate driver autoloading now that PCI dependency is gone. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-08-08EDAC, amd64: Fix channel decode on Fam15hMod60h systemsYazen Ghannam1-3/+12
Fam15hMod60h systems are using the channel decode of Fam15hMod30h which gives incorrect results. Fam15hMod60h systems should use the generic channel decode method plus a couple more cases. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-16EDAC, amd64_edac: Init opstate at the proper time during initBorislav Petkov1-2/+2
It is useless to do it if we're loaded on unsupported hardware so do that only after we have detected at least 1 supported AMD northbridge. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-18Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
2016-05-09EDAC, amd64_edac: Drop pci_register_driver() useBorislav Petkov1-82/+43
- remove homegrown instances counting. - take F3 PCI device from amd_nb caching instead of F2 which was used with the PCI core. With those changes, the driver doesn't need to register a PCI driver and relies on the northbridges caching which we do anyway on AMD. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2016-04-27EDAC, amd64_edac: Issue driver banner only on successBorislav Petkov1-2/+2
... and don't mislead users into thinking that the driver has loaded successfully. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-18treewide: Fix typos in printkMasanari Iida1-1/+1
This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-01-25EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()Dan Carpenter1-1/+1
dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>
2015-11-04Merge branch 'ras-core-for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...
2015-10-21x86/amd_nb, EDAC: Rename amd_get_node_id()Aravind Gopalakrishnan1-3/+3
This function doesn't give us the "Node ID" as the function name suggests. Rather, it receives a PCI device as argument, checks the available F3 PCI device IDs in the system and returns the index of the matching Bus/Device IDs. Rename it to amd_pci_dev_to_node_id(). No functional change is introduced. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-29EDAC, amd64_edac: Extend scrub rate support to F15hM60hAravind Gopalakrishnan1-10/+25
The scrub rate control register has moved to function 2 in PCI config space and is at a different offset on family 0x15, models 0x60 and later. The minimum recommended scrub rate has also changed. (Refer to D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG). Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate this. Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Cleanup conditionals. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-20amd64_edac: enforce synchronous probeLuis R. Rodriguez1-0/+1
While testing asynchronous PCI probe on this driver I noticed it failed because the driver checks if any of the PCI devices have been bound to the driver after registering it, which obviously does not work if probing is asynchronous. While there are patches and discussions on how the driver should behave are ongoing, let's enforce synchronous probe for this driver for now. Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-23EDAC, amd64_edac: Get rid of per-node driver instancesBorislav Petkov1-20/+13
... and do the proper thing using EDAC core facilities. Cc: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: amd64: Use static attribute groupsTakashi Iwai1-36/+11
Instead of calling device_create_file() and device_remove_file() manually, pass the static attribute groups with the new edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs files is done by a proper is_visible callback. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-17EDAC, amd64_edac: Prevent OOPS with >16 memory controllersDaniel J Blueman1-2/+8
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com Cc: stable@vger.kernel.org [ Boris: massage commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-05amd64_edac: Build module on x86-32Tomasz Pala1-0/+5
By popular demand, enable amd64_edac on 32-bit too. Boris: - update Kconfig text. - add a warning on load which states that 32-bit configurations are unsupported. Signed-off-by: Tomasz Pala <gotar@polanet.pl> Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.pl Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-30amd64_edac: Add F15h M60h supportAravind Gopalakrishnan1-79/+176
This patch adds support for ECC error decoding for F15h M60h processor. Aside from the usual changes, the patch adds support for some new features in the processor: - DDR4(unbuffered, registered); LRDIMM DDR3 support - relevant debug messages have been modified/added to report these memory types - new dbam_to_cs mappers - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find cs_size. This multiplier value is obtained from the per-dimm DCSM register. So, change the interface to accept a 'cs_mask_nr' value to facilitate this calculation - switch-casing determine_memory_type() - done to cleanse the function of too many if-else statements and improve readability - This is now called early in read_mc_regs() to cache dram_type Misc cleanup: - amd64_pci_table[] is condensed by using PCI_VDEVICE macro. Testing details: Tested the patch by injecting 'ECC' type errors using mce_amd_inj and error decoding works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: determine_memory_type() cleanups ] Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-23amd64_edac: Modify usage of amd64_read_dct_pci_cfg()Aravind Gopalakrishnan1-66/+80
Rationale behind this change: - F2x1xx addresses were stopped from being mapped explicitly to DCT1 from F15h (OR) onwards. They use _dct[0:1] mechanism to access the registers. So we should move away from using address ranges to select DCT for these families. - On newer processors, the address ranges used to indicate DCT1 (0x140, 0x1a0) have different meanings than what is assumed currently. Changes introduced: - amd64_read_dct_pci_cfg() now takes in dct value and uses it for 'selecting the dct' - Update usage of the function. Keep in mind that different families have specific handling requirements - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different from amd64_read_pci_cfg() - Move the k8 specific check to amd64_read_pci_cfg - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg() - Remove now needless .read_dct_pci_cfg Testing: - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG' and mce_amd_inj - driver obtains info from F2x registers and caches it in pvt structures correctly - ECC decoding works fine Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-27amd64_edac: Add support for newer F16h modelsAravind Gopalakrishnan1-0/+24
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC turned on using mce_amd_inj module and the patch works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com Tested-by: Arindam Nath <Arindam.Nath@amd.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07amd64_edac: Fix logic to determine channel for F15 M30h processorsAravind Gopalakrishnan1-3/+11
Update current channel selection logic to include F15h, M30h memory controllers. Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low) (Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf) Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Remove "amd64" prefix from static functionsBorislav Petkov1-62/+56
No need for the namespace tagging there. Cleanup setup_pci_device while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Simplify code around decode_bus_errorBorislav Petkov1-9/+4
Drop wrapper function and prefixes. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Mark amd64_decode_bus_error as staticRashika Kheria1-1/+1
This patch marks the function amd64_decode_bus_error() as static because it is not used outside of amd64_edac.c. It also eliminates the following warning: drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06EDAC: Remove DEFINE_PCI_DEVICE_TABLE macroJingoo Han1-1/+1
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06amd64_edac: Fix condition to verify max channels allowed for F15 M30hAravind Gopalakrishnan1-1/+1
The value returned from 'f15_m30h_determine_channel' will always be 0x3 max. The condition (channel > 4 || channel < 0) works as hardware never returns a value of 4, but it leads to static checker analysis errors like http://marc.info/?l=linux-edac&m=138607615131951&w=2. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain [ Boris: massage commit message a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-10-22bitops: Introduce a more generic BITMASK macroChen, Gong1-22/+24
GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-08-27amd64_edac: Fix incorrect wraparoundsAravind Gopalakrishnan1-5/+6
dct_base and dct_limit obtain 32 bit register values when they read their respective pci config space registers. A left shift beyond 32 bits will cause them to wrap around. Similar case for chan_addr as can be seen from the bug report (link below). In the patch, we rectify this by casting chan_addr to u64 and by comparing dct_base and dct_limit against properly shifted sys_addr in order to compare the correct bits. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27amd64_edac: Correct erratum 505 rangeBorislav Petkov1-4/+4
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later. Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-15Merge tag 'edac_for_3.12' of ↵Ingo Molnar1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS/EDAC updates from Boris Petkov: "An amd64_edac fix for single channel configurations + trivial cleanups courtesy of Jingoo Han." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12amd64_edac: Get rid of boot_cpu_data accessesBorislav Petkov1-47/+43
Now that we cache (family, model, stepping) locally, use them instead of boot_cpu_data. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12amd64_edac: Add ECC decoding support for newer F15h modelsAravind Gopalakrishnan1-30/+216
On newer models, support has been included for upto 4 DCT's, however, only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10). Also, the routing DRAM Requests algorithm is different for F15h M30h. Thus it is cleaner to use a brand new function rather than adding quirks to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM Routing Requests" in the BKDG for further info. Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and verified to be functionally correct. While at it, verify if erratum workarounds for E505 and E637 still hold. From email conversations within AMD, the current status of the errata is: * Erratum 505: fixed in model 0x1, stepping 0x1 and later. * Erratum 637: not fixed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanups, corrections ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-29amd64_edac: Fix single-channel setupsBorislav Petkov1-1/+8
It can happen that configurations are running in a single-channel mode even with a dual-channel memory controller, by, say, putting the DIMMs only on the one channel and leaving the other empty. This causes a problem in init_csrows which implicitly assumes that when the second channel is enabled, i.e. channel 1, the struct dimm hierarchy will be present. Which is not. So always allocate two channels unconditionally. This provides for the nice side effect that the data structures are initialized so some day, when memory hotplug is supported, it should just work out of the box when all of a sudden a second channel appears. Reported-and-tested-by: Roger Leigh <rleigh@debian.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-19amd64_edac: Add Family 16h supportAravind Gopalakrishnan1-1/+64
Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16EDAC: Merge mci.mem_is_per_rank with mci.csbasedMauro Carvalho Chehab1-1/+0
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>