From 4fb6fde74d6724dc6d64ec729f950fbdeefd7f07 Mon Sep 17 00:00:00 2001 From: Aaron Miller Date: Thu, 3 Nov 2016 15:01:53 -0700 Subject: EDAC: Expose per-DIMM error counts in sysfs The old csrowX sysfs directories have per-csrow error counters, but the new dimmX directories do not currently expose error counts. EDAC already keeps these counts, add them to sysfs so per-DIMM counts are still available when CONFIG_EDAC_LEGACY_SYSFS=n. Signed-off-by: Aaron Miller Cc: linux-edac Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com Signed-off-by: Borislav Petkov --- Documentation/admin-guide/ras.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation/admin-guide/ras.rst') diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index d71340e86c27..9939348bd4a3 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst @@ -438,11 +438,13 @@ A typical EDAC system has the following structure under │   │   ├── ce_count │   │   ├── ce_noinfo_count │   │   ├── dimm0 + │   │   │   ├── dimm_ce_count │   │   │   ├── dimm_dev_type │   │   │   ├── dimm_edac_mode │   │   │   ├── dimm_label │   │   │   ├── dimm_location │   │   │   ├── dimm_mem_type + │   │   │   ├── dimm_ue_count │   │   │   ├── size │   │   │   └── uevent │   │   ├── max_location @@ -457,11 +459,13 @@ A typical EDAC system has the following structure under │   │   ├── ce_count │   │   ├── ce_noinfo_count │   │   ├── dimm0 + │   │   │   ├── dimm_ce_count │   │   │   ├── dimm_dev_type │   │   │   ├── dimm_edac_mode │   │   │   ├── dimm_label │   │   │   ├── dimm_location │   │   │   ├── dimm_mem_type + │   │   │   ├── dimm_ue_count │   │   │   ├── size │   │   │   └── uevent │   │   ├── max_location @@ -483,6 +487,22 @@ this ``X`` memory module: This attribute file displays, in count of megabytes, the memory that this csrow contains. +- ``dimm_ue_count`` - Uncorrectable Errors count attribute file + + This attribute file displays the total count of uncorrectable + errors that have occurred on this DIMM. If panic_on_ue is set + this counter will not have a chance to increment, since EDAC + will panic the system. + +- ``dimm_ce_count`` - Correctable Errors count attribute file + + This attribute file displays the total count of correctable + errors that have occurred on this DIMM. This count is very + important to examine. CEs provide early indications that a + DIMM is beginning to fail. This count field should be + monitored for non-zero values and report such information + to the system administrator. + - ``dimm_dev_type`` - Device type attribute file This attribute file will display what type of DRAM device is -- cgit v1.2.3