diff options
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/RAS/address-translation.rst | 24 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/error-decoding.rst | 21 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/index.rst | 7 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/main.rst (renamed from Documentation/admin-guide/ras.rst) | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/index.rst | 2 |
5 files changed, 60 insertions, 4 deletions
diff --git a/Documentation/admin-guide/RAS/address-translation.rst b/Documentation/admin-guide/RAS/address-translation.rst new file mode 100644 index 000000000000..f0ca17b43cd3 --- /dev/null +++ b/Documentation/admin-guide/RAS/address-translation.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Address translation +=================== + +x86 AMD +------- + +Zen-based AMD systems include a Data Fabric that manages the layout of +physical memory. Devices attached to the Fabric, like memory controllers, +I/O, etc., may not have a complete view of the system physical memory map. +These devices may provide a "normalized", i.e. device physical, address +when reporting memory errors. Normalized addresses must be translated to +a system physical address for the kernel to action on the memory. + +AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for +this case. + +Glossary of acronyms used in address translation for Zen-based systems + +* CCM = Cache Coherent Moderator +* COD = Cluster-on-Die +* COH_ST = Coherent Station +* DF = Data Fabric diff --git a/Documentation/admin-guide/RAS/error-decoding.rst b/Documentation/admin-guide/RAS/error-decoding.rst new file mode 100644 index 000000000000..26a72f3fe5de --- /dev/null +++ b/Documentation/admin-guide/RAS/error-decoding.rst @@ -0,0 +1,21 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Error decoding +============== + +x86 +--- + +Error decoding on AMD systems should be done using the rasdaemon tool: +https://github.com/mchehab/rasdaemon/ + +While the daemon is running, it would automatically log and decode +errors. If not, one can still decode such errors by supplying the +hardware information from the error:: + + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca + +Also, the user can pass particular family and model to decode the error +string:: + + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM> diff --git a/Documentation/admin-guide/RAS/index.rst b/Documentation/admin-guide/RAS/index.rst new file mode 100644 index 000000000000..f4087040a7c0 --- /dev/null +++ b/Documentation/admin-guide/RAS/index.rst @@ -0,0 +1,7 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. toctree:: + :maxdepth: 2 + + main + error-decoding + address-translation diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/RAS/main.rst index 8e03751d126d..7ac1d4ccc509 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/RAS/main.rst @@ -1,8 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 .. include:: <isonum.txt> -============================================ -Reliability, Availability and Serviceability -============================================ +================================================== +Reliability, Availability and Serviceability (RAS) +================================================== + +This documents different aspects of the RAS functionality present in the +kernel. RAS concepts ************ diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index fb40a1f6f79e..dfc06fab9432 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -122,7 +122,7 @@ configure specific aspects of kernel behavior to your liking. pmf pnp rapidio - ras + RAS/index rtc serial-console svga |