diff options
Diffstat (limited to 'Documentation/powerpc/firmware-assisted-dump.txt')
-rw-r--r-- | Documentation/powerpc/firmware-assisted-dump.txt | 292 |
1 files changed, 0 insertions, 292 deletions
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt deleted file mode 100644 index 18c5feef2577..000000000000 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ /dev/null @@ -1,292 +0,0 @@ - - Firmware-Assisted Dump - ------------------------ - July 2011 - -The goal of firmware-assisted dump is to enable the dump of -a crashed system, and to do so from a fully-reset system, and -to minimize the total elapsed time until the system is back -in production use. - -- Firmware assisted dump (fadump) infrastructure is intended to replace - the existing phyp assisted dump. -- Fadump uses the same firmware interfaces and memory reservation model - as phyp assisted dump. -- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore - in the ELF format in the same way as kdump. This helps us reuse the - kdump infrastructure for dump capture and filtering. -- Unlike phyp dump, userspace tool does not need to refer any sysfs - interface while reading /proc/vmcore. -- Unlike phyp dump, fadump allows user to release all the memory reserved - for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. -- Once enabled through kernel boot parameter, fadump can be - started/stopped through /sys/kernel/fadump_registered interface (see - sysfs files section below) and can be easily integrated with kdump - service start/stop init scripts. - -Comparing with kdump or other strategies, firmware-assisted -dump offers several strong, practical advantages: - --- Unlike kdump, the system has been reset, and loaded - with a fresh copy of the kernel. In particular, - PCI and I/O devices have been reinitialized and are - in a clean, consistent state. --- Once the dump is copied out, the memory that held the dump - is immediately available to the running kernel. And therefore, - unlike kdump, fadump doesn't need a 2nd reboot to get back - the system to the production configuration. - -The above can only be accomplished by coordination with, -and assistance from the Power firmware. The procedure is -as follows: - --- The first kernel registers the sections of memory with the - Power firmware for dump preservation during OS initialization. - These registered sections of memory are reserved by the first - kernel during early boot. - --- When a system crashes, the Power firmware will save - the low memory (boot memory of size larger of 5% of system RAM - or 256MB) of RAM to the previous registered region. It will - also save system registers, and hardware PTE's. - - NOTE: The term 'boot memory' means size of the low memory chunk - that is required for a kernel to boot successfully when - booted with restricted memory. By default, the boot memory - size will be the larger of 5% of system RAM or 256MB. - Alternatively, user can also specify boot memory size - through boot parameter 'crashkernel=' which will override - the default calculated size. Use this option if default - boot memory size is not sufficient for second kernel to - boot successfully. For syntax of crashkernel= parameter, - refer to Documentation/kdump/kdump.txt. If any offset is - provided in crashkernel= parameter, it will be ignored - as fadump uses a predefined offset to reserve memory - for boot memory dump preservation in case of a crash. - --- After the low memory (boot memory) area has been saved, the - firmware will reset PCI and other hardware state. It will - *not* clear the RAM. It will then launch the bootloader, as - normal. - --- The freshly booted kernel will notice that there is a new - node (ibm,dump-kernel) in the device tree, indicating that - there is crash data available from a previous boot. During - the early boot OS will reserve rest of the memory above - boot memory size effectively booting with restricted memory - size. This will make sure that the second kernel will not - touch any of the dump memory area. - --- User-space tools will read /proc/vmcore to obtain the contents - of memory, which holds the previous crashed kernel dump in ELF - format. The userspace tools may copy this info to disk, or - network, nas, san, iscsi, etc. as desired. - --- Once the userspace tool is done saving dump, it will echo - '1' to /sys/kernel/fadump_release_mem to release the reserved - memory back to general use, except the memory required for - next firmware-assisted dump registration. - - e.g. - # echo 1 > /sys/kernel/fadump_release_mem - -Please note that the firmware-assisted dump feature -is only available on Power6 and above systems with recent -firmware versions. - -Implementation details: ----------------------- - -During boot, a check is made to see if firmware supports -this feature on that particular machine. If it does, then -we check to see if an active dump is waiting for us. If yes -then everything but boot memory size of RAM is reserved during -early boot (See Fig. 2). This area is released once we finish -collecting the dump from user land scripts (e.g. kdump scripts) -that are run. If there is dump data, then the -/sys/kernel/fadump_release_mem file is created, and the reserved -memory is held. - -If there is no waiting dump data, then only the memory required -to hold CPU state, HPTE region, boot memory dump and elfcore -header, is usually reserved at an offset greater than boot memory -size (see Fig. 1). This area is *not* released: this region will -be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state -and HPTE region, in the case a crash does occur. Since this reserved -memory area is used only after the system crash, there is no point in -blocking this significant chunk of memory from production kernel. -Hence, the implementation uses the Linux kernel's Contiguous Memory -Allocator (CMA) for memory reservation if CMA is configured for kernel. -With CMA reservation this memory will be available for applications to -use it, while kernel is prevented from using it. With this fadump will -still be able to capture all of the kernel memory and most of the user -space memory except the user pages that were present in CMA region. - - o Memory Reservation during first kernel - - Low memory Top of memory - 0 boot memory size | - | | |<--Reserved dump area -->| | - V V | Permanent Reservation | V - +-----------+----------/ /---+---+----+-----------+----+------+ - | | |CPU|HPTE| DUMP |ELF | | - +-----------+----------/ /---+---+----+-----------+----+------+ - | ^ - | | - \ / - ------------------------------------------- - Boot memory content gets transferred to - reserved area by firmware at the time of - crash - Fig. 1 - - o Memory Reservation during second kernel after crash - - Low memory Top of memory - 0 boot memory size | - | |<------------- Reserved dump area ----------- -->| - V V V - +-----------+----------/ /---+---+----+-----------+----+------+ - | | |CPU|HPTE| DUMP |ELF | | - +-----------+----------/ /---+---+----+-----------+----+------+ - | | - V V - Used by second /proc/vmcore - kernel to boot - Fig. 2 - -Currently the dump will be copied from /proc/vmcore to a -a new file upon user intervention. The dump data available through -/proc/vmcore will be in ELF format. Hence the existing kdump -infrastructure (kdump scripts) to save the dump works fine with -minor modifications. - -The tools to examine the dump will be same as the ones -used for kdump. - -How to enable firmware-assisted dump (fadump): -------------------------------------- - -1. Set config option CONFIG_FA_DUMP=y and build kernel. -2. Boot into linux kernel with 'fadump=on' kernel cmdline option. - By default, fadump reserved memory will be initialized as CMA area. - Alternatively, user can boot linux kernel with 'fadump=nocma' to - prevent fadump to use CMA. -3. Optionally, user can also set 'crashkernel=' kernel cmdline - to specify size of the memory to reserve for boot memory dump - preservation. - -NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead - use 'crashkernel=' to specify size of the memory to reserve - for boot memory dump preservation. - 2. If firmware-assisted dump fails to reserve memory then it - will fallback to existing kdump mechanism if 'crashkernel=' - option is set at kernel cmdline. - 3. if user wants to capture all of user space memory and ok with - reserved memory not available to production system, then - 'fadump=nocma' kernel parameter can be used to fallback to - old behaviour. - -Sysfs/debugfs files: ------------- - -Firmware-assisted dump feature uses sysfs file system to hold -the control files and debugfs file to display memory reserved region. - -Here is the list of files under kernel sysfs: - - /sys/kernel/fadump_enabled - - This is used to display the fadump status. - 0 = fadump is disabled - 1 = fadump is enabled - - This interface can be used by kdump init scripts to identify if - fadump is enabled in the kernel and act accordingly. - - /sys/kernel/fadump_registered - - This is used to display the fadump registration status as well - as to control (start/stop) the fadump registration. - 0 = fadump is not registered. - 1 = fadump is registered and ready to handle system crash. - - To register fadump echo 1 > /sys/kernel/fadump_registered and - echo 0 > /sys/kernel/fadump_registered for un-register and stop the - fadump. Once the fadump is un-registered, the system crash will not - be handled and vmcore will not be captured. This interface can be - easily integrated with kdump service start/stop. - - /sys/kernel/fadump_release_mem - - This file is available only when fadump is active during - second kernel. This is used to release the reserved memory - region that are held for saving crash dump. To release the - reserved memory echo 1 to it: - - echo 1 > /sys/kernel/fadump_release_mem - - After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region - file will change to reflect the new memory reservations. - - The existing userspace tools (kdump infrastructure) can be easily - enhanced to use this interface to release the memory reserved for - dump and continue without 2nd reboot. - -Here is the list of files under powerpc debugfs: -(Assuming debugfs is mounted on /sys/kernel/debug directory.) - - /sys/kernel/debug/powerpc/fadump_region - - This file shows the reserved memory regions if fadump is - enabled otherwise this file is empty. The output format - is: - <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> - - e.g. - Contents when fadump is registered during first kernel - - # cat /sys/kernel/debug/powerpc/fadump_region - CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 - HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 - DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 - - Contents when fadump is active during second kernel - - # cat /sys/kernel/debug/powerpc/fadump_region - CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 - HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 - DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 - : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 - -NOTE: Please refer to Documentation/filesystems/debugfs.txt on - how to mount the debugfs filesystem. - - -TODO: ------ - o Need to come up with the better approach to find out more - accurate boot memory size that is required for a kernel to - boot successfully when booted with restricted memory. - o The fadump implementation introduces a fadump crash info structure - in the scratch area before the ELF core header. The idea of introducing - this structure is to pass some important crash info data to the second - kernel which will help second kernel to populate ELF core header with - correct data before it gets exported through /proc/vmcore. The current - design implementation does not address a possibility of introducing - additional fields (in future) to this structure without affecting - compatibility. Need to come up with the better approach to address this. - The possible approaches are: - 1. Introduce version field for version tracking, bump up the version - whenever a new field is added to the structure in future. The version - field can be used to find out what fields are valid for the current - version of the structure. - 2. Reserve the area of predefined size (say PAGE_SIZE) for this - structure and have unused area as reserved (initialized to zero) - for future field additions. - The advantage of approach 1 over 2 is we don't need to reserve extra space. ---- -Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> -This document is based on the original documentation written for phyp -assisted dump by Linas Vepstas and Manish Ahuja. |