Ramoops oops/panic logger¶
Sergiu Iordache <sergiu@chromium.org>
Updated: 10 Feb 2021
Introduction¶
Ramoops is an oops/panic logger that writes its logs to RAM before the systemcrashes. It works by logging oopses and panics in a circular buffer. Ramoopsneeds a system with persistent RAM so that the content of that area cansurvive after a restart.
Ramoops concepts¶
Ramoops uses a predefined memory area to store the dump. The start and sizeand type of the memory area are set using three variables:
mem_addressfor the start
mem_sizefor the size. The memory size will be rounded down to apower of two.
mem_typeto specify if the memory type (default is pgprot_writecombine).
mem_nameto specify a memory region defined byreserve_memcommandline parameter.
Typically the default value ofmem_type=0 should be used as that sets the pstoremapping to pgprot_writecombine. Settingmem_type=1 attempts to usepgprot_noncached, which only works on some platforms. This is because pstoredepends on atomic operations. At least on ARM, pgprot_noncached causes thememory to be mapped strongly ordered, and atomic operations on strongly orderedmemory are implementation defined, and won’t work on many ARMs such as omaps.Settingmem_type=2 attempts to treat the memory region as normal memory,which enables full cache on it. This can improve the performance.
The memory area is divided intorecord_size chunks (also rounded down topower of two) and each kmesg dump writes arecord_size chunk ofinformation.
Limiting which kinds of kmsg dumps are stored can be controlled viathemax_reason value, as defined in include/linux/kmsg_dump.h’senumkmsg_dump_reason. For example, to store both Oopses and Panics,max_reason should be set to 2 (KMSG_DUMP_OOPS), to store only Panicsmax_reason should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by theprintk.always_kmsg_dump boot param: if unset, it’ll be KMSG_DUMP_OOPS,otherwise KMSG_DUMP_MAX.
The module uses a counter to record multiple dumps but the counter gets reseton restart (i.e. new dumps after the restart will overwrite old ones).
Ramoops also supports software ECC protection of persistent memory regions.This might be useful when a hardware reset was used to bring the machine backto life (i.e. a watchdog triggered). In such cases, RAM may be somewhatcorrupt, but usually it is restorable.
Setting the parameters¶
Setting the ramoops parameters can be done in several different manners:
A. Use the module parameters (which have the names of the variables describedas before). For quick debugging, you can also reserve parts of memory duringboot and then use the reserved memory for ramoops. For example, assuming amachine with > 128 MB of memory, the following kernel command line will tellthe kernel to use only the first 128 MB of memory, and place ECC-protectedramoops region at 128 MB boundary:
mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1B. Use Device Tree bindings, as described in
Documentation/devicetree/bindings/reserved-memory/ramoops.yaml.For example:reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; ramoops@8f000000 { compatible = "ramoops"; reg = <0 0x8f000000 0 0x100000>; record-size = <0x4000>; console-size = <0x4000>; };};C. Use a platform device and set the platform data. The parameters can thenbe set through that platform data. An example of doing that is:
#include<linux/pstore_ram.h>[...]staticstructramoops_platform_dataramoops_data={.mem_size=<...>,.mem_address=<...>,.mem_type=<...>,.record_size=<...>,.max_reason=<...>,.ecc=<...>,};staticstructplatform_deviceramoops_dev={.name="ramoops",.dev={.platform_data=&ramoops_data,},};[...insideafunction...]intret;ret=platform_device_register(&ramoops_dev);if(ret){printk(KERN_ERR"unable to register platform device\n");returnret;}
Using a region of memory reserved via
reserve_memcommand lineparameter. The address and size will be defined by thereserve_memparameter. Note, thatreserve_memmay not always allocate memoryin the same location, and cannot be relied upon. Testing will needto be done, and it may not work on every machine, nor every kernel.Consider this a “best effort” approach. Thereserve_memoptiontakes a size, alignment and name as arguments. The name is usedto map the memory to a label that can be retrieved by ramoops.reserve_mem=2M:4096:oops ramoops.mem_name=oops
You can specify either RAM memory or peripheral devices’ memory. However, whenspecifying RAM, be sure to reserve the memory by issuingmemblock_reserve()very early in the architecture code, e.g.:
#include <linux/memblock.h>memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);
Dump format¶
The data dump begins with a header, currently defined as==== followed by atimestamp and a new line. The dump then continues with the actual data.
Reading the data¶
The dump data can be read from the pstore filesystem. The format for thesefiles isdmesg-ramoops-N, where N is the record number in memory. To deletea stored record from RAM, simply unlink the respective pstore file.
Persistent function tracing¶
Persistent function tracing might be useful for debugging software or hardwarerelated hangs. The functions call chain log is stored in aftrace-ramoopsfile. Here is an example of usage:
# mount -t debugfs debugfs /sys/kernel/debug/# echo 1 > /sys/kernel/debug/pstore/record_ftrace# reboot -f[...]# mount -t pstore pstore /mnt/# tail /mnt/ftrace-ramoops0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc00 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc00 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x900 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x900 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x400 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e00 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e00 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e00 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x400 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20