pstore block oops/panic logger¶
Introduction¶
pstore block (pstore/blk) is an oops/panic logger that writes its logs to ablock device and non-block device before the system crashes. You can getthese log files by mounting pstore filesystem like:
mount -t pstore pstore /sys/fs/pstore
pstore block concepts¶
pstore/blk provides efficient configuration method for pstore/blk, whichdivides all configurations into two parts, configurations for user andconfigurations for driver.
Configurations for user determine how pstore/blk works, such as pmsg_size,kmsg_size and so on. All of them support both Kconfig and module parameters,but module parameters have priority over Kconfig.
Configurations for driver are all about block device and non-block device,such as total_size of block device and read/write operations.
Configurations for user¶
All of these configurations support both Kconfig and module parameters, butmodule parameters have priority over Kconfig.
Here is an example for module parameters:
pstore_blk.blkdev=/dev/mmcblk0p7 pstore_blk.kmsg_size=64 best_effort=y
The detail of each configurations may be of interest to you.
blkdev¶
The block device to use. Most of the time, it is a partition of block device.It’s required for pstore/blk. It is also used for MTD device.
When pstore/blk is built as a module, “blkdev” accepts the following variants:
/dev/<disk_name> represents the device number of disk
/dev/<disk_name><decimal> represents the device number of partition - devicenumber of disk plus the partition number
/dev/<disk_name>p<decimal> - same as the above; this form is used when diskname of partitioned disk ends with a digit.
When pstore/blk is built into the kernel, “blkdev” accepts the following variants:
<hex_major><hex_minor> device number in hexadecimal representation,with no leading 0x, for example b302.
PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id ofa partition if the partition table provides it. The UUID may be either anEFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,where SSSSSSSS is a zero-filled hex representation of the 32-bit“NT disk signature”, and PP is a zero-filled hex representation of the1-based partition number.
PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to apartition with a known unique id.
<major>:<minor> major and minor number of the device separated by a colon.
It accepts the following variants for MTD device:
<device name> MTD device name. “pstore” is recommended.
<device number> MTD device number.
kmsg_size¶
The chunk size in KB for oops/panic front-end. ItMUST be a multiple of 4.It’s optional if you do not care about the oops/panic log.
There are multiple chunks for oops/panic front-end depending on the remainingspace except other pstore front-ends.
pstore/blk will log to oops/panic chunks one by one, and always overwrite theoldest chunk if there is no more free chunk.
pmsg_size¶
The chunk size in KB for pmsg front-end. ItMUST be a multiple of 4.It’s optional if you do not care about the pmsg log.
Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
Pmsg is a user space accessible pstore object. Writes to/dev/pmsg0 areappended to the chunk. On reboot the contents are available in/sys/fs/pstore/pmsg-pstore-blk-0.
console_size¶
The chunk size in KB for console front-end. ItMUST be a multiple of 4.It’s optional if you do not care about the console log.
Similar to pmsg front-end, there is only one chunk for console front-end.
All log of console will be appended to the chunk. On reboot the contents areavailable in/sys/fs/pstore/console-pstore-blk-0.
ftrace_size¶
The chunk size in KB for ftrace front-end. ItMUST be a multiple of 4.It’s optional if you do not care about the ftrace log.
Similar to oops front-end, there are multiple chunks for ftrace front-enddepending on the count of cpu processors. Each chunk size is equal toftrace_size / processors_count.
All log of ftrace will be appended to the chunk. On reboot the contents arecombined and available in/sys/fs/pstore/ftrace-pstore-blk-0.
Persistent function tracing might be useful for debugging software or hardwarerelated hangs. Here is an example of usage:
# mount -t pstore pstore /sys/fs/pstore# mount -t debugfs debugfs /sys/kernel/debug/# echo 1 > /sys/kernel/debug/pstore/record_ftrace# reboot -f[...]# mount -t pstore pstore /sys/fs/pstore# tail /sys/fs/pstore/ftrace-pstore-blk-0CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
max_reason¶
Limiting which kinds of kmsg dumps are stored can be controlled viathemax_reason value, as defined in include/linux/kmsg_dump.h’senumkmsg_dump_reason. For example, to store both Oopses and Panics,max_reason should be set to 2 (KMSG_DUMP_OOPS), to store only Panicsmax_reason should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by theprintk.always_kmsg_dump boot param: if unset, it’ll be KMSG_DUMP_OOPS,otherwise KMSG_DUMP_MAX.
Configurations for driver¶
A device driver usesregister_pstore_device withstructpstore_device_info to register to pstore/blk.
- intregister_pstore_device(structpstore_device_info*dev)¶
register non-block device to pstore/blk
Parameters
structpstore_device_info*devnon-block device information
Return
0 - OK
Others - something error.
- voidunregister_pstore_device(structpstore_device_info*dev)¶
unregister non-block device from pstore/blk
Parameters
structpstore_device_info*devnon-block device information
Compression and header¶
Block device is large enough for uncompressed oops data. Actually we do notrecommend data compression because pstore/blk will insert some information intothe first line of oops/panic data. For example:
Panic: Total 16 times
It means that it’s OOPS|Panic for the 16th time since the first booting.Sometimes the number of occurrences of oops|panic since the first booting isimportant to judge whether the system is stable.
The following line is inserted by pstore filesystem. For example:
Oops#2 Part1
It means that it’s OOPS for the 2nd time on the last boot.
Reading the data¶
The dump data can be read from the pstore filesystem. The format for thesefiles isdmesg-pstore-blk-[N] for oops/panic front-end,pmsg-pstore-blk-0 for pmsg front-end and so on. The timestamp of thedump file records the trigger time. To delete a stored record from blockdevice, simply unlink the respective pstore file.
Attentions in panic read/write APIs¶
If on panic, the kernel is not going to run for much longer, the tasks will notbe scheduled and most kernel resources will be out of service. Itlooks like a single-threaded program running on a single-core computer.
The following points require special attention for panic read/write APIs:
CanNOT allocate any memory.If you need memory, just allocate while the block driver is initializingrather than waiting until the panic.
Must be polled,NOT interrupt driven.No task schedule any more. The block driver should delay to ensure the writesucceeds, but NOT sleep.
CanNOT take any lock.There is no other task, nor any shared resource; you are safe to break alllocks.
Just use CPU to transfer.Do not use DMA to transfer unless you are sure that DMA will not keep lock.
Control registers directly.Please control registers directly rather than use Linux kernel resources.Do I/O map while initializing rather than wait until a panic occurs.
Reset your block device and controller if necessary.If you are not sure of the state of your block device and controller whena panic occurs, you are safe to stop and reset them.
pstore/blk supportspsblk_blkdev_info(), which is defined inlinux/pstore_blk.h, to get information of using block device, such as thedevice number, sector count and start sector of the whole disk.
pstore block internals¶
For developer reference, here are all the important structures and APIs:
- structpsz_buffer¶
header of zone to flush to storage
Definition:
struct psz_buffer {#define PSZ_SIG (0x43474244); uint32_t sig; atomic_t datalen; atomic_t start; uint8_t data[];};Members
sigsignature to indicate header (PSZ_SIG xor PSZONE-type value)
datalenlength of data indata
startoffset intodata where the beginning of the stored bytes begin
datazone data.
- structpsz_kmsg_header¶
kmsg dump-specific header to flush to storage
Definition:
struct psz_kmsg_header {#define PSTORE_KMSG_HEADER_MAGIC 0x4dfc3ae5; uint32_t magic; struct timespec64 time; bool compressed; uint32_t counter; enum kmsg_dump_reason reason; uint8_t data[];};Members
magicmagic num for kmsg dump header
timekmsg dump trigger time
compressedwhether compressed
counterkmsg dump counter
reasonthe kmsg dump reason (e.g. oops, panic, etc)
datapointer to log data
Description
This is a sub-header for a kmsg dump, trailing afterpsz_buffer.
- structpstore_zone¶
single stored buffer
Definition:
struct pstore_zone { loff_t off; const char *name; enum pstore_type_id type; struct psz_buffer *buffer; struct psz_buffer *oldbuf; size_t buffer_size; bool should_recover; atomic_t dirty;};Members
offzone offset of storage
namefront-end name for this zone
typefront-end type for this zone
bufferpointer to data buffer managed by this zone
oldbufpointer to old data buffer
buffer_sizebytes inbuffer->data
should_recoverwhether this zone should recover from storage
dirtywhether the data inbuffer dirty
Description
zone structure in memory.
- structpsz_context¶
all about running state of pstore/zone
Definition:
struct psz_context { struct pstore_zone **kpszs; struct pstore_zone *ppsz; struct pstore_zone *cpsz; struct pstore_zone **fpszs; unsigned int kmsg_max_cnt; unsigned int kmsg_read_cnt; unsigned int kmsg_write_cnt; unsigned int pmsg_read_cnt; unsigned int console_read_cnt; unsigned int ftrace_max_cnt; unsigned int ftrace_read_cnt; unsigned int oops_counter; unsigned int panic_counter; atomic_t recovered; atomic_t on_panic; struct mutex pstore_zone_info_lock; struct pstore_zone_info *pstore_zone_info; struct pstore_info pstore;};Members
kpszskmsg dump storage zones
ppszpmsg storage zone
cpszconsole storage zone
fpszsftrace storage zones
kmsg_max_cntmax count ofkpszs
kmsg_read_cntcounter of total read kmsg dumps
kmsg_write_cntcounter of total kmsg dump writes
pmsg_read_cntcounter of total read pmsg zone
console_read_cntcounter of total read console zone
ftrace_max_cntmax count offpszs
ftrace_read_cntcounter of max read ftrace zone
oops_countercounter of oops dumps
panic_countercounter of panic dumps
recoveredwhether finished recovering data from storage
on_panicwhether panic is happening
pstore_zone_info_locklock topstore_zone_info
pstore_zone_infoinformation from backend
pstorestructure for pstore
- enumpsz_flush_mode¶
flush mode for
psz_zone_write()
Constants
FLUSH_NONEdo not flush to storage but update data on memory
FLUSH_PARTjust flush part of data including meta data to storage
FLUSH_METAjust flush meta data of zone to storage
FLUSH_ALLflush all of zone
- intpsz_recovery(structpsz_context*cxt)¶
recover data from storage
Parameters
structpsz_context*cxtthe context of pstore/zone
Description
recovery means reading data back from storage after rebooting
Return
0 on success, others on failure.
- structpstore_zone_info¶
pstore/zone back-end driver structure
Definition:
struct pstore_zone_info { struct module *owner; const char *name; unsigned long total_size; unsigned long kmsg_size; int max_reason; unsigned long pmsg_size; unsigned long console_size; unsigned long ftrace_size; pstore_zone_read_op read; pstore_zone_write_op write; pstore_zone_erase_op erase; pstore_zone_write_op panic_write;};Members
ownerModule which is responsible for this back-end driver.
nameName of the back-end driver.
total_sizeThe total size in bytes pstore/zone can use. It must be greaterthan 4096 and be multiple of 4096.
kmsg_sizeThe size of oops/panic zone. Zero means disabled, otherwise,it must be multiple of SECTOR_SIZE(512 Bytes).
max_reasonMaximum kmsg dump reason to store.
pmsg_sizeThe size of pmsg zone which is the same askmsg_size.
console_sizeThe size of console zone which is the same askmsg_size.
ftrace_sizeThe size of ftrace zone which is the same askmsg_size.
readThe general read operation. Both of the function parameterssize andoffset are relative value to storage.On success, the number of bytes should be returned, othersmean error.
writeThe same asread, but the following error number:-EBUSY means try to write again later.-ENOMSG means to try next zone.
eraseThe general erase operation for device with special removingjob. Both of the function parameterssize andoffset arerelative value to storage.Return 0 on success and others on failure.
panic_writeThe write operation only used for panic case. It’s optionalif you do not care panic log. The parameters are relativevalue to storage.On success, the number of bytes should be returned, othersexcluding -ENOMSG mean error. -ENOMSG means to try next zone.
- structpstore_device_info¶
back-end pstore/blk driver structure.
Definition:
struct pstore_device_info { unsigned int flags; struct pstore_zone_info zone;};Members
flagsRefer to macro starting with PSTORE_FLAGS defined inlinux/pstore.h. It means what front-ends this device support.Zero means all backends for compatible.
zoneThe
structpstore_zone_infodetails.
- structpstore_blk_config¶
the pstore_blk backend configuration
Definition:
struct pstore_blk_config { char device[80]; enum kmsg_dump_reason max_reason; unsigned long kmsg_size; unsigned long pmsg_size; unsigned long console_size; unsigned long ftrace_size;};Members
deviceName of the desired block device
max_reasonMaximum kmsg dump reason to store to block device
kmsg_sizeTotal size of for kmsg dumps
pmsg_sizeTotal size of the pmsg storage area
console_sizeTotal size of the console storage area
ftrace_sizeTotal size for ftrace logging data (for all CPUs)
- intpstore_blk_get_config(structpstore_blk_config*info)¶
get a copy of the pstore_blk backend configuration
Parameters
structpstore_blk_config*infoThe sturct pstore_blk_config to be filled in
Description
Failure returns negative error code, and success returns 0.