Explicit volatile write back cache control¶
Introduction¶
Many storage devices, especially in the consumer market, come with volatilewrite back caches. That means the devices signal I/O completion to theoperating system before data actually has hit the non-volatile storage. Thisbehavior obviously speeds up various workloads, but it means the operatingsystem needs to force data out to the non-volatile storage when it performsa data integrity operation like fsync, sync or an unmount.
The Linux block layer provides two simple mechanisms that let filesystemscontrol the caching behavior of the storage device. These mechanisms area forced cache flush, and the Force Unit Access (FUA) flag for requests.
Explicit cache flushes¶
The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted fromthe filesystem and will make sure the volatile cache of the storage devicehas been flushed before the actual I/O operation is started. This explicitlyguarantees that previously completed write requests are on non-volatilestorage before the flagged bio starts. In addition the REQ_PREFLUSH flag can beset on an otherwise empty bio structure, which causes only an explicit cacheflush without any dependent I/O. It is recommend to usetheblkdev_issue_flush() helper for a pure cache flush.
Forced Unit Access¶
The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from thefilesystem and will make sure that I/O completion for this request is onlysignaled after the data has been committed to non-volatile storage.
Implementation details for filesystems¶
Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have toworry if the underlying devices need any explicit cache flushing and howthe Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flagsmay both be set on a single bio.
Implementation details for bio based block drivers¶
These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sitdirectly below the submit_bio interface. For remapping drivers the REQ_FUAbits need to be propagated to underlying devices, and a global flush needsto be implemented for bios with the REQ_PREFLUSH bit set. For real devicedrivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bitson non-empty bios can simply be ignored, and REQ_PREFLUSH requests withoutdata can be completed successfully without doing any work. Drivers fordevices with volatile caches need to implement the support for theseflags themselves without any help from the block layer.
Implementation details for request_fn based block drivers¶
For devices that do not support volatile write caches there is no driversupport required, the block layer completes empty REQ_PREFLUSH requests beforeentering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits fromrequests that have a payload. For devices with volatile write caches thedriver needs to tell the block layer that it supports flushing caches bydoing:
blk_queue_write_cache(sdkp->disk->queue, true, false);
and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note thatREQ_PREFLUSH requests with a payload are automatically turned into a sequenceof an empty REQ_OP_FLUSH request followed by the actual write by the blocklayer. For devices that also support the FUA bit the block layer needsto be told to pass through the REQ_FUA bit using:
blk_queue_write_cache(sdkp->disk->queue, true, true);
and the driver must handle write requests that have the REQ_FUA bit setin prep_fn/request_fn. If the FUA bit is not natively supported the blocklayer turns it into an empty REQ_OP_FLUSH request after the actual write.