zswap¶
Overview¶
Zswap is a lightweight compressed cache for swap pages. It takes pages that arein the process of being swapped out and attempts to compress them into adynamically allocated RAM-based memory pool. zswap basically trades CPU cyclesfor potentially reduced swap I/O. This trade-off can also result in asignificant performance improvement if reads from the compressed cache arefaster than reads from a swap device.
Note
Zswap is a new feature as of v3.11 and interacts heavily with memoryreclaim. This interaction has not been fully explored on the large set ofpotential configurations and workloads that exist. For this reason, zswapis a work in progress and should be considered experimental.
Some potential benefits:
- Desktop/laptop users with limited RAM capacities can mitigate theperformance impact of swapping.
- Overcommitted guests that share a common I/O resource candramatically reduce their swap I/O pressure, avoiding heavy handed I/Othrottling by the hypervisor. This allows more work to get done with lessimpact to the guest workload and guests sharing the I/O subsystem
- Users with SSDs as swap devices can extend the life of the device bydrastically reducing life-shortening writes.
Zswap evicts pages from compressed cache on an LRU basis to the backing swapdevice when the compressed pool reaches its size limit. This requirement hadbeen identified in prior community discussions.
Whether Zswap is enabled at the boot time depends on whethertheCONFIG_ZSWAP_DEFAULT_ON Kconfig option is enabled or not.This setting can then be overridden by providing the kernel command linezswap.enabled= option, for examplezswap.enabled=0.Zswap can also be enabled and disabled at runtime using the sysfs interface.An example command to enable zswap at runtime, assuming sysfs is mountedat/sys, is:
echo 1 > /sys/module/zswap/parameters/enabled
When zswap is disabled at runtime it will stop storing pages that arebeing swapped out. However, it will _not_ immediately write out or faultback into memory all of the pages stored in the compressed pool. Thepages stored in zswap will remain in the compressed pool until they areeither invalidated or faulted back into memory. In order to force allpages out of the compressed pool, a swapoff on the swap device(s) willfault back into memory all swapped out pages, including those in thecompressed pool.
Design¶
Zswap receives pages for compression through the Frontswap API and is able toevict pages from its own compressed pool on an LRU basis and write them back tothe backing swap device in the case that the compressed pool is full.
Zswap makes use of zpool for the managing the compressed memory pool. Eachallocation in zpool is not directly accessible by address. Rather, a handle isreturned by the allocation routine and that handle must be mapped before beingaccessed. The compressed memory pool grows on demand and shrinks as compressedpages are freed. The pool is not preallocated. By default, a zpoolof type selected inCONFIG_ZSWAP_ZPOOL_DEFAULT Kconfig option is created,but it can be overridden at boot time by setting thezpool attribute,e.g.zswap.zpool=zbud. It can also be changed at runtime using the sysfszpool attribute, e.g.:
echo zbud > /sys/module/zswap/parameters/zpool
The zbud type zpool allocates exactly 1 page to store 2 compressed pages, whichmeans the compression ratio will always be 2:1 or worse (because of half-fullzbud pages). The zsmalloc type zpool has a more complex compressed pagestorage method, and it can achieve greater storage densities. However,zsmalloc does not implement compressed page eviction, so once zswap fills itcannot evict the oldest page, it can only reject new pages.
When a swap page is passed from frontswap to zswap, zswap maintains a mappingof the swap entry, a combination of the swap type and swap offset, to the zpoolhandle that references that compressed swap page. This mapping is achievedwith a red-black tree per swap type. The swap offset is the search key for thetree nodes.
During a page fault on a PTE that is a swap entry, frontswap calls the zswapload function to decompress the page into the page allocated by the page faulthandler.
Once there are no PTEs referencing a swap page stored in zswap (i.e. the countin the swap_map goes to 0) the swap code calls the zswap invalidate function,via frontswap, to free the compressed entry.
Zswap seeks to be simple in its policies. Sysfs attributes allow for one usercontrolled policy:
- max_pool_percent - The maximum percentage of memory that the compressedpool can occupy.
The default compressor is selected inCONFIG_ZSWAP_COMPRESSOR_DEFAULTKconfig option, but it can be overridden at boot time by setting thecompressor attribute, e.g.zswap.compressor=lzo.It can also be changed at runtime using the sysfs “compressor”attribute, e.g.:
echo lzo > /sys/module/zswap/parameters/compressor
When the zpool and/or compressor parameter is changed at runtime, any existingcompressed pages are not modified; they are left in their own zpool. When arequest is made for a page in an old zpool, it is uncompressed using itsoriginal compressor. Once all pages are removed from an old zpool, the zpooland its compressor are freed.
Some of the pages in zswap are same-value filled pages (i.e. contents of thepage have same value or repetitive pattern). These pages include zero-filledpages and they are handled differently. During store operation, a page ischecked if it is a same-value filled page before compressing it. If true, thecompressed length of the page is set to zero and the pattern or same-filledvalue is stored.
Same-value filled pages identification feature is enabled by default and can bedisabled at boot time by setting thesame_filled_pages_enabled attributeto 0, e.g.zswap.same_filled_pages_enabled=0. It can also be enabled anddisabled at runtime using the sysfssame_filled_pages_enabledattribute, e.g.:
echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
When zswap same-filled page identification is disabled at runtime, it will stopchecking for the same-value filled pages during store operation. However, theexisting pages which are marked as same-value filled pages remain storedunchanged in zswap until they are either loaded or invalidated.
To prevent zswap from shrinking pool when zswap is full and there’s a highpressure on swap (this will result in flipping pages in and out zswap poolwithout any real benefit but with a performance drop for the system), aspecial parameter has been introduced to implement a sort of hysteresis torefuse taking pages into zswap pool until it has sufficient space if the limithas been hit. To set the threshold at which zswap would start accepting pagesagain after it became full, use the sysfsaccept_threshold_percentattribute, e. g.:
echo 80 > /sys/module/zswap/parameters/accept_threshold_percent
Setting this parameter to 100 will disable the hysteresis.
A debugfs interface is provided for various statistic about pool size, numberof pages stored, same-value filled pages and various counters for the reasonspages are rejected.