Generic Radix Page Table¶
Generic Radix Page Table is a set of functions and helpers to efficientlyparse radix style page tables typically seen in HW implementations. Theinterface is built to deliver similar code generation as the mm’s pte/pmd/etcsystem by fully inlining the exact code required to handle each table level.
Like the mm subsystem each format contributes its parsing implementationunder common names and the common code implements the required algorithms.
The system is divided into three logical levels:
The page table format and its manipulation functions
Generic helpers to give a consistent API regardless of underlying format
An algorithm implementation (e.g. IOMMU/DRM/KVM/MM)
Multiple implementations are supported. The intention is to have the genericformat code be re-usable for whatever specialized implementation is required.The generic code is solely about the format of the radix tree; it does notinclude memory allocation or higher level decisions that are left for theimplementation.
The generic framework supports a superset of functions across many HWimplementations:
Entries comprised of contiguous blocks of IO PTEs for larger page sizes
Multi-level tables, up to 6 levels. Runtime selected top level
Runtime variable table level size (ARM’s concatenated tables)
Expandable top level allowing dynamic sizing of table levels
Optional leaf entries at any level
32-bit/64-bit virtual and output addresses, using every address bit
Dirty tracking
Sign extended addressing
- Language used in Generic Page Table
- VA
The input address to the page table, often the virtual address.
- OA
The output address from the page table, often the physical address.
- leaf
An entry that results in an output address.
- start/end
An half-open range, e.g. [0,0) refers to no VA.
- start/last
An inclusive closed range, e.g. [0,0] refers to the VA 0
- common
The generic page table container
structpt_common- level
Level 0 is always a table of only leaves with no futher table pointers.Increasing levels increase the size of the table items. The leastsignificant VA bits used to index page tables are used to index the Level0 table. The various labels for table levels used by HW descriptions arenot used.
- top_level
The inclusive highest level of the table. A two-level tablehas a top level of 1.
- table
A linear array of translation items for that level.
- index
The position in a table of an element: item = table[index]
- item
A single index in a table
- entry
A single logical element in a table. If contiguous pages are notsupported then item and entry are the same thing, otherwise entry refersto all the items that comprise a single contiguous translation.
- item/entry_size
The number of bytes of VA the table index translates for.If the item is a table entry then the next table coversthis size. If the entry translates to an output address then thefull OA is: OA | (VA % entry_size)
- contig_count
The number of consecutive items fused into a single entry.item_size * contig_count is the size of that entry’s translation.
- lg2
Indicates the value is encoded as log2, i.e. 1<<x is the actual value.Normally the compiler is fine to optimize divide and mod with log2 valuesautomatically when inlining, however if the values are not constantexpressions it can’t. So we do it by hand; we want to avoid 64-bitdivmod.
Usage¶
Generic PT is structured as a multi-compilation system. Since each formatprovides an API using a common set of names there can be only one format activewithin a compilation unit. This design avoids function pointers around the lowlevel API.
Instead the function pointers can end up at the higher level API (i.e.map/unmap, etc.) and the per-format code can be directly inlined into theper-format compilation unit. For something like IOMMU each format will becompiled into a per-format IOMMU operations kernel module.
For this to work the .c file for each compilation unit will include both theformat headers and the generic code for the implementation. For instance in animplementation compilation unit the headers would normally be included asfollows:
generic_pt/fmt/iommu_amdv1.c:
#include <linux/generic_pt/common.h>#include "defs_amdv1.h"#include "../pt_defs.h"#include "amdv1.h"#include "../pt_common.h"#include "../pt_iter.h"#include "../iommu_pt.h" /* The IOMMU implementation */
iommu_pt.h includes definitions that will generate the operations functions formap/unmap/etc. using the definitions provided by AMDv1. The resulting modulewill have exported symbols named likept_iommu_amdv1_init().
Refer to drivers/iommu/generic_pt/fmt/iommu_template.h for an example of how theIOMMU implementation uses multi-compilation to generate per-format ops structspointers.
The format code is written so that the common names arise from #defines todistinct format specific names. This is intended to aid debuggability byavoiding symbol clashes across all the different formats.
Exported symbols and other global names are mangled using a per-format stringvia theNS() helper macro.
The format usesstructpt_common as the top-level struct for the table,and each format will have its ownstructpt_xxx which embeds it to storeformat-specific information.
The implementation will further wrapstructpt_common in its own top-levelstruct, such asstructpt_iommu_amdv1.
Format functions at the struct pt_common level¶
- structpt_common¶
struct for all page table implementations
Definition:
struct pt_common { uintptr_t top_of_table; u8 max_oasz_lg2; u8 max_vasz_lg2; unsigned int features;};Members
top_of_tableEncodes the table top pointer and the top level in asingle value. Must use READ_ONCE/WRITE_ONCE to access it. The lowerbits of the aligned table pointer are used for the level.
max_oasz_lg2Maximum number of bits the OA can contain. Upper bitsmust be zero. This may be less than what the page table formatsupports, but must not be more.
max_vasz_lg2Maximum number of bits the VA can contain. Upper bitsare 0 or 1 depending on
pt_full_va_prefix(). This may be less thanwhat the page table format supports, but must not be more. WhenPT_FEAT_DYNAMIC_TOP is set this reflects the maximum VA capability.featuresBitmap of
enumpt_features
- enumpt_features¶
Features turned on in the table. Each symbol is a bit position.
Constants
PT_FEAT_DMA_INCOHERENTCache flush page table memory beforeassuming the HW can read it. Otherwise a SMP release is sufficientfor HW to read it.
PT_FEAT_FULL_VAThe table can span the full VA range from 0 toPT_VADDR_MAX.
PT_FEAT_DYNAMIC_TOPThe table’s top level can be increaseddynamically during map. This requires HW support for atomicallysetting both the table top pointer and the starting table level.
PT_FEAT_SIGN_EXTENDThe top most bit of the valid VA range signextends up to the full pt_vaddr_t. This divides the page table intothree VA ranges:
0 -> 2^N - 1 Lower2^N -> (MAX - 2^N - 1) Non-CanonicalMAX - 2^N -> MAX Upper
In this mode pt_common::max_vasz_lg2 includes the sign bit and theupper bits that don’t fall within the translation are just validated.
If not set there is no sign extension and valid VA goes from 0 to 2^N- 1.
PT_FEAT_FLUSH_RANGEIOTLB maintenance is done by flushing IOVAranges which will clean out any walk cache or any IOPTE fullycontained by the range. The optimization objective is to minimize thenumber of flushes even if ranges include IOVA gaps that do not needto be flushed.
PT_FEAT_FLUSH_RANGE_NO_GAPSLike PT_FEAT_FLUSH_RANGE except thatthe optimization objective is to only flush IOVA that has beenchanged. This mode is suitable for cases like hypervisor shadowingwhere flushing unchanged ranges may cause the hypervisor to reparsesignificant amount of page table.
- voidpt_attr_from_entry(conststructpt_state*pts,structpt_write_attrs*attrs)¶
Convert the permission bits back to attrs
Parameters
conststructpt_state*ptsEntry to convert from
structpt_write_attrs*attrsResulting attrs
Description
Fill in the attrs with the permission bits encoded in the current leaf entry.The attrs should be usable withpt_install_leaf_entry() to reconstruct thesame entry.
- boolpt_can_have_leaf(conststructpt_state*pts)¶
True if the current level can have an OA entry
Parameters
conststructpt_state*ptsThe current level
Description
True if the current level can supportpt_install_leaf_entry(). A leafentry produce an OA.
- boolpt_can_have_table(conststructpt_state*pts)¶
True if the current level can have a lower table
Parameters
conststructpt_state*ptsThe current level
Description
Every level except 0 is allowed to have a lower table.
- voidpt_clear_entries(structpt_state*pts,unsignedintnum_contig_lg2)¶
Make entries empty (non-present)
Parameters
structpt_state*ptsStarting table index
unsignedintnum_contig_lg2Number of contiguous items to clear
Description
Clear a run of entries. A cleared entry will load back as PT_ENTRY_EMPTYand does not have any effect on table walking. The starting index must bealigned to num_contig_lg2.
- boolpt_entry_make_write_dirty(structpt_state*pts)¶
Make an entry dirty
Parameters
structpt_state*ptsTable entry to change
Description
Makept_entry_is_write_dirty() return true for this entry. This can be calledasynchronously with any other table manipulation under a RCU lock and mustnot corrupt the table.
- voidpt_entry_make_write_clean(structpt_state*pts)¶
Make the entry write clean
Parameters
structpt_state*ptsTable entry to change
Description
Modify the entry so thatpt_entry_is_write_dirty() == false. The HW willeventually be notified of this change via a TLB flush, which is the pointthat the HW must become synchronized. Any “write dirty” prior to the TLBflush can be lost, but once the TLB flush completes all writes must maketheir entries write dirty.
The format should alter the entry in a way that is compatible with anyconcurrent update from HW. The entire contiguous entry is changed.
- boolpt_entry_is_write_dirty(conststructpt_state*pts)¶
True if the entry has been written to
Parameters
conststructpt_state*ptsEntry to query
Description
“write dirty” means that the HW has written to the OA translatedby this entry. If the entry is contiguous then the consolidated“write dirty” for all the items must be returned.
Parameters
structpt_common*commonPage table to query
- unsignedintpt_entry_num_contig_lg2(conststructpt_state*pts)¶
Number of contiguous items for this leaf entry
Parameters
conststructpt_state*ptsEntry to query
Description
Return the number of contiguous items this leaf entry spans. If the entryis single item it returns ilog2(1).
- pt_oaddr_tpt_entry_oa(conststructpt_state*pts)¶
Output Address for this leaf entry
Parameters
conststructpt_state*ptsEntry to query
Description
Return the output address for the start of the entry. If the entryis contiguous this returns the same value for each sub-item. I.e.:
log2_mod(pt_entry_oa(), pt_entry_oa_lg2sz()) == 0
Seept_item_oa(). The format should implement one of these two functionsdepending on how it stores the OAs in the table.
- unsignedintpt_entry_oa_lg2sz(conststructpt_state*pts)¶
Return the size of an OA entry
Parameters
conststructpt_state*ptsEntry to query
Description
If the entry is not contiguous this returnspt_table_item_lg2sz(), otherwiseit returns the total VA/OA size of the entire contiguous entry.
- pt_oaddr_tpt_entry_oa_exact(conststructpt_state*pts)¶
Return the complete OA for an entry
Parameters
conststructpt_state*ptsEntry to query
Description
During iteration the first entry could have a VA with an offset from thenatural start of the entry. Return the exact OA including the pts’s VAoffset.
Parameters
conststructpt_common*commonPage table to query
Description
This is usually 0, but some formats have their VA space going downward fromPT_VADDR_MAX, and will return that instead. This value must always beadjusted bystructpt_common max_vasz_lg2.
- boolpt_has_system_page_size(conststructpt_common*common)¶
True if level 0 can install a PAGE_SHIFT entry
Parameters
conststructpt_common*commonPage table to query
Description
If true the caller can use, at level 0, pt_install_leaf_entry(PAGE_SHIFT).This is useful to create optimized paths for common cases of PAGE_SIZEmappings.
- voidpt_install_leaf_entry(structpt_state*pts,pt_oaddr_toa,unsignedintoasz_lg2,conststructpt_write_attrs*attrs)¶
Write a leaf entry to the table
Parameters
structpt_state*ptsTable index to change
pt_oaddr_toaOutput Address for this leaf
unsignedintoasz_lg2Size in VA/OA for this leaf
conststructpt_write_attrs*attrsAttributes to modify the entry
Description
A leaf OA entry will return PT_ENTRY_OA frompt_load_entry(). It translatesthe VA indicated by pts to the given OA.
For a single item non-contiguous entry oasz_lg2 ispt_table_item_lg2sz().For contiguous it ispt_table_item_lg2sz() + num_contig_lg2.
This must not be called ifpt_can_have_leaf() == false. Contiguous sizesnot indicated bypt_possible_sizes() must not be specified.
- boolpt_install_table(structpt_state*pts,pt_oaddr_ttable_pa,conststructpt_write_attrs*attrs)¶
Write a table entry to the table
Parameters
structpt_state*ptsTable index to change
pt_oaddr_ttable_paCPU physical address of the lower table’s memory
conststructpt_write_attrs*attrsAttributes to modify the table index
Description
A table entry will return PT_ENTRY_TABLE frompt_load_entry(). The table_pais the table at pts->level - 1. This is done by cmpxchg so pts must have thecurrent entry loaded. The pts is updated with the installed entry.
This must not be called ifpt_can_have_table() == false.
Return
true if the table was installed successfully.
- pt_oaddr_tpt_item_oa(conststructpt_state*pts)¶
Output Address for this leaf item
Parameters
conststructpt_state*ptsItem to query
Description
Return the output address for this item. If the item is part of a contiguousentry it returns the value of the OA for this individual sub item.
Seept_entry_oa(). The format should implement one of these two functionsdepending on how it stores the OA’s in the table.
- enumpt_entry_typept_load_entry_raw(structpt_state*pts)¶
Read from the location pts points at into the pts
Parameters
structpt_state*ptsTable index to load
Description
Return the type of entry that was loaded. pts->entry will be filled in withthe entry’s content. Seept_load_entry()
- unsignedintpt_max_oa_lg2(conststructpt_common*common)¶
Return the maximum OA the table format can hold
Parameters
conststructpt_common*commonPage table to query
Description
The value oalog2_to_max_int(pt_max_oa_lg2()) is the MAX for theOA. This is the absolute maximum address the table can hold.structpt_commonmax_oasz_lg2 sets a lower dynamic maximum based on HW capability.
- unsignedintpt_num_items_lg2(conststructpt_state*pts)¶
Return the number of items in this table level
Parameters
conststructpt_state*ptsThe current level
Description
The number of items in a table level defines the number of bits this leveldecodes from the VA. This function is not called for the top level,so it does not need to compute a special value for the top case. Theresult for the top is based on pt_common max_vasz_lg2.
The value is used as part of determining the table indexes via theequation:
log2_mod(log2_div(VA, pt_table_item_lg2sz()), pt_num_items_lg2())
- unsignedintpt_pgsz_lg2_to_level(structpt_common*common,unsignedintpgsize_lg2)¶
Return the level that maps the page size
Parameters
structpt_common*commonPage table to query
unsignedintpgsize_lg2Log2 page size
Description
Returns the table level that will map the given page size. The pagesize must be part of thept_possible_sizes() for some level.
- pt_vaddr_tpt_possible_sizes(conststructpt_state*pts)¶
Return a bitmap of possible output sizes at this level
Parameters
conststructpt_state*ptsThe current level
Description
Each level has a list of possible output sizes that can be installed asleaf entries. Ifpt_can_have_leaf() is false returns zero.
Otherwise the bit in positionpt_table_item_lg2sz() should be set indicatingthat a non-contiguous single item leaf entry is supported. The followingpt_num_items_lg2() number of bits can be set indicating contiguous entriesare supported. Bitpt_table_item_lg2sz() +pt_num_items_lg2() must not beset, contiguous entries cannot span the entire table.
The OR ofpt_possible_sizes() of all levels is the typical bitmask of allsupported sizes in the entire table.
- unsignedintpt_table_item_lg2sz(conststructpt_state*pts)¶
Size of a single item entry in this table level
Parameters
conststructpt_state*ptsThe current level
Description
The size of the item specifies how much VA and OA a single item occupies.
Seept_entry_oa_lg2sz() for the same value including the effect of contiguousentries.
- unsignedintpt_table_oa_lg2sz(conststructpt_state*pts)¶
Return the VA/OA size of the entire table
Parameters
conststructpt_state*ptsThe current level
Description
Return the size of VA decoded by the entire table level.
- pt_oaddr_tpt_table_pa(conststructpt_state*pts)¶
Return the CPU physical address of the table entry
Parameters
conststructpt_state*ptsEntry to query
Description
This is only ever called on PT_ENTRY_TABLE entries. Must return the samevalue passed topt_install_table().
- structpt_table_p*pt_table_ptr(conststructpt_state*pts)¶
Return a CPU pointer for a table item
Parameters
conststructpt_state*ptsEntry to query
Description
Same aspt_table_pa() but returns a CPU pointer.
- unsignedintpt_max_sw_bit(structpt_common*common)¶
Return the maximum software bit usable for any level and entry
Parameters
structpt_common*commonPage table
Description
The swbit can be passed as bitnr to the other sw_bit functions.
- boolpt_test_sw_bit_acquire(structpt_state*pts,unsignedintbitnr)¶
Read a software bit in an item
Parameters
structpt_state*ptsEntry to read
unsignedintbitnrBit to read
Description
Software bits are ignored by HW and can be used for any purpose by thesoftware. This does a test bit and acquire operation.
- voidpt_set_sw_bit_release(structpt_state*pts,unsignedintbitnr)¶
Set a software bit in an item
Parameters
structpt_state*ptsEntry to set
unsignedintbitnrBit to set
Description
Software bits are ignored by HW and can be used for any purpose by thesoftware. This does a set bit and release operation.
- voidpt_load_entry(structpt_state*pts)¶
Read from the location pts points at into the pts
Parameters
structpt_state*ptsTable index to load
Description
Set the type of entry that was loaded. pts->entry and pts->table_lowerwill be filled in with the entry’s content.
Iteration Helpers¶
- intpt_check_range(structpt_range*range)¶
Validate the range can be iterated
Parameters
structpt_range*rangeRange to validate
Description
Check that VA and last_va fall within the permitted range of VAs. If theformat is using PT_FEAT_SIGN_EXTEND then this also checks the sign extensionis correct.
- voidpt_index_to_va(structpt_state*pts)¶
Update range->va to the current pts->index
Parameters
structpt_state*ptsIteration State
Description
Adjust range->va to match the current index. This is done in a lazy mannersince computing the VA takes several instructions and is rarely required.
- boolpt_entry_fully_covered(conststructpt_state*pts,unsignedintoasz_lg2)¶
Check if the item or entry is entirely contained within pts->range
Parameters
conststructpt_state*ptsIteration State
unsignedintoasz_lg2The size of the item to check,
pt_table_item_lg2sz()orpt_entry_oa_lg2sz()
Return
true if the item is fully enclosed by the pts->range.
- unsignedintpt_range_to_index(conststructpt_state*pts)¶
Starting index for an iteration
Parameters
conststructpt_state*ptsIteration State
Return
the starting index for the iteration in pts.
- unsignedintpt_range_to_end_index(conststructpt_state*pts)¶
Ending index iteration
Parameters
conststructpt_state*ptsIteration State
Return
the last index for the iteration in pts.
- voidpt_next_entry(structpt_state*pts)¶
Advance pts to the next entry
Parameters
structpt_state*ptsIteration State
Description
Update pts to go to the next index at this level. If pts is pointing at acontiguous entry then the index may advance my more than one.
- for_each_pt_level_entry¶
for_each_pt_level_entry(pts)
For loop wrapper over entries in the range
Parameters
ptsIteration State
Description
This is the basic iteration primitive. It iterates over all the entries inpts->range that fall within the pts’s current table level. Each step doespt_load_entry(pts).
- enumpt_entry_typept_load_single_entry(structpt_state*pts)¶
Version of
pt_load_entry()usable within a walker
Parameters
structpt_state*ptsIteration State
Description
Alternative tofor_each_pt_level_entry() if the walker function uses only asingle entry.
Parameters
structpt_common*commonTable
Description
For PT_FEAT_SIGN_EXTEND this will return the lower range, and cover half thetotal page table. Otherwise it returns the entire page table.
Parameters
structpt_common*commonTable
Description
The returned range spans the whole page table. Due to how PT_FEAT_SIGN_EXTENDis supported range->va and range->last_va will be incorrect during theiteration and must not be accessed.
- structpt_rangept_upper_range(structpt_common*common)¶
Return a range that spans part of the top level
Parameters
structpt_common*commonTable
Description
For PT_FEAT_SIGN_EXTEND this will return the upper range, and cover half thetotal page table. Otherwise it returns the entire page table.
- structpt_rangept_make_range(structpt_common*common,pt_vaddr_tva,pt_vaddr_tlast_va)¶
Return a range that spans part of the table
Parameters
structpt_common*commonTable
pt_vaddr_tvaStart address
pt_vaddr_tlast_vaLast address
Description
The caller must validate the range withpt_check_range() before using it.
- structpt_statept_init(structpt_range*range,unsignedintlevel,structpt_table_p*table)¶
Initialize a pt_state on the stack
Parameters
structpt_range*rangeRange pointer to embed in the state
unsignedintlevelTable level for the state
structpt_table_p*tablePointer to the table memory at level
Description
Helper to initialize the on-stack pt_state from walker arguments.
- structpt_statept_init_top(structpt_range*range)¶
Initialize a pt_state on the stack
Parameters
structpt_range*rangeRange pointer to embed in the state
Description
The pt_state points to the top most level.
- intpt_descend(structpt_state*pts,void*arg,pt_level_fn_tfn)¶
Recursively invoke the walker for the lower level
Parameters
structpt_state*ptsIteration State
void*argValue to pass to the function
pt_level_fn_tfnWalker function to call
Description
pts must point to a table item. Invoke fn as a walker on the tablepts points to.
- intpt_walk_range(structpt_range*range,pt_level_fn_tfn,void*arg)¶
Walk over a VA range
Parameters
structpt_range*rangeRange pointer
pt_level_fn_tfnWalker function to call
void*argValue to pass to the function
Description
Walk over a VA range. The caller should have done a validity check, atleast callingpt_check_range(), when building range. The walk willstart at the top most table.
- structpt_rangept_range_slice(conststructpt_state*pts,unsignedintstart_index,unsignedintend_index)¶
Return a range that spans indexes
Parameters
conststructpt_state*ptsIteration State
unsignedintstart_indexStarting index within pts
unsignedintend_indexEnding index within pts
Description
Create a range than spans an index range of the current table levelpt_state points at.
Parameters
structpt_common*commonTable
uintptr_ttop_of_tableTop of table value from
_pt_top_set()
Description
Compute the allocation size of the top table. For PT_FEAT_DYNAMIC_TOP thiswill compute the top size assuming the table will grow.
- unsignedintpt_compute_best_pgsize(pt_vaddr_tpgsz_bitmap,pt_vaddr_tva,pt_vaddr_tlast_va,pt_oaddr_toa)¶
Determine the best page size for leaf entries
Parameters
pt_vaddr_tpgsz_bitmapPermitted page sizes
pt_vaddr_tvaStarting virtual address for the leaf entry
pt_vaddr_tlast_vaLast virtual address for the leaf entry, sets the max page size
pt_oaddr_toaStarting output address for the leaf entry
Description
Compute the largest page size for va, last_va, and oa together and return itin lg2. The largest page size depends on the format’s supported page sizes atthis level, and the relative alignment of the VA and OA addresses. 0 meansthe OA cannot be stored with the provided pgsz_bitmap.
- PT_MAKE_LEVELS¶
PT_MAKE_LEVELS(fn,do_fn)
Build an unwound walker
Parameters
fnName of the walker function
do_fnFunction to call at each level
Description
This builds a function call tree that can be fully inlined.The caller must provide a function body in an __always_inline function:
static __always_inline int do_fn(struct pt_range *range, void *arg, unsigned int level, struct pt_table_p *table, pt_level_fn_t descend_fn)
An inline function will be created for each table level that calls do_fn witha compile time constant for level and a pointer to the next lower function.This generates an optimally inlined walk where each of the functions sees aconstant level and can codegen the exact constants/etc for that level.
Note this can produce a lot of code!
Writing a Format¶
It is best to start from a simple format that is similar to the target. x86_64is usually a good reference for something simple, and AMDv1 is something fairlycomplete.
The required inline functions need to be implemented in the format header.These should all follow the standard pattern of:
static inline pt_oaddr_t amdv1pt_entry_oa(const struct pt_state *pts){ [..]}#define pt_entry_oa amdv1pt_entry_oawhere a uniquely named per-format inline function provides the implementationand a define maps it to the generic name. This is intended to make debug symbolswork better. inline functions should always be used as the prototypes inpt_common.h will cause the compiler to validate the function signature toprevent errors.
Review pt_fmt_defaults.h to understand some of the optional inlines.
Once the format compiles then it should be run through the generic page tablekunit test in kunit_generic_pt.h using kunit. For example:
$ tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig amdv1_fmt_test.*[...][11:15:08] Testing complete. Ran 9 tests: passed: 9[11:15:09] Elapsed time: 3.137s total, 0.001s configuring, 2.368s building, 0.311s running
The generic tests are intended to prove out the format functions and giveclearer failures to speed up finding the problems. Once those pass then theentire kunit suite should be run.
IOMMU Invalidation Features¶
Invalidation is how the page table algorithms synchronize with a HW cache of thepage table memory, typically called the TLB (or IOTLB for IOMMU cases).
The TLB can store present PTEs, non-present PTEs and table pointers, dependingon its design. Every HW has its own approach on how to describe what has changedto have changed items removed from the TLB.
PT_FEAT_FLUSH_RANGE¶
PT_FEAT_FLUSH_RANGE is the easiest scheme to understand. It tries to generate asingle range invalidation for each operation, over-invalidating if there aregaps of VA that don’t need invalidation. This trades off impacted VA for numberof invalidation operations. It does not keep track of what is being invalidated;however, if pages have to be freed then page table pointers have to be cleanedfrom the walk cache. The range can start/end at any page boundary.
PT_FEAT_FLUSH_RANGE_NO_GAPS¶
PT_FEAT_FLUSH_RANGE_NO_GAPS is similar to PT_FEAT_FLUSH_RANGE; however, it triesto minimize the amount of impacted VA by issuing extra flush operations. This isuseful if the cost of processing VA is very high, for instance because ahypervisor is processing the page table with a shadowing algorithm.