Generic Radix Page Table

Generic Radix Page Table is a set of functions and helpers to efficientlyparse radix style page tables typically seen in HW implementations. Theinterface is built to deliver similar code generation as the mm’s pte/pmd/etcsystem by fully inlining the exact code required to handle each table level.

Like the mm subsystem each format contributes its parsing implementationunder common names and the common code implements the required algorithms.

The system is divided into three logical levels:

  • The page table format and its manipulation functions

  • Generic helpers to give a consistent API regardless of underlying format

  • An algorithm implementation (e.g. IOMMU/DRM/KVM/MM)

Multiple implementations are supported. The intention is to have the genericformat code be re-usable for whatever specialized implementation is required.The generic code is solely about the format of the radix tree; it does notinclude memory allocation or higher level decisions that are left for theimplementation.

The generic framework supports a superset of functions across many HWimplementations:

  • Entries comprised of contiguous blocks of IO PTEs for larger page sizes

  • Multi-level tables, up to 6 levels. Runtime selected top level

  • Runtime variable table level size (ARM’s concatenated tables)

  • Expandable top level allowing dynamic sizing of table levels

  • Optional leaf entries at any level

  • 32-bit/64-bit virtual and output addresses, using every address bit

  • Dirty tracking

  • Sign extended addressing

Language used in Generic Page Table
VA

The input address to the page table, often the virtual address.

OA

The output address from the page table, often the physical address.

leaf

An entry that results in an output address.

start/end

An half-open range, e.g. [0,0) refers to no VA.

start/last

An inclusive closed range, e.g. [0,0] refers to the VA 0

common

The generic page table containerstructpt_common

level

Level 0 is always a table of only leaves with no futher table pointers.Increasing levels increase the size of the table items. The leastsignificant VA bits used to index page tables are used to index the Level0 table. The various labels for table levels used by HW descriptions arenot used.

top_level

The inclusive highest level of the table. A two-level tablehas a top level of 1.

table

A linear array of translation items for that level.

index

The position in a table of an element: item = table[index]

item

A single index in a table

entry

A single logical element in a table. If contiguous pages are notsupported then item and entry are the same thing, otherwise entry refersto all the items that comprise a single contiguous translation.

item/entry_size

The number of bytes of VA the table index translates for.If the item is a table entry then the next table coversthis size. If the entry translates to an output address then thefull OA is: OA | (VA % entry_size)

contig_count

The number of consecutive items fused into a single entry.item_size * contig_count is the size of that entry’s translation.

lg2

Indicates the value is encoded as log2, i.e. 1<<x is the actual value.Normally the compiler is fine to optimize divide and mod with log2 valuesautomatically when inlining, however if the values are not constantexpressions it can’t. So we do it by hand; we want to avoid 64-bitdivmod.

Usage

Generic PT is structured as a multi-compilation system. Since each formatprovides an API using a common set of names there can be only one format activewithin a compilation unit. This design avoids function pointers around the lowlevel API.

Instead the function pointers can end up at the higher level API (i.e.map/unmap, etc.) and the per-format code can be directly inlined into theper-format compilation unit. For something like IOMMU each format will becompiled into a per-format IOMMU operations kernel module.

For this to work the .c file for each compilation unit will include both theformat headers and the generic code for the implementation. For instance in animplementation compilation unit the headers would normally be included asfollows:

generic_pt/fmt/iommu_amdv1.c:

#include <linux/generic_pt/common.h>#include "defs_amdv1.h"#include "../pt_defs.h"#include "amdv1.h"#include "../pt_common.h"#include "../pt_iter.h"#include "../iommu_pt.h"  /* The IOMMU implementation */

iommu_pt.h includes definitions that will generate the operations functions formap/unmap/etc. using the definitions provided by AMDv1. The resulting modulewill have exported symbols named likept_iommu_amdv1_init().

Refer to drivers/iommu/generic_pt/fmt/iommu_template.h for an example of how theIOMMU implementation uses multi-compilation to generate per-format ops structspointers.

The format code is written so that the common names arise from #defines todistinct format specific names. This is intended to aid debuggability byavoiding symbol clashes across all the different formats.

Exported symbols and other global names are mangled using a per-format stringvia theNS() helper macro.

The format usesstructpt_common as the top-level struct for the table,and each format will have its ownstructpt_xxx which embeds it to storeformat-specific information.

The implementation will further wrapstructpt_common in its own top-levelstruct, such asstructpt_iommu_amdv1.

Format functions at the struct pt_common level

structpt_common

struct for all page table implementations

Definition:

struct pt_common {    uintptr_t top_of_table;    u8 max_oasz_lg2;    u8 max_vasz_lg2;    unsigned int features;};

Members

top_of_table

Encodes the table top pointer and the top level in asingle value. Must use READ_ONCE/WRITE_ONCE to access it. The lowerbits of the aligned table pointer are used for the level.

max_oasz_lg2

Maximum number of bits the OA can contain. Upper bitsmust be zero. This may be less than what the page table formatsupports, but must not be more.

max_vasz_lg2

Maximum number of bits the VA can contain. Upper bitsare 0 or 1 depending onpt_full_va_prefix(). This may be less thanwhat the page table format supports, but must not be more. WhenPT_FEAT_DYNAMIC_TOP is set this reflects the maximum VA capability.

features

Bitmap ofenumpt_features

enumpt_features

Features turned on in the table. Each symbol is a bit position.

Constants

PT_FEAT_DMA_INCOHERENT

Cache flush page table memory beforeassuming the HW can read it. Otherwise a SMP release is sufficientfor HW to read it.

PT_FEAT_FULL_VA

The table can span the full VA range from 0 toPT_VADDR_MAX.

PT_FEAT_DYNAMIC_TOP

The table’s top level can be increaseddynamically during map. This requires HW support for atomicallysetting both the table top pointer and the starting table level.

PT_FEAT_SIGN_EXTEND

The top most bit of the valid VA range signextends up to the full pt_vaddr_t. This divides the page table intothree VA ranges:

0         -> 2^N - 1             Lower2^N       -> (MAX - 2^N - 1)     Non-CanonicalMAX - 2^N -> MAX                 Upper

In this mode pt_common::max_vasz_lg2 includes the sign bit and theupper bits that don’t fall within the translation are just validated.

If not set there is no sign extension and valid VA goes from 0 to 2^N- 1.

PT_FEAT_FLUSH_RANGE

IOTLB maintenance is done by flushing IOVAranges which will clean out any walk cache or any IOPTE fullycontained by the range. The optimization objective is to minimize thenumber of flushes even if ranges include IOVA gaps that do not needto be flushed.

PT_FEAT_FLUSH_RANGE_NO_GAPS

Like PT_FEAT_FLUSH_RANGE except thatthe optimization objective is to only flush IOVA that has beenchanged. This mode is suitable for cases like hypervisor shadowingwhere flushing unchanged ranges may cause the hypervisor to reparsesignificant amount of page table.

voidpt_attr_from_entry(conststructpt_state*pts,structpt_write_attrs*attrs)

Convert the permission bits back to attrs

Parameters

conststructpt_state*pts

Entry to convert from

structpt_write_attrs*attrs

Resulting attrs

Description

Fill in the attrs with the permission bits encoded in the current leaf entry.The attrs should be usable withpt_install_leaf_entry() to reconstruct thesame entry.

boolpt_can_have_leaf(conststructpt_state*pts)

True if the current level can have an OA entry

Parameters

conststructpt_state*pts

The current level

Description

True if the current level can supportpt_install_leaf_entry(). A leafentry produce an OA.

boolpt_can_have_table(conststructpt_state*pts)

True if the current level can have a lower table

Parameters

conststructpt_state*pts

The current level

Description

Every level except 0 is allowed to have a lower table.

voidpt_clear_entries(structpt_state*pts,unsignedintnum_contig_lg2)

Make entries empty (non-present)

Parameters

structpt_state*pts

Starting table index

unsignedintnum_contig_lg2

Number of contiguous items to clear

Description

Clear a run of entries. A cleared entry will load back as PT_ENTRY_EMPTYand does not have any effect on table walking. The starting index must bealigned to num_contig_lg2.

boolpt_entry_make_write_dirty(structpt_state*pts)

Make an entry dirty

Parameters

structpt_state*pts

Table entry to change

Description

Makept_entry_is_write_dirty() return true for this entry. This can be calledasynchronously with any other table manipulation under a RCU lock and mustnot corrupt the table.

voidpt_entry_make_write_clean(structpt_state*pts)

Make the entry write clean

Parameters

structpt_state*pts

Table entry to change

Description

Modify the entry so thatpt_entry_is_write_dirty() == false. The HW willeventually be notified of this change via a TLB flush, which is the pointthat the HW must become synchronized. Any “write dirty” prior to the TLBflush can be lost, but once the TLB flush completes all writes must maketheir entries write dirty.

The format should alter the entry in a way that is compatible with anyconcurrent update from HW. The entire contiguous entry is changed.

boolpt_entry_is_write_dirty(conststructpt_state*pts)

True if the entry has been written to

Parameters

conststructpt_state*pts

Entry to query

Description

“write dirty” means that the HW has written to the OA translatedby this entry. If the entry is contiguous then the consolidated“write dirty” for all the items must be returned.

boolpt_dirty_supported(structpt_common*common)

True if the page table supports dirty tracking

Parameters

structpt_common*common

Page table to query

unsignedintpt_entry_num_contig_lg2(conststructpt_state*pts)

Number of contiguous items for this leaf entry

Parameters

conststructpt_state*pts

Entry to query

Description

Return the number of contiguous items this leaf entry spans. If the entryis single item it returns ilog2(1).

pt_oaddr_tpt_entry_oa(conststructpt_state*pts)

Output Address for this leaf entry

Parameters

conststructpt_state*pts

Entry to query

Description

Return the output address for the start of the entry. If the entryis contiguous this returns the same value for each sub-item. I.e.:

log2_mod(pt_entry_oa(), pt_entry_oa_lg2sz()) == 0

Seept_item_oa(). The format should implement one of these two functionsdepending on how it stores the OAs in the table.

unsignedintpt_entry_oa_lg2sz(conststructpt_state*pts)

Return the size of an OA entry

Parameters

conststructpt_state*pts

Entry to query

Description

If the entry is not contiguous this returnspt_table_item_lg2sz(), otherwiseit returns the total VA/OA size of the entire contiguous entry.

pt_oaddr_tpt_entry_oa_exact(conststructpt_state*pts)

Return the complete OA for an entry

Parameters

conststructpt_state*pts

Entry to query

Description

During iteration the first entry could have a VA with an offset from thenatural start of the entry. Return the exact OA including the pts’s VAoffset.

pt_vaddr_tpt_full_va_prefix(conststructpt_common*common)

The top bits of the VA

Parameters

conststructpt_common*common

Page table to query

Description

This is usually 0, but some formats have their VA space going downward fromPT_VADDR_MAX, and will return that instead. This value must always beadjusted bystructpt_common max_vasz_lg2.

boolpt_has_system_page_size(conststructpt_common*common)

True if level 0 can install a PAGE_SHIFT entry

Parameters

conststructpt_common*common

Page table to query

Description

If true the caller can use, at level 0, pt_install_leaf_entry(PAGE_SHIFT).This is useful to create optimized paths for common cases of PAGE_SIZEmappings.

voidpt_install_leaf_entry(structpt_state*pts,pt_oaddr_toa,unsignedintoasz_lg2,conststructpt_write_attrs*attrs)

Write a leaf entry to the table

Parameters

structpt_state*pts

Table index to change

pt_oaddr_toa

Output Address for this leaf

unsignedintoasz_lg2

Size in VA/OA for this leaf

conststructpt_write_attrs*attrs

Attributes to modify the entry

Description

A leaf OA entry will return PT_ENTRY_OA frompt_load_entry(). It translatesthe VA indicated by pts to the given OA.

For a single item non-contiguous entry oasz_lg2 ispt_table_item_lg2sz().For contiguous it ispt_table_item_lg2sz() + num_contig_lg2.

This must not be called ifpt_can_have_leaf() == false. Contiguous sizesnot indicated bypt_possible_sizes() must not be specified.

boolpt_install_table(structpt_state*pts,pt_oaddr_ttable_pa,conststructpt_write_attrs*attrs)

Write a table entry to the table

Parameters

structpt_state*pts

Table index to change

pt_oaddr_ttable_pa

CPU physical address of the lower table’s memory

conststructpt_write_attrs*attrs

Attributes to modify the table index

Description

A table entry will return PT_ENTRY_TABLE frompt_load_entry(). The table_pais the table at pts->level - 1. This is done by cmpxchg so pts must have thecurrent entry loaded. The pts is updated with the installed entry.

This must not be called ifpt_can_have_table() == false.

Return

true if the table was installed successfully.

pt_oaddr_tpt_item_oa(conststructpt_state*pts)

Output Address for this leaf item

Parameters

conststructpt_state*pts

Item to query

Description

Return the output address for this item. If the item is part of a contiguousentry it returns the value of the OA for this individual sub item.

Seept_entry_oa(). The format should implement one of these two functionsdepending on how it stores the OA’s in the table.

enumpt_entry_typept_load_entry_raw(structpt_state*pts)

Read from the location pts points at into the pts

Parameters

structpt_state*pts

Table index to load

Description

Return the type of entry that was loaded. pts->entry will be filled in withthe entry’s content. Seept_load_entry()

unsignedintpt_max_oa_lg2(conststructpt_common*common)

Return the maximum OA the table format can hold

Parameters

conststructpt_common*common

Page table to query

Description

The value oalog2_to_max_int(pt_max_oa_lg2()) is the MAX for theOA. This is the absolute maximum address the table can hold.structpt_commonmax_oasz_lg2 sets a lower dynamic maximum based on HW capability.

unsignedintpt_num_items_lg2(conststructpt_state*pts)

Return the number of items in this table level

Parameters

conststructpt_state*pts

The current level

Description

The number of items in a table level defines the number of bits this leveldecodes from the VA. This function is not called for the top level,so it does not need to compute a special value for the top case. Theresult for the top is based on pt_common max_vasz_lg2.

The value is used as part of determining the table indexes via theequation:

log2_mod(log2_div(VA, pt_table_item_lg2sz()), pt_num_items_lg2())
unsignedintpt_pgsz_lg2_to_level(structpt_common*common,unsignedintpgsize_lg2)

Return the level that maps the page size

Parameters

structpt_common*common

Page table to query

unsignedintpgsize_lg2

Log2 page size

Description

Returns the table level that will map the given page size. The pagesize must be part of thept_possible_sizes() for some level.

pt_vaddr_tpt_possible_sizes(conststructpt_state*pts)

Return a bitmap of possible output sizes at this level

Parameters

conststructpt_state*pts

The current level

Description

Each level has a list of possible output sizes that can be installed asleaf entries. Ifpt_can_have_leaf() is false returns zero.

Otherwise the bit in positionpt_table_item_lg2sz() should be set indicatingthat a non-contiguous single item leaf entry is supported. The followingpt_num_items_lg2() number of bits can be set indicating contiguous entriesare supported. Bitpt_table_item_lg2sz() +pt_num_items_lg2() must not beset, contiguous entries cannot span the entire table.

The OR ofpt_possible_sizes() of all levels is the typical bitmask of allsupported sizes in the entire table.

unsignedintpt_table_item_lg2sz(conststructpt_state*pts)

Size of a single item entry in this table level

Parameters

conststructpt_state*pts

The current level

Description

The size of the item specifies how much VA and OA a single item occupies.

Seept_entry_oa_lg2sz() for the same value including the effect of contiguousentries.

unsignedintpt_table_oa_lg2sz(conststructpt_state*pts)

Return the VA/OA size of the entire table

Parameters

conststructpt_state*pts

The current level

Description

Return the size of VA decoded by the entire table level.

pt_oaddr_tpt_table_pa(conststructpt_state*pts)

Return the CPU physical address of the table entry

Parameters

conststructpt_state*pts

Entry to query

Description

This is only ever called on PT_ENTRY_TABLE entries. Must return the samevalue passed topt_install_table().

structpt_table_p*pt_table_ptr(conststructpt_state*pts)

Return a CPU pointer for a table item

Parameters

conststructpt_state*pts

Entry to query

Description

Same aspt_table_pa() but returns a CPU pointer.

unsignedintpt_max_sw_bit(structpt_common*common)

Return the maximum software bit usable for any level and entry

Parameters

structpt_common*common

Page table

Description

The swbit can be passed as bitnr to the other sw_bit functions.

boolpt_test_sw_bit_acquire(structpt_state*pts,unsignedintbitnr)

Read a software bit in an item

Parameters

structpt_state*pts

Entry to read

unsignedintbitnr

Bit to read

Description

Software bits are ignored by HW and can be used for any purpose by thesoftware. This does a test bit and acquire operation.

voidpt_set_sw_bit_release(structpt_state*pts,unsignedintbitnr)

Set a software bit in an item

Parameters

structpt_state*pts

Entry to set

unsignedintbitnr

Bit to set

Description

Software bits are ignored by HW and can be used for any purpose by thesoftware. This does a set bit and release operation.

voidpt_load_entry(structpt_state*pts)

Read from the location pts points at into the pts

Parameters

structpt_state*pts

Table index to load

Description

Set the type of entry that was loaded. pts->entry and pts->table_lowerwill be filled in with the entry’s content.

Iteration Helpers

intpt_check_range(structpt_range*range)

Validate the range can be iterated

Parameters

structpt_range*range

Range to validate

Description

Check that VA and last_va fall within the permitted range of VAs. If theformat is using PT_FEAT_SIGN_EXTEND then this also checks the sign extensionis correct.

voidpt_index_to_va(structpt_state*pts)

Update range->va to the current pts->index

Parameters

structpt_state*pts

Iteration State

Description

Adjust range->va to match the current index. This is done in a lazy mannersince computing the VA takes several instructions and is rarely required.

boolpt_entry_fully_covered(conststructpt_state*pts,unsignedintoasz_lg2)

Check if the item or entry is entirely contained within pts->range

Parameters

conststructpt_state*pts

Iteration State

unsignedintoasz_lg2

The size of the item to check,pt_table_item_lg2sz() orpt_entry_oa_lg2sz()

Return

true if the item is fully enclosed by the pts->range.

unsignedintpt_range_to_index(conststructpt_state*pts)

Starting index for an iteration

Parameters

conststructpt_state*pts

Iteration State

Return

the starting index for the iteration in pts.

unsignedintpt_range_to_end_index(conststructpt_state*pts)

Ending index iteration

Parameters

conststructpt_state*pts

Iteration State

Return

the last index for the iteration in pts.

voidpt_next_entry(structpt_state*pts)

Advance pts to the next entry

Parameters

structpt_state*pts

Iteration State

Description

Update pts to go to the next index at this level. If pts is pointing at acontiguous entry then the index may advance my more than one.

for_each_pt_level_entry

for_each_pt_level_entry(pts)

For loop wrapper over entries in the range

Parameters

pts

Iteration State

Description

This is the basic iteration primitive. It iterates over all the entries inpts->range that fall within the pts’s current table level. Each step doespt_load_entry(pts).

enumpt_entry_typept_load_single_entry(structpt_state*pts)

Version ofpt_load_entry() usable within a walker

Parameters

structpt_state*pts

Iteration State

Description

Alternative tofor_each_pt_level_entry() if the walker function uses only asingle entry.

structpt_rangept_top_range(structpt_common*common)

Return a range that spans part of the top level

Parameters

structpt_common*common

Table

Description

For PT_FEAT_SIGN_EXTEND this will return the lower range, and cover half thetotal page table. Otherwise it returns the entire page table.

structpt_rangept_all_range(structpt_common*common)

Return a range that spans the entire page table

Parameters

structpt_common*common

Table

Description

The returned range spans the whole page table. Due to how PT_FEAT_SIGN_EXTENDis supported range->va and range->last_va will be incorrect during theiteration and must not be accessed.

structpt_rangept_upper_range(structpt_common*common)

Return a range that spans part of the top level

Parameters

structpt_common*common

Table

Description

For PT_FEAT_SIGN_EXTEND this will return the upper range, and cover half thetotal page table. Otherwise it returns the entire page table.

structpt_rangept_make_range(structpt_common*common,pt_vaddr_tva,pt_vaddr_tlast_va)

Return a range that spans part of the table

Parameters

structpt_common*common

Table

pt_vaddr_tva

Start address

pt_vaddr_tlast_va

Last address

Description

The caller must validate the range withpt_check_range() before using it.

structpt_statept_init(structpt_range*range,unsignedintlevel,structpt_table_p*table)

Initialize a pt_state on the stack

Parameters

structpt_range*range

Range pointer to embed in the state

unsignedintlevel

Table level for the state

structpt_table_p*table

Pointer to the table memory at level

Description

Helper to initialize the on-stack pt_state from walker arguments.

structpt_statept_init_top(structpt_range*range)

Initialize a pt_state on the stack

Parameters

structpt_range*range

Range pointer to embed in the state

Description

The pt_state points to the top most level.

intpt_descend(structpt_state*pts,void*arg,pt_level_fn_tfn)

Recursively invoke the walker for the lower level

Parameters

structpt_state*pts

Iteration State

void*arg

Value to pass to the function

pt_level_fn_tfn

Walker function to call

Description

pts must point to a table item. Invoke fn as a walker on the tablepts points to.

intpt_walk_range(structpt_range*range,pt_level_fn_tfn,void*arg)

Walk over a VA range

Parameters

structpt_range*range

Range pointer

pt_level_fn_tfn

Walker function to call

void*arg

Value to pass to the function

Description

Walk over a VA range. The caller should have done a validity check, atleast callingpt_check_range(), when building range. The walk willstart at the top most table.

structpt_rangept_range_slice(conststructpt_state*pts,unsignedintstart_index,unsignedintend_index)

Return a range that spans indexes

Parameters

conststructpt_state*pts

Iteration State

unsignedintstart_index

Starting index within pts

unsignedintend_index

Ending index within pts

Description

Create a range than spans an index range of the current table levelpt_state points at.

unsignedintpt_top_memsize_lg2(structpt_common*common,uintptr_ttop_of_table)

Parameters

structpt_common*common

Table

uintptr_ttop_of_table

Top of table value from_pt_top_set()

Description

Compute the allocation size of the top table. For PT_FEAT_DYNAMIC_TOP thiswill compute the top size assuming the table will grow.

unsignedintpt_compute_best_pgsize(pt_vaddr_tpgsz_bitmap,pt_vaddr_tva,pt_vaddr_tlast_va,pt_oaddr_toa)

Determine the best page size for leaf entries

Parameters

pt_vaddr_tpgsz_bitmap

Permitted page sizes

pt_vaddr_tva

Starting virtual address for the leaf entry

pt_vaddr_tlast_va

Last virtual address for the leaf entry, sets the max page size

pt_oaddr_toa

Starting output address for the leaf entry

Description

Compute the largest page size for va, last_va, and oa together and return itin lg2. The largest page size depends on the format’s supported page sizes atthis level, and the relative alignment of the VA and OA addresses. 0 meansthe OA cannot be stored with the provided pgsz_bitmap.

PT_MAKE_LEVELS

PT_MAKE_LEVELS(fn,do_fn)

Build an unwound walker

Parameters

fn

Name of the walker function

do_fn

Function to call at each level

Description

This builds a function call tree that can be fully inlined.The caller must provide a function body in an __always_inline function:

static __always_inline int do_fn(struct pt_range *range, void *arg,       unsigned int level, struct pt_table_p *table,       pt_level_fn_t descend_fn)

An inline function will be created for each table level that calls do_fn witha compile time constant for level and a pointer to the next lower function.This generates an optimally inlined walk where each of the functions sees aconstant level and can codegen the exact constants/etc for that level.

Note this can produce a lot of code!

Writing a Format

It is best to start from a simple format that is similar to the target. x86_64is usually a good reference for something simple, and AMDv1 is something fairlycomplete.

The required inline functions need to be implemented in the format header.These should all follow the standard pattern of:

static inline pt_oaddr_t amdv1pt_entry_oa(const struct pt_state *pts){       [..]}#define pt_entry_oa amdv1pt_entry_oa

where a uniquely named per-format inline function provides the implementationand a define maps it to the generic name. This is intended to make debug symbolswork better. inline functions should always be used as the prototypes inpt_common.h will cause the compiler to validate the function signature toprevent errors.

Review pt_fmt_defaults.h to understand some of the optional inlines.

Once the format compiles then it should be run through the generic page tablekunit test in kunit_generic_pt.h using kunit. For example:

$ tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig amdv1_fmt_test.*[...][11:15:08] Testing complete. Ran 9 tests: passed: 9[11:15:09] Elapsed time: 3.137s total, 0.001s configuring, 2.368s building, 0.311s running

The generic tests are intended to prove out the format functions and giveclearer failures to speed up finding the problems. Once those pass then theentire kunit suite should be run.

IOMMU Invalidation Features

Invalidation is how the page table algorithms synchronize with a HW cache of thepage table memory, typically called the TLB (or IOTLB for IOMMU cases).

The TLB can store present PTEs, non-present PTEs and table pointers, dependingon its design. Every HW has its own approach on how to describe what has changedto have changed items removed from the TLB.

PT_FEAT_FLUSH_RANGE

PT_FEAT_FLUSH_RANGE is the easiest scheme to understand. It tries to generate asingle range invalidation for each operation, over-invalidating if there aregaps of VA that don’t need invalidation. This trades off impacted VA for numberof invalidation operations. It does not keep track of what is being invalidated;however, if pages have to be freed then page table pointers have to be cleanedfrom the walk cache. The range can start/end at any page boundary.

PT_FEAT_FLUSH_RANGE_NO_GAPS

PT_FEAT_FLUSH_RANGE_NO_GAPS is similar to PT_FEAT_FLUSH_RANGE; however, it triesto minimize the amount of impacted VA by issuing extra flush operations. This isuseful if the cost of processing VA is very high, for instance because ahypervisor is processing the page table with a shadowing algorithm.