Memory Management

BO management

TTM manages (placement, eviction, etc...) all BOs in Xe.

BO creation

Create a chunk of memory which can be used by the GPU. Placement rules(sysmem or vram region) passed in upon creation. TTM handles placement of BOand can trigger eviction of other BOs to make space for the new BO.

Kernel BOs

A kernel BO is created as part of driver load (e.g. uC firmware images, GuCADS, etc...) or a BO created as part of a user operation which requiresa kernel BO (e.g. engine state, memory for page tables, etc...). These BOsare typically mapped in the GGTT (any kernel BOs aside memory for page tablesare in the GGTT), are pinned (can’t move or be evicted at runtime), have avmap (Xe can access the memory via xe_map layer) and have contiguous physicalmemory.

More details of why kernel BOs are pinned and contiguous below.

User BOs

A user BO is created via the DRM_IOCTL_XE_GEM_CREATE IOCTL. Once it iscreated the BO can be mmap’d (via DRM_IOCTL_XE_GEM_MMAP_OFFSET) for useraccess and it can be bound for GPU access (via DRM_IOCTL_XE_VM_BIND). Alluser BOs are evictable and user BOs are never pinned by Xe. The allocation ofthe backing store can be deferred from creation time until first use which iseither mmap, bind, or pagefault.

Private BOs

A private BO is a user BO created with a valid VM argument passed into thecreate IOCTL. If a BO is private it cannot be exported via prime FD andmappings can only be created for the BO within the VM it is tied to. Lastly,the BO dma-resv slots / lock point to the VM’s dma-resv slots / lock (allprivate BOs to a VM share common dma-resv slots / lock).

External BOs

An external BO is a user BO created with a NULL VM argument passed into thecreate IOCTL. An external BO can be shared with different UMDs / devices viaprime FD and the BO can be mapped into multiple VMs. An external BO has itsown unique dma-resv slots / lock. An external BO will be in an array of allVMs which has a mapping of the BO. This allows VMs to lookup and lock allexternal BOs mapped in the VM as needed.

BO placement

When a user BO is created, a mask of valid placements is passed indicatingwhich memory regions are considered valid.

The memory region information is available via query uAPI (TODO: add link).

BO validation

BO validation (ttm_bo_validate) refers to ensuring a BO has a validplacement. If a BO was swapped to temporary storage, a validation call willtrigger a move back to a valid (location where GPU can access BO) placement.Validation of a BO may evict other BOs to make room for the BO beingvalidated.

BO eviction / moving

All eviction (or in other words, moving a BO from one memory location toanother) is routed through TTM with a callback into Xe.

Runtime eviction

Runtime evictions refers to during normal operations where TTM decides itneeds to move a BO. Typically this is because TTM needs to make room foranother BO and the evicted BO is first BO on LRU list that is not locked.

An example of this is a new BO which can only be placed in VRAM but there isnot space in VRAM. There could be multiple BOs which have sysmem and VRAMplacement rules which currently reside in VRAM, TTM trigger a will move ofone (or multiple) of these BO(s) until there is room in VRAM to place the newBO. The evicted BO(s) are valid but still need new bindings before the BOused again (exec or compute mode rebind worker).

Another example would be, TTM can’t find a BO to evict which has anothervalid placement. In this case TTM will evict one (or multiple) unlocked BO(s)to a temporary unreachable (invalid) placement. The evicted BO(s) are invalidand before next use need to be moved to a valid placement and rebound.

In both cases, moves of these BOs are scheduled behind the fences in the BO’sdma-resv slots.

WW locking tries to ensures if 2 VMs use 51% of the memory forward progressis made on both VMs.

Runtime eviction uses per a GT migration engine (TODO: link to migrationengine doc) to do a GPU memcpy from one location to another.

Rebinds after runtime eviction

When BOs are moved, every mapping (VMA) of the BO needs to rebound beforethe BO is used again. Every VMA is added to an evicted list of its VM whenthe BO is moved. This is safe because of the VM locking structure (TODO: linkto VM locking doc). On the next use of a VM (exec or compute mode rebindworker) the evicted VMA list is checked and rebinds are triggered. In thecase of faulting VM, the rebind is done in the page fault handler.

Suspend / resume eviction of VRAM

During device suspend / resume VRAM may lose power which means the contentsof VRAM’s memory is blown away. Thus BOs present in VRAM at the time ofsuspend must be moved to sysmem in order for their contents to be saved.

A simple TTM call (ttm_resource_manager_evict_all) can move all non-pinned(user) BOs to sysmem. External BOs that are pinned need to be manuallyevicted with a simple loop + xe_bo_evict call. It gets a little trickierwith kernel BOs.

Some kernel BOs are used by the GT migration engine to do moves, thus wecan’t move all of the BOs via the GT migration engine. For simplity, use aTTM memcpy (CPU) to move any kernel (pinned) BO on either suspend or resume.

Some kernel BOs need to be restored to the exact same physical location. TTMmakes this rather easy but the caveat is the memory must be contiguous. Againfor simplity, we enforce that all kernel (pinned) BOs are contiguous andrestored to the same physical location.

Pinned external BOs in VRAM are restored on resume via the GPU.

Rebinds after suspend / resume

Most kernel BOs have GGTT mappings which must be restored during the resumeprocess. All user BOs are rebound after validation on their next use.

Future work

Trim the list of BOs which is saved / restored via TTM memcpy on suspend /resume. All we really need to save / restore via TTM memcpy is the memoryrequired for the GuC to load and the memory for the GT migrate engine tooperate.

Do not require kernel BOs to be contiguous in physical memory / restored tothe same physical address on resume. In all likelihood the only memory thatneeds to be restored to the same physical address is memory used for pagetables. All of that memory is allocated 1 page at time so the contiguousrequirement isn’t needed. Some work on the vmap code would need to be done ifkernel BOs are not contiguous too.

Make some kernel BO evictable rather than pinned. An example of this would beengine state, in all likelihood if the dma-slots of these BOs where properlyused rather than pinning we could safely evict + rebind these BOs as needed.

Some kernel BOs do not need to be restored on resume (e.g. GuC ADS as that isrepopulated on resume), add flag to mark such objects as no save / restore.

GGTT

Xe GGTT implements the support for a Global Virtual Address space that is usedfor resources that are accessible to privileged (i.e. kernel-mode) processes,and not tied to a specific user-level process. For example, the Graphicsmicro-Controller (GuC) and Display Engine (if present) utilize this Globaladdress space.

The Global GTT (GGTT) translates from the Global virtual address to a physicaladdress that can be accessed by HW. The GGTT is a flat, single-level table.

Xe implements a simplified version of the GGTT specifically managing only acertain range of it that goes from the Write Once Protected Content Memory (WOPCM)Layout to a predefined GUC_GGTT_TOP. This approach avoids complications related tothe GuC (Graphics Microcontroller) hardware limitations. The GuC address spaceis limited on both ends of the GGTT, because the GuC shim HW redirectsaccesses to those addresses to other HW areas instead of going through theGGTT. On the bottom end, the GuC can’t access offsets below the WOPCM size,while on the top side the limit is fixed at GUC_GGTT_TOP. To keep thingssimple, instead of checking each object to see if they are accessed by GuC ornot, we just exclude those areas from the allocator. Additionally, to simplifythe driver load, we use the maximum WOPCM size in this logic instead of theprogrammed one, so we don’t need to wait until the actual size to beprogrammed is determined (which requires FW fetch) before initializing theGGTT. These simplifications might waste space in the GGTT (about 20-25 MBsdepending on the platform) but we can live with this. Another benefit of thisis the GuC bootrom can’t access anything below the WOPCM max size so anythingthe bootrom needs to access (e.g. a RSA key) needs to be placed in the GGTTabove the WOPCM max size. Starting the GGTT allocations above the WOPCM maxgive us the correct placement for free.

GGTT Internal API

structxe_ggtt_node

A node in GGTT.

Definition:

struct xe_ggtt_node {    struct xe_ggtt *ggtt;    struct drm_mm_node base;    struct work_struct delayed_removal_work;    bool invalidate_on_remove;};

Members

ggtt

Back pointer to xe_ggtt where this region will be inserted at

base

A drm_mm_node

delayed_removal_work

The work struct for the delayed removal

invalidate_on_remove

If it needs invalidation upon removal

Description

Thisstructis allocated with xe_ggtt_insert_node(,_transform) or xe_ggtt_insert_bo(,_at).It will be deallocated usingxe_ggtt_node_remove().

structxe_ggtt_pt_ops

GGTT Page table operations Which can vary from platform to platform.

Definition:

struct xe_ggtt_pt_ops {    u64 (*pte_encode_flags)(struct xe_bo *bo, u16 pat_index);    xe_ggtt_set_pte_fn ggtt_set_pte;    u64 (*ggtt_get_pte)(struct xe_ggtt *ggtt, u64 addr);};

Members

pte_encode_flags

Encode PTE flags for a given BO

ggtt_set_pte

Directly write into GGTT’s PTE

ggtt_get_pte

Directly read from GGTT’s PTE

structxe_ggtt

Main GGTT struct

Definition:

struct xe_ggtt {    struct xe_tile *tile;    u64 start;    u64 size;#define XE_GGTT_FLAGS_64K BIT(0);    unsigned int flags;    struct xe_bo *scratch;    struct mutex lock;    u64 __iomem *gsm;    const struct xe_ggtt_pt_ops *pt_ops;    struct drm_mm mm;    unsigned int access_count;    struct workqueue_struct *wq;};

Members

tile

Back pointer to tile where this GGTT belongs

start

Start offset of GGTT

size

Total usable size of this GGTT

flags

Flags for this GGTTAcceptable flags:-XE_GGTT_FLAGS_64K - if PTE size is 64K. Otherwise, regular is 4K.

scratch

Internal object allocation used as a scratch page

lock

Mutex lock to protect GGTT data

gsm

The iomem pointer to the actual location of the translationtable located in the GSM for easy PTE manipulation

pt_ops

Page Table operations per platform

mm

The memory manager used to manage individual GGTT allocations

access_count

counts GGTT writes

wq

Dedicated unordered work queue to process node removals

Description

In general, each tile can contains its own Global Graphics Translation Table(GGTT) instance.

u64xe_ggtt_start(structxe_ggtt*ggtt)

Get starting offset of GGTT.

Parameters

structxe_ggtt*ggtt

xe_ggtt

Return

Starting offset for thisxe_ggtt.

u64xe_ggtt_size(structxe_ggtt*ggtt)

Get size of GGTT.

Parameters

structxe_ggtt*ggtt

xe_ggtt

Return

Total usable size of thisxe_ggtt.

structxe_ggtt*xe_ggtt_alloc(structxe_tile*tile)

Allocate a GGTT for a givenxe_tile

Parameters

structxe_tile*tile

xe_tile

Description

Allocates axe_ggtt for a given tile.

Return

xe_ggtt on success, or NULL when out of memory.

intxe_ggtt_init_early(structxe_ggtt*ggtt)

Early GGTT initialization

Parameters

structxe_ggtt*ggtt

thexe_ggtt to be initialized

Description

It allows to create new mappings usable by the GuC.Mappings are not usable by the HW engines, as it doesn’t have scratch norinitial clear done to it yet. That will happen in the regular, non-earlyGGTT initialization.

Return

0 on success or a negative error code on failure.

voidxe_ggtt_node_remove(structxe_ggtt_node*node,boolinvalidate)

Remove axe_ggtt_node from the GGTT

Parameters

structxe_ggtt_node*node

thexe_ggtt_node to be removed

boolinvalidate

if node needs invalidation upon removal

intxe_ggtt_init(structxe_ggtt*ggtt)

Regular non-early GGTT initialization

Parameters

structxe_ggtt*ggtt

thexe_ggtt to be initialized

Return

0 on success or a negative error code on failure.

voidxe_ggtt_shift_nodes(structxe_ggtt*ggtt,u64new_start)

Shift GGTT nodes to adjust for a change in usable address range.

Parameters

structxe_ggtt*ggtt

thexe_ggttstructinstance

u64new_start

new location of area provisioned for current VF

Description

Ensure that all structxe_ggtt_node are moved to thenew_start base addressby changing the base offset of the GGTT.

This function may be called multiple times during recovery, but ifnew_start is unchanged from the current base, it’s a noop.

new_start should be a value betweenxe_wopcm_size() and #GUC_GGTT_TOP.

structxe_ggtt_node*xe_ggtt_insert_node(structxe_ggtt*ggtt,u32size,u32align)

Insert axe_ggtt_node into the GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt into which the node should be inserted.

u32size

size of the node

u32align

alignment constrain of the node

Return

xe_ggtt_node on success or a ERR_PTR on failure.

size_txe_ggtt_node_pt_size(conststructxe_ggtt_node*node)

Get the size of page table entries needed to map a GGTT node.

Parameters

conststructxe_ggtt_node*node

thexe_ggtt_node

Return

GGTT node page table entries size in bytes.

voidxe_ggtt_map_bo(structxe_ggtt*ggtt,structxe_ggtt_node*node,structxe_bo*bo,u64pte)

Map the BO into GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt where node will be mapped

structxe_ggtt_node*node

thexe_ggtt_node where this BO is mapped

structxe_bo*bo

thexe_bo to be mapped

u64pte

The pte flags to append.

voidxe_ggtt_map_bo_unlocked(structxe_ggtt*ggtt,structxe_bo*bo)

Restore a mapping of a BO into GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt where node will be mapped

structxe_bo*bo

thexe_bo to be mapped

Description

This is used to restore a GGTT mapping after suspend.

structxe_ggtt_node*xe_ggtt_insert_node_transform(structxe_ggtt*ggtt,structxe_bo*bo,u64pte_flags,u64size,u32align,xe_ggtt_transform_cbtransform,void*arg)

Insert a newly allocatedxe_ggtt_node into the GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt where the node will inserted/reserved.

structxe_bo*bo

The bo to be transformed

u64pte_flags

The extra GGTT flags to add to mapping.

u64size

size of the node

u32align

required alignment for node

xe_ggtt_transform_cbtransform

transformation function that will populate the GGTT node, or NULL for linear mapping.

void*arg

Extra argument to pass to the transformation function.

Description

This function allows inserting a GGTT node with a custom transformation function.This is useful for display to allow inserting rotated framebuffers to GGTT.

Return

A pointer toxe_ggtt_nodestructon success. An ERR_PTR otherwise.

intxe_ggtt_insert_bo_at(structxe_ggtt*ggtt,structxe_bo*bo,u64start,u64end,structdrm_exec*exec)

Insert BO at a specific GGTT space

Parameters

structxe_ggtt*ggtt

thexe_ggtt where bo will be inserted

structxe_bo*bo

thexe_bo to be inserted

u64start

address where it will be inserted

u64end

end of the range where it will be inserted

structdrm_exec*exec

The drm_exec transaction to use for exhaustive eviction.

Return

0 on success or a negative error code on failure.

intxe_ggtt_insert_bo(structxe_ggtt*ggtt,structxe_bo*bo,structdrm_exec*exec)

Insert BO into GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt where bo will be inserted

structxe_bo*bo

thexe_bo to be inserted

structdrm_exec*exec

The drm_exec transaction to use for exhaustive eviction.

Return

0 on success or a negative error code on failure.

voidxe_ggtt_remove_bo(structxe_ggtt*ggtt,structxe_bo*bo)

Remove a BO from the GGTT

Parameters

structxe_ggtt*ggtt

thexe_ggtt where node will be removed

structxe_bo*bo

thexe_bo to be removed

u64xe_ggtt_largest_hole(structxe_ggtt*ggtt,u64alignment,u64*spare)

Largest GGTT hole

Parameters

structxe_ggtt*ggtt

thexe_ggtt that will be inspected

u64alignment

minimum alignment

u64*spare

If not NULL: in: desired memory size to be spared / out: Adjusted possible spare

Return

size of the largest continuous GGTT region

voidxe_ggtt_assign(conststructxe_ggtt_node*node,u16vfid)

assign a GGTT region to the VF

Parameters

conststructxe_ggtt_node*node

thexe_ggtt_node to update

u16vfid

the VF identifier

Description

This function is used by the PF driver to assign a GGTT region to the VF.In addition to PTE’s VFID bits 11:2 also PRESENT bit 0 is set as on someplatforms VFs can’t modify that either.

intxe_ggtt_node_save(structxe_ggtt_node*node,void*dst,size_tsize,u16vfid)

Save axe_ggtt_node to a buffer.

Parameters

structxe_ggtt_node*node

thexe_ggtt_node to be saved

void*dst

destination buffer

size_tsize

destination buffer size in bytes

u16vfid

VF identifier

Return

0 on success or a negative error code on failure.

intxe_ggtt_node_load(structxe_ggtt_node*node,constvoid*src,size_tsize,u16vfid)

Load axe_ggtt_node from a buffer.

Parameters

structxe_ggtt_node*node

thexe_ggtt_node to be loaded

constvoid*src

source buffer

size_tsize

source buffer size in bytes

u16vfid

VF identifier

Return

0 on success or a negative error code on failure.

intxe_ggtt_dump(structxe_ggtt*ggtt,structdrm_printer*p)

Dump GGTT for debug

Parameters

structxe_ggtt*ggtt

thexe_ggtt to be dumped

structdrm_printer*p

thedrm_mm_printer helper handle to be used to dump the information

Return

0 on success or a negative error code on failure.

u64xe_ggtt_print_holes(structxe_ggtt*ggtt,u64alignment,structdrm_printer*p)

Print holes

Parameters

structxe_ggtt*ggtt

thexe_ggtt to be inspected

u64alignment

min alignment

structdrm_printer*p

thedrm_printer

Description

Print GGTT ranges that are available and return total size available.

Return

Total available size.

u64xe_ggtt_encode_pte_flags(structxe_ggtt*ggtt,structxe_bo*bo,u16pat_index)

Get PTE encoding flags for BO

Parameters

structxe_ggtt*ggtt

xe_ggtt

structxe_bo*bo

xe_bo

u16pat_index

The pat_index for the PTE.

Description

This function returns the pte_flags for a given BO, without address.It’s used for DPT to fill a GGTT mapped BO with a linear lookup table.

u64xe_ggtt_read_pte(structxe_ggtt*ggtt,u64offset)

Read a PTE from the GGTT

Parameters

structxe_ggtt*ggtt

xe_ggtt

u64offset

the offset for which the mapping should be read.

Description

Used by testcases, and by display reading out an inherited bios FB.

u64xe_ggtt_node_addr(conststructxe_ggtt_node*node)

Getnode offset in GGTT.

Parameters

conststructxe_ggtt_node*node

xe_ggtt_node

Description

Get the GGTT offset for allocated node.

u64xe_ggtt_node_size(conststructxe_ggtt_node*node)

Getnode allocation size.

Parameters

conststructxe_ggtt_node*node

xe_ggtt_node

Description

Get the allocated node’s size.

Pagetable building

Below we use the term “page-table” for both page-directories, containingpointers to lower level page-directories or page-tables, and level 0page-tables that contain only page-table-entries pointing to memory pages.

When inserting an address range in an already existing page-table treethere will typically be a set of page-tables that are shared with otheraddress ranges, and a set that are private to this address range.The set of shared page-tables can be at most two per level,and those can’t be updated immediately because the entries of thosepage-tables may still be in use by the gpu for other mappings. Thereforewhen inserting entries into those, we instead stage those insertions byadding insertion data intostructxe_vm_pgtable_update structures. Thisdata, (subtrees for the cpu and page-table-entries for the gpu) is thenadded in a separate commit step. CPU-data is committed while still under thevm lock, the object lock and for userptr, the notifier lock in read mode.The GPU async data is committed either by the GPU or CPU after fulfillingrelevant dependencies.For non-shared page-tables (and, in fact, for shared ones that aren’texisting at the time of staging), we add the data in-place without thespecial update structures. This private part of the page-table tree willremain disconnected from the vm page-table tree until data is committed tothe shared page tables of the vm tree in the commit phase.