Memory Management¶
BO management¶
TTM manages (placement, eviction, etc...) all BOs in Xe.
BO creation¶
Create a chunk of memory which can be used by the GPU. Placement rules(sysmem or vram region) passed in upon creation. TTM handles placement of BOand can trigger eviction of other BOs to make space for the new BO.
Kernel BOs¶
A kernel BO is created as part of driver load (e.g. uC firmware images, GuCADS, etc...) or a BO created as part of a user operation which requiresa kernel BO (e.g. engine state, memory for page tables, etc...). These BOsare typically mapped in the GGTT (any kernel BOs aside memory for page tablesare in the GGTT), are pinned (can’t move or be evicted at runtime), have avmap (Xe can access the memory via xe_map layer) and have contiguous physicalmemory.
More details of why kernel BOs are pinned and contiguous below.
User BOs¶
A user BO is created via the DRM_IOCTL_XE_GEM_CREATE IOCTL. Once it iscreated the BO can be mmap’d (via DRM_IOCTL_XE_GEM_MMAP_OFFSET) for useraccess and it can be bound for GPU access (via DRM_IOCTL_XE_VM_BIND). Alluser BOs are evictable and user BOs are never pinned by Xe. The allocation ofthe backing store can be deferred from creation time until first use which iseither mmap, bind, or pagefault.
Private BOs¶
A private BO is a user BO created with a valid VM argument passed into thecreate IOCTL. If a BO is private it cannot be exported via prime FD andmappings can only be created for the BO within the VM it is tied to. Lastly,the BO dma-resv slots / lock point to the VM’s dma-resv slots / lock (allprivate BOs to a VM share common dma-resv slots / lock).
External BOs¶
An external BO is a user BO created with a NULL VM argument passed into thecreate IOCTL. An external BO can be shared with different UMDs / devices viaprime FD and the BO can be mapped into multiple VMs. An external BO has itsown unique dma-resv slots / lock. An external BO will be in an array of allVMs which has a mapping of the BO. This allows VMs to lookup and lock allexternal BOs mapped in the VM as needed.
BO placement¶
When a user BO is created, a mask of valid placements is passed indicatingwhich memory regions are considered valid.
The memory region information is available via query uAPI (TODO: add link).
BO validation¶
BO validation (ttm_bo_validate) refers to ensuring a BO has a validplacement. If a BO was swapped to temporary storage, a validation call willtrigger a move back to a valid (location where GPU can access BO) placement.Validation of a BO may evict other BOs to make room for the BO beingvalidated.
BO eviction / moving¶
All eviction (or in other words, moving a BO from one memory location toanother) is routed through TTM with a callback into Xe.
Runtime eviction¶
Runtime evictions refers to during normal operations where TTM decides itneeds to move a BO. Typically this is because TTM needs to make room foranother BO and the evicted BO is first BO on LRU list that is not locked.
An example of this is a new BO which can only be placed in VRAM but there isnot space in VRAM. There could be multiple BOs which have sysmem and VRAMplacement rules which currently reside in VRAM, TTM trigger a will move ofone (or multiple) of these BO(s) until there is room in VRAM to place the newBO. The evicted BO(s) are valid but still need new bindings before the BOused again (exec or compute mode rebind worker).
Another example would be, TTM can’t find a BO to evict which has anothervalid placement. In this case TTM will evict one (or multiple) unlocked BO(s)to a temporary unreachable (invalid) placement. The evicted BO(s) are invalidand before next use need to be moved to a valid placement and rebound.
In both cases, moves of these BOs are scheduled behind the fences in the BO’sdma-resv slots.
WW locking tries to ensures if 2 VMs use 51% of the memory forward progressis made on both VMs.
Runtime eviction uses per a GT migration engine (TODO: link to migrationengine doc) to do a GPU memcpy from one location to another.
Rebinds after runtime eviction¶
When BOs are moved, every mapping (VMA) of the BO needs to rebound beforethe BO is used again. Every VMA is added to an evicted list of its VM whenthe BO is moved. This is safe because of the VM locking structure (TODO: linkto VM locking doc). On the next use of a VM (exec or compute mode rebindworker) the evicted VMA list is checked and rebinds are triggered. In thecase of faulting VM, the rebind is done in the page fault handler.
Suspend / resume eviction of VRAM¶
During device suspend / resume VRAM may lose power which means the contentsof VRAM’s memory is blown away. Thus BOs present in VRAM at the time ofsuspend must be moved to sysmem in order for their contents to be saved.
A simple TTM call (ttm_resource_manager_evict_all) can move all non-pinned(user) BOs to sysmem. External BOs that are pinned need to be manuallyevicted with a simple loop + xe_bo_evict call. It gets a little trickierwith kernel BOs.
Some kernel BOs are used by the GT migration engine to do moves, thus wecan’t move all of the BOs via the GT migration engine. For simplity, use aTTM memcpy (CPU) to move any kernel (pinned) BO on either suspend or resume.
Some kernel BOs need to be restored to the exact same physical location. TTMmakes this rather easy but the caveat is the memory must be contiguous. Againfor simplity, we enforce that all kernel (pinned) BOs are contiguous andrestored to the same physical location.
Pinned external BOs in VRAM are restored on resume via the GPU.
Rebinds after suspend / resume¶
Most kernel BOs have GGTT mappings which must be restored during the resumeprocess. All user BOs are rebound after validation on their next use.
Future work¶
Trim the list of BOs which is saved / restored via TTM memcpy on suspend /resume. All we really need to save / restore via TTM memcpy is the memoryrequired for the GuC to load and the memory for the GT migrate engine tooperate.
Do not require kernel BOs to be contiguous in physical memory / restored tothe same physical address on resume. In all likelihood the only memory thatneeds to be restored to the same physical address is memory used for pagetables. All of that memory is allocated 1 page at time so the contiguousrequirement isn’t needed. Some work on the vmap code would need to be done ifkernel BOs are not contiguous too.
Make some kernel BO evictable rather than pinned. An example of this would beengine state, in all likelihood if the dma-slots of these BOs where properlyused rather than pinning we could safely evict + rebind these BOs as needed.
Some kernel BOs do not need to be restored on resume (e.g. GuC ADS as that isrepopulated on resume), add flag to mark such objects as no save / restore.
GGTT¶
Xe GGTT implements the support for a Global Virtual Address space that is usedfor resources that are accessible to privileged (i.e. kernel-mode) processes,and not tied to a specific user-level process. For example, the Graphicsmicro-Controller (GuC) and Display Engine (if present) utilize this Globaladdress space.
The Global GTT (GGTT) translates from the Global virtual address to a physicaladdress that can be accessed by HW. The GGTT is a flat, single-level table.
Xe implements a simplified version of the GGTT specifically managing only acertain range of it that goes from the Write Once Protected Content Memory (WOPCM)Layout to a predefined GUC_GGTT_TOP. This approach avoids complications related tothe GuC (Graphics Microcontroller) hardware limitations. The GuC address spaceis limited on both ends of the GGTT, because the GuC shim HW redirectsaccesses to those addresses to other HW areas instead of going through theGGTT. On the bottom end, the GuC can’t access offsets below the WOPCM size,while on the top side the limit is fixed at GUC_GGTT_TOP. To keep thingssimple, instead of checking each object to see if they are accessed by GuC ornot, we just exclude those areas from the allocator. Additionally, to simplifythe driver load, we use the maximum WOPCM size in this logic instead of theprogrammed one, so we don’t need to wait until the actual size to beprogrammed is determined (which requires FW fetch) before initializing theGGTT. These simplifications might waste space in the GGTT (about 20-25 MBsdepending on the platform) but we can live with this. Another benefit of thisis the GuC bootrom can’t access anything below the WOPCM max size so anythingthe bootrom needs to access (e.g. a RSA key) needs to be placed in the GGTTabove the WOPCM max size. Starting the GGTT allocations above the WOPCM maxgive us the correct placement for free.
GGTT Internal API¶
- structxe_ggtt_node¶
A node in GGTT.
Definition:
struct xe_ggtt_node { struct xe_ggtt *ggtt; struct drm_mm_node base; struct work_struct delayed_removal_work; bool invalidate_on_remove;};Members
ggttBack pointer to xe_ggtt where this region will be inserted at
baseA drm_mm_node
delayed_removal_workThe work struct for the delayed removal
invalidate_on_removeIf it needs invalidation upon removal
Description
Thisstructis allocated with xe_ggtt_insert_node(,_transform) or xe_ggtt_insert_bo(,_at).It will be deallocated usingxe_ggtt_node_remove().
- structxe_ggtt_pt_ops¶
GGTT Page table operations Which can vary from platform to platform.
Definition:
struct xe_ggtt_pt_ops { u64 (*pte_encode_flags)(struct xe_bo *bo, u16 pat_index); xe_ggtt_set_pte_fn ggtt_set_pte; u64 (*ggtt_get_pte)(struct xe_ggtt *ggtt, u64 addr);};Members
pte_encode_flagsEncode PTE flags for a given BO
ggtt_set_pteDirectly write into GGTT’s PTE
ggtt_get_pteDirectly read from GGTT’s PTE
- structxe_ggtt¶
Main GGTT struct
Definition:
struct xe_ggtt { struct xe_tile *tile; u64 start; u64 size;#define XE_GGTT_FLAGS_64K BIT(0); unsigned int flags; struct xe_bo *scratch; struct mutex lock; u64 __iomem *gsm; const struct xe_ggtt_pt_ops *pt_ops; struct drm_mm mm; unsigned int access_count; struct workqueue_struct *wq;};Members
tileBack pointer to tile where this GGTT belongs
startStart offset of GGTT
sizeTotal usable size of this GGTT
flagsFlags for this GGTTAcceptable flags:-
XE_GGTT_FLAGS_64K- if PTE size is 64K. Otherwise, regular is 4K.scratchInternal object allocation used as a scratch page
lockMutex lock to protect GGTT data
gsmThe iomem pointer to the actual location of the translationtable located in the GSM for easy PTE manipulation
pt_opsPage Table operations per platform
mmThe memory manager used to manage individual GGTT allocations
access_countcounts GGTT writes
wqDedicated unordered work queue to process node removals
Description
In general, each tile can contains its own Global Graphics Translation Table(GGTT) instance.
Parameters
structxe_tile*tilexe_tile
Description
Allocates axe_ggtt for a given tile.
Return
xe_ggtt on success, or NULL when out of memory.
Parameters
structxe_ggtt*ggttthe
xe_ggttto be initialized
Description
It allows to create new mappings usable by the GuC.Mappings are not usable by the HW engines, as it doesn’t have scratch norinitial clear done to it yet. That will happen in the regular, non-earlyGGTT initialization.
Return
0 on success or a negative error code on failure.
- voidxe_ggtt_node_remove(structxe_ggtt_node*node,boolinvalidate)¶
Remove a
xe_ggtt_nodefrom the GGTT
Parameters
structxe_ggtt_node*nodethe
xe_ggtt_nodeto be removedboolinvalidateif node needs invalidation upon removal
Parameters
structxe_ggtt*ggttthe
xe_ggttto be initialized
Return
0 on success or a negative error code on failure.
- voidxe_ggtt_shift_nodes(structxe_ggtt*ggtt,u64new_start)¶
Shift GGTT nodes to adjust for a change in usable address range.
Parameters
structxe_ggtt*ggttthe
xe_ggttstructinstanceu64new_startnew location of area provisioned for current VF
Description
Ensure that all structxe_ggtt_node are moved to thenew_start base addressby changing the base offset of the GGTT.
This function may be called multiple times during recovery, but ifnew_start is unchanged from the current base, it’s a noop.
new_start should be a value betweenxe_wopcm_size() and #GUC_GGTT_TOP.
- structxe_ggtt_node*xe_ggtt_insert_node(structxe_ggtt*ggtt,u32size,u32align)¶
Insert a
xe_ggtt_nodeinto the GGTT
Parameters
structxe_ggtt*ggttthe
xe_ggttinto which the node should be inserted.u32sizesize of the node
u32alignalignment constrain of the node
Return
xe_ggtt_node on success or a ERR_PTR on failure.
- size_txe_ggtt_node_pt_size(conststructxe_ggtt_node*node)¶
Get the size of page table entries needed to map a GGTT node.
Parameters
conststructxe_ggtt_node*nodethe
xe_ggtt_node
Return
GGTT node page table entries size in bytes.
- voidxe_ggtt_map_bo(structxe_ggtt*ggtt,structxe_ggtt_node*node,structxe_bo*bo,u64pte)¶
Map the BO into GGTT
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere node will be mappedstructxe_ggtt_node*nodethe
xe_ggtt_nodewhere this BO is mappedstructxe_bo*bothe
xe_boto be mappedu64pteThe pte flags to append.
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere node will be mappedstructxe_bo*bothe
xe_boto be mapped
Description
This is used to restore a GGTT mapping after suspend.
- structxe_ggtt_node*xe_ggtt_insert_node_transform(structxe_ggtt*ggtt,structxe_bo*bo,u64pte_flags,u64size,u32align,xe_ggtt_transform_cbtransform,void*arg)¶
Insert a newly allocated
xe_ggtt_nodeinto the GGTT
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere the node will inserted/reserved.structxe_bo*boThe bo to be transformed
u64pte_flagsThe extra GGTT flags to add to mapping.
u64sizesize of the node
u32alignrequired alignment for node
xe_ggtt_transform_cbtransformtransformation function that will populate the GGTT node, or NULL for linear mapping.
void*argExtra argument to pass to the transformation function.
Description
This function allows inserting a GGTT node with a custom transformation function.This is useful for display to allow inserting rotated framebuffers to GGTT.
Return
A pointer toxe_ggtt_nodestructon success. An ERR_PTR otherwise.
- intxe_ggtt_insert_bo_at(structxe_ggtt*ggtt,structxe_bo*bo,u64start,u64end,structdrm_exec*exec)¶
Insert BO at a specific GGTT space
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere bo will be insertedstructxe_bo*bothe
xe_boto be insertedu64startaddress where it will be inserted
u64endend of the range where it will be inserted
structdrm_exec*execThe drm_exec transaction to use for exhaustive eviction.
Return
0 on success or a negative error code on failure.
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere bo will be insertedstructxe_bo*bothe
xe_boto be insertedstructdrm_exec*execThe drm_exec transaction to use for exhaustive eviction.
Return
0 on success or a negative error code on failure.
Parameters
structxe_ggtt*ggttthe
xe_ggttwhere node will be removedstructxe_bo*bothe
xe_boto be removed
Parameters
structxe_ggtt*ggttthe
xe_ggttthat will be inspectedu64alignmentminimum alignment
u64*spareIf not NULL: in: desired memory size to be spared / out: Adjusted possible spare
Return
size of the largest continuous GGTT region
- voidxe_ggtt_assign(conststructxe_ggtt_node*node,u16vfid)¶
assign a GGTT region to the VF
Parameters
conststructxe_ggtt_node*nodethe
xe_ggtt_nodeto updateu16vfidthe VF identifier
Description
This function is used by the PF driver to assign a GGTT region to the VF.In addition to PTE’s VFID bits 11:2 also PRESENT bit 0 is set as on someplatforms VFs can’t modify that either.
- intxe_ggtt_node_save(structxe_ggtt_node*node,void*dst,size_tsize,u16vfid)¶
Save a
xe_ggtt_nodeto a buffer.
Parameters
structxe_ggtt_node*nodethe
xe_ggtt_nodeto be savedvoid*dstdestination buffer
size_tsizedestination buffer size in bytes
u16vfidVF identifier
Return
0 on success or a negative error code on failure.
- intxe_ggtt_node_load(structxe_ggtt_node*node,constvoid*src,size_tsize,u16vfid)¶
Load a
xe_ggtt_nodefrom a buffer.
Parameters
structxe_ggtt_node*nodethe
xe_ggtt_nodeto be loadedconstvoid*srcsource buffer
size_tsizesource buffer size in bytes
u16vfidVF identifier
Return
0 on success or a negative error code on failure.
- intxe_ggtt_dump(structxe_ggtt*ggtt,structdrm_printer*p)¶
Dump GGTT for debug
Parameters
structxe_ggtt*ggttthe
xe_ggttto be dumpedstructdrm_printer*pthe
drm_mm_printerhelper handle to be used to dump the information
Return
0 on success or a negative error code on failure.
- u64xe_ggtt_print_holes(structxe_ggtt*ggtt,u64alignment,structdrm_printer*p)¶
Print holes
Parameters
structxe_ggtt*ggttthe
xe_ggttto be inspectedu64alignmentmin alignment
structdrm_printer*pthe
drm_printer
Description
Print GGTT ranges that are available and return total size available.
Return
Total available size.
- u64xe_ggtt_encode_pte_flags(structxe_ggtt*ggtt,structxe_bo*bo,u16pat_index)¶
Get PTE encoding flags for BO
Parameters
structxe_ggtt*ggttstructxe_bo*boxe_bou16pat_indexThe pat_index for the PTE.
Description
This function returns the pte_flags for a given BO, without address.It’s used for DPT to fill a GGTT mapped BO with a linear lookup table.
Parameters
structxe_ggtt*ggttu64offsetthe offset for which the mapping should be read.
Description
Used by testcases, and by display reading out an inherited bios FB.
- u64xe_ggtt_node_addr(conststructxe_ggtt_node*node)¶
Getnode offset in GGTT.
Parameters
conststructxe_ggtt_node*node
Description
Get the GGTT offset for allocated node.
- u64xe_ggtt_node_size(conststructxe_ggtt_node*node)¶
Getnode allocation size.
Pagetable building¶
Below we use the term “page-table” for both page-directories, containingpointers to lower level page-directories or page-tables, and level 0page-tables that contain only page-table-entries pointing to memory pages.
When inserting an address range in an already existing page-table treethere will typically be a set of page-tables that are shared with otheraddress ranges, and a set that are private to this address range.The set of shared page-tables can be at most two per level,and those can’t be updated immediately because the entries of thosepage-tables may still be in use by the gpu for other mappings. Thereforewhen inserting entries into those, we instead stage those insertions byadding insertion data intostructxe_vm_pgtable_update structures. Thisdata, (subtrees for the cpu and page-table-entries for the gpu) is thenadded in a separate commit step. CPU-data is committed while still under thevm lock, the object lock and for userptr, the notifier lock in read mode.The GPU async data is committed either by the GPU or CPU after fulfillingrelevant dependencies.For non-shared page-tables (and, in fact, for shared ones that aren’texisting at the time of staging), we add the data in-place without thespecial update structures. This private part of the page-table tree willremain disconnected from the vm page-table tree until data is committed tothe shared page tables of the vm tree in the commit phase.