I915 Small BAR RFC Section¶
Starting from DG2 we will have resizable BAR support for device local-memory(i.eI915_MEMORY_CLASS_DEVICE), but in some cases the final BAR size might still besmaller than the total probed_size. In such cases, only some subset ofI915_MEMORY_CLASS_DEVICE will be CPU accessible(for example the first 256M),while the remainder is only accessible via the GPU.
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag¶
New gem_create_ext flag to tell the kernel that a BO will require CPU access.This becomes important when placing an object in I915_MEMORY_CLASS_DEVICE, whereunderneath the device has a small BAR, meaning only some portion of it is CPUaccessible. Without this flag the kernel will assume that CPU access is notrequired, and prioritize using the non-CPU visible portion ofI915_MEMORY_CLASS_DEVICE.
- struct__drm_i915_gem_create_ext¶
Existing gem_create behaviour, with added extension support using
structi915_user_extension.
Definition:
struct __drm_i915_gem_create_ext { __u64 size; __u32 handle;#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0); __u32 flags;#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0;#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1; __u64 extensions;};Members
sizeRequested size for the object.
The (page-aligned) allocated size for the object will be returned.
Note that for some devices we have might have further minimumpage-size restrictions (larger than 4K), like for device local-memory.However in general the final size here should always reflect anyrounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONSextension to place the object in device local-memory. The kernel willalways select the largest minimum page-size for the set of possibleplacements as the value to use when rounding up thesize.
handleReturned handle for the object.
Object handles are nonzero.
flagsOptional flags.
Supported values:
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel thatthe object will need to be accessed via the CPU.
Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and onlystrictly required on configurations where some subset of the devicememory is directly visible/mappable through the CPU (which we alsocall small BAR), like on some DG2+ systems. Note that this is quiteundesirable, but due to various factors like the client CPU, BIOS etcit’s something we can expect to see in the wild. See
__drm_i915_memory_region_info.probed_cpu_visible_sizefor how todetermine if this system applies.Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, toensure the kernel can always spill the allocation to system memory,if the object can’t be allocated in the mappable part ofI915_MEMORY_CLASS_DEVICE.
Also note that since the kernel only supports flat-CCS on objectsthat canonly be placed in I915_MEMORY_CLASS_DEVICE, we thereforedon’t support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together withflat-CCS.
Without this hint, the kernel will assume that non-mappableI915_MEMORY_CLASS_DEVICE is preferred for this object. Note that thekernel can still migrate the object to the mappable part, as a lastresort, if userspace ever CPU faults this object, but this might beexpensive, and so ideally should be avoided.
On older kernels which lack the relevant small-bar uAPI support (seealso
__drm_i915_memory_region_info.probed_cpu_visible_size),usage of the flag will result in an error, but it should NEVER bepossible to end up with a small BAR configuration, assuming we canalso successfully load the i915 kernel module. In such cases theentire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and assuch there are zero restrictions on where the object can be placed.extensionsThe chain of extensions to apply to this object.
This will be useful in the future when we need to support severaldifferent extensions, and we need to apply more than one whencreating the object. See
structi915_user_extension.If we don’t supply any extensions then we get the same old gem_createbehaviour.
For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see
structdrm_i915_gem_create_ext_memory_regions.For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
structdrm_i915_gem_create_ext_protected_content.
Description
Note that new buffer flags should be added here, at least for the stuff thatis immutable. Previously we would have two ioctls, one to create the objectwith gem_create, and another to apply various parameters, however thiscreates some ambiguity for the params which are considered immutable. Also ingeneral we’re phasing out the various SET/GET ioctls.
probed_cpu_visible_size attribute¶
New struct__drm_i915_memory_region attribute which returns the total size of theCPU accessible portion, for the particular region. This should only beapplicable for I915_MEMORY_CLASS_DEVICE. We also report theunallocated_cpu_visible_size, alongside the unallocated_size.
Vulkan will need this as part of creating a separate VkMemoryHeap with theVK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set, to represent the CPU visible portion,where the total size of the heap needs to be known. It also wants to be able togive a rough estimate of how memory can potentially be allocated.
- struct__drm_i915_memory_region_info¶
Describes one region as known to the driver.
Definition:
struct __drm_i915_memory_region_info { struct drm_i915_gem_memory_class_instance region; __u32 rsvd0; __u64 probed_size; __u64 unallocated_size; union { __u64 rsvd1[8]; struct { __u64 probed_cpu_visible_size; __u64 unallocated_cpu_visible_size; }; };};Members
regionThe class:instance pair encoding
rsvd0MBZ
probed_sizeMemory probed by the driver
Note that it should not be possible to ever encounter a zero valuehere, also note that no current region type will ever return -1 here.Although for future region types, this might be a possibility. Thesame applies to the other size fields.
unallocated_sizeEstimate of memory remaining
Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.Without this (or if this is an older kernel) the value here willalways equal theprobed_size. Note this is only currently trackedfor I915_MEMORY_CLASS_DEVICE regions (for other types the value herewill always equal theprobed_size).
{unnamed_union}anonymous
rsvd1MBZ
{unnamed_struct}anonymous
probed_cpu_visible_sizeMemory probed by the driverthat is CPU accessible.
This will be always be <=probed_size, and theremainder (if there is any) will not be CPUaccessible.
On systems without small BAR, theprobed_size willalways equal theprobed_cpu_visible_size, since allof it will be CPU accessible.
Note this is only tracked forI915_MEMORY_CLASS_DEVICE regions (for other types thevalue here will always equal theprobed_size).
Note that if the value returned here is zero, thenthis must be an old kernel which lacks the relevantsmall-bar uAPI support (includingI915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but onsuch systems we should never actually end up with asmall BAR configuration, assuming we are able to loadthe kernel module. Hence it should be safe to treatthis the same as whenprobed_cpu_visible_size ==probed_size.
unallocated_cpu_visible_sizeEstimate of CPUvisible memory remaining
Note this is only tracked forI915_MEMORY_CLASS_DEVICE regions (for other types thevalue here will always equal theprobed_cpu_visible_size).
Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliableaccounting. Without this the value here will alwaysequal theprobed_cpu_visible_size. Note this is onlycurrently tracked for I915_MEMORY_CLASS_DEVICEregions (for other types the value here will alsoalways equal theprobed_cpu_visible_size).
If this is an older kernel the value here will bezero, see alsoprobed_cpu_visible_size.
Description
Note this is using bothstructdrm_i915_query_item andstructdrm_i915_query.For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONSatdrm_i915_query_item.query_id.
Error Capture restrictions¶
With error capture we have two new restrictions:
1) Error capture is best effort on small BAR systems; if the pages are notCPU accessible, at the time of capture, then the kernel is free to skiptrying to capture them.
2) On discrete and newer integrated platforms we now reject error captureon recoverable contexts. In the future the kernel may want to blit duringerror capture, when for example something is not currently CPU accessible.