12. Linux IOMMU Support¶
The architecture spec can be obtained from the below location.
This guide gives a quick cheat sheet for some basic understanding.
Some Keywords
- DMAR - DMA remapping
- DRHD - DMA Remapping Hardware Unit Definition
- RMRR - Reserved memory Region Reporting Structure
- ZLR - Zero length reads from PCI devices
- IOVA - IO Virtual address.
12.1. Basic stuff¶
ACPI enumerates and lists the different DMA engines in the platform, anddevice scope relationships between PCI devices and which DMA engine controlsthem.
12.2. What is RMRR?¶
There are some devices the BIOS controls, for e.g USB devices to performPS2 emulation. The regions of memory used for these devices are markedreserved in the e820 map. When we turn on DMA translation, DMA to thoseregions will fail. Hence BIOS uses RMRR to specify these regions along withdevices that need to access these regions. OS is expected to setupunity mappings for these regions for these devices to access these regions.
12.3. How is IOVA generated?¶
Well behaved drivers call pci_map_*() calls before sending command to devicethat needs to perform DMA. Once DMA is completed and mapping is no longerrequired, device performs a pci_unmap_*() calls to unmap the region.
The Intel IOMMU driver allocates a virtual address per domain. Each PCIEdevice has its own domain (hence protection). Devices under p2p bridgesshare the virtual address with all devices under the p2p bridge due totransaction id aliasing for p2p bridges.
IOVA generation is pretty generic. We used the same technique asvmalloc()but these are not global address spaces, but separate for each domain.Different DMA engines may support different number of domains.
We also allocate guard pages with each mapping, so we can attempt to catchany overflow that might happen.
12.4. Graphics Problems?¶
If you encounter issues with graphics devices, you can try addingoption intel_iommu=igfx_off to turn off the integrated graphics engine.If this fixes anything, please ensure you file a bug reporting the problem.
12.5. Some exceptions to IOVA¶
Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff).The same is true for peer to peer transactions. Hence we reserve theaddress from PCI MMIO ranges so they are not allocated for IOVA addresses.
12.6. Fault reporting¶
When errors are reported, the DMA engine signals via an interrupt. The faultreason and device that caused it with fault reason is printed on console.
See below for sample.
12.7. Boot Message Sample¶
Something like this gets printed indicating presence of DMAR tablesin ACPI.
ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
When DMAR is being processed and initialized by ACPI, prints DMAR locationsand any RMRR’s processed:
ACPI DMAR:Host address width 36ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effffACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
When DMAR is enabled for use, you will notice..
12.8. PCI-DMA: Using DMAR IOMMU¶
12.8.1. Fault reporting¶
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000DMAR:[fault reason 05] PTE Write access is not setDMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000DMAR:[fault reason 05] PTE Write access is not set
12.9. TBD¶
- For compatibility testing, could use unity map domain for all devices, justprovide a 1-1 for all useful memory under a single domain for all devices.
- API for paravirt ops for abstracting functionality for VMM folks.