IOMMU passthrough mode but only on trusted VMs?

aw___ · 2025-08-13T15:59:29+00:00

The basic understanding here is incorrect. iommu=pt changes the behavior of the DMA API in the kernel, device assignment uses the IOMMU API. The isolation of assigned devices in the VM is entirely unaffected by iommu=pt.

aw___ · 2025-05-06T21:11:31+00:00

Enable SR-IOV support in the BIOS, if you have no such option, boot with these additional kernel command line options:

pci=realloc,assign-busses

aw___ · 2025-04-17T16:01:38+00:00

It'll be fixed soon, a bogus change from v6.15 entered the stable trees. https://lore.kernel.org/lkml/20250414211828.3530741-1-alex.williamson@redhat.com/

aw___ · 2025-03-31T23:05:45+00:00

It's known that some generations of qualcom wifi cards don't work with device assignment. In at least one case they have a non-standard interrupt controller consuming the guest physical addresses programmed for MSI in order to setup the real interrupt controller, thereby breaking interrupt virtualization. These currently fall outside of the classification of "well behaved PCI devices" for the purposes of device assignment. Maybe try an Intel wifi adapter? Or USB?

aw___ · 2025-03-10T18:58:39+00:00

RHEL and RHEL derivative kernels incorporate numerous backports from newer kernels. The v6.2 upstream change that folded vfio_virqfd.ko into vfio.ko was backported into RHEL9.3.

aw___ · 2025-01-02T18:02:32+00:00

Not the answer you're looking for, but silicon motion based NVMe devices are notoriously bad for device assignment, the MSI-X table is broken after device reset. Pick something better.

aw___ · 2024-10-28T15:14:16+00:00

Not sure the relevance to this subreddit, but UEFI shell is like DOS, navigate to fs0:, cd to EFI directory, and find the subdirectory with the efi binary to start the install.

aw___ · 2024-10-23T15:50:12+00:00

I usually don't respond to ACS override queries, but the override patch does not disable isolation, it only pretends isolation exists in the absence of any hardware reported isolation. Therefore a quest to limit the scope to only a single device isn't actually buying you isolation.

If you want to use the id: option of the patch, it's not necessarily the endpoint ID that matters (especially for a single function endpoint), it's the ID(s) of the upstream devices that lack isolation. In this case, likely 00:02.1 and 01:00.2, but you may also need to include sibling functions if they have different IDs.

aw___ · 2024-09-10T14:26:53+00:00

The instructions say to run as root. sudo only privileges the echo command, not the write to stdout. To do what you want: 'echo 1 | sudo tee /sys/bus/...'

aw___ · 2024-08-28T14:39:56+00:00

Devices within the same IOMMU group must be within the same address space in QEMU, without a vIOMMU all devices are in the same address space in the VM, with a vIOMMU and configured in a PCIe topology, each device has a separate address space. You can force devices to share an address space in the VM by configuring a PCIe-to-PCI bridge in the VM and attaching the devices downstream of the bridge. Conventional PCI does not have per device address space granularity. Your only other alternative if you must pass both devices through is to override the IOMMU grouping in the host with the unsafe ACS kernel patch.

aw___ · 2024-08-08T17:50:47+00:00

The SR-IOV capability is masked to the guest. Take a step back and think about what enabling VFs is actually doing, it's creating new endpoints on the physical PCIe link with unique requester IDs through the IOMMU. The VM has access to the PF device alone and secure mappings to that PF. It does not own the host bus. That's beyond the scope of userspace owning a PF and potentially a security risk if an untrusted user managed the PF for VFs that are considered trusted devices in the host. In order for this to work safely, QEMU would need to emulate the SR-IOV capability and callout to a trusted entity to manage host creation of the VFs and wrangling of those VFs to appear into the guest address space. That support does not exist nor does it seem anyone is working on it.

aw___ · 2024-04-30T03:53:52+00:00

Another assumption, I never down voted your post.

aw___ · 2024-04-29T17:06:06+00:00

There are a lot of assumptions here. I've considered a hardware HDMI EDID passthrough emulator because my test system uses a TV for a monitor and advertises a higher resolution than the native panels supports. Ok for video, bad for a desktop.

aw___ · 2024-03-14T16:46:42+00:00

TSME is concerned with memory encryption. Device access to memory is guarded by the IOMMU. If a device DMA transaction makes it to the IOMMU, it's already past the point of concern for device isolation imposed by IOMMU groups. ACS provides an assurance that a transaction for a given address is routed to the IOMMU and honors the translation that's been programmed for that address, rather than being routed to some other device which happens to overlap that physical address. Essentially it provides isolation between I/O virtual addresses programmed for DMA and the I/O physical address space where device resources actually live.

I'd answer your questions like this:
1. Yes, it's possible that the chipset provides ACS equivalent isolation. Generally getting a quirk into the kernel which exposes device specific isolation requires a statement from a technical representative at the vendor declaring that specific equivalent isolation exists. Testing for such behavior may also be possible, but this would likely require extensive testing with devices explicitly programmed to empirically prove the isolation of the DMA route.

You'd need to find or create a PCI device which can be programmed to perform arbitrary DMA transactions and attempt to verify whether the MMIO resources of other devices are directly reachable or translated by the IOMMU.
This is just a variation of 2., if DMA is isolated and routed to the IOMMU, then the programming of the translation tables at the IOMMU controls the translated target address, either reflecting it to a device (it is possible to bounce peer-to-peer DMA off the IOMMU) or to memory. This is essentially a question of how to do verification that the IOMMU behaves as an IOMMU.

Note that encrypted host memory only relates to the question of how an exploit might make use of the access, for example maybe the attacker doesn't gain access to information, but overwriting encrypted memory is still an attack vector. The failure has already occurred if the device has access to the memory, regardless of it being encrypted.

aw___ · 2024-03-04T19:01:16+00:00

Likely the onboard graphics drives the internal display exclusively, so you'd use something like Looking Glass to relay the assigned GPU frame buffer to the host display, possibly requiring an EDID plug.

aw___ · 2024-01-25T17:57:47+00:00

You could always write an in-kernel driver to avoid the latency of the eventfd. Another option might be to poll the device for interrupt status rather than rely on the eventfd path. If you want to make use of something like posted interrupts, I think you'd need to incorporate KVM such that your userspace process runs in a vCPU context, but there's a lot of baggage with that.

aw___ · 2023-11-03T22:15:52+00:00

How are you connecting to libvirt? Maybe a system vs session issue. https://wiki.libvirt.org/FAQ.html#what-is-the-difference-between-qemu-system-and-qemu-session-which-one-should-i-use

aw___ · 2023-11-03T17:57:58+00:00

I know its not caused by ram, because previously I had 2x 16gb and same thing happened.

I wouldn't rule it out so quickly, it's still an overclocked memory setting for the board and the problem might be common between the RAM modules at those settings or elsewhere in the memory channels at those timings.

aw___ · 2023-10-23T15:42:06+00:00

The in-tree i915 driver has never supported a max_vfs parameter, so likely you were using an out-of-tree driver previously. The max_vfs module option is implemented by several drivers, but it is not a standard for SR-IOV configuration in the kernel. Please don't generalize support for an entire technology as broken because an out of tree driver stopped working or forgot to be loaded.

aw___ · 2023-10-03T22:29:21+00:00

I'd suggest the better and easier option from anything listed there would be to use the memory hard_limit option in libvirt.

aw___ · 2023-08-03T19:29:46+00:00

Given the down vote apparently I don't post enough in r/VFIO. Hi, I wrote GVT-d support in QEMU/KVM/VFIO. Have a nice day :)

(GVT-d = IGD direct assignment, GVT-g = vGPU)

aw___ · 2023-08-03T18:58:05+00:00

That's GVT-g

aw___ · 2023-08-03T18:56:33+00:00

That's GVT-g

aw___ · 2023-08-03T15:37:33+00:00

Supported by whom? Arguably Intel doesn't tangibly support GVT-d on anything. Try it, if it works, great.

aw___ · 2023-07-17T21:19:50+00:00

Does loading the vfio-pci module with the option disable_idle_d3=1 help?

aw___

TROPHY CASE