XCP-ng PCIe error messages

XCP-ng runs fine on cheap desktops that make good, low power homelabbing virtualization hosts. I came across this issue with a Fujitsu Esprimo small form factor machines. The dom0 ring buffer is flooded with messages about PCIe bus error. The problem looks like this:

pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7

pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)

pcieport 0000:00:1c.7:   device [8086:a33f] error status/mask=00001000/00002000

pcieport 0000:00:1c.7:    [12] Timeout  

The device in question appears to be the bridge itself:

00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)

It seems the PCIe power management does not play ball with the dom0 kernel in XCP-ng. The issue can be fixed by just turning the power management feature off, after which the errors stop. It's done by adding pcie_aspm=off in dom0 kernel parameters (not Xen!)

This was apparently fixed in later kernel versions, but XCP-ng 8.3 still has relatively old dom0 kernel, so this was still needed.

No comments:

Post a Comment