Matthew Garrett (mjg59) wrote,
Matthew Garrett

Adventures in PCI hotplug

I played with an Eee for a bit last time I was in Boston, culminating in a patch to make the eeepc-laptop driver use standard interfaces rather than just having random files in /sys that people need to write custom scripts to use. The world became a better place.

However. Asus implemented the rfkill control on the Eee in a slightly odd way. Disabling the wifi actually causes the entire card to drop off the bus, similar to how Bluetooth is normally handled. The difference is that the Bluetooth dongles are almost exclusively USB, while the Eee's wifi is PCI. Linux supports hotplugging of PCI devices, but nothing seemed to work out of the box on the Eee. Another case of this was the SD reader in the Acer Aspire One. Unless a card was present in the slot during boot, it simply wouldn't appear on the PCI bus. It turned out that Acer have implemented things in such a way that removing the card results in the entire chip being unplugged. This was when I started looking more closely into how this functionality is implemented.

The two common cases of PCI hotplug are native PCIe hotplug and ACPI mediated hotplug. In the former case, the chipset generates an interrupt when a hotplug event occurs and the OS then rescans the bus. This is a mildly complicated operation, requiring enabling the slot, checking whether there's a card there, powering the card and all its functions up, waiting for the PCIe link to settle and then announcing the new PCI device to the rest of the OS. ACPI-mediated hotplugging puts more of the load on the firmware rather than the OS - the hotplug event generates a notify message that is caught by the ACPI interpreter in the OS, allowing the OS to check for device presence by calling another ACPI method. If the device is present it's then a simple matter of telling the PCI layer about it.

Native PCIe hotplug has the advantage that there's much less vendor code involved. ACPI is still involved to an extent - an _OSC method on the PCIe bridge is called to allow the OS to tell the firmware that it supports handling hotplug events. This allows the firmware to stop sending any ACPI notifications. ACPI hotplugging requires more support in the firmware, but can work for PCI as well as PCIe.

The general approach taken to getting the Eee's wifi hotplugging to work has been to load the pciehp driver with the pciehp_force=1 argument. This tells the driver to listen for hotplugging events even when there's no _OSC method to tell the firmware that the OS is handling things now. Since the hardware will generate the event anyway, things work. However, this is non-ideal. Some hardware exists where ACPI hotplugging will work, but due to quirks in the hardware design native PCIe hotplugging control will fail. This has been handled in their firmware by having the _OSC method fail, signalling to the pciehp driver that it shouldn't bind to the port. Using pciehp_force overrides that, leading to a situation where hardware could potentially be removed from a port that's powered up. Unfortunate.

My first approach was to add a new argument to pciehp called pciehp_passive. This would indicate to the pciehp driver that it should only listen for notifications from the hardware. User-triggered events would not be supported, avoiding the situation where anyone could remove the card by accident. This worked on my test machine (an Eee 901 somewhere in Ottawa, since I don't actually have one myself...) but was reported to work less well on a 700. Since the 700 didn't claim to have any support for power control, the code was forced to wait a second on every operation to see whether the link powered up or not. This resulted in long pauses during boot and suspend/resume operations.

The final issue that convinced me that this was the wrong approach was reading a document on Microsoft's site on how PCIe hotplugging is implemented in Windows. It turns out that XP doesn't support native PCIe hotplugging at all - that feature was added in Vista. Both the Eee and the Aspire One are available with XP, but things work there. So PCIe native hotplugging was clearly not the right answer. Time to look further.

Armed with a disassembly of the Aspire One's DSDT, I figured out why the ACPI hotplug driver didn't work on it. The first thing the driver does is walk the list of ACPI devices, looking for any that are removable. That was being implemented by looking for an _EJ0 method. _EJ0 indicates that the device can be ejected under the control of the OS. The Aspire One doesn't have an _EJ0 method on its SD readers. However, it did have an _RMV method. This can be used to indicate that a device is removable but not ejectable - that is, the device can be removed (by physically pulling it out or by the hardware taking it away itself), but there's no standard way to ask the OS to logically disconnect it. A quick patch to acpiphp later and the Aspire One now worked without any forcing or spec contravention. This also has the nice side effect of making expresscard hotplug work on a bunch of machines where it otherwise wouldn't.

But back to the Eee. acpiphp still wasn't binding, and a closer examination revealed why. There's nothing to indicate that the Eee's ports are hotpluggable, and there's no topological data in the ACPI tables that ties the wifi function to the PCIe root bridges. However, the Eee firmware was sending an ACPI notification on wifi hotplug. But it was only sending this to the PCIe root bridges, and there's no way to then tell which device had potentially appeared or vanished.

In the end, I gave up on trying to solve this generically. Instead I've got a patch that implements the hotplugging entirely in eeepc-laptop. In an ideal world nobody else will have implemented this in the same way as Asus and we can all be happy.
Tags: advogato, fedora

Comments for this post were locked by the author