Matthew Garrett's Journal
 
[Most Recent Entries] [Calendar View] [Friends View]

Saturday, November 15th, 2008

    Time Event
    5:01p
    Adventures in PCI hotplug
    I played with an Eee for a bit last time I was in Boston, culminating in a patch to make the eeepc-laptop driver use standard interfaces rather than just having random files in /sys that people need to write custom scripts to use. The world became a better place.

    However. Asus implemented the rfkill control on the Eee in a slightly odd way. Disabling the wifi actually causes the entire card to drop off the bus, similar to how Bluetooth is normally handled. The difference is that the Bluetooth dongles are almost exclusively USB, while the Eee's wifi is PCI. Linux supports hotplugging of PCI devices, but nothing seemed to work out of the box on the Eee. Another case of this was the SD reader in the Acer Aspire One. Unless a card was present in the slot during boot, it simply wouldn't appear on the PCI bus. It turned out that Acer have implemented things in such a way that removing the card results in the entire chip being unplugged. This was when I started looking more closely into how this functionality is implemented.

    The two common cases of PCI hotplug are native PCIe hotplug and ACPI mediated hotplug. In the former case, the chipset generates an interrupt when a hotplug event occurs and the OS then rescans the bus. This is a mildly complicated operation, requiring enabling the slot, checking whether there's a card there, powering the card and all its functions up, waiting for the PCIe link to settle and then announcing the new PCI device to the rest of the OS. ACPI-mediated hotplugging puts more of the load on the firmware rather than the OS - the hotplug event generates a notify message that is caught by the ACPI interpreter in the OS, allowing the OS to check for device presence by calling another ACPI method. If the device is present it's then a simple matter of telling the PCI layer about it.

    Native PCIe hotplug has the advantage that there's much less vendor code involved. ACPI is still involved to an extent - an _OSC method on the PCIe bridge is called to allow the OS to tell the firmware that it supports handling hotplug events. This allows the firmware to stop sending any ACPI notifications. ACPI hotplugging requires more support in the firmware, but can work for PCI as well as PCIe.

    The general approach taken to getting the Eee's wifi hotplugging to work has been to load the pciehp driver with the pciehp_force=1 argument. This tells the driver to listen for hotplugging events even when there's no _OSC method to tell the firmware that the OS is handling things now. Since the hardware will generate the event anyway, things work. However, this is non-ideal. Some hardware exists where ACPI hotplugging will work, but due to quirks in the hardware design native PCIe hotplugging control will fail. This has been handled in their firmware by having the _OSC method fail, signalling to the pciehp driver that it shouldn't bind to the port. Using pciehp_force overrides that, leading to a situation where hardware could potentially be removed from a port that's powered up. Unfortunate.

    My first approach was to add a new argument to pciehp called pciehp_passive. This would indicate to the pciehp driver that it should only listen for notifications from the hardware. User-triggered events would not be supported, avoiding the situation where anyone could remove the card by accident. This worked on my test machine (an Eee 901 somewhere in Ottawa, since I don't actually have one myself...) but was reported to work less well on a 700. Since the 700 didn't claim to have any support for power control, the code was forced to wait a second on every operation to see whether the link powered up or not. This resulted in long pauses during boot and suspend/resume operations.

    The final issue that convinced me that this was the wrong approach was reading a document on Microsoft's site on how PCIe hotplugging is implemented in Windows. It turns out that XP doesn't support native PCIe hotplugging at all - that feature was added in Vista. Both the Eee and the Aspire One are available with XP, but things work there. So PCIe native hotplugging was clearly not the right answer. Time to look further.

    Armed with a disassembly of the Aspire One's DSDT, I figured out why the ACPI hotplug driver didn't work on it. The first thing the driver does is walk the list of ACPI devices, looking for any that are removable. That was being implemented by looking for an _EJ0 method. _EJ0 indicates that the device can be ejected under the control of the OS. The Aspire One doesn't have an _EJ0 method on its SD readers. However, it did have an _RMV method. This can be used to indicate that a device is removable but not ejectable - that is, the device can be removed (by physically pulling it out or by the hardware taking it away itself), but there's no standard way to ask the OS to logically disconnect it. A quick patch to acpiphp later and the Aspire One now worked without any forcing or spec contravention. This also has the nice side effect of making expresscard hotplug work on a bunch of machines where it otherwise wouldn't.

    But back to the Eee. acpiphp still wasn't binding, and a closer examination revealed why. There's nothing to indicate that the Eee's ports are hotpluggable, and there's no topological data in the ACPI tables that ties the wifi function to the PCIe root bridges. However, the Eee firmware was sending an ACPI notification on wifi hotplug. But it was only sending this to the PCIe root bridges, and there's no way to then tell which device had potentially appeared or vanished.

    In the end, I gave up on trying to solve this generically. Instead I've got a patch that implements the hotplugging entirely in eeepc-laptop. In an ideal world nobody else will have implemented this in the same way as Asus and we can all be happy.
    6:01p
    Hybrid suspend
    One often requested feature for suspend support on Linux is hybrid suspend, or "suspend to both". This is where the system suspends to disk, but then puts the machine in S3 rather than powering down. If the user resumes without power having been removed they get the benefit of the fast S3 resume. If not, the system resumes from disk and no data is lost.

    This is, clearly, the way suspend should work. We're not planning on adding it by default in Fedora, though, for a couple of reasons. The main reason right now is that the in-kernel suspend to disk is still slow. Triggering a suspend to disk on a machine with gigabytes of RAM (which is a basic laptop configuration these days) will leave you sitting there for an extended period of time when all you actually want to do is pick your machine up and leave. Fixing this properly is less than trivial. TuxOnIce improves the speed somewhat, at the expense of being a >500k patch against upstream that touches all sorts of interesting bits of the kernel such as the vm system. We're not supporting that for fairly obvious reasons. But even then, the suspend to disk process involves discarding some pages. Those need to be pulled in from disk again on resume. With the current implementation, suspend to both is fundamentally slower than suspend to RAM for both the suspend and resume paths.

    So, what other approaches are there? One is to resume from RAM some period of time after suspending and then if the battery is low suspend to disk. Many recent machines will automatically resume when the battery level becomes critical. If the hardware doesn't support that, we can wake up after a set period and measure the battery consumption, set a new alarm and go to sleep again. The downside of this approach is that your system wakes up and does stuff without you being aware of it, which may be bad if it's inside a neoprane cover at the time. Cooking laptops is generally considered unhelpful.

    Using the kexec approach to hibernation provides a more straightforward way of handling the problem. The fundamental problem with the existing approach is that it ties suspend into the vm system and involves making atomic copies of RAM into other bits of RAM. kexec would allow us to pre-allocate enough space on disk to save RAM as-is, and then simply kexec into a new kernel and dump RAM to disk without any of the tedious shrinking required first. Resuming from S3 would kexec back into the old kernel, whereas losing power would just fall back to reading off disk. The extra time taken on the S3 path would be minimal.

    In an ideal world we'd adopt the Vista approach where "off" is synonymous with suspend. There's still more work to be done on enhancing reliability before that can be achieved, though.
    8:32p
    And another thing
    I swear I'm going out in a minute, but:

    Running strings on the firmware for a Dlink wireless bridge I have gives output that includes the following:
    From isolation / Deliver me o Xbox - / Through the ethernet
    Copyright (c) Microsoft Corporation. All Rights Reserved.
    Device is Xbox Compatible. Copyright (c) Microsoft Corporation. All Rights Reserved.
    . This confused me for a while until I plugged it into an Xbox 360 and discovered that despite it being plugged into the ethernet port, I could control the wifi options including network selection and encryption method. Does anyone have the faintest idea how this is implemented? A tcpdump of the Xbox booting reveals some ICMP6 packets, a bunch of DHCP and some uPNP service discovery. uPNP seems like a plausible option, but I've got no idea how to probe a device for uPNP services using Linux. Has anyone played with reverse engineering this stuff? Googling didn't seem to show anything up.

    << Previous Day 2008/11/15
    [Calendar]
    Next Day >>

My Website   About LiveJournal.com

Advertisement