me

Supporting UEFI secure boot on Linux: the details

(Update January 18th 2012 - you probably want to read this for details on why the technical details described below are not the difficult bit of the problem)

An obvious question is why Linux doesn't support UEFI secure booting. Let's ignore the issues of key distribution and the GPL and all of those things, and instead just focus on what would be required. There's two components - the signed binary and the authenticated variables.

The UEFI 2.3.1 spec describes the modification to the binary format required to produce a signed binary. It's not especially difficult - you add an extra entry to the image directory, generate a hash of the entire binary other than the checksum, the certificate directory entry and the signatures themselves, encrypt that hash with your key and embed the encrypted hash in the binary. The problem has been that there was a disagreement between Microsoft and Intel over whether this signature was supposed to include the PKCS header or not, and until earlier this week the only widely available developer firmware (Intel's) was incompatible with the only widely available signed OS (Microsoft's). There's further hilarity in that the specification lists six supported hash algorithms, but the implementations will only accept two. So pretty normal, really. Developing towards a poorly defined target is a pain. Now that there's more clarity we'll probably have a signing tool before too long.

Authenticated variables are the other part of the puzzle. If a variable requires authentication, the operating system's attempt to write it will fail unless the new data is appropriately signed. The key databases (white and blacklists) are examples of authenticated variables. The signing actually takes place in userspace, and the handoff between the kernel and firmware is identical for both this case and the unauthenticated case. The only problem in Linux's support here is that our EFI variable support was written to a pre-1.0 version of the EFI specification which stated that variables had a maximum size of 1024 bytes, and this limitation ended up exposed to userspace. So all we really need to do there is add a new interface to let arbitrary sized variables be written.

Summary: We don't really support secure boot right now, but that's ok because you can't buy any hardware that supports it yet. Adding support is probably about a week's worth of effort at most.

me

UEFI secure booting (part 2)

Updated: Three things happened to defuse this situation:
  1. Microsoft mandated that it be possible to disable Secure Boot on any Windows certified systems
  2. Microsoft mandated that it be possible for the user to replace the original Secure Boot keys on any Windows certified systems
  3. Microsoft were willing to sign alternative OS bootloaders with their signing keys

As a result, the worst case scenario did not come to pass and it's still possible for users to install Linux on their systems.

Original content follows:

Microsoft have responded to suggestions that Windows 8 may make it difficult to boot alternative operating systems. What's interesting is that at no point do they contradict anything I've said. As things stand, Windows 8 certified systems will make it either more difficult or impossible to install alternative operating systems. But let's have some more background.

We became aware of this issue in early August. Since then, we at Red Hat have been discussing the problem with other Linux vendors, hardware vendors and BIOS vendors. We've been making sure that we understood the ramifications of the policy in order to avoid saying anything that wasn't backed up by facts. These are the facts:

  • Windows 8 certification requires that hardware ship with UEFI secure boot enabled.
  • Windows 8 certification does not require that the user be able to disable UEFI secure boot, and we've already been informed by hardware vendors that some hardware will not have this option.
  • Windows 8 certification does not require that the system ship with any keys other than Microsoft's.
  • A system that ships with UEFI secure boot enabled and only includes Microsoft's signing keys will only securely boot Microsoft operating systems.

Microsoft have a dominant position in the desktop operating system market. Despite Apple's huge comeback over the past decade, their worldwide share of the desktop market is below 5%. Linux is far below that. Microsoft own well over 90% of the market. Competition in that market is tough, and vendors will take every break they can get. That includes the Windows logo program, in which Microsoft give incentives to vendors to sell hardware that meets their certification requirements. Vendors who choose not to follow the certification requirements will be at a disadvantage in the marketplace. So while it's up to vendors to choose whether or not to follow the certification requirements, Microsoft's dominant position means that they'd be losing sales by doing so.

Why is this a problem? Because there's no central certification authority for UEFI signing keys. Microsoft can require that hardware vendors include their keys. Their competition can't. A system that ships with Microsoft's signing keys and no others will be unable to perform secure boot of any operating system other than Microsoft's. No other vendor has the same position of power over the hardware vendors. Red Hat is unable to ensure that every OEM carries their signing key. Nor is Canonical. Nor is Nvidia, or AMD or any other PC component manufacturer. Microsoft's influence here is greater than even Intel's.

What does this mean for the end user? Microsoft claim that the customer is in control of their PC. That's true, if by "customer" they mean "hardware manufacturer". The end user is not guaranteed the ability to install extra signing keys in order to securely boot the operating system of their choice. The end user is not guaranteed the ability to disable this functionality. The end user is not guaranteed that their system will include the signing keys that would be required for them to swap their graphics card for one from another vendor, or replace their network card and still be able to netboot, or install a newer SATA controller and have it recognise their hard drive in the firmware. The end user is no longer in control of their PC.

If Microsoft were serious about giving the end user control, they'd be mandating that systems ship without any keys installed. The user would then have the ability to make an informed and conscious decision to limit the flexibility of their system and install the keys. The user would be told what they'd be gaining and what they'd be giving up.

The final irony? If the user has no control over the installed keys, the user has no way to indicate that they don't trust Microsoft products. They can prevent their system booting malware. They can prevent their system booting Red Hat, Ubuntu, FreeBSD, OS X or any other operating system. But they can't prevent their system from running Windows 8.

Microsoft's rebuttal is entirely factually accurate. But it's also misleading. The truth is that Microsoft's move removes control from the end user and places it in the hands of Microsoft and the hardware vendors. The truth is that it makes it more difficult to run anything other than Windows. The truth is that UEFI secure boot is a valuable and worthwhile feature that Microsoft are misusing to gain tighter control over the market. And the truth is that Microsoft haven't even attempted to argue otherwise.

me

UEFI secure booting

Since there are probably going to be some questions about this in the near future:

The UEFI secure boot protocol is part of recent UEFI specification releases. It permits one or more signing keys to be installed into a system firmware. Once enabled, secure boot prevents executables or drivers from being loaded unless they're signed by one of these keys. Another set of keys (Pkek) permits communication between an OS and the firmware. An OS with a Pkek matching that installed in the firmware may add additional keys to the whitelist. Alternatively, it may add keys to a blacklist. Binaries signed with a blacklisted key will not load.

There is no centralised signing authority for these UEFI keys. If a vendor key is installed on a machine, the only way to get code signed with that key is to get the vendor to perform the signing. A machine may have several keys installed, but if you are unable to get any of them to sign your binary then it won't be installable.

This impacts both software and hardware vendors. An OS vendor cannot boot their software on a system unless it's signed with a key that's included in the system firmware. A hardware vendor cannot run their hardware inside the EFI environment unless their drivers are signed with a key that's included in the system firmware. If you install a new graphics card that either has unsigned drivers, or drivers that are signed with a key that's not in your system firmware, you'll get no graphics support in the firmware.

Microsoft requires that machines conforming to the Windows 8 logo program and running a client version of Windows 8 ship with secure boot enabled. The two alternatives here are for Windows to be signed with a Microsoft key and for the public part of that key to be included with all systems, or alternatively for each OEM to include their own key and sign the pre-installed versions of Windows. The second approach would make it impossible to run boxed copies of Windows on Windows logo hardware, and also impossible to install new versions of Windows unless your OEM provided a new signed copy. The former seems more likely.

A system that ships with only OEM and Microsoft keys will not boot a generic copy of Linux.

Now, obviously, we could provide signed versions of Linux. This poses several problems. Firstly, we'd need a non-GPL bootloader. Grub 2 is released under the GPLv3, which explicitly requires that we provide the signing keys. Grub is under GPLv2 which lacks the explicit requirement for keys, but it could be argued that the requirement for the scripts used to control compilation includes that. It's a grey area, and exploiting it would be a pretty good show of bad faith. Secondly, in the near future the design of the kernel will mean that the kernel itself is part of the bootloader. This means that kernels will also have to be signed. Making it impossible for users or developers to build their own kernels is not practical. Finally, if we self-sign, it's still necessary to get our keys included by ever OEM.

There's no indication that Microsoft will prevent vendors from providing firmware support for disabling this feature and running unsigned code. However, experience indicates that many firmware vendors and OEMs are interested in providing only the minimum of firmware functionality required for their market. It's almost certainly the case that some systems will ship with the option of disabling this. Equally, it's almost certainly the case that some systems won't.

It's probably not worth panicking yet. But it is worth being concerned.

me

The Android/GPL situation

There was another upsurge in discussion of Android GPL issues last month, triggered by couple of posts by Edward Naughton, followed by another by Florian Mueller. The central thrust is that section 4 of GPLv2 terminates your license on violation, and you need the copyright holders to grant you a new one. If they don't then you don't get to distribute any more copies of the code, even if you've now come into compliance. TLDR; most Android vendors are no longer permitted to distribute Linux.

I'll get to that shortly. There's a few other issues that could do with some clarification. The first is Naughton's insinuation that Google are violating the GPL due to Honeycomb being closed or their "license washing" of some headers. There's no evidence whatsoever that Google have failed to fulfil their GPL obligations in terms of providing source to anyone who received GPL-covered binaries from them. If anyone has some, please do get in touch. Some vendors do appear to be unwilling to hand over code for GPLed bits of Honeycomb. That's an issue with the vendors, not Google.

His second point is more interesting, but the summary is "Google took some GPLed header files and relicensed them under Apache 2.0, and they've taken some other people's GPLv2 code and put it under Apache 2.0 as well". As far as the headers go, there's probably not much to see here. The intent was to produce a set of headers for the C library by taking the kernel headers and removing the kernel-only components. The majority of what's left is just structure definitions and function prototypes, and is almost certainly not copyrightable. And remember that these are the headers that are distributed with the kernel and intended for consumption by userspace. If any of the remaining macros or inline functions are genuinely covered by the GPLv2, any userspace application including them would end up a derived work. This is clearly not the intention of the authors of the code. The risk to Google here is indistinguishable from zero.

How about the repurposing of other code? Naughton's most explicit description is:

For example, Android uses “bootcharting” logic, which uses “the 'bootchartd' script provided by www.bootchart.org, but a C re-implementation that is directly compiled into our init program.” The license that appears at www.bootchart.org is the GPLv2, not the Apache 2.0 license that Google claims for its implementation.

, but there's no indication that Google's reimplementation is a derived work of the GPLv2 original.

In summary: No sign that Google's violating the GPL.

Florian's post appears to be pretty much factually correct, other than this bit discussing the SFLC/Best Buy case:

I personally believe that intellectual property rights should usually be enforced against infringing publishers/manufacturers rather than mere resellers, but that's a separate issue.

The case in question was filed against Best Buy because Best Buy were manufacturing infringing devices. It was a set of own-brand Blu Ray players that incorporated Busybox. Best Buy were not a mere reseller.

Anyway. Back to the original point. Nobody appears to disagree that section 4 of the GPLv2 means that violating the license results in total termination of the license. The disagreement is over what happens next. Armijn Hemel, who has done various work on helping companies get back into compliance, believes that simply downloading a new copy of the code will result in a new license being granted, and that he's received legal advice that supports that. Bradley Kuhn disagrees. And the FSF seem to be on his side.

The relevant language in v2 is:

You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License.

The relevant language in v3 is:

You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License

which is awfully similar. However, v3 follows that up with:

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

In other words, with v3 you get your license back providing you're in compliance. This doesn't mesh too well with the assumption that you can get a new license by downloading a new copy of the software. It seems pretty clear that the intent of GPLv2 was that the license termination was final and required explicit reinstatement.

So whose interpretation is correct? At this point we really don't know - the only people who've tried to use this aspect of the GPL are the SFLC, and as part of their settlements they've always reinstated permission to distribute Busybox. There's no clear legal precedent. Which makes things a little awkward.

It's not possible to absolutely say that many Android distributors no longer have the right to distribute Linux. But nor is it possible to absolutely say that they haven't lost that right. Any sufficiently motivated kernel copyright holder probably could engage in a pretty effective shakedown racket against Android vendors. Whether they will do remains to be seen, but honestly if I were an Android vendor I'd be worried. There's plenty of people out there who hold copyright over significant parts of the kernel. Would you really bet on all of them being individuals of extreme virtue?

me

Booting with EFI

One of the ways in which EFI is *actually* better than BIOS is its native support for multiple boot choices. All EFI systems should have an EFI system partition which holds the OS bootloaders. Collisions are avoided by operating system vendors registering a unique name here, so there's no risk that Microsoft will overwrite the Fedora bootloader or whatever. After installing the bootloader the OS installer simply sets an NVRAM variable pointing at it, along with a descriptive name and (if they want) sets the default boot variable to point at that. The firmware will then typically provide some mechanism to override that default by providing a menu of all the configured variables.

This obviously doesn't work so well for removable media, where otherwise you'd have an awkward chicken and egg problem or have to force people to drop to a shell and run the bootloader themselves. This is handled by looking for EFI/boot/boot(architecture).efi, where architecture depends on the system type - examples include bootia32.efi, bootia64.efi and bootx64.efi. Since vendors have complete control over their media, there's still no risk of collisions.

Why do we care about collisions? The main reason this is helpful is that it means that there's no single part of the disk that every OS wants to control. If you install Windows it'll write stuff in the MBR and set the Windows partition as active. If you install Linux you'll either have to install grub in the MBR or set the Linux partition as active. Multiple Linux installations, more problems. It's very, very annoying to handle the multiple OS case with traditional BIOS.

This was all fine until UEFI 2.3 added section 3.4.1.2 of the spec, which specifies that in the absence of any configured boot variables it is permitted for the firmware to treat the EFI system partition in the same way as removable media - that is, it'll boot EFI/boot/bootx64.efi or whatever. And, if you install Windows via EFI, it'll install an EFI/boot/bootx64.efi fallback bootloader as well as putting one in EFI/microsoft.

Or, in other words, if your system fails to implement the boot variable section of the specification, Windows will still boot correctly.

As we've seen many times in the past, the only thing many hardware vendors do is check that Windows boots correctly. Which means that it's utterly unsurprising to discover that there are some systems that appear to ignore EFI boot variables and just look for the fallback bootloader instead. The fallback bootloader that has no namespacing, guaranteeing collisions if multiple operating systems are installed on the same system.

It could be worse. If there's already a bootloader there, Windows won't overwrite it. So things are marginally better than in the MBR days. But the Windows bootloader won't boot Linux, so if Windows gets there first we still have problems. The only solution I've come up with so far is to have a stub bootloader that is intelligent enough to scan the EFI system partition for any other bootloaders and present them as a menu, and for every Linux install to just blindly overwrite bootx64.efi if it already exists. Spec-compliant firmware should always ignore this and run whatever's in the boot variables instead.

This is all clearly less than optimal. Welcome to EFI.

me

IPv6 routers

I have a WRT-54G. I've had it for some years. It's run a bunch of different firmware variants over that time, but they've all had something in common. There's no way to configure IPv6 without editing text files, installing packages and punching yourself in the face repeatedly. Adam blogged about doing so today, and I suspect he may be in need of some reconstructive surgery now.

I spent yesterday looking at disassembled ACPI tables and working out the sequence of commands the firmware was sending to the hard drive. I'm planning on spending tomorrow writing x86 assembler to parse EFI memory maps. I spend a lot of time caring about stupidly awkward implementation details worked out from staring at binary dumps. The last thing I want to do is have to spend more than three minutes working out how to get IPv6 working on my home network because that cuts into the time I can spend drinking to forget.

Thankfully this is the future and punching yourself in the face is now an optional extra rather than bundled. Recent versions of Tomato USB (ie, newer than actually released) have a nice web UI for this. I registered with Tunnelbroker.net, got a tunnel, copied the prefix and endpoint addresses into the UI, hit save and ever since then NetworkManager has given me a routable IPv6 address. It's like the future.

Because I'm lazy I ended up getting an unofficial build from here. The std built doesn't seem to include IPv6, so I grabbed the miniipv6 one. The cheat-sheet for identifying builds is here. And I didn't edit a single text file. Excellent.

me

A use for EFI

Anyone who's been following anything I've written lately may be under the impression that I dislike EFI. They'd be entirely correct. It's an awful thing and I've lost far too much of my life to it. It complicates the process of booting for no real benefit to the OS. The only real advantage we've seen so far is that we can configure boot devices in a vaguely vendor-neutral manner without having to care about BIOS drive numbers. Woo.

But there is something else EFI gives us. We finally have more than 256 bytes of nvram available to us as standard. Enough nvram, in fact, for us to reasonably store crash output. Progress!

This isn't a novel concept. The UEFI spec provides for a specially segregated are of nvram for hardware error reports. This is lovely and not overly helpful for us, because they're supposed to be in a well-defined format that doesn't leave much scope for "I found a null pointer where I would really have preferred there not be one" followed by a pile of text, especially if the firmware's supposed to do something with it. Also, the record format has lots of metadata that I really don't care about. Apple have also been using EFI for this, creating a special variable that stores the crash data and letting them get away with just telling the user to turn their computer off and then turn it back on again.

EFI's not the only way this could be done, either. ACPI specifies something called the ERST, or Error Record Serialization Table. The OS can stick errors in here and then they can be retrieved later. Excellent! Except ERST is currently usually only present on high-end servers. But when ERST support was added to Linux, a generic interface called pstore went in as well.

Pstore's very simple. It's a virtual filesystem that has platform-specific plugins. The platform driver (such as ERST) registers with pstore and the ERST errors then get exposed as files in pstore. Deleting the files removes the records. pstore also registers with kmsg_dump, so when an oops happens the kernel output gets dumped back into a series of records. I'd been playing with pstore but really wanted something a little more convenient than an 8-socket server to test it with, so ended up writing a pstore backend that uses EFI variables. And now whenever I crash the kernel, pstore gives me a backtrace without me having to take photographs of the screen. Progress.

Patches are here. I should probably apologise to Seiji Aguchi, who was working on the same problem and posted a preliminary patch for some feedback last month. I replied to the thread without ever reading the patch and then promptly forgot about it, leading to me writing it all from scratch last week. Oops.

(There's an easter egg in the patchset. First person to find it doesn't win a prize. Sorry.)

me

Rebooting

You'd think it'd be easy to reboot a PC, wouldn't you? But then you'd also think that it'd be straightforward to convince people that at least making some effort to be nice to each other would be a mutually beneficial proposal, and look how well that's worked for us.

Linux has a bunch of different ways to reset an x86. Some of them are 32-bit only and so I'm just going to ignore them because honestly just what are you doing with your life. Also, they're horrible. So, that leaves us with five of them.
  • kbd - reboot via the keyboard controller. The original IBM PC had the CPU reset line tied to the keyboard controller. Writing the appropriate magic value pulses the line and the machine resets. This is all very straightforward, except for the fact that modern machines don't have keyboard controllers (they're actually part of the embedded controller) and even more modern machines don't even pretend to have a keyboard controller. Now, embedded controllers run software. And, as we all know, software is dreadful. But, worse, the software on the embedded controller has been written by BIOS authors. So clearly any pretence that this ever works is some kind of elaborate fiction. Some machines are very picky about hardware being in the exact state that Windows would program. Some machines work 9 times out of 10 and then lock up due to some odd timing issue. And others simply don't work at all. Hurrah!
  • triple - attempt to generate a triple fault. This is done by loading an empty interrupt descriptor table and then calling int(3). The interrupt fails (there's no IDT), the fault handler fails (there's no IDT) and the CPU enters a condition which should, in theory, then trigger a reset. Except there doesn't seem to be a requirement that this happen and it just doesn't work on a bunch of machines.
  • pci - not actually pci. Traditional PCI config space access is achieved by writing a 32 bit value to io port 0xcf8 to identify the bus, device, function and config register. Port 0xcfc then contains the register in question. But if you write the appropriate pair of magic values to 0xcf9, the machine will reboot. Spectacular! And not standardised in any way (certainly not part of the PCI spec), so different chipsets may have different requirements. Booo.
  • efi - EFI runtime services provide an entry point to reboot the machine. It usually even works! As long as EFI runtime services are working at all, which may be a stretch.
  • acpi - Recent versions of the ACPI spec let you provide an address (typically memory or system IO space) and a value to write there. The idea is that writing the value to the address resets the system. It turns out that doing so often fails. It's also impossible to represent the PCI reboot method via ACPI, because the PCI reboot method requires a pair of values and ACPI only gives you one.

Now, I'll admit that this all sounds pretty depressing. But people clearly sell computers with the expectation that they'll reboot correctly, so what's going on here?

A while back I did some tests with Windows running on top of qemu. This is a great way to evaluate OS behaviour, because you've got complete control of what's handed to the OS and what the OS tries to do to the hardware. And what I discovered was a little surprising. In the absence of an ACPI reboot vector, Windows will hit the keyboard controller, wait a while, hit it again and then give up. If an ACPI reboot vector is present, windows will poke it, try the keyboard controller, poke the ACPI vector again and try the keyboard controller one more time.

This turns out to be important. The first thing it means is that it generates two writes to the ACPI reboot vector. The second is that it leaves a gap between them while it's fiddling with the keyboard controller. And, shockingly, it turns out that on most systems the ACPI reboot vector points at 0xcf9 in system IO space. Even though most implementations nominally require two different values be written, it seems that this isn't a strict requirement and the ACPI method works.

3.0 will ship with this behaviour by default. It makes various machines work (some Apples, for instance), improves things on some others (some Thinkpads seem to sit around for extended periods of time otherwise) and hopefully avoids the need to add any more machine-specific quirks to the reboot code. There's still some divergence between us and Windows (mostly in how often we write to the keyboard controller), which can be cleaned up if it turns out to make a difference anywhere.

Now. Back to EFI bugs.