Trials and tribulations with EFI

I wrote about some EFI implementation issues I'd seen on Macs a while back. Shortly afterwards we started seeing approximately identical bugs on some Intel reference platforms, and fixing it actually became more of a priority.

The fundamental problem is the same. We take the EFI memory map, identify the virtual addresses of the regions that will be required for runtime (mapping them into virtual address space if needed) and then call the firmware's SetVirtualAddressMap() implementation in order to let the firmware convert all its pointers. Sadly it seems that some firmware implementations call into sections of boot services code to do this, which is unfortunate because we've already taken that back to use as RAM. So, given that this is clearly against the spec, how does it ever work?

The tediously dull version is that Linux typically calls SetVirtualAddressMap() in the kernel, and everyone else does it in their bootloaders. The bootloader hasn't set up NX bits or anything, so it just happens to work there. We could just do it in the bootloader in Linux, but that makes doing things like kernel address space randomisation trickier, so it's not the favoured approach. So, instead, we can probably just reserve those ranges until after we've switched to virtual mode, and make sure the pages are executable. This ought to land in 2.6.40, or whatever it ends up being called.

(The alternative approach, of just never transitioning to physical mode, turns out to mysteriously fail on various machines. Calls to SetVariable() just give errors. We just don't know)

That still leaves the problem of SetVariable() on the test Mac trying to access a random address. That one turned out to be easier. There's 2MB of flash at the top of physical address space, and this was being presented as being broken into four separate EFI regions. While physically contiguous, Linux was mapping these to discontiguous virtual addresses. Apple's firmware appeared to assume that a pointer into one region could just be incremented into another. So because it's still easier to change the kernel than change Apple, 2.6.39 merges these regions to ensure they're contiguous.

Remaining problems include some machines seemingly not booting if they have 4GB of RAM or more and this Apple failing to communicate with its panel over the eDP auxchannel. Anyone got any idea how to dump the bios compatibility module out of a running EFI session?


Macs and Linux

Firstly: If you want to buy a computer to run Linux on, don't buy a Mac.
Secondly: If you have a Mac and want to run Linux on it, the easiest approach is going to be to run it under virtualisation. Virtualbox is free, and worth every bit of what you're paying.
Thirdly: If you're going to boot Linux on bare-metal Apple hardware, boot it via BIOS emulation.
Fourthly: If you're going to boot Linux on bare-metal Apple hardware via EFI, and it doesn't work, write a patch. Apple's firmware has a number of quirks that I'm aware of and we're working through them, but anyone filing bugs against Apple hardware on EFI right now is likely to be ignored for a significant period of time until there's an expectation that it'll actually work. Maybe in six months or so.


Copyright assignment

The fundamental problem with projects requiring copyright assignment is that there's an economic cost involved in me letting a competitor sell a closed version of my code without letting me sell a closed version of their code. If this cost is perceived as larger than the cost of maintaining my code outside the upstream tree, it's cheaper for me to fork than it is to sign over my rights. So if I have my own engineering resources, what rational benefit is there to me assigning my copyright?


LightDM, or: an examination of a misunderstanding of the problem

LightDM's a from-scratch implementation of an X display manager, ie the piece of software that handles remote X connections, starts any local X servers, provides a login screen and kicks off the initial user session. It's split into a nominally desktop-agnostic core (built directly on xcb and glib) and greeters, the idea being that it's straightforward to implement an environment-specific greeter that integrates nicely with your desktop session. It's about 6500 lines of code in the core, 3500 lines of code in the gtk bindings to the core and about 1000 in the sample gtk greeter, for a total of about 11,000 lines of code for a full implementation. This compares to getting on for 60,000 in gdm. Ubuntu plan to switch to LightDM in their next release (11.10).

This is a ridiculous idea.

To a first approximation, when someone says "Lightweight" what they mean is "I don't understand the problems that the alternative solves". People see gtk and think "Gosh, that's kind of big for a toolkit library. I'll write my own". And at some point they have to write a file dialog. And then they need to implement support for handling remote filesystems. And then they need to know whether the system has a functioning network connection or not, and so they end up querying state from Network Manager. And then they suddenly have a library that's getting on for the size of gtk, has about the same level of complexity and has had much less testing because why would you want to use a lightweight toolkit that either does nothing or is 90% of the size of the alternative and crashes all the time.

Adding functionality means that code gets larger. Given two codebases that are of significantly different sizes, the two possible conclusions are either that (a) the smaller one is massively more competently written than the larger one, or (b) the smaller one does less. The gdm authors have never struck me as incompetent, even if some people may disagree with some of their design decisions, and the LightDM authors don't seem to have argued on that basis either. So the obvious conclusion is that LightDM does less.

And, indeed, LightDM does less. Part of this is by design - as the proposal to the Gnome development list shows, one of the key advantages of LightDM is claimed as it not starting a Gnome session. And from that statement alone, we can already see that there's been a massive failure of understanding the complexity of the problem.

Let's go back to the comparisons of code size. LightDM's simple GTK greeter is about 1000 lines of code. gdm's greeter is almost 20,000. Some of this is arbitrary shiny stuff like the slidy effects that occur, but a lot of it is additional functionality. For example, some of it is devoted to handling the interface with AccountsService so gdm can automatically update when users are created or deleted. Some of it is providing UI for accessibility functionality. Some of it is drawing a clock, which I'll admit may be a touch gratuitous.

But if your argument is that your software is better because it's doing less, you should be able to ensure that you can demonstrate that the differences aren't important. And the differences here are important. For example, one of the reasons gdm starts a local gnome session is that it wants gnome-power-manager to be there to handle power policy. Closing the lid of my laptop should suspend the system regardless of whether it's logged in or not. LightDM takes a different approach. Because there's no session, it has to take care of this kind of thing itself. So the backend daemon code speaks to upower directly, and the greeters ask the daemon to implement their policy decisions.

This is pretty obviously miserable. Now you've got two sets of policy - one at the login screen, and one in your session. How do I ensure they're consistent? The only sane solution is to ignore the functionality the backend provides and have my greeter run gnome-power-manager. And now how about accessibility preferences? Again, if I want to have the same selection of policy, I need to run the same code. So you end up with a greeter that's about as complex and large as the gdm one, and unused functionality in the backend. Lighter weight through code duplication. We have always been at war with Eurasia.

The entirety of LightDM's design is based on a false premise - that you can move a large set of common greeter functionality into a daemon and just leave UI presentation up to the greeter code. And if you believe that, then yes, you can absolutely implement a greeter in 1000 lines of code. It'll behave differently to your desktop - the range of policy you can implement will be limited to what the daemon provides, even if your desktop environment has a different range of features. It'll have worse accessibility for much the same reason. And eventually you'll end up with a daemon that's absolutely huge in order to attempt to provide the superset of functionality that each different desktop makes use of.

The only real problem LightDM solves is that it makes it easier to write custom greeters, and if you're really seeking to differentiate your project based on your login screen then maybe your priorities are a little out of line. I'm sure that Ubuntu will release with a beautiful 3D greeter that has a wonderful transition to the desktop. It's just a shame that it won't do any of the useful things that the existing implementations already do.

And if you think that when LightDM finally has the full feature set of gdm, kdm and lxdm it'll still be fewer lines of code and take less memory - I hear the BSD kernel is lighter weight than Linux. Have fun with it.


On platforms

At some stage the seminal KDE vs Gnome paper vanished from its original home, and while it's still available in a few places (such as here) it set me thinking. What are the fundamental differences between Gnome and KDE development? There's lots of little differences (2006: Gnome parties on a beach. Akademy has melted ice cream in the rain) but they're both basically communities made up of people who are interested in developing a functional and interesting desktop experience. So why do the end results have so little in common?

Then I read this and something that had been floating around in my mind began to solidify. KDE assumes a platform and attempts to work around its shortcomings. Gnome helps define the platform and works on fixing its shortcomings.

It's pretty easy to see this across the platform. The developer of the Gnome Bluetooth support has multiple commits to the underlying Bluetooth stack, while nobody who's committed to bluedevil appears to. The main developer of the Gnome Networkmanager support is Networkmanager upstream, with the same applying to the Gnome power management infrastructure. And when Gnome developers find limitations in graphics drivers, those tend to be fixed in the graphics drivers rather than worked around in the UI code. KDE builds on top of what's already there, while Gnome is happy to flatten some mountains first.

I should emphasise that I'm not criticising KDE here[1]. These are both rational development models. One optimises for making things work and will compromise on functionality in order to be more portable to different underlying operating systems. The other optimises for additional functionality at the cost of being tied to a much smaller number of underlying operating systems that have to be completely up to date. But understanding that this distinction exists is key to understanding fundamental differences between the projects, and any argument about which is better or about how there should be more collaboration has to take these fundamentally different approaches into consideration. My personal belief is that a tightly integrated platform is going to produce a more compelling product in the long run than one built on top a series of abstraction layers, but we'll see what happens in the long run.

And then, of course, there's Unity and Canonical's gradual effort to turn Ubuntu into a platform distinct from either Gnome or KDE. But that's a separate post.

[1] Well, except for the melted ice cream at Akademy 2006. But I think that's fair.


HTC are still incredible fuckheads

Update: Despite another email yesterday reasserting the 90-120 days lie, the source code has now landed on HTC's site.

As has been discussed before, HTC have a somewhat "interesting" interpretation of the GPL that allows them to claim they don't need to provide source code until between 90 and 120 days after the release of binaries. It's probably noteworthy that the FSF (who, you know, wrote the license and all) disagree with this interpretation, as do the kernel copyright holders (who, you know, wrote the code that the license covers) I've talked to about it. Anyway, after a pile of screaming and shouting from all sides HTC have tended to release their source code in a timely manner. So things seemed better.

HTC released the Thunderbolt last week and we're back to the 90-120 day song and dance. It's probably worth remembering that by behaving in this way HTC gain a competitive advantage over any vendors who obey the terms of their license - HTC can incorporate improvements made by others without releasing their own until through a significant portion of the lifecycle of their phone.

As far as I'm concerned, every single Thunderbolt sold so far embodies a copyright infringement. Wilfully engaging in copyright infringement for commercial benefit is typically frowned upon by courts, especially if by doing so a foreign company is gaining commercial advantage over a domestic one. If you think Microsoft's patent assault on Android is a problem, just imagine what they could do if they hired one significant Linux kernel developer and used their copyrights to attack the overwhelming majority of Android vendors who fail to comply with the GPL. It probably wouldn't be industry ending (companies would merely have improve their compliance procedures) but it'd do a huge deal of damage in the short term. It's insane for companies to behave this way. Don't reward them by giving them your money.

I'll be talking about this at the Linux Foundation Collaboration Summit next month, along with an update on my study of the compliance of Android tablets. I'm hoping that there'll be further developments after that.


Archos update

Archos confirmed to me that they don't have source code for their RK2818-based models at the moment, which means the 7" home tablet (version 2) and the Arnova range all appear to be infringing. For a company that is actually on the better end of the scale for compliance, that's somewhat disheartening. My understanding is that the Arnova and "home tablet" ranges (as opposed to the "internet tablet" range) are subcontracted or rebadging exercises, so there's probably less corporate oversight than for the internally developed hardware. This is, obviously, not an excuse.


Archos tablets

Has anyone tried to obtain the kernel source for the Archos 7 home tablet V2 or the Arnova range (ie, anything Archos is shipping that's based on the RK2818 rather than the RK2808)? If so, what was the response? The source from their site only appears to be for the RK2808 devices.


Further adventures in mobile Linux

I picked up a couple of cheap Linux devices at the weekend. First of all, a $99 Android tablet from CVS, made by Craig. It's a generic RK2818 device and of course it's lacking any kind of GPL offer in the documentation. As far as I know the only company that's released any Rockchip source so far has been Archos, and even then they haven't released the tools you need to actually build an image - they seem to be floating around the internet anyway. But it's straightforward to get it to run the Android market, and it runs Shortyz quite well, so fit for purpose from my point of view. I am, obviously, attempting to contact Craig to find out how they're going to satisfy their obligations but haven't got past their bizarre text-to-speech based support menu system that dumps you to answerphone after 5 minutes of being on hold. Next attempt will involve pressing more buttons.

The other one was a Sharper Image Literati e-reader, $49 from Macy's (on clearance, obviously). This one's interesting by virtue of not being an Android device. Instead it's got a fairly recognisable Busybox-based Linux environment that's even got udev and dbus running. It brings up a framebuffer and just dumps a QTE-based reader (from Kobo) onto it. Other than being woefully underpowered and slow, it actually seems very competent. There seem to be several versions of the hardware - the one I got has an ARM SoC from SiRF on it. SiRF make GPS chipsets, and it turns out that their Atlas 5 platform is actually intended for Linux-based GPS units. The embedded world always seems to find a way. What surprised me more is that it's probably the most polished looking Linux I've bought for under $300. No bizarre kernel spew. echo mem >/sys/power/state works. Standard backlight interface.

Oh, and no source. Obviously. But an interesting device regardless.


LCA 2011

I'm both back from LCA 2011 and also over the associated brutal jetlag, so if you sent me mail and I still haven't replied then it's fallen down some sort of cliff and you should probably shout at me until I do something about it. LCA was, as usual, excellent. Especially given that the original venue was the same distance above the river as the now somewhat misleadingly described dry dock on the opposite bank.

I did a bunch of talks this year, and they're now all online, so without further ado:

Enterprise power management - a discussion of power management with a focus on enterprise users, given at the plumbers miniconf. LWN did a writeup here, and every time I say "Tera" you should pretend that I'm saying "Giga". I have no good excuse.

Linux license compliance - about the poor observance of the GPL's conditions by vendors, given at the business miniconf. A useful example presented itself in the form of a GPL-violating Android tablet that I bought in literally the first store I went into in Australia this year. Probably also the last appearance of my Knuth t-shirt, because it appears to have several holes in places where it shouldn't.

Making laptops work in Linux - talking about identifying how laptop-specific functionality is wired up, with an emphasis on reverse-engineering ACPI methods to figure out how to make things work. This one was my presentation at the main conference.

The organisers and volunteers deserve incredible gratitude for the way the conference was managed, especially considering that the CBD was underwater a week and a half earlier.