Matthew Garrett (mjg59) wrote,
Matthew Garrett

Power management design philosphy

One of the traditional problems with power management has been balancing power savings against reduced functionality. The two traditional approaches to this have been to have sane defaults or to let the user tweak them, with a hybrid approach being to have sane defaults and an "advanced" box that lets users set their own parameters.

Both of these options suck.

The problem with sane defaults is that power management is an area where the "sane default" genuinely does differ between users. Usage patterns will determine the appropriate spindown period for a hard drive. AHCI's ALPM will disable hotswap detection, which is fine for home users but a pain if people want to rip out a failed RAID element. ALPM also introduces a small amount of latency (on the order of a seek in the most aggressive mode), which is entirely irrelevant for almost everyone but is a pain if you're trying to offer i/o latency guarantees. PCI-E link power management adds small quantities of latency. Letting the CPU run to full speed whenever it wants to is good for power management but potentially bad for thermal management. Screensaver timeouts vary depending on whether you're watching a movie or not.

The problem with leaving everything tweakable is that you're asking users to make choices about things but not giving them the information they need to make those choices. Whether you get a power saving from hard drive spindown depends on whether the drive is idle for long enough to save the power you'll spend spinning it back up. Get it wrong and you'll be putting your drive under extra load, reducing performance and consuming more power than you were to begin with. If you limit the maximum speed of your CPU to keep it cool, you'll be wasting power and performance even when it's well below the thermal limits. Disabling ALPM because you might want to change a RAID device at some stage means you're consuming power despite generally not needing to use the additional functionality.

But while these options both suck in different ways, they share one fundamental failure. They both imply static configuration. The correct power management policy depends on the way that a machine is being used at a specific instant in time, and no matter how many power management profiles you add there's no way for a user to switch between them fast enough to obtain a decent power/performance ratio.

Adaptive power management adopts a different strategy. Rather than define a static policy (or several static policies), the hardware is monitored to determine the best power management settings. An example of this is the ondemand CPUfreq governor. Rather than attempt to provide a static configuration, the processor frequency is automatically changed to meet demand while also keeping the overall power consumption as low as possible. This patch adds a thermal component to its policy, preventing the machine from overheating. Result? No requirement to provide tunables to userspace.

Something similar is achievable with hard-drive spindown. By simply monitoring how often reads or writes actually hit disk, it's possible to automatically adjust the spindown period to obtain decent power savings while reducing the number of spinups. This approach is imperfect - in an ideal world we'd know to spin the drive up before it's needed, but that simply isn't possible. However, the user is almost certainly unable to make a better guess. It's not realistic for a user to be able to predict whether or not a given page is still in cache or has been evicted. Nothing is lost by providing a dynamic policy, and another unnecessary tunable has vanished.

The difficult part of an adaptive policy comes when roles are switched in ways that don't correspond to changse in hardware status. One of the most common cases is that of screensaver settings when the user is watching a video. It's not practical for the kernel or a system daemon to predict this case, so we have two choices:
  • Get the user to switch to a "media" power policy and then remember to switch back afterwards
  • Have the media player deactivate the screensaver at startup and reactivate it when the video isn't playing
.One of these choices involves the user having to explicitly state something that's obvious - playing a video automatically implies the desire to adopt a different power management policy, and so asking the user to manually adjust that is ridiculous.

The loss of hotswap notifications when ALPM is enabled? Simply reenable it if SMART starts complaining or the RAID set becomes degraded or a filesystem is unmounted or any of the other things that will have to be done before a drive can be safely removed. Latency an issue? Let userspace tell you what its latency requirements are. When we get down to it, the only settings that we can't infer are things like idle-time-to-suspend, screensaver settings and default brightnesses. Which, curiously, are about the only things that Mac OS exposes[1].

The problem we face is that we don't currently have interfaces to let applications tell us what their requirements are. This is theoretically straightforward, but getting it right is important. Beyond latency and screensaver issues, what constraints do applications want to impose? How can we expose an interface to those constraints? This is the sort of thing we'll be looking at at the Linux Plumbers Conference, so if you have ideas about what they'd need to look like then sign up. Or, even better, submit a paper and be part of the process that gets Linux's power consumption as close to the theoretical minimum as possible.

[1] My Vista system gives me 25 options which can be set for each of AC and battery.
Tags: advogato, fedora

Comments for this post were locked by the author