Matthew Garrett ([info]mjg59) wrote,
@ 2008-11-18 19:38:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:advogato, fedora

Aggressive graphics power management
My current desktop PC has an RS790-based Radeon on-board graphics controller. It also has a Radeon X1900 plugged in. Playing with my Watts Up, I found that the system was (at idle!) drawing around 35W more power with the X1900 than with the on-board graphics.

This is clearly less than ideal.

Recent Radeons all support dynamic clock gating, a technology where the clocks to various bits of the chip are turned off when not in use. Unfortunately it seems that this is generally already enabled by the BIOS on most hardware, so playing with that didn't give me any power savings. Next I looked at Powerplay, the AMD technology for reducing clocks and voltages. It turns out that my desktop hardware doesn't provide any Powerplay tables, so no joy there either. What next?

Radeons all carry a ROM containing a bunch of tables and scripts written in a straightforward bytecode language called Atom. The idea is that OS-specific drivers can call the Atom tables to perform tasks that are hardware dependent, even without knowledge of the specific low-level nature of the hardware they're driving. You can use Atom to do several things, from card initialisation through mode setting to (crucially) setting the clock frequencies. Jerome Glisse wrote a small utility called Atomtools that lets you execute Atom scripts and set the core and RAM frequencies. Playing with this showed that it was possible to save the best part of 5W by underclocking the graphics core, and about the same again by reducing the memory clock. A total saving of 9-10W was pretty significant.

The main problem with reducing the memory clock was that doing it while the screen is being scanned out results in memory corruption, showing up as big ugly graphical artifacts on the screen. I'm a fan of doing power management as aggressively as possible, which means reclocking the memory whenever the system is idle. Turning the screen off to reclock the memory would avoid the graphical corruption but introduce irritating flicker, so that wasn't really an option. The next plan was to synchronise the memory reclocking to the vertical refresh interval, the period of time between the bottom of a frame and the top of the next frame being drawn. Unfortunately setting the memory frequency took somewhere between 2 and 20 milliseconds, far too long to finish inside that time period.

So. Just using Atom was clearly not going to be possible. The next step was to try writing the registers directly. Looking at the R500 register documentation showed that the MPLL_FUNC_CNTL register contained the PLL dividers for the memory clock. Simply smacking a new value in here would allow changing the frequency of the memory clock with a single register write. It even worked. Almost. I could change the frequency within small ranges, but going any further resulted in increasingly severe graphical corruption. Unlike the sort I got with the Atom approach to changing the frequency, this corruption manifested itself as a range of effects from shimmering on the screen down to blocks of image gradually disappearing in an impressively trippy (though somewhat disturbing) way.

Next step was to perform a register dump before and after changing the frequencies via Atom, and compare them to the registers I was programming. MC_ARB_RATIO_CLK_SEQ was consistently different, which is where things got interesting. The AMD docs helpfully describe this register as "Magic field, please use the excel programming guide. Sets the hclk/sclk ratio in the arbiter", about as helpful as being told that the register contents are defined by careful examination of a series of butterflies kept somewhere in Taiwan. Now what?

Back to Atomtools. Enabling debugging let me watch a dump of the Atom script as it ran. The relevant part of the dump is here. The most significant point was:

MOVE_REG @ 0xBC09
src: ID[0x0000+B39E].[31:0] -> 0xFF7FFF7F
dst: REG[0xFE16].[31:0] <- 0xFF7FFF7F
, showing that the value in question was being read out of a table in the video BIOS (ID[0x0000+B39E] indicating the base of the ROM plus 0xB39E). Looking further back showed that WS[0x40] contained a number that was used as an index into the table. Grepping the header files gave 0x40 as ATOM_WS_QUOTIENT, containing the quotient of a division operation immediately beforehand. Working back from there showed that the value was derived from a formula involving the divider frequencies of the memory PLL and the source PLL. Reimplementing that was trivial, and now I could program the same register values. Hurrah!

It didn't work, of course. These things never do. It looked like modifying this value didn't actually do anything unless the memory controller was reinitialised. Looking through the Atom dump showed that this was achieved by calling the MemoryDeviceInit script. Reimplementing this from scratch was one option, but it had a bunch of branches and frankly I'm lazy and that's why I work on this Linux stuff rather than getting a proper job. This particular script was fast, so there was no real reason to do it by hand instead of just using the interpreter. Timing showed that doing so could easily be done within the vblank interval. This time, it even worked.

I've done a proof of concept that involved wedging this into the Radeon DRM code with extreme prejudice, but it needs some rework. However, it demonstrates that it's possible to downclock the memory whenever the screen is idle without there being any observable screen flicker. Combine that with GPU downclocking and we can save about 10W without any noticable degradation in performance or output. Victory!

I gave the code to someone with an X1300 and it promptly corrupted their screen and locked their machine up. Oh well. Turns out that they have a different memory controller or some such madness.

So, obviously, there's more work to be done on this. I've put some test code here. It's a small program that should be run as root. It should reprogram an Atom-based discrete graphics card[1] to half its memory clock. Running it again will halve it again. I don't recommend doing that. You'll need to reboot to get the full clock back. This isn't vblank synced, so it may introduce some graphical corruption. If the corruption is static (ie, isn't moving or flickering) then that's fine. If it's moving then I (and/or the docs) suck and there's still work to be done. If your machine hangs then I'm interested in knowing what hardware you have and may have some further debugging code to be run. Unless you have an X1300, in which case it's known to break and what were you thinking running this code you crazy mad fool.

Once this is stable it shouldn't take long to integrate it into the DRM and X layers. I'm also trying to get hold of some mobile AMD hardware to test what impact we can have on laptops.

[1] Shockingly enough, it's somewhat harder to underclock graphics memory on a shared memory system



(13 comments) - (Post a new comment)

ATI Mobility Radeon X1600
(Anonymous)
2008-11-18 10:02 pm UTC (link)
I have a MacBook Pro (MA610LL, http://support.apple.com/kb/SP24) with an "ATI Mobility Radeon X1600 with 256MB of GDDR3 SDRAM and dual-link DVI".

Running your avivoram tool the first time reduces power consumption by ~2 Watt according to powertop. Running it the second time saves another Watt. Running it a third time saves maybe 1/2 Watt. Running it a fourth time does not make any difference. Then I stopped testing.

When running it the first time the screen gets statically corrupted - red, yellow, and purple vertical lines appear "overlayed". Text is still more or less readable. The vertical lines are about 10-20 pixels wide. Running it for a second and third time does not make a noticeable difference. After running avivoram the fourth time I noticed some big black blocks flashing on the screen whenever I pressed Enter in the terminal. Additionally, white, slowly blinking pixels appeared.
Upon switching to a text console during reboot the screen went crazy, with large blinking boxes in red,yellow, purple, and white.

btw: there is a patched radeontool with power saving support:
http://vrodic.blogspot.com/2007/11/for-fglrx-using-people-having-idle.html

Hope this helps.
Raphael

--

lspci -vvvv output:
01:00.0 VGA compatible controller: ATI Technologies Inc M56P [Radeon Mobility X1600] (prog-if 00 [VGA controller])
Subsystem: Apple Computer Inc. MacBook Pro
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 256 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at 80000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 3000 [size=256]
Region 2: Memory at 90300000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at 90320000 [disabled] [size=128K]
Capabilities:
Kernel modules: fglrx

(Reply to this) (Thread)

Re: ATI Mobility Radeon X1600
[info]mjg59
2008-11-18 10:08 pm UTC (link)
Yeah, the static corruption is expected - that's due to it not being synced to vblank. Forcing some redraws should get rid of it. Once the RAM is below a certain speed then it won't be able to satisfy all the requirements of the GPU, so you'll start getting additional corruption. Thanks for testing!

(Reply to this) (Parent)


(Anonymous)
2008-11-18 10:07 pm UTC (link)
So "never buy nVidia or AMD graphics" still stands. Go Intel I guess?

(Reply to this)

X1350 laptop
[info]victor.osadci.myopenid.com
2008-11-18 10:08 pm UTC (link)
I can try to test this on a HP laptop with an ATI X1350 card if you promise it won't format my drive, brick the machine or kill the cat ;-)

(Reply to this) (Thread)

Re: X1350 laptop
[info]mjg59
2008-11-18 10:09 pm UTC (link)
It should do none of these things, but, hey, y'know? Running it once should certainly be safe, but I suspect it'll result in bizarreness on an X1350. Would be an interesting datapoint, though.

(Reply to this) (Parent)(Thread)

Re: X1350 laptop
[info]victor.osadci.myopenid.com
2008-11-18 10:31 pm UTC (link)
Yep, the laptop locked up, the screen filled with thin, vertical, coloured lines, which changed colour slightly after about two seconds.


It is nice to know that you are hacking on this.
Thanks!

(Reply to this) (Parent)(Thread)

Re: X1350 laptop
[info]thaytan
2008-11-19 01:02 pm UTC (link)
Hey, that's the exact behaviour my old laptop had before I replaced it :)

(Actually, after I replaced it, i figured out that it was crashing because the video daughterboard relied on the ATI video chip and memory chips having good contact with the metal plate of the top panel of the laptop to provide cooling. They'd gotten separated a bit by 3 years of throwing the laptop around, causing the video daughterboard chips to overheat... and the RAM to deliver unreliable data, which is not that different to the underclocking scenario, so really not at all surprising)

(Reply to this) (Parent)


(Anonymous)
2008-11-19 07:09 am UTC (link)
I have a fanless AMD 3450 desktop chip. I'll test it later this week when I'm not swamped with work. This is also interesting because in theory it would let a desktop run cooler and thus quieter (and in a dorm room, I want my machine as silent as possible, hence the fanless graphics card).

(Reply to this)

Re: X1350 laptop
[info]pjc50
2008-11-24 12:54 pm UTC (link)
"support dynamic clock gating, a technology where the clocks to various bits of the chip are turned off when not in use. Unfortunately it seems that this is generally already enabled by the BIOS on most hardware"

Generally dynamic clock gating gets baked into the hardware and I'm suprised it's even possible to turn it off. Manually poking registers like this is kinda an extreme sport, and I would expect the exact behaviour to vary from one card to another because they're only tested at specific combinations of clock speeds; it might be the case that you have to adjust something elsewhere to keep DRAM refresh up at the right rate, for example.

Did I mention that my employer does software for reducing power of ASIC designs? nVidia use our stuff; I can tell you that because their logo is on our website, NDAs discourage me from giving any specifics.

It's a market thing. At the moment, low power consumption of desktop PCs isn't a marketable issue, and therefore designers don't try too hard to reduce it. As you've seen, the more the entire hardware/software stack is aware of power the greater the savings that can be made. When we see major product reviews including a power consumption figure alongside the performance figures - ideally "how much power did it take to achieve this graphics benchmark" etc. - then there will be real pressure to sort this out.

(Reply to this) (Thread)

Re: X1350 laptop
[info]mjg59
2008-11-24 01:28 pm UTC (link)
Interesting. Most chipsets I've worked with (admittedly limited to AMD and Intel) have had register bits to enable or disable clock gating on specific domains. The refresh rate code is thankfully self-explanatory (unlike the memory controller init...), though I haven't yet bothered implementing it. Still, I have some information on the X1300 now, so I ought to be able to look into it this week.

Also, are you likely to be in London at any point? Haven't seen you in an astonishingly long time.

(Reply to this) (Parent)(Thread)

Re: X1350 laptop
[info]pjc50
2008-11-24 02:25 pm UTC (link)
It's been a while :) I'm not routinely in London, but can come down sometime to say hello. Probably best after Christmas sometime.

Enable/disable gating: my first thoughts on this is that it's like a useless UI option: either it works, in which case there is no reason to turn it off, or they've done something horribly dubious to make it work. It's possible that turning it off allows the chipset to be overclocked more; or maybe it's marginal design and some chips have to have it turned off in order to work (and the firmware knows to do this).

(Reply to this) (Parent)(Thread)

Re: X1350 laptop
[info]mjg59
2008-11-24 02:57 pm UTC (link)
The general case I've seen it being used is when vendors aren't sure that all of their dynamic gating will work. Supporting enabling/disabling at the per-domain level lets them ship hardware and tell firmware authors which ones they need to turn off to avoid chip lockups.

(Reply to this) (Parent)

RS780
(Anonymous)
2009-01-04 08:10 pm UTC (link)
Should this also run on RS780?
Here's my output:
$ sudo ./avivoram
ATOM BIOS: B27722 RS780 DDR2 200e/500m
100 6


Thanks!

By the way: Are you still working on it? What about an graphics card energy save framework for x/kernel?

(Reply to this)


(13 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…