Matthew Garrett ([info]mjg59) wrote,
@ 2008-07-27 02:42:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:advogato, fedora

Further Foxconn fun
Ryan kindly sent me a copy of the ACPI tables for his motherboard, so I've had the opportunity to look at them in a little more detail. There's nothing especially surprising. The first method of interest is OSFL, which I've annotated below:

   Method (OSFL, 0, NotSerialized)
    {
        If (LNotEqual (OSVR, Ones))
        {
            Return (OSVR)
        }
This block simply skips the checks if they've already been evaluated and returns the cached value
        If (LEqual (PICM, Zero))
        {
            Store (0xAC, DBG8)
        }
If the programmable interrupt controller has been set up in PIC mode rather than APIC mode, 0xAC is written to i/o port 0x80. This would then show up on a plug-in card if one were attached. Simply debug code
        Store (One, OSVR)
Set OSVR to 1, which in this case clearly means "Unknown OS"
        If (CondRefOf (_OSI, Local1))
This checks whether the OS supports the _OSI method. If it does, the following block is executed. If not, the older _OS method is used to detect the OS
        {
            If (_OSI ("Windows 2000"))
            {
                Store (0x04, OSVR)
            }
Newer versions of Windows will also claim to support the interfaces defined in older versions, so this set of checks is done in release order
            If (_OSI ("Windows 2001"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001 SP1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001 SP2"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001.1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2001.1 SP1"))
            {
                Store (Zero, OSVR)
            }

            If (_OSI ("Windows 2006"))
            {
                Store (Zero, OSVR)
            }
If we've got this far, OSVR is now set to 0. Linux will claim to support all of these interfaces, and so OSVR should be 0 on Linux systems. Note that there is no _OSI check for Linux - the 2.6.24 change to remove Linux from the set of claimed interfaces is therefore irrelevant
        }
        Else
        {
Linux supports _OSI, so we should never be here. But if we somehow are...
            If (MCTH (_OS, "Microsoft Windows NT"))
            {
                Store (0x04, OSVR)
            }
Linux has responded to _OS with "Microsoft Windows NT" since 2.6.9. MCTH is simply a string matching routine defined elsewhere in the DSDT. So, worst case here is that OSVR is 4
            Else
            {
                If (MCTH (_OS, "Microsoft WindowsME: Millennium Edition"))
                {
                    Store (0x02, OSVR)
                }

                If (MCTH (_OS, "Linux"))
                {
                    Store (0x03, OSVR)
                }
..because this could never be true unless you're running 2.6.8.1 or earlier. But even so, getting here would still indicate failure - we've supported _OSI since before then, and so should never come anywhere near this code block.
            }
        }

        Return (OSVR)
    }
In summary, we end up with the following values:
ValueOS
0Windows XP, 2003 or Vista. Linux (assuming absence of bugs)
1Unknown OS
2Windows ME
3A version of Linux that doesn't implement _OSI and is from before 2.6.9
4Windows NT 4 and 2000. A version of Linux that doesn't implement _OSI and is 2.6.9 or later (I don't believe any such version exists

Now, where is this used? The majority of the OSFL checks only check whether the return value is 1 or 2, which will only be true for an OS that (a) doesn't claim to be Windows or (b) is Windows ME. Linux doesn't fall into either of these categories, so we can ignore them. The first interesting hit we have is in the HPET code, where _STA will return 0xf (device present and working) if OSFL is 0 and 0xb (device present and working, but should not be shown in the UI) otherwise. This is just to keep the HPET from showing up in versions of Windows that don't know what it is. The only other interesting hit is the following code from the PCI bus initialisation pathway:
 
                               If (LEqual (OSFL (), Zero))
                                {
                                    Store (0x59, SMIC)
                                }
                                Else
                                {
                                    If (LEqual (OSFL (), 0x04))
                                    {
                                        Store (0x5A, SMIC)
                                    }
                                    Else
                                    {
                                        Store (0x58, SMIC)
                                    }
                                }
This writes different values to SMIC (which turns out to be i/o port 0xb2) depending on the OS. 0xb2 is the standard(ish) way to trigger a system management interrupt, which causes the CPU to execute some code from a memory region that can't be accessed by the OS. This isn't that unusual, but it's a little weird. In any case, note that there's no check for whether OSFL is 3 here (which would be true if the _OS call returned Linux), and so Linux is being treated identically to Windows ME and any unknown OS. In reality, Linux will be treated identically to either Vista or 2000. This block provides no evidence of conspiracy. Finally, the OS version flag is written to a region of memory before suspend and read back afterwards. Nothing appears to be done with this information - it's conceivable that the low-level resume code in the BIOS has conditionals based on this, but I suspect that it's just boilerplate code that's ignored.

To summarise:
  • There is no code in this DSDT that could determine that the system is running any Linux kernel of 2.6.9 or later. This may even be true of earlier versions - I'm not sure when _OSI support was added
  • Even if the code did manage to determine that the system was running Linux, there are no codepaths that are Linux specific. Every piece of code is run on at least one version of Windows
What's the problem, then? I've no idea. The only "significant" issue is that the OEMB table provided by the BIOS has an incorrect checksum. Given that the OEMB table is never used by Linux (it's a vendor extension of some kind, with the best hint I've been able to find being that it can be used to pass information from the BIOS to the OS - kind of like the rest of ACPI, then...), this is pretty unimportant. And given that the OEMB table isn't part of the ACPI spec, it's certainly entirely irrelevant when it comes to determining whether the system is ACPI compliant or not.

Are there ACPI issues with Ryan's system? It sounds like it. The "Error attaching device data" complaints indicate some kind of failure on the part of the kernel to work out how the devices correspond to the ACPI namespace, but I strongly suspect that this is a Linux bug. Failure to reboot after suspend? Could be anything (I'd need direct access to the hardware to figure it out properly), but again it's almost certainly a Linux bug. The standard way Linux reboots systems is to bang the keyboard controller, and it's conceivable that something we're doing on resume is leaving the keyboard controller in a slightly confused state. We're clearly doing something wrong there, given that my Dell comes up without a keyboard about one resume in twenty - I just haven't had time to look into it yet.

The only remaining thing is the mutex handwaving. I've got no clue what's going on there. Ryan's suggested change (from Acquire (MUTE, 0x03E8) to Acquire (MUTE, 0xFFFF)) simply means that the OS will wait forever until it acquires the mutex - in the past it would only wait a second. The reason the compiler generates a warning here is that the firmware never checks whether it acquired the mutex or not! Bumping the timeout to infinity obviously fixes this warning (there's no need to check the return code if you're happy to wait forever rather than failing), but the original code is merely stupid as opposed to a spec violation.

Take home messages? There's no evidence whatsoever that the BIOS is deliberately targeting Linux. There's also no obvious spec violations, but some further investigation would be required to determine for sure whether the runtime errors are due to a Linux bug or a firmware bug. Ryan's modifications should result in precisely no reasonable functional change to the firmware (if it's ever hitting the mutex timeout, something has already gone horribly wrong), and if they do then it's because Linux isn't working as it's intended to. I can't find any way in which the code Foxconn are shipping is worse than any other typical vendor. This entire controversy is entirely unjustified.



(65 comments) - (Post a new comment)

Then please explain this...
(Anonymous)
2008-07-27 04:14 am UTC (link)
When I change it, the checksum error goes away, the ACPI: Failed to attach Device goes away, the random crashing goes away, etc.

Something is obviously wrong with the BIOS, and some of this is possibly Linux bugs on top, is what I think.

-Ryan

(Reply to this) (Thread)

Re: Then please explain this...
[info]mjg59
2008-07-27 06:47 am UTC (link)
It's impossible for the changes you made to affect the checksum error. Completely. Utterly. Impossible. I really can't be any clearer here. Altering the DSDT does nothing whatsoever to influence the other ACPI tables.

(Reply to this) (Parent)(Thread)

Re: Then please explain this...
(Anonymous)
2008-07-27 07:21 am UTC (link)
Atually, I must insist, my way no checksum error AT ALL, the way Foxconn shipped the BIOS, I get the error on every boot.

OK, you don't _know_ how it's doing that or why, I can accept that, there are probably no doubt other things at play here for this to be happening.

But I swear on my life and a stack of Bibles, the difference is NIGHT and DAY, if you can't explain this, that's all the more cause for me to be alarmed.

-Ryan

(Reply to this) (Parent)(Thread)

Re: Then please explain this...
[info]mjg59
2008-07-27 07:35 am UTC (link)
The checksum code was modified somewhat in 2.6.25, so one possibility is that you're not testing both cases with the same kernel.

(Reply to this) (Parent)(Thread)

Re: Then please explain this... - (Anonymous), 2008-07-27 08:15 am UTC
Re: Then please explain this... - [info]fo0bar, 2008-07-27 09:14 am UTC
Re: Then please explain this... - (Anonymous), 2008-07-27 10:07 am UTC
Re: Then please explain this... - (Anonymous), 2008-07-27 04:18 pm UTC
Re: Then please explain this... - (Anonymous), 2008-07-27 09:24 pm UTC
Re: Then please explain this... - (Anonymous), 2008-07-28 03:08 am UTC
Re: Then please explain this... - (Anonymous), 2008-07-28 03:10 am UTC
Re: Then please explain this... - (Anonymous), 2008-07-27 10:01 pm UTC
Re: Then please explain this... - (Anonymous), 2008-07-28 03:12 am UTC
Re: Then please explain this... - [info]zdzichu.openid.pl, 2008-07-28 08:19 am UTC

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)
Re: Then please explain this...
(Anonymous)
2008-07-27 02:11 pm UTC (link)
A stupid question but it can be something that people are overlooking...

Are you sure you don't have anything like "acpi_osi=Linux" on the cmdline? Or you could try and force it with acpi_os_name= if unsure.



(Reply to this) (Parent)

Re: Then please explain this...
(Anonymous)
2008-07-28 01:44 pm UTC (link)
http://izanbardprince.wordpress.com/2008/07/28/foxconn-says-acpi-issues-are-amis-fault-is-having-them-repair-the-code/

Foxconn has said they are sending the BIOS to AMI for repair.

-Ryan

(Reply to this) (Parent)

(Reply from suspended user)

(Reply from suspended user)

(Reply from suspended user)
oh Failure to reboot after suspend
[info]vuvuvu.wordpress.com
2008-07-27 10:44 am UTC (link)
oh I have this problem of "Failure to reboot after suspend" for a while on my asus V6J
the strange thing is that the problem appears after a bios upgrade with the old bios the problem didn't exists if you are interested I have the 2 dsdt

(Reply to this) (Thread)

Re: oh Failure to reboot after suspend
(Anonymous)
2008-07-27 03:22 pm UTC (link)
I have sent Matthew the email addresses of the managers at Foxconn who are (now) interested in/willing to resolve this issue.

I would kindly suggest that his energies are better spent working with them than... attacking me on his blog?

-Ryan

I doesn't understand why Matthew is tossing out some weasel words like "Oh, this is executing some mystery code that the OS can't see, but Ryan is in the Black Helicopter/Tin Foil Hat Society", the point is that he doesn't know the complete story of what they're up to, even having seen what I've sent him, and now that Foxconn is willing to help, he and others who have been _ASKING_ for this since forever shouldn't be complaining about it now.

(Reply to this) (Parent)

Re: oh Failure to reboot after suspend
(Anonymous)
2008-07-27 03:25 pm UTC (link)
I have sent Matthew the email addresses of the managers at Foxconn who are (now) interested in/willing to resolve this issue.

I would kindly suggest that his energies are better spent working with them than... attacking me on his blog?

I don't understand why Matthew is tossing out some weasel words like "Oh, this is executing some mystery code that the OS can't see, but Ryan is in the Black Helicopter/Tin Foil Hat Society", the point is that he doesn't know the complete story of what they're up to, even having seen what I've sent him, and now that Foxconn is willing to help, he and others who have been _ASKING_ for this since forever shouldn't be complaining about it now.

-Ryan

(Reply to this) (Parent)

Don't assume when we can't see their code
(Anonymous)
2008-07-27 06:31 pm UTC (link)
[Part 1 or 2]
>> Even if the code did manage to determine that the system was running Linux, there are no codepaths that are Linux specific. Every piece of code is run on at least one version of Windows

For the quote above, remember that Monopolysoft DOES WANT their old OS to fail. They want users to have to upgrade. The above is thus no suggestion of sloppy play but may actually support foul play.

I read over the post. I have no comment on the middle section yet since I don't think this was covered by Ryan (the SMIC section). [I don't know APIC, btw, but am trying to follow the logic given.]

The blogger mjg59 claims there are no code paths to track newer Linux. [From what I have read..] That is false. There are paths. it's just that Linux tries to hack around such paths (I think Ryan in a different blog entry suggested that this is because Linux works around particular items when it recognizes the compiler output as being from Microsoft's compiler as appears to be the case here).

Worst case, Linux should get tagged as a unsupported OS, which means that there would likely be a failure due to bitrot. Apparently, Foxconn has no way of "guessing" that they support APIC except if the Monopolysoft testsuite says so. Foxconn needs to fix this, and the FTC should know about it. Using a clearly biased Monopolysoft testsuite as proof you implement an "open" standard is laughable (but probably done to save a buck).

>> Given that the OEMB table is never used by Linux (it's a vendor extension of some kind, with the best hint I've been able to find being that it can be used to pass information from the BIOS to the OS - kind of like the rest of ACPI, then...), this is pretty unimportant.

Embrace and extend.

Overall, I think there is enough evidence for justice officials to look closer at this matter. The problem may or may not be with Foxconn, but without more careful investigation one side can give the benefit of all doubts (that it's cases of Linux bugs or Foxconn innocent sloppiness/oversight) while the other can assume the worst.

>> The "Error attaching device data" complaints indicate some kind of failure on the part of the kernel to work out how the devices correspond to the ACPI namespace, but I strongly suspect that this is a Linux bug. Failure to reboot after suspend? Could be anything (I'd need direct access to the hardware to figure it out properly), but again it's almost certainly a Linux bug. The standard way Linux reboots systems is to bang the keyboard controller, and it's conceivable that something we're doing on resume is leaving the keyboard controller in a slightly confused state. We're clearly doing something wrong there, given that my Dell comes up without a keyboard about one resume in twenty - I just haven't had time to look into it yet.

In any case, it's very possible that Monopolysoft has fallen short. They are a monopolist and so have a higher standard (to match the extremely lopsided power they wield.. something Gates is very conscience of as it is a clear part of their business strategies to preserve their monopolies). They cannot be negligent or careless or encourage (directly or indirectly) actions in vendors that would lead to certain biases towards them [IANAL].

For example, if they add proprietary extensions to the standard to overcome bugs in their misimplementation of the regular standard, they are violating the law. In the least, they should be forbidden from giving access to their tools to vendors when their tools clearly bias in favor of their OS [I refer to the MS compiler Ryan mentions in another post]

>> The only remaining thing is the mutex handwaving. I've got no clue what's going on there. Ryan's suggested change (from Acquire (MUTE, 0x03E8) to Acquire (MUTE, 0xFFFF)) simply means that the OS will wait forever until it acquires the mutex - in the past it would only wait a second.

I think Ryan (and he could be wrong) said this was a reference to a memory value. So is this a time out value or a memory location?

(cont)

(Reply to this) (Thread)

Re: Don't assume when we can't see their code
[info]mjg59
2008-07-27 06:36 pm UTC (link)
The blogger mjg59 claims there are no code paths to track newer Linux. [From what I have read..] That is false. There are paths. it's just that Linux tries to hack around such paths

Wrong. Linux gives no indication that it's anything other than Windows.

I think Ryan (and he could be wrong) said this was a reference to a memory value. So is this a time out value or a memory location?

Ryan is wrong. It's a timeout. Check section 17.5.1 of version 3.0 of the ACPI spec.

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code
(Anonymous)
2008-07-27 06:40 pm UTC (link)
>> Wrong. Linux gives no indication that it's anything other than Windows.

That is precisely the hack I am talking about. It's a work-around unless the standard says that you have to indicate that you are Windows.

>> Ryan is wrong. It's a timeout. Check section 17.5.1 of version 3.0 of the ACPI spec.

I don't doubt you. Is there a link to a public version for me to check? [Update: it appears this is the link: http://www.acpi.info/spec.htm ]

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code
[info]mjg59
2008-07-27 06:44 pm UTC (link)
There are no code paths to track newer versions of Linux, since Linux has not responded to _OS("Linux") since 2.6.9 was released almost four years ago. The standard states that you should respond to all _OSI strings that represent interfaces you support, and Linux supports the interfaces implemented in every released version of Windows. It's not a hack, it's how it's meant to work. We don't provide an _OSI("Linux") string because the interface semantics implemented in Linux change with every release, so nobody would be able to do anything useful with it.

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code - (Anonymous), 2008-07-27 07:12 pm UTC
Re: Don't assume when we can't see their code - [info]mjg59, 2008-07-27 07:15 pm UTC
Re: Don't assume when we can't see their code - (Anonymous), 2008-07-27 07:35 pm UTC
(no subject) - [info]ajaxxx, 2008-07-27 09:10 pm UTC
(no subject) - (Anonymous), 2008-07-27 09:21 pm UTC
(no subject) - [info]ajaxxx, 2008-07-27 09:33 pm UTC
(no subject) - (Anonymous), 2008-07-27 10:02 pm UTC
(no subject) - [info]fooishbar, 2008-07-28 02:26 am UTC
(no subject) - [info]ajaxxx, 2008-07-28 03:28 am UTC
Not any more it's not - [info]Chris C [blondechris.com], 2008-07-29 09:35 pm UTC
Re: Not any more it's not - [info]fooishbar, 2008-07-29 11:24 pm UTC
Don't assume when we can't see their code: part2
(Anonymous)
2008-07-27 06:36 pm UTC (link)
[Part 2 or 2]

>> Take home messages? There's no evidence whatsoever that the BIOS is deliberately targeting Linux. There's also no obvious spec violations, but some further investigation would be required to determine for sure whether the runtime errors are due to a Linux bug or a firmware bug.

There IS evidence that Linux is being targeted. You waved that off by saying that "since Linux bypasses that tripwire, then ...." The fact is that there is an identification for Linux specifically. At least that much seems clear if the quoted code portions are accurate.

What I suspect happened is that Linux bypassed that tripwire, perhaps, but fell into something else due to a lack of knowledge over extensions or because of misimplementations, or likely because of both. If APIC allows for extensions or for very little to be communicated, then Foxconn might technically be following the spec. We'd have to dig some more. I don't know what is the text of APIC, so I can't judge.

>> Ryan's modifications should result in precisely no reasonable functional change to the firmware (if it's ever hitting the mutex timeout, something has already gone horribly wrong), and if they do then it's because Linux isn't working as it's intended to.

We can't know about the firmware because we don't have access to firmware code. Linux might be given bad info.

>> I can't find any way in which the code Foxconn are shipping is worse than any other typical vendor.

You also couldn't find a Linux bug, yet you assume one likely exists.

Plus, are other motherboards giving these problems?

I think Foxconn is being negligent but is doing so because it is best for business (since this sort of negligence and bad engineering practices, like using Monopolysoft testsuites to claim standards compliance with an open standard and not even looking at or understanding the code, likely leads to them not getting blackmarked by Monopolysoft). Monopolysoft is almost surely violating the law again as they are in many ways. They hide behind a cover of secrecy and "sloppiness".

The antitrust authorities need to do their work and prevent Monopolysoft from having such an easy time leveraging tie-ins in being allowed to compete in related markets while their keeping their code closed. Open standard is not a term that means anything except in the context of cooperative vendors. Clearly, when one vendor is a monopolist reaping monopolist profits, those conditions don't exist.

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code: part2
[info]fooishbar
2008-07-28 02:30 am UTC (link)
Taking it as axiomatic that the antitrust authorities don't pay any attention whatsoever to anonymous comments left on a LiveJournal containing (but not limited to) uppercase abuse directed towards telcos, I'm assuming that you don't actually want to pursue this beyond constantly parroting 'Monopolysoft' to open source developers.

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code: part2
(Anonymous)
2008-07-28 06:39 pm UTC (link)
I don't think you are whom I had in mind if all you think I did was parrot Monopolysoft. I get the feeling you get stuck on delivery and miss content. You could have argued any of the various items I mentioned but all you can manage to find interesting mentioning is Monopolysoft.

I forget that the postings here are under anon.

The postings I made here yesterday were:
http://mjg59.livejournal.com/94998.html?view=911894#t911894
http://mjg59.livejournal.com/94998.html?view=912406#t912406
http://mjg59.livejournal.com/94998.html?view=913174#t913174
http://mjg59.livejournal.com/94998.html?view=913686#t913686
http://mjg59.livejournal.com/94998.html?view=913942#t913942
http://mjg59.livejournal.com/94998.html?view=914454#t914454
http://mjg59.livejournal.com/94998.html?view=914966#t914966
http://mjg59.livejournal.com/94998.html?view=916758#t916758

Jose_X

(Reply to this) (Parent)(Thread)

Re: Don't assume when we can't see their code: part2 - [info]fooishbar, 2008-07-28 07:41 pm UTC
Re: Don't assume when we can't see their code: part2 - (Anonymous), 2008-07-29 04:01 am UTC
Re: Don't assume when we can't see their code: part2 - (Anonymous), 2008-07-29 04:03 am UTC
Re: Don't assume when we can't see their code: part2 - [info]mjg59, 2008-07-29 05:51 am UTC
Re: Don't assume when we can't see their code
(Anonymous)
2008-07-27 06:47 pm UTC (link)
>> The blogger mjg59...

Didn't mean to refer to you this way. I wrote the comment originally for posting elsewhere and did not change it for use here.

(Reply to this) (Parent)

keyboard fail for me also
(Anonymous)
2008-07-27 06:37 pm UTC (link)
Hrm.

I have a similar no-keyboard problem with my Dell laptop, 1420n, Debian Testing/Unstable, Debian/Linux 2.6.25-2-686. After waking up from sleep about 20-30% of the time my trackpad buttons and keyboard fail to respond to input.

If I plug a USB mouse and keyboard into it then I can log out and reboot, but until I do the reboot the keyboard will never come back. (even if I restart X, switch to console, etc) At least as far as I can tell.

(Reply to this) (Thread)

Re: keyboard fail for me also
(Anonymous)
2008-07-27 06:38 pm UTC (link)
Err. That is to say the USB keyboard and mouse I add after the fact works fine, but the onboard keyboard will never work again until after a reboot.

(Reply to this) (Parent)(Thread)

Re: keyboard fail for me also
(Anonymous)
2008-08-05 07:10 pm UTC (link)
Acer Aspire 5672, same problem, but under Windows XP :o
Works fine on Ubuntu Intrepid.

(Reply to this) (Parent)

Looks like it's fixed
(Anonymous)
2008-08-03 11:17 am UTC (link)
According to an Ubuntu forum post by a Foxconn China employee, this should soon be fixed.

See http://ubuntuforums.org/showthread.php?t=877721

"Aug 1 05:51:15 ryan-desktop kernel: [18.280909] ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - 96, should be 8F [20070126]

With the release version of the BIOS it said "70, should be 69"

Matthew Garrett says Linux doesn't even need OEMB and this is some vendor-proprietary Windows stuff. (Should be OK to ignore)."

Looks like they are listening to you!

(Reply to this)


(65 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…