Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Wed, 27 May 2020 06:27:16 -0700
On 5/27/20 2:39 AM, Andriy Gapon wrote:
> On 27/05/2020 11:13, Andriy Gapon wrote:
>> I added more diagnostics and it seems to support the idea that the problem is
>> related to I/O cycles and bridges.
>>
>> ACPI timer suddenly starts returning 0xffffffff and that lasts for tens of
>> microseconds before the timer goes back to returning normal values with an
>> expected increase.
>> AMD provides a proprietary way to access ACPI registers via MMIO (0xfed808xx).
>> That mechanism is unaffected, ACPI timer register always returns good values.
>>
>> The problem seems to happen when restoring configuration of a particular PCI
>> bridge.  What's interesting is that the bridge decodes one memory range and one
>> I/O range.
>>
>> Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND so
>> early.  Could it be that after the resume the bridge is configured with a wrong
>> I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that decoding.
>>  So, the bridge steals I/O cycles destined for ACPI support hardware.  If there
>> is nothing behind the bridge to handle those ports, then we get those bad readings.
>> Once the bridge configuration is fully restored, the I/O handling goes back to
>> normal.
> 
> From what I see, this looks like a BIOS bug.
> Upon resume, it swaps window configurations of pcib1 and pcib2 (until FreeBSD
> restores them).  pcib1 originally does not have an I/O window.  So, BIOS
> programs both base and limit of pcib2 I/O window to zero.   When FreeBSD writes
> its command register to enable I/O decoding it starts claiming 0x0 - 0xFFF I/O
> port range.  That covers the ACPI ports at 0x8xx.
> 
> Some printf-s.
> From (verbose) boot time:
> pcib1:   domain            0
> pcib1:   secondary bus     1
> pcib1:   subordinate bus   1
> pcib1:   memory decode     0xfea00000-0xfeafffff
> pcib2:   domain            0
> pcib2:   secondary bus     2
> pcib2:   subordinate bus   2
> pcib2:   I/O decode        0xf000-0xffff
> pcib2:   memory decode     0xfe900000-0xfe9fffff
> 
> My printf-s from resume time:
> pcib1: old I/O base (low): 0xf1
> pcib1: old I/O base (high): 0x0
> pcib1: old I/O limit (low): 0x1
> pcib1: old I/O limit (high): 0x0
> pcib2: old I/O base (low): 0x1
> pcib2: old I/O base (high): 0x0
> pcib2: old I/O limit (low): 0x1
> pcib2: old I/O limit (high): 0x0

The "solution" I think is to have resume be multi-pass and to resume all the bridges
first before trying to resume leaf devices (including timers), but that's a fair bit
of work.  It might be that we just need to resume timer interrupts later after the
new-bus resume (I think we currently do it before?), though the reason for that was
to allow resume methods in devices to sleep (I'm not sure if any do).

-- 
John Baldwin
Received on Wed May 27 2020 - 11:27:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC