On 5/27/20 2:05 PM, Hans Petter Selasky wrote: > On 2020-05-27 15:41, Justin Hibbits wrote: >> On Wed, 27 May 2020 06:27:16 -0700 >> John Baldwin <jhb_at_FreeBSD.org> wrote: >> >>> On 5/27/20 2:39 AM, Andriy Gapon wrote: >>>> On 27/05/2020 11:13, Andriy Gapon wrote: >>>>> I added more diagnostics and it seems to support the idea that the >>>>> problem is related to I/O cycles and bridges. >>>>> >>>>> ACPI timer suddenly starts returning 0xffffffff and that lasts for >>>>> tens of microseconds before the timer goes back to returning >>>>> normal values with an expected increase. >>>>> AMD provides a proprietary way to access ACPI registers via MMIO >>>>> (0xfed808xx). That mechanism is unaffected, ACPI timer register >>>>> always returns good values. >>>>> >>>>> The problem seems to happen when restoring configuration of a >>>>> particular PCI bridge. What's interesting is that the bridge >>>>> decodes one memory range and one I/O range. >>>>> >>>>> Looking at pci_cfg_restore() I wonder if it is wise to restore >>>>> PCIR_COMMAND so early. Could it be that after the resume the >>>>> bridge is configured with a wrong I/O range (e.g., too wide) and >>>>> by writing PCIR_COMMAND we enable that decoding. So, the bridge >>>>> steals I/O cycles destined for ACPI support hardware. If there is >>>>> nothing behind the bridge to handle those ports, then we get those >>>>> bad readings. Once the bridge configuration is fully restored, the >>>>> I/O handling goes back to normal. >>>> >>>> From what I see, this looks like a BIOS bug. >>>> Upon resume, it swaps window configurations of pcib1 and pcib2 >>>> (until FreeBSD restores them). pcib1 originally does not have an >>>> I/O window. So, BIOS programs both base and limit of pcib2 I/O >>>> window to zero. When FreeBSD writes its command register to >>>> enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range. >>>> That covers the ACPI ports at 0x8xx. >>>> >>>> Some printf-s. >>>> From (verbose) boot time: >>>> pcib1: domain 0 >>>> pcib1: secondary bus 1 >>>> pcib1: subordinate bus 1 >>>> pcib1: memory decode 0xfea00000-0xfeafffff >>>> pcib2: domain 0 >>>> pcib2: secondary bus 2 >>>> pcib2: subordinate bus 2 >>>> pcib2: I/O decode 0xf000-0xffff >>>> pcib2: memory decode 0xfe900000-0xfe9fffff >>>> >>>> My printf-s from resume time: >>>> pcib1: old I/O base (low): 0xf1 >>>> pcib1: old I/O base (high): 0x0 >>>> pcib1: old I/O limit (low): 0x1 >>>> pcib1: old I/O limit (high): 0x0 >>>> pcib2: old I/O base (low): 0x1 >>>> pcib2: old I/O base (high): 0x0 >>>> pcib2: old I/O limit (low): 0x1 >>>> pcib2: old I/O limit (high): 0x0 >>> >>> The "solution" I think is to have resume be multi-pass and to resume >>> all the bridges first before trying to resume leaf devices (including >>> timers), but that's a fair bit of work. It might be that we just >>> need to resume timer interrupts later after the new-bus resume (I >>> think we currently do it before?), though the reason for that was to >>> allow resume methods in devices to sleep (I'm not sure if any do). >>> >> >> That sounds like a good fit for https://reviews.freebsd.org/D203 . >> Someone (TM) just needs to take it over the finish line... 6 years >> later. > > Is this perhaps related to: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666 No. I get that constantly on a desktop that never suspends/resumes. It only started after upgrading to 12.0. -- John BaldwinReceived on Wed May 27 2020 - 19:38:26 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC