Re: bge problems when resuming

From: Gonzalo Nemmi <gnemmi_at_gmail.com>
Date: Wed, 22 Jul 2009 16:40:16 -0300
On Wednesday 22 July 2009 8:13:16 am Paul B. Mahol wrote:
> On 7/22/09, Gonzalo Nemmi <gnemmi_at_gmail.com> wrote:
> > On Tuesday 21 July 2009 5:46:10 am Paul B. Mahol wrote:
> >> On 7/20/09, Gonzalo Nemmi <gnemmi_at_gmail.com> wrote:
> >> > On Sunday 19 July 2009 7:53:52 pm Paul B. Mahol wrote:
> >> >> On 7/20/09, Gonzalo Nemmi <gnemmi_at_gmail.com> wrote:
> >> >> > On Sat, Jul 18, 2009 at 12:09 AM, Paul B. Mahol
> >> >> > <onemda_at_gmail.com>
> >> >
> >> > wrote:
> >> >> >> On 7/17/09, Gonzalo Nemmi <gnemmi_at_gmail.com> wrote:
> >> >> >> > On Wednesday 15 July 2009 8:13:47 am Adam K Kirchhoff 
wrote:
> >> >> >> >> On Wednesday 15 July 2009 03:20:45 Paul B. Mahol wrote:
> >> >> >> >> > On 7/15/09, Adam K Kirchhoff <adamk_at_voicenet.com> wrote:
> >> >> >> >> > > Hello all,
> >> >> >> >> > >
> >> >> >> >> > > I have a Dell Latitude D610 laptop with 8.0-BETA1
> >> >> >> >> > > installed.  I hadn't tried suspend/resume for a while
> >> >> >> >> > > and decided to give it a shot.  I was pleasantly
> >> >> >> >> > > surprised to see that I could suspend to ram, resume,
> >> >> >> >> > > and have a (relatively) working system (previously
> >> >> >> >> > > the display would never come back up and the serial
> >> >> >> >> > > console I had hooked up remained dead). Great job to
> >> >> >> >> > > everyone who helped make that possible.
> >> >> >> >> > >
> >> >> >> >> > > The only real issue that I seem to have now is that
> >> >> >> >> > > bge is completely unusable after resume.  Another
> >> >> >> >> > > individual seems to have reported similar problems
> >> >> >> >> > > with bge and resume, but he also had other issues
> >> >> >> >> > > that apparently trumped his networking issues:
> >> >> >> >> > >
> >> >> >> >> > > http://lists.freebsd.org/pipermail/freebsd-current/20
> >> >> >> >> > >09- Jul y/0090 23.html
> >> >> >> >> > >
> >> >> >> >> > > Like him, resuming from suspend gives me:
> >> >> >> >> > >
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 0, val 32768)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY read timed
> >> >> >> >> > > out (phy 1, reg 0, val 0xffffffff)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 24, val 3072)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 23, val 10)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 21, val 12555)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 23, val 8223)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 21, val 38150)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 23, val 16415)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 21, val 5346)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 24, val 1024)
> >> >> >> >> > > Jul 14 12:35:53 scroll kernel: bge0: PHY write timed
> >> >> >> >> > > out (phy 1, reg 24, val 7)
> >> >> >> >> > >
> >> >> >> >> > > And so on and so forth.
> >> >> >> >> > >
> >> >> >> >> > > I thought that compiling if_bge as a module,
> >> >> >> >> > > unloading it before suspend, and reloading it after
> >> >> >> >> > > resume, might get this working. However, doing a
> >> >> >> >> > > "kldload if_bge" after the resume does nothing. Well,
> >> >> >> >> > > the module gets loaded, but the device doesn't show
> >> >> >> >> > > up.  No errors from kldload, and there is nothing new
> >> >> >> >> > > in dmesg.
> >> >> >> >> > >
> >> >> >> >> > > Before the suspend, the device shows up as:
> >> >> >> >> > >
> >> >> >> >> > > bge0_at_pci0:2:0:0:        class=0x020000
> >> >> >> >> > > card=0x01821028 chip=0x167714e4 rev=0x01 hdr=0x00
> >> >> >> >> > >     vendor     = 'Broadcom Corporation'
> >> >> >> >> > >     device     = 'NetXtreme Gigabit Ethernet PCI
> >> >> >> >> > > Express (BCM5750A1)' class      = network
> >> >> >> >> > >     subclass   = ethernet
> >> >> >> >> > >
> >> >> >> >> > > After resuming,  and reloading the module, it's:
> >> >> >> >> > >
> >> >> >> >> > > none1_at_pci0:2:0:0:       class=0x020000
> >> >> >> >> > > card=0x01821028 chip=0x167714e4 rev=0x01 hdr=0x00
> >> >> >> >> > >     vendor     = 'Broadcom Corporation'
> >> >> >> >> > >     device     = 'NetXtreme Gigabit Ethernet PCI
> >> >> >> >> > > Express (BCM5750A1)' class      = network
> >> >> >> >> > >     subclass   = ethernet
> >> >> >> >> > >
> >> >> >> >> > > If there are no ideas, I'll go ahead and open up a
> >> >> >> >> > > pr. I assume this is just one bug, since both
> >> >> >> >> > > problems (the PHY issues and the inability to reload
> >> >> >> >> > > the driver) are both related to the network device.
> >> >> >> >> >
> >> >> >> >> > Put this lines into loader.conf and reboot.
> >> >> >> >> >
> >> >> >> >> > hw.pci.do_power_nodriver="3"
> >> >> >> >> > hw.pci.do_power_resume="1"
> >> >> >> >> >
> >> >> >> >> > Now, before suspend, unload if_bge and some another
> >> >> >> >> > driver (sound drivers are best candidate) and load
> >> >> >> >> > sound driver again, suspend and resume.
> >> >> >> >> > Now loading if_bge should make it succesfully attach.
> >> >> >> >>
> >> >> >> >> Unfortunately, after doing this, reloading the if_bge
> >> >> >> >> driver causes the laptop to completely lock up...  It
> >> >> >> >> gets as far as:
> >> >> >> >>
> >> >> >> >> bge0: <Broadcom NetXtreme Gigabit Ethernet Controller,
> >> >> >> >> unknown ASIC rev. 0xffff>
> >> >> >> >> mem 0xdfdf0000-0xdfdfffff irq 16 at device 0.0 on pci2
> >> >> >> >>
> >> >> >> >> And then the entire machine hangs.  I'm on ttyv0, so I'd
> >> >> >> >> see any kernel panic, but nothing like that happens.  The
> >> >> >> >> screen stays on, but nothing else happens till I force a
> >> >> >> >> reboot.
> >> >> >> >>
> >> >> >> >> Adam
> >> >> >> >
> >> >> >> > Hi Adam, Paul ...
> >> >> >> > I'm the "another individual" from you OP.
> >> >> >> > I have the same problems you have regarding bge, but they
> >> >> >> > weren't trumped .. I just had an order of priorities ;)
> >> >> >> >
> >> >> >> > Anyways, I tried the solution Paul posted and, just as in
> >> >> >> > your case, I got a hard lock too ...
> >> >> >> >
> >> >> >> > I tried loading if_bge through /boot/loader.conf
> >> >> >> > Then issued a:
> >> >> >> >
> >> >> >> > kldunload if_bge coretemp
> >> >> >>
> >> >> >> coretemp is wrong module, it must be one of modules that
> >> >> >> attach to pci.
> >> >> >
> >> >> > Sorry Paul!
> >> >> > I gave it a go with snd_hda and I got the same result except
> >> >> > that this time I also got the following message:
> >> >>
> >> >> After unloading snd_hda you loaded it again before suspending?
> >> >
> >> > Doing so yielded a Fatal trap 12 on BETA2. Yesterday I install
> >> > BETA2 and here are the results:
> >> >
> >> >
> >> > kldstat
> >> >
> >> > Id Refs Address    Size     Name
> >> >  1   28 0xc0400000 cf6c70   kernel
> >> >  2    1 0xc10f7000 11bc0    if_bge.ko
> >> >  3    1 0xc1109000 1ac4c    snd_hda.ko
> >> >  4    2 0xc1124000 61f78    sound.ko
> >> >  5    1 0xc1186000 2af4     coretemp.ko
> >> >  6    1 0xc1189000 a6d8     i915.ko
> >> >  7    2 0xc1194000 177d4    drm.ko
> >> >
> >> >
> >> > kldunload if_bge snd_hda
> >> >
> >> > Jul 20 17:50:49 gargoyle login: ROOT LOGIN (root) ON ttyv0
> >> > Jul 20 17:51:06 gargoyle kernel: brgphy0: detached
> >> > Jul 20 17:51:06 gargoyle kernel: lock order reversal:
> >> > Jul 20 17:51:06 gargoyle kernel: 1st 0xc0dba45c kernel linker
> >> > (kernel linker) _at_ /usr/src/sys/kern/kern_linker.c:1079
> >> > Jul 20 17:51:06 gargoyle kernel: 2nd 0xc0dbbc64 sysctl lock
> >> > (sysctl lock) _at_ /usr/src/sys/kern/kern_sysctl.c:257
> >> > Jul 20 17:51:06 gargoyle kernel: KDB: stack backtrace:
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > db_trace_self_wrapper(c0c6baf4,e6daba34,c08bc995,c08ad6db,c0c6e9
> >> >89, ...) at db_trace_self_wrapper+0x26
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > kdb_backtrace(c08ad6db,c0c6e989,c452bc88,c4529e10,e6daba90,...)
> >> > at kdb_backtrace+0x29
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > _witness_debugger(c0c6e989,c0dbbc64,c0c69667,c4529e10,c0c6956e,.
> >> >..) at _witness_debugger+0x25
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > witness_checkorder(c0dbbc64,9,c0c6956e,101,0,...) at
> >> > witness_checkorder+0x839
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > _sx_xlock(c0dbbc64,0,c0c6956e,101,c4722c00,...) at
> >> > _sx_xlock+0x85 Jul 20 17:51:06 gargoyle kernel:
> >> > sysctl_ctx_free(c4722c4c,c4722c00,e6dabb18,c08a3c85,c4722c00,...
> >> >) at sysctl_ctx_free+0x30
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > device_sysctl_fini(c4722c00,0,c0d4c848,c472a810,c4ab3400,...) at
> >> > device_sysctl_fini+0x1a
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > device_detach(c4722c00,c4722b80,e6dabb38,c06bc622,c4722b80,...)
> >> > at device_detach+0x1f5
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > bus_generic_detach(c4722b80,c4722b80,e6dabb64,c08a3b1c,c4722b80,
> >> >... ) at bus_generic_detach+0x29
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > miibus_detach(c4722b80,c45d6060,c0d4ca68,a3c,c0c76f47,...) at
> >> > miibus_detach+0x12
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > device_detach(c4722b80,c472b008,e6dabb98,c10ff7ff,c4722300,...)
> >> > at device_detach+0x8c
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > bus_generic_detach(c4722300,1,c1104b66,aec,c4722300,...) at
> >> > bus_generic_detach+0x29
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > bge_detach(c4722300,c4677060,c0d4ca68,a3c,c4526300,...) at
> >> > bge_detach+0xbf
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > device_detach(c4722300,c086c843,c0dbb570,c1106c20,c456fb80,...)
> >> > at device_detach+0x8c
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > driver_module_handler(c4526300,1,c1106c20,109,0,...) at
> >> > driver_module_handler+0x29c
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > module_unload(c4526300,c0c652ef,273,270,c08604b6,...) at
> >> > module_unload+0x43
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > linker_file_unload(c4544200,0,c0c652ef,437,c10f7000,...) at
> >> > linker_file_unload+0x15e
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > kern_kldunload(c4b346c0,2,0,e6dabd2c,c0ba8dd3,...) at
> >> > kern_kldunload+0xd5
> >> > Jul 20 17:51:06 gargoyle kernel:
> >> > kldunloadf(c4b346c0,e6dabcf8,8,c0c6fa4b,c0d50450,...) at
> >> > kldunloadf+0x2b
> >> > Jul 20 17:51:06 gargoyle kernel: syscall(e6dabd38) at
> >> > syscall+0x2a3 Jul 20 17:51:06 gargoyle kernel:
> >> > Xint0x80_syscall() at
> >> > Xint0x80_syscall+0x20
> >> > Jul 20 17:51:06 gargoyle kernel: --- syscall (444, FreeBSD
> >> > ELF32, kldunloadf), eip = 0x280d516b, esp = 0xbfbfe47c, ebp =
> >> > 0xbfbfecc8 --- Jul 20 17:51:06 gargoyle kernel: miibus0:
> >> > detached
> >> > Jul 20 17:51:06 gargoyle kernel: bge0: detached
> >> > Jul 20 17:51:06 gargoyle kernel: sysctl_unregister_oid: failed
> >> > to unregister sysctl
> >>
> >> if_bge driver looks very problematic to me. Probably it can not
> >> detach at all.
> >>
> >> > Jul 20 17:51:06 gargoyle kernel: pcm0: detached
> >> > Jul 20 17:51:06 gargoyle kernel: hdac0: detached
> >> >
> >> >
> >> > kld snd_hda
> >>
> >>   ^^^
> >> You mean kldload.
> >>
> >> > Jul 20 17:52:16 gargoyle kernel: hdac0: <Intel 82801H High
> >> > Definition Audio Controller> mem 0xf6dfc000-0xf6dfffff irq 21 at
> >> > device 27.0 on pci0
> >> > Jul 20 17:52:16 gargoyle kernel: hdac0: HDA Driver Revision:
> >> > 20090624_0136
> >> > Jul 20 17:52:16 gargoyle kernel: hdac0: [ITHREAD]
> >> > Jul 20 17:52:16 gargoyle kernel: hdac0: HDA Codec #0: Sigmatel
> >> > STAC9228X Jul 20 17:52:16 gargoyle kernel: bge0: <Broadcom
> >> > BCM5906 A2, ASIC rev. 0xc002> mem 0xf69f0000-0xf69fffff irq 17
> >> > at device 0.0 on pci9 Jul 20 17:52:16 gargoyle kernel: miibus0:
> >> > <MII bus> on bge0 Jul 20 17:52:16 gargoyle kernel: brgphy0:
> >> > <BCM5906 10/100baseTX PHY> PHY 1 on miibus0
> >> > Jul 20 17:52:16 gargoyle kernel: brgphy0:  10baseT, 10baseT-FDX,
> >> > 100baseTX, 100baseTX-FDX, auto
> >> > Jul 20 17:52:16 gargoyle kernel: bge0: Ethernet address:
> >> > 00:23:ae:04:ba:ca
> >> > Jul 20 17:52:16 gargoyle kernel: bge0: [ITHREAD]
> >> > Jul 20 17:52:16 gargoyle kernel: pcm0: <HDA Sigmatel STAC9228X
> >> > PCM #0 Analog> at cad 0 nid 1 on hdac0
> >> > Jul 20 17:52:16 gargoyle kernel: bge0: link state changed to
> >> > DOWN Jul 20 17:52:18 gargoyle kernel: bge0: link state changed
> >> > to UP
> >>
> >> Why bge0 appeared again?
> >>
> >> > acpiconf -s 3
> >>
> >> After this command bge0 should not appear at all because it should
> >> not be attached to
> >> device.
> >>
> >> > Jul 20 17:53:51 gargoyle acpi: suspend at 20090720 17:53:51
> >> > Jul 20 17:53:56 gargoyle kernel: fwohci0: fwohci_pci_suspend
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY write timed out (phy
> >> > 1, reg 0, val 32768)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY read timed out (phy
> >> > 1, reg 0, val 0xffffffff)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY read timed out (phy
> >> > 1, reg 24, val 0xffffffff)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY read timed out (phy
> >> > 1, reg 16, val 0xffffffff)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY write timed out (phy
> >> > 1, reg 16, val 0)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY read timed out (phy
> >> > 1, reg 16, val 0xffffffff)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY write timed out (phy
> >> > 1, reg 16, val 0)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: PHY write timed out (phy
> >> > 1, reg 23, val 18)
> >> > Jul 20 17:54:25 gargoyle kernel: bge0: flow-through queue init
> >> > failed Jul 20 17:54:25 gargoyle kernel: bge0: initialization
> >> > failure Jul 20 17:54:25 gargoyle kernel: fwohci0: Phy 1394a
> >> > available S400, 1 ports.
> >> > Jul 20 17:54:25 gargoyle kernel: fwohci0: Link S400, max_rec
> >> > 2048 bytes. Jul 20 17:54:25 gargoyle kernel: fwohci0: Initiate
> >> > bus reset Jul 20 17:54:25 gargoyle kernel: fwohci0:
> >> > fwohci_intr_core: BUS reset Jul 20 17:54:25 gargoyle kernel:
> >> > fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1,
> >> > CYCLEMASTER mode
> >> > Jul 20 17:54:25 gargoyle kernel: firewire0: 1 nodes, maxhop <= 0
> >> > cable IRM irm(0)  (me)
> >> > Jul 20 17:54:25 gargoyle kernel: firewire0: bus manager 0
> >> > Jul 20 17:54:25 gargoyle kernel: fwohci0: unrecoverable error
> >> > Jul 20 17:54:25 gargoyle kernel: wakeup from sleeping state
> >> > (slept 00:00:29)
> >> > Jul 20 17:54:25 gargoyle acpi: resumed at 20090720 17:54:25
> >> >
> >> > Should a PR on fwohci and firewire also be filed??
> >>
> >> Try with custom kernel with smaller number of drivers as possible.
> >> (use modules instead)
> >> From your mail I dont see where is problem with firewire.
> >
> > Done.
> >
> > Commented if_bge out of GENERIC, recompiled, loaded if_bge via
> > loader.conf, kldunloaded if_bge snd_hda, kloaded snd_hda (if_bge
> > did not show up on dmesg this time), went to sleep (acpiconf -s 3),
> > resumed, no bge timeouts (only fwohci and firewire messages), then
> > kldloaded if_bge and got a solid freeze :(
>
> Does kldload of if_bge works after boot? (remove if_bge_load="YES"
> from /boot/loader.conf
> and load it after boot)
> Does kldload and kldunload and kldload again of if_bge works (without
> suspending machine this time)?

Yes it does ... a few LORs but it does load and unload correctly.

You'll find the messages in here: http://pastebin.com/f69916d2

If there's anything else you like me to test, just tell me :)

Thanks once again for you concern Paul :D

Best regards
-- 
Blessings
Gonzalo Nemmi
Received on Wed Jul 22 2009 - 17:40:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC