Re: Strange issue after early AP startup

From: Cy Schubert <Cy.Schubert_at_komquats.com>
Date: Wed, 18 Jan 2017 22:49:59 -0800
In message <1922021.4HJeqFJ74r_at_ralph.baldwin.cx>, John Baldwin writes:
> On Tuesday, January 17, 2017 05:08:58 PM Cy Schubert wrote:
> > In message <1492450.XZfNz8zFfg_at_ralph.baldwin.cx>, John Baldwin writes:
> > > On Tuesday, January 17, 2017 12:53:19 PM Cy Schubert wrote:
> > > > In message <b9c53237-4b1a-a140-f692-bf5837060b18_at_selasky.org>, Hans Pet
> ter 
> > > > Sela
> > > > sky writes:
> > > > > Hi,
> > > > > 
> > > > > When booting I observe an additional 30-second delay after this print
> :
> > > > > 
> > > > > > Timecounters tick every 1.000 msec
> > > > > 
> > > > > ~30 second delay and boot continues like normal.
> > > > > 
> > > > > Checking "vmstat -i" reveals that some timers have been running loose
> .
> > > > > 
> > > > > > cpu0:timer                         44300        442
> > > > > > cpu1:timer                         40561        404
> > > > > > cpu3:timer                      48462822     483058
> > > > > > cpu2:timer                      48477898     483209
> > > > > 
> > > > > Trying to add delays and/or prints around the Timecounters printout 
> > > > > makes the issue go away. Any ideas for debugging?
> > > > > 
> > > > > Looks like a startup race to me.
> > > > 
> > > > just picking a random email to reply to, I'm seeing a different issue w
> ith 
> > > > early AP startup. It affects one of my four machines, my laptop. My thr
> ee 
> > > > server systems downstairs have no problem however my laptop will reboot
>  
> > > > repeatedly at:
> > > > 
> > > > Jan 17 11:55:16 slippy kernel: cd0: Attempt to query device size failed
> : 
> > > > NOT READY, Medium not present - tray closed
> > > 
> > > So it panics and reboots after this?
> > 
> > Yes, it goes into a panic/reboot loop for a few iterations until it 
> > successfully boots. Disabling early AP startup allows it to boot up without
>  
> > the assumed race.
> 
> Can you add DDB to the kernel config (and remove DDB_UNATTENDED) to get it
> to break into DDB when it panics to get the panic message (and a stack trace
> as well)?

I found and fixed the problem. It was in some code I had added a long time 
ago but not committed yet to the bge driver to implement WOL. It was a lock 
assertion.


-- 
Cheers,
Cy Schubert <Cy.Schubert_at_cschubert.com>
FreeBSD UNIX:  <cy_at_FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.
Received on Thu Jan 19 2017 - 05:50:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC