Re: Revision 309657 to stack_machdep.c renders unbootable system

From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 14 Dec 2016 14:10:48 -0800
On Wed, Dec 14, 2016 at 12:14:16PM -0800, Mark Johnston wrote:
> On Wed, Dec 14, 2016 at 11:49:26AM -0800, Steven G. Kargl wrote:
> > Well, after 3 days of bisection, I finally found the commit
> > that renders my system unbootable.  The system does not panic.
> > It simply gets stuck in some state.  Nonfunctional keyboard,
> > so can't break into debugger.  No serial console available.
> > The verbose dmesg.boot for a working kernel from revision
> > 309656 is at
> > 
> > http://troutmask.apl.washington.edu/~kargl/freebsd/dmesg.309656.txt
> > 
> > The kernel config file is at
> > 
> > http://troutmask.apl.washington.edu/~kargl/freebsd/SPEW.txt
> > 
> > In looking at /usr/src/UPDATING, there is no warning that one
> > can create a boat anchor by upgrading to 309657.  If compiling
> > a kernel with 'options DDB' is no longer supported, this should
> > be stated in UPDATING.  Or, UPDATING should state that 'options
> > DDB' requires 'options STACK'.  Or, 'options DDB' should simply
> > to the right thing and pull in whatever 'option STACK' does. 
> 
> It is supported though - the point of that change was to fix a problem
> that occurred when DDB is configured but STACK isn't. While testing I
> tried every combination of the two options, and I just tried and
> successfully booted a kernel with DDB and !STACK.
> 
> Does the kernel boot successfully if STACK is added to your
> configuration?

I tried your config (plus virtio drivers) and was able to reproduce the
hang in bhyve. Adding STACK "fixed" the hang, as did reverting part of
my change to re-add dead code into the kernel. My VM was always hanging
after printing

000.000050 [ 426] vtnet_netmap_attach       virtio attached txq=1, txd=1024 rxq=1, rxd=1024

Sure enough, removing "device netmap" from your config also fixes the
hang. When the hang occurs, I can see with "bhyvectl --get-rip" that
we're stuck in DELAY(), but I can't get a stack at that point. I think
my change is an innocent bystander - it just happened to expose a latent
issue elsewhere.

I don't have much more time to look at this right now, but I'll look
into it more tonight.
Received on Wed Dec 14 2016 - 21:04:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC