Re: Revision 309657 to stack_machdep.c renders unbootable system

From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 14 Dec 2016 16:50:21 -0800
On Wed, Dec 14, 2016 at 03:48:04PM -0800, Steven G. Kargl wrote:
> On Wed, Dec 14, 2016 at 02:10:48PM -0800, Mark Johnston wrote:
> > On Wed, Dec 14, 2016 at 12:14:16PM -0800, Mark Johnston wrote:
> > > On Wed, Dec 14, 2016 at 11:49:26AM -0800, Steven G. Kargl wrote:
> > > > Well, after 3 days of bisection, I finally found the commit
> > > > that renders my system unbootable.  The system does not panic.
> > > > It simply gets stuck in some state.  Nonfunctional keyboard,
> > > > so can't break into debugger.  No serial console available.
> > > > The verbose dmesg.boot for a working kernel from revision
> > > > 309656 is at
> > > > 
> > > > http://troutmask.apl.washington.edu/~kargl/freebsd/dmesg.309656.txt
> > > > 
> > > > The kernel config file is at
> > > > 
> > > > http://troutmask.apl.washington.edu/~kargl/freebsd/SPEW.txt
> > > > 
> > > > In looking at /usr/src/UPDATING, there is no warning that one
> > > > can create a boat anchor by upgrading to 309657.  If compiling
> > > > a kernel with 'options DDB' is no longer supported, this should
> > > > be stated in UPDATING.  Or, UPDATING should state that 'options
> > > > DDB' requires 'options STACK'.  Or, 'options DDB' should simply
> > > > to the right thing and pull in whatever 'option STACK' does. 
> > > 
> > > It is supported though - the point of that change was to fix a problem
> > > that occurred when DDB is configured but STACK isn't. While testing I
> > > tried every combination of the two options, and I just tried and
> > > successfully booted a kernel with DDB and !STACK.
> > > 
> > > Does the kernel boot successfully if STACK is added to your
> > > configuration?
> > 
> > I tried your config (plus virtio drivers) and was able to reproduce the
> > hang in bhyve. Adding STACK "fixed" the hang, as did reverting part of
> > my change to re-add dead code into the kernel. My VM was always hanging
> > after printing
> > 
> > 000.000050 [ 426] vtnet_netmap_attach       virtio attached txq=1, txd=1024 rxq=1, rxd=1024
> > 
> > Sure enough, removing "device netmap" from your config also fixes the
> > hang. When the hang occurs, I can see with "bhyvectl --get-rip" that
> > we're stuck in DELAY(), but I can't get a stack at that point. I think
> > my change is an innocent bystander - it just happened to expose a latent
> > issue elsewhere.
> > 
> > I don't have much more time to look at this right now, but I'll look
> > into it more tonight.
> 
> Yes, adding STACK got me to a booting kernel.  I can't remember
> why I added netmap to my config file.  Re-adding dead code seems to
> point to some memory corruption issue or a rogue pointer. :(

It's not quite that bad, as it turns out. The key is that
adding/removing the dead code changes the ordering of the items in the
sysinit linker set. I discovered that if the ctl(4) module is
initialized before the vtnet driver attaches, the hang occurs, and
reverting my commit results in a sysinit order where vtnet comes
_before_ ctl(4). So my change triggers the problem just because it
happens to perturb something in the compile-time linker.

> 
> BTW, I think it would be prudent to add something like 
> 
>   20161206:
>      At revision 309657, 'options STACK' was introduced into
>      sys/x86/x86/mstack_machdep.c.  Old kernel configuration files
>      that included 'options DDB' are now required to include also
>      'options STACK'.
> 
> to UPDATING or some such wording.  I was jumping from circ Oct 10th world
> to top of tree, and got caught by ~3000 commits.

The issue actually seems to be in 4BSD, and more specifically in r308564
and r308565. Switching to ULE or reverting either of those two commits
fixes the hang. Here's what happens:

1. ctl_init() runs and creates ctl_thresh_thread. This thread's main
   loop cause pause(9) when it has no work to do. During boot, pause(9)
   just calls DELAY() and does not yield the CPU.
2. thread0 attaches the vtnet driver. As part of this, it creates and
   starts some high-priority taskqueue threads in
   _taskqueue_thread_start(). They're added to the scheduler with:

   thread_lock();
   sched_pri(...);
   sched_add(...);
   thread_unlock();

   4BSD's sched_add() will call maybe_preempt() in this case, which as
   of r308564 will unconditionally set td_owepreempt in the current
   thread.
3. thread_unlock() will release the critical section held by the current
   thread and because td_owepreempt is set, we'll yield the CPU. The
   taskqueue threads have nothing to do, but ctl_thresh_thread runs
   and ends up busy-waiting in pause() forever.

r308565 removes a check in maybe_preempt() that would have stopped
td_owepreempt from being set. Before r308564, maybe_preempt() would have
switched directly to the new thread and apparently always switched back
immediately.

I'm not sure what the correct fix is - jhb might have an idea. I wonder
if pause() should try to yield periodically when called during boot.
Received on Wed Dec 14 2016 - 23:44:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC