Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage)

From: Kenneth D. Merry <ken_at_freebsd.org>
Date: Tue, 21 Jun 2011 21:54:04 -0600
On Wed, Jun 22, 2011 at 00:49:34 +0400, Andrey Chernov wrote:
> On Tue, Jun 21, 2011 at 10:17:19AM -0600, Kenneth D. Merry wrote:
> > ps
> > alltrace
> > show locks
> > show msgbuf
> > 
> > Hopefully that will give us something to start looking at...
> > 
> > This would really work a lot better if there is any way to get a serial
> > console on the machine.  The above will produce a good bit of output, and
> > would likely need a lot of pictures.
> > 
> > Since we can't reproduce the problem here, some debugging help would be
> > greatly appreciated.
> 
> Sorry I have no serial console. Here are the photos. I remove very similar 
> looking USB parts from 'ps' and 'alltrace', and very general parts from 
> 'alltrace' always been there. I hope remaining info will be enough. USB 
> hotplagging works at this stage, so no reason to look there. If it will be 
> not enough, I'll upload whole series.

Thanks for uploading all of the photos.  That's a lot of work, but they are
helpful...

I think I see part of the problem, but not the whole problem:

> 'show lock' outputs nothing, it means no locks just sleep somewhere 
> forever.
> 
> 'ps':
> http://img43.imageshack.us/img43/1424/21062011001j.jpg
> http://img835.imageshack.us/img835/6607/21062011002.jpg
> http://img841.imageshack.us/img841/5401/21062011003.jpg
> 
> 'alltrace':
> http://img864.imageshack.us/img864/6757/21062011004ya.jpg
> http://img542.imageshack.us/img542/4857/21062011005.jpg
> http://img828.imageshack.us/img828/823/21062011006.jpg
> http://img5.imageshack.us/img5/910/21062011007.jpg
> http://img7.imageshack.us/img7/4704/21062011008.jpg
> http://img848.imageshack.us/img848/5487/21062011009.jpg
> http://img641.imageshack.us/img641/2/21062011010.jpg
> http://img7.imageshack.us/img7/7946/21062011011.jpg
> http://img860.imageshack.us/img860/8185/21062011012.jpg
> http://img696.imageshack.us/img696/5276/21062011013.jpg

These two are interesting:

> http://img825.imageshack.us/img825/1249/21062011014m.jpg
> http://img839.imageshack.us/img839/3791/21062011015.jpg

It looks like the GEOM event thread is stuck inside the cd(4) driver.  The
cd(4) driver is trying to acquire the peripheral lock, and is sleeping
until it gets it.

What isn't clear is who is holding it.  The ps output shows an idle thread
running on CPU 1, and thread 100014 (taskq) running on CPU 0.
Unfortunately I don't see a stack trace for that.  (I might have missed
it.)

Do you happen to have the image with the stack trace for that thread?

> http://img594.imageshack.us/img594/1773/21062011016.jpg
> http://img109.imageshack.us/img109/9937/21062011017.jpg
> http://img51.imageshack.us/img51/6047/21062011018l.jpg
> 
> 'show msgbuf':
> http://img59.imageshack.us/img59/46/21062011019.jpg
> http://img189.imageshack.us/img189/483/21062011020.jpg
> http://img19.imageshack.us/img19/8163/21062011021.jpg
> http://img683.imageshack.us/img683/3171/21062011022.jpg
> http://img819.imageshack.us/img819/5923/21062011023.jpg
> http://img692.imageshack.us/img692/3789/21062011024.jpg
> http://img580.imageshack.us/img580/1550/21062011025.jpg
> http://img560.imageshack.us/img560/7478/21062011026.jpg
> http://img94.imageshack.us/img94/9371/21062011027.jpg
> http://img857.imageshack.us/img857/5185/21062011028.jpg

Thanks,

Ken
-- 
Kenneth Merry
ken_at_FreeBSD.ORG
Received on Wed Jun 22 2011 - 01:54:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC