On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote: > On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote: > > These two are interesting: > > > > > http://img825.imageshack.us/img825/1249/21062011014m.jpg > > > http://img839.imageshack.us/img839/3791/21062011015.jpg > > > > It looks like the GEOM event thread is stuck inside the cd(4) driver. The > > cd(4) driver is trying to acquire the peripheral lock, and is sleeping > > until it gets it. > > > > What isn't clear is who is holding it. The ps output shows an idle thread > > running on CPU 1, and thread 100014 (taskq) running on CPU 0. > > Unfortunately I don't see a stack trace for that. (I might have missed > > it.) > > > > Do you happen to have the image with the stack trace for that thread? > > I don't have the image because no disks are mounted at that stage and the > swap slice is not attached. But I can issue more specific DDB commands to > narrow it down, just say what you need in detail. > > BTW, the machine have 2 DVD both are attached to Marvell IDE plain ATA > interface, they always works before. > > Are you sure that something holding the lock? 'show lock' shows absolutely > nothing, it is empty. Well, after looking at the code a little more, it looks like the "lock" that is being held is the periph lock, which is really just a flag. So 'show lock' wouldn't show anything relevant. Here's cam_periph_hold(): int cam_periph_hold(struct cam_periph *periph, int priority) { int error; /* * Increment the reference count on the peripheral * while we wait for our lock attempt to succeed * to ensure the peripheral doesn't disappear out * from user us while we sleep. */ if (cam_periph_acquire(periph) != CAM_REQ_CMP) return (ENXIO); mtx_assert(periph->sim->mtx, MA_OWNED); while ((periph->flags & CAM_PERIPH_LOCKED) != 0) { periph->flags |= CAM_PERIPH_LOCK_WANTED; if ((error = mtx_sleep(periph, periph->sim->mtx, priority, "caplck", 0)) != 0) { cam_periph_release_locked(periph); return (error); } } periph->flags |= CAM_PERIPH_LOCKED; return (0); } The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So that tells me that one of several things is going on: - There is a path in the cd(4) driver where it can call cam_periph_hold() but not cam_periph_unhold(). - There is another thread in the system that has called cam_periph_hold(), and has gotten stuck before it can call cam_periph_unhold(). - The hold/unhold logic is broken, and there is a case where a thread waiting for the lock can miss the wakeup. After looking at the code, I don't think this is the case, but I may have missed something. So it is probably one of the first two cases. From the dmesg, I only see cd1 listed, not cd0. So it is possible that cd0 is stuck in the probe code somewhere, and the geom code just gets stuck trying to open it when the probe hasn't completed. Seeing the stack trace for the taskq thread that is running on CPU 0 (process 100014) might be enlightening, it's hard to say. That may or may not show the issue. It's possible that this issue is directly related to the commit in question; perhaps there is an error being returned that wasn't returned before and it isn't being handled right in the cd(4) driver. (The cd(4) driver wasn't touched in the commit.) It's also possible that the commit in question just changed the timing and your system is hitting a race that was there previously. Ken -- Kenneth Merry ken_at_FreeBSD.ORGReceived on Wed Jun 22 2011 - 18:09:20 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC