on 22/06/2011 23:09 Kenneth D. Merry said the following: > The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So > that tells me that one of several things is going on: > > - There is a path in the cd(4) driver where it can call cam_periph_hold() > but not cam_periph_unhold(). > > - There is another thread in the system that has called cam_periph_hold(), > and has gotten stuck before it can call cam_periph_unhold(). > > - The hold/unhold logic is broken, and there is a case where a thread > waiting for the lock can miss the wakeup. After looking at the code, I > don't think this is the case, but I may have missed something. > > So it is probably one of the first two cases. From the dmesg, I only see > cd1 listed, not cd0. So it is possible that cd0 is stuck in the probe code > somewhere, and the geom code just gets stuck trying to open it when the > probe hasn't completed. > > Seeing the stack trace for the taskq thread that is running on CPU 0 > (process 100014) might be enlightening, it's hard to say. That may or may > not show the issue. > > It's possible that this issue is directly related to the commit in > question; perhaps there is an error being returned that wasn't returned > before and it isn't being handled right in the cd(4) driver. (The cd(4) > driver wasn't touched in the commit.) > > It's also possible that the commit in question just changed the timing and > your system is hitting a race that was there previously. I have a suspicion that this is actually the case. More than once I've seen under qemu that the kernel boot non-deterministically gets stuck in the cd driver. Other people have also bumped into this. E.g., here's one of the reports that I googled up, it's not exactly the same as what ache has reported, but somewhat similar: http://lists.freebsd.org/pipermail/freebsd-current/2010-October/020336.html -- Andriy GaponReceived on Thu Jun 23 2011 - 10:51:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC