On 6/22/11 4:09 PM, Kenneth D. Merry wrote: > On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote: > > On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote: > >> These two are interesting: > >> > >>> http://img825.imageshack.us/img825/1249/21062011014m.jpg > >>> http://img839.imageshack.us/img839/3791/21062011015.jpg > >> > >> It looks like the GEOM event thread is stuck inside the cd(4) driver. The > >> cd(4) driver is trying to acquire the peripheral lock, and is sleeping > >> until it gets it. > >> > >> What isn't clear is who is holding it. ... > The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So > that tells me that one of several things is going on: > > - There is a path in the cd(4) driver where it can call cam_periph_hold() > but not cam_periph_unhold(). > > - There is another thread in the system that has called cam_periph_hold(), > and has gotten stuck before it can call cam_periph_unhold(). > > - The hold/unhold logic is broken, and there is a case where a thread > waiting for the lock can miss the wakeup. After looking at the code, I > don't think this is the case, but I may have missed something. > > So it is probably one of the first two cases. ... I have a theory for the cause of this hang. The commit that triggers this problem added calls to g_access() during the geom_dev probe. I believe this hit a race in cdregister() where the periph hold lock is dropped around the changer probe code. Why the periph hold lock is dropped there, I do not know as I haven't fully reviewed the changer probe code. The drop of the lock in cdregister() can allow geom classes to probe and thus call g_access()->g_disk_access()->cdopen() before a probe is initiated in the "normal way" by cdregister(). cdopen() checks for media presence by issuing immediate ccds. When the race is exploited, the peripheral will be in the "probe state" when the immediate ccbs are requested. This will cause the device probe to be performed before the immediate ccd is returned. When the cdopen() activity finally unwinds, cdregister() will again take the periph hold lock and schedule the peripheral, expecting probe processing to complete and release the hold lock. However, since the periph is already in the normal state (due to the successful probe performed indirectly by the cdopen() call), that unlock never happens, thus wedging the device. To test this theory, apply the following patch. I do not know if this is safe for changer devices, but I will review the changer code if this patch fixes ache's problem. -- Justin
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC