On Fri, Jan 18, 2013 at 02:45:38PM +0900, Kohji Okuno wrote: > Hi, > > When I removed a device (ex. /dev/da0), I have encounterd a > dead-lock between ``g_event'' thread and a thread that is opening > device file (I call this thread as A). > > Would you refer the following? > > When the device is removed between dev_refthread() and g_dev_open(), > thread A incremented dev->si_threadcount, but can't acquire > topology_lock. > > On the other hand, g_event is waiting to set dev->si_threadcount to 0 > with topology_lock. > > Regards, > Kohji Okuno > > > <<< Thread A >>> > ... > devfs_open() > { > ... > dsw = dev_refthread(dev, &ref); <= increment dev->si_threadcount > ... > error = dsw->d_open(...); <= call g_dev_open() > ... > dev_relthread(dev, ref); <= decrement dev->si_threadcount > } > > g_dev_open() > { > ... > g_topology_lock(); <= Thread A couldn't acquire > ... topology_lock. > } > > <<< g_event >>> > g_run_events() > { > ... > g_topology_lock(); <= g_event acuired topology_lock here. > ... > one_event() > ... > } > > one_event() > g_orphan_register() > g_dev_orphan() > destroy_dev() > destroy_dev() > destroy_devl() > { > ... > while (dev->si_threadcount != 0) { <= this count was incremented by Thread A > /* Use unique dummy wait ident */ > msleep(&csw, &devmtx, PRIBIO, "devdrn", hz / 10); > } > ... > } Yes, you are absolutely right. I believe there were some patches floating around which changed the destroy_dev() call in the g_dev_orphan() to destroy_dev_sched(). I do not remember who was the author. My reply was that naive substitution of the destroy_dev() to destroy_dev_sched() is racy, because some requests might still come in after the call to destroy_dev_sched(). Despite destroy_dev_sched() setting the CDP_SCHED_DTR flag on the devfs node, some thread might already entered the cdevsw method. I do not believe that there was further progress there.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC