On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > >>> Hello, > > >>> there seems to be a problem with device attach sequence offered by newbus. > > >>> Basically, when device attach method is executing, device is not fully > > >>> initialized yet. Also the device state in the newbus part of the world > > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > > >>> but drivers that e.g. create devfs node to communicate with consumers > > >>> are prone to a race. > > >>> > > >>> If /dev node is created inside device attach method, then usermode > > >>> can start calling cdevsw methods before device fully initialized itself. > > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > > >>> like device_busy(9), then panic occurs "called for unatteched device". > > >>> I get reports from users about this issues, to it is not something > > >>> that only could happen. > > >>> > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > >>> from newbus right after device attach finished and newbus considers > > >>> the device fully initialized. Driver then could create devfs node > > >>> in the after_attach method instead of attach. Please see the patch below. > > >>> > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > >>> index eb720eb..9db74e2 100644 > > >>> --- a/sys/kern/device_if.m > > >>> +++ b/sys/kern/device_if.m > > >>> _at__at_ -43,6 +43,10 _at__at_ INTERFACE device; > > >>> # Default implementations of some methods. > > >>> # > > >>> CODE { > > >>> + static void null_after_attach(device_t dev) > > >>> + { > > >>> + } > > >>> + > > >>> static int null_shutdown(device_t dev) > > >>> { > > >>> return 0; > > >>> _at__at_ -199,6 +203,21 _at__at_ METHOD int attach { > > >>> }; > > >>> > > >>> /** > > >>> + * _at_brief Notify the driver that device is in attached state > > >>> + * > > >>> + * Called after driver is successfully attached to the device and > > >>> + * corresponding device_t is fully operational. Driver now may expose > > >>> + * the device to the consumers, e.g. create devfs nodes. > > >>> + * > > >>> + * _at_param dev the device to probe > > >>> + * > > >>> + * _at_see DEVICE_ATTACH() > > >>> + */ > > >>> +METHOD void after_attach { > > >>> + device_t dev; > > >>> +} DEFAULT null_after_attach; > > >>> + > > >>> +/** > > >>> * _at_brief Detach a driver from a device. > > >>> * > > >>> * This can be called if the user is replacing the > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > >>> index d485b9f..6d849cb 100644 > > >>> --- a/sys/kern/subr_bus.c > > >>> +++ b/sys/kern/subr_bus.c > > >>> _at__at_ -2743,6 +2743,7 _at__at_ device_attach(device_t dev) > > >>> dev->state = DS_ATTACHED; > > >>> dev->flags &= ~DF_DONENOMATCH; > > >>> devadded(dev); > > >>> + DEVICE_AFTER_ATTACH(dev); > > >>> return (0); > > >>> } > > >>> > > >> > > >> Does device_get_softc() work before attach is completed? (I don't have > > >> time to go look in the code right now). If so, then a mutex initialized > > >> and acquired early in the driver's attach routine, and also acquired in > > >> the driver's cdev implementation routines before using any newbus > > >> functions other than device_get_softc(), would solve the problem without > > >> a driver api change that would make it harder to backport/MFC driver > > >> changes. > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > on the driver developers, because you cannot sleep under mutex. Changing > > > the mutex to the sleepable lock also does not byy you much, since > > > you need to somehow solve the issues with some cdevsw call waking up > > > thread sleeping into another cdevsw call, just for example. > > > > > > Singlethreading a driver due to this race is just silly. > > > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > > > driver_attach() > > { > > ... > > softc->flags = 0; // redundant, since softc is initialized to 0. > > softc->cdev = device_create...(); > > ... > > softc->flags |= READY; > > } > > > > driver_open(...) > > { > > if (!(softc->flags & READY)) > > return ENXIO; > > ... > > } > > > > What's the big burden here? > The burden is that your proposal does not work. As I described above, > device_busy() calls from cdevsw method panic if open() is called before > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > attach() method returned, so no workarounds from attach() could solve > this. One thing that keeps floating to the front of my brain is that all the proposals so far (including my not-well-thought-out mutex suggestion) requires changing every existing driver to get the new safe behavior. Hmmm. Looking at the code, not very many drivers call device_busy(). Why is that? I agree that calling device_create() should be deferred until the driver is ready to handle requests. That's only part of the fix if the newbus support routines are still going to have a window where they can panic because the internal state variables haven't yet transitioned to the correct state. Also, the implementation of device_busy() looks to be unsafe unless it's being implicitly protected by some locking in a call chain that isn't jumping out at me with simple grepping of the code. For example, concurrent callers in a device's open() and close() methods for a driver that calls busy/unbusy from cdev open/close could leave the parent device's busy count in an indeterminate state. Could fixing this by enforcing single threading through busy/unbusy provide an opportunity to fix the original problem by having the attach code acquire the same lock, so that an early call to a cdev method that invokes device_busy() ends up sleeping until the attach routine returns? That way you don't single-thread the whole cdev open/close handling, just the part that's currently causing a problem. -- IanReceived on Mon Apr 09 2012 - 12:41:25 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:25 UTC