On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote: > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > > > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > > > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > >>> Hello, > > > >>> there seems to be a problem with device attach sequence offered by newbus. > > > >>> Basically, when device attach method is executing, device is not fully > > > >>> initialized yet. Also the device state in the newbus part of the world > > > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > > > >>> but drivers that e.g. create devfs node to communicate with consumers > > > >>> are prone to a race. > > > >>> > > > >>> If /dev node is created inside device attach method, then usermode > > > >>> can start calling cdevsw methods before device fully initialized itself. > > > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > > > >>> like device_busy(9), then panic occurs "called for unatteched device". > > > >>> I get reports from users about this issues, to it is not something > > > >>> that only could happen. > > > >>> > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > >>> from newbus right after device attach finished and newbus considers > > > >>> the device fully initialized. Driver then could create devfs node > > > >>> in the after_attach method instead of attach. Please see the patch below. > > > >>> > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > >>> index eb720eb..9db74e2 100644 > > > >>> --- a/sys/kern/device_if.m > > > >>> +++ b/sys/kern/device_if.m > > > >>> _at__at_ -43,6 +43,10 _at__at_ INTERFACE device; > > > >>> # Default implementations of some methods. > > > >>> # > > > >>> CODE { > > > >>> + static void null_after_attach(device_t dev) > > > >>> + { > > > >>> + } > > > >>> + > > > >>> static int null_shutdown(device_t dev) > > > >>> { > > > >>> return 0; > > > >>> _at__at_ -199,6 +203,21 _at__at_ METHOD int attach { > > > >>> }; > > > >>> > > > >>> /** > > > >>> + * _at_brief Notify the driver that device is in attached state > > > >>> + * > > > >>> + * Called after driver is successfully attached to the device and > > > >>> + * corresponding device_t is fully operational. Driver now may expose > > > >>> + * the device to the consumers, e.g. create devfs nodes. > > > >>> + * > > > >>> + * _at_param dev the device to probe > > > >>> + * > > > >>> + * _at_see DEVICE_ATTACH() > > > >>> + */ > > > >>> +METHOD void after_attach { > > > >>> + device_t dev; > > > >>> +} DEFAULT null_after_attach; > > > >>> + > > > >>> +/** > > > >>> * _at_brief Detach a driver from a device. > > > >>> * > > > >>> * This can be called if the user is replacing the > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > >>> index d485b9f..6d849cb 100644 > > > >>> --- a/sys/kern/subr_bus.c > > > >>> +++ b/sys/kern/subr_bus.c > > > >>> _at__at_ -2743,6 +2743,7 _at__at_ device_attach(device_t dev) > > > >>> dev->state = DS_ATTACHED; > > > >>> dev->flags &= ~DF_DONENOMATCH; > > > >>> devadded(dev); > > > >>> + DEVICE_AFTER_ATTACH(dev); > > > >>> return (0); > > > >>> } > > > >>> > > > >> > > > >> Does device_get_softc() work before attach is completed? (I don't have > > > >> time to go look in the code right now). If so, then a mutex initialized > > > >> and acquired early in the driver's attach routine, and also acquired in > > > >> the driver's cdev implementation routines before using any newbus > > > >> functions other than device_get_softc(), would solve the problem without > > > >> a driver api change that would make it harder to backport/MFC driver > > > >> changes. > > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > > on the driver developers, because you cannot sleep under mutex. Changing > > > > the mutex to the sleepable lock also does not byy you much, since > > > > you need to somehow solve the issues with some cdevsw call waking up > > > > thread sleeping into another cdevsw call, just for example. > > > > > > > > Singlethreading a driver due to this race is just silly. > > > > > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > > > > > driver_attach() > > > { > > > ... > > > softc->flags = 0; // redundant, since softc is initialized to 0. > > > softc->cdev = device_create...(); > > > ... > > > softc->flags |= READY; > > > } > > > > > > driver_open(...) > > > { > > > if (!(softc->flags & READY)) > > > return ENXIO; > > > ... > > > } > > > > > > What's the big burden here? > > The burden is that your proposal does not work. As I described above, > > device_busy() calls from cdevsw method panic if open() is called before > > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > > attach() method returned, so no workarounds from attach() could solve > > this. > > One thing that keeps floating to the front of my brain is that all the > proposals so far (including my not-well-thought-out mutex suggestion) > requires changing every existing driver to get the new safe behavior. > > Hmmm. Looking at the code, not very many drivers call device_busy(). > Why is that? > > I agree that calling device_create() should be deferred until the driver You mean make_dev(9), I assume. > is ready to handle requests. That's only part of the fix if the newbus > support routines are still going to have a window where they can panic > because the internal state variables haven't yet transitioned to the > correct state. > > Also, the implementation of device_busy() looks to be unsafe unless it's > being implicitly protected by some locking in a call chain that isn't > jumping out at me with simple grepping of the code. For example, > concurrent callers in a device's open() and close() methods for a driver > that calls busy/unbusy from cdev open/close could leave the parent > device's busy count in an indeterminate state. Could fixing this by > enforcing single threading through busy/unbusy provide an opportunity to > fix the original problem by having the attach code acquire the same > lock, so that an early call to a cdev method that invokes device_busy() > ends up sleeping until the attach routine returns? That way you don't > single-thread the whole cdev open/close handling, just the part that's > currently causing a problem. I think device_busy() calls require Giant, the same as the whole newbus. The absence of GIANT_REQUIRED assertions in device_busy() and device_unbusy() looks like overlook.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:25 UTC