On Tue, Apr 10, 2012 at 09:56:06AM -0400, John Baldwin wrote: > On Monday, April 09, 2012 4:05:29 pm Konstantin Belousov wrote: > > On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > > > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > > > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > > > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > > > >> Hello, > > > > > > >> there seems to be a problem with device attach sequence offered by > > > > > newbus. > > > > > > >> Basically, when device attach method is executing, device is not fully > > > > > > >> initialized yet. Also the device state in the newbus part of the world > > > > > > >> is DS_ALIVE. There is definitely no shattering news in the statements, > > > > > > >> but drivers that e.g. create devfs node to communicate with consumers > > > > > > >> are prone to a race. > > > > > > >> > > > > > > >> If /dev node is created inside device attach method, then usermode > > > > > > >> can start calling cdevsw methods before device fully initialized itself. > > > > > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > > > > > >> like device_busy(9), then panic occurs "called for unatteched device". > > > > > > >> I get reports from users about this issues, to it is not something > > > > > > >> that only could happen. > > > > > > >> > > > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > > > > >> from newbus right after device attach finished and newbus considers > > > > > > >> the device fully initialized. Driver then could create devfs node > > > > > > >> in the after_attach method instead of attach. Please see the patch below. > > > > > > >> > > > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > > > >> index eb720eb..9db74e2 100644 > > > > > > >> --- a/sys/kern/device_if.m > > > > > > >> +++ b/sys/kern/device_if.m > > > > > > >> _at__at_ -43,6 +43,10 _at__at_ INTERFACE device; > > > > > > >> # Default implementations of some methods. > > > > > > >> # > > > > > > >> CODE { > > > > > > >> + static void null_after_attach(device_t dev) > > > > > > >> + { > > > > > > >> + } > > > > > > >> + > > > > > > >> static int null_shutdown(device_t dev) > > > > > > >> { > > > > > > >> return 0; > > > > > > >> _at__at_ -199,6 +203,21 _at__at_ METHOD int attach { > > > > > > >> }; > > > > > > >> > > > > > > >> /** > > > > > > >> + * _at_brief Notify the driver that device is in attached state > > > > > > >> + * > > > > > > >> + * Called after driver is successfully attached to the device and > > > > > > >> + * corresponding device_t is fully operational. Driver now may expose > > > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > > > >> + * > > > > > > >> + * _at_param dev the device to probe > > > > > > >> + * > > > > > > >> + * _at_see DEVICE_ATTACH() > > > > > > >> + */ > > > > > > >> +METHOD void after_attach { > > > > > > >> + device_t dev; > > > > > > >> +} DEFAULT null_after_attach; > > > > > > >> + > > > > > > >> +/** > > > > > > >> * _at_brief Detach a driver from a device. > > > > > > >> * > > > > > > >> * This can be called if the user is replacing the > > > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > > > >> index d485b9f..6d849cb 100644 > > > > > > >> --- a/sys/kern/subr_bus.c > > > > > > >> +++ b/sys/kern/subr_bus.c > > > > > > >> _at__at_ -2743,6 +2743,7 _at__at_ device_attach(device_t dev) > > > > > > >> dev->state = DS_ATTACHED; > > > > > > >> dev->flags &= ~DF_DONENOMATCH; > > > > > > >> devadded(dev); > > > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > > > >> return (0); > > > > > > >> } > > > > > > >> > > > > > > > > > > > > > > Does device_get_softc() work before attach is completed? (I don't have > > > > > > > time to go look in the code right now). If so, then a mutex initialized > > > > > > > and acquired early in the driver's attach routine, and also acquired in > > > > > > > the driver's cdev implementation routines before using any newbus > > > > > > > functions other than device_get_softc(), would solve the problem without > > > > > > > a driver api change that would make it harder to backport/MFC driver > > > > > > > changes. > > > > > > > > > > > > Also, more generally, don't create the dev nodes before you are ready to > > > > > deal with requests. Why do we need to uglify everything here? If you can't > > > > > do that, you can check a bit in the softc and return EBUSY or ENXIO on open if > > > > > that bit says that your driver isn't ready to accept requests. > > > > > > > > > > I agree, this dosen't actually fix anything as the decision for what to put > > > > > in your foo_attach() method rather than foo_after_attach() is non-obvious and > > > > > very arbitrary. > > > > > > > > > > The actual bug appears to only be with using 'device_busy()'. I think > > > > > this should be fixed by making device_busy() better, not by adding > > > > > this type of obfuscation to attach. It should be trivial to make > > > > > device_busy() safe to use on a device that is currently being attached > > > > > which will not require any changes to drivers. > > > > > > > > Could you, please, elaborate your proposal ? How do you think device_busy() > > > > can be enchanced ? > > > > > > > > Obvious idea to sleep inside device_busy() until dev->state becomes != > > > > DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks > > > > if device_attach() method needs to call destroy_dev() to rollback. > > > > > > I think you could have a DS_ATTACHING state and allow device_busy() to work > > > for DS_ATTACHING. The idea being that it is a bug for a driver to invoke > > > device_busy() if it is going to fail attach. You may then need to do a fixup > > > in device_attach() to promote the state from DS_ATTACHED to DS_BUSY when it > > > returns if there is a non-zero busy count. > > This is quite good idea, but it still adds burden to device author, > > although I agree that this is manageable. A scenario I have in mind now > > is the following: > > assume that driver needs to create two devfs nodes, lets name them > > dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) > > calls, and while creation of dri/card0 succeed, consequent creation > > of dri/forcewake0 could fail for numerous reasons. > > > > Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 > > would return ENXIO until dri/forcewake0 is created. This can be implemented > > with flag, indeed. But still somewhat muddy, and probably leads to > > user-visible errors (I mostly worry about graphical login managers). > > You could also sleep on the flag in d_open() (you can imagine a two-step > process where you set a "adding cdev's flag", then all d_open() calls block > on it). Then when finished adding cdev's, you set a flag if an error > occurred and wake up all the waiters. If no error occured the waiters can > have the first open() work fine. But even with other proposals you still > have to deal with this problem if you want to fail out entirely if you > have problems creating cdevs. > > > But for single-node drivers it is indeed a nice solution. > > I think we are somewhat stuck with this for other reasons as well. Note > that device_busy() propagates up the tree to parent devices, so imagine > kldloading a driver that creates a tree (e.g. a bus with a few consumers) > where the leaf devices are attached by a call to bus_generic_attach() from > the device_attach() method for the parent device. Even if you add a > DEVICE_AFTER_ATTACH() hook, while it may allow device_busy() to be invoked > on the leaf device, when it tries to propagate device_busy() up to the > parent device it would still be in the middle of attach and blow up anyway. This looks like a bug in my implementation of after_attach, which apparently should be called for new tree after the top level attach finished. Anyway, would you commit your change ? I definitely can work out the driver change after. But this seems to be a large amount of work for driver authors.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:25 UTC