Re: device_attach(9) and driver initialization

From: Ian Lepore <freebsd_at_damnhippie.dyndns.org>
Date: Mon, 09 Apr 2012 09:42:18 -0600
On Mon, 2012-04-09 at 17:59 +0300, Konstantin Belousov wrote:
> On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote:
> > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote:
> > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote:
> > > > 
> > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote:
> > > > 
> > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote:
> > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote:
> > > > >>> Hello,
> > > > >>> there seems to be a problem with device attach sequence offered by newbus.
> > > > >>> Basically, when device attach method is executing, device is not fully
> > > > >>> initialized yet. Also the device state in the newbus part of the world
> > > > >>> is DS_ALIVE. There is definitely no shattering news in the statements,
> > > > >>> but drivers that e.g. create devfs node to communicate with consumers
> > > > >>> are prone to a race.
> > > > >>> 
> > > > >>> If /dev node is created inside device attach method, then usermode
> > > > >>> can start calling cdevsw methods before device fully initialized itself.
> > > > >>> Even more, if device tries to use newbus helpers in cdevsw methods,
> > > > >>> like device_busy(9), then panic occurs "called for unatteched device".
> > > > >>> I get reports from users about this issues, to it is not something
> > > > >>> that only could happen.
> > > > >>> 
> > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called
> > > > >>> from newbus right after device attach finished and newbus considers
> > > > >>> the device fully initialized. Driver then could create devfs node
> > > > >>> in the after_attach method instead of attach. Please see the patch below.
> > > > >>> 
> > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
> > > > >>> index eb720eb..9db74e2 100644
> > > > >>> --- a/sys/kern/device_if.m
> > > > >>> +++ b/sys/kern/device_if.m
> > > > >>> _at__at_ -43,6 +43,10 _at__at_ INTERFACE device;
> > > > >>> # Default implementations of some methods.
> > > > >>> #
> > > > >>> CODE {
> > > > >>> +	static void null_after_attach(device_t dev)
> > > > >>> +	{
> > > > >>> +	}
> > > > >>> +
> > > > >>> 	static int null_shutdown(device_t dev)
> > > > >>> 	{
> > > > >>> 	    return 0;
> > > > >>> _at__at_ -199,6 +203,21 _at__at_ METHOD int attach {
> > > > >>> };
> > > > >>> 
> > > > >>> /**
> > > > >>> + * _at_brief Notify the driver that device is in attached state
> > > > >>> + *
> > > > >>> + * Called after driver is successfully attached to the device and
> > > > >>> + * corresponding device_t is fully operational. Driver now may expose
> > > > >>> + * the device to the consumers, e.g. create devfs nodes.
> > > > >>> + *
> > > > >>> + * _at_param dev		the device to probe
> > > > >>> + *
> > > > >>> + * _at_see DEVICE_ATTACH()
> > > > >>> + */
> > > > >>> +METHOD void after_attach {
> > > > >>> +	device_t dev;
> > > > >>> +} DEFAULT null_after_attach;
> > > > >>> +
> > > > >>> +/**
> > > > >>>  * _at_brief Detach a driver from a device.
> > > > >>>  *
> > > > >>>  * This can be called if the user is replacing the
> > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> > > > >>> index d485b9f..6d849cb 100644
> > > > >>> --- a/sys/kern/subr_bus.c
> > > > >>> +++ b/sys/kern/subr_bus.c
> > > > >>> _at__at_ -2743,6 +2743,7 _at__at_ device_attach(device_t dev)
> > > > >>> 	dev->state = DS_ATTACHED;
> > > > >>> 	dev->flags &= ~DF_DONENOMATCH;
> > > > >>> 	devadded(dev);
> > > > >>> +	DEVICE_AFTER_ATTACH(dev);
> > > > >>> 	return (0);
> > > > >>> }
> > > > >>> 
> > > > >> 
> > > > >> Does device_get_softc() work before attach is completed?  (I don't have
> > > > >> time to go look in the code right now).  If so, then a mutex initialized
> > > > >> and acquired early in the driver's attach routine, and also acquired in
> > > > >> the driver's cdev implementation routines before using any newbus
> > > > >> functions other than device_get_softc(), would solve the problem without
> > > > >> a driver api change that would make it harder to backport/MFC driver
> > > > >> changes.
> > > > > No, 'a mutex' does not solve anything. It only adds enourmous burden
> > > > > on the driver developers, because you cannot sleep under mutex. Changing
> > > > > the mutex to the sleepable lock also does not byy you much, since
> > > > > you need to somehow solve the issues with some cdevsw call waking up
> > > > > thread sleeping into another cdevsw call, just for example.
> > > > > 
> > > > > Singlethreading a driver due to this race is just silly.
> > > > > 
> > > > > And, what do you mean by 'making it harder to MFC' ? How ?
> > > > 
> > > > driver_attach()
> > > > {
> > > > 	...
> > > > 	softc->flags = 0; // redundant, since softc is initialized to 0.
> > > > 	softc->cdev = device_create...();
> > > > 	...
> > > > 	softc->flags |= READY;
> > > > }
> > > > 
> > > > driver_open(...)
> > > > {
> > > > 	if (!(softc->flags & READY))
> > > > 		return ENXIO;
> > > > 	...
> > > > }
> > > > 
> > > > What's the big burden here?
> > > The burden is that your proposal does not work. As I described above,
> > > device_busy() calls from cdevsw method panic if open() is called before
> > > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device
> > > attach() method returned, so no workarounds from attach() could solve
> > > this.
> > 
> > One thing that keeps floating to the front of my brain is that all the
> > proposals so far (including my not-well-thought-out mutex suggestion)
> > requires changing every existing driver to get the new safe behavior.
> > 
> > Hmmm.  Looking at the code, not very many drivers call device_busy().
> > Why is that?  
> > 
> > I agree that calling device_create() should be deferred until the driver
> You mean make_dev(9), I assume.
> 

Yes, of course, sorry (I was looking at a call to a local convenience
routine in a private driver).

> > is ready to handle requests.  That's only part of the fix if the newbus
> > support routines are still going to have a window where they can panic
> > because the internal state variables haven't yet transitioned to the
> > correct state.  
> > 
> > Also, the implementation of device_busy() looks to be unsafe unless it's
> > being implicitly protected by some locking in a call chain that isn't
> > jumping out at me with simple grepping of the code.  For example,
> > concurrent callers in a device's open() and close() methods for a driver
> > that calls busy/unbusy from cdev open/close could leave the parent
> > device's busy count in an indeterminate state.  Could fixing this by
> > enforcing single threading through busy/unbusy provide an opportunity to
> > fix the original problem by having the attach code acquire the same
> > lock, so that an early call to a cdev method that invokes device_busy()
> > ends up sleeping until the attach routine returns?  That way you don't
> > single-thread the whole cdev open/close handling, just the part that's
> > currently causing a problem.
> I think device_busy() calls require Giant, the same as the whole newbus.
> The absence of GIANT_REQUIRED assertions in device_busy() and device_unbusy()
> looks like overlook.

Hmmm.  Shouldn't device_attach() then require Giant as well?  I see that
device_detach() and device_probe_and_attach() contains GIANT_REQUIRED
but device_attach() does not.  

Of the few devices calling device_busy(), some use D_NEEDGIANT and some
do not (the do-nots include drm, fdc, rp -- I stopped looking at this
point).  Only the fdc code specifically calls mtx_lock(&Giant), but it
appears that it does not do so around the device_busy/unbusy() calls.

If you're supposed to hold Giant during calls to device_busy/unbusy, and
assuming it should be held by whomever calls device_attach(), wouldn't
that fix the panic?  I don't know if that's a good assumption, because
I'm not seeing anything in the manpages or architecture handbook about
holding Giant during newbus calls.

-- Ian
Received on Mon Apr 09 2012 - 13:43:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:25 UTC