Re: Bug: devfs is sure to have the bug.

From: Kohji Okuno <okuno.kohji_at_jp.panasonic.com> Date: Tue, 09 Aug 2011 11:34:42 +0900 (JST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:16 UTC

Hi Kostic and Jaakko,

> On Fri, Aug 05, 2011 at 06:45:22PM +0300, Jaakko Heinonen wrote:
>> On 2011-08-03, Kostik Belousov wrote:
>> > On Wed, Aug 03, 2011 at 02:44:23PM +0900, Kohji Okuno wrote:
>> > > > devfs_populate(), and the context holds only "dm->dm_lock" in
>> > > > devfs_populate().
>> > > > 
>> > > > On the other hand, "devfs_generation" is incremented in devfs_create()
>> > > > and devfs_destroy() the context holds only "devmtx" in devfs_create()
>> > > > and devfs_destroy().
>> > > > 
>> > > > If a context executes devfs_create() when other context is executing
>> > > > (***), then "dm->dm_generation" is updated incorrect value.
>> > > > As a result, we can not open the last detected device (we receive ENOENT).
>> > 
>> > I think the problem you described is real, and suggested change is right.
>> > Initially, I thought that we should work with devfs_generation as with
>> > the atomic type due to unlocked access in the devfs_populate(), but then
>> > convinced myself that this is not needed.
>> > 
>> > But also, I think there is another half of the problem. Namely,
>> > devfs_lookup() calls devfs_populate_vp(), and then does lookup with the
>> > help of devfs_lookupx(). We will miss the generation update
>> > happen after the drop of the dm_lock in devfs_populate_vp() to reacquire
>> > the directory vnode lock.
>> 
>> I don't understand this. devfs_generation is not protected with dm_lock
>> in devfs_create() and devfs_destroy(). On the other hand if you mean
>> that another thread calls devfs_populate() while we drop dm_lock in
>> devfs_populate_vp(), isn't the mount point up to date when we re-lock
>> dm_lock?
> Yes, I was not quite exact in describing what I mean, and the reference
> to dm_lock drop is both vague and not correct.
> 
> I am after the fact that we do allow the situation where it is externally
> visible that new cdev node was successfully created before the lookup
> returns ENOENT for the path of the node.
> 
>> 
>> > _at__at_ -630,13 +630,15 _at__at_ devfs_populate_loop(struct devfs_mount *dm, int cleanup)
>> >  void
>> >  devfs_populate(struct devfs_mount *dm)
>> >  {
>> > +	unsigned gen;
>> >  
>> >  	sx_assert(&dm->dm_lock, SX_XLOCKED);
>> > -	if (dm->dm_generation == devfs_generation)
>> > +	gen = devfs_generation;
>> > +	if (dm->dm_generation == gen)
>> >  		return;
>> >  	while (devfs_populate_loop(dm, 0))
>> >  		continue;
>> > -	dm->dm_generation = devfs_generation;
>> > +	dm->dm_generation = gen;
>> >  }
>> 
>> After this change dm->dm_generation may be stale although the mount
>> point is up to date? This is probably harmless, though.
> This must be harmless, in the worst case it will cause more calls to
> the populate. In fact, this even allows the dm_generation to roll backward,
> which is again harmless.

After all, do we only have to fix only "devfs_populate()"?
I think that there is no problem while I am testing the 8.1-RELEASE
environment that fix only devfs_populate().

Many thanks,
 Kohji Okuno