On Sun, 14 Dec 2003, Jeff Roberson wrote: > On Sat, 13 Dec 2003, Don Lewis wrote: > > > On 13 Dec, Don Lewis wrote: > > > On 12 Dec, Jeff Roberson wrote: > > > > > > > > >> fsync: giving up on dirty: 0xc4e18000: tag devfs, type VCHR, usecount 44, > > >> writecount 0, refcount 14, flags (VI_XLOCK|VV_OBJBUF), lock type devfs: EXCL > > >> (count 1) by thread 0xc20ff500 > > > > > > Why are we trying to reuse a vnode with a usecount of 44 and a refcount > > > of 14? What is thread 0xc20ff500 doing? > > > > Following up to myself ... > > > > It looks like we're trying to recycle this vnode because of the > > following sysinstall code, in distExtractTarball(): > > > > if (is_base && RunningAsInit && !Fake) { > > unmounted_dev = 1; > > unmount("/dev", MNT_FORCE); > > } else > > unmounted_dev = 0; > > > > What happens if we forceably umount /dev while /dev/whatever holds a > > mounted file system? It looks like this is handled by vgonechrl(). It > > looks to me like vclean() is going to do some scary stuff to this vnode. > > > > Excellent work! I think I may know what's wrong. If you look at rev > 1.461 of vfs_subr.c I changed the semantics of cleaning a VCHR that was > being unmounted. I now acquire the xlock around the operation. This may > be the culprit. I'm too tired to debug this right now, but I can look at > it in the am. > Ok, I think I understand what happens.. The syncer runs, and at the same time, we're doing the forced unmount. This causes the sync of the device vnode to fail. This isn't really a problem. After this, while syncing a ffs volume that is mounted on a VCHR from /dev, we bread() and get a buffer for this device and then immediately block. The forced unmount then proceeds, calling vclean() on the device, which goes into the VM via DESTROYVOBJECT. The VM frees all of the pages associated with the object etc. Then, the ffs_update() is allowed to run again with a pointer to a buffer that has pointers to pages that have been freed. This is where vfs_setdirty() comes in and finds a NULL object. The wired counts on the pages are 1, which is consistent with a page in the bufcache. Also the object is NULL which is the only indication we have that this is a free page. I think that if we want to allow unmounting of the underlying device for VCHR, we need to not call vclean() from vgonechr(). We need to just lock, VOP_RECLAIM, cache_purge(), and insmntque to NULL. I've looked through my changes here, and I don't see how I could have introduced this bug. Were we vclean()ing before, and that seems to be the main problem. There have been some changes to device aliasing that could have impacted this. I'm trying to get the scoop from phk now. I'm going to change the way vgonechrl() works, but I'd really like to know what changed that broke this.. Cheers, JeffReceived on Sun Dec 14 2003 - 09:27:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:34 UTC