Re: vnode leak in FFS code ... ?

From: Marc G. Fournier <scrappy_at_hub.org> Date: Wed, 1 Sep 2004 23:07:46 -0300 (ADT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:10 UTC

On Wed, 1 Sep 2004, Allan Fields wrote:

>> It's really hard to tell if there is a vnode leak here.  The vnode pool
>> is fairly fluid and has nothing to do with the number of files that are
>> actually 'open'.  Vnodes get created when the VFS layer wants to access
>> an object that isn't already in the cache, and only get destroyed when
>> the object is destroyed.  A vnode that reprents a file that was opened
>> will stay 'active' in the system long after the file has been closed,
>> because it's cheaper to keep it active in the cache than it is to
>> discard it and then risk having to go through the pain of a namei()
>> and VOP_LOOKUP() again later.  Only if the maxvnode limit is hit will
>> old vnodes start getting recycled to represent other objects.  [...]
>>
>> So you've obviously bumped up kern.maxvnodes well above the limits that
>> are normally generated from the auto-tuner.  Why did you do that, if not
>> because you knew that you'd have a large working set of referenced (but
>> maybe not open all at once) filesystem objects?  [...]
>
> There was a pevious thread I've found which also helps explains
> this further:
> http://lists.freebsd.org/pipermail/freebsd-stable/2003-May/001266.html
>
> Really the same issue now as then?

I'm not getting the hangs now, it is freeing up vnodes ... but its having 
to work very hard to do so, or so it seems:

venus# ps aux | grep vnlru
root        7  3.0  0.0     0    0  ??  DL    5Aug04 606:34.54  (vnlru)

I started up the script for monitoring this on Aug 29th ... since then, 
there have been 4331 entries to the log file, of which  1927 are in 
'vlrup', which I believe is vnlru running through its lists trying to find 
some to free up, if I recall the code ... ?

venus# grep vnode /var/log/syswatch | wc -l
     4331
venus# grep vnode /var/log/syswatch | grep vlrup | wc -l
     1927

and this is based on a check every minute ...

The other server, running ~19 more VMs (~100 more processes), only up 2 
days now, seems to be fairing better:

debug.numvnodes: 344062 - debug.freevnodes: 168285 - debug.vnlru_nowhere: 0 - vlruwt

I've schedualed 'maintenance' on that server for Saturday ... am going to 
shut down all 'non-host server' processes, and unmount the large file 
system (where all the VMs run off of) ... see if that cleans up any of the 
vnodes without having to do a reboot ...

If that doesn't work, I could cause a panic and have it dump core, if that 
would provide for easier/better debugging ... ?

I have limited flexibility with the server, but it is a 'real' server 
without a fake load on it, and as solid as I've always considered FreeBSD 
to be, I seem to have a knack for pushing it and breaking it :( ... so 
whatever data I can provide to make it that much more solid, even if it 
involves a little bit of downtime to get a good core dump, I'm willing to 
do ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy_at_hub.org           Yahoo!: yscrappy              ICQ: 7615664