Re: file descripter leak in current with Qmail?

From: David A. Benfell <benfell_at_parts-unknown.org>
Date: Mon, 7 Jun 2004 11:45:51 -0700
On Mon, 07 Jun 2004 13:06:49 -0400, Robert Watson wrote:
> 
> All of these limits have existed for quite a while, but typically aren't
> run into since the default limits typically are pretty high for normal
> application use.  If necessary, you can raise the limit by tweaking the
> global maximum using kern.maxfiles (either as a tunable or sysctl), and
> then as needed adjusting the resource limits that qmail runs with.
> 
Okay, so as a temporary measure, I've raised kern.maxfiles to 20000.
I'm concerned about doing this; what I'm seeing suggests that system
performance gets really ugly as the number of open files increases,
even when it's still well below the old limit.

> However, I think the more serious element here is the reason why you reach
> the limit: this happens "naturally" under some workloads simply because of
> large numbers of open files and network connections.  However, in some
> workloads, it's a symptom of a system or application bug, such as a
> resource leak.

The part that has me worried is that I'm hitting the limit now, when I
wasn't before.  Unfortunately, I haven't been keeping track of my
upgrades in -CURRENT, so I can't really put a timeframe on when the
problem arose, except that I didn't have the problem before my most
recent upgrade a couple days ago.
> 
> Because the resources were returned when qmail was killed, that largely
> eliminates the possibility of a kernel resource leak (not entirely, but
> largely), as most kernel resource leaks involving file descriptors have
> the symptom that even after the process exits, the resources aren't
> release (i.e., a reference counting bug or race).  This suggests a user
> space issue -- that doesn't eliminate a system bug, as it could be a bug
> in a library that manages descriptors, but it also suggests the
> possibility of an application bug, or at least, a poor application
> interaction with a system bug.  Occasionally, we've seen bugs in the
> threading libraries that result in leaked descriptors, but my recollection
> is that qmail doesn't use threads.  So that suggests either a support
> library (perhaps crypto or the like), or qmail itself.  Or that you just
> hit an extremely high load. :-) 
> 
> In terms of debugging it: your first task it to identify if there's one
> process that's holding all the fd's, or if it is distributed over many
> proceses.  After that, you want to track down what kind of fd is being
> left open, which may help you track down why it's left open...
> 
I'm going to have to take this to the qmail list; people there might
be able to track this down.

Thanks!

-- 
David Benfell, LCP
benfell_at_parts-unknown.org
---
Resume available at http://www.parts-unknown.org/resume.html
Received on Mon Jun 07 2004 - 16:46:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC