Re: file descripter leak in current with Qmail?

From: Robert Watson <rwatson_at_freebsd.org>
Date: Mon, 7 Jun 2004 13:06:49 -0400 (EDT)
On Mon, 7 Jun 2004, David A. Benfell wrote:

> I'm running current, with qmail and spamassassin. 
> 
> Having finally caught on to the new kernel build process (ahem), I'm now
> having problems with the qmail UIDs (mostly for qmaild but occasionally
> qmails) exceeding the openfiles limit. 
> 
> When I used sysctl to interrogate kern.openfiles, it said 1836.  I have
> not altered the default maximum.  When I shut down qmail, it promptly
> dropped to something like 180.  When I restarted qmail, it went back up
> to something like 194 but system responsiveness dropped through the
> floor.  In basically the time it's taken me to write this much,
> kern.openfiles has climbed to something like 261.
> 
> So I guess I have a couple of idiot questions to ask here: 
> 
> Is the kern.openfiles limit something (relatively) new?  I was running
> current before on this box but hadn't gotten through a build because I
> hadn't caught on to the new kernel build process since before 5.2 was
> released.  Qmail was not a problem before. 
> 
> Is the correct response to this problem to raise the limit?  If so, I
> presume this would be done in rc.conf; what would be the corresponding
> variable in rc.conf? 
> 
> In the time I've composed this far, system responsiveness seems to have
> returned to normal and kern.openfiles has dropped to something like 221. 
> So I assume the responsiveness issue had to do with qmail trying to
> catch up. 
> 
> I'm in between quarters in school right now, so I have a little time to
> play with this if needed. 

Just to make sure we're clear on terminology, kern.openfiles is the number
of open file descriptors in the system.  Several resource limits impact
the ability to allocate new file descriptors:

- kern.maxfiles, the global maximum number of open file descriptors
  permitted.

- Resource limits, which are per-process, and can be viewed for the
  current process using the "limits" command (or some variation depending
  on shell).

- Real system memory constraints, which can result in allocation failures,
  etc, if exceeded.

All of these limits have existed for quite a while, but typically aren't
run into since the default limits typically are pretty high for normal
application use.  If necessary, you can raise the limit by tweaking the
global maximum using kern.maxfiles (either as a tunable or sysctl), and
then as needed adjusting the resource limits that qmail runs with.

However, I think the more serious element here is the reason why you reach
the limit: this happens "naturally" under some workloads simply because of
large numbers of open files and network connections.  However, in some
workloads, it's a symptom of a system or application bug, such as a
resource leak.

Because the resources were returned when qmail was killed, that largely
eliminates the possibility of a kernel resource leak (not entirely, but
largely), as most kernel resource leaks involving file descriptors have
the symptom that even after the process exits, the resources aren't
release (i.e., a reference counting bug or race).  This suggests a user
space issue -- that doesn't eliminate a system bug, as it could be a bug
in a library that manages descriptors, but it also suggests the
possibility of an application bug, or at least, a poor application
interaction with a system bug.  Occasionally, we've seen bugs in the
threading libraries that result in leaked descriptors, but my recollection
is that qmail doesn't use threads.  So that suggests either a support
library (perhaps crypto or the like), or qmail itself.  Or that you just
hit an extremely high load. :-) 

In terms of debugging it: your first task it to identify if there's one
process that's holding all the fd's, or if it is distributed over many
proceses.  After that, you want to track down what kind of fd is being
left open, which may help you track down why it's left open...

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Senior Research Scientist, McAfee Research
Received on Mon Jun 07 2004 - 15:07:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC