On Mon, 07 Jun 2004 13:06:49 -0400, Robert Watson wrote: > > All of these limits have existed for quite a while, but typically aren't > run into since the default limits typically are pretty high for normal > application use. If necessary, you can raise the limit by tweaking the > global maximum using kern.maxfiles (either as a tunable or sysctl), and > then as needed adjusting the resource limits that qmail runs with. > Okay, so as a temporary measure, I've raised kern.maxfiles to 20000. I'm concerned about doing this; what I'm seeing suggests that system performance gets really ugly as the number of open files increases, even when it's still well below the old limit. > However, I think the more serious element here is the reason why you reach > the limit: this happens "naturally" under some workloads simply because of > large numbers of open files and network connections. However, in some > workloads, it's a symptom of a system or application bug, such as a > resource leak. The part that has me worried is that I'm hitting the limit now, when I wasn't before. Unfortunately, I haven't been keeping track of my upgrades in -CURRENT, so I can't really put a timeframe on when the problem arose, except that I didn't have the problem before my most recent upgrade a couple days ago. > > Because the resources were returned when qmail was killed, that largely > eliminates the possibility of a kernel resource leak (not entirely, but > largely), as most kernel resource leaks involving file descriptors have > the symptom that even after the process exits, the resources aren't > release (i.e., a reference counting bug or race). This suggests a user > space issue -- that doesn't eliminate a system bug, as it could be a bug > in a library that manages descriptors, but it also suggests the > possibility of an application bug, or at least, a poor application > interaction with a system bug. Occasionally, we've seen bugs in the > threading libraries that result in leaked descriptors, but my recollection > is that qmail doesn't use threads. So that suggests either a support > library (perhaps crypto or the like), or qmail itself. Or that you just > hit an extremely high load. :-) > > In terms of debugging it: your first task it to identify if there's one > process that's holding all the fd's, or if it is distributed over many > proceses. After that, you want to track down what kind of fd is being > left open, which may help you track down why it's left open... > I'm going to have to take this to the qmail list; people there might be able to track this down. Thanks! -- David Benfell, LCP benfell_at_parts-unknown.org --- Resume available at http://www.parts-unknown.org/resume.htmlReceived on Mon Jun 07 2004 - 16:46:03 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC