Re: sysctl kern.ipc.somaxconn limit 65535 why?

From: Chuck Swiger <cswiger_at_mac.com> Date: Wed, 04 Jan 2012 15:24:28 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC

Hi--

On Jan 4, 2012, at 2:23 PM, Dan The Man wrote:
>> It is not arbitrary.  Systems ought to provide sensible limits, which can be adjusted if needed and appropriate.  The fact that a system might have 50,000 file descriptors globally available does not mean that it would be OK for any random process to consume half of them, even if there is still adequate room left for other tasks.  It's common for "ulimit -n" to be set to 256 or 1024.
> 
> Sensibly limits means a sensible stock default, not imposing an OS limit on what admin/developer can set on his own hardware.

In point of fact, protocols like TCP/IP impose limits on what is possible.  It is in fact the job of the OS to say "no" when a developer asks for a TTL of a million via setsockopt(), because RFC-791 limits the maximum value of the "time to live" field to 255.

> With the new IBM developments underway of 16 core atom processors and hundreds of gigabytes of memory, surely a backlog of 100k is manageable. Or what about the future of 500 core systems with a terrabyte of memory, 100k listen queue could be processed instantly.

Um.  I gather you don't have much background in operating system design or massively parallelized systems?

Due to locking constraints imposed by whatever synchronization mechanism and communications topology is employed between cores, you simply cannot just add more processors to a system and expect it to go faster in a linear fashion.  Having 500 cores contending over a single queue is almost certain to result in horrible performance.  Even though the problem of a bunch of independent requests is "embarrassingly parallelizeable", you do that by partitioning the queue into multiple pieces that are fed to different groups or pools of processors to minimize contention over a single data structure.

>> Yes.  If the system doesn't handle connectivity problems via something like exponential backoff, then the weak point is poor software design and not FreeBSD being unwilling to set the socket listen queue to a value in the hundreds of thousands.
> 
> I think what me and Arnaud are trying to say here, is let freebsd use a sensible default value, but let the admin dictate the actual policy if he so chooses to change it for stress testing, future proofing or anything else.

FreeBSD does provide a sensible default value for the listen queue size.  It's tunable to a factor of about 1000 times larger, and is a value which is sufficiently large to hold a backlog of several minutes worth of connections, assuming you can process the requests at a very high rate to keep draining the queue.

There probably isn't a reasonable use-case for queuing unprocessed requests for longer than MAXTTL, which is about 4 minutes.  So, it's conceivable in theory for a high-volume server to want to set the listen queue to, say 1000 req/s * 255 (ie, MAXTTL), but I manage high volume servers for a living, and practical experience including measurements of latency and service performance suggests that tuning the listen queue up to on the order of a thousand or so is the inflection point after which it is better/necessary for the software to recognize and start doing overload mitigation then it is for the OS to blindly queue more requests.

Put more simply, there comes a point where saying "no", ie, dropping the connection with a reset, works better.

Regards,
-- 
-Chuck