Or perhaps it should be just "Here be dragons"... Whilst attempting to nail down some serious performance issues (compared with 4.x) in preparation for a 6.x rollout here, we've come across something of a fundamental bug. In this particular environment (a Usenet transit server, so very high network and disk I/O) we observed that processes were spending a considerable amount of time in state 'wswbuf', traced back to getpbuf() in vm/vm_pager.c To cut a long story short, the order in which nswbuf is being initialized is completely, totally, and utterly wrong -- this was introduced by revision 1.132 of vm/vnode_pager.c just over 4 years ago. In vnode_pager.c we find: static void vnode_pager_init(void) { vnode_pbuf_freecnt = nswbuf / 2 + 1; } Unfortunately, nswbuf hasn't been assigned to yet, just happens to be zero (in all cases), and thus the kernel believes that there is only ever *one* swap buffer available. kern_vfs_bio_buffer_alloc() in kern/vfs_bio.c which actually does the calculation and assignment, is called rather further on in the process, by which time the damage has been done. The net result is that *any* calls involving getpbuf() will be unconditionally serialized, completely destroying any kind of concurrency (and performance). Given the memory footprint of our machines, we've hacked in a simple: nswbuf = 0x100; into vnode_pager_init(), since the calculation ends up giving us the maximum number anyway. There are a number of possible 'correct' fixes in terms of re-ordering the startup sequence. With the aforementioned hack, we're now seeing considerably better machine operation, certainly as good as similar 4.10-STABLE boxes. As per $SUBJECT, this affects all of RELENG_5, RELENG_6, and HEAD, and should, IMO, be considered an absolutely required fix for 6.0-RELEASE. -aDeReceived on Mon Aug 08 2005 - 19:39:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:40 UTC