:All this debate about the merits of process scope threads and fair :scheduling is great. But tell me, who was working on making this stuff :work well quickly and reliably (i.e. work well)? No one! I don't care :what AIX or Solaris or what else may or may not have done, who was :making this work well for FreeBSD? Having a slow a thread subsystem is :a serious detriment, no matter how nice and flexible it looks on paper. : :Scott I hope Julian won't think badly of me for saying this, but I don't think M:N support in the kernel is a good idea. M:N is the way to go in my view, but the 'M' has to be implemented in userland. And, of course, M:N does *NOT* preclude 1:1. The 'N' is just a number picked out of the ether, after all, so baring userland support you wind up with 1:1 in the kernel. There are a couple of people working on LWP's in DragonFly (that is, direct 1:1 kernel support that includes POSIX signal sharing). It is my intention to take this mechanism, once it is working, and transform it into a M:N implementation where the LWPs represent the 'N' and the userland thread library deals with the 'M'. If one really thinks about it, why is a userland implementation slower then a KSE implementation and can it be made more efficient? This is what I have come up with: * Extra kevent() calls to register events, extra kevent() calls to poll for new events. When the userland thread library issues a non-blocking I/O it has to call kevent() to add the descriptor when EWOULDBLOCK is returned. When the userland thread scheduler switches threads ir needs to poll for new events as well. But this can be automated. There is no why the kernel couldn't automatically add a descriptor returning EWOULDBLOCK to a kqueue. Also, there is no reason why the kernel couldn't write to a user supplied memory location to notify the userland scheduler that a new kevent is pending. (And note that not using EV_ONESHOT in current userland thread libraries is NOT an optimal solution to this problem, for reasons that should be obvious if you think about it for a few seconds). * Signal mask handling. The userland thread schedule needs to block signals during certain critical operations and needs to be able to adjust the signal mask when switching threads (depending on the scope). It must also poll for blocked signals. But this doesn't have to be done with system calls, at least no in the critical path. There is no reason why userland can't register a signal mask pointer and pending signal set with a kernel, where they both reside in user memory. So the kernel needs to do a copyin or two when processing a signal. Signals so rarely occur that it is extrordinarily difficult to justify putting all that overhead in the userland thread scheduler's critical path. It makes sense to shift the overhead to the actual signal delivery operation. Userland can then adjust the signal mask simply by changing a pointer, and poll for blocked signals by testing a single variable in memory. * IPC between threads. Something similar to IPI messaging is needed, where the data is passed solely via shared memory. The only system call involved would be to queue an upcall to the target (rforked) process. This is more a DragonFly-like abstraction though. We are big on cpu localization. In an M:N environment, the cpu localization is abstracted as the 'N'. * TLS segment switching. Short of trying to implement a caching scheme in the segment descriptor array this probably needs to remain a system call. But it isn't very expensive. ~350ns or so on my DragonFly test box. Frankly, the kernel can't switch threads all that quickly either. It takes the kernel at least a 1 uS to switch threads, whereas a userland thread switch (including FP), plus the TLS call, winds up being around 1.2 uS. It really isn't that big a difference. * Blocked FILESYSTEM disk I/O. From a performance standpoint blocked disk I/O is the biggest issue for a M:N design over a 1:1 design. In fact, I think ultimately this is *THE* only issue of any significance. It seems to me that the kernel is long, LONG overdue for getting filesystem support for O_NONBLOCK. I am NOT talking about AIO here, I am talking about making read() or write() to a file in a filesystem work efficiently in a threaded environment. Traditionally the kernel blocks unconditionally in kernel space for such I/O and does read-ahead in 128KB blocks. O_NONBLOCK is ignored. What we want to do is make it work with O_NONBLOCK (or perhaps some new flag) in an efficient manner. This implies that, with special system calls and/or flags, file I/O should be able to return EWOULDBLOC K *AND* should *ALSO* operate somewhat like a device when it does so, with the knowledge that the user program tried to issue this large read kept intact in the kernel so the kernel can do a dependable read-ahead of some of the data (more then 128KB... at least 512KB in my view for things to be efficient), and then generate an event for that descriptor just like a normal device or pipe or socket would. What I am describing here is NOT AIO. IMHO AIO as a concept is a complete failure. -Matt Matthew Dillon <dillon_at_backplane.com>Received on Sun Oct 29 2006 - 17:19:04 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:01 UTC