Re: libthr and 1:1 threading.

From: Matthew Dillon <dillon_at_apollo.backplane.com> Date: Wed, 2 Apr 2003 17:57:40 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:02 UTC

:How does this break the read() API?  The read() API, when called
:on a NBIO fd is *supposed* to return EAGAIN, if the request cannot
:be immediately satisfied, but could be satisfied later.  Right now,
:it blocks.  This looks like breakage of disk I/O introducing a
:stall, when socket I/O doesn't.
:
:If this breaks read() semantics, then socket I/O needs fixing to
:unbreak them, right?

    Oh please.  You know very well that every single UNIX out there 
    operates on disk files as if their data was immediately available
    regardless of whether the process blocks in an uninterruptable
    disk wait or not.  What you are suggesting is that we make our 
    file interface incompatible with every other unix out there... ours
    will return EAGAIN in situations where programs wouldn't expect it.
    Additionally, the EAGAIN operation would be highly non-deterministic
    and it would be fairly difficult for a program to rely on it because
    there would be no easy way (short of experiementation or a sysctl) for
    it to determine whether the 'feature' is present or not.

    Also, the idea that the resulting block I/O operation is then queued
    and one returns immediately from way down deep in the filesystem device
    driver code, and that this whole mess is then tied into select()/kqueue()/
    poll(), is just asking for more non-determinism... now it would 
    depend on the filesystem AND the OS supporting the feature, and other
    UNIX implementations (if they were to adopt the mechanism) would likely
    wind up with slightly different semantics, just like O_NONBLOCK on
    listen() sockets has wound up being broken on things like HPUX.

    For example, how would one deal with, say, issuing a million of these
    special non-blocking reads() all of which fail.  Do we queue a million
    I/Os?  Do we queue just the last requested I/O?  You see the problem?
    The API would be unstable and almost certainly implemented differently
    on each OS platform.

    A better solution would be to implement a new system call, similar to
    pread(), which simply checks the buffer cache and returns a short read
    or an error if the data is not present.   If the call fails you would
    then know that reading that data would block in the disk subsystem and
    you could back-off to a more expensive mechanism like AIO.  If want
    to select() on it you would then simply use kqueue with EVFILT_AIO and 
    AIO.  A system call pread_cache(), or perhaps we could even use 
    recvmsg() with a flag.  Such an interface would not have to touch the
    filesystem code, only the buffer cache and the VM page cache, and
    could be implemented in less then a day.

						-Matt