Re: libthr and 1:1 threading.

From: Terry Lambert <tlambert2_at_mindspring.com>
Date: Wed, 02 Apr 2003 23:24:03 -0800
Matthew Dillon wrote:
> :How does this break the read() API?  The read() API, when called
> :on a NBIO fd is *supposed* to return EAGAIN, if the request cannot
> :be immediately satisfied, but could be satisfied later.  Right now,
> :it blocks.  This looks like breakage of disk I/O introducing a
> :stall, when socket I/O doesn't.
> :
> :If this breaks read() semantics, then socket I/O needs fixing to
> :unbreak them, right?
> 
>     Oh please.  You know very well that every single UNIX out there
>     operates on disk files as if their data was immediately available
>     regardless of whether the process blocks in an uninterruptable
>     disk wait or not.

False.  SVR4.0.2 and SVR4.2 do not.  They act as I describe.  The
code was written by a guy named Steve Baumel.


>     What you are suggesting is that we make our file interface
>     incompatible with every other unix out there... ours will
>     return EAGAIN in situations where programs wouldn't expect it.

According to the FreeBSD 5.x man page for read(2):

     [EAGAIN]           The file was marked for non-blocking I/O, and no data
                        were ready to be read.

...in other words, they mark it for non-blocking I/O, they
*better* expect it!

And at least /usr/src/lib/libc_r/uthread/uthread_read[v].c expects
it from the kernel.

                /* Perform a non-blocking read syscall: */
                while ((ret = __sys_read(fd, buf, nbytes)) < 0) {
                        if ((_thread_fd_getflags(fd) & O_NONBLOCK) == 0 &&  
                            (errno == EWOULDBLOCK || errno == EAGAIN)) { 
                                curthread->data.fd.fd = fd;
                                _thread_kern_set_timeout(NULL);


The kernel also certainly expects, if not EAGAIN, EWOULDBLOCK:

        if ((error = fo_read(fp, &auio, td->td_ucred, flags, td))) {
                if (auio.uio_resid != cnt && (error == ERESTART ||
                    error == EINTR || error == EWOULDBLOCK))
                        error = 0;
        }
        cnt -= auio.uio_resid;


>     Additionally, the EAGAIN operation would be highly non-deterministic
>     and it would be fairly difficult for a program to rely on it because
>     there would be no easy way (short of experiementation or a sysctl) for
>     it to determine whether the 'feature' is present or not.

???  It's in the man page!  You *must* handle it, if it's in the
man page!

You know that there are a number of VOP_CLOSE routines that can
return EAGAIN, right?  Including ufs_close().


>     Also, the idea that the resulting block I/O operation is then queued
>     and one returns immediately from way down deep in the filesystem device
>     driver code, and that this whole mess is then tied into select()/kqueue()/
>     poll(), is just asking for more non-determinism... now it would
>     depend on the filesystem AND the OS supporting the feature, and other
>     UNIX implementations (if they were to adopt the mechanism) would likely
>     wind up with slightly different semantics, just like O_NONBLOCK on
>     listen() sockets has wound up being broken on things like HPUX.

No.  It creates no obligations on the part of applications or other
UNIX implementations which are not already there.  It doesn't break
POSIX semantics.


>     For example, how would one deal with, say, issuing a million of these
>     special non-blocking reads() all of which fail.  Do we queue a million
>     I/Os?  Do we queue just the last requested I/O?  You see the problem?
>     The API would be unstable and almost certainly implemented differently
>     on each OS platform.

They aren't "special".  You handle them by issuing an EAGAIN, if
they can't be immediately satisfied.  Just like the man page says.

I don't think you are understanding.  This is not a replacement
for AIO.  It's a way of touching pages to force them into the
buffer cache, rather than blocking.  It's permitted by POSIX
for read(2) to return EAGAIN to do this.

There's no requirement on the queuing of the I/O.  I'd suggest
that you don't attempt more than one simultaneously on a
descriptor though, since it's not going to do anything for you,
since each one that fails is going to also fail to use the "resid"
value to update the file pointer.

So if you issue a million of these for one page, well... you've
just asked for the same page to be loaded into memory a million
times, because the read(2) system call doesn't advance the file
pointer except by the amount of its non-negative return value.

8-) 8-).


>     A better solution would be to implement a new system call, similar to
>     pread(), which simply checks the buffer cache and returns a short read
>     or an error if the data is not present.   If the call fails you would
>     then know that reading that data would block in the disk subsystem and
>     you could back-off to a more expensive mechanism like AIO.  If want
>     to select() on it you would then simply use kqueue with EVFILT_AIO and
>     AIO.  A system call pread_cache(), or perhaps we could even use
>     recvmsg() with a flag.  Such an interface would not have to touch the
>     filesystem code, only the buffer cache and the VM page cache, and
>     could be implemented in less then a day.

The pread(2) call isn't even really supported in libc_r.

You might as well call this function read(2); I think the same
amount of time would be necessary in both cases, and there's no
reason to introduce yet another system call.

People reading fd's are not supposed to care what they point to
under the covers, only about POSIX semantics.

I'm not really convinced by your argument that some people might
be ignoring the manual page and the POSIX.1 standard, and so be
unduly surprised by EAGAIN from a read(2) call on an fd open to
an FS file instead of some other fd to a socket or FIFO or some
other FS object.

-- Terry
Received on Wed Apr 02 2003 - 21:25:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:02 UTC