Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

From: Bruce Evans <bde_at_zeta.org.au>
Date: Mon, 16 Jun 2003 20:48:47 +1000 (EST)
On Mon, 16 Jun 2003, Don Lewis wrote:

> On 16 Jun, I wrote:
> > On 16 Jun, Tim Robbins wrote:
>
> >>> This looks like a bug in the named pipe code. Reverting
> >>> sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
> >>> away. I haven't tracked down exactly what change between RELENG_5_0 and
> >>> RELENG_5_1 caused the problem.
> >>
> >> Looks like revision 1.86 works, but it stops working with 1.87. Moving the
> >> soclose() calls to fifo_inactive() may have caused it.
> >
> > This is an interesting observation, but I'm not sure why it would make a
> > difference.  I haven't looked at the qmail source, but it looks like it
> > is doing a non-blocking open on the fifo, calling select() on the fd,
> > and hoping that select() waits for a writer to open the fifo before
> > returning with an indication that the descriptor is readable.

In my review of 1.87, I forgot to ask you how atomic the close is with part
of it moved out to fifo_inactive().  I think it's important that all
traces of the old open have gone away (as far as applications can tell)
when the last close returns.

> > It looks like the select code is calling the soreadable() macro to
> > determine if the fifo descriptor is readable, and the soreadable() macro
> > returns a true value if the SS_CANTRCVMORE socket flag is set, which
> > would indicate an EOF condition.

fifo_close() sets this flag and the corresponding send flag on last close,
so there is no direct problem here.

> > ...
> > The posted qmail syscall trace looks like what I would expect to see in
> > the present implementation.  I can't explain why it would behave any
> > differently prior to 1.87 ...
>
> The plot thickens ...
>
> I ran this bit of code on both 5.1 current with version 1.88 of
> fifo_vnops.c, and 4.8-stable:
>
> #include <sys/types.h>
> #include <sys/time.h>
> #include <unistd.h>
> #include <fcntl.h>
> main()
> {
>         int fd;
>         fd_set readfds;
>
>         fd = open("myfifo", O_RDONLY | O_NONBLOCK);
>
>         printf("before the loop\n");
>         while (1) {
>                 FD_ZERO(&readfds);
>                 FD_SET(fd, &readfds);
>                 printf("%d %d\n", fd, select(20, &readfds, NULL, NULL, NULL));
>         }
>         exit(0);
> }
>
> On 4.8-stable, select() immediately returns a "1", whether or not the
> fifo has ever been opened for writing.
>
> On 5.1-current, select() waits forever, even if the fifo has been opened
> for writing by another process.  Select() only returns when something
> has actually been written to the fifo, and since this process doesn't
> read anything from the fifo, it spins on select() forever.
>
> If some data is getting written to the fifo, it doesn't look like qmail
> consumes it, and since fifo_close in 1.87 doesn't destroy the sockets,
> it looks like the data is hanging around in the fifo while neither end
> is open, and qmail stumbles across this data when it calls select()
> after re-opening the fifo.
>
> Now there are two questions that I can't answer:
>
> 	Why is my analysis of select() and the SS_CANTRCVMORE flag
>         incorrect in 5.1-current with version 1.87 or 1.88 of
>         fifo_vnops.c.

I think it is correct, assuming that something writes to the fifo.
Writing might be part of synchronization but actually reading the
data should not be necessary since the last close must discard the
data (POSIX spec).

> 	Why doesn't qmail get stuck in a similar loop in 4.8-stable,
>         since select always returns true for reading on a fifo with no
>         writers?

Don't know.  Maybe it uses autoconfig to handle the 4.8 behaviour.
The 4.8 behaviour is normal compared with the buggy behaviour of
not discarding data on last close, so applications should handle it
better :-).  Maybe qmain spins under 4.8 too, but only until
synchronization is achieved.

Bruce
Received on Mon Jun 16 2003 - 01:48:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC