Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

From: Don Lewis <truckman_at_freebsd.org>
Date: Mon, 16 Jun 2003 04:09:22 -0700 (PDT)
On 16 Jun, Bruce Evans wrote:
> On Mon, 16 Jun 2003, Don Lewis wrote:
> 
>> On 16 Jun, I wrote:
>> > On 16 Jun, Tim Robbins wrote:
>>
>> >>> This looks like a bug in the named pipe code. Reverting
>> >>> sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
>> >>> away. I haven't tracked down exactly what change between RELENG_5_0 and
>> >>> RELENG_5_1 caused the problem.
>> >>
>> >> Looks like revision 1.86 works, but it stops working with 1.87. Moving the
>> >> soclose() calls to fifo_inactive() may have caused it.
>> >
>> > This is an interesting observation, but I'm not sure why it would make a
>> > difference.  I haven't looked at the qmail source, but it looks like it
>> > is doing a non-blocking open on the fifo, calling select() on the fd,
>> > and hoping that select() waits for a writer to open the fifo before
>> > returning with an indication that the descriptor is readable.
> 
> In my review of 1.87, I forgot to ask you how atomic the close is with part
> of it moved out to fifo_inactive().  I think it's important that all
> traces of the old open have gone away (as far as applications can tell)
> when the last close returns.

I hadn't taken queued data into consideration.  Now that I've looked at
this more closely, there are other problems in both the old and new
code.  If a process calls fcntl(fd, F_SETOWN, ...) on one end of the
fifo, that should be undone when that end of the fifo is closed.  In the
old implementation, that only happens when both ends of the fifo are
closed and the sockets are deleted.


>> On 5.1-current, select() waits forever, even if the fifo has been opened
>> for writing by another process.  Select() only returns when something
>> has actually been written to the fifo, and since this process doesn't
>> read anything from the fifo, it spins on select() forever.
>>
>> If some data is getting written to the fifo, it doesn't look like qmail
>> consumes it, and since fifo_close in 1.87 doesn't destroy the sockets,
>> it looks like the data is hanging around in the fifo while neither end
>> is open, and qmail stumbles across this data when it calls select()
>> after re-opening the fifo.
>>
>> Now there are two questions that I can't answer:
>>
>> 	Why is my analysis of select() and the SS_CANTRCVMORE flag
>>         incorrect in 5.1-current with version 1.87 or 1.88 of
>>         fifo_vnops.c.
> 
> I think it is correct, assuming that something writes to the fifo.
> Writing might be part of synchronization but actually reading the
> data should not be necessary since the last close must discard the
> data (POSIX spec).

It sure looks to me like SS_CANTRCVMORE is always set when the write end
of the fifo is closed, no matter whether the the sockets were freshly
allocated by a fifo_open() call on the read end of the fifo, or because
the the last writer closed the write end of the fifo.  It sure looks
like select() should immediately return if this flag is set, but it is
not returning ...

Actually, something seems broken.  I modified my little test program to
actually read the data, which works just fine, but select() still blocks
when the writer closes the fifo, so there doesn't seem to be a way to
detect the EOF.

>> 	Why doesn't qmail get stuck in a similar loop in 4.8-stable,
>>         since select always returns true for reading on a fifo with no
>>         writers?
> 
> Don't know.  Maybe it uses autoconfig to handle the 4.8 behaviour.
> The 4.8 behaviour is normal compared with the buggy behaviour of
> not discarding data on last close, so applications should handle it
> better :-).  Maybe qmain spins under 4.8 too, but only until
> synchronization is achieved.
> 
> Bruce
Received on Mon Jun 16 2003 - 02:09:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC