Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

From: Pawel Jakub Dawidek <pjd_at_FreeBSD.org>
Date: Wed, 13 Mar 2013 22:27:50 +0100
On Wed, Mar 13, 2013 at 11:18:36AM -0400, John Baldwin wrote:
> On Tuesday, March 12, 2013 4:16:32 pm Dirk Engling wrote:
> > While debugging my own daemon I noticed that pidfile_open does not
> > perform the appropriate checks for a running daemon if the caller does
> > not provide a pidptr to pidfile_open
> > 
> > fd = flopen(pfh->pf_path,
> > 	    O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
> > 
> > fails when another daemon holds the lock and flopen sets errno to
> > EAGAIN, the check 4 lines below in
> > 
> > 		if (errno == EWOULDBLOCK && pidptr != NULL) {
> > 
> > means that the pidfile_read is never executed.
> > 
> > This results in my second daemon receiving an EAGAIN which clearly was
> > meant to report a race condition between two daemons starting at the
> > same time and the first one not yet finishing pidfile_write.
> > 
> > The expected behavior would be to set errno to EEXIST, even if no pidptr
> > was passed.
> 
> Yes, I think it should actually perform the same logic even if pidptr is
> NULL of waiting for the other daemon to finish starting up.  Something like
> this:
> 
> Index: lib/libutil/pidfile.c
> ===================================================================
> --- pidfile.c	(revision 248162)
> +++ pidfile.c	(working copy)
> _at__at_ -100,6 +100,7 _at__at_ pidfile_open(const char *path, mode_t mode, pid_t
>  	struct stat sb;
>  	int error, fd, len, count;
>  	struct timespec rqtp;
> +	pid_t dummy;
>  
>  	pfh = malloc(sizeof(*pfh));
>  	if (pfh == NULL)
> _at__at_ -126,7 +127,9 _at__at_ pidfile_open(const char *path, mode_t mode, pid_t
>  	fd = flopen(pfh->pf_path,
>  	    O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
>  	if (fd == -1) {
> -		if (errno == EWOULDBLOCK && pidptr != NULL) {
> +		if (errno == EWOULDBLOCK) {
> +			if (pidptr == NULL)
> +				pidptr = &dummy;
>  			count = 20;
>  			rqtp.tv_sec = 0;
>  			rqtp.tv_nsec = 5000000;

I agree EEXIST should be returned, but I don't like reading existing
pidfile (including waiting for the other process to write its PID) just
to throw read PID away.

How about this patch?

	http://people.freebsd.org/~pjd/patches/pidfile.c.patch

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

Received on Wed Mar 13 2013 - 20:26:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC