Possible bug in or around posix_fadvise after r292326

From: Benno Rice <benno_at_FreeBSD.org>
Date: Mon, 4 Jan 2016 22:05:21 -0800
Hi Konstantin,

I recently updated my dev box to r292962. After doing this I attempted to set up PostgreSQL 9.4. When I ran initdb the last phase hung. Using procstat -kk I found it appeared to be stuck in a loop inside a posix_fadvise syscall. I could not ^C or ^Z the initdb process. I could kill it but a subsequent attempt to rm -rf the /usr/local/pgsql/data directory also got stuck and was unkillable by any means. Rebooting allowed me to remove the directory but the initdb process still hung when I re-ran it.

I tried PostgreSQL 9.3 with similar results.

Looking at the source code for initdb I found that it calls posix_fadvise like so[1]:

     /*
      * We do what pg_flush_data() would do in the backend: prefer to use
      * sync_file_range, but fall back to posix_fadvise.  We ignore errors
      * because this is only a hint.
      */
 #if defined(HAVE_SYNC_FILE_RANGE)
     (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
 #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
     (void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
 #else
 #error PG_FLUSH_DATA_WORKS should not have been defined
 #endif

Looking for recent commits involving POSIX_FADV_DONTNEED I found r292326:

https://svnweb.freebsd.org/changeset/base/292326 <https://svnweb.freebsd.org/changeset/base/292326>

Backing this revision out allowed the initdb process to complete.

My current theory is that some how we’re getting ENOLCK or EAGAIN from the BUF_TIMELOCK call in bnoreuselist:

https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676 <https://svnweb.freebsd.org/base/head/sys/kern/vfs_subr.c?view=annotate#l1676>

Leading to an infinite loop in vop_stdadvise:

https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083 <https://svnweb.freebsd.org/base/head/sys/kern/vfs_default.c?annotate=292373#l1083>

I haven’t managed to dig any deeper than that yet.

Is there any other information I could give you to help narrow this down?

Thanks,
	Benno.

[1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/initdb/initdb.c;h=35e39ce4b31b2f437d6e28eaf90500a22d229c6a;hb=HEAD#l631
Received on Tue Jan 05 2016 - 05:05:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC