Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 19 Aug 2010 09:34:27 -0400
On Thursday, August 19, 2010 5:29:25 am pluknet wrote:
> On 19 August 2010 00:07, John Baldwin <jhb_at_freebsd.org> wrote:
> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote:
> >> On 18 August 2010 23:11, pluknet <pluknet_at_gmail.com> wrote:
> >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel_at_gmail.com> wrote:
> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote:
> >> >>> On 18 August 2010 12:07, pluknet <pluknet_at_gmail.com> wrote:
> >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel_at_gmail.com> wrote:
> >> >>> >
> >> >>> >>
> >> >>> >> Also please take a note of the John' suggestion to use the taskqueue.
> >> >>> >
> >> >>> > I decided to go this road. Thank you both.
> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark results.
> >> >>> >
> >> >>>
> >> >>> So, I modified the patch to defer proc_create() with taskqueue(9).
> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=yes` perf.
> > evaluation.
> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both
> >> >>> nfs-mounted over 1Gbit LAN.
> >> >>>
> >> >>> clean old
> >> >>> 1137.985u 239.411s 7:42.15 298.0%       6538+2133k 87+43388io 226pf+0w
> >> >>>
> >> >>> clean new
> >> >>> 1134.755u 240.032s 7:41.25 298.0%       6553+2133k 87+43367io 224pf+0w
> >> >>>
> >> >>> Patch needs polishing, though it generally works.
> >> >>> Not sure if shep_chan (or whatever name it will get) needs locking.
> >> >> As I said yesterday, if several requests to create nfsiod coming one
> >> >> after another, you would loose all but the last.
> >> >>
> >> >> You should put the requests into the list, probably protected by
> >> >> nfs_iod_mtx.
> >> >>
> >> >
> >> > How about this patch? Still several things to ask.
> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_iod_mtx
> > held.
> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is
> > appreciated.
> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set
> >> > NFSIOD_NOT_AVAILABLE)
> >> > on memory shortage? Not tested.
> >> >
> >> > There are debug printf() left intentionally to see how 3 contexts run
> > under load
> >> > to each other. I attached these messages as well if that makes sense.
> >> >
> >>
> >> Ah, yes. Sorry, forgot about that.
> >
> > Your task handler needs to run in a loop until the list of requests is empty.
> > If multiple threads call taskqueue_enqueue() before your task is scheduled,
> > they will be consolidated down to a single call of your task handler.
> 
> Thanks for your suggestions.
> 
> So I converted nfs_nfsiodnew_tq() into loop, and as such
> I changed a cleanup SLIST_FOREACH() as well to free a bulk of
> (potentially consolidated) completed requests in one pass.
> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it
> with nfs_iod_mtx held.
> 
> >
> > -       int error, i;
> > -       int newiod;
> > +       int i, newiod, error;
> >
> > You should sort these alphabetically if you are going to change this.  I would
> > probably just leave it as-is.
> 
> Err.. that's resulted after a set of changes. Thanks for noting that.
> 
> >
> > Also, I'm not sure how this works as you don't actually wait for the task to
> > complete.  If the taskqueue_enqueue() doesn't preempt, then you may read
> > ni_error as zero before the kproc has actually been created and return success
> > even though an nfsiod hasn't been created.
> >
> 
> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync with
> task handler. That was part to think about, as I didn't find such a use pattern.
> It seems though that tasks are launched now strictly one-by-one, without
> visible parallelism (as far as debug printf()s don't overlap anymore).

Ah, if it is safe to sleep then I have a couple of suggestions:

- Use M_WAITOK to malloc() so you don't have to worry about the failure case
  (I see Rick already suggested this).

- Use something like this in the code that schedules the task:

	mtx_unlock(&nfs_iod_mtx);
	nip = malloc(..., M_WAITOK);
	/* Initialize nip. */
	mtx_lock(&nfs_iod_mtx);
	SLIST_INSERT_HEAD(...);	/* Maybe use STAILQ_INSERT_TAIL() instead? */
	taskqueue_enqueue(...);
	while (!nip->done)
		mtx_sleep(nip, &nfs_iod_mtx, ...);
	mtx_unlock(&nfs_iod_mtx);
	error = nip->ni_error;
	free(nip);
	/* Existing if (error), etc. code */

and something like this in the task handler:

	mtx_lock(&nfs_iod_mtx);
	while ((nip = STAILQ_FIRST(...)) != NULL) {
		STAILQ_REMOVE_HEAD(...);
		mtx_unlock(&nfs_iod_mtx);
		/* Create thread, setting ni_error, etc. */
		mtx_lock(&nfsd_iod_mtx);
		wakeup(nip);
	}
	mtx_unlock(&nfs_iod_mtx);

The sleep/wakeup pattern is far more common than using taskqueue_drain() for
this sort of thing.  Using STAILQ instead of SLIST would give you "fairer"
FIFO processing btw.

-- 
John Baldwin
Received on Thu Aug 19 2010 - 11:35:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC