Re: Much improved sendfile(2) kernel implementation

From: Gleb Smirnoff <glebius_at_FreeBSD.org>
Date: Thu, 21 Sep 2006 15:44:31 +0400
  Andre,

On Wed, Sep 20, 2006 at 11:59:13PM +0200, Andre Oppermann wrote:
A> I have rewritten kern_sendfile() to work in two loops, the inner which turns
A> as many pages into mbufs as it can up to the free send socket buffer space.
A> The outer loop then drops the whole mbuf chain into the send socket buffer,
A> calls tcp_output() on it and then waits until 50% of the socket buffer are
A> free again to repeat the cycle.  This way tcp_output() gets the full amount
A> of data to work with and can issue up to 64K sends for TSO to chop up in the
A> network adapter without using any CPU cycles.  Thus it gets very efficient
A> especially with the readahead the VM and I/O system do.
A> 
A> The patch is available here:
A>  http://people.freebsd.org/~andre/sendfile-20060920.diff

I see that mbuf allocations are done in different way depending on the SS_NBIO
flag on socket. This isn't true for the current code, so your patch isn't
only opmization, but is also changing the semantics of the syscall. This should
be avoided in single change/commit.

The non-blocking IO definition is related to the blocking on the socket buffer.
It isn't related to blocking on memory, so this change is incorrect:

-			m_header = m_uiotombuf(hdr_uio, M_DONTWAIT, 0, 0);
-			if (m_header == NULL)
+			m = m_uiotombuf(hdr_uio, (nbio ? M_NOWAIT : M_WAITOK),
+			    0, 0);
+			if (m == NULL) {
+				error = nbio ? EAGAIN : ENOBUFS;

There should be unconditional M_NOWAIT. Oops, the M_DONTWAIT in the current
code is incorrect. It is present since rev. 1.171. If the m_uiotombuf() fails
the current code returns from syscall without error! Before rev. 1.171, there
wasn't m_uiotombuf(), the mbuf header was allocated below, with correct wait
argument.

The wait argument for m_uiotombuf() should be changed to M_WAITOK, but in
a separate commit.

And this one:

+			m0 = m_get((nbio ? M_NOWAIT : M_WAITOK), MT_DATA);
+			if (m0 == NULL) {
+				error = (nbio ? EAGAIN : ENOBUFS);
+				sf_buf_mext((void *)sf_buf_kva(sf), sf);
+				break;
+			}

This one should be M_WAITOK always. It is M_TRYWAIT (equal to M_WAITOK) in
the current code.

Is there RELENG_6 version of your patch. If there is one, we can test it
quite extensively.

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
Received on Thu Sep 21 2006 - 09:44:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC