Re: jumbograms (& em) & nfs a no go

From: Michal Mertl <mime_at_traveller.cz> Date: Fri, 31 Oct 2003 13:43:20 +0100 (CET) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC

On Thu, 30 Oct 2003, Doug Ambrisko wrote:

> Michal Mertl writes:
> | On Thu, 30 Oct 2003, Sam Leffler wrote:
> |
> | > On Thursday 30 October 2003 04:46 am, Michal Mertl wrote:
> | > > I wanted to test gigabit network performance and found out that current
> | > > (from 5.0 to up to date -current) doesn't fully work with jumbograms (MTU
> | > > set to 6000), Intel adapters and nfs (both UDP and TCP).
> | > >
> | > > I checked that the same thing works with 4.9.
> | > >
> | > > I then left one computer at 4.9 and upgraded the other to 5.0. When I
> | > > mount a partition from 5.0 machine I found out, that copying reliably
> | > > works only from 5.0 to 4.9. The other way around I see messages 'em0:
> | > > discard oversize frame (ether type 800 flags 3 len 67582 > max 6014)' on
> | > > 5.0 and the copying stalls. On 4.9 machine I later see 'nfs server
> | > > 10.0.0.2:/usr: not responding'. The interface is stuck for some time - can
> | > > be revived by changing mtu back to 1500 and down/up sequence.
> | >
> | > I've ran many jumbogram tests of machines connected with a cross-over cable
> | > and em devices at each end.  If you've got a swtch in the middle make sure it
> | > does the right thing.
> |
> | I also used exclusively crossover cable. The same configuration worked
> | with 4.9. The problem appears only with NFS.
>
> You might want to try this patch:
>
> Index: if_em.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/em/if_em.c,v
> retrieving revision 1.32
> diff -c -r1.32 if_em.c
> *** if_em.c	15 Oct 2003 05:34:41 -0000	1.32
> --- if_em.c	30 Oct 2003 19:39:49 -0000
> ***************
> *** 2454,2460 ****
>                                  BUS_SPACE_MAXADDR,       /* highaddr */
>                                  NULL, NULL,              /* filter, filterarg */
>                                  MCLBYTES,                /* maxsize */
> !                                1,                       /* nsegments */
>                                  MCLBYTES,                /* maxsegsize */
>                                  BUS_DMA_ALLOCNOW,        /* flags */
>   			       NULL,			/* lockfunc */
> --- 2454,2460 ----
>                                  BUS_SPACE_MAXADDR,       /* highaddr */
>                                  NULL, NULL,              /* filter, filterarg */
>                                  MCLBYTES,                /* maxsize */
> !                                2,                       /* nsegments */
>                                  MCLBYTES,                /* maxsegsize */
>                                  BUS_DMA_ALLOCNOW,        /* flags */
>   			       NULL,			/* lockfunc */
>
> There was a few bugs in the system before in that there was insufficient
> error check in the bus_dma stuff.  The issue was that HW was writing more
> then was the allocated due to (nsegments=1).  This isn't the right fix but
> might help point to the issue.
>
> I don't have access to the HW to test it out anymore.
>
> Doug A.

I'm afraid it doesn't help. The problem doesn't occur with FTP.

For the last tests I've got two -current machines from Oct 30th.  One
exports a filesystem (server) and the other mounts it R/W (client).

Copying /usr/src from server to client stalls (with 'em0: discard
oversized frame...' on the receiver) and from client to server stalls too.
NFS doesn't work (cp is uninterruptible and other access to remote fs
stalls too). Client shows after some time 'nfs server 10.0.0.1:/usr: not
responding'. At the time NFS doesn't work I can ping the other machine,
so the interface isn't completely stuck.

Copying one large file works from server to client but not the other way
around.

-- 
Michal Mertl