Re: panic in tulip_rx_intr after recent changes

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Mon, 4 Jun 2007 10:01:02 +0900 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:11 UTC

On Sun, Jun 03, 2007 at 03:39:56PM +0200, Arne H Juul wrote:
 > (this mail didn't make it to the list from my private
 > address, so I'm resending it from work instead; my
 > apologies if it suddenly appears multiple times)
 > 
 > 
 > I'm getting a kernel panic during network startup with the
 > "de" driver.  Here's the messages from the crash dump:
 > 
 > <118>Mounting local file systems:
 > <118>.
 > <118>Setting hostname: bluebox.trondheim.corp.yahoo.com.
 > <118>net.inet6.ip6.auto_linklocal:
 > <118>1
 > <118> ->
 > <118>0
 > <118>
 > de0: unable to load rx map, error = 27
 > panic: tulip_rx_intr
 > cpuid = 0
 > KDB: enter: panic
 > Uptime: 13s
 > 
 > I think this must have been introduced during the last week
 > or so on -CURRENT; my old kernel works OK:
 > 
 > arnej_at_bluebox:~ $ uname -a
 > FreeBSD bluebox 7.0-CURRENT FreeBSD 7.0-CURRENT #13: Tue May 29 08:02:41 
 > CEST 2007 root_at_bluebox:/usr/obj/home/src.cur/sys/GENERIC amd64
 > 
 > as you can see this is on amd64 platform.
 > 
 > it crashes here (in if_de.c):
 > 
 > 3557                error = bus_dmamap_load_mbuf(ri->ri_data_tag,
 > *nextout->di_map, ms,
 > 3558                    tulip_dma_map_rxbuf, nextout->di_desc,
 > BUS_DMA_NOWAIT);
 > 3559                if (error) {
 > 3560                    device_printf(sc->tulip_dev,
 > 3561                        "unable to load rx map, error = %d\n",
 > error);
 > 3562                    panic("tulip_rx_intr");         /* XXX */
 > 3563                }
 > 
 > errno 27 is EFBIG, and indeed the mbuf is MCLBYTES:
 > 
 > (kgdb) print ms[0].M_dat.MH.MH_pkthdr.len
 > $22 = 2048
 > 
 > while the tag has a lower limit:
 > 
 > (kgdb) print ri->ri_data_tag[0].maxsegsz
 > $21 = 2032
 > 
 > it looks like this is the triggering change:
 > 
 > RCS file: /usr/cvs/src/sys/amd64/amd64/busdma_machdep.c,v
 > ----------------------------
 > revision 1.81
 > date: 2007/05/29 06:30:25;  author: yongari;  state: Exp;  lines: +2 -0
 > Honor maxsegsz of less than a page size in a DMA tag. Previously it
 > used to return PAGE_SIZE without respect to restrictions of a DMA tag.
 > This affected all of the busdma load functions that use
 > _bus_dmamap_loader_buffer() as their back-end.
 > 
 > so the questions are...
 > 
 > Is the above change wrong?
 > or is the "de" driver buggy?
 > or should bus_dmamap_load_mbuf handle this somehow?
 > and does it cause problems other places too?
 > 

I'm not familiar with de(4) but it seems that it needs big cleanup.
All busdma load functions can fail so it's job of the driver to
recover from busdma load failure. I think explicitly invoking panic(9)
is really bad idea.

The de(4) set maximum segment size for a dma segment to
TULIP_DATA_PER_DESC in tulip_busdma_allocring(). I don't know why
the author limit the segment size to TULIP_DATA_PER_DESC but I guess
it comes from the limit of DMA engine of the hardware.(e.g. the
hardware can dma upto TULIP_DATA_PER_DESC bytes in size for SG
operations.)
In Rx path it allocates a mbuf with m_getcl(9) so the length of
the mbuf is MCLBYTES which is greater than a segment size supported by
the hardware.

I guess we have two possible way to fix de(4).

1. Nuke TULIP_DATA_PER_DESC and use MCLBYTES instead. Of course, it
   assumes the hardware can support upto the segment size in dma
   operation.
2. Set the mbuf length to TULIP_DATA_PER_DESC in Rx path after
   allocating a mbuf with m_getcl(9). See attached patch(I don't have
   de(4) hardware so it's just guess work but you may know the point).

However it still lacks a code that should recover from busdma load
failure. :-(

-- 
Regards,
Pyun YongHyeon