Re: latest -CURRENT kernel panic

From: Vincent Poy <vincepoy_at_gmail.com>
Date: Tue, 9 Nov 2004 16:30:34 -0800
On Tue, 9 Nov 2004 13:39:47 +0000 (GMT), Robert Watson
<rwatson_at_freebsd.org> wrote:
> 
> On Mon, 8 Nov 2004, Vincent Poy wrote:
> 
> > I updated from a March 6, 2004 -CURRENT which had been running without
> > issues to October 19, 2004 -CURRENT.  All -CURRENT since October 19,
> > 2004 to the latest November 5, 2004 -CURRENT have been panicing with the
> > following:
> 
> Looks like it's likely a bug in the if_xl device driver.  Looks like
> Garance has already pointed you at the debugging guide in the handbook, in
> particular using a serial console via a second box to avoid having to
> type.  In addition to that useful stuff, it would be useful to have you
> try the following:

A second box would work except this is running on a notebook sitting
on the floor due to space issues.  The kernel crash dump method seems
to be what I'm looking for except my swap device is 256MB while I have
2048MB of RAM.  What I did yesterday was unplugged the notebook and
took it next to another computer so I can type the output and
hopefully the notebook wouldn't reboot or shutoff before I finish
typing.  Guess the other alternative is to just take a picture of the
screen's output with a digital camera and then type it out. =)  Is it
possible to set up the serial console but only plug the serial cable
in after a panic?
 
> - if debug.mpsafenet=1, please try temporarily setting debug.mpsafenet=0
>  in loader.conf and rebooting to see if the problem goes away.  This
>  might suggest a locking problem in the xl driver, for example.

You're right that it's on  debug.mpsafenet=1 since from reading the
-current list, it
seems that the dc or was it de driver that required the
debug.mpsafenet=0 and I didn't hear anyone complain about panics with
the xl driver.  I've added debug.mpsafenet=0 to /boot/loader.conf and
will try it on the next reboot.  What I noticed was the problem is not
exactly identical each time.  The one before the one posted below IIRC
said something about cannot lock free mbuf.

> - Could you use gdb or addr2line to convert xl_intr+0x262 below into a
>  source line number?

How exactly do I do the above since I noticed ddb doesn't have a
addr2line command in it?

> - Could you run "show pcpu", "show locks", and "ps" and provide that
>  output (serial console highly recommended).

Will do that on the next panic if it occurs after the debug.mpsafenet=1
setting.
 
> - Could you tell me what sort of load the system is experiencing -- in
>  particular, does it have multiple network interfaces, is it performing
>  web serving, packet forwarding, bridging, XWindows, etc?

It has a xl0, fwe0, ath0, plip0, lo0, pflog0, pfsync0 interface
according to ifconfig -a but I'm only using the xl0 interface as the
ath0 is not configured.  The xl0 is basically doing ipfw/dummynet for
outgoing traffic shaping of 0-80 byte TCP Ack packets as well as
acting as a outgoing router for the LAN of 2 windows machines.  It is
performing httpd except during the last panic, my apache wasn't
working as I am in the process of going from 1.2 to 2.0.  I do have
x.org as well as KDE installed but I'm only running it as needed which
I haven't ran for the last year until about 2 hours ago after
converting from XFree86 to X.Org and then to kde3.3.

Typical load looks like this:
USER     PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
root      27  0.5  0.0     0   12  ??  RL    2:22PM   1:13.28 [swi4: clock sio]
root       0  0.0  0.0     0    4  ??  DLs   2:22PM   0:00.02 [swapper]
root       1  0.0  0.0   720  344  ??  ILs   2:22PM   0:00.01 /sbin/init --
root       2  0.0  0.0     0   12  ??  DL    2:22PM   0:00.61 [g_event]
root       3  0.0  0.0     0   12  ??  DL    2:22PM   0:04.22 [g_up]
root       4  0.0  0.0     0   12  ??  DL    2:22PM   0:03.61 [g_down]
root       5  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [thread taskq]
root       6  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [kqueue taskq]
root       7  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [acpi_task0]
root       8  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [acpi_task1]
root       9  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [acpi_task2]
root      10  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [ktrace]
root      11  0.0  0.0     0   12  ??  RL    2:22PM   4:41.92 [idle: cpu0]
root      12  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq0: clk]
root      13  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq1: atkbd0]
root      14  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq3:]
root      15  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq4: sio0]
root      16  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq5: pcm0]
root      17  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq6: fdc0]
root      18  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq7: ppc0]
root      19  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq8: rtc]
root      20  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq9: acpi0]
root      21  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq10:]
root      22  0.0  0.0     0   12  ??  WL    2:22PM   0:01.57 [irq11: cbb0 cbb1
root      23  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq12: psm0]
root      24  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [irq13:]
root      25  0.0  0.0     0   12  ??  WL    2:22PM   0:01.17 [irq14: ata0]
root      26  0.0  0.0     0   12  ??  WL    2:22PM   0:00.09 [irq15: ata1]
root      28  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi3: vm]
root      29  0.0  0.0     0   12  ??  WL    2:22PM   0:00.87 [swi1: net]
root      30  0.0  0.0     0   12  ??  DL    2:22PM   0:00.54 [yarrow]
root      31  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi5:+]
root      32  0.0  0.0     0   12  ??  WL    2:22PM   0:00.21 [swi6:+]
root      33  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi6: task queue
root      34  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi6: acpitaskq]
root      35  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi2: cambio]
root      36  0.0  0.0     0   12  ??  DL    2:22PM   0:07.74 [acpi_thermal]
root      37  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [usb0]
root      38  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [usbtask]
root      39  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [usb1]
root      40  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [cbb0]
root      41  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [cbb1]
root      42  0.0  0.0     0   12  ??  DL    2:22PM   0:00.03 [fdc0]
root      43  0.0  0.0     0   12  ??  WL    2:22PM   0:00.00 [swi0: sio]
root      44  0.0  0.0     0   12  ??  DL    2:22PM   0:00.01 [pagedaemon]
root      45  0.0  0.0     0   12  ??  DL    2:22PM   0:00.00 [vmdaemon]
root      46  0.0  0.0     0   12  ??  RL    2:22PM   0:06.67 [pagezero]
root      47  0.0  0.0     0   12  ??  DL    2:22PM   0:17.49 [bufdaemon]
root      48  0.0  0.0     0   12  ??  DL    2:22PM   0:04.29 [syncer]
root      49  0.0  0.0     0   12  ??  DL    2:22PM   0:00.04 [vnlru]
root      50  0.0  0.0     0   12  ??  DL    2:22PM   0:00.02 [hpt_wt]
root      51  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [nfsiod 0]
root      52  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [nfsiod 1]
root      53  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [nfsiod 2]
root      54  0.0  0.0     0   12  ??  IL    2:22PM   0:00.00 [nfsiod 3]
root      55  0.0  0.0     0   12  ??  DL    2:22PM   0:00.60 [schedcpu]
root     145  0.0  0.0  1204  628  ??  Is    2:23PM   0:00.00 adjkerntz -i
root     215  0.0  0.0  1440  900  ??  Ss    2:23PM   0:00.23 /sbin/natd -n xl0
root     295  0.0  0.0   488  364  ??  Is    2:23PM   0:00.00 /sbin/devd
root     372  0.0  0.0  1328  876  ??  Ss    2:23PM   0:00.32 /usr/sbin/syslogd
bind     391  0.0  0.2  4360 3544  ??  Ss    2:23PM   0:00.43 /usr/sbin/named -
root     405  0.0  0.1  1408 1044  ??  Ss    2:23PM   0:00.01 /usr/sbin/rpcbind
root     528  0.0  0.0  1304  880  ??  Ss    2:23PM   0:00.45 diskcheckd: ad2 8
root     539  0.0  0.0  1284  928  ??  Is    2:23PM   0:00.00 /usr/sbin/lpd
root     572  0.0  0.0  1244  788  ??  Ss    2:23PM   0:00.01 /usr/sbin/usbd
root     592  0.0  0.1  3348 2568  ??  Is    2:23PM   0:00.16 /usr/sbin/sshd
root     597  0.0  0.1  3568 2824  ??  Ss    2:23PM   0:00.39 sendmail: accepti
smmsp    603  0.0  0.1  3444 2700  ??  Ss    2:23PM   0:00.12 sendmail: Queue r
root     624  0.0  0.1  1384 1088  ??  Ss    2:23PM   0:00.03 /usr/sbin/cron -s
root     829  0.0  0.1  6080 3120  ??  Is    2:24PM   0:00.05 sshd: vince [priv
vince    842  0.0  0.2  6080 3492  ??  S     2:24PM   0:00.63 sshd: vince_at_ttyp0
root     870  0.0  0.0  1304  880  ??  Ss    2:24PM   0:00.44 diskcheckd: ad2 8
uucp     884  0.0  0.1  2224 1732  ??  Is    2:24PM   0:00.00 /usr/local/sbin/f
uucp     887  0.0  0.1  2232 1896  ??  Is    2:24PM   0:00.01 /usr/local/sbin/h
root     955  0.0  0.3  6880 6188  ??  Ss    2:24PM   0:00.12 /usr/local/bin/pe
root     967  0.0  0.3  7100 6444  ??  Ss    2:24PM   0:00.11 /usr/local/bin/pe
root     986  0.0  0.0  1240  732  ??  Is    2:24PM   0:00.00 /usr/sbin/moused 
root    1015  0.0  0.1  1472 1156  ??  Is    2:24PM   0:00.02 /usr/sbin/inetd -
root    1055  0.0  0.2  6080 3128  ??  Is    2:26PM   0:00.05 sshd: vince [priv
vince   1057  0.0  0.2  6080 3480  ??  S     2:26PM   0:01.06 sshd: vince_at_ttyp1
nobody  7012  0.0  0.1  3104 1832  ??  Is    3:16PM   0:00.02 identd -w -t120
vince    843  0.0  0.1  2628 2212  p0  Ss    2:24PM   0:00.11 -tcsh (tcsh)
vince    961  0.0  0.5 11484 9380  p0  T     2:24PM   0:01.83 pine
root    1029  0.0  0.0  1296  944  v0  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1030  0.0  0.0  1296  944  v1  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1031  0.0  0.0  1296  944  v2  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1032  0.0  0.0  1296  944  v3  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1033  0.0  0.0  1296  944  v4  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1034  0.0  0.0  1296  944  v5  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1035  0.0  0.0  1296  944  v6  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1036  0.0  0.0  1296  944  v7  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1037  0.0  0.0  1296  944  v8  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1038  0.0  0.0  1296  944  v9  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1039  0.0  0.0  1296  944  va  Is+   2:24PM   0:00.00 /usr/libexec/gett
root    1040  0.0  0.0  1296  944  vb  Is+   2:24PM   0:00.00 /usr/libexec/gett

> - Are you running with any tweaked network settings, such as
>  net.isr.enable?

My kernel config does have maxusers=512 and 65536 nmbclusters. 
IPFIREWALL, IPDIVERT, DUMMYNET, BRIDGE, HZ=1000 are also in the kernel
config.  There is also this as a temporary workaround for the kmem_map
too small panic issue ever since it started about 9 or so months ago. 
Don't know if this issue was fixed.

options         VM_KMEM_SIZE_MAX=(384*1048576)
options         VM_KMEM_SIZE_SCALE=2

For sysctl, these are the changes:

net.inet.icmp.bmcastecho 1 -> 0
net.inet.tcp.log_in_vain=0 -> 1 
net.inet.udp.log_in_vain=0 -> 1
kern.maxfiles=16424 -> 65536 
kern.maxfilesperproc=14781 -> 32768
net.inet.tcp.sendspace=32768 -> 65536
kern.ipc.somaxconn=128 -> 8192
net.inet.icmp.drop_redirect=0 -> 1
net.inet.icmp.log_redirect=0 -> 1
/sbin/sysctl -w kern.ipc.maxsockbuf=262144 -> 8388608

Thanks!

Cheers,
Vince

> > Slab at 0xc3d75fa8, freei 14 = 0.
> > panic: Duplicate free of item 0xc3d75e00 from zone 0xc143e9a0(Mbuf)
> > cpuid = 0
> > KDB: enter: panic
> > [thread 100019]
> > Stopped at    kdb_enter+0x2c: leave
> > db> trace
> > kdb_enter(c0852d05,100,c38b0980,e,c3d75000) at kdb_enter+0x2c
> > panic(c086ce77,c3d75e00,c143e9a0,c08515e0,c086ce5b) at panic+0x17f
> > uma_dbg_free(c143e9a0,0,c3d75e00) at uma_dbg_free+0x10e
> > uma_zfree_arg(c143e9a0,c3d75e00,0) at uma_zfree_arg+0x1f6
> > m_freem(c3d75e00,c3a0b21c,c3a0b2f4,c3a0cd80,c38b7400) at m_freem+0x3e
> > xl_intr(c3a09000,0,c38b3c00,0,c0612120) at xl_intr+0x262
> > ithread_loop(c38b7400,e9f95d48,c38b7400,c0612120,0) at ithread_loop+0x186
> > fork_exit(c0612120,c38b7400,e9f95d48) at fork_exit+0x7e
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x1, eip = 0, esp = 0xe9f95d7c, ebp = 0 ---
> > db>
> >
> > Any ideas what could be causing this?  Is there anyway to write
> > everything above to a file instead of manually typing everything?
> > Thanks.
> >
> > Cheers,
> > Vince
> > _______________________________________________
> > freebsd-current_at_freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> >
> 
>
Received on Tue Nov 09 2004 - 23:30:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:21 UTC