Re: Latest -current complete lockup (tcp changes?)

From: Niclas Zeising <niclas.zeising_at_gmail.com>
Date: Sun, 25 Mar 2007 19:40:58 +0200
Giorgos Keramidas wrote:
> On 2007-03-24 15:47, Andrey Chernov <ache_at_freebsd.org> wrote:
>> Very recent -current cause complete lockup in case and after small
>> amount of network activity happens. No panic, no ddb console - nothing
>> just lockup. Previously working kernel is from Mar 22. I suspect
>> recent round of TCP changes.
> 
> I'm seeing kernel panics with a kernel updates to:
> 
>     changeset:   129918:f239d4a0fa1e
>     tag:         tip
>     user:        imp
>     date:        Fri Mar 23 23:47:59 2007 +0000
>     summary:     Default to booting off the SD card.  It is more useful, [...]
> 
> The panic happens after a small amount of network activity, inside the
> TCP SACK code:
> 
> ---------------------------------------------------------------------------
> Physical memory: 495 MB
> Dumping 54 MB: 39 23 7
> 

[SNIP stack trace]

> ---------------------------------------------------------------------------
> 
> Disabling SACK lets me use my laptop again:
> 
>     net.inet.tcp.sack.enable=0
> 
> without any panics.

I've got another panic with current sources, in tcp_output.c
The panic happens right after the system goes to multiuser, sometimes I 
have time to login, but it blows within a minute.  Looking at the stack 
trace (taken from /var/log/messages, I have KDB_TRACE in my kernel) it 
seems like it panics multiple times.
The panic is network-related, booting the system in single user mode 
without networking works like a charm, so I've done a back trace from 
the core file, which is attached.
I still have the core file around, so I can get more information.

Regards!
//Niclas

Here is a stack trace from the panic:
Sleeping on "-" with the following non-sleepable locks held:
exclusive sleep mutex tcp r = 0 (0xc07e90cc) locked _at_ 
/usr/src/sys/netinet/tcp_input.c:619
KDB: stack backtrace:
db_trace_self_wrapper(c071d5f1,e409fc9c,c0580e33,c07e6fec,e409fcb0,...) 
at db_trace_self_wrapper+0x27
kdb_backtrace(c07e6fec,e409fcb0,1,0,c3badbd0,...) at kdb_backtrace+0x2f
witness_warn(5,0,c071b75c,c070bd17,0,...) at witness_warn+0x1af
msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x1eb
taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at 
taskqueue_thread_loop+0x8d
fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 ---
kernel boot file is /boot/kernel/kernel
Sleeping thread (tid 100035, pid 30) owns a non-sleepable lock
sched_switch(c3badbd0,0,1,179,c079fdf8,...) at sched_switch+0x10a
mi_switch(1,0,c071e040,1cc,0,...) at mi_switch+0x2d0
sleepq_switch(c3c46200,0,c071e040,21e,e409fcd4,...) at sleepq_switch+0x10d
sleepq_wait(c3c46200,0,c071b75c,c070bd17,0,...) at sleepq_wait+0x66
msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x20d
taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at 
taskqueue_thread_loop+0x8d
fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 ---
panic: sleeping thread
KDB: stack backtrace:
db_trace_self_wrapper(c071d5f1,e25cfb90,c054e887,c071b028,c079cae0,...) 
at db_trace_self_wrapper+0x27
kdb_backtrace(c071b028,c079cae0,c071e729,e25cfb9c,100,...) at 
kdb_backtrace+0x2f
panic(c071e729,ffffffff,1e,ae,8,...) at panic+0xb8
propagate_priority(c3abe6c0,0,c071e6ca,287,c07a0380,...) at 
propagate_priority+0xe7
turnstile_wait(c07e90cc,c3badbd0,0,18b,c3badbd2,...) at turnstile_wait+0x499
_mtx_lock_sleep(c07e90cc,c3abe6c0,0,c072ad06,7f,...) at 
_mtx_lock_sleep+0x122
_mtx_lock_flags(c07e90cc,0,c072ad06,7f,e25cfc70,...) at _mtx_lock_flags+0xec
tcp_slowtimo(c054429f,c079d0e4,1,c0719efe,0,...) at tcp_slowtimo+0x3b
pfslowtimo(0,0,c071c0f5,ee,0,...) at pfslowtimo+0x52
softclock(0,c054429f,c079c8b0,1,c3aa66a0,...) at softclock+0x27b
ithread_execute_handlers(c3abdb40,c3abb400,c0717f9c,30e,c3abe6c0,...) at 
ithread_execute_handlers+0x14c
ithread_loop(c3aa66a0,e25cfd38,c0717d8c,326,c3abdb40,...) at 
ithread_loop+0x77
fork_exit(c0535d55,c3aa66a0,e25cfd38) at fork_exit+0xcc
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe25cfd70, ebp = 0 ---
KDB: enter: panic
Physical memory: 1015 MB
Dumping 97 MB: 82 66 50 34 18 2


# kgdb /boot/kernel/kernel /var/crash/vmcore.2
kgdb: kvm_nlist(_stopped_cpus): 
kgdb: kvm_nlist(_stoppcbs): 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
Sleeping thread (tid 100035, pid 30) owns a non-sleepable lock
sched_switch(c3badbd0,0,1,179,c079fdf8,...) at sched_switch+0x10a
mi_switch(1,0,c071e040,1cc,0,...) at mi_switch+0x2d0
sleepq_switch(c3c46200,0,c071e040,21e,e409fcd4,...) at sleepq_switch+0x10d
sleepq_wait(c3c46200,0,c071b75c,c070bd17,0,...) at sleepq_wait+0x66
msleep_spin(c3c46200,c3c4621c,c070bd17,0,1,...) at msleep_spin+0x20d
taskqueue_thread_loop(c3c4a214,e409fd38,c0717d8c,326,c3baf000,...) at taskqueue_thread_loop+0x8d
fork_exit(c057b52a,c3c4a214,e409fd38) at fork_exit+0xcc
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe409fd70, ebp = 0 ---
panic: sleeping thread
KDB: stack backtrace:
db_trace_self_wrapper(c071d5f1,e25cfb90,c054e887,c071b028,c079cae0,...) at db_trace_self_wrapper+0x27
kdb_backtrace(c071b028,c079cae0,c071e729,e25cfb9c,100,...) at kdb_backtrace+0x2f
panic(c071e729,ffffffff,1e,ae,8,...) at panic+0xb8
propagate_priority(c3abe6c0,0,c071e6ca,287,c07a0380,...) at propagate_priority+0xe7
turnstile_wait(c07e90cc,c3badbd0,0,18b,c3badbd2,...) at turnstile_wait+0x499
_mtx_lock_sleep(c07e90cc,c3abe6c0,0,c072ad06,7f,...) at _mtx_lock_sleep+0x122
_mtx_lock_flags(c07e90cc,0,c072ad06,7f,e25cfc70,...) at _mtx_lock_flags+0xec
tcp_slowtimo(c054429f,c079d0e4,1,c0719efe,0,...) at tcp_slowtimo+0x3b
pfslowtimo(0,0,c071c0f5,ee,0,...) at pfslowtimo+0x52
softclock(0,c054429f,c079c8b0,1,c3aa66a0,...) at softclock+0x27b
ithread_execute_handlers(c3abdb40,c3abb400,c0717f9c,30e,c3abe6c0,...) at ithread_execute_handlers+0x14c
ithread_loop(c3aa66a0,e25cfd38,c0717d8c,326,c3abdb40,...) at ithread_loop+0x77
fork_exit(c0535d55,c3aa66a0,e25cfd38) at fork_exit+0xcc
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe25cfd70, ebp = 0 ---
KDB: enter: panic
Physical memory: 1015 MB
Dumping 97 MB: 82 66 50 34 18 2

#0  doadump () at pcpu.h:172
172	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0xc044b92d in db_fncall (dummy1=0, dummy2=0, dummy3=1999, 
    dummy4=0xe25cf980 " WxÀ") at /usr/src/sys/ddb/db_command.c:486
#2  0xc044b6fc in db_command (last_cmdp=0xc0784e24, cmd_table=0x0)
    at /usr/src/sys/ddb/db_command.c:401
#3  0xc044b7c3 in db_command_loop () at /usr/src/sys/ddb/db_command.c:453
#4  0xc044d731 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222
#5  0xc05728f0 in kdb_trap (type=0, code=0, tf=0x0)
    at /usr/src/sys/kern/subr_kdb.c:502
#6  0xc06db633 in trap (frame=0xe25cfb1c) at /usr/src/sys/i386/i386/trap.c:621
#7  0xc06cb0db in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8  0xc0572668 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>)
    at cpufunc.h:60
#9  0xc054e89d in panic (fmt=0xc071e729 "sleeping thread")
    at /usr/src/sys/kern/kern_shutdown.c:547
#10 0xc057c0b7 in propagate_priority (td=0xc3badbd0)
    at /usr/src/sys/kern/subr_turnstile.c:205
#11 0xc057d061 in turnstile_wait (lock=0xc07e90cc, owner=0xc3badbd0, queue=0)
    at /usr/src/sys/kern/subr_turnstile.c:682
#12 0xc05444e5 in _mtx_lock_sleep (m=0xc07e90cc, tid=3282822848, opts=0, 
    file=0x1 <Address 0x1 out of bounds>, line=1)
    at /usr/src/sys/kern/kern_mutex.c:411
#13 0xc0543f4c in _mtx_lock_flags (m=0xc07e90cc, opts=0, 
    file=0xc072ad06 "/usr/src/sys/netinet/tcp_timer.c", line=127)
---Type <return> to continue, or q <return> to quit--- 
    at /usr/src/sys/kern/kern_mutex.c:194
#14 0xc060884b in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:127
#15 0xc05996c5 in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:481
#16 0xc055e1e3 in softclock (dummy=0x0)
    at /usr/src/sys/kern/kern_timeout.c:273
#17 0xc0535c88 in ithread_execute_handlers (p=0xc3abdb40, ie=0xc3abb400)
    at /usr/src/sys/kern/kern_intr.c:682
#18 0xc0535dcc in ithread_loop (arg=0xc3aa66a0)
    at /usr/src/sys/kern/kern_intr.c:766
#19 0xc05349ac in fork_exit (callout=0xc0535d55 <ithread_loop>, arg=0x12, 
    frame=0x12) at /usr/src/sys/kern/kern_fork.c:814
#20 0xc06cb150 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
(kgdb) bt full
#0  doadump () at pcpu.h:172
No locals.
#1  0xc044b92d in db_fncall (dummy1=0, dummy2=0, dummy3=1999, 
    dummy4=0xe25cf980 " WxÀ") at /usr/src/sys/ddb/db_command.c:486
	fn_addr = -1068179472
	args = {-1066111392, -497223348, -1065855328, -497223332, 
  -1069239410, -1065855328, -1066111392, -497223300, -497223348, 0}
	nargs = 0
	retval = -1011678976
	t = 0
#2  0xc044b6fc in db_command (last_cmdp=0xc0784e24, cmd_table=0x0)
    at /usr/src/sys/ddb/db_command.c:401
	cmd = (struct command *) 0xc0746e60
	t = 0
	modif = " WxÀ\000\000\000\000\000É«Ã W\177À`\213yÀ\r\000\000\000\001\000\000\000¼ù\\â1þLÀ\000\001³Ã\aK\004 ä\213yÀàq~À WxÀx\000\000\000 WxÀ\034û\\âàù\\âóÚDÀ\204¯qÀ\bØDÀ\000\000\000\000\020\000\000\000\034û\\â WxÀØÎDÀ WxÀØNxÀx\000\000\000<ú\\â"
	addr = 0
	count = 1999
	have_addr = 0
	result = 0
#3  0xc044b7c3 in db_command_loop () at /usr/src/sys/ddb/db_command.c:453
No locals.
---Type <return> to continue, or q <return> to quit--- 
#4  0xc044d731 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222
	jb = {{_jb = {-497223108, -497223136, -497223056, 3, -497222884, 
      -1069230384, -1066618544, -497222288, 18, 0, -497223056, -1068029843}}}
	prev_jb = (void *) 0x0
	bkpt = 0
#5  0xc05728f0 in kdb_trap (type=0, code=0, tf=0x0)
    at /usr/src/sys/kern/subr_kdb.c:502
	handled = 0
#6  0xc06db633 in trap (frame=0xe25cfb1c) at /usr/src/sys/i386/i386/trap.c:621
	td = (struct thread *) 0xc3abe6c0
	p = (struct proc *) 0xc3abdb40
	i = 0
	ucode = 0
	type = 3
	code = 0
	addr = -1064743456
	eva = 0
	ksi = {ksi_link = {tqe_next = 0xc071ae56, tqe_prev = 0xe25cfaf0}, 
  ksi_info = {si_signo = 0, si_errno = -1066291153, si_code = -1065760017, 
    si_pid = -497222888, si_uid = 3797744372, si_status = -1069240794, 
    si_addr = 0xe25cfb18, si_value = {sival_int = -1, 
      sival_ptr = 0xffffffff}, _reason = {_fault = {_trapno = -497222940}, 
      _timer = {_timerid = -497222940, _overrun = -1069230212}, _mesgq = {
        _mqd = -497222940}, _poll = {_band = -497222940}, __spare__ = {
---Type <return> to continue, or q <return> to quit---
        __spare1__ = -497222940, __spare2__ = {-1069230212, -497222876, 
          -497222892, -1066277079, 1, -497222892, -1066623191}}}}, 
  ksi_flags = -1012144448, ksi_sigq = 0x0}
#7  0xc06cb0db in calltrap () at /usr/src/sys/i386/i386/exception.s:139
No locals.
#8  0xc0572668 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>)
    at cpufunc.h:60
No locals.
#9  0xc054e89d in panic (fmt=0xc071e729 "sleeping thread")
    at /usr/src/sys/kern/kern_shutdown.c:547
	td = (struct thread *) 0xc3abe6c0
	bootopt = 256
	newpanic = 1
	ap = 0xe25cfb9c "ÿÿÿÿ\036"
	buf = "sleeping thread", '\0' <repeats 240 times>
#10 0xc057c0b7 in propagate_priority (td=0xc3badbd0)
    at /usr/src/sys/kern/subr_turnstile.c:205
	tc = (struct turnstile_chain *) 0xc07e90cc
	ts = (struct turnstile *) 0xc3ab2d40
	pri = 52
#11 0xc057d061 in turnstile_wait (lock=0xc07e90cc, owner=0xc3badbd0, queue=0)
    at /usr/src/sys/kern/subr_turnstile.c:682
	tc = (struct turnstile_chain *) 0xc07a0380
	ts = (struct turnstile *) 0xc3ab2d40
---Type <return> to continue, or q <return> to quit---
	td = (struct thread *) 0xc3abe6c0
	td1 = (struct thread *) 0xc07e90cc
#12 0xc05444e5 in _mtx_lock_sleep (m=0xc07e90cc, tid=3282822848, opts=0, 
    file=0x1 <Address 0x1 out of bounds>, line=1)
    at /usr/src/sys/kern/kern_mutex.c:411
	v = 3238088704
#13 0xc0543f4c in _mtx_lock_flags (m=0xc07e90cc, opts=0, 
    file=0xc072ad06 "/usr/src/sys/netinet/tcp_timer.c", line=127)
    at /usr/src/sys/kern/kern_mutex.c:194
No locals.
#14 0xc060884b in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:127
No locals.
#15 0xc05996c5 in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:481
	dp = (struct domain *) 0xc076dde0
	pr = (struct protosw *) 0xc076d9c8
#16 0xc055e1e3 in softclock (dummy=0x0)
    at /usr/src/sys/kern/kern_timeout.c:273
	c_func = (void (*)(void *)) 0xc0599673 <pfslowtimo>
	c_arg = (void *) 0x0
	c_mtx = (struct mtx *) 0x0
	c_flags = 22
	c = (struct callout *) 0xc1015000
	bucket = (struct callout_tailq *) 0xd7b13170
	curticks = 180004
---Type <return> to continue, or q <return> to quit---
	steps = 0
	depth = 1
	mpcalls = 1
	mtxcalls = 0
	gcalls = 0
#17 0xc0535c88 in ithread_execute_handlers (p=0xc3abdb40, ie=0xc3abb400)
    at /usr/src/sys/kern/kern_intr.c:682
	ih = (struct intr_handler *) 0xc3ab9d00
	ihn = (struct intr_handler *) 0xc3c58400
#18 0xc0535dcc in ithread_loop (arg=0xc3aa66a0)
    at /usr/src/sys/kern/kern_intr.c:766
	ithd = (struct intr_thread *) 0xc3aa66a0
	ie = (struct intr_event *) 0xc3abb400
	td = (struct thread *) 0xc3abe6c0
	p = (struct proc *) 0xc3abdb40
	__func__ = "ithread_loop"
#19 0xc05349ac in fork_exit (callout=0xc0535d55 <ithread_loop>, arg=0x12, 
    frame=0x12) at /usr/src/sys/kern/kern_fork.c:814
	p = (struct proc *) 0xc3abdb40
	td = (struct thread *) 0x1
#20 0xc06cb150 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205
No locals.
(kgdb) q
#
Received on Sun Mar 25 2007 - 16:50:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:07 UTC