On Tuesday 19 October 2004 05:03 pm, John Baldwin wrote: > On Sunday 17 October 2004 03:17 pm, Robert Watson wrote: > > Ran into an interesting problem relating to WITNESS and a > > multiply-destroyed mutex today, but I didn't get the real panic message > > because of what looks like one or two other bugs. Here's the high-level: > > > > sodealloc() tries to destroy the mutex on a socket more than once, > > causing WITNESS to panic. The soft clock ithread preempts the serial > > output in panic() when a critical section is released, resulting in a > > context switch to that ithread. That ithread then attempts to lock a > > mutex, and is horrified to find out that Giant is held by the thread it > > preempted, but that that thread isn't blocked: > > > > tiger-2# ppanic: process 15955(tcpconnect):2 holds Giant but isn't > > blocked on a lock > > > > % rwatson -- note that double p is from the initiating panic > > > > cpuid = 0 > > KDB: enter: panic > > [thread 100071] > > Stopped at kdb_enter+0x2b: nop > > db> trace > > kdb_enter(c07fc101) at kdb_enter+0x2b > > panic(c07ff1ec,3e53,c5f90588,2,c080c957) at panic+0x127 > > propagate_priority(c547b180,c08c47b8,c5f91780,c08bdbc0,0) at > > propagate_priority+ > > 0x139 > > turnstile_wait(0,c08bdbc0,c5f91780,c08bdbc0,2,c07fb4a5,217) at > > turnstile_wait+0x > > 2de > > _mtx_lock_sleep(c08bdbc0,c547b180,0,c07fd124,f7) at _mtx_lock_sleep+0x15f > > _mtx_lock_flags(c08bdbc0,0,c07fd124,f7,0) at _mtx_lock_flags+0x85 > > softclock(0) at softclock+0x16a > > ithread_loop(c5436e80,e942fd48,c5436e80,c05f7c80,0) at ithread_loop+0x124 > > fork_exit(c05f7c80,c5436e80,e942fd48) at fork_exit+0xa4 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0x1, eip = 0, esp = 0xe942fd7c, ebp = 0 --- > > db> show locks > > db> show pcpu > > cpuid = 0 > > curthread = 0xc547b180: pid 87 "swi4: clock sio" > > curpcb = 0xe942fda0 > > fpcurthread = none > > idlethread = 0xc53e4a80: pid 14 "idle: cpu0" > > APIC ID = 0 > > currentldt = 0x28 > > spin locks held: > > db> ps > > pid proc uarea uid ppid pgrp flag stat wmesg wchan > > cmd 15955 c5f90400 ebbb1000 0 525 15955 0004002 [Can run] tcpconnect > > ... > > 87 c548b400 e946e000 0 0 0 000020c [LOCK Giant c5432d00] > > swi4: clock sio > > db> trace 15955 > > sched_switch(c5f91780,c5451d80,6) at sched_switch+0x16f > > mi_switch(6,c5451d80,c5451ed0,c5451d80,ebba492c) at mi_switch+0x264 > > maybe_preempt(c5451d80) at maybe_preempt+0x156 > > sched_add(c5451d80,4,c53e2d80,c5451d80,c5450800) at sched_add+0x153 > > setrunqueue(c5451d80,4) at setrunqueue+0xab > > ithread_schedule(c53e2d80,1f,c5f91780,c5f91780,2580) at > > ithread_schedule+0xb3 > > intr_execute_handlers(c53dc0d8,ebba49a8,1f,ebba49f0,c0780ad3) at > > intr_execute_ha > > ndlers+0xf5 > > lapic_handle_intr(4f) at lapic_handle_intr+0x2e > > Xapic_isr2() at Xapic_isr2+0x33 > > --- interrupt, eip = 0xc0619446, esp = 0xebba49ec, ebp = 0xebba49f0 --- > > critical_exit(3f8,ebba4a30,c076f731,c08fa000,0) at critical_exit+0x92 > > _mtx_unlock_spin_flags(c08fa000,0,c08174fc,be4) at > > _mtx_unlock_spin_flags+0x8d > > siocnputc(c08987c0,70) at siocnputc+0xb9 > > cnputc(70) at cnputc+0x4d > > putchar(70,ebba4b44) at putchar+0x52 > > kvprintf(c07fc2ed,c06219b8,ebba4b44,a,ebba4b64) at kvprintf+0x77 > > printf(c07fc2ed,c08bf060,c08bf060,100,c07ff91e,ebba4b98) at printf+0x43 > > panic(c07ff91e,c07b69b1,c0813aa8,c07ff6af,c5a23ebc) at panic+0xcb > > witness_destroy(c5a23ebc,c5a23dec,ebba4bd0,c063f15f,c5a23ebc) at > > witness_destroy > > +0x40 > > mtx_destroy(c5a23ebc,c5e94e00,c5a23ebc,ebba4be8,c063f4fd) at > > mtx_destroy+0x7a > > sodealloc(c5a23dec,c5a23eb0,c5832bb0,c5a23dec,ebba4c00) at sodealloc+0xf3 > > sofree(c5a23dec) at sofree+0x281 > > soclose(c5a23dec,c5832bb0,0,ebba4c2c,c05efad0) at soclose+0x24d > > soo_close(c5832bb0,c5f91780) at soo_close+0x4b > > fdrop_locked(c5832bb0,c5f91780,c53ccf40,0,c07f90b1) at fdrop_locked+0x84 > > fdrop(c5832bb0,c5f91780,ebba4c78,c0628c5c,c07f90ba) at fdrop+0x24 > > closef(c5832bb0,c5f91780,c5ebc628,0,c07f90b1) at closef+0x1db > > close(c5f91780,ebba4d14,1,14b,292) at close+0xe6 > > syscall(2f,bfbf002f,bfbf002f,3,bfbfedf7) at syscall+0x213 > > Xint0x80_syscall() at Xint0x80_syscall+0x1f > > --- syscall (6, FreeBSD ELF32, close), eip = 0x280c154b, esp = > > 0xbfbfec0c, ebp = > > 0xbfbfec48 --- > > db> show locks 15955 > > exclusive sleep mutex Giant r = 0 (0xc08bdbc0) locked _at_ > > kern/kern_descrip.c:967 > > > > So on face value, it looks like propagate_priority() is unhappy about the > > tcpconnect thread (that was preempted) still holding Giant. However, > > given that preemption is supposed to preempt but leave a thread runnable, > > this would seem not to be a panic-worthy condition. > > Yeah, pp doesn't like a thread in TD_CAN_RUN(). That state can only happen > if you have already panic'd though. I can fix it to allow TD_CAN_RUN if > the kernel has already panic'd though. Actually, this won't work. Here's the problem: the reason you got to pp() is that you blocked on Giant, right. So your ithread is going to block after pp() returns. Since the thread at the end of the chain is TD_CAN_RUN and due to the weird TDF_INPANIC stuff we have in choosethread(), the thread holding the lock that the ithread needs is _never_ going to run, so your ithread will never resume. The end result is that if you made pp() just return rather than panic'ing in this case, you would get a single 'p' character on your output, and then the machine would deadlock. At least this way you get into ddb. :-P -- John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/Received on Tue Oct 19 2004 - 19:18:42 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:18 UTC