Re: snp panic [Was: Re: panic with tcpdrop]

From: Javier <rako29_at_gmail.com> Date: Wed, 19 Dec 2007 10:01:42 -0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:24 UTC

Hello, sorry, the same panic again, but with the snp.c patched with your
modification. The steps of the new panic are the same.
How can I help?
Thanks
Javier

> On Sun, Nov 25, 2007 at 02:37:16PM -0300, Rako wrote:
>> It is not reproduceable. But a have 2 panic of that in 1 month.
>>
>> The steps for the last panic:
>>
>> watch -W /dev/ttyv0
>> make buildkernel ....
>> Ctrl-G
>> Panic!
>>
>> The explication from Kostik can be correct, but, if I not try to attach 
>> again, only Ctrl-G; it is not the path for that explication or I 
>> misunderstands. Anyway, I can probe the changes for 1 month to test if 
>> any happen
>> Thanks!
>> Javier
> C-G seems to be the exact scenario for the bug.
>>
>>> On Sat, Nov 24, 2007 at 09:19:42PM +0000, Robert Watson wrote:
>>>> On Sat, 24 Nov 2007, Rako wrote:
>>>>
>>>>> the patch solve the problem with tcpdrop, Thanks!!
>>>>>
>>>>> An other panic ocurred, but on other area, is on snp.ko module (watch -W 
>>>>> /dev/ttyv0) but can't get backtrace. This panic is simliar at
>>>>>
>>>>> http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069990.html
>>>>>
>>>>> the problem may be at line 164 of /usr/src/sys/dev/snp/snp.c snp = 
>>>>> ttytosnp(tp);
>>>>>
>>>>> where snp get NULL
>>>>>
>>>>> but, no familiar with this ... Any idea what can I do to solve the error?
>>>> I'm having trouble reproducing this -- could you give me a detailed set 
>>>> of instructions regarding the specific steps I should take to try and get 
>>>> this panic, if it's reproduceable for you?
>>>>
>>>> Thanks,
>>>>
>>>> Robert N M Watson
>>>> Computer Laboratory
>>>> University of Cambridge
>>>>
>>>>> Regards,
>>>>> Javier
>>>>>
>>>>>
>>>>> Fatal trap 12: page fault while in kernel mode
>>>>> fault virtual address   = 0x24
>>>>> fault code              = supervisor read, page not present
>>>>> instruction pointer     = 0x20:0xc3e4f230
>>>>> stack pointer           = 0x28:0xd66c3b34
>>>>> frame pointer           = 0x28:0xd66c3b88
>>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>>                      = DPL 0, pres 1, def32 1, gran 1
>>>>> processor eflags        = interrupt enabled, resume, IOPL = 0
>>>>> current process         = 2216 (make)
>>>>> trap number             = 12
>>>>> panic: page fault
>>>>> KDB: stack backtrace:
>>>>> db_trace_self_wrapper(c0a5f1ea,d66c39d4,c078878a,c0a5d5f4,c0b5bcc0,...) 
>>>>> at db_trace_self_wrapper+0x26
>>>>> kdb_backtrace(c0a5d5f4,c0b5bcc0,c0a1fb8c,d66c39e0,d66c39e0,...) at 
>>>>> kdb_backtrace+0x29
>>>>> panic(c0a1fb8c,c0a7c54d,c3e44770,1,1,...) at panic+0xaa
>>>>> trap_fatal(c3e942b8,0,1,0,c39f5630,...) at trap_fatal+0x303
>>>>> trap_pfault(0,c39f5630,c39f5630,0,c,...) at trap_pfault+0x250
>>>>> trap(d66c3af4) at trap+0x382
>>>>> calltrap() at calltrap+0x6
>>>>> --- trap 0xc, eip = 0xc3e4f230, esp = 0xd66c3b34, ebp = 0xd66c3b88 ---
>>>>> snplwrite(c33bf800,d66c3c60,0,d66c3bbc,c0754bec,...) at snplwrite+0x80
>>>>> ttywrite(c3389600,d66c3c60,0,c39cf5e8,c39f5630,...) at ttywrite+0x39
>>>>> giant_write(c3389600,d66c3c60,0,0,c0abb080,...) at giant_write+0x6c
>>>>> devfs_write_f(c39cf5e8,d66c3c60,c3de4800,0,c39f5630,...) at 
>>>>> devfs_write_f+0x75
>>>>> dofilewrite(d66c3c60,ffffffff,ffffffff,0,c39cf5e8,...) at 
>>>>> dofilewrite+0x97
>>>>> kern_writev(c39f5630,1,d66c3c60,2813c076,0,...) at kern_writev+0x58
>>>>> write(c39f5630,d66c3cfc,c,110,c337e630,...) at write+0x4f
>>>>> syscall(d66c3d38) at syscall+0x335
>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x20
>>>>> --- syscall (4, FreeBSD ELF32, write), eip = 0x8083603, esp = 
>>>>> 0xbfbfd4ec, ebp = 0xbfbfd528 ---
>>>>> Uptime: 19m14s
>>>>> Physical memory: 495 MB
>>>>> Dumping 86 MB: 71 55 39 23 7
>>> I believe I have a plausible explanation for the panic. Please, look
>>> at the snpioctl(), SNPSTTY command. First, assume that both the s > 0
>>> and snoop device has attached tty. Then, snp_tty will be overwritten,
>>> without detaching the old tty from the snooper. In this case, ttytosnp()
>>> would not find the snp from tty, returning NULL. This would lead to the
>>> trace above. This is old kernel bug.
>>>
>>> Now, I shall note that watch(8) does not attach to the new tty without
>>> detaching from the previous one. But, after destroy_dev_sched() conversion
>>> have been done for snp(4), actual detach is asynchronous. Since watch(8)
>>> opens the numbered snpX clone device instead of the master /dev/snp, it
>>> could reopen the same device. The condition is racy, and thus not easily
>>> reproducable.
>>>
>>> The patch below might help with kernel panic.
>>>
>>> diff --git a/sys/dev/snp/snp.c b/sys/dev/snp/snp.c
>>> index a84e90c..b8f3d63 100644
>>> --- a/sys/dev/snp/snp.c
>>> +++ b/sys/dev/snp/snp.c
>>> _at__at_ -491,7 +491,7 _at__at_ snpioctl(struct cdev *dev, u_long cmd, caddr_t data, 
>>> int flags,
>>>     struct thread *td)
>>> {
>>> 	struct snoop *snp;
>>> -	struct tty *tp, *tpo;
>>> +	struct tty *tp;
>>> 	struct cdev *tdev;
>>> 	struct file *fp;
>>> 	int s;
>>> _at__at_ -502,6 +502,9 _at__at_ snpioctl(struct cdev *dev, u_long cmd, caddr_t data, 
>>> int flags,
>>> 		s = *(int *)data;
>>> 		if (s < 0)
>>> 			return (snp_down(snp));
>>> +		if (snp->snp_tty != NULL)
>>> +			return (EBUSY);
>>> +
>>> 		if (fget(td, s, &fp) != 0)
>>> 			return (EINVAL);
>>> 		if (fp->f_type != DTYPE_VNODE ||
>>> _at__at_ -520,13 +523,6 _at__at_ snpioctl(struct cdev *dev, u_long cmd, caddr_t data, 
>>> int flags,
>>> 			return (EBUSY);
>>>
>>> 		s = spltty();
>>> -
>>> -		if (snp->snp_target == NULL) {
>>> -			tpo = snp->snp_tty;
>>> -			if (tpo)
>>> -				tpo->t_state &= ~TS_SNOOP;
>>> -		}
>>> -
>>> 		tp->t_state |= TS_SNOOP;
>>> 		snp->snp_olddisc = tp->t_line;
>>> 		tp->t_line = snooplinedisc;